TGV2RayScraper

TGV2RayScraper is a Python project designed for collecting data from Telegram channels, extracting and processing V2Ray configurations, including cleaning, normalizing, and deduplicating them. The project maintains up-to-date information about channels and includes tools for managing their lists. It provides both synchronous and asynchronous tools for data collection and V2Ray configuration processing.

The project runs on Python version 3.10 or higher.

For Russian version, see README.md

Quick Start

Clone the repository

Clones the project to your computer:

git clone https://github.com/denxv/TGV2RayScraper.git

Changes into the project directory:

cd TGV2RayScraper

Working with the `uv` command

All uv commands work the same on Linux, macOS, and Windows.

Creating a virtual environment

Creates and activates the virtual environment:

uv venv

Installing dependencies

Installs only the main dependencies for running the project:

uv sync --no-dev

Installs all dependencies, including dev packages for tests and development:

uv sync

Running the project

Runs the main project script:

uv run python main.py

Alternative way to run the project:

uv run main.py

This will update the channel list, collect data, and clean V2Ray configurations in a single run.

Testing and linting (only for development)

Runs all project tests automatically:

uv run pytest

Checks type correctness in all files:

uv run mypy .

Checks code style and errors:

uv run ruff check .

Working with the `pip` command

Creating a virtual environment

Creates a virtual environment for the project:

python -m venv venv

Activates the virtual environment on Linux/macOS:

source venv/bin/activate

Activates the virtual environment on Windows:

.\venv\Scripts\Activate.ps1

If PowerShell blocks script execution, temporarily allow it with:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process

Then run .\venv\Scripts\Activate.ps1 again.

Installing dependencies

Installs the required libraries for running the project:

pip install -r requirements.txt

Installs all dependencies, including dev packages for tests and development:

pip install -r requirements.txt -r requirements-dev.txt

Running the project

Runs the main project script:

python main.py

This will update the channel list, collect data, and clean V2Ray configurations in a single run.

Testing and linting (only for development)

Runs all project tests automatically:

pytest

Checks type correctness in all files:

mypy .

Checks code style and errors:

ruff check .

Dependencies

Main Dependencies

The project requires the following Python libraries (works with Python ≥ 3.10):

aiofiles – asynchronous file handling
asteval — safe evaluation of Python expressions (used for filtering configurations)
httpx — modern HTTP client with support for both synchronous and asynchronous requests
lxml – parsing and processing HTML/XML
tqdm – progress bar for long-running operations

The full list of dependencies is available in requirements.txt.

Development Dependencies (Dev-dependencies)

For development and testing of the project, additional tools are required:

mypy – type checking
pytest – testing framework
pytest-asyncio – support for asynchronous tests in pytest
pytest-cov – test coverage reporting
ruff – static code analysis and linting

All dev-dependencies are listed in requirements-dev.txt.

Project Structure

adapters/ — adapters for synchronous and asynchronous data operations
- async_/ — asynchronous implementations (channels, configurations, scraping)
  - channels.py — asynchronous operations with channels
  - configs.py — asynchronous processing of configurations
  - scraper.py — asynchronous channel data scraper
- sync/ — synchronous implementations
  - channels.py — synchronous operations with channels
  - configs.py — synchronous processing of configurations
  - scraper.py — synchronous channel data scraper
channels/ — folder for storing channel and URL list files
- current.json — main file with Telegram channel information
- urls.txt — main file with Telegram channel links
- backups of these files are also stored (e.g., current-backup-<timestamp>.json, urls-backup-<timestamp>.txt)
configs/ — folder for storing V2Ray configurations
- v2ray-clean.txt — cleaned configurations
- v2ray-raw.txt — raw configurations
core/ — core utilities and constants
- constants.py — constants, default paths, URL templates, regex patterns, script flags
- decorators.py — decorators (e.g., for logging)
- logger.py — logging utility with colored console output and microsecond timestamps
- typing.py — custom type aliases for the project (channels, V2Ray configs, CLI, sessions, etc.)
- utils.py — utility and helper functions
docs/ — project documentation in multiple languages
- ru/ — Russian documentation
  - README.md — user guide in Russian
  - LICENSE — project license in Russian
domain/ — business logic and domain-specific functions
- channel.py — operations with channels, sorting, filtering
- config.py — processing and normalization of configurations
- predicates.py — filtering logic and predicates
logs/ — folder for script logs
- log files with timestamps (e.g., 2020-10-10.log)
scripts/ — helper scripts for performing project tasks
- async_scraper.py — script collects data from Telegram channels asynchronously
- scraper.py — script collects data from Telegram channels synchronously
- update_channels.py — script to update channels (removing inactive channels and adding new ones)
- v2ray_cleaner.py — script cleans, normalizes, and processes obtained V2Ray configurations
tests/ — directory with all project tests, verifying correctness, stability, and module functionality (currently in progress...)
- e2e/ — end-to-end tests, covering full usage scenarios
- fixtures/ — helper files and test data
- integration/ — integration tests, checking module interactions
- unit/ — unit tests, checking individual functions and classes in isolation
- conftest.py — pytest configuration: fixtures, hooks, and common test settings
LICENSE — project license (default in English)
main.py — main script to run all project operations, including updating channels, collecting data, and processing configurations
pyproject.toml — configuration file for project metadata, dependencies, and development tools (e.g., mypy, ruff, pytest), centralizing build and tooling settings
README.md — main project documentation (default in English)
requirements-dev.txt — list of development dependencies (testing, type checking, linters — pytest, mypy, ruff, etc.)
requirements.txt — list of all required libraries for running the project
uv.lock — dependency lock file, recording exact package versions for a reproducible environment

Channel JSON Structure

The file channels/current.json stores metadata about Telegram channels. Top-level keys are channel usernames, and values are objects with channel state.

Example

{
    "channel_new_default": {
        "count": 0,
        "current_id": 1,
        "last_id": -1
    },
    "channel_is_not_live": {
        "count": -1,
        "current_id": 100,
        "last_id": -1
    },
    "channel_live": {
        "count": 500,
        "current_id": 100,
        "last_id": 100
    },
    "channel_will_be_deleted": {
        "count": -3,
        "current_id": 100,
        "last_id": -1
    }
}

Field Description

count
- > 0 → number of V2Ray configurations in an active channel (count = 1)
- = 0 → nothing found, or channel temporarily unavailable (last_id = -1)
- < 0 → number of failed attempts to access the channel
  - Each failed attempt decreases the value (-1, -2, …).
  - When count <= -3, the channel is considered inactive and removed from current.json and urls.txt.
current_id
- starting message ID for scraping
- 1 → start from the beginning of the channel
- negative → take the last N messages
  - Example: if last_id = 150 and current_id = -100, the effective current_id is 150 - 100 = 50. Scraping will start from message 50 and move toward the last message (last_id = 150).
last_id
- latest message ID in the channel
- updated on each run
- -1 → channel temporarily or permanently unavailable
- otherwise, a positive integer

Supported Protocols

The cleaned configuration file (configs/v2ray-clean.txt) contains entries in one of the following formats:

AnyTLS

anytls://password@host:port/path?params#name
anytls://password@host:port?params#name

Hy2 / Hysteria2

hy2://password@host:port/path?params#name
hy2://password@host:port?params#name
hysteria2://password@host:port/path?params#name
hysteria2://password@host:port?params#name

Shadowsocks / ShadowsocksR

ss://base64(method:password)@host:port#name
ss://method:password@host:port#name
ss://base64(method:password@host:port)#name
ssr://base64(host:port:protocol:method:obfs:base64(password)/?param=base64(value))

Trojan

trojan://password@host:port/path?params#name
trojan://password@host:port?params#name

TUIC

tuic://uuid:password@host:port/path?params#name
tuic://uuid:password@host:port?params#name

VLESS

vless://uuid@host:port/path?params#name
vless://uuid@host:port?params#name

VMess

vmess://base64(json)
vmess://uuid@host:port/path?params#name
vmess://uuid@host:port?params#name

WireGuard

wireguard://privatekey@host:port/path?params#name
wireguard://privatekey@host:port?params#name

Usage

1. Update Channels

You can run the channel update script as follows:

python -m scripts.update_channels

You can also prepend uv run before any python command to run it through uv.

An alternative method using PYTHONPATH is also available:

PYTHONPATH=. python scripts/update_channels.py

You can use the -h flag to see all available options:

python -m scripts.update_channels -h

Options include:

--no-dry-run — Disable dry run and actually assign current_id (check-only mode is enabled by default).
-B, --no-backup — Skip creating backup files for channel and Telegram URL lists before saving (backups are created by default).
-C, --channels FILE — Path to the input JSON file containing the list of channels (default: channels/current.json).
-D, --delete-channels — Delete channels that are unavailable or meet specific conditions (default: disabled).
-M, --message-offset N — Number of recent messages to include when assigning current_id.
-N, --include-new — Include new channels in processing.
-U, --urls FILE — Path to a text file containing new channel URLs (default: channels/urls.txt).

The script performs the following:

Loads the current list of channels from channels/current.json.
Merges with new URLs from channels/urls.txt.
By default, performs a dry run without making changes (--no-dry-run disables it).
Allows assigning current_id to channels taking message offset into account (--message-offset).
Can include new channels in processing (--include-new).
Supports deletion of unavailable or flagged channels (--delete-channels).
Creates backup copies of both files with a timestamp (can be disabled using the --no-backup option).
Saves the updated list back to current.json and urls.txt.
Logs detailed warnings and debug information for each channel.

Example usage:

python -m scripts.update_channels -C channels/current.json --urls channels/urls.txt --delete-channels -M 50 --include-new --no-dry-run --no-backup

You can add uv run before the python command to run it through uv.

2. Running Scrapers

Asynchronous Scraper (faster, experimental)

You can run the asynchronous scraper as follows:

python -m scripts.async_scraper

You can also prepend uv run before any python command to run it through uv.

An alternative method using PYTHONPATH is also available:

PYTHONPATH=. python scripts/async_scraper.py

You can use the -h flag to see all available options:

python -m scripts.async_scraper -h

Options include:

-C, --channels FILE — Path to the input JSON file containing the list of channels (default: channels/current.json).
-E, --batch-extract N — Number of messages processed in parallel to extract V2Ray configs (default: 20).
-R, --configs-raw FILE — Path to the output file for saving scraped V2Ray configs (default: configs/v2ray-raw.txt).
-T, --time-out SECONDS — HTTP client timeout in seconds for requests used while updating channel info and extracting V2Ray configurations (default: 30.0).
-U, --batch-update N — Maximum number of channels updated in parallel (default: 100).

Example usage:

python -m scripts.async_scraper -E 20 -U 100 --time-out 30.0 -C channels/current.json -R configs/v2ray-raw.txt

You can add uv run before the python command to run it through uv.

Synchronous Scraper (simpler, slower)

You can run the synchronous scraper as follows:

python -m scripts.scraper

You can also prepend uv run before any python command to run it through uv.

Alternatively, you can run it with PYTHONPATH:

PYTHONPATH=. python scripts/scraper.py

Use -h to see all available options:

python -m scripts.scraper -h

Options include:

-C, --channels FILE — Path to the input JSON file containing the list of channels (default: channels/current.json).
-R, --configs-raw FILE — Path to the output file for saving scraped V2Ray configs (default: configs/v2ray-raw.txt).
-T, --time-out SECONDS — HTTP client timeout in seconds for requests used while updating channel info and extracting V2Ray configurations (default: 30.0).

Example usage:

python -m scripts.scraper --time-out 30.0 -C channels/current.json -R configs/v2ray-raw.txt

You can add uv run before the python command to run it through uv.

3. Cleaning V2Ray Configurations

You can run the V2Ray configuration cleaner script as follows:

python -m scripts.v2ray_cleaner

You can also prepend uv run before any python command to run it through uv.

Alternatively, you can run it using PYTHONPATH:

PYTHONPATH=. python scripts/v2ray_cleaner.py

You can also run with -h to see all available options:

python -m scripts.v2ray_cleaner -h

Options include:

-D, --duplicate [FIELDS] — Remove duplicate entries by specified comma-separated fields. If used without value (e.g., -D), the default fields are protocol,host,port. If omitted, duplicates are not removed.
-F, --filter CONDITION — Filter entries using a Python-like condition. Example: "host == '1.1.1.1' and port > 1000". Only matching entries are kept.
-I, --configs-raw FILE — Path to the input file with raw V2Ray configs (default: configs/v2ray-raw.txt).
-N, --no-normalize — Disable normalization (enabled by default).
-O, --configs-clean FILE — Path to the output file for cleaned and processed configs (default: configs/v2ray-clean.txt).
-R, --reverse — Sort entries in descending order (only applies with --sort).
-S, --sort [FIELDS] — Sort entries by comma-separated fields. If used without value (e.g., -S), the default fields are host,port. If omitted, entries are not sorted.

The script performs the following:

Reads raw configs from configs/v2ray-raw.txt.
Applies regex-based filters and normalization.
Removes duplicates (if --duplicate is used).
Sorts entries (if --sort is used).
Saves cleaned and processed configs to configs/v2ray-clean.txt.

Example usage:

python -m scripts.v2ray_cleaner -I configs/v2ray-raw.txt -O configs/v2ray-clean.txt --filter "re_search(r'speedtest|google', host)" -D "host, port" -S "protocol, host, port" --reverse

You can add uv run before the python command to run it through uv.

4. Running All Steps via `main.py`

python main.py

You can also prepend uv run before any python command to run it through uv.

You can also run with -h or --help-scripts to see all available options:

python main.py -h

python main.py --help-scripts

Options include:

-H, --help-scripts — Display help information for all internal pipeline scripts.
-N, --no-async — Use slower but simpler synchronous scraping mode instead of the default asynchronous mode.

The script performs the following:

Executes all pipeline steps in order:
1. update_channels.py – updates the list of channels.
2. async_scraper.py – collects channel data from Telegram asynchronously (faster, used by default).
3. scraper.py – collects channel data synchronously if --no-async is used (slower, simpler).
4. v2ray_cleaner.py – cleans, normalizes, and processes the scraped proxy configuration files.
Collects only relevant arguments for each script automatically.

Example usage:

python main.py --batch-extract 10 --batch-update 100 --filter "host and port" --duplicate --sort "protocol" --reverse

You can add uv run before the python command to run it through uv.

Notes

Always update the channel list before running the scrapers.
Use the V2Ray cleaner after scraping to normalize configurations.
Scripts are provided as-is; use at your own risk.

Disclaimer

This software is provided "as-is". The author is not responsible for any damage, data loss, or other consequences resulting from the use of this software.

Important: Intended for educational/personal use only. The author is not responsible for:

Misuse, including spamming or overloading Telegram servers
Unauthorized data collection
Any legal, financial, or other consequences

Use responsibly and comply with platform terms.

License

This project is licensed under the MIT License – see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
adapters		adapters
channels		channels
configs		configs
core		core
docs/ru		docs/ru
domain		domain
logs		logs
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

License

denxv/TGV2RayScraper

Folders and files

Latest commit

History

Repository files navigation

TGV2RayScraper

Quick Start

Clone the repository

Working with the uv command

Creating a virtual environment

Installing dependencies

Running the project

Testing and linting (only for development)

Working with the pip command

Creating a virtual environment

Installing dependencies

Running the project

Testing and linting (only for development)

Dependencies

Main Dependencies

Development Dependencies (Dev-dependencies)

Project Structure

Channel JSON Structure

Example

Field Description

Supported Protocols

AnyTLS

Hy2 / Hysteria2

Shadowsocks / ShadowsocksR

Trojan

TUIC

VLESS

VMess

WireGuard

Usage

1. Update Channels

2. Running Scrapers

Asynchronous Scraper (faster, experimental)

Synchronous Scraper (simpler, slower)

3. Cleaning V2Ray Configurations

4. Running All Steps via main.py

Notes

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Working with the `uv` command

Working with the `pip` command

4. Running All Steps via `main.py`

Packages