TGV2RayScraper is a Python project designed for collecting data from Telegram channels, extracting and processing V2Ray configurations, including cleaning, normalizing, and deduplicating them. The project maintains up-to-date information about channels and includes tools for managing their lists. It provides both synchronous and asynchronous tools for data collection and V2Ray configuration processing.
The project runs on Python version 3.10 or higher.
For Russian version, see README.md
Clones the project to your computer:
git clone https://github.com/denxv/TGV2RayScraper.gitChanges into the project directory:
cd TGV2RayScraperAll
uvcommands work the same on Linux, macOS, and Windows.
Creates and activates the virtual environment:
uv venvInstalls only the main dependencies for running the project:
uv sync --no-devInstalls all dependencies, including dev packages for tests and development:
uv syncRuns the main project script:
uv run python main.pyAlternative way to run the project:
uv run main.pyThis will update the channel list, collect data, and clean V2Ray configurations in a single run.
Runs all project tests automatically:
uv run pytestChecks type correctness in all files:
uv run mypy .Checks code style and errors:
uv run ruff check .Creates a virtual environment for the project:
python -m venv venvActivates the virtual environment on Linux/macOS:
source venv/bin/activateActivates the virtual environment on Windows:
.\venv\Scripts\Activate.ps1If PowerShell blocks script execution, temporarily allow it with:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope ProcessThen run
.\venv\Scripts\Activate.ps1again.
Installs the required libraries for running the project:
pip install -r requirements.txtInstalls all dependencies, including dev packages for tests and development:
pip install -r requirements.txt -r requirements-dev.txtRuns the main project script:
python main.pyThis will update the channel list, collect data, and clean V2Ray configurations in a single run.
Runs all project tests automatically:
pytestChecks type correctness in all files:
mypy .Checks code style and errors:
ruff check .The project requires the following Python libraries (works with Python ≥ 3.10):
-
aiofiles – asynchronous file handling
-
asteval — safe evaluation of Python expressions (used for filtering configurations)
-
httpx — modern HTTP client with support for both synchronous and asynchronous requests
-
lxml – parsing and processing HTML/XML
-
tqdm – progress bar for long-running operations
The full list of dependencies is available in requirements.txt.
For development and testing of the project, additional tools are required:
-
mypy – type checking
-
pytest – testing framework
-
pytest-asyncio – support for asynchronous tests in
pytest -
pytest-cov – test coverage reporting
-
ruff – static code analysis and linting
All dev-dependencies are listed in requirements-dev.txt.
-
adapters/ — adapters for synchronous and asynchronous data operations
-
async_/ — asynchronous implementations (channels, configurations, scraping)
-
channels.py— asynchronous operations with channels -
configs.py— asynchronous processing of configurations -
scraper.py— asynchronous channel data scraper
-
-
sync/ — synchronous implementations
-
channels.py— synchronous operations with channels -
configs.py— synchronous processing of configurations -
scraper.py— synchronous channel data scraper
-
-
-
channels/ — folder for storing channel and URL list files
-
current.json— main file with Telegram channel information -
urls.txt— main file with Telegram channel links -
backups of these files are also stored (e.g.,
current-backup-<timestamp>.json,urls-backup-<timestamp>.txt)
-
-
configs/ — folder for storing V2Ray configurations
-
v2ray-clean.txt— cleaned configurations -
v2ray-raw.txt— raw configurations
-
-
core/ — core utilities and constants
-
constants.py— constants, default paths, URL templates, regex patterns, script flags -
decorators.py— decorators (e.g., for logging) -
logger.py— logging utility with colored console output and microsecond timestamps -
typing.py— custom type aliases for the project (channels, V2Ray configs, CLI, sessions, etc.) -
utils.py— utility and helper functions
-
-
docs/ — project documentation in multiple languages
-
ru/— Russian documentation-
README.md— user guide in Russian -
LICENSE— project license in Russian
-
-
-
domain/ — business logic and domain-specific functions
-
channel.py— operations with channels, sorting, filtering -
config.py— processing and normalization of configurations -
predicates.py— filtering logic and predicates
-
-
logs/ — folder for script logs
- log files with timestamps (e.g.,
2020-10-10.log)
- log files with timestamps (e.g.,
-
scripts/ — helper scripts for performing project tasks
-
async_scraper.py— script collects data from Telegram channels asynchronously -
scraper.py— script collects data from Telegram channels synchronously -
update_channels.py— script to update channels (removing inactive channels and adding new ones) -
v2ray_cleaner.py— script cleans, normalizes, and processes obtained V2Ray configurations
-
-
tests/ — directory with all project tests, verifying correctness, stability, and module functionality (currently in progress...)
-
e2e/ — end-to-end tests, covering full usage scenarios
-
fixtures/ — helper files and test data
-
integration/ — integration tests, checking module interactions
-
unit/ — unit tests, checking individual functions and classes in isolation
-
conftest.py —
pytestconfiguration: fixtures, hooks, and common test settings
-
-
LICENSE — project license (default in English)
-
main.py — main script to run all project operations, including updating channels, collecting data, and processing configurations
-
pyproject.toml — configuration file for project metadata, dependencies, and development tools (e.g.,
mypy,ruff,pytest), centralizing build and tooling settings -
README.md — main project documentation (default in English)
-
requirements-dev.txt — list of development dependencies (testing, type checking, linters —
pytest,mypy,ruff, etc.) -
requirements.txt — list of all required libraries for running the project
-
uv.lock — dependency lock file, recording exact package versions for a reproducible environment
The file channels/current.json stores metadata about Telegram channels. Top-level keys are channel usernames, and values are objects with channel state.
{
"channel_new_default": {
"count": 0,
"current_id": 1,
"last_id": -1
},
"channel_is_not_live": {
"count": -1,
"current_id": 100,
"last_id": -1
},
"channel_live": {
"count": 500,
"current_id": 100,
"last_id": 100
},
"channel_will_be_deleted": {
"count": -3,
"current_id": 100,
"last_id": -1
}
}-
count-
> 0→ number of V2Ray configurations in an active channel (count = 1) -
= 0→ nothing found, or channel temporarily unavailable (last_id = -1) -
< 0→ number of failed attempts to access the channel-
Each failed attempt decreases the value (
-1, -2, …). -
When
count <= -3, the channel is considered inactive and removed fromcurrent.jsonandurls.txt.
-
-
-
current_id-
starting message ID for scraping
-
1→ start from the beginning of the channel -
negative → take the last N messages
- Example: if
last_id = 150andcurrent_id = -100, the effectivecurrent_idis150 - 100 = 50. Scraping will start from message 50 and move toward the last message (last_id = 150).
- Example: if
-
-
last_id-
latest message ID in the channel
-
updated on each run
-
-1→ channel temporarily or permanently unavailable -
otherwise, a positive integer
-
The cleaned configuration file (configs/v2ray-clean.txt) contains entries in one of the following formats:
anytls://password@host:port/path?params#name
anytls://password@host:port?params#name
hy2://password@host:port/path?params#name
hy2://password@host:port?params#name
hysteria2://password@host:port/path?params#name
hysteria2://password@host:port?params#name
ss://base64(method:password)@host:port#name
ss://method:password@host:port#name
ss://base64(method:password@host:port)#name
ssr://base64(host:port:protocol:method:obfs:base64(password)/?param=base64(value))
trojan://password@host:port/path?params#name
trojan://password@host:port?params#name
tuic://uuid:password@host:port/path?params#name
tuic://uuid:password@host:port?params#name
vless://uuid@host:port/path?params#name
vless://uuid@host:port?params#name
vmess://base64(json)
vmess://uuid@host:port/path?params#name
vmess://uuid@host:port?params#name
wireguard://privatekey@host:port/path?params#name
wireguard://privatekey@host:port?params#name
You can run the channel update script as follows:
python -m scripts.update_channelsYou can also prepend
uv runbefore anypythoncommand to run it throughuv.
An alternative method using PYTHONPATH is also available:
PYTHONPATH=. python scripts/update_channels.pyYou can use the -h flag to see all available options:
python -m scripts.update_channels -hOptions include:
-
--no-dry-run— Disable dry run and actually assigncurrent_id(check-only mode is enabled by default). -
-B, --no-backup— Skip creating backup files for channel and Telegram URL lists before saving (backups are created by default). -
-C, --channels FILE— Path to the input JSON file containing the list of channels (default:channels/current.json). -
-D, --delete-channels— Delete channels that are unavailable or meet specific conditions (default: disabled). -
-M, --message-offset N— Number of recent messages to include when assigningcurrent_id. -
-N, --include-new— Include new channels in processing. -
-U, --urls FILE— Path to a text file containing new channel URLs (default:channels/urls.txt).
The script performs the following:
-
Loads the current list of channels from
channels/current.json. -
Merges with new URLs from
channels/urls.txt. -
By default, performs a dry run without making changes (
--no-dry-rundisables it). -
Allows assigning
current_idto channels taking message offset into account (--message-offset). -
Can include new channels in processing (
--include-new). -
Supports deletion of unavailable or flagged channels (
--delete-channels). -
Creates backup copies of both files with a timestamp (can be disabled using the
--no-backupoption). -
Saves the updated list back to
current.jsonandurls.txt. -
Logs detailed warnings and debug information for each channel.
Example usage:
python -m scripts.update_channels -C channels/current.json --urls channels/urls.txt --delete-channels -M 50 --include-new --no-dry-run --no-backupYou can add
uv runbefore thepythoncommand to run it throughuv.
You can run the asynchronous scraper as follows:
python -m scripts.async_scraperYou can also prepend
uv runbefore anypythoncommand to run it throughuv.
An alternative method using PYTHONPATH is also available:
PYTHONPATH=. python scripts/async_scraper.pyYou can use the -h flag to see all available options:
python -m scripts.async_scraper -hOptions include:
-
-C, --channels FILE— Path to the input JSON file containing the list of channels (default:channels/current.json). -
-E, --batch-extract N— Number of messages processed in parallel to extract V2Ray configs (default: 20). -
-R, --configs-raw FILE— Path to the output file for saving scraped V2Ray configs (default:configs/v2ray-raw.txt). -
-T, --time-out SECONDS— HTTP client timeout in seconds for requests used while updating channel info and extracting V2Ray configurations (default: 30.0). -
-U, --batch-update N— Maximum number of channels updated in parallel (default: 100).
Example usage:
python -m scripts.async_scraper -E 20 -U 100 --time-out 30.0 -C channels/current.json -R configs/v2ray-raw.txtYou can add
uv runbefore thepythoncommand to run it throughuv.
You can run the synchronous scraper as follows:
python -m scripts.scraperYou can also prepend
uv runbefore anypythoncommand to run it throughuv.
Alternatively, you can run it with PYTHONPATH:
PYTHONPATH=. python scripts/scraper.pyUse -h to see all available options:
python -m scripts.scraper -hOptions include:
-
-C, --channels FILE— Path to the input JSON file containing the list of channels (default:channels/current.json). -
-R, --configs-raw FILE— Path to the output file for saving scraped V2Ray configs (default:configs/v2ray-raw.txt). -
-T, --time-out SECONDS— HTTP client timeout in seconds for requests used while updating channel info and extracting V2Ray configurations (default: 30.0).
Example usage:
python -m scripts.scraper --time-out 30.0 -C channels/current.json -R configs/v2ray-raw.txtYou can add
uv runbefore thepythoncommand to run it throughuv.
You can run the V2Ray configuration cleaner script as follows:
python -m scripts.v2ray_cleanerYou can also prepend
uv runbefore anypythoncommand to run it throughuv.
Alternatively, you can run it using PYTHONPATH:
PYTHONPATH=. python scripts/v2ray_cleaner.pyYou can also run with -h to see all available options:
python -m scripts.v2ray_cleaner -hOptions include:
-
-D, --duplicate [FIELDS]— Remove duplicate entries by specified comma-separated fields. If used without value (e.g.,-D), the default fields areprotocol,host,port. If omitted, duplicates are not removed. -
-F, --filter CONDITION— Filter entries using a Python-like condition. Example:"host == '1.1.1.1' and port > 1000". Only matching entries are kept. -
-I, --configs-raw FILE— Path to the input file with raw V2Ray configs (default:configs/v2ray-raw.txt). -
-N, --no-normalize— Disable normalization (enabled by default). -
-O, --configs-clean FILE— Path to the output file for cleaned and processed configs (default:configs/v2ray-clean.txt). -
-R, --reverse— Sort entries in descending order (only applies with--sort). -
-S, --sort [FIELDS]— Sort entries by comma-separated fields. If used without value (e.g.,-S), the default fields arehost,port. If omitted, entries are not sorted.
The script performs the following:
-
Reads raw configs from
configs/v2ray-raw.txt. -
Applies regex-based filters and normalization.
-
Removes duplicates (if
--duplicateis used). -
Sorts entries (if
--sortis used). -
Saves cleaned and processed configs to
configs/v2ray-clean.txt.
Example usage:
python -m scripts.v2ray_cleaner -I configs/v2ray-raw.txt -O configs/v2ray-clean.txt --filter "re_search(r'speedtest|google', host)" -D "host, port" -S "protocol, host, port" --reverseYou can add
uv runbefore thepythoncommand to run it throughuv.
python main.pyYou can also prepend
uv runbefore anypythoncommand to run it throughuv.
You can also run with -h or --help-scripts to see all available options:
python main.py -hpython main.py --help-scriptsOptions include:
-
-H, --help-scripts— Display help information for all internal pipeline scripts. -
-N, --no-async— Use slower but simpler synchronous scraping mode instead of the default asynchronous mode.
The script performs the following:
-
Executes all pipeline steps in order:
-
update_channels.py– updates the list of channels. -
async_scraper.py– collects channel data from Telegram asynchronously (faster, used by default). -
scraper.py– collects channel data synchronously if--no-asyncis used (slower, simpler). -
v2ray_cleaner.py– cleans, normalizes, and processes the scraped proxy configuration files.
-
-
Collects only relevant arguments for each script automatically.
Example usage:
python main.py --batch-extract 10 --batch-update 100 --filter "host and port" --duplicate --sort "protocol" --reverseYou can add
uv runbefore thepythoncommand to run it throughuv.
-
Always update the channel list before running the scrapers.
-
Use the V2Ray cleaner after scraping to normalize configurations.
-
Scripts are provided as-is; use at your own risk.
This software is provided "as-is". The author is not responsible for any damage, data loss, or other consequences resulting from the use of this software.
Important: Intended for educational/personal use only. The author is not responsible for:
-
Misuse, including spamming or overloading Telegram servers
-
Unauthorized data collection
-
Any legal, financial, or other consequences
Use responsibly and comply with platform terms.
This project is licensed under the MIT License – see the LICENSE file for details.