Skip to content

infocyde/x_list_scraper

Repository files navigation

X OSINT List Pull

Pull tweets from an X (Twitter) list, convert to Markdown, and automatically check for significant war developments using the Inception Labs Mercury AI model.

Setup

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Create a .env file in the directory above the project root one directory so your .env file isn't caught up in a git push.

INCEPTION_API_KEY=your_inception_labs_api_key

Cookies

Grab your X auth cookies by running:

python x_grab_cookies.py

The script will prompt you for two cookie values. To find them:

  1. Go to x.com in your browser and make sure you're logged in
  2. Press F12 to open Developer Tools
  3. Click the Application tab (Chrome/Edge) or Storage tab (Firefox)
  4. In the left sidebar, expand Cookies and click https://x.com
  5. Find auth_token in the list and copy its Value
  6. Find ct0 in the list and copy its Value
  7. Paste each value when the script prompts you

Configuration

Note: currently the id is my public OSINT list on X. You are fine to use that.

Edit config.json:

{
  "list_id": "YOUR_LIST_ID", 
  "tweet_count": 70,
  "scrape_media": false
}
  • list_id — The numeric X list ID. You can find this in the URL when viewing a list on x.com (e.g. https://x.com/i/lists/1672634236292169729). Change this to pull from any public list — an OSINT list, a news list, a tech list, etc.
  • tweet_count — Number of tweets to pull
  • scrape_media — Set to true to include photo/video URLs

Usage

Run the Full Pipeline

Double-click run_pipeline.bat or run:

python run_pipeline.py

This runs three steps in order, stopping if any step fails:

  1. Scrape — Pulls tweets from the X list to list_tweets.json
  2. Convert — Creates a timestamped Markdown file in markdown_output/
  3. Check — Sends the Markdown to the Mercury API to flag significant developments

Run Steps Individually

python x_list_scraper.py      # Scrape tweets to JSON
python x_to_markdown.py       # Convert JSON to Markdown
python check_significant.py   # Check for significant developments

Batch File

run_pipeline.bat is a Windows batch file that:

  1. Navigates to the project directory
  2. Activates the virtual environment
  3. Runs the full pipeline
  4. Stays open until you press a key

You can place a shortcut to run_pipeline.bat on your desktop or pin it to the taskbar for quick access.

What Counts as Significant

The AI check (defined in prompts/prompt.txt) flags these events:

  • US military casualties
  • US plane shot down
  • US ship hit or sunk
  • Major terrorist attack in Europe or the USA
  • US troops entering Iran (not just discussion/plans)
  • Iran surrenders

These criteria are defined in prompts/prompt.txt and can be freely edited without touching any code. The prompt uses a {tweets} placeholder that gets replaced with the tweet content at runtime. You could repurpose this for entirely different use cases — tracking stock-moving news, monitoring for natural disasters, flagging mentions of specific companies, etc. Just rewrite the prompt to match what you're looking for.

AI Model

The significance check uses Inception Labs' Mercury model, a diffusion-based language model. Unlike traditional autoregressive models that generate one token at a time, Mercury generates all tokens in parallel through an iterative denoising process, making it significantly faster at inference. The API follows the OpenAI chat completions format, so check_significant.py can be adapted to use virtually any model or provider (OpenAI, Anthropic, Ollama, etc.) by changing the endpoint URL, auth header, and model name.

Disclaimer

This tool uses twikit, an unofficial/reverse-engineered X (Twitter) client library. It is not authorized by X's official API program and operates in a legal and policy grey area. Using twikit may violate X's Terms of Service, and excessive or automated use could result in your account being rate-limited, temporarily locked, or permanently suspended. Use this tool at your own risk, keep request volumes low, and be aware that X could take action against accounts that engage in unofficial scraping.

Files

File Purpose
run_pipeline.bat One-click Windows launcher
run_pipeline.py Runs all three steps in sequence
x_grab_cookies.py Save auth cookies from browser DevTools
x_list_scraper.py Pull tweets from an X list
x_to_markdown.py Convert JSON output to Markdown
check_significant.py Check tweets for significant developments via AI
prompts/prompt.txt Editable prompt template for the AI check
config.json Scraper settings
.env API keys (not committed)
x_cookies.json Saved session cookies (not committed)

About

Trying out the Inception DLLM to do some simple analytics on a given topic of scaped content from an X list.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages