Skip to content

mohocp/deep-research-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Research Tool

The Deep Research Tool is an automated research assistant that uses iterative deep-search techniques to gather insights from the web. It leverages large language models, SERP (Search Engine Results Page) queries, and a recursive research methodology to compile detailed reports. The solution supports both a command-line (console) interface and a web-based (Streamlit) interface.

Features

  • Iterative Deep Research:

    • Generates follow-up questions from the initial user query.
    • Creates tailored SERP queries based on ongoing research learnings.
    • Recursively drills down into topics by controlling research breadth and depth.
  • Report Generation:

    • Processes SERP results to extract key learnings and URLs.
    • Compiles all gathered information into a detailed final report.
  • Multiple Interfaces:

    • Console Application: Interactively guides the user via terminal inputs.
    • Streamlit Web App: Provides an easy-to-use UI with live progress updates and a downloadable final report.
  • Configurable Parameters:

    • Control search breadth (number of query branches) and depth (levels of recursive search).
    • Configure environment variables for API keys and endpoints.

Project Structure

.
├── src
│   ├── deep_research.py         # Core research logic, utility functions, and asynchronous deep research routine.
│   ├── console_app.py           # Console (CLI) application for running deep research.
│   └── streamlit_app.py         # Streamlit web application to run deep research with a GUI.
├── .env                       # Environment configuration (API keys, endpoints, etc.).
├── requirements.txt          # Python dependencies.
└── README.md                 # This file.

Prerequisites

  • Python 3.11+ is required.
  • Install the required packages using the provided requirements.txt.

Installation

  1. Clone the Repository:

    git clone https://github.com/mohocp/deep-research-python.git
    cd deep-research-python
  2. Create and Activate a Virtual Environment (Optional but Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt

Configuration

Before running the application, set up your environment variables. You can do this by editing the provided .env file. The following keys must be configured:

  • LLM (Large Language Model) Settings:

    • LLM_KEY — API key for the language model (e.g., OpenAI).
    • LLM_MODEL — The model name (default example: gpt-4o).
    • LLM_ENDPOINT — The endpoint URL for your LLM API (default: https://api.openai.com/v1).
  • Firecrawl (SERP Search) Settings:

    • FIRECRAWL_KEY — Your API key for Firecrawl.
    • FIRECRAWL_BASE_URL — Base URL for the Firecrawl API.
  • Other Parameters:

    • CONTEXT_SIZE — Maximum allowed context size (default: 128000).
    • MAX_OUTPUT_TOKENS — Maximum tokens for LLM responses (default: 8000).
    • BREADTH — Default breadth (number of query branches) for research (default: 4).
    • DEPTH — Default depth (recursion levels) for research (default: 2).

Note: If you are using your self-hosted Firecrawl or different LLM settings, update the .env file accordingly.

Usage

You can run the console app or the streamlit app.

1. Running the Console Application

The console app provides an interactive command-line interface:

python src/console_app.py
  • Step 1: Enter your research query.
  • Step 2: Answer follow-up questions generated by the tool.
  • Step 3: The tool performs deep research, displays progress, and saves the final report to output.md.

2. Running the Streamlit Web Application

The Streamlit app provides a graphical interface with live progress updates:

streamlit run src/streamlit_app.py
  • Step 1: Enter your research query and adjust the breadth/depth parameters if needed.
  • Step 2: Answer the generated follow-up questions.
  • Step 3: Watch the research progress in real time and view/download the final report directly from the browser.

How It Works

  1. Feedback & Query Generation:

    • The tool first asks follow-up questions based on the initial user query to clarify research direction.
    • It then generates multiple SERP queries using an assistant agent powered by a large language model.
  2. SERP Search & Processing:

    • Each generated query is run concurrently (subject to a concurrency limit).
    • The SERP results are processed to extract key learnings and URLs.
  3. Recursive Deep Research:

    • If additional depth is allowed, the tool recursively generates new queries based on follow-up questions and learnings from the current search.
  4. Final Report Compilation:

    • All learnings and sources are combined into a detailed report.
    • The report is output as Markdown and saved locally (output.md for the console app).

Troubleshooting & Notes

  • API Rate Limits:
    The tool implements exponential backoff when encountering rate limits. If you see repeated rate limit messages, consider adjusting your API usage or reviewing the Firecrawl documentation.

  • Asynchronous Execution:
    Both the console and Streamlit apps use asynchronous programming (asyncio) to perform multiple searches concurrently. Ensure your Python environment supports asyncio (Python 3.8+).

  • Customizing Prompts:
    You can adjust the research instructions and system prompt in deep_research.py to better fit your use case.

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages