The Deep Research Tool is an automated research assistant that uses iterative deep-search techniques to gather insights from the web. It leverages large language models, SERP (Search Engine Results Page) queries, and a recursive research methodology to compile detailed reports. The solution supports both a command-line (console) interface and a web-based (Streamlit) interface.
-
Iterative Deep Research:
- Generates follow-up questions from the initial user query.
- Creates tailored SERP queries based on ongoing research learnings.
- Recursively drills down into topics by controlling research breadth and depth.
-
Report Generation:
- Processes SERP results to extract key learnings and URLs.
- Compiles all gathered information into a detailed final report.
-
Multiple Interfaces:
- Console Application: Interactively guides the user via terminal inputs.
- Streamlit Web App: Provides an easy-to-use UI with live progress updates and a downloadable final report.
-
Configurable Parameters:
- Control search breadth (number of query branches) and depth (levels of recursive search).
- Configure environment variables for API keys and endpoints.
.
├── src
│ ├── deep_research.py # Core research logic, utility functions, and asynchronous deep research routine.
│ ├── console_app.py # Console (CLI) application for running deep research.
│ └── streamlit_app.py # Streamlit web application to run deep research with a GUI.
├── .env # Environment configuration (API keys, endpoints, etc.).
├── requirements.txt # Python dependencies.
└── README.md # This file.
- Python 3.11+ is required.
- Install the required packages using the provided
requirements.txt.
-
Clone the Repository:
git clone https://github.com/mohocp/deep-research-python.git cd deep-research-python -
Create and Activate a Virtual Environment (Optional but Recommended):
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
Before running the application, set up your environment variables. You can do this by editing the provided .env file. The following keys must be configured:
-
LLM (Large Language Model) Settings:
LLM_KEY— API key for the language model (e.g., OpenAI).LLM_MODEL— The model name (default example:gpt-4o).LLM_ENDPOINT— The endpoint URL for your LLM API (default:https://api.openai.com/v1).
-
Firecrawl (SERP Search) Settings:
FIRECRAWL_KEY— Your API key for Firecrawl.FIRECRAWL_BASE_URL— Base URL for the Firecrawl API.
-
Other Parameters:
CONTEXT_SIZE— Maximum allowed context size (default:128000).MAX_OUTPUT_TOKENS— Maximum tokens for LLM responses (default:8000).BREADTH— Default breadth (number of query branches) for research (default:4).DEPTH— Default depth (recursion levels) for research (default:2).
Note: If you are using your self-hosted Firecrawl or different LLM settings, update the
.envfile accordingly.
The console app provides an interactive command-line interface:
python src/console_app.py- Step 1: Enter your research query.
- Step 2: Answer follow-up questions generated by the tool.
- Step 3: The tool performs deep research, displays progress, and saves the final report to
output.md.
The Streamlit app provides a graphical interface with live progress updates:
streamlit run src/streamlit_app.py- Step 1: Enter your research query and adjust the breadth/depth parameters if needed.
- Step 2: Answer the generated follow-up questions.
- Step 3: Watch the research progress in real time and view/download the final report directly from the browser.
-
Feedback & Query Generation:
- The tool first asks follow-up questions based on the initial user query to clarify research direction.
- It then generates multiple SERP queries using an assistant agent powered by a large language model.
-
SERP Search & Processing:
- Each generated query is run concurrently (subject to a concurrency limit).
- The SERP results are processed to extract key learnings and URLs.
-
Recursive Deep Research:
- If additional depth is allowed, the tool recursively generates new queries based on follow-up questions and learnings from the current search.
-
Final Report Compilation:
- All learnings and sources are combined into a detailed report.
- The report is output as Markdown and saved locally (
output.mdfor the console app).
-
API Rate Limits:
The tool implements exponential backoff when encountering rate limits. If you see repeated rate limit messages, consider adjusting your API usage or reviewing the Firecrawl documentation. -
Asynchronous Execution:
Both the console and Streamlit apps use asynchronous programming (asyncio) to perform multiple searches concurrently. Ensure your Python environment supports asyncio (Python 3.8+). -
Customizing Prompts:
You can adjust the research instructions and system prompt indeep_research.pyto better fit your use case.