A full-stack AI-powered web application built with Streamlit that allows users to upload a CSV file and automatically generates insights, visualizations, and an interactive chat interface for data exploration.
- Exploratory Data Analysis (EDA): Automatic generation of summary statistics, missing value heatmaps, correlation matrices, and distribution/count plots.
- AI-Powered Insights: Get plain-English insights (trends, correlations, anomalies) generated by Groq AI
- Chat with Data: An interactive chat interface powered by LangChain's Pandas Dataframe Agent to ask questions directly about your data.
- Recommendations: Actionable preprocessing steps (handling missing values, outliers) suggested by AI.
- Report Export: Generate and download a comprehensive HTML report containing your visualizations and AI insights.
├── app.py # Main Streamlit application
├── requirements.txt # Project dependencies
├── README.md # Project documentation
├── sample_data/
│ └── titanic.csv # Sample dataset for testing
└── src/
├── chat.py # LangChain Pandas Agent integration
├── data_loader.py # CSV file parsing and basic info
├── eda.py # Matplotlib/Seaborn visualization logic
├── llm_insights.py # LLM integration for dataset insights
├── recommendations.py # LLM integration for data recommendations
└── report_generator.py # HTML report generation logic
- Python 3.9 or higher installed (download from python.org)
- Git installed (download from git-scm.com)
- An OpenAI API key (or Groq API key) for AI features
git clone https://github.com/Payal-Dhokane/DataWhisper.git
cd DataWhisperUsing a virtual environment prevents dependency conflicts with other Python projects.
On Windows:
python -m venv venv
venv\Scripts\activateOn macOS/Linux:
python3 -m venv venv
source venv/bin/activateYou should see (venv) appear in your terminal prompt, indicating the virtual environment is active.
pip install --upgrade pip
pip install -r requirements.txtIf you encounter installation errors, try installing packages one at a time:
pip install streamlit pandas numpy matplotlib seaborn plotly langchain langchain-core langchain-groq langchain-experimental tabulate PyYAML python-dotenvCreate a .env file in the project root directory:
# .env file
OPENAI_API_KEY=your_openai_api_key_here
GROQ_API_KEY=your_groq_api_key_here # optional, alternative to OpenAIReplace your_openai_api_key_here with your actual API key from platform.openai.com.
streamlit run app.pyThis will open the app in your default web browser at http://localhost:8501.
| Error | Likely Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'streamlit' |
Dependencies not installed | Run pip install -r requirements.txt |
ImportError: cannot import name '...' from 'langchain' |
LangChain version mismatch | Run pip install --upgrade langchain langchain-core langchain-experimental |
OpenAI API key not found |
Missing .env file or API key |
Create a .env file with OPENAI_API_KEY=your_key |
Streamlit App - Module Not Found |
Running from wrong directory | Make sure you're in the project root (cd DataWhisper) |
Permission denied on venv activation (Windows) |
Execution policy restriction | Run PowerShell as Administrator and execute: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser |
pip: command not found |
Python/Pip not in PATH | Reinstall Python and check "Add Python to PATH" during installation |
If you're still stuck, please open an issue.
- Launch the app with
streamlit run app.py - Upload a CSV file using the file uploader widget
- Explore the automatically generated visualizations and statistics
- Use the chat interface to ask questions about your data in plain English
- Download the HTML report for sharing or documentation
A sample Titanic dataset is included in the sample_data/ directory to test the app without your own data.
- Frontend: Streamlit
- Data Processing: Pandas, NumPy
- Visualization: Matplotlib, Seaborn, Plotly
- AI/LLM: LangChain, OpenAI / Groq
- Authentication: Streamlit-Authenticator, Streamlit-OAuth





