Skip to content

feat: Add support for Excel and JSON file uploads (fixes #71)#72

Open
ArshVermaGit wants to merge 1 commit into
Payal-Dhokane:mainfrom
ArshVermaGit:main
Open

feat: Add support for Excel and JSON file uploads (fixes #71)#72
ArshVermaGit wants to merge 1 commit into
Payal-Dhokane:mainfrom
ArshVermaGit:main

Conversation

@ArshVermaGit
Copy link
Copy Markdown

Summary

Resolves #71

This PR adds support for Excel (.xlsx/.xls) and JSON file uploads to
DataWhisper, which was previously limited to CSV files only.

Problem

The data loader (data_loader.py) hardcoded pd.read_csv() as the only file reader,
and the Streamlit file uploader in app.py restricted uploads to type=["csv"]. This
blocked users from uploading Excel or JSON files — two of the most common data formats
in enterprise and API-driven workflows.

Changes Made

src/data_loader.py — Refactored

  • Added _get_file_extension() to detect file type from both UploadedFile objects and string paths
  • Extracted CSV logic into _load_csv() (preserves multi-encoding fallback)
  • Added _load_excel() — reads .xlsx/.xls via pd.read_excel() with openpyxl engine
  • Added _load_json() — reads standard JSON with automatic fallback to JSON Lines (lines=True)
  • Created _LOADERS dispatch dict for clean extension → loader routing
  • Exported SUPPORTED_EXTENSIONS as single source of truth for accepted file types

app.py — Updated

  • Imported SUPPORTED_EXTENSIONS from data_loader
  • Updated st.file_uploader() to accept csv, xlsx, xls, json
  • Added help tooltip listing supported formats
  • Updated page header and instruction text to reflect multi-format support

requirements.txt — Updated

  • Added openpyxl>=3.1.0 (required by pandas for Excel file reading)

What's NOT Changed

No changes to eda.py, llm_insights.py, report_generator.py, chat.py, or
ui_components.py — these modules operate on pd.DataFrame objects and are
format-agnostic.

Testing

  • ✅ All modified files pass Python syntax compilation
  • ✅ File extension detection verified for .csv, .xlsx, .xls, .json, and unsupported types
  • ✅ Existing CSV functionality (including sample Titanic data) is fully preserved
  • ✅ Proper error messages shown for unsupported file types and missing openpyxl package

Refactor data_loader.py to detect file type by extension and dispatch
to the appropriate pandas reader (CSV, Excel, JSON). Update the
file uploader in app.py to accept all supported formats and add
openpyxl dependency for Excel reading.
Copy link
Copy Markdown
Author

@ArshVermaGit ArshVermaGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes are tested and ready. Please review and merge when you get a chance. Thanks!

@ArshVermaGit
Copy link
Copy Markdown
Author

Hi @Payal-Dhokane ! Issue #71 has been resolved. Please review the PR and merge it under GSSoC. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug / Feature Request: Add Support for Excel (.xlsx/.xls) and JSON File Uploads

1 participant