Developer: Cuma KURT
Email: [email protected]
GitHub
A modular, high-performance file and web domain metadata analyzer and sensitive data scanner.
- Modular Architecture: Clean separation of CLI, scanning, analysis, reporting, configuration, and utilities.
- Multiprocessing: Fast file scanning and analysis using parallel processing (default: 5 workers, configurable).
- Sensitive Data Detection: Finds sensitive patterns (e.g., passwords, emails, Turkish ID, credit cards) in files and web downloads.
- Archive Extraction: Automatically extracts and scans contents of
.zip,.rar,.tar,.gz,.bz2,.tgz,.tar.gz,.tar.bz2,.tbz2files (excluding password-protected archives). - Office & PDF Analysis: Deep metadata and content analysis for Office (
.doc,.docx,.xls,.xlsx,.ppt,.pptx) and PDF files, with fallback and external tool support (catdoc). - HTML Reporting: Generates modern, interactive HTML reports with clickable file paths and downloadable sensitive data summaries.
- Sensitive Data Export: Download all detected sensitive data (type, value, file) as a
.txtfile directly from the report. - Web Domain Scanning: Crawls and downloads files from web domains, with clear error messages for unreachable URLs.
- Configurable Logging: All debug and error messages are shown only with
--debug; otherwise, logs are silent and third-party library logs are suppressed. - Extensible Filtering: Filter results by user, date, keyword, and more.
- Testable: Includes unit tests for validators and filters.
- Optional Advanced Analyses: Enable with CLI flags:
--analyze-links: Extract links, emails, IPs, and domains from PDF, DOCX, and HTML files (shown in a separate exportable table in the report).--analyze-macros: Detect macros, scripts, and OLE objects in Office files (shown in a separate exportable table in the report).--analyze-stegano: Detect basic steganography (LSB) in images and hidden base64 data in text files (shown in a separate exportable table in the report).
- Clone the repository:
git clone https://github.com/cumakurt/filemeta.git cd filemeta/filemeta - Install dependencies:
pip install -r requirements.txt
- For
.rararchive support:pip install rarfileand ensureunraris installed on your system. - For
.doccontent extraction: installcatdoc(e.g.,sudo apt install catdoc).
- For
-
Clone the repository:
git clone https://github.com/cumakurt/filemeta.git cd filemeta/filemeta -
Create and activate a virtual environment:
python3 -m venv myenv source myenv/bin/activate pip install -r requirements.txt -
Install dependencies:
pip install -r requirements.txt
- For
.rararchive support:pip install rarfileand ensureunraris installed on your system. - For
.doccontent extraction: installcatdoc(e.g.,sudo apt install catdoc).
- For
-
Deactivate virtual environment when done:
deactivate
- Clone the repository:
git clone https://github.com/cumakurt/filemeta.git cd filemeta - Build Docker image:
docker build -t filemeta:latest .
docker run --rm filemeta:latest --helpdocker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetaExplanation: This command sequence does the following:
- Mounts
/home/cuma/Downloadsdirectory to Docker container as/scan - Scans this directory and creates reports in
/home/appuser/app/filemeta/reports/ - Copies reports to your current directory (host machine)
- Cleans up the used Docker container
docker run --name filemeta filemeta:latest -u https://www.cumakurt.com && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetaExplanation: This command sequence does the following:
- Scans the specified website (https://www.cumakurt.com)
- Saves the obtained reports to the reports folder
- Copies the reports folder to your current directory (host machine)
- Removes the used Docker container and cleans the environment
filemeta.py -p /path/to/files
filemeta.py -u https://example.comdocker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --analyze-links --analyze-macros --analyze-stegano && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetafilemeta -p /path/to/files --analyze-links
filemeta -p /path/to/files --analyze-macros
filemeta -p /path/to/files --analyze-stegano
filemeta -p /path/to/files --analyze-links --analyze-macros --analyze-stegano- By default, advanced analyses (links, macros, steganography) are disabled for performance. Enable them with the flags above.
docker run --rm filemeta:latest --helpfilemeta --helpfor all options and usage details.
docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --debug && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetapython filemeta.py -p /path/to/scan --debugdocker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --workers 8 && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetapython filemeta.py -p /path/to/scan --workers 8docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan -o /output/dir -r myreport.html && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemetapython filemeta.py -p /path/to/scan -o /output/dir -r myreport.html- All file paths in the report are clickable (
file://links). - In the Sensitive Patterns section, you can download all detected sensitive data as a
.txtfile (type, value, file).
- Default settings (extensions, workers, crawling limits, etc.) are in
config.py. - You can override settings via environment variables or CLI arguments.
- By default, the application is silent (no debug or error output to console).
- Use
--debugto see detailed logs and errors. - All third-party library logs (Scrapy, urllib3, asyncio) are suppressed unless debugging is enabled.
- Add new file analyzers or sensitive pattern detectors in the
utils/directory. - Run unit tests:
pytest tests/
GPL v3
- Uses Rich for beautiful terminal output.
- Uses Scrapy for web crawling.
- Uses PyPDF2, olefile, python-docx, openpyxl, python-pptx, rarfile, and others for file analysis.
Project Home: https://github.com/cumakurt/filemeta
