FileMeta

Developer: Cuma KURT
Email: [email protected]
LinkedIn
GitHub

A modular, high-performance file and web domain metadata analyzer and sensitive data scanner.

Features

Modular Architecture: Clean separation of CLI, scanning, analysis, reporting, configuration, and utilities.
Multiprocessing: Fast file scanning and analysis using parallel processing (default: 5 workers, configurable).
Sensitive Data Detection: Finds sensitive patterns (e.g., passwords, emails, Turkish ID, credit cards) in files and web downloads.
Archive Extraction: Automatically extracts and scans contents of .zip, .rar, .tar, .gz, .bz2, .tgz, .tar.gz, .tar.bz2, .tbz2 files (excluding password-protected archives).
Office & PDF Analysis: Deep metadata and content analysis for Office (.doc, .docx, .xls, .xlsx, .ppt, .pptx) and PDF files, with fallback and external tool support (catdoc).
HTML Reporting: Generates modern, interactive HTML reports with clickable file paths and downloadable sensitive data summaries.
Sensitive Data Export: Download all detected sensitive data (type, value, file) as a .txt file directly from the report.
Web Domain Scanning: Crawls and downloads files from web domains, with clear error messages for unreachable URLs.
Configurable Logging: All debug and error messages are shown only with --debug; otherwise, logs are silent and third-party library logs are suppressed.
Extensible Filtering: Filter results by user, date, keyword, and more.
Testable: Includes unit tests for validators and filters.
Optional Advanced Analyses: Enable with CLI flags:
- --analyze-links: Extract links, emails, IPs, and domains from PDF, DOCX, and HTML files (shown in a separate exportable table in the report).
- --analyze-macros: Detect macros, scripts, and OLE objects in Office files (shown in a separate exportable table in the report).
- --analyze-stegano: Detect basic steganography (LSB) in images and hidden base64 data in text files (shown in a separate exportable table in the report).

Installation

Method 1: Direct Installation

Clone the repository:

git clone https://github.com/cumakurt/filemeta.git
cd filemeta/filemeta

Install dependencies:
```
pip install -r requirements.txt
```
- For .rar archive support: pip install rarfile and ensure unrar is installed on your system.
- For .doc content extraction: install catdoc (e.g., sudo apt install catdoc).

Method 2: Virtual Environment Installation

⚠️ Important: FileMeta requires Python 3.10, 3.11, or 3.12. Python 3.13 is currently not supported.

Clone the repository:

git clone https://github.com/cumakurt/filemeta.git
cd filemeta/filemeta

Create and activate a virtual environment:

python3 -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt

Install dependencies:
```
pip install -r requirements.txt
```
- For .rar archive support: pip install rarfile and ensure unrar is installed on your system.
- For .doc content extraction: install catdoc (e.g., sudo apt install catdoc).
Deactivate virtual environment when done:
```
deactivate
```

Method 3: Docker Installation

Clone the repository:

git clone https://github.com/cumakurt/filemeta.git
cd filemeta

Build Docker image:
```
docker build -t filemeta:latest .
```

Usage

Docker Usage

Display Help Menu

docker run --rm filemeta:latest --help

Scan a Local Directory

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Explanation: This command sequence does the following:

Mounts /home/cuma/Downloads directory to Docker container as /scan
Scans this directory and creates reports in /home/appuser/app/filemeta/reports/
Copies reports to your current directory (host machine)
Cleans up the used Docker container

Scan a Website

docker run --name filemeta filemeta:latest -u https://www.cumakurt.com && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Explanation: This command sequence does the following:

Scans the specified website (https://www.cumakurt.com)
Saves the obtained reports to the reports folder
Copies the reports folder to your current directory (host machine)
Removes the used Docker container and cleans the environment

Direct Installation Usage

Basic Examples

filemeta.py -p /path/to/files
filemeta.py -u https://example.com

Advanced Analysis Examples

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --analyze-links --analyze-macros --analyze-stegano && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

filemeta -p /path/to/files --analyze-links
filemeta -p /path/to/files --analyze-macros
filemeta -p /path/to/files --analyze-stegano
filemeta -p /path/to/files --analyze-links --analyze-macros --analyze-stegano

By default, advanced analyses (links, macros, steganography) are disabled for performance. Enable them with the flags above.

Full CLI Help

Docker Usage

docker run --rm filemeta:latest --help

Direct Installation Usage

filemeta --help

for all options and usage details.

Enable Debug/Verbose Output

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --debug && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan --debug

Control Number of Workers

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --workers 8 && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan --workers 8

Output Directory and Report Name

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan -o /output/dir -r myreport.html && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan -o /output/dir -r myreport.html

HTML Report

All file paths in the report are clickable (file:// links).
In the Sensitive Patterns section, you can download all detected sensitive data as a .txt file (type, value, file).

Configuration

Default settings (extensions, workers, crawling limits, etc.) are in config.py.
You can override settings via environment variables or CLI arguments.

Error & Debug Control

By default, the application is silent (no debug or error output to console).
Use --debug to see detailed logs and errors.
All third-party library logs (Scrapy, urllib3, asyncio) are suppressed unless debugging is enabled.

Extending & Testing

Add new file analyzers or sensitive pattern detectors in the utils/ directory.
Run unit tests:
```
pytest tests/
```

License

GPL v3

Credits

Uses Rich for beautiful terminal output.
Uses Scrapy for web crawling.
Uses PyPDF2, olefile, python-docx, openpyxl, python-pptx, rarfile, and others for file analysis.

Project Home: https://github.com/cumakurt/filemeta

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
core		core
reports		reports
tests		tests
utils		utils
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
config.py		config.py
filemeta.py		filemeta.py
report.png		report.png
requirements.txt		requirements.txt

License

cumakurt/filemeta

Folders and files

Latest commit

History

Repository files navigation

FileMeta

Features

Installation

Method 1: Direct Installation

Method 2: Virtual Environment Installation

Method 3: Docker Installation

Usage

Docker Usage

Display Help Menu

Scan a Local Directory

Scan a Website

Direct Installation Usage

Basic Examples

Advanced Analysis Examples

Docker Usage

Direct Installation Usage

Full CLI Help

Docker Usage

Direct Installation Usage

Enable Debug/Verbose Output

Docker Usage

Direct Installation Usage

Control Number of Workers

Docker Usage

Direct Installation Usage

Output Directory and Report Name

Docker Usage

Direct Installation Usage

HTML Report

Configuration

Error & Debug Control

Extending & Testing

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages