Skip to content

cumakurt/filemeta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FileMeta

Developer: Cuma KURT
Email: [email protected]
LinkedIn
GitHub

A modular, high-performance file and web domain metadata analyzer and sensitive data scanner.

FileMeta Report Example

Features

  • Modular Architecture: Clean separation of CLI, scanning, analysis, reporting, configuration, and utilities.
  • Multiprocessing: Fast file scanning and analysis using parallel processing (default: 5 workers, configurable).
  • Sensitive Data Detection: Finds sensitive patterns (e.g., passwords, emails, Turkish ID, credit cards) in files and web downloads.
  • Archive Extraction: Automatically extracts and scans contents of .zip, .rar, .tar, .gz, .bz2, .tgz, .tar.gz, .tar.bz2, .tbz2 files (excluding password-protected archives).
  • Office & PDF Analysis: Deep metadata and content analysis for Office (.doc, .docx, .xls, .xlsx, .ppt, .pptx) and PDF files, with fallback and external tool support (catdoc).
  • HTML Reporting: Generates modern, interactive HTML reports with clickable file paths and downloadable sensitive data summaries.
  • Sensitive Data Export: Download all detected sensitive data (type, value, file) as a .txt file directly from the report.
  • Web Domain Scanning: Crawls and downloads files from web domains, with clear error messages for unreachable URLs.
  • Configurable Logging: All debug and error messages are shown only with --debug; otherwise, logs are silent and third-party library logs are suppressed.
  • Extensible Filtering: Filter results by user, date, keyword, and more.
  • Testable: Includes unit tests for validators and filters.
  • Optional Advanced Analyses: Enable with CLI flags:
    • --analyze-links: Extract links, emails, IPs, and domains from PDF, DOCX, and HTML files (shown in a separate exportable table in the report).
    • --analyze-macros: Detect macros, scripts, and OLE objects in Office files (shown in a separate exportable table in the report).
    • --analyze-stegano: Detect basic steganography (LSB) in images and hidden base64 data in text files (shown in a separate exportable table in the report).

Installation

Method 1: Direct Installation

  1. Clone the repository:
    git clone https://github.com/cumakurt/filemeta.git
    cd filemeta/filemeta
  2. Install dependencies:
    pip install -r requirements.txt
    • For .rar archive support: pip install rarfile and ensure unrar is installed on your system.
    • For .doc content extraction: install catdoc (e.g., sudo apt install catdoc).

Method 2: Virtual Environment Installation

⚠️ Important: FileMeta requires Python 3.10, 3.11, or 3.12. Python 3.13 is currently not supported.

  1. Clone the repository:

    git clone https://github.com/cumakurt/filemeta.git
    cd filemeta/filemeta
  2. Create and activate a virtual environment:

    python3 -m venv myenv
    source myenv/bin/activate
    pip install -r requirements.txt
  3. Install dependencies:

    pip install -r requirements.txt
    • For .rar archive support: pip install rarfile and ensure unrar is installed on your system.
    • For .doc content extraction: install catdoc (e.g., sudo apt install catdoc).
  4. Deactivate virtual environment when done:

    deactivate

Method 3: Docker Installation

  1. Clone the repository:
    git clone https://github.com/cumakurt/filemeta.git
    cd filemeta
  2. Build Docker image:
    docker build -t filemeta:latest .

Usage

Docker Usage

Display Help Menu

docker run --rm filemeta:latest --help

Scan a Local Directory

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Explanation: This command sequence does the following:

  • Mounts /home/cuma/Downloads directory to Docker container as /scan
  • Scans this directory and creates reports in /home/appuser/app/filemeta/reports/
  • Copies reports to your current directory (host machine)
  • Cleans up the used Docker container

Scan a Website

docker run --name filemeta filemeta:latest -u https://www.cumakurt.com && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Explanation: This command sequence does the following:

  • Scans the specified website (https://www.cumakurt.com)
  • Saves the obtained reports to the reports folder
  • Copies the reports folder to your current directory (host machine)
  • Removes the used Docker container and cleans the environment

Direct Installation Usage

Basic Examples

filemeta.py -p /path/to/files
filemeta.py -u https://example.com

Advanced Analysis Examples

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --analyze-links --analyze-macros --analyze-stegano && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

filemeta -p /path/to/files --analyze-links
filemeta -p /path/to/files --analyze-macros
filemeta -p /path/to/files --analyze-stegano
filemeta -p /path/to/files --analyze-links --analyze-macros --analyze-stegano
  • By default, advanced analyses (links, macros, steganography) are disabled for performance. Enable them with the flags above.

Full CLI Help

Docker Usage

docker run --rm filemeta:latest --help

Direct Installation Usage

filemeta --help

for all options and usage details.

Enable Debug/Verbose Output

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --debug && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan --debug

Control Number of Workers

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan --workers 8 && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan --workers 8

Output Directory and Report Name

Docker Usage

docker run --name filemeta -v /home/cuma/Downloads/:/scan filemeta:latest -p /scan -o /output/dir -r myreport.html && \
docker cp filemeta:/home/appuser/app/filemeta/reports/ . && \
docker rm filemeta

Direct Installation Usage

python filemeta.py -p /path/to/scan -o /output/dir -r myreport.html

HTML Report

  • All file paths in the report are clickable (file:// links).
  • In the Sensitive Patterns section, you can download all detected sensitive data as a .txt file (type, value, file).

Configuration

  • Default settings (extensions, workers, crawling limits, etc.) are in config.py.
  • You can override settings via environment variables or CLI arguments.

Error & Debug Control

  • By default, the application is silent (no debug or error output to console).
  • Use --debug to see detailed logs and errors.
  • All third-party library logs (Scrapy, urllib3, asyncio) are suppressed unless debugging is enabled.

Extending & Testing

  • Add new file analyzers or sensitive pattern detectors in the utils/ directory.
  • Run unit tests:
    pytest tests/

License

GPL v3

Credits


Project Home: https://github.com/cumakurt/filemeta

About

A modular, high-performance file and web domain metadata analyzer and sensitive data scanner.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published