A high-performance file deduplication tool that detects and manages duplicate files using content hashing and intelligent similarity analysis.
- Fast and Concurrent: Uses Go's goroutines and worker pools for high-performance parallel file hashing
- SHA-256 Hashing: Secure and reliable content-based duplicate detection
- Fuzzy Hashing: Optional similarity detection for finding near-duplicate files
- Recursive Scanning: Scan entire directory trees with configurable depth
- Smart Filtering:
- Exclude patterns (glob-style)
- File size thresholds (min/max)
- Hidden file handling
- Symbolic link following
- Multiple Output Formats: Text, JSON, and CSV reports
- Interactive Modes:
- Terminal UI (TUI) using Bubbletea
- Interactive CLI deletion
- Flexible Actions:
- Dry-run mode (preview without changes)
- Automatic deletion (keeps oldest file)
- Hard-link replacement (saves disk space)
- Cross-Platform: Works on Linux, macOS, and Windows
git clone https://github.com/BaseMax/go-smart-deduper.git
cd go-smart-deduper
go build -o go-smart-dedupergo install github.com/BaseMax/go-smart-deduper@latestScan the current directory for duplicates:
go-smart-deduperScan specific directories:
go-smart-deduper /path/to/dir1 /path/to/dir2Set minimum file size (in bytes):
go-smart-deduper --min-size 1024Set maximum file size:
go-smart-deduper --max-size 10485760 # 10MBExclude patterns:
go-smart-deduper --exclude "*.tmp" --exclude "*.log"Include hidden files:
go-smart-deduper --exclude-hidden=falseFollow symbolic links:
go-smart-deduper --follow-symlinksGenerate JSON report:
go-smart-deduper --format jsonGenerate CSV report:
go-smart-deduper --format csv --output duplicates.csvVerbose output:
go-smart-deduper -vDry-run (preview without making changes):
go-smart-deduper --delete --dry-runInteractive deletion (choose which files to delete):
go-smart-deduper --interactiveAutomatic deletion (keeps oldest file in each duplicate group):
go-smart-deduper --deleteHard-link replacement (replace duplicates with hard links to save space):
go-smart-deduper --hard-linkLaunch the interactive TUI:
go-smart-deduper --tuiIn TUI mode:
- Use arrow keys or
j/kto navigate - Press
spaceto select duplicate groups - Press
qto quit
Note: TUI mode currently displays duplicates for review only. To delete files, use CLI mode with --interactive, --delete, or --hard-link options.
Use fuzzy hashing for similarity detection:
go-smart-deduper --fuzzySet number of worker threads:
go-smart-deduper --workers 8Combine multiple options:
go-smart-deduper /home/user/Documents \
--min-size 1024 \
--exclude "*.tmp" \
--exclude-hidden \
--workers 8 \
--format json \
--output report.json \
-v=== Duplicate Files Report ===
Group 1 (Hash: d2a84f4b8b650937...):
Count: 3 files
Size: 12 B per file
Wasted space: 24 B
Files:
- /tmp/test-deduper/file1.txt (modified: 2025-12-19 17:28:05)
- /tmp/test-deduper/file2.txt (modified: 2025-12-19 17:28:05)
- /tmp/test-deduper/subdir/file4.txt (modified: 2025-12-19 17:28:12)
=== Summary ===
Total duplicate groups: 1
Total duplicate files: 3
Total wasted space: 24 B
{
"duplicates": [
{
"hash": "d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26",
"files": [
"/tmp/test-deduper/file1.txt",
"/tmp/test-deduper/file2.txt",
"/tmp/test-deduper/subdir/file4.txt"
],
"size": 12,
"count": 3
}
],
"summary": {
"total_files": 3,
"total_groups": 1,
"wasted_space": 24
}
}Group,Hash,File,Size,Modified
1,d2a84f4b8b650937...,/tmp/test-deduper/file1.txt,12,2025-12-19 17:28:05
1,d2a84f4b8b650937...,/tmp/test-deduper/file2.txt,12,2025-12-19 17:28:05
1,d2a84f4b8b650937...,/tmp/test-deduper/subdir/file4.txt,12,2025-12-19 17:28:12The tool is organized into several packages:
- scanner: Recursive directory scanning with filtering
- hasher: SHA-256 and fuzzy hashing implementation
- deduper: Duplicate detection with worker pool pattern
- reporter: Report generation in multiple formats
- tui: Terminal UI using Bubbletea
- cmd: Command-line interface using Cobra
The tool uses several optimizations for performance:
- Size-based pre-filtering: Only files with identical sizes are compared
- Worker pool pattern: Concurrent file hashing with configurable workers
- Buffered I/O: Efficient file reading with 64KB buffers
- Early termination: Stops processing when no duplicates are possible
- Dry-run mode: Preview changes before committing
- Interactive mode: Manual control over deletions
- Oldest-first preservation: Automatic mode keeps the oldest file
- Error handling: Continues scanning even if some files are inaccessible
Flags:
-d, --delete Automatically delete duplicates (keep oldest)
-n, --dry-run Don't actually delete or modify files
-e, --exclude strings Exclude patterns (glob style)
--exclude-hidden Exclude hidden files and directories (default true)
--follow-symlinks Follow symbolic links
-f, --format string Output format: text, json, csv (default "text")
--fuzzy Use fuzzy hashing for similarity detection
--hard-link Replace duplicates with hard links
-h, --help help for go-smart-deduper
-i, --interactive Interactive deletion mode
--max-size int Maximum file size in bytes (0 = no limit)
--min-size int Minimum file size in bytes
-o, --output string Output file (default: stdout)
-p, --path strings Paths to scan (can specify multiple) (default [.])
-t, --tui Use Terminal UI mode
-v, --verbose Verbose output
-w, --workers int Number of worker goroutines for hashing (default 4)
Run the test suite:
go test ./pkg/...Run tests with coverage:
go test ./pkg/... -coverContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Max Base (@BaseMax)