This project implements a comprehensive floating-point compression analysis framework that evaluates the impact of zeroing out least significant bits (LSBs) on different statistical distributions. The tool performs in-depth entropy analysis, measures information loss, and includes parallel processing capabilities to leverage multi-core systems.
- Multi-distribution Analysis: Tests compression effects on various statistical distributions (Uniform, Gaussian, Exponential, Skewed, Bimodal)
- Entropy Measurement: Calculates Shannon entropy and Kullback-Leibler divergence to quantify information loss
- Bit-level Analysis: Examines bit patterns in IEEE 754 floating-point representation
- Parallel Processing: Uses Python's
ProcessPoolExecutorand MPI for scalable multi-core analysis - Comprehensive Visualization: Generates detailed plots showing compression ratio, MSE, entropy changes, and distribution comparisons
- Performance Analysis: Evaluates parallel scaling efficiency with Amdahl's Law estimates
- OS: Linux (Ubuntu or similar distributions recommended)
- CPU: Multi-core processor (testing performed on AMD Ryzen 5 5600H)
- Memory: Sufficient RAM to handle parallel processes (8GB+ recommended)
- Software:
- Python 3.x
- Open MPI (for MPI-based execution)
- Python packages: numpy, matplotlib, seaborn, pandas, scipy, zlib, json, mpi4py (optional)
# Create and activate a virtual environment
python3 -m venv compression_env
source compression_env/bin/activatepip install numpy matplotlib seaborn pandas scipy mpi4py# On Ubuntu/Debian
sudo apt update
sudo apt install openmpi-bin libopenmpi-dev
# Verify installation
mpirun --versionTo run the basic analysis using the EnhancedFloatingPointCompressor class:
python3 EnhancedFloatingPointCompressor.pyThis will:
- Generate various statistical distributions
- Analyze compression at different LSB zeroing levels
- Create visualizations in the
enhanced_compression_analysisdirectory - Output comprehensive reports on entropy and compression metrics
The main script already includes multi-core analysis using Python's built-in ProcessPoolExecutor. By default, it will test with 1, 2, 4, and 8 cores, but you can modify this in the script:
# Modify these values in the script
core_counts = [1, 2, 4, 8] # Adjust based on your hardwareFor MPI-based execution, use the provided shell script:
# Make the script executable
chmod +x multicoreAna.sh
# Run the analysis
./multicoreAna.shThis script will run the analysis with 1, 2, 4, and 6 cores by default. You can modify CORE_COUNTS in the script to test with different core counts.
The analysis generates the following outputs in the enhanced_compression_analysis directory:
-
PDF Visualizations:
comprehensive_compression_analysis.pdf: Detailed compression metrics for each distribution*_distribution_comparison.pdf: Visual comparison of original vs. compressed distributionsbit_pattern_analysis.pdf: Analysis of bit patterns for different distributionsmulticore_performance_analysis.pdf: Parallel performance metrics
-
Text Reports:
comprehensive_analysis_report.txt: Detailed metrics on compression effectivenessmulticore_performance_report.txt: Analysis of parallel scaling efficiency
-
JSON Data:
compression_analysis_results.json: Raw data for all compression metricsmulticore_performance_results.json: Performance measurements across core counts
- Entropy Reduction: Indicates information loss due to compression
- KL Divergence: Measures how much the compressed distribution differs from the original
- Compression Ratio: Higher values indicate better compressibility
- MSE (Mean Squared Error): Quantifies numerical accuracy loss
- Speedup: Execution time improvement relative to single-core baseline
- Efficiency: Speedup divided by core count (ideal is 100%)
- Theoretical Maximum: Estimated using Amdahl's Law based on serial fraction
EnhancedFloatingPointCompressor.py: Main analysis classCompressingInsightsParalleld.py: MPI-based parallel implementationmulticoreAna.sh: Shell script for running MPI analysis with various core countsenhanced_compression_analysis/: Output directory for results and visualizations
- "Insufficient slots" error: Use
--use-hwthread-cpusor--oversubscribeflag with mpirun - Executable not found: Ensure mpirun and Python are correctly installed and in your PATH
- Performance degradation: Too many processes can cause overhead; monitor system resources
- Missing modules: Ensure all required packages are installed
- Memory errors: Reduce sample_size in the compressor initialization if RAM is limited
- Implement adaptive LSB zeroing based on data characteristics
- Explore domain-specific compression strategies for scientific computing
- Add GPU-accelerated analysis for larger datasets
- Implement additional quality metrics for domain-specific applications
- Integrate with other compression algorithms for comparison