Lossy Floating Point Compression Analysis

This project implements a comprehensive floating-point compression analysis framework that evaluates the impact of zeroing out least significant bits (LSBs) on different statistical distributions. The tool performs in-depth entropy analysis, measures information loss, and includes parallel processing capabilities to leverage multi-core systems.

📊 Features

Multi-distribution Analysis: Tests compression effects on various statistical distributions (Uniform, Gaussian, Exponential, Skewed, Bimodal)
Entropy Measurement: Calculates Shannon entropy and Kullback-Leibler divergence to quantify information loss
Bit-level Analysis: Examines bit patterns in IEEE 754 floating-point representation
Parallel Processing: Uses Python's ProcessPoolExecutor and MPI for scalable multi-core analysis
Comprehensive Visualization: Generates detailed plots showing compression ratio, MSE, entropy changes, and distribution comparisons
Performance Analysis: Evaluates parallel scaling efficiency with Amdahl's Law estimates

🔧 System Requirements

OS: Linux (Ubuntu or similar distributions recommended)
CPU: Multi-core processor (testing performed on AMD Ryzen 5 5600H)
Memory: Sufficient RAM to handle parallel processes (8GB+ recommended)
Software:
- Python 3.x
- Open MPI (for MPI-based execution)
- Python packages: numpy, matplotlib, seaborn, pandas, scipy, zlib, json, mpi4py (optional)

🚀 Installation

1. Set up a Python environment

# Create and activate a virtual environment
python3 -m venv compression_env
source compression_env/bin/activate

2. Install required Python packages

pip install numpy matplotlib seaborn pandas scipy mpi4py

3. Install MPI (for MPI-based execution)

# On Ubuntu/Debian
sudo apt update
sudo apt install openmpi-bin libopenmpi-dev

# Verify installation
mpirun --version

💻 Usage

Single-core Analysis

To run the basic analysis using the EnhancedFloatingPointCompressor class:

python3 EnhancedFloatingPointCompressor.py

This will:

Generate various statistical distributions
Analyze compression at different LSB zeroing levels
Create visualizations in the enhanced_compression_analysis directory
Output comprehensive reports on entropy and compression metrics

Multi-core Analysis with ProcessPoolExecutor

The main script already includes multi-core analysis using Python's built-in ProcessPoolExecutor. By default, it will test with 1, 2, 4, and 8 cores, but you can modify this in the script:

# Modify these values in the script
core_counts = [1, 2, 4, 8]  # Adjust based on your hardware

Multi-core Analysis with MPI

For MPI-based execution, use the provided shell script:

# Make the script executable
chmod +x multicoreAna.sh

# Run the analysis
./multicoreAna.sh

This script will run the analysis with 1, 2, 4, and 6 cores by default. You can modify CORE_COUNTS in the script to test with different core counts.

📋 Output

The analysis generates the following outputs in the enhanced_compression_analysis directory:

PDF Visualizations:
- comprehensive_compression_analysis.pdf: Detailed compression metrics for each distribution
- *_distribution_comparison.pdf: Visual comparison of original vs. compressed distributions
- bit_pattern_analysis.pdf: Analysis of bit patterns for different distributions
- multicore_performance_analysis.pdf: Parallel performance metrics
Text Reports:
- comprehensive_analysis_report.txt: Detailed metrics on compression effectiveness
- multicore_performance_report.txt: Analysis of parallel scaling efficiency
JSON Data:
- compression_analysis_results.json: Raw data for all compression metrics
- multicore_performance_results.json: Performance measurements across core counts

📊 Understanding the Results

Compression Metrics

Entropy Reduction: Indicates information loss due to compression
KL Divergence: Measures how much the compressed distribution differs from the original
Compression Ratio: Higher values indicate better compressibility
MSE (Mean Squared Error): Quantifies numerical accuracy loss

Parallel Performance

Speedup: Execution time improvement relative to single-core baseline
Efficiency: Speedup divided by core count (ideal is 100%)
Theoretical Maximum: Estimated using Amdahl's Law based on serial fraction

🔍 Project Structure

EnhancedFloatingPointCompressor.py: Main analysis class
CompressingInsightsParalleld.py: MPI-based parallel implementation
multicoreAna.sh: Shell script for running MPI analysis with various core counts
enhanced_compression_analysis/: Output directory for results and visualizations

🛠️ Troubleshooting

MPI Issues

"Insufficient slots" error: Use --use-hwthread-cpus or --oversubscribe flag with mpirun
Executable not found: Ensure mpirun and Python are correctly installed and in your PATH
Performance degradation: Too many processes can cause overhead; monitor system resources

Python Issues

Missing modules: Ensure all required packages are installed
Memory errors: Reduce sample_size in the compressor initialization if RAM is limited

🔮 Future Improvements

Implement adaptive LSB zeroing based on data characteristics
Explore domain-specific compression strategies for scientific computing
Add GPU-accelerated analysis for larger datasets
Implement additional quality metrics for domain-specific applications
Integrate with other compression algorithms for comparison

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
CompressingInsightsParalleld.py		CompressingInsightsParalleld.py
README.md		README.md
loosyCompression.ipynb		loosyCompression.ipynb
multicoreAna.sh		multicoreAna.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lossy Floating Point Compression Analysis

📊 Features

🔧 System Requirements

🚀 Installation

1. Set up a Python environment

2. Install required Python packages

3. Install MPI (for MPI-based execution)

💻 Usage

Single-core Analysis

Multi-core Analysis with ProcessPoolExecutor

Multi-core Analysis with MPI

📋 Output

📊 Understanding the Results

Compression Metrics

Parallel Performance

🔍 Project Structure

🛠️ Troubleshooting

MPI Issues

Python Issues

🔮 Future Improvements

About

Uh oh!

Releases

Packages

Languages

walid885/LoosyCompressedFloatingPoint

Folders and files

Latest commit

History

Repository files navigation

Lossy Floating Point Compression Analysis

📊 Features

🔧 System Requirements

🚀 Installation

1. Set up a Python environment

2. Install required Python packages

3. Install MPI (for MPI-based execution)

💻 Usage

Single-core Analysis

Multi-core Analysis with ProcessPoolExecutor

Multi-core Analysis with MPI

📋 Output

📊 Understanding the Results

Compression Metrics

Parallel Performance

🔍 Project Structure

🛠️ Troubleshooting

MPI Issues

Python Issues

🔮 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages