CRUSH is a novel lossless compression algorithm for floating-point data. It leverages a controlled realignment strategy with uniform shifts to achieve high compression ratios.
We recommend using Ubuntu 22.04 with the clang compiler for the best compatibility.
-
Install essential build tools and clang:
sudo apt-get update sudo apt-get install -y build-essential clang cmake pkg-config liblzma-dev
-
Clone the repository:
git clone git@github.com:Spatio-Temporal-Lab/Crush.git cd Crush
The project uses CMake for building.
-
Create a build directory and enter it:
mkdir build && cd build
-
Configure the project with CMake (from the build directory):
cmake ..
-
Compile the project:
make PerformanceProgram -j4
This will build the main performance testing executable
PerformancePrograminside thebuilddirectory.
We have integrated CRUSH and other baseline algorithms into Apache IoTDB v1.1 using the Java Native Interface (JNI). The implementation can be found in the jni/ directory.
A pre-built Docker image with these integrations is available on Docker Hub:
- Image:
sagann/iotdb:latest
You can pull the image using the following command:
docker pull sagann/iotdb:latestThe core logic for CRUSH is located in baselines/crush/.
baselines/crush/
├── crush.h # Main header for CRUSH
├── CrushCompressor.cpp # Compression logic
├── CrushDecompressor.cpp # Decompression logic
├── CrushAbCompressor.cpp # Ablation study compressor
├── CrushAbDecompressor.cpp # Ablation study decompressor
├── utils.cc # Utility functions
└── BitStream/ # Bit manipulation utilities
crush.h: Declares the interfaces for the CRUSH compressor and decompressor.CrushCompressor.cpp: Implements the CRUSH compression algorithm.CrushDecompressor.cpp: Implements the CRUSH decompression algorithm.Perf_test.cc: The main file for running performance tests.
The PerformanceProgram executable created in the build directory is used to run the performance evaluations.
cd build
./PerformanceProgramYou can also run specific test suites using filters:
./PerformanceProgram --gtest_filter=Perf.BatchSize
./PerformanceProgram --gtest_filter=Perf.Beta
./PerformanceProgram --gtest_filter=Perf.AblationThe datasets used for evaluation are located in the data_set/ directory. These include a variety of real-world floating-point data sources.
Air-pressure.csvAir-sensor.csvBasel-temp.csvBitcoin-price.csv- ... and many more.