PLA Benchmark

This is the official repository for the article Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis. We benchmark various Piecewise Linear Approximation (PLA) algorithms like GreedyPLA, OptimalPLA, SwingFilter used in learned indexes (e.g. PGM-Index, FITing-Tree).

Note

All experiments in the paper are conducted under the O0 optimization level to prevent the compiler from applying vectorization optimizations to some algorithms while not others, thereby ensuring fairness and direct comparability in the performance comparison of the PLA algorithms.

Introduction

This project aims to systematically compare different ε-PLA algorithms under unified benchmarking settings. These algorithms are used for segmenting key-index mappings with an error bound ε, essential for learned index structures like:

PGM-Index: hierarchical index using PLA at each level
FITing-Tree: learned B+ Tree with PLA-based leaf segments

Getting started

Clone the repo and install dependencies:

git clone https://github.com/bdhxxnix/PLABench.git

Prepare datasets

We write the preparation scripts based on SOSD repository: SOSD

First, please install Python and use pip to download necessary dependencies numpy and scipy.
Second, you need to install the zstd software to decompress datasets:

sudo apt install zstd

Third, run the script and wait for everything to be ready:

bash prepare_data.sh

Compile

After preparing your dataset, build the project by running:

./build.sh

This will generate all test runners inside the build/ directory.

Run

Before running the tests, ensure your dataset is ready and placed in the correct directory. Then run the test by:

./test.sh ./your_dataset/data

This script will automatically execute the following tests in order:

Linear Test
Linear Test with Different Threads
PGM Test
FIT Test

You can also navigate to the build/ directory and execute individual test runners manually. For example, to run only the PGM test:

cd build/
./pgmtest ./your_dataset/data

To capture all the outputs into a certain log file:

./test.sh ./your_dataset/data > output.log 2>&1

This will save everything printed to the terminal into output.log.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.vscode		.vscode
include		include
prepare		prepare
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
data_pre.cpp		data_pre.cpp
fitting_tree_test.cpp		fitting_tree_test.cpp
linear_model_test.cpp		linear_model_test.cpp
ltestForThreads.cpp		ltestForThreads.cpp
pgm_index_test.cpp		pgm_index_test.cpp
prepare_data.sh		prepare_data.sh
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLA Benchmark

Note

Introduction

Getting started

Prepare datasets

Compile

Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PLA Benchmark

Note

Introduction

Getting started

Prepare datasets

Compile

Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages