-
Notifications
You must be signed in to change notification settings - Fork 237
feat(arc): add ClickBench results for Arc on c6a.4xlarge #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
29e86a5
adding arc values
xe-nvdk a433b23
we missed one query, now is complete
xe-nvdk 01b692e
fixing run.sh and re run, just in case both benchmark in pro m3 max a…
xe-nvdk b663892
disabling query caching and re ran the benchmarks
xe-nvdk 7a40588
updating repo to match the current for arc
xe-nvdk b934055
Merge branch 'main' into main
xe-nvdk fa56ed3
Merge branch 'ClickHouse:main' into main
xe-nvdk 7c0ccab
Merge branch 'ClickHouse:main' into main
xe-nvdk 1db5924
adding updated values for m3 max
xe-nvdk 08fe758
Merge branch 'main' of github.com:Basekick-Labs/ClickBench
xe-nvdk bde45ce
updating results and scripts for arc
xe-nvdk 7135fff
Merge branch 'ClickHouse:main' into main
xe-nvdk 3a00ca3
fixing benchmark to load the data
xe-nvdk 757d7fa
Merge branch 'main' of github.com:Basekick-Labs/ClickBench
xe-nvdk 6e70633
fixing token creation
xe-nvdk 32c62ba
fixing api env passing
xe-nvdk 56702bc
fixing db specification for api creation
xe-nvdk 82abc81
making sure that we don't have enabled query cache
xe-nvdk d6904f8
adding results for arc in clickbench
xe-nvdk 48a8fc9
Merge branch 'main' into main
xe-nvdk 8333f83
Merge branch 'ClickHouse:main' into main
xe-nvdk 799b4a7
refining format of the results
xe-nvdk ecd0414
refining format of the results
xe-nvdk b905b50
Merge branch 'ClickHouse:main' into main
xe-nvdk ad86bf5
Merge branch 'main' of github.com:Basekick-Labs/ClickBench
xe-nvdk 97da2bd
deleting comments in the results
xe-nvdk 716b715
adding time-series tag
xe-nvdk 705c8bf
Merge branch 'ClickHouse:main' into main
xe-nvdk 4fef7fc
fix: improve benchmark output clarity and cache status reporting
xe-nvdk a49a8ef
Some fixes for results display, and print of caching status
xe-nvdk 9a0b9b1
fixing and modifying things based on clickbench team
xe-nvdk 229e53f
Merge branch 'main' of github.com:Basekick-Labs/ClickBench
xe-nvdk 9f3b46d
Merge branch 'ClickHouse:main' into main
xe-nvdk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| # Arc - ClickBench Benchmark | ||
|
|
||
| Arc is a high-performance time-series data warehouse built on DuckDB, Parquet, and object storage. | ||
|
|
||
| ## System Information | ||
|
|
||
| - **System:** Arc | ||
| - **Date:** 2025-10-15 | ||
| - **Machine:** m3_max (14 cores, 36GB RAM) | ||
| - **Tags:** Python, time-series, DuckDB, Parquet, columnar, HTTP API | ||
| - **License:** AGPL-3.0 | ||
| - **Repository:** https://github.com/Basekick-Labs/arc | ||
|
|
||
| ## Performance | ||
|
|
||
| Arc achieves: | ||
| - **Write throughput:** 1.89M records/sec (MessagePack binary protocol) | ||
| - **ClickBench:** ~22 seconds total (43 analytical queries) | ||
| - **Storage:** DuckDB + Parquet with MinIO/S3/GCS backends | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Ubuntu/Debian Linux (or compatible) | ||
| - Python 3.11+ | ||
| - 8GB+ RAM recommended | ||
| - Internet connection for dataset download | ||
| - Sudo access (only if system dependencies are missing) | ||
|
|
||
| ## Quick Start | ||
|
|
||
| The benchmark script handles everything automatically: | ||
|
|
||
| ```bash | ||
| ./benchmark.sh | ||
| ``` | ||
|
|
||
| This will: | ||
| 1. Create Python virtual environment (no system packages modified) | ||
| 2. Clone Arc repository | ||
| 3. Install dependencies in venv | ||
| 4. Start Arc server with optimal worker count (2x CPU cores) | ||
| 5. Download ClickBench dataset (14GB parquet file) | ||
| 6. Run 43 queries × 3 iterations | ||
| 7. Output results in ClickBench JSON format | ||
|
|
||
| ## Manual Steps | ||
|
|
||
| ### 1. Install Dependencies | ||
|
|
||
| ```bash | ||
| sudo apt-get update -y | ||
| sudo apt-get install -y python3-pip python3-venv wget curl | ||
| ``` | ||
|
|
||
| ### 2. Create Virtual Environment | ||
|
|
||
| ```bash | ||
| python3 -m venv arc-venv | ||
| source arc-venv/bin/activate | ||
| ``` | ||
|
|
||
| ### 3. Clone and Setup Arc | ||
|
|
||
| ```bash | ||
| git clone https://github.com/Basekick-Labs/arc.git | ||
| cd arc | ||
| pip install -r requirements.txt | ||
| mkdir -p data logs | ||
| ``` | ||
|
|
||
| ### 4. Create API Token | ||
|
|
||
| ```bash | ||
| python3 << 'EOF' | ||
| from api.auth import AuthManager | ||
|
|
||
| auth = AuthManager(db_path='./data/arc.db') | ||
| token = auth.create_token(name='clickbench', description='ClickBench benchmark') | ||
| print(f"Token: {token}") | ||
| EOF | ||
| ``` | ||
|
|
||
| ### 5. Start Arc Server | ||
|
|
||
| ```bash | ||
| # Auto-detect cores | ||
| CORES=$(nproc) | ||
| WORKERS=$((CORES * 2)) | ||
|
|
||
| # Start server | ||
| gunicorn -w $WORKERS -b 0.0.0.0:8000 \ | ||
| -k uvicorn.workers.UvicornWorker \ | ||
| --timeout 300 \ | ||
| api.main:app | ||
| ``` | ||
|
|
||
| ### 6. Download Dataset | ||
|
|
||
| ```bash | ||
| wget https://datasets.clickhouse.com/hits_compatible/hits.parquet | ||
| ``` | ||
|
|
||
| ### 7. Run Benchmark | ||
|
|
||
| ```bash | ||
| export ARC_URL="http://localhost:8000" | ||
| export ARC_API_KEY="your-token-from-step-4" | ||
| export DATABASE="clickbench" | ||
| export TABLE="hits" | ||
|
|
||
| ./run.sh | ||
| ``` | ||
|
|
||
| **Note:** The benchmark uses Apache Arrow columnar format for optimal performance. Requires `pyarrow` to be installed. | ||
|
|
||
| ## Configuration | ||
|
|
||
| Arc uses optimal settings for ClickBench (all automatic, no configuration needed): | ||
|
|
||
| - **Workers:** Auto-detected cores × 2 (optimal for analytical workloads) | ||
| - **Query cache:** Disabled (per ClickBench rules) | ||
| - **Storage:** Local filesystem (fastest for single-node) | ||
| - **Timeout:** 300 seconds per query | ||
| - **Format:** Apache Arrow (columnar, high-performance) | ||
|
|
||
| ## Results Format | ||
|
|
||
| Results are output in official ClickBench format: | ||
|
|
||
| ``` | ||
| Load time: 0 | ||
| Data size: 14779976446 | ||
| [0.0226, 0.0233, 0.0284] | ||
| [0.0324, 0.0334, 0.0392] | ||
| ... | ||
| ``` | ||
|
|
||
| - **Load time:** Arc queries Parquet files directly without a data loading phase (load time = 0) | ||
| - **Data size:** Size of the dataset in bytes (14GB) | ||
| - **Query results:** 43 lines, each containing 3 execution times (in seconds) for the same query | ||
|
|
||
| ## Notes | ||
|
|
||
| - **Virtual Environment:** All dependencies installed in isolated venv (no `--break-system-packages` needed) | ||
| - **Authentication:** Uses Arc's built-in token auth (simpler than Permission-based auth) | ||
| - **Query Cache:** Disabled to ensure fair benchmark (no cache hits) | ||
| - **Worker Count:** Auto-detected based on CPU cores, optimized for analytical workloads | ||
| - **Timeout:** Generous 300s timeout for complex queries | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| ClickBench Query → Arc Arrow API → DuckDB → Parquet File → Arrow Results | ||
| ``` | ||
|
|
||
| Arc queries the Parquet file directly via DuckDB's `read_parquet()` function and returns results in Apache Arrow columnar format for maximum efficiency. | ||
|
|
||
| ## Performance Characteristics | ||
|
|
||
| Arc is optimized for: | ||
| - **High-throughput writes** (1.89M RPS with MessagePack) | ||
| - **Analytical queries** (DuckDB's columnar engine) | ||
| - **Columnar data transfer** (Apache Arrow IPC for efficient results) | ||
| - **Object storage** (S3, GCS, MinIO compatibility) | ||
| - **Time-series workloads** (built-in time-based indexing) | ||
|
|
||
| ## Support | ||
|
|
||
| - GitHub: https://github.com/Basekick-Labs/arc | ||
| - Issues: https://github.com/Basekick-Labs/arc/issues | ||
| - Docs: https://docs.arc.basekick.com (coming soon) | ||
|
|
||
| ## License | ||
|
|
||
| Arc Core is licensed under AGPL-3.0. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no prerequisites - the benchmark runs automatically on an empty AWS machine with Ubuntu AMI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. We’ll revisit the submission later this year. For now, we’re happy to have the benchmark numbers internally and will use them for our own reference. Once we release official binaries, we’ll try again to get included in ClickBench.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a problem, let's push this PR to ClickBench. The more systems included, the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alexey-milovidov we just updated, we were able to run the benchmark.sh according to clickbench guidelines. Let me know if you have issues running, but shouldn't have any. Thank you.