Skip to content

Conversation

@xe-nvdk
Copy link
Contributor

@xe-nvdk xe-nvdk commented Oct 6, 2025

Hey everyone,

We’re the new folks in the neighborhood, sharing ClickBench results for Arc, our time-series warehouse that’s launching soon.
I’ve made sure everything follows the benchmark requirements, but happy to adjust if needed.

Appreciate your work on this project!
– Ignacio

@CLAassistant
Copy link

CLAassistant commented Oct 6, 2025

CLA assistant check
All committers have signed the CLA.

@rschu1ze rschu1ze self-assigned this Oct 6, 2025
Copy link
Contributor Author

@xe-nvdk xe-nvdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to push a new update of this PR in a few minutes. Thank you for marking the issues.

@xe-nvdk

This comment was marked as resolved.

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 7, 2025

Just updated the files and make it public the repo. Thanks.

arc/benchmark.sh Outdated

# Install Python and dependencies
echo "Installing dependencies..."
pip3 install fastapi uvicorn duckdb pyarrow requests gunicorn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires running pip with --break-system-packages.

Would it be possible to create a Python venv? See e.g. chdb/benchmark.sh for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we have in our start.sh in the repo, I'm adding to this script.

arc/benchmark.sh Outdated

# Create API token for benchmark
python3 << EOF
from api.auth import AuthManager, Permission
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the next error here:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Permission' from 'api.auth' (/data/ClickBench/arc/arc/api/auth.py)

I checked, there is indeed no Permission class in file auth.py.

Copy link
Contributor Author

@xe-nvdk xe-nvdk Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff, thank you for this, its old code, in our repo we have this right. Let me update it here too.

## Prerequisites

- Ubuntu/Debian Linux (or compatible)
- Python 3.11+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no prerequisites - the benchmark runs automatically on an empty AWS machine with Ubuntu AMI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. We’ll revisit the submission later this year. For now, we’re happy to have the benchmark numbers internally and will use them for our own reference. Once we release official binaries, we’ll try again to get included in ClickBench.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem, let's push this PR to ClickBench. The more systems included, the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alexey-milovidov we just updated, we were able to run the benchmark.sh according to clickbench guidelines. Let me know if you have issues running, but shouldn't have any. Thank you.

@alexey-milovidov
Copy link
Member

No success so far:

Running ClickBench queries via Arc HTTP API...
================================================
Checking if Arc is running at http://localhost:8000...
Arc is running. Using parquet file: /ClickBench/arc/hits.parquet
Running 43 queries via Arc HTTP API...
Query 1 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 2 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 3 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 4 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 5 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 6 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 7 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 8 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 9 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 10 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 11 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 12 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 13 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 14 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 15 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 16 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 17 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 18 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 19 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 20 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 21 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 22 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 23 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 24 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 25 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 26 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 27 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 28 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 29 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 30 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 31 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 32 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 33 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 34 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 35 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 36 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 37 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 38 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 39 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 40 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 41 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 42 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Query 43 failed: 401 - {"error":"Unauthorized","detail":"Invalid or missing API token"}
Benchmark complete!

@alexey-milovidov
Copy link
Member

However, it did something before:

Creating API token...
Created API token: EdodzXfV99KRO-0XoWONWxxUs0AGH2HqjcNRq4c4rfg
Token created successfully

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 13, 2025

Ok, Thanks, this should be good now.

@alexey-milovidov
Copy link
Member

This still does not fit the format. It prints:

Formatting results...

✓ Benchmark complete!

Results saved to: results.json
Logs saved to: log.txt

To view results:
  cat results.json

But it does not run the actual command cat results.json to put results in the log.

@alexey-milovidov
Copy link
Member

This looks contradictory:

Verifying query cache configuration...
======================================================================
Query Cache Configuration Check
======================================================================
✓ arc.conf:     enabled = True
✓ .env:         QUERY_CACHE_ENABLED = false
  Environment:  QUERY_CACHE_ENABLED not set

⚠️  FINAL RESULT: Query cache is ENABLED
    TTL: 60s
    Max size: 100
======================================================================

@alexey-milovidov
Copy link
Member

This is not compliant:

Dataset size:
-rw-r--r-- 1 root root 14G Jun 25  2022 hits.parquet

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 14, 2025

This still does not fit the format. It prints:

Formatting results...

✓ Benchmark complete!

Results saved to: results.json
Logs saved to: log.txt

To view results:
  cat results.json

But it does not run the actual command cat results.json to put results in the log.

Ok. I thought that you were mentioning the post results for Arc, not what was printed.

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 14, 2025

This is not compliant:

Dataset size:
-rw-r--r-- 1 root root 14G Jun 25  2022 hits.parquet

What do you mean? is what is downloaded and saved in the data folder to query through the query http endpoint.

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 14, 2025

This looks contradictory:

Verifying query cache configuration...
======================================================================
Query Cache Configuration Check
======================================================================
✓ arc.conf:     enabled = True
✓ .env:         QUERY_CACHE_ENABLED = false
  Environment:  QUERY_CACHE_ENABLED not set

⚠️  FINAL RESULT: Query cache is ENABLED
    TTL: 60s
    Max size: 100
======================================================================

But if you see final result is what is important.

Looks like that we are playing to find any specific small detail to not keep moving forward with this and as I said, I respect what you guys built here, but I'm not, at least for now keep chasing this. We have the numbers for our internal references, for now is what is matter, if somebody want to replicate it can use what we have in our fork and the number can easy be replicated.

Thank you.

@alexey-milovidov
Copy link
Member

No need to close the PR. I can help with merging it.
We will need to remove all the AI slop, and it will be alright.

@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 14, 2025

Ok, let me go through this in a few days.

xe-nvdk and others added 5 commits October 15, 2025 20:14
Two fixes to benchmark.sh based on PR feedback:

1. Actually output results.json content to log
   - Changed from showing "To view results: cat results.json"
   - Now runs `cat results.json` directly so results appear in log
   - Makes CI logs and benchmark runs more useful

2. Remove contradictory checkmarks in cache configuration check
   - Was showing ✓ for both arc.conf (enabled=True) AND .env (enabled=false)
   - Now shows config sources as informational only (no checkmarks)
   - Only final result gets status indicator:
     * ✓ for cache disabled (good for benchmarks)
     * ✗ for cache enabled (with warning)
   - Clearer indication of actual runtime behavior

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <[email protected]>
@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 16, 2025

I think that we have it @alexey-milovidov, Can you check and let me know? now its print everything and print the results on screen. Thank you!

Also, I deleted the cache enabled results, we are going to submit those in a different folder, like arc-query-cache, unless that you recommend something different.

@alexey-milovidov
Copy link
Member

It works! I will run it on every machine type...

@alexey-milovidov alexey-milovidov merged commit d164185 into ClickHouse:main Oct 16, 2025
@xe-nvdk
Copy link
Contributor Author

xe-nvdk commented Oct 16, 2025

Thank you! We are going to add more systems too! Let me know any feedback that you have. Thanks again!

@alexey-milovidov
Copy link
Member

@xe-nvdk, something is wrong: #658

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants