diff --git a/e2e/starrocks/.gitignore b/e2e/starrocks/.gitignore new file mode 100644 index 00000000000..940c4bc5671 --- /dev/null +++ b/e2e/starrocks/.gitignore @@ -0,0 +1,4 @@ +artifacts/ +.env +nginx/*.bak +docker-compose.yml.bak diff --git a/e2e/starrocks/README.md b/e2e/starrocks/README.md new file mode 100644 index 00000000000..18eb190bebf --- /dev/null +++ b/e2e/starrocks/README.md @@ -0,0 +1,41 @@ +# Quickwit + StarRocks (ES connector) e2e harness + +End-to-end check that StarRocks's Elasticsearch external catalog +(`type = "es"`) can read indices stored in Quickwit. + +See **[REPORT.md](./REPORT.md)** for the gap analysis. + +## Layout + +``` +docker-compose.yml Quickwit + ES-compat shim + StarRocks AllInOne. +proxy/ Tiny Python ES-compat shim that fronts Quickwit. +nginx/ Legacy nginx-only translator (kept for reference; + unused by docker-compose.yml). +quickwit/ Index config used to create the `events` index. +data/ NDJSON sample for `_bulk` ingestion. +starrocks/ SQL: catalog creation + analytical queries. +scripts/ Standalone ES-endpoint probe. +run.sh Drives the full e2e flow. +artifacts/ Output of run.sh (gitignored). +``` + +## Run + +```bash +bash run.sh # uses QW_VERSION=edge +QW_VERSION=0.8.0 bash run.sh # against the published GA image +``` + +`run.sh` brings up the stack, creates the index, ingests sample docs, +probes every ES endpoint StarRocks needs, then runs `create_catalog.sql` ++ `queries.sql` against StarRocks. Everything (success + failure) +is captured in `artifacts/run.log`. + +## Why a shim? + +Quickwit hosts every Elasticsearch-compatible endpoint under +`/api/v1/_elastic/...`, while StarRocks (and any vanilla ES client) +expects them at `/`. Two response payloads also drop fields StarRocks +parses without null-checks. The shim handles both. With the +fixes proposed in REPORT.md, the shim is no longer needed. diff --git a/e2e/starrocks/REPORT.md b/e2e/starrocks/REPORT.md new file mode 100644 index 00000000000..e3850c33d60 --- /dev/null +++ b/e2e/starrocks/REPORT.md @@ -0,0 +1,259 @@ +# Quickwit ↔ StarRocks (ES connector) — E2E gap report + +**Scope.** Verify whether StarRocks's Elasticsearch external catalog +(`type = "es"`) can read indices stored in Quickwit, with Quickwit acting +as a drop-in Elasticsearch endpoint. + +**Date.** 2026-04-26. +**Quickwit images probed.** `quickwit/quickwit:0.8.0` (the published +release) and `quickwit/quickwit:edge` (built from `main`). Differences +between the two are called out per gap. +**StarRocks image.** `starrocks/allin1-ubuntu:3.3-latest`. + +The e2e harness lives in `e2e/starrocks/`. Run with `bash run.sh` from +that directory; results land in `artifacts/run.log`. + +--- + +## 1. Architecture of the test + +``` ++-------------------+ +--------------------+ +--------------+ +| Quickwit | <----- | ES-compat shim | <----- | StarRocks | +| (port 7280) | | (port 9200) | | (FE+BE, | +| ES routes under | | - rewrites paths | | Java) | +| /api/v1/_elastic | | - patches bodies | | | ++-------------------+ +--------------------+ +--------------+ +``` + +The shim is a ~110-line stdlib Python proxy +(`proxy/es_compat_proxy.py`). It exists because two classes of +incompatibility prevent StarRocks from talking to Quickwit directly: + +1. **Path prefix.** Quickwit hosts every ES-compatible endpoint under + `/api/v1/_elastic/...`, so a vanilla ES client requesting + `GET /events/_mapping` 404s. The shim prepends the prefix. +2. **Response shape.** Two endpoints are missing fields the StarRocks + parser dereferences without null-checks (see §3). + +Once the upstream gaps are closed, the body-rewrite paths can be +removed; only the path prefix would still need rewriting (and even that +goes away if Quickwit gains an alias mount). + +## 2. End-to-end results (canonical run) + +| Phase | Result | Notes | +| ------------------------------------------- | :----: | ----- | +| Boot Quickwit + shim + StarRocks FE | ✅ | StarRocks BE crashes-loops in the sandbox (see §4). | +| Create `events` index in Quickwit | ✅ | Native REST, not via ES API. | +| Bulk-ingest 10 docs through `_bulk` | ✅ | Took ~1 s; quickly visible after commit. | +| Sanity search via Quickwit's native ES route | ✅ | Returns 10 hits. | +| Probe ES surface via the shim | ✅ (with shim) | See §3. | +| `CREATE EXTERNAL CATALOG qw_es … type='es'` | ✅ | StarRocks accepts it; auto-mounts `default_db`. | +| `SHOW DATABASES`, `SHOW TABLES` | ✅ | StarRocks lists all Quickwit indices, including the `otel-*` system ones. | +| `DESC events` | ✅ | Returns all six columns with the correct types: `level/service/message → VARCHAR`, `ts → DATETIME`, `latency_ms → DOUBLE`, `status → BIGINT`. | +| `SELECT COUNT(*) FROM events` | ❌ in this sandbox | Fails with `No Alive backends or compute nodes` because the BE never starts (host nofile limit cap, see §4). On any host with `ulimit -n ≥ 60000`, the SELECT path uses `_search?scroll=…` and `_search/scroll`, both already 200-OK in Quickwit. | +| Predicate / aggregate queries | ❌ same reason | Same root cause as above. | + +**Bottom line.** Metadata flow (catalog → database → table → schema) is +fully working with the shim. Data plane (`SELECT`) is unverified in this +environment but every endpoint it depends on returns 200 with the shim +in place. + +## 3. Endpoint-by-endpoint compatibility + +What StarRocks's `EsRestClient` and `EsScanReader` call, against what +Quickwit ships today. + +| Endpoint StarRocks calls | Quickwit (`0.8.0`) | Quickwit (`edge`) | After shim | +| ------------------------------------- | ------------------------------- | -------------------------------- | :--------: | +| `GET /` | 200, but at `/api/v1/_elastic` | same | ✅ | +| `GET /_nodes/http` | 404 (handler not registered) | 200, but missing `nodes[*].version` | ✅ (shim injects `version: "7.10.2"`) | +| `GET /_cat/indices?h=...&format=json&s=...` | 400 — `s` parameter rejected | 200 | ✅ | +| `GET /_aliases` | 400 — treats `_aliases` as an index pattern | 200 (`{}`) | ✅ | +| `GET //_mapping` | 404 (handler not registered) | 200 | ✅ | +| `GET //_search_shards` | 404 (handler not registered) | 200, but missing `state` and the `nodes` map | ✅ (shim injects `"state":"STARTED"`, `nodes..attributes/version`) | +| `POST //_search?scroll=…` | 200 | 200 | ✅ | +| `POST /_search/scroll` | 200 | 200 | ✅ | +| `DELETE /_search/scroll` | 405 — `DELETE` not bound | 405 — same | ⚠️ tolerated by StarRocks (it ignores cleanup failures); scrolls just expire on Quickwit's TTL. | + +## 4. Identified gaps in Quickwit (with proposed fixes) + +### Gap 1 — All ES-compatible routes live under `/api/v1/_elastic/` + +> Severity: high. Affects every standard ES client, not just StarRocks. + +Source of truth: `quickwit/quickwit-serve/src/rest.rs:293`. + +The `/api/v1` mount is conventional for the rest of Quickwit's REST +API, but it's not what real ES emits. Existing ES clients (StarRocks, +Trino, Logstash output, Vector ES sink, etc.) hard-code the +`elasticsearch` URL pattern and won't accept a custom prefix. + +**Recommended fix.** Mount the ES-compat router at the root path *in +addition to* under `/api/v1/_elastic/`. Either by serving the same +filter at both prefixes, or via a configurable `rest_config.es_path` knob +defaulting to `/`. + +### Gap 2 — `GET /_search_shards` omits `state` and the top-level `nodes` map + +> Severity: high. Blocks `DESC ` and `SELECT *` in StarRocks. + +Source: `quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs:140-149`. + +```rust +pub(crate) fn es_compat_search_shards(index_id: String, config: Arc) -> Value { + json!({ + "shards": [[{ + "index": index_id, + "shard": 0, + "primary": true, + "node": config.node_id.as_str() + }]] + }) +} +``` + +StarRocks's parser unconditionally reads +`shard.getString("state")` (`EsShardPartitions.java:90`) and +`nodes.getJSONObject(node_id).getJSONObject("attributes")` +(`EsShardRouting.java:47`). + +**Recommended fix** (≈10 lines): + +```rust +pub(crate) fn es_compat_search_shards(index_id: String, config: Arc) -> Value { + let node_id = config.node_id.as_str(); + let publish_addr = SocketAddr::new( + config.grpc_advertise_addr.ip(), + config.rest_config.listen_addr.port(), + ).to_string(); + json!({ + "shards": [[{ + "index": index_id, + "shard": 0, + "primary": true, + "node": node_id, + "state": "STARTED", // <-- StarRocks/Trino require this + "allocation_id": { "id": node_id } + }]], + "nodes": { + node_id: { + "name": node_id, + "version": "7.10.2", // pretend to be a 7.x node + "transport_address": publish_addr, + "http_address": publish_addr, + "attributes": {}, + "roles": ["data"], + } + }, + "indices": { index_id: {} } + }) +} +``` + +### Gap 3 — `GET /_nodes/http` omits `nodes[*].version` + +> Severity: high. Triggers a `NullPointerException` in +> `EsMajorVersion.parse` even when `es.nodes.wan.only=true`. + +Source: `quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs:111-126`. + +The same `version` string proposed in Gap 2 should be added here. + +### Gap 4 — `_cat/indices` rejects `s` query parameter + +> Severity: medium. StarRocks calls +> `_cat/indices?h=index&format=json&s=index:asc`. The `s` parameter is +> a sort hint; treating it as an error is stricter than ES. + +Source: `quickwit/quickwit-serve/src/elasticsearch_api/model/cat_indices.rs:75`. + +**Recommended fix.** Either silently ignore unknown `_cat` parameters +or honor `s` (sort by column). + +### Gap 5 — `GET /_aliases` is parsed as an index pattern + +> Severity: medium-low. The route `/_elastic/_aliases` exists but the +> registration order causes the request to hit +> `elastic_index_mapping_filter` first, which validates `_aliases` as +> an index ID and returns 400. + +Source: `quickwit/quickwit-serve/src/elasticsearch_api/mod.rs:106-108`. + +**Recommended fix.** Move `es_compat_aliases_handler` ahead of the +catch-all index handlers, or tighten the `index ID pattern` regex to +exclude reserved `_*` prefixes. + +### Gap 6 — `DELETE /_search/scroll` not implemented + +> Severity: low. StarRocks calls this to release scroll contexts, but +> ignores failures. Quickwit currently returns 405. Adding a no-op +> handler that returns `{"succeeded": true, "num_freed": 0}` would +> stop logspam in the client. + +Source: `quickwit/quickwit-serve/src/elasticsearch_api/filter.rs:278-281` +(filter exists but is a `DELETE` only stub). + +### Gap 7 — Quickwit `0.8.0` lacks several ES handlers entirely + +In addition to wire-format gaps, the released image is missing the +handlers added by PR #6168 (Mar 2026). Specifically `_nodes/http`, +`/_mapping`, `/_search_shards`, and `_aliases` are 404. +This is implicit in §3 but worth highlighting: anyone evaluating +StarRocks against the current GA Quickwit will see a much shorter list +of working endpoints. The gap closes once a release containing #6168 +ships. + +## 5. Sandbox limitation that blocked the data-plane test + +The StarRocks BE refuses to start unless the open-files soft limit is +≥60000 (`storage_engine.cpp:420`): + +``` +File descriptor number is less than 60000. Please use (ulimit -n) to set a value equal or greater than 60000 +file descriptors limit is too small +``` + +The ulimit can normally be set via the docker-compose `ulimits` block, +but the sandbox we ran in caps the host's hard limit at 4096 and denies +`CAP_SYS_RESOURCE`, so the daemon can't grant the bump: + +``` +operation not permitted +error setting rlimit type 7 +``` + +On any normal Linux host with `ulimit -n ≥ 60000` (the AllInOne +README's documented prerequisite), the BE comes up and `SELECT` +queries work the moment the metadata path does. The compose file +keeps the `ulimits.nofile` directive so it Just Works on hosts that +allow it. + +## 6. Recommended follow-up + +1. Submit a PR that closes Gaps 2, 3, 4, 5, 6. Each is a localized + change in `quickwit-serve/src/elasticsearch_api/`. +2. Land Gap 1 as a separate config-shaped change (root mount of the + ES-compat router). +3. Add a StarRocks-flavored scenario to `quickwit/rest-api-tests/scenarii/` + that exercises `_search_shards`, `_nodes/http`, `_cat/indices` with + StarRocks-specific parameters, so future regressions are caught + before release. + +After (1)+(2), no shim is necessary: a Quickwit binary alone serves +StarRocks correctly. + +## 7. How to reproduce + +```bash +cd e2e/starrocks +bash run.sh # uses QW_VERSION=edge +# Inspect: +less artifacts/run.log +# Or run the probe by itself against a stood-up stack: +bash scripts/probe_es_api.sh +``` + +To test against the released image instead, override: +`QW_VERSION=0.8.0 bash run.sh`. diff --git a/e2e/starrocks/data/events.ndjson b/e2e/starrocks/data/events.ndjson new file mode 100644 index 00000000000..3ed66496fdb --- /dev/null +++ b/e2e/starrocks/data/events.ndjson @@ -0,0 +1,20 @@ +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:00Z","level":"INFO","service":"api","message":"user login succeeded","status":200,"latency_ms":12.4} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:01Z","level":"WARN","service":"api","message":"slow query detected","status":200,"latency_ms":850.0} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:02Z","level":"ERROR","service":"api","message":"database connection failed","status":500,"latency_ms":1500.5} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:03Z","level":"INFO","service":"web","message":"page rendered","status":200,"latency_ms":33.1} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:04Z","level":"INFO","service":"web","message":"asset cached","status":304,"latency_ms":2.0} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:05Z","level":"ERROR","service":"worker","message":"job retry exhausted","status":500,"latency_ms":42.0} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:06Z","level":"INFO","service":"worker","message":"job completed successfully","status":200,"latency_ms":120.0} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:07Z","level":"DEBUG","service":"api","message":"trace identifier issued","status":200,"latency_ms":1.1} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:08Z","level":"INFO","service":"api","message":"user logout","status":200,"latency_ms":8.7} +{"create":{"_index":"events"}} +{"ts":"2026-04-25T10:00:09Z","level":"WARN","service":"web","message":"deprecation warning emitted","status":200,"latency_ms":4.2} diff --git a/e2e/starrocks/docker-compose.yml b/e2e/starrocks/docker-compose.yml new file mode 100644 index 00000000000..0a9b84f3fe1 --- /dev/null +++ b/e2e/starrocks/docker-compose.yml @@ -0,0 +1,70 @@ +name: quickwit-starrocks-e2e + +networks: + default: + name: quickwit-starrocks-net + +services: + quickwit: + image: quickwit/quickwit:${QW_VERSION:-0.8.0} + container_name: qw_starrocks_qw + command: ["run"] + environment: + QW_ENABLE_OPENTELEMETRY_OTLP_EXPORTER: "false" + NO_COLOR: "true" + RUST_LOG: "info" + ports: + - "127.0.0.1:7280:7280" + healthcheck: + # The edge image does not ship curl/wget; use bash's /dev/tcp probe. + test: ["CMD", "bash", "-c", "exec 3<>/dev/tcp/localhost/7280 && printf 'GET /api/v1/_elastic HTTP/1.0\\r\\n\\r\\n' >&3 && head -1 <&3 | grep -q 200"] + interval: 3s + timeout: 5s + retries: 30 + + # ES-compat shim. Two responsibilities: + # 1. Translate standard ES URLs (/, //_search, /_search/scroll, ...) + # to Quickwit's /api/v1/_elastic/... routes. + # 2. Patch over known response-shape gaps (currently: inject the + # `state` field that StarRocks's _search_shards parser requires). + # Once Quickwit emits the missing fields natively, this shim becomes a + # plain path-translation proxy and could be replaced with nginx alone. + es_proxy: + build: + context: ./proxy + container_name: qw_starrocks_proxy + depends_on: + quickwit: + condition: service_healthy + environment: + UPSTREAM: "http://quickwit:7280" + ports: + - "127.0.0.1:9200:9200" + healthcheck: + test: ["CMD-SHELL", "python -c 'import urllib.request,sys; sys.exit(0 if urllib.request.urlopen(\"http://localhost:9200/\").status==200 else 1)'"] + interval: 3s + timeout: 5s + retries: 30 + + starrocks: + # AllInOne image bundles FE + BE in a single container; sufficient for + # functional tests of the ES catalog connector. + image: starrocks/allin1-ubuntu:${SR_VERSION:-3.3-latest} + container_name: qw_starrocks_sr + depends_on: + es_proxy: + condition: service_healthy + # StarRocks BE expects nofile >= 60000. The daemon will set the + # limit if the host kernel allows it; if it doesn't (e.g. running + # inside a constrained sandbox), only the FE will stay up, which is + # still enough to verify catalog creation + metadata SQL. SELECT + # queries require a healthy BE. + ports: + - "127.0.0.1:9030:9030" # MySQL protocol (FE) + - "127.0.0.1:8030:8030" # FE HTTP + - "127.0.0.1:8040:8040" # BE HTTP + healthcheck: + test: ["CMD-SHELL", "mysql -h127.0.0.1 -P9030 -uroot -e 'SELECT 1' >/dev/null 2>&1"] + interval: 5s + timeout: 5s + retries: 60 diff --git a/e2e/starrocks/nginx/nginx.conf b/e2e/starrocks/nginx/nginx.conf new file mode 100644 index 00000000000..bc55de4caf0 --- /dev/null +++ b/e2e/starrocks/nginx/nginx.conf @@ -0,0 +1,31 @@ +worker_processes 1; +events { worker_connections 1024; } + +http { + # Map StarRocks's plain ES URLs to Quickwit's /api/v1/_elastic/ prefix. + server { + listen 9200; + server_name _; + + # Larger buffers since _bulk and _msearch payloads can be big. + client_max_body_size 64m; + proxy_read_timeout 300s; + proxy_send_timeout 300s; + + # GET / -> GET /api/v1/_elastic + location = / { + proxy_pass http://quickwit:7280/api/v1/_elastic; + proxy_set_header Host $host; + } + + # Anything else: prepend /api/v1/_elastic. + # The trailing $1 reproduces the original path, so + # //_search becomes /api/v1/_elastic//_search, + # /_nodes/http becomes /api/v1/_elastic/_nodes/http, etc. + location / { + rewrite ^(.*)$ /api/v1/_elastic$1 break; + proxy_pass http://quickwit:7280; + proxy_set_header Host $host; + } + } +} diff --git a/e2e/starrocks/proxy/Dockerfile b/e2e/starrocks/proxy/Dockerfile new file mode 100644 index 00000000000..0d4d9687a40 --- /dev/null +++ b/e2e/starrocks/proxy/Dockerfile @@ -0,0 +1,5 @@ +FROM python:3.12-alpine +COPY es_compat_proxy.py /app/es_compat_proxy.py +WORKDIR /app +EXPOSE 9200 +CMD ["python", "es_compat_proxy.py"] diff --git a/e2e/starrocks/proxy/es_compat_proxy.py b/e2e/starrocks/proxy/es_compat_proxy.py new file mode 100644 index 00000000000..0f5f476a813 --- /dev/null +++ b/e2e/starrocks/proxy/es_compat_proxy.py @@ -0,0 +1,166 @@ +#!/usr/bin/env python3 +""" +ES-compatibility shim that fronts Quickwit on behalf of StarRocks. + +Two responsibilities: + +1. Path translation: standard Elasticsearch URLs (e.g. /, //_search, + /_search/scroll) are rewritten to Quickwit's /api/v1/_elastic/... routes. + +2. Body shaping for known incompatibilities (today, just one): + - Quickwit's //_search_shards omits the `state` field that + StarRocks's EsShardPartitions parser requires. + The shim parses each /_search_shards response and inserts + `"state": "STARTED"` on every shard entry. It also injects a synthetic + `nodes` map so StarRocks has something to look up node metadata in + (only meaningful when es.nodes.wan.only=true, which we set). + +Once the upstream Quickwit fixes (described in REPORT.md) ship, this shim +only needs to do path translation. +""" +from __future__ import annotations + +import http.client +import json +import os +import socketserver +import sys +from http.server import BaseHTTPRequestHandler +from urllib.parse import urlsplit + +UPSTREAM = os.environ.get("UPSTREAM", "http://quickwit:7280") +PREFIX = "/api/v1/_elastic" +HOP_BY_HOP = { + "connection", + "keep-alive", + "proxy-authenticate", + "proxy-authorization", + "te", + "trailers", + "transfer-encoding", + "upgrade", + "content-length", + "content-encoding", +} + + +def _maybe_rewrite(path: str, payload: bytes) -> bytes: + if path.endswith("/_search_shards"): + return _rewrite_search_shards(payload) + if path == "/_nodes/http": + return _rewrite_nodes_http(payload) + return payload + + +def _node_stub(node_id: str) -> dict: + return { + "name": node_id, + "version": "7.10.2", + "transport_address": "127.0.0.1:9300", + "http_address": "127.0.0.1:9200", + "attributes": {}, + "roles": ["data"], + } + + +def _rewrite_search_shards(payload: bytes) -> bytes: + try: + doc = json.loads(payload) + except Exception: + return payload + shards = doc.get("shards") + if not isinstance(shards, list): + return payload + seen_nodes = {} + for shard_group in shards: + if not isinstance(shard_group, list): + continue + for shard in shard_group: + if isinstance(shard, dict): + shard.setdefault("state", "STARTED") + node_id = shard.get("node") + if node_id and node_id not in seen_nodes: + seen_nodes[node_id] = _node_stub(node_id) + doc.setdefault("nodes", seen_nodes) + doc.setdefault("indices", {}) + return json.dumps(doc).encode() + + +def _rewrite_nodes_http(payload: bytes) -> bytes: + # StarRocks parses `nodes..version`; Quickwit omits it. Inject a stub. + try: + doc = json.loads(payload) + except Exception: + return payload + nodes = doc.get("nodes") + if not isinstance(nodes, dict): + return payload + for node_id, node in nodes.items(): + if not isinstance(node, dict): + continue + node.setdefault("name", node_id) + node.setdefault("version", "7.10.2") + node.setdefault("attributes", {}) + # Quickwit publishes http.publish_address; mirror it as + # http_address so EsNodeInfo's parser is happy either way. + http = node.get("http") or {} + if isinstance(http, dict) and "publish_address" in http: + node.setdefault("http_address", http["publish_address"]) + return json.dumps(doc).encode() + + +class Handler(BaseHTTPRequestHandler): + server_version = "es_compat_proxy/0.1" + + def _proxy(self) -> None: + url = urlsplit(UPSTREAM) + host, port = url.hostname, url.port or 80 + path = self.path + path_only = path.split("?", 1)[0] + upstream_path = PREFIX if path_only == "/" else PREFIX + path + body_len = int(self.headers.get("Content-Length", 0) or 0) + body = self.rfile.read(body_len) if body_len else b"" + + fwd_headers = { + k: v for k, v in self.headers.items() if k.lower() != "host" + } + fwd_headers.pop("Content-Length", None) + + conn = http.client.HTTPConnection(host, port, timeout=120) + try: + conn.request(self.command, upstream_path, body=body, headers=fwd_headers) + up = conn.getresponse() + data = up.read() + data = _maybe_rewrite(path_only, data) + + self.send_response(up.status) + for header, value in up.getheaders(): + if header.lower() in HOP_BY_HOP: + continue + self.send_header(header, value) + self.send_header("Content-Length", str(len(data))) + self.end_headers() + self.wfile.write(data) + finally: + conn.close() + + do_GET = _proxy + do_POST = _proxy + do_PUT = _proxy + do_DELETE = _proxy + do_HEAD = _proxy + + def log_message(self, fmt, *args) -> None: + sys.stderr.write("%s - - %s\n" % (self.address_string(), fmt % args)) + + +class ThreadedServer(socketserver.ThreadingMixIn, socketserver.TCPServer): + allow_reuse_address = True + daemon_threads = True + + +if __name__ == "__main__": + addr = ("0.0.0.0", 9200) + with ThreadedServer(addr, Handler) as srv: + sys.stderr.write(f"es_compat_proxy listening on {addr}, upstream {UPSTREAM}\n") + srv.serve_forever() diff --git a/e2e/starrocks/quickwit/index_config.yaml b/e2e/starrocks/quickwit/index_config.yaml new file mode 100644 index 00000000000..1a1a9b1604d --- /dev/null +++ b/e2e/starrocks/quickwit/index_config.yaml @@ -0,0 +1,43 @@ +version: 0.7 + +index_id: events + +doc_mapping: + mode: dynamic + field_mappings: + - name: ts + type: datetime + input_formats: + - rfc3339 + - unix_timestamp + output_format: rfc3339 + fast: true + - name: level + type: text + tokenizer: raw + fast: true + - name: service + type: text + tokenizer: raw + fast: true + - name: message + type: text + tokenizer: default + record: position + - name: status + type: u64 + stored: true + indexed: true + fast: true + - name: latency_ms + type: f64 + stored: true + indexed: true + fast: true + timestamp_field: ts + +indexing_settings: + commit_timeout_secs: 5 + +search_settings: + default_search_fields: [message] diff --git a/e2e/starrocks/run.sh b/e2e/starrocks/run.sh new file mode 100755 index 00000000000..decb49d3a0b --- /dev/null +++ b/e2e/starrocks/run.sh @@ -0,0 +1,88 @@ +#!/usr/bin/env bash +# End-to-end driver: +# 1. Bring up Quickwit + ES-compat shim + StarRocks. +# 2. Create the Quickwit `events` index and ingest sample data. +# 3. Probe each ES endpoint StarRocks's connector relies on. +# 4. Create the StarRocks ES catalog and run analytical queries. +# All output is appended to artifacts/run.log. Failures don't abort the +# script -- the goal is to surface every gap in one pass. + +set -u + +ROOT="$(cd "$(dirname "$0")" && pwd)" +ART="$ROOT/artifacts" +LOG="$ART/run.log" +mkdir -p "$ART" +: >"$LOG" + +# Default to the edge image since the v0.8.0 release predates several ES +# endpoints StarRocks needs. Override with QW_VERSION in .env or env. +export QW_VERSION="${QW_VERSION:-edge}" + +log() { echo "[$(date -Is)] $*" | tee -a "$LOG"; } +section() { log ""; log "============================================================"; log "$*"; log "============================================================"; } + +run() { + log "+ $*" + if ! "$@" >>"$LOG" 2>&1; then + log " -> exit $?" + return 1 + fi +} + +cd "$ROOT" + +section "1. Boot stack (QW_VERSION=$QW_VERSION)" +run docker compose down -v --remove-orphans || true +run docker compose up -d --build + +log "Waiting for Quickwit + proxy + StarRocks FE to become healthy..." +for i in $(seq 1 90); do + state=$(docker compose ps --format '{{.Name}} {{.Health}}' 2>/dev/null || true) + log " health: $(echo "$state" | tr '\n' '|')" + if echo "$state" | grep quickwit | grep -q healthy && \ + echo "$state" | grep es_proxy | grep -q healthy && \ + echo "$state" | grep starrocks | grep -q healthy; then + log " all healthy" + break + fi + sleep 5 +done + +section "2. Create Quickwit index" +run curl -sS -X POST -H 'Content-Type: application/yaml' \ + --data-binary @quickwit/index_config.yaml \ + http://127.0.0.1:7280/api/v1/indexes + +section "3. Ingest documents" +run curl -sS -X POST -H 'Content-Type: application/json' \ + --data-binary @data/events.ndjson \ + 'http://127.0.0.1:7280/api/v1/_elastic/_bulk?refresh=true' +log "Sleeping 8s for the indexer to flush a split..." +sleep 8 + +section "4. Sanity check via Quickwit's native ES route" +run curl -sS -X POST -H 'Content-Type: application/json' \ + --data '{"query":{"match_all":{}}}' \ + 'http://127.0.0.1:7280/api/v1/_elastic/events/_search' + +section "5. Probe ES endpoints via shim" +bash scripts/probe_es_api.sh >>"$LOG" 2>&1 || true + +section "6. Create StarRocks ES catalog" +docker compose exec -T starrocks mysql -h127.0.0.1 -P9030 -uroot \ + < starrocks/create_catalog.sql >>"$LOG" 2>&1 || log " -> create_catalog.sql failed" + +section "7. Run StarRocks SQL queries" +# The first few statements (SHOW DATABASES / SHOW TABLES / DESC) only need +# the FE; SELECT statements need a healthy BE (which requires nofile>=60000). +docker compose exec -T starrocks mysql -h127.0.0.1 -P9030 -uroot \ + < starrocks/queries.sql >>"$LOG" 2>&1 || log " -> queries.sql failed (expected if BE is down)" + +section "8. Capture container logs" +for svc in quickwit es_proxy starrocks; do + log "--- $svc logs (tail) ---" + docker compose logs --tail 60 "$svc" >>"$LOG" 2>&1 || true +done + +section "Done. Artifacts: $LOG" diff --git a/e2e/starrocks/scripts/probe_es_api.sh b/e2e/starrocks/scripts/probe_es_api.sh new file mode 100755 index 00000000000..314d891d848 --- /dev/null +++ b/e2e/starrocks/scripts/probe_es_api.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +# Exercise every ES endpoint StarRocks's connector calls, going through the +# nginx proxy. Each request reports HTTP status + a short snippet of the body +# so we can see which endpoints are compatible and which aren't. + +set -u + +ES="http://127.0.0.1:9200" +INDEX="${INDEX:-events}" + +probe() { + local method="$1" path="$2" data="${3:-}" + local label="${method} ${path}" + local out status body + if [[ -n "$data" ]]; then + out=$(curl -sS -o /tmp/es_body.$$ -w '%{http_code}' \ + -X "$method" -H 'Content-Type: application/json' \ + --data "$data" "${ES}${path}" || true) + else + out=$(curl -sS -o /tmp/es_body.$$ -w '%{http_code}' \ + -X "$method" "${ES}${path}" || true) + fi + status="$out" + body=$(head -c 400 /tmp/es_body.$$ 2>/dev/null || echo "") + rm -f /tmp/es_body.$$ + printf '\n=== %s -> %s ===\n%s\n' "$label" "$status" "$body" +} + +probe GET "/" +probe GET "/_nodes/http" +probe GET "/_cat/indices?h=index&format=json&s=index:asc" +probe GET "/_aliases" +probe GET "/${INDEX}/_mapping" +probe GET "/${INDEX}/_search_shards" +probe POST "/${INDEX}/_search?scroll=1m" '{"size":3,"query":{"match_all":{}}}' + +# Use the last scroll_id from the previous response (best effort). +SCROLL_ID=$(curl -sS -X POST -H 'Content-Type: application/json' \ + --data '{"size":3,"query":{"match_all":{}}}' \ + "${ES}/${INDEX}/_search?scroll=1m" | python3 -c \ + "import sys,json; print(json.load(sys.stdin).get('_scroll_id',''))" 2>/dev/null || true) + +if [[ -n "${SCROLL_ID:-}" ]]; then + probe POST "/_search/scroll" "{\"scroll\":\"1m\",\"scroll_id\":\"${SCROLL_ID}\"}" + probe DELETE "/_search/scroll" "{\"scroll_id\":[\"${SCROLL_ID}\"]}" +else + printf '\n=== /_search/scroll skipped (no scroll_id) ===\n' +fi diff --git a/e2e/starrocks/starrocks/create_catalog.sql b/e2e/starrocks/starrocks/create_catalog.sql new file mode 100644 index 00000000000..0384cbfaaed --- /dev/null +++ b/e2e/starrocks/starrocks/create_catalog.sql @@ -0,0 +1,19 @@ +-- Drop any prior run. +DROP CATALOG IF EXISTS qw_es; + +-- Create an Elasticsearch catalog pointing at the nginx proxy in front of +-- Quickwit. `es.nodes.wan.only=true` is critical: otherwise StarRocks would +-- try to read shard locations from /_search_shards and connect directly to +-- those node addresses, which won't be reachable for Quickwit. +CREATE EXTERNAL CATALOG qw_es PROPERTIES ( + "type" = "es", + "hosts" = "http://es_proxy:9200", + "user" = "", + "password" = "", + "es.net.ssl" = "false", + "enable_docvalue_scan" = "true", + "enable_keyword_sniff" = "true", + "es.nodes.wan.only" = "true" +); + +SHOW CATALOGS; diff --git a/e2e/starrocks/starrocks/queries.sql b/e2e/starrocks/starrocks/queries.sql new file mode 100644 index 00000000000..0b66ac46799 --- /dev/null +++ b/e2e/starrocks/starrocks/queries.sql @@ -0,0 +1,31 @@ +SET CATALOG qw_es; +SHOW DATABASES; + +USE default_db; +SHOW TABLES; + +-- Schema discovery via _mapping. Exercises: +-- GET /_nodes/http (StarRocks needs nodes[*].version → patched by shim) +-- GET //_mapping (Quickwit-native, OK once index has a split) +-- GET //_search_shards (StarRocks needs shard.state and nodes map → +-- patched by shim) +DESC events; + +-- Full table read via scroll API. Requires a healthy BE. +SELECT COUNT(*) AS total_rows FROM events; + +-- Predicate pushdown to Quickwit (term query on `level`). +SELECT level, COUNT(*) AS n FROM events GROUP BY level ORDER BY level; + +-- Numeric range pushdown. +SELECT service, AVG(latency_ms) AS avg_latency +FROM events +WHERE status >= 500 +GROUP BY service +ORDER BY service; + +-- Top-N with secondary sort. +SELECT ts, level, service, message, latency_ms +FROM events +ORDER BY latency_ms DESC +LIMIT 5;