-
Notifications
You must be signed in to change notification settings - Fork 20
Make SDK ingestion docs agent-friendly #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -188,15 +188,22 @@ func main() { | |
| } | ||
| defer stream.Close() | ||
|
|
||
| // 4. Send record to server and get offset | ||
| // The offset is a logical sequence number assigned to this record | ||
| // 4. Send record to server and get offset. | ||
| // IngestRecordOffset returns as soon as the record is queued; the offset is a | ||
| // logical sequence number assigned to this record. The server round-trip | ||
| // happens in the background. | ||
| offset, err := stream.IngestRecordOffset(`{"id": 1, "message": "Hello"}`) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| log.Printf("Record queued for ingestion with offset %d", offset) | ||
|
|
||
| // 5. Wait for server to acknowledge the record is durably written | ||
| // 5. Wait for the server to acknowledge the record is durably written. | ||
| // | ||
| // WaitForOffset confirms this one record, which is all this example needs. | ||
| // For real workloads, the idiomatic flow is to ingest many records in a loop | ||
| // and call stream.Flush() once at the end. See the "Idiomatic high-throughput | ||
| // flow" under Usage Guide → Ingest Data. | ||
| if err := stream.WaitForOffset(offset); err != nil { | ||
| log.Fatal(err) | ||
| } | ||
|
|
@@ -247,7 +254,9 @@ if err != nil { | |
| } | ||
| log.Printf("Record queued with offset %d", offset) | ||
|
|
||
| // 4. Optionally wait for server acknowledgment | ||
| // 4. Optionally wait for server acknowledgment of this specific record. | ||
| // For bulk ingestion, ingest in a loop and call stream.Flush() once instead — | ||
| // see the "Idiomatic high-throughput flow" under Usage Guide → Ingest Data. | ||
| if err := stream.WaitForOffset(offset); err != nil { | ||
| log.Fatal(err) | ||
| } | ||
|
|
@@ -477,21 +486,40 @@ defer stream.Close() | |
|
|
||
| ### 4. Ingest Data | ||
|
|
||
| **Single record:** | ||
| > **Acknowledgments and throughput.** Ingestion is asynchronous. | ||
| > `IngestRecordOffset()` returns as soon as the record is queued; the SDK sends it | ||
| > and tracks its acknowledgment in the background. To confirm records are durably | ||
| > committed, call `Flush()` — it returns once everything queued so far is | ||
| > acknowledged. The idiomatic flow is **ingest in a loop, then `Flush()`** (once | ||
| > for a bounded batch, or periodically for a long-running stream). Each ingest | ||
| > also returns the record's offset, and `WaitForOffset(offset)` blocks until that | ||
| > offset is acknowledged — handy when a specific record must be confirmed before | ||
| > continuing (acks are ordered, so waiting on the last offset confirms the whole | ||
| > run). Just avoid calling `WaitForOffset()` after every record in a tight loop, | ||
| > since that limits throughput to one record per round-trip. | ||
|
|
||
| **Idiomatic high-throughput flow: ingest in a loop, then `Flush()` once.** | ||
|
|
||
| ```go | ||
| // JSON (string) - queues record and returns offset | ||
| offset, err := stream.IngestRecordOffset(`{"id": 1, "value": "hello"}`) | ||
| if err != nil { | ||
| // Ingest many records without waiting between them. | ||
| for i := 0; i < 100000; i++ { | ||
| jsonData := fmt.Sprintf(`{"id": %d, "timestamp": %d}`, i, time.Now().Unix()) | ||
| if _, err := stream.IngestRecordOffset(jsonData); err != nil { | ||
| log.Printf("Record %d failed: %v", i, err) | ||
| continue | ||
| } | ||
| } | ||
|
|
||
| // Wait for ALL pending records to be acknowledged in a single call. | ||
| if err := stream.Flush(); err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| log.Printf("Record queued at offset: %d", offset) | ||
| ``` | ||
|
|
||
| **Batch ingestion for high throughput:** | ||
| **Batch ingestion for even higher throughput:** | ||
|
|
||
| ```go | ||
| // Ingest multiple records at once | ||
| // Ingest multiple records in one call (one offset for the whole batch). | ||
| records := []interface{}{ | ||
| `{"id": 1, "value": "first"}`, | ||
| `{"id": 2, "value": "second"}`, | ||
|
|
@@ -502,28 +530,26 @@ if err != nil { | |
| log.Fatal(err) | ||
| } | ||
| log.Printf("Batch queued with offset: %d", batchOffset) | ||
| // ... ingest more batches ... | ||
| stream.Flush() // wait for everything at the end | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other if err := stream.Flush(); err != nil {
log.Fatal(err)
} |
||
| ``` | ||
|
|
||
| **High throughput pattern:** | ||
| **Single record with explicit confirmation:** | ||
|
|
||
| ```go | ||
| // Ingest many records without waiting | ||
| for i := 0; i < 100000; i++ { | ||
| jsonData := fmt.Sprintf(`{"id": %d, "timestamp": %d}`, i, time.Now().Unix()) | ||
| offset, err := stream.IngestRecordOffset(jsonData) | ||
| if err != nil { | ||
| log.Printf("Record %d failed: %v", i, err) | ||
| continue | ||
| } | ||
|
|
||
| // Optional: log progress | ||
| if i%10000 == 0 { | ||
| log.Printf("Ingested %d records, latest offset: %d", i, offset) | ||
| } | ||
| // JSON (string) - queues record and returns offset. | ||
| offset, err := stream.IngestRecordOffset(`{"id": 1, "value": "hello"}`) | ||
| if err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| log.Printf("Record queued at offset: %d", offset) | ||
|
|
||
| // Wait for all records to be acknowledged | ||
| stream.Flush() | ||
| // WaitForOffset confirms this specific record before continuing. For bulk | ||
| // ingestion, prefer ingesting in a loop and calling Flush() once (see the | ||
| // high-throughput flow above). | ||
| if err := stream.WaitForOffset(offset); err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| ``` | ||
|
|
||
| **Concurrent ingestion with goroutines:** | ||
|
|
@@ -620,13 +646,18 @@ if err := stream.WaitForOffset(batchOffset); err != nil { | |
| } | ||
| log.Println("Batch confirmed") | ||
|
|
||
| // High-throughput: | ||
| // Idiomatic high-throughput flow: ingest in a loop, then Flush() once to | ||
| // confirm everything queued so far. | ||
| for i := 0; i < 1000; i++ { | ||
| _, _ := stream.IngestRecordOffset(record) | ||
| if _, err := stream.IngestRecordOffset(record); err != nil { | ||
| log.Printf("Record %d failed: %v", i, err) | ||
| } | ||
| } | ||
|
|
||
| // Use Flush() to wait for all pending acknowledgments at once | ||
| stream.Flush() | ||
| // Use Flush() to wait for all pending acknowledgments at once. | ||
| if err := stream.Flush(); err != nil { | ||
| log.Fatal(err) | ||
| } | ||
| ``` | ||
|
|
||
| ### 6. Error Handling | ||
|
|
@@ -898,16 +929,17 @@ The test suite includes: | |
|
|
||
| 1. **Reuse SDK Instances** - Create one `ZerobusSdk` per application and reuse for multiple streams | ||
| 2. **Always Close Streams** - Use `defer stream.Close()` to ensure all data is flushed | ||
| 3. **Choose the Right Ingestion Method**: | ||
| 3. **Ingest, then `Flush()`** - `IngestRecordOffset()`/`IngestRecordsOffset()` return as soon as the record is queued and track acknowledgment in the background. The idiomatic flow is to ingest in a loop and call `Flush()` to confirm durability. Use `WaitForOffset()` when a specific record must be confirmed before continuing (acks are ordered, so the last offset confirms the whole group). Just avoid calling `WaitForOffset()` after every record in a tight loop, since that limits throughput to one record per round-trip. | ||
| 4. **Choose the Right Ingestion Method**: | ||
| - Use `IngestRecordsOffset()` for high throughput batch ingestion | ||
| - Use `IngestRecordOffset()` when processing records individually | ||
| - Both return offsets directly; use `WaitForOffset()` to explicitly wait for acknowledgments | ||
| - The older `IngestRecord()` method is deprecated | ||
| 4. **Tune Inflight Limits** - Adjust `MaxInflightRequests` based on memory and throughput needs | ||
| 5. **Enable Recovery** - Always set `Recovery: true` in production environments | ||
| 6. **Use Batch Ingestion** - For high throughput, ingest many records before calling `Flush()` | ||
| 7. **Monitor Errors** - Log and alert on non-retryable errors | ||
| 8. **Use Protocol Buffers for Production** - More efficient than JSON for high-volume scenarios | ||
| 5. **Tune Inflight Limits** - Adjust `MaxInflightRequests` based on memory and throughput needs | ||
| 6. **Enable Recovery** - Always set `Recovery: true` in production environments | ||
| 7. **Use Batch Ingestion** - For high throughput, ingest many records before calling `Flush()` | ||
| 8. **Monitor Errors** - Log and alert on non-retryable errors | ||
| 9. **Use Protocol Buffers for Production** - More efficient than JSON for high-volume scenarios | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicate number 9 in list below, this line should be followed with 10/11/12. |
||
| 9. **Secure Credentials** - Never hardcode secrets; use environment variables or secret managers | ||
| 10. **Test Recovery** - Simulate failures to verify your error handling logic | ||
| 11. **One Stream Per Goroutine** - Don't share streams across goroutines; create separate streams for concurrent ingestion | ||
|
|
@@ -1120,6 +1152,12 @@ Waits for the server to acknowledge that a specific record has been durably writ | |
|
|
||
| Unlike `Flush()` which waits for all pending records, this waits only for a specific offset, allowing more granular control. | ||
|
|
||
| > Use `WaitForOffset()` when a specific record must be confirmed before | ||
| > continuing; acks are ordered, so waiting on the last offset of a group confirms | ||
| > all prior offsets too. For bulk durability, prefer ingesting in a loop and | ||
| > calling `Flush()` once. Avoid calling `WaitForOffset()` after every record in a | ||
| > tight loop, since that limits throughput to one record per round-trip. | ||
|
|
||
| **Example:** | ||
| ```go | ||
| // Send multiple records | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IngestBatchis a method onZerobusArrowStream(go/arrow_stream.go:168), not onZerobusStreamwhereIngestRecordOffset/IngestRecordsOffsetlive. This reads as if all three are on the same type. We should maybe dropIngestBatch()from the list or note it belongs to the Arrow stream. Flagged by LLM so probably useful.