Skip to content

Conversation

@ryan-gang
Copy link
Contributor

@ryan-gang ryan-gang commented Jul 10, 2025

Add Produce API extension with 7 stages and updated stage descriptions.

Summary by CodeRabbit

  • New Features

    • Introduced a new "Producing Messages" challenge extension, focusing on Kafka's Produce API and message storage.
    • Added seven progressive stages covering API versioning, error handling for invalid topics, producing records to topics, multiple records, multiple partitions, and multiple topics.
  • Documentation

    • Added detailed instructional guides for each new stage, outlining learning objectives, expected behaviors, and testing criteria for implementing Kafka's Produce API functionality.

@ryan-gang ryan-gang self-assigned this Jul 10, 2025
@coderabbitai
Copy link

coderabbitai bot commented Jul 10, 2025

Warning

Rate limit exceeded

@ryan-gang has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 0 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3f26b and 0140d31.

📒 Files selected for processing (5)
  • stage_descriptions/producing-messages-03-dp1.md (1 hunks)
  • stage_descriptions/producing-messages-04-ps7.md (1 hunks)
  • stage_descriptions/producing-messages-05-sb8.md (1 hunks)
  • stage_descriptions/producing-messages-06-mf2.md (1 hunks)
  • stage_descriptions/producing-messages-07-ar4.md (1 hunks)

Walkthrough

A new "Producing Messages" extension has been added to the course, introducing seven progressive stages that guide learners through implementing Kafka's Produce API. Each stage is accompanied by detailed instructional documentation, covering topics from API version advertisement to producing messages across multiple topics and partitions, with increasing complexity.

Changes

File(s) Change Summary
course-definition.yml Added "Producing Messages" extension and seven new stages (um3, ck2, dp1, ps7, sb8, mf2, ar4) with descriptions and difficulty levels.
stage_descriptions/producing-messages-01-um3.md New doc: Guide for including Produce API in APIVersions response.
stage_descriptions/producing-messages-02-ck2.md New doc: Guide for handling Produce requests to invalid topics/partitions.
stage_descriptions/producing-messages-03-dp1.md New doc: Guide for responding to Produce requests with topic/partition validation using cluster metadata.
stage_descriptions/producing-messages-04-ps7.md New doc: Guide for producing a single record and persisting it to disk.
stage_descriptions/producing-messages-05-sb8.md New doc: Guide for producing multiple records in a single Produce request.
stage_descriptions/producing-messages-06-mf2.md New doc: Guide for producing messages to multiple partitions in one request.
stage_descriptions/producing-messages-07-ar4.md New doc: Guide for producing messages to multiple topics in one request.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Broker

    Client->>Broker: APIVersions Request
    Broker-->>Client: APIVersions Response (includes Produce API)

    Client->>Broker: Produce Request (topic/partition)
    alt Invalid topic/partition
        Broker-->>Client: Produce Response (error code 3)
    else Valid topic/partition
        Broker->>Broker: Validate via cluster metadata
        Broker->>Broker: Persist record(s) to log file(s)
        Broker-->>Client: Produce Response (success, offsets, etc.)
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐇
Hopping through the Kafka fields,
New stages sprout where learning yields.
Messages produced, from one to many,
Topics and partitions, oh there are plenty!
With docs in tow and guidance bright,
The broker learns to do it right.
Produce, persist—what a delight!

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kafka-produce

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ryan-gang
Copy link
Contributor Author

ryan-gang commented Jul 10, 2025

@rohitpaulk some more work is required.

  1. We want to send multiple consecutive requests from stage 5 onwards. But I'm yet to confirm how the response changes subsequently.
  2. The last couple of stages are turning out to be very dry.
  3. We don't really have binspec for the cluster metadata payload records. (TOPIC_RECORD, PARTITION_RECORD). Need to think of how to convey this easily.
  4. Details are pending on how we would assert their on-disk writes. Need to implement this.

But I think we can finalize the breakdown from this. In the next review cycle, we can finalize the language.

@ryan-gang ryan-gang requested a review from rohitpaulk July 10, 2025 18:37
Copy link
Member

@rohitpaulk rohitpaulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of work left with wording etc., but when it comes to stage breakdown - stage 2 & 3 are the ones that definitely need rework. stage 2 implementation is likely to pass stage 3 autoamtically.

## Produce API response for invalid partitions

When a Kafka broker receives a Produce request, it first needs to validate that the partition exists. If a partition doesn't exist, it returns an appropriate error code and response.
To validate that a partition exists, the broker reads the `__cluster_metadata` topic's log file, located at `/tmp/kraft-combined-logs/__cluster_metadata-0/00000000000000000000.log`. Inside the log file, the broker finds the topic's metadata, which is a `record` (inside a RecordBatch) with a payload of type `PARTITION_RECORD`. If there exists a `PARTITION_RECORD` with the given partition index, the UUID of the topic it is associated with and the UUID of directory it is associated with, the partition exists. Else, the partition doesn't exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an implementer's standpoint, the code for the previous stage is automatically going to pass this one too - they don't really have to parse the cluster metadata file at all and can hardcode the error case. We should rework this, doesn't make sense to breakdown unless we can actually test whether they're implementing the steps we expect.

In the Fetch extension, we tell people that they can hardcode the error response for now, and that they'll get to actually parsing / checking etc. in later stages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, no way to test that they are parsing PARTITION_RECORDs, so we gonna merge stage 2 and 3.


## Single Record Production

When a Kafka broker receives a Produce request for a valid topic and partition, it needs to validate that the topic and partition exist (using the `__cluster_metadata` topic's log file), store the record in the appropriate log file using Kafka's on-disk format, and return a successful response with the assigned offset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how this breakdown is currently setup, I think users are going to end up hardcoding the error response for stage 2-3, and then in stage 4 they'll be stuck having to implement both parsing cluster_metadata and storing records in the log file.

Is there a way around this?

Copy link
Contributor Author

@ryan-gang ryan-gang Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so.
One unconventional way might be:
Stage 2: Introduce creating successful responses. (Hard coded)
Stage 3: Introduce error case, at this point as they can receive both success / invalid requests they would be forced to parse the cluster metadata log and check for the existence of both topic and partitions.

@ryan-gang
Copy link
Contributor Author

@ryan-gang ryan-gang requested a review from rohitpaulk July 17, 2025 19:30
Copy link
Member

@rohitpaulk rohitpaulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one round of comments

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
course-definition.yml (1)

214-255: Duplicate slug values detected – please rename for uniqueness
The following slug values appear more than once across your YAML files, which will silently break course progression:

  • concurrent-clients
  • consuming-messages
  • listing-partitions
  • producing-messages

Action items:

  • Identify each occurrence of the above slugs (e.g. in extension-definition.yml or course-definition.yml).
  • Rename the offending slug: entries to unique identifiers.
  • Re-run:
grep -RhoP 'slug:\s*"\K[^"]+' . | sort | uniq -d

to verify that no duplicates remain.

♻️ Duplicate comments (9)
course-definition.yml (1)

236-247: Marketing copy contradicts earlier feedback about multi-partition tests

Line 20 of producing-messages-06-ar4.md still says “a single partition for each topic”, even after feedback requesting multi-partition coverage. Please align the marketing text here with the actual test plan (multi-partition should be supported).

No code change needed in this file, but update the description for clarity.

stage_descriptions/producing-messages-06-ar4.md (2)

38-40: Missing binspec link for on-disk file format

Previous review asked to link the RecordBatch on-disk binspec wherever persistence is tested. Add a bullet under “Tests”:

- 🔎 [RecordBatch file format](https://binspec.org/kafka-record-batch-v2)

20-22: Still restricts to one partition/topic – contradicts prior review

Line 20 explicitly says “a single partition for each topic”; prior feedback requested multi-partition coverage. Update wording & tests or mark reason for deviation.

stage_descriptions/producing-messages-03-ps7.md (3)

14-14: Provide the actual RecordBatch spec URL instead of the placeholder.

The [RecordBatch](add-link) placeholder is still unresolved. Replace it with the real binspec (or official docs) so learners can jump straight to the format definition.

- Kafka's on-disk log format uses the same [RecordBatch](add-link) format that is used in `Produce` and `Fetch` requests.
+Kafka's on-disk log format uses the same [RecordBatch](https://binspec.org/kafka-record-batch) format that is used in `Produce` and `Fetch` requests.

35-38: Explicitly name the topics array and require exactly one element.

Previous feedback asked for consistency with the Fetch stages. Mention the exact field (topics) and the expected cardinality to avoid ambiguity for implementers.

- Inside the topic response:
+The response’s `topics` array must contain exactly **one** entry.
+Inside that single topic response:

46-46: Clarify what “CRC validation” means for implementers.

They only need to write the CRC for each RecordBatch; validating it on write-path isn’t required in this stage and may confuse readers. Re-phrase accordingly.

- On-disk log files must be stored in `RecordBatch` format with proper CRC validation.
+On-disk log files must be stored in `RecordBatch` format, including the correct CRC for each batch.
stage_descriptions/producing-messages-04-sb8.md (1)

5-5: Add a binspec link for a multi-record topic log file.

Earlier feedback requested a binspec showing a regular topic’s log file with multiple messages; that context is still missing and would greatly help users verify their on-disk output.

When a Kafka broker receives a `Produce` request containing a `RecordBatch` with multiple records, it needs to validate that the topic and partition exist, assign sequential offsets to each record within the batch, and store the entire batch atomically to the log file. 
+
+You can inspect an example topic log file containing a multi-record batch here:
+- 🔎 [Sample topic log file](https://binspec.org/kafka-topic-log-multi-record)
+
The `RecordBatch` containing multiple records must be stored as a single unit to the topic's log file at `<log-dir>/<topic-name>-<partition-index>/00000000000000000000.log`.
stage_descriptions/producing-messages-02-ck2.md (2)

21-25: Link to an error-response binspec rather than the generic success response.

Learners need to see the exact on-wire layout for an error scenario.

-🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11)
+🔎 [Produce Error Response (v11)](https://binspec.org/kafka-produce-error-response-v11)

8-18: Trim the deep metadata-parsing explanation; this stage only requires a hard-coded error response.

Earlier review feedback pointed out that learners can’t be tested on UUID look-ups yet. A concise “hard-code UNKNOWN_TOPIC_OR_PARTITION for any unknown topic/partition” keeps the stage focused.

-### Topic Validation
-...
-If there exists a `TOPIC_RECORD` with the given topic name and the topic ID, the topic exists.
-
-### Partition Validation
-...
-If there exists a `PARTITION_RECORD` with the given partition index, the UUID of the topic it is associated with, and the UUID of the directory it is associated with, the partition exists.
-
-If either the topic or partition doesn't exist, the broker returns an error code of `3` (UNKNOWN_TOPIC_OR_PARTITION).
+For this stage you **don’t** have to read `__cluster_metadata`.  
+Simply parse the `Produce` request to extract the topic & partition, and respond with error code `3` (UNKNOWN_TOPIC_OR_PARTITION) whenever they don’t match a hard-coded “known” topic/partition pair.  
+Full metadata validation will be introduced in a later stage.
🧹 Nitpick comments (8)
course-definition.yml (1)

221-227: Stage name is wordy & deviates from earlier naming guidelines

Previous review suggested shorter, action-focused titles (“Produce to an invalid topic”) to keep the stage list readable.
Current title "Produce to an invalid topic or partition" is still long and mixes two error cases. Consider splitting or trimming:

-    name: "Produce to an invalid topic or partition"
+    name: "Produce to an invalid topic/partition"
stage_descriptions/producing-messages-01-um3.md (1)

12-13: Tiny wording tweak for flow

Consider plural “clients” for better readability:

-This will let the client know that the broker supports the `Produce` API.
+This lets clients know that your broker supports the `Produce` API.
stage_descriptions/consuming-messages-01-gs0.md (2)

29-31: Spelling: “atleast” ➜ “at least”

-- The `MaxVersion` for the Fetch API is atleast 16.
+- The `MaxVersion` for the Fetch API is at least 16.

26-29: Consistent constant casing

Other docs use NO_ERROR; this file uses “No Error”. Align to avoid learner confusion.

-- The error code in the response body is `0` (No Error).
+- The error code in the response body is `0` (NO_ERROR).
stage_descriptions/consuming-messages-02-dh6.md (1)

42-45: Remove stray double-space & line-wrap

The markdown has two spaces before “to scroll”, producing a non-breaking space in rendered HTML.

- Make sure  to scroll down to the "Fetch Response (Version: 16)" section.
+ Make sure to scroll down to the "Fetch Response (Version: 16)" section.
stage_descriptions/producing-messages-04-sb8.md (1)

32-33: Keep the wording minimal; the 0 base-offset rule is already declared.

The parenthetical explanation is redundant given the bullet explicitly says the value must be 0. Dropping it streamlines the spec.

-    - The `base_offset` field is 0 (the base offset for the batch).
+    - The `base_offset` field is `0`.
stage_descriptions/producing-messages-05-mf2.md (2)

24-31: State that the topics array must contain exactly one element for consistency.

Other stages make this explicit; repeating it here prevents off-by-one misunderstandings when multiple topics are introduced in the next stage.

- There is a single topic present in the response.
+The response’s `topics` array must contain exactly **one** element.

44-45: Mention the on-disk path to keep instructions uniform.

Previous stages specify the exact log-file path pattern; adding it here maintains consistency and removes guess-work.

- On-disk log files must be stored in `RecordBatch` format with proper CRC validation.
+On-disk log files for each partition must be stored in `RecordBatch` format (with correct CRC) at `<log-dir>/<topic-name>-<partition-index>/00000000000000000000.log`.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7572554 and 9e1ef42.

📒 Files selected for processing (9)
  • course-definition.yml (2 hunks)
  • stage_descriptions/consuming-messages-01-gs0.md (2 hunks)
  • stage_descriptions/consuming-messages-02-dh6.md (2 hunks)
  • stage_descriptions/producing-messages-01-um3.md (1 hunks)
  • stage_descriptions/producing-messages-02-ck2.md (1 hunks)
  • stage_descriptions/producing-messages-03-ps7.md (1 hunks)
  • stage_descriptions/producing-messages-04-sb8.md (1 hunks)
  • stage_descriptions/producing-messages-05-mf2.md (1 hunks)
  • stage_descriptions/producing-messages-06-ar4.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test_course_definition / test (go)
  • GitHub Check: test_course_definition / test (typescript)
  • GitHub Check: test_course_definition / test (zig)
  • GitHub Check: test_course_definition / test (java)
  • GitHub Check: test_course_definition / test (cpp)
🔇 Additional comments (1)
course-definition.yml (1)

74-82: Ensure list item indentation matches YAML spec

The new producing-messages extension sits at the same indentation level as its siblings, which is correct, but the nested block under description_markdown: must be indented by exactly two spaces relative to the key to be parsed as a multi-line scalar. A quick scan shows 4-space indentation – that’s fine because earlier extensions follow the same convention, just make sure you keep it consistent to avoid silent YAML parse failures.

ryan-gang added 18 commits July 21, 2025 23:19
…ve protocol inspector links and clarifying APIVersions requirements.
…xistent topics, including error response details and interactive protocol inspector links
…s for invalid topics, including error response details
…lid partitions, detailing error responses and adding interactive protocol inspector links
…uction process, including validation steps, log file format, and updated response fields
…uction by consolidating validation, storage, and response steps into a single sentence
… format and add interactive protocol inspector link for cluster metadata topic's log file
…le records in a single RecordBatch, including validation, storage, and updated response fields
…e partitions, including validation, response structure, and log file requirements
…le topics and partitions, including validation, response structure, and offset management
…ng on baseOffset and log_start_offset handling in Kafka
…cluding multiple stages for handling various production scenarios such as invalid topics, partitions, and batch production
… invalid topics and partitions, enhancing clarity on error handling and response structure
…tions in a single request, including validation, independent offset management, and nested response structure
…f invalid topics or partitions, enhancing consistency in error response messaging
…for invalid topics and partitions, removing detailed validation steps for clarity
…id topics, including validation steps and interactive protocol inspector links
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
stage_descriptions/producing-messages-07-ar4.md (3)

29-37: Bullet-nesting is off; Markdown renders this list flat

Lines 29-37 should be indented by two spaces so that “Inside each topic response” and its sub-items render as a nested list. Otherwise the hierarchy is lost and the doc looks jumbled.

-29 - Inside each topic response:
+29   - Inside each topic response:
-30   - The `name` field matches the topic name in the request.
+30     - The `name` field matches the topic name in the request.
  … (indent the remaining sub-bullets similarly)

5-5: Clarify independent offsets example

The example “topic foo partition 0 and topic bar partition 0 can both have records starting at offset 0” is good, but explicitly stating that offsets are scoped per-topic-partition removes any ambiguity for newcomers.

-Each topic-partition combination maintains its own independent offset sequence,
+Each <topic, partition> pair maintains its own independent, zero-based offset sequence,

38-39: Persist-to-disk criterion could mention RecordBatch format

Earlier stages emphasise persisting in RecordBatch with CRC validation; restating it here keeps consistency and prevents accidental regression.

- - Records are persisted to the correct topic-partition log files on disk.
+ - Records are persisted to the correct topic-partition log files on disk in valid Kafka `RecordBatch` format (CRC verified).
stage_descriptions/producing-messages-06-mf2.md (2)

29-37: Fix list indentation for proper nesting

Indent “Inside the topic response” and its child bullets to render correctly.

-29 - Inside the topic response:
+29   - Inside the topic response:
-30   - The `name` field matches the topic name in the request.
+30     - The `name` field matches the topic name in the request.
  … (indent remaining sub-bullets)

34-34: Highlight meaning of base_offset

Explicitly say it’s the first offset of the RecordBatch; this prevents confusion when batches hold multiple records.

-     - The `base_offset` field contains the assigned offset for that partition.
+     - `base_offset` equals the first offset of the RecordBatch written to that partition.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c77abf3 and 7b40866.

📒 Files selected for processing (6)
  • stage_descriptions/producing-messages-02-ck2.md (1 hunks)
  • stage_descriptions/producing-messages-03-dp1.md (1 hunks)
  • stage_descriptions/producing-messages-04-ps7.md (1 hunks)
  • stage_descriptions/producing-messages-05-sb8.md (1 hunks)
  • stage_descriptions/producing-messages-06-mf2.md (1 hunks)
  • stage_descriptions/producing-messages-07-ar4.md (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • stage_descriptions/producing-messages-05-sb8.md
  • stage_descriptions/producing-messages-03-dp1.md
  • stage_descriptions/producing-messages-04-ps7.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • stage_descriptions/producing-messages-02-ck2.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test_course_definition / test (typescript)
  • GitHub Check: test_course_definition / test (go)
  • GitHub Check: test_course_definition / test (zig)
  • GitHub Check: test_course_definition / test (java)
  • GitHub Check: test_course_definition / test (cpp)
🔇 Additional comments (1)
stage_descriptions/producing-messages-06-mf2.md (1)

44-44: Good call-out on RecordBatch & CRC

Re-emphasising CRC validation reinforces an important correctness requirement.

…s multiple partitions for each topic, enhancing understanding of request structure
…pdating link titles for consistency and accuracy
…nsistency and clarity across Produce request validations
@ryan-gang ryan-gang requested a review from rohitpaulk July 21, 2025 20:18
@rohitpaulk
Copy link
Member

Will review this tomorrow, want to see if I can use it to test automated reviews via core

@ryan-gang
Copy link
Contributor Author

@rohitpaulk this is pending.

@rohitpaulk
Copy link
Member

Yep, getting to it now

Copy link
Member

@rohitpaulk rohitpaulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed half, let's revise based on these and request a review again please


We've created an interactive protocol inspector for the request & response structures for `Produce`:

- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-gang did we settle on v11 vs v12? What was the conclusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are going to go with v12.
The Request and Response structures are identical.
Changes in stage descriptions and binspec would be very minimal.
For tester, changes would be limited, but we would have to change the version of Kafka we use for testing on CI and locally, + fixtures are going to change a lot.

So I thought I would ship tester with v11, then in a separate PR, upgrade version + update CI + fixtures.

ryan-gang and others added 3 commits July 25, 2025 23:27
… 'Produce' references and improving overall readability
…th valid topics and partitions, and improve link references for RecordBatch format
… in stage descriptions to streamline content
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
stage_descriptions/producing-messages-06-mf2.md (2)

30-38: Fine-tune nested bullet indentation for reliable Markdown rendering

GitHub CommonMark sometimes treats two-space indentation ambiguously; four spaces (or a tab) is safest for guaranteeing a properly nested list.
Consider shifting lines 31-37 four spaces right to avoid the chance of the sub-list collapsing into plain text on some renderers.


35-36: Clarify expected offset semantics

Stating that base_offset “contains the assigned offset” leaves room for interpretation.
Appending a brief note such as “(e.g., 0 for the first record in an empty partition)” reinforces that offsets are scoped per partition and start from 0 when no prior records exist.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 543120f and 1096865.

📒 Files selected for processing (7)
  • stage_descriptions/producing-messages-01-um3.md (1 hunks)
  • stage_descriptions/producing-messages-02-ck2.md (1 hunks)
  • stage_descriptions/producing-messages-03-dp1.md (1 hunks)
  • stage_descriptions/producing-messages-04-ps7.md (1 hunks)
  • stage_descriptions/producing-messages-05-sb8.md (1 hunks)
  • stage_descriptions/producing-messages-06-mf2.md (1 hunks)
  • stage_descriptions/producing-messages-07-ar4.md (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • stage_descriptions/producing-messages-05-sb8.md
  • stage_descriptions/producing-messages-04-ps7.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • stage_descriptions/producing-messages-03-dp1.md
  • stage_descriptions/producing-messages-07-ar4.md
  • stage_descriptions/producing-messages-01-um3.md
  • stage_descriptions/producing-messages-02-ck2.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test_course_definition / test (typescript)
🔇 Additional comments (1)
stage_descriptions/producing-messages-06-mf2.md (1)

1-46: Overall description reads clearly — no blocking issues

The stage requirements, validation steps, and external references are precise and technically accurate. Nice work.

…ion from v11 to v12, ensuring consistency in request and response links and validation criteria
coderabbitai[bot]

This comment was marked as outdated.

…est now includes a recordBatch containing a single record, enhancing clarity on request structure
…ractive protocol inspector links and clarifying record handling for multiple records and partitions
…hat records are written to each partition's log file independently
@ryan-gang ryan-gang requested a review from rohitpaulk July 25, 2025 18:30
The record must be persisted to the topic's log file at `<log-dir>/<topic-name>-<partition-index>/00000000000000000000.log` using Kafka's RecordBatch format.
You can refer to the official Kafka docs for the [RecordBatch format](https://kafka.apache.org/documentation/#recordbatch).

Kafka's on-disk log format uses the same [RecordBatch](https://binspec.org/kafka-record-batches) format that is used in `Produce` and `Fetch` requests. The `RecordBatch` you receive in the `Produce` request can be written to the log file as is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually a binspec file we're going to make? Could we just link to a recordBatch within a topic's binspec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could link to a topic's log file. It would be just like a couple of record batches.

@@ -0,0 +1,43 @@
In this stage, you'll add support for producing multiple records in a single `Produce` request.

## Producing multiple records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryan-gang at what point do we test "appending" to a log file? That seems like something that'd need extra implementation and something we should explicitly call out and test.

ryan-gang and others added 2 commits July 28, 2025 21:21
…ect accurate RecordBatch format references and clarify log format details
The record must be persisted to the topic's log file at `<log-dir>/<topic-name>-<partition-index>/00000000000000000000.log` using Kafka's [RecordBatch format](https://kafka.apache.org/documentation/#recordbatch).

TODO: edit link after topic log binspec is in
Kafka's on-disk log format uses the same [RecordBatch](https://binspec.org/kafka-topic-log) format that is used in `Produce` and `Fetch` requests. The `RecordBatch` you receive in the `Produce` request can be written to the log file as is.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Remove Outdated Comment

The "TODO: edit link after topic log binspec is in" comment was accidentally committed. This development note is outdated and confusing, as the binspec.org/kafka-topic-log link it refers to is already present and used in the immediate vicinity. It should be removed.

Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants