diff --git a/all-stages.md b/all-stages.md new file mode 100644 index 0000000..b595461 --- /dev/null +++ b/all-stages.md @@ -0,0 +1,28 @@ +- Include CreateTopics in APIVersions +- CreateTopics with Invalid Topic Name (empty, ., .., length > 249, invalid characters) +- CreateTopics with Invalid Partition Values (negative, zero) +- CreateTopics with Invalid Replication Values (negative, zero, > broker count) +- CreateTopics with Reserved Topic Name (__cluster_metadata) +- CreateTopics with Existing Topic (read __cluster_metadata) +- CreateTopics with Authorization Check (no CREATE permission on CLUSTER or TOPIC resource) +- CreateTopics with Single Topic (single topic in single request) +- CreateTopics with Multiple Topics (multiple topics in single request) +- CreateTopics with Manual Assignments (manual assignment of partitions and replicas) +- CreateTopics with Validation Only +- CreateTopics with Invalid Request (Duplicate topic names in single request) + + +1. Topic Name Validation - INVALID_TOPIC_EXCEPTION (17) +- Empty topic name or "." or ".." +- Length > 249 characters +- Invalid characters (only ASCII alphanumerics, '.', '_', '-') +- Internal topic collision detection + + + +Stage CTX: CreateTopics with Invalid Parameters +- Handle negative/zero partitions, invalid replication factors +- Duplicate topic names in single request +- Error codes: INVALID_PARTITIONS (37), INVALID_REPLICATION_FACTOR (38) + +Stage CTX: Topic-specific configurations diff --git a/course-definition.yml b/course-definition.yml index c4f7483..9adfaee 100644 --- a/course-definition.yml +++ b/course-definition.yml @@ -71,6 +71,24 @@ extensions: [fetch-api]: https://kafka.apache.org/protocol.html#The_Messages_Fetch + - slug: "producing-messages" + name: "Producing Messages" + description_markdown: | + In this challenge extension you'll add support for producing messages by implementing the [Produce][produce-api] API. + + Along the way you'll learn about how Kafka's Produce API works, how Kafka stores messages on disk and more. + + [produce-api]: https://kafka.apache.org/protocol.html#The_Messages_Produce + + - slug: "creating-topics" + name: "Creating Topics" + description_markdown: | + In this challenge extension you'll add support for creating topics by implementing the [CreateTopics][create-topics-api] API. + + Along the way you'll learn about how Kafka's CreateTopics API works, topic validation, the `__cluster_metadata` topic and more. + + [create-topics-api]: https://kafka.apache.org/protocol.html#The_Messages_CreateTopics + stages: - slug: "vi6" name: "Bind to a port" @@ -199,3 +217,91 @@ stages: difficulty: hard marketing_md: |- In this stage, you'll implement the Fetch response for a topic with multiple messages, reading them from disk. + + # Producing Messages + + - slug: "um3" + primary_extension_slug: "producing-messages" + name: "Include Produce in APIVersions" + difficulty: easy + marketing_md: |- + In this stage, you'll add the Produce API to the APIVersions response. + + - slug: "ck2" + primary_extension_slug: "producing-messages" + name: "Produce to an invalid topic or partition" + difficulty: medium + marketing_md: |- + In this stage, you'll implement the Produce response for an invalid topic or partition. + + - slug: "ps7" + primary_extension_slug: "producing-messages" + name: "Produce a single record" + difficulty: hard + marketing_md: |- + In this stage, you'll implement successfully producing a single record to disk. + + - slug: "sb8" + primary_extension_slug: "producing-messages" + name: "Produce multiple records" + difficulty: hard + marketing_md: |- + In this stage, you'll implement producing multiple records in a single record batch. + + - slug: "mf2" + primary_extension_slug: "producing-messages" + name: "Produce to multiple partitions" + difficulty: hard + marketing_md: |- + In this stage, you'll implement producing to multiple partitions of the same topic. + + - slug: "ar4" + primary_extension_slug: "producing-messages" + name: "Produce to multiple topics" + difficulty: hard + marketing_md: |- + In this stage, you'll implement producing to multiple topics in a single request. + + # Creating Topics + + - slug: "yb1" + primary_extension_slug: "creating-topics" + name: "Include CreateTopics in APIVersions" + difficulty: easy + marketing_md: |- + In this stage, you'll add the CreateTopics API to the APIVersions response. + + - slug: "ve7" + primary_extension_slug: "creating-topics" + name: "CreateTopics with Invalid Topic Name" + difficulty: medium + marketing_md: |- + In this stage, you'll implement the CreateTopics response for invalid topic names. + + - slug: "us2" + primary_extension_slug: "creating-topics" + name: "CreateTopics with Existing Topic Name" + difficulty: hard + marketing_md: |- + In this stage, you'll implement the CreateTopics response for existing and reserved topic names. + + - slug: "hh9" + primary_extension_slug: "creating-topics" + name: "CreateTopics with Validation Only" + difficulty: hard + marketing_md: |- + In this stage, you'll implement the CreateTopics response for validation-only mode. + + - slug: "rk2" + primary_extension_slug: "creating-topics" + name: "CreateTopics with a single topic" + difficulty: hard + marketing_md: |- + In this stage, you'll implement creating a single topic. + + - slug: "fl3" + primary_extension_slug: "creating-topics" + name: "CreateTopics with Multiple Topics" + difficulty: hard + marketing_md: |- + In this stage, you'll implement creating multiple topics in a single request. diff --git a/final-stages.md b/final-stages.md new file mode 100644 index 0000000..5e3e908 --- /dev/null +++ b/final-stages.md @@ -0,0 +1,21 @@ +Stage CT1: Include CreateTopics in APIVersions +- Add API key 19 (CreateTopics) to APIVersions response +- Foundation stage enabling API discovery + +Stage CT2: CreateTopics with Invalid Topic Name (Hard code error) +- Handle invalid characters, reserved names, empty names +- Error code: INVALID_TOPIC_EXCEPTION (17) + +Stage CT3: CreateTopics with Existing Topic Name (Read __cluster_metadata) +- Handle attempts to create system topics (__cluster_metadata): INVALID_REQUEST (42) +- Handle duplicate topic creation attempts: TOPIC_ALREADY_EXISTS (36) + +Stage CT4: CreateTopics with Validation Only +- Success case without persisting any data + +Stage CT5: CreateTopics with Valid Parameters (Success Case) +- Successfully create single topic with valid parameters +- Core success functionality + +Stage CT6: Multiple topics in single CreateTopics request +- Handle multiple topics in a single request diff --git a/stage_descriptions/consuming-messages-01-gs0.md b/stage_descriptions/consuming-messages-01-gs0.md index 20403af..a0f68b3 100644 --- a/stage_descriptions/consuming-messages-01-gs0.md +++ b/stage_descriptions/consuming-messages-01-gs0.md @@ -1,10 +1,15 @@ In this stage, you'll add an entry for the `Fetch` API to the APIVersions response. -🚧 **We're still working on instructions for this stage**. You can find notes on how the tester works below. +### The Fetch API -In the meantime, please use -[this link](https://forum.codecrafters.io/new-topic?category=Challenges&tags=challenge%3Akafka&title=Question+about+gs0%3A+Include+Fetch+in+APIVersions&body=%3Cyour+question+here%3E) -to ask questions on the forum. +The [Fetch API](https://kafka.apache.org/protocol#The_Messages_Fetch) (API key `1`) is used to fetch messages from a Kafka topic. + +We've created an interactive protocol inspector for the request & response structures for `Fetch`: + +- 🔎 [Fetch Request (v16)](https://binspec.org/kafka-fetch-request-v16) +- 🔎 [Fetch Response (v16)](https://binspec.org/kafka-fetch-response-v16) + +In this stage, you'll only need to add an entry for the `Fetch` API to the APIVersions response you implemented in earlier stages. We'll get to responding to `Fetch` requests in later stages. ### Tests @@ -26,6 +31,6 @@ The tester will validate that: ### Notes -- You don't have to implement support for the `Fetch` request in this stage. We'll get to this in later stages. -- You'll still need to include the entry for `APIVersions` in your response to pass the previous stage. +- You don't have to implement support for handling `Fetch` requests in this stage. We'll get to this in later stages. +- You'll still need to include the entry for `APIVersions` in your response to pass earlier stages. - The `MaxVersion` for the `Fetch` and `APIVersions` are different. For `APIVersions`, it is 4. For `Fetch`, it is 16. diff --git a/stage_descriptions/consuming-messages-02-dh6.md b/stage_descriptions/consuming-messages-02-dh6.md index b413300..d358007 100644 --- a/stage_descriptions/consuming-messages-02-dh6.md +++ b/stage_descriptions/consuming-messages-02-dh6.md @@ -1,10 +1,15 @@ -In this stage, you'll implement the Fetch response for a Fetch request with no topics. +In this stage, you'll implement the response for a `Fetch` request with no topics. -🚧 **We're still working on instructions for this stage**. You can find notes on how the tester works below. +### Fetch API response for no topics -In the meantime, please use -[this link](https://forum.codecrafters.io/new-topic?category=Challenges&tags=challenge%3Akafka&title=Question+about+dh6%3A+Fetch+with+no+topics&body=%3Cyour+question+here%3E) -to ask questions on the forum. +A `Fetch` request includes a list of topics to fetch messages from. If the request contains an empty list of topics, the `responses` field in the response will be an empty array. + +Here are interactive visualizations of what the `Fetch` request & response will look like when the request contains an empty list of topics: + +- 🔎 [Fetch Request (v17) with no topics](https://binspec.org/kafka-fetch-request-v17-no-topics) +- 🔎 [Fetch Response (v17) with no topics](https://binspec.org/kafka-fetch-response-v17-no-topics) + +In this stage, you'll need to implement the response for a `Fetch` request with an empty list of topics. We'll get to handling `Fetch` requests with topics in later stages. ### Tests @@ -36,4 +41,4 @@ The tester will validate that: - You don't need to parse the fields in the `Fetch` request in this stage, we'll get to this in later stages. - The official docs for the `Fetch` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Fetch). Make sure -to scroll down to the "Fetch Response (Version: 16)" section. \ No newline at end of file + to scroll down to the "Fetch Response (Version: 16)" section. diff --git a/stage_descriptions/creating-topics-01-yb1.md b/stage_descriptions/creating-topics-01-yb1.md new file mode 100644 index 0000000..800524d --- /dev/null +++ b/stage_descriptions/creating-topics-01-yb1.md @@ -0,0 +1,36 @@ +In this stage, you'll add an entry for the `CreateTopics` API to the APIVersions response. + +## The CreateTopics API + +The [CreateTopics API](https://kafka.apache.org/protocol#The_Messages_CreateTopics) (API key `19`) is used to create topics in a Kafka cluster. We'll use a single broker in our cluster for this extension. + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +In this stage, you'll only need to add an entry for the `CreateTopics` API to the APIVersions response you implemented in earlier stages. This will let the client know that the broker supports the `CreateTopics` API. We'll get to responding to `CreateTopics` requests in later stages. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a valid `APIVersions` (v4) request. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (No Error). +- The response body contains at least one entry for the API key `19` (CreateTopics). +- The `MaxVersion` for the CreateTopics API is at least 6. + +## Notes + +- You don't have to implement support for the `CreateTopics` request in this stage. We'll get to this in later stages. +- You'll still need to include the entry for `APIVersions` in your response to pass previous stages. +- The `MaxVersion` for the `CreateTopics` API is 6. diff --git a/stage_descriptions/creating-topics-02-ve7.md b/stage_descriptions/creating-topics-02-ve7.md new file mode 100644 index 0000000..ac1f405 --- /dev/null +++ b/stage_descriptions/creating-topics-02-ve7.md @@ -0,0 +1,46 @@ +In this stage, you'll add support for handling `CreateTopics` requests with invalid topic names. + +## CreateTopics API response for invalid topics + +When a Kafka broker receives a CreateTopics request, it first needs to validate that the topic name follows Kafka's naming rules. If a topic name is invalid, it returns an appropriate error code and response without attempting to create the topic. + +Topic names must follow these rules: +- Cannot be empty string +- Cannot be "." or ".." +- Maximum length of 249 characters +- Only ASCII alphanumerics, '.', '_', and '-' are allowed + +If the topic name is invalid, the broker returns an error code of `17` (INVALID_TOPIC_EXCEPTION). + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +In this stage, you'll need to implement basic topic name validation without needing to check against existing topics or system topics. You can hard code the error response for invalid topic names in this stage. We'll get to checking against existing topics and system topics in later stages. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `CreateTopics` (v6) request with an invalid topic name. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the topic response is `17` (INVALID_TOPIC_EXCEPTION). +- The `throttle_time_ms` field in the response is `0`. +- The `name` field in the topic response corresponds to the topic name in the request. +- The `error_message` field contains `Topic name is invalid`. +- The `num_partitions` and `replication_factor` fields are `-1`. + +## Notes + +- You'll need to parse the `CreateTopics` request in this stage to get the topic names. +- The official docs for the `CreateTopics` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_CreateTopics). +- Topic name validation logic is in the Kafka source code at `clients/src/main/java/org/apache/kafka/common/internals/Topic.java`. \ No newline at end of file diff --git a/stage_descriptions/creating-topics-03-us2.md b/stage_descriptions/creating-topics-03-us2.md new file mode 100644 index 0000000..5ebffde --- /dev/null +++ b/stage_descriptions/creating-topics-03-us2.md @@ -0,0 +1,50 @@ +In this stage, you'll add support for handling `CreateTopics` requests with existing topic names. + +## CreateTopics API response for existing topics + +When a Kafka broker receives a CreateTopics request, it needs to validate that the topic doesn't already exist and is not reserved for system use. If a topic already exists or is reserved, it returns an appropriate error code and response without attempting to create the topic again. + +To validate that a topic exists, the broker reads the `__cluster_metadata` topic's log file, located at `/tmp/kraft-combined-logs/__cluster_metadata-0/00000000000000000000.log`. Inside the log file, the broker finds the topic's metadata, which is a `record` (inside a RecordBatch) with a payload of type `TOPIC_RECORD`. If there exists a `TOPIC_RECORD` with the given topic name and the topic ID, the topic exists. + +If the topic already exists, the broker returns an error code of `36` (TOPIC_ALREADY_EXISTS), with the `error_message` field containing `Topic '' already exists.`. + +Reserved topic names include: +- `__cluster_metadata` - KRaft metadata topic + +If the topic name is reserved, the broker returns an error code of `42` (INVALID_REQUEST), with the `error_message` field containing `Creation of internal topic __cluster_metadata is prohibited.`. + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +We've also created an interactive protocol inspector for the `__cluster_metadata` topic's log file: +- 🔎 [__cluster_metadata topic's log file](https://binspec.org/kafka-cluster-metadata) + +This would help you understand the structure of the `TOPIC_RECORD` record inside the `__cluster_metadata` topic's log file. + +In this stage, you'll need to implement topic existence checking by reading the cluster metadata. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `CreateTopics` (v6) request for an existing topic and a reserved topic name. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the topic response is `36` (TOPIC_ALREADY_EXISTS) or `42` (INVALID_REQUEST) depending on the request. +- The `throttle_time_ms` field in the response is `0`. +- The `name` field in the topic response corresponds to the topic name in the request. +- The `error_message` field contains the expected error message. +- The `num_partitions` and `replication_factor` fields are `-1`. + +## Notes + +- The official Kafka docs don't cover the structure of records inside the `__cluster_metadata` topic, but you can find the definitions in the Kafka source code [here](https://github.com/apache/kafka/tree/trunk/metadata/src/main/resources/common/metadata). \ No newline at end of file diff --git a/stage_descriptions/creating-topics-04-hh9.md b/stage_descriptions/creating-topics-04-hh9.md new file mode 100644 index 0000000..dcef5db --- /dev/null +++ b/stage_descriptions/creating-topics-04-hh9.md @@ -0,0 +1,50 @@ +In this stage, you'll add support for handling `CreateTopics` requests in validation mode. + +## Validation-Only Mode + +When a Kafka broker receives a CreateTopics request with the `validate_only` flag set to `true`, it needs to perform all validation checks as if creating the topics but without actually creating them or modifying any persistent state. This allows clients to validate topic creation parameters without side effects. + +The broker performs the following steps: +1. Validates the topic name +2. Checks that the topic doesn't already exist +3. Validates partition count and replication factor parameters +4. Returns success/error responses as if creating the topics +5. Does NOT write any `TOPIC_RECORD` or `PARTITION_RECORD` to the `__cluster_metadata` topic's log file +6. Does NOT create any partition directories in the log directory +7. Does NOT modify any persistent state (e.g. topic metadata, partition files, etc.) + +The response should be identical to what would be returned for actual topic creation, but without any persistence side effects. + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +In this stage, you'll need to check the `validate_only` flag in the request and modify your behavior accordingly. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `CreateTopics` (v6) request with `validate_only` set to `true`. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the topic result is `0` (NO_ERROR) for valid topics. +- The `throttle_time_ms` field in the response is `0`. +- The `name` field in the topic result corresponds to the topic name in the request. +- NO topic metadata is written to the `__cluster_metadata` topic's log file. +- NO topic directories are created in the log directory. +- The response is identical to what would be returned for actual topic creation. + +## Notes + +- This stage tests the ability to perform dry-run validation without side effects. +- All validation logic from previous stages should be applied. +- The response should be identical to actual topic creation, but without persistence. \ No newline at end of file diff --git a/stage_descriptions/creating-topics-05-rk2.md b/stage_descriptions/creating-topics-05-rk2.md new file mode 100644 index 0000000..459b3de --- /dev/null +++ b/stage_descriptions/creating-topics-05-rk2.md @@ -0,0 +1,51 @@ +In this stage, you'll add support for handling `CreateTopics` requests for a single topic. + +## Single Topic Creation + +When a Kafka broker receives a CreateTopics request for a valid topic name that doesn't already exist, it needs to validate the parameters, create the topic metadata, persist it to the `__cluster_metadata` topic's log file, create the topic directory structure, and return a successful response. + +The broker performs the following steps: +1. Validates the topic name (from previous stages) +2. Checks that the topic doesn't already exist (from previous stages) +3. Validates partition count and replication factor parameters (request will include valid values) +4. Creates a new topic UUID +5. Writes a `TOPIC_RECORD` to the `__cluster_metadata` topic's log file +6. Writes `PARTITION_RECORD`s to the `__cluster_metadata` topic's log file for each partition +7. Creates partition directories in the log directory (e.g., `/tmp/kraft-combined-logs/topic-name-0/`) +8. Returns a successful response with error code `0` (NO_ERROR) + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +We've also created an interactive protocol inspector for the `__cluster_metadata` topic's log file: +- 🔎 [__cluster_metadata topic's log file](https://binspec.org/kafka-cluster-metadata) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `CreateTopics` (v6) request with a non-existent topic name. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the topic response is `0` (NO_ERROR). +- The `throttle_time_ms` field in the response is `0`. +- The `name` field in the topic response corresponds to the topic name in the request. +TODO: details +- The topic metadata is written to the `__cluster_metadata` topic's log file. +TODO: details +- The topic directory structure is created in the log directory. +- The `num_partitions` and `replication_factor` fields in the topic response correspond to the request. + +## Notes + +- The topic metadata should be persisted in the `__cluster_metadata` topic as a `TOPIC_RECORD`. +- Topic directories should be created in the log directory (e.g., `/tmp/kraft-combined-logs/topic-name-0/`). \ No newline at end of file diff --git a/stage_descriptions/creating-topics-06-fl3.md b/stage_descriptions/creating-topics-06-fl3.md new file mode 100644 index 0000000..a88c06a --- /dev/null +++ b/stage_descriptions/creating-topics-06-fl3.md @@ -0,0 +1,50 @@ +In this stage, you'll add support for handling `CreateTopics` requests with multiple topics. + +## Batch Topic Creation + +When a Kafka broker receives a CreateTopics request containing multiple topics, it needs to process each topic independently, applying all validation rules and creation logic to each one. The broker processes each topic in the array and returns individual results for each topic, regardless of whether other topics succeed or fail. + +The broker performs the following steps for each topic: +1. Validates the topic name (from previous stages) +2. Checks that the topic doesn't already exist (from previous stages) +3. Validates partition count and replication factor parameters +4. Creates a new topic UUID (for successful topics) +5. Writes a `TOPIC_RECORD` to the `__cluster_metadata` topic's log file (for successful topics) +6. Writes `PARTITION_RECORD`s to the `__cluster_metadata` topic's log file for each partition (for successful topics) +7. Creates partition directories in the log directory (for successful topics) +8. Returns individual results for each topic in the response array + +Mixed success/failure scenarios are handled gracefully - if one topic fails validation, it doesn't prevent other valid topics from being created. + +We've created an interactive protocol inspector for the request & response structures for `CreateTopics`: + +- 🔎 [CreateTopics Request (v6)](https://binspec.org/kafka-createtopics-request-v6) +- 🔎 [CreateTopics Response (v6)](https://binspec.org/kafka-createtopics-response-v6) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `CreateTopics` (v6) request with multiple topics. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The response contains individual results for each topic in the request. +- Each topic result has the correct error code (0 for success, appropriate error codes for failures). +- The `throttle_time_ms` field in the response is `0`. +- The `name` field in each topic response corresponds to the topic name in the request. +TODO: details +- Successfully created topics have their metadata written to the `__cluster_metadata` topic. +TODO: details +- The topic directory structure is created in the log directory. +- Failed topics have appropriate error messages. + +## Notes + +- The CreateTopics request can contain multiple topics, and each should be processed independently. \ No newline at end of file diff --git a/stage_descriptions/producing-messages-01-um3.md b/stage_descriptions/producing-messages-01-um3.md new file mode 100644 index 0000000..d1b3ee5 --- /dev/null +++ b/stage_descriptions/producing-messages-01-um3.md @@ -0,0 +1,36 @@ +In this stage, you'll add an entry for the `Produce` API to the APIVersions response. + +## The Produce API + +The [Produce API](https://kafka.apache.org/protocol#The_Messages_Produce) (API key `0`) is used to produce messages to a Kafka topic. + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +In this stage, you'll only need to add an entry for the `Produce` API to the APIVersions response you implemented in earlier stages. This will let the client know that the broker supports the `Produce` API. We'll get to responding to `Produce` requests in later stages. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a valid `APIVersions` (v4) request. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (NO_ERROR). +- The response body contains at least one entry for the API key `0` (Produce). +- The `MaxVersion` for the Produce API is at least 11. + +## Notes + +- You don't have to implement support for handling `Produce` requests in this stage. We'll get to this in later stages. +- You'll still need to include the entry for `APIVersions` in your response to pass earlier stages. +- The `MaxVersion` for `Produce` and `APIVersions` are different. For `APIVersions`, it is 4. For `Produce`, it is 11. diff --git a/stage_descriptions/producing-messages-02-ck2.md b/stage_descriptions/producing-messages-02-ck2.md new file mode 100644 index 0000000..5b60d16 --- /dev/null +++ b/stage_descriptions/producing-messages-02-ck2.md @@ -0,0 +1,59 @@ +In this stage, you'll add support for handling Produce requests to invalid topics or partitions. + +## Produce API Response for Invalid Topics or Partitions + +When a Kafka broker receives a Produce request, it needs to validate that both the topic and partition exist. If either the topic or partition doesn't exist, it returns an appropriate error code and response. + +The broker performs validation in this order: +1. **Topic validation**: Check if the topic exists by reading the `__cluster_metadata` topic's log file +2. **Partition validation**: If the topic exists, check if the partition exists within that topic + +### Topic Validation + +To validate that a topic exists, the broker reads the `__cluster_metadata` topic's log file, located at `/tmp/kraft-combined-logs/__cluster_metadata-0/00000000000000000000.log`. Inside the log file, the broker finds the topic's metadata, which is a `record` (inside a RecordBatch) with a payload of type `TOPIC_RECORD`. If there exists a `TOPIC_RECORD` with the given topic name and the topic ID, the topic exists. + +### Partition Validation + +To validate that a partition exists, the broker reads the same `__cluster_metadata` topic's log file and finds the partition's metadata, which is a `record` (inside a RecordBatch) with a payload of type `PARTITION_RECORD`. If there exists a `PARTITION_RECORD` with the given partition index, the UUID of the topic it is associated with, and the UUID of the directory it is associated with, the partition exists. + +If either the topic or partition doesn't exist, the broker returns an error code of `3` (UNKNOWN_TOPIC_OR_PARTITION). + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +We've also created an interactive protocol inspector for the `__cluster_metadata` topic's log file: +- 🔎 [__cluster_metadata log file](https://binspec.org/kafka-cluster-metadata) + +In this stage, you'll need to implement the response for a `Produce` request with either an invalid topic or invalid partition. In later stages, you'll handle successfully producing messages to valid topics and partitions and persist messages to disk using Kafka's RecordBatch format. + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `Produce` (v11) request with either an invalid topic name or a valid topic but invalid partition. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The `error_code` in the response body is `3` (UNKNOWN_TOPIC_OR_PARTITION). +- The `throttle_time_ms` field in the response is `0`. +- Inside the topic response: + - The `name` field matches the topic name in the request. + - Inside the partition response: + - The `index` field matches the partition in the request. + - The `base_offset` field is `-1`. + - The `log_append_time_ms` field is `-1`. + - The `log_start_offset` field is `-1`. + +## Notes + +- You'll need to parse the `Produce` request in this stage to get the topic name and partition to send in the response. +- The official docs for the `Produce` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Produce). Make sure to scroll down to the "Produce Response (Version: 11)" section. +- The official Kafka docs don't cover the structure of records inside the `__cluster_metadata` topic, but you can find the definitions in the Kafka source code [here](https://github.com/apache/kafka/tree/5b3027dfcbcb62d169d4b4421260226e620459af/metadata/src/main/resources/common/metadata). diff --git a/stage_descriptions/producing-messages-03-ps7.md b/stage_descriptions/producing-messages-03-ps7.md new file mode 100644 index 0000000..d0bb75b --- /dev/null +++ b/stage_descriptions/producing-messages-03-ps7.md @@ -0,0 +1,48 @@ +In this stage, you'll add support for successfully producing a single record. + +## Single Record Production + +When a Kafka broker receives a Produce request, it needs to validate that the topic and partition exist (using the `__cluster_metadata` topic's log file), store the record in the appropriate log file using Kafka's on-disk format, and return a successful response with the assigned offset. + +The record must be persisted to the topic's log file at `/-/00000000000000000000.log` using Kafka's RecordBatch format. + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +Kafka's on-disk log format is just records inside a `RecordBatch`. The same RecordBatch format that is used in the `Produce` request and `Fetch` request is also used in the on-disk log file. + +You can refer to the following interactive protocol inspector for Kafka's log file format: +- 🔎 [__cluster_metadata topic's log file](https://binspec.org/kafka-cluster-metadata) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send multiple successive `Produce` (v11) requests to a valid topic and partition with single records as the payload. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (NO_ERROR). +- The `throttle_time_ms` field in the response is `0`. +- Inside the topic response: + - The `name` field matches the topic name in the request. + - Inside the partition response: + - The `index` field matches the partition in the request. + - The `base_offset` field contains the assigned offset for the record. (The `base_offset` is the offset of the record in the partition, not the offset of the batch. So 0 for the first record, 1 for the second record, and so on.) + - The `log_append_time_ms` field is `-1` (signifying that the timestamp is the latest). + - The `log_start_offset` field is `0`. +- The record is persisted to the appropriate log file on disk at `/-/00000000000000000000.log`. + +## Notes + +- On-disk log files must be stored in `RecordBatch` format with proper CRC validation. +- The offset assignment should start from `0` for new partitions and increment for each subsequent record. +- The official docs for the `Produce` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Produce). Make sure to scroll down to the "Produce Response (Version: 11)" section. diff --git a/stage_descriptions/producing-messages-04-sb8.md b/stage_descriptions/producing-messages-04-sb8.md new file mode 100644 index 0000000..ba2bc0d --- /dev/null +++ b/stage_descriptions/producing-messages-04-sb8.md @@ -0,0 +1,41 @@ +In this stage, you'll add support for producing multiple records from a single request. + +## Batch Processing + +When a Kafka broker receives a `Produce` request containing a `RecordBatch` with multiple records, it needs to validate that the topic and partition exist, assign sequential offsets to each record within the batch, and store the entire batch atomically to the log file. The `RecordBatch` containing multiple records must be stored as a single unit to the topic's log file at `/-/00000000000000000000.log`. + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `Produce` (v11) request containing a RecordBatch with multiple records. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (NO_ERROR). +- The `throttle_time_ms` field in the response is `0`. +- Inside the topic response: + - The `name` field matches the topic name in the request. + - Inside the partition response: + - The `index` field matches the partition in the request. + - The `base_offset` field is 0 (the base offset for the batch). + - The `log_append_time_ms` field is `-1` (signifying that the timestamp is the latest). + - The `log_start_offset` field is `0`. +- The records are persisted to the appropriate log file on disk at `/-/00000000000000000000.log` with sequential offsets. + +## Notes + +- Records within a batch must be assigned sequential offsets (e.g., if the base offset is 5, records get offsets 5, 6, 7). +- The response should return the base offset of the batch, not individual record offsets. +- The official docs for the `Produce` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Produce). Make sure to scroll down to the "Produce Response (Version: 11)" section. diff --git a/stage_descriptions/producing-messages-05-mf2.md b/stage_descriptions/producing-messages-05-mf2.md new file mode 100644 index 0000000..09601f2 --- /dev/null +++ b/stage_descriptions/producing-messages-05-mf2.md @@ -0,0 +1,45 @@ +In this stage, you'll add support for producing to multiple partitions of the same topic in a single request. + +## Partition Routing + +When a Kafka broker receives a Produce request targeting multiple partitions of the same topic, it needs to validate that the topic and all partitions exist, write records to each partition's log file independently, and return a response containing results for all partitions. Each partition maintains its own offset sequence independently, so partition 0 and partition 1 can both have records starting at offset 0. + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `Produce` (v11) request targeting multiple partitions of the same topic. The request will contain multiple RecordBatches, one for each partition. Each RecordBatch will contain a single record. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (NO_ERROR). +- The `throttle_time_ms` field in the response is `0`. +- There is a single topic present in the response. +- Inside the topic response: + - The `name` field matches the topic name in the request. + - Each partition in the request has a corresponding partition response. + - Inside each partition response: + - The `index` field matches the partition in the request. + - The `base_offset` field contains the assigned offset for that partition. + - The error code is `0` (NO_ERROR). + - The `log_append_time_ms` field is `-1` (signifying that the timestamp is the latest). + - The `log_start_offset` field is `0`. +- Records are persisted to the correct partition log files on disk. +- Offset assignment is independent per partition (partition 0 and partition 1 can both have offset 0). + +## Notes + +- The response must include entries for all requested partitions. +- On-disk log files must be stored in `RecordBatch` format with proper CRC validation. +- The official docs for the `Produce` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Produce). Make sure to scroll down to the "Produce Response (Version: 11)" section. diff --git a/stage_descriptions/producing-messages-06-ar4.md b/stage_descriptions/producing-messages-06-ar4.md new file mode 100644 index 0000000..e309c1c --- /dev/null +++ b/stage_descriptions/producing-messages-06-ar4.md @@ -0,0 +1,43 @@ +In this stage, you'll add support for producing to multiple topics in a single request. + +## Cross-Topic Production + +When a Kafka broker receives a `Produce` request targeting multiple topics with their respective partitions, it needs to validate that all topics and partitions exist, write records to each topic-partition's log file independently, and return a nested response structure containing results for all topics and their partitions. Each topic-partition combination maintains its own independent offset sequence, so topic "foo" partition 0 and topic "bar" partition 0 can both have records starting at offset 0. + +We've created an interactive protocol inspector for the request & response structures for `Produce`: + +- 🔎 [Produce Request (v11)](https://binspec.org/kafka-produce-request-v11) +- 🔎 [Produce Response (v11)](https://binspec.org/kafka-produce-response-v11) + +## Tests + +The tester will execute your program like this: + +```bash +./your_program.sh /tmp/server.properties +``` + +It'll then connect to your server on port 9092 and send a `Produce` (v11) request targeting multiple topics with their respective partitions. The request will contain data for multiple topics, and a single partition for each topic. + +The tester will validate that: + +- The first 4 bytes of your response (the "message length") are valid. +- The correlation ID in the response header matches the correlation ID in the request header. +- The error code in the response body is `0` (NO_ERROR). +- The `throttle_time_ms` field in the response is `0`. +- Each topic in the request has a corresponding topic response. +- Inside each topic response: + - The `name` field matches the topic name in the request. + - Each partition in the request has a corresponding partition response. + - Inside each partition response: + - The `index` field matches the partition in the request. + - The error code is `0` (NO_ERROR). + - The `base_offset` field contains the assigned offset for that topic-partition. + - The `log_append_time_ms` field is `-1` (signifying that the timestamp is the latest). + - The `log_start_offset` field is `0`. +- Records are persisted to the correct topic-partition log files on disk. +- Offset assignment is independent per topic-partition combination. + +## Notes + +- The official docs for the `Produce` request can be found [here](https://kafka.apache.org/protocol.html#The_Messages_Produce). Make sure to scroll down to the "Produce Response (Version: 11)" section.