feat(transactions): Transaction added to garantee only once paradigm.…#297
feat(transactions): Transaction added to garantee only once paradigm.…#297marcosschroh wants to merge 1 commit intomasterfrom
Conversation
|
| @@ -0,0 +1,241 @@ | |||
| `Kafka 0.11.0` includes support for `idempotent` and `transactional` capabilities in the `producer`. Idempotent delivery ensures that messages are delivered `exactly once` | |||
| to a particular topic partition during the lifetime of a `single producer`. Transactional delivery allows producers to send data to multiple partitions such that either | |||
| all messages are successfully delivered, or none of them are. Together, these capabilities enable `*exactly once semantics*` in Kafka. | |||
There was a problem hiding this comment.
I think transactions lack some explanation here. I would add that "exactly once semantics" are for only one scenario:
- consume-process-produce under the same kafka cluster.
| ) -> typing.Awaitable[RecordMetadata]: ... | ||
|
|
||
|
|
||
| class Transaction(typing.Protocol): |
There was a problem hiding this comment.
Add a docstring to this class please. It's not clear why we have a protocol Transaction and then we have transaction.Transaction. An explanation on the docstring would clarify.
consider also renaming this to TransactionProtocol.
| from .clients import Producer | ||
|
|
||
| if typing.TYPE_CHECKING: | ||
| from . import transaction # pragma: no cover |
There was a problem hiding this comment.
is there a way to avoid this circular import? could transaction.Transaction also be a protocol?
| - Transaction state is stored in a new internal topic `__transaction_state`. This topic is not created until the the first attempt to use a transactional request API. There are several settings to control the topic's configuration. | ||
| - `Topics` which are included in transactions should be configured for durability. In particular, the `replication.factor` should be at least `3`, and the `min.insync.replicas` for these topics should be set to `2` | ||
| - `Transactions` always add overhead, meaning that more effort is needed to produce events and the consumer have to apply filters | ||
| For example, `transaction.state.log.min.isr` controls the minimum ISR for this topic. |
There was a problem hiding this comment.
For discoverability I would add links to the kafka settings.
| For example, `transaction.state.log.min.isr` controls the minimum ISR for this topic. | |
| For example, [`transaction.state.log.min.isr`](https://kafka.apache.org/documentation/#brokerconfigs_transaction.state.log.min.isr) controls the minimum ISR for this topic. |
|
|
||
| ## Usage | ||
|
|
||
| From the `kstreams` point of view, the `transaction pattern` is a `context manager` that will `start` a transaction and then `commit` or `aboort` it. Always a `transaction` starts when an event is send. Once that we have the `context` we can send events in two ways: |
There was a problem hiding this comment.
| From the `kstreams` point of view, the `transaction pattern` is a `context manager` that will `start` a transaction and then `commit` or `aboort` it. Always a `transaction` starts when an event is send. Once that we have the `context` we can send events in two ways: | |
| From the `kstreams` point of view, the "transaction pattern" is a `context manager` that will `start` a transaction and then `commit` or `abort` it. A `transaction` always starts when an event is sent. Once we have the `context` we can send events in two ways: |
|
|
||
| From the `kstreams` point of view, the `transaction pattern` is a `context manager` that will `start` a transaction and then `commit` or `aboort` it. Always a `transaction` starts when an event is send. Once that we have the `context` we can send events in two ways: | ||
|
|
||
| Using the `StreamEngine` directly: |
There was a problem hiding this comment.
This should be a side-note, the recommended way should be with the type annotation IMO.
|
Excellent work! 💪🏻 |
|
I have hit an issue in |
Masqueey
left a comment
There was a problem hiding this comment.
What an insane commit! Incredible that you can manage to implement such a large feature and manage to adapt the testing client to work and test your changes accordingly.
I left some spelling corrections and some questions, but mostly very impressed with the work you did.
| @@ -0,0 +1,241 @@ | |||
| `Kafka 0.11.0` includes support for `idempotent` and `transactional` capabilities in the `producer`. Idempotent delivery ensures that messages are delivered `exactly once` | |||
There was a problem hiding this comment.
Although we call it Kafka, formally it is called Apache Kafka. I think this first mention would be good to use the full name. Further references can stay Kafka.
| It is important to notice that: | ||
|
|
||
| - `Transaction` always start from a `send` (producer) | ||
| - Events sent to one or more topics will only be visible on consumers after the transaction is committed. |
| - To use transactions, a `transaction id` (unique id per transaction) must be set prior an event is sent. If you do not provide one `kstreams` will auto generate one for you. | ||
| - Transaction state is stored in a new internal topic `__transaction_state`. This topic is not created until the the first attempt to use a transactional request API. There are several settings to control the topic's configuration. | ||
| - `Topics` which are included in transactions should be configured for durability. In particular, the `replication.factor` should be at least `3`, and the `min.insync.replicas` for these topics should be set to `2` | ||
| - `Transactions` always add overhead, meaning that more effort is needed to produce events and the consumer have to apply filters |
There was a problem hiding this comment.
Not all sentences in this itemisation end in a period (.), I would change them to make sure all of them do. (Sorry for nitpicking.)
| - `Topics` which are included in transactions should be configured for durability. In particular, the `replication.factor` should be at least `3`, and the `min.insync.replicas` for these topics should be set to `2` | ||
| - `Transactions` always add overhead, meaning that more effort is needed to produce events and the consumer have to apply filters | ||
| For example, `transaction.state.log.min.isr` controls the minimum ISR for this topic. | ||
| - `Streams` (consumers) will have to filter or not transactional events |
There was a problem hiding this comment.
Maybe more: "Consumer will have to set per Stream whether to filter out events based on their transaction header." (Or something like that, I might not understand the process well enough yet.)
| queue "Topic A" as topic_a | ||
|
|
||
| producer -> topic_a: "produce with a transaction" | ||
| topic_a --> stream_a: "consume only transactional events" |
There was a problem hiding this comment.
It's more about the success of the transaction, more than whether or not it was part of a transaction, right? So more "filter out aborted transactional events" than "only transactional"?
| self._producer: typing.Optional[typing.Type[Producer]] = None | ||
| self._producer: typing.Optional[Producer] = None | ||
| self._streams: typing.List[Stream] = [] | ||
| self._transaction_manager = TransactionManager( |
There was a problem hiding this comment.
How come this TransactionManager no longer needs to be imported?
| @@ -74,7 +132,10 @@ def __init__(self, group_id: Optional[str] = None, **kwargs) -> None: | |||
|
|
|||
| # Called to make sure that has all the kafka attributes like _coordinator | |||
| @@ -74,7 +132,10 @@ def __init__(self, group_id: Optional[str] = None, **kwargs) -> None: | |||
|
|
|||
| # Called to make sure that has all the kafka attributes like _coordinator | |||
| # so it will behave like an real Kafka Consumer | |||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_producer_with_transaction_no_consumer( |
There was a problem hiding this comment.
I do not entirely get what this test is supposed to test. Not sure what this scenario is representing.
| await asyncio.wait_for(stream.start(), timeout=0.1) | ||
|
|
||
| await stream.stop() | ||
|
|
There was a problem hiding this comment.
The only test I'm still missing here is the produce-then-commit transaction test. (Although I'm aware that that one doesn't work because of the bug in aiokafka.)
… Related to #265