Open
Conversation
Writes to a new Zookeepr znode should take advantage of Zookeeper's atomic create + write primitive. If not, it is possible that a read that was triggered by a watch will return an empty string. The previous workaround for this does not always work (e.g., when an empty string is returnedi due to a race) and can potentially cause call-stack overflow. This change-set fixes this race and removes the workaround. It also adds a call to Zookeepeer's Sync() on a Get operation, only when an empty string (or SOH) is returned to guard against an older version of libkv doing create+write in a non atomic fashion. This change-set addresses github.com/docker-archive/classicswarm/issues/1915 Signed-off-by: Amir Malekpour <amir.malekpour@cohodata.com>
refs CIO-39409 In the implementation of Store.Watch(), which watches for changes in a specific key, clients could miss changes in the value of interest due to the following pattern in the implementation: 1. get value of key 2. send value of key to client on channel 3. get value of key and set watch, but ignore value of key 4. when watch fires, get value of key and send value of key to client on channel The above has been reduced to: 1. get value of key and set watch, send value of key to client on channel In the implementation of Store.WatchTree(), which watches for changes in the children of specific key, clients could miss events due to the following pattern in the implementation: 1. get children of the key 2. send child list to the client over the channel 3. get children and set watch, but ignore the set of children 4. when watch fires, get children of key and send child list to the client over the channel Step 4 was problematic because any failure to get the children after the event was fired would result in clients missing the change to the set of children until the watch fired again due to a subsequent change. The implementation has been reduced to: 1. get children and set watch, send children to client on channel and retry immediately if values of child nodes could not be read Signed-off-by: Daniel Ferstay <dan@cohodata.com>
Author
|
The CI failure is fixed by #154 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There were a few issues that I spotted with respect to ZK watch handling in the libkv library.
When setting a watch the library calls GetW() on the zookeeper client, but it didn't handle the race condition that Amir Malekpour fixed previously by sync'ing and retrying.
In the implementation of Store.Watch(), which watches for changes in a specific key, clients could miss changes in the value of interest due to the following pattern in the implementation:
The above has been reduced to:
Step 4 was problematic because any failure to get the children after the event was fired would result in clients missing the change to the set of children until the watch fired again due to a subsequent change.
The implementation has been reduced to: