-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48134: [C++] Make StructArray::field() thread-safe #48128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
b2edd9c to
821e498
Compare
|
|
|
Can we simplify this something like the following? diff --git a/cpp/src/arrow/array/array_nested.cc b/cpp/src/arrow/array/array_nested.cc
index b821451419..4eb5e82087 100644
--- a/cpp/src/arrow/array/array_nested.cc
+++ b/cpp/src/arrow/array/array_nested.cc
@@ -1086,8 +1086,10 @@ const std::shared_ptr<Array>& StructArray::field(int i) const {
field_data = data_->child_data[i];
}
result = MakeArray(field_data);
- std::atomic_store(&boxed_fields_[i], std::move(result));
- return boxed_fields_[i];
+ // If some other thread inserted the array in the meantime, just
+ // drop this array.
+ std::shared_ptr<Array> expected = nullptr;
+ std::atomic_compare_exchange_weak(&boxed_fields_[i], &expected, std::move(result));
}
return boxed_fields_[i];
} |
|
While that is a smaller change it would probably work, I think the proposed change is superior. The supposed optimization of returning a const ref in the old implementation does not work anyways since the function bumps the ref-count internally in the way it checks if the array is already materialized. So to compare the ref-count changes in case the caller takes ownership: If the caller does not take ownership: Another, probably more important argument: is a potential heap-use-after-free that is fixed with the proposed implementation. Also, "simple" is somewhat subjective, and returning a |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ok to me. It would be an API change but one that should generally be source-compatible in practice. (Unless someone relies on getting exactly the same Array*!)
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the principle, just one suggestion
|
@github-actions crossbow submit -g cpp |
|
Revision: 821e498 Submitted crossbow builds: ursacomputing/crossbow @ actions-55e2a943b6 |
Before this change it was possible for two threads calling `field()` with the same index at the same time to cause a race on the stored entry in `boxed_fields_`. I.e. if a second thread goes into the path that calls `MakeArray` before the first thread stored its own new array, the second thread would also write to the same shared_ptr and invalidate the shared_ptr from the first thread, thereby also invalidating the returned reference.
821e498 to
a9900d6
Compare
|
Thanks a lot @tobim . I will merge if CI is green. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 98620f5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Before this change it was possible for two threads calling
field()with the same index at the same time to cause a race on the stored entry inboxed_fields_. I.e. if a second thread goes into the path that callsMakeArraybefore the first thread stored its own new array, the second thread would also write to the sameshared_ptrand invalidate theshared_ptrfrom the first thread, thereby also invalidating the returned reference.What changes are included in this PR?
This PR changes the return type of
StructArray::field()fromshared_ptr<Array>&toshared_ptr<Array>giving the caller co-ownership of the data and safeguarding against any potential concurrent writes to the underlyingboxed_fields_vector.It also changes the body to use the CAS pattern to avoid multiple concurrent writes to the same address.
Are these changes tested?
I don't know how to write a deterministic test that triggers the issue before the fix. Even a non-deterministic test needs to run with address sanitizer or valgrind or something similar.
I can however confirm that this change fixes an issue that I've been debugging in https://github.com/tenzir/tenzir.
Are there any user-facing changes?
While changing
StructArray::field()to return by value is an API change, I believe this should be compatible with regular uses of that function.