Skip to content

Conversation

@tobim
Copy link
Contributor

@tobim tobim commented Nov 13, 2025

Rationale for this change

Before this change it was possible for two threads calling field() with the same index at the same time to cause a race on the stored entry in boxed_fields_. I.e. if a second thread goes into the path that calls MakeArray before the first thread stored its own new array, the second thread would also write to the same shared_ptr and invalidate the shared_ptr from the first thread, thereby also invalidating the returned reference.

What changes are included in this PR?

This PR changes the return type of StructArray::field() from shared_ptr<Array>& to shared_ptr<Array> giving the caller co-ownership of the data and safeguarding against any potential concurrent writes to the underlying boxed_fields_ vector.
It also changes the body to use the CAS pattern to avoid multiple concurrent writes to the same address.

Are these changes tested?

I don't know how to write a deterministic test that triggers the issue before the fix. Even a non-deterministic test needs to run with address sanitizer or valgrind or something similar.

I can however confirm that this change fixes an issue that I've been debugging in https://github.com/tenzir/tenzir.

Are there any user-facing changes?

While changing StructArray::field() to return by value is an API change, I believe this should be compatible with regular uses of that function.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@tobim tobim force-pushed the push-pszzuwsnowxv branch from b2edd9c to 821e498 Compare November 14, 2025 11:44
@tobim tobim changed the title Make StructArray::field() thread-safe GH-48134: [C++] Make StructArray::field() thread-safe Nov 14, 2025
@github-actions
Copy link

⚠️ GitHub issue #48134 has been automatically assigned in GitHub to PR creator.

@kou
Copy link
Member

kou commented Nov 15, 2025

Can we simplify this something like the following?

diff --git a/cpp/src/arrow/array/array_nested.cc b/cpp/src/arrow/array/array_nested.cc
index b821451419..4eb5e82087 100644
--- a/cpp/src/arrow/array/array_nested.cc
+++ b/cpp/src/arrow/array/array_nested.cc
@@ -1086,8 +1086,10 @@ const std::shared_ptr<Array>& StructArray::field(int i) const {
       field_data = data_->child_data[i];
     }
     result = MakeArray(field_data);
-    std::atomic_store(&boxed_fields_[i], std::move(result));
-    return boxed_fields_[i];
+    // If some other thread inserted the array in the meantime, just
+    // drop this array.
+    std::shared_ptr<Array> expected = nullptr;
+    std::atomic_compare_exchange_weak(&boxed_fields_[i], &expected, std::move(result));
   }
   return boxed_fields_[i];
 }

@tobim
Copy link
Contributor Author

tobim commented Nov 15, 2025

While that is a smaller change it would probably work, I think the proposed change is superior. The supposed optimization of returning a const ref in the old implementation does not work anyways since the function bumps the ref-count internally in the way it checks if the array is already materialized.

So to compare the ref-count changes in case the caller takes ownership:
original implementation: +1 -1 +1
proposed implementation: +1

If the caller does not take ownership:
original implementation: +1 -1
proposed implementation: +1 -1

Another, probably more important argument:

auto fieldarray = std::move(parent).field(n)

is a potential heap-use-after-free that is fixed with the proposed implementation.

Also, "simple" is somewhat subjective, and returning a const shared_ptr& from a const function that internally modifies a mutable cache is implies certain assumptions. To me, that's nowhere near "simple".

@kou
Copy link
Member

kou commented Nov 19, 2025

@pitrou @lidavidm Do you have any opinion for this?

const & was added in #13364 and you reviewed the PR.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok to me. It would be an API change but one that should generally be source-compatible in practice. (Unless someone relies on getting exactly the same Array*!)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 19, 2025
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the principle, just one suggestion

@pitrou
Copy link
Member

pitrou commented Nov 20, 2025

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 821e498

Submitted crossbow builds: ursacomputing/crossbow @ actions-55e2a943b6

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Before this change it was possible for two threads calling `field()`
with the same index at the same time to cause a race on the stored
entry in `boxed_fields_`. I.e. if a second thread goes into the path
that calls `MakeArray` before the first thread stored its own new
array, the second thread would also write to the same shared_ptr and
invalidate the shared_ptr from the first thread, thereby also
invalidating the returned reference.
@tobim tobim force-pushed the push-pszzuwsnowxv branch from 821e498 to a9900d6 Compare November 20, 2025 16:22
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 20, 2025
@pitrou
Copy link
Member

pitrou commented Nov 20, 2025

Thanks a lot @tobim . I will merge if CI is green.

@pitrou pitrou merged commit 98620f5 into apache:main Nov 20, 2025
42 of 45 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Nov 20, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 98620f5.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants