Fix lookup for current Thread in async signal handler #122048

janvorli · 2025-11-28T23:10:11Z

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.

This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.

The check for safety of the activation injection was further enhanced
to make sure that the ScanReaderLock is not taken. In cases it would
need to be taken, we just reject the location.

Since NativeAOT is subject to the same issue, the code to maintain the
thread id to thread instance map is placed to the minipal and shared
between coreclr and NativeAOT.

Closes #121581

The current CheckActivationSafePoint uses thread local storage to get the current Thread instance. But this function is called from async signal handler (the activation signal handler) and it is not allowed to access TLS variables there because the access can allocate and if the interrupted code was running in an allocation code, it could crash. There was no problem with this since .NET 1.0, but a change in the recent glibc version has broken this. We've got reports of crashes in this code due to the reason mentioned above. This change introduces an async safe mechanism for accessing the current Thread instance from async signal handlers. It uses a segmented array that can grow, but never shrink. Entries for threads are added when runtime creates a thread / attaches to an external thread and removed when the thread dies. Closes dotnet#121581

dotnet-policy-service · 2025-11-28T23:11:09Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR fixes crashes occurring when async signal handlers access Thread Local Storage (TLS) in recent glibc versions. The fix introduces an async-safe, lock-free segmented array to map OS thread IDs to Thread instances, avoiding TLS access in signal handlers. The implementation is shared between CoreCLR and NativeAOT through the minipal library.

Key changes:

New async-safe thread lookup mechanism using lock-free segmented arrays
Added minipal_get_current_thread_id_no_cache() to avoid TLS in signal handlers
Enhanced activation safe point checks to avoid taking ScanReaderLock
Integrated async-safe thread lookup in both CoreCLR and NativeAOT runtimes

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/native/minipal/thread.h	Adds async-safe API declarations and no-cache thread ID function
src/native/minipal/thread.c	Implements lock-free segmented array for async-safe thread lookup
src/native/minipal/CMakeLists.txt	Adds thread.c to build
src/coreclr/vm/threadsuspend.cpp	Updates CheckActivationSafePoint to use async-safe thread lookup
src/coreclr/vm/threads.h	Declares GetThreadAsyncSafe for Unix
src/coreclr/vm/threads.cpp	Implements GetThreadAsyncSafe and integrates with async-safe map
src/coreclr/vm/codeman.h	Adds IsManagedCodeNoLock and GetScanFlags parameter
src/coreclr/vm/codeman.cpp	Implements IsManagedCodeNoLock for use without reader lock
src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp	Updates activation handler to use async-safe thread lookup
src/coreclr/nativeaot/Runtime/threadstore.inl	Conditionalizes GetCurrentThread and adds async-safe variant
src/coreclr/nativeaot/Runtime/threadstore.h	Declares GetCurrentThreadIfAvailableAsyncSafe
src/coreclr/nativeaot/Runtime/threadstore.cpp	Implements async-safe thread lookup and map integration

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copilot · 2025-11-28T23:14:47Z

src/coreclr/vm/threadsuspend.cpp

+    _ASSERTE(pThread != NULL);



The assertion _ASSERTE(pThread != NULL) on line 5735 conflicts with the subsequent null check on line 5739. If the assertion is correct and pThread can never be NULL, then the null checks on lines 5739 and 5745 are redundant and should be removed. However, if there are cases where GetThreadAsyncSafe() can return NULL (e.g., if the thread hasn't been registered yet), then the assertion is incorrect and should be changed to a null check with early return.

Given that this is called from an async signal handler where the thread might not be fully initialized, the safer approach would be to remove the assertion and handle the NULL case gracefully:

Thread *pThread = GetThreadAsyncSafe(); if (pThread == NULL) { return FALSE; }

Suggested change

_ASSERTE(pThread != NULL);

if (pThread == NULL)

{

return FALSE;

}

This looks like valid feedback. We can be seeing a signal sent by some other component (e.g. second NAOT runtime). We should forward the signal to the next component in that case.

Could you please do some ad-hoc stress testing to make sure that it works well? We should not be crashing, and we should not be eating signals that other components may expect.

Copilot · 2025-11-28T23:14:47Z

src/native/minipal/thread.c

+            if (__atomic_compare_exchange_n(
+                    &pSegment->entries[index].osThread,
+                    &expected, osThread,
+                    false /* weak */,
+                    __ATOMIC_RELEASE  /* success_memorder */,
+                    __ATOMIC_RELAXED /* failure_memorder */))
+            {
+                // Successfully inserted
+                pSegment->entries[index].pThread = pThread;


The write to pThread at line 78 is not atomic, which creates a race condition with the concurrent read in minipal_find_thread_in_async_safe_map at line 122. After the osThread field is atomically set with release semantics, a concurrent reader could load the osThread value with acquire semantics but then read a stale or partially-written pThread value.

To fix this, the write to pThread should happen before the atomic write to osThread, ensuring proper happens-before relationship:

pSegment->entries[index].pThread = pThread; __atomic_store_n(&pSegment->entries[index].osThread, osThread, __ATOMIC_RELEASE);

Then update line 70-75 to just check if the slot is empty before proceeding.

Suggested change

if (__atomic_compare_exchange_n(

&pSegment->entries[index].osThread,

&expected, osThread,

false /* weak */,

__ATOMIC_RELEASE /* success_memorder */,

__ATOMIC_RELAXED /* failure_memorder */))

{

// Successfully inserted

pSegment->entries[index].pThread = pThread;

if (expected == __atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE))

{

// Write pThread first, then publish osThread atomically

pSegment->entries[index].pThread = pThread;

__atomic_store_n(&pSegment->entries[index].osThread, osThread, __ATOMIC_RELEASE);

Copilot · 2025-11-28T23:14:48Z

src/native/minipal/thread.c

+        for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
+        {
+            size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
+            // Use acquire to synchronize with release in insert_thread_to_async_safe_map


The comment references insert_thread_to_async_safe_map but the actual function name is minipal_insert_thread_into_async_safe_map. This should be corrected for clarity and consistency.

Suggested change

// Use acquire to synchronize with release in insert_thread_to_async_safe_map

// Use acquire to synchronize with release in minipal_insert_thread_into_async_safe_map

Copilot · 2025-11-28T23:14:48Z

src/native/minipal/thread.c

+            // Use acquire to synchronize with release in insert_thread_to_async_safe_map
+            if (__atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE) == osThread)
+            {
+                return pSegment->entries[index].pThread;


The pThread field is read non-atomically at line 122, which can race with the non-atomic write at line 78 in minipal_insert_thread_into_async_safe_map. Even though the osThread field is read with acquire semantics, this doesn't protect the subsequent pThread read because it's a separate memory location.

The pThread read should use an atomic load with acquire semantics to properly synchronize with the insert operation:

void* result = __atomic_load_n(&pSegment->entries[index].pThread, __ATOMIC_ACQUIRE); return result;

This ensures that if we observe a matching osThread value, we'll also observe the correctly written pThread value.

Suggested change

return pSegment->entries[index].pThread;

return __atomic_load_n(&pSegment->entries[index].pThread, __ATOMIC_ACQUIRE);

Copilot · 2025-11-28T23:14:48Z

src/native/minipal/thread.c

+        pSegment = pSegment->pNext;
+    }
+}
+
+void minipal_remove_thread_from_async_safe_map(size_t osThread, void* pThread)
+{
+    size_t startIndex = osThread % MAX_THREADS_IN_SEGMENT;
+
+    struct ThreadSegment* pSegment = s_pAsyncSafeThreadMapHead;
+    while (pSegment)
+    {
+        for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
+        {
+            size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
+            if (pSegment->entries[index].pThread == pThread)
+            {
+                // Found the entry, remove it
+                pSegment->entries[index].pThread = NULL;
+                __atomic_exchange_n(&pSegment->entries[index].osThread, (size_t)0, __ATOMIC_RELEASE);
+                return;
+            }
+        }
+        pSegment = pSegment->pNext;
+    }
+}
+
+void *minipal_find_thread_in_async_safe_map(size_t osThread)
+{
+    size_t startIndex = osThread % MAX_THREADS_IN_SEGMENT;
+    struct ThreadSegment* pSegment = s_pAsyncSafeThreadMapHead;
+    while (pSegment)
+    {
+        for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
+        {
+            size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
+            // Use acquire to synchronize with release in insert_thread_to_async_safe_map
+            if (__atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE) == osThread)
+            {
+                return pSegment->entries[index].pThread;
+            }
+        }
+        pSegment = pSegment->pNext;


The pNext field is read non-atomically on lines 84, 106, and 125, which can race with the atomic write to pNext at line 50-56 in the insert function. This could lead to reading a partially written pointer value, causing crashes or incorrect behavior.

All reads of pSegment->pNext should use atomic loads with acquire semantics to synchronize with the release store in the insert operation:

pSegment = __atomic_load_n(&pSegment->pNext, __ATOMIC_ACQUIRE);

This ensures proper synchronization across all functions that traverse the segment list.

jkotas · 2025-11-29T01:46:18Z

src/native/minipal/thread.c

+    struct ThreadSegment* pNext;
+};
+
+static struct ThreadSegment *s_pAsyncSafeThreadMapHead = NULL;


I am not sure whether the implementation of this data structure belongs to minipal.

If you have put it into minipal just to facilitate sharing between NAOT and regular CoreCLR, we have better place for that: `src\coreclr\runtime .

Initially, native/minipal was created to share code between src/coreclr and something outside of coreclr (libs, corehost or mono). Then we added cpuid/cpufeatures and bunch of other stuff which is not used by anything outside coreclr. Today, src/native/minipal is just being used to share C code regardless of whether it’s used outside of src/coreclr.

to facilitate sharing between NAOT and regular CoreCLR, we have better place for that: `src\coreclr\runtime .

I thought src\coreclr\minipal is the place for that facility and coreclr\runtime is to unify asm code between them.

I think about src\minipal as general-purpose stuff that should be abstracting the platform. I know we have a few exceptions that are not really a platform abstraction - but they are still very general-purpose utilities.

This data structure looks very specific to how the thread suspension is implemented in JIT (regular CoreCLR) and NAOT runtimes. I think about it as an implementation detail of the thread suspension. Ideally, the whole implementation of thread suspension would be shared between JIT and NAOT - we are not there yet.

cpuid/cpufeatures

BTW: cpuid/cpufeatures is also used in crossgen2/ilc that is distinct from the runtimes.

src\coreclr\minipal is the place for that facility

coreclr\minipal has the messier and less general-purpose PAL abstractions that are specific to the JIT runtime. It is not used by anything else. We tend to be moving bits of src\coreclr\pal that are too difficult to cleanup completely, https://github.com/dotnet/runtime/blob/main/src/coreclr/minipal/dn-stdio.h#L15 is a great example.

coreclr\runtime is to unify asm code between them.

It is meant to be used for more than just the .asm files. It has CachedInterfaceDispatch.cpp/CachedInterfaceDispatch.h today. It would be nice to move the other .h/cpp files that are shared between JIT and NAOT runtime too. These files are cherry-picked from VM directory in a hacky way currently that does not look great, e.g.: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime/gcinfodecoder.cpp.

jkotas · 2025-11-29T02:31:45Z

There was no problem with this since .NET 1.0, but a change in the recent glibc version has broken this

Have you been able to trace down the change? I think we just got lucky that it has not showed up on the radar earlier.

janvorli added 2 commits November 25, 2025 00:38

Add support for NativeAOT

712bdb6

janvorli added this to the 11.0.0 milestone Nov 28, 2025

janvorli requested a review from jkotas November 28, 2025 23:10

janvorli self-assigned this Nov 28, 2025

janvorli requested a review from MichalStrehovsky as a code owner November 28, 2025 23:10

Copilot AI review requested due to automatic review settings November 28, 2025 23:10

janvorli added the area-VM-coreclr label Nov 28, 2025

Copilot started reviewing on behalf of janvorli November 28, 2025 23:11 View session

Copilot finished reviewing on behalf of janvorli November 28, 2025 23:13

Copilot AI reviewed Nov 28, 2025

View reviewed changes

jkotas reviewed Nov 29, 2025

View reviewed changes

build-analysis bot mentioned this pull request Nov 29, 2025

wasm build failure in CI #117017

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix lookup for current Thread in async signal handler #122048

Fix lookup for current Thread in async signal handler #122048

janvorli commented Nov 28, 2025

Uh oh!

dotnet-policy-service bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

jkotas Nov 29, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

jkotas Nov 29, 2025 •

edited

Loading

Uh oh!

am11 Nov 29, 2025 •

edited

Loading

Uh oh!

jkotas Nov 29, 2025 •

edited

Loading

Uh oh!

jkotas commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    _ASSERTE(pThread != NULL);
+    if (pThread == NULL)
+    {
+        return FALSE;
+    }

	// Use acquire to synchronize with release in insert_thread_to_async_safe_map
	// Use acquire to synchronize with release in minipal_insert_thread_into_async_safe_map

	return pSegment->entries[index].pThread;
	return __atomic_load_n(&pSegment->entries[index].pThread, __ATOMIC_ACQUIRE);

Fix lookup for current Thread in async signal handler #122048

Are you sure you want to change the base?

Fix lookup for current Thread in async signal handler #122048

Conversation

janvorli commented Nov 28, 2025

Uh oh!

dotnet-policy-service bot commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

am11 Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jkotas Nov 29, 2025 •

edited

Loading

am11 Nov 29, 2025 •

edited

Loading

jkotas Nov 29, 2025 •

edited

Loading