Skip to content

Conversation

@janvorli
Copy link
Member

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.

This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.

The check for safety of the activation injection was further enhanced
to make sure that the ScanReaderLock is not taken. In cases it would
need to be taken, we just reject the location.

Since NativeAOT is subject to the same issue, the code to maintain the
thread id to thread instance map is placed to the minipal and shared
between coreclr and NativeAOT.

Closes #121581

The current CheckActivationSafePoint uses thread local storage to
get the current Thread instance. But this function is called from
async signal handler (the activation signal handler) and it is not
allowed to access TLS variables there because the access can allocate
and if the interrupted code was running in an allocation code, it
could crash.
There was no problem with this since .NET 1.0, but a change in the
recent glibc version has broken this. We've got reports of crashes
in this code due to the reason mentioned above.

This change introduces an async safe mechanism for accessing the
current Thread instance from async signal handlers. It uses a
segmented array that can grow, but never shrink. Entries for
threads are added when runtime creates a thread / attaches to an
external thread and removed when the thread dies.

Closes dotnet#121581
@janvorli janvorli added this to the 11.0.0 milestone Nov 28, 2025
@janvorli janvorli requested a review from jkotas November 28, 2025 23:10
@janvorli janvorli self-assigned this Nov 28, 2025
Copilot AI review requested due to automatic review settings November 28, 2025 23:10
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copilot finished reviewing on behalf of janvorli November 28, 2025 23:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes crashes occurring when async signal handlers access Thread Local Storage (TLS) in recent glibc versions. The fix introduces an async-safe, lock-free segmented array to map OS thread IDs to Thread instances, avoiding TLS access in signal handlers. The implementation is shared between CoreCLR and NativeAOT through the minipal library.

Key changes:

  • New async-safe thread lookup mechanism using lock-free segmented arrays
  • Added minipal_get_current_thread_id_no_cache() to avoid TLS in signal handlers
  • Enhanced activation safe point checks to avoid taking ScanReaderLock
  • Integrated async-safe thread lookup in both CoreCLR and NativeAOT runtimes

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/native/minipal/thread.h Adds async-safe API declarations and no-cache thread ID function
src/native/minipal/thread.c Implements lock-free segmented array for async-safe thread lookup
src/native/minipal/CMakeLists.txt Adds thread.c to build
src/coreclr/vm/threadsuspend.cpp Updates CheckActivationSafePoint to use async-safe thread lookup
src/coreclr/vm/threads.h Declares GetThreadAsyncSafe for Unix
src/coreclr/vm/threads.cpp Implements GetThreadAsyncSafe and integrates with async-safe map
src/coreclr/vm/codeman.h Adds IsManagedCodeNoLock and GetScanFlags parameter
src/coreclr/vm/codeman.cpp Implements IsManagedCodeNoLock for use without reader lock
src/coreclr/nativeaot/Runtime/unix/PalUnix.cpp Updates activation handler to use async-safe thread lookup
src/coreclr/nativeaot/Runtime/threadstore.inl Conditionalizes GetCurrentThread and adds async-safe variant
src/coreclr/nativeaot/Runtime/threadstore.h Declares GetCurrentThreadIfAvailableAsyncSafe
src/coreclr/nativeaot/Runtime/threadstore.cpp Implements async-safe thread lookup and map integration

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines +5735 to 5736
_ASSERTE(pThread != NULL);

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion _ASSERTE(pThread != NULL) on line 5735 conflicts with the subsequent null check on line 5739. If the assertion is correct and pThread can never be NULL, then the null checks on lines 5739 and 5745 are redundant and should be removed. However, if there are cases where GetThreadAsyncSafe() can return NULL (e.g., if the thread hasn't been registered yet), then the assertion is incorrect and should be changed to a null check with early return.

Given that this is called from an async signal handler where the thread might not be fully initialized, the safer approach would be to remove the assertion and handle the NULL case gracefully:

Thread *pThread = GetThreadAsyncSafe();
if (pThread == NULL)
{
    return FALSE;
}
Suggested change
_ASSERTE(pThread != NULL);
if (pThread == NULL)
{
return FALSE;
}

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like valid feedback. We can be seeing a signal sent by some other component (e.g. second NAOT runtime). We should forward the signal to the next component in that case.

Could you please do some ad-hoc stress testing to make sure that it works well? We should not be crashing, and we should not be eating signals that other components may expect.

Comment on lines +70 to +78
if (__atomic_compare_exchange_n(
&pSegment->entries[index].osThread,
&expected, osThread,
false /* weak */,
__ATOMIC_RELEASE /* success_memorder */,
__ATOMIC_RELAXED /* failure_memorder */))
{
// Successfully inserted
pSegment->entries[index].pThread = pThread;
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write to pThread at line 78 is not atomic, which creates a race condition with the concurrent read in minipal_find_thread_in_async_safe_map at line 122. After the osThread field is atomically set with release semantics, a concurrent reader could load the osThread value with acquire semantics but then read a stale or partially-written pThread value.

To fix this, the write to pThread should happen before the atomic write to osThread, ensuring proper happens-before relationship:

pSegment->entries[index].pThread = pThread;
__atomic_store_n(&pSegment->entries[index].osThread, osThread, __ATOMIC_RELEASE);

Then update line 70-75 to just check if the slot is empty before proceeding.

Suggested change
if (__atomic_compare_exchange_n(
&pSegment->entries[index].osThread,
&expected, osThread,
false /* weak */,
__ATOMIC_RELEASE /* success_memorder */,
__ATOMIC_RELAXED /* failure_memorder */))
{
// Successfully inserted
pSegment->entries[index].pThread = pThread;
if (expected == __atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE))
{
// Write pThread first, then publish osThread atomically
pSegment->entries[index].pThread = pThread;
__atomic_store_n(&pSegment->entries[index].osThread, osThread, __ATOMIC_RELEASE);

Copilot uses AI. Check for mistakes.
for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
{
size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
// Use acquire to synchronize with release in insert_thread_to_async_safe_map
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references insert_thread_to_async_safe_map but the actual function name is minipal_insert_thread_into_async_safe_map. This should be corrected for clarity and consistency.

Suggested change
// Use acquire to synchronize with release in insert_thread_to_async_safe_map
// Use acquire to synchronize with release in minipal_insert_thread_into_async_safe_map

Copilot uses AI. Check for mistakes.
// Use acquire to synchronize with release in insert_thread_to_async_safe_map
if (__atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE) == osThread)
{
return pSegment->entries[index].pThread;
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pThread field is read non-atomically at line 122, which can race with the non-atomic write at line 78 in minipal_insert_thread_into_async_safe_map. Even though the osThread field is read with acquire semantics, this doesn't protect the subsequent pThread read because it's a separate memory location.

The pThread read should use an atomic load with acquire semantics to properly synchronize with the insert operation:

void* result = __atomic_load_n(&pSegment->entries[index].pThread, __ATOMIC_ACQUIRE);
return result;

This ensures that if we observe a matching osThread value, we'll also observe the correctly written pThread value.

Suggested change
return pSegment->entries[index].pThread;
return __atomic_load_n(&pSegment->entries[index].pThread, __ATOMIC_ACQUIRE);

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +125
pSegment = pSegment->pNext;
}
}

void minipal_remove_thread_from_async_safe_map(size_t osThread, void* pThread)
{
size_t startIndex = osThread % MAX_THREADS_IN_SEGMENT;

struct ThreadSegment* pSegment = s_pAsyncSafeThreadMapHead;
while (pSegment)
{
for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
{
size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
if (pSegment->entries[index].pThread == pThread)
{
// Found the entry, remove it
pSegment->entries[index].pThread = NULL;
__atomic_exchange_n(&pSegment->entries[index].osThread, (size_t)0, __ATOMIC_RELEASE);
return;
}
}
pSegment = pSegment->pNext;
}
}

void *minipal_find_thread_in_async_safe_map(size_t osThread)
{
size_t startIndex = osThread % MAX_THREADS_IN_SEGMENT;
struct ThreadSegment* pSegment = s_pAsyncSafeThreadMapHead;
while (pSegment)
{
for (size_t i = 0; i < MAX_THREADS_IN_SEGMENT; i++)
{
size_t index = (startIndex + i) % MAX_THREADS_IN_SEGMENT;
// Use acquire to synchronize with release in insert_thread_to_async_safe_map
if (__atomic_load_n(&pSegment->entries[index].osThread, __ATOMIC_ACQUIRE) == osThread)
{
return pSegment->entries[index].pThread;
}
}
pSegment = pSegment->pNext;
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pNext field is read non-atomically on lines 84, 106, and 125, which can race with the atomic write to pNext at line 50-56 in the insert function. This could lead to reading a partially written pointer value, causing crashes or incorrect behavior.

All reads of pSegment->pNext should use atomic loads with acquire semantics to synchronize with the release store in the insert operation:

pSegment = __atomic_load_n(&pSegment->pNext, __ATOMIC_ACQUIRE);

This ensures proper synchronization across all functions that traverse the segment list.

Copilot uses AI. Check for mistakes.
struct ThreadSegment* pNext;
};

static struct ThreadSegment *s_pAsyncSafeThreadMapHead = NULL;
Copy link
Member

@jkotas jkotas Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether the implementation of this data structure belongs to minipal.

If you have put it into minipal just to facilitate sharing between NAOT and regular CoreCLR, we have better place for that: `src\coreclr\runtime .

Copy link
Member

@am11 am11 Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, native/minipal was created to share code between src/coreclr and something outside of coreclr (libs, corehost or mono). Then we added cpuid/cpufeatures and bunch of other stuff which is not used by anything outside coreclr. Today, src/native/minipal is just being used to share C code regardless of whether it’s used outside of src/coreclr.

to facilitate sharing between NAOT and regular CoreCLR, we have better place for that: `src\coreclr\runtime .

I thought src\coreclr\minipal is the place for that facility and coreclr\runtime is to unify asm code between them.

Copy link
Member

@jkotas jkotas Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about src\minipal as general-purpose stuff that should be abstracting the platform. I know we have a few exceptions that are not really a platform abstraction - but they are still very general-purpose utilities.

This data structure looks very specific to how the thread suspension is implemented in JIT (regular CoreCLR) and NAOT runtimes. I think about it as an implementation detail of the thread suspension. Ideally, the whole implementation of thread suspension would be shared between JIT and NAOT - we are not there yet.

cpuid/cpufeatures

BTW: cpuid/cpufeatures is also used in crossgen2/ilc that is distinct from the runtimes.

src\coreclr\minipal is the place for that facility

coreclr\minipal has the messier and less general-purpose PAL abstractions that are specific to the JIT runtime. It is not used by anything else. We tend to be moving bits of src\coreclr\pal that are too difficult to cleanup completely, https://github.com/dotnet/runtime/blob/main/src/coreclr/minipal/dn-stdio.h#L15 is a great example.

coreclr\runtime is to unify asm code between them.

It is meant to be used for more than just the .asm files. It has CachedInterfaceDispatch.cpp/CachedInterfaceDispatch.h today. It would be nice to move the other .h/cpp files that are shared between JIT and NAOT runtime too. These files are cherry-picked from VM directory in a hacky way currently that does not look great, e.g.: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime/gcinfodecoder.cpp.

@jkotas
Copy link
Member

jkotas commented Nov 29, 2025

There was no problem with this since .NET 1.0, but a change in the recent glibc version has broken this

Have you been able to trace down the change? I think we just got lucky that it has not showed up on the radar earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address sanitizer: Crash/deadlock in inject signal handler on Linux with glibc >= 2.40

3 participants