Skip to content

6.0.0-beta00: domainworkers fails with Assertion failed or SIGSEGV sometimes #1079

@edwintorok

Description

@edwintorok

Doesn't happen always, but running dune runtest in 6.0.0-beta00 tag sometimes fails like this::

dune runtest --force
unixpipe: ✓
Testing library 'retry'...
..............
Ok. 14 tests ran, 0 tests skipped in 0.01 seconds
Testing library 'lwt_direct'...
.............
Ok. 13 tests ran, 0 tests skipped in 0.00 seconds
preempting: ✓
Testing library 'core'...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSSSSS..................................................................................................................................................
Ok. 697 tests ran, 23 tests skipped in 0.09 seconds
basic: ✓
moving-promises: ✓
File "test/multidomain/dune", line 2, characters 15-28:
2 |   (names basic domainworkers movingpromises unixpipe preempting)
                   ^^^^^^^^^^^^^
Fatal error: exception File "src/core/lwt.ml", line 1039, characters 23-29: Assertion failed
Testing library 'ppx'...
................
Ok. 16 tests ran, 0 tests skipped in 1.20 seconds
Testing library 'react'...
...........
Ok. 11 tests ran, 0 tests skipped in 4.50 seconds
Testing library 'unix'...
...........................................................................................................................
Ok. 123 tests ran, 0 tests skipped in 6.01 seconds

It doesn't happen with running just that test in a loop.

Running dune runtest --force a few more times causes domainworkers to fail in a different way though:

dune runtest --force
unixpipe: ✓
Testing library 'retry'...
..............
Ok. 14 tests ran, 0 tests skipped in 0.01 seconds
Testing library 'lwt_direct'...
.............
Ok. 13 tests ran, 0 tests skipped in 0.00 seconds
preempting: ✓
Testing library 'core'...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSSSSS..................................................................................................................................................
Ok. 697 tests ran, 23 tests skipped in 0.09 seconds
basic: ✓
moving-promises: ✓
File "test/multidomain/dune", line 2, characters 15-28:
2 |   (names basic domainworkers movingpromises unixpipe preempting)
                   ^^^^^^^^^^^^^
Command got signal SEGV.
Testing library 'ppx'...
................
Ok. 16 tests ran, 0 tests skipped in 1.20 seconds

This is with OCaml 5.3.0 on AMD Ryzen 9 7950X 16-Core Processor on Fedora 42.

GDB stacktrace
  Id   Target Id                                    Frame 
* 1    Thread 0x7f2eab18f100 (LWP 249173)           camlLwt.run_callbacks_1040 () at src/core/lwt.ml:1304
  2    Thread 0x7f2e99ffe6c0 (LWP 249184) (Exiting) 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117
  3    Thread 0x7f2e9afff6c0 (LWP 249182) (Exiting) 0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117
  4    Thread 0x7f2e92ffe6c0 (LWP 249188)           __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
  5    Thread 0x7f2e93fff6c0 (LWP 249185)           futex_wait (futex_word=0x1cb42e30, expected=2, private=0) at ../sysdeps/nptl/futex-internal.h:146
  6    Thread 0x7f2e91ffd6c0 (LWP 249190)           __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56

Thread 6 (Thread 0x7f2e91ffd6c0 (LWP 249190)):
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
No locals.
#1  0x00007f2eab1fe75c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=4294967295, nr=202) at cancellation.c:49
        result = <optimized out>
        pd = <optimized out>
        ch = <optimized out>
#2  0x00007f2eab1fedcc in __futex_abstimed_wait_common64 (private=0, futex_word=0x1cb43004, expected=<optimized out>, op=<optimized out>, abstime=0x0, cancel=true) at futex-internal.c:57
No locals.
#3  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x1cb43004, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
        err = <optimized out>
        clockbit = <optimized out>
        op = <optimized out>
#4  0x00007f2eab1fee2f in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x1cb43004, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
No locals.
#5  0x00007f2eab20149e in __pthread_cond_wait_common (cond=0x1cb42fe0, mutex=0x1cb42fb8, clockid=0, abstime=0x0) at pthread_cond_wait.c:426
        signals = <optimized out>
        g1_start = <optimized out>
        buffer = {__routine = 0x7f2eab2012c0 <__condvar_cleanup_waiting>, __arg = 0x7f2e91ffcdc0, __canceltype = 0, __prev = 0x0}
        cbuffer = {wseq = 3, cond = 0x1cb42fe0, mutex = 0x1cb42fb8, private = 0}
        err = <optimized out>
        result = 0
        wseq = 3
        g = <optimized out>
        seq = 1
        flags = <optimized out>
        private = 0
#6  ___pthread_cond_wait (cond=cond@entry=0x1cb42fe0, mutex=mutex@entry=0x1cb42fb8) at pthread_cond_wait.c:458
No locals.
#7  0x00000000004c8499 in caml_plat_wait (cond=cond@entry=0x1cb42fe0, mut=mut@entry=0x1cb42fb8) at runtime/platform.c:127
No locals.
#8  0x00000000004af936 in backup_thread_func (v=0x1cb42fa0) at runtime/domain.c:1068
        di = 0x1cb42fa0
        msg = <optimized out>
        s = 0x1cb42fb0
#9  0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139837994686144, -2486909826438284792, 139837994686144, 139838011464368, 0, 139838011464631, -2486909826480227832, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#10 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

Thread 5 (Thread 0x7f2e93fff6c0 (LWP 249185)):
#0  futex_wait (futex_word=0x1cb42e30, expected=2, private=0) at ../sysdeps/nptl/futex-internal.h:146
        __ret = -512
        err = <optimized out>
#1  __GI___lll_lock_wait (futex=futex@entry=0x1cb42e30, private=0) at lowlevellock.c:49
No locals.
#2  0x00007f2eab205501 in lll_mutex_lock_optimized (mutex=0x1cb42e30) at pthread_mutex_lock.c:48
        __futex = 0x1cb42e30
        private = <optimized out>
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x1cb42e30) at pthread_mutex_lock.c:93
        type = <optimized out>
        __PRETTY_FUNCTION__ = "___pthread_mutex_lock"
        id = <optimized out>
#4  0x00000000004af8c4 in caml_plat_lock_blocking (m=0x1cb42e30) at runtime/caml/platform.h:458
No locals.
#5  backup_thread_func (v=0x1cb42d90) at runtime/domain.c:1076
        di = 0x1cb42d90
        msg = <optimized out>
        s = 0x1cb42da0
#6  0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838028248768, -2486905429465515512, 139838028248768, 140735637062656, 0, 140735637062919, -2486905429507458552, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#7  0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

Thread 4 (Thread 0x7f2e92ffe6c0 (LWP 249188)):
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
No locals.
#1  0x00007f2eab1fe75c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=232) at cancellation.c:49
        result = <optimized out>
        pd = <optimized out>
        ch = <optimized out>
#2  0x00007f2eab1fe7a4 in __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=232) at cancellation.c:75
        r = <optimized out>
#3  0x00007f2eab285615 in epoll_wait (epfd=<optimized out>, events=<optimized out>, maxevents=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
No locals.
#4  0x00007f2eab474c44 in epoll_poll (loop=0x7f2e8c026080, timeout=<optimized out>) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev_epoll.c:155
        i = <optimized out>
        eventcnt = <optimized out>
#5  0x00007f2eab477807 in ev_run (loop=0x7f2e8c026080, flags=2) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev.c:4157
        waittime = 0.01999962985428283
        sleeptime = 0
        prev_mn_now = <optimized out>
        to = <optimized out>
        to = <optimized out>
        __PRETTY_FUNCTION__ = "ev_run"
#6  0x000000000049b418 in ev_loop (loop=0x7f2e8c026080, flags=2) at /usr/include/ev.h:841
No locals.
#7  0x000000000049b6d0 in lwt_libev_loop (val_loop=139838110178112, val_block=3) at lwt_libev_stubs.c:123
        loop = 0x7f2e8c026080
#8  <signal handler called>
No symbol table info available.
#9  0x00000000004063a6 in camlLwt_engine.fun_2534 () at src/unix/lwt_engine.ml:187
No locals.
#10 0x00000000004166aa in camlLwt_main.run_loop_696 () at src/unix/lwt_main.ml:45
No locals.
#11 0x000000000041698f in camlLwt_main.run_756 () at src/unix/lwt_main.ml:113
No locals.
#12 0x000000000045ced6 in camlStdlib__Domain.body_741 () at domain.ml:266
No locals.
#13 <signal handler called>
No symbol table info available.
#14 0x00000000004ac8b0 in caml_callback_exn (closure=<optimized out>, closure@entry=139838147682368, arg=<optimized out>, arg@entry=1) at runtime/callback.c:208
        domain_state = 0x7f2e8c002b80
#15 0x00000000004acd79 in caml_callback_res (closure=closure@entry=139838147682368, arg=arg@entry=1) at runtime/callback.c:321
No locals.
#16 0x00000000004af006 in domain_thread_func (v=<optimized out>) at runtime/domain.c:1244
        unrooted_callback = 139838147682368
        res = <optimized out>
        mut = <optimized out>
        p = <optimized out>
        ml_values = 0x1cb9f1f0
        signal_stack = 0x7f2e8c000b70
#17 0x00007f2eab201f54 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838011467456, -2486903229905389048, 139838011467456, 140735637062944, 0, 140735637063207, -2486903229947332088, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#18 0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

Thread 3 (Thread 0x7f2e9afff6c0 (LWP 249182) (Exiting)):
#0  0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117
No locals.
#1  0x00007f2eab20210f in advise_stack_range (mem=0x7f2e99fff000, size=16781312, pd=139838145689280, guardsize=<optimized out>) at /usr/src/debug/glibc-2.41-11.fc42.x86_64/nptl/allocatestack.c:196
        sp = 139838145687152
        pagesize_m1 = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "advise_stack_range"
#2  start_thread (arg=<optimized out>) at pthread_create.c:558
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838145689280, -2486920822628304376, 139838145689280, 140735637062944, 0, 140735637063207, -2486920822670247416, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

Thread 2 (Thread 0x7f2e99ffe6c0 (LWP 249184) (Exiting)):
#0  0x00007f2eab2813cb in __GI_madvise () at ../sysdeps/unix/syscall-template.S:117
No locals.
#1  0x00007f2eab20210f in advise_stack_range (mem=0x7f2e98ffe000, size=16781312, pd=139838128907968, guardsize=<optimized out>) at /usr/src/debug/glibc-2.41-11.fc42.x86_64/nptl/allocatestack.c:196
        sp = 139838128905840
        pagesize_m1 = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "advise_stack_range"
#2  start_thread (arg=<optimized out>) at pthread_create.c:558
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139838128907968, -2486927419161200120, 139838128907968, 139838145686192, 0, 139838145686455, -2486927419203143160, -2486887426650424824}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007f2eab28532c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

Thread 1 (Thread 0x7f2eab18f100 (LWP 249173)):
#0  camlLwt.run_callbacks_1040 () at src/core/lwt.ml:1304
No locals.
#1  0x000000000041a393 in camlLwt.run_in_resolution_loop_1147 () at src/core/lwt.ml:1339
No locals.
#2  0x000000000041a5f8 in camlLwt.resolve_1165 () at src/core/lwt.ml:1375
No locals.
#3  0x000000000041b367 in camlLwt.callback_1437 () at src/core/lwt.ml:1701
No locals.
#4  0x0000000000417fa0 in camlLwt_sequence.loop_347 () at src/core/lwt_sequence.ml:132
No locals.
#5  0x000000000044f045 in camlStdlib__Array.iter_340 () at array.ml:113
No locals.
#6  <signal handler called>
No symbol table info available.
#7  0x00000000004ac8b0 in caml_callback_exn (closure=<optimized out>, arg=<optimized out>) at runtime/callback.c:208
        domain_state = 0x1cb4b9c0
#8  0x00000000004ace09 in caml_callback (closure=<optimized out>, arg=<optimized out>) at runtime/callback.c:347
No locals.
#9  0x000000000049b779 in handle_io (loop=0x1cba93b0, watcher=0x1cba9bd0, revents=1) at lwt_libev_stubs.c:161
No locals.
#10 0x00007f2eab47423b in ev_invoke_pending (loop=0x1cba93b0) at /usr/src/debug/libev-4.33-13.fc42.x86_64/ev.c:3770
        p = <optimized out>
#11 0x000000000049b6e1 in lwt_libev_loop (val_loop=139838147681440, val_block=3) at lwt_libev_stubs.c:127
        loop = 0x1cba93b0
#12 <signal handler called>
No symbol table info available.
#13 0x00000000004063a6 in camlLwt_engine.fun_2534 () at src/unix/lwt_engine.ml:187
No locals.
#14 0x00000000004166aa in camlLwt_main.run_loop_696 () at src/unix/lwt_main.ml:45
No locals.
#15 0x000000000041698f in camlLwt_main.run_756 () at src/unix/lwt_main.ml:113
No locals.
#16 0x0000000000404e5e in camlDune__exe__Domainworkers.main_850 () at test/multidomain/domainworkers.ml:45
No locals.
#17 0x0000000000405232 in camlDune__exe__Domainworkers.entry () at test/multidomain/domainworkers.ml:74
No locals.
#18 0x0000000000401af7 in caml_program ()
No symbol table info available.
#19 <signal handler called>
No symbol table info available.
#20 0x00000000004d2954 in caml_startup_common (pooling=<optimized out>, argv=0x7fff91a78978) at runtime/startup_nat.c:127
        exe_name = <optimized out>
        proc_self_exe = <optimized out>
        res = <optimized out>
#21 caml_startup_common (argv=0x7fff91a78978, pooling=<optimized out>) at runtime/startup_nat.c:86
        exe_name = <optimized out>
        proc_self_exe = <optimized out>
        res = <optimized out>
#22 0x00000000004d29cb in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:134
No locals.
#23 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:139
        res = <optimized out>
#24 caml_main (argv=<optimized out>) at runtime/startup_nat.c:146
No locals.
#25 0x000000000040166c in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37
No locals.
quit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions