Skip to content

My great test#3

Closed
fabien-marchand wants to merge 1 commit into
Intersec:masterfrom
fabien-marchand:hackathon
Closed

My great test#3
fabien-marchand wants to merge 1 commit into
Intersec:masterfrom
fabien-marchand:hackathon

Conversation

@fabien-marchand
Copy link
Copy Markdown
Contributor

Commit message.

Refs: #

Commit message.

Refs: #
nicopauss pushed a commit that referenced this pull request Feb 12, 2026
Because our el_wake API didn't use explicit memory barriers between the
read/write synchronizations points TSAN reported data race such as the
one below:

WARNING: ThreadSanitizer: data race
  Write of size 8 by main thread:
    #0 close <null>
    #1 el_fd_unregister ./src/core/el-epoll.in.c:109:13
    #2 el_wake_unregister ./src/core/el.blk:1645:9
    #3 el_unregister ./src/core/el.blk:1894:36
    #4 dns_resolv_ctx_wipe ./src/net/addr.blk:239:5
    #5 dns_resolv_ctx_delete ./src/net/addr.blk:245:1
    #6 ____addr_info_async_block_invoke ./src/net/addr.blk:297:9
    #7 el_wake_on_event ./src/core/el.blk:1607:9
    #8 el_fd_fire ./src/core/el.blk:1307:13
    #9 el_fds_loop ./src/core/el.blk:1461:17
    #10 z_connect_ics_from_addr_and_wait ./tests/zchk-iop-rpc.c:316:13
    #11 __z_iop_rpc_block_invoke_6 ./tests/zchk-iop-rpc.c:608:9
    #12 z_iop_rpc ./tests/zchk-iop-rpc.c:640:7
    #13 z_run ./src/core/z.blk:1545:9
    #14 main ./tests/zchk.c:1206:12

  Previous read of size 8 by thread T8:
    #0 write <null>
    #1 el_wake_fire ./src/core/el.blk:1660:5
    #2 ____addr_info_async_block_invoke_2 ./src/net/addr.blk:331:9
    #3 job_run ./src/core/thr-job.blk:281:9
    #4 thr_run_deque_entry ./src/core/thr-job.blk:381:12
    #5 thr_job_try_steal ./src/core/thr-job.blk:473:20
    #6 thr_job_steal ./src/core/thr-job.blk:513:15
    #7 thr_job_main ./src/core/thr-job.blk:874:13
    #8 thr_hooks_wrapper ./src/core/thr.c:89:11

Indeed even if the eventfd() API ensures the thread safety of the
write/read sequence we still have to ensure no more usage of the fd
itself is possible before closing it.

Thus we add an explicit memory barrier in the form of an atomic counter.

Change-Id: I519147361d912e155341eec76f1fb4018699235c
Priv-Id: b1aef4b55b87f433f478950e7b79207846a0f623
nicopauss pushed a commit that referenced this pull request Feb 12, 2026
thr_queue_drain() seemed to assume that no other thread is running the
queue when used and thus started with an `atomic_store(&q->running_on,
id)` without taking care of the current `running_on` value.

But when destroying a queue (`thr_queue_destroy`) TSAN reported this
race:

WARNING: ThreadSanitizer: data race
  Write of size 8  by main thread:
    #0 free <null>
    #1 libc_free ./src/core/mem.blk:140:9
    #2 mp_ifree ./src/core/mem.blk:351:5
    #3 thr_queue_delete ./src/core/thr-job.blk:574:1
    #4 thr_queue_drain ./src/core/thr-job.blk:618:9
    #5 thr_queue_sync ./src/core/thr-job.blk:705:13
    #6 thr_queue_destroy ./src/core/thr-job.blk:730:9
    #7 test_queue ./tests/zchk-thrjob.blk:381:9
    #8 __z_thrjobs_block_invoke_4 ./tests/zchk-thrjob.blk:647:9
    #9 z_thrjobs ./tests/zchk-thrjob.blk:648:7
    #10 z_run ./src/core/z.blk:1545:9
    #11 main ./tests/zchk.c:1206:12

  Previous atomic write of size 8 at 0x7210000001e0 by thread T28:
    #0 thr_queue_drain ./src/core/thr-job.blk:614:5
    #1 thr_queue_run ./src/core/thr-job.blk:624:5
    #2 job_run ./src/core/thr-job.blk:285:9
    #3 thr_run_deque_entry ./src/core/thr-job.blk:381:12
    #4 thr_job_try_steal ./src/core/thr-job.blk:473:20
    #5 thr_job_steal ./src/core/thr-job.blk:513:15
    #6 thr_job_main ./src/core/thr-job.blk:884:25
    #7 thr_hooks_wrapper ./src/core/thr.c:89:11

Indeed when finishing to drain the queue, another would finished by:

    do {
        […]
    } while (!mpsc_queue_drain_end(&it, &thr_qnode_destroy));
    atomic_compare_exchange_strong(&q->running_on, &id,
                                   THR_QUEUE_NOT_RUNNING);

So after removing the last element of the queue (`mpsc_queue_drain_end`)
the queue's `running_on` is reset to THR_QUEUE_NOT_RUNNING.

But when destroying, the sole condition to immediately drain the queue
is:

    if (mpsc_queue_push(&q->q, &n->qnode)) {
        thr_queue_drain(q);

So the race is obvious here, as soon as the queued is emptied, the
destroying thread could already be freeing the queue while the previous
thread could still be trying to reset `q->running_on`.

To fix we now actually wait for the queue to be release before draining
it again.

Change-Id: I159d6426ec7ace01d0e7aaf685a7e32a6bf31749
Priv-Id: 07021192cda3217760c2ba85fc0fa12af17ef20f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant