My great test#3
Closed
fabien-marchand wants to merge 1 commit into
Closed
Conversation
Commit message. Refs: #
nicopauss
pushed a commit
that referenced
this pull request
Feb 12, 2026
Because our el_wake API didn't use explicit memory barriers between the
read/write synchronizations points TSAN reported data race such as the
one below:
WARNING: ThreadSanitizer: data race
Write of size 8 by main thread:
#0 close <null>
#1 el_fd_unregister ./src/core/el-epoll.in.c:109:13
#2 el_wake_unregister ./src/core/el.blk:1645:9
#3 el_unregister ./src/core/el.blk:1894:36
#4 dns_resolv_ctx_wipe ./src/net/addr.blk:239:5
#5 dns_resolv_ctx_delete ./src/net/addr.blk:245:1
#6 ____addr_info_async_block_invoke ./src/net/addr.blk:297:9
#7 el_wake_on_event ./src/core/el.blk:1607:9
#8 el_fd_fire ./src/core/el.blk:1307:13
#9 el_fds_loop ./src/core/el.blk:1461:17
#10 z_connect_ics_from_addr_and_wait ./tests/zchk-iop-rpc.c:316:13
#11 __z_iop_rpc_block_invoke_6 ./tests/zchk-iop-rpc.c:608:9
#12 z_iop_rpc ./tests/zchk-iop-rpc.c:640:7
#13 z_run ./src/core/z.blk:1545:9
#14 main ./tests/zchk.c:1206:12
Previous read of size 8 by thread T8:
#0 write <null>
#1 el_wake_fire ./src/core/el.blk:1660:5
#2 ____addr_info_async_block_invoke_2 ./src/net/addr.blk:331:9
#3 job_run ./src/core/thr-job.blk:281:9
#4 thr_run_deque_entry ./src/core/thr-job.blk:381:12
#5 thr_job_try_steal ./src/core/thr-job.blk:473:20
#6 thr_job_steal ./src/core/thr-job.blk:513:15
#7 thr_job_main ./src/core/thr-job.blk:874:13
#8 thr_hooks_wrapper ./src/core/thr.c:89:11
Indeed even if the eventfd() API ensures the thread safety of the
write/read sequence we still have to ensure no more usage of the fd
itself is possible before closing it.
Thus we add an explicit memory barrier in the form of an atomic counter.
Change-Id: I519147361d912e155341eec76f1fb4018699235c
Priv-Id: b1aef4b55b87f433f478950e7b79207846a0f623
nicopauss
pushed a commit
that referenced
this pull request
Feb 12, 2026
thr_queue_drain() seemed to assume that no other thread is running the
queue when used and thus started with an `atomic_store(&q->running_on,
id)` without taking care of the current `running_on` value.
But when destroying a queue (`thr_queue_destroy`) TSAN reported this
race:
WARNING: ThreadSanitizer: data race
Write of size 8 by main thread:
#0 free <null>
#1 libc_free ./src/core/mem.blk:140:9
#2 mp_ifree ./src/core/mem.blk:351:5
#3 thr_queue_delete ./src/core/thr-job.blk:574:1
#4 thr_queue_drain ./src/core/thr-job.blk:618:9
#5 thr_queue_sync ./src/core/thr-job.blk:705:13
#6 thr_queue_destroy ./src/core/thr-job.blk:730:9
#7 test_queue ./tests/zchk-thrjob.blk:381:9
#8 __z_thrjobs_block_invoke_4 ./tests/zchk-thrjob.blk:647:9
#9 z_thrjobs ./tests/zchk-thrjob.blk:648:7
#10 z_run ./src/core/z.blk:1545:9
#11 main ./tests/zchk.c:1206:12
Previous atomic write of size 8 at 0x7210000001e0 by thread T28:
#0 thr_queue_drain ./src/core/thr-job.blk:614:5
#1 thr_queue_run ./src/core/thr-job.blk:624:5
#2 job_run ./src/core/thr-job.blk:285:9
#3 thr_run_deque_entry ./src/core/thr-job.blk:381:12
#4 thr_job_try_steal ./src/core/thr-job.blk:473:20
#5 thr_job_steal ./src/core/thr-job.blk:513:15
#6 thr_job_main ./src/core/thr-job.blk:884:25
#7 thr_hooks_wrapper ./src/core/thr.c:89:11
Indeed when finishing to drain the queue, another would finished by:
do {
[…]
} while (!mpsc_queue_drain_end(&it, &thr_qnode_destroy));
atomic_compare_exchange_strong(&q->running_on, &id,
THR_QUEUE_NOT_RUNNING);
So after removing the last element of the queue (`mpsc_queue_drain_end`)
the queue's `running_on` is reset to THR_QUEUE_NOT_RUNNING.
But when destroying, the sole condition to immediately drain the queue
is:
if (mpsc_queue_push(&q->q, &n->qnode)) {
thr_queue_drain(q);
So the race is obvious here, as soon as the queued is emptied, the
destroying thread could already be freeing the queue while the previous
thread could still be trying to reset `q->running_on`.
To fix we now actually wait for the queue to be release before draining
it again.
Change-Id: I159d6426ec7ace01d0e7aaf685a7e32a6bf31749
Priv-Id: 07021192cda3217760c2ba85fc0fa12af17ef20f
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit message.
Refs: #