[libc][rwlock] fix the race condition in waiter queue#201629
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
@llvm/pr-subscribers-libc Author: Schrodinger ZHU Yifan (SchrodingerZhu) ChangesFix #201615. Fix the issue that non atomic operations races in waiting queue, which causes missed futex wakeup signals. Confirmed by TSAN: AI wrote the detection script. Manually fixed. Full diff: https://github.com/llvm/llvm-project/pull/201629.diff 1 Files Affected:
diff --git a/libc/src/__support/threads/raw_rwlock.h b/libc/src/__support/threads/raw_rwlock.h
index 9bc1b88cabd69..1b0bba9c72675 100644
--- a/libc/src/__support/threads/raw_rwlock.h
+++ b/libc/src/__support/threads/raw_rwlock.h
@@ -75,11 +75,11 @@ class WaitingQueue final : private RawMutex {
else
return queue.pending_writers;
}
- template <Role role> LIBC_INLINE FutexWordType &serialization() {
+ template <Role role> LIBC_INLINE Futex &serialization() {
if constexpr (role == Role::Reader)
- return queue.reader_serialization.val;
+ return queue.reader_serialization;
else
- return queue.writer_serialization.val;
+ return queue.writer_serialization;
}
friend WaitingQueue;
};
@@ -391,8 +391,9 @@ class RawRwLock {
// sleep on the futex, we can avoid such waiting.
old = RwState::fetch_set_pending_bit<role>(state,
cpp::MemoryOrder::RELAXED);
- // no need to use atomic since it is already protected by the mutex.
- serial_number = guard.serialization<role>();
+ // relaxed atomic since it is already protected by the mutex.
+ serial_number =
+ guard.serialization<role>().load(cpp::MemoryOrder::RELAXED);
}
// Phase 6: do futex wait until the lock is available or timeout is
@@ -437,10 +438,12 @@ class RawRwLock {
{
WaitingQueue::Guard guard = queue.acquire(is_pshared);
if (guard.pending_count<Role::Writer>() != 0) {
- guard.serialization<Role::Writer>()++;
+ guard.serialization<Role::Writer>().fetch_add(
+ 1, cpp::MemoryOrder::RELEASE);
status = WakeTarget::Writers;
} else if (guard.pending_count<Role::Reader>() != 0) {
- guard.serialization<Role::Reader>()++;
+ guard.serialization<Role::Reader>().fetch_add(
+ 1, cpp::MemoryOrder::RELEASE);
status = WakeTarget::Readers;
} else {
status = WakeTarget::None;
|
There was a problem hiding this comment.
Pull request overview
This PR updates the libc RawRwLock waiter-queue serialization handling to eliminate data races on the futex “serialization” words, addressing flakiness/hangs caused by missed futex wakeups (Issue #201615). The change aligns the serialization counters with the Futex atomic interface so waiters and wakers always interact with the futex word atomically.
Changes:
- Switch
WaitingQueue::Guard::serialization()to return aFutex&instead of a plainFutexWordType&. - Read the serialization value with an atomic
load(RELAXED)when registering as pending. - Increment the serialization value with an atomic
fetch_addwhen deciding which waiters to wake.
Fix llvm#201615. Fix the issue that non atomic operations race in waiting queue, which causes missed futex wakeup signals. Confirmed by TSAN: ``` ================== WARNING: ThreadSanitizer: data race (pid=388518) Write of size 4 at 0x7ffd21cf98e4 by thread T23: #0 __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads() ./libc/src/__support/threads/raw_rwlock.h:443:44 llvm#1 __llvm_libc_23_0_0_git::RawRwLock::unlock() ./libc/src/__support/threads/raw_rwlock.h:520:5 llvm#2 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:104:18 llvm#3 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5 Previous atomic read of size 4 at 0x7ffd21cf98e4 by thread T4: #0 __llvm_libc_23_0_0_git::cpp::Atomic<unsigned int>::load(...) ./libc/src/__support/CPP/atomic.h:115:5 llvm#1 __llvm_libc_23_0_0_git::Futex::wait(...) ./libc/src/__support/threads/linux/futex_utils.h:43:17 llvm#2 __llvm_libc_23_0_0_git::cpp::expected<int, int> __llvm_libc_23_0_0_git::rwlock::WaitingQueue::wait<Role::Reader>(...) ./libc/src/__support/threads/raw_rwlock.h:101:35 llvm#3 __llvm_libc_23_0_0_git::rwlock::LockResult __llvm_libc_23_0_0_git::RawRwLock::lock_slow<Role::Reader>(...) ./libc/src/__support/threads/raw_rwlock.h:402:34 llvm#4 __llvm_libc_23_0_0_git::RawRwLock::read_lock(...) ./libc/src/__support/threads/raw_rwlock.h:485:12 llvm#5 randomized_thread_operation(SharedData*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:79:16 llvm#6 thread_runner(void*) ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:148:5 Thread T23 (tid=388553, running) created by main thread at: #0 pthread_create ... llvm#1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5 Thread T4 (tid=388533, running) created by main thread at: #0 pthread_create ... llvm#1 main ./libc/test/integration/src/__support/threads/tsan_full_rwlock.cpp:166:5 SUMMARY: ThreadSanitizer: data race ./libc/src/__support/threads/raw_rwlock.h:443:44 in __llvm_libc_23_0_0_git::RawRwLock::notify_pending_threads() ================== ``` AI wrote the detection script. Manually fixed.
Fix #201615.
Fix the issue that non atomic operations race in waiting queue, which causes missed futex wakeup signals.
Confirmed by TSAN:
AI wrote the detection script. Manually fixed.