build: fix build with gcc 7 by JeanMarcCoic · Pull Request #1 · Intersec/lib-common

JeanMarcCoic · 2019-08-28T16:41:36Z

When building the generated source from flex using gcc 7, there is a
warning about an implicit fallthrough:

iopc/iopc-lex.c: In function ‘iopc_lex’:
iopc/iopc-lex.l:267:19: error: this statement may fall through [-Werror=implicit-fallthrough=]
     yyextra->len  = yyleng;
iopc/iopc-lex.c:1507:2: note: in expansion of macro ‘YY_USER_ACTION’
 #endif
  ^~~
iopc/iopc-lex.l:503:1: note: in expansion of macro ‘YY_RULE_SETUP’
     <<EOF>>             { ERROR("unterminated doxygen param"); }
 ^   ~~~
iopc/iopc-lex.l:504:1: note: here
     .|^","|","{HS}*","  {
 ^
cc1: all warnings being treated as errors

Interestingly, gcc 8 doesn't complain about this fallthrough. To fix that,
we simply define two separate error messages to avoid relying on an
implicit fallthrough.

When building the generated source from flex using gcc 7, there is a warning about an implicit fallthrough: ``` iopc/iopc-lex.c: In function ‘iopc_lex’: iopc/iopc-lex.l:267:19: error: this statement may fall through [-Werror=implicit-fallthrough=] yyextra->len = yyleng; iopc/iopc-lex.c:1507:2: note: in expansion of macro ‘YY_USER_ACTION’ #endif ^~~ iopc/iopc-lex.l:503:1: note: in expansion of macro ‘YY_RULE_SETUP’ <<EOF>> { ERROR("unterminated doxygen param"); } ^ ~~~ iopc/iopc-lex.l:504:1: note: here .|^","|","{HS}*"," { ^ cc1: all warnings being treated as errors ``` Interestingly, gcc 8 doesn't complain about this fallthrough. To fix that, we simply define two separate error messages to avoid relying on an implicit fallthrough.

vthib · 2019-09-03T13:14:11Z

Pull request merged: 32a28fc

The consistency of the QPS blocks could get corrupted on some realloc situations. A block may end up marked as having its previous block free, while it isn't. Use of the freelist might end up using the wrong blocks, and then... you get 0x1010101 in all your handles. Makes sense! To describe the bug, a quick summary about QPS, or rather about QPS's implementation of the TLSF allocator. Allocated blocks have an associated header, which can indicate the status of the block. Notably, two bits are used to indicate whether the block is used or free, and whether the previous block is used or free. Here is a buggy situation: * First block is used, with size N * Second block is free, with size 1 * Third block is used, with size M In the qps->hdrs, this ends up as: +---------------+---------------+--------------------------+ | size: N, USED | size: 1, FREE | size: N, PREV_FREE, USED | +---------------+---------------+--------------------------+ The first block is reallocated to size N+1. As the next block is free and (block_size + next_block_size) = (N + 1) >= asked_size, The second block is removed from the freelist, and the first block size adjusted: +-------------------------------+--------------------------+ | size: N + 1, USED | size: N, PREV_FREE, USED | +-------------------------------+--------------------------+ The bug is previous here: we should update the PREV_FREE flag of the new next block, but we didn't. This leads to a corruption of the invariants of the allocator (clearly detected by the check_invariants routine). What this ends up causing probably depends a lot on the access pattern of the QPS. This is too big of a bug to not have been triggered regularly, but as the direct access to the freelist is still fine, unless we end up reallocating or using blocks around this corrupted block, we probably are safe. As for the 0x1010101 value, well, here's the backtrace: #0 qhat_flatten_leaf8 at qps-hat.in.c:110 #1 qhat_set_path8 at qps-hat.in.c:268 #2 qhat_set_path at qps-hat.h:286 We end up writing the byte value 1 repeatedly in a uint8_t array, at the wrong address, with a wrong len. My guess is that the bug is caused by repeated +1 reallocs, followed by repeated freeing. We end up corrupting the headers, then using a wrong header when reallocating, using an invalid ptr & len when setting a simple value "1". This ends up corrupting a handle's list with repeated 0x1010101 value, which can get copied around, but leads to segfaults when dereferenced. Change-Id: I4650d9666c5cab3af8ab824b98a300b9095ead8a rip-it: 39cffc8 uprooted

The Azure pipelines on Ubuntu seems to deadlocks randomly. It can be reproduced on docker with an Ubuntu image. The deadlocks seems to occur in the lib ASAN, so it is very difficult to debug: (gdb) thread apply all bt Thread 2 (Thread 0x7f10ed3fc700 (LWP 11433)): #0 0x00000000004bde50 in __sanitizer::BlockingMutex::Lock() () #1 0x00000000004355c0 in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) () #2 0x00000000004354c3 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> > >::PerClass*, __sanitiz er::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >*, unsigned long) () #3 0x0000000000435112 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddress SpaceView> > >*, unsigned long, unsigned long) () #4 0x0000000000434ee1 in __sanitizer::QuarantineCache<__asan::QuarantineCallback>::Enqueue(__asan::QuarantineCallback, void*, unsigned long) () #5 0x0000000000434d53 in __asan::Allocator::QuarantineChunk(__asan::AsanChunk*, void*, __sanitizer::BufferedStackTrace*) () #6 0x00000000004a7692 in free () #7 0x00007f10f3f7ce51 in __pthread_attr_destroy (attr=<optimized out>) at pthread_attr_destroy.c:38 #8 0x00000000004c4654 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) () #9 0x00000000004c4aaa in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) () #10 0x00000000004b2b0e in __asan::AsanThread::SetThreadStackAndTls(__asan::AsanThread::InitOptions const*) () #11 0x00000000004b270d in __asan::AsanThread::Init(__asan::AsanThread::InitOptions const*) () #12 0x00000000004b2bd8 in __asan::AsanThread::ThreadStart(unsigned long long, __sanitizer::atomic_uintptr_t*) () #13 0x00007f10f479c609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f10f4007293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7f10f202d980 (LWP 11432)): #0 0x00000000004bd8a7 in __sanitizer::internal_sched_yield() () #1 0x0000000000492705 in pthread_create () #2 0x00000000010f664a in thr_create (thread=0x7fff14247480, attr=0x7fff142472e0, fn=0x10f3540 <thr_job_main>, arg=0x621000044d00) at src/core/thr.c:111 #3 0x00000000010ecfe2 in thr_fork_threads () at src/core/thr-job.blk:949 #4 0x00000000010ed910 in thr_initialize (arg=0x0) at src/core/thr-job.blk:1066 #5 0x0000000001063db1 in module_require (module=0x60f000000310, required_by=0x60f000000220) at src/core/module.c:282 #6 0x0000000001063a66 in module_require (module=0x60f000000220, required_by=0x0) at src/core/module.c:276 #7 0x0000000000a2e669 in z_qps_hat () at tests/zchk-hat.blk:394 #8 0x0000000000745aa6 in z_run () at src/core/z.blk:1181 #9 0x00000000008dd459 in main (argc=1, argv=0x7fff1424a2f8) at tests/zchk.c:1010 We never encountered such deadlock on our buildbots. There is one difference between our buildbots and the Azure pipelines, we set the environment variable ASAN_OPTIONS to 'handle_segv=0:detect_leaks=1'. With this variable, the deadlocks disappear. So let's use this variable for the Azure pipelines. Change-Id: I2c33526422717ddcbf808fd618e17b8f15532c17 rip-it: adb53e9

Because our el_wake API didn't use explicit memory barriers between the read/write synchronizations points TSAN reported data race such as the one below: WARNING: ThreadSanitizer: data race Write of size 8 by main thread: #0 close <null> #1 el_fd_unregister ./src/core/el-epoll.in.c:109:13 #2 el_wake_unregister ./src/core/el.blk:1645:9 #3 el_unregister ./src/core/el.blk:1894:36 #4 dns_resolv_ctx_wipe ./src/net/addr.blk:239:5 #5 dns_resolv_ctx_delete ./src/net/addr.blk:245:1 #6 ____addr_info_async_block_invoke ./src/net/addr.blk:297:9 #7 el_wake_on_event ./src/core/el.blk:1607:9 #8 el_fd_fire ./src/core/el.blk:1307:13 #9 el_fds_loop ./src/core/el.blk:1461:17 #10 z_connect_ics_from_addr_and_wait ./tests/zchk-iop-rpc.c:316:13 #11 __z_iop_rpc_block_invoke_6 ./tests/zchk-iop-rpc.c:608:9 #12 z_iop_rpc ./tests/zchk-iop-rpc.c:640:7 #13 z_run ./src/core/z.blk:1545:9 #14 main ./tests/zchk.c:1206:12 Previous read of size 8 by thread T8: #0 write <null> #1 el_wake_fire ./src/core/el.blk:1660:5 #2 ____addr_info_async_block_invoke_2 ./src/net/addr.blk:331:9 #3 job_run ./src/core/thr-job.blk:281:9 #4 thr_run_deque_entry ./src/core/thr-job.blk:381:12 #5 thr_job_try_steal ./src/core/thr-job.blk:473:20 #6 thr_job_steal ./src/core/thr-job.blk:513:15 #7 thr_job_main ./src/core/thr-job.blk:874:13 #8 thr_hooks_wrapper ./src/core/thr.c:89:11 Indeed even if the eventfd() API ensures the thread safety of the write/read sequence we still have to ensure no more usage of the fd itself is possible before closing it. Thus we add an explicit memory barrier in the form of an atomic counter. Change-Id: I519147361d912e155341eec76f1fb4018699235c Priv-Id: b1aef4b55b87f433f478950e7b79207846a0f623

thr_queue_drain() seemed to assume that no other thread is running the queue when used and thus started with an `atomic_store(&q->running_on, id)` without taking care of the current `running_on` value. But when destroying a queue (`thr_queue_destroy`) TSAN reported this race: WARNING: ThreadSanitizer: data race Write of size 8 by main thread: #0 free <null> #1 libc_free ./src/core/mem.blk:140:9 #2 mp_ifree ./src/core/mem.blk:351:5 #3 thr_queue_delete ./src/core/thr-job.blk:574:1 #4 thr_queue_drain ./src/core/thr-job.blk:618:9 #5 thr_queue_sync ./src/core/thr-job.blk:705:13 #6 thr_queue_destroy ./src/core/thr-job.blk:730:9 #7 test_queue ./tests/zchk-thrjob.blk:381:9 #8 __z_thrjobs_block_invoke_4 ./tests/zchk-thrjob.blk:647:9 #9 z_thrjobs ./tests/zchk-thrjob.blk:648:7 #10 z_run ./src/core/z.blk:1545:9 #11 main ./tests/zchk.c:1206:12 Previous atomic write of size 8 at 0x7210000001e0 by thread T28: #0 thr_queue_drain ./src/core/thr-job.blk:614:5 #1 thr_queue_run ./src/core/thr-job.blk:624:5 #2 job_run ./src/core/thr-job.blk:285:9 #3 thr_run_deque_entry ./src/core/thr-job.blk:381:12 #4 thr_job_try_steal ./src/core/thr-job.blk:473:20 #5 thr_job_steal ./src/core/thr-job.blk:513:15 #6 thr_job_main ./src/core/thr-job.blk:884:25 #7 thr_hooks_wrapper ./src/core/thr.c:89:11 Indeed when finishing to drain the queue, another would finished by: do { […] } while (!mpsc_queue_drain_end(&it, &thr_qnode_destroy)); atomic_compare_exchange_strong(&q->running_on, &id, THR_QUEUE_NOT_RUNNING); So after removing the last element of the queue (`mpsc_queue_drain_end`) the queue's `running_on` is reset to THR_QUEUE_NOT_RUNNING. But when destroying, the sole condition to immediately drain the queue is: if (mpsc_queue_push(&q->q, &n->qnode)) { thr_queue_drain(q); So the race is obvious here, as soon as the queued is emptied, the destroying thread could already be freeing the queue while the previous thread could still be trying to reset `q->running_on`. To fix we now actually wait for the queue to be release before draining it again. Change-Id: I159d6426ec7ace01d0e7aaf685a7e32a6bf31749 Priv-Id: 07021192cda3217760c2ba85fc0fa12af17ef20f

JeanMarcCoic requested review from LpZ-squall, skyj and vthib August 28, 2019 16:41

vthib approved these changes Aug 29, 2019

View reviewed changes

skyj approved these changes Sep 3, 2019

View reviewed changes

vthib closed this Sep 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: fix build with gcc 7#1

build: fix build with gcc 7#1
JeanMarcCoic wants to merge 1 commit into
Intersec:masterfrom
JeanMarcCoic:master

JeanMarcCoic commented Aug 28, 2019

Uh oh!

vthib commented Sep 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JeanMarcCoic commented Aug 28, 2019

Uh oh!

vthib commented Sep 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants