update by RanderWang · Pull Request #9 · thesofproject/linux

RanderWang · 2018-06-28T01:43:17Z

No description provided.

…fig. Machine drivers statically define a number of DAI links that currently cannot be changed or removed by topology. This means PCMs and platform components cannot be changed by topology at runtime AND machine drivers are tightly coupled to topology. This patch allows topology to override the machine driver DAI link config in order to reuse machine drivers with different topologies and platform components. The patch supports :- 1) create new FE PCMs with a topology defined PCM ID. 2) destroy existing static FE PCMs 3) change the platform component driver. 4) assign any new HW params fixups. The patch requires no changes to the machine drivers, but does add some platform component flags that the platform component driver can assign before loading topologies. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

…writes Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Give topology clients more access to the topology data by passing index, pcm, link_config and dai_driver to clients. This allows clients to fully instantiate and track topology objects. The SOF driver is the first user of these new APIs and needs them to build component topology driver and FW objects. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add a callback fro clients for notification about DAPM route loading and unloading. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Currently ALSA core blocks userspace for about 10 seconds for PCM R/W IO. This needs to be configurable for modern hardware like DSPs where no pointer update in milliseconds can indicate terminal DSP errors. Add a substream variable to set the wait time in ms. This allows userspace and drivers to recover more quickly from terminal DSP errors. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Might not be needed. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Align with firmware tools, no functionality change Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

There is no need to deal with DMICs if the DSP is not present and there is no ACPI machine ID found. Simplify before moving these ACPI tables to sound/soc/intel/common Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

No functionality change, just move to common tables to make it easier to deal with SOF and share the same machine drivers - as done previously for BYT/CHT/HSW/BDW. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

No functionality change for Skylake driver, add relevant names needed by SOF. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

The Sound Open Firmware driver core is a generic architecture independent layer that allows SOF to be used on many different different architectures and platforms. It abstracts DSP operations and IO methods so that the target DSP can be an internal memory mapped or external SPI or I2C based device. This abstraction also allows SOF to be run on many different VMs on the same physical HW. SOF also requires some data in ASoC PCM runtime data for looking up SOF data during ASoC PCM operations. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

SOF exposes regular ALSA Kcontrols that are defined by topology. This patch converts the Kcontrol IO to DSP IPC. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add debugFS files that can be used to expose DSP memories and and peripherals to userspace to assist with firmware debugging. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Define an IPC ABI for all host <--> DSP communication. This ABI should be transport agnostic. i.e. it should work on MMIO and SPI/I2C style interfaces. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for exposing PCMs to userspace. PCMs are defined by topology and the operations in this patch map to SOF IPC calls. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

SOF uses topology to define the DAPM graphs and widgets, DAIs, PCMs and set parameters for init and run time usage. This patch loads topology and maps it to IPC commands that are build the topology on the DSP. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add a trace event buffer that can be used by userspace to read DSP runtime trace events alongside bespoke trace data in realtime for firmware debug. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add operation pointers that can be called by core to control a wide variety of DSP targets. The DSP HW drivers will fill in these operations. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

The firmware loader exports APIs that can be called by core to load and process multiple different file formats. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for compressed audio playback/capture in SOF. TODO: to be completed. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for saving and restoring DSP context in D3 to host DDR. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add a simple "fallback" machine driver that can be used to enable SOF on boards with no codec device. This machine driver can also be forced for debug/development. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for the audio DSP hardware found on Intel Baytrail, Cherrytrail and Braswell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add DSP hardware support for Intel Haswell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add SOF support for Intel Broadwell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add SOF hardware DSP support for Intel Apollolake and Cannonlake based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Support HDA controller operations for DSP and provide space for future DSP HDA FW integration. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for various PM and core reset/run state transitions. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add HDA specific IPC mechanism for Intel DSP HW. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

Add support for loading DSP firmware on Intel HDA based platforms. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

The recent commit 01eb018 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") changed some of the handling of floating point/vector restore. In particular it caused current->thread.fpexc_mode to be copied into the current MSR (via msr_check_and_set()), rather than just into regs->msr (which is moved into MSR on return to userspace). This can lead to a crash in the kernel if we take a floating point exception when restoring FPSCR: Oops: Exception in kernel mode, sig: 8 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty thesofproject#9 NIP: c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570 REGS: c0000016b7cfb3b0 TRAP: 0700 Not tainted (5.9.0-rc1-00098-g18445bf405cb-dirty) MSR: 900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002444 XER: 00000000 CFAR: c00000000001a7a8 IRQMASK: 1 GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740 GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00 GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000 GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58 GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638 GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0 GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28 GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900 NIP load_fp_state+0x4/0x214 LR restore_math+0x17c/0x1f0 Call Trace: 0xc0000016b7cfb680 (unreliable) __switch_to+0x330/0x460 __schedule+0x318/0x920 schedule+0x74/0x140 schedule_timeout+0x318/0x3f0 wait_for_completion+0xc8/0x210 call_usermodehelper_exec+0x234/0x280 do_coredump+0xedc/0x13c0 get_signal+0x1d4/0xbe0 do_notify_resume+0x1a0/0x490 interrupt_exit_user_prepare+0x1c4/0x230 interrupt_return+0x14/0x1c0 Instruction dump: ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4 782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010 Fix it by only loading the fpexc_mode value into regs->msr. Also add a comment to explain that although VSX is subject to the value of fpexc_mode, we don't have to handle that separately because we only allow VSX to be enabled if FP is also enabled. Fixes: 01eb018 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au

…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The aliases were never released causing the following leaks: Indirect leak of 1224 byte(s) in 9 object(s) allocated from: #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) thesofproject#1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322 thesofproject#2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778 thesofproject#3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295 thesofproject#4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367 thesofproject#5 0x56332c76a09b in run_test tests/builtin-test.c:410 thesofproject#6 0x56332c76a09b in test_and_print tests/builtin-test.c:440 thesofproject#7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695 thesofproject#8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807 thesofproject#9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 956a783 ("perf test: Test pmu-events aliases") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) thesofproject#1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 thesofproject#2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 thesofproject#3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 thesofproject#4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 thesofproject#5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 thesofproject#6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 thesofproject#7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 thesofproject#8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 thesofproject#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 thesofproject#10 0x559fbbc0109b in run_test tests/builtin-test.c:410 thesofproject#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 thesofproject#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 thesofproject#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 thesofproject#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The test_generic_metric() missed to release entries in the pctx. Asan reported following leak (and more): Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) thesofproject#1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14) thesofproject#2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497) thesofproject#3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111 thesofproject#4 0x55f7e7341667 in expr__add_ref util/expr.c:120 thesofproject#5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783 thesofproject#6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858 thesofproject#7 0x55f7e712390b in compute_single tests/parse-metric.c:128 thesofproject#8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180 thesofproject#9 0x55f7e712446d in compute_metric tests/parse-metric.c:196 thesofproject#10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295 thesofproject#11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355 thesofproject#12 0x55f7e70be09b in run_test tests/builtin-test.c:410 thesofproject#13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 thesofproject#14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 thesofproject#15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 thesofproject#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 6d432c4 ("perf tools: Add test_generic_metric function") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The metricgroup__add_metric() can find multiple match for a metric group and it's possible to fail. Also it can fail in the middle like in resolve_metric() even for single metric. In those cases, the intermediate list and ids will be leaked like: Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) thesofproject#1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683 thesofproject#2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906 thesofproject#3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940 thesofproject#4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993 thesofproject#5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045 thesofproject#6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087 thesofproject#7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164 thesofproject#8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196 thesofproject#9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318 thesofproject#10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356 thesofproject#11 0x55f7e70be09b in run_test tests/builtin-test.c:410 thesofproject#12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 thesofproject#13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 thesofproject#14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 thesofproject#15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) thesofproject#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 thesofproject#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 thesofproject#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 thesofproject#4 0x560578e07045 in test__pmu tests/pmu.c:155 thesofproject#5 0x560578de109b in run_test tests/builtin-test.c:410 thesofproject#6 0x560578de109b in test_and_print tests/builtin-test.c:440 thesofproject#7 0x560578de401a in __cmd_test tests/builtin-test.c:661 thesofproject#8 0x560578de401a in cmd_test tests/builtin-test.c:807 thesofproject#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 thesofproject#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 thesofproject#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 thesofproject#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 thesofproject#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 thesofproject#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 thesofproject#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 thesofproject#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 thesofproject#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 thesofproject#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 thesofproject#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 thesofproject#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 thesofproject#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 thesofproject#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 thesofproject#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 thesofproject#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 thesofproject#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 thesofproject#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 thesofproject#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 thesofproject#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 thesofproject#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 thesofproject#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 thesofproject#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 thesofproject#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 thesofproject#4 0x560578e07045 in test__pmu tests/pmu.c:155 thesofproject#5 0x560578de109b in run_test tests/builtin-test.c:410 thesofproject#6 0x560578de109b in test_and_print tests/builtin-test.c:440 thesofproject#7 0x560578de401a in __cmd_test tests/builtin-test.c:661 thesofproject#8 0x560578de401a in cmd_test tests/builtin-test.c:807 thesofproject#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 thesofproject#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 thesofproject#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 thesofproject#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 thesofproject#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 thesofproject#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 thesofproject#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 thesofproject#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 thesofproject#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 thesofproject#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 thesofproject#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 thesofproject#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 thesofproject#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) thesofproject#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 thesofproject#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 thesofproject#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 thesofproject#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 thesofproject#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 thesofproject#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 thesofproject#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 thesofproject#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 thesofproject#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 thesofproject#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 thesofproject#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 thesofproject#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 thesofproject#4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 thesofproject#5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 thesofproject#6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 thesofproject#7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 thesofproject#8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 thesofproject#9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 thesofproject#10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 thesofproject#11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 thesofproject#12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) thesofproject#13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 thesofproject#14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 thesofproject#15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 thesofproject#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 thesofproject#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 thesofproject#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 thesofproject#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 thesofproject#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 thesofproject#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 thesofproject#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 thesofproject#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 thesofproject#24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 thesofproject#4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 thesofproject#5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 thesofproject#6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 thesofproject#7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 thesofproject#8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 thesofproject#9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 thesofproject#10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 thesofproject#11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) thesofproject#12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 thesofproject#13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 thesofproject#14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 thesofproject#15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 thesofproject#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 thesofproject#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 thesofproject#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 thesofproject#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 thesofproject#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 thesofproject#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 thesofproject#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 thesofproject#23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 thesofproject#4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 thesofproject#5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 thesofproject#6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 thesofproject#7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 thesofproject#8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 thesofproject#9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 thesofproject#10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 thesofproject#11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 thesofproject#12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 thesofproject#13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) thesofproject#14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 thesofproject#15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 thesofproject#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 thesofproject#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 thesofproject#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 thesofproject#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 thesofproject#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 thesofproject#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 thesofproject#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 thesofproject#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 thesofproject#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 thesofproject#25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 thesofproject#4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 thesofproject#5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) thesofproject#6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 thesofproject#7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 thesofproject#8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 thesofproject#9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 thesofproject#10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 thesofproject#11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 thesofproject#12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 thesofproject#13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 thesofproject#14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 thesofproject#15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 thesofproject#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 thesofproject#17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Very sporadically I had test case btrfs/069 from fstests hanging (for years, it is not a recent regression), with the following traces in dmesg/syslog: [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0 [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds. [162513.514318] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.514747] task:btrfs-transacti state:D stack: 0 pid:1356167 ppid: 2 flags:0x00004000 [162513.514751] Call Trace: [162513.514761] __schedule+0x5ce/0xd00 [162513.514765] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.514771] schedule+0x46/0xf0 [162513.514844] wait_current_trans+0xde/0x140 [btrfs] [162513.514850] ? finish_wait+0x90/0x90 [162513.514864] start_transaction+0x37c/0x5f0 [btrfs] [162513.514879] transaction_kthread+0xa4/0x170 [btrfs] [162513.514891] ? btrfs_cleanup_transaction+0x660/0x660 [btrfs] [162513.514894] kthread+0x153/0x170 [162513.514897] ? kthread_stop+0x2c0/0x2c0 [162513.514902] ret_from_fork+0x22/0x30 [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds. [162513.515192] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.515680] task:fsstress state:D stack: 0 pid:1356184 ppid:1356177 flags:0x00004000 [162513.515682] Call Trace: [162513.515688] __schedule+0x5ce/0xd00 [162513.515691] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.515697] schedule+0x46/0xf0 [162513.515712] wait_current_trans+0xde/0x140 [btrfs] [162513.515716] ? finish_wait+0x90/0x90 [162513.515729] start_transaction+0x37c/0x5f0 [btrfs] [162513.515743] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.515753] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.515758] ? __ia32_sys_fdatasync+0x20/0x20 [162513.515761] iterate_supers+0x87/0xf0 [162513.515765] ksys_sync+0x60/0xb0 [162513.515768] __do_sys_sync+0xa/0x10 [162513.515771] do_syscall_64+0x33/0x80 [162513.515774] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.515781] RIP: 0033:0x7f5238f50bd7 [162513.515782] Code: Bad RIP value. [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0 [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds. [162513.516064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.516617] task:fsstress state:D stack: 0 pid:1356185 ppid:1356177 flags:0x00000000 [162513.516620] Call Trace: [162513.516625] __schedule+0x5ce/0xd00 [162513.516628] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.516634] schedule+0x46/0xf0 [162513.516647] wait_current_trans+0xde/0x140 [btrfs] [162513.516650] ? finish_wait+0x90/0x90 [162513.516662] start_transaction+0x4d7/0x5f0 [btrfs] [162513.516679] btrfs_setxattr_trans+0x3c/0x100 [btrfs] [162513.516686] __vfs_setxattr+0x66/0x80 [162513.516691] __vfs_setxattr_noperm+0x70/0x200 [162513.516697] vfs_setxattr+0x6b/0x120 [162513.516703] setxattr+0x125/0x240 [162513.516709] ? lock_acquire+0xb1/0x480 [162513.516712] ? mnt_want_write+0x20/0x50 [162513.516721] ? rcu_read_lock_any_held+0x8e/0xb0 [162513.516723] ? preempt_count_add+0x49/0xa0 [162513.516725] ? __sb_start_write+0x19b/0x290 [162513.516727] ? preempt_count_add+0x49/0xa0 [162513.516732] path_setxattr+0xba/0xd0 [162513.516739] __x64_sys_setxattr+0x27/0x30 [162513.516741] do_syscall_64+0x33/0x80 [162513.516743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.516745] RIP: 0033:0x7f5238f56d5a [162513.516746] Code: Bad RIP value. [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470 [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700 [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004 [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0 [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds. [162513.517064] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.517763] task:fsstress state:D stack: 0 pid:1356196 ppid:1356177 flags:0x00004000 [162513.517780] Call Trace: [162513.517786] __schedule+0x5ce/0xd00 [162513.517789] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.517796] schedule+0x46/0xf0 [162513.517810] wait_current_trans+0xde/0x140 [btrfs] [162513.517814] ? finish_wait+0x90/0x90 [162513.517829] start_transaction+0x37c/0x5f0 [btrfs] [162513.517845] btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs] [162513.517857] btrfs_sync_fs+0x61/0x1c0 [btrfs] [162513.517862] ? __ia32_sys_fdatasync+0x20/0x20 [162513.517865] iterate_supers+0x87/0xf0 [162513.517869] ksys_sync+0x60/0xb0 [162513.517872] __do_sys_sync+0xa/0x10 [162513.517875] do_syscall_64+0x33/0x80 [162513.517878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.517881] RIP: 0033:0x7f5238f50bd7 [162513.517883] Code: Bad RIP value. [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7 [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053 [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0 [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053 [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340 [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds. [162513.518298] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.519157] task:fsstress state:D stack: 0 pid:1356197 ppid:1356177 flags:0x00000000 [162513.519160] Call Trace: [162513.519165] __schedule+0x5ce/0xd00 [162513.519168] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.519174] schedule+0x46/0xf0 [162513.519190] wait_current_trans+0xde/0x140 [btrfs] [162513.519193] ? finish_wait+0x90/0x90 [162513.519206] start_transaction+0x4d7/0x5f0 [btrfs] [162513.519222] btrfs_create+0x57/0x200 [btrfs] [162513.519230] lookup_open+0x522/0x650 [162513.519246] path_openat+0x2b8/0xa50 [162513.519270] do_filp_open+0x91/0x100 [162513.519275] ? find_held_lock+0x32/0x90 [162513.519280] ? lock_acquired+0x33b/0x470 [162513.519285] ? do_raw_spin_unlock+0x4b/0xc0 [162513.519287] ? _raw_spin_unlock+0x29/0x40 [162513.519295] do_sys_openat2+0x20d/0x2d0 [162513.519300] do_sys_open+0x44/0x80 [162513.519304] do_syscall_64+0x33/0x80 [162513.519307] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.519309] RIP: 0033:0x7f5238f4a903 [162513.519310] Code: Bad RIP value. [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903 [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470 [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002 [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013 [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620 [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds. [162513.519727] Not tainted 5.9.0-rc6-btrfs-next-69 #1 [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [162513.520508] task:btrfs state:D stack: 0 pid:1356211 ppid:1356178 flags:0x00004002 [162513.520511] Call Trace: [162513.520516] __schedule+0x5ce/0xd00 [162513.520519] ? _raw_spin_unlock_irqrestore+0x3c/0x60 [162513.520525] schedule+0x46/0xf0 [162513.520544] btrfs_scrub_pause+0x11f/0x180 [btrfs] [162513.520548] ? finish_wait+0x90/0x90 [162513.520562] btrfs_commit_transaction+0x45a/0xc30 [btrfs] [162513.520574] ? start_transaction+0xe0/0x5f0 [btrfs] [162513.520596] btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs] [162513.520619] btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs] [162513.520639] btrfs_ioctl+0x2a25/0x36f0 [btrfs] [162513.520643] ? do_sigaction+0xf3/0x240 [162513.520645] ? find_held_lock+0x32/0x90 [162513.520648] ? do_sigaction+0xf3/0x240 [162513.520651] ? lock_acquired+0x33b/0x470 [162513.520655] ? _raw_spin_unlock_irq+0x24/0x50 [162513.520657] ? lockdep_hardirqs_on+0x7d/0x100 [162513.520660] ? _raw_spin_unlock_irq+0x35/0x50 [162513.520662] ? do_sigaction+0xf3/0x240 [162513.520671] ? __x64_sys_ioctl+0x83/0xb0 [162513.520672] __x64_sys_ioctl+0x83/0xb0 [162513.520677] do_syscall_64+0x33/0x80 [162513.520679] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [162513.520681] RIP: 0033:0x7fc3cd307d87 [162513.520682] Code: Bad RIP value. [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87 [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003 [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003 [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001 [162513.520703] Showing all locks held in the system: [162513.520712] 1 lock held by khungtaskd/54: [162513.520713] #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197 [162513.520728] 1 lock held by in:imklog/596: [162513.520729] #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [162513.520782] 1 lock held by btrfs-transacti/1356167: [162513.520784] #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs] [162513.520798] 1 lock held by btrfs/1356190: [162513.520800] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60 [162513.520805] 1 lock held by fsstress/1356184: [162513.520806] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520811] 3 locks held by fsstress/1356185: [162513.520812] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520815] #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120 [162513.520820] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520833] 1 lock held by fsstress/1356196: [162513.520834] #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0 [162513.520838] 3 locks held by fsstress/1356197: [162513.520839] #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50 [162513.520843] #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50 [162513.520846] #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] [162513.520858] 2 locks held by btrfs/1356211: [162513.520859] #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs] [162513.520877] #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs] This was weird because the stack traces show that a transaction commit, triggered by a device replace operation, is blocking trying to pause any running scrubs but there are no stack traces of blocked tasks doing a scrub. After poking around with drgn, I noticed there was a scrub task that was constantly running and blocking for shorts periods of time: >>> t = find_task(prog, 1356190) >>> prog.stack_trace(t) #0 __schedule+0x5ce/0xcfc #1 schedule+0x46/0xe4 #2 schedule_timeout+0x1df/0x475 #3 btrfs_reada_wait+0xda/0x132 #4 scrub_stripe+0x2a8/0x112f #5 scrub_chunk+0xcd/0x134 thesofproject#6 scrub_enumerate_chunks+0x29e/0x5ee thesofproject#7 btrfs_scrub_dev+0x2d5/0x91b thesofproject#8 btrfs_ioctl+0x7f5/0x36e7 thesofproject#9 __x64_sys_ioctl+0x83/0xb0 thesofproject#10 do_syscall_64+0x33/0x77 thesofproject#11 entry_SYSCALL_64+0x7c/0x156 Which corresponds to: int btrfs_reada_wait(void *handle) { struct reada_control *rc = handle; struct btrfs_fs_info *fs_info = rc->fs_info; while (atomic_read(&rc->elems)) { if (!atomic_read(&fs_info->reada_works_cnt)) reada_start_machine(fs_info); wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0, (HZ + 9) / 10); } (...) So the counter "rc->elems" was set to 1 and never decreased to 0, causing the scrub task to loop forever in that function. Then I used the following script for drgn to check the readahead requests: $ cat dump_reada.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) def dump_re(re): nzones = re.nzones.value_() print(f're at {hex(re.value_())}') print(f'\t logical {re.logical.value_()}') print(f'\t refcnt {re.refcnt.value_()}') print(f'\t nzones {nzones}') for i in range(nzones): dev = re.zones[i].device name = dev.name.str.string_() print(f'\t\t dev id {dev.devid.value_()} name {name}') print() for _, e in radix_tree_for_each(fs_info.reada_tree): re = cast('struct reada_extent *', e) dump_re(re) $ drgn dump_reada.py re at 0xffff8f3da9d25ad8 logical 38928384 refcnt 1 nzones 1 dev id 0 name b'/dev/sdd' $ So there was one readahead extent with a single zone corresponding to the source device of that last device replace operation logged in dmesg/syslog. Also the ID of that zone's device was 0 which is a special value set in the source device of a device replace operation when the operation finishes (constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()), confirming again that device /dev/sdd was the source of a device replace operation. Normally there should be as many zones in the readahead extent as there are devices, and I wasn't expecting the extent to be in a block group with a 'single' profile, so I went and confirmed with the following drgn script that there weren't any single profile block groups: $ cat dump_block_groups.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1" mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info) BTRFS_BLOCK_GROUP_DATA = (1 << 0) BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1) BTRFS_BLOCK_GROUP_METADATA = (1 << 2) BTRFS_BLOCK_GROUP_RAID0 = (1 << 3) BTRFS_BLOCK_GROUP_RAID1 = (1 << 4) BTRFS_BLOCK_GROUP_DUP = (1 << 5) BTRFS_BLOCK_GROUP_RAID10 = (1 << 6) BTRFS_BLOCK_GROUP_RAID5 = (1 << 7) BTRFS_BLOCK_GROUP_RAID6 = (1 << 8) BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9) BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10) def bg_flags_string(bg): flags = bg.flags.value_() ret = '' if flags & BTRFS_BLOCK_GROUP_DATA: ret = 'data' if flags & BTRFS_BLOCK_GROUP_METADATA: if len(ret) > 0: ret += '|' ret += 'meta' if flags & BTRFS_BLOCK_GROUP_SYSTEM: if len(ret) > 0: ret += '|' ret += 'system' if flags & BTRFS_BLOCK_GROUP_RAID0: ret += ' raid0' elif flags & BTRFS_BLOCK_GROUP_RAID1: ret += ' raid1' elif flags & BTRFS_BLOCK_GROUP_DUP: ret += ' dup' elif flags & BTRFS_BLOCK_GROUP_RAID10: ret += ' raid10' elif flags & BTRFS_BLOCK_GROUP_RAID5: ret += ' raid5' elif flags & BTRFS_BLOCK_GROUP_RAID6: ret += ' raid6' elif flags & BTRFS_BLOCK_GROUP_RAID1C3: ret += ' raid1c3' elif flags & BTRFS_BLOCK_GROUP_RAID1C4: ret += ' raid1c4' else: ret += ' single' return ret def dump_bg(bg): print() print(f'block group at {hex(bg.value_())}') print(f'\t start {bg.start.value_()} length {bg.length.value_()}') print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}') bg_root = fs_info.block_group_cache_tree.address_of_() for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'): dump_bg(bg) $ drgn dump_block_groups.py block group at 0xffff8f3d673b0400 start 22020096 length 16777216 flags 258 - system raid6 block group at 0xffff8f3d53ddb400 start 38797312 length 536870912 flags 260 - meta raid6 block group at 0xffff8f3d5f4d9c00 start 575668224 length 2147483648 flags 257 - data raid6 block group at 0xffff8f3d08189000 start 2723151872 length 67108864 flags 258 - system raid6 block group at 0xffff8f3db70ff000 start 2790260736 length 1073741824 flags 260 - meta raid6 block group at 0xffff8f3d5f4dd800 start 3864002560 length 67108864 flags 258 - system raid6 block group at 0xffff8f3d67037000 start 3931111424 length 2147483648 flags 257 - data raid6 $ So there were only 2 reasons left for having a readahead extent with a single zone: reada_find_zone(), called when creating a readahead extent, returned NULL either because we failed to find the corresponding block group or because a memory allocation failed. With some additional and custom tracing I figured out that on every further ocurrence of the problem the block group had just been deleted when we were looping to create the zones for the readahead extent (at reada_find_extent()), so we ended up with only one zone in the readahead extent, corresponding to a device that ends up getting replaced. So after figuring that out it became obvious why the hang happens: 1) Task A starts a scrub on any device of the filesystem, except for device /dev/sdd; 2) Task B starts a device replace with /dev/sdd as the source device; 3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently starting to scrub a stripe from block group X. This call to btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add() calls reada_add_block(), it passes the logical address of the extent tree's root node as its 'logical' argument - a value of 38928384; 4) Task A then enters reada_find_extent(), called from reada_add_block(). It finds there isn't any existing readahead extent for the logical address 38928384, so it proceeds to the path of creating a new one. It calls btrfs_map_block() to find out which stripes exist for the block group X. On the first iteration of the for loop that iterates over the stripes, it finds the stripe for device /dev/sdd, so it creates one zone for that device and adds it to the readahead extent. Before getting into the second iteration of the loop, the cleanup kthread deletes block group X because it was empty. So in the iterations for the remaining stripes it does not add more zones to the readahead extent, because the calls to reada_find_zone() returned NULL because they couldn't find block group X anymore. As a result the new readahead extent has a single zone, corresponding to the device /dev/sdd; 4) Before task A returns to btrfs_reada_add() and queues the readahead job for the readahead work queue, task B finishes the device replace and at btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new device /dev/sdg; 5) Task A returns to reada_add_block(), which increments the counter "->elems" of the reada_control structure allocated at btrfs_reada_add(). Then it returns back to btrfs_reada_add() and calls reada_start_machine(). This queues a job in the readahead work queue to run the function reada_start_machine_worker(), which calls __reada_start_machine(). At __reada_start_machine() we take the device list mutex and for each device found in the current device list, we call reada_start_machine_dev() to start the readahead work. However at this point the device /dev/sdd was already freed and is not in the device list anymore. This means the corresponding readahead for the extent at 38928384 is never started, and therefore the "->elems" counter of the reada_control structure allocated at btrfs_reada_add() never goes down to 0, causing the call to btrfs_reada_wait(), done by the scrub task, to wait forever. Note that the readahead request can be made either after the device replace started or before it started, however in pratice it is very unlikely that a device replace is able to start after a readahead request is made and is able to complete before the readahead request completes - maybe only on a very small and nearly empty filesystem. This hang however is not the only problem we can have with readahead and device removals. When the readahead extent has other zones other than the one corresponding to the device that is being removed (either by a device replace or a device remove operation), we risk having a use-after-free on the device when dropping the last reference of the readahead extent. For example if we create a readahead extent with two zones, one for the device /dev/sdd and one for the device /dev/sde: 1) Before the readahead worker starts, the device /dev/sdd is removed, and the corresponding btrfs_device structure is freed. However the readahead extent still has the zone pointing to the device structure; 2) When the readahead worker starts, it only finds device /dev/sde in the current device list of the filesystem; 3) It starts the readahead work, at reada_start_machine_dev(), using the device /dev/sde; 4) Then when it finishes reading the extent from device /dev/sde, it calls __readahead_hook() which ends up dropping the last reference on the readahead extent through the last call to reada_extent_put(); 5) At reada_extent_put() it iterates over each zone of the readahead extent and attempts to delete an element from the device's 'reada_extents' radix tree, resulting in a use-after-free, as the device pointer of the zone for /dev/sdd is now stale. We can also access the device after dropping the last reference of a zone, through reada_zone_release(), also called by reada_extent_put(). And a device remove suffers the same problem, however since it shrinks the device size down to zero before removing the device, it is very unlikely to still have readahead requests not completed by the time we free the device, the only possibility is if the device has a very little space allocated. While the hang problem is exclusive to scrub, since it is currently the only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free problem affects any path that triggers readhead, which includes btree_readahead_hook() and __readahead_hook() (a readahead worker can trigger readahed for the children of a node) for example - any path that ends up calling reada_add_block() can trigger the use-after-free after a device is removed. So fix this by waiting for any readahead requests for a device to complete before removing a device, ensuring that while waiting for existing ones no new ones can be made. This problem has been around for a very long time - the readahead code was added in 2011, device remove exists since 2008 and device replace was introduced in 2013, hard to pick a specific commit for a git Fixes tag. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

We had kernel panic, it is caused by unload module and last close confirmation. call trace: [1196029.743127] free_sess+0x15/0x50 [rtrs_client] [1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client] [1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client] [1196029.743130] close_rtrs+0x25/0x50 [rnbd_client] [1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client] [1196029.743132] __x64_sys_delete_module+0x190/0x260 And in the crashdump confirmation kworker is also running. PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2" #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891 thesofproject#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98 thesofproject#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938 thesofproject#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7 thesofproject#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e thesofproject#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client] thesofproject#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client] thesofproject#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client] thesofproject#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client] thesofproject#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client] The problem is both code path try to close same session, which lead to panic. To fix it, just skip the sess if the refcount already drop to 0. Fixes: f7a7a5c ("block/rnbd: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

The bit that indicates if the device supports 160MHZ is bit #9. The macro checks bit #8. Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit. Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com> Fixes: d6f2134 ("iwlwifi: add mac/rf types and 160MHz to the device tables") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20210122144849.bddbf9b57a75.I16e09e2b1404b16bfff70852a5a654aa468579e2@changeid

Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The evlist has the maps with its own refcounts so we don't need to set the pointers to NULL. Otherwise following error was reported by Asan. # perf test -v 4 4: Read samples using the mmap interface : --- start --- test child forked, pid 139782 mmap size 528384B ================================================================= ==139782==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f1f76daee8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 #1 0x564ba21a0fea in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x564ba21a1a0f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149 #3 0x564ba21a21cf in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166 #4 0x564ba21a21cf in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181 #5 0x564ba1e48298 in test__basic_mmap tests/mmap-basic.c:55 #6 0x564ba1e278fb in run_test tests/builtin-test.c:428 #7 0x564ba1e278fb in test_and_print tests/builtin-test.c:458 #8 0x564ba1e29a53 in __cmd_test tests/builtin-test.c:679 #9 0x564ba1e29a53 in cmd_test tests/builtin-test.c:825 #10 0x564ba1e95cb4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #11 0x564ba1d1fa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #12 0x564ba1d1fa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #13 0x564ba1d1fa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #14 0x7f1f768e4d09 in __libc_start_main ../csu/libc-start.c:308 ... test child finished with 1 ---- end ---- Read samples using the mmap interface: FAILED! failed to open shell test directory: /home/namhyung/libexec/perf-core/tests/shell Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20210301140409.184570-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evlist has the maps with its own refcounts so we don't need to set the pointers to NULL. Otherwise following error was reported by Asan. Also change the goto label since it doesn't need to have two. # perf test -v 24 24: Number of exit events of a simple workload : --- start --- test child forked, pid 145915 mmap size 528384B ================================================================= ==145915==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fc44e50d1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164 #1 0x561cf50f4d2e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23 #2 0x561cf4eeb949 in thread_map__new_by_tid util/thread_map.c:63 #3 0x561cf4db7fd2 in test__task_exit tests/task-exit.c:74 #4 0x561cf4d798fb in run_test tests/builtin-test.c:428 #5 0x561cf4d798fb in test_and_print tests/builtin-test.c:458 #6 0x561cf4d7ba53 in __cmd_test tests/builtin-test.c:679 #7 0x561cf4d7ba53 in cmd_test tests/builtin-test.c:825 #8 0x561cf4de7d04 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #9 0x561cf4c71a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #10 0x561cf4c71a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #11 0x561cf4c71a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #12 0x7fc44e042d09 in __libc_start_main ../csu/libc-start.c:308 ... test child finished with 1 ---- end ---- Number of exit events of a simple workload: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evlist has the maps with its own refcounts so we don't need to set the pointers to NULL. Otherwise following error was reported by Asan. Also change the goto label since it doesn't need to have two. # perf test -v 25 25: Software clock events period values : --- start --- test child forked, pid 149154 mmap size 528384B mmap size 528384B ================================================================= ==149154==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fef5cd071f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164 #1 0x56260d5e8b8e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23 #2 0x56260d3df7a9 in thread_map__new_by_tid util/thread_map.c:63 #3 0x56260d2ac6b2 in __test__sw_clock_freq tests/sw-clock.c:65 #4 0x56260d26d8fb in run_test tests/builtin-test.c:428 #5 0x56260d26d8fb in test_and_print tests/builtin-test.c:458 #6 0x56260d26fa53 in __cmd_test tests/builtin-test.c:679 #7 0x56260d26fa53 in cmd_test tests/builtin-test.c:825 #8 0x56260d2dbb64 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #9 0x56260d165a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #10 0x56260d165a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #11 0x56260d165a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #12 0x7fef5c83cd09 in __libc_start_main ../csu/libc-start.c:308 ... test child finished with 1 ---- end ---- Software clock events period values : FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evlist and the cpu/thread maps should be released together. Otherwise following error was reported by Asan. Note that this test still has memory leaks in DSOs so it still fails even after this change. I'll take a look at that too. # perf test -v 26 26: Object code reading : --- start --- test child forked, pid 154184 Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols Parsing event 'cycles' mmap size 528384B ... ================================================================= ==154184==ERROR: LeakSanitizer: detected memory leaks Direct leak of 439 byte(s) in 1 object(s) allocated from: #0 0x7fcb66e77037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x55ad9b7e821e in dso__new_id util/dso.c:1256 #2 0x55ad9b8cfd4a in __machine__addnew_vdso util/vdso.c:132 #3 0x55ad9b8cfd4a in machine__findnew_vdso util/vdso.c:347 #4 0x55ad9b845b7e in map__new util/map.c:176 #5 0x55ad9b8415a2 in machine__process_mmap2_event util/machine.c:1787 #6 0x55ad9b8fab16 in perf_tool__process_synth_event util/synthetic-events.c:64 #7 0x55ad9b8fab16 in perf_event__synthesize_mmap_events util/synthetic-events.c:499 #8 0x55ad9b8fbfdf in __event__synthesize_thread util/synthetic-events.c:741 #9 0x55ad9b8ff3e3 in perf_event__synthesize_thread_map util/synthetic-events.c:833 #10 0x55ad9b738585 in do_test_code_reading tests/code-reading.c:608 #11 0x55ad9b73b25d in test__code_reading tests/code-reading.c:722 #12 0x55ad9b6f28fb in run_test tests/builtin-test.c:428 #13 0x55ad9b6f28fb in test_and_print tests/builtin-test.c:458 #14 0x55ad9b6f4a53 in __cmd_test tests/builtin-test.c:679 #15 0x55ad9b6f4a53 in cmd_test tests/builtin-test.c:825 #16 0x55ad9b760cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #17 0x55ad9b5eaa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #18 0x55ad9b5eaa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #19 0x55ad9b5eaa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #20 0x7fcb669acd09 in __libc_start_main ../csu/libc-start.c:308 ... SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s). test child finished with 1 ---- end ---- Object code reading: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evlist and the cpu/thread maps should be released together. Otherwise following error was reported by Asan. $ perf test -v 28 28: Use a dummy software event to keep tracking: --- start --- test child forked, pid 156810 mmap size 528384B ================================================================= ==156810==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f637d2bce8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 #1 0x55cc6295cffa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x55cc6295da1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149 #3 0x55cc6295e1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166 #4 0x55cc6295e1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181 #5 0x55cc626287cf in test__keep_tracking tests/keep-tracking.c:84 #6 0x55cc625e38fb in run_test tests/builtin-test.c:428 #7 0x55cc625e38fb in test_and_print tests/builtin-test.c:458 #8 0x55cc625e5a53 in __cmd_test tests/builtin-test.c:679 #9 0x55cc625e5a53 in cmd_test tests/builtin-test.c:825 #10 0x55cc62651cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #11 0x55cc624dba88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #12 0x55cc624dba88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #13 0x55cc624dba88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #14 0x7f637cdf2d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s). test child finished with 1 ---- end ---- Use a dummy software event to keep tracking: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The evlist and cpu/thread maps should be released together. Otherwise the following error was reported by Asan. $ perf test -v 35 35: Track with sched_switch : --- start --- test child forked, pid 159287 Using CPUID GenuineIntel-6-8E-C mmap size 528384B 1295 events recorded ================================================================= ==159287==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7fa28d9a2e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 #1 0x5652f5a5affa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x5652f5a5ba1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149 #3 0x5652f5a5c1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166 #4 0x5652f5a5c1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181 #5 0x5652f5723bbf in test__switch_tracking tests/switch-tracking.c:350 #6 0x5652f56e18fb in run_test tests/builtin-test.c:428 #7 0x5652f56e18fb in test_and_print tests/builtin-test.c:458 #8 0x5652f56e3a53 in __cmd_test tests/builtin-test.c:679 #9 0x5652f56e3a53 in cmd_test tests/builtin-test.c:825 #10 0x5652f574fcc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #11 0x5652f55d9a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #12 0x5652f55d9a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #13 0x5652f55d9a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #14 0x7fa28d4d8d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s). test child finished with 1 ---- end ---- Track with sched_switch: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

It missed to call perf_thread_map__put() after using the map. $ perf test -v 43 43: Synthesize thread map : --- start --- test child forked, pid 162640 ================================================================= ==162640==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fd48cdaa1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164 #1 0x563e6d5f8d0e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23 #2 0x563e6d3ef69a in thread_map__new_by_pid util/thread_map.c:46 #3 0x563e6d2cec90 in test__thread_map_synthesize tests/thread-map.c:97 #4 0x563e6d27d8fb in run_test tests/builtin-test.c:428 #5 0x563e6d27d8fb in test_and_print tests/builtin-test.c:458 #6 0x563e6d27fa53 in __cmd_test tests/builtin-test.c:679 #7 0x563e6d27fa53 in cmd_test tests/builtin-test.c:825 #8 0x563e6d2ebce4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #9 0x563e6d175a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #10 0x563e6d175a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #11 0x563e6d175a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #12 0x7fd48c8dfd09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 8224 byte(s) leaked in 2 allocation(s). test child finished with 1 ---- end ---- Synthesize thread map: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

It should be released after printing the map. $ perf test -v 52 52: Print cpu map : --- start --- test child forked, pid 172233 ================================================================= ==172233==ERROR: LeakSanitizer: detected memory leaks Direct leak of 156 byte(s) in 1 object(s) allocated from: #0 0x7fc472518e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 #1 0x55e63b378f7a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x55e63b37a05c in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:237 #3 0x55e63b056d16 in cpu_map_print tests/cpumap.c:102 #4 0x55e63b056d16 in test__cpu_map_print tests/cpumap.c:120 #5 0x55e63afff8fb in run_test tests/builtin-test.c:428 #6 0x55e63afff8fb in test_and_print tests/builtin-test.c:458 #7 0x55e63b001a53 in __cmd_test tests/builtin-test.c:679 #8 0x55e63b001a53 in cmd_test tests/builtin-test.c:825 #9 0x55e63b06dc44 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #10 0x55e63aef7a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #11 0x55e63aef7a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #12 0x55e63aef7a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #13 0x7fc47204ed09 in __libc_start_main ../csu/libc-start.c:308 ... SUMMARY: AddressSanitizer: 448 byte(s) leaked in 7 allocation(s). test child finished with 1 ---- end ---- Print cpu map: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

It should release the maps at the end. $ perf test -v 71 71: Convert perf time to TSC : --- start --- test child forked, pid 178744 mmap size 528384B 1st event perf time 59207256505278 tsc 13187166645142 rdtsc time 59207256542151 tsc 13187166723020 2nd event perf time 59207256543749 tsc 13187166726393 ================================================================= ==178744==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7faf601f9e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145 #1 0x55b620cfc00a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x55b620cfca2f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149 #3 0x55b620cfd1ef in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166 #4 0x55b620cfd1ef in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181 #5 0x55b6209ef1b2 in test__perf_time_to_tsc tests/perf-time-to-tsc.c:73 #6 0x55b6209828fb in run_test tests/builtin-test.c:428 #7 0x55b6209828fb in test_and_print tests/builtin-test.c:458 #8 0x55b620984a53 in __cmd_test tests/builtin-test.c:679 #9 0x55b620984a53 in cmd_test tests/builtin-test.c:825 #10 0x55b6209f0cd4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #11 0x55b62087aa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #12 0x55b62087aa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #13 0x55b62087aa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #14 0x7faf5fd2fd09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s). test child finished with 1 ---- end ---- Convert perf time to TSC: FAILED! Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210301140409.184570-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

I got a segfault when using -r option with event groups. The option makes it run the workload multiple times and it will reuse the evlist and evsel for each run. While most of resources are allocated and freed properly, the id hash in the evlist was not and it resulted in the bug. You can see it with the address sanitizer like below: $ perf stat -r 100 -e '{cycles,instructions}' true ================================================================= ==693052==ERROR: AddressSanitizer: heap-use-after-free on address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8 WRITE of size 8 at 0x6080000003d0 thread T0 #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644 #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237 #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244 #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285 #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765 #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782 #6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895 #7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014 #8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446 #9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308 #14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9) Actually the nodes in the hash table are struct perf_stream_id and they were freed in the previous run. Fix it by resetting the hash. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

I got several memory leak reports from Asan with a simple command. It was because VDSO is not released due to the refcount. Like in __dsos_addnew_id(), it should put the refcount after adding to the list. $ perf record true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ] ================================================================= ==692599==ERROR: LeakSanitizer: detected memory leaks Direct leak of 439 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256 #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132 #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347 #4 0x559bce50826c in map__new util/map.c:175 #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 thesofproject#6 0x559bce512f6b in machines__deliver_event util/session.c:1481 thesofproject#7 0x559bce515107 in perf_session__deliver_event util/session.c:1551 thesofproject#8 0x559bce51d4d2 in do_flush util/ordered-events.c:244 thesofproject#9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 thesofproject#10 0x559bce519bea in __perf_session__process_events util/session.c:2268 thesofproject#11 0x559bce519bea in perf_session__process_events util/session.c:2297 thesofproject#12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 thesofproject#13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 thesofproject#14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 thesofproject#15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 thesofproject#16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 thesofproject#17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 thesofproject#18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 thesofproject#19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 thesofproject#20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169 #2 0x559bce50821b in map__new util/map.c:168 #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 #4 0x559bce512f6b in machines__deliver_event util/session.c:1481 #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551 thesofproject#6 0x559bce51d4d2 in do_flush util/ordered-events.c:244 thesofproject#7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 thesofproject#8 0x559bce519bea in __perf_session__process_events util/session.c:2268 thesofproject#9 0x559bce519bea in perf_session__process_events util/session.c:2297 thesofproject#10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 thesofproject#11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 thesofproject#12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 thesofproject#13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 thesofproject#14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 thesofproject#15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 thesofproject#16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 thesofproject#17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 thesofproject#18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc #1 schedule at ffffffff81866de6 #2 inode_dio_wait at ffffffff812a2d04 #3 ocfs2_setattr at ffffffffc05f322e [ocfs2] #4 notify_change at ffffffff812a5a09 #5 do_truncate at ffffffff812808f5 #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 #7 sys_ftruncate at ffffffff81280d8e #8 do_syscall_64 at ffffffff81003949 #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc #1 schedule+54 at ffffffff81866de6 #2 rwsem_down_write_failed+536 at ffffffff8186aa28 #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 #4 down_write+45 at ffffffff81869c9d #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] #7 dio_complete+140 at ffffffff812c873c #8 dio_aio_complete_work+25 at ffffffff812c89f9 #9 process_one_work+361 at ffffffff810b1889 #10 worker_thread+77 at ffffffff810b233d #11 kthread+261 at ffffffff810b7fd5 #12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes thesofproject#9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels: + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Support for `arm64` architecture: + Added to the CI: BusyBox is cross-compiled on the fly (increased timeout a bit to match). + Requires weakening of a few compiler builtins and adding `copy_{from,to}_user` helpers. - Support for custom `--sysroot` via `KRUSTCFLAGS`. - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes thesofproject#17). - No interleaved Cargo output (fixes thesofproject#29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix thesofproject#27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks and debug assertions. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Do not export any helpers' symbols. - Updated the quick start guide. - Updated CI: + Now we always test with 2 builtins and 2 loadables Rust example drivers, removing the matrix test for builtin/loadable. + Added `toolchain` to matrix: now we test building with GCC, Clang or a full LLVM toolchain. + Added `arch` to matrix: now we test both arm64 and x86_64. + Added `rustc` to matrix: now we test with a very recent nightly as well. + Only build `output == build` once to reduce the number of combinations. + Debug x86_64 config: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release x86_64 config: disabled `EXPERT` and changed a few things to make it look more like a normal desktop configuration, although it is still pretty minimal. + The configs for arm64 are `EXPERT` and `EMBEDDED` ones, very minimal, for the particular CPU we are simulating. + Update configs to v5.10. + Use `$GITHUB_ENV` to simplify. - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. - The hack to get the proper flags for bindgen on GCC builds. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

…bit. The bit that indicates if the device supports 160MHZ is bit thesofproject#9. The macro checks bit thesofproject#8. Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit. Change-Id: I16e09e2b1404b16bfff70852a5a654aa468579e2 Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com> iwl7000-tree: 7c3faa9fa57dbcbd9d3b2c275154b42fe458a118

…nect It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP): BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000 Cc: <stable@vger.kernel.org> Fixes: d646960 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

lrgirdwo and others added 30 commits June 4, 2018 10:58

[SQUASHME?] ASoC: core: Add name prefix for machines with topology re…

c59e0d3

…writes Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: topology: Add callback for DAPM route load/unload

7b47108

Add a callback fro clients for notification about DAPM route loading and unloading. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: DAPM: Check for DAI private data - NEEDED ???

908d35a

Might not be needed. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: Intel: rename 'reef' to 'sof' in ACPI matching table

cf8e473

Align with firmware tools, no functionality change Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

ASoC: Intel: common: add firmware/topology information for SOF

0779134

No functionality change for Skylake driver, add relevant names needed by SOF. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

ASoC: SOF: Add Sound Open Firmware KControl support

40ac5ea

SOF exposes regular ALSA Kcontrols that are defined by topology. This patch converts the Kcontrol IO to DSP IPC. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add driver debug support.

936c53a

Add debugFS files that can be used to expose DSP memories and and peripherals to userspace to assist with firmware debugging. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add support for IPC IO between DSP and Host

39315dd

Define an IPC ABI for all host <--> DSP communication. This ABI should be transport agnostic. i.e. it should work on MMIO and SPI/I2C style interfaces. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add PCM operations support

6245e5e

Add support for exposing PCMs to userspace. PCMs are defined by topology and the operations in this patch map to SOF IPC calls. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add DSP firmware trace event support

1ebacc8

Add a trace event buffer that can be used by userspace to read DSP runtime trace events alongside bespoke trace data in realtime for firmware debug. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add DSP HW abstraction operations

a1e523d

Add operation pointers that can be called by core to control a wide variety of DSP targets. The DSP HW drivers will fill in these operations. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add firmware loader support

121138b

The firmware loader exports APIs that can be called by core to load and process multiple different file formats. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add compressed PCM support

b4f4dc0

Add support for compressed audio playback/capture in SOF. TODO: to be completed. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add PM support

a1ed847

Add support for saving and restoring DSP context in D3 to host DDR. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Add Nocodec machine driver support

a46071d

Add a simple "fallback" machine driver that can be used to enable SOF on boards with no codec device. This machine driver can also be forced for debug/development. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add BYT, CHT and BSW DSP HW support.

2a80a39

Add support for the audio DSP hardware found on Intel Baytrail, Cherrytrail and Braswell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add HSW HW DSP support

308c3a9

Add DSP hardware support for Intel Haswell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add support for BDW HW DSP support

9adbdbc

Add SOF support for Intel Broadwell based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add APL/CNL HW DSP support

ddcad8b

Add SOF hardware DSP support for Intel Apollolake and Cannonlake based devices. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add HDA controller for Intel DSP

fc43fc9

Support HDA controller operations for DSP and provide space for future DSP HDA FW integration. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add Intel specific HDA DSP HW operations

17eb800

Add support for various PM and core reset/run state transitions. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add Intel specific HDA IPC mechanisms.

042e224

Add HDA specific IPC mechanism for Intel DSP HW. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

ASoC: SOF: Intel: Add Intel specific HDA firmware loader

aef0e8e

Add support for loading DSP firmware on Intel HDA based platforms. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>

aplitvinov mentioned this pull request Apr 21, 2023

No sound on Huawei Matebook 14 (2021) - even on Linux 6.2 #4305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update#9

update#9
RanderWang wants to merge 75 commits intothesofproject:topic/sof-devfrom
RanderWang:topic/sof-dev

RanderWang commented Jun 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

RanderWang commented Jun 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants