Conversation
include/sound/soc.h
Outdated
| * This connection will be established at runtime by triggering the | ||
| * hostless pipeline with a kcontrol attached to the component. | ||
| */ | ||
| unsigned int virtual:1; |
There was a problem hiding this comment.
Just wondering if we can set dynamic = 1 and no_pcm = 1 to achieve the same effect and hence no need for this ?
There was a problem hiding this comment.
i had the same question, it feels somewhat redundant with existing mechanisms
There was a problem hiding this comment.
Fixed in the revised version
sound/soc/soc-pcm.c
Outdated
| /* define virtual FE DAI link */ | ||
| link->virtual = 1; | ||
| link->name = link_name; | ||
| link->id = 1; |
There was a problem hiding this comment.
I removed the ID as it isnt really needed
sound/soc/soc-pcm.c
Outdated
| link->platform_name = platform_name; | ||
| link->codec_name = "snd-soc-dummy"; | ||
| link->codec_dai_name = "snd-soc-dummy-dai"; | ||
| link->num_codecs = 1; |
There was a problem hiding this comment.
Do we need codecs ? should it b 0 ?
sound/soc/soc-pcm.c
Outdated
| link->dynamic = 1; | ||
|
|
||
| /*TODO: check if we need to handle capture for virtual FE */ | ||
| link->dpcm_playback = 1; |
There was a problem hiding this comment.
yes, this looks like a generic capability on capture where a component can capture audio and throw some sort of non-audio event DAPM knows nothing about.
There was a problem hiding this comment.
Should the virtual FE have both dpcm_playback and dpcm_capture enabled by default ?
Or does it depend on the widget that calls vfe_new(). So in the case of tone, I can only have dpcm_playback enabled?
There was a problem hiding this comment.
I've added capture support in virtual FE too. The widget creating the vfe must specify the stream direction.
sound/soc/soc-pcm.c
Outdated
| EXPORT_SYMBOL_GPL(soc_dpcm_vfe_new); | ||
|
|
||
| /* free virtual FE DAI link */ | ||
| int soc_dpcm_vfe_free(struct snd_soc_card *card) |
There was a problem hiding this comment.
we need to individually delete these link by link.
There was a problem hiding this comment.
Is the idea then that the widget that created the virtual fe, will delete it?
In that case, can I associate the link with the comp by adding it to the snd_sof_control associated with the comp?
There was a problem hiding this comment.
FIxed to delete vfe by name
sound/soc/soc-pcm.c
Outdated
| for_each_rtdcom_safe(rtd, rtdcom1, rtdcom2) | ||
| kfree(rtdcom1); | ||
|
|
||
| INIT_LIST_HEAD(&rtd->component_list); |
There was a problem hiding this comment.
Why is the list head init here in free() ?
|
|
||
| /* Drop starting point */ | ||
| list_del(widgets.next); | ||
| if (!list_is_singular(&widgets)) |
There was a problem hiding this comment.
What happens if you have more than one widget between widget and BE, it's valid and could be constructed.
There was a problem hiding this comment.
This part is a bit unclear to me. The problem I was facing was that the snd_soc_dapm_dai_get_connected_widgets() returned a list with only the BE DAI in the case of the tone pipeline and dropping the starting point resulted in an empty list of paths to process.
the dai argument passed to this method was the cpu_dai->playback_widget() which is the BE dai in my case. And this was the only way I could get it to be enabled. I could use some help here to figure out the right way to do it.
There was a problem hiding this comment.
Yes, what happens if you have tone followed by a volume control?
There was a problem hiding this comment.
We do have a tone followed by volume in my topology. But the snd_soc_dapm_dai_get_connected_widgets() is called starting with the dai->playback_widget as the starting point. And this is the BE DAI.
There was a problem hiding this comment.
I think we should also state in a comment what we are doing here and why.
sound/soc/soc-pcm.c
Outdated
| * pcm device and substream for the requested direction | ||
| */ | ||
| if (rtd->dai_link->virtual) { | ||
| struct snd_pcm_str *pstr; |
There was a problem hiding this comment.
I would offload this logic as a static function.
There was a problem hiding this comment.
and can any part of this be done by call core ALSA functions directly ?
There was a problem hiding this comment.
The only function that does this is snd_new_pcm() but that also registers the pcm device. So I created this to avoid that part.
|
Thanks, Liam and Pierre. Let me work on revising the patchset today. |
This patch proposes to enable kcontrols for siggen type dapm widgets. This will allow triggering the tone generating siggen component and also modifying the run-time attributes such as frequency, amplitude etc. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
This patch adds the methods for create a virtual FE dal link and add it to the sound card. It also adds the method to free the virtual FE connected to the card. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
When walking the graph to discover the path from the virtual FE to the BE, there is only one widget in the path. Do not remove this BE widget from the list, so it can be used to connect with the virtual FE dai link. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
…irtual dai link Virtual FE dai links should be manually set to running state by default with a pcm runtime. The active count of their cpu_dai and codec_dai's should also be updated. This is required to establish FE-BE connection and enable the BE DAI when the dpcm runtime is updated. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
|
@plbossart , @lgirdwood . I've pushed some changes. Could you please review the new version? |
Virtual FE dai links do not need to register the pcm device. So just create the empty pcm device and substream in the requested direction. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
the soc_dpcm_runtime_update() method will be called to establish a connection to the BE and enable the codec. So make this method accessible to modules. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
plbossart
left a comment
There was a problem hiding this comment.
Looks better! I think it does make sense to reuse no_pcm and dynamic, this is easier to push upstream. but I am still not clear on a couple of comments related to init/free/activation.
| dpcm_be_disconnect(rtd, stream_dir); | ||
|
|
||
| /* free pcm runtime */ | ||
| kfree(rtd->dpcm[stream_dir].runtime); |
There was a problem hiding this comment.
Question: all these frees don't reflect what is done for the _new method, could we rely on a helper to avoid tracking all these actions? It looks like either copy-paste or something extracted from somewhere else.
There was a problem hiding this comment.
This is actually not a copy-paste at all. I traced all the steps from the point of adding the link to the time when the sound card is registered and freed all the data that is allocated. Here're the steps I traced:
- Bind DAI link: The PCM runtime and the dpcm runtime is created during this step.
There are some other runtime child structures allocated during this step too ie codec dais and component list. - Probe DAI link :The pcm device and the substream is created here.
- dpcm runtime update: the FE link is connected to the BE
So, in the vfe_free() method, I undid all of this.
But I wonder if I can avoid all of this manually in the free method. Let me dig into these steps a bit more.
|
|
||
| /* Drop starting point */ | ||
| list_del(widgets.next); | ||
| if (!list_is_singular(&widgets)) |
There was a problem hiding this comment.
Yes, what happens if you have tone followed by a volume control?
|
|
||
| soc_add_pcm_runtime(card, rtd); | ||
|
|
||
| /* if this is a virtual FE link, create runtime and set it as active */ |
There was a problem hiding this comment.
I don't know what 'active' runtime means. I would think it becomes 'active' when the switch is On, but maybe it's a different concept you are referring to.
There was a problem hiding this comment.
Oh, what I mean by active here is when the kcontrol is switched on and the dpcm_runtime_update() is called, it checks to see if playback or capture is active on the FE link. So I set the FE playback/capture active status by default.
There was a problem hiding this comment.
I see, but I wonder if this has negative implications related to suspend/resume? If there is nothing going on it's a bit odd to force a runtime to be active.
There was a problem hiding this comment.
Yes, thats a good point.
The problem is the runtime_update() method is oblivious to where it is being called from. It just looks for an active link to process the update.
Maybe I can avoid this by activating the FE in the kcontrol IO handler just before calling runtime_update.
lgirdwood
left a comment
There was a problem hiding this comment.
Sorry, have more questions on the core changes.
|
|
||
| /* Create any new FE <--> BE connections */ | ||
| for (i = 0; i < list->num_widgets; i++) { | ||
|
|
| substream->pstr = pstr; | ||
| substream->number = 0; | ||
| substream->stream = stream_dir; | ||
| sprintf(substream->name, "subdevice #%i", 0); |
There was a problem hiding this comment.
Is the name exposed by /sysff or /proc ? if so, do we have any naming collisions when we register 2 or more VFE ?
There was a problem hiding this comment.
hmm I did not check if it is exposed. Let me try to make it unique
| kcname_in_long_name = true; | ||
| } else { | ||
| switch (w->id) { | ||
| case snd_soc_dapm_siggen: |
There was a problem hiding this comment.
as discussed last week, I wonder if we should also add a snd_soc_dapm_sig_sink (the dual of a generator). This would be very useful for testing e.g. by letting us set-up all sort of DMA channels with a self-test checking.
There was a problem hiding this comment.
Sure, I will add it to my next update.
|
@plbossart , @lgirdwood . I'm closing this pull request and replacing it with a new pull request that has all the commits to enable the tone generator. This will be helpful for you review the entire flow for enabling the VFE link and the pipeline. I have addressed your comments in the new pull request. |
[ Upstream commit 934140a ] cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... thesofproject#7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 thesofproject#8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 thesofproject#9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: 9ae326a ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue <carmark.dlut@gmail.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Reported-by: Anthony DeRobertis <aderobertis@metrics.net> Reported-by: NeilBrown <neilb@suse.com> Reported-by: Daniel Axtens <dja@axtens.net> Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5ba1d9 upstream. We have reports of the following crash: PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0" #0 [ffff88085c6db710] machine_kexec at ffffffff81046239 #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248 #2 [ffff88085c6db830] oops_end at ffffffff81008ae7 #3 [ffff88085c6db860] no_context at ffffffff81050b8f thesofproject#4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75 thesofproject#5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83 thesofproject#6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e thesofproject#7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c thesofproject#8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122 [exception RIP: uart_put_char+149] RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006 RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081 RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120 RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320 R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000 R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 thesofproject#9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544 thesofproject#10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c thesofproject#11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b thesofproject#12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2 thesofproject#13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b thesofproject#14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a thesofproject#15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016 thesofproject#16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194 thesofproject#17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a thesofproject#18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2 thesofproject#19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d thesofproject#20 [ffff88085c6dbeb0] kthread at ffffffff81096384 thesofproject#21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f after slogging through some dissasembly: ffffffff814b6720 <uart_put_char>: ffffffff814b6720: 55 push %rbp ffffffff814b6721: 48 89 e5 mov %rsp,%rbp ffffffff814b6724: 48 83 ec 20 sub $0x20,%rsp ffffffff814b6728: 48 89 1c 24 mov %rbx,(%rsp) ffffffff814b672c: 4c 89 64 24 08 mov %r12,0x8(%rsp) ffffffff814b6731: 4c 89 6c 24 10 mov %r13,0x10(%rsp) ffffffff814b6736: 4c 89 74 24 18 mov %r14,0x18(%rsp) ffffffff814b673b: e8 b0 8e 58 00 callq ffffffff81a3f5f0 <mcount> ffffffff814b6740: 4c 8b a7 88 02 00 00 mov 0x288(%rdi),%r12 ffffffff814b6747: 45 31 ed xor %r13d,%r13d ffffffff814b674a: 41 89 f6 mov %esi,%r14d ffffffff814b674d: 49 83 bc 24 70 01 00 cmpq $0x0,0x170(%r12) ffffffff814b6754: 00 00 ffffffff814b6756: 49 8b 9c 24 80 01 00 mov 0x180(%r12),%rbx ffffffff814b675d: 00 ffffffff814b675e: 74 2f je ffffffff814b678f <uart_put_char+0x6f> ffffffff814b6760: 48 89 df mov %rbx,%rdi ffffffff814b6763: e8 a8 67 58 00 callq ffffffff81a3cf10 <_raw_spin_lock_irqsave> ffffffff814b6768: 41 8b 8c 24 78 01 00 mov 0x178(%r12),%ecx ffffffff814b676f: 00 ffffffff814b6770: 89 ca mov %ecx,%edx ffffffff814b6772: f7 d2 not %edx ffffffff814b6774: 41 03 94 24 7c 01 00 add 0x17c(%r12),%edx ffffffff814b677b: 00 ffffffff814b677c: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b6782: 75 23 jne ffffffff814b67a7 <uart_put_char+0x87> ffffffff814b6784: 48 89 c6 mov %rax,%rsi ffffffff814b6787: 48 89 df mov %rbx,%rdi ffffffff814b678a: e8 e1 64 58 00 callq ffffffff81a3cc70 <_raw_spin_unlock_irqrestore> ffffffff814b678f: 44 89 e8 mov %r13d,%eax ffffffff814b6792: 48 8b 1c 24 mov (%rsp),%rbx ffffffff814b6796: 4c 8b 64 24 08 mov 0x8(%rsp),%r12 ffffffff814b679b: 4c 8b 6c 24 10 mov 0x10(%rsp),%r13 ffffffff814b67a0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff814b67a5: c9 leaveq ffffffff814b67a6: c3 retq ffffffff814b67a7: 49 8b 94 24 70 01 00 mov 0x170(%r12),%rdx ffffffff814b67ae: 00 ffffffff814b67af: 48 63 c9 movslq %ecx,%rcx ffffffff814b67b2: 41 b5 01 mov $0x1,%r13b ffffffff814b67b5: 44 88 34 0a mov %r14b,(%rdx,%rcx,1) ffffffff814b67b9: 41 8b 94 24 78 01 00 mov 0x178(%r12),%edx ffffffff814b67c0: 00 ffffffff814b67c1: 83 c2 01 add $0x1,%edx ffffffff814b67c4: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b67ca: 41 89 94 24 78 01 00 mov %edx,0x178(%r12) ffffffff814b67d1: 00 ffffffff814b67d2: eb b0 jmp ffffffff814b6784 <uart_put_char+0x64> ffffffff814b67d4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff814b67db: 00 00 00 00 00 for our build, this is crashing at: circ->buf[circ->head] = c; Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf) protected by the "per-port mutex", which based on uart_port_check() is state->port.mutex. Indeed, the lock acquired in uart_put_char() is uport->lock, i.e. not the same lock. Anyway, since the lock is not acquired, if uart_shutdown() is called, the last chunk of that function may release state->xmit.buf before its assigned to null, and cause the race above. To fix it, let's lock uport->lock when allocating/deallocating state->xmit.buf in addition to the per-port mutex. v2: switch to locking uport->lock on allocation/deallocation instead of locking the per-port mutex in uart_put_char. Note that since uport->lock is a spin lock, we have to switch the allocation to GFP_ATOMIC. v3: move the allocation outside the lock, so we can switch back to GFP_KERNEL Signed-off-by: Tycho Andersen <tycho@tycho.ws> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 286e877 upstream. Commit efda1b5 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Introduced additional hardening for ambiguity in the ACPI spec for ars_status output sizing. However, it had a couple of cases mixed up. Where it should have been checking for (and returning) "out_field[1] - 4" it was using "out_field[1] - 8" and vice versa. This caused a four byte discrepancy in the buffer size passed on to the command handler, and in some cases, this caused memory corruption like: ./daxdev-errors.sh: line 76: 24104 Aborted (core dumped) ./daxdev-errors $busdev $region malloc(): memory corruption Program received signal SIGABRT, Aborted. [...] thesofproject#5 0x00007ffff7865a2e in calloc () from /lib64/libc.so.6 thesofproject#6 0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136 thesofproject#7 0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144 thesofproject#8 test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>) at daxdev-errors.c:332 Cc: <stable@vger.kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Lukasz Dorau <lukasz.dorau@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Fixes: efda1b5 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-of-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The workaround for missing CAS bit is also needed for xHC on Intel sunrisepoint PCH. For more details see: Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8 Cc: <stable@vger.kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cc4a08 upstream. info->nr_rings isn't adjusted in case of ENOMEM error from negotiate_mq(). This leads to kernel panic in error path. Typical call stack involving panic - thesofproject#8 page_fault at ffffffff8175936f [exception RIP: blkif_free_ring+33] RIP: ffffffffa0149491 RSP: ffff8804f7673c08 RFLAGS: 00010292 ... thesofproject#9 blkif_free at ffffffffa0149aaa [xen_blkfront] thesofproject#10 talk_to_blkback at ffffffffa014c8cd [xen_blkfront] thesofproject#11 blkback_changed at ffffffffa014ea8b [xen_blkfront] thesofproject#12 xenbus_otherend_changed at ffffffff81424670 thesofproject#13 backend_changed at ffffffff81426dc3 thesofproject#14 xenwatch_thread at ffffffff81422f29 thesofproject#15 kthread at ffffffff810abe6a thesofproject#16 ret_from_fork at ffffffff81754078 Cc: stable@vger.kernel.org Fixes: 7ed8ce1 ("xen-blkfront: move negotiate_mq to cover all cases of new VBDs") Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…eadlock commit b72c3ab upstream. [BUG] For certain crafted image, whose csum root leaf has missing backref, if we try to trigger write with data csum, it could cause deadlock with the following kernel WARN_ON(): WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400 CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ thesofproject#8 Workqueue: btrfs-endio-write btrfs_endio_write_helper RIP: 0010:btrfs_tree_lock+0x3e2/0x400 Call Trace: btrfs_alloc_tree_block+0x39f/0x770 __btrfs_cow_block+0x285/0x9e0 btrfs_cow_block+0x191/0x2e0 btrfs_search_slot+0x492/0x1160 btrfs_lookup_csum+0xec/0x280 btrfs_csum_file_blocks+0x2be/0xa60 add_pending_csums+0xaf/0xf0 btrfs_finish_ordered_io+0x74b/0xc90 finish_ordered_fn+0x15/0x20 normal_work_helper+0xf6/0x500 btrfs_endio_write_helper+0x12/0x20 process_one_work+0x302/0x770 worker_thread+0x81/0x6d0 kthread+0x180/0x1d0 ret_from_fork+0x35/0x40 [CAUSE] That crafted image has missing backref for csum tree root leaf. And when we try to allocate new tree block, since there is no EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot and use it. The extent tree of the image looks like: Normal image | This fuzzed image ----------------------------------+-------------------------------- BG 29360128 | BG 29360128 One empty slot | One empty slot 29364224: backref to UUID tree | 29364224: backref to UUID tree Two empty slots | Two empty slots 29376512: backref to CSUM tree | One empty slot (bad type) <<< 29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree ... | ... Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to alloc tree block, it's an valid slot for btrfs. And for finish_ordered_write, when we need to insert csum, we try to CoW csum tree root. By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K, BG_OFFSET + 12K is already used by tree block COW for other trees, the next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM tree. But due to the bad type, btrfs can recognize it and still consider it as an empty slot, and will try to use it for csum tree CoW. Then in the following call trace, we will try to lock the new tree block, which turns out to be the old csum tree root which is already locked: btrfs_search_slot() called on csum tree root, which is at 29376512 |- btrfs_cow_block() |- btrfs_set_lock_block() | |- Now locks tree block 29376512 (old csum tree root) |- __btrfs_cow_block() |- btrfs_alloc_tree_block() |- btrfs_reserve_extent() | Now it returns tree block 29376512, which extent tree | shows its empty slot, but it's already hold by csum tree |- btrfs_init_new_buffer() |- btrfs_tree_lock() | Triggers WARN_ON(eb->lock_owner == current->pid) |- wait_event() Wait lock owner to release the lock, but it's locked by ourself, so it will deadlock [FIX] This patch will do the lock_owner and current->pid check at btrfs_init_new_buffer(). So above deadlock can be avoided. Since such problem can only happen in crafted image, we will still trigger kernel warning for later aborted transaction, but with a little more meaningful warning message. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405 Reported-by: Xu Wen <wen.xu@gatech.edu> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 33a1a7b ] The issue is found by a fuzzing test. If tty_find_polling_driver() recevies an incorrect input such as ',,' or '0b', the len becomes 0 and strncmp() always return 0. In this case, a null p->ops->poll_init() is called and it causes a kernel panic. Fix this by checking name length against zero in tty_find_polling_driver(). $echo ,, > /sys/module/kgdboc/parameters/kgdboc [ 20.804451] WARNING: CPU: 1 PID: 104 at drivers/tty/serial/serial_core.c:457 uart_get_baud_rate+0xe8/0x190 [ 20.804917] Modules linked in: [ 20.805317] CPU: 1 PID: 104 Comm: sh Not tainted 4.19.0-rc7ajb thesofproject#8 [ 20.805469] Hardware name: linux,dummy-virt (DT) [ 20.805732] pstate: 20000005 (nzCv daif -PAN -UAO) [ 20.805895] pc : uart_get_baud_rate+0xe8/0x190 [ 20.806042] lr : uart_get_baud_rate+0xc0/0x190 [ 20.806476] sp : ffffffc06acff940 [ 20.806676] x29: ffffffc06acff940 x28: 0000000000002580 [ 20.806977] x27: 0000000000009600 x26: 0000000000009600 [ 20.807231] x25: ffffffc06acffad0 x24: 00000000ffffeff0 [ 20.807576] x23: 0000000000000001 x22: 0000000000000000 [ 20.807807] x21: 0000000000000001 x20: 0000000000000000 [ 20.808049] x19: ffffffc06acffac8 x18: 0000000000000000 [ 20.808277] x17: 0000000000000000 x16: 0000000000000000 [ 20.808520] x15: ffffffffffffffff x14: ffffffff00000000 [ 20.808757] x13: ffffffffffffffff x12: 0000000000000001 [ 20.809011] x11: 0101010101010101 x10: ffffff880d59ff5f [ 20.809292] x9 : ffffff880d59ff5e x8 : ffffffc06acffaf3 [ 20.809549] x7 : 0000000000000000 x6 : ffffff880d59ff5f [ 20.809803] x5 : 0000000080008001 x4 : 0000000000000003 [ 20.810056] x3 : ffffff900853e6b4 x2 : dfffff9000000000 [ 20.810693] x1 : ffffffc06acffad0 x0 : 0000000000000cb0 [ 20.811005] Call trace: [ 20.811214] uart_get_baud_rate+0xe8/0x190 [ 20.811479] serial8250_do_set_termios+0xe0/0x6f4 [ 20.811719] serial8250_set_termios+0x48/0x54 [ 20.811928] uart_set_options+0x138/0x1bc [ 20.812129] uart_poll_init+0x114/0x16c [ 20.812330] tty_find_polling_driver+0x158/0x200 [ 20.812545] configure_kgdboc+0xbc/0x1bc [ 20.812745] param_set_kgdboc_var+0xb8/0x150 [ 20.812960] param_attr_store+0xbc/0x150 [ 20.813160] module_attr_store+0x40/0x58 [ 20.813364] sysfs_kf_write+0x8c/0xa8 [ 20.813563] kernfs_fop_write+0x154/0x290 [ 20.813764] vfs_write+0xf0/0x278 [ 20.813951] __arm64_sys_write+0x84/0xf4 [ 20.814400] el0_svc_common+0xf4/0x1dc [ 20.814616] el0_svc_handler+0x98/0xbc [ 20.814804] el0_svc+0x8/0xc [ 20.822005] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 20.826913] Mem abort info: [ 20.827103] ESR = 0x84000006 [ 20.827352] Exception class = IABT (current EL), IL = 16 bits [ 20.827655] SET = 0, FnV = 0 [ 20.827855] EA = 0, S1PTW = 0 [ 20.828135] user pgtable: 4k pages, 39-bit VAs, pgdp = (____ptrval____) [ 20.828484] [0000000000000000] pgd=00000000aadee003, pud=00000000aadee003, pmd=0000000000000000 [ 20.829195] Internal error: Oops: 84000006 [#1] SMP [ 20.829564] Modules linked in: [ 20.829890] CPU: 1 PID: 104 Comm: sh Tainted: G W 4.19.0-rc7ajb thesofproject#8 [ 20.830545] Hardware name: linux,dummy-virt (DT) [ 20.830829] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 20.831174] pc : (null) [ 20.831457] lr : serial8250_do_set_termios+0x358/0x6f4 [ 20.831727] sp : ffffffc06acff9b0 [ 20.831936] x29: ffffffc06acff9b0 x28: ffffff9008d7c000 [ 20.832267] x27: ffffff900969e16f x26: 0000000000000000 [ 20.832589] x25: ffffff900969dfb0 x24: 0000000000000000 [ 20.832906] x23: ffffffc06acffad0 x22: ffffff900969e160 [ 20.833232] x21: 0000000000000000 x20: ffffffc06acffac8 [ 20.833559] x19: ffffff900969df90 x18: 0000000000000000 [ 20.833878] x17: 0000000000000000 x16: 0000000000000000 [ 20.834491] x15: ffffffffffffffff x14: ffffffff00000000 [ 20.834821] x13: ffffffffffffffff x12: 0000000000000001 [ 20.835143] x11: 0101010101010101 x10: ffffff880d59ff5f [ 20.835467] x9 : ffffff880d59ff5e x8 : ffffffc06acffaf3 [ 20.835790] x7 : 0000000000000000 x6 : ffffff880d59ff5f [ 20.836111] x5 : c06419717c314100 x4 : 0000000000000007 [ 20.836419] x3 : 0000000000000000 x2 : 0000000000000000 [ 20.836732] x1 : 0000000000000001 x0 : ffffff900969df90 [ 20.837100] Process sh (pid: 104, stack limit = 0x(____ptrval____)) [ 20.837396] Call trace: [ 20.837566] (null) [ 20.837816] serial8250_set_termios+0x48/0x54 [ 20.838089] uart_set_options+0x138/0x1bc [ 20.838570] uart_poll_init+0x114/0x16c [ 20.838834] tty_find_polling_driver+0x158/0x200 [ 20.839119] configure_kgdboc+0xbc/0x1bc [ 20.839380] param_set_kgdboc_var+0xb8/0x150 [ 20.839658] param_attr_store+0xbc/0x150 [ 20.839920] module_attr_store+0x40/0x58 [ 20.840183] sysfs_kf_write+0x8c/0xa8 [ 20.840183] sysfs_kf_write+0x8c/0xa8 [ 20.840440] kernfs_fop_write+0x154/0x290 [ 20.840702] vfs_write+0xf0/0x278 [ 20.840942] __arm64_sys_write+0x84/0xf4 [ 20.841209] el0_svc_common+0xf4/0x1dc [ 20.841471] el0_svc_handler+0x98/0xbc [ 20.841713] el0_svc+0x8/0xc [ 20.842057] Code: bad PC value [ 20.842764] ---[ end trace a8835d7de79aaadf ]--- [ 20.843134] Kernel panic - not syncing: Fatal exception [ 20.843515] SMP: stopping secondary CPUs [ 20.844289] Kernel Offset: disabled [ 20.844634] CPU features: 0x0,21806002 [ 20.844857] Memory Limit: none [ 20.845172] ---[ end Kernel panic - not syncing: Fatal exception ]--- Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcd5e74 upstream. When running generic/475, we may get the following warning in dmesg: [ 6902.102154] WARNING: CPU: 3 PID: 18013 at fs/btrfs/extent-tree.c:9776 btrfs_free_block_groups+0x2af/0x3b0 [btrfs] [ 6902.109160] CPU: 3 PID: 18013 Comm: umount Tainted: G W O 4.19.0-rc8+ thesofproject#8 [ 6902.110971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 6902.112857] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs] [ 6902.118921] RSP: 0018:ffffc9000459bdb0 EFLAGS: 00010286 [ 6902.120315] RAX: ffff880175050bb0 RBX: ffff8801124a8000 RCX: 0000000000170007 [ 6902.121969] RDX: 0000000000000002 RSI: 0000000000170007 RDI: ffffffff8125fb74 [ 6902.123716] RBP: ffff880175055d10 R08: 0000000000000000 R09: 0000000000000000 [ 6902.125417] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175055d88 [ 6902.127129] R13: ffff880175050bb0 R14: 0000000000000000 R15: dead000000000100 [ 6902.129060] FS: 00007f4507223780(0000) GS:ffff88017ba00000(0000) knlGS:0000000000000000 [ 6902.130996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6902.132558] CR2: 00005623599cac78 CR3: 000000014b700001 CR4: 00000000003606e0 [ 6902.134270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6902.135981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6902.137836] Call Trace: [ 6902.138939] close_ctree+0x171/0x330 [btrfs] [ 6902.140181] ? kthread_stop+0x146/0x1f0 [ 6902.141277] generic_shutdown_super+0x6c/0x100 [ 6902.142517] kill_anon_super+0x14/0x30 [ 6902.143554] btrfs_kill_super+0x13/0x100 [btrfs] [ 6902.144790] deactivate_locked_super+0x2f/0x70 [ 6902.146014] cleanup_mnt+0x3b/0x70 [ 6902.147020] task_work_run+0x9e/0xd0 [ 6902.148036] do_syscall_64+0x470/0x600 [ 6902.149142] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 6902.150375] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 6902.151640] RIP: 0033:0x7f45077a6a7b [ 6902.157324] RSP: 002b:00007ffd589f3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 6902.159187] RAX: 0000000000000000 RBX: 000055e8eec732b0 RCX: 00007f45077a6a7b [ 6902.160834] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e8eec73490 [ 6902.162526] RBP: 0000000000000000 R08: 000055e8eec734b0 R09: 00007ffd589f26c0 [ 6902.164141] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e8eec73490 [ 6902.165815] R13: 00007f4507ac61a4 R14: 0000000000000000 R15: 00007ffd589f40d8 [ 6902.167553] irq event stamp: 0 [ 6902.168998] hardirqs last enabled at (0): [<0000000000000000>] (null) [ 6902.170731] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00 [ 6902.172773] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00 [ 6902.174671] softirqs last disabled at (0): [<0000000000000000>] (null) [ 6902.176407] ---[ end trace 463138c2986b275c ]--- [ 6902.177636] BTRFS info (device dm-3): space_info 4 has 273465344 free, is not full [ 6902.179453] BTRFS info (device dm-3): space_info total=276824064, used=4685824, pinned=18446744073708158976, reserved=0, may_use=0, readonly=65536 In the above line there's "pinned=18446744073708158976" which is an unsigned u64 value of -1392640, an obvious underflow. When transaction_kthread is running cleanup_transaction(), another fsstress is running btrfs_commit_transaction(). The btrfs_finish_extent_commit() may get the same range as btrfs_destroy_pinned_extent() got, which causes the pinned underflow. Fixes: d4b450c ("Btrfs: fix race between transaction commit and empty block group removal") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was observed that a process blocked indefintely in __fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP to be cleared via fscache_wait_for_deferred_lookup(). At this time, ->backing_objects was empty, which would normaly prevent __fscache_read_or_alloc_page() from getting to the point of waiting. This implies that ->backing_objects was cleared *after* __fscache_read_or_alloc_page was was entered. When an object is "killed" and then "dropped", FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is ->backing_objects cleared. This leaves a window where something else can set FSCACHE_COOKIE_LOOKING_UP and __fscache_read_or_alloc_page() can start waiting, before ->backing_objects is cleared There is some uncertainty in this analysis, but it seems to be fit the observations. Adding the wake in this patch will be handled correctly by __fscache_read_or_alloc_page(), as it checks if ->backing_objects is empty again, after waiting. Customer which reported the hang, also report that the hang cannot be reproduced with this fix. The backtrace for the blocked process looked like: PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh" #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed #2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8 #3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e #4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache] #5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache] #6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs] #7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs] #8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73 #9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs] #10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756 #11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa #12 [ffff881ff43eff18] sys_read at ffffffff811fda62 #13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David Howells <dhowells@redhat.com>
Function graph tracing recurses into itself when stackleak is enabled, causing the ftrace graph selftest to run for up to 90 seconds and trigger the softlockup watchdog. Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 200 mcount_get_lr_addr x0 // pointer to function's saved lr (gdb) bt \#0 ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200 \#1 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \#2 0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106 \#3 0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507 \#4 0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286 \#5 ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321 \#6 0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153 \#7 0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876 \#8 0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650 \#9 0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205 Rework so we mark stackleak_track_stack as notrace Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Kees Cook <keescook@chromium.org>
The *_frag_reasm() functions are susceptible to miscalculating the byte count of packet fragments in case the truesize of a head buffer changes. The truesize member may be changed by the call to skb_unclone(), leaving the fragment memory limit counter unbalanced even if all fragments are processed. This miscalculation goes unnoticed as long as the network namespace which holds the counter is not destroyed. Should an attempt be made to destroy a network namespace that holds an unbalanced fragment memory limit counter the cleanup of the namespace never finishes. The thread handling the cleanup gets stuck in inet_frags_exit_net() waiting for the percpu counter to reach zero. The thread is usually in running state with a stacktrace similar to: PID: 1073 TASK: ffff880626711440 CPU: 1 COMMAND: "kworker/u48:4" #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480 #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856 #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0 #10 [ffff880621563e38] process_one_work at ffffffff81096f14 It is not possible to create new network namespaces, and processes that call unshare() end up being stuck in uninterruptible sleep state waiting to acquire the net_mutex. The bug was observed in the IPv6 netfilter code by Per Sundstrom. I thank him for his analysis of the problem. The parts of this patch that apply to IPv4 and IPv6 fragment reassembly are preemptive measures. Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Reported-by: Per Sundstrom <per.sundstrom@redqube.se> Acked-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit b2ca374 upstream. syzbot hit the following crash on upstream commit 87ef120 (Wed Apr 18 19:48:17 2018 +0000) Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value BUG: unable to handle kernel paging request at ffffed006b2a50c0 PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ thesofproject#8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline] RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06 RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8 R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80 R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600 FS: 0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803 mount_bdev+0x30c/0x3e0 fs/super.c:1165 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1268 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0x564/0x3070 fs/namespace.c:2847 ksys_mount+0x12d/0x140 fs/namespace.c:3063 __do_sys_mount fs/namespace.c:3077 [inline] __se_sys_mount fs/namespace.c:3074 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443d6a RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000 RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0 RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0 CR2: ffffed006b2a50c0 ---[ end trace a2034989e196ff17 ]--- Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
… allocation commit a1ad1cc upstream. After memory allocation failure vc_allocate() doesn't clean up data which has been initialized in visual_init(). In case of fbcon this leads to divide-by-0 in fbcon_init() on next open of the same tty. memory allocation in vc_allocate() may fail here: 1097: vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_KERNEL); on next open() fbcon_init() skips vc_font.data initialization: 1088: if (!p->fontdata) { division by zero in fbcon_init() happens here: 1149: new_cols /= vc->vc_font.width; Additional check is needed in fbcon_deinit() to prevent usage of uninitialized vc_screenbuf: 1251: if (vc->vc_hi_font_mask && vc->vc_screenbuf) 1252: set_vc_hi_font(vc, false); Crash: thesofproject#6 [ffffc90001eafa60] divide_error at ffffffff81a00be4 [exception RIP: fbcon_init+463] RIP: ffffffff814b860f RSP: ffffc90001eafb18 RFLAGS: 00010246 ... thesofproject#7 [ffffc90001eafb60] visual_init at ffffffff8154c36e thesofproject#8 [ffffc90001eafb80] vc_allocate at ffffffff8154f53c thesofproject#9 [ffffc90001eafbc8] con_install at ffffffff8154f624 ... Signed-off-by: Grzegorz Halat <ghalat@redhat.com> Reviewed-by: Oleksandr Natalenko <oleksandr@redhat.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…git/vgupta/arc Pull ARC updates from Vineet Gupta: - long due rewrite of do_page_fault - refactoring of entry/exit code to utilize the double load/store instructions - hsdk platform updates * tag 'arc-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [plat-hsdk]: Enable AXI DW DMAC in defconfig ARC: [plat-hsdk]: enable DW SPI controller ARC: hide unused function unw_hdr_alloc ARC: [haps] Add Virtio support ARCv2: entry: simplify return to Delay Slot via interrupt ARC: entry: EV_Trap expects r10 (vs. r9) to have exception cause ARCv2: entry: rewrite to enable use of double load/stores LDD/STD ARCv2: entry: avoid a branch ARCv2: entry: push out the Z flag unclobber from common EXCEPTION_PROLOGUE ARCv2: entry: comments about hardware auto-save on taken interrupts ARC: mm: do_page_fault refactor thesofproject#8: release mmap_sem sooner ARC: mm: do_page_fault refactor thesofproject#7: fold the various error handling ARC: mm: do_page_fault refactor thesofproject#6: error handlers to use same pattern ARC: mm: do_page_fault refactor thesofproject#5: scoot no_context to end ARC: mm: do_page_fault refactor thesofproject#4: consolidate retry related logic ARC: mm: do_page_fault refactor #3: tidyup vma access permission code ARC: mm: do_page_fault refactor #2: remove short lived variable ARC: mm: do_page_fault refactor #1: remove label @good_area
A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revert the commit bd293d0. The proper fix has been made available with commit d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread"). Note that the fix offered by commit bd293d0 doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: bd293d0 ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
It can happen that a commit message refers to an invalid commit id,
because the referenced hash changed following a rebase, or simply by
mistake. Add a check in checkpatch.pl which checks that an hash
referenced by a Fixes tag, or just cited in the commit message, is a valid
commit id.
$ scripts/checkpatch.pl <<'EOF'
Subject: [PATCH] test commit
Sample test commit to test checkpatch.pl
Commit 1da177e4c3f4 ("Linux-2.6.12-rc2") really exists,
commit 0bba044 ("tree") is valid but not a commit,
while commit b4cc0b1c0cca ("unknown") is invalid.
Fixes: f0cacc14cade ("unknown")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
EOF
WARNING: Unknown commit id '0bba044c4ce7', maybe rebased or not pulled?
thesofproject#8:
commit 0bba044 ("tree") is valid but not a commit,
WARNING: Unknown commit id 'b4cc0b1c0cca', maybe rebased or not pulled?
thesofproject#9:
while commit b4cc0b1c0cca ("unknown") is invalid.
WARNING: Unknown commit id 'f0cacc14cade', maybe rebased or not pulled?
thesofproject#11:
Fixes: f0cacc14cade ("unknown")
total: 0 errors, 3 warnings, 4 lines checked
Link: http://lkml.kernel.org/r/20190711001640.13398-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We would not be transmitting using the correct SYSTEMPORT transmit queue during ndo_select_queue() which looks up the internal TX ring map because while establishing the mapping we would be off by 4, so for instance, when we populate switch port mappings we would be doing: switch port 0, queue 0 -> ring index #0 switch port 0, queue 1 -> ring index #1 ... switch port 0, queue 3 -> ring index #3 switch port 1, queue 0 -> ring index #8 (4 + 4 * 1) ... instead of using ring index #4. This would cause our ndo_select_queue() to use the fallback queue mechanism which would pick up an incorrect ring for that switch port. Fix this by using the correct switch queue number instead of SYSTEMPORT queue number. Fixes: 25c4407 ("net: systemport: Simplify queue mapping logic") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: b293f02 ("Btrfs: Add writepages support") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
It is not that hard to trigger lockdep splats by calling printk from
under zone->lock. Most of them are false positives caused by lock
chains introduced early in the boot process and they do not cause any
real problems (although most of the early boot lock dependencies could
happen after boot as well). There are some console drivers which do
allocate from the printk context as well and those should be fixed. In
any case, false positives are not that trivial to workaround and it is
far from optimal to lose lockdep functionality for something that is a
non-issue.
So change has_unmovable_pages() so that it no longer calls dump_page()
itself - instead it returns a "struct page *" of the unmovable page back
to the caller so that in the case of a has_unmovable_pages() failure,
the caller can call dump_page() after releasing zone->lock. Also, make
dump_page() is able to report a CMA page as well, so the reason string
from has_unmovable_pages() can be removed.
Even though has_unmovable_pages doesn't hold any reference to the
returned page this should be reasonably safe for the purpose of
reporting the page (dump_page) because it cannot be hotremoved in the
context of memory unplug. The state of the page might change but that
is the case even with the existing code as zone->lock only plays role
for free pages.
While at it, remove a similar but unnecessary debug-only printk() as
well. A sample of one of those lockdep splats is,
WARNING: possible circular locking dependency detected
------------------------------------------------------
test.sh/8653 is trying to acquire lock:
ffffffff865a4460 (console_owner){-.-.}, at:
console_unlock+0x207/0x750
but task is already holding lock:
ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&(&zone->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock+0x2f/0x40
rmqueue_bulk.constprop.21+0xb6/0x1160
get_page_from_freelist+0x898/0x22c0
__alloc_pages_nodemask+0x2f3/0x1cd0
alloc_pages_current+0x9c/0x110
allocate_slab+0x4c6/0x19c0
new_slab+0x46/0x70
___slab_alloc+0x58b/0x960
__slab_alloc+0x43/0x70
__kmalloc+0x3ad/0x4b0
__tty_buffer_request_room+0x100/0x250
tty_insert_flip_string_fixed_flag+0x67/0x110
pty_write+0xa2/0xf0
n_tty_write+0x36b/0x7b0
tty_write+0x284/0x4c0
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
redirected_tty_write+0x6a/0xc0
do_iter_write+0x248/0x2a0
vfs_writev+0x106/0x1e0
do_writev+0xd4/0x180
__x64_sys_writev+0x45/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #2 (&(&port->lock)->rlock){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
tty_port_tty_get+0x20/0x60
tty_port_default_wakeup+0xf/0x30
tty_port_tty_wakeup+0x39/0x40
uart_write_wakeup+0x2a/0x40
serial8250_tx_chars+0x22e/0x440
serial8250_handle_irq.part.8+0x14a/0x170
serial8250_default_handle_irq+0x5c/0x90
serial8250_interrupt+0xa6/0x130
__handle_irq_event_percpu+0x78/0x4f0
handle_irq_event_percpu+0x70/0x100
handle_irq_event+0x5a/0x8b
handle_edge_irq+0x117/0x370
do_IRQ+0x9e/0x1e0
ret_from_intr+0x0/0x2a
cpuidle_enter_state+0x156/0x8e0
cpuidle_enter+0x41/0x70
call_cpuidle+0x5e/0x90
do_idle+0x333/0x370
cpu_startup_entry+0x1d/0x1f
start_secondary+0x290/0x330
secondary_startup_64+0xb6/0xc0
-> #1 (&port_lock_key){-.-.}:
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
_raw_spin_lock_irqsave+0x3a/0x50
serial8250_console_write+0x3e4/0x450
univ8250_console_write+0x4b/0x60
console_unlock+0x501/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
-> #0 (console_owner){-.-.}:
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Chain exists of:
console_owner --> &(&port->lock)->rlock --> &(&zone->lock)->rlock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&zone->lock)->rlock);
lock(&(&port->lock)->rlock);
lock(&(&zone->lock)->rlock);
lock(console_owner);
*** DEADLOCK ***
9 locks held by test.sh/8653:
#0: ffff88839ba7d408 (sb_writers#4){.+.+}, at:
vfs_write+0x25f/0x290
#1: ffff888277618880 (&of->mutex){+.+.}, at:
kernfs_fop_write+0x128/0x240
#2: ffff8898131fc218 (kn->count#115){.+.+}, at:
kernfs_fop_write+0x138/0x240
#3: ffffffff86962a80 (device_hotplug_lock){+.+.}, at:
lock_device_hotplug_sysfs+0x16/0x50
#4: ffff8884374f4990 (&dev->mutex){....}, at:
device_offline+0x70/0x110
#5: ffffffff86515250 (cpu_hotplug_lock.rw_sem){++++}, at:
__offline_pages+0xbf/0xa10
#6: ffffffff867405f0 (mem_hotplug_lock.rw_sem){++++}, at:
percpu_down_write+0x87/0x2f0
#7: ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
__offline_isolated_pages+0x179/0x3e0
#8: ffffffff865a4920 (console_lock){+.+.}, at:
vprintk_emit+0x100/0x340
stack backtrace:
Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 Gen10,
BIOS U34 05/21/2019
Call Trace:
dump_stack+0x86/0xca
print_circular_bug.cold.31+0x243/0x26e
check_noncircular+0x29e/0x2e0
check_prev_add+0x107/0xea0
validate_chain+0x8fc/0x1200
__lock_acquire+0x5b3/0xb40
lock_acquire+0x126/0x280
console_unlock+0x269/0x750
vprintk_emit+0x10d/0x340
vprintk_default+0x1f/0x30
vprintk_func+0x44/0xd4
printk+0x9f/0xc5
__offline_isolated_pages.cold.52+0x2f/0x30a
offline_isolated_pages_cb+0x17/0x30
walk_system_ram_range+0xda/0x160
__offline_pages+0x79c/0xa10
offline_pages+0x11/0x20
memory_subsys_offline+0x7e/0xc0
device_offline+0xd5/0x110
state_store+0xc6/0xe0
dev_attr_store+0x3f/0x60
sysfs_kf_write+0x89/0xb0
kernfs_fop_write+0x188/0x240
__vfs_write+0x50/0xa0
vfs_write+0x105/0x290
ksys_write+0xc6/0x160
__x64_sys_write+0x43/0x50
do_syscall_64+0xcc/0x76c
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Link: http://lkml.kernel.org/r/20200117181200.20299-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: thesofproject#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 thesofproject#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 thesofproject#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd thesofproject#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 thesofproject#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 thesofproject#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a thesofproject#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 thesofproject#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 thesofproject#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf thesofproject#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 thesofproject#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 thesofproject#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d thesofproject#17 [ffffc9002219fc90] schedule at ffffffff81a3e219 thesofproject#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 thesofproject#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 thesofproject#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc thesofproject#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c thesofproject#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 thesofproject#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: --- a/tools/trace.py +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> thesofproject#1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ #823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4 ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix tcon use-after-free and NULL ptr deref.
Customer system crashes with the following kernel log:
[462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR
[462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347107] CIFS VFS: Close unmatched open
[462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
...
[exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
#6 [...] smb2_cancelled_close_fid at ... [cifs]
#7 [...] process_one_work at ...
#8 [...] worker_thread at ...
#9 [...] kthread at ...
The most likely explanation we have is:
* When we put the last reference of a tcon (refcount=0), we close the
cached share root handle.
* If closing a handle is interrupted, SMB2_close() will
queue a SMB2_close() in a work thread.
* The queued object keeps a tcon ref so we bump the tcon
refcount, jumping from 0 to 1.
* We reach the end of cifs_put_tcon(), we free the tcon object despite
it now having a refcount of 1.
* The queued work now runs, but the tcon, ses & server was freed in
the meantime resulting in a crash.
THREAD 1
========
cifs_put_tcon => tcon refcount reach 0
SMB2_tdis
close_shroot_lease
close_shroot_lease_locked => if cached root has lease && refcount = 0
smb2_close_cached_fid => if cached root valid
SMB2_close => retry close in a thread if interrupted
smb2_handle_cancelled_close
__smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !!
INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
queue_work(cifsiod_wq, &cancelled->work) => queue work
tconInfoFree(tcon); ==> freed!
cifs_put_smb_ses(ses); ==> freed!
THREAD 2 (workqueue)
========
smb2_cancelled_close_fid
SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time
*CRASH*
Fixes: d919131 ("CIFS: Close cached root handle only if it has a lease")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: thesofproject#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 thesofproject#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 thesofproject#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd thesofproject#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 thesofproject#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 thesofproject#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a thesofproject#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 thesofproject#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 thesofproject#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf thesofproject#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 thesofproject#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 thesofproject#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d thesofproject#17 [ffffc9002219fc90] schedule at ffffffff81a3e219 thesofproject#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 thesofproject#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 thesofproject#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc thesofproject#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c thesofproject#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 thesofproject#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> #1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbef280 upstream. If KVM wasn't used at all before we crash the cleanup procedure fails with BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 23215067 P4D 23215067 PUD 23217067 PMD 0 Oops: 0000 [thesofproject#8] SMP PTI CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G D 5.6.0-rc2+ thesofproject#823 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel] The root cause is that loaded_vmcss_on_cpu list is not yet initialized, we initialize it in hardware_enable() but this only happens when we start a VM. Previously, we used to have a bitmap with enabled CPUs and that was preventing [masking] the issue. Initialized loaded_vmcss_on_cpu list earlier, right before we assign crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and blocked_vcpu_on_cpu_lock are moved altogether for consistency. Fixes: 31603d4 ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e #5 [f418de68] sctp_do_sm at f8db8c77 [sctp] thesofproject#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] thesofproject#7 [f418df48] inet_shutdown at c080baf9 thesofproject#8 [f418df5c] sys_shutdown at c079eedf thesofproject#9 [f418df7] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: jere.leppanen@nokia.com CC: marcelo.leitner@gmail.com CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 20a785a ] This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c thesofproject#1 [f418dd7c] oops_end at c0863e02 thesofproject#2 [f418dd90] do_invalid_op at c040aaca thesofproject#3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 thesofproject#4 [f418de5c] add_timer at c046fa5e thesofproject#5 [f418de68] sctp_do_sm at f8db8c77 [sctp] thesofproject#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] thesofproject#7 [f418df48] inet_shutdown at c080baf9 thesofproject#8 [f418df5c] sys_shutdown at c079eedf thesofproject#9 [f418df7] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: jere.leppanen@nokia.com CC: marcelo.leitner@gmail.com CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c85f4ab ] Fix use after free when user user space request uobject concurrently for the same object, within the RCU grace period. In that case, remove_handle_idr_uobject() is called twice and we will have an extra put on the uobject which cause use after free. Fix it by leaving the uobject write locked after it was removed from the idr. Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of UVERBS_LOOKUP_WRITE will do the work. refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 thesofproject#8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x94/0xce panic+0x234/0x56f __warn+0x1cc/0x1e1 report_bug+0x200/0x310 fixup_bug.part.11+0x32/0x80 do_error_trap+0xd3/0x100 do_invalid_op+0x31/0x40 invalid_op+0x1e/0x30 RIP: 0010:refcount_warn_saturate+0xfe/0x1a0 Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3 R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08 uverbs_uobject_put+0xfd/0x140 __uobj_perform_destroy+0x3d/0x60 ib_uverbs_close_xrcd+0x148/0x170 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49 RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 7452a3c ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate") Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/5
1 lock held by swapper/5/0:
#0: ffff80001002bd90 (samples/ftrace/sample-trace-array.c:38){+.-.}-{0:0}, at: call_timer_fn+0x8/0x3e0
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.7.0+ #8
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x20/0x30
dump_stack+0xe4/0x150
___might_sleep+0x160/0x200
__might_sleep+0x58/0x90
__mutex_lock+0x64/0x948
mutex_lock_nested+0x3c/0x58
__ftrace_set_clr_event+0x44/0x88
trace_array_set_clr_event+0x24/0x38
mytimer_handler+0x34/0x40 [sample_trace_array]
mutex_lock() will be called in interrupt context, using workqueue to fix it.
Link: https://lkml.kernel.org/r/20200610011244.2209486-1-wangkefeng.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 89ed424 ("tracing: Sample module to demonstrate kernel access to Ftrace instances.")
Reviewed-by: Divya Indi <divya.indi@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit 7e9f5e6 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc unwinder to unwind out of signal handlers using the .eh_frame information populated by our .cfi directives. In conjunction with a4eb355 ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has been shown to cause segmentation faults originating from within the unwinder during thread cancellation: | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault. | 0x0000000000435e24 in uw_frame_state_for () | (gdb) bt | #0 0x0000000000435e24 in uw_frame_state_for () | #1 0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 () | #2 0x00000000004374d8 in _Unwind_ForcedUnwind () | #3 0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121 | #4 0x0000000000429808 in __do_cancel () at ./pthreadP.h:304 | #5 sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200 | thesofproject#6 sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165 | thesofproject#7 <signal handler called> | thesofproject#8 futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 After considerable bashing of heads, it appears that our CFI directives for unwinding out of the sigreturn trampoline are only processed by libgcc when both a .eh_frame_hdr section is present *and* the mysterious NOP is covered by an entry in .eh_frame. With both of these now in place, it has highlighted that our CFI directives are not comprehensive enough to restore the stack pointer of the interrupted context. This results in libgcc falling back to an arm64-specific unwinder after computing a bogus PC value from the unwind tables. The unwinder promptly dereferences this bogus address in an attempt to see if the pointed-to instruction sequence looks like the sigreturn trampoline. Restore the old unwind behaviour, which relied solely on heuristics in the unwinder, by removing the .eh_frame_hdr section from the vDSO and commenting out the insufficient CFI directives for now. Add comments to explain the current, miserable state of affairs. Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Acked-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
I/O requests may be held in scheduler queue because of resource contention. The starvation scenario was handled properly in the regular completion path but we failed to account for it during I/O submission. This lead to the hang captured below. Make sure we run the queue when resource contention is encountered in the submission path. [ 39.054963] scsi 13:0:0:0: rejecting I/O to dead device [ 39.058700] scsi 13:0:0:0: rejecting I/O to dead device [ 39.087855] sd 13:0:0:1: [sdd] Synchronizing SCSI cache [ 39.088909] scsi 13:0:0:1: rejecting I/O to dead device [ 39.095351] scsi 13:0:0:1: rejecting I/O to dead device [ 39.096962] scsi 13:0:0:1: rejecting I/O to dead device [ 247.021859] INFO: task scsi-stress-rem:813 blocked for more than 122 seconds. [ 247.023258] Not tainted 5.8.0-rc2 thesofproject#8 [ 247.024069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.025331] scsi-stress-rem D 0 813 802 0x00004000 [ 247.025334] Call Trace: [ 247.025354] __schedule+0x504/0x55f [ 247.027987] schedule+0x72/0xa8 [ 247.027991] blk_mq_freeze_queue_wait+0x63/0x8c [ 247.027994] ? do_wait_intr_irq+0x7a/0x7a [ 247.027996] blk_cleanup_queue+0x4b/0xc9 [ 247.028000] __scsi_remove_device+0xf6/0x14e [ 247.028002] scsi_remove_device+0x21/0x2b [ 247.029037] sdev_store_delete+0x58/0x7c [ 247.029041] kernfs_fop_write+0x10d/0x14f [ 247.031281] vfs_write+0xa2/0xdf [ 247.032670] ksys_write+0x6b/0xb3 [ 247.032673] do_syscall_64+0x56/0x82 [ 247.034053] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 247.034059] RIP: 0033:0x7f69f39e9008 [ 247.036330] Code: Bad RIP value. [ 247.036331] RSP: 002b:00007ffdd8116498 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 247.037613] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f69f39e9008 [ 247.039714] RDX: 0000000000000002 RSI: 000055cde92a0ab0 RDI: 0000000000000001 [ 247.039715] RBP: 000055cde92a0ab0 R08: 000000000000000a R09: 00007f69f3a79e80 [ 247.039716] R10: 000000000000000a R11: 0000000000000246 R12: 00007f69f3abb780 [ 247.039717] R13: 0000000000000002 R14: 00007f69f3ab6740 R15: 0000000000000002 Link: https://lore.kernel.org/r/20200720025435.812030-1-ming.lei@redhat.com Cc: linux-block@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch addresses an irq free warning and null pointer dereference error problem when nvme devices got timeout error during initialization. This problem happens when nvme_timeout() function is called while nvme_reset_work() is still in execution. This patch fixed the problem by setting flag of the problematic request to NVME_REQ_CANCELLED before calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns an error code and let nvme_submit_sync_cmd() fail gracefully. The following is console output. [ 62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller [ 62.488796] nvme nvme0: could not set timestamp (881) [ 62.494888] ------------[ cut here ]------------ [ 62.495142] Trying to free already-free IRQ 11 [ 62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370 [ 62.495742] Modules linked in: [ 62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8 [ 62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 62.496772] Workqueue: nvme-reset-wq nvme_reset_work [ 62.497019] RIP: 0010:free_irq+0x1f7/0x370 [ 62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70 [ 62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086 [ 62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000 [ 62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c [ 62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163 [ 62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00 [ 62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000 [ 62.500140] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 62.500534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 62.501864] Call Trace: [ 62.501993] pci_free_irq+0x13/0x20 [ 62.502167] nvme_reset_work+0x5d0/0x12a0 [ 62.502369] ? update_load_avg+0x59/0x580 [ 62.502569] ? ttwu_queue_wakelist+0xa8/0xc0 [ 62.502780] ? try_to_wake_up+0x1a2/0x450 [ 62.502979] process_one_work+0x1d2/0x390 [ 62.503179] worker_thread+0x45/0x3b0 [ 62.503361] ? process_one_work+0x390/0x390 [ 62.503568] kthread+0xf9/0x130 [ 62.503726] ? kthread_park+0x80/0x80 [ 62.503911] ret_from_fork+0x22/0x30 [ 62.504090] ---[ end trace de9ed4a70f8d71e2 ]--- [ 123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller [ 123.914670] nvme nvme0: 1/0/0 default/read/poll queues [ 123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 123.917469] #PF: supervisor write access in kernel mode [ 123.917725] #PF: error_code(0x0002) - not-present page [ 123.917976] PGD 0 P4D 0 [ 123.918109] Oops: 0002 [#1] SMP PTI [ 123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G W 5.8.0+ #8 [ 123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 123.919219] Workqueue: nvme-reset-wq nvme_reset_work [ 123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286 [ 123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000 [ 123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000 [ 123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000 [ 123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000 [ 123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000 [ 123.922641] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 123.923032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 123.924353] Call Trace: [ 123.924479] blk_mq_alloc_tag_set+0x137/0x2a0 [ 123.924694] nvme_reset_work+0xed6/0x12a0 [ 123.924898] process_one_work+0x1d2/0x390 [ 123.925099] worker_thread+0x45/0x3b0 [ 123.925280] ? process_one_work+0x390/0x390 [ 123.925486] kthread+0xf9/0x130 [ 123.925642] ? kthread_park+0x80/0x80 [ 123.925825] ret_from_fork+0x22/0x30 [ 123.926004] Modules linked in: [ 123.926158] CR2: 0000000000000000 [ 123.926322] ---[ end trace de9ed4a70f8d71e3 ]--- [ 123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286 [ 123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000 [ 123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000 [ 123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000 [ 123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000 [ 123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000 [ 123.929715] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 123.930106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The aliases were never released causing the following leaks:
Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
#0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
thesofproject#1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
thesofproject#2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
thesofproject#3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
thesofproject#4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
thesofproject#5 0x56332c76a09b in run_test tests/builtin-test.c:410
thesofproject#6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
thesofproject#7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
thesofproject#8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
thesofproject#9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
thesofproject#10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
thesofproject#11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
thesofproject#12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
thesofproject#13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 956a783 ("perf test: Test pmu-events aliases")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string. But tool event (duration_time) passes a result of
strdup() caused a leak.
It was found by ASAN during metric test:
Direct leak of 210 byte(s) in 70 object(s) allocated from:
#0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
thesofproject#1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
thesofproject#2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
thesofproject#3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
thesofproject#4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
thesofproject#5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
thesofproject#6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
thesofproject#7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
thesofproject#8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
thesofproject#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
thesofproject#10 0x559fbbc0109b in run_test tests/builtin-test.c:410
thesofproject#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
thesofproject#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
thesofproject#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
thesofproject#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
thesofproject#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
thesofproject#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
thesofproject#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
thesofproject#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test_generic_metric() missed to release entries in the pctx. Asan
reported following leak (and more):
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
thesofproject#1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
thesofproject#2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
thesofproject#3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
thesofproject#4 0x55f7e7341667 in expr__add_ref util/expr.c:120
thesofproject#5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
thesofproject#6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
thesofproject#7 0x55f7e712390b in compute_single tests/parse-metric.c:128
thesofproject#8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
thesofproject#9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
thesofproject#10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
thesofproject#11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
thesofproject#12 0x55f7e70be09b in run_test tests/builtin-test.c:410
thesofproject#13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
thesofproject#14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
thesofproject#15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
thesofproject#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
thesofproject#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
thesofproject#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
thesofproject#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
thesofproject#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail. Also it can fail in the middle like in
resolve_metric() even for single metric.
In those cases, the intermediate list and ids will be leaked like:
Direct leak of 3 byte(s) in 1 object(s) allocated from:
#0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
thesofproject#1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
thesofproject#2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
thesofproject#3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
thesofproject#4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
thesofproject#5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
thesofproject#6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
thesofproject#7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
thesofproject#8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
thesofproject#9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
thesofproject#10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
thesofproject#11 0x55f7e70be09b in run_test tests/builtin-test.c:410
thesofproject#12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
thesofproject#13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
thesofproject#14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
thesofproject#15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
thesofproject#16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
thesofproject#17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
thesofproject#18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
thesofproject#19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following leaks were detected by ASAN:
Indirect leak of 360 byte(s) in 9 object(s) allocated from:
#0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
thesofproject#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
thesofproject#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
thesofproject#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
thesofproject#4 0x560578e07045 in test__pmu tests/pmu.c:155
thesofproject#5 0x560578de109b in run_test tests/builtin-test.c:410
thesofproject#6 0x560578de109b in test_and_print tests/builtin-test.c:440
thesofproject#7 0x560578de401a in __cmd_test tests/builtin-test.c:661
thesofproject#8 0x560578de401a in cmd_test tests/builtin-test.c:807
thesofproject#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
thesofproject#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
thesofproject#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
thesofproject#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
thesofproject#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308
Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This set of patches introduces the concept of a virtual FE dai link.
Previously, when trying to enable the hostless pipeline, it was
discovered that the codec does not get enabled and the tone remained inaudible.
Therefore, in such cases, a virtual FE dai link will be used to establish
a connection to the BE dai and enable the codec when the pipeline is triggered.
The virtual FE will be created when a siggen widget is loaded and freed when
the siggen widget is unloaded. It will be used for enabling the BE dai in the
kcontrol IO handler for the kcontrol attached to the siggen by calling
the soc_spcm_runtime_update() method.