Skip to content

[DNM] Random attempts to fix controller reset issue#4894

Closed
plbossart wants to merge 3 commits intothesofproject:topic/sof-devfrom
plbossart:test/better-pio-use
Closed

[DNM] Random attempts to fix controller reset issue#4894
plbossart wants to merge 3 commits intothesofproject:topic/sof-devfrom
plbossart:test/better-pio-use

Conversation

@plbossart
Copy link
Member

@plbossart plbossart commented Apr 1, 2024

suggested hacks based on code review only

I don't see any improvement on the hardware but the code does look suspicious.

with the 3rd commit I can't reproduce the errors reported in issue #4889 root-caused to the use of PIO commands (see PR #4892)

cc:

…commands mode

When we use the PIO mode, we should not program anything related to CORB.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
This doesn't seem to have any effect on the controller reset issue,
but I wonder why we disable interrupts and then stop the
commands. This sounds backwards, the commands first need to complete
and then the interrupts should be disabled?

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
For some reason, the programming sequences in the SOF driver do not
include the required clear of the WAKE_STS bits when the controller is
not in reset.

Adding this sequence avoids a regression on LunarLake when using PIO
commands. The correlation or causation is not clear, but there was no
reason in the first place to deviate from the recommended programming
sequence.

Closes: thesofproject#4889
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart
Copy link
Member Author

@bardliao can you take a look? The coincidence with the WAKE_STS also used for SoundWire is troubling.

@ujfalusi comments/test welcome. I just randomly looked at potential issues and found this 3rd commit by comparison with the existing programming sequences in sound/hda.

@plbossart
Copy link
Member Author

FWIW the 3rd commit partially reverts 6ec3295 ("ASoC: SOF: Intel: hda: remove duplicated clear WAKESTS")

I have no idea why this was removed.

It could also be that there's an interference from ae5ff22 ("ASoC: SOF: Intel: hda-ctrl: only clear WAKESTS for HDaudio codecs"), where in the end we didn't clear all the WAKE_STS fields.

Wow, my brain is officially fried haha.

@plbossart
Copy link
Member Author

Wait, this gets even better.

This WAKE_STS clear was improved in b37a151 ("ALSA: hda: avoid write to STATESTS if controller is in reset") by our very own @kv2019i - but we never used this in the SOF driver.

Meh.

@plbossart
Copy link
Member Author

plbossart commented Apr 1, 2024

And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?

commit e8a7f136f5edb6ae83b14faaa0da2a3c4558f431
Author: Danny Tholen <obiwan@mailmij.org>
Date:   Tue Sep 11 21:41:56 2007 +0200

    [ALSA] hda-intel - Improve HD-audio codec probing robustness
    
    When modem is disabled in the BIOS, detection of the number of codecs
    always fails after booting if STATESTS is not cleared first.
    This patch fixes this problem and also adds an error check in a place
    where a read error would lead to a very large number of pointless loops.
    
    Signed-off-by: Danny Tholen <obiwan@mailmij.org>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Jaroslav Kysela <perex@suse.cz>

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 1, 2024

And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?

This could have been worse, it could have been "JiaT75"

@ujfalusi
Copy link
Collaborator

ujfalusi commented Apr 2, 2024

I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?

gctl = snd_sof_dsp_read(sdev, HDA_DSP_HDA_BAR, SOF_HDA_GCTL);
if (gctl & SOF_HDA_GCTL_RESET)
snd_sof_dsp_write(sdev, HDA_DSP_HDA_BAR,
SOF_HDA_WAKESTS, SOF_HDA_WAKESTS_INT_MASK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart @ujfalusi This is really curious. HDA-spec wise this clearing is not needed, the state will be reset with controller reset. But indeed in snd-hda-intel, this has been historically done to avoid bugs, but until now, we've not needed this in SOF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, based on my test, the commit does help on updating GCTL value in hda_dsp_ctrl_link_reset(). gctl will be always 1 without this commit. But I don't get the reason why clear WAKE_STS helps on this. Looks like something is blocked?

@plbossart
Copy link
Member Author

I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?

There is a clear correlation between SoundWire and WAKE_STS. From LNL onwards, we use the HDaudio wake detection instead of the SoundWire wake detection. So clearing all the bits prior to entering reset makes sense, otherwise there might be some bitfields that are left active.

But yeah I don't have any explanation for the PIO role in triggering this. We may need to double check what happens if we don't enable PIO for LNL.

@ujfalusi
Copy link
Collaborator

ujfalusi commented Apr 2, 2024

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

@plbossart
Copy link
Member Author

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.

/* disable controller CIE and GIE */
snd_sof_dsp_update_bits(sdev, HDA_DSP_HDA_BAR, SOF_HDA_INTCTL,
SOF_HDA_INT_CTRL_EN | SOF_HDA_INT_GLOBAL_EN,
0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is does snd_sof_dsp_write need the interrupts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is useful, what I was thinking is that disabling the interrupts and then stopping the commands makes no sense in general. If there was an in-flight command it would be blocked, be it with the CORB/RIRB or PIO mode.

For now this doesn't have any effect so we can drop it. Still it'd be good to double-check this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PIO mode is polling on the IC registers, it is not using interrupts, CORB/RIRB uses interrupts.

@ujfalusi
Copy link
Collaborator

ujfalusi commented Apr 2, 2024

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.

I don't have access to the entire dmesg, but daily test 38398 (from 29.02.2024) does have these:

[ 4399.067924] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[ 4399.067964] kernel: soundwire sdw-master-0-0: trf on Slave 6 failed:-5 read addr 8038 count 0
[ 4399.068012] kernel: rt711-sdca sdw:0:0:025d:0711:01: Failed to get private value: 6100038 => ffff92fd ret=-5
[ 4399.571720] kernel: soundwire_intel soundwire_intel.link.3: IO transfer timed out, cmd 3 device 1 addr 45 len 1
[ 4399.571785] kernel: soundwire sdw-master-0-3: trf on Slave 1 failed:-110 write addr 45 count 0
[ 4399.571810] kernel: rt1316-sdca sdw:0:3:025d:1316:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571826] kernel: rt1316-sdca sdw:0:3:025d:1316:01: clock stop prepare failed:-110
[ 4399.571839] kernel: soundwire_intel soundwire_intel.link.3: prepare clock stop failed -110
[ 4399.571852] kernel: soundwire_intel soundwire_intel.link.3: intel_stop_bus: cannot stop clock: -110
[ 4399.571921] kernel: soundwire_intel soundwire_intel.link.1: IO transfer timed out, cmd 3 device 7 addr 45 len 1
[ 4399.571942] kernel: soundwire sdw-master-0-1: trf on Slave 7 failed:-110 write addr 45 count 0
[ 4399.571958] kernel: rt715-sdca sdw:0:1:025d:0714:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571971] kernel: rt715-sdca sdw:0:1:025d:0714:01: clock stop prepare failed:-110
[ 4399.571983] kernel: soundwire_intel soundwire_intel.link.1: prepare clock stop failed -110
[ 4399.571994] kernel: soundwire_intel soundwire_intel.link.1: intel_stop_bus: cannot stop clock: -110
[ 4399.572048] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out

I don't see gctl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants