loader,smp: Enable caches on all cores for shareability#465
loader,smp: Enable caches on all cores for shareability#465
Conversation
midnightveil
left a comment
There was a problem hiding this comment.
This seems like it would be correct to me, but I'd like to confirm it against the ARM ARM first.
What is the shareability of these mappings, do you know? Hopefully we set it up as ISH.
Maybe I misunderstood @KurtWu10 but I was under the impression that this would be fine as we only disabled the MMU & L1 cache (or actually thinking about it I guess we also disable the cache itself. I was asking about how it works it you just disable the MMU, not the cache too)).
This probably would affect spin table booting too, though I'm inclined to leave it as is as nothing but raspberry pi uses it (looking at Linux it does a cache clean & invalidate to PoC there)
(Not something to fix here) this does make me realise I could have switched to the kernel page tables instead of disabling the uboot ones, as that would stop the loader loading things over the current page tables since it knows to not load over itself. Although we probably want to clean caches then.
|
Initially, I thought this was correct, but now I am not sure whether enabling MMU is required for cache coherency. According to the Arm ARM D8.2.12 The effects of disabling an address translation stage, assuming that the effective value of
then according to B2.3 Ordering requirements defined by the formal concurrency model,
My interpretation is, with MMU disabled, data memory accesses use the device attribute, and cache coherency for them is provided for load-acquire and store-release instructions. The reasoning above assumes that the effective value of |
I think you are correct here: cacheability and coherency can indeed apply regardless of MMU, as long as the cores are in the same shareability domain and cache attributes are consistent. My mistake in the description was that I confused “enabling the caches” with “enabling the MMU”, because both are currently done inside enable_mmu(). A possible refinement would be to separate cache activation from the enable_mmu() with a new function called before accessing shared data. Which makes now 2 possible fixes :
I am fine with either option Thank you for raising this inconsistency! |
In #464 I set inner shareability in the memory translation attributes
That the point, we could disable the MMU and only enable the cache. But in this situation, I don't know what the default shareability settings. I tested on STM32MP and this works, but I'm afraid we could fall into SoC or undefined behavior. Making sure attributes in the translation tables are set (thus with MMU enabled) is IMHO the best way to avoid ambiguity
|
|
Yes, I agree. I think it makes sense to turn MMU and caches on. |
|
FYI, you'll need to fix the git commit title length, since there is gitlint setup on seL4 repos. No more than 50 chars in the title and 70 in the body. Same for the other PR. |
Ensure that caches are activated on all cores before accessing shared synchronisation cached data. Caches are enabled within arch_mmu_enable(), so sharablility attributes are consistenly configured. Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
| sp[1] = 0; | ||
|
|
||
| /* clean up the cache line containing sp before use by secondary core */ | ||
| asm volatile("dc cvac, %0" :: "r"(sp) : "memory"); |
There was a problem hiding this comment.
I guess this is an argument that we should make the loader use outer shareable instead of inner shareable, then I think this should be fine.
There was a problem hiding this comment.
I don't think that would solve any coherency problems when the caches are disabled on the secondary cores.
There was a problem hiding this comment.
I don't think that would solve any coherency problems when the caches are disabled on the secondary cores.
Caches are enabled early in arm_secondary_cpu_entry(). Before, the only data that must be coherent is the parameter read from the stack by arm_secondary_cpu_entry_asm() (without this I read 0 as logical_cpu parameter, which fails). Instructions should be fine.
We could conservatively clean the entire cache here, but that seems like overkill. Am I missing anything, or do you see any other possible source of incoherence before caches are enabled in this sequence?
(edit: arm_secondary_cpu_entry_asm() read the stack, not the PSCI call)
There was a problem hiding this comment.
Apologies, I didn't notice this was for the stack, which probably needs a cache clean here yes.
What I had thought of was that the print_lock is a variable that exists across cores that won't be in the same shareability domain. I probably need to recheck the manual here, it might be that a "dsb sy" (full system barrier) would be OK when the memory is mapped ISH as opposed to OSH, but I also don't know if that's what the compiler atomics expect and whether the instructions they emit work for that case.
Admittedly this isn't an issue with this PR at all...
There was a problem hiding this comment.
No problem. It's best to discuss the details now :). I think the print_lock exclusive accesses are in the same shareability domain, because they are accessed after enable_mmu() on all cores, with the same MMU/cache attributes. Also, we enable both ISH and OSH shareability in the translation tables
There was a problem hiding this comment.
Also, we enable both ISH and OSH shareability in the translation tables
I'm not sure what you mean by this?
Anyway, it's a Friday evening and I'm mixing everything up 🙃. For some reason I'm thinking that ISH means "non-shareable"; I'm mixing things up. ISH should be fine for cores; all cores are (usually) in the same inner shareable domain; at least that's what the seL4 SMP code assumes. Non-shareable would be a different matter, it was just the comment in here that the shareability matches that of seL4 sees to be not true as seL4 is NSH on the non-SMP configurations.
tldr: no issue I mixed things up .
Hello,
This fixes an SMP boot issue observed on the STM32MP2 platform: core 0 does not start because print_lock is never seen as released by the secondary cores. The issue is related to cachability, which is not yet enabled at this stage.
Ensure all cores are in the same coherency domain before using cached shared data by enable MMU on all cores before using shared datas
Thus moved arch_mmu_enable() out of start_kernel().