Improve TLS codegen by marking the panic/init path as cold by orlp · Pull Request #143511 · rust-lang/rust

orlp · 2025-07-05T23:17:16Z

This is an extension of the performance improvements seen from #141685. I noticed that the non-const TLS still didn't have the #[cold] attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (init) or never (panic, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to #[cold] I added the more aggressive #[inline(never)] to both cold paths as well.

rustbot · 2025-07-05T23:17:20Z

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

compiler-errors · 2025-07-05T23:30:07Z

Not sure if this will show up at all on perf but 🤷

@bors2 try @rust-timer queue

Do you have any local benchmarks?

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-05T23:30:15Z

⌛ Trying commit db7b096 with merge 9f2c18a…

To cancel the try build, run the command @bors2 try cancel.

orlp · 2025-07-05T23:32:01Z

@compiler-errors No I don't have any local benchmarks. But I look at assembly output a lot, and trust me when I say these code paths should never get inlined.

Could you restart the benchmark with my second commit included?

compiler-errors · 2025-07-05T23:32:53Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-05T23:32:57Z

⌛ Trying commit cf4669e with merge 8b17150…

(The previously running try build was automatically cancelled.)

To cancel the try build, run the command @bors2 try cancel.

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-06T01:46:59Z

☀️ Try build successful (CI)
Build commit: 8b17150 (8b17150009e237f23856ea93eb9b208049d8a621, parent: 175e04331be56c5b4bdf77478434b1a5e0556770)

rust-timer · 2025-07-06T10:56:21Z

Finished benchmarking commit (8b17150): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	1
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	1
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	-0.3%	[-0.3%, -0.3%]	1

Max RSS (memory usage)

Results (primary 5.4%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	5.4%	[4.3%, 7.1%]	3
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	5.4%	[4.3%, 7.1%]	3

Cycles

Results (primary 2.6%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.6%	[2.6%, 2.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-2.8%, -2.8%]	1
All ❌✅ (primary)	2.6%	[2.6%, 2.6%]	1

Binary size

Results (primary 0.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.5%]	15
Regressions ❌ (secondary)	0.1%	[0.0%, 0.1%]	37
Improvements ✅ (primary)	-0.2%	[-0.7%, -0.0%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.0%	[-0.7%, 0.5%]	20

Bootstrap: 459.09s -> 461.518s (0.53%)
Artifact size: 372.18 MiB -> 372.13 MiB (-0.01%)

orlp · 2025-07-06T13:43:47Z

I removed some inline(never)s because they pessimized codegen. I had forgotten that the get() call which returns the TLS pointer still gets wrapped inside LocalKey and checked again to see if a panic is required. Now this PR only adds hot paths with #[cold] for the fallback.

Codegen is still nicer just due to the addition of #[cold], it moves the initialization out of the hot path at least (and the compiler may still decide to not inline it).

lqd · 2025-07-06T15:01:45Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-06T15:01:48Z

⌛ Trying commit 92fa8e8 with merge 9782d0a…

To cancel the try build, run the command @bors2 try cancel.

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-06T17:15:29Z

☀️ Try build successful (CI)
Build commit: 9782d0a (9782d0a1d99759de86b20e0863061637a0a3c245, parent: c83e217d268d25960a0c79c6941bcb3917a6a0af)

rust-timer · 2025-07-06T22:56:58Z

Finished benchmarking commit (9782d0a): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	2
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.0%	[0.0%, 0.0%]	1
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	1
All ❌✅ (primary)	0.0%	[0.0%, 0.0%]	1

Bootstrap: 461.809s -> 462.209s (0.09%)
Artifact size: 372.19 MiB -> 372.13 MiB (-0.02%)

joboet · 2025-08-29T22:35:36Z

    /// The resulting pointer may not be used after reentrant inialialization
    /// or thread destruction has occurred.
+    #[inline]
    pub fn get(&'static self, i: Option<&mut Option<T>>, f: impl FnOnce() -> T) -> *const T {


While you're at it, I think it might be beneficial to inline the ptr.addr() == 1 case into this function, as that might yield more optimized LocalKey::withs.

I disagree, see my other comment.

joboet · 2025-08-29T22:41:39Z

+        if let State::Alive = self.state.get() {
+            self.val.get()
+        } else {
+            unsafe { self.get_or_init_slow() }


I don't think this is beneficial – the returned pointer is later compared against null in LocalKey::with anyway, so the optimiser should be able to merge the state comparison into that.

I think it is beneficial. I think anything that is not the initialized path should be marked cold and gotten out of the way, even if that makes the non-initialized path slightly slower and have duplicated work.

The initialized hot path is what matters 99.999% of the time and should be prioritized over all else.

I've made this toy example to illustrate this: https://rust.godbolt.org/z/hd6hnGWGh.

Note that because UnsafeCell::get cannot return a null pointer, the fast-path once inlined completely eliminates the nullptr check and only checks the state.

rustbot · 2026-06-05T10:55:50Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

orlp · 2026-06-05T11:10:52Z

Sorry for the delay. I've rebased on the latest main and addressed the review comments (either solving their concern or contesting).

I've also made one small change, I've explicitly assigned discriminant 0 to the Alive state which can give faster code on Arm to check (directly cbz instead of cmp first).

@rustbot label -S-waiting-on-author +S-waiting-on-review

joboet · 2026-06-07T11:50:29Z

I tested the performance of my idea, and it truly appears to be worse.

Let's do a final perf-run of this, and then this should be good to go.

@bors try @rust-timer queue

Improve TLS codegen by marking the panic/init path as cold

rust-bors · 2026-06-07T13:59:43Z

☀️ Try build successful (CI)
Build commit: 4fa0244 (4fa024445d244dbe0860ff6c1500d718a02ec239, parent: 43a4909ee98ed4d006d9d773f5d94dc58e34f846)

rust-timer · 2026-06-07T14:40:43Z

Finished benchmarking commit (4fa0244): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up.

@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This perf run didn't have relevant results for this metric.

Max RSS (memory usage)

Results (secondary -4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.9%	[-4.9%, -4.9%]	1
All ❌✅ (primary)	-	-	0

Cycles

This perf run didn't have relevant results for this metric.

Binary size

This perf run didn't have relevant results for this metric.

Bootstrap: 516.093s -> 514.238s (-0.36%)
Artifact size: 400.83 MiB -> 401.34 MiB (0.13%)

joboet · 2026-06-08T13:52:18Z

Well, that's underwhelming...

But I think I'll merge this anyway, it makes the implementations more consistent and is nice to have in general...

@bors r+ rollup

rust-bors · 2026-06-08T13:52:21Z

📌 Commit 2de39c0 has been approved by joboet

It is now in the queue for this repository.

…uwer Rollup of 13 pull requests Successful merges: - #147302 (asm! support for the Xtensa architecture) - #148820 (Add very basic "comptime" fn implementation) - #157299 (Fix unstable diagnostics in tests) - #143511 (Improve TLS codegen by marking the panic/init path as cold) - #154608 (Add `_value` API for number literals in proc-macro) - #156762 (xfs support in `test_rename_directory_to_non_empty_directory`) - #157300 (Relax test requirements for consistency) - #157383 (tests: codegen-llvm: Ignore BPF targets in c-variadic-opt) - #157413 (fix: don't suggest .into_iter() for .cloned()/.copied() on non-reference Option) - #157578 (Fix diagnostics for non-exhaustive destructuring assignments (#157553)) - #157587 (explain that the size_of constant also serves to avoid optimizing away 'unused' size_of calls) - #157596 (test: remove ineffective link-extern-crate-with-drop-type test) - #157602 (rustdoc: Remove unnecessary fast path)

Rollup merge of #143511 - orlp:tls-cold-init, r=joboet Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

JonathanBrouwer · 2026-06-12T06:03:40Z

@rust-timer build 6f652bc

rust-timer · 2026-06-12T07:22:28Z

Finished benchmarking commit (6f652bc): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up.

@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This perf run didn't have relevant results for this metric.

Max RSS (memory usage)

Results (primary -1.3%, secondary 6.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.8%	[2.8%, 2.8%]	1
Regressions ❌ (secondary)	6.8%	[6.8%, 6.8%]	1
Improvements ✅ (primary)	-5.4%	[-5.4%, -5.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-5.4%, 2.8%]	2

Cycles

Results (secondary -0.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.6%	[3.6%, 3.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.6%	[-4.6%, -4.6%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.0%	[0.0%, 0.0%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.0%	[0.0%, 0.0%]	3

Bootstrap: 517.572s -> 515.951s (-0.31%)
Artifact size: 400.85 MiB -> 400.77 MiB (-0.02%)

rustbot assigned workingjubilee Jul 5, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 5, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

ibraheemdev reviewed Jul 22, 2025

View reviewed changes

Comment thread library/std/src/sys/thread_local/native/eager.rs

joboet reviewed Aug 29, 2025

View reviewed changes

orlp added 2 commits June 5, 2026 12:45

Don't use inline(never)

8fd8fbf

Rename Uninitialized to Unregistered

a7790c6

orlp force-pushed the tls-cold-init branch from 92fa8e8 to a7790c6 Compare June 5, 2026 10:55

Use zero to represent alive state for faster comparison on Arm

2de39c0

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 5, 2026

joboet mentioned this pull request Jun 6, 2026

std: micro-optimize thread local accesses #157537

Closed

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 7, 2026

This comment has been minimized.

Sign in to view

rust-bors Bot pushed a commit that referenced this pull request Jun 7, 2026

Auto merge of #143511 - orlp:tls-cold-init, r=<try>

4fa0244

Improve TLS codegen by marking the panic/init path as cold

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 7, 2026

rust-bors Bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 8, 2026

JonathanBrouwer mentioned this pull request Jun 8, 2026

Rollup of 13 pull requests #157616

Merged

rust-bors Bot merged commit 7fc9526 into rust-lang:main Jun 8, 2026
13 checks passed

rustbot added this to the 1.98.0 milestone Jun 8, 2026

This comment has been minimized.

Sign in to view

Uh oh!

Conversation

orlp commented Jul 5, 2025 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors Bot commented Jul 5, 2025

Uh oh!

orlp commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors Bot commented Jul 5, 2025

Uh oh!

rust-bors Bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ❌✅ regressions and improvements - no action needed

Uh oh!

orlp commented Jul 6, 2025

Uh oh!

lqd commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-bors Bot commented Jul 6, 2025

Uh oh!

rust-bors Bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orlp Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Jun 5, 2026

Uh oh!

orlp commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joboet commented Jun 7, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented Jun 7, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Jun 7, 2026

Overall result: no relevant changes - no action needed

Uh oh!

joboet commented Jun 8, 2026

Uh oh!

rust-bors Bot commented Jun 8, 2026

Uh oh!

Uh oh!

JonathanBrouwer commented Jun 12, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Jun 12, 2026

Overall result: no relevant changes - no action needed

Uh oh!

Reviewers

Assignees

orlp commented Jul 5, 2025 •

edited by rustbot

Loading

orlp Jun 5, 2026 •

edited

Loading

orlp commented Jun 5, 2026 •

edited

Loading