Skip to content

perf(cache): migrate H22 cache async commits to virtual threads#35992

Open
wezell wants to merge 1 commit into
mainfrom
issue-35991-h22cache-virtual-threads
Open

perf(cache): migrate H22 cache async commits to virtual threads#35992
wezell wants to merge 1 commit into
mainfrom
issue-35991-h22cache-virtual-threads

Conversation

@wezell
Copy link
Copy Markdown
Member

@wezell wezell commented Jun 5, 2026

Proposed Changes

Migrates the H22 cache async commit executor from a fixed pool of 5 platform threads to virtual threads.

Async cache writes (put/remove against the embedded H2 store) are I/O-bound, so they're a natural fit for virtual threads now that we're on Java 25 — JEP 491 means synchronized blocks in Hikari/H2 no longer pin the carrier thread, so the classic "virtual threads + JDBC pinning" concern does not apply.

What changed (H22Cache.java)

  • spawnNewThreadPool() now returns Executors.newThreadPerTaskExecutor(...) with a named virtual-thread factory (H22-ASYNC-COMMIT-*). Removed the Guava ThreadFactoryBuilder, ThreadPoolExecutor, and LinkedBlockingQueue.
  • A shared submitAsync(Runnable) helper now backs both putAsync and removeAsync.

Behavior preserved (not a behavior change)

Concern How it's handled
Backpressure The real backpressure was always the up-front shouldAsync() check, which falls back to a synchronous commit on the caller. Unchanged. The old CallerRunsPolicy was effectively dead code — the unbounded queue never rejected.
Backlog metric A virtual-thread-per-task executor has no shared queue, so isAllocationWithinTolerance() now reads an AtomicInteger in-flight counter instead of asyncTaskQueue.size(). Same semantics; the @Ignored H22CacheTest still compiles.
DB concurrency A Semaphore (sized by the existing cache_h22_async_threads knob, default 5) acquired inside each task caps how many commits hit the Hikari pool at once — preventing connection-timeout storms and spurious shard rebuilds under burst load. Virtual threads block cheaply on the semaphore.
Shutdown In-flight counter is balanced on RejectedExecutionException at shutdown; executor shutdown path unchanged.

The obsolete cache.h22.async.caller.runs.policy config is no longer read.

This fixes / relates to

Part of #35991 (Java 25 Performance Improvements) — first child PR.

Testing

  • ./mvnw compile -pl :dotcms-coreBUILD SUCCESS.
  • No integration suite run locally (per repo guidance). Reviewer note: worth a sanity run of H22CacheTest and a write-heavy cache smoke test.

Checklist

  • Compiles (BUILD SUCCESS)
  • No public API / behavior change intended
  • Reviewer to confirm cache behavior under load

🤖 Generated with Claude Code

Replace the fixed 5-thread ThreadPoolExecutor that handles async H22 cache
commits (put/remove to the embedded H2 store) with a virtual-thread-per-task
executor. Each commit now runs on its own virtual thread that blocks cheaply
on JDBC/disk I/O instead of contending for a small pool of platform threads.

Java 25's JEP 491 means synchronized blocks in Hikari/H2 no longer pin the
carrier thread, so the classic "virtual threads + JDBC pinning" concern does
not apply.

Behavior preserved:
- The real backpressure was always the shouldAsync() pre-check, which falls
  back to a synchronous commit on the caller; that path is unchanged. The old
  CallerRunsPolicy was effectively dead code (unbounded queue never rejected).
- Since a virtual-thread-per-task executor has no shared queue, the queue-depth
  backpressure metric now reads an AtomicInteger in-flight counter instead of
  asyncTaskQueue.size(); isAllocationWithinTolerance()/shouldAsync() semantics
  are otherwise unchanged (and the @ignore'd H22CacheTest still compiles).
- A Semaphore (sized by the existing cache_h22_async_threads knob, default 5)
  acquired inside each task bounds how many commits hit the Hikari pool at once,
  preventing connection-timeout storms and spurious shard rebuilds under load.

Part of #35991.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Area : Backend PR changes Java/Maven backend code label Jun 5, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 5, 2026

Claude finished @wezell's task in 3m 8s —— View job


Code Review

  • Read changed file
  • Analyze implementation
  • Post review

Overall the migration is sound. The counter-balancing logic in submitAsync is correct, InterruptedException handling is proper (semaphore not released when acquire throws), and the semaphore-as-concurrency-cap pattern is a legitimate substitute for the old bounded pool. A few things worth a second look:

1. Shutdown ordering: pools close before executor drains (shutdownPools, lines 476–499)

The pools are set to null and closed before executorService.shutdown() is called. In-flight virtual threads blocked on dbWorkPermits will proceed after pool closure, call createConnection()getPool(), and find pools[db] == null. The getPool() early-return guard is !isInitialized.get() && ShutdownCoordinator.isShutdownStarted() (both must be true). If ShutdownCoordinator.isShutdownStarted() is false at that moment, getPool() will attempt to rebuild a pool for a cache that's shutting down. Reversing the order — drain the executor first, then close pools — avoids this and prevents a cascade of spurious error-log lines or unnecessary pool rebuilds.

2. asyncTaskQueueSize semantics have silently changed (isAllocationWithinTolerance, line 320–326)

The old asyncTaskQueue.size() measured items sitting in a bounded LinkedBlockingQueue. CallerRunsPolicy was the hard backstop. Now inFlightTasks counts every virtual thread that has been submitted but not yet finished — including those blocked on the semaphore. There is no longer a hard bound at asyncTaskQueueSize; it's a soft throttle. This is fine in practice (the prior queue was "never full" per the PR description), but the config property cache_h22_async_task_queue and its comment now misrepresent what's being measured. Worth a rename or at minimum an updated comment/javadoc so operators don't tune it expecting queue-depth semantics.

3. shouldAsync() falling back to synchronous writes is now the only backpressure (line 334–336)

With a thread-per-task executor there is no queue to observe back-pressure at the OS scheduler level. The only throttle is: (a) the tolerance check (soft threshold at 9,800 in-flight tasks), and (b) the semaphore (hard limit at 5 concurrent DB operations). Under a write burst, the path is: tasks pile up blocked on the semaphore → inFlightTasks climbs → shouldAsync() flips false → callers write synchronously. That's reasonable, but it means a burst can create up to ~9,800 virtual threads all holding their Fqn+Serializable content closures before the fallback kicks in. The old queue had the same ceiling, so this is not worse — just worth confirming the 10,000 default is right for the expected object size, since these closures live until the semaphore permits them through.

4. Minor: namedThreadFactory as an instance field

Thread.ofVirtual().name("H22-ASYNC-COMMIT-", 0).factory() — the counter restarts at 0 for each H22Cache instance. If multiple instances run in the same JVM (tests, multi-tenant), thread names collide. Moving this to spawnNewThreadPool() or making it static avoids the ambiguity. Low severity.

Fix this →

@wezell wezell enabled auto-merge June 5, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant