Skip to content

25.3.8-fips: Make tests in CI a bit more green#1614

Merged
mkmkme merged 9 commits intoreleases/25.3.8-fipsfrom
mkmkme/fips/disable-broken-tests
Apr 7, 2026
Merged

25.3.8-fips: Make tests in CI a bit more green#1614
mkmkme merged 9 commits intoreleases/25.3.8-fipsfrom
mkmkme/fips/disable-broken-tests

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Apr 2, 2026

Fix a config in one test, disable a couple of tests.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

It requires PRQL that is disabled in FIPS.
@mkmkme mkmkme added fips Work related to Altinity FIPS releases fips-25.3 fips-25.3.8.30001 labels Apr 2, 2026
@mkmkme mkmkme changed the title Disable broken tests 25.3.8-fips: Disable broken tests Apr 3, 2026
mkmkme added 3 commits April 3, 2026 10:27
DeltaLake is not supported in FIPS.
We replaced aes_128_gcm_siv with aes_128_gcm, but it was not reflected
in the test configs. Let's fix it.
@mkmkme mkmkme marked this pull request as ready for review April 3, 2026 08:28
@mkmkme mkmkme changed the title 25.3.8-fips: Disable broken tests 25.3.8-fips: Make tests in CI a bit greener Apr 3, 2026
@mkmkme mkmkme changed the title 25.3.8-fips: Make tests in CI a bit greener 25.3.8-fips: Make tests in CI a bit more green Apr 3, 2026
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

It's quite odd test_disks_app_func/test.py::test_disks_app_func_rm_shared_recursive is failing in the CI semi-consistently. It's passing locally.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

03170_ecs_crash is something odd. Never seen it failing before. And these changes didn't touch it at all.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] is an interesting failing test. It also fails on non-fips build from branch releases/25.3.8. So it's unrelated.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_disks_app_other_disk_types/test.py::test_disks_app_plain_rewritable also can't be reproduced locally, but for some reason is reproduced in CI

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 3, 2026

test_disks_app_other_disk_types/test.py::test_disks_app_plain_rewritable also can't be reproduced locally, but for some reason is reproduced in CI

Okay it does fail on asan due to memory leak. Investigating

mkmkme and others added 2 commits April 4, 2026 10:41
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state`
struct (48 bytes + 24-byte pointer array) on the first FIPS-approved
crypto operation via `service_indicator_get()` →
`CRYPTO_set_thread_local()` → `OPENSSL_malloc()`. This state tracks
whether each cryptographic operation used a FIPS-approved algorithm, as
required by FIPS 140-3 compliance.

The memory is registered with a pthread TLS destructor (`OPENSSL_free`)
that fires when the thread exits. However, in ClickHouse the crypto
operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool
worker threads. These worker threads outlive the application-level thread
pools (like `threadpool_writer`) because `ThreadFromGlobalPoolImpl`
submits jobs to the GlobalThreadPool rather than owning threads directly.
The GlobalThreadPool worker threads are only joined during static
destruction of `GlobalThreadPool::the_instance`, which races with
LeakSanitizer's atexit check.

This is a false positive: the memory IS freed when the worker threads
eventually exit, but LSAN runs its scan before GlobalThreadPool static
destruction completes.

This issue is FIPS-specific: the `FIPS_service_indicator_update_state`
code path only exists in AWS-LC FIPS builds. Non-FIPS builds (OpenSSL
3.2.1) do not have service indicator tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS-LC FIPS 2.0.0 allocates a per-thread `fips_service_indicator_state`
(48 + 24 bytes) on the first FIPS-approved crypto operation via
`service_indicator_get()`. This state is freed by a pthread TLS
destructor when the thread exits, but in ClickHouse the crypto
operations (e.g. SHA-256 for S3 request signing) run on GlobalThreadPool
worker threads whose lifetime extends beyond LeakSanitizer's check.

This causes LSAN to report a false positive in integration tests that
invoke `clickhouse disks` or other CLI tools performing S3 operations.

The fix:
- Add `tests/integration/helpers/lsan_suppressions.txt` with a
  suppression for `service_indicator_get` (placed in helpers/ because
  only `tests/integration/` is mounted into the test runner container)
- Configure `cluster.py` to set `LSAN_OPTIONS` with the suppressions
  file path and copy it into each instance's config directory

This issue is FIPS-specific: the `FIPS_service_indicator_update_state`
code path only exists in AWS-LC FIPS builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Apr 4, 2026

[gw8] [ 71%] PASSED test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] 
[gw8] [ 71%] ERROR test_dictionaries_all_layouts_separate_sources/test_mongo_uri.py::test_simple_ssl[flat-True] 

This seems to be failing due to a memory leak in BIO_do_handshake that is being fixed in #1616

mkmkme and others added 3 commits April 4, 2026 13:52
The previous commit (bf4af17) added LSAN suppressions for
integration tests via cluster.py, but stateless tests inherit
LSAN_OPTIONS from the Docker image's ENV directive which did not
include the suppressions file path.

The base Dockerfile already wrote the correct LSAN_OPTIONS (with
suppressions path) to /etc/environment (line 38) for services, but
the ENV on line 45 — used by non-login shells including the test
runner — omitted the suppressions= directive. This caused
clickhouse-disks and other CLI tools to abort when LSAN detected
the AWS-LC FIPS service_indicator_get allocation.

Fix both:
- docker/test/base/Dockerfile: add suppressions path to ENV
  LSAN_OPTIONS, matching the /etc/environment entry
- tests/config/lsan_suppressions.txt: add service_indicator_get
  suppression (this file is installed at
  /usr/share/clickhouse-test/config/lsan_suppressions.txt in the
  test Docker image)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The kitware apt key baked into the altinityinfra/test-util base image
has expired, causing `apt-get update` to fail with:

  GPG error: https://apt.kitware.com/ubuntu jammy InRelease:
  NO_PUBKEY 65ADECD7A7039392

Re-fetch the key before the first apt-get update, same approach as
53bcc4f and 4bc620d for the binary-builder image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set LSAN_OPTIONS with the suppressions file path in the stateless test
runner (tests/clickhouse-test) instead of modifying the Docker image.
This ensures clickhouse-disks and other CLI tools invoked from stateless
tests pick up the AWS-LC FIPS service_indicator_get suppression.

Also reverts the docker/test/base/Dockerfile changes from the previous
two commits (LSAN_OPTIONS ENV and kitware GPG key refresh) to avoid
triggering Docker image rebuilds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mkmkme mkmkme force-pushed the mkmkme/fips/disable-broken-tests branch from 37f10f6 to 50ffdb2 Compare April 4, 2026 15:08
@DimensionWieldr DimensionWieldr added the verified Approved for release label Apr 6, 2026
@DimensionWieldr
Copy link
Copy Markdown
Collaborator

Just CI fixes. Integration fail fixed by #1616 . LGTM

AI audit returned no major defects.

@mkmkme mkmkme merged commit a7fe013 into releases/25.3.8-fips Apr 7, 2026
295 of 314 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fips Work related to Altinity FIPS releases fips-25.3 fips-25.3.8.30001 verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants