Skip to content

[HOTFIX] Use importlib.util.find_spec for pluggable worker discovery#1918

Merged
chandrasekharan-zipstack merged 3 commits intov0.163.2-hotfixfrom
fix/pluggable-worker-find-spec-v0.163.2
Apr 21, 2026
Merged

[HOTFIX] Use importlib.util.find_spec for pluggable worker discovery#1918
chandrasekharan-zipstack merged 3 commits intov0.163.2-hotfixfrom
fix/pluggable-worker-find-spec-v0.163.2

Conversation

@chandrasekharan-zipstack
Copy link
Copy Markdown
Contributor

@chandrasekharan-zipstack chandrasekharan-zipstack commented Apr 17, 2026

What

Replaces the filesystem worker.py existence check in WorkerBuilder._verify_pluggable_worker_exists() with importlib.util.find_spec().

Why

The pre-import check at workers/shared/infrastructure/config/builder.py:338 hardcoded worker.py as the file to look for. This breaks any deployment where the pluggable worker has been compiled to an extension module (e.g. Nuitka or Cython produces a worker.cpython-312-x86_64-linux-gnu.so that replaces worker.py) — the module is perfectly importable, but the hard .py existence check rejects the worker before importlib.import_module() ever runs.

Observed error pattern (paraphrased):

ERROR: Pluggable worker file not found: /app/pluggable_worker/<name>/worker.py
ImportError: Pluggable worker '<name>' not found.
Expected module: pluggable_worker.<name>.worker

This is not specific to any one obfuscation tool — the check would also reject plugins distributed as Cython .so, bytecode-only .pyc, namespace packages, or any future loader that doesn't materialize a .py file on disk.

How

Use Python's standard "is this module importable?" primitive instead of filesystem introspection:

module_path = f"pluggable_worker.{worker_type.value}.worker"

if importlib.util.find_spec(module_path) is None:
    logger.error(f"Pluggable worker module not importable: {module_path}")
    return False

importlib.import_module(module_path)

find_spec() consults every registered finder (source .py, compiled .so, bytecode .pyc, namespace packages, zipimports) and returns None cleanly if nothing can produce the module. The subsequent import_module() still catches findable-but-broken modules (e.g. ImportError inside the loaded module).

The only semantic change: when a parent package exists but an intermediate subpackage is missing (pluggable_worker/ exists but pluggable_worker/<name>/ does not), find_spec raises ModuleNotFoundError. That's a subclass of ImportError and is already handled by the existing except ImportError block, so the function correctly returns False just like before.

Can this PR break any existing features?

No. Behavior preserved across all deployment scenarios, verified via a synthetic fixture that compiles a test worker with Nuitka and runs _verify_pluggable_worker_exists() against it:

Scenario .py present .so present Before After
No subpackage (pluggable workers disabled) no no False (file not found) False (ImportError)
Source available yes no True True
Compiled extension module no yes False ← bug True ← fixed

Relevant Docs

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6536540-703e-4261-95e3-0bc96ef664e1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pluggable-worker-find-spec-v0.163.2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This hotfix replaces the hardcoded worker.py filesystem existence check in _verify_pluggable_worker_exists() with importlib.util.find_spec(), allowing pluggable workers compiled to .so (Nuitka/Cython) or other non-.py forms to be correctly discovered. A secondary fix in worker.py corrects the core-worker directory derivation for api-deployment, whose on-disk path uses hyphens — the old value.replace("-", "_") was a no-op on already-underscored enum values and would have missed the actual directory.

Confidence Score: 5/5

Safe to merge — both changes are targeted correctness fixes with no regressions on the unaffected paths.

The builder.py change is correct and the previously flagged ValueError is now handled. The worker.py fix properly resolves the api-deployment directory mismatch. No new P0/P1 issues found; all remaining notes from prior review threads have been addressed.

No files require special attention.

Important Files Changed

Filename Overview
workers/shared/infrastructure/config/builder.py Replaces filesystem .py check with importlib.util.find_spec(); ValueError now caught alongside ImportError; logic and exception handling are correct.
workers/worker.py Core-worker directory derivation now uses to_import_path().rsplit() instead of a blind replace, correctly resolving api-deployment's hyphenated on-disk path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_verify_pluggable_worker_exists(worker_type)"] --> B{is_pluggable?}
    B -- No --> C[return True]
    B -- Yes --> D["module_path = 'pluggable_worker.{worker_type}.worker'"]
    D --> E["importlib.util.find_spec(module_path)"]
    E -- "returns None" --> F["logger.error\nreturn False"]
    E -- "raises ValueError" --> G["except ImportError | ValueError\nreturn False"]
    E -- "spec found" --> H["importlib.import_module(module_path)"]
    H -- "ImportError / ValueError" --> G
    H -- "OSError / AttributeError" --> I["except OSError | AttributeError\nreturn False"]
    H -- success --> J["logger.debug\nreturn True"]
Loading

Reviews (4): Last reviewed commit: "[FIX] Resolve api-deployment worker dire..." | Re-trigger Greptile

Comment thread workers/shared/infrastructure/config/builder.py
Comment thread workers/shared/infrastructure/config/builder.py
_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@muhammad-ali-e muhammad-ali-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment

# module representation (source .py, Nuitka/Cython .so, .pyc,
# namespace packages, zipimports) — unlike a filesystem check
# for a specific file extension, which breaks for compiled plugins.
module_path = f"pluggable_worker.{worker_type.value}.worker"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this not break running worker locally insated of docker

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally — doesn't break local runs. find_spec() is actually a superset of the previous filesystem check.

Local runs go through workers/run-worker.sh, which sets export PYTHONPATH="$WORKERS_DIR:${PYTHONPATH:-}" (line 702) before launching celery. So import pluggable_worker.<name>.worker resolves the same way the launcher itself uses the dotted module name at line 444 (celery_app_module="pluggable_worker.${worker_type}.worker").

Coverage comparison vs old code:

Scenario Old (Path(...)/worker.py) New (find_spec)
Local .py via run-worker.sh ✅ (PYTHONPATH set by script)
Docker with uncompiled plugin
Docker with compiled .so (Cython/Nuitka) ❌ (checks .py)
OSS with no pluggable plugins n/a — is_pluggable() returns False, early return same
Bare python worker.py from repo root, no PYTHONPATH ❌ (and would fail at actual import too)

The only way someone could get a false negative from the new code but not the old is if they ran a pluggable worker outside run-worker.sh without any PYTHONPATH setup — in that case the old check returned True but celery would have crashed on the real import a few lines later. So net change: we fail earlier with a cleaner log instead of getting a stack trace.

Also confirmed on snapshot.180420262 locally: pluggable workers (agentic_callback + bulk_download worker.so) register and run fine — previously they crash-looped because Path(...)/worker.py returned False for compiled modules.

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@chandrasekharan-zipstack chandrasekharan-zipstack merged commit e6bb412 into v0.163.2-hotfix Apr 21, 2026
6 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the fix/pluggable-worker-find-spec-v0.163.2 branch April 21, 2026 06:28
chandrasekharan-zipstack added a commit that referenced this pull request Apr 21, 2026
)

[HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918)

* [FIX] Use importlib.util.find_spec for pluggable worker discovery

_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.



* [FIX] Handle ValueError from find_spec in pluggable worker verification

Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.



* [FIX] Resolve api-deployment worker directory from enum import path

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).



---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
athul-rs added a commit that referenced this pull request May 7, 2026
* [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918)

* [FIX] Use importlib.util.find_spec for pluggable worker discovery

_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Handle ValueError from find_spec in pluggable worker verification

Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Resolve api-deployment worker directory from enum import path

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944)

* [FEAT] Allow Bedrock to fall through to boto3's default credential chain

Match the S3/MinIO connector pattern: when AWS access keys are left blank
on the Bedrock LLM and embedding adapter forms, drop them from the kwargs
dict so boto3's default credential chain handles authentication. This
unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on
hosts that already have ambient AWS credentials (e.g. EKS workers with
IRSA, EC2 with an instance profile).

- llm1/static/bedrock.json: clarify access-key descriptions to mention
  IRSA and instance profile (already non-required at v0.163.2 base).
- embedding1/static/bedrock.json: drop aws_access_key_id and
  aws_secret_access_key from top-level required; same description fix;
  expose aws_profile_name for parity with the LLM form.
- base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters
  now strip empty access-key values from the validated kwargs before
  returning, so empty strings don't override boto3's default chain.
  AWSBedrockEmbeddingParameters fields gain explicit None defaults
  and an aws_profile_name field.

Backward-compatible: existing adapters with access keys filled in
continue to work unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FEAT] Add Authentication Type selector to Bedrock adapter form

Add an explicit `auth_type` selector with two options, making the auth
choice clear to users:

- "Access Keys" (default): existing flow, keys required
- "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on
  boto3's default credential chain (IRSA on EKS, task role on ECS,
  instance profile on EC2). Description on the selector explicitly notes
  this option is only for AWS-hosted Unstract deployments.

The form-only auth_type field is stripped before LiteLLM validation in
both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters.
validate(). Empty access keys continue to be stripped so boto3 falls
through to the default chain even when the access_keys arm is selected
without values (matches the S3/MinIO connector pattern).

Backward-compatible: legacy adapters without auth_type behave as
"Access Keys" mode (the default), and existing keys are forwarded
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [REVIEW] Address Bedrock auth_type review feedback

Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on
PR #1944.

Behaviour fixes:
- Stale-key leak in IAM Role mode: switching an existing adapter from
  Access Keys to IAM Role would carry truthy stored access keys through
  the strip-empty-only loop, so boto3 silently authenticated with the
  old long-lived credentials instead of falling through to the host's
  IRSA / instance-profile identity. Both LLM and embedding paths were
  affected.
- Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or
  a malformed payload from a non-UI client passed through the dict
  comprehension untouched, with no enum guard.
- Cross-field validation gap: explicit Access Keys mode with blank or
  whitespace-only values silently fell through to the default
  credential chain instead of surfacing the misconfiguration.

Implementation:
- Add a module-level _resolve_bedrock_aws_credentials helper used by
  both AWSBedrockLLMParameters.validate() and AWSBedrock
  EmbeddingParameters.validate(), so the auth-type contract is
  expressed once.
  - Validates auth_type against an allowlist (None | "access_keys" |
    "iam_role"); raises ValueError on anything else.
  - iam_role: unconditionally drops aws_access_key_id and
    aws_secret_access_key.
  - access_keys (explicit): requires non-blank values; raises ValueError
    if either is empty or whitespace-only.
  - Legacy (auth_type absent): retains the lenient strip behaviour so
    pre-PR adapter configurations continue to deserialise unchanged.
- Restore aws_region_name as required (no `= None` default) on
  AWSBedrockEmbeddingParameters; only credentials may legitimately be
  absent.
- Drop the orphan aws_profile_name field from
  embedding1/static/bedrock.json: it was added for parity with the LLM
  form but lives outside the auth_type oneOf and contradicts the
  selector's "no further input" semantics. The LLM form already had
  aws_profile_name pre-PR and is left alone for backwards compatibility.

Tests:
- New tests/test_bedrock_adapter.py covers 15 cases across LLM and
  embedding adapters: legacy-no-auth-type, explicit access_keys with
  valid/blank/whitespace keys, iam_role with stale/no keys, unknown
  auth_type rejection, cross-field validation, and preservation of
  unrelated params (model_id, aws_profile_name, region, thinking).

Skipped (P2 nice-to-have):
- Comment-scope clarification, MinIO reference rewording,
  validate-mutates-caller'\''s-dict, and the LLM form description nit
  about aws_profile_name visibility. These don'\''t change behaviour
  and can be addressed in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>
kirtimanmishrazipstack added a commit that referenced this pull request May 7, 2026
* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS (#1938)

* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS

Production socket connections were failing for `*.env.us-central.unstract.com`
because python-socketio does exact-string comparison on `cors_allowed_origins`,
so a literal `*` pattern silently rejected every real subdomain.

- Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`.
- Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single
  list entry covers all wildcard subdomains, no library subclass needed.
- Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths
  in env are stripped (also fixes the `…com//oauth-status/` double-slash).
- Add startup guard for malformed env values.

Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io,
fallback) are owned separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests

Addresses five review comments on #1938:

1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize
   `Origin` headers with a lowercase host and no explicit default ports;
   `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443`
   would silently fail to match the browser's `https://app.example.com`.
   Switch to `parsed_url.hostname` + drop default ports, and reject
   non-http(s) schemes at startup.

2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match`
   plus `$`, a candidate ending in `\n` matches because `$` is allowed
   before an optional trailing newline. `fullmatch` removes the ambiguity.

3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)`
   (one fixed pattern hash vs. many matching strings). Today this is
   masked because python-socketio uses linear `__eq__` on a list, but if
   the allow-list is ever wrapped in a set, every legitimate subdomain
   would silently be rejected — exactly the failure mode UN-3439 closes.
   Make instances unhashable so the contract can't be broken.

4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py`
   (33 cases) covering: regex match/no-match, lookalike spoofing, scheme
   mismatch, trailing-newline rejection, non-string equality protocol,
   unhashability, ReDoS bounds, URL normalization (case, default ports,
   trailing slash, paths, queries), startup-guard rejections (empty,
   no-scheme, non-browser-scheme, no-host), and end-to-end via the same
   `RegexOrigin` path SocketIO uses.

5. self — Over-clever wildcard-to-regex builder. The
   `split('*').join(re.escape, ...)` construction generalised to N
   wildcards but the input has exactly one; replace with a direct rf-string
   that's self-evident on review.

Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin`
into `backend/utils/cors_origin.py` (Django-free, importable from settings
and tests). Settings now delegates to one helper call; `log_events.py`
imports `RegexOrigin`. No behavioural change beyond what each comment fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address SonarCloud quality gate

The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:

- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
  doesn't see the implicit `__hash__` call). Drove the C reliability rating.
  Fix: use `len({ro})` so the side effect is via an explicit function call;
  test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
  doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
  tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
  These are intentional inputs proving the rejection logic. Annotate with
  `# NOSONAR` and an explanatory comment so the hotspots can be marked
  reviewed.

No production code changed; tests still 33/33 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder

Sonar S5727 correctly inferred that ``ro == None`` is statically always
False (NotImplemented falls back to identity), making the assertion look
tautological. The intent is to lock the protocol contract: ``__eq__`` must
return the ``NotImplemented`` sentinel for non-strings. Test that directly
via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound

Two minor follow-ups from the second CodeRabbit pass:

- `parsed.port` is a property that raises ValueError on malformed/out-of-range
  inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error
  message and surfaced as a stack trace. Wrap the access and re-raise with
  the same actionable text. Adds two test cases (`https://example.com:abc`,
  `https://example.com:99999`) to lock the new behaviour.

- The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to
  500ms — still orders of magnitude below what catastrophic backtracking
  would produce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.161.4 hotfix (#1943)

* Change csp to report only

* [HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (v0.161.4) (#1939)

[HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (#1937)

[FIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var

os.environ.get returns the raw string when the variable is set, so
ENABLE_HIGHLIGHT_API_DEPLOYMENT="False" was truthy in Python (any
non-empty string is truthy). Wrap in CommonUtils.str_to_bool so
"False" / "false" / "0" actually evaluate to False.

The setting is consumed by the cloud configuration plugin's spec
default (ConfigSpec.default in plugins/configuration/cloud_config.py)
on cloud and on-prem builds. With this fix, an admin who explicitly
sets the env var to a falsy string sees highlight data stripped as
expected.

Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line in uv-lock-automation workflow (#1941)

* UN-3448 [FIX] Add --system flag to uv pip install in uv-lock-automation workflow

Modern uv requires uv pip install to run inside a virtual environment OR
with the explicit --system flag. The workflow currently has neither, so
it errors out:

  error: No virtual environment found for Python 3.12.9; run `uv venv`
  to create an environment, or pass `--system` to install into a
  non-virtual environment

This breaks every PR that touches a pyproject.toml (the workflow's
paths filter triggers on those). Last successful run was 2026-04-01,
before a behaviour change in uv or astral-sh/setup-uv@v7.

The --system flag is exactly what the error message suggests and is
correct here — we install pip into the runner's system Python; the
downstream uv-lock.sh script creates its own venvs as needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line per review

Per @jaseemjaskp's review: the pre-step `uv pip install ... pip` does
nothing useful for this workflow. The downstream uv-lock.sh script
uses uv sync at line 74, which manages its own venvs internally and
never invokes pip directly:

  $ grep -rn 'pip' docker/scripts/uv-lock-gen/
  docker/scripts/uv-lock-gen/uv-lock.sh:2:set -o pipefail

Only match is pipefail (shell option), no real pip references.

Removing the line entirely is cleaner than papering over with --system.
The line was likely copy-pasted from a sibling workflow that legitimately
needed pip in the system Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.163.2 hotfix (#1946)

* [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918)

* [FIX] Use importlib.util.find_spec for pluggable worker discovery

_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Handle ValueError from find_spec in pluggable worker verification

Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Resolve api-deployment worker directory from enum import path

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944)

* [FEAT] Allow Bedrock to fall through to boto3's default credential chain

Match the S3/MinIO connector pattern: when AWS access keys are left blank
on the Bedrock LLM and embedding adapter forms, drop them from the kwargs
dict so boto3's default credential chain handles authentication. This
unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on
hosts that already have ambient AWS credentials (e.g. EKS workers with
IRSA, EC2 with an instance profile).

- llm1/static/bedrock.json: clarify access-key descriptions to mention
  IRSA and instance profile (already non-required at v0.163.2 base).
- embedding1/static/bedrock.json: drop aws_access_key_id and
  aws_secret_access_key from top-level required; same description fix;
  expose aws_profile_name for parity with the LLM form.
- base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters
  now strip empty access-key values from the validated kwargs before
  returning, so empty strings don't override boto3's default chain.
  AWSBedrockEmbeddingParameters fields gain explicit None defaults
  and an aws_profile_name field.

Backward-compatible: existing adapters with access keys filled in
continue to work unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FEAT] Add Authentication Type selector to Bedrock adapter form

Add an explicit `auth_type` selector with two options, making the auth
choice clear to users:

- "Access Keys" (default): existing flow, keys required
- "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on
  boto3's default credential chain (IRSA on EKS, task role on ECS,
  instance profile on EC2). Description on the selector explicitly notes
  this option is only for AWS-hosted Unstract deployments.

The form-only auth_type field is stripped before LiteLLM validation in
both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters.
validate(). Empty access keys continue to be stripped so boto3 falls
through to the default chain even when the access_keys arm is selected
without values (matches the S3/MinIO connector pattern).

Backward-compatible: legacy adapters without auth_type behave as
"Access Keys" mode (the default), and existing keys are forwarded
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [REVIEW] Address Bedrock auth_type review feedback

Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on
PR #1944.

Behaviour fixes:
- Stale-key leak in IAM Role mode: switching an existing adapter from
  Access Keys to IAM Role would carry truthy stored access keys through
  the strip-empty-only loop, so boto3 silently authenticated with the
  old long-lived credentials instead of falling through to the host's
  IRSA / instance-profile identity. Both LLM and embedding paths were
  affected.
- Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or
  a malformed payload from a non-UI client passed through the dict
  comprehension untouched, with no enum guard.
- Cross-field validation gap: explicit Access Keys mode with blank or
  whitespace-only values silently fell through to the default
  credential chain instead of surfacing the misconfiguration.

Implementation:
- Add a module-level _resolve_bedrock_aws_credentials helper used by
  both AWSBedrockLLMParameters.validate() and AWSBedrock
  EmbeddingParameters.validate(), so the auth-type contract is
  expressed once.
  - Validates auth_type against an allowlist (None | "access_keys" |
    "iam_role"); raises ValueError on anything else.
  - iam_role: unconditionally drops aws_access_key_id and
    aws_secret_access_key.
  - access_keys (explicit): requires non-blank values; raises ValueError
    if either is empty or whitespace-only.
  - Legacy (auth_type absent): retains the lenient strip behaviour so
    pre-PR adapter configurations continue to deserialise unchanged.
- Restore aws_region_name as required (no `= None` default) on
  AWSBedrockEmbeddingParameters; only credentials may legitimately be
  absent.
- Drop the orphan aws_profile_name field from
  embedding1/static/bedrock.json: it was added for parity with the LLM
  form but lives outside the auth_type oneOf and contradicts the
  selector's "no further input" semantics. The LLM form already had
  aws_profile_name pre-PR and is left alone for backwards compatibility.

Tests:
- New tests/test_bedrock_adapter.py covers 15 cases across LLM and
  embedding adapters: legacy-no-auth-type, explicit access_keys with
  valid/blank/whitespace keys, iam_role with stale/no keys, unknown
  auth_type rejection, cross-field validation, and preservation of
  unrelated params (model_id, aws_profile_name, region, thinking).

Skipped (P2 nice-to-have):
- Comment-scope clarification, MinIO reference rewording,
  validate-mutates-caller'\''s-dict, and the LLM form description nit
  about aws_profile_name visibility. These don'\''t change behaviour
  and can be addressed in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>

* batch notification

---------

Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Praveen Kumar <praveen@zipstack.com>
Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants