Skip to content

fix: Prompt studio index and run issues#1

Merged
jaseemjaskp merged 1 commit intomainfrom
fix/prompt-studio-index-and-run
Feb 26, 2024
Merged

fix: Prompt studio index and run issues#1
jaseemjaskp merged 1 commit intomainfrom
fix/prompt-studio-index-and-run

Conversation

@chandrasekharan-zipstack
Copy link
Copy Markdown
Contributor

What

  • Updated platform key retrieval logic in PromptIdeBaseTool

Why

  • Noticed issues with indexing a file and running of prompts due
    image

How

...

Relevant Docs

Related Issues or PRs

Dependencies Versions / Env Variables

Notes on Testing

  • Tried indexing and running a prompt on a file

Screenshots

image
image

Checklist

I have read and understood the Contribution Guidelines.

@nehabagdia
Copy link
Copy Markdown
Contributor

@jaseemjaskp I am not merging this yet as I want you to confirm if org_id is really mandatory and if the approach for multi-tenancy sounds OK.
Pulling in the change makes my prompt studio work OK locally.

@jaseemjaskp
Copy link
Copy Markdown
Contributor

@jaseemjaskp I am not merging this yet as I want you to confirm if org_id is really mandatory and if the approach for multi-tenancy sounds OK. Pulling in the change makes my prompt studio work OK locally.

@nehabagdia I think it is impossible to get platform key without org_id.

@jaseemjaskp jaseemjaskp merged commit c64479c into main Feb 26, 2024
@jaseemjaskp jaseemjaskp deleted the fix/prompt-studio-index-and-run branch February 26, 2024 07:25
pk-zipstack pushed a commit that referenced this pull request Aug 20, 2025
ritwik-g added a commit that referenced this pull request Nov 10, 2025
1. Remove deprecated acquire_slot() method (PR comment #1)
   - Method marked as deprecated, not used anywhere in codebase
   - check_and_acquire() is the only method used (atomic operation)
   - Dead code removal improves maintainability

2. Add detailed comment explaining TTL vs manual cleanup (PR comment #2)
   - Clarifies why _cleanup_expired_entries() is needed even with TTL
   - Redis ZSET TTL expires entire key, not individual entries
   - Manual cleanup (ZREMRANGEBYSCORE) removes stale entries within ZSET
   - Both mechanisms work together for accurate rate limiting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
vishnuszipstack pushed a commit that referenced this pull request Nov 10, 2025
…he and per-org locks (#1649)

* UN-2972: Implement API deployment rate limiting with Django cache and per-org locks

This commit implements a comprehensive rate limiting system for API deployments
with atomic operations, caching, and management commands.

Features:
- Organization-level and global concurrent request limits
- Per-organization Redis distributed locks to prevent race conditions
- Django cache integration for org limits (~95% reduction in DB queries)
- Automatic cache invalidation on limit updates
- Centralized constants to eliminate magic strings
- Management commands for easy administration

Technical Details:
- Uses Redis ZSET for tracking active executions with TTL-based cleanup
- Per-org locks ensure atomic check-and-acquire operations
- Global limit enforced with eventual consistency (acceptable tolerance)
- Rate limiter fails open on Redis errors for system reliability
- Retry-After header in 429 responses for proper client backoff

New Files:
- backend/api_v2/rate_limit_constants.py: Centralized keys and constants
- backend/api_v2/rate_limiter.py: Core rate limiting logic
- backend/api_v2/migrations/0003_add_organization_rate_limit.py: DB migration
- backend/api_v2/management/commands/set_org_rate_limit.py: Set org limits
- backend/api_v2/management/commands/get_org_rate_limit.py: View org limits
- backend/api_v2/management/commands/list_org_rate_limits.py: List all limits

Modified Files:
- backend/api_v2/models.py: OrganizationRateLimit model with auto cache clearing
- backend/api_v2/deployment_helper.py: Use atomic check_and_acquire
- backend/api_v2/admin.py: Admin interface for OrganizationRateLimit
- backend/api_v2/exceptions.py: RateLimitExceeded exception
- backend/backend/exceptions.py: Add Retry-After header to 429 responses
- backend/workflow_manager/workflow_v2/models/execution.py: Auto-release on completion
- backend/backend/settings/base.py: Rate limiting configuration
- backend/sample.env: Configuration documentation

Configuration (environment variables):
- API_DEPLOYMENT_DEFAULT_RATE_LIMIT: Default per-org limit (default: 5)
- API_DEPLOYMENT_GLOBAL_RATE_LIMIT: System-wide limit (default: 50)
- API_DEPLOYMENT_RATE_LIMIT_TTL_HOURS: ZSET entry TTL (default: 6 hours)
- API_DEPLOYMENT_RATE_LIMIT_CACHE_TTL: Cache TTL (default: 1 hour)
- API_DEPLOYMENT_RATE_LIMIT_LOCK_TIMEOUT: Lock timeout (default: 2 seconds)
- API_DEPLOYMENT_RATE_LIMIT_LOCK_BLOCKING_TIMEOUT: Lock wait (default: 5 seconds)
- API_DEPLOYMENT_RATE_LIMIT_RETRY_AFTER: Retry-After value (default: 300 seconds)

Management Commands:
- python manage.py set_org_rate_limit <org-id-or-name> <limit>
- python manage.py get_org_rate_limit <org-id-or-name> [--clear-cache]
- python manage.py list_org_rate_limits [--with-usage]

Performance Impact:
- DB queries for org limits: ~95% reduction (cached)
- Rate limit check latency: 10-20ms (acceptable tradeoff for no race conditions)
- Lock contention: Minimal (per-org, not global)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* UN-2972 [FIX] Move signal imports to top of models.py to fix Ruff E402

Fix Ruff linting errors by moving Django signal imports to the top of the file
according to PEP 8 conventions.

Changes:
- Moved `from django.db.models.signals import post_delete` to line 7
- Moved `from django.dispatch import receiver` to line 8
- Removed duplicate imports from line 268-269 (bottom of file)
- Signal handler function remains at bottom unchanged

Fixes Ruff errors:
- E402: Module level import not at top of file (line 266)
- E402: Module level import not at top of file (line 267)

Also installed pre-commit hooks locally for future commits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix organization lookup in rate limit management commands

The Organization.organization_id field is a CharField (not UUIDField),
storing custom string IDs like "org_qijtoAkJNhznYhNt". The previous
implementation attempted UUID parsing which failed for these values.

Changes:
- Remove UUID validation that assumes organization_id is a UUID
- Try organization_id lookup first (handles both UUID and string formats)
- Fall back to name lookup if organization_id lookup fails
- Update help text to remove "UUID" reference

This allows commands to work with both:
- Organization ID: python manage.py set_org_rate_limit org_qijtoAkJNhznYhNt 10
- Organization name: python manage.py set_org_rate_limit zipstack 10

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add delete_org_rate_limit management command and improve error logging

New command allows administrators to remove custom organization rate limits,
reverting organizations back to the system default limit.

Features:
- Delete custom rate limit for an organization (by ID or name)
- Confirmation prompt (can be skipped with --force)
- Shows default limit that will be used after deletion
- Warns if current usage exceeds default
- Automatic cache clearing via post_delete signal

Also improved error logging in WorkflowExecution to use logger.exception()
instead of logger.error() for better stack trace visibility.

Usage:
  python manage.py delete_org_rate_limit org_qijtoAkJNhznYhNt
  python manage.py delete_org_rate_limit zipstack --force

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add cache clearing command and improve cache TTL behavior

New Features:
1. Management command to clear organization rate limit cache
   - Clear specific org: --org-id <org_id>
   - Clear all with custom limits: (default)
   - Clear ALL orgs: --all
   - Uses Redis pattern deletion (rate_limit:cache:org_limit:*)
     for performance, falls back to individual deletion

2. Reduced cache TTL from 1 hour to 10 minutes
   - Allows faster pickup of default limit changes from ENV
   - Still provides significant DB query reduction

3. Implement cache TTL refresh on every read
   - TTL is extended by 10 minutes on every cache hit
   - Frequently-used orgs stay cached indefinitely
   - Inactive orgs expire after 10 minutes
   - LRU-like behavior without explicit LRU cache

Usage:
  # Clear specific organization cache
  python manage.py clear_org_rate_limit_cache --org-id org_qijtoAkJNhznYhNt

  # Clear cache for all orgs with custom limits
  python manage.py clear_org_rate_limit_cache

  # Clear cache for ALL organizations (uses pattern deletion if Redis)
  python manage.py clear_org_rate_limit_cache --all

This is useful when changing API_DEPLOYMENT_DEFAULT_RATE_LIMIT to ensure
organizations pick up the new default value immediately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add comprehensive rate limiting documentation and update default limits

Documentation:
- Created docs/API_DEPLOYMENT_RATE_LIMITING.md with:
  - Architecture overview (per-org + global limits, Redis locks, cache)
  - All 7 ENV variables with detailed explanations
  - All 5 management commands with usage examples
  - Usage scenarios and best practices
  - Troubleshooting guide
  - Performance characteristics and security considerations
  - Confirms rate limiting ONLY affects API deployments (not ETL/tasks)

Default Limit Updates (3 locations):
- Organization limit: 5 → 20 concurrent requests
- Global limit: 50 → 100 concurrent requests
- Updated in:
  1. backend/api_v2/rate_limit_constants.py (RateLimitDefaults)
  2. backend/sample.env (environment variable defaults)
  3. backend/backend/settings/base.py (Django setting defaults)

Also updated cache TTL default in base.py: 3600s → 600s (10 minutes)
to match the change made in previous commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use centralized constants and remove Retry-After functionality

Issues Fixed:
1. RateLimitExceeded was not using centralized RateLimitMessages constants
2. Response format documentation didn't match actual drf-standardized-errors format
3. Retry-After header and retry_after_seconds were not useful (hard to predict accurately)

Changes:
- RateLimitExceeded now uses RateLimitMessages.get_org_limit_exceeded_message()
  and get_global_limit_exceeded_message() for consistent error messages
- Removed retry_after_seconds parameter from RateLimitExceeded.__init__()
- Removed retry_after_seconds from rate_limiter.py return dicts
- Removed retry_after_seconds from deployment_helper.py exception call
- Removed dead Retry-After header code from backend/exceptions.py
- Removed API_DEPLOYMENT_RATE_LIMIT_RETRY_AFTER from:
  - sample.env
  - settings/base.py
  - rate_limit_constants.py (DEFAULT_RETRY_AFTER_SECONDS)
- Updated documentation to show actual drf-standardized-errors response format:
  {
    "type": "client_error",
    "errors": [{"code": "error", "detail": "...", "attr": null}]
  }
- Removed all Retry-After references from documentation

Clients should implement their own retry logic with exponential backoff.
Rate limits are released when active requests complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix late imports and comparison bug in rate limit management commands

- Move APIDeploymentRateLimiter import to module top (PEP 8 compliance)
- Fix off-by-one warning bug: change > to >= to match rate limiter blocking logic
- Affects: delete_org_rate_limit, set_org_rate_limit, get_org_rate_limit, list_org_rate_limits

The rate limiter blocks when usage >= limit, so warnings must also use >= for consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Refactor rate limiting to view layer with dual release paths

**Problem:**
Review bot identified critical bug where early failures (DB, file staging, queue dispatch)
would leak rate limit slots for 6 hours, potentially exhausting org/global quotas.

**Root Cause:**
- Rate limiting was in deployment_helper.py
- Helper has internal try-catch that swallows exceptions and returns error response
- Exceptions that don't propagate can't be caught by outer handlers
- No slot release on these failures → orphaned slots

**Solution:**
Move rate limiting to view layer with dual release strategy:

1. **View Layer (api_deployment_views.py)**
   - Acquires rate limit slot before calling execute_workflow()
   - Wraps execute_workflow() in try-catch
   - Releases slot if exception propagates (early setup failures)

2. **Helper Layer (deployment_helper.py)**
   - Accepts optional execution_id parameter
   - Existing exception handler now also releases slot
   - Handles failures in async dispatch that don't propagate

3. **Signal (workflow completion)**
   - Unchanged - releases slot when async job completes successfully

**Coverage:**
✅ Lines 197-241 failures (Tag creation, WorkflowExecution, file staging) → View catches & releases
✅ Lines 243-289 failures (Async dispatch, config checks) → Helper catches & releases
✅ Async job completion → Signal releases
✅ No double-release (each path has one release point)
✅ No orphaned slots on any error path

**Files Changed:**
- api_deployment_views.py: Add rate limit check + blanket exception handler
- deployment_helper.py: Accept execution_id param, add slot release to exception handler
- API_DEPLOYMENT_RATE_LIMITING.md: Update architecture diagram and code locations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use logger.exception() to capture full traceback in rate limit exception handler

Changed logger.error() to logger.exception() in api_deployment_views.py exception
handler to automatically include the full stack trace. This provides better debugging
information when workflow execution fails during rate limit protected operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update documentation with generic organization IDs

* Address PR review comments: remove dead code and clarify TTL behavior

1. Remove deprecated acquire_slot() method (PR comment #1)
   - Method marked as deprecated, not used anywhere in codebase
   - check_and_acquire() is the only method used (atomic operation)
   - Dead code removal improves maintainability

2. Add detailed comment explaining TTL vs manual cleanup (PR comment #2)
   - Clarifies why _cleanup_expired_entries() is needed even with TTL
   - Redis ZSET TTL expires entire key, not individual entries
   - Manual cleanup (ZREMRANGEBYSCORE) removes stale entries within ZSET
   - Both mechanisms work together for accurate rate limiting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ritwik-g pushed a commit that referenced this pull request May 5, 2026
…o CORS (#1938)

* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS

Production socket connections were failing for `*.env.us-central.unstract.com`
because python-socketio does exact-string comparison on `cors_allowed_origins`,
so a literal `*` pattern silently rejected every real subdomain.

- Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`.
- Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single
  list entry covers all wildcard subdomains, no library subclass needed.
- Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths
  in env are stripped (also fixes the `…com//oauth-status/` double-slash).
- Add startup guard for malformed env values.

Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io,
fallback) are owned separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests

Addresses five review comments on #1938:

1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize
   `Origin` headers with a lowercase host and no explicit default ports;
   `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443`
   would silently fail to match the browser's `https://app.example.com`.
   Switch to `parsed_url.hostname` + drop default ports, and reject
   non-http(s) schemes at startup.

2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match`
   plus `$`, a candidate ending in `\n` matches because `$` is allowed
   before an optional trailing newline. `fullmatch` removes the ambiguity.

3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)`
   (one fixed pattern hash vs. many matching strings). Today this is
   masked because python-socketio uses linear `__eq__` on a list, but if
   the allow-list is ever wrapped in a set, every legitimate subdomain
   would silently be rejected — exactly the failure mode UN-3439 closes.
   Make instances unhashable so the contract can't be broken.

4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py`
   (33 cases) covering: regex match/no-match, lookalike spoofing, scheme
   mismatch, trailing-newline rejection, non-string equality protocol,
   unhashability, ReDoS bounds, URL normalization (case, default ports,
   trailing slash, paths, queries), startup-guard rejections (empty,
   no-scheme, non-browser-scheme, no-host), and end-to-end via the same
   `RegexOrigin` path SocketIO uses.

5. self — Over-clever wildcard-to-regex builder. The
   `split('*').join(re.escape, ...)` construction generalised to N
   wildcards but the input has exactly one; replace with a direct rf-string
   that's self-evident on review.

Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin`
into `backend/utils/cors_origin.py` (Django-free, importable from settings
and tests). Settings now delegates to one helper call; `log_events.py`
imports `RegexOrigin`. No behavioural change beyond what each comment fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address SonarCloud quality gate

The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:

- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
  doesn't see the implicit `__hash__` call). Drove the C reliability rating.
  Fix: use `len({ro})` so the side effect is via an explicit function call;
  test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
  doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
  tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
  These are intentional inputs proving the rejection logic. Annotate with
  `# NOSONAR` and an explanatory comment so the hotspots can be marked
  reviewed.

No production code changed; tests still 33/33 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder

Sonar S5727 correctly inferred that ``ro == None`` is statically always
False (NotImplemented falls back to identity), making the assertion look
tautological. The intent is to lock the protocol contract: ``__eq__`` must
return the ``NotImplemented`` sentinel for non-strings. Test that directly
via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound

Two minor follow-ups from the second CodeRabbit pass:

- `parsed.port` is a property that raises ValueError on malformed/out-of-range
  inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error
  message and surfaced as a stack trace. Wrap the access and re-raise with
  the same actionable text. Adds two test cases (`https://example.com:abc`,
  `https://example.com:99999`) to lock the new behaviour.

- The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to
  500ms — still orders of magnitude below what catastrophic backtracking
  would produce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kirtimanmishrazipstack added a commit that referenced this pull request May 7, 2026
* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS (#1938)

* UN-3439 [FIX] Accept wildcard subdomain origins in SocketIO and Django CORS

Production socket connections were failing for `*.env.us-central.unstract.com`
because python-socketio does exact-string comparison on `cors_allowed_origins`,
so a literal `*` pattern silently rejected every real subdomain.

- Add `CORS_ALLOWED_ORIGIN_REGEXES` derived from `WEB_APP_ORIGIN_URL_WITH_WILD_CARD`.
- Wire SocketIO via `_RegexOrigin` whose `__eq__` does the regex match — single
  list entry covers all wildcard subdomains, no library subclass needed.
- Normalize `WEB_APP_ORIGIN_URL` through `urlparse` so trailing slashes / paths
  in env are stripped (also fixes the `…com//oauth-status/` double-slash).
- Add startup guard for malformed env values.

Resolves item #1 of UN-3439. Items #2/#3 (decoupling indexing from Socket.io,
fallback) are owned separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address PR review: canonical origin, fullmatch, unhashable RegexOrigin, tests

Addresses five review comments on #1938:

1. coderabbitai (Major) — RFC 6454 canonicalization. Browsers serialize
   `Origin` headers with a lowercase host and no explicit default ports;
   `parsed_url.netloc` preserved both, so `https://APP.EXAMPLE.COM:443`
   would silently fail to match the browser's `https://app.example.com`.
   Switch to `parsed_url.hostname` + drop default ports, and reject
   non-http(s) schemes at startup.

2. greptile (P2) — `re.fullmatch` instead of `re.match`. With `re.match`
   plus `$`, a candidate ending in `\n` matches because `$` is allowed
   before an optional trailing newline. `fullmatch` removes the ambiguity.

3. self — `_RegexOrigin.__hash__` violated `a == b ⇒ hash(a) == hash(b)`
   (one fixed pattern hash vs. many matching strings). Today this is
   masked because python-socketio uses linear `__eq__` on a list, but if
   the allow-list is ever wrapped in a set, every legitimate subdomain
   would silently be rejected — exactly the failure mode UN-3439 closes.
   Make instances unhashable so the contract can't be broken.

4. self — No regression tests. Add `backend/utils/tests/test_cors_origin.py`
   (33 cases) covering: regex match/no-match, lookalike spoofing, scheme
   mismatch, trailing-newline rejection, non-string equality protocol,
   unhashability, ReDoS bounds, URL normalization (case, default ports,
   trailing slash, paths, queries), startup-guard rejections (empty,
   no-scheme, non-browser-scheme, no-host), and end-to-end via the same
   `RegexOrigin` path SocketIO uses.

5. self — Over-clever wildcard-to-regex builder. The
   `split('*').join(re.escape, ...)` construction generalised to N
   wildcards but the input has exactly one; replace with a direct rf-string
   that's self-evident on review.

Refactor for testability: extract `RegexOrigin` and `normalize_web_app_origin`
into `backend/utils/cors_origin.py` (Django-free, importable from settings
and tests). Settings now delegates to one helper call; `log_events.py`
imports `RegexOrigin`. No behavioural change beyond what each comment fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address SonarCloud quality gate

The Sonar quality gate failed with C reliability + 5 security hotspots, all
on the new test file:

- S905 (Bug, Major) — `{ro}` flagged as no-side-effect statement (Sonar
  doesn't see the implicit `__hash__` call). Drove the C reliability rating.
  Fix: use `len({ro})` so the side effect is via an explicit function call;
  test still asserts the same `TypeError`.
- S5727 (Code Smell, Critical) — `assert ro != None` is tautological and
  doesn't exercise `__eq__`. Switch to `(ro == None) is False` which directly
  tests that `NotImplemented` falls back to identity-equality.
- S5332 × 5 (Hotspots) — `http://` and `ftp://` literals in test data.
  These are intentional inputs proving the rejection logic. Annotate with
  `# NOSONAR` and an explanatory comment so the hotspots can be marked
  reviewed.

No production code changed; tests still 33/33 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Remove last S5727 code smell — test __eq__ via dunder

Sonar S5727 correctly inferred that ``ro == None`` is statically always
False (NotImplemented falls back to identity), making the assertion look
tautological. The intent is to lock the protocol contract: ``__eq__`` must
return the ``NotImplemented`` sentinel for non-strings. Test that directly
via ``ro.__eq__(None) is NotImplemented`` instead of going through ``==``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3439 [FIX] Address remaining CodeRabbit nits — port validation, ReDoS bound

Two minor follow-ups from the second CodeRabbit pass:

- `parsed.port` is a property that raises ValueError on malformed/out-of-range
  inputs (e.g. `:abc`, `:99999`). That bypassed our normalized config-error
  message and surfaced as a stack trace. Wrap the access and re-raise with
  the same actionable text. Adds two test cases (`https://example.com:abc`,
  `https://example.com:99999`) to lock the new behaviour.

- The 50ms ReDoS timing bound is too tight for noisy CI runners. Loosen to
  500ms — still orders of magnitude below what catastrophic backtracking
  would produce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.161.4 hotfix (#1943)

* Change csp to report only

* [HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (v0.161.4) (#1939)

[HOTFIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var (#1937)

[FIX] Bool-parse ENABLE_HIGHLIGHT_API_DEPLOYMENT env var

os.environ.get returns the raw string when the variable is set, so
ENABLE_HIGHLIGHT_API_DEPLOYMENT="False" was truthy in Python (any
non-empty string is truthy). Wrap in CommonUtils.str_to_bool so
"False" / "false" / "0" actually evaluate to False.

The setting is consumed by the cloud configuration plugin's spec
default (ConfigSpec.default in plugins/configuration/cloud_config.py)
on cloud and on-prem builds. With this fix, an admin who explicitly
sets the env var to a falsy string sees highlight data stripped as
expected.

Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line in uv-lock-automation workflow (#1941)

* UN-3448 [FIX] Add --system flag to uv pip install in uv-lock-automation workflow

Modern uv requires uv pip install to run inside a virtual environment OR
with the explicit --system flag. The workflow currently has neither, so
it errors out:

  error: No virtual environment found for Python 3.12.9; run `uv venv`
  to create an environment, or pass `--system` to install into a
  non-virtual environment

This breaks every PR that touches a pyproject.toml (the workflow's
paths filter triggers on those). Last successful run was 2026-04-01,
before a behaviour change in uv or astral-sh/setup-uv@v7.

The --system flag is exactly what the error message suggests and is
correct here — we install pip into the runner's system Python; the
downstream uv-lock.sh script creates its own venvs as needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3448 [FIX] Remove vestigial `uv pip install` line per review

Per @jaseemjaskp's review: the pre-step `uv pip install ... pip` does
nothing useful for this workflow. The downstream uv-lock.sh script
uses uv sync at line 74, which manages its own venvs internally and
never invokes pip directly:

  $ grep -rn 'pip' docker/scripts/uv-lock-gen/
  docker/scripts/uv-lock-gen/uv-lock.sh:2:set -o pipefail

Only match is pipefail (shell option), no real pip references.

Removing the line entirely is cleaner than papering over with --system.
The line was likely copy-pasted from a sibling workflow that legitimately
needed pip in the system Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ReverseMerge: V0.163.2 hotfix (#1946)

* [HOTFIX] Use importlib.util.find_spec for pluggable worker discovery (#1918)

* [FIX] Use importlib.util.find_spec for pluggable worker discovery

_verify_pluggable_worker_exists() previously checked for the literal file
`pluggable_worker/<name>/worker.py` on disk, which breaks when the plugin
has been compiled to a .so (Nuitka, Cython, or any C extension) — the
module is perfectly importable but the pre-check rejects it because only
the .py extension is considered.

Replace the filesystem check with importlib.util.find_spec(), which is
Python's standard way to ask "is this module resolvable by the import
system?". It honors every registered finder — source .py, compiled .so,
bytecode .pyc, namespace packages, zipimports — so the function now
matches what its docstring claims: verifying the module can be loaded,
not that a specific file extension is present.

Behavior is preserved for existing deployments:
- Images with no `pluggable_worker/<name>/` subpackage → find_spec
  raises ModuleNotFoundError (ImportError subclass) → returns False.
- Images with source .py → find_spec resolves the .py → returns True.
- Images with compiled .so → find_spec resolves the .so → returns True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Handle ValueError from find_spec in pluggable worker verification

Greptile-flagged edge case: importlib.util.find_spec() can raise
ValueError (not just ImportError) when sys.modules has a partially
initialised module entry with __spec__ = None from a prior failed import.
Broaden the except to catch both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Resolve api-deployment worker directory from enum import path

worker.py:452 did worker_type.value.replace("-", "_") to derive the
on-disk dir name. All WorkerType enum values already use underscores,
so the replace was a no-op; for API_DEPLOYMENT whose dir is
"api-deployment" (hyphen), it resolved to "api_deployment" and the
os.path.exists() check failed. Boot then logged a spurious
"❌ Worker directory not found: /app/api_deployment" at ERROR level.

The task registration path (builder + celery autodiscover via
to_import_path) is unaffected, so this was purely log noise — but
noise at ERROR level that masks real failures in log scans.

Fix: derive the directory from the authoritative to_import_path()
which already handles the hyphen case (api_deployment -> api-deployment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [HOTFIX] Add IAM Role / Instance Profile auth mode to AWS Bedrock adapter (#1944)

* [FEAT] Allow Bedrock to fall through to boto3's default credential chain

Match the S3/MinIO connector pattern: when AWS access keys are left blank
on the Bedrock LLM and embedding adapter forms, drop them from the kwargs
dict so boto3's default credential chain handles authentication. This
unlocks IAM role / instance profile / IRSA / AWS Profile scenarios on
hosts that already have ambient AWS credentials (e.g. EKS workers with
IRSA, EC2 with an instance profile).

- llm1/static/bedrock.json: clarify access-key descriptions to mention
  IRSA and instance profile (already non-required at v0.163.2 base).
- embedding1/static/bedrock.json: drop aws_access_key_id and
  aws_secret_access_key from top-level required; same description fix;
  expose aws_profile_name for parity with the LLM form.
- base1.py: AWSBedrockLLMParameters and AWSBedrockEmbeddingParameters
  now strip empty access-key values from the validated kwargs before
  returning, so empty strings don't override boto3's default chain.
  AWSBedrockEmbeddingParameters fields gain explicit None defaults
  and an aws_profile_name field.

Backward-compatible: existing adapters with access keys filled in
continue to work unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FEAT] Add Authentication Type selector to Bedrock adapter form

Add an explicit `auth_type` selector with two options, making the auth
choice clear to users:

- "Access Keys" (default): existing flow, keys required
- "IAM Role / Instance Profile (on-prem AWS only)": no fields; relies on
  boto3's default credential chain (IRSA on EKS, task role on ECS,
  instance profile on EC2). Description on the selector explicitly notes
  this option is only for AWS-hosted Unstract deployments.

The form-only auth_type field is stripped before LiteLLM validation in
both AWSBedrockLLMParameters.validate() and AWSBedrockEmbeddingParameters.
validate(). Empty access keys continue to be stripped so boto3 falls
through to the default chain even when the access_keys arm is selected
without values (matches the S3/MinIO connector pattern).

Backward-compatible: legacy adapters without auth_type behave as
"Access Keys" mode (the default), and existing keys are forwarded
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [REVIEW] Address Bedrock auth_type review feedback

Fixes the P0/P1 issues raised by greptile-apps and jaseemjaskp on
PR #1944.

Behaviour fixes:
- Stale-key leak in IAM Role mode: switching an existing adapter from
  Access Keys to IAM Role would carry truthy stored access keys through
  the strip-empty-only loop, so boto3 silently authenticated with the
  old long-lived credentials instead of falling through to the host's
  IRSA / instance-profile identity. Both LLM and embedding paths were
  affected.
- Silent acceptance of unknown auth_type: a typo (e.g. "access_key") or
  a malformed payload from a non-UI client passed through the dict
  comprehension untouched, with no enum guard.
- Cross-field validation gap: explicit Access Keys mode with blank or
  whitespace-only values silently fell through to the default
  credential chain instead of surfacing the misconfiguration.

Implementation:
- Add a module-level _resolve_bedrock_aws_credentials helper used by
  both AWSBedrockLLMParameters.validate() and AWSBedrock
  EmbeddingParameters.validate(), so the auth-type contract is
  expressed once.
  - Validates auth_type against an allowlist (None | "access_keys" |
    "iam_role"); raises ValueError on anything else.
  - iam_role: unconditionally drops aws_access_key_id and
    aws_secret_access_key.
  - access_keys (explicit): requires non-blank values; raises ValueError
    if either is empty or whitespace-only.
  - Legacy (auth_type absent): retains the lenient strip behaviour so
    pre-PR adapter configurations continue to deserialise unchanged.
- Restore aws_region_name as required (no `= None` default) on
  AWSBedrockEmbeddingParameters; only credentials may legitimately be
  absent.
- Drop the orphan aws_profile_name field from
  embedding1/static/bedrock.json: it was added for parity with the LLM
  form but lives outside the auth_type oneOf and contradicts the
  selector's "no further input" semantics. The LLM form already had
  aws_profile_name pre-PR and is left alone for backwards compatibility.

Tests:
- New tests/test_bedrock_adapter.py covers 15 cases across LLM and
  embedding adapters: legacy-no-auth-type, explicit access_keys with
  valid/blank/whitespace keys, iam_role with stale/no keys, unknown
  auth_type rejection, cross-field validation, and preservation of
  unrelated params (model_id, aws_profile_name, region, thinking).

Skipped (P2 nice-to-have):
- Comment-scope clarification, MinIO reference rewording,
  validate-mutates-caller'\''s-dict, and the LLM form description nit
  about aws_profile_name visibility. These don'\''t change behaviour
  and can be addressed in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

---------

Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>

* batch notification

---------

Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
Co-authored-by: vishnuszipstack <117254672+vishnuszipstack@users.noreply.github.com>
Co-authored-by: Praveen Kumar <praveen@zipstack.com>
Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Athul <89829560+athul-rs@users.noreply.github.com>
ritwik-g pushed a commit that referenced this pull request May 8, 2026
* UN-3450 [FEAT] Add golden-path smoke test for callback worker

Adds workers/tests/test_callback_sanity.py mirroring the existing
test_executor_sanity.py pattern. Covers:

- Worker enums and registry (WorkerType.CALLBACK, QueueName.CALLBACK +
  CALLBACK_API, TaskName.PROCESS_BATCH_CALLBACK, health port 8083,
  WorkerRegistry queue config + task routing).
- Celery task wiring (process_batch_callback, process_batch_callback_api,
  healthcheck — registration, retry config, max_retries, autoretry_for).
- Full dispatch -> task -> return round-trip via Celery eager mode,
  using the simple healthcheck task (process_batch_callback itself
  needs a configured InternalAPIClient + downstream HTTP, which is
  heavy mocking territory unsuitable for a smoke test — covered later
  in #1.2 characterisation suite).

Coverage gains on callback module:
- callback/__init__.py:  0% -> 100%
- callback/worker.py:    0% -> 52%
- callback/tasks.py:     0% -> 10%
- Module total:          0% -> 12%

14 tests, run in ~17s. Regression net for spine PRs #6 (CallbackStatus
enum migration) and #13 (chord -> Barrier lift in callback chain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3450 [FEAT] Address Greptile review on callback smoke test

Two P2 findings from Greptile, both fixed:

1. test_healthcheck_result_is_json_serializable: drop the `default=str`
   fallback in json.dumps. With it, any non-JSON-serializable value
   (UUID, datetime, custom object) gets silently coerced to a string —
   exactly the failure mode the test claims to catch. Without it, the
   assertion faithfully tests "round-trips cleanly via JSON" the way
   Celery's serializer would.

2. Add registration check for `process_batch_callback_django_compat`
   (the backward-compat task name `workflow_manager.workflow_v2.
   file_execution_tasks.process_batch_callback`). Both this task and
   `process_batch_callback` delegate to `_process_batch_callback_core`,
   so refactors that touch the core (e.g. the upcoming CallbackStatus
   enum migration) affect this path too. Three-line addition.

15 tests now (was 14), runtime 11s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants