removed translator, PII redactor and workflow settings#11
Closed
muhammad-ali-e wants to merge 2 commits intomainfrom
Closed
removed translator, PII redactor and workflow settings#11muhammad-ali-e wants to merge 2 commits intomainfrom
muhammad-ali-e wants to merge 2 commits intomainfrom
Conversation
Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
chandrasekharan-zipstack
added a commit
that referenced
this pull request
Apr 17, 2026
… wrappers Address review comments on PR #1886: - #9 (typing): call_with_retry / acall_with_retry / iter_with_retry previously returned `object`, erasing caller type info. Add PEP 695 generics so the return type flows from the wrapped callable: acall_with_retry now takes Callable[[], Awaitable[T]] and iter_with_retry takes Callable[[], Iterable[T]] -> Generator[T, ...]. - #11 / #13 (DRY): `_pop_retry_params` in embedding.py and `_disable_litellm_retry` in llm.py were identical logic. Lift to shared `pop_litellm_retry_kwargs` helper in retry_utils.py and delete both methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
muhammad-ali-e
added a commit
that referenced
this pull request
Apr 17, 2026
* [FIX] Unified retry for LLM and embedding providers litellm's retry only works for SDK-based providers (OpenAI/Azure). httpx-based providers (Anthropic, Vertex, Bedrock, Mistral) and ALL embedding calls silently ignore max_retries. This adds self-managed retry with exponential backoff at the SDK layer, disabling litellm's own retry entirely for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [REFACTOR] DRY retry logic into reusable call_with_retry utilities Move retry loops out of LLM/Embedding classes into generic call_with_retry, acall_with_retry, and iter_with_retry functions in retry_utils.py. Both classes now call these directly instead of maintaining their own retry helper methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Consolidate retry logic, expose max_retries for all adapters - Extract _get_retry_delay() shared helper to eliminate duplicated retry decision logic across call_with_retry, acall_with_retry, iter_with_retry, and retry_with_exponential_backoff - Add num_retries=0 to embedding._pop_retry_params() to fully disable litellm's internal retry for embedding calls - Expose max_retries in UI JSON schemas for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously the field existed in Pydantic models but wasn't shown to users, silently defaulting to 0 retries - Add debug logging to LLM and Embedding retry parameter extraction - Clarify docstrings distinguishing is_retryable_litellm_error() from is_retryable_error() (different exception hierarchies) - Remove stale noqa: C901 from simplified retry_with_exponential_backoff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Set max_retries default to 3 for all embedding and Ollama LLM adapters Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Address greptile review: fix shadowed ConnectionError, use MRO check - Fix `requests.ConnectionError` shadowing Python's builtin `ConnectionError` in `is_retryable_litellm_error()` — rename import to `RequestsConnectionError` and use `builtins.ConnectionError` / `builtins.TimeoutError` explicitly - Use `__mro__`-based class name check instead of `type(error).__name__` to also catch subclasses of retryable error types - P1 (num_retries not zeroed) was already fixed in prior commit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Address CodeRabbit review: add APITimeoutError, validate max_retries - Add APITimeoutError to _RETRYABLE_ERROR_NAMES for explicit OpenAI SDK timeout coverage - Add _validate_max_retries() guard to call_with_retry, acall_with_retry, iter_with_retry to fail fast on negative values instead of silently returning None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Reduce cognitive complexity and remove useless except clause Address SonarCloud findings on PR #1886: - S3776: Flatten retry_with_exponential_backoff.wrapper by moving the success logging + return out of the try block and using `continue` in the retry path, so the except branch only handles the give-up case. - S2737: Drop the `except Exception: raise` clause — it was a no-op that added complexity without changing behavior (non-matching exceptions propagate naturally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Extract retry loop to top-level helper to drop cognitive complexity Sonar still flagged retry_with_exponential_backoff at complexity 16 after the previous flatten. Nested def decorator / def wrapper counted against the outer function's score. Move the retry body to a module-level _invoke_with_retries helper so the decorator factory just delegates, bringing the outer function well under the 15 threshold. Behavior is unchanged — all paths (success, retry, give-up, non-retryable propagate) are preserved and covered by the existing SDK1 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Honor Retry-After, close stream gen on retry, share give-up log Address review comments on PR #1886: - #10 (resource leak): close the generator returned by fn() before retrying in iter_with_retry — otherwise streaming providers leak an in-flight HTTP socket until GC. - #12 (behavioral regression): when we zero out SDK/wrapper retries we also lose the OpenAI SDK's native Retry-After handling on 429/503. _get_retry_delay now checks error.response.headers["retry-after"] and uses that value ahead of exponential backoff. HTTP-date form is not parsed; those fall back to backoff. - #8 (observability gap): move the "Giving up ... after N attempt(s)" log into _get_retry_delay so all four retry helpers (call_with_retry, acall_with_retry, iter_with_retry, decorator) share the same exhaustion signal. Previously only the decorator path logged it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [REFACTOR] Share retry-kwargs helper and add TypeVar to retry wrappers Address review comments on PR #1886: - #9 (typing): call_with_retry / acall_with_retry / iter_with_retry previously returned `object`, erasing caller type info. Add PEP 695 generics so the return type flows from the wrapped callable: acall_with_retry now takes Callable[[], Awaitable[T]] and iter_with_retry takes Callable[[], Iterable[T]] -> Generator[T, ...]. - #11 / #13 (DRY): `_pop_retry_params` in embedding.py and `_disable_litellm_retry` in llm.py were identical logic. Lift to shared `pop_litellm_retry_kwargs` helper in retry_utils.py and delete both methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
11 tasks
muhammad-ali-e
added a commit
that referenced
this pull request
Apr 28, 2026
Two valid nitpicks from coderabbit on the previous commits:
* ``_coerce_id`` now explicitly rejects ``bool`` before the ``int``
check. Without the guard, ``True``/``False`` would coerce to the
literal strings ``"True"``/``"False"`` because ``bool`` is a
subclass of ``int`` in Python.
* ``_bind_task_context`` now logs at DEBUG with ``exc_info=True``
when ``_extract_request_id`` raises, instead of swallowing the
exception silently. Behaviour is unchanged in production
(request_id still falls back to task_id); the log line lets
operators diagnose malformed payloads from the executor pod.
Skipped coderabbit's third suggestion (replace ``@functools.lru_cache``
with explicit ``threading.Lock`` + flag) -- contradicts the explicit
preference in PR review #11 for the simpler lru_cache pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ritwik-g
pushed a commit
that referenced
this pull request
May 8, 2026
* UN-3435 [FIX] Bind request_id and trace context to worker logs
The new workers service was emitting `request_id:- trace_id:- span_id:-`
in every log line, blocking ops debugging. The plumbing existed
(RequestIDFilter, OTelFieldFilter, LogContext.request_id) but was
never wired up to actual values.
Changes in workers/shared/infrastructure/logging/logger.py:
* RequestIDFilter now falls back to LogContext.request_id from the
thread-local context when record.request_id is unset, so log_context()
and signal-based binding actually populate the structured field.
* Added _extract_request_id() with priority order
request_id > file_execution_id > execution_id > run_id, preserving
the legacy structure-tool convention of using file_execution_id as
the per-tool correlation ID. Migration to true HTTP X-Request-ID
later is purely additive on the backend side.
* Added Celery task_prerun/task_postrun signals that bind request_id
onto LogContext for every task and clear it on completion. Falls
back to Celery task_id when the payload has none, so non-execution
tasks (scheduler, log-consumer) still get a usable correlation ID.
* Signal install is idempotent and silently no-ops if Celery is not
importable (unit tests).
Trace ID / span ID for the executor pod is fixed by a paired helm
change in unstract-cloud (workerExecutorV2 OTel auto-instrumentation
wrapper in otel.values.yaml).
Verified with sanity tests covering filter fallback, explicit-extra
precedence, payload extraction, prerun/postrun lifecycle, and full
Celery signal round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* UN-3435 [FIX] Address PR review comments
Three review fixes from greptile bot in #1932:
* `_clear_task_context` now resets only task-scoped fields (`request_id`,
`task_id`) instead of deleting the entire `LogContext`. This preserves
baseline fields like `worker_name` set at `WorkerLogger.configure()` --
previously the first task_postrun would wipe `worker_name` permanently
and subsequent tasks ran without it.
* Signal install is now thread-safe via double-checked locking with
`threading.Lock`. The previous bare-flag check could allow two
threads racing through `_install_celery_request_id_signals` to both
pass the guard and register handlers twice -- Celery's
`Signal.connect()` does not deduplicate, so handlers would fire twice
per task.
* `_extract_request_id` now also scans dict values inside `kwargs`
(e.g. `task(context={"execution_id": "abc"})`). The previous version
only looked at top-level kwargs and dict args; dict-valued kwargs
were silently missed and fell back to `task_id`. Search order now:
top-level kwargs > nested dict in kwargs > dict args.
Verified:
* 20 concurrent installs register the signal exactly once.
* `worker_name` survives `_clear_task_context`; `request_id` and
`task_id` are nulled.
* `kwargs={'context': {'execution_id': 'X'}}` resolves to 'X'.
* Original 9 sanity tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* UN-3435 [FIX] Address PR review on request_id extraction
Addresses 13 review comments on #1932. Highlights:
* P0 -- Positional-string args now resolve. ``async_execute_bin``
is dispatched as ``send_task("async_execute_bin", args=[schema,
workflow_id, execution_id, file_hash])`` -- all positional strings.
The previous implementation only inspected dict args and dict
kwargs values, so it returned None and fell back to the Celery
task UUID, defeating per-execution log granularity for the main
workflow path. ``_gather_containers`` now uses
``inspect.signature(task.run).bind_partial(*args, **kwargs)`` to
map positional args to parameter names before scanning.
* Priority order is now by KEY, not by container. Outer loop is
over ``_REQUEST_ID_KEYS``, inner over containers, so a payload
with file_execution_id in one nested dict and execution_id in
another deterministically picks the higher-priority key
regardless of insertion order.
* ``_coerce_id`` rejects non-id types. Previous ``str(value)``
could stringify dataclass instances or arbitrary objects into
``"<X object at 0x...>"`` log lines and OTel attributes. Only
``str``, ``int``, and ``UUID`` are accepted now.
* Dataclass positional args are supported via
``dataclasses.is_dataclass`` + ``dataclasses.asdict``.
* ``_bind_task_context`` wraps extraction in try/except so a
misbehaving payload falls back to ``task_id`` instead of leaving
the previous task's id bound on the thread.
* Signal install simplified to ``@functools.lru_cache(maxsize=1)``
-- idempotent and thread-safe by construction; replaces the
explicit flag + Lock pattern.
* ``ImportError`` for ``celery.signals`` now logs at debug level
instead of disappearing silently, so a broken deployment is
diagnosable rather than mysteriously missing request_ids.
* Type hints use ``Mapping[str, Any]`` / ``Sequence[Any]`` instead
of bare ``dict`` / ``tuple``.
* Docstring on ``RequestIDFilter`` no longer name-checks a single
helper as the sole writer of ``LogContext.request_id``.
Tests added at ``workers/shared/tests/test_logger_request_id.py``
(22 tests covering filter fallback chain, coerce rejection, key-
priority scan, signature-bound positional args, dataclass arg,
nested dict priority, baseline preservation, and concurrent install).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* UN-3435 [FIX] Drop test file from PR; deferred to follow-up
Tests at workers/shared/tests/test_logger_request_id.py have been
moved out of this PR. Rationale:
* Keeps this PR scoped to the P0 ops fix (logger.py only).
* Test conventions for workers/shared/tests/ deserve their own
review (only one existing test file there today).
* pytest project config integration (coverage flags etc.) needs
separate verification.
The 22 tests have been verified locally against the current
implementation; a follow-up PR will add them once test layout
conventions are confirmed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* UN-3435 [FIX] Reduce _gather_containers cognitive complexity
Sonar flagged _gather_containers at cognitive complexity 19 (max 15).
Extracted two single-purpose helpers:
* _bind_positional_args_to_names -- the inspect.signature path that
catches send_task("async_execute_bin", args=[schema, wf, exec, ...])
where ids are passed as positional strings.
* _arg_as_mapping -- coerces a single positional arg into a Mapping
(Mapping pass-through, dataclass via dataclasses.asdict, else None).
_gather_containers itself is now a flat ordered append: bound names,
kwargs + nested mappings, then per-arg coercion. Behaviour is
unchanged; verified with sanity scenarios for each input shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* UN-3435 [FIX] Address coderabbit nitpicks
Two valid nitpicks from coderabbit on the previous commits:
* ``_coerce_id`` now explicitly rejects ``bool`` before the ``int``
check. Without the guard, ``True``/``False`` would coerce to the
literal strings ``"True"``/``"False"`` because ``bool`` is a
subclass of ``int`` in Python.
* ``_bind_task_context`` now logs at DEBUG with ``exc_info=True``
when ``_extract_request_id`` raises, instead of swallowing the
exception silently. Behaviour is unchanged in production
(request_id still falls back to task_id); the log line lets
operators diagnose malformed payloads from the executor pod.
Skipped coderabbit's third suggestion (replace ``@functools.lru_cache``
with explicit ``threading.Lock`` + flag) -- contradicts the explicit
preference in PR review #11 for the simpler lru_cache pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
...
Screenshots
...
Checklist
I have read and understood the Contribution Guidelines.