[Enhancement] Add metadata tracking system with latency percentiles and node costs#59
Merged
zephyrzilla merged 7 commits intomainfrom Nov 13, 2025
Merged
[Enhancement] Add metadata tracking system with latency percentiles and node costs#59zephyrzilla merged 7 commits intomainfrom
zephyrzilla merged 7 commits intomainfrom
Conversation
…ics and cost tracking - Implemented MetadataCollector singleton for centralized metrics collection - Added automatic tracking via @track_model_request decorator - Integrated LangChain callback for agent/tool tracking - Added comprehensive test suite (94 tests, 100% passing) Features: - Multi-level tracking: aggregate, per-model, and per-node metrics - Token statistics: prompt, completion, total tokens with averages - Performance metrics: latency (total, average, percentiles), throughput, failure rates - Latency statistics: min, max, mean, median, std_dev, p50, p95, p99 using Python statistics module - Cost tracking: per-model, per-node, and aggregate costs in USD - Response code distribution tracking - Git context capture (commit hash, branch, dirty status) - Environment metadata (Python version, SyGra version) - Thread-safe implementation with proper locking - Toggle support via --disable_metadata flag - Automatic JSON export with timestamp synchronization Supported Models: - OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, GPT-4o, GPT-4o-mini) - Azure OpenAI - Anthropic Claude (via AWS Bedrock) - vLLM (OpenAI-compatible endpoints) - TGI (Text Generation Inference) Documentation: - Comprehensive feature documentation in docs/features/metadata_tracking.md - Usage examples and API reference - Architecture overview - Token extraction implementation details Tests: - test_metadata_collector.py: Core collector functionality (32 tests) - test_metadata_integration.py: Decorator integration (16 tests) - test_metadata_end_to_end.py: End-to-end workflows (12 tests) - test_langchain_callback.py: LangChain integration (6 tests) - test_metadata_toggle.py: Enable/disable functionality (12 tests) - test_metadata.py: Additional integration tests (16 tests)
bidyapati-p
reviewed
Nov 12, 2025
vipul-mittal
previously approved these changes
Nov 13, 2025
121dba2
vipul-mittal
approved these changes
Nov 13, 2025
psriramsnc
approved these changes
Nov 13, 2025
amitsnow
approved these changes
Nov 13, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a comprehensive metadata tracking system for SyGra that automatically captures execution metrics, token usage, costs, and performance data across all LLM calls and workflow executions. The system provides detailed latency statistics (including percentiles), per-node cost tracking, and multi-level metrics aggregation, requiring zero changes to existing code.
Explain the features implemented:
1. Centralized Metadata Collection System
MetadataCollectorfor tracking all execution metrics2. Latency Statistics
3. Per-Node Cost Tracking
calculate_cost()methodtotal_cost_usdandaverage_cost_per_executionper node4. Cost Tracking with LangChain Community Integration (
langchain-community)5. Automatic Tracking Infrastructure
@track_model_requestdecorator for custom model wrappersMetadataTrackingCallbackfor LangChain agent LLM callsBaseNodefor consistent tracking across all node types6. Comprehensive Metrics Tracking
7. Timestamp Synchronization
output_2025-10-30_18-19-07.json->metadata_..._2025-10-30_18-19-07.json8. Toggle Support
--disable_metadataCLI flagcollector.set_enabled(False)9. Supported Models
How to Test the feature
Test 1: Library Usage with Latency Statistics
Expected Result:
test/output_YYYY-MM-DD_HH-MM-SS.jsontest/metadata/metadata_test_metadata_YYYY-MM-DD_HH-MM-SS.jsonTest 2: CLI Usage
Expected Result:
tasks/examples/glaive_code_assistant/metadata/Expected Result:
Screenshots (if applicable)
Metadata File Structure
{ "metadata_version": "1.0.0", "generated_at": "2025-11-05T21:57:10.123456", "execution": { "task_name": "tasks.examples.glaive_code_assistant", "timing": { "start_time": "2025-11-05T21:57:07.899389", "end_time": "2025-11-05T21:57:10.657968", "duration_seconds": 2.759 }, "environment": { "python_version": "3.11.12", "sygra_version": "1.0.0" }, "git": { "commit_hash": "139a535...", "branch": "scratch/metadata", "is_dirty": false } }, "aggregate_statistics": { "tokens": { "total_prompt_tokens": 440, "total_completion_tokens": 920, "total_tokens": 1360 }, "cost": { "total_cost_usd": 0.00062, "average_cost_per_record": 0.000062 }, "requests": { "total_requests": 20, "total_failures": 0, "failure_rate": 0.0 } }, "models": { "gpt-4o-mini": { "model_type": "OpenAI", "performance": { "average_latency_seconds": 3.203, "tokens_per_second": 21.23, "latency_statistics": { "min": 2.105, "max": 4.821, "mean": 3.203, "median": 3.150, "std_dev": 0.652, "p50": 3.150, "p95": 4.512, "p99": 4.759 } }, "cost": { "total_cost_usd": 0.00062, "average_cost_per_request": 0.000031 } } }, "nodes": { "summarizer": { "node_name": "summarizer", "node_type": "llm", "model_name": "gpt-4o-mini", "total_executions": 10, "latency_statistics": { "min": 2.105, "max": 4.821, "mean": 3.203, "median": 3.150, "std_dev": 0.652, "p50": 3.150, "p95": 4.512, "p99": 4.759 }, "cost": { "total_cost_usd": 0.00031, "average_cost_per_execution": 0.000031 }, "token_statistics": { "total_prompt_tokens": 220, "total_completion_tokens": 460, "total_tokens": 680 } } } }Checklist
Breaking Changes
None. This is a purely additive feature with full backward compatibility. The feature works automatically with all existing code.