- Overview
- Architecture
- Core Features
- Technology Stack
- Getting Started
- Design Decisions
- Limitations
- Future Roadmap
- License
DTMS Platform is an OpenShift-based distributed monitoring system designed to simulate and monitor multi-site data transfers across heterogeneous infrastructure.
The platform demonstrates multi-site monitoring, streaming analytics, anomaly detection, and Kubernetes-native deployment under strict resource constraints.
- Multi-site transfer exporters (SITE_A, SITE_B, SITE_C)
- Real-time metric collection with Prometheus
- Visualization with Grafana dashboards
- Anomaly detection using Isolation Forest ML
- Streaming pipeline with Kafka + Spark
- REST API powered by FastAPI
- Data freshness validation (Go microservice)
- Correlation analytics via Kubernetes CronJobs
- Containerized deployment on OpenShift
- Resource-optimized for strict 3 vCPU quota environments
The system follows a distributed producer-consumer pattern with centralized observability and batch analytics.
graph TD
subgraph Exporters
A[Exporter A] --> PVC[(Shared PVC)]
B[Exporter B] --> PVC
C[Exporter C] --> PVC
end
PVC --> Anomaly[Anomaly Exporter<br/>Isolation Forest]
Anomaly --> Prom
subgraph Streaming
Prod[Producer] --> Kafka
Kafka --> Spark[Spark Streaming]
Spark --> Prom
end
subgraph Observability
Prom[Prometheus] --> Grafana[Grafana Dashboards]
Prom --> API[FastAPI Aggregator]
API --> Go[Go Freshness Service]
end
subgraph Analytics
Cron[Correlation CronJob] -.-> API
end
- Exporters (A/B/C) write simulated transfer data to a shared PVC
- Anomaly Exporter reads data and applies Isolation Forest detection
- Prometheus scrapes metrics from all services
- FastAPI aggregates anomalies and provides REST interface
- Go Service validates data freshness in real-time
- CronJob runs periodic correlation analytics
Three independent exporters simulate distributed transfer activity across heterogeneous sites, replicating real-world multi-datacenter scenarios.
Prometheus scrapes comprehensive metrics:
- Transfer throughput and latency
- Anomaly detection scores
- Streaming event rates
- Data freshness heartbeats
- System health indicators
Implements Isolation Forest machine learning model:
- Detects approximately 5% anomalous transfers
- Real-time scoring based on transfer patterns
- Publishes anomaly ratios and confidence scores
- Adaptive threshold management
Built on Kafka + Spark Streaming:
- Kafka producer โ Spark Streaming consumer โ per-batch aggregation exposed as Prometheus metrics.
Lightweight Go microservice for reliability monitoring:
- Polls API to validate system liveness
- Computes per-site data freshness metrics
- Exposes
dtms_data_fresh_secondsanddtms_data_fresh_ok - Minimal resource footprint
Kubernetes CronJob for batch processing:
- Analyzes anomaly patterns vs throughput
- Computes correlation coefficients
- Automated execution every 10 minutes
- Exports findings to monitoring stack
Engineered for sandbox and edge environments:
- Hard Limit: Operates under 3 vCPU request quota
- Memory Optimized: Minimal footprint per service
- Storage Efficient: Shared RWX PVC strategy
- Cost-Effective: Suitable for free-tier clusters
| Layer | Technology | Purpose |
|---|---|---|
| Core Logic | Python 3.9+ | Main exporters and ML models |
| Microservice | Go 1.19+ | Freshness validation |
| API | FastAPI | REST aggregation layer |
| Streaming | Kafka + Spark | Real-time event processing |
| Monitoring | Prometheus | Metric collection & TSDB |
| Visualization | Grafana | Dashboard & alerting |
| Orchestration | OpenShift / K8s | Container orchestration |
| Storage | PVC (RWX) | Shared persistent volume |
| ML | scikit-learn | Isolation Forest model |
- OpenShift 4.x or Kubernetes 1.24+
kubectlorocCLI tool- Access to a cluster namespace
- At least 3 vCPU and 4Gi memory quota
# Clone the repository
git clone https://github.com/yourusername/dtms-platform.git
cd dtms-platform
# Create namespace
oc new-project dtms-monitoring
# Deploy storage
oc apply -f k8s/storage/
# Deploy services
oc apply -f k8s/deployments/
oc apply -f k8s/services/
oc apply -f k8s/routes/
# Deploy monitoring
oc apply -f k8s/config/
# Deploy analytics
oc apply -f k8s/cronjobs/
# Verify deployment
oc get pods -n dtms-monitoring# Get routes
oc get routes
# Access Grafana Dashboard
open $(oc get route grafana -o jsonpath='{.spec.host}')
# Access API Documentation
open $(oc get route dtms-api -o jsonpath='{.spec.host}')/docs- Lightweight: 10x smaller memory footprint vs Python equivalent
- Performance: Sub-millisecond response times
- Polyglot Showcase: Demonstrates multi-language architecture
- Sidecar Pattern: Ideal for health-check microservices
- Resource Efficiency: Avoids continuous CPU usage
- Batch Pattern: Analytics valuable in periodic intervals
- Quota Management: Fits within 3 vCPU constraint
- Kubernetes Native: Leverages built-in scheduling
- Simplicity: No external object storage (S3) needed
- Unified Source: Single source of truth for all readers
- Sandbox Friendly: Works in constrained environments
- Cost Effective: No egress charges
- Sandbox Optimized: Tuned for development clusters, not production-scale
- Kafka Configuration: Single broker, limited Zookeeper resources
- Security: No TLS between internal services (demo environment)
- Scaling: Not currently configured for horizontal pod autoscaling
- Persistence: Limited retention policies for demo purposes
- GitLab CI/CD automation
- Alertmanager rules for anomaly thresholds
- Helm chart packaging
- Resource usage benchmarking
- Structured logging and tracing integration
This project is licensed under the MIT License - see the LICENSE file for details.