ApiCortex
GitHub: 0xarchit/ApiCortex
Live Demo: https://api-cortex.vercel.app
Tip
Predict API Failures Before They Happen. An enterprise-grade SaaS platform using ML analytics on real production traffic.
Overview
ApiCortex is an enterprise-grade SaaS platform that predicts API failures before they occur using machine learning analytics on real production traffic. The platform ensures API contract compliance and provides proactive failure detection through advanced anomaly detection algorithms.
Key Capabilities
- Predictive Analytics: ML-powered failure prediction with 95%+ accuracy
- Real-time Monitoring: Sub-second telemetry processing via Kafka streaming
- Contract Validation: OpenAPI specification enforcement and drift detection
- Multi-tenant Architecture: Organization-based isolation with RBAC
- Time-series Analytics: Historical querying with TimescaleDB
- Developer Dashboard: Interactive Next.js UI with live metrics
Deployment Status (MVP)
For the initial MVP launch, we have adopted a hybrid-cloud strategy utilizing high-performance managed services.
Note
To maximize efficiency and minimize cross-service latency on free-tier resources, the core backend services (Ingest, Control Plane, and ML Service) are orchestrated within a unified Docker container on HuggingFace Spaces.
Architecture
System Flow Diagram
graph TB
subgraph "Presentation Layer"
A[Next.js Dashboard]
B[REST API Clients]
end
subgraph "Control Plane"
C[FastAPI Server]
D[Auth Service]
E[API Management]
F[Contract Validator]
end
subgraph "Data Plane"
G[Go Ingest Service]
H[Kafka Producer]
I[Rate Limiter]
end
subgraph "ML Plane"
J[Python ML Service]
K[Feature Engineering]
L[XGBoost Predictor]
M[Anomaly Detector]
end
subgraph "Execution Plane"
Q[Rust Testing Engine]
R[SSRF Shield]
S[External APIs]
end
subgraph "Storage"
N[(PostgreSQL)]
O[(TimescaleDB)]
P[Kafka Topics]
end
A --> C
B --> C
C --> D
C --> E
C --> F
C <--> Q
Q --> R
R --> S
G --> H
H --> P
J --> P
J --> K
K --> L
L --> M
C --> N
G --> O
J --> OFeatures
Core Features
Technical Specifications
- Throughput: 10,000+ events/second
- Latency: <50ms p99 for telemetry ingestion
- Accuracy: 95%+ failure prediction accuracy
- Retention: Configurable (default 30 days)
- Scalability: Horizontal scaling with Kafka partitions
System Components
1. Data Plane (Go)
Location: ingest-service/
Responsible for high-throughput telemetry collection and streaming.
Key Files:
cmd/server/main.go- Application entry pointinternal/api/handler.go- HTTP request handlersinternal/kafka/producer.go- Kafka producerinternal/buffer/batcher.go- Event batching
2. Control Plane (FastAPI)
Location: control-plane/
Handles authentication, API metadata, and contract management.
Key Files:
app/main.py- FastAPI applicationapp/routers/auth.py- Authentication endpointsapp/routers/apis.py- API managementapp/services/contract_service.py- Contract validation
3. ML Plane (Python)
Location: ml-service/
Processes telemetry streams and generates failure predictions.
Key Files:
app/main.py- ML worker entryworkers/inference_worker.py- Inference pipelineapp/features/feature_engineering.py- Feature extractionapp/inference/predictor.py- Model prediction
4. Presentation Plane (Next.js)
Location: frontend/
Developer dashboard for monitoring and management.
5. Execution Engine (Rust)
Location: api-testing/
High-performance, secure engine optimized for executing REST, GraphQL, and WebSocket tests.
Key Files:
src/main.rs- Axum server entrysrc/executor.rs- Core execution & security logicsrc/protocols/- WebSocket & HTTP handlerssrc/models.rs- Result & Snapshot schemas
Data Flow
Telemetry Data Flow
sequenceDiagram
participant Client as API Client
participant Ingest as Ingest Service
participant Kafka as Apache Kafka
participant ML as ML Service
participant DB as TimescaleDB
participant UI as Dashboard
Client->>Ingest: POST /v1/telemetry
Ingest->>Ingest: Validate & Buffer
Ingest->>Kafka: Publish telemetry.raw
Ingest->>DB: Store telemetry
Ingest-->>Client: 200 OK
ML->>Kafka: Consume telemetry.raw
ML->>ML: Feature Engineering
ML->>ML: XGBoost Prediction
ML->>DB: Store prediction
ML->>Kafka: Publish alerts
UI->>DB: Query metrics
UI->>UI: Display chartsPrediction Flow
flowchart TD
A[Telemetry Event] --> B{Kafka Consumer}
B --> C[Feature Extraction]
C --> D[1m Window Stats]
C --> E[5m Window Stats]
C --> F[15m Window Stats]
D --> G[Feature Vector]
E --> G
F --> G
G --> H{XGBoost Model}
H --> I[Risk Score]
I --> J{Threshold Check}
J -->|Score > 0.8| K[Generate Alert]
J -->|Score < 0.8| L[Store Prediction]
K --> M[Kafka Alerts Topic]
L --> N[TimescaleDB]Getting Started
Prerequisites
- Go: 1.26 or later
- Python: 3.11 or later
- Node.js: 22 or later
- PostgreSQL: 16+ or NeonDB
- TimescaleDB: Latest version
- Apache Kafka: 3.0 or later
Installation
# Clone repository
git clone https://github.com/0xarchit/apicortex.git
cd apicortex
# Set up environment variables
cp .env.example .env
# Edit .env with your credentials
# Start infrastructure (Docker)
docker-compose up -d
Running Services
# Ingest Service
cd ingest-service && go run cmd/server/main.go
# Control Plane
cd control-plane && uvicorn app.main:app --reload
# ML Service
cd ml-service && python app/main.py
# API Testing Engine (Rust)
cd api-testing && cargo run
# Frontend
cd frontend && npm run dev
Configuration
Environment Variables
Service Configuration
Ingest Service (ingest-service/.env):
PORT=8080
KAFKA_SERVICE_URI=kafka:9092
BATCH_SIZE=500
FLUSH_INTERVAL_SECONDS=2
ACTIVE_POLLING_ENABLED=true
Control Plane (control-plane/.env):
DATABASE=postgresql://user:pass@host:5432/db
JWT_SECRET_KEY=your-secret-key
OAUTH_GITHUB_CLIENT_ID=your-client-id
ML Service (ml-service/.env):
KAFKA_TOPIC_RAW=telemetry.raw
MODEL_PATH=model/xgboost_failure_prediction.pkl
ALERT_THRESHOLD=0.8
ENABLE_SHAP=true
Usage
Dashboard Access
- Open browser:
http://localhost:3000 - Sign in with OAuth (Google/GitHub)
- Navigate to Dashboard
API Endpoints
Monitoring
Health Checks
Logging
All services use structured logging in JSON format:
- Ingest: Zerolog
- Control Plane: Python logging
- ML Service: Python logging
Troubleshooting
Common Issues
Services Won't Start
Solution:
# Check environment variables
printenv | grep APICORTEX
# Verify database connectivity
psql $DATABASE -c "SELECT 1"
# Check Kafka connection
kafka-consumer-groups --bootstrap-server $KAFKA_URI --list
High Memory Usage
Solution:
# Reduce batch size
BATCH_SIZE=100
# Limit buffer capacity
MAX_BUFFER_CAPACITY=10000
Kafka Consumer Lag
Solution:
- Increase consumer parallelism
- Add more ML worker instances
- Check network connectivity
Debug Mode
DEBUG=true
LOG_LEVEL=debug
Security
Authentication Flow
sequenceDiagram
participant User
participant Frontend
participant ControlPlane
participant OAuth
participant DB
User->>Frontend: Click "Login"
Frontend->>ControlPlane: Initiate OAuth
ControlPlane->>OAuth: Redirect
User->>OAuth: Authenticate
OAuth->>ControlPlane: OAuth Callback
ControlPlane->>DB: Create/Update User
ControlPlane->>Frontend: JWT Token
Frontend->>User: Dashboard AccessAPI Key Management
- Keys are hashed with pepper before storage
- Keys are rotated every 90 days
- Audit logging for all key operations
Contributing
- Fork the repository
- Create feature branch
- Submit pull request
- Pass CI/CD pipeline
Development Setup
# Install dependencies
go mod download
pip install -r requirements.txt
npm install
# Run tests
go test ./...
pytest
npm test
Support
- Email: mail@0xarchit.is-a.dev
- Discussions: GitHub Discussions
- Issues: GitHub Issues