Skip to main content

Overview

UltraBalancer provides comprehensive metrics and observability features for monitoring performance, tracking request patterns, and identifying issues in production environments.

Real-Time Metrics

Live performance metrics with millisecond precision

Prometheus Export

Native Prometheus metrics format support

HTTP Endpoints

JSON and Prometheus metrics endpoints

Zero Overhead

Lock-free atomic operations for minimal impact

Metrics Collection

UltraBalancer collects metrics automatically without configuration. All metrics are stored in memory with minimal performance impact using atomic operations and lock-free data structures.

Available Metrics

Request Counters
  • total_requests - Total number of requests processed
  • successful_requests - Requests completed successfully (2xx, 3xx)
  • failed_requests - Requests that failed (4xx, 5xx, timeouts)
  • requests_per_second - Current requests/second rate
Response Times
  • avg_response_time_ms - Mean response time
  • min_response_time_ms - Fastest response time
  • max_response_time_ms - Slowest response time
  • p50_response_time_ms - 50th percentile (median)
  • p95_response_time_ms - 95th percentile
  • p99_response_time_ms - 99th percentile

Metrics Endpoints

JSON Metrics Endpoint

The /metrics endpoint returns metrics in JSON format:
curl http://localhost:8080/metrics
Response Format:
{
  "total_requests": 1547823,
  "successful_requests": 1542891,
  "failed_requests": 4932,
  "avg_response_time_ms": 12.45,
  "min_response_time_ms": 1.23,
  "max_response_time_ms": 234.56,
  "p50_response_time_ms": 10.12,
  "p95_response_time_ms": 28.45,
  "p99_response_time_ms": 45.67,
  "uptime_seconds": 86400,
  "requests_per_second": 17.91,
  "backends": [
    {
      "address": "192.168.1.10:8080",
      "healthy": true,
      "active_connections": 42,
      "total_requests": 515941,
      "failures": 0,
      "avg_response_time_ms": 11.23
    },
    {
      "address": "192.168.1.11:8080",
      "healthy": true,
      "active_connections": 38,
      "total_requests": 515940,
      "failures": 0,
      "avg_response_time_ms": 12.87
    }
  ]
}

Prometheus Metrics Endpoint

The /prometheus endpoint exports metrics in Prometheus format:
curl http://localhost:8080/prometheus
Response Format:
# HELP ultrabalancer_requests_total Total number of requests
# TYPE ultrabalancer_requests_total counter
ultrabalancer_requests_total 1547823

# HELP ultrabalancer_requests_successful Successful requests
# TYPE ultrabalancer_requests_successful counter
ultrabalancer_requests_successful 1542891

# HELP ultrabalancer_requests_failed Failed requests
# TYPE ultrabalancer_requests_failed counter
ultrabalancer_requests_failed 4932

# HELP ultrabalancer_response_time_seconds Response time in seconds
# TYPE ultrabalancer_response_time_seconds histogram
ultrabalancer_response_time_seconds_bucket{le="0.005"} 123456
ultrabalancer_response_time_seconds_bucket{le="0.01"} 234567
ultrabalancer_response_time_seconds_bucket{le="0.025"} 345678
ultrabalancer_response_time_seconds_bucket{le="0.05"} 456789
ultrabalancer_response_time_seconds_bucket{le="0.1"} 512345
ultrabalancer_response_time_seconds_bucket{le="0.25"} 545678
ultrabalancer_response_time_seconds_bucket{le="0.5"} 547123
ultrabalancer_response_time_seconds_bucket{le="1"} 547823
ultrabalancer_response_time_seconds_bucket{le="+Inf"} 1547823
ultrabalancer_response_time_seconds_sum 19247.456
ultrabalancer_response_time_seconds_count 1547823

# HELP ultrabalancer_backend_healthy Backend health status
# TYPE ultrabalancer_backend_healthy gauge
ultrabalancer_backend_healthy{backend="192.168.1.10:8080"} 1
ultrabalancer_backend_healthy{backend="192.168.1.11:8080"} 1

# HELP ultrabalancer_backend_requests_total Total requests per backend
# TYPE ultrabalancer_backend_requests_total counter
ultrabalancer_backend_requests_total{backend="192.168.1.10:8080"} 515941
ultrabalancer_backend_requests_total{backend="192.168.1.11:8080"} 515940

# HELP ultrabalancer_backend_active_connections Active connections per backend
# TYPE ultrabalancer_backend_active_connections gauge
ultrabalancer_backend_active_connections{backend="192.168.1.10:8080"} 42
ultrabalancer_backend_active_connections{backend="192.168.1.11:8080"} 38

# HELP ultrabalancer_uptime_seconds Load balancer uptime
# TYPE ultrabalancer_uptime_seconds counter
ultrabalancer_uptime_seconds 86400

Prometheus Integration

Prometheus Configuration

Add UltraBalancer as a scrape target in your Prometheus configuration:
prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'ultrabalancer'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/prometheus'
    scrape_interval: 5s

Dynamic Service Discovery

Use Prometheus service discovery for dynamic backend discovery:
prometheus.yml
scrape_configs:
  - job_name: 'ultrabalancer'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - production
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: ultrabalancer
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: ${1}:8080
      - target_label: __metrics_path__
        replacement: /prometheus

Query Examples

Common Prometheus queries for UltraBalancer metrics:
# Request rate per second
rate(ultrabalancer_requests_total[5m])

# Success rate percentage
100 * (
  rate(ultrabalancer_requests_successful[5m]) /
  rate(ultrabalancer_requests_total[5m])
)

# Error rate
rate(ultrabalancer_requests_failed[5m])

# Average response time
rate(ultrabalancer_response_time_seconds_sum[5m]) /
rate(ultrabalancer_response_time_seconds_count[5m])

# 95th percentile response time
histogram_quantile(0.95, rate(ultrabalancer_response_time_seconds_bucket[5m]))

# 99th percentile response time
histogram_quantile(0.99, rate(ultrabalancer_response_time_seconds_bucket[5m]))

# Requests per backend
sum by (backend) (rate(ultrabalancer_backend_requests_total[5m]))

# Unhealthy backends
count(ultrabalancer_backend_healthy == 0)

# Total active connections
sum(ultrabalancer_backend_active_connections)

Grafana Dashboards

Pre-Built Dashboard

Import the official UltraBalancer Grafana dashboard:
1

Download Dashboard

Download the dashboard JSON from the repository:
curl -O https://raw.githubusercontent.com/bas3line/ultrabalancer/main/grafana/dashboard.json
2

Import to Grafana

  1. Open Grafana UI
  2. Go to Dashboards → Import
  3. Upload dashboard.json
  4. Select your Prometheus data source
  5. Click Import
3

Configure Variables

Set the following variables:
  • datasource: Your Prometheus instance
  • instance: UltraBalancer instance (supports regex)
  • backend: Backend server filter

Custom Dashboard Panels

grafana-panel.json
{
  "title": "Request Rate",
  "type": "graph",
  "datasource": "Prometheus",
  "targets": [
    {
      "expr": "rate(ultrabalancer_requests_total[5m])",
      "legendFormat": "Total Requests/s"
    },
    {
      "expr": "rate(ultrabalancer_requests_successful[5m])",
      "legendFormat": "Successful Requests/s"
    },
    {
      "expr": "rate(ultrabalancer_requests_failed[5m])",
      "legendFormat": "Failed Requests/s"
    }
  ]
}

Alert Rules

Create Grafana alerts for critical conditions:
grafana-alerts.yml
groups:
  - name: ultrabalancer
    interval: 30s
    rules:
      # High error rate alert
      - alert: HighErrorRate
        expr: |
          100 * (
            rate(ultrabalancer_requests_failed[5m]) /
            rate(ultrabalancer_requests_total[5m])
          ) > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"

      # Slow response time alert
      - alert: SlowResponseTime
        expr: |
          histogram_quantile(0.95,
            rate(ultrabalancer_response_time_seconds_bucket[5m])
          ) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response time detected"
          description: "p95 response time is {{ $value | humanizeDuration }}"

      # Backend down alert
      - alert: BackendDown
        expr: ultrabalancer_backend_healthy == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Backend {{ $labels.backend }} is down"
          description: "Backend has been unhealthy for 1 minute"

      # All backends down
      - alert: AllBackendsDown
        expr: sum(ultrabalancer_backend_healthy) == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "All backends are down!"
          description: "Load balancer has no healthy backends"

      # High connection count
      - alert: HighConnectionCount
        expr: sum(ultrabalancer_backend_active_connections) > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection count detected"
          description: "Total active connections: {{ $value }}"

Performance Impact

Metrics Collection Overhead

UltraBalancer uses highly optimized atomic operations for metrics collection:
# Run metrics benchmark
cargo bench --bench metrics

# Results (Intel i9-9900K)
increment_counter     time: 2.1234 ns
record_response_time  time: 45.678 ns
get_snapshot         time: 123.456 ns
Impact on throughput: < 0.1%
Metrics collection in UltraBalancer uses lock-free atomic operations and has negligible performance impact even at 100k+ requests/second.

Exporting Metrics

StatsD Export

Export metrics to StatsD/DogStatsD:
use statsd::Client;

let client = Client::new("127.0.0.1:8125", "ultrabalancer").unwrap();

// Send metrics
client.count("requests.total", 1);
client.timing("response_time", duration_ms);
client.gauge("backends.healthy", healthy_count);

CloudWatch Export

Export metrics to AWS CloudWatch:
export_cloudwatch.py
import boto3
import requests
from datetime import datetime

cloudwatch = boto3.client('cloudwatch')

def export_metrics():
    # Get metrics from UltraBalancer
    response = requests.get('http://localhost:8080/metrics')
    metrics = response.json()

    # Push to CloudWatch
    cloudwatch.put_metric_data(
        Namespace='UltraBalancer',
        MetricData=[
            {
                'MetricName': 'TotalRequests',
                'Value': metrics['total_requests'],
                'Unit': 'Count',
                'Timestamp': datetime.utcnow()
            },
            {
                'MetricName': 'AvgResponseTime',
                'Value': metrics['avg_response_time_ms'],
                'Unit': 'Milliseconds',
                'Timestamp': datetime.utcnow()
            }
        ]
    )

if __name__ == '__main__':
    export_metrics()

Datadog Export

Export metrics to Datadog:
export_datadog.js
const dogapi = require('dogapi');
const axios = require('axios');

dogapi.initialize({
  api_key: process.env.DATADOG_API_KEY,
  app_key: process.env.DATADOG_APP_KEY
});

async function exportMetrics() {
  const response = await axios.get('http://localhost:8080/metrics');
  const metrics = response.data;

  const now = Math.floor(Date.now() / 1000);

  dogapi.metric.send_all([
    {
      metric: 'ultrabalancer.requests.total',
      points: [[now, metrics.total_requests]],
      type: 'count'
    },
    {
      metric: 'ultrabalancer.response_time.avg',
      points: [[now, metrics.avg_response_time_ms]],
      type: 'gauge'
    }
  ]);
}

exportMetrics();

Custom Metrics Collection

Programmatic Access

Access metrics programmatically in your application:
use ultrabalancer::MetricsCollector;
use std::sync::Arc;

let metrics = Arc::new(MetricsCollector::new());

// Record metrics
metrics.increment_total_requests();
metrics.increment_successful_requests();
metrics.record_response_time(Duration::from_millis(15));

// Get snapshot
let snapshot = metrics.snapshot();
println!("Total requests: {}", snapshot.total_requests);
println!("Avg response: {}ms", snapshot.avg_response_time_ms);