Overview
UltraBalancer provides comprehensive metrics and observability features for monitoring performance, tracking request patterns, and identifying issues in production environments.
Real-Time Metrics Live performance metrics with millisecond precision
Prometheus Export Native Prometheus metrics format support
HTTP Endpoints JSON and Prometheus metrics endpoints
Zero Overhead Lock-free atomic operations for minimal impact
Metrics Collection
UltraBalancer collects metrics automatically without configuration. All metrics are stored in memory with minimal performance impact using atomic operations and lock-free data structures.
Available Metrics
Request Metrics
Backend Metrics
System Metrics
Request Counters
total_requests - Total number of requests processed
successful_requests - Requests completed successfully (2xx, 3xx)
failed_requests - Requests that failed (4xx, 5xx, timeouts)
requests_per_second - Current requests/second rate
Response Times
avg_response_time_ms - Mean response time
min_response_time_ms - Fastest response time
max_response_time_ms - Slowest response time
p50_response_time_ms - 50th percentile (median)
p95_response_time_ms - 95th percentile
p99_response_time_ms - 99th percentile
Backend Health
backend_healthy - Number of healthy backends
backend_unhealthy - Number of unhealthy backends
backend_failures - Failure count per backend
circuit_breaker_state - Circuit breaker state per backend
Backend Load
backend_active_connections - Active connections per backend
backend_total_requests - Total requests sent to backend
backend_avg_response_time - Average response time per backend
System Information
uptime_seconds - Load balancer uptime
version - UltraBalancer version
algorithm - Active load balancing algorithm
Performance
connections_active - Currently active connections
connections_total - Total connections handled
bytes_sent - Total bytes sent to backends
bytes_received - Total bytes received from backends
Metrics Endpoints
JSON Metrics Endpoint
The /metrics endpoint returns metrics in JSON format:
curl http://localhost:8080/metrics
Response Format :
{
"total_requests" : 1547823 ,
"successful_requests" : 1542891 ,
"failed_requests" : 4932 ,
"avg_response_time_ms" : 12.45 ,
"min_response_time_ms" : 1.23 ,
"max_response_time_ms" : 234.56 ,
"p50_response_time_ms" : 10.12 ,
"p95_response_time_ms" : 28.45 ,
"p99_response_time_ms" : 45.67 ,
"uptime_seconds" : 86400 ,
"requests_per_second" : 17.91 ,
"backends" : [
{
"address" : "192.168.1.10:8080" ,
"healthy" : true ,
"active_connections" : 42 ,
"total_requests" : 515941 ,
"failures" : 0 ,
"avg_response_time_ms" : 11.23
},
{
"address" : "192.168.1.11:8080" ,
"healthy" : true ,
"active_connections" : 38 ,
"total_requests" : 515940 ,
"failures" : 0 ,
"avg_response_time_ms" : 12.87
}
]
}
Prometheus Metrics Endpoint
The /prometheus endpoint exports metrics in Prometheus format:
curl http://localhost:8080/prometheus
Response Format :
# HELP ultrabalancer_requests_total Total number of requests
# TYPE ultrabalancer_requests_total counter
ultrabalancer_requests_total 1547823
# HELP ultrabalancer_requests_successful Successful requests
# TYPE ultrabalancer_requests_successful counter
ultrabalancer_requests_successful 1542891
# HELP ultrabalancer_requests_failed Failed requests
# TYPE ultrabalancer_requests_failed counter
ultrabalancer_requests_failed 4932
# HELP ultrabalancer_response_time_seconds Response time in seconds
# TYPE ultrabalancer_response_time_seconds histogram
ultrabalancer_response_time_seconds_bucket{le="0.005"} 123456
ultrabalancer_response_time_seconds_bucket{le="0.01"} 234567
ultrabalancer_response_time_seconds_bucket{le="0.025"} 345678
ultrabalancer_response_time_seconds_bucket{le="0.05"} 456789
ultrabalancer_response_time_seconds_bucket{le="0.1"} 512345
ultrabalancer_response_time_seconds_bucket{le="0.25"} 545678
ultrabalancer_response_time_seconds_bucket{le="0.5"} 547123
ultrabalancer_response_time_seconds_bucket{le="1"} 547823
ultrabalancer_response_time_seconds_bucket{le="+Inf"} 1547823
ultrabalancer_response_time_seconds_sum 19247.456
ultrabalancer_response_time_seconds_count 1547823
# HELP ultrabalancer_backend_healthy Backend health status
# TYPE ultrabalancer_backend_healthy gauge
ultrabalancer_backend_healthy{backend="192.168.1.10:8080"} 1
ultrabalancer_backend_healthy{backend="192.168.1.11:8080"} 1
# HELP ultrabalancer_backend_requests_total Total requests per backend
# TYPE ultrabalancer_backend_requests_total counter
ultrabalancer_backend_requests_total{backend="192.168.1.10:8080"} 515941
ultrabalancer_backend_requests_total{backend="192.168.1.11:8080"} 515940
# HELP ultrabalancer_backend_active_connections Active connections per backend
# TYPE ultrabalancer_backend_active_connections gauge
ultrabalancer_backend_active_connections{backend="192.168.1.10:8080"} 42
ultrabalancer_backend_active_connections{backend="192.168.1.11:8080"} 38
# HELP ultrabalancer_uptime_seconds Load balancer uptime
# TYPE ultrabalancer_uptime_seconds counter
ultrabalancer_uptime_seconds 86400
Prometheus Integration
Prometheus Configuration
Add UltraBalancer as a scrape target in your Prometheus configuration:
global :
scrape_interval : 15s
evaluation_interval : 15s
scrape_configs :
- job_name : 'ultrabalancer'
static_configs :
- targets : [ 'localhost:8080' ]
metrics_path : '/prometheus'
scrape_interval : 5s
Dynamic Service Discovery
Use Prometheus service discovery for dynamic backend discovery:
Kubernetes
Consul
Docker
File-Based
scrape_configs :
- job_name : 'ultrabalancer'
kubernetes_sd_configs :
- role : pod
namespaces :
names :
- production
relabel_configs :
- source_labels : [ __meta_kubernetes_pod_label_app ]
action : keep
regex : ultrabalancer
- source_labels : [ __meta_kubernetes_pod_ip ]
target_label : __address__
replacement : ${1}:8080
- target_label : __metrics_path__
replacement : /prometheus
scrape_configs :
- job_name : 'ultrabalancer'
consul_sd_configs :
- server : 'localhost:8500'
services : [ 'ultrabalancer' ]
relabel_configs :
- source_labels : [ __meta_consul_service ]
target_label : job
- target_label : __metrics_path__
replacement : /prometheus
scrape_configs :
- job_name : 'ultrabalancer'
docker_sd_configs :
- host : unix:///var/run/docker.sock
refresh_interval : 30s
relabel_configs :
- source_labels : [ __meta_docker_container_label_app ]
action : keep
regex : ultrabalancer
- source_labels : [ __meta_docker_network_ip ]
target_label : __address__
replacement : ${1}:8080
- target_label : __metrics_path__
replacement : /prometheus
scrape_configs :
- job_name : 'ultrabalancer'
file_sd_configs :
- files :
- '/etc/prometheus/ultrabalancer_targets.json'
refresh_interval : 30s
ultrabalancer_targets.json
[
{
"targets" : [ "lb1.example.com:8080" , "lb2.example.com:8080" ],
"labels" : {
"env" : "production" ,
"region" : "us-west-2"
}
}
]
Query Examples
Common Prometheus queries for UltraBalancer metrics:
# Request rate per second
rate(ultrabalancer_requests_total[5m])
# Success rate percentage
100 * (
rate(ultrabalancer_requests_successful[5m]) /
rate(ultrabalancer_requests_total[5m])
)
# Error rate
rate(ultrabalancer_requests_failed[5m])
# Average response time
rate(ultrabalancer_response_time_seconds_sum[5m]) /
rate(ultrabalancer_response_time_seconds_count[5m])
# 95th percentile response time
histogram_quantile(0.95, rate(ultrabalancer_response_time_seconds_bucket[5m]))
# 99th percentile response time
histogram_quantile(0.99, rate(ultrabalancer_response_time_seconds_bucket[5m]))
# Requests per backend
sum by (backend) (rate(ultrabalancer_backend_requests_total[5m]))
# Unhealthy backends
count(ultrabalancer_backend_healthy == 0)
# Total active connections
sum(ultrabalancer_backend_active_connections)
Grafana Dashboards
Pre-Built Dashboard
Import the official UltraBalancer Grafana dashboard:
Download Dashboard
Download the dashboard JSON from the repository: curl -O https://raw.githubusercontent.com/bas3line/ultrabalancer/main/grafana/dashboard.json
Import to Grafana
Open Grafana UI
Go to Dashboards → Import
Upload dashboard.json
Select your Prometheus data source
Click Import
Configure Variables
Set the following variables:
datasource : Your Prometheus instance
instance : UltraBalancer instance (supports regex)
backend : Backend server filter
Custom Dashboard Panels
Request Rate
Response Time
Backend Health
Traffic Distribution
{
"title" : "Request Rate" ,
"type" : "graph" ,
"datasource" : "Prometheus" ,
"targets" : [
{
"expr" : "rate(ultrabalancer_requests_total[5m])" ,
"legendFormat" : "Total Requests/s"
},
{
"expr" : "rate(ultrabalancer_requests_successful[5m])" ,
"legendFormat" : "Successful Requests/s"
},
{
"expr" : "rate(ultrabalancer_requests_failed[5m])" ,
"legendFormat" : "Failed Requests/s"
}
]
}
{
"title" : "Response Time Percentiles" ,
"type" : "graph" ,
"datasource" : "Prometheus" ,
"targets" : [
{
"expr" : "histogram_quantile(0.50, rate(ultrabalancer_response_time_seconds_bucket[5m]))" ,
"legendFormat" : "p50"
},
{
"expr" : "histogram_quantile(0.95, rate(ultrabalancer_response_time_seconds_bucket[5m]))" ,
"legendFormat" : "p95"
},
{
"expr" : "histogram_quantile(0.99, rate(ultrabalancer_response_time_seconds_bucket[5m]))" ,
"legendFormat" : "p99"
}
]
}
{
"title" : "Backend Health" ,
"type" : "stat" ,
"datasource" : "Prometheus" ,
"targets" : [
{
"expr" : "sum(ultrabalancer_backend_healthy)" ,
"legendFormat" : "Healthy Backends"
}
],
"options" : {
"colorMode" : "value" ,
"graphMode" : "none" ,
"thresholds" : {
"mode" : "absolute" ,
"steps" : [
{ "value" : 0 , "color" : "red" },
{ "value" : 1 , "color" : "yellow" },
{ "value" : 2 , "color" : "green" }
]
}
}
}
{
"title" : "Traffic Distribution by Backend" ,
"type" : "piechart" ,
"datasource" : "Prometheus" ,
"targets" : [
{
"expr" : "sum by (backend) (rate(ultrabalancer_backend_requests_total[5m]))" ,
"legendFormat" : "{{ backend }}"
}
]
}
Alert Rules
Create Grafana alerts for critical conditions:
groups :
- name : ultrabalancer
interval : 30s
rules :
# High error rate alert
- alert : HighErrorRate
expr : |
100 * (
rate(ultrabalancer_requests_failed[5m]) /
rate(ultrabalancer_requests_total[5m])
) > 5
for : 2m
labels :
severity : warning
annotations :
summary : "High error rate detected"
description : "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
# Slow response time alert
- alert : SlowResponseTime
expr : |
histogram_quantile(0.95,
rate(ultrabalancer_response_time_seconds_bucket[5m])
) > 0.5
for : 5m
labels :
severity : warning
annotations :
summary : "Slow response time detected"
description : "p95 response time is {{ $value | humanizeDuration }}"
# Backend down alert
- alert : BackendDown
expr : ultrabalancer_backend_healthy == 0
for : 1m
labels :
severity : critical
annotations :
summary : "Backend {{ $labels.backend }} is down"
description : "Backend has been unhealthy for 1 minute"
# All backends down
- alert : AllBackendsDown
expr : sum(ultrabalancer_backend_healthy) == 0
for : 30s
labels :
severity : critical
annotations :
summary : "All backends are down!"
description : "Load balancer has no healthy backends"
# High connection count
- alert : HighConnectionCount
expr : sum(ultrabalancer_backend_active_connections) > 1000
for : 5m
labels :
severity : warning
annotations :
summary : "High connection count detected"
description : "Total active connections: {{ $value }}"
Metrics Collection Overhead
UltraBalancer uses highly optimized atomic operations for metrics collection:
Benchmark Results
Memory Usage
CPU Impact
# Run metrics benchmark
cargo bench --bench metrics
# Results (Intel i9-9900K)
increment_counter time: 2.1234 ns
record_response_time time: 45.678 ns
get_snapshot time: 123.456 ns
Impact on throughput : < 0.1%# Memory overhead per metric
Counter: 8 bytes (AtomicU64)
Histogram: ~80 KB (10,000 samples )
Total overhead: ~250 KB per instance
# CPU usage with metrics enabled
Base load balancer: 15% CPU
With metrics: 15.2% CPU (+0.2%)
# Metrics collection is negligible
Metrics collection in UltraBalancer uses lock-free atomic operations and has negligible performance impact even at 100k+ requests/second.
Exporting Metrics
StatsD Export
Export metrics to StatsD/DogStatsD:
use statsd :: Client ;
let client = Client :: new ( "127.0.0.1:8125" , "ultrabalancer" ) . unwrap ();
// Send metrics
client . count ( "requests.total" , 1 );
client . timing ( "response_time" , duration_ms );
client . gauge ( "backends.healthy" , healthy_count );
CloudWatch Export
Export metrics to AWS CloudWatch:
import boto3
import requests
from datetime import datetime
cloudwatch = boto3.client( 'cloudwatch' )
def export_metrics ():
# Get metrics from UltraBalancer
response = requests.get( 'http://localhost:8080/metrics' )
metrics = response.json()
# Push to CloudWatch
cloudwatch.put_metric_data(
Namespace = 'UltraBalancer' ,
MetricData = [
{
'MetricName' : 'TotalRequests' ,
'Value' : metrics[ 'total_requests' ],
'Unit' : 'Count' ,
'Timestamp' : datetime.utcnow()
},
{
'MetricName' : 'AvgResponseTime' ,
'Value' : metrics[ 'avg_response_time_ms' ],
'Unit' : 'Milliseconds' ,
'Timestamp' : datetime.utcnow()
}
]
)
if __name__ == '__main__' :
export_metrics()
Datadog Export
Export metrics to Datadog:
const dogapi = require ( 'dogapi' );
const axios = require ( 'axios' );
dogapi . initialize ({
api_key: process . env . DATADOG_API_KEY ,
app_key: process . env . DATADOG_APP_KEY
});
async function exportMetrics () {
const response = await axios . get ( 'http://localhost:8080/metrics' );
const metrics = response . data ;
const now = Math . floor ( Date . now () / 1000 );
dogapi . metric . send_all ([
{
metric: 'ultrabalancer.requests.total' ,
points: [[ now , metrics . total_requests ]],
type: 'count'
},
{
metric: 'ultrabalancer.response_time.avg' ,
points: [[ now , metrics . avg_response_time_ms ]],
type: 'gauge'
}
]);
}
exportMetrics ();
Custom Metrics Collection
Programmatic Access
Access metrics programmatically in your application:
use ultrabalancer :: MetricsCollector ;
use std :: sync :: Arc ;
let metrics = Arc :: new ( MetricsCollector :: new ());
// Record metrics
metrics . increment_total_requests ();
metrics . increment_successful_requests ();
metrics . record_response_time ( Duration :: from_millis ( 15 ));
// Get snapshot
let snapshot = metrics . snapshot ();
println! ( "Total requests: {}" , snapshot . total_requests);
println! ( "Avg response: {}ms" , snapshot . avg_response_time_ms);