Telemetry & IoT Data Generation¶
Generate machine telemetry and IoT sensor data for dashboard and observability demos.
Overview¶
The telemetry generators create data suitable for:
DevOps/SRE dashboards
Anomaly detection systems
Capacity planning demos
Infrastructure monitoring
Crossfilter visualization
Telemetry¶
The telemetry() function generates comprehensive machine metrics with configurable scenarios.
Basic Usage¶
from superstore import telemetry
# Generate telemetry with default settings
df = telemetry(n_machines=50, n_readings=1000)
# Use a preset scenario
df = telemetry(scenario="anomaly_detection", n_machines=100)
Output Schema¶
Column |
Type |
Description |
|---|---|---|
|
datetime |
Reading timestamp |
|
str |
Machine identifier |
|
str |
Machine type (core, edge, worker) |
|
str |
Zone identifier |
|
str |
Region identifier |
|
float |
CPU utilization (0-100) |
|
float |
Memory utilization (0-100) |
|
float |
Disk utilization (0-100) |
|
float |
Network ingress (Mbps) |
|
float |
Network egress (Mbps) |
|
int |
Request count |
|
int |
Error count |
|
float |
50th percentile latency (ms) |
|
float |
99th percentile latency (ms) |
|
bool |
Anomaly label (for ML training) |
Preset Scenarios¶
Use preset scenarios for common use cases:
from superstore import telemetry, TELEMETRY_SCENARIOS
# See available scenarios
print(TELEMETRY_SCENARIOS)
# Use a scenario
df = telemetry(scenario="production", n_machines=100)
Scenario |
Description |
|---|---|
|
Normal operating conditions |
|
Scheduled maintenance patterns |
|
Growth trend data |
|
Training data with labeled anomalies |
|
Multi-datacenter deployment |
|
High CPU workload |
|
High memory workload |
|
Network-intensive workload |
|
Progressive degradation and recovery |
|
Full realistic environment |
|
High anomaly rates for chaos engineering |
Crossfilter Data¶
For dashboard demos, use the individual crossfilter generators:
machines()¶
Generate machine metadata:
from superstore import machines
df = machines(n_machines=100)
Column |
Type |
Description |
|---|---|---|
|
str |
Machine identifier |
|
str |
Machine type |
|
int |
CPU cores |
|
int |
Memory in GB |
|
str |
Zone |
|
str |
Region |
|
datetime |
Provisioning date |
usage()¶
Generate machine usage metrics:
from superstore import usage
df = usage(n_machines=100, n_readings=1000)
Column |
Type |
Description |
|---|---|---|
|
datetime |
Reading timestamp |
|
str |
Machine identifier |
|
float |
CPU utilization |
|
float |
Memory utilization |
|
float |
Disk utilization |
status()¶
Generate machine status records:
from superstore import status
df = status(n_machines=100)
Column |
Type |
Description |
|---|---|---|
|
str |
Machine identifier |
|
str |
Current status (healthy, degraded, down) |
|
datetime |
Last heartbeat time |
jobs()¶
Generate job/task records:
from superstore import jobs
df = jobs(n_machines=100, n_jobs=5000)
Column |
Type |
Description |
|---|---|---|
|
str |
Job identifier |
|
str |
Executing machine |
|
str |
Job type |
|
str |
Job status |
|
datetime |
Job start time |
|
datetime |
Job end time |
|
float |
Job duration |
Configuration¶
Use CrossfilterConfig for detailed control:
from superstore import telemetry, CrossfilterConfig
config = CrossfilterConfig(
n_machines=200,
n_readings=2000,
seed=42,
)
df = telemetry(config=config)
Machine Configuration¶
Configure the machine fleet:
config = CrossfilterConfig(
n_machines=100,
# Machine types
machine_types=["core", "edge", "worker"],
# Hardware specs
cores_range=(8, 128), # CPU cores range
# Topology
zones=["zone-a", "zone-b", "zone-c", "zone-d"],
regions=["us-east-1", "us-west-2", "eu-west-1", "ap-northeast-1"],
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Types of machines |
|
|
CPU cores range |
|
|
Available zones |
|
|
Available regions |
Usage Profiles¶
Configure baseline resource utilization:
config = CrossfilterConfig(
n_machines=100,
n_readings=1000,
base_cpu_load=0.4, # 40% base CPU
base_memory_load=0.6, # 60% base memory
load_variance=0.25, # Load variability
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Base CPU utilization (0-1) |
|
|
Base memory utilization (0-1) |
|
|
Variance in load readings |
Anomaly Injection¶
Inject anomalies for detection training:
config = CrossfilterConfig(
n_machines=100,
n_readings=2000,
anomalies={
"enable": True,
"cpu_spike_probability": 0.03, # 3% CPU spikes
"memory_leak_probability": 0.015, # Memory leak starts
"network_saturation_probability": 0.02, # Network saturation
}
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable anomaly injection |
|
|
Probability of CPU spike |
|
|
Probability of memory leak |
|
|
Probability of network saturation |
Temporal Patterns¶
Add realistic time-of-day and day-of-week patterns:
config = CrossfilterConfig(
n_machines=100,
n_readings=5000,
temporal_patterns={
"enable_diurnal": True, # Day/night patterns
"enable_weekly": True, # Weekday/weekend patterns
"peak_hour": 14, # Peak at 2 PM
"night_load_factor": 0.25, # 25% load at night
"weekend_load_factor": 0.4, # 40% load on weekends
}
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable day/night load patterns |
|
|
Enable weekday/weekend patterns |
|
|
Hour of peak load (0-23) |
|
|
Load factor during night |
|
|
Load factor during weekends |
Failure Simulation¶
Simulate machine failures and cascades:
config = CrossfilterConfig(
n_machines=100,
n_readings=2000,
enable_failures=True,
failure_probability=0.002, # 0.2% failure rate per reading
cascade_failure_probability=0.4, # 40% chance of cascade
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable failure simulation |
|
|
Per-reading failure probability |
|
|
Cascade probability when dependent fails |
Complete Example¶
from superstore import telemetry, CrossfilterConfig
config = CrossfilterConfig(
n_machines=200,
n_readings=3000,
seed=42,
# Machine topology
machine_types=["core", "edge", "worker"],
cores_range=(8, 96),
zones=["zone-a", "zone-b", "zone-c"],
regions=["us-east-1", "us-west-2", "eu-west-1"],
# Usage profiles
base_cpu_load=0.35,
base_memory_load=0.55,
load_variance=0.2,
# Anomalies
anomalies={
"enable": True,
"cpu_spike_probability": 0.02,
"memory_leak_probability": 0.01,
"network_saturation_probability": 0.01,
},
# Temporal patterns
temporal_patterns={
"enable_diurnal": True,
"enable_weekly": True,
"peak_hour": 15,
"night_load_factor": 0.2,
"weekend_load_factor": 0.35,
},
# Failures
enable_failures=True,
failure_probability=0.001,
cascade_failure_probability=0.25,
)
df = telemetry(config=config)
Schemas¶
Access schema constants for validation:
from superstore import (
MACHINE_SCHEMA,
USAGE_SCHEMA,
STATUS_SCHEMA,
JOBS_SCHEMA,
TELEMETRY_SCHEMA,
TELEMETRY_SCENARIOS,
)
API Reference¶
See the full API documentation: