# Telemetry & IoT Data Generation

Generate machine telemetry and IoT sensor data for dashboard and observability demos.

## Overview

The telemetry generators create data suitable for:

- DevOps/SRE dashboards
- Anomaly detection systems
- Capacity planning demos
- Infrastructure monitoring
- Crossfilter visualization

## Telemetry

The `telemetry()` function generates comprehensive machine metrics with configurable scenarios.

### Basic Usage

```python
from superstore import telemetry

# Generate telemetry with default settings
df = telemetry(n_machines=50, n_readings=1000)

# Use a preset scenario
df = telemetry(scenario="anomaly_detection", n_machines=100)
```

### Output Schema

| Column | Type | Description |
|--------|------|-------------|
| `timestamp` | datetime | Reading timestamp |
| `machine_id` | str | Machine identifier |
| `machine_type` | str | Machine type (core, edge, worker) |
| `zone` | str | Zone identifier |
| `region` | str | Region identifier |
| `cpu_percent` | float | CPU utilization (0-100) |
| `memory_percent` | float | Memory utilization (0-100) |
| `disk_percent` | float | Disk utilization (0-100) |
| `network_in_mbps` | float | Network ingress (Mbps) |
| `network_out_mbps` | float | Network egress (Mbps) |
| `request_count` | int | Request count |
| `error_count` | int | Error count |
| `latency_p50` | float | 50th percentile latency (ms) |
| `latency_p99` | float | 99th percentile latency (ms) |
| `is_anomaly` | bool | Anomaly label (for ML training) |

### Preset Scenarios

Use preset scenarios for common use cases:

```python
from superstore import telemetry, TELEMETRY_SCENARIOS

# See available scenarios
print(TELEMETRY_SCENARIOS)

# Use a scenario
df = telemetry(scenario="production", n_machines=100)
```

| Scenario | Description |
|----------|-------------|
| `baseline` | Normal operating conditions |
| `maintenance_window` | Scheduled maintenance patterns |
| `capacity_planning` | Growth trend data |
| `anomaly_detection` | Training data with labeled anomalies |
| `multi_zone` | Multi-datacenter deployment |
| `cpu_bound` | High CPU workload |
| `memory_bound` | High memory workload |
| `network_heavy` | Network-intensive workload |
| `degradation_cycle` | Progressive degradation and recovery |
| `production` | Full realistic environment |
| `chaos` | High anomaly rates for chaos engineering |

---

## Crossfilter Data

For dashboard demos, use the individual crossfilter generators:

### machines()

Generate machine metadata:

```python
from superstore import machines

df = machines(n_machines=100)
```

| Column | Type | Description |
|--------|------|-------------|
| `machine_id` | str | Machine identifier |
| `machine_type` | str | Machine type |
| `cores` | int | CPU cores |
| `memory_gb` | int | Memory in GB |
| `zone` | str | Zone |
| `region` | str | Region |
| `created_at` | datetime | Provisioning date |

### usage()

Generate machine usage metrics:

```python
from superstore import usage

df = usage(n_machines=100, n_readings=1000)
```

| Column | Type | Description |
|--------|------|-------------|
| `timestamp` | datetime | Reading timestamp |
| `machine_id` | str | Machine identifier |
| `cpu_percent` | float | CPU utilization |
| `memory_percent` | float | Memory utilization |
| `disk_percent` | float | Disk utilization |

### status()

Generate machine status records:

```python
from superstore import status

df = status(n_machines=100)
```

| Column | Type | Description |
|--------|------|-------------|
| `machine_id` | str | Machine identifier |
| `status` | str | Current status (healthy, degraded, down) |
| `last_heartbeat` | datetime | Last heartbeat time |

### jobs()

Generate job/task records:

```python
from superstore import jobs

df = jobs(n_machines=100, n_jobs=5000)
```

| Column | Type | Description |
|--------|------|-------------|
| `job_id` | str | Job identifier |
| `machine_id` | str | Executing machine |
| `job_type` | str | Job type |
| `status` | str | Job status |
| `start_time` | datetime | Job start time |
| `end_time` | datetime | Job end time |
| `duration_seconds` | float | Job duration |

---

## Configuration

Use `CrossfilterConfig` for detailed control:

```python
from superstore import telemetry, CrossfilterConfig

config = CrossfilterConfig(
    n_machines=200,
    n_readings=2000,
    seed=42,
)
df = telemetry(config=config)
```

### Machine Configuration

Configure the machine fleet:

```python
config = CrossfilterConfig(
    n_machines=100,

    # Machine types
    machine_types=["core", "edge", "worker"],

    # Hardware specs
    cores_range=(8, 128),  # CPU cores range

    # Topology
    zones=["zone-a", "zone-b", "zone-c", "zone-d"],
    regions=["us-east-1", "us-west-2", "eu-west-1", "ap-northeast-1"],
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `machine_types` | `[core, edge, worker]` | Types of machines |
| `cores_range` | `(4, 64)` | CPU cores range |
| `zones` | `[zone-a, zone-b, zone-c]` | Available zones |
| `regions` | `[us-east-1, us-west-2, eu-west-1]` | Available regions |

### Usage Profiles

Configure baseline resource utilization:

```python
config = CrossfilterConfig(
    n_machines=100,
    n_readings=1000,

    base_cpu_load=0.4,        # 40% base CPU
    base_memory_load=0.6,     # 60% base memory
    load_variance=0.25,       # Load variability
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `base_cpu_load` | `0.3` | Base CPU utilization (0-1) |
| `base_memory_load` | `0.5` | Base memory utilization (0-1) |
| `load_variance` | `0.2` | Variance in load readings |

### Anomaly Injection

Inject anomalies for detection training:

```python
config = CrossfilterConfig(
    n_machines=100,
    n_readings=2000,

    anomalies={
        "enable": True,
        "cpu_spike_probability": 0.03,       # 3% CPU spikes
        "memory_leak_probability": 0.015,    # Memory leak starts
        "network_saturation_probability": 0.02,  # Network saturation
    }
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable` | `False` | Enable anomaly injection |
| `cpu_spike_probability` | `0.02` | Probability of CPU spike |
| `memory_leak_probability` | `0.01` | Probability of memory leak |
| `network_saturation_probability` | `0.01` | Probability of network saturation |

### Temporal Patterns

Add realistic time-of-day and day-of-week patterns:

```python
config = CrossfilterConfig(
    n_machines=100,
    n_readings=5000,

    temporal_patterns={
        "enable_diurnal": True,    # Day/night patterns
        "enable_weekly": True,     # Weekday/weekend patterns
        "peak_hour": 14,           # Peak at 2 PM
        "night_load_factor": 0.25, # 25% load at night
        "weekend_load_factor": 0.4, # 40% load on weekends
    }
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_diurnal` | `False` | Enable day/night load patterns |
| `enable_weekly` | `False` | Enable weekday/weekend patterns |
| `peak_hour` | `14` | Hour of peak load (0-23) |
| `night_load_factor` | `0.3` | Load factor during night |
| `weekend_load_factor` | `0.5` | Load factor during weekends |

### Failure Simulation

Simulate machine failures and cascades:

```python
config = CrossfilterConfig(
    n_machines=100,
    n_readings=2000,

    enable_failures=True,
    failure_probability=0.002,       # 0.2% failure rate per reading
    cascade_failure_probability=0.4, # 40% chance of cascade
)
```

| Parameter | Default | Description |
|-----------|---------|-------------|
| `enable_failures` | `False` | Enable failure simulation |
| `failure_probability` | `0.001` | Per-reading failure probability |
| `cascade_failure_probability` | `0.3` | Cascade probability when dependent fails |

### Complete Example

```python
from superstore import telemetry, CrossfilterConfig

config = CrossfilterConfig(
    n_machines=200,
    n_readings=3000,
    seed=42,

    # Machine topology
    machine_types=["core", "edge", "worker"],
    cores_range=(8, 96),
    zones=["zone-a", "zone-b", "zone-c"],
    regions=["us-east-1", "us-west-2", "eu-west-1"],

    # Usage profiles
    base_cpu_load=0.35,
    base_memory_load=0.55,
    load_variance=0.2,

    # Anomalies
    anomalies={
        "enable": True,
        "cpu_spike_probability": 0.02,
        "memory_leak_probability": 0.01,
        "network_saturation_probability": 0.01,
    },

    # Temporal patterns
    temporal_patterns={
        "enable_diurnal": True,
        "enable_weekly": True,
        "peak_hour": 15,
        "night_load_factor": 0.2,
        "weekend_load_factor": 0.35,
    },

    # Failures
    enable_failures=True,
    failure_probability=0.001,
    cascade_failure_probability=0.25,
)

df = telemetry(config=config)
```

---

## Schemas

Access schema constants for validation:

```python
from superstore import (
    MACHINE_SCHEMA,
    USAGE_SCHEMA,
    STATUS_SCHEMA,
    JOBS_SCHEMA,
    TELEMETRY_SCHEMA,
    TELEMETRY_SCENARIOS,
)
```

---

## API Reference

See the full API documentation:

- [telemetry()](api.md)
- [machines()](api.md)
- [usage()](api.md)
- [status()](api.md)
- [jobs()](api.md)
- [CrossfilterConfig](api.md)