# Log Data Generation Generate realistic web server access logs and application event logs with configurable traffic patterns. ## Overview The log generators create data suitable for: - Log analytics dashboards - Error rate monitoring - Latency analysis - Security analysis and fraud detection - Observability pipeline testing ## Web Server Logs The `logs()` function generates HTTP access logs in Apache Combined or Common format. ### Basic Usage ```python from superstore import logs # Generate 10,000 log entries df = logs(count=10000) # Apache Combined format (default) df = logs(count=10000, format="combined") # JSON structured logs df = logs(count=10000, format="json") ``` ### Output Schema | Column | Type | Description | |--------|------|-------------| | `timestamp` | datetime | Request timestamp | | `ip_address` | str | Client IP address | | `method` | str | HTTP method (GET, POST, etc.) | | `path` | str | Request path | | `status_code` | int | HTTP status code | | `response_size` | int | Response size in bytes | | `latency_ms` | float | Request latency in milliseconds | | `user_agent` | str | User agent string | | `referer` | str | Referrer URL | | `user_id` | str | User identifier (if authenticated) | --- ## Application Logs The `app_logs()` function generates application-level event logs with log levels, trace IDs, and exceptions. ### Basic Usage ```python from superstore import app_logs # Generate 5,000 application log entries df = app_logs(count=5000) ``` ### Output Schema | Column | Type | Description | |--------|------|-------------| | `timestamp` | datetime | Event timestamp | | `level` | str | Log level (DEBUG, INFO, WARN, ERROR) | | `logger` | str | Logger name/component | | `message` | str | Log message | | `trace_id` | str | Distributed trace ID | | `span_id` | str | Span ID | | `exception` | str | Exception type (if error) | | `stack_trace` | str | Stack trace (if error) | --- ## Configuration Use `LogsConfig` for detailed control over log generation: ```python from superstore import logs, LogsConfig config = LogsConfig( count=50000, seed=42, format="combined", ) df = logs(config=config) ``` ### Traffic Patterns Control the traffic rate and timing: ```python config = LogsConfig( count=10000, start_time="2024-01-15T10:00:00", # ISO format start time requests_per_second=250.0, # Average RPS (Poisson arrival) ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `start_time` | (current time) | Start timestamp in ISO format | | `requests_per_second` | `100.0` | Average requests per second | ### Status Code Distribution Configure success and error rates: ```python config = LogsConfig( count=10000, success_rate=0.98, # 98% success (2xx responses) ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `success_rate` | `0.95` | Base success rate (2xx responses) | ### Error Bursts Simulate error bursts for monitoring/alerting demos: ```python config = LogsConfig( count=50000, error_burst={ "enable": True, "burst_probability": 0.03, # 3% chance of entering burst "burst_duration_seconds": 45, # Average burst duration "burst_error_rate": 0.6, # 60% errors during burst } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `enable` | `True` | Enable error burst simulation | | `burst_probability` | `0.02` | Probability of entering burst state | | `burst_duration_seconds` | `30` | Average burst duration | | `burst_error_rate` | `0.5` | Error rate during bursts | ### Latency Distribution Configure request latency behavior: ```python config = LogsConfig( count=10000, latency={ "base_latency_ms": 60.0, # Median latency "latency_stddev": 0.9, # Log-normal spread "slow_request_probability": 0.08, # 8% slow requests "slow_request_multiplier": 15.0, # Slow = 15x base } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `base_latency_ms` | `50.0` | Base/median latency in milliseconds | | `latency_stddev` | `0.8` | Standard deviation (log-normal) | | `slow_request_probability` | `0.05` | Probability of slow requests | | `slow_request_multiplier` | `10.0` | Multiplier for slow request latency | ### Request Details Customize request generation: ```python config = LogsConfig( count=10000, include_user_agent=True, # Include user agent strings unique_ips=2000, # Number of unique client IPs unique_users=800, # Number of unique user IDs api_path_ratio=0.8, # 80% API paths, 20% static ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `include_user_agent` | `True` | Include user agent strings | | `unique_ips` | `1000` | Number of unique IP addresses | | `unique_users` | `500` | Number of unique user IDs | | `api_path_ratio` | `0.7` | Ratio of API vs static paths | ### Complete Example ```python from superstore import logs, LogsConfig config = LogsConfig( count=100000, seed=42, format="json", # Traffic start_time="2024-06-01T00:00:00", requests_per_second=500.0, # Success rate success_rate=0.97, # Error bursts for monitoring demos error_burst={ "enable": True, "burst_probability": 0.02, "burst_duration_seconds": 60, "burst_error_rate": 0.7, }, # Latency latency={ "base_latency_ms": 45.0, "latency_stddev": 0.7, "slow_request_probability": 0.05, "slow_request_multiplier": 20.0, }, # Request details unique_ips=5000, unique_users=2000, api_path_ratio=0.85, ) df = logs(config=config) ``` --- ## API Reference See the full API documentation: - [logs()](api.md) - [app_logs()](api.md) - [LogsConfig](api.md)