Log Data Generation¶
Generate realistic web server access logs and application event logs with configurable traffic patterns.
Overview¶
The log generators create data suitable for:
Log analytics dashboards
Error rate monitoring
Latency analysis
Security analysis and fraud detection
Observability pipeline testing
Web Server Logs¶
The logs() function generates HTTP access logs in Apache Combined or Common format.
Basic Usage¶
from superstore import logs
# Generate 10,000 log entries
df = logs(count=10000)
# Apache Combined format (default)
df = logs(count=10000, format="combined")
# JSON structured logs
df = logs(count=10000, format="json")
Output Schema¶
Column |
Type |
Description |
|---|---|---|
|
datetime |
Request timestamp |
|
str |
Client IP address |
|
str |
HTTP method (GET, POST, etc.) |
|
str |
Request path |
|
int |
HTTP status code |
|
int |
Response size in bytes |
|
float |
Request latency in milliseconds |
|
str |
User agent string |
|
str |
Referrer URL |
|
str |
User identifier (if authenticated) |
Application Logs¶
The app_logs() function generates application-level event logs with log levels, trace IDs, and exceptions.
Basic Usage¶
from superstore import app_logs
# Generate 5,000 application log entries
df = app_logs(count=5000)
Output Schema¶
Column |
Type |
Description |
|---|---|---|
|
datetime |
Event timestamp |
|
str |
Log level (DEBUG, INFO, WARN, ERROR) |
|
str |
Logger name/component |
|
str |
Log message |
|
str |
Distributed trace ID |
|
str |
Span ID |
|
str |
Exception type (if error) |
|
str |
Stack trace (if error) |
Configuration¶
Use LogsConfig for detailed control over log generation:
from superstore import logs, LogsConfig
config = LogsConfig(
count=50000,
seed=42,
format="combined",
)
df = logs(config=config)
Traffic Patterns¶
Control the traffic rate and timing:
config = LogsConfig(
count=10000,
start_time="2024-01-15T10:00:00", # ISO format start time
requests_per_second=250.0, # Average RPS (Poisson arrival)
)
Parameter |
Default |
Description |
|---|---|---|
|
(current time) |
Start timestamp in ISO format |
|
|
Average requests per second |
Status Code Distribution¶
Configure success and error rates:
config = LogsConfig(
count=10000,
success_rate=0.98, # 98% success (2xx responses)
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Base success rate (2xx responses) |
Error Bursts¶
Simulate error bursts for monitoring/alerting demos:
config = LogsConfig(
count=50000,
error_burst={
"enable": True,
"burst_probability": 0.03, # 3% chance of entering burst
"burst_duration_seconds": 45, # Average burst duration
"burst_error_rate": 0.6, # 60% errors during burst
}
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable error burst simulation |
|
|
Probability of entering burst state |
|
|
Average burst duration |
|
|
Error rate during bursts |
Latency Distribution¶
Configure request latency behavior:
config = LogsConfig(
count=10000,
latency={
"base_latency_ms": 60.0, # Median latency
"latency_stddev": 0.9, # Log-normal spread
"slow_request_probability": 0.08, # 8% slow requests
"slow_request_multiplier": 15.0, # Slow = 15x base
}
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Base/median latency in milliseconds |
|
|
Standard deviation (log-normal) |
|
|
Probability of slow requests |
|
|
Multiplier for slow request latency |
Request Details¶
Customize request generation:
config = LogsConfig(
count=10000,
include_user_agent=True, # Include user agent strings
unique_ips=2000, # Number of unique client IPs
unique_users=800, # Number of unique user IDs
api_path_ratio=0.8, # 80% API paths, 20% static
)
Parameter |
Default |
Description |
|---|---|---|
|
|
Include user agent strings |
|
|
Number of unique IP addresses |
|
|
Number of unique user IDs |
|
|
Ratio of API vs static paths |
Complete Example¶
from superstore import logs, LogsConfig
config = LogsConfig(
count=100000,
seed=42,
format="json",
# Traffic
start_time="2024-06-01T00:00:00",
requests_per_second=500.0,
# Success rate
success_rate=0.97,
# Error bursts for monitoring demos
error_burst={
"enable": True,
"burst_probability": 0.02,
"burst_duration_seconds": 60,
"burst_error_rate": 0.7,
},
# Latency
latency={
"base_latency_ms": 45.0,
"latency_stddev": 0.7,
"slow_request_probability": 0.05,
"slow_request_multiplier": 20.0,
},
# Request details
unique_ips=5000,
unique_users=2000,
api_path_ratio=0.85,
)
df = logs(config=config)
API Reference¶
See the full API documentation: