superstore¶
High-performance synthetic data generation library for testing and development.
Overview¶
superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:
Data Generators¶
Generator |
Description |
Use Cases |
|---|---|---|
Sales transactions, employees |
BI dashboards, forecasting |
|
Financial-style series with regimes, jumps |
Quant research, backtesting |
|
Sensor data with seasonal/diurnal patterns |
IoT analytics, anomaly detection |
|
Web server & application logs |
Observability, alerting |
|
Stock prices, OHLCV, options chains |
Trading systems, risk analysis |
|
Machine metrics, anomalies, failures |
DevOps dashboards, ML training |
Statistical Tools¶
Tool |
Description |
Use Cases |
|---|---|---|
Sample from statistical distributions |
Simulation, Monte Carlo |
|
Correlated multivariate data |
Risk modeling, portfolio analysis |
|
AR, Markov chains, random walks |
Time series simulation |
Key Features¶
Rust-powered: High-performance generation, 10-100x faster than pure Python
Flexible output: pandas DataFrame, polars DataFrame, or Python dicts
Configurable: Pydantic config classes for validated, structured configuration
Reproducible: Seed support for deterministic generation
Scalable: Streaming and parallel generation for large datasets
Installation¶
pip install superstore
For development with polars support:
pip install superstore[develop]
Quick Start¶
from superstore import superstore, employees, timeseries, weather
# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)
# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")
# Generate as list of dicts
records = superstore(count=1000, output="dict")
Reproducibility with Seeds¶
All data generators support an optional seed parameter for reproducible random data generation:
from superstore import superstore, employees, getTimeSeries, machines
# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2) # True
# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)
# No seed means random data each time
df3 = superstore(count=100) # Different each call
Development¶
Setup¶
# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore
# Install development dependencies
make develop
Building¶
# Build Python wheel
make build
Testing¶
# Run all tests
make test
Linting¶
# Run linters
make lint
# Fix formatting
make fix
Architecture¶
superstore uses a hybrid Rust/Python architecture:
rust/: Core Rust library with all data generation logic
src/: PyO3 bindings exposing Rust functions to Python
superstore/: Python package with native module
The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.
License¶
This library is released under the Apache 2.0 license
Note
This library was generated using copier from the Base Python Project Template repository.