superstore

High-performance synthetic data generation library for testing and development.

Build Status codecov License PyPI

Overview

superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:

Data Generators

Generator

Description

Use Cases

Retail

Sales transactions, employees

BI dashboards, forecasting

Time Series

Financial-style series with regimes, jumps

Quant research, backtesting

Weather

Sensor data with seasonal/diurnal patterns

IoT analytics, anomaly detection

Logs

Web server & application logs

Observability, alerting

Finance

Stock prices, OHLCV, options chains

Trading systems, risk analysis

Telemetry

Machine metrics, anomalies, failures

DevOps dashboards, ML training

Statistical Tools

Tool

Description

Use Cases

Distributions

Sample from statistical distributions

Simulation, Monte Carlo

Copulas

Correlated multivariate data

Risk modeling, portfolio analysis

Temporal Models

AR, Markov chains, random walks

Time series simulation

Key Features

  • Rust-powered: High-performance generation, 10-100x faster than pure Python

  • Flexible output: pandas DataFrame, polars DataFrame, or Python dicts

  • Configurable: Pydantic config classes for validated, structured configuration

  • Reproducible: Seed support for deterministic generation

  • Scalable: Streaming and parallel generation for large datasets

Installation

pip install superstore

For development with polars support:

pip install superstore[develop]

Quick Start

from superstore import superstore, employees, timeseries, weather

# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)

# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")

# Generate as list of dicts
records = superstore(count=1000, output="dict")

Reproducibility with Seeds

All data generators support an optional seed parameter for reproducible random data generation:

from superstore import superstore, employees, getTimeSeries, machines

# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2)  # True

# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)

# No seed means random data each time
df3 = superstore(count=100)  # Different each call

Development

Setup

# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore

# Install development dependencies
make develop

Building

# Build Python wheel
make build

Testing

# Run all tests
make test

Linting

# Run linters
make lint

# Fix formatting
make fix

Architecture

superstore uses a hybrid Rust/Python architecture:

  • rust/: Core Rust library with all data generation logic

  • src/: PyO3 bindings exposing Rust functions to Python

  • superstore/: Python package with native module

The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.

License

This library is released under the Apache 2.0 license

Note

This library was generated using copier from the Base Python Project Template repository.