API Reference¶
Full API reference for all public functions and classes.
For detailed guides with examples, see:
Retail Data -
superstore(),employees()Time Series -
timeseries()Weather -
weather()Logs -
logs(),app_logs()Finance -
stock_prices(),options_chain(),finance()E-commerce -
ecommerce_data(),ecommerce_sessions(),ecommerce_products()Telemetry -
telemetry(), crossfilter functionsDistributions -
sample*()functionsCopulas - copula classes
Temporal Models -
AR1,MarkovChain,RandomWalk
Data Generators¶
- superstore.superstore(config=None, count=None, output=None, seed=None)¶
Generate superstore sales data with structured configuration.
- Parameters:
config – Optional SuperstoreConfig pydantic model, dict, or int (for backward compatibility). If int, treated as count. If None, uses default configuration.
count – Number of rows (overrides config if provided)
output – Output format (“pandas”, “polars”, or “dict”)
seed – Random seed (overrides config if provided)
- Returns:
Superstore sales data in the specified format.
- superstore.employees(count=1000, output='pandas', seed=None)¶
- superstore.timeseries(config=None, nper=None, freq=None, ncol=None, output=None, seed=None)¶
Generate time series data with structured configuration.
- Parameters:
config – Optional TimeseriesConfig pydantic model, dict, or int (for backward compatibility). If int, treated as nper. If None, uses default configuration.
nper – Number of periods (overrides config if provided)
freq – Frequency string (overrides config if provided)
ncol – Number of columns (overrides config if provided)
output – Output format (“pandas”, “polars”, or “dict”)
seed – Random seed (overrides config if provided)
- Returns:
Time series data in the specified format.
- superstore.weather(config=None, count=None, output='pandas', seed=None)¶
Generate weather data with structured configuration.
- Parameters:
config – Optional WeatherConfig pydantic model or dict with configuration. If None, uses default configuration.
count – Number of readings (overrides config if provided)
output – Output format (“pandas”, “polars”, or “dict”)
seed – Random seed (overrides config if provided)
- Returns:
Weather sensor data in the specified format.
- superstore.logs(config=None)¶
Generate web server access logs.
Returns realistic HTTP access log entries with configurable traffic patterns, status code distributions (via Markov chain), latency (LogNormal), and error bursts.
# Arguments * config - Optional LogsConfig or dict with generation parameters
# Returns * DataFrame (pandas/polars) or dict of log entries
- superstore.app_logs(config=None)¶
Generate application event logs.
Returns application-level log entries with log levels, loggers, messages, thread IDs, trace/span IDs, and optional exceptions.
# Arguments * config - Optional LogsConfig or dict with generation parameters
# Returns * DataFrame (pandas/polars) or dict of application log entries
- superstore.stock_prices(config=None)¶
Generate stock price data (OHLCV bars).
Returns realistic stock price data using Geometric Brownian Motion with optional jump diffusion. Includes OHLCV bars with realistic intraday relationships and volume patterns.
# Arguments * config - Optional FinanceConfig or dict with generation parameters
# Returns * DataFrame (pandas/polars) or dict of OHLCV bars
- superstore.options_chain(config=None, spot_price=None, date=None)¶
Generate options chain with Greeks.
Returns options data including Black-Scholes pricing and Greeks (delta, gamma, theta, vega) for various strikes and expirations.
# Arguments * config - Optional FinanceConfig or dict with generation parameters * spot_price - Current underlying price (default: 100.0) * date - Pricing date (default: “2024-01-15”)
# Returns * DataFrame (pandas/polars) of options chain
- superstore.finance(config=None)¶
Generate complete finance dataset: stock prices + options.
Returns both OHLCV price data and an options chain for the last trading day.
# Arguments * config - Optional FinanceConfig or dict with generation parameters
# Returns * Tuple of (prices_df, options_df)
- superstore.telemetry(config=None, scenario=None)¶
Generate IoT telemetry data with configurable behaviors and preset scenarios.
# Arguments * config - Optional TelemetryConfig dict or None for defaults * scenario - Optional preset scenario name: “normal”, “cpu_spikes”, “memory_leak”,
“network_congestion”, “disk_pressure”, “cascade_failure”, “maintenance_window”, “sensor_drift”, “degradation_cycle”, “production”, “chaos”
# Returns * DataFrame (pandas/polars) with telemetry readings
- superstore.machines(config=None, count=None, json=False, seed=None)¶
Generate machine data with structured configuration.
- Parameters:
config – Optional CrossfilterConfig pydantic model, dict, or int (for backward compatibility). If int, treated as count. If None, uses default configuration.
count – Number of machines (overrides config if provided)
json – Whether to return JSON (deprecated, unused)
seed – Random seed (overrides config if provided)
- Returns:
List of machine dictionaries.
- superstore.usage(machine, json=False, seed=None)¶
- superstore.status(machine, json=False)¶
- superstore.jobs(machine, json=False, seed=None)¶
- superstore.ecommerce_sessions(count, seed=None, output='pandas')¶
Generate e-commerce sessions
- Parameters:
count – Number of sessions to generate
seed – Optional random seed for reproducibility
output – Output format (“pandas”, “polars”, or “dict”)
- Returns:
DataFrame or dict with session data
- superstore.ecommerce_products(count, seed=None, output='pandas')¶
Generate e-commerce product catalog
- Parameters:
count – Number of products to generate
seed – Optional random seed for reproducibility
output – Output format (“pandas”, “polars”, or “dict”)
- Returns:
DataFrame or dict with product data
- superstore.ecommerce_data(config=None, output='pandas')¶
Generate complete e-commerce dataset
- Parameters:
config – EcommerceConfig dict with generation parameters
output – Output format (“pandas”, “polars”, or “dict”)
- Returns:
Dict with DataFrames for products, sessions, cart_events, orders, customers
Streaming & Parallel¶
- superstore.superstoreStream(total_count, chunk_size=1000, seed=None)¶
Create a streaming superstore data generator.
This returns an iterator that yields chunks of data, allowing memory-efficient processing of large datasets.
- Parameters:
total_count – Total number of rows to generate
chunk_size – Number of rows per chunk (default: 1000)
seed – Optional seed for reproducibility
- Returns:
An iterator yielding lists of dicts
Example
>>> for chunk in superstoreStream(1_000_000, chunk_size=10000): ... process(chunk) # Each chunk is a list of 10000 dicts
- superstore.employeesStream(total_count, chunk_size=1000, seed=None)¶
Create a streaming employee data generator.
This returns an iterator that yields chunks of data, allowing memory-efficient processing of large datasets.
- Parameters:
total_count – Total number of employees to generate
chunk_size – Number of employees per chunk (default: 1000)
seed – Optional seed for reproducibility
- Returns:
An iterator yielding lists of dicts
Example
>>> for chunk in employeesStream(1_000_000, chunk_size=10000): ... process(chunk) # Each chunk is a list of 10000 dicts
- superstore.superstoreParallel(count=1000, output='pandas', seed=None)¶
Generate superstore data in parallel using multiple CPU cores.
This function uses Rayon to parallelize data generation across all available CPU cores, providing significant speedup for large datasets.
- Parameters:
count – Number of rows to generate
output – Output format - “pandas”, “polars”, or “dict” (default: “pandas”)
seed – Optional seed for reproducibility
- Returns:
DataFrame or list of dicts depending on output format
Example
>>> df = superstoreParallel(1_000_000) # Uses all CPU cores
- superstore.employeesParallel(count=1000, output='pandas', seed=None)¶
Generate employee data in parallel using multiple CPU cores.
This function uses Rayon to parallelize data generation across all available CPU cores, providing significant speedup for large datasets.
- Parameters:
count – Number of employees to generate
output – Output format - “pandas”, “polars”, or “dict” (default: “pandas”)
seed – Optional seed for reproducibility
- Returns:
DataFrame or list of dicts depending on output format
Example
>>> df = employeesParallel(1_000_000) # Uses all CPU cores
- superstore.numThreads()¶
Get the number of CPU threads available for parallel operations.
- Returns:
Number of threads Rayon will use
- superstore.setNumThreads(num_threads)¶
Set the number of threads for parallel operations.
This should be called early in the program before any parallel operations. Once set, it cannot be changed.
- Parameters:
num_threads – Number of threads to use for parallel generation
- Raises:
RuntimeError – If the thread pool has already been initialized
Export Functions¶
- superstore.superstoreArrowIpc(count, seed=None)¶
Generate superstore data as Arrow IPC bytes.
The returned bytes can be read by PyArrow: ```python import pyarrow as pa from superstore import superstoreArrowIpc
ipc_bytes = superstoreArrowIpc(100) reader = pa.ipc.open_stream(ipc_bytes) table = reader.read_all() df = table.to_pandas() ```
# Arguments * count - Number of rows to generate * seed - Optional random seed for reproducibility
# Returns Arrow IPC stream bytes
- superstore.employeesArrowIpc(count, seed=None)¶
Generate employee data as Arrow IPC bytes.
The returned bytes can be read by PyArrow: ```python import pyarrow as pa from superstore import employeesArrowIpc
ipc_bytes = employeesArrowIpc(100) reader = pa.ipc.open_stream(ipc_bytes) table = reader.read_all() df = table.to_pandas() ```
# Arguments * count - Number of rows to generate * seed - Optional random seed for reproducibility
# Returns Arrow IPC stream bytes
- superstore.superstoreToParquet(path, count, seed=None, compression=None)¶
Write superstore data directly to a Parquet file.
# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility * compression - Compression type: ‘none’, ‘snappy’ (default), or ‘zstd’
# Returns Number of rows written
- superstore.employeesToParquet(path, count, seed=None, compression=None)¶
Write employee data directly to a Parquet file.
# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility * compression - Compression type: ‘none’, ‘snappy’ (default), or ‘zstd’
# Returns Number of rows written
- superstore.superstoreToCsv(path, count, seed=None)¶
Write superstore data directly to a CSV file.
# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility
# Returns Number of rows written
- superstore.employeesToCsv(path, count, seed=None)¶
Write employee data directly to a CSV file.
# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility
# Returns Number of rows written
Distributions¶
- superstore.sampleUniform(min, max, n=1, seed=None)¶
Sample from a uniform distribution.
- Parameters:
min – Minimum value
max – Maximum value
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleNormal(mean, std_dev, n=1, seed=None)¶
Sample from a normal (Gaussian) distribution.
- Parameters:
mean – Mean of the distribution
std_dev – Standard deviation
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleLogNormal(mu, sigma, n=1, seed=None)¶
Sample from a log-normal distribution.
- Parameters:
mu – Mean of the underlying normal distribution
sigma – Standard deviation of the underlying normal distribution
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleExponential(lambda_, n=1, seed=None)¶
Sample from an exponential distribution.
- Parameters:
lambda – Rate parameter (1/mean)
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleBeta(alpha, beta, n=1, seed=None)¶
Sample from a Beta distribution.
- Parameters:
alpha – Shape parameter alpha
beta – Shape parameter beta
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise (values in [0, 1])
- superstore.sampleGamma(shape, scale, n=1, seed=None)¶
Sample from a Gamma distribution.
- Parameters:
shape – Shape parameter
scale – Scale parameter
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleWeibull(shape, scale, n=1, seed=None)¶
Sample from a Weibull distribution.
- Parameters:
shape – Shape parameter
scale – Scale parameter
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.samplePareto(scale, shape, n=1, seed=None)¶
Sample from a Pareto (power law) distribution.
- Parameters:
scale – Scale parameter (minimum value)
shape – Shape parameter (tail index)
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.samplePoisson(lambda_, n=1, seed=None)¶
Sample from a Poisson distribution.
- Parameters:
lambda – Rate parameter (expected count)
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
- superstore.sampleCategorical(weights, n=1, seed=None)¶
Sample from a categorical distribution with weights.
- Parameters:
weights – List of weights for each category (will be normalized)
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single category index if n=1, list of indices otherwise
- superstore.sampleMixture(means, std_devs, weights, n=1, seed=None)¶
Sample from a mixture of normal distributions.
- Parameters:
means – List of means for each component
std_devs – List of standard deviations for each component
weights – List of weights for each component (will be normalized)
n – Number of samples (default: 1)
seed – Optional seed for reproducibility
- Returns:
Single value if n=1, list of values otherwise
Example
>>> # Bimodal distribution >>> samples = sampleMixture([30000, 80000], [10000, 20000], [0.6, 0.4], n=1000)
- superstore.addGaussianNoise(values, std_dev, seed=None)¶
Add Gaussian noise to values.
- Parameters:
values – List of values to add noise to
std_dev – Standard deviation of the noise
seed – Optional seed for reproducibility
- Returns:
List of values with noise added
- superstore.applyMissing(values, probability, seed=None)¶
Apply missing at random to values.
- Parameters:
values – List of values
probability – Probability of each value being missing (0-1)
seed – Optional seed for reproducibility
- Returns:
List of values with some replaced by None
Correlation & Copulas¶
- superstore.pearsonCorrelation(x, y)¶
Compute the Pearson correlation coefficient between two lists.
# Arguments * x - First variable * y - Second variable
# Returns The correlation coefficient (between -1 and 1)
- superstore.sampleBivariate(n, rho, mean1=0.0, std1=1.0, mean2=0.0, std2=1.0, seed=None)¶
Generate correlated bivariate normal data.
This is a convenience function for the common case of generating two correlated variables.
# Arguments * n - Number of samples * rho - Correlation coefficient between the two variables * mean1, std1 - Mean and standard deviation for first variable * mean2, std2 - Mean and standard deviation for second variable * seed - Optional random seed
# Returns A tuple of two lists (x, y)
- class superstore.GaussianCopula(correlation_matrix)¶
Bases:
objectGaussian (Normal) Copula.
Uses multivariate normal distribution to model dependencies. The correlation between variables is specified via a correlation matrix.
Example
>>> copula = GaussianCopula([[1.0, 0.8], [0.8, 1.0]]) >>> samples = copula.sample(100) >>> # Each sample is a list of uniform [0,1] values with the specified correlation
- dim¶
Get the dimension of the copula.
- sample(n, seed=None)¶
Generate n samples from the copula.
- Parameters:
n – Number of samples to generate
seed – Optional random seed
- Returns:
List of n samples, where each sample is a list of d uniform [0,1] values
- class superstore.ClaytonCopula(theta, dim)¶
Bases:
objectClayton Copula.
An Archimedean copula with lower tail dependence. Good for modeling dependencies where extreme low values tend to occur together.
Example
>>> copula = ClaytonCopula(2.0, 2) # theta=2, 2 dimensions >>> samples = copula.sample(100)
- dim¶
Get the dimension of the copula.
- kendalls_tau()¶
Get Kendall’s tau (measure of correlation).
- sample(n, seed=None)¶
Generate n samples from the copula.
- theta¶
Get theta parameter.
- class superstore.FrankCopula(theta)¶
Bases:
objectFrank Copula.
An Archimedean copula with symmetric tail dependence. Good for modeling overall dependence without tail asymmetry.
Example
>>> copula = FrankCopula(5.0) # positive dependence >>> samples = copula.sample(100)
- kendalls_tau()¶
Get Kendall’s tau (measure of correlation).
- sample(n, seed=None)¶
Generate n bivariate samples from the copula.
- Returns:
List of n tuples (u, v), each containing two uniform [0,1] values
- theta¶
Get theta parameter.
- class superstore.GumbelCopula(theta)¶
Bases:
objectGumbel Copula.
An Archimedean copula with upper tail dependence. Good for modeling dependencies where extreme high values tend to occur together.
Example
>>> copula = GumbelCopula(2.0) # theta=2 means moderate upper tail dependence >>> samples = copula.sample(100)
- kendalls_tau()¶
Get Kendall’s tau (measure of correlation).
- sample(n, seed=None)¶
Generate n bivariate samples from the copula.
- Returns:
List of n tuples (u, v), each containing two uniform [0,1] values
- theta¶
Get theta parameter.
- upper_tail_dependence()¶
Get upper tail dependence coefficient.
Temporal Models¶
- class superstore.AR1(phi, sigma, mean=0.0)¶
Bases:
objectAR(1) autoregressive model for generating temporally dependent data.
Generates values according to: x_t = mean + phi * (x_{t-1} - mean) + epsilon_t where epsilon_t ~ N(0, sigma^2)
- mean¶
Get the mean.
- phi¶
Get the phi coefficient.
- reset()¶
Reset the state to the mean.
- sample(n, seed=None)¶
Generate n samples.
- Parameters:
n – Number of samples to generate
seed – Optional random seed
- Returns:
List of n values
- sigma¶
Get the sigma value.
- state¶
Get the current state.
- stationary_variance()¶
Get the stationary variance of the process.
- class superstore.ARp(coefficients, sigma, mean=0.0)¶
Bases:
objectAR(p) autoregressive model of order p.
Generates values according to: x_t = mean + sum_i(phi_i * (x_{t-i} - mean)) + epsilon_t
- static ar2(phi1, phi2, sigma, mean=0.0)¶
Create an AR(2) model.
- order()¶
Get the order of the AR model.
- reset()¶
Reset the state to the mean.
- sample(n, seed=None)¶
Generate n samples.
- class superstore.MarkovChain(transition_matrix, states)¶
Bases:
objectMarkov chain for generating temporally dependent categorical data.
- current_state¶
Get current state.
- sample(n, seed=None)¶
Generate n state transitions.
- sample_indices(n, seed=None)¶
Generate n state transitions as indices.
- set_state(state)¶
Set current state by name.
- states()¶
Get all states.
- stationary_distribution()¶
Get stationary distribution.
- static two_state(state_a, state_b, prob_a_to_b, prob_b_to_a)¶
Create a simple two-state Markov chain.
- Parameters:
state_a – Name of first state
state_b – Name of second state
prob_a_to_b – Probability of transitioning from A to B
prob_b_to_a – Probability of transitioning from B to A
Configuration Classes¶
- pydantic model superstore.SuperstoreConfig[source]¶
Bases:
BaseModelConfiguration for the superstore data generator.
Generates realistic retail transaction data with correlations between sales, quantity, discount, and profit.
Show JSON schema
{ "title": "SuperstoreConfig", "description": "Configuration for the superstore data generator.\n\nGenerates realistic retail transaction data with correlations\nbetween sales, quantity, discount, and profit.", "type": "object", "properties": { "count": { "default": 1000, "description": "Number of rows to generate", "minimum": 1, "title": "Count", "type": "integer" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "pool_size": { "default": 1000, "description": "Size of pre-generated data pools for performance", "maximum": 100000, "minimum": 1, "title": "Pool Size", "type": "integer" }, "sales_quantity_correlation": { "default": 0.8, "description": "Sales-quantity correlation", "maximum": 1.0, "minimum": -1.0, "title": "Sales Quantity Correlation", "type": "number" }, "sales_profit_correlation": { "default": 0.9, "description": "Sales-profit correlation", "maximum": 1.0, "minimum": -1.0, "title": "Sales Profit Correlation", "type": "number" }, "discount_profit_correlation": { "default": -0.6, "description": "Discount-profit correlation", "maximum": 1.0, "minimum": -1.0, "title": "Discount Profit Correlation", "type": "number" }, "enable_price_points": { "default": true, "description": "Round prices to realistic $X.99 values", "title": "Enable Price Points", "type": "boolean" }, "seasonality": { "$ref": "#/$defs/SeasonalityConfig", "description": "Seasonal patterns" }, "promotions": { "$ref": "#/$defs/PromotionalConfig", "description": "Promotional effects" }, "customers": { "$ref": "#/$defs/CustomerConfig", "description": "Customer behavior" } }, "$defs": { "CustomerConfig": { "description": "Configuration for customer behavior patterns.", "properties": { "enable_cohorts": { "default": true, "description": "Enable customer cohort modeling", "title": "Enable Cohorts", "type": "boolean" }, "repeat_customer_rate": { "default": 0.7, "description": "Fraction of orders from repeat customers", "maximum": 1.0, "minimum": 0.0, "title": "Repeat Customer Rate", "type": "number" }, "vip_segment_rate": { "default": 0.1, "description": "Fraction of customers in VIP segment", "maximum": 0.5, "minimum": 0.0, "title": "Vip Segment Rate", "type": "number" }, "vip_order_multiplier": { "default": 2.0, "description": "VIP customer order value multiplier", "maximum": 5.0, "minimum": 1.0, "title": "Vip Order Multiplier", "type": "number" } }, "title": "CustomerConfig", "type": "object" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" }, "PromotionalConfig": { "description": "Configuration for promotional effects.", "properties": { "enable": { "default": true, "description": "Enable promotional patterns", "title": "Enable", "type": "boolean" }, "discount_quantity_correlation": { "default": 0.5, "description": "How much discounts increase quantity (correlation factor)", "maximum": 1.0, "minimum": 0.0, "title": "Discount Quantity Correlation", "type": "number" }, "price_elasticity": { "default": -0.8, "description": "Price elasticity of demand", "maximum": 0.0, "minimum": -2.0, "title": "Price Elasticity", "type": "number" } }, "title": "PromotionalConfig", "type": "object" }, "SeasonalityConfig": { "description": "Configuration for seasonal patterns in sales data.", "properties": { "enable": { "default": true, "description": "Enable seasonal effects", "title": "Enable", "type": "boolean" }, "q4_multiplier": { "default": 1.5, "description": "Q4 (holiday) sales multiplier", "maximum": 3.0, "minimum": 1.0, "title": "Q4 Multiplier", "type": "number" }, "summer_multiplier": { "default": 0.9, "description": "Summer sales multiplier", "maximum": 1.5, "minimum": 0.5, "title": "Summer Multiplier", "type": "number" }, "back_to_school_multiplier": { "default": 1.2, "description": "August/September sales multiplier", "maximum": 2.0, "minimum": 1.0, "title": "Back To School Multiplier", "type": "number" } }, "title": "SeasonalityConfig", "type": "object" } } }
- field count: int = 1000¶
Number of rows to generate
- field output: OutputFormat = OutputFormat.DICT¶
Output format
- field seed: int | None = None¶
Random seed for reproducibility
- field pool_size: int = 1000¶
Size of pre-generated data pools for performance
- field sales_quantity_correlation: float = 0.8¶
Sales-quantity correlation
- field sales_profit_correlation: float = 0.9¶
Sales-profit correlation
- field discount_profit_correlation: float = -0.6¶
Discount-profit correlation
- field enable_price_points: bool = True¶
Round prices to realistic $X.99 values
- field seasonality: SeasonalityConfig [Optional]¶
Seasonal patterns
- field promotions: PromotionalConfig [Optional]¶
Promotional effects
- field customers: CustomerConfig [Optional]¶
Customer behavior
- pydantic model superstore.TimeseriesConfig[source]¶
Bases:
BaseModelConfiguration for the time series generator.
Generates financial-style time series with optional regime changes, volatility clustering, and jump diffusion.
Show JSON schema
{ "title": "TimeseriesConfig", "description": "Configuration for the time series generator.\n\nGenerates financial-style time series with optional regime changes,\nvolatility clustering, and jump diffusion.", "type": "object", "properties": { "nper": { "default": 30, "description": "Number of periods", "minimum": 1, "title": "Nper", "type": "integer" }, "ncol": { "default": 4, "description": "Number of columns (max 26)", "maximum": 26, "minimum": 1, "title": "Ncol", "type": "integer" }, "freq": { "default": "B", "description": "Frequency: B=business, D=daily, W=weekly, M=monthly", "enum": [ "B", "D", "W", "M" ], "title": "Freq", "type": "string" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "ar_phi": { "default": 0.95, "description": "AR(1) persistence parameter", "maximum": 1.0, "minimum": -1.0, "title": "Ar Phi", "type": "number" }, "sigma": { "default": 1.0, "description": "Innovation standard deviation", "minimum": 0.0, "title": "Sigma", "type": "number" }, "drift": { "default": 0.0, "description": "Drift/trend per period", "title": "Drift", "type": "number" }, "cumulative": { "default": true, "description": "Apply cumulative sum (price-like behavior)", "title": "Cumulative", "type": "boolean" }, "use_fat_tails": { "default": false, "description": "Use Student-t instead of normal innovations", "title": "Use Fat Tails", "type": "boolean" }, "degrees_freedom": { "default": 5.0, "description": "Degrees of freedom for Student-t", "maximum": 30.0, "minimum": 2.1, "title": "Degrees Freedom", "type": "number" }, "cross_correlation": { "default": 0.0, "description": "Correlation between columns (0 = independent)", "maximum": 1.0, "minimum": -1.0, "title": "Cross Correlation", "type": "number" }, "regimes": { "$ref": "#/$defs/RegimeConfig", "description": "Regime switching configuration" }, "jumps": { "$ref": "#/$defs/JumpConfig", "description": "Jump diffusion configuration" } }, "$defs": { "JumpConfig": { "description": "Configuration for jump diffusion.", "properties": { "enable": { "default": false, "description": "Enable jump diffusion", "title": "Enable", "type": "boolean" }, "jump_probability": { "default": 0.01, "description": "Probability of jump per period", "maximum": 0.1, "minimum": 0.0, "title": "Jump Probability", "type": "number" }, "jump_mean": { "default": 0.0, "description": "Mean jump size", "title": "Jump Mean", "type": "number" }, "jump_stddev": { "default": 0.05, "description": "Standard deviation of jump size", "minimum": 0.0, "title": "Jump Stddev", "type": "number" } }, "title": "JumpConfig", "type": "object" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" }, "RegimeConfig": { "description": "Configuration for regime-switching behavior.", "properties": { "enable": { "default": false, "description": "Enable regime switching", "title": "Enable", "type": "boolean" }, "n_regimes": { "default": 2, "description": "Number of regimes", "maximum": 5, "minimum": 2, "title": "N Regimes", "type": "integer" }, "regime_persistence": { "default": 0.95, "description": "Probability of staying in current regime", "maximum": 0.99, "minimum": 0.5, "title": "Regime Persistence", "type": "number" }, "volatility_multipliers": { "description": "Volatility multiplier for each regime", "items": { "type": "number" }, "title": "Volatility Multipliers", "type": "array" } }, "title": "RegimeConfig", "type": "object" } } }
- field nper: int = 30¶
Number of periods
- field ncol: int = 4¶
Number of columns (max 26)
- field freq: Literal['B', 'D', 'W', 'M'] = 'B'¶
Frequency: B=business, D=daily, W=weekly, M=monthly
- field output: OutputFormat = OutputFormat.DICT¶
Output format
- field seed: int | None = None¶
Random seed for reproducibility
- field ar_phi: float = 0.95¶
AR(1) persistence parameter
- field sigma: float = 1.0¶
Innovation standard deviation
- field drift: float = 0.0¶
Drift/trend per period
- field cumulative: bool = True¶
Apply cumulative sum (price-like behavior)
- field use_fat_tails: bool = False¶
Use Student-t instead of normal innovations
- field degrees_freedom: float = 5.0¶
Degrees of freedom for Student-t
- field cross_correlation: float = 0.0¶
Correlation between columns (0 = independent)
- field regimes: RegimeConfig [Optional]¶
Regime switching configuration
- field jumps: JumpConfig [Optional]¶
Jump diffusion configuration
- pydantic model superstore.WeatherConfig[source]¶
Bases:
BaseModelConfiguration for the weather data generator.
Generates realistic outdoor sensor data with temporal patterns, seasonal variations, and weather events.
Show JSON schema
{ "title": "WeatherConfig", "description": "Configuration for the weather data generator.\n\nGenerates realistic outdoor sensor data with temporal patterns,\nseasonal variations, and weather events.", "type": "object", "properties": { "count": { "default": 1000, "description": "Number of readings to generate", "minimum": 1, "title": "Count", "type": "integer" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "start_date": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Start date (YYYY-MM-DD). Defaults to 30 days ago.", "title": "Start Date" }, "frequency_minutes": { "default": 15, "description": "Reading frequency in minutes", "maximum": 1440, "minimum": 1, "title": "Frequency Minutes", "type": "integer" }, "climate_zone": { "$ref": "#/$defs/ClimateZone", "default": "temperate", "description": "Climate zone for realistic patterns" }, "latitude": { "default": 40.0, "description": "Latitude for day/night calculations", "maximum": 90.0, "minimum": -90.0, "title": "Latitude", "type": "number" }, "base_temp_celsius": { "default": 15.0, "description": "Annual average temperature in Celsius", "maximum": 50.0, "minimum": -50.0, "title": "Base Temp Celsius", "type": "number" }, "temp_daily_amplitude": { "default": 10.0, "description": "Day/night temperature swing in Celsius", "maximum": 30.0, "minimum": 0.0, "title": "Temp Daily Amplitude", "type": "number" }, "temp_seasonal_amplitude": { "default": 15.0, "description": "Summer/winter temperature swing in Celsius", "maximum": 40.0, "minimum": 0.0, "title": "Temp Seasonal Amplitude", "type": "number" }, "temp_noise_stddev": { "default": 2.0, "description": "Random noise standard deviation", "maximum": 10.0, "minimum": 0.0, "title": "Temp Noise Stddev", "type": "number" }, "base_humidity_percent": { "default": 60.0, "description": "Average humidity percentage", "maximum": 100.0, "minimum": 0.0, "title": "Base Humidity Percent", "type": "number" }, "humidity_temp_correlation": { "default": -0.3, "description": "Correlation between temp and humidity (-1 to 1)", "maximum": 1.0, "minimum": -1.0, "title": "Humidity Temp Correlation", "type": "number" }, "precipitation_probability": { "default": 0.15, "description": "Base probability of precipitation", "maximum": 1.0, "minimum": 0.0, "title": "Precipitation Probability", "type": "number" }, "enable_weather_events": { "default": true, "description": "Enable weather event simulation", "title": "Enable Weather Events", "type": "boolean" }, "event_probability": { "default": 0.05, "description": "Probability of weather event occurring", "maximum": 1.0, "minimum": 0.0, "title": "Event Probability", "type": "number" }, "outlier_probability": { "default": 0.01, "description": "Probability of outlier readings (sensor errors)", "maximum": 0.1, "minimum": 0.0, "title": "Outlier Probability", "type": "number" }, "sensor_drift": { "default": false, "description": "Enable gradual sensor calibration drift", "title": "Sensor Drift", "type": "boolean" }, "sensor_drift_rate": { "default": 0.001, "description": "Rate of sensor drift per reading", "maximum": 0.1, "minimum": 0.0, "title": "Sensor Drift Rate", "type": "number" } }, "$defs": { "ClimateZone": { "description": "Climate zone affecting weather patterns.", "enum": [ "tropical", "subtropical", "temperate", "continental", "polar", "arid", "mediterranean" ], "title": "ClimateZone", "type": "string" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" } } }
- field count: int = 1000¶
Number of readings to generate
- field output: OutputFormat = OutputFormat.DICT¶
Output format
- field seed: int | None = None¶
Random seed for reproducibility
- field start_date: str | None = None¶
Start date (YYYY-MM-DD). Defaults to 30 days ago.
- field frequency_minutes: int = 15¶
Reading frequency in minutes
- field climate_zone: ClimateZone = ClimateZone.TEMPERATE¶
Climate zone for realistic patterns
- field latitude: float = 40.0¶
Latitude for day/night calculations
- field base_temp_celsius: float = 15.0¶
Annual average temperature in Celsius
- field temp_daily_amplitude: float = 10.0¶
Day/night temperature swing in Celsius
- field temp_seasonal_amplitude: float = 15.0¶
Summer/winter temperature swing in Celsius
- field temp_noise_stddev: float = 2.0¶
Random noise standard deviation
- field base_humidity_percent: float = 60.0¶
Average humidity percentage
- field humidity_temp_correlation: float = -0.3¶
Correlation between temp and humidity (-1 to 1)
- field precipitation_probability: float = 0.15¶
Base probability of precipitation
- field enable_weather_events: bool = True¶
Enable weather event simulation
- field event_probability: float = 0.05¶
Probability of weather event occurring
- field outlier_probability: float = 0.01¶
Probability of outlier readings (sensor errors)
- field sensor_drift: bool = False¶
Enable gradual sensor calibration drift
- field sensor_drift_rate: float = 0.001¶
Rate of sensor drift per reading
- pydantic model superstore.LogsConfig[source]¶
Bases:
BaseModelConfiguration for the logs data generator.
Generates realistic web server access logs and application event logs with configurable traffic patterns, error rates, and latency distributions.
Show JSON schema
{ "title": "LogsConfig", "description": "Configuration for the logs data generator.\n\nGenerates realistic web server access logs and application event logs\nwith configurable traffic patterns, error rates, and latency distributions.", "type": "object", "properties": { "count": { "default": 1000, "description": "Number of log entries to generate", "minimum": 1, "title": "Count", "type": "integer" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format (pandas, polars, or dict)" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "format": { "$ref": "#/$defs/LogFormat", "default": "combined", "description": "Log format style" }, "start_time": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Start timestamp (ISO format). Defaults to current time.", "title": "Start Time" }, "requests_per_second": { "default": 100.0, "description": "Average requests per second (Poisson rate)", "minimum": 0.1, "title": "Requests Per Second", "type": "number" }, "success_rate": { "default": 0.95, "description": "Base success rate (2xx responses)", "maximum": 1.0, "minimum": 0.0, "title": "Success Rate", "type": "number" }, "error_burst": { "$ref": "#/$defs/ErrorBurstConfig", "description": "Error burst configuration" }, "latency": { "$ref": "#/$defs/LatencyConfig", "description": "Latency distribution configuration" }, "include_user_agent": { "default": true, "description": "Include user agent strings", "title": "Include User Agent", "type": "boolean" }, "unique_ips": { "default": 1000, "description": "Number of unique IP addresses to generate", "minimum": 1, "title": "Unique Ips", "type": "integer" }, "unique_users": { "default": 500, "description": "Number of unique user IDs", "minimum": 1, "title": "Unique Users", "type": "integer" }, "api_path_ratio": { "default": 0.7, "description": "Ratio of API paths vs static paths", "maximum": 1.0, "minimum": 0.0, "title": "Api Path Ratio", "type": "number" } }, "$defs": { "ErrorBurstConfig": { "description": "Configuration for error burst behavior in logs.", "properties": { "enable": { "default": true, "description": "Enable error burst simulation", "title": "Enable", "type": "boolean" }, "burst_probability": { "default": 0.02, "description": "Probability of entering a burst state per second", "maximum": 1.0, "minimum": 0.0, "title": "Burst Probability", "type": "number" }, "burst_duration_seconds": { "default": 30, "description": "Average duration of error bursts in seconds", "minimum": 1, "title": "Burst Duration Seconds", "type": "integer" }, "burst_error_rate": { "default": 0.5, "description": "Error rate during burst periods", "maximum": 1.0, "minimum": 0.0, "title": "Burst Error Rate", "type": "number" } }, "title": "ErrorBurstConfig", "type": "object" }, "LatencyConfig": { "description": "Configuration for request latency distribution.", "properties": { "base_latency_ms": { "default": 50.0, "description": "Base latency in milliseconds (median)", "minimum": 1.0, "title": "Base Latency Ms", "type": "number" }, "latency_stddev": { "default": 0.8, "description": "Standard deviation for log-normal distribution", "minimum": 0.1, "title": "Latency Stddev", "type": "number" }, "slow_request_probability": { "default": 0.05, "description": "Probability of a slow request", "maximum": 1.0, "minimum": 0.0, "title": "Slow Request Probability", "type": "number" }, "slow_request_multiplier": { "default": 10.0, "description": "Multiplier for slow request latency", "minimum": 1.0, "title": "Slow Request Multiplier", "type": "number" } }, "title": "LatencyConfig", "type": "object" }, "LogFormat": { "description": "Log output format styles.", "enum": [ "combined", "common", "json", "application" ], "title": "LogFormat", "type": "string" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" } } }
- field count: int = 1000¶
Number of log entries to generate
- field output: OutputFormat = OutputFormat.DICT¶
Output format (pandas, polars, or dict)
- field seed: int | None = None¶
Random seed for reproducibility
- field start_time: str | None = None¶
Start timestamp (ISO format). Defaults to current time.
- field requests_per_second: float = 100.0¶
Average requests per second (Poisson rate)
- field success_rate: float = 0.95¶
Base success rate (2xx responses)
- field error_burst: ErrorBurstConfig [Optional]¶
Error burst configuration
- field latency: LatencyConfig [Optional]¶
Latency distribution configuration
- field include_user_agent: bool = True¶
Include user agent strings
- field unique_ips: int = 1000¶
Number of unique IP addresses to generate
- field unique_users: int = 500¶
Number of unique user IDs
- field api_path_ratio: float = 0.7¶
Ratio of API paths vs static paths
- pydantic model superstore.FinanceConfig[source]¶
Bases:
BaseModelConfiguration for the finance data generator.
Generates realistic financial market data including OHLCV stock prices, multi-asset correlated returns, and options chains with Black-Scholes pricing.
Show JSON schema
{ "title": "FinanceConfig", "description": "Configuration for the finance data generator.\n\nGenerates realistic financial market data including OHLCV stock prices,\nmulti-asset correlated returns, and options chains with Black-Scholes pricing.", "type": "object", "properties": { "ndays": { "default": 252, "description": "Number of trading days to generate (252 = 1 year)", "minimum": 1, "title": "Ndays", "type": "integer" }, "n_assets": { "default": 1, "description": "Number of assets (1 = single stock, >1 = correlated multi-asset)", "minimum": 1, "title": "N Assets", "type": "integer" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format (pandas, polars, or dict)" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "start_date": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Start date (ISO format YYYY-MM-DD). Defaults to 2024-01-02.", "title": "Start Date" }, "tickers": { "description": "Ticker symbols for the assets", "items": { "type": "string" }, "title": "Tickers", "type": "array" }, "asset_correlation": { "default": 0.5, "description": "Correlation between assets (for multi-asset generation)", "maximum": 1.0, "minimum": -1.0, "title": "Asset Correlation", "type": "number" }, "stock": { "$ref": "#/$defs/StockConfig", "description": "Stock price generation configuration" }, "ohlcv": { "$ref": "#/$defs/OhlcvConfig", "description": "OHLCV bar configuration" }, "options": { "$ref": "#/$defs/OptionsConfig", "description": "Options chain configuration" } }, "$defs": { "OhlcvConfig": { "description": "Configuration for OHLCV (Open-High-Low-Close-Volume) bar generation.", "properties": { "avg_volume": { "default": 1000000, "description": "Average daily trading volume", "minimum": 1, "title": "Avg Volume", "type": "integer" }, "volume_volatility": { "default": 0.5, "description": "Volatility of volume (log-normal sigma)", "minimum": 0.0, "title": "Volume Volatility", "type": "number" }, "intraday_volatility": { "default": 0.02, "description": "Intraday price range volatility", "minimum": 0.0, "title": "Intraday Volatility", "type": "number" }, "volume_price_correlation": { "default": 0.3, "description": "Correlation between volume and absolute returns", "maximum": 1.0, "minimum": -1.0, "title": "Volume Price Correlation", "type": "number" } }, "title": "OhlcvConfig", "type": "object" }, "OptionsConfig": { "description": "Configuration for options chain generation with Black-Scholes pricing.", "properties": { "risk_free_rate": { "default": 0.05, "description": "Annual risk-free interest rate", "title": "Risk Free Rate", "type": "number" }, "dividend_yield": { "default": 0.02, "description": "Annual dividend yield", "minimum": 0.0, "title": "Dividend Yield", "type": "number" }, "expirations": { "description": "Days to expiration for option contracts", "items": { "type": "integer" }, "title": "Expirations", "type": "array" }, "strike_offsets": { "description": "Strike prices as multipliers of spot price", "items": { "type": "number" }, "title": "Strike Offsets", "type": "array" } }, "title": "OptionsConfig", "type": "object" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" }, "StockConfig": { "description": "Configuration for stock price generation using Geometric Brownian Motion.", "properties": { "annual_drift": { "default": 0.08, "description": "Annual expected return (mu). E.g., 0.08 = 8% annual return", "title": "Annual Drift", "type": "number" }, "annual_volatility": { "default": 0.2, "description": "Annual volatility (sigma). E.g., 0.20 = 20% annual volatility", "minimum": 0.0, "title": "Annual Volatility", "type": "number" }, "initial_price": { "default": 100.0, "description": "Initial stock price", "exclusiveMinimum": 0.0, "title": "Initial Price", "type": "number" }, "enable_jumps": { "default": false, "description": "Enable jump diffusion for more realistic price movements", "title": "Enable Jumps", "type": "boolean" }, "jump_probability": { "default": 0.02, "description": "Daily probability of a jump event", "maximum": 1.0, "minimum": 0.0, "title": "Jump Probability", "type": "number" }, "jump_mean": { "default": 0.0, "description": "Mean of jump size (log-normal)", "title": "Jump Mean", "type": "number" }, "jump_stddev": { "default": 0.05, "description": "Standard deviation of jump size", "minimum": 0.0, "title": "Jump Stddev", "type": "number" } }, "title": "StockConfig", "type": "object" } } }
- field ndays: int = 252¶
Number of trading days to generate (252 = 1 year)
- field n_assets: int = 1¶
Number of assets (1 = single stock, >1 = correlated multi-asset)
- field output: OutputFormat = OutputFormat.DICT¶
Output format (pandas, polars, or dict)
- field seed: int | None = None¶
Random seed for reproducibility
- field start_date: str | None = None¶
Start date (ISO format YYYY-MM-DD). Defaults to 2024-01-02.
- field tickers: list[str] [Optional]¶
Ticker symbols for the assets
- field asset_correlation: float = 0.5¶
Correlation between assets (for multi-asset generation)
- field stock: StockConfig [Optional]¶
Stock price generation configuration
- field ohlcv: OhlcvConfig [Optional]¶
OHLCV bar configuration
- field options: OptionsConfig [Optional]¶
Options chain configuration
- pydantic model superstore.CrossfilterConfig[source]¶
Bases:
BaseModelConfiguration for crossfilter IoT data generator.
Generates machine telemetry data suitable for dashboard demos with optional anomalies and temporal patterns.
Show JSON schema
{ "title": "CrossfilterConfig", "description": "Configuration for crossfilter IoT data generator.\n\nGenerates machine telemetry data suitable for dashboard demos\nwith optional anomalies and temporal patterns.", "type": "object", "properties": { "n_machines": { "default": 10, "description": "Number of machines", "minimum": 1, "title": "N Machines", "type": "integer" }, "n_readings": { "default": 1000, "description": "Number of usage readings per machine", "minimum": 1, "title": "N Readings", "type": "integer" }, "output": { "$ref": "#/$defs/OutputFormat", "default": "dict", "description": "Output format" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "machine_types": { "description": "Types of machines to generate", "items": { "$ref": "#/$defs/MachineType" }, "title": "Machine Types", "type": "array" }, "cores_range": { "default": [ 4, 64 ], "description": "Range of CPU cores per machine", "maxItems": 2, "minItems": 2, "prefixItems": [ { "type": "integer" }, { "type": "integer" } ], "title": "Cores Range", "type": "array" }, "zones": { "description": "Available zones", "items": { "type": "string" }, "title": "Zones", "type": "array" }, "regions": { "description": "Available regions", "items": { "type": "string" }, "title": "Regions", "type": "array" }, "base_cpu_load": { "default": 0.3, "description": "Base CPU utilization", "maximum": 1.0, "minimum": 0.0, "title": "Base Cpu Load", "type": "number" }, "base_memory_load": { "default": 0.5, "description": "Base memory utilization", "maximum": 1.0, "minimum": 0.0, "title": "Base Memory Load", "type": "number" }, "load_variance": { "default": 0.2, "description": "Variance in load readings", "maximum": 0.5, "minimum": 0.0, "title": "Load Variance", "type": "number" }, "anomalies": { "$ref": "#/$defs/AnomalyConfig", "description": "Anomaly injection settings" }, "temporal_patterns": { "$ref": "#/$defs/TemporalPatternConfig", "description": "Temporal pattern settings" }, "enable_failures": { "default": false, "description": "Enable machine failure simulation", "title": "Enable Failures", "type": "boolean" }, "failure_probability": { "default": 0.001, "description": "Probability of failure per reading", "maximum": 0.1, "minimum": 0.0, "title": "Failure Probability", "type": "number" }, "cascade_failure_probability": { "default": 0.3, "description": "Probability of cascade failure when dependent machine fails", "maximum": 1.0, "minimum": 0.0, "title": "Cascade Failure Probability", "type": "number" } }, "$defs": { "AnomalyConfig": { "description": "Configuration for anomaly injection.", "properties": { "enable": { "default": false, "description": "Enable anomaly injection", "title": "Enable", "type": "boolean" }, "cpu_spike_probability": { "default": 0.02, "description": "Probability of CPU spike", "maximum": 0.1, "minimum": 0.0, "title": "Cpu Spike Probability", "type": "number" }, "memory_leak_probability": { "default": 0.01, "description": "Probability of memory leak start", "maximum": 0.1, "minimum": 0.0, "title": "Memory Leak Probability", "type": "number" }, "network_saturation_probability": { "default": 0.01, "description": "Probability of network saturation", "maximum": 0.1, "minimum": 0.0, "title": "Network Saturation Probability", "type": "number" } }, "title": "AnomalyConfig", "type": "object" }, "MachineType": { "description": "Types of machines for crossfilter.", "enum": [ "core", "edge", "worker" ], "title": "MachineType", "type": "string" }, "OutputFormat": { "description": "Output format for generators.", "enum": [ "pandas", "polars", "dict" ], "title": "OutputFormat", "type": "string" }, "TemporalPatternConfig": { "description": "Configuration for temporal patterns in IoT data.", "properties": { "enable_diurnal": { "default": false, "description": "Enable day/night load patterns", "title": "Enable Diurnal", "type": "boolean" }, "enable_weekly": { "default": false, "description": "Enable weekday/weekend patterns", "title": "Enable Weekly", "type": "boolean" }, "peak_hour": { "default": 14, "description": "Hour of peak load (0-23)", "maximum": 23, "minimum": 0, "title": "Peak Hour", "type": "integer" }, "night_load_factor": { "default": 0.3, "description": "Load factor during night hours", "maximum": 1.0, "minimum": 0.0, "title": "Night Load Factor", "type": "number" }, "weekend_load_factor": { "default": 0.5, "description": "Load factor during weekends", "maximum": 1.0, "minimum": 0.0, "title": "Weekend Load Factor", "type": "number" } }, "title": "TemporalPatternConfig", "type": "object" } } }
- field n_machines: int = 10¶
Number of machines
- field n_readings: int = 1000¶
Number of usage readings per machine
- field output: OutputFormat = OutputFormat.DICT¶
Output format
- field seed: int | None = None¶
Random seed for reproducibility
- field machine_types: list[MachineType] [Optional]¶
Types of machines to generate
- field cores_range: tuple[int, int] = (4, 64)¶
Range of CPU cores per machine
- field zones: list[str] [Optional]¶
Available zones
- field regions: list[str] [Optional]¶
Available regions
- field base_cpu_load: float = 0.3¶
Base CPU utilization
- field base_memory_load: float = 0.5¶
Base memory utilization
- field load_variance: float = 0.2¶
Variance in load readings
- field anomalies: AnomalyConfig [Optional]¶
Anomaly injection settings
- field temporal_patterns: TemporalPatternConfig [Optional]¶
Temporal pattern settings
- field enable_failures: bool = False¶
Enable machine failure simulation
- field failure_probability: float = 0.001¶
Probability of failure per reading
- field cascade_failure_probability: float = 0.3¶
Probability of cascade failure when dependent machine fails
- pydantic model superstore.EcommerceConfig[source]¶
Bases:
BaseModelConfiguration for e-commerce data generation.
Generates realistic e-commerce data including: - User sessions via MarkovChain state machines - Shopping cart events with abandonment patterns - Customer RFM (Recency, Frequency, Monetary) metrics - Product catalog with categories and pricing - Conversion funnels with realistic drop-off rates
Show JSON schema
{ "title": "EcommerceConfig", "description": "Configuration for e-commerce data generation.\n\nGenerates realistic e-commerce data including:\n- User sessions via MarkovChain state machines\n- Shopping cart events with abandonment patterns\n- Customer RFM (Recency, Frequency, Monetary) metrics\n- Product catalog with categories and pricing\n- Conversion funnels with realistic drop-off rates", "type": "object", "properties": { "sessions": { "default": 10000, "description": "Number of sessions to generate", "minimum": 1, "title": "Sessions", "type": "integer" }, "customers": { "default": 2000, "description": "Number of unique customers", "minimum": 1, "title": "Customers", "type": "integer" }, "seed": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Random seed for reproducibility", "title": "Seed" }, "start_date": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Start date for data generation (YYYY-MM-DD)", "title": "Start Date" }, "days": { "default": 30, "description": "Number of days to generate", "minimum": 1, "title": "Days", "type": "integer" }, "session": { "$ref": "#/$defs/SessionConfig", "description": "Session behavior configuration" }, "cart": { "$ref": "#/$defs/CartConfig", "description": "Cart behavior configuration" }, "catalog": { "$ref": "#/$defs/CatalogConfig", "description": "Product catalog configuration" }, "rfm": { "$ref": "#/$defs/RfmConfig", "description": "RFM analysis configuration" }, "funnel": { "$ref": "#/$defs/FunnelConfig", "description": "Conversion funnel configuration" } }, "$defs": { "CartConfig": { "description": "Configuration for cart behavior.", "properties": { "avg_items_per_cart": { "default": 2.5, "description": "Average items per cart", "minimum": 1.0, "title": "Avg Items Per Cart", "type": "number" }, "remove_probability": { "default": 0.1, "description": "Probability of removing an item from cart", "maximum": 1.0, "minimum": 0.0, "title": "Remove Probability", "type": "number" }, "quantity_update_probability": { "default": 0.05, "description": "Probability of updating quantity", "maximum": 1.0, "minimum": 0.0, "title": "Quantity Update Probability", "type": "number" }, "max_items": { "default": 20, "description": "Maximum items per cart", "minimum": 1, "title": "Max Items", "type": "integer" }, "enable_abandonment": { "default": true, "description": "Enable cart abandonment simulation", "title": "Enable Abandonment", "type": "boolean" }, "abandonment_rate": { "default": 0.7, "description": "Cart abandonment rate", "maximum": 1.0, "minimum": 0.0, "title": "Abandonment Rate", "type": "number" } }, "title": "CartConfig", "type": "object" }, "CatalogConfig": { "description": "Configuration for product catalog.", "properties": { "num_products": { "default": 500, "description": "Number of unique products", "minimum": 1, "title": "Num Products", "type": "integer" }, "min_price": { "default": 5.0, "description": "Minimum product price", "minimum": 0.01, "title": "Min Price", "type": "number" }, "max_price": { "default": 1000.0, "description": "Maximum product price", "minimum": 1.0, "title": "Max Price", "type": "number" }, "lognormal_prices": { "default": true, "description": "Price follows log-normal distribution (realistic skew)", "title": "Lognormal Prices", "type": "boolean" }, "categories": { "description": "Product categories", "items": { "type": "string" }, "title": "Categories", "type": "array" } }, "title": "CatalogConfig", "type": "object" }, "FunnelConfig": { "description": "Configuration for conversion funnel.", "properties": { "enable": { "default": true, "description": "Enable funnel stage tracking", "title": "Enable", "type": "boolean" }, "stages": { "description": "Funnel stages", "items": { "type": "string" }, "title": "Stages", "type": "array" }, "time_of_day_effects": { "default": true, "description": "Time-of-day effects on conversions", "title": "Time Of Day Effects", "type": "boolean" }, "day_of_week_effects": { "default": true, "description": "Day-of-week effects on conversions", "title": "Day Of Week Effects", "type": "boolean" } }, "title": "FunnelConfig", "type": "object" }, "RfmConfig": { "description": "Configuration for RFM (Recency, Frequency, Monetary) analysis.", "properties": { "enable": { "default": true, "description": "Enable RFM metrics calculation", "title": "Enable", "type": "boolean" }, "recency_window_days": { "default": 365, "description": "Days to look back for recency", "minimum": 1, "title": "Recency Window Days", "type": "integer" }, "num_buckets": { "default": 5, "description": "Number of RFM score buckets (typically 5)", "maximum": 10, "minimum": 2, "title": "Num Buckets", "type": "integer" }, "pareto_shape": { "default": 1.5, "description": "Pareto distribution shape for customer value (80/20 rule)", "minimum": 1.0, "title": "Pareto Shape", "type": "number" } }, "title": "RfmConfig", "type": "object" }, "SessionConfig": { "description": "Configuration for session behavior in e-commerce.", "properties": { "avg_pages_per_session": { "default": 5.0, "description": "Average pages viewed per session", "minimum": 1.0, "title": "Avg Pages Per Session", "type": "number" }, "cart_add_probability": { "default": 0.15, "description": "Probability of adding item to cart given product view", "maximum": 1.0, "minimum": 0.0, "title": "Cart Add Probability", "type": "number" }, "checkout_start_probability": { "default": 0.4, "description": "Probability of starting checkout given cart view", "maximum": 1.0, "minimum": 0.0, "title": "Checkout Start Probability", "type": "number" }, "purchase_completion_probability": { "default": 0.65, "description": "Probability of completing purchase given checkout start", "maximum": 1.0, "minimum": 0.0, "title": "Purchase Completion Probability", "type": "number" }, "avg_session_duration_seconds": { "default": 300, "description": "Average session duration in seconds", "minimum": 1, "title": "Avg Session Duration Seconds", "type": "integer" }, "enable_bounces": { "default": true, "description": "Enable session bounces (single-page visits)", "title": "Enable Bounces", "type": "boolean" }, "bounce_rate": { "default": 0.35, "description": "Bounce rate (probability of immediate exit)", "maximum": 1.0, "minimum": 0.0, "title": "Bounce Rate", "type": "number" } }, "title": "SessionConfig", "type": "object" } } }
- field sessions: int = 10000¶
Number of sessions to generate
- field customers: int = 2000¶
Number of unique customers
- field seed: int | None = None¶
Random seed for reproducibility
- field start_date: str | None = None¶
Start date for data generation (YYYY-MM-DD)
- field days: int = 30¶
Number of days to generate
- field session: SessionConfig [Optional]¶
Session behavior configuration
- field cart: CartConfig [Optional]¶
Cart behavior configuration
- field catalog: CatalogConfig [Optional]¶
Product catalog configuration
- field funnel: FunnelConfig [Optional]¶
Conversion funnel configuration
- pydantic model superstore.SessionConfig[source]¶
Bases:
BaseModelConfiguration for session behavior in e-commerce.
Show JSON schema
{ "title": "SessionConfig", "description": "Configuration for session behavior in e-commerce.", "type": "object", "properties": { "avg_pages_per_session": { "default": 5.0, "description": "Average pages viewed per session", "minimum": 1.0, "title": "Avg Pages Per Session", "type": "number" }, "cart_add_probability": { "default": 0.15, "description": "Probability of adding item to cart given product view", "maximum": 1.0, "minimum": 0.0, "title": "Cart Add Probability", "type": "number" }, "checkout_start_probability": { "default": 0.4, "description": "Probability of starting checkout given cart view", "maximum": 1.0, "minimum": 0.0, "title": "Checkout Start Probability", "type": "number" }, "purchase_completion_probability": { "default": 0.65, "description": "Probability of completing purchase given checkout start", "maximum": 1.0, "minimum": 0.0, "title": "Purchase Completion Probability", "type": "number" }, "avg_session_duration_seconds": { "default": 300, "description": "Average session duration in seconds", "minimum": 1, "title": "Avg Session Duration Seconds", "type": "integer" }, "enable_bounces": { "default": true, "description": "Enable session bounces (single-page visits)", "title": "Enable Bounces", "type": "boolean" }, "bounce_rate": { "default": 0.35, "description": "Bounce rate (probability of immediate exit)", "maximum": 1.0, "minimum": 0.0, "title": "Bounce Rate", "type": "number" } } }
- field avg_pages_per_session: float = 5.0¶
Average pages viewed per session
- field cart_add_probability: float = 0.15¶
Probability of adding item to cart given product view
- field checkout_start_probability: float = 0.4¶
Probability of starting checkout given cart view
- field purchase_completion_probability: float = 0.65¶
Probability of completing purchase given checkout start
- field avg_session_duration_seconds: int = 300¶
Average session duration in seconds
- field enable_bounces: bool = True¶
Enable session bounces (single-page visits)
- field bounce_rate: float = 0.35¶
Bounce rate (probability of immediate exit)
- pydantic model superstore.CartConfig[source]¶
Bases:
BaseModelConfiguration for cart behavior.
Show JSON schema
{ "title": "CartConfig", "description": "Configuration for cart behavior.", "type": "object", "properties": { "avg_items_per_cart": { "default": 2.5, "description": "Average items per cart", "minimum": 1.0, "title": "Avg Items Per Cart", "type": "number" }, "remove_probability": { "default": 0.1, "description": "Probability of removing an item from cart", "maximum": 1.0, "minimum": 0.0, "title": "Remove Probability", "type": "number" }, "quantity_update_probability": { "default": 0.05, "description": "Probability of updating quantity", "maximum": 1.0, "minimum": 0.0, "title": "Quantity Update Probability", "type": "number" }, "max_items": { "default": 20, "description": "Maximum items per cart", "minimum": 1, "title": "Max Items", "type": "integer" }, "enable_abandonment": { "default": true, "description": "Enable cart abandonment simulation", "title": "Enable Abandonment", "type": "boolean" }, "abandonment_rate": { "default": 0.7, "description": "Cart abandonment rate", "maximum": 1.0, "minimum": 0.0, "title": "Abandonment Rate", "type": "number" } } }
- field avg_items_per_cart: float = 2.5¶
Average items per cart
- field remove_probability: float = 0.1¶
Probability of removing an item from cart
- field quantity_update_probability: float = 0.05¶
Probability of updating quantity
- field max_items: int = 20¶
Maximum items per cart
- field enable_abandonment: bool = True¶
Enable cart abandonment simulation
- field abandonment_rate: float = 0.7¶
Cart abandonment rate
- pydantic model superstore.CatalogConfig[source]¶
Bases:
BaseModelConfiguration for product catalog.
Show JSON schema
{ "title": "CatalogConfig", "description": "Configuration for product catalog.", "type": "object", "properties": { "num_products": { "default": 500, "description": "Number of unique products", "minimum": 1, "title": "Num Products", "type": "integer" }, "min_price": { "default": 5.0, "description": "Minimum product price", "minimum": 0.01, "title": "Min Price", "type": "number" }, "max_price": { "default": 1000.0, "description": "Maximum product price", "minimum": 1.0, "title": "Max Price", "type": "number" }, "lognormal_prices": { "default": true, "description": "Price follows log-normal distribution (realistic skew)", "title": "Lognormal Prices", "type": "boolean" }, "categories": { "description": "Product categories", "items": { "type": "string" }, "title": "Categories", "type": "array" } } }
- field num_products: int = 500¶
Number of unique products
- field min_price: float = 5.0¶
Minimum product price
- field max_price: float = 1000.0¶
Maximum product price
- field lognormal_prices: bool = True¶
Price follows log-normal distribution (realistic skew)
- field categories: list[str] [Optional]¶
Product categories
- pydantic model superstore.RfmConfig[source]¶
Bases:
BaseModelConfiguration for RFM (Recency, Frequency, Monetary) analysis.
Show JSON schema
{ "title": "RfmConfig", "description": "Configuration for RFM (Recency, Frequency, Monetary) analysis.", "type": "object", "properties": { "enable": { "default": true, "description": "Enable RFM metrics calculation", "title": "Enable", "type": "boolean" }, "recency_window_days": { "default": 365, "description": "Days to look back for recency", "minimum": 1, "title": "Recency Window Days", "type": "integer" }, "num_buckets": { "default": 5, "description": "Number of RFM score buckets (typically 5)", "maximum": 10, "minimum": 2, "title": "Num Buckets", "type": "integer" }, "pareto_shape": { "default": 1.5, "description": "Pareto distribution shape for customer value (80/20 rule)", "minimum": 1.0, "title": "Pareto Shape", "type": "number" } } }
- field enable: bool = True¶
Enable RFM metrics calculation
- field recency_window_days: int = 365¶
Days to look back for recency
- field num_buckets: int = 5¶
Number of RFM score buckets (typically 5)
- field pareto_shape: float = 1.5¶
Pareto distribution shape for customer value (80/20 rule)
- pydantic model superstore.FunnelConfig[source]¶
Bases:
BaseModelConfiguration for conversion funnel.
Show JSON schema
{ "title": "FunnelConfig", "description": "Configuration for conversion funnel.", "type": "object", "properties": { "enable": { "default": true, "description": "Enable funnel stage tracking", "title": "Enable", "type": "boolean" }, "stages": { "description": "Funnel stages", "items": { "type": "string" }, "title": "Stages", "type": "array" }, "time_of_day_effects": { "default": true, "description": "Time-of-day effects on conversions", "title": "Time Of Day Effects", "type": "boolean" }, "day_of_week_effects": { "default": true, "description": "Day-of-week effects on conversions", "title": "Day Of Week Effects", "type": "boolean" } } }
- field enable: bool = True¶
Enable funnel stage tracking
- field stages: list[str] [Optional]¶
Funnel stages
- field time_of_day_effects: bool = True¶
Time-of-day effects on conversions
- field day_of_week_effects: bool = True¶
Day-of-week effects on conversions
Enums¶
- class superstore.ClimateZone(value)[source]¶
Bases:
str,EnumClimate zone affecting weather patterns.
- ARID = 'arid'¶
- CONTINENTAL = 'continental'¶
- MEDITERRANEAN = 'mediterranean'¶
- POLAR = 'polar'¶
- SUBTROPICAL = 'subtropical'¶
- TEMPERATE = 'temperate'¶
- TROPICAL = 'tropical'¶
- class superstore.OutputFormat(value)[source]¶
Bases:
str,EnumOutput format for generators.
- DICT = 'dict'¶
- PANDAS = 'pandas'¶
- POLARS = 'polars'¶