API Reference

Full API reference for all public functions and classes.

For detailed guides with examples, see:


Data Generators

superstore.superstore(config=None, count=None, output=None, seed=None)

Generate superstore sales data with structured configuration.

Parameters:
  • config – Optional SuperstoreConfig pydantic model, dict, or int (for backward compatibility). If int, treated as count. If None, uses default configuration.

  • count – Number of rows (overrides config if provided)

  • output – Output format (“pandas”, “polars”, or “dict”)

  • seed – Random seed (overrides config if provided)

Returns:

Superstore sales data in the specified format.

superstore.employees(count=1000, output='pandas', seed=None)
superstore.timeseries(config=None, nper=None, freq=None, ncol=None, output=None, seed=None)

Generate time series data with structured configuration.

Parameters:
  • config – Optional TimeseriesConfig pydantic model, dict, or int (for backward compatibility). If int, treated as nper. If None, uses default configuration.

  • nper – Number of periods (overrides config if provided)

  • freq – Frequency string (overrides config if provided)

  • ncol – Number of columns (overrides config if provided)

  • output – Output format (“pandas”, “polars”, or “dict”)

  • seed – Random seed (overrides config if provided)

Returns:

Time series data in the specified format.

superstore.weather(config=None, count=None, output='pandas', seed=None)

Generate weather data with structured configuration.

Parameters:
  • config – Optional WeatherConfig pydantic model or dict with configuration. If None, uses default configuration.

  • count – Number of readings (overrides config if provided)

  • output – Output format (“pandas”, “polars”, or “dict”)

  • seed – Random seed (overrides config if provided)

Returns:

Weather sensor data in the specified format.

superstore.logs(config=None)

Generate web server access logs.

Returns realistic HTTP access log entries with configurable traffic patterns, status code distributions (via Markov chain), latency (LogNormal), and error bursts.

# Arguments * config - Optional LogsConfig or dict with generation parameters

# Returns * DataFrame (pandas/polars) or dict of log entries

superstore.app_logs(config=None)

Generate application event logs.

Returns application-level log entries with log levels, loggers, messages, thread IDs, trace/span IDs, and optional exceptions.

# Arguments * config - Optional LogsConfig or dict with generation parameters

# Returns * DataFrame (pandas/polars) or dict of application log entries

superstore.stock_prices(config=None)

Generate stock price data (OHLCV bars).

Returns realistic stock price data using Geometric Brownian Motion with optional jump diffusion. Includes OHLCV bars with realistic intraday relationships and volume patterns.

# Arguments * config - Optional FinanceConfig or dict with generation parameters

# Returns * DataFrame (pandas/polars) or dict of OHLCV bars

superstore.options_chain(config=None, spot_price=None, date=None)

Generate options chain with Greeks.

Returns options data including Black-Scholes pricing and Greeks (delta, gamma, theta, vega) for various strikes and expirations.

# Arguments * config - Optional FinanceConfig or dict with generation parameters * spot_price - Current underlying price (default: 100.0) * date - Pricing date (default: “2024-01-15”)

# Returns * DataFrame (pandas/polars) of options chain

superstore.finance(config=None)

Generate complete finance dataset: stock prices + options.

Returns both OHLCV price data and an options chain for the last trading day.

# Arguments * config - Optional FinanceConfig or dict with generation parameters

# Returns * Tuple of (prices_df, options_df)

superstore.telemetry(config=None, scenario=None)

Generate IoT telemetry data with configurable behaviors and preset scenarios.

# Arguments * config - Optional TelemetryConfig dict or None for defaults * scenario - Optional preset scenario name: “normal”, “cpu_spikes”, “memory_leak”,

“network_congestion”, “disk_pressure”, “cascade_failure”, “maintenance_window”, “sensor_drift”, “degradation_cycle”, “production”, “chaos”

# Returns * DataFrame (pandas/polars) with telemetry readings

superstore.machines(config=None, count=None, json=False, seed=None)

Generate machine data with structured configuration.

Parameters:
  • config – Optional CrossfilterConfig pydantic model, dict, or int (for backward compatibility). If int, treated as count. If None, uses default configuration.

  • count – Number of machines (overrides config if provided)

  • json – Whether to return JSON (deprecated, unused)

  • seed – Random seed (overrides config if provided)

Returns:

List of machine dictionaries.

superstore.usage(machine, json=False, seed=None)
superstore.status(machine, json=False)
superstore.jobs(machine, json=False, seed=None)
superstore.ecommerce_sessions(count, seed=None, output='pandas')

Generate e-commerce sessions

Parameters:
  • count – Number of sessions to generate

  • seed – Optional random seed for reproducibility

  • output – Output format (“pandas”, “polars”, or “dict”)

Returns:

DataFrame or dict with session data

superstore.ecommerce_products(count, seed=None, output='pandas')

Generate e-commerce product catalog

Parameters:
  • count – Number of products to generate

  • seed – Optional random seed for reproducibility

  • output – Output format (“pandas”, “polars”, or “dict”)

Returns:

DataFrame or dict with product data

superstore.ecommerce_data(config=None, output='pandas')

Generate complete e-commerce dataset

Parameters:
  • config – EcommerceConfig dict with generation parameters

  • output – Output format (“pandas”, “polars”, or “dict”)

Returns:

Dict with DataFrames for products, sessions, cart_events, orders, customers


Streaming & Parallel

superstore.superstoreStream(total_count, chunk_size=1000, seed=None)

Create a streaming superstore data generator.

This returns an iterator that yields chunks of data, allowing memory-efficient processing of large datasets.

Parameters:
  • total_count – Total number of rows to generate

  • chunk_size – Number of rows per chunk (default: 1000)

  • seed – Optional seed for reproducibility

Returns:

An iterator yielding lists of dicts

Example

>>> for chunk in superstoreStream(1_000_000, chunk_size=10000):
...     process(chunk)  # Each chunk is a list of 10000 dicts
superstore.employeesStream(total_count, chunk_size=1000, seed=None)

Create a streaming employee data generator.

This returns an iterator that yields chunks of data, allowing memory-efficient processing of large datasets.

Parameters:
  • total_count – Total number of employees to generate

  • chunk_size – Number of employees per chunk (default: 1000)

  • seed – Optional seed for reproducibility

Returns:

An iterator yielding lists of dicts

Example

>>> for chunk in employeesStream(1_000_000, chunk_size=10000):
...     process(chunk)  # Each chunk is a list of 10000 dicts
superstore.superstoreParallel(count=1000, output='pandas', seed=None)

Generate superstore data in parallel using multiple CPU cores.

This function uses Rayon to parallelize data generation across all available CPU cores, providing significant speedup for large datasets.

Parameters:
  • count – Number of rows to generate

  • output – Output format - “pandas”, “polars”, or “dict” (default: “pandas”)

  • seed – Optional seed for reproducibility

Returns:

DataFrame or list of dicts depending on output format

Example

>>> df = superstoreParallel(1_000_000)  # Uses all CPU cores
superstore.employeesParallel(count=1000, output='pandas', seed=None)

Generate employee data in parallel using multiple CPU cores.

This function uses Rayon to parallelize data generation across all available CPU cores, providing significant speedup for large datasets.

Parameters:
  • count – Number of employees to generate

  • output – Output format - “pandas”, “polars”, or “dict” (default: “pandas”)

  • seed – Optional seed for reproducibility

Returns:

DataFrame or list of dicts depending on output format

Example

>>> df = employeesParallel(1_000_000)  # Uses all CPU cores
superstore.numThreads()

Get the number of CPU threads available for parallel operations.

Returns:

Number of threads Rayon will use

superstore.setNumThreads(num_threads)

Set the number of threads for parallel operations.

This should be called early in the program before any parallel operations. Once set, it cannot be changed.

Parameters:

num_threads – Number of threads to use for parallel generation

Raises:

RuntimeError – If the thread pool has already been initialized


Export Functions

superstore.superstoreArrowIpc(count, seed=None)

Generate superstore data as Arrow IPC bytes.

The returned bytes can be read by PyArrow: ```python import pyarrow as pa from superstore import superstoreArrowIpc

ipc_bytes = superstoreArrowIpc(100) reader = pa.ipc.open_stream(ipc_bytes) table = reader.read_all() df = table.to_pandas() ```

# Arguments * count - Number of rows to generate * seed - Optional random seed for reproducibility

# Returns Arrow IPC stream bytes

superstore.employeesArrowIpc(count, seed=None)

Generate employee data as Arrow IPC bytes.

The returned bytes can be read by PyArrow: ```python import pyarrow as pa from superstore import employeesArrowIpc

ipc_bytes = employeesArrowIpc(100) reader = pa.ipc.open_stream(ipc_bytes) table = reader.read_all() df = table.to_pandas() ```

# Arguments * count - Number of rows to generate * seed - Optional random seed for reproducibility

# Returns Arrow IPC stream bytes

superstore.superstoreToParquet(path, count, seed=None, compression=None)

Write superstore data directly to a Parquet file.

# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility * compression - Compression type: ‘none’, ‘snappy’ (default), or ‘zstd’

# Returns Number of rows written

superstore.employeesToParquet(path, count, seed=None, compression=None)

Write employee data directly to a Parquet file.

# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility * compression - Compression type: ‘none’, ‘snappy’ (default), or ‘zstd’

# Returns Number of rows written

superstore.superstoreToCsv(path, count, seed=None)

Write superstore data directly to a CSV file.

# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility

# Returns Number of rows written

superstore.employeesToCsv(path, count, seed=None)

Write employee data directly to a CSV file.

# Arguments * path - Output file path * count - Number of rows to generate * seed - Optional random seed for reproducibility

# Returns Number of rows written


Distributions

superstore.sampleUniform(min, max, n=1, seed=None)

Sample from a uniform distribution.

Parameters:
  • min – Minimum value

  • max – Maximum value

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleNormal(mean, std_dev, n=1, seed=None)

Sample from a normal (Gaussian) distribution.

Parameters:
  • mean – Mean of the distribution

  • std_dev – Standard deviation

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleLogNormal(mu, sigma, n=1, seed=None)

Sample from a log-normal distribution.

Parameters:
  • mu – Mean of the underlying normal distribution

  • sigma – Standard deviation of the underlying normal distribution

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleExponential(lambda_, n=1, seed=None)

Sample from an exponential distribution.

Parameters:
  • lambda – Rate parameter (1/mean)

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleBeta(alpha, beta, n=1, seed=None)

Sample from a Beta distribution.

Parameters:
  • alpha – Shape parameter alpha

  • beta – Shape parameter beta

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise (values in [0, 1])

superstore.sampleGamma(shape, scale, n=1, seed=None)

Sample from a Gamma distribution.

Parameters:
  • shape – Shape parameter

  • scale – Scale parameter

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleWeibull(shape, scale, n=1, seed=None)

Sample from a Weibull distribution.

Parameters:
  • shape – Shape parameter

  • scale – Scale parameter

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.samplePareto(scale, shape, n=1, seed=None)

Sample from a Pareto (power law) distribution.

Parameters:
  • scale – Scale parameter (minimum value)

  • shape – Shape parameter (tail index)

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.samplePoisson(lambda_, n=1, seed=None)

Sample from a Poisson distribution.

Parameters:
  • lambda – Rate parameter (expected count)

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

superstore.sampleCategorical(weights, n=1, seed=None)

Sample from a categorical distribution with weights.

Parameters:
  • weights – List of weights for each category (will be normalized)

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single category index if n=1, list of indices otherwise

superstore.sampleMixture(means, std_devs, weights, n=1, seed=None)

Sample from a mixture of normal distributions.

Parameters:
  • means – List of means for each component

  • std_devs – List of standard deviations for each component

  • weights – List of weights for each component (will be normalized)

  • n – Number of samples (default: 1)

  • seed – Optional seed for reproducibility

Returns:

Single value if n=1, list of values otherwise

Example

>>> # Bimodal distribution
>>> samples = sampleMixture([30000, 80000], [10000, 20000], [0.6, 0.4], n=1000)
superstore.addGaussianNoise(values, std_dev, seed=None)

Add Gaussian noise to values.

Parameters:
  • values – List of values to add noise to

  • std_dev – Standard deviation of the noise

  • seed – Optional seed for reproducibility

Returns:

List of values with noise added

superstore.applyMissing(values, probability, seed=None)

Apply missing at random to values.

Parameters:
  • values – List of values

  • probability – Probability of each value being missing (0-1)

  • seed – Optional seed for reproducibility

Returns:

List of values with some replaced by None


Correlation & Copulas

superstore.pearsonCorrelation(x, y)

Compute the Pearson correlation coefficient between two lists.

# Arguments * x - First variable * y - Second variable

# Returns The correlation coefficient (between -1 and 1)

superstore.sampleBivariate(n, rho, mean1=0.0, std1=1.0, mean2=0.0, std2=1.0, seed=None)

Generate correlated bivariate normal data.

This is a convenience function for the common case of generating two correlated variables.

# Arguments * n - Number of samples * rho - Correlation coefficient between the two variables * mean1, std1 - Mean and standard deviation for first variable * mean2, std2 - Mean and standard deviation for second variable * seed - Optional random seed

# Returns A tuple of two lists (x, y)

class superstore.GaussianCopula(correlation_matrix)

Bases: object

Gaussian (Normal) Copula.

Uses multivariate normal distribution to model dependencies. The correlation between variables is specified via a correlation matrix.

Example

>>> copula = GaussianCopula([[1.0, 0.8], [0.8, 1.0]])
>>> samples = copula.sample(100)
>>> # Each sample is a list of uniform [0,1] values with the specified correlation
dim

Get the dimension of the copula.

sample(n, seed=None)

Generate n samples from the copula.

Parameters:
  • n – Number of samples to generate

  • seed – Optional random seed

Returns:

List of n samples, where each sample is a list of d uniform [0,1] values

class superstore.ClaytonCopula(theta, dim)

Bases: object

Clayton Copula.

An Archimedean copula with lower tail dependence. Good for modeling dependencies where extreme low values tend to occur together.

Example

>>> copula = ClaytonCopula(2.0, 2)  # theta=2, 2 dimensions
>>> samples = copula.sample(100)
dim

Get the dimension of the copula.

kendalls_tau()

Get Kendall’s tau (measure of correlation).

sample(n, seed=None)

Generate n samples from the copula.

theta

Get theta parameter.

class superstore.FrankCopula(theta)

Bases: object

Frank Copula.

An Archimedean copula with symmetric tail dependence. Good for modeling overall dependence without tail asymmetry.

Example

>>> copula = FrankCopula(5.0)  # positive dependence
>>> samples = copula.sample(100)
kendalls_tau()

Get Kendall’s tau (measure of correlation).

sample(n, seed=None)

Generate n bivariate samples from the copula.

Returns:

List of n tuples (u, v), each containing two uniform [0,1] values

theta

Get theta parameter.

class superstore.GumbelCopula(theta)

Bases: object

Gumbel Copula.

An Archimedean copula with upper tail dependence. Good for modeling dependencies where extreme high values tend to occur together.

Example

>>> copula = GumbelCopula(2.0)  # theta=2 means moderate upper tail dependence
>>> samples = copula.sample(100)
kendalls_tau()

Get Kendall’s tau (measure of correlation).

sample(n, seed=None)

Generate n bivariate samples from the copula.

Returns:

List of n tuples (u, v), each containing two uniform [0,1] values

theta

Get theta parameter.

upper_tail_dependence()

Get upper tail dependence coefficient.


Temporal Models

class superstore.AR1(phi, sigma, mean=0.0)

Bases: object

AR(1) autoregressive model for generating temporally dependent data.

Generates values according to: x_t = mean + phi * (x_{t-1} - mean) + epsilon_t where epsilon_t ~ N(0, sigma^2)

mean

Get the mean.

phi

Get the phi coefficient.

reset()

Reset the state to the mean.

sample(n, seed=None)

Generate n samples.

Parameters:
  • n – Number of samples to generate

  • seed – Optional random seed

Returns:

List of n values

sigma

Get the sigma value.

state

Get the current state.

stationary_variance()

Get the stationary variance of the process.

class superstore.ARp(coefficients, sigma, mean=0.0)

Bases: object

AR(p) autoregressive model of order p.

Generates values according to: x_t = mean + sum_i(phi_i * (x_{t-i} - mean)) + epsilon_t

static ar2(phi1, phi2, sigma, mean=0.0)

Create an AR(2) model.

order()

Get the order of the AR model.

reset()

Reset the state to the mean.

sample(n, seed=None)

Generate n samples.

class superstore.MarkovChain(transition_matrix, states)

Bases: object

Markov chain for generating temporally dependent categorical data.

current_state

Get current state.

sample(n, seed=None)

Generate n state transitions.

sample_indices(n, seed=None)

Generate n state transitions as indices.

set_state(state)

Set current state by name.

states()

Get all states.

stationary_distribution()

Get stationary distribution.

static two_state(state_a, state_b, prob_a_to_b, prob_b_to_a)

Create a simple two-state Markov chain.

Parameters:
  • state_a – Name of first state

  • state_b – Name of second state

  • prob_a_to_b – Probability of transitioning from A to B

  • prob_b_to_a – Probability of transitioning from B to A

class superstore.RandomWalk(sigma, start=0.0, drift=0.0)

Bases: object

Random walk model.

position

Get current position.

sample(n, seed=None)

Generate n samples.

class superstore.ExponentialSmoothing(alpha, sigma, initial=0.0)

Bases: object

Exponential smoothing generator for smooth trend generation.

sample(n, seed=None)

Generate n samples.

smoothed

Get current smoothed value.


Configuration Classes

pydantic model superstore.SuperstoreConfig[source]

Bases: BaseModel

Configuration for the superstore data generator.

Generates realistic retail transaction data with correlations between sales, quantity, discount, and profit.

Show JSON schema
{
   "title": "SuperstoreConfig",
   "description": "Configuration for the superstore data generator.\n\nGenerates realistic retail transaction data with correlations\nbetween sales, quantity, discount, and profit.",
   "type": "object",
   "properties": {
      "count": {
         "default": 1000,
         "description": "Number of rows to generate",
         "minimum": 1,
         "title": "Count",
         "type": "integer"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "pool_size": {
         "default": 1000,
         "description": "Size of pre-generated data pools for performance",
         "maximum": 100000,
         "minimum": 1,
         "title": "Pool Size",
         "type": "integer"
      },
      "sales_quantity_correlation": {
         "default": 0.8,
         "description": "Sales-quantity correlation",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Sales Quantity Correlation",
         "type": "number"
      },
      "sales_profit_correlation": {
         "default": 0.9,
         "description": "Sales-profit correlation",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Sales Profit Correlation",
         "type": "number"
      },
      "discount_profit_correlation": {
         "default": -0.6,
         "description": "Discount-profit correlation",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Discount Profit Correlation",
         "type": "number"
      },
      "enable_price_points": {
         "default": true,
         "description": "Round prices to realistic $X.99 values",
         "title": "Enable Price Points",
         "type": "boolean"
      },
      "seasonality": {
         "$ref": "#/$defs/SeasonalityConfig",
         "description": "Seasonal patterns"
      },
      "promotions": {
         "$ref": "#/$defs/PromotionalConfig",
         "description": "Promotional effects"
      },
      "customers": {
         "$ref": "#/$defs/CustomerConfig",
         "description": "Customer behavior"
      }
   },
   "$defs": {
      "CustomerConfig": {
         "description": "Configuration for customer behavior patterns.",
         "properties": {
            "enable_cohorts": {
               "default": true,
               "description": "Enable customer cohort modeling",
               "title": "Enable Cohorts",
               "type": "boolean"
            },
            "repeat_customer_rate": {
               "default": 0.7,
               "description": "Fraction of orders from repeat customers",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Repeat Customer Rate",
               "type": "number"
            },
            "vip_segment_rate": {
               "default": 0.1,
               "description": "Fraction of customers in VIP segment",
               "maximum": 0.5,
               "minimum": 0.0,
               "title": "Vip Segment Rate",
               "type": "number"
            },
            "vip_order_multiplier": {
               "default": 2.0,
               "description": "VIP customer order value multiplier",
               "maximum": 5.0,
               "minimum": 1.0,
               "title": "Vip Order Multiplier",
               "type": "number"
            }
         },
         "title": "CustomerConfig",
         "type": "object"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      },
      "PromotionalConfig": {
         "description": "Configuration for promotional effects.",
         "properties": {
            "enable": {
               "default": true,
               "description": "Enable promotional patterns",
               "title": "Enable",
               "type": "boolean"
            },
            "discount_quantity_correlation": {
               "default": 0.5,
               "description": "How much discounts increase quantity (correlation factor)",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Discount Quantity Correlation",
               "type": "number"
            },
            "price_elasticity": {
               "default": -0.8,
               "description": "Price elasticity of demand",
               "maximum": 0.0,
               "minimum": -2.0,
               "title": "Price Elasticity",
               "type": "number"
            }
         },
         "title": "PromotionalConfig",
         "type": "object"
      },
      "SeasonalityConfig": {
         "description": "Configuration for seasonal patterns in sales data.",
         "properties": {
            "enable": {
               "default": true,
               "description": "Enable seasonal effects",
               "title": "Enable",
               "type": "boolean"
            },
            "q4_multiplier": {
               "default": 1.5,
               "description": "Q4 (holiday) sales multiplier",
               "maximum": 3.0,
               "minimum": 1.0,
               "title": "Q4 Multiplier",
               "type": "number"
            },
            "summer_multiplier": {
               "default": 0.9,
               "description": "Summer sales multiplier",
               "maximum": 1.5,
               "minimum": 0.5,
               "title": "Summer Multiplier",
               "type": "number"
            },
            "back_to_school_multiplier": {
               "default": 1.2,
               "description": "August/September sales multiplier",
               "maximum": 2.0,
               "minimum": 1.0,
               "title": "Back To School Multiplier",
               "type": "number"
            }
         },
         "title": "SeasonalityConfig",
         "type": "object"
      }
   }
}

field count: int = 1000

Number of rows to generate

field output: OutputFormat = OutputFormat.DICT

Output format

field seed: int | None = None

Random seed for reproducibility

field pool_size: int = 1000

Size of pre-generated data pools for performance

field sales_quantity_correlation: float = 0.8

Sales-quantity correlation

field sales_profit_correlation: float = 0.9

Sales-profit correlation

field discount_profit_correlation: float = -0.6

Discount-profit correlation

field enable_price_points: bool = True

Round prices to realistic $X.99 values

field seasonality: SeasonalityConfig [Optional]

Seasonal patterns

field promotions: PromotionalConfig [Optional]

Promotional effects

field customers: CustomerConfig [Optional]

Customer behavior

pydantic model superstore.TimeseriesConfig[source]

Bases: BaseModel

Configuration for the time series generator.

Generates financial-style time series with optional regime changes, volatility clustering, and jump diffusion.

Show JSON schema
{
   "title": "TimeseriesConfig",
   "description": "Configuration for the time series generator.\n\nGenerates financial-style time series with optional regime changes,\nvolatility clustering, and jump diffusion.",
   "type": "object",
   "properties": {
      "nper": {
         "default": 30,
         "description": "Number of periods",
         "minimum": 1,
         "title": "Nper",
         "type": "integer"
      },
      "ncol": {
         "default": 4,
         "description": "Number of columns (max 26)",
         "maximum": 26,
         "minimum": 1,
         "title": "Ncol",
         "type": "integer"
      },
      "freq": {
         "default": "B",
         "description": "Frequency: B=business, D=daily, W=weekly, M=monthly",
         "enum": [
            "B",
            "D",
            "W",
            "M"
         ],
         "title": "Freq",
         "type": "string"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "ar_phi": {
         "default": 0.95,
         "description": "AR(1) persistence parameter",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Ar Phi",
         "type": "number"
      },
      "sigma": {
         "default": 1.0,
         "description": "Innovation standard deviation",
         "minimum": 0.0,
         "title": "Sigma",
         "type": "number"
      },
      "drift": {
         "default": 0.0,
         "description": "Drift/trend per period",
         "title": "Drift",
         "type": "number"
      },
      "cumulative": {
         "default": true,
         "description": "Apply cumulative sum (price-like behavior)",
         "title": "Cumulative",
         "type": "boolean"
      },
      "use_fat_tails": {
         "default": false,
         "description": "Use Student-t instead of normal innovations",
         "title": "Use Fat Tails",
         "type": "boolean"
      },
      "degrees_freedom": {
         "default": 5.0,
         "description": "Degrees of freedom for Student-t",
         "maximum": 30.0,
         "minimum": 2.1,
         "title": "Degrees Freedom",
         "type": "number"
      },
      "cross_correlation": {
         "default": 0.0,
         "description": "Correlation between columns (0 = independent)",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Cross Correlation",
         "type": "number"
      },
      "regimes": {
         "$ref": "#/$defs/RegimeConfig",
         "description": "Regime switching configuration"
      },
      "jumps": {
         "$ref": "#/$defs/JumpConfig",
         "description": "Jump diffusion configuration"
      }
   },
   "$defs": {
      "JumpConfig": {
         "description": "Configuration for jump diffusion.",
         "properties": {
            "enable": {
               "default": false,
               "description": "Enable jump diffusion",
               "title": "Enable",
               "type": "boolean"
            },
            "jump_probability": {
               "default": 0.01,
               "description": "Probability of jump per period",
               "maximum": 0.1,
               "minimum": 0.0,
               "title": "Jump Probability",
               "type": "number"
            },
            "jump_mean": {
               "default": 0.0,
               "description": "Mean jump size",
               "title": "Jump Mean",
               "type": "number"
            },
            "jump_stddev": {
               "default": 0.05,
               "description": "Standard deviation of jump size",
               "minimum": 0.0,
               "title": "Jump Stddev",
               "type": "number"
            }
         },
         "title": "JumpConfig",
         "type": "object"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      },
      "RegimeConfig": {
         "description": "Configuration for regime-switching behavior.",
         "properties": {
            "enable": {
               "default": false,
               "description": "Enable regime switching",
               "title": "Enable",
               "type": "boolean"
            },
            "n_regimes": {
               "default": 2,
               "description": "Number of regimes",
               "maximum": 5,
               "minimum": 2,
               "title": "N Regimes",
               "type": "integer"
            },
            "regime_persistence": {
               "default": 0.95,
               "description": "Probability of staying in current regime",
               "maximum": 0.99,
               "minimum": 0.5,
               "title": "Regime Persistence",
               "type": "number"
            },
            "volatility_multipliers": {
               "description": "Volatility multiplier for each regime",
               "items": {
                  "type": "number"
               },
               "title": "Volatility Multipliers",
               "type": "array"
            }
         },
         "title": "RegimeConfig",
         "type": "object"
      }
   }
}

field nper: int = 30

Number of periods

field ncol: int = 4

Number of columns (max 26)

field freq: Literal['B', 'D', 'W', 'M'] = 'B'

Frequency: B=business, D=daily, W=weekly, M=monthly

field output: OutputFormat = OutputFormat.DICT

Output format

field seed: int | None = None

Random seed for reproducibility

field ar_phi: float = 0.95

AR(1) persistence parameter

field sigma: float = 1.0

Innovation standard deviation

field drift: float = 0.0

Drift/trend per period

field cumulative: bool = True

Apply cumulative sum (price-like behavior)

field use_fat_tails: bool = False

Use Student-t instead of normal innovations

field degrees_freedom: float = 5.0

Degrees of freedom for Student-t

field cross_correlation: float = 0.0

Correlation between columns (0 = independent)

field regimes: RegimeConfig [Optional]

Regime switching configuration

field jumps: JumpConfig [Optional]

Jump diffusion configuration

pydantic model superstore.WeatherConfig[source]

Bases: BaseModel

Configuration for the weather data generator.

Generates realistic outdoor sensor data with temporal patterns, seasonal variations, and weather events.

Show JSON schema
{
   "title": "WeatherConfig",
   "description": "Configuration for the weather data generator.\n\nGenerates realistic outdoor sensor data with temporal patterns,\nseasonal variations, and weather events.",
   "type": "object",
   "properties": {
      "count": {
         "default": 1000,
         "description": "Number of readings to generate",
         "minimum": 1,
         "title": "Count",
         "type": "integer"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "start_date": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Start date (YYYY-MM-DD). Defaults to 30 days ago.",
         "title": "Start Date"
      },
      "frequency_minutes": {
         "default": 15,
         "description": "Reading frequency in minutes",
         "maximum": 1440,
         "minimum": 1,
         "title": "Frequency Minutes",
         "type": "integer"
      },
      "climate_zone": {
         "$ref": "#/$defs/ClimateZone",
         "default": "temperate",
         "description": "Climate zone for realistic patterns"
      },
      "latitude": {
         "default": 40.0,
         "description": "Latitude for day/night calculations",
         "maximum": 90.0,
         "minimum": -90.0,
         "title": "Latitude",
         "type": "number"
      },
      "base_temp_celsius": {
         "default": 15.0,
         "description": "Annual average temperature in Celsius",
         "maximum": 50.0,
         "minimum": -50.0,
         "title": "Base Temp Celsius",
         "type": "number"
      },
      "temp_daily_amplitude": {
         "default": 10.0,
         "description": "Day/night temperature swing in Celsius",
         "maximum": 30.0,
         "minimum": 0.0,
         "title": "Temp Daily Amplitude",
         "type": "number"
      },
      "temp_seasonal_amplitude": {
         "default": 15.0,
         "description": "Summer/winter temperature swing in Celsius",
         "maximum": 40.0,
         "minimum": 0.0,
         "title": "Temp Seasonal Amplitude",
         "type": "number"
      },
      "temp_noise_stddev": {
         "default": 2.0,
         "description": "Random noise standard deviation",
         "maximum": 10.0,
         "minimum": 0.0,
         "title": "Temp Noise Stddev",
         "type": "number"
      },
      "base_humidity_percent": {
         "default": 60.0,
         "description": "Average humidity percentage",
         "maximum": 100.0,
         "minimum": 0.0,
         "title": "Base Humidity Percent",
         "type": "number"
      },
      "humidity_temp_correlation": {
         "default": -0.3,
         "description": "Correlation between temp and humidity (-1 to 1)",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Humidity Temp Correlation",
         "type": "number"
      },
      "precipitation_probability": {
         "default": 0.15,
         "description": "Base probability of precipitation",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Precipitation Probability",
         "type": "number"
      },
      "enable_weather_events": {
         "default": true,
         "description": "Enable weather event simulation",
         "title": "Enable Weather Events",
         "type": "boolean"
      },
      "event_probability": {
         "default": 0.05,
         "description": "Probability of weather event occurring",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Event Probability",
         "type": "number"
      },
      "outlier_probability": {
         "default": 0.01,
         "description": "Probability of outlier readings (sensor errors)",
         "maximum": 0.1,
         "minimum": 0.0,
         "title": "Outlier Probability",
         "type": "number"
      },
      "sensor_drift": {
         "default": false,
         "description": "Enable gradual sensor calibration drift",
         "title": "Sensor Drift",
         "type": "boolean"
      },
      "sensor_drift_rate": {
         "default": 0.001,
         "description": "Rate of sensor drift per reading",
         "maximum": 0.1,
         "minimum": 0.0,
         "title": "Sensor Drift Rate",
         "type": "number"
      }
   },
   "$defs": {
      "ClimateZone": {
         "description": "Climate zone affecting weather patterns.",
         "enum": [
            "tropical",
            "subtropical",
            "temperate",
            "continental",
            "polar",
            "arid",
            "mediterranean"
         ],
         "title": "ClimateZone",
         "type": "string"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      }
   }
}

field count: int = 1000

Number of readings to generate

field output: OutputFormat = OutputFormat.DICT

Output format

field seed: int | None = None

Random seed for reproducibility

field start_date: str | None = None

Start date (YYYY-MM-DD). Defaults to 30 days ago.

field frequency_minutes: int = 15

Reading frequency in minutes

field climate_zone: ClimateZone = ClimateZone.TEMPERATE

Climate zone for realistic patterns

field latitude: float = 40.0

Latitude for day/night calculations

field base_temp_celsius: float = 15.0

Annual average temperature in Celsius

field temp_daily_amplitude: float = 10.0

Day/night temperature swing in Celsius

field temp_seasonal_amplitude: float = 15.0

Summer/winter temperature swing in Celsius

field temp_noise_stddev: float = 2.0

Random noise standard deviation

field base_humidity_percent: float = 60.0

Average humidity percentage

field humidity_temp_correlation: float = -0.3

Correlation between temp and humidity (-1 to 1)

field precipitation_probability: float = 0.15

Base probability of precipitation

field enable_weather_events: bool = True

Enable weather event simulation

field event_probability: float = 0.05

Probability of weather event occurring

field outlier_probability: float = 0.01

Probability of outlier readings (sensor errors)

field sensor_drift: bool = False

Enable gradual sensor calibration drift

field sensor_drift_rate: float = 0.001

Rate of sensor drift per reading

pydantic model superstore.LogsConfig[source]

Bases: BaseModel

Configuration for the logs data generator.

Generates realistic web server access logs and application event logs with configurable traffic patterns, error rates, and latency distributions.

Show JSON schema
{
   "title": "LogsConfig",
   "description": "Configuration for the logs data generator.\n\nGenerates realistic web server access logs and application event logs\nwith configurable traffic patterns, error rates, and latency distributions.",
   "type": "object",
   "properties": {
      "count": {
         "default": 1000,
         "description": "Number of log entries to generate",
         "minimum": 1,
         "title": "Count",
         "type": "integer"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format (pandas, polars, or dict)"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "format": {
         "$ref": "#/$defs/LogFormat",
         "default": "combined",
         "description": "Log format style"
      },
      "start_time": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Start timestamp (ISO format). Defaults to current time.",
         "title": "Start Time"
      },
      "requests_per_second": {
         "default": 100.0,
         "description": "Average requests per second (Poisson rate)",
         "minimum": 0.1,
         "title": "Requests Per Second",
         "type": "number"
      },
      "success_rate": {
         "default": 0.95,
         "description": "Base success rate (2xx responses)",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Success Rate",
         "type": "number"
      },
      "error_burst": {
         "$ref": "#/$defs/ErrorBurstConfig",
         "description": "Error burst configuration"
      },
      "latency": {
         "$ref": "#/$defs/LatencyConfig",
         "description": "Latency distribution configuration"
      },
      "include_user_agent": {
         "default": true,
         "description": "Include user agent strings",
         "title": "Include User Agent",
         "type": "boolean"
      },
      "unique_ips": {
         "default": 1000,
         "description": "Number of unique IP addresses to generate",
         "minimum": 1,
         "title": "Unique Ips",
         "type": "integer"
      },
      "unique_users": {
         "default": 500,
         "description": "Number of unique user IDs",
         "minimum": 1,
         "title": "Unique Users",
         "type": "integer"
      },
      "api_path_ratio": {
         "default": 0.7,
         "description": "Ratio of API paths vs static paths",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Api Path Ratio",
         "type": "number"
      }
   },
   "$defs": {
      "ErrorBurstConfig": {
         "description": "Configuration for error burst behavior in logs.",
         "properties": {
            "enable": {
               "default": true,
               "description": "Enable error burst simulation",
               "title": "Enable",
               "type": "boolean"
            },
            "burst_probability": {
               "default": 0.02,
               "description": "Probability of entering a burst state per second",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Burst Probability",
               "type": "number"
            },
            "burst_duration_seconds": {
               "default": 30,
               "description": "Average duration of error bursts in seconds",
               "minimum": 1,
               "title": "Burst Duration Seconds",
               "type": "integer"
            },
            "burst_error_rate": {
               "default": 0.5,
               "description": "Error rate during burst periods",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Burst Error Rate",
               "type": "number"
            }
         },
         "title": "ErrorBurstConfig",
         "type": "object"
      },
      "LatencyConfig": {
         "description": "Configuration for request latency distribution.",
         "properties": {
            "base_latency_ms": {
               "default": 50.0,
               "description": "Base latency in milliseconds (median)",
               "minimum": 1.0,
               "title": "Base Latency Ms",
               "type": "number"
            },
            "latency_stddev": {
               "default": 0.8,
               "description": "Standard deviation for log-normal distribution",
               "minimum": 0.1,
               "title": "Latency Stddev",
               "type": "number"
            },
            "slow_request_probability": {
               "default": 0.05,
               "description": "Probability of a slow request",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Slow Request Probability",
               "type": "number"
            },
            "slow_request_multiplier": {
               "default": 10.0,
               "description": "Multiplier for slow request latency",
               "minimum": 1.0,
               "title": "Slow Request Multiplier",
               "type": "number"
            }
         },
         "title": "LatencyConfig",
         "type": "object"
      },
      "LogFormat": {
         "description": "Log output format styles.",
         "enum": [
            "combined",
            "common",
            "json",
            "application"
         ],
         "title": "LogFormat",
         "type": "string"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      }
   }
}

field count: int = 1000

Number of log entries to generate

field output: OutputFormat = OutputFormat.DICT

Output format (pandas, polars, or dict)

field seed: int | None = None

Random seed for reproducibility

field format: LogFormat = LogFormat.COMBINED

Log format style

field start_time: str | None = None

Start timestamp (ISO format). Defaults to current time.

field requests_per_second: float = 100.0

Average requests per second (Poisson rate)

field success_rate: float = 0.95

Base success rate (2xx responses)

field error_burst: ErrorBurstConfig [Optional]

Error burst configuration

field latency: LatencyConfig [Optional]

Latency distribution configuration

field include_user_agent: bool = True

Include user agent strings

field unique_ips: int = 1000

Number of unique IP addresses to generate

field unique_users: int = 500

Number of unique user IDs

field api_path_ratio: float = 0.7

Ratio of API paths vs static paths

pydantic model superstore.FinanceConfig[source]

Bases: BaseModel

Configuration for the finance data generator.

Generates realistic financial market data including OHLCV stock prices, multi-asset correlated returns, and options chains with Black-Scholes pricing.

Show JSON schema
{
   "title": "FinanceConfig",
   "description": "Configuration for the finance data generator.\n\nGenerates realistic financial market data including OHLCV stock prices,\nmulti-asset correlated returns, and options chains with Black-Scholes pricing.",
   "type": "object",
   "properties": {
      "ndays": {
         "default": 252,
         "description": "Number of trading days to generate (252 = 1 year)",
         "minimum": 1,
         "title": "Ndays",
         "type": "integer"
      },
      "n_assets": {
         "default": 1,
         "description": "Number of assets (1 = single stock, >1 = correlated multi-asset)",
         "minimum": 1,
         "title": "N Assets",
         "type": "integer"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format (pandas, polars, or dict)"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "start_date": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Start date (ISO format YYYY-MM-DD). Defaults to 2024-01-02.",
         "title": "Start Date"
      },
      "tickers": {
         "description": "Ticker symbols for the assets",
         "items": {
            "type": "string"
         },
         "title": "Tickers",
         "type": "array"
      },
      "asset_correlation": {
         "default": 0.5,
         "description": "Correlation between assets (for multi-asset generation)",
         "maximum": 1.0,
         "minimum": -1.0,
         "title": "Asset Correlation",
         "type": "number"
      },
      "stock": {
         "$ref": "#/$defs/StockConfig",
         "description": "Stock price generation configuration"
      },
      "ohlcv": {
         "$ref": "#/$defs/OhlcvConfig",
         "description": "OHLCV bar configuration"
      },
      "options": {
         "$ref": "#/$defs/OptionsConfig",
         "description": "Options chain configuration"
      }
   },
   "$defs": {
      "OhlcvConfig": {
         "description": "Configuration for OHLCV (Open-High-Low-Close-Volume) bar generation.",
         "properties": {
            "avg_volume": {
               "default": 1000000,
               "description": "Average daily trading volume",
               "minimum": 1,
               "title": "Avg Volume",
               "type": "integer"
            },
            "volume_volatility": {
               "default": 0.5,
               "description": "Volatility of volume (log-normal sigma)",
               "minimum": 0.0,
               "title": "Volume Volatility",
               "type": "number"
            },
            "intraday_volatility": {
               "default": 0.02,
               "description": "Intraday price range volatility",
               "minimum": 0.0,
               "title": "Intraday Volatility",
               "type": "number"
            },
            "volume_price_correlation": {
               "default": 0.3,
               "description": "Correlation between volume and absolute returns",
               "maximum": 1.0,
               "minimum": -1.0,
               "title": "Volume Price Correlation",
               "type": "number"
            }
         },
         "title": "OhlcvConfig",
         "type": "object"
      },
      "OptionsConfig": {
         "description": "Configuration for options chain generation with Black-Scholes pricing.",
         "properties": {
            "risk_free_rate": {
               "default": 0.05,
               "description": "Annual risk-free interest rate",
               "title": "Risk Free Rate",
               "type": "number"
            },
            "dividend_yield": {
               "default": 0.02,
               "description": "Annual dividend yield",
               "minimum": 0.0,
               "title": "Dividend Yield",
               "type": "number"
            },
            "expirations": {
               "description": "Days to expiration for option contracts",
               "items": {
                  "type": "integer"
               },
               "title": "Expirations",
               "type": "array"
            },
            "strike_offsets": {
               "description": "Strike prices as multipliers of spot price",
               "items": {
                  "type": "number"
               },
               "title": "Strike Offsets",
               "type": "array"
            }
         },
         "title": "OptionsConfig",
         "type": "object"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      },
      "StockConfig": {
         "description": "Configuration for stock price generation using Geometric Brownian Motion.",
         "properties": {
            "annual_drift": {
               "default": 0.08,
               "description": "Annual expected return (mu). E.g., 0.08 = 8% annual return",
               "title": "Annual Drift",
               "type": "number"
            },
            "annual_volatility": {
               "default": 0.2,
               "description": "Annual volatility (sigma). E.g., 0.20 = 20% annual volatility",
               "minimum": 0.0,
               "title": "Annual Volatility",
               "type": "number"
            },
            "initial_price": {
               "default": 100.0,
               "description": "Initial stock price",
               "exclusiveMinimum": 0.0,
               "title": "Initial Price",
               "type": "number"
            },
            "enable_jumps": {
               "default": false,
               "description": "Enable jump diffusion for more realistic price movements",
               "title": "Enable Jumps",
               "type": "boolean"
            },
            "jump_probability": {
               "default": 0.02,
               "description": "Daily probability of a jump event",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Jump Probability",
               "type": "number"
            },
            "jump_mean": {
               "default": 0.0,
               "description": "Mean of jump size (log-normal)",
               "title": "Jump Mean",
               "type": "number"
            },
            "jump_stddev": {
               "default": 0.05,
               "description": "Standard deviation of jump size",
               "minimum": 0.0,
               "title": "Jump Stddev",
               "type": "number"
            }
         },
         "title": "StockConfig",
         "type": "object"
      }
   }
}

field ndays: int = 252

Number of trading days to generate (252 = 1 year)

field n_assets: int = 1

Number of assets (1 = single stock, >1 = correlated multi-asset)

field output: OutputFormat = OutputFormat.DICT

Output format (pandas, polars, or dict)

field seed: int | None = None

Random seed for reproducibility

field start_date: str | None = None

Start date (ISO format YYYY-MM-DD). Defaults to 2024-01-02.

field tickers: list[str] [Optional]

Ticker symbols for the assets

field asset_correlation: float = 0.5

Correlation between assets (for multi-asset generation)

field stock: StockConfig [Optional]

Stock price generation configuration

field ohlcv: OhlcvConfig [Optional]

OHLCV bar configuration

field options: OptionsConfig [Optional]

Options chain configuration

pydantic model superstore.CrossfilterConfig[source]

Bases: BaseModel

Configuration for crossfilter IoT data generator.

Generates machine telemetry data suitable for dashboard demos with optional anomalies and temporal patterns.

Show JSON schema
{
   "title": "CrossfilterConfig",
   "description": "Configuration for crossfilter IoT data generator.\n\nGenerates machine telemetry data suitable for dashboard demos\nwith optional anomalies and temporal patterns.",
   "type": "object",
   "properties": {
      "n_machines": {
         "default": 10,
         "description": "Number of machines",
         "minimum": 1,
         "title": "N Machines",
         "type": "integer"
      },
      "n_readings": {
         "default": 1000,
         "description": "Number of usage readings per machine",
         "minimum": 1,
         "title": "N Readings",
         "type": "integer"
      },
      "output": {
         "$ref": "#/$defs/OutputFormat",
         "default": "dict",
         "description": "Output format"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "machine_types": {
         "description": "Types of machines to generate",
         "items": {
            "$ref": "#/$defs/MachineType"
         },
         "title": "Machine Types",
         "type": "array"
      },
      "cores_range": {
         "default": [
            4,
            64
         ],
         "description": "Range of CPU cores per machine",
         "maxItems": 2,
         "minItems": 2,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Cores Range",
         "type": "array"
      },
      "zones": {
         "description": "Available zones",
         "items": {
            "type": "string"
         },
         "title": "Zones",
         "type": "array"
      },
      "regions": {
         "description": "Available regions",
         "items": {
            "type": "string"
         },
         "title": "Regions",
         "type": "array"
      },
      "base_cpu_load": {
         "default": 0.3,
         "description": "Base CPU utilization",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Base Cpu Load",
         "type": "number"
      },
      "base_memory_load": {
         "default": 0.5,
         "description": "Base memory utilization",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Base Memory Load",
         "type": "number"
      },
      "load_variance": {
         "default": 0.2,
         "description": "Variance in load readings",
         "maximum": 0.5,
         "minimum": 0.0,
         "title": "Load Variance",
         "type": "number"
      },
      "anomalies": {
         "$ref": "#/$defs/AnomalyConfig",
         "description": "Anomaly injection settings"
      },
      "temporal_patterns": {
         "$ref": "#/$defs/TemporalPatternConfig",
         "description": "Temporal pattern settings"
      },
      "enable_failures": {
         "default": false,
         "description": "Enable machine failure simulation",
         "title": "Enable Failures",
         "type": "boolean"
      },
      "failure_probability": {
         "default": 0.001,
         "description": "Probability of failure per reading",
         "maximum": 0.1,
         "minimum": 0.0,
         "title": "Failure Probability",
         "type": "number"
      },
      "cascade_failure_probability": {
         "default": 0.3,
         "description": "Probability of cascade failure when dependent machine fails",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Cascade Failure Probability",
         "type": "number"
      }
   },
   "$defs": {
      "AnomalyConfig": {
         "description": "Configuration for anomaly injection.",
         "properties": {
            "enable": {
               "default": false,
               "description": "Enable anomaly injection",
               "title": "Enable",
               "type": "boolean"
            },
            "cpu_spike_probability": {
               "default": 0.02,
               "description": "Probability of CPU spike",
               "maximum": 0.1,
               "minimum": 0.0,
               "title": "Cpu Spike Probability",
               "type": "number"
            },
            "memory_leak_probability": {
               "default": 0.01,
               "description": "Probability of memory leak start",
               "maximum": 0.1,
               "minimum": 0.0,
               "title": "Memory Leak Probability",
               "type": "number"
            },
            "network_saturation_probability": {
               "default": 0.01,
               "description": "Probability of network saturation",
               "maximum": 0.1,
               "minimum": 0.0,
               "title": "Network Saturation Probability",
               "type": "number"
            }
         },
         "title": "AnomalyConfig",
         "type": "object"
      },
      "MachineType": {
         "description": "Types of machines for crossfilter.",
         "enum": [
            "core",
            "edge",
            "worker"
         ],
         "title": "MachineType",
         "type": "string"
      },
      "OutputFormat": {
         "description": "Output format for generators.",
         "enum": [
            "pandas",
            "polars",
            "dict"
         ],
         "title": "OutputFormat",
         "type": "string"
      },
      "TemporalPatternConfig": {
         "description": "Configuration for temporal patterns in IoT data.",
         "properties": {
            "enable_diurnal": {
               "default": false,
               "description": "Enable day/night load patterns",
               "title": "Enable Diurnal",
               "type": "boolean"
            },
            "enable_weekly": {
               "default": false,
               "description": "Enable weekday/weekend patterns",
               "title": "Enable Weekly",
               "type": "boolean"
            },
            "peak_hour": {
               "default": 14,
               "description": "Hour of peak load (0-23)",
               "maximum": 23,
               "minimum": 0,
               "title": "Peak Hour",
               "type": "integer"
            },
            "night_load_factor": {
               "default": 0.3,
               "description": "Load factor during night hours",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Night Load Factor",
               "type": "number"
            },
            "weekend_load_factor": {
               "default": 0.5,
               "description": "Load factor during weekends",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Weekend Load Factor",
               "type": "number"
            }
         },
         "title": "TemporalPatternConfig",
         "type": "object"
      }
   }
}

field n_machines: int = 10

Number of machines

field n_readings: int = 1000

Number of usage readings per machine

field output: OutputFormat = OutputFormat.DICT

Output format

field seed: int | None = None

Random seed for reproducibility

field machine_types: list[MachineType] [Optional]

Types of machines to generate

field cores_range: tuple[int, int] = (4, 64)

Range of CPU cores per machine

field zones: list[str] [Optional]

Available zones

field regions: list[str] [Optional]

Available regions

field base_cpu_load: float = 0.3

Base CPU utilization

field base_memory_load: float = 0.5

Base memory utilization

field load_variance: float = 0.2

Variance in load readings

field anomalies: AnomalyConfig [Optional]

Anomaly injection settings

field temporal_patterns: TemporalPatternConfig [Optional]

Temporal pattern settings

field enable_failures: bool = False

Enable machine failure simulation

field failure_probability: float = 0.001

Probability of failure per reading

field cascade_failure_probability: float = 0.3

Probability of cascade failure when dependent machine fails

pydantic model superstore.EcommerceConfig[source]

Bases: BaseModel

Configuration for e-commerce data generation.

Generates realistic e-commerce data including: - User sessions via MarkovChain state machines - Shopping cart events with abandonment patterns - Customer RFM (Recency, Frequency, Monetary) metrics - Product catalog with categories and pricing - Conversion funnels with realistic drop-off rates

Show JSON schema
{
   "title": "EcommerceConfig",
   "description": "Configuration for e-commerce data generation.\n\nGenerates realistic e-commerce data including:\n- User sessions via MarkovChain state machines\n- Shopping cart events with abandonment patterns\n- Customer RFM (Recency, Frequency, Monetary) metrics\n- Product catalog with categories and pricing\n- Conversion funnels with realistic drop-off rates",
   "type": "object",
   "properties": {
      "sessions": {
         "default": 10000,
         "description": "Number of sessions to generate",
         "minimum": 1,
         "title": "Sessions",
         "type": "integer"
      },
      "customers": {
         "default": 2000,
         "description": "Number of unique customers",
         "minimum": 1,
         "title": "Customers",
         "type": "integer"
      },
      "seed": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Random seed for reproducibility",
         "title": "Seed"
      },
      "start_date": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Start date for data generation (YYYY-MM-DD)",
         "title": "Start Date"
      },
      "days": {
         "default": 30,
         "description": "Number of days to generate",
         "minimum": 1,
         "title": "Days",
         "type": "integer"
      },
      "session": {
         "$ref": "#/$defs/SessionConfig",
         "description": "Session behavior configuration"
      },
      "cart": {
         "$ref": "#/$defs/CartConfig",
         "description": "Cart behavior configuration"
      },
      "catalog": {
         "$ref": "#/$defs/CatalogConfig",
         "description": "Product catalog configuration"
      },
      "rfm": {
         "$ref": "#/$defs/RfmConfig",
         "description": "RFM analysis configuration"
      },
      "funnel": {
         "$ref": "#/$defs/FunnelConfig",
         "description": "Conversion funnel configuration"
      }
   },
   "$defs": {
      "CartConfig": {
         "description": "Configuration for cart behavior.",
         "properties": {
            "avg_items_per_cart": {
               "default": 2.5,
               "description": "Average items per cart",
               "minimum": 1.0,
               "title": "Avg Items Per Cart",
               "type": "number"
            },
            "remove_probability": {
               "default": 0.1,
               "description": "Probability of removing an item from cart",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Remove Probability",
               "type": "number"
            },
            "quantity_update_probability": {
               "default": 0.05,
               "description": "Probability of updating quantity",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Quantity Update Probability",
               "type": "number"
            },
            "max_items": {
               "default": 20,
               "description": "Maximum items per cart",
               "minimum": 1,
               "title": "Max Items",
               "type": "integer"
            },
            "enable_abandonment": {
               "default": true,
               "description": "Enable cart abandonment simulation",
               "title": "Enable Abandonment",
               "type": "boolean"
            },
            "abandonment_rate": {
               "default": 0.7,
               "description": "Cart abandonment rate",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Abandonment Rate",
               "type": "number"
            }
         },
         "title": "CartConfig",
         "type": "object"
      },
      "CatalogConfig": {
         "description": "Configuration for product catalog.",
         "properties": {
            "num_products": {
               "default": 500,
               "description": "Number of unique products",
               "minimum": 1,
               "title": "Num Products",
               "type": "integer"
            },
            "min_price": {
               "default": 5.0,
               "description": "Minimum product price",
               "minimum": 0.01,
               "title": "Min Price",
               "type": "number"
            },
            "max_price": {
               "default": 1000.0,
               "description": "Maximum product price",
               "minimum": 1.0,
               "title": "Max Price",
               "type": "number"
            },
            "lognormal_prices": {
               "default": true,
               "description": "Price follows log-normal distribution (realistic skew)",
               "title": "Lognormal Prices",
               "type": "boolean"
            },
            "categories": {
               "description": "Product categories",
               "items": {
                  "type": "string"
               },
               "title": "Categories",
               "type": "array"
            }
         },
         "title": "CatalogConfig",
         "type": "object"
      },
      "FunnelConfig": {
         "description": "Configuration for conversion funnel.",
         "properties": {
            "enable": {
               "default": true,
               "description": "Enable funnel stage tracking",
               "title": "Enable",
               "type": "boolean"
            },
            "stages": {
               "description": "Funnel stages",
               "items": {
                  "type": "string"
               },
               "title": "Stages",
               "type": "array"
            },
            "time_of_day_effects": {
               "default": true,
               "description": "Time-of-day effects on conversions",
               "title": "Time Of Day Effects",
               "type": "boolean"
            },
            "day_of_week_effects": {
               "default": true,
               "description": "Day-of-week effects on conversions",
               "title": "Day Of Week Effects",
               "type": "boolean"
            }
         },
         "title": "FunnelConfig",
         "type": "object"
      },
      "RfmConfig": {
         "description": "Configuration for RFM (Recency, Frequency, Monetary) analysis.",
         "properties": {
            "enable": {
               "default": true,
               "description": "Enable RFM metrics calculation",
               "title": "Enable",
               "type": "boolean"
            },
            "recency_window_days": {
               "default": 365,
               "description": "Days to look back for recency",
               "minimum": 1,
               "title": "Recency Window Days",
               "type": "integer"
            },
            "num_buckets": {
               "default": 5,
               "description": "Number of RFM score buckets (typically 5)",
               "maximum": 10,
               "minimum": 2,
               "title": "Num Buckets",
               "type": "integer"
            },
            "pareto_shape": {
               "default": 1.5,
               "description": "Pareto distribution shape for customer value (80/20 rule)",
               "minimum": 1.0,
               "title": "Pareto Shape",
               "type": "number"
            }
         },
         "title": "RfmConfig",
         "type": "object"
      },
      "SessionConfig": {
         "description": "Configuration for session behavior in e-commerce.",
         "properties": {
            "avg_pages_per_session": {
               "default": 5.0,
               "description": "Average pages viewed per session",
               "minimum": 1.0,
               "title": "Avg Pages Per Session",
               "type": "number"
            },
            "cart_add_probability": {
               "default": 0.15,
               "description": "Probability of adding item to cart given product view",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Cart Add Probability",
               "type": "number"
            },
            "checkout_start_probability": {
               "default": 0.4,
               "description": "Probability of starting checkout given cart view",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Checkout Start Probability",
               "type": "number"
            },
            "purchase_completion_probability": {
               "default": 0.65,
               "description": "Probability of completing purchase given checkout start",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Purchase Completion Probability",
               "type": "number"
            },
            "avg_session_duration_seconds": {
               "default": 300,
               "description": "Average session duration in seconds",
               "minimum": 1,
               "title": "Avg Session Duration Seconds",
               "type": "integer"
            },
            "enable_bounces": {
               "default": true,
               "description": "Enable session bounces (single-page visits)",
               "title": "Enable Bounces",
               "type": "boolean"
            },
            "bounce_rate": {
               "default": 0.35,
               "description": "Bounce rate (probability of immediate exit)",
               "maximum": 1.0,
               "minimum": 0.0,
               "title": "Bounce Rate",
               "type": "number"
            }
         },
         "title": "SessionConfig",
         "type": "object"
      }
   }
}

field sessions: int = 10000

Number of sessions to generate

field customers: int = 2000

Number of unique customers

field seed: int | None = None

Random seed for reproducibility

field start_date: str | None = None

Start date for data generation (YYYY-MM-DD)

field days: int = 30

Number of days to generate

field session: SessionConfig [Optional]

Session behavior configuration

field cart: CartConfig [Optional]

Cart behavior configuration

field catalog: CatalogConfig [Optional]

Product catalog configuration

field rfm: RfmConfig [Optional]

RFM analysis configuration

field funnel: FunnelConfig [Optional]

Conversion funnel configuration

pydantic model superstore.SessionConfig[source]

Bases: BaseModel

Configuration for session behavior in e-commerce.

Show JSON schema
{
   "title": "SessionConfig",
   "description": "Configuration for session behavior in e-commerce.",
   "type": "object",
   "properties": {
      "avg_pages_per_session": {
         "default": 5.0,
         "description": "Average pages viewed per session",
         "minimum": 1.0,
         "title": "Avg Pages Per Session",
         "type": "number"
      },
      "cart_add_probability": {
         "default": 0.15,
         "description": "Probability of adding item to cart given product view",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Cart Add Probability",
         "type": "number"
      },
      "checkout_start_probability": {
         "default": 0.4,
         "description": "Probability of starting checkout given cart view",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Checkout Start Probability",
         "type": "number"
      },
      "purchase_completion_probability": {
         "default": 0.65,
         "description": "Probability of completing purchase given checkout start",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Purchase Completion Probability",
         "type": "number"
      },
      "avg_session_duration_seconds": {
         "default": 300,
         "description": "Average session duration in seconds",
         "minimum": 1,
         "title": "Avg Session Duration Seconds",
         "type": "integer"
      },
      "enable_bounces": {
         "default": true,
         "description": "Enable session bounces (single-page visits)",
         "title": "Enable Bounces",
         "type": "boolean"
      },
      "bounce_rate": {
         "default": 0.35,
         "description": "Bounce rate (probability of immediate exit)",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Bounce Rate",
         "type": "number"
      }
   }
}

field avg_pages_per_session: float = 5.0

Average pages viewed per session

field cart_add_probability: float = 0.15

Probability of adding item to cart given product view

field checkout_start_probability: float = 0.4

Probability of starting checkout given cart view

field purchase_completion_probability: float = 0.65

Probability of completing purchase given checkout start

field avg_session_duration_seconds: int = 300

Average session duration in seconds

field enable_bounces: bool = True

Enable session bounces (single-page visits)

field bounce_rate: float = 0.35

Bounce rate (probability of immediate exit)

pydantic model superstore.CartConfig[source]

Bases: BaseModel

Configuration for cart behavior.

Show JSON schema
{
   "title": "CartConfig",
   "description": "Configuration for cart behavior.",
   "type": "object",
   "properties": {
      "avg_items_per_cart": {
         "default": 2.5,
         "description": "Average items per cart",
         "minimum": 1.0,
         "title": "Avg Items Per Cart",
         "type": "number"
      },
      "remove_probability": {
         "default": 0.1,
         "description": "Probability of removing an item from cart",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Remove Probability",
         "type": "number"
      },
      "quantity_update_probability": {
         "default": 0.05,
         "description": "Probability of updating quantity",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Quantity Update Probability",
         "type": "number"
      },
      "max_items": {
         "default": 20,
         "description": "Maximum items per cart",
         "minimum": 1,
         "title": "Max Items",
         "type": "integer"
      },
      "enable_abandonment": {
         "default": true,
         "description": "Enable cart abandonment simulation",
         "title": "Enable Abandonment",
         "type": "boolean"
      },
      "abandonment_rate": {
         "default": 0.7,
         "description": "Cart abandonment rate",
         "maximum": 1.0,
         "minimum": 0.0,
         "title": "Abandonment Rate",
         "type": "number"
      }
   }
}

field avg_items_per_cart: float = 2.5

Average items per cart

field remove_probability: float = 0.1

Probability of removing an item from cart

field quantity_update_probability: float = 0.05

Probability of updating quantity

field max_items: int = 20

Maximum items per cart

field enable_abandonment: bool = True

Enable cart abandonment simulation

field abandonment_rate: float = 0.7

Cart abandonment rate

pydantic model superstore.CatalogConfig[source]

Bases: BaseModel

Configuration for product catalog.

Show JSON schema
{
   "title": "CatalogConfig",
   "description": "Configuration for product catalog.",
   "type": "object",
   "properties": {
      "num_products": {
         "default": 500,
         "description": "Number of unique products",
         "minimum": 1,
         "title": "Num Products",
         "type": "integer"
      },
      "min_price": {
         "default": 5.0,
         "description": "Minimum product price",
         "minimum": 0.01,
         "title": "Min Price",
         "type": "number"
      },
      "max_price": {
         "default": 1000.0,
         "description": "Maximum product price",
         "minimum": 1.0,
         "title": "Max Price",
         "type": "number"
      },
      "lognormal_prices": {
         "default": true,
         "description": "Price follows log-normal distribution (realistic skew)",
         "title": "Lognormal Prices",
         "type": "boolean"
      },
      "categories": {
         "description": "Product categories",
         "items": {
            "type": "string"
         },
         "title": "Categories",
         "type": "array"
      }
   }
}

field num_products: int = 500

Number of unique products

field min_price: float = 5.0

Minimum product price

field max_price: float = 1000.0

Maximum product price

field lognormal_prices: bool = True

Price follows log-normal distribution (realistic skew)

field categories: list[str] [Optional]

Product categories

pydantic model superstore.RfmConfig[source]

Bases: BaseModel

Configuration for RFM (Recency, Frequency, Monetary) analysis.

Show JSON schema
{
   "title": "RfmConfig",
   "description": "Configuration for RFM (Recency, Frequency, Monetary) analysis.",
   "type": "object",
   "properties": {
      "enable": {
         "default": true,
         "description": "Enable RFM metrics calculation",
         "title": "Enable",
         "type": "boolean"
      },
      "recency_window_days": {
         "default": 365,
         "description": "Days to look back for recency",
         "minimum": 1,
         "title": "Recency Window Days",
         "type": "integer"
      },
      "num_buckets": {
         "default": 5,
         "description": "Number of RFM score buckets (typically 5)",
         "maximum": 10,
         "minimum": 2,
         "title": "Num Buckets",
         "type": "integer"
      },
      "pareto_shape": {
         "default": 1.5,
         "description": "Pareto distribution shape for customer value (80/20 rule)",
         "minimum": 1.0,
         "title": "Pareto Shape",
         "type": "number"
      }
   }
}

field enable: bool = True

Enable RFM metrics calculation

field recency_window_days: int = 365

Days to look back for recency

field num_buckets: int = 5

Number of RFM score buckets (typically 5)

field pareto_shape: float = 1.5

Pareto distribution shape for customer value (80/20 rule)

pydantic model superstore.FunnelConfig[source]

Bases: BaseModel

Configuration for conversion funnel.

Show JSON schema
{
   "title": "FunnelConfig",
   "description": "Configuration for conversion funnel.",
   "type": "object",
   "properties": {
      "enable": {
         "default": true,
         "description": "Enable funnel stage tracking",
         "title": "Enable",
         "type": "boolean"
      },
      "stages": {
         "description": "Funnel stages",
         "items": {
            "type": "string"
         },
         "title": "Stages",
         "type": "array"
      },
      "time_of_day_effects": {
         "default": true,
         "description": "Time-of-day effects on conversions",
         "title": "Time Of Day Effects",
         "type": "boolean"
      },
      "day_of_week_effects": {
         "default": true,
         "description": "Day-of-week effects on conversions",
         "title": "Day Of Week Effects",
         "type": "boolean"
      }
   }
}

field enable: bool = True

Enable funnel stage tracking

field stages: list[str] [Optional]

Funnel stages

field time_of_day_effects: bool = True

Time-of-day effects on conversions

field day_of_week_effects: bool = True

Day-of-week effects on conversions


Enums

class superstore.ClimateZone(value)[source]

Bases: str, Enum

Climate zone affecting weather patterns.

ARID = 'arid'
CONTINENTAL = 'continental'
MEDITERRANEAN = 'mediterranean'
POLAR = 'polar'
SUBTROPICAL = 'subtropical'
TEMPERATE = 'temperate'
TROPICAL = 'tropical'
class superstore.OutputFormat(value)[source]

Bases: str, Enum

Output format for generators.

DICT = 'dict'
PANDAS = 'pandas'
POLARS = 'polars'
class superstore.LogLevel(value)[source]

Bases: str, Enum

Log severity levels.

DEBUG = 'debug'
ERROR = 'error'
INFO = 'info'
TRACE = 'trace'
WARN = 'warn'
class superstore.LogFormat(value)[source]

Bases: str, Enum

Log output format styles.

APPLICATION = 'application'
COMBINED = 'combined'
COMMON = 'common'
JSON = 'json'