````markdown # E-commerce Data Generation Generate synthetic e-commerce data with realistic user sessions, shopping carts, orders, and customer RFM metrics. ## Overview The e-commerce generators create data suitable for: - Conversion funnel analysis - Customer segmentation (RFM) - Cart abandonment studies - Session behavior analytics - A/B testing simulations ## Quick Start ```python from superstore import ecommerce_data, EcommerceConfig # Generate complete e-commerce dataset data = ecommerce_data() # Access individual tables sessions_df = data["sessions"] products_df = data["products"] cart_events_df = data["cart_events"] orders_df = data["orders"] customers_df = data["customers"] ``` --- ## Session Data The `ecommerce_sessions()` function generates user session records with MarkovChain-based page navigation. ### Basic Usage ```python from superstore import ecommerce_sessions # Generate 1000 sessions df = ecommerce_sessions(count=1000, seed=42) ``` ### Output Schema | Column | Type | Description | |--------|------|-------------| | `session_id` | str | Unique session identifier | | `user_id` | str | User identifier | | `start_time` | datetime | Session start timestamp | | `end_time` | datetime | Session end timestamp | | `duration_seconds` | int | Total session duration | | `device_type` | str | Device type (desktop, mobile, tablet) | | `browser` | str | Browser name | | `traffic_source` | str | Traffic source (organic, paid_search, direct, social, email, referral, affiliate) | | `landing_page` | str | First page viewed | | `pages_viewed` | int | Number of pages viewed | | `bounced` | bool | Whether session was a bounce (single page) | | `converted` | bool | Whether session resulted in purchase | | `total_value` | float | Total purchase value (0 if not converted) | ### MarkovChain Session States Sessions navigate through states using a configurable transition matrix: ``` landing → browse → view_product → add_to_cart → view_cart → checkout_start → checkout_payment → purchase ↘ ↘ ↘ ↘ ↘ exit exit exit exit exit ``` --- ## Product Catalog The `ecommerce_products()` function generates a product catalog with realistic pricing. ### Basic Usage ```python from superstore import ecommerce_products # Generate 500 products df = ecommerce_products(count=500, seed=42) ``` ### Output Schema | Column | Type | Description | |--------|------|-------------| | `product_id` | str | Unique product identifier | | `name` | str | Product name | | `category` | str | Product category | | `subcategory` | str | Product subcategory | | `price` | float | Product price (log-normal distribution) | | `rating` | float | Average rating (1.0-5.0) | | `review_count` | int | Number of reviews | | `in_stock` | bool | Stock availability | --- ## Cart Events Cart events track user interactions with shopping carts. ### Output Schema | Column | Type | Description | |--------|------|-------------| | `event_id` | str | Unique event identifier | | `session_id` | str | Associated session | | `user_id` | str | User identifier | | `timestamp` | datetime | Event timestamp | | `event_type` | str | Event type (add, remove, update_quantity) | | `product_id` | str | Product identifier | | `quantity` | int | Item quantity | | `unit_price` | float | Price per unit | | `total_price` | float | Total line price | --- ## Orders Completed purchase orders. ### Output Schema | Column | Type | Description | |--------|------|-------------| | `order_id` | str | Unique order identifier | | `user_id` | str | Customer identifier | | `session_id` | str | Originating session | | `order_time` | datetime | Order timestamp | | `total_items` | int | Number of items | | `subtotal` | float | Subtotal before tax/shipping | | `discount` | float | Discount amount | | `tax` | float | Tax amount | | `shipping` | float | Shipping cost | | `total` | float | Final order total | | `payment_method` | str | Payment method (credit_card, paypal, apple_pay, etc.) | | `status` | str | Order status (completed, processing, shipped) | --- ## Customers with RFM Metrics Customer records include RFM (Recency, Frequency, Monetary) segmentation. ### Output Schema | Column | Type | Description | |--------|------|-------------| | `customer_id` | str | Unique customer identifier | | `email` | str | Customer email | | `first_order_date` | date | First purchase date | | `last_order_date` | date | Most recent purchase date | | `total_orders` | int | Lifetime order count | | `total_spent` | float | Lifetime spend | | `avg_order_value` | float | Average order value | | `rfm_recency` | int | Recency score (1-5) | | `rfm_frequency` | int | Frequency score (1-5) | | `rfm_monetary` | float | Monetary value | | `rfm_score` | str | Combined RFM score (e.g., "544") | | `rfm_segment` | str | Customer segment label | ### RFM Segments | Segment | Description | |---------|-------------| | Champions | High recency, frequency, and monetary | | Loyal Customers | High frequency and monetary | | Potential Loyalists | Recent customers with medium frequency | | New Customers | Very recent, low frequency | | At Risk | Previously good customers, declining | | Need Attention | Below average across metrics | | Hibernating | Low activity, long time since purchase | | Lost | No recent activity, low value | --- ## Configuration Use `EcommerceConfig` for detailed control: ```python from superstore import ecommerce_data, EcommerceConfig config = EcommerceConfig( sessions=10000, # Number of sessions customers=2000, # Number of unique customers days=30, # Time span in days seed=42, # Reproducibility ) data = ecommerce_data(config=config.model_dump()) ``` ### Session Configuration Control user session behavior: ```python config = EcommerceConfig( sessions=5000, session={ "avg_pages_per_session": 5.0, "cart_add_probability": 0.15, "checkout_start_probability": 0.40, "purchase_completion_probability": 0.65, "avg_session_duration_seconds": 300, "enable_bounces": True, "bounce_rate": 0.35, } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `avg_pages_per_session` | `5.0` | Average pages viewed | | `cart_add_probability` | `0.15` | P(add to cart \| view product) | | `checkout_start_probability` | `0.40` | P(start checkout \| view cart) | | `purchase_completion_probability` | `0.65` | P(purchase \| checkout start) | | `avg_session_duration_seconds` | `300` | Average session length | | `enable_bounces` | `True` | Enable single-page bounces | | `bounce_rate` | `0.35` | Bounce probability | ### Cart Configuration Configure cart behavior and abandonment: ```python config = EcommerceConfig( cart={ "avg_items_per_cart": 2.5, "remove_probability": 0.10, "quantity_update_probability": 0.05, "max_items": 20, "enable_abandonment": True, "abandonment_rate": 0.70, } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `avg_items_per_cart` | `2.5` | Average items added | | `remove_probability` | `0.10` | P(remove item) | | `quantity_update_probability` | `0.05` | P(update quantity) | | `max_items` | `20` | Maximum cart size | | `enable_abandonment` | `True` | Enable cart abandonment | | `abandonment_rate` | `0.70` | Cart abandonment rate | ### Catalog Configuration Configure the product catalog: ```python config = EcommerceConfig( catalog={ "num_products": 500, "min_price": 5.0, "max_price": 1000.0, "lognormal_prices": True, "categories": ["Electronics", "Clothing", "Home", "Sports"], } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `num_products` | `500` | Number of products | | `min_price` | `5.0` | Minimum product price | | `max_price` | `1000.0` | Maximum product price | | `lognormal_prices` | `True` | Use log-normal price distribution | | `categories` | `[...]` | Product categories | ### RFM Configuration Configure RFM analysis parameters: ```python config = EcommerceConfig( rfm={ "enable": True, "recency_window_days": 365, "num_buckets": 5, "pareto_shape": 1.5, } ) ``` | Parameter | Default | Description | |-----------|---------|-------------| | `enable` | `True` | Calculate RFM metrics | | `recency_window_days` | `365` | Recency lookback period | | `num_buckets` | `5` | Number of RFM score buckets | | `pareto_shape` | `1.5` | Shape parameter for 80/20 distribution | ### Funnel Configuration Configure conversion funnel tracking: ```python config = EcommerceConfig( funnel={ "enable": True, "stages": ["visit", "view_product", "add_to_cart", "checkout", "purchase"], "time_of_day_effects": True, "day_of_week_effects": True, } ) ``` --- ## Complete Example ```python from superstore import ecommerce_data, EcommerceConfig config = EcommerceConfig( sessions=20000, customers=5000, days=90, seed=42, session={ "avg_pages_per_session": 6.0, "bounce_rate": 0.30, "purchase_completion_probability": 0.70, }, cart={ "abandonment_rate": 0.65, "avg_items_per_cart": 3.0, }, catalog={ "num_products": 1000, "categories": ["Electronics", "Fashion", "Home", "Beauty", "Sports"], }, rfm={ "num_buckets": 5, "pareto_shape": 1.8, }, ) data = ecommerce_data(config=config.model_dump()) # Analyze conversion rates sessions = data["sessions"] conversion_rate = sessions["converted"].mean() print(f"Conversion rate: {conversion_rate:.2%}") # Segment customers by RFM customers = data["customers"] segment_counts = customers["rfm_segment"].value_counts() print(segment_counts) ``` --- ## API Reference See the full API documentation: - [ecommerce_sessions()](api.md) - [ecommerce_products()](api.md) - [ecommerce_data()](api.md) - [EcommerceConfig](api.md) ````