Correlation & Copulas¶
Generate correlated multivariate data using copula models.
Overview¶
Copulas allow you to:
Generate correlated random variables
Control the dependence structure independently from marginals
Model different types of dependence (tail dependence, asymmetric)
Create realistic multivariate distributions
Correlation Functions¶
Pearson Correlation¶
Compute Pearson correlation between two arrays:
from superstore import pearsonCorrelation
correlation = pearsonCorrelation(x, y)
Bivariate Sampling¶
Generate correlated bivariate samples:
from superstore import sampleBivariate
# Generate correlated pairs with rho=0.7
x, y = sampleBivariate(n=1000, rho=0.7)
Copula Models¶
Gaussian Copula¶
The Gaussian copula creates correlation through a multivariate normal distribution. It has no tail dependence - extreme observations are not more likely to occur together.
from superstore import GaussianCopula
# Create a Gaussian copula with correlation matrix
copula = GaussianCopula(
correlation=[[1.0, 0.7, 0.3],
[0.7, 1.0, 0.5],
[0.3, 0.5, 1.0]]
)
# Generate 1000 correlated uniform samples
u = copula.sample(n=1000)
# u has shape (1000, 3), each column is marginally Uniform(0,1)
Properties:
Symmetric dependence
No tail dependence
Easy to parameterize with correlation matrix
Good for “normal” dependencies
Clayton Copula¶
The Clayton copula has lower tail dependence - extreme low values are more likely to occur together. Useful for:
Credit risk (joint defaults)
Insurance (correlated claims)
Portfolio risk (market crashes)
from superstore import ClaytonCopula
# Create a Clayton copula with theta=2.0
# Higher theta = stronger dependence
copula = ClaytonCopula(theta=2.0, dim=3)
# Generate samples
u = copula.sample(n=1000)
Properties:
Asymmetric dependence
Lower tail dependence (crashes happen together)
No upper tail dependence
theta > 0 for positive dependence
Frank Copula¶
The Frank copula has no tail dependence but can model both positive and negative dependence:
from superstore import FrankCopula
# Positive dependence
copula = FrankCopula(theta=5.0, dim=2)
# Negative dependence
copula = FrankCopula(theta=-5.0, dim=2)
u = copula.sample(n=1000)
Properties:
Symmetric dependence
No tail dependence
Can model negative dependence (theta < 0)
Good for weak dependencies
Gumbel Copula¶
The Gumbel copula has upper tail dependence - extreme high values are more likely to occur together. Useful for:
Flood modeling (extreme rainfall)
Insurance (extreme losses)
Finance (market bubbles)
from superstore import GumbelCopula
# Create a Gumbel copula with theta=3.0
# theta >= 1, higher = stronger dependence
copula = GumbelCopula(theta=3.0, dim=2)
u = copula.sample(n=1000)
Properties:
Asymmetric dependence
Upper tail dependence (booms happen together)
No lower tail dependence
theta >= 1
Choosing a Copula¶
Copula |
Lower Tail |
Upper Tail |
Use Case |
|---|---|---|---|
Gaussian |
No |
No |
General correlation |
Clayton |
Yes |
No |
Joint crashes, defaults |
Frank |
No |
No |
Weak/negative dependence |
Gumbel |
No |
Yes |
Joint extremes (high) |
Combining Copulas with Marginals¶
Copulas generate uniform marginals. Transform to desired distributions:
from superstore import GaussianCopula, sampleNormal
from scipy.stats import norm, lognorm
# Generate correlated uniforms
copula = GaussianCopula(correlation=[[1.0, 0.8], [0.8, 1.0]])
u = copula.sample(n=10000)
# Transform to different marginal distributions
x = norm.ppf(u[:, 0], loc=0, scale=1) # Standard normal
y = lognorm.ppf(u[:, 1], s=0.5, scale=100) # Log-normal
# x and y are now correlated with different marginals
Examples¶
Credit Risk Defaults¶
from superstore import ClaytonCopula
# Strong lower tail dependence for joint defaults
copula = ClaytonCopula(theta=3.0, dim=5)
u = copula.sample(n=10000)
# Transform to default indicators
default_threshold = 0.03 # 3% default probability
defaults = u < default_threshold # Boolean array
Insurance Claims¶
from superstore import GumbelCopula
from scipy.stats import pareto
# Upper tail dependence for extreme claims
copula = GumbelCopula(theta=2.5, dim=3)
u = copula.sample(n=5000)
# Transform to Pareto claims
claims = pareto.ppf(u, b=2.0, scale=10000)
API Reference¶
See the full API Reference for all copula classes.