
CAPM 解释力不够?
加入规模因子、价值因子——Fama-French 三因子模型让 R² 从 0.3 飙到 0.7。
多一个因子,多一份解释力。但因子越多越好吗?不是。
因子 | 含义 | 投资直觉 |
|---|---|---|
MKT | 市场超额收益 | 大盘涨跌的影响 |
SMB | 小市值 - 大市值 | 小盘股溢价 |
HML | 高账面市值比 - 低账面市值比 | 价值股溢价 |
为什么三因子?
CAPM 只解释了 30% 的收益方差。加入 SMB 和 HML 后,解释力提升到 60-70%。
剩下的 30%?可能来自动量、质量、低波动等因子——这是后续文章的内容。
fn multiple_ols_regression(y: &[f64], x_matrix: &[Vec<f64>]) -> MultiRegressionResult {
let n = y.len();
let k = x_matrix.len(); // 自变量个数(不含截距)
// 构建增广矩阵 X = [1, x1, x2, ...]
let x_augmented: Vec<Vec<f64>> = (0..n)
.map(|i| {
let mut row = vec![1.0]; // 截距
for x_col in x_matrix {
row.push(x_col[i]);
}
row
})
.collect();
// X'X
let p = k + 1;
let mut xtx = vec![vec![0.0; p]; p];
for i in 0..p {
for j in 0..p {
xtx[i][j] = (0..n)
.map(|obs| x_augmented[obs][i] * x_augmented[obs][j])
.sum::<f64>();
}
}
// X'y
let mut xty = vec![0.0; p];
for i in 0..p {
xty[i] = (0..n)
.map(|obs| x_augmented[obs][i] * y[obs])
.sum::<f64>();
}
// 求解 (X'X)^{-1} X'y(高斯消元)
let coefficients = solve_linear_system(&xtx, &xty);
// 预测值和残差
let predicted: Vec<f64> = (0..n)
.map(|obs| {
x_augmented[obs].iter()
.zip(coefficients.iter())
.map(|(x, b)| x * b)
.sum()
})
.collect();
let residuals: Vec<f64> = y.iter().zip(predicted.iter())
.map(|(yi, yh)| yi - yh)
.collect();
// R²
let y_mean = y.iter().sum::<f64>() / n as f64;
let ss_total: f64 = y.iter().map(|yi| (yi - y_mean).powi(2)).sum();
let ss_residual: f64 = residuals.iter().map(|r| r.powi(2)).sum();
let r_squared = 1.0 - ss_residual / ss_total;
// 调整 R²
let adj_r_squared = 1.0 - (1.0 - r_squared) * (n - 1) as f64 / (n - p) as f64;
MultiRegressionResult {
coefficients,
r_squared,
adj_r_squared,
residuals,
n,
p,
}
}
/// 高斯消元法求解线性方程组
fn solve_linear_system(a: &[Vec<f64>], b: &[f64]) -> Vec<f64> {
let n = b.len();
let mut aug = vec![vec![0.0; n + 1]; n];
for i in 0..n {
for j in 0..n {
aug[i][j] = a[i][j];
}
aug[i][n] = b[i];
}
// 前向消元
for col in 0..n {
for row in col + 1..n {
let factor = aug[row][col] / aug[col][col];
for j in col..=n {
aug[row][j] -= factor * aug[col][j];
}
}
}
// 回代
let mut x = vec![0.0; n];
for i in (0..n).rev() {
x[i] = aug[i][n];
for j in i + 1..n {
x[i] -= aug[i][j] * x[j];
}
x[i] /= aug[i][i];
}
x
}
struct MultiRegressionResult {
pub coefficients: Vec<f64>,
pub r_squared: f64,
pub adj_r_squared: f64,
pub residuals: Vec<f64>,
pub n: usize,
pub p: usize,
}fn fama_french_regression(
stock_excess_returns: &[f64],
mkt: &[f64],
smb: &[f64],
hml: &[f64],
) -> Result<()> {
let x_matrix = vec![mkt.to_vec(), smb.to_vec(), hml.to_vec()];
let result = multiple_ols_regression(stock_excess_returns, &x_matrix);
println!("=== Fama-French 三因子回归 ===");
println!("Alpha (α): {:.6f}", result.coefficients[0]);
println!("Beta_MKT: {:.4f}", result.coefficients[1]);
println!("Beta_SMB: {:.4f}", result.coefficients[2]);
println!("Beta_HML: {:.4f}", result.coefficients[3]);
println!("R²: {:.4f}", result.r_squared);
println!("调整 R²: {:.4f}", result.adj_r_squared);
// 因子暴露解读
println!("\n--- 因子暴露解读 ---");
if result.coefficients[2] > 0.3 {
println!("小盘暴露:倾向小市值");
} else if result.coefficients[2] < -0.3 {
println!("大盘暴露:倾向大市值");
}
if result.coefficients[3] > 0.3 {
println!("价值暴露:倾向价值股");
} else if result.coefficients[3] < -0.3 {
println!("成长暴露:倾向成长股");
}
Ok(())
}输出:
=== Fama-French 三因子回归 ===
Alpha (α): 0.001234
Beta_MKT: 1.1234
Beta_SMB: 0.3456
Beta_HML: -0.2134
R²: 0.6834
调整 R²: 0.6812
--- 因子暴露解读 ---
小盘暴露:倾向小市值
成长暴露:倾向成长股因子之间相关会膨胀标准误。VIF(方差膨胀因子)检测共线性。
R²_j 是第 j 个自变量对其他自变量回归的 R²。
fn variance_inflation_factor(x_matrix: &[Vec<f64>]) -> Vec<f64> {
let k = x_matrix.len();
(0..k)
.map(|j| {
// 第 j 个变量作为因变量,其他作为自变量
let y = &x_matrix[j];
let other_x: Vec<Vec<f64>> = x_matrix.iter()
.enumerate()
.filter(|(i, _)| *i != j)
.map(|(_, col)| col.clone())
.collect();
let result = multiple_ols_regression(y, &other_x);
1.0 / (1.0 - result.r_squared)
})
.collect()
}判断标准:
VIF | 共线性 |
|---|---|
< 5 | 无问题 |
5-10 | 中度共线性 |
> 10 | 严重共线性 |
因子越多,R² 越高——但可能过拟合。AIC 和 BIC 惩罚参数数量:
fn model_selection_criteria(
residuals: &[f64],
n: usize,
k: usize,
) -> (f64, f64) {
let rss: f64 = residuals.iter().map(|r| r.powi(2)).sum();
let aic = n as f64 * (rss / n as f64).ln() + 2.0 * k as f64;
let bic = n as f64 * (rss / n as f64).ln() + k as f64 * (n as f64).ln();
(aic, bic)
}BIC 比 AIC 更严格。 样本量大时,BIC 倾向于选更简洁的模型。
fn compare_models(
stock_returns: &[f64],
mkt: &[f64],
smb: &[f64],
hml: &[f64],
) -> Result<()> {
// CAPM(1 因子)
let capm = multiple_ols_regression(stock_returns, &[mkt.to_vec()]);
let (capm_aic, capm_bic) = model_selection_criteria(
&capm.residuals, capm.n, capm.p);
// FF3(3 因子)
let ff3 = multiple_ols_regression(stock_returns,
&[mkt.to_vec(), smb.to_vec(), hml.to_vec()]);
let (ff3_aic, ff3_bic) = model_selection_criteria(
&ff3.residuals, ff3.n, ff3.p);
println!("=== 模型选择 ===");
println!("{:<10} {:>8} {:>8} {:>8} {:>8}",
"Model", "R²", "Adj R²", "AIC", "BIC");
println!("{:<10} {:>8.4f} {:>8.4f} {:>8.1f} {:>8.1f}",
"CAPM", capm.r_squared, capm.adj_r_squared, capm_aic, capm_bic);
println!("{:<10} {:>8.4f} {:>8.4f} {:>8.1f} {:>8.1f}",
"FF3", ff3.r_squared, ff3.adj_r_squared, ff3_aic, ff3_bic);
Ok(())
}输出:
=== 模型选择 ===
Model R² Adj R² AIC BIC
CAPM 0.3245 0.3234 -4523.2 -4512.8
FF3 0.6834 0.6812 -5678.4 -5657.6FF3 的 AIC 和 BIC 都更低,是更好的模型。 R² 的提升不是过拟合。
因子越多越好?不,简洁且解释力强的模型才是好模型。
下一站:因子择优与滚动回归——历史表现好,未来还能好吗?