Machine Learning for Economists
ECON 128
- Economists often require two parts to a model in order to make effective policies: prediction and causations
- Many computer science algorithms can handle prediction, and some machine learning ones can handle causation
- Double machine learning handles both but is limited
- Trade offs
- Prediction Accuracy vs. Intuitive Interpretation
- Linear models are simple but easy to interpret; more complex models are harder to interpret but more accurate
- Good fit vs. Over/Under Fit
- Overfitting is bad because it can’t predict new data accurately; underfitting is bad because it doesn’t fit training data accurately
- Simplicty (Parsimony) vs. Black-box
- Fewer variables can be better as opposed to using all variables
-
Bias vs. Variance
- More complicated models will have less bias but lead to more variance (i.e. be more susceptible to noisy data)
- Prediction Accuracy vs. Intuitive Interpretation
Feature Reduction Techniques
Subset Selection
- Given p parameters and 1 <= k <= p, fit all pCk models that contain exactly k predictors
- Pick the best model amongst these and call it $M_k$
- Pick the best models from the set of best models $M_1, M_2, …, M_k$
- Inefficient; requires lots of computation, especially for larger p
- Since the search space is so large, the best model might be the best due to chance
Forward Stepwise Selection
- For k = 0, …, p - 1, add an additional predictor to $M_k$
- Choose the best out of these new models; this becomes $M_{k+1}$
- After iterating and creating $M_0$ to $M_p$, choose the best model out of these
Shrinkage
-
Regularization: A strategy for building models where the cost function increases in value when more features are used
- Strategies take the form of $\frac{1}{n} dev(\beta) + \lambda \Sigma _j c(\beta _j)$
- $dev$ is the squared difference between predictions and actual data
- c is the cost function that adds a penalty for nonzero coefficients
- OLS uses this strategy but does not include the cost function c
- Types of cost functions
- Ridge: Places big penalties on large coefficients and smaller penalties on smaller ones ($\beta ^2$)
- Lasso: Penalty is equal to absolute value of coefficient ($\vert \beta \vert$)
- Elastic net: A linear combination of ridge and lasso
- Log penalty: Penalizes changes from 0 to small values heavily but not as much for changes in big coefficients ($log(1 + \vert \beta \vert)$)
Time Series Analysis
- Types of dynamic models
- $y_t = f(x_t, x_{t-1}, x_{t-3}, …)$
- $y_t = f(y_{t-1}, x_t)$
- $y_t = f(x_t) + e_t, e_t = f(e_{t-1})$
Types of Variables
- Stationary
- Nonstationary
- Trending