• Economists often require two parts to a model in order to make effective policies: prediction and causations
    • Many computer science algorithms can handle prediction, and some machine learning ones can handle causation
    • Double machine learning handles both but is limited
  • Trade offs
    • Prediction Accuracy vs. Intuitive Interpretation
      • Linear models are simple but easy to interpret; more complex models are harder to interpret but more accurate
    • Good fit vs. Over/Under Fit
      • Overfitting is bad because it can’t predict new data accurately; underfitting is bad because it doesn’t fit training data accurately
    • Simplicty (Parsimony) vs. Black-box
      • Fewer variables can be better as opposed to using all variables
    • Bias vs. Variance
      • More complicated models will have less bias but lead to more variance (i.e. be more susceptible to noisy data)

Feature Reduction Techniques

Subset Selection

  • Given p parameters and 1 <= k <= p, fit all pCk models that contain exactly k predictors
  • Pick the best model amongst these and call it $M_k$
  • Pick the best models from the set of best models $M_1, M_2, …, M_k$
  • Inefficient; requires lots of computation, especially for larger p
    • Since the search space is so large, the best model might be the best due to chance

Forward Stepwise Selection

  • For k = 0, …, p - 1, add an additional predictor to $M_k$
  • Choose the best out of these new models; this becomes $M_{k+1}$
  • After iterating and creating $M_0$ to $M_p$, choose the best model out of these

Shrinkage

  • Regularization: A strategy for building models where the cost function increases in value when more features are used
    • Strategies take the form of $\frac{1}{n} dev(\beta) + \lambda \Sigma _j c(\beta _j)$
    • $dev$ is the squared difference between predictions and actual data
    • c is the cost function that adds a penalty for nonzero coefficients
    • OLS uses this strategy but does not include the cost function c
  • Types of cost functions
    • Ridge: Places big penalties on large coefficients and smaller penalties on smaller ones ($\beta ^2$)
    • Lasso: Penalty is equal to absolute value of coefficient ($\vert \beta \vert$)
    • Elastic net: A linear combination of ridge and lasso
    • Log penalty: Penalizes changes from 0 to small values heavily but not as much for changes in big coefficients ($log(1 + \vert \beta \vert)$)

Time Series Analysis

  • Types of dynamic models
    • $y_t = f(x_t, x_{t-1}, x_{t-3}, …)$
    • $y_t = f(y_{t-1}, x_t)$
    • $y_t = f(x_t) + e_t, e_t = f(e_{t-1})$

Types of Variables

  • Stationary
  • Nonstationary
  • Trending