14.7 Bayesian Modeling Approaches
Bayesian modeling allows us to take a similar but slightly different approach to prediction than the above mentioned frequentist approaches. While many of the constraints are similar (e.g. feature selection), Bayesian modeling naturally applies shrinkage through setting of priors. Priors impart our knowledge about how the world works without seeing data (e.g. the probability that there is a tornado is low). Our posterior probability then is our priors multiplied by our actual data (what we see). This represents our updated belief based on the data.
Priors can be set very small using bernouilli
regression (basically logistic regression), and horseshoe or Laplace priors can be used to simulate elastic net to feature selection.
This results in natural shrinkage.
Additionally, we can include random slopes and intercepts in the form of hierarchical models. The benefit of using hierarchical models is that is allows us to capture some of the heterogeneity of effects (via the random slopes and/or intercepts) and when using partial pooling it borrows information across groups. A good example of this is illustrated here.
One nice feature of Bayesian modeling is that you do not require any of the up/down sampling or SMOTE techniques. The rare event problem is still a challenge, but with use of the different hierarchical modeling techniques, it is best not to use these pre-treatment techniques. Additionally, as mentioned in the text, Bayesian modeling allows you to specify how much heterogeneity you expect without the curse of dimensionality problem (constrained via priors) that you would find in other frequentist approaches.
14.7.1 LOO and Model Averaging
An additional feature of Bayesian Modeling is to perform Bayesian LOO (leave one out cross validation). This is always a good approach. An additional nicety is to use Model Averaging to create synthesized models. In this way you may create a model that more accurately predicts certain kinds of students at high risk of leaving and another model that predicts students with different attributes a little better. These can be combined into a larger synthetic model through Bayesian modeling averaging approaches. This is a nice feature because sometimes you need different models to better reflect different populations.