Chapter 14 Retention Modeling Method

A large portion of the data scientist position is predictive modeling. This modeling can take the form of prediction of student retention to prediction of student financial aid yield. What is important is to understand the nuances of the data and that by pooling data over many years you can introduce different biases. In my opinion, it is better to have a simple model with higher bias than to have a more complex model with higher variance (e.g. as the data change the model overfits).

In the course of modeling, both traditional machine learning techniques were used. Unsurprisingly though, Bayesian Hierarchical models provided the best predictions for students who would go on the leave. The added benefit of Bayesian modeling is that the parameters and outputs are often higher interpretable.