14.3 Data
It all starts with data. Good, clean data is critical to having a successful model. The quantity and types of data that you have also help to scope the types of model that you are building.
A rule of thumb is that for each 100 data points you can add a predictor to a model. Thus if you do not have a lot of data then you must be looking to build simpler models (e.g. 500 data points would have a 5 parameter model maximum).
The data should be collected such that you have items that represent all of the major drivers in student success:
There are also many open sources of information to use. This includes information regarding the high schools that the student attended (Department of Education) and Census information. Many of these are contained in the LSDS and should be leveraged where applicable.
- Demographics
- High School Performance
- Socio-economic background
- University performance
- On campus engagement (clubs, sports, Greek, gym)