4.17 Database Formats
After the LSDS is completed combined, the final step is outputting two data sets that can be used in a database format. Many of the operations and analyses done by the data scientist are best done through a database format. The Building FR cohort file_4_Xlong_format.sas
program created two data sets:
db_student_features
- for those items that are static and captured once with the student (e.g. race, gender, cohort code, hometown)db_student_results
- which is a long format (multiple rows per student) which contains information about the activities that the student completed per term (e.g. how many IM games played in the201880
term on one row,201910
on another row)
The primary key for each of these tables is the banner_id
field. I find that this is the easiest way of manipulating this data.