4.17 Database Formats

After the LSDS is completed combined, the final step is outputting two data sets that can be used in a database format. Many of the operations and analyses done by the data scientist are best done through a database format. The Building FR cohort file_4_Xlong_format.sas program created two data sets:

  • db_student_features - for those items that are static and captured once with the student (e.g. race, gender, cohort code, hometown)
  • db_student_results - which is a long format (multiple rows per student) which contains information about the activities that the student completed per term (e.g. how many IM games played in the 201880 term on one row, 201910 on another row)

The primary key for each of these tables is the banner_id field. I find that this is the easiest way of manipulating this data.