Introduction

One of the key functions in the irtools package that powers the majority of usualsuspects template is the compare_chi_square2 function. This function has evolved over time to be flexible enough to perform a categorical analysis, print a message, and write to a log file for future analysis.

When to Use it

compare_chi_square2 is used when you are comparing the distribution of a categorical variable by a treatment variable. The function will complete a chi-square analysis and return a table. As a reminder, a chi-square test is only appropriate when the predicted cell sizes are above 5. There is not a good way for the function to detect this issue, so please use caution when reviewing the results.

To get started let’s load the necessary packages.

Say for instances we have the following fictitious student data:

ipeds_race gender student_ses treated gpa hrs_earned hrs_attempted ap_hours_earned
Black Female Medium Income 1 3.18 14.15 14.15 17.44
Asian Female High Income 1 3.00 13.43 13.43 17.76
White Female Medium Income 1 3.62 13.12 13.12 19.46
NRA Male Low Income 0 3.55 12.22 12.22 14.33
White Female High Income 0 3.68 11.45 11.45 12.83
White Male High Income 0 3.36 12.18 12.34 14.60

This is a generated data set and is completely fictitious.

This data set contains some demographics, an indicator variable for if the student participated in a given intervention or event (treated) and some other continuous variables

Running the Function

To run the function you need to supply a few arguments:

Basic Arguments

  • data which is the data frame which you plan to use
  • group1 which indicates which category you want to test (e.g. ipeds_race)
  • response which indicates the treatment variable. It is best practice to make this a factor variable and set your desired control at the first level, if you decide to use something that is not 0 or 1.
  • treated_name which will replace the first level of response and will be the name that is printed
  • between_lingo which is used when describing the difference between the two groups

Advanced Arguments

  • print_log T/F to tell the function to write the output message to a log file
  • verbose T/F to print a message that indicates if there is a statistically significant difference or not and the group

Alert

You will need to set the results option in the rmarkdown chunk to 'asis' if you are using an R Markdown document.

Example

{r chunk-name, results='asis'}

Some Examples

The basic application of the function would be as follows:

gender Did not Participate Participated
Female 43% (124) 79% (166)
Male 57% (165) 21% (45)
Note: Column percentages

If we wanted to see the more detailed or verbose response we could do the following:

There was evidence of a statistically significant difference for gender between those who participated versus those who did not at the 0.05 level and medium (0.77+/-0.1) effect size. Those who participated were more likely to be Female.
gender Did not Participate Participated
Female 43% (124) 79% (166)
Male 57% (165) 21% (45)
Note: Column percentages

Additionally, we may want to include some details about the definitions that we are using. We can add extra information to this message by using the description_text argument.

There was evidence of a statistically significant difference for gender between those who participated versus those who did not at the 0.05 level and medium (0.77+/-0.1) effect size. Those who participated were more likely to be Female. Biological sex was used as per current US Department of Education reporting guidelines.1
gender Did not Participate Participated
Female 43% (124) 79% (166)
Male 57% (165) 21% (45)
Note: Column percentages

If there is a need to change the case of the description use the description_case argument to specify one of title, lower, upper or asis. Additionally, if you need to add an additional word do that with the extra_word argument.

There was evidence of a statistically significant difference for gender between those who participated versus those who did not at the 0.05 level and medium (0.77+/-0.1) effect size. Those who participated were more likely to be a female. Biological sex was used as per current US Department of Education reporting guidelines.2
gender Did not Participate Participated
Female 43% (124) 79% (166)
Male 57% (165) 21% (45)
Note: Column percentages

The Log

One nice feature of this function is that it writes a log with the significant results (named log_file.txt). This can be helpful when writing up the final analysis. If you run this function multiple times, the new record will be appended to the bottom of the file.

There was evidence of a statistically significant difference for gender between those who participated versus those who did not at the 0.05 level and medium (0.77+/-0.1) effect size. Those who participated were more likely to be a female.
gender Did not Participate Participated
Female 43% (124) 79% (166)
Male 57% (165) 21% (45)
Note: Column percentages

The log looks like the following:

You can then use this log file to write any kind of summary regarding the data that you have.

Rinse and Repeat

Of course this function can be repeated for whatever variables. Below is an example of a non-significant finding message. Additionally, this is an example of renaming the group_1 variable to a more friendly name.

There was no evidence of a statistically significant difference for household income between those who participated versus those who did not at the 0.05 level.
Household Income Did not Participate Participated
Low Income 29% (84) 31% (66)
Medium Income 18% (52) 20% (42)
High Income 53% (153) 49% (103)
Note: Column percentages

  1. See https://edsurveys.rti.org/IPEDS_TRP_DOCS/prod/documents/TRP51_Summary.pdf for more details on the conversations regarding gender and biological sex definition for National Center for Education Statistics reporting

  2. See https://edsurveys.rti.org/IPEDS_TRP_DOCS/prod/documents/TRP51_Summary.pdf for more details on the conversations regarding gender and biological sex definition for National Center for Education Statistics reporting