irtools
compare-ap-ib.Rmd
In order to support the usualsuspects
template several functions were developed to explore and interpret inferences around continuous variables. While make_chi_square2
is the function used for categorical data, make_compare_continuous
is the function used for continuous variables. Additionally, to describe and interpret the outputs, another function called make_ap_ib_determination
was created, again to assist with the usualsuspects
reporting process.
As always the required libraries must be installed:
library(tidyverse)
#> -- Attaching packages ------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
#> v ggplot2 3.2.0.9000 v purrr 0.3.2
#> v tibble 2.1.3 v dplyr 0.8.3
#> v tidyr 0.8.3 v stringr 1.4.0
#> v readr 1.3.1 v forcats 0.4.0
#> -- Conflicts --------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library(irverse)
#> -- Attaching packages ---------------------------------------------------------------------------------------------------------------------- irverse0.0.0.9000 --
#> v knitr 1.23 v usualsuspects 0.0.1
#> v kableExtra 1.1.0 v brms 2.9.0
#> v wfudata 0.1.0 v janitor 1.2.0
#> v irtools 0.0.1
#> -- Conflicts ----------------------------------------------------------------------------------------------------------------------------- irverse_conflicts() --
#> x janitor::chisq.test() masks stats::chisq.test()
#> x dplyr::filter() masks stats::filter()
#> x janitor::fisher.test() masks stats::fisher.test()
#> x kableExtra::group_rows() masks dplyr::group_rows()
#> x dplyr::lag() masks stats::lag()
library(irtools)
Here we will again use the simulated data from the irtools
package:
ipeds_race | gender | student_ses | treated | gpa | hrs_earned | hrs_attempted | ap_hours_earned |
---|---|---|---|---|---|---|---|
Black | Female | Medium Income | 1 | 3.18 | 14.15 | 14.15 | 17.44 |
Asian | Female | High Income | 1 | 3.00 | 13.43 | 13.43 | 17.76 |
White | Female | Medium Income | 1 | 3.62 | 13.12 | 13.12 | 19.46 |
NRA | Male | Low Income | 0 | 3.55 | 12.22 | 12.22 | 14.33 |
White | Female | High Income | 0 | 3.68 | 11.45 | 11.45 | 12.83 |
White | Male | High Income | 0 | 3.36 | 12.18 | 12.34 | 14.60 |
Here we see that the ap_hours_earned
column is available to us for analysis.
First we need to prep the data by calculating the average, the variance, and the standard deviation for each leg of treatment group as show below
ap_comparison <- fake_student_data %>%
group_by(treated) %>%
summarise(freq = n(),
mu = mean(ap_hours_earned),
var = var(ap_hours_earned),
my_sd = sd(ap_hours_earned)) %>%
mutate(combined_results = paste0(round(mu,1), " (n = ", freq,"; ", "Std = ", round(my_sd, 1),")"))
ap_comparison
#> # A tibble: 2 x 6
#> treated freq mu var my_sd combined_results
#> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 0 289 14.0 0.992 0.996 14 (n = 289; Std = 1)
#> 2 1 211 18.0 1.05 1.03 18 (n = 211; Std = 1)
The resulting data frame can then be m,anipulated to use the make_compare_continuous
function. This series of steps takes the initial data frame, converts the treatment names to something pretty, applies the make_compare_continuous
function, and selects the desired rows in order to make a nice table.
ap_ib_difference <- ap_comparison %>%
select(treated, combined_results) %>%
mutate(treated = ifelse(treated==1, "Participated", "Did Not Participate")) %>%
spread(treated, combined_results) %>%
bind_cols(
make_compare_continuous(ap_comparison) %>% select(effect_size, formatted)
)
ap_ib_difference
#> # A tibble: 1 x 4
#> `Did Not Participate` Participated effect_size formatted
#> <chr> <chr> <chr> <chr>
#> 1 14 (n = 289; Std = 1) 18 (n = 211; Std = 1) Huge 4.01+/-0.31
Just for references, if the make_compare_continuous
function is applied as is, you will get the following output:
ap_comparison %>%
make_compare_continuous()
#> es es_sigma es_ci effect_size formatted
#> 1 4.009424 0.1558045 0.3053767 Huge 4.01+/-0.31
This provides the effect size, the standard deviation of the effect size, the effect size confidence interval, and Cohen's
effect size description. This function is always useful when you are comparing continuous data. For example this entire example could be repeated with the credit hours earned if desired.
Now when it comes time to interpret the results, you can use the make_ap_ib_determination
function. As with compare_chi_square2
, you will need to make sure the results
option in the R Markdown code chunk is set to ‘asis’. Additionally, I am going to include some descriptive text to include in my message.
Now, we can pass our ap_ib_object
to our function, supply it with arguments regarding what to call the treated group, what to call the control group, and some information on how to phrase these two groups. We can also print out a nice table with our interpretation.
Participated | Did Not Participate | Effect Size | Effect Size Value |
---|---|---|---|
14 (n = 289; Std = 1) | 18 (n = 211; Std = 1) | Huge | 4.01+/-0.31 |
See the new student information page at https://newstudents.wfu.edu/academics/academic-success/planning-for-registration/.↩