risk_score) by whether or not the defendant
is a person of color (race_binary)?Here, we’ll be working from the Defendants2025 data set, to
examine mean differences in a defendant’s risk score
(risk_score: measured as an interval-ratio
variable), by whether or not they are a person of color or white
(race_binary: measured no or
yes).
The t-test examines the differences in means between two groups, in effort to see if the differences reflect true differences that we could expect to find in the population.
The assumptions for a t-test are…
Defendants2025 data have been
randomly-sampled, we have met the assumption of independence of
observations.For both of the above (2 and 3) assumptions, we can examine the univariate data table, broken out by group:
##
## Descriptive statistics by group
## group: non-white
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1356 4.71 2.82 4.66 4.66 3.44 0.01 10 9.99 0.09 -1.16 0.08
## -------------------------------------------------------------------------------
## group: white
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 382 3.8 2.53 3.42 3.58 2.7 0.03 9.98 9.95 0.61 -0.36 0.13
we have met the assumption of equal sample sizes. Further,
given that the standard deviations for both groups do not exceed a 3:1
ratio,
we have met the assumption of homogeneity of variance.Plot the histogram for risk score (Y variable) broken out by whether or not the defendant was a person of color or white (levels of the X variable)…

risk_score) by the
predictor/grouping/independent variable (race_binary), are
relatively normal.Boxplots also provide a visual representation of the normality of a distribution. The boxplot has a box, a line through the box, two whiskers on either end of the box, and sometimes dots/points outside the whiskers. Below, we get a sense of what each part of the boxplot represents…
To tell if a variable is normally-distrubted using the box-and-whisker plot, generally, we want to see that there is some distance between the box and the end of the whiskers, that the box isn’t pushed too close to either whisker, that the median line (dot) is near the center of the box, and that there aren’t many outliers (dots) on the outside of the whiskers.
To plot a boxplot, broken out by race, we can do the following…

The quantile-quantile plot is a visual tool to help us figure out if the empirical distribution of our variable fits (or rather, comes from) a theoretical normal distribution.
We assess normality an break this plot out by a grouping variable.

risk_score), the
data are somewhat normal, however, it is clear that the data tend to
curl away from the normality line at the tails of each
distribution. This indicates some deviation from normality.
Therefore, it is safe to proceed with the statistical test.risk_score broken out by
race_binary, the variables do not seem to
drastically deviate from normality. Therefore,
we can assume normality.The calculation for the t-Test is:
\(t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{SD_1^2}{n_1}+\frac{SD_2^2}{n_2}}}\)
where…
In addition, the degrees of freedom (\(df\)) for the test is…
\(df = n_1 + n_2 -2\) (aka \(df = N-2\))
To run the independent samples t-test in R, we use the traditional
t.test function. But, in
the vannstats package, we
can use the is.t.
Within the is.t
function, the data frame is listed first, followed by the
(interval-ratio level) dependent variable, followed by the
(discrete/categorical) independent variable is listed second.
If you meet the assumptions of the independent samples t-test, you
can assume equal variances, which is assumed by default
in the function (using the call var.equal=TRUE). If you violate
this assumption, you must add the following call to the function: var.equal=FALSE.
## Call:
## is.t(df = data1, var1 = risk_score, var2 = race_binary)
##
## Independent Samples (Two Sample) t-test:
##
## 𝑡 Critical 𝑡 df p-value
## 5.6788 1.9610 1736 0.00000001587 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Group Means:
## x̅: non-white x̅: white
## 4.705524 3.797853
In the output above, we see the t-obtained value (5.6788, or rather, \(\pm\) 5.6788), the degrees of freedom (1736), and the p-value (.00000001587, which is less than our set alpha level of .05).
To interpret the findings, we report the following information:
“Using an independent samples t-test, I reject/fail to reject the null hypothesis that there is no mean difference between group 1 and group 2, in the population, \(t(?) = ?, p ? .05\)”
“Using an independent samples t-test, I reject the null hypothesis that there is no difference between the mean risk score for people of color and white individuals, in the population, \(t(1736) = \pm 5.6788, p \lt .05\)” .
This means that, on average, non-white folks (people of color) have higher mean risk scores than white individuals.