risk_score) by whether or not the defendant
had a gun in possession for their current case
(gun)?Here, we’ll be working from the Defendants2025 data set, to
examine mean differences in a defendant’s risk score the miles per
gallon of a car (risk_score: measured as an
interval-ratio variable), by whether or not they had a gun in their
possession (gun: measured no or
yes).
The t-test examines the differences in means between two groups, in effort to see if the differences reflect true differences that we could expect to find in the population.
The assumptions for a t-test are…
Defendants2025 data have been
randomly-sampled, we have met the assumption of independence of
observations.For both of the above (2 and 3) assumptions, we can examine the univariate data table, broken out by group:
##
## Descriptive statistics by group
## group: no
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1015 4.24 2.66 4.01 4.13 3.25 0.01 9.98 9.97 0.27 -0.96 0.08
## ------------------------------------------------------------------------------
## group: yes
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 723 4.88 2.91 4.77 4.85 3.66 0.04 10 9.96 0.07 -1.21 0.11
we have met the assumption of equal sample sizes. Further,
given that the standard deviations for both groups do not exceed a 3:1
ratio,
we have met the assumption of homogeneity of variance.Plot the histogram for risk score (Y variable) broken out by whether or not the defendant was in possession of a gun (levels of the X variable)…

risk_score) by the
predictor/grouping/independent variable (gun), are
relatively normal.Boxplots also provide a visual representation of the normality of a distribution. The boxplot has a box, a line through the box, two whiskers on either end of the box, and sometimes dots/points outside the whiskers. Below, we get a sense of what each part of the boxplot represents…
To tell if a variable is normally-distrubted using the box-and-whisker plot, generally, we want to see that there is some distance between the box and the end of the whiskers, that the box isn’t pushed too close to either whisker, that the median line (dot) is near the center of the box, and that there aren’t many outliers (dots) on the outside of the whiskers.
To plot a boxplot, broken out by gun possession, we can do the following…

The quantile-quantile plot is a visual tool to help us figure out if the empirical distribution of our variable fits (or rather, comes from) a theoretical normal distribution.
We assess normality an break this plot out by a grouping variable.

risk_score), the
data are somewhat normal, however, it is clear that the data tend to
curl away from the normality line at the tails of each
distribution. This indicates some deviation from normality.
Therefore, it is safe to proceed with the statistical test.risk_score broken out by
gun, the variables do not seem to drastically
deviate from normality. Therefore,
we can assume normality.The calculation for the t-Test is:
\(t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{SD_1^2}{n_1}+\frac{SD_2^2}{n_2}}}\)
where…
In addition, the degrees of freedom (\(df\)) for the test is…
\(df = n_1 + n_2 -2\) (aka \(df = N-2\))
To run the independent samples t-test in R, we use the traditional
t.test function. But, in
the vannstats package, we
can use the is.t.
Within the is.t
function, the data frame is listed first, followed by the
(interval-ratio level) dependent variable, followed by the
(discrete/categorical) independent variable is listed second.
If you meet the assumptions of the independent samples t-test, you
can assume equal variances, which is assumed by default
in the function (using the call var.equal=TRUE). If you violate
this assumption, you must add the following call to the function: var.equal=FALSE.
#t.test(data1$risk_score ~ data1$gun, var.equal=TRUE)
#ttest <- is.t(data1, risk_score, gun, var.equal=TRUE) #this is same as below
ttest <- is.t(data1, risk_score, gun)
summary(ttest)## Call:
## is.t(df = data1, var1 = risk_score, var2 = gun)
##
## Independent Samples (Two Sample) t-test:
##
## 𝑡 Critical 𝑡 df p-value
## -4.714 1.961 1736 2.623e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Group Means:
## x̅: no x̅: yes
## 4.241941 4.876763
In the output above, we see the t-obtained value (-4.714, or rather, \(\pm\) 4.714), the degrees of freedom (1736), and the p-value (.000002623, which is less than our set alpha level of .05).
To interpret the findings, we report the following information:
“Using an independent samples t-test, I reject/fail to reject the null hypothesis that there is no mean difference between group 1 and group 2, in the population, \(t(?) = ?, p ? .05\)”