risk_score) from our sample and the risk
score of the population of defendants?Here, we’ll be working from the Defendants2025 data set, to
examine differences in the mean defendant’s risk score
(risk_score: measured as an interval-ratio
variable) in our sample and the (hypothetical) mean risk score for all
defendants in the population.
The one sample t-test examines the differences in means between two groups – the mean of our sample and the mean in our population.
The assumptions for a t-test are…
Defendants2025 data have been
randomly-sampled, we have met the assumption of independence of
observations.Plot the histogram for risk score (Y variable)…

risk_score) is
relatively normal.Boxplots also provide a visual representation of the normality of a distribution. The boxplot has a box, a line through the box, two whiskers on either end of the box, and sometimes dots/points outside the whiskers. Below, we get a sense of what each part of the boxplot represents…
To tell if a variable is normally-distrubted using the box-and-whisker plot, generally, we want to see that there is some distance between the box and the end of the whiskers, that the box isn’t pushed too close to either whisker, that the median line (dot) is near the center of the box, and that there aren’t many outliers (dots) on the outside of the whiskers.
To plot a boxplot for risk score we do the
following…

The quantile-quantile plot is a visual tool to help us figure out if the empirical distribution of our variable fits (or rather, comes from) a theoretical normal distribution.
We assess normality for risk score, using the
following

risk_score) is somewhat normal, however,
it is clear that the data tend to curl away from the normality
line at the tails of the distribution. This indicates some
deviation from normality. Therefore, it is safe to proceed with the
statistical test.risk_score, the variable does not seem to
drastically deviate from normality. Therefore,
we can assume normality.The calculation for the t-Test is:
\(t = \frac{\bar{x}-\mu_0}{\frac{SD}{\sqrt{n}}}\)
where…
In addition, the degrees of freedom (\(df\)) for the test is…
\(df = n - 1\)
To run the one sample t-test in R, we use the traditional t.test function. But, in the
vannstats package, we can
use the os.t.
Within the os.t
function, the data frame is listed first, followed by the
(interval-ratio level) variable for the sample, followed by the
(interval-ratio level) mean value for the population listed second.
If you meet the assumptions of the one sample t-test, you can
assume equal variances, which is assumed by default in
the function (using the call var.equal=TRUE). If you violate
this assumption, you must add the following call to the function: var.equal=FALSE.
## Call:
## os.t(df = data1, var1 = risk_score, mu = 3.6)
##
## One Sample t-test:
##
## 𝑡 Critical 𝑡 df p-value
## 13.567 1.961 1737 < 0.00000000000000022 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample and Population Means:
## x̅: μ:
## 4.506024 3.600000
In the output above, we see the t-obtained value -13.567, or rather, \(\pm\) 13.567), the degrees of freedom (1737), and the p-value (.00000000000000022, which is less than our set alpha level of .05).
To interpret the findings, we report the following information:
“Using a one sample t-test, I reject/fail to reject the null hypothesis that there is no mean difference between our sample and the population, \(t(?) = ?, p ? .05\)”