Formatting + Summary Statistics

Lecture 3

Dr. Elijah Meyer

NC State University
ST 511 - Fall 2025

2025-08-25

Checklist

– Are you keeping up with the prepare material?

– Did you clone today’s repo? We will demo this again today.

      If not, you can open up the raw .qmd file in the workbench (located on Moodle)
      If you can not clone a repo, please talk to me asap

– Can you make a PDF?

    Run tinytex::install_tinytex() in your Console and restart R

– Quiz-1 released (Due Friday on Moodle)

– HW-1 released Thursday afternoon

Announcements

Having trouble with a memory issue on the workbench?

Terry from IT is on it, and asks that instead of just closing the browser, to please log-out. You will not lose any work doing so.

Warm up

What is a function?

What is a R package?

Warm up

Function - A function is a set of statements organized together to perform a specific task. It’s an action that we can call to perform.

Example

glimpse

library(tidyverse)

glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

Warm up

R packages are extensions to the base R statistical programming language

Warm up

What do we call the following?

Warm up

Shortcut for Windows: ctrl + alt + I

Shortcut for Mac: cmd + option + I

Warm up

What’s wrong?

Warm up

Make sure that your code chunks have the three tick marks left aligned, or your code chunk won’t close.

Formatting in Quarto

Goals

– Section headers

– Bold / Italicize

– Insert tables

– Output control

The tidyverse pipe

|> is called a pipe operator

This is used to emphasize a sequence of coding actions

“and then”

The tidyverse pipe

The tidyverse pipe

Compare

library(tidyverse)

glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

vs

library(tidyverse)

mtcars |>
  glimpse()
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

New functions

group_by()

summarise()

n()

mean(); median(); sd() …etc.

Summary statistics review

Question

How do we summarize quantitative variables? How do we summarize categorical variables?

Summary statistics

\[ mean =\bar{x} = \frac{\sum_{i=1}^{n}{x_i}}{n} \]

Median = The middle number

There is no widely accepted standard notation for the median

\[ proportion = \hat{p} = \frac{success}{total} \]

Other statistics

– Standard deviation

– IQR

Standard deviation

Numerical summary of how spread out your observations are from the center (mean)

\[ sd = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}} \]

IQR

Measures to spread of your data.

Q3 - Q1

75th percentile - 25 percentile

We can calculate all of these (and more) in R

Practice

In summary

– We use the pipe operator when we are writing a sequence of actions

group_by() groups our data and allows us to create summary statistics on the grouped data

summarise() allows us to calculate summary statistics!

Plots

What types of plots can we make?

Golden Rule We let the type of variable(s) dictate the appropriate plot

  • Quantitative

  • Categorical

Pick a plot

What plot is appropriate to graph the following scenarios

– One quantitative variable

– One quantitative variable; one categorical variable

– Two quantitative variables

– One categorical variable

– Two categorical variables

– Scatter plot

– Histogram

– Bar plot

– Segmented bar plot

– Box plot

Scatter plot

Two quantitative variables

data |>
  ggplot() +
  geom_point()

Histogram

One quantitative variable

data |>
  ggplot() +
  geom_histogram()

Bar plot

One categorical variable

data |>
  ggplot() +
  geom_bar() #or geom_col

Segmented bar plot

Two categorical variables

data |>
  ggplot() +
  geom_bar()

Boxplot

One quantitative; One categorical

data |>
  ggplot() +
  geom_boxplot()