Lab 15: Regression Tables

(Stats) Nothing new. (R) How to report results of regression in a table with R.

Published

April 18, 2023

Outline

Objectives

This lab will guide you through the process of

Creating simple display tables
Creating interactive displays for large HTML tables
Building data summary tables
Building regression tables
Exporting tables

R Packages

We will be using the following packages:

⚠️ Don’t forget to install gt and gtsummary with install.packages(c("gt", "gtsummary")). Best to run this in the console!

library(archdata)
library(gt)
library(gtsummary)
library(here)
library(palmerpenguins)
library(tidyverse)

Data

DartPoints
- Includes measurements of 91 Archaic dart points recovered during surface surveys at Fort Hood, Texas.
- package: archdata
- reference: https://cran.r-project.org/web/packages/archdata/archdata.pdf
penguins
- Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
- package: palmerpenguins
- reference: https://allisonhorst.github.io/palmerpenguins/reference/penguins.html
Snodgrass
- Includes measurements of size, location, and contents of 91 pit houses at the Snodgrass site in Butler County, Missouri.
- reference: https://cran.r-project.org/web/packages/archdata/archdata.pdf

Grammar of Tables

Tables have a grammar? 🤔 Well, sort of… the grammar here refers to a cohesive language for describing the parts of a table, not a data table, per se, but a display table, a table meant to represent your data rather than simply store it.

A simple table of data includes column or variable labels and a body (all the rows and cells containing values of the variables). A display table, though, can also include (i) a header containing a title for the whole table (and not just a column!), (ii) a footer with, well, footnotes, and (iii) a “stub”, which includes row or observation labels and groupings of those observations. To create a display table, you can use the eponymous gt() function. It will create a gt object having all the components shown in the figure above, though some may be excluded if you do not explicitly specify them. Importantly, you can use gt() to generate tables in the most common formats, namely HTML, LaTeX, and RTF, but you can also export the tables in even larger number of formats, including HTML, PDF, Word, and PNG (if you want an image of the table). That said, the real power of gt is its support of HTML tables, and in particular interactive tables.

As a simple motivating example, consider the scenario where a reviewer asks you to include a table with all your raw data in the supplement. That’s pretty easy with gt(). In the following example, we’ll also specify the table header with tab_header() and add a footnote with tab_footnote().

head(penguins) |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("*I added a footnote to this table.")

Table 1. Palmer Penguins Data
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
this is the header
Adelie	Torgersen	39.1	18.7	181	3750	male	2007
Adelie	Torgersen	39.5	17.4	186	3800	female	2007
Adelie	Torgersen	40.3	18.0	195	3250	female	2007
Adelie	Torgersen	NA	NA	NA	NA	NA	2007
Adelie	Torgersen	36.7	19.3	193	3450	female	2007
Adelie	Torgersen	39.3	20.6	190	3650	male	2007
*I added a footnote to this table.

But what if your table has a lot of data, like hundreds of rows? Obviously, you can just dump all that data into a giant table and let the user suffer through navigating it. Alternatively, you know, if you have a 💟, you can use gt() to create an interactive HTML table with search and scroll features. You simply pass the gt table to opt_interactive().

⚠️ A word of caution, this is actually a brand new feature that is under active development, so it might be a smidge buggy. It will also only work in HTML documents, not Word or PDF.

penguins |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("I added a footnote to this table.") |> 
  opt_interactive(
    use_compact_mode = TRUE, # squish table
    use_highlight = TRUE, # highlight rows on mouse hover
    use_page_size_select = TRUE, # specify number of rows displayed
    use_resizers = TRUE, # allow resizing columns
    use_search = TRUE # add a search text box to table
  ) |> 
  tab_options(container.height = px(500))

this is the header

Notice that I used px(500) to specify the height of the container. The px() function is a helper for specifying height or width in pixels, rather than, say, inches or centimeters.

With the gt package, the sky is the limit on formatting beautiful tables in R. For instance, you can add color codes to columns with data_color() like so.

head(penguins) |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("*I added a footnote to this table.") |> 
  data_color(
    columns = bill_length_mm:body_mass_g,
    palette = "BrBG"
  )

Table 1. Palmer Penguins Data
species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
this is the header
Adelie	Torgersen	39.1	18.7	181	3750	male	2007
Adelie	Torgersen	39.5	17.4	186	3800	female	2007
Adelie	Torgersen	40.3	18.0	195	3250	female	2007
Adelie	Torgersen	NA	NA	NA	NA	NA	2007
Adelie	Torgersen	36.7	19.3	193	3450	female	2007
Adelie	Torgersen	39.3	20.6	190	3650	male	2007
*I added a footnote to this table.

And that’s just the tip of the iceberg! To learn more, I recommend perusing the package website and playing around with different functions and settings.

Exercises

Load in the DartPoints data with data("DartPoints").
Make all the variable names lower case with rename_with(tolower).
Subset the data to include only name, tarl (the Smithsonian Trinomial), length, width, and thickness.
Create an interactive table with the DartPoints data.
- Add a header and footnote with tab_header() and tab_footnote().
- Specify the height of the table’s container.
- Try experimenting with the different arguments you can pass to opt_interactive() to see what they do.

Data summary

The number and variety of options available in the gt package can be overwhelming. It’s also not capable of summarizing data or models by itself. That’s where the gtsummary package comes in. It’s a wrapper around gt that automatically generates summary tables of data and regression models. To generate a summary table of data, we use the tbl_summary() function. This is similar to the skim function from the skimr package, but with a slightly different aesthetic. It also gives you more fine grained control over the output and style, which means it’s a smidge more complicated to work with, too.

penguins |> tbl_summary()

Characteristic	N = 344¹
species
Adelie	152 (44%)
Chinstrap	68 (20%)
Gentoo	124 (36%)
island
Biscoe	168 (49%)
Dream	124 (36%)
Torgersen	52 (15%)
bill_length_mm	44.5 (39.2, 48.5)
Unknown	2
bill_depth_mm	17.30 (15.60, 18.70)
Unknown	2
flipper_length_mm	197 (190, 213)
Unknown	2
body_mass_g	4,050 (3,550, 4,750)
Unknown	2
sex
female	165 (50%)
male	168 (50%)
Unknown	11
year
2007	110 (32%)
2008	114 (33%)
2009	120 (35%)
¹ n (%); Median (IQR)

By default, the summary gives you the proportions for categorical data, and the median values of continuous variables (along with their first and third quartiles or their interquartile range, IQR, in parentheses). ‘Unknown’ here refers to missing values. These are fine as far as they go, but the styling leaves a lot to be desired, especially if vertical space is prime real estate in whatever document you are creating.

In this case, something we might actually care about in our analysis is differences between species, which requires that we summarize the data by species. To do that, we simply specify the species variable in the by= argument.

penguins |> tbl_summary(by = species)

Characteristic	Adelie, N = 152¹	Chinstrap, N = 68¹	Gentoo, N = 124¹
island
Biscoe	44 (29%)	0 (0%)	124 (100%)
Dream	56 (37%)	68 (100%)	0 (0%)
Torgersen	52 (34%)	0 (0%)	0 (0%)
bill_length_mm	38.8 (36.8, 40.8)	49.5 (46.3, 51.1)	47.3 (45.3, 49.5)
Unknown	1	0	1
bill_depth_mm	18.40 (17.50, 19.00)	18.45 (17.50, 19.40)	15.00 (14.20, 15.70)
Unknown	1	0	1
flipper_length_mm	190 (186, 195)	196 (191, 201)	216 (212, 221)
Unknown	1	0	1
body_mass_g	3,700 (3,350, 4,000)	3,700 (3,488, 3,950)	5,000 (4,700, 5,500)
Unknown	1	0	1
sex
female	73 (50%)	34 (50%)	58 (49%)
male	73 (50%)	34 (50%)	61 (51%)
Unknown	6	0	5
year
2007	50 (33%)	26 (38%)	34 (27%)
2008	50 (33%)	18 (26%)	46 (37%)
2009	52 (34%)	24 (35%)	44 (35%)
¹ n (%); Median (IQR)

By default, the table uses the column labels from the table. R doesn’t like spaces in names, so it’s common to see underscores in the labels. We can amend these in the table by specifying all the changes in a named list.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  )

Characteristic	Adelie, N = 152¹	Chinstrap, N = 68¹	Gentoo, N = 124¹
Island
Biscoe	44 (29%)	0 (0%)	124 (100%)
Dream	56 (37%)	68 (100%)	0 (0%)
Torgersen	52 (34%)	0 (0%)	0 (0%)
Bill length (mm)	38.8 (36.8, 40.8)	49.5 (46.3, 51.1)	47.3 (45.3, 49.5)
Unknown	1	0	1
Bill depth (mm)	18.40 (17.50, 19.00)	18.45 (17.50, 19.40)	15.00 (14.20, 15.70)
Unknown	1	0	1
Flipper length (mm)	190 (186, 195)	196 (191, 201)	216 (212, 221)
Unknown	1	0	1
Body mass (g)	3,700 (3,350, 4,000)	3,700 (3,488, 3,950)	5,000 (4,700, 5,500)
Unknown	1	0	1
Sex
female	73 (50%)	34 (50%)	58 (49%)
male	73 (50%)	34 (50%)	61 (51%)
Unknown	6	0	5
Year
2007	50 (33%)	26 (38%)	34 (27%)
2008	50 (33%)	18 (26%)	46 (37%)
2009	52 (34%)	24 (35%)	44 (35%)
¹ n (%); Median (IQR)

The gtsummary package also offers some helper functions for adding various columns to a summary table. For example, you can add a column with row totals using the add_overall() function.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  ) |> 
  add_overall(last = TRUE)

Characteristic	Adelie, N = 152¹	Chinstrap, N = 68¹	Gentoo, N = 124¹	Overall, N = 344¹
Island
Biscoe	44 (29%)	0 (0%)	124 (100%)	168 (49%)
Dream	56 (37%)	68 (100%)	0 (0%)	124 (36%)
Torgersen	52 (34%)	0 (0%)	0 (0%)	52 (15%)
Bill length (mm)	38.8 (36.8, 40.8)	49.5 (46.3, 51.1)	47.3 (45.3, 49.5)	44.5 (39.2, 48.5)
Unknown	1	0	1	2
Bill depth (mm)	18.40 (17.50, 19.00)	18.45 (17.50, 19.40)	15.00 (14.20, 15.70)	17.30 (15.60, 18.70)
Unknown	1	0	1	2
Flipper length (mm)	190 (186, 195)	196 (191, 201)	216 (212, 221)	197 (190, 213)
Unknown	1	0	1	2
Body mass (g)	3,700 (3,350, 4,000)	3,700 (3,488, 3,950)	5,000 (4,700, 5,500)	4,050 (3,550, 4,750)
Unknown	1	0	1	2
Sex
female	73 (50%)	34 (50%)	58 (49%)	165 (50%)
male	73 (50%)	34 (50%)	61 (51%)	168 (50%)
Unknown	6	0	5	11
Year
2007	50 (33%)	26 (38%)	34 (27%)	110 (32%)
2008	50 (33%)	18 (26%)	46 (37%)	114 (33%)
2009	52 (34%)	24 (35%)	44 (35%)	120 (35%)
¹ n (%); Median (IQR)

We can also add some custom formatting, like adding a caption, removing the “Characteristic” label (that column should be obvious), and making the variable names or labels bold.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  ) |> 
  add_overall(last = TRUE) |> 
  modify_header(label ~ "") |>
  modify_caption("**Table 1. Penguin Characteristics**") |> 
  bold_labels()

**Table 1. Penguin Characteristics**
	Adelie, N = 152¹	Chinstrap, N = 68¹	Gentoo, N = 124¹	Overall, N = 344¹
Island
Biscoe	44 (29%)	0 (0%)	124 (100%)	168 (49%)
Dream	56 (37%)	68 (100%)	0 (0%)	124 (36%)
Torgersen	52 (34%)	0 (0%)	0 (0%)	52 (15%)
Bill length (mm)	38.8 (36.8, 40.8)	49.5 (46.3, 51.1)	47.3 (45.3, 49.5)	44.5 (39.2, 48.5)
Unknown	1	0	1	2
Bill depth (mm)	18.40 (17.50, 19.00)	18.45 (17.50, 19.40)	15.00 (14.20, 15.70)	17.30 (15.60, 18.70)
Unknown	1	0	1	2
Flipper length (mm)	190 (186, 195)	196 (191, 201)	216 (212, 221)	197 (190, 213)
Unknown	1	0	1	2
Body mass (g)	3,700 (3,350, 4,000)	3,700 (3,488, 3,950)	5,000 (4,700, 5,500)	4,050 (3,550, 4,750)
Unknown	1	0	1	2
Sex
female	73 (50%)	34 (50%)	58 (49%)	165 (50%)
male	73 (50%)	34 (50%)	61 (51%)	168 (50%)
Unknown	6	0	5	11
Year
2007	50 (33%)	26 (38%)	34 (27%)	110 (32%)
2008	50 (33%)	18 (26%)	46 (37%)	114 (33%)
2009	52 (34%)	24 (35%)	44 (35%)	120 (35%)
¹ n (%); Median (IQR)

Notice that you can use markdown syntax, like including asterisks around text you want to bold or italicize, and it will be rendered appropriately.

Exercises

Create a summary table of the DartPoints data with tbl_summary().
Use the by= argument to summarize by dart type (the variable name is name).
Update the labels as needed.
Add the overall counts for each group.
Remove the “Characteristic” label with modify_header().
Add a caption.
Make the variable labels bold.

Regression summary

Summarizing models works in a similar way to summarizing data tables. Consider this simple linear model of bill length by flipper length and species.

lm_penguins <- lm(
  bill_length_mm ~ flipper_length_mm + species, 
  data = penguins
)

Here is the base R summary of the model:

lm_penguins |> summary()
#> 
#> Call:
#> lm(formula = bill_length_mm ~ flipper_length_mm + species, data = penguins)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -6.662 -1.746  0.028  1.825 12.354 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)        -2.0586     4.0386   -0.51                 0.61    
#> flipper_length_mm   0.2151     0.0212   10.13 < 0.0000000000000002 ***
#> speciesChinstrap    8.7801     0.3991   22.00 < 0.0000000000000002 ***
#> speciesGentoo       2.8569     0.6586    4.34             0.000019 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.6 on 338 degrees of freedom
#>   (2 observations deleted due to missingness)
#> Multiple R-squared:  0.776,  Adjusted R-squared:  0.774 
#> F-statistic:  390 on 3 and 338 DF,  p-value: <0.0000000000000002

And here is the gtsummary:

lm_penguins |> tbl_regression()

Characteristic	Beta	95% CI¹	p-value
flipper_length_mm	0.22	0.17, 0.26	<0.001
species
Adelie	—	—
Chinstrap	8.8	8.0, 9.6	<0.001
Gentoo	2.9	1.6, 4.2	<0.001
¹ CI = Confidence Interval

The first thing to note here is that this is just the coefficients table from the summary() output. Instead of “Estimate”, though, it uses the label “Beta” - just another way of referring to the coefficient estimates or estimates of the betas for the different variables. The second and way more important thing to note is that the intercept is not reported by default. I do not know why this is the case, but I recommend you always do so (at least if it’s a model that estimates an intercept). To do that, we add intercept = TRUE. While we’re at, let’s also rename the variables.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  )

Characteristic	Beta	95% CI¹	p-value
(Intercept)	-2.1	-10, 5.9	0.6
Flipper length (mm)	0.22	0.17, 0.26	<0.001
Species
Adelie	—	—
Chinstrap	8.8	8.0, 9.6	<0.001
Gentoo	2.9	1.6, 4.2	<0.001
¹ CI = Confidence Interval

Also, the test statistic used to generate the p-value is currently suppressed, and instead of reporting the standard error of the estimates, the 95% confidence interval is shown instead. These are some odd design choices by the package authors, maybe related to the fact that they designed the package to work with every model you can think of putting together in R. Whatever the reason, I recommend that you unhide those columns like so.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error))

Characteristic	Beta	SE¹	Statistic	95% CI¹	p-value
(Intercept)	-2.1	4.04	-0.510	-10, 5.9	0.6
Flipper length (mm)	0.22	0.021	10.1	0.17, 0.26	<0.001
Species
Adelie	—	—	—	—
Chinstrap	8.8	0.399	22.0	8.0, 9.6	<0.001
Gentoo	2.9	0.659	4.34	1.6, 4.2	<0.001
¹ SE = Standard Error, CI = Confidence Interval

Finally, the coefficients table doesn’t have information about the model as a whole, like the \(R^2\) value or the F-statistic used in the ANOVA. We can add that information to our table either as additional rows with add_glance_table() or as a footnote with add_glance_source_note(). My own preference is for the latter, but journals will have their own formatting guidelines.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(adj.r.squared, sigma, statistic, df.residual, df, p.value),
    label = list(
      sigma = "σ",
      statistic = "F-statistic"
    )
  )

Characteristic	Beta	SE¹	Statistic	95% CI¹	p-value
(Intercept)	-2.1	4.04	-0.510	-10, 5.9	0.6
Flipper length (mm)	0.22	0.021	10.1	0.17, 0.26	<0.001
Species
Adelie	—	—	—	—
Chinstrap	8.8	0.399	22.0	8.0, 9.6	<0.001
Gentoo	2.9	0.659	4.34	1.6, 4.2	<0.001
Adjusted R² = 0.774; σ = 2.60; F-statistic = 390; Residual df = 338; df = 3; p-value = <0.001
¹ SE = Standard Error, CI = Confidence Interval

By default, a number of statistics will be added to the table, but you should limit these to the ones that are most relevant to evaluating the type of model you have fit. In this case, we fit a linear model, so we included the adjusted \(R^2\), the residual standard error (sigma), and information about the ANOVA.

Let’s do one more example with a GLM. Here we’ll see if we can differentiate Gentoo penguins from the rest. Hint: they’re the really big ones. Then we’ll walk through the steps of displaying the model.

gentoo <- penguins |> 
  mutate(gentoo = ifelse(species == "Gentoo", 1, 0))

glm_penguins <- glm(
  gentoo ~ body_mass_g + bill_length_mm, 
  family = binomial,
  data = gentoo
)

glm_penguins |> summary()
#> 
#> Call:
#> glm(formula = gentoo ~ body_mass_g + bill_length_mm, family = binomial, 
#>     data = gentoo)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.3865  -0.1753  -0.0390   0.0622   2.6125  
#> 
#> Coefficients:
#>                  Estimate Std. Error z value          Pr(>|z|)    
#> (Intercept)    -31.990945   4.528334   -7.06 0.000000000001611 ***
#> body_mass_g      0.006257   0.000821    7.62 0.000000000000026 ***
#> bill_length_mm   0.091252   0.057726    1.58              0.11    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 446.80  on 341  degrees of freedom
#> Residual deviance: 115.33  on 339  degrees of freedom
#>   (2 observations deleted due to missingness)
#> AIC: 121.3
#> 
#> Number of Fisher Scoring iterations: 7

glm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  )

Characteristic	log(OR)¹	SE¹	Statistic	95% CI¹	p-value
(Intercept)	-32	4.53	-7.06	-42, -24	<0.001
Body mass (g)	0.01	0.001	7.62	0.00, 0.01	<0.001
Bill length (mm)	0.09	0.058	1.58	-0.02, 0.21	0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
¹ OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

Note that we are given estimates of the coefficients on the logit or log odds ratio scale. This is a value that can be really hard to interpret, so it’s recommended that you exponentiate these terms to get them onto the odds ratio scale (in this case the probability of being a Gentoo penguin relative to the probability of not being a Gentoo penguin). To do that, we simply tell tbl_regression() to exponentiate the terms for us.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  )

Characteristic	OR¹	SE¹	Statistic	95% CI¹	p-value
(Intercept)	0.00	4.53	-7.06	0.00, 0.00	<0.001
Body mass (g)	1.01	0.001	7.62	1.00, 1.01	<0.001
Bill length (mm)	1.10	0.058	1.58	0.98, 1.23	0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
¹ OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

A final note about formatting. All the things you can change in a data summary table, you can modify in a regression summary table, too.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  modify_header(label ~ "") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()

**Table 2. Gentoo Model**
	OR¹	SE¹	Statistic	95% CI¹	p-value
(Intercept)	0.00	4.53	-7.06	0.00, 0.00	<0.001
Body mass (g)	1.01	0.001	7.62	1.00, 1.01	<0.001
Bill length (mm)	1.10	0.058	1.58	0.98, 1.23	0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
¹ OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

You can also re-order the columns, but it’s a little tricky because of the structure of the summary table. We modify the table body, relocating columns or variables with the dplyr function relocate(), telling it to move the statistic column before the p.value column.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  modify_table_body(~.x |> relocate(statistic, .before = p.value)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  modify_header(label ~ "") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()

**Table 2. Gentoo Model**
	OR¹	SE¹	95% CI¹	Statistic	p-value
(Intercept)	0.00	4.53	0.00, 0.00	-7.06	<0.001
Body mass (g)	1.01	0.001	1.00, 1.01	7.62	<0.001
Bill length (mm)	1.10	0.058	0.98, 1.23	1.58	0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
¹ OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

This ordering makes a smidge more sense because SE and CI are naturally related to each other, as are the test statistic and p-value.

Exercises

Create a regression table for a logistic model fit to the Snodgrass data.
Load in the Snodgrass data with data("Snodgrass").
Make all the variable names lower case with rename_with(tolower).
Use select() to subset the data to include only the area of each structure and whether it is inside the inner walls.
Use a GLM to model whether the structure is found inside the inner walls as a function of its total area.
- Formula should be inside ~ area.
- Set family = binomial and data = Snodgrass.
Summarize the model with summary().
Now create a regression table with tbl_regression().
- Be sure to exponentiate the coefficient estimates.
- Include the intercept.
- Unhide the test statistic and standard error for the coefficient estimates with modify_column_unhide().
- Be sure to re-arrange the columns with modify_table_body(~.x |> relocate(statistic, .before = p.value)).
- Add model statistics with add_glance_source_note().
- Add a caption with modify_caption().
- Remove the “Characteristic” label with modify_header(label ~ "").

Exporting tables

The gtsummary package doesn’t provide tools for exporting tables. Fortunately, it does let you convert its summary tables to gt tables, and the gt package does provide tools for exporting tables.

Here’s the last table we made in the previous section:

gtsummary_penguins <- glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  modify_table_body(~.x |> relocate(statistic, .before = p.value)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  # modify_header(label ~ " ") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()

We convert this to a gt object using as_gt() and save it to disk with the gtsave() function from the gt package. This works similar to write_csv() and ggsave(). We simply tell it what table we want to save and where we want to save it.

gtsave(
  as_gt(gtsummary_penguins),
  file = here("manuscript", "model-summary.png")
)

Saving as a png actually involves taking a screen shot of the HTML table and requires the webshot2 package be installed. You’ll get a note to this effect if you trying saving to this format and don’t have webshot2 installed already. You can also save to other formats like PDF (.pdf) and Word (.docx), though the latter probably won’t work right now (there appears to be a bug in the code for gtsave() that is currently being worked on).

Exercises

Make sure you are in the QAAD project directory.
Add a folder called “manuscript.”
Now try saving the data summary and regression summary tables you made in the last two sections into that folder. Use here("manuscript", <name of file.png>).
Navigate to that folder and check that it successfully saved.

Homework

No homework this week.