Lab 15: Regression Tables

(Stats) Nothing new. (R) How to report results of regression in a table with R.

Published

April 18, 2023

Outline

Objectives

This lab will guide you through the process of

  1. Creating simple display tables
  2. Creating interactive displays for large HTML tables
  3. Building data summary tables
  4. Building regression tables
  5. Exporting tables

R Packages

We will be using the following packages:

⚠️ Don’t forget to install gt and gtsummary with install.packages(c("gt", "gtsummary")). Best to run this in the console!

Data

Grammar of Tables

Tables have a grammar? 🤔 Well, sort of… the grammar here refers to a cohesive language for describing the parts of a table, not a data table, per se, but a display table, a table meant to represent your data rather than simply store it.

A simple table of data includes column or variable labels and a body (all the rows and cells containing values of the variables). A display table, though, can also include (i) a header containing a title for the whole table (and not just a column!), (ii) a footer with, well, footnotes, and (iii) a “stub”, which includes row or observation labels and groupings of those observations. To create a display table, you can use the eponymous gt() function. It will create a gt object having all the components shown in the figure above, though some may be excluded if you do not explicitly specify them. Importantly, you can use gt() to generate tables in the most common formats, namely HTML, LaTeX, and RTF, but you can also export the tables in even larger number of formats, including HTML, PDF, Word, and PNG (if you want an image of the table). That said, the real power of gt is its support of HTML tables, and in particular interactive tables.

As a simple motivating example, consider the scenario where a reviewer asks you to include a table with all your raw data in the supplement. That’s pretty easy with gt(). In the following example, we’ll also specify the table header with tab_header() and add a footnote with tab_footnote().

head(penguins) |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("*I added a footnote to this table.")
Table 1. Palmer Penguins Data
this is the header
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
*I added a footnote to this table.

But what if your table has a lot of data, like hundreds of rows? Obviously, you can just dump all that data into a giant table and let the user suffer through navigating it. Alternatively, you know, if you have a 💟, you can use gt() to create an interactive HTML table with search and scroll features. You simply pass the gt table to opt_interactive().

⚠️ A word of caution, this is actually a brand new feature that is under active development, so it might be a smidge buggy. It will also only work in HTML documents, not Word or PDF.

penguins |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("I added a footnote to this table.") |> 
  opt_interactive(
    use_compact_mode = TRUE, # squish table
    use_highlight = TRUE, # highlight rows on mouse hover
    use_page_size_select = TRUE, # specify number of rows displayed
    use_resizers = TRUE, # allow resizing columns
    use_search = TRUE # add a search text box to table
  ) |> 
  tab_options(container.height = px(500))
this is the header

Notice that I used px(500) to specify the height of the container. The px() function is a helper for specifying height or width in pixels, rather than, say, inches or centimeters.

With the gt package, the sky is the limit on formatting beautiful tables in R. For instance, you can add color codes to columns with data_color() like so.

head(penguins) |> 
  gt() |> 
  tab_caption("Table 1. Palmer Penguins Data") |> 
  tab_header(title = "this is the header") |>  
  tab_footnote("*I added a footnote to this table.") |> 
  data_color(
    columns = bill_length_mm:body_mass_g,
    palette = "BrBG"
  )
Table 1. Palmer Penguins Data
this is the header
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
*I added a footnote to this table.

And that’s just the tip of the iceberg! To learn more, I recommend perusing the package website and playing around with different functions and settings.

Exercises

  1. Load in the DartPoints data with data("DartPoints").
  2. Make all the variable names lower case with rename_with(tolower).
  3. Subset the data to include only name, tarl (the Smithsonian Trinomial), length, width, and thickness.
  4. Create an interactive table with the DartPoints data.

Data summary

The number and variety of options available in the gt package can be overwhelming. It’s also not capable of summarizing data or models by itself. That’s where the gtsummary package comes in. It’s a wrapper around gt that automatically generates summary tables of data and regression models. To generate a summary table of data, we use the tbl_summary() function. This is similar to the skim function from the skimr package, but with a slightly different aesthetic. It also gives you more fine grained control over the output and style, which means it’s a smidge more complicated to work with, too.

penguins |> tbl_summary()
Characteristic N = 3441
species
    Adelie 152 (44%)
    Chinstrap 68 (20%)
    Gentoo 124 (36%)
island
    Biscoe 168 (49%)
    Dream 124 (36%)
    Torgersen 52 (15%)
bill_length_mm 44.5 (39.2, 48.5)
    Unknown 2
bill_depth_mm 17.30 (15.60, 18.70)
    Unknown 2
flipper_length_mm 197 (190, 213)
    Unknown 2
body_mass_g 4,050 (3,550, 4,750)
    Unknown 2
sex
    female 165 (50%)
    male 168 (50%)
    Unknown 11
year
    2007 110 (32%)
    2008 114 (33%)
    2009 120 (35%)
1 n (%); Median (IQR)

By default, the summary gives you the proportions for categorical data, and the median values of continuous variables (along with their first and third quartiles or their interquartile range, IQR, in parentheses). ‘Unknown’ here refers to missing values. These are fine as far as they go, but the styling leaves a lot to be desired, especially if vertical space is prime real estate in whatever document you are creating.

In this case, something we might actually care about in our analysis is differences between species, which requires that we summarize the data by species. To do that, we simply specify the species variable in the by= argument.

penguins |> tbl_summary(by = species)
Characteristic Adelie, N = 1521 Chinstrap, N = 681 Gentoo, N = 1241
island
    Biscoe 44 (29%) 0 (0%) 124 (100%)
    Dream 56 (37%) 68 (100%) 0 (0%)
    Torgersen 52 (34%) 0 (0%) 0 (0%)
bill_length_mm 38.8 (36.8, 40.8) 49.5 (46.3, 51.1) 47.3 (45.3, 49.5)
    Unknown 1 0 1
bill_depth_mm 18.40 (17.50, 19.00) 18.45 (17.50, 19.40) 15.00 (14.20, 15.70)
    Unknown 1 0 1
flipper_length_mm 190 (186, 195) 196 (191, 201) 216 (212, 221)
    Unknown 1 0 1
body_mass_g 3,700 (3,350, 4,000) 3,700 (3,488, 3,950) 5,000 (4,700, 5,500)
    Unknown 1 0 1
sex
    female 73 (50%) 34 (50%) 58 (49%)
    male 73 (50%) 34 (50%) 61 (51%)
    Unknown 6 0 5
year
    2007 50 (33%) 26 (38%) 34 (27%)
    2008 50 (33%) 18 (26%) 46 (37%)
    2009 52 (34%) 24 (35%) 44 (35%)
1 n (%); Median (IQR)

By default, the table uses the column labels from the table. R doesn’t like spaces in names, so it’s common to see underscores in the labels. We can amend these in the table by specifying all the changes in a named list.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  ) 
Characteristic Adelie, N = 1521 Chinstrap, N = 681 Gentoo, N = 1241
Island
    Biscoe 44 (29%) 0 (0%) 124 (100%)
    Dream 56 (37%) 68 (100%) 0 (0%)
    Torgersen 52 (34%) 0 (0%) 0 (0%)
Bill length (mm) 38.8 (36.8, 40.8) 49.5 (46.3, 51.1) 47.3 (45.3, 49.5)
    Unknown 1 0 1
Bill depth (mm) 18.40 (17.50, 19.00) 18.45 (17.50, 19.40) 15.00 (14.20, 15.70)
    Unknown 1 0 1
Flipper length (mm) 190 (186, 195) 196 (191, 201) 216 (212, 221)
    Unknown 1 0 1
Body mass (g) 3,700 (3,350, 4,000) 3,700 (3,488, 3,950) 5,000 (4,700, 5,500)
    Unknown 1 0 1
Sex
    female 73 (50%) 34 (50%) 58 (49%)
    male 73 (50%) 34 (50%) 61 (51%)
    Unknown 6 0 5
Year
    2007 50 (33%) 26 (38%) 34 (27%)
    2008 50 (33%) 18 (26%) 46 (37%)
    2009 52 (34%) 24 (35%) 44 (35%)
1 n (%); Median (IQR)

The gtsummary package also offers some helper functions for adding various columns to a summary table. For example, you can add a column with row totals using the add_overall() function.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  ) |> 
  add_overall(last = TRUE)
Characteristic Adelie, N = 1521 Chinstrap, N = 681 Gentoo, N = 1241 Overall, N = 3441
Island
    Biscoe 44 (29%) 0 (0%) 124 (100%) 168 (49%)
    Dream 56 (37%) 68 (100%) 0 (0%) 124 (36%)
    Torgersen 52 (34%) 0 (0%) 0 (0%) 52 (15%)
Bill length (mm) 38.8 (36.8, 40.8) 49.5 (46.3, 51.1) 47.3 (45.3, 49.5) 44.5 (39.2, 48.5)
    Unknown 1 0 1 2
Bill depth (mm) 18.40 (17.50, 19.00) 18.45 (17.50, 19.40) 15.00 (14.20, 15.70) 17.30 (15.60, 18.70)
    Unknown 1 0 1 2
Flipper length (mm) 190 (186, 195) 196 (191, 201) 216 (212, 221) 197 (190, 213)
    Unknown 1 0 1 2
Body mass (g) 3,700 (3,350, 4,000) 3,700 (3,488, 3,950) 5,000 (4,700, 5,500) 4,050 (3,550, 4,750)
    Unknown 1 0 1 2
Sex
    female 73 (50%) 34 (50%) 58 (49%) 165 (50%)
    male 73 (50%) 34 (50%) 61 (51%) 168 (50%)
    Unknown 6 0 5 11
Year
    2007 50 (33%) 26 (38%) 34 (27%) 110 (32%)
    2008 50 (33%) 18 (26%) 46 (37%) 114 (33%)
    2009 52 (34%) 24 (35%) 44 (35%) 120 (35%)
1 n (%); Median (IQR)

We can also add some custom formatting, like adding a caption, removing the “Characteristic” label (that column should be obvious), and making the variable names or labels bold.

penguins |> 
  tbl_summary(
    by = species,
    label = list(
      island = "Island",
      bill_length_mm = "Bill length (mm)",
      bill_depth_mm = "Bill depth (mm)",
      flipper_length_mm = "Flipper length (mm)",
      body_mass_g = "Body mass (g)",
      sex = "Sex",
      year = "Year"
    )
  ) |> 
  add_overall(last = TRUE) |> 
  modify_header(label ~ "") |>
  modify_caption("**Table 1. Penguin Characteristics**") |> 
  bold_labels()
Table 1. Penguin Characteristics
Adelie, N = 1521 Chinstrap, N = 681 Gentoo, N = 1241 Overall, N = 3441
Island
    Biscoe 44 (29%) 0 (0%) 124 (100%) 168 (49%)
    Dream 56 (37%) 68 (100%) 0 (0%) 124 (36%)
    Torgersen 52 (34%) 0 (0%) 0 (0%) 52 (15%)
Bill length (mm) 38.8 (36.8, 40.8) 49.5 (46.3, 51.1) 47.3 (45.3, 49.5) 44.5 (39.2, 48.5)
    Unknown 1 0 1 2
Bill depth (mm) 18.40 (17.50, 19.00) 18.45 (17.50, 19.40) 15.00 (14.20, 15.70) 17.30 (15.60, 18.70)
    Unknown 1 0 1 2
Flipper length (mm) 190 (186, 195) 196 (191, 201) 216 (212, 221) 197 (190, 213)
    Unknown 1 0 1 2
Body mass (g) 3,700 (3,350, 4,000) 3,700 (3,488, 3,950) 5,000 (4,700, 5,500) 4,050 (3,550, 4,750)
    Unknown 1 0 1 2
Sex
    female 73 (50%) 34 (50%) 58 (49%) 165 (50%)
    male 73 (50%) 34 (50%) 61 (51%) 168 (50%)
    Unknown 6 0 5 11
Year
    2007 50 (33%) 26 (38%) 34 (27%) 110 (32%)
    2008 50 (33%) 18 (26%) 46 (37%) 114 (33%)
    2009 52 (34%) 24 (35%) 44 (35%) 120 (35%)
1 n (%); Median (IQR)

Notice that you can use markdown syntax, like including asterisks around text you want to bold or italicize, and it will be rendered appropriately.

Exercises

  1. Create a summary table of the DartPoints data with tbl_summary().
  2. Use the by= argument to summarize by dart type (the variable name is name).
  3. Update the labels as needed.
  4. Add the overall counts for each group.
  5. Remove the “Characteristic” label with modify_header().
  6. Add a caption.
  7. Make the variable labels bold.

Regression summary

Summarizing models works in a similar way to summarizing data tables. Consider this simple linear model of bill length by flipper length and species.

lm_penguins <- lm(
  bill_length_mm ~ flipper_length_mm + species, 
  data = penguins
)

Here is the base R summary of the model:

lm_penguins |> summary()
#> 
#> Call:
#> lm(formula = bill_length_mm ~ flipper_length_mm + species, data = penguins)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -6.662 -1.746  0.028  1.825 12.354 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)        -2.0586     4.0386   -0.51                 0.61    
#> flipper_length_mm   0.2151     0.0212   10.13 < 0.0000000000000002 ***
#> speciesChinstrap    8.7801     0.3991   22.00 < 0.0000000000000002 ***
#> speciesGentoo       2.8569     0.6586    4.34             0.000019 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.6 on 338 degrees of freedom
#>   (2 observations deleted due to missingness)
#> Multiple R-squared:  0.776,  Adjusted R-squared:  0.774 
#> F-statistic:  390 on 3 and 338 DF,  p-value: <0.0000000000000002

And here is the gtsummary:

lm_penguins |> tbl_regression()
Characteristic Beta 95% CI1 p-value
flipper_length_mm 0.22 0.17, 0.26 <0.001
species
    Adelie
    Chinstrap 8.8 8.0, 9.6 <0.001
    Gentoo 2.9 1.6, 4.2 <0.001
1 CI = Confidence Interval

The first thing to note here is that this is just the coefficients table from the summary() output. Instead of “Estimate”, though, it uses the label “Beta” - just another way of referring to the coefficient estimates or estimates of the betas for the different variables. The second and way more important thing to note is that the intercept is not reported by default. I do not know why this is the case, but I recommend you always do so (at least if it’s a model that estimates an intercept). To do that, we add intercept = TRUE. While we’re at, let’s also rename the variables.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  )
Characteristic Beta 95% CI1 p-value
(Intercept) -2.1 -10, 5.9 0.6
Flipper length (mm) 0.22 0.17, 0.26 <0.001
Species
    Adelie
    Chinstrap 8.8 8.0, 9.6 <0.001
    Gentoo 2.9 1.6, 4.2 <0.001
1 CI = Confidence Interval

Also, the test statistic used to generate the p-value is currently suppressed, and instead of reporting the standard error of the estimates, the 95% confidence interval is shown instead. These are some odd design choices by the package authors, maybe related to the fact that they designed the package to work with every model you can think of putting together in R. Whatever the reason, I recommend that you unhide those columns like so.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error))
Characteristic Beta SE1 Statistic 95% CI1 p-value
(Intercept) -2.1 4.04 -0.510 -10, 5.9 0.6
Flipper length (mm) 0.22 0.021 10.1 0.17, 0.26 <0.001
Species
    Adelie
    Chinstrap 8.8 0.399 22.0 8.0, 9.6 <0.001
    Gentoo 2.9 0.659 4.34 1.6, 4.2 <0.001
1 SE = Standard Error, CI = Confidence Interval

Finally, the coefficients table doesn’t have information about the model as a whole, like the \(R^2\) value or the F-statistic used in the ANOVA. We can add that information to our table either as additional rows with add_glance_table() or as a footnote with add_glance_source_note(). My own preference is for the latter, but journals will have their own formatting guidelines.

lm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      bill_length_mm = "Bill length (mm)",
      flipper_length_mm = "Flipper length (mm)",
      species = "Species"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(adj.r.squared, sigma, statistic, df.residual, df, p.value),
    label = list(
      sigma = "σ",
      statistic = "F-statistic"
    )
  )
Characteristic Beta SE1 Statistic 95% CI1 p-value
(Intercept) -2.1 4.04 -0.510 -10, 5.9 0.6
Flipper length (mm) 0.22 0.021 10.1 0.17, 0.26 <0.001
Species
    Adelie
    Chinstrap 8.8 0.399 22.0 8.0, 9.6 <0.001
    Gentoo 2.9 0.659 4.34 1.6, 4.2 <0.001
Adjusted R² = 0.774; σ = 2.60; F-statistic = 390; Residual df = 338; df = 3; p-value = <0.001
1 SE = Standard Error, CI = Confidence Interval

By default, a number of statistics will be added to the table, but you should limit these to the ones that are most relevant to evaluating the type of model you have fit. In this case, we fit a linear model, so we included the adjusted \(R^2\), the residual standard error (sigma), and information about the ANOVA.

Let’s do one more example with a GLM. Here we’ll see if we can differentiate Gentoo penguins from the rest. Hint: they’re the really big ones. Then we’ll walk through the steps of displaying the model.

gentoo <- penguins |> 
  mutate(gentoo = ifelse(species == "Gentoo", 1, 0))

glm_penguins <- glm(
  gentoo ~ body_mass_g + bill_length_mm, 
  family = binomial,
  data = gentoo
)

glm_penguins |> summary()
#> 
#> Call:
#> glm(formula = gentoo ~ body_mass_g + bill_length_mm, family = binomial, 
#>     data = gentoo)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.3865  -0.1753  -0.0390   0.0622   2.6125  
#> 
#> Coefficients:
#>                  Estimate Std. Error z value          Pr(>|z|)    
#> (Intercept)    -31.990945   4.528334   -7.06 0.000000000001611 ***
#> body_mass_g      0.006257   0.000821    7.62 0.000000000000026 ***
#> bill_length_mm   0.091252   0.057726    1.58              0.11    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 446.80  on 341  degrees of freedom
#> Residual deviance: 115.33  on 339  degrees of freedom
#>   (2 observations deleted due to missingness)
#> AIC: 121.3
#> 
#> Number of Fisher Scoring iterations: 7
glm_penguins |> 
  tbl_regression(
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  )
Characteristic log(OR)1 SE1 Statistic 95% CI1 p-value
(Intercept) -32 4.53 -7.06 -42, -24 <0.001
Body mass (g) 0.01 0.001 7.62 0.00, 0.01 <0.001
Bill length (mm) 0.09 0.058 1.58 -0.02, 0.21 0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

Note that we are given estimates of the coefficients on the logit or log odds ratio scale. This is a value that can be really hard to interpret, so it’s recommended that you exponentiate these terms to get them onto the odds ratio scale (in this case the probability of being a Gentoo penguin relative to the probability of not being a Gentoo penguin). To do that, we simply tell tbl_regression() to exponentiate the terms for us.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  )
Characteristic OR1 SE1 Statistic 95% CI1 p-value
(Intercept) 0.00 4.53 -7.06 0.00, 0.00 <0.001
Body mass (g) 1.01 0.001 7.62 1.00, 1.01 <0.001
Bill length (mm) 1.10 0.058 1.58 0.98, 1.23 0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

A final note about formatting. All the things you can change in a data summary table, you can modify in a regression summary table, too.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  modify_header(label ~ "") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()
Table 2. Gentoo Model
OR1 SE1 Statistic 95% CI1 p-value
(Intercept) 0.00 4.53 -7.06 0.00, 0.00 <0.001
Body mass (g) 1.01 0.001 7.62 1.00, 1.01 <0.001
Bill length (mm) 1.10 0.058 1.58 0.98, 1.23 0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

You can also re-order the columns, but it’s a little tricky because of the structure of the summary table. We modify the table body, relocating columns or variables with the dplyr function relocate(), telling it to move the statistic column before the p.value column.

glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  modify_table_body(~.x |> relocate(statistic, .before = p.value)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  modify_header(label ~ "") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()
Table 2. Gentoo Model
OR1 SE1 95% CI1 Statistic p-value
(Intercept) 0.00 4.53 0.00, 0.00 -7.06 <0.001
Body mass (g) 1.01 0.001 1.00, 1.01 7.62 <0.001
Bill length (mm) 1.10 0.058 0.98, 1.23 1.58 0.11
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval

This ordering makes a smidge more sense because SE and CI are naturally related to each other, as are the test statistic and p-value.

Exercises

  1. Create a regression table for a logistic model fit to the Snodgrass data.
  2. Load in the Snodgrass data with data("Snodgrass").
  3. Make all the variable names lower case with rename_with(tolower).
  4. Use select() to subset the data to include only the area of each structure and whether it is inside the inner walls.
  5. Use a GLM to model whether the structure is found inside the inner walls as a function of its total area.
    • Formula should be inside ~ area.
    • Set family = binomial and data = Snodgrass.
  6. Summarize the model with summary().
  7. Now create a regression table with tbl_regression().
    • Be sure to exponentiate the coefficient estimates.
    • Include the intercept.
    • Unhide the test statistic and standard error for the coefficient estimates with modify_column_unhide().
    • Be sure to re-arrange the columns with modify_table_body(~.x |> relocate(statistic, .before = p.value)).
    • Add model statistics with add_glance_source_note().
    • Add a caption with modify_caption().
    • Remove the “Characteristic” label with modify_header(label ~ "").

Exporting tables

The gtsummary package doesn’t provide tools for exporting tables. Fortunately, it does let you convert its summary tables to gt tables, and the gt package does provide tools for exporting tables.

Here’s the last table we made in the previous section:

gtsummary_penguins <- glm_penguins |> 
  tbl_regression(
    exponentiate = TRUE,
    intercept = TRUE,
    label = list(
      gentoo = "Gentoo",
      body_mass_g = "Body mass (g)",
      bill_length_mm = "Bill length (mm)"
    )
  ) |> 
  modify_column_unhide(columns = c(statistic, std.error)) |> 
  modify_table_body(~.x |> relocate(statistic, .before = p.value)) |> 
  add_glance_source_note(
    include = c(AIC, null.deviance, df.null, deviance, df.residual)
  ) |>
  # modify_header(label ~ " ") |>
  modify_caption("**Table 2. Gentoo Model**") |> 
  bold_labels()

We convert this to a gt object using as_gt() and save it to disk with the gtsave() function from the gt package. This works similar to write_csv() and ggsave(). We simply tell it what table we want to save and where we want to save it.

gtsave(
  as_gt(gtsummary_penguins),
  file = here("manuscript", "model-summary.png")
)

Saving as a png actually involves taking a screen shot of the HTML table and requires the webshot2 package be installed. You’ll get a note to this effect if you trying saving to this format and don’t have webshot2 installed already. You can also save to other formats like PDF (.pdf) and Word (.docx), though the latter probably won’t work right now (there appears to be a bug in the code for gtsave() that is currently being worked on).

Exercises

  1. Make sure you are in the QAAD project directory.
  2. Add a folder called “manuscript.”
  3. Now try saving the data summary and regression summary tables you made in the last two sections into that folder. Use here("manuscript", <name of file.png>).
  4. Navigate to that folder and check that it successfully saved.

Homework

No homework this week.