Lab 15: Regression Tables
(Stats) Nothing new. (R) How to report results of regression in a table with R.
Outline
Objectives
This lab will guide you through the process of
- Creating simple display tables
- Creating interactive displays for large HTML tables
- Building data summary tables
- Building regression tables
- Exporting tables
R Packages
We will be using the following packages:
⚠️ Don’t forget to install gt
and gtsummary
with install.packages(c("gt", "gtsummary"))
. Best to run this in the console!
Data
-
DartPoints
- Includes measurements of 91 Archaic dart points recovered during surface surveys at Fort Hood, Texas.
- package:
archdata
- reference: https://cran.r-project.org/web/packages/archdata/archdata.pdf
-
penguins
- Includes measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.
- package:
palmerpenguins
- reference: https://allisonhorst.github.io/palmerpenguins/reference/penguins.html
-
Snodgrass
- Includes measurements of size, location, and contents of 91 pit houses at the Snodgrass site in Butler County, Missouri.
- reference: https://cran.r-project.org/web/packages/archdata/archdata.pdf
Grammar of Tables
Tables have a grammar? 🤔 Well, sort of… the grammar here refers to a cohesive language for describing the parts of a table, not a data table, per se, but a display table, a table meant to represent your data rather than simply store it.
A simple table of data includes column or variable labels and a body (all the rows and cells containing values of the variables). A display table, though, can also include (i) a header containing a title for the whole table (and not just a column!), (ii) a footer with, well, footnotes, and (iii) a “stub”, which includes row or observation labels and groupings of those observations. To create a display table, you can use the eponymous gt()
function. It will create a gt
object having all the components shown in the figure above, though some may be excluded if you do not explicitly specify them. Importantly, you can use gt()
to generate tables in the most common formats, namely HTML, LaTeX, and RTF, but you can also export the tables in even larger number of formats, including HTML, PDF, Word, and PNG (if you want an image of the table). That said, the real power of gt
is its support of HTML tables, and in particular interactive tables.
As a simple motivating example, consider the scenario where a reviewer asks you to include a table with all your raw data in the supplement. That’s pretty easy with gt()
. In the following example, we’ll also specify the table header with tab_header()
and add a footnote with tab_footnote()
.
head(penguins) |>
gt() |>
tab_caption("Table 1. Palmer Penguins Data") |>
tab_header(title = "this is the header") |>
tab_footnote("*I added a footnote to this table.")
this is the header | |||||||
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
*I added a footnote to this table. |
But what if your table has a lot of data, like hundreds of rows? Obviously, you can just dump all that data into a giant table and let the user suffer through navigating it. Alternatively, you know, if you have a 💟, you can use gt()
to create an interactive HTML table with search and scroll features. You simply pass the gt
table to opt_interactive()
.
⚠️ A word of caution, this is actually a brand new feature that is under active development, so it might be a smidge buggy. It will also only work in HTML documents, not Word or PDF.
penguins |>
gt() |>
tab_caption("Table 1. Palmer Penguins Data") |>
tab_header(title = "this is the header") |>
tab_footnote("I added a footnote to this table.") |>
opt_interactive(
use_compact_mode = TRUE, # squish table
use_highlight = TRUE, # highlight rows on mouse hover
use_page_size_select = TRUE, # specify number of rows displayed
use_resizers = TRUE, # allow resizing columns
use_search = TRUE # add a search text box to table
) |>
tab_options(container.height = px(500))
Notice that I used px(500)
to specify the height of the container. The px()
function is a helper for specifying height or width in pixels, rather than, say, inches or centimeters.
With the gt
package, the sky is the limit on formatting beautiful tables in R. For instance, you can add color codes to columns with data_color()
like so.
head(penguins) |>
gt() |>
tab_caption("Table 1. Palmer Penguins Data") |>
tab_header(title = "this is the header") |>
tab_footnote("*I added a footnote to this table.") |>
data_color(
columns = bill_length_mm:body_mass_g,
palette = "BrBG"
)
this is the header | |||||||
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
*I added a footnote to this table. |
And that’s just the tip of the iceberg! To learn more, I recommend perusing the package website and playing around with different functions and settings.
Exercises
- Load in the DartPoints data with
data("DartPoints")
.
- Make all the variable names lower case with
rename_with(tolower)
.
- Subset the data to include only name, tarl (the Smithsonian Trinomial), length, width, and thickness.
- Create an interactive table with the DartPoints data.
- Add a header and footnote with
tab_header()
andtab_footnote()
.
- Specify the height of the table’s container.
- Try experimenting with the different arguments you can pass to
opt_interactive()
to see what they do.
- Add a header and footnote with
Data summary
The number and variety of options available in the gt
package can be overwhelming. It’s also not capable of summarizing data or models by itself. That’s where the gtsummary
package comes in. It’s a wrapper around gt
that automatically generates summary tables of data and regression models. To generate a summary table of data, we use the tbl_summary()
function. This is similar to the skim
function from the skimr
package, but with a slightly different aesthetic. It also gives you more fine grained control over the output and style, which means it’s a smidge more complicated to work with, too.
penguins |> tbl_summary()
Characteristic | N = 3441 |
---|---|
species | |
Adelie | 152 (44%) |
Chinstrap | 68 (20%) |
Gentoo | 124 (36%) |
island | |
Biscoe | 168 (49%) |
Dream | 124 (36%) |
Torgersen | 52 (15%) |
bill_length_mm | 44.5 (39.2, 48.5) |
Unknown | 2 |
bill_depth_mm | 17.30 (15.60, 18.70) |
Unknown | 2 |
flipper_length_mm | 197 (190, 213) |
Unknown | 2 |
body_mass_g | 4,050 (3,550, 4,750) |
Unknown | 2 |
sex | |
female | 165 (50%) |
male | 168 (50%) |
Unknown | 11 |
year | |
2007 | 110 (32%) |
2008 | 114 (33%) |
2009 | 120 (35%) |
1 n (%); Median (IQR) |
By default, the summary gives you the proportions for categorical data, and the median values of continuous variables (along with their first and third quartiles or their interquartile range, IQR, in parentheses). ‘Unknown’ here refers to missing values. These are fine as far as they go, but the styling leaves a lot to be desired, especially if vertical space is prime real estate in whatever document you are creating.
In this case, something we might actually care about in our analysis is differences between species, which requires that we summarize the data by species. To do that, we simply specify the species
variable in the by=
argument.
penguins |> tbl_summary(by = species)
Characteristic | Adelie, N = 1521 | Chinstrap, N = 681 | Gentoo, N = 1241 |
---|---|---|---|
island | |||
Biscoe | 44 (29%) | 0 (0%) | 124 (100%) |
Dream | 56 (37%) | 68 (100%) | 0 (0%) |
Torgersen | 52 (34%) | 0 (0%) | 0 (0%) |
bill_length_mm | 38.8 (36.8, 40.8) | 49.5 (46.3, 51.1) | 47.3 (45.3, 49.5) |
Unknown | 1 | 0 | 1 |
bill_depth_mm | 18.40 (17.50, 19.00) | 18.45 (17.50, 19.40) | 15.00 (14.20, 15.70) |
Unknown | 1 | 0 | 1 |
flipper_length_mm | 190 (186, 195) | 196 (191, 201) | 216 (212, 221) |
Unknown | 1 | 0 | 1 |
body_mass_g | 3,700 (3,350, 4,000) | 3,700 (3,488, 3,950) | 5,000 (4,700, 5,500) |
Unknown | 1 | 0 | 1 |
sex | |||
female | 73 (50%) | 34 (50%) | 58 (49%) |
male | 73 (50%) | 34 (50%) | 61 (51%) |
Unknown | 6 | 0 | 5 |
year | |||
2007 | 50 (33%) | 26 (38%) | 34 (27%) |
2008 | 50 (33%) | 18 (26%) | 46 (37%) |
2009 | 52 (34%) | 24 (35%) | 44 (35%) |
1 n (%); Median (IQR) |
By default, the table uses the column labels from the table. R doesn’t like spaces in names, so it’s common to see underscores in the labels. We can amend these in the table by specifying all the changes in a named list.
penguins |>
tbl_summary(
by = species,
label = list(
island = "Island",
bill_length_mm = "Bill length (mm)",
bill_depth_mm = "Bill depth (mm)",
flipper_length_mm = "Flipper length (mm)",
body_mass_g = "Body mass (g)",
sex = "Sex",
year = "Year"
)
)
Characteristic | Adelie, N = 1521 | Chinstrap, N = 681 | Gentoo, N = 1241 |
---|---|---|---|
Island | |||
Biscoe | 44 (29%) | 0 (0%) | 124 (100%) |
Dream | 56 (37%) | 68 (100%) | 0 (0%) |
Torgersen | 52 (34%) | 0 (0%) | 0 (0%) |
Bill length (mm) | 38.8 (36.8, 40.8) | 49.5 (46.3, 51.1) | 47.3 (45.3, 49.5) |
Unknown | 1 | 0 | 1 |
Bill depth (mm) | 18.40 (17.50, 19.00) | 18.45 (17.50, 19.40) | 15.00 (14.20, 15.70) |
Unknown | 1 | 0 | 1 |
Flipper length (mm) | 190 (186, 195) | 196 (191, 201) | 216 (212, 221) |
Unknown | 1 | 0 | 1 |
Body mass (g) | 3,700 (3,350, 4,000) | 3,700 (3,488, 3,950) | 5,000 (4,700, 5,500) |
Unknown | 1 | 0 | 1 |
Sex | |||
female | 73 (50%) | 34 (50%) | 58 (49%) |
male | 73 (50%) | 34 (50%) | 61 (51%) |
Unknown | 6 | 0 | 5 |
Year | |||
2007 | 50 (33%) | 26 (38%) | 34 (27%) |
2008 | 50 (33%) | 18 (26%) | 46 (37%) |
2009 | 52 (34%) | 24 (35%) | 44 (35%) |
1 n (%); Median (IQR) |
The gtsummary
package also offers some helper functions for adding various columns to a summary table. For example, you can add a column with row totals using the add_overall()
function.
penguins |>
tbl_summary(
by = species,
label = list(
island = "Island",
bill_length_mm = "Bill length (mm)",
bill_depth_mm = "Bill depth (mm)",
flipper_length_mm = "Flipper length (mm)",
body_mass_g = "Body mass (g)",
sex = "Sex",
year = "Year"
)
) |>
add_overall(last = TRUE)
Characteristic | Adelie, N = 1521 | Chinstrap, N = 681 | Gentoo, N = 1241 | Overall, N = 3441 |
---|---|---|---|---|
Island | ||||
Biscoe | 44 (29%) | 0 (0%) | 124 (100%) | 168 (49%) |
Dream | 56 (37%) | 68 (100%) | 0 (0%) | 124 (36%) |
Torgersen | 52 (34%) | 0 (0%) | 0 (0%) | 52 (15%) |
Bill length (mm) | 38.8 (36.8, 40.8) | 49.5 (46.3, 51.1) | 47.3 (45.3, 49.5) | 44.5 (39.2, 48.5) |
Unknown | 1 | 0 | 1 | 2 |
Bill depth (mm) | 18.40 (17.50, 19.00) | 18.45 (17.50, 19.40) | 15.00 (14.20, 15.70) | 17.30 (15.60, 18.70) |
Unknown | 1 | 0 | 1 | 2 |
Flipper length (mm) | 190 (186, 195) | 196 (191, 201) | 216 (212, 221) | 197 (190, 213) |
Unknown | 1 | 0 | 1 | 2 |
Body mass (g) | 3,700 (3,350, 4,000) | 3,700 (3,488, 3,950) | 5,000 (4,700, 5,500) | 4,050 (3,550, 4,750) |
Unknown | 1 | 0 | 1 | 2 |
Sex | ||||
female | 73 (50%) | 34 (50%) | 58 (49%) | 165 (50%) |
male | 73 (50%) | 34 (50%) | 61 (51%) | 168 (50%) |
Unknown | 6 | 0 | 5 | 11 |
Year | ||||
2007 | 50 (33%) | 26 (38%) | 34 (27%) | 110 (32%) |
2008 | 50 (33%) | 18 (26%) | 46 (37%) | 114 (33%) |
2009 | 52 (34%) | 24 (35%) | 44 (35%) | 120 (35%) |
1 n (%); Median (IQR) |
We can also add some custom formatting, like adding a caption, removing the “Characteristic” label (that column should be obvious), and making the variable names or labels bold.
penguins |>
tbl_summary(
by = species,
label = list(
island = "Island",
bill_length_mm = "Bill length (mm)",
bill_depth_mm = "Bill depth (mm)",
flipper_length_mm = "Flipper length (mm)",
body_mass_g = "Body mass (g)",
sex = "Sex",
year = "Year"
)
) |>
add_overall(last = TRUE) |>
modify_header(label ~ "") |>
modify_caption("**Table 1. Penguin Characteristics**") |>
bold_labels()
Adelie, N = 1521 | Chinstrap, N = 681 | Gentoo, N = 1241 | Overall, N = 3441 | |
---|---|---|---|---|
Island | ||||
Biscoe | 44 (29%) | 0 (0%) | 124 (100%) | 168 (49%) |
Dream | 56 (37%) | 68 (100%) | 0 (0%) | 124 (36%) |
Torgersen | 52 (34%) | 0 (0%) | 0 (0%) | 52 (15%) |
Bill length (mm) | 38.8 (36.8, 40.8) | 49.5 (46.3, 51.1) | 47.3 (45.3, 49.5) | 44.5 (39.2, 48.5) |
Unknown | 1 | 0 | 1 | 2 |
Bill depth (mm) | 18.40 (17.50, 19.00) | 18.45 (17.50, 19.40) | 15.00 (14.20, 15.70) | 17.30 (15.60, 18.70) |
Unknown | 1 | 0 | 1 | 2 |
Flipper length (mm) | 190 (186, 195) | 196 (191, 201) | 216 (212, 221) | 197 (190, 213) |
Unknown | 1 | 0 | 1 | 2 |
Body mass (g) | 3,700 (3,350, 4,000) | 3,700 (3,488, 3,950) | 5,000 (4,700, 5,500) | 4,050 (3,550, 4,750) |
Unknown | 1 | 0 | 1 | 2 |
Sex | ||||
female | 73 (50%) | 34 (50%) | 58 (49%) | 165 (50%) |
male | 73 (50%) | 34 (50%) | 61 (51%) | 168 (50%) |
Unknown | 6 | 0 | 5 | 11 |
Year | ||||
2007 | 50 (33%) | 26 (38%) | 34 (27%) | 110 (32%) |
2008 | 50 (33%) | 18 (26%) | 46 (37%) | 114 (33%) |
2009 | 52 (34%) | 24 (35%) | 44 (35%) | 120 (35%) |
1 n (%); Median (IQR) |
Notice that you can use markdown syntax, like including asterisks around text you want to bold or italicize, and it will be rendered appropriately.
Exercises
- Create a summary table of the DartPoints data with
tbl_summary()
.
- Use the
by=
argument to summarize by dart type (the variable name isname
).
- Update the labels as needed.
- Add the overall counts for each group.
- Remove the “Characteristic” label with
modify_header()
.
- Add a caption.
- Make the variable labels bold.
Regression summary
Summarizing models works in a similar way to summarizing data tables. Consider this simple linear model of bill length by flipper length and species.
lm_penguins <- lm(
bill_length_mm ~ flipper_length_mm + species,
data = penguins
)
Here is the base R summary of the model:
lm_penguins |> summary()
#>
#> Call:
#> lm(formula = bill_length_mm ~ flipper_length_mm + species, data = penguins)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -6.662 -1.746 0.028 1.825 12.354
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.0586 4.0386 -0.51 0.61
#> flipper_length_mm 0.2151 0.0212 10.13 < 0.0000000000000002 ***
#> speciesChinstrap 8.7801 0.3991 22.00 < 0.0000000000000002 ***
#> speciesGentoo 2.8569 0.6586 4.34 0.000019 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.6 on 338 degrees of freedom
#> (2 observations deleted due to missingness)
#> Multiple R-squared: 0.776, Adjusted R-squared: 0.774
#> F-statistic: 390 on 3 and 338 DF, p-value: <0.0000000000000002
And here is the gtsummary
:
lm_penguins |> tbl_regression()
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
flipper_length_mm | 0.22 | 0.17, 0.26 | <0.001 |
species | |||
Adelie | — | — | |
Chinstrap | 8.8 | 8.0, 9.6 | <0.001 |
Gentoo | 2.9 | 1.6, 4.2 | <0.001 |
1 CI = Confidence Interval |
The first thing to note here is that this is just the coefficients table from the summary()
output. Instead of “Estimate”, though, it uses the label “Beta” - just another way of referring to the coefficient estimates or estimates of the betas for the different variables. The second and way more important thing to note is that the intercept is not reported by default. I do not know why this is the case, but I recommend you always do so (at least if it’s a model that estimates an intercept). To do that, we add intercept = TRUE
. While we’re at, let’s also rename the variables.
lm_penguins |>
tbl_regression(
intercept = TRUE,
label = list(
bill_length_mm = "Bill length (mm)",
flipper_length_mm = "Flipper length (mm)",
species = "Species"
)
)
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
(Intercept) | -2.1 | -10, 5.9 | 0.6 |
Flipper length (mm) | 0.22 | 0.17, 0.26 | <0.001 |
Species | |||
Adelie | — | — | |
Chinstrap | 8.8 | 8.0, 9.6 | <0.001 |
Gentoo | 2.9 | 1.6, 4.2 | <0.001 |
1 CI = Confidence Interval |
Also, the test statistic used to generate the p-value is currently suppressed, and instead of reporting the standard error of the estimates, the 95% confidence interval is shown instead. These are some odd design choices by the package authors, maybe related to the fact that they designed the package to work with every model you can think of putting together in R. Whatever the reason, I recommend that you unhide those columns like so.
lm_penguins |>
tbl_regression(
intercept = TRUE,
label = list(
bill_length_mm = "Bill length (mm)",
flipper_length_mm = "Flipper length (mm)",
species = "Species"
)
) |>
modify_column_unhide(columns = c(statistic, std.error))
Characteristic | Beta | SE1 | Statistic | 95% CI1 | p-value |
---|---|---|---|---|---|
(Intercept) | -2.1 | 4.04 | -0.510 | -10, 5.9 | 0.6 |
Flipper length (mm) | 0.22 | 0.021 | 10.1 | 0.17, 0.26 | <0.001 |
Species | |||||
Adelie | — | — | — | — | |
Chinstrap | 8.8 | 0.399 | 22.0 | 8.0, 9.6 | <0.001 |
Gentoo | 2.9 | 0.659 | 4.34 | 1.6, 4.2 | <0.001 |
1 SE = Standard Error, CI = Confidence Interval |
Finally, the coefficients table doesn’t have information about the model as a whole, like the \(R^2\) value or the F-statistic used in the ANOVA. We can add that information to our table either as additional rows with add_glance_table()
or as a footnote with add_glance_source_note()
. My own preference is for the latter, but journals will have their own formatting guidelines.
lm_penguins |>
tbl_regression(
intercept = TRUE,
label = list(
bill_length_mm = "Bill length (mm)",
flipper_length_mm = "Flipper length (mm)",
species = "Species"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
add_glance_source_note(
include = c(adj.r.squared, sigma, statistic, df.residual, df, p.value),
label = list(
sigma = "σ",
statistic = "F-statistic"
)
)
Characteristic | Beta | SE1 | Statistic | 95% CI1 | p-value |
---|---|---|---|---|---|
(Intercept) | -2.1 | 4.04 | -0.510 | -10, 5.9 | 0.6 |
Flipper length (mm) | 0.22 | 0.021 | 10.1 | 0.17, 0.26 | <0.001 |
Species | |||||
Adelie | — | — | — | — | |
Chinstrap | 8.8 | 0.399 | 22.0 | 8.0, 9.6 | <0.001 |
Gentoo | 2.9 | 0.659 | 4.34 | 1.6, 4.2 | <0.001 |
Adjusted R² = 0.774; σ = 2.60; F-statistic = 390; Residual df = 338; df = 3; p-value = <0.001 | |||||
1 SE = Standard Error, CI = Confidence Interval |
By default, a number of statistics will be added to the table, but you should limit these to the ones that are most relevant to evaluating the type of model you have fit. In this case, we fit a linear model, so we included the adjusted \(R^2\), the residual standard error (sigma), and information about the ANOVA.
Let’s do one more example with a GLM. Here we’ll see if we can differentiate Gentoo penguins from the rest. Hint: they’re the really big ones. Then we’ll walk through the steps of displaying the model.
gentoo <- penguins |>
mutate(gentoo = ifelse(species == "Gentoo", 1, 0))
glm_penguins <- glm(
gentoo ~ body_mass_g + bill_length_mm,
family = binomial,
data = gentoo
)
glm_penguins |> summary()
#>
#> Call:
#> glm(formula = gentoo ~ body_mass_g + bill_length_mm, family = binomial,
#> data = gentoo)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.3865 -0.1753 -0.0390 0.0622 2.6125
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -31.990945 4.528334 -7.06 0.000000000001611 ***
#> body_mass_g 0.006257 0.000821 7.62 0.000000000000026 ***
#> bill_length_mm 0.091252 0.057726 1.58 0.11
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 446.80 on 341 degrees of freedom
#> Residual deviance: 115.33 on 339 degrees of freedom
#> (2 observations deleted due to missingness)
#> AIC: 121.3
#>
#> Number of Fisher Scoring iterations: 7
glm_penguins |>
tbl_regression(
intercept = TRUE,
label = list(
gentoo = "Gentoo",
body_mass_g = "Body mass (g)",
bill_length_mm = "Bill length (mm)"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
add_glance_source_note(
include = c(AIC, null.deviance, df.null, deviance, df.residual)
)
Characteristic | log(OR)1 | SE1 | Statistic | 95% CI1 | p-value |
---|---|---|---|---|---|
(Intercept) | -32 | 4.53 | -7.06 | -42, -24 | <0.001 |
Body mass (g) | 0.01 | 0.001 | 7.62 | 0.00, 0.01 | <0.001 |
Bill length (mm) | 0.09 | 0.058 | 1.58 | -0.02, 0.21 | 0.11 |
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339 | |||||
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval |
Note that we are given estimates of the coefficients on the logit or log odds ratio scale. This is a value that can be really hard to interpret, so it’s recommended that you exponentiate these terms to get them onto the odds ratio scale (in this case the probability of being a Gentoo penguin relative to the probability of not being a Gentoo penguin). To do that, we simply tell tbl_regression()
to exponentiate the terms for us.
glm_penguins |>
tbl_regression(
exponentiate = TRUE,
intercept = TRUE,
label = list(
gentoo = "Gentoo",
body_mass_g = "Body mass (g)",
bill_length_mm = "Bill length (mm)"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
add_glance_source_note(
include = c(AIC, null.deviance, df.null, deviance, df.residual)
)
Characteristic | OR1 | SE1 | Statistic | 95% CI1 | p-value |
---|---|---|---|---|---|
(Intercept) | 0.00 | 4.53 | -7.06 | 0.00, 0.00 | <0.001 |
Body mass (g) | 1.01 | 0.001 | 7.62 | 1.00, 1.01 | <0.001 |
Bill length (mm) | 1.10 | 0.058 | 1.58 | 0.98, 1.23 | 0.11 |
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339 | |||||
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval |
A final note about formatting. All the things you can change in a data summary table, you can modify in a regression summary table, too.
glm_penguins |>
tbl_regression(
exponentiate = TRUE,
intercept = TRUE,
label = list(
gentoo = "Gentoo",
body_mass_g = "Body mass (g)",
bill_length_mm = "Bill length (mm)"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
add_glance_source_note(
include = c(AIC, null.deviance, df.null, deviance, df.residual)
) |>
modify_header(label ~ "") |>
modify_caption("**Table 2. Gentoo Model**") |>
bold_labels()
OR1 | SE1 | Statistic | 95% CI1 | p-value | |
---|---|---|---|---|---|
(Intercept) | 0.00 | 4.53 | -7.06 | 0.00, 0.00 | <0.001 |
Body mass (g) | 1.01 | 0.001 | 7.62 | 1.00, 1.01 | <0.001 |
Bill length (mm) | 1.10 | 0.058 | 1.58 | 0.98, 1.23 | 0.11 |
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339 | |||||
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval |
You can also re-order the columns, but it’s a little tricky because of the structure of the summary table. We modify the table body, relocating columns or variables with the dplyr
function relocate()
, telling it to move the statistic
column before the p.value
column.
glm_penguins |>
tbl_regression(
exponentiate = TRUE,
intercept = TRUE,
label = list(
gentoo = "Gentoo",
body_mass_g = "Body mass (g)",
bill_length_mm = "Bill length (mm)"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
modify_table_body(~.x |> relocate(statistic, .before = p.value)) |>
add_glance_source_note(
include = c(AIC, null.deviance, df.null, deviance, df.residual)
) |>
modify_header(label ~ "") |>
modify_caption("**Table 2. Gentoo Model**") |>
bold_labels()
OR1 | SE1 | 95% CI1 | Statistic | p-value | |
---|---|---|---|---|---|
(Intercept) | 0.00 | 4.53 | 0.00, 0.00 | -7.06 | <0.001 |
Body mass (g) | 1.01 | 0.001 | 1.00, 1.01 | 7.62 | <0.001 |
Bill length (mm) | 1.10 | 0.058 | 0.98, 1.23 | 1.58 | 0.11 |
AIC = 121; Null deviance = 447; Null df = 341; Deviance = 115; Residual df = 339 | |||||
1 OR = Odds Ratio, SE = Standard Error, CI = Confidence Interval |
This ordering makes a smidge more sense because SE and CI are naturally related to each other, as are the test statistic and p-value.
Exercises
- Create a regression table for a logistic model fit to the Snodgrass data.
- Load in the Snodgrass data with
data("Snodgrass")
.
- Make all the variable names lower case with
rename_with(tolower)
.
- Use
select()
to subset the data to include only thearea
of each structure and whether it isinside
the inner walls.
- Use a GLM to model whether the structure is found inside the inner walls as a function of its total area.
- Formula should be
inside ~ area
. - Set
family = binomial
anddata = Snodgrass
.
- Formula should be
- Summarize the model with
summary()
. - Now create a regression table with
tbl_regression()
.- Be sure to exponentiate the coefficient estimates.
- Include the intercept.
- Unhide the test statistic and standard error for the coefficient estimates with
modify_column_unhide()
.
- Be sure to re-arrange the columns with
modify_table_body(~.x |> relocate(statistic, .before = p.value))
.
- Add model statistics with
add_glance_source_note()
.
- Add a caption with
modify_caption()
.
- Remove the “Characteristic” label with
modify_header(label ~ "")
.
- Be sure to exponentiate the coefficient estimates.
Exporting tables
The gtsummary
package doesn’t provide tools for exporting tables. Fortunately, it does let you convert its summary tables to gt
tables, and the gt
package does provide tools for exporting tables.
Here’s the last table we made in the previous section:
gtsummary_penguins <- glm_penguins |>
tbl_regression(
exponentiate = TRUE,
intercept = TRUE,
label = list(
gentoo = "Gentoo",
body_mass_g = "Body mass (g)",
bill_length_mm = "Bill length (mm)"
)
) |>
modify_column_unhide(columns = c(statistic, std.error)) |>
modify_table_body(~.x |> relocate(statistic, .before = p.value)) |>
add_glance_source_note(
include = c(AIC, null.deviance, df.null, deviance, df.residual)
) |>
# modify_header(label ~ " ") |>
modify_caption("**Table 2. Gentoo Model**") |>
bold_labels()
We convert this to a gt object using as_gt()
and save it to disk with the gtsave()
function from the gt
package. This works similar to write_csv()
and ggsave()
. We simply tell it what table we want to save and where we want to save it.
Saving as a png actually involves taking a screen shot of the HTML table and requires the webshot2
package be installed. You’ll get a note to this effect if you trying saving to this format and don’t have webshot2
installed already. You can also save to other formats like PDF (.pdf) and Word (.docx), though the latter probably won’t work right now (there appears to be a bug in the code for gtsave()
that is currently being worked on).
Exercises
- Make sure you are in the QAAD project directory.
- Add a folder called “manuscript.”
- Now try saving the data summary and regression summary tables you made in the last two sections into that folder. Use
here("manuscript", <name of file.png>)
.
- Navigate to that folder and check that it successfully saved.
Homework
No homework this week.