How do I work this?

this ⟵ R project

acknowledgments

Most of the ideas presented here come from Jenny Bryan’s blog post, Project-oriented workflow.

Similar ideas can be found in chapter nine of R4DS (2e).

⚠️ problem 1: crooked paths

A guest asks where the bathroom is, so you say:

"Go to the kitchen / turn left down the hallway / first door on the right"

Easy enough. Now imagine going to someone else’s house and offering the exact same directions.

Now you know what’s wrong with absolute file paths.

crooked paths in

This appears at the top of Blake’s R script:

setwd("C:/Users/blake/this/path/works/only/on/my/computer")

which is fine for Blake, but who works alone in science?

⚠️ problem 2: cross-contamination

Imagine trying to cook dinner in a CDC lab…

That’s EXACTLY what it’s like working on multiple research projects in the same programming work-space.


Seriously, don’t do it.

cross-contamination in

solution ⟵ project

A cartoon of a cracked glass cube looking frustrated with casts on its arm and leg, with bandaids on it, containing “setwd”, looks on at a metal riveted cube labeled “R Proj” holding a skateboard looking sympathetic, and a smaller cube with a helmet on labeled “here” doing a trick on a skateboard.

Artwork by Allison Horst.

🏗️ project basics

A typical project folder might look like this:

📁 my-r-project

  • |- 📁 _misc
  • |- 📁 data
    • |- data.csv
    • |- my-spatial.gpkg
    • |- elevation.tiff
  • |- 📁 figures
  • |- 📁 R
    • |- 📄 analysis.qmd
    • |- 📄 data-wrangling.R
  • |- 📄 .gitignore
  • |- 📄 my-r-project.Rproj <— this makes it an R project!
  • |- 📄 README.md

If you open this in the RStudio IDE, the working directory will automatically be set to “root/path/to/my-r-project”.

📁 my-r-project

  • |- 📁 _misc <— what goes here?
  • |- 📁 data
    • |- data.csv
    • |- my-spatial.gpkg
    • |- elevation.tiff
  • |- 📁 figures
  • |- 📁 R
    • |- 📄 analysis.qmd
    • |- 📄 data-wrangling.R
  • |- 📄 .gitignore
  • |- 📄 my-r-project.Rproj
  • |- 📄 README.md

workflow vs product

Need to distinguish the essentials from the inessentials!

x is product iff x can run without error

head(cars)
mean(cars$dist) + 1

# don't forget to do your laundry!
i <- sample(1:nrow(cars), size = 25, replace = FALSE)
cars2 <- cars[i, ]

plot(cars2) 

Sys.Date()

bb8 <- lm(dist ~ speed, data = cars2)
summary(bb8)

But all this will 🏃🏃🏃…

x is product iff the goal requires x

head(cars)
mean(cars$dist) + 1

# don't forget to do your laundry!
i <- sample(1:nrow(cars), size = 25, replace = FALSE)
cars2 <- cars[i, ] 

plot(cars2) # <--- is this necessary?

Sys.Date()

bb8 <- lm(dist ~ speed, data = cars2)
summary(bb8)

But teleology means it just depends… 🤷‍

a good rule of thumb 👍

consider what details you’d include when giving directions

your code is like that, but from your raw data to your results

only the source is real

so, be like Quine

“Wyman’s overpopulated universe is in many ways unlovely. It offends the aesthetic sense of us who have a taste for desert landscapes, but this is not the worst of it. Wyman’s slum of possibles is a breeding ground for disorderly elements.”
On What There Is (1948)

Translation: trust your R script! and be ruthless with your use of rm()!

here() I am

A cartoon showing two paths side-by-side. On the left is a scary spooky forest, with spiderwebs and gnarled trees, with file paths written on the branches like “~/mmm/nope.csv” and “setwd(“/haha/good/luck/”), with a scared looking cute fuzzy monster running out of it. On the right is a bright, colorful path with flowers, rainbow and sunshine, with signs saying “here!” and “it’s all right here!” A monster facing away from us in a backpack and walking stick is looking toward the right path. Stylized text reads “here: find your path.”

Artwork by Allison Horst.

Note that here() finds the path to the project folder, though RStudio will do this, too…

library(here)

# on blake's computer
here() 
#> [1] "C:/Users/blake/rstuff/our-r-project"

# on bob's computer
here()
#> [1] "C:/Users/bob/likes/subfolders/our-r-project"

# on simon's computer
here()
#> [1] "?????/our-r-project"

here(), however, will also reference the top project directory no matter where you are in the project.

library(here)

# on blake's computer, in the R folder
here("data", "elevation.tiff") 
#> [1] "C:/Users/blake/rstuff/our-r-project/data/elevation.tiff"

# on bob's computer, in the figures folder
here("data", "elevation.tiff")
#> [1] "C:/Users/bob/likes/subfolders/our-r-project/data/elevation.tiff"

# on simon's computer, in the _misc folder
here("data", "elevation.tiff")
#> [1] "?????/our-r-project/data/elevation.tiff"

but the multiple drafts problem!

📁 my-r-project

  • |- 📁 manuscript
    • |- 📄 draft_230202.docx
    • |- 📄 draft_bob-commentss_230317.docx
    • |- 📄 draft_simon-comments_230319.docx
    • |- 📄 draft_simon-comments-on-bobs-comments_230318.docx
    • |- 📄 draft_blake-makes-changes-without-reading-comments_230320.docx
    • |- 📄 draft_bob-comments-on-newest-draft_230402.docx
    • |- 📄 draft_blake-incorporates-simons-comments-from-230319.docx
    • |- 📄 draft_simon-changes-his-mind-about-original-draft_230415.docx

etc., etc., etc.

and we haven’t even gotten to drafts of our R scripts! hmmm… 🤔

he got git

Once you have git and Github setup, RStudio makes version control super super easy.

See happy git with r for details.

An R project should be self-contained

“but I want to share data across projects,” you will inevitably find yourself saying

and now you’re on the cutting edge 🔪🔪🔪

pin() your data?

“The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.”
- From the package website

This looks promising, but I don’t have much experience with it. Need buy in from the collabs on using projects first…

now for some hands-on stuff

let’s make an R project!