How do I work this?

this โŸต R project

acknowledgments

Most of the ideas presented here come from Jenny Bryanโ€™s blog post, Project-oriented workflow.

Similar ideas can be found in chapter nine of R4DS (2e).

โš ๏ธ problem 1: crooked paths

A guest asks where the bathroom is, so you say:

"Go to the kitchen / turn left down the hallway / first door on the right"

Easy enough. Now imagine going to someone elseโ€™s house and offering the exact same directions.

Now you know whatโ€™s wrong with absolute file paths.

crooked paths in

This appears at the top of Blakeโ€™s R script:

setwd("C:/Users/blake/this/path/works/only/on/my/computer")

which is fine for Blake, but who works alone in science?

โš ๏ธ problem 2: cross-contamination

Imagine trying to cook dinner in a CDC labโ€ฆ

Thatโ€™s EXACTLY what itโ€™s like working on multiple research projects in the same programming work-space.


Seriously, donโ€™t do it.

cross-contamination in

solution โŸต project

A cartoon of a cracked glass cube looking frustrated with casts on its arm and leg, with bandaids on it, containing โ€œsetwdโ€, looks on at a metal riveted cube labeled โ€œR Projโ€ holding a skateboard looking sympathetic, and a smaller cube with a helmet on labeled โ€œhereโ€ doing a trick on a skateboard.

Artwork by Allison Horst.

๐Ÿ—๏ธ project basics

A typical project folder might look like this:

๐Ÿ“ my-r-project

  • |- ๐Ÿ“ _misc
  • |- ๐Ÿ“ data
    • |- data.csv
    • |- my-spatial.gpkg
    • |- elevation.tiff
  • |- ๐Ÿ“ figures
  • |- ๐Ÿ“ R
    • |- ๐Ÿ“„ analysis.qmd
    • |- ๐Ÿ“„ data-wrangling.R
  • |- ๐Ÿ“„ .gitignore
  • |- ๐Ÿ“„ my-r-project.Rproj <โ€” this makes it an R project!
  • |- ๐Ÿ“„ README.md

If you open this in the RStudio IDE, the working directory will automatically be set to โ€œroot/path/to/my-r-projectโ€.

๐Ÿ“ my-r-project

  • |- ๐Ÿ“ _misc <โ€” what goes here?
  • |- ๐Ÿ“ data
    • |- data.csv
    • |- my-spatial.gpkg
    • |- elevation.tiff
  • |- ๐Ÿ“ figures
  • |- ๐Ÿ“ R
    • |- ๐Ÿ“„ analysis.qmd
    • |- ๐Ÿ“„ data-wrangling.R
  • |- ๐Ÿ“„ .gitignore
  • |- ๐Ÿ“„ my-r-project.Rproj
  • |- ๐Ÿ“„ README.md

workflow vs product

Need to distinguish the essentials from the inessentials!

x is product iff x can run without error

head(cars)
mean(cars$dist) + 1

# don't forget to do your laundry!
i <- sample(1:nrow(cars), size = 25, replace = FALSE)
cars2 <- cars[i, ]

plot(cars2) 

Sys.Date()

bb8 <- lm(dist ~ speed, data = cars2)
summary(bb8)

But all this will ๐Ÿƒ๐Ÿƒ๐Ÿƒโ€ฆ

x is product iff the goal requires x โœ”

head(cars)
mean(cars$dist) + 1

# don't forget to do your laundry!
i <- sample(1:nrow(cars), size = 25, replace = FALSE)
cars2 <- cars[i, ] 

plot(cars2) # <--- is this necessary?

Sys.Date()

bb8 <- lm(dist ~ speed, data = cars2)
summary(bb8)

But teleology means it just dependsโ€ฆ ๐Ÿคทโ€

a good rule of thumb ๐Ÿ‘

consider what details youโ€™d include when giving directions

your code is like that, but from your raw data to your results

only the source is real

so, be like Quine

โ€œWymanโ€™s overpopulated universe is in many ways unlovely. It offends the aesthetic sense of us who have a taste for desert landscapes, but this is not the worst of it. Wymanโ€™s slum of possibles is a breeding ground for disorderly elements.โ€
On What There Is (1948)

Translation: trust your R script! and be ruthless with your use of rm()!

here() I am

A cartoon showing two paths side-by-side. On the left is a scary spooky forest, with spiderwebs and gnarled trees, with file paths written on the branches like โ€œ~/mmm/nope.csvโ€ and โ€œsetwd(โ€œ/haha/good/luck/โ€), with a scared looking cute fuzzy monster running out of it. On the right is a bright, colorful path with flowers, rainbow and sunshine, with signs saying โ€œhere!โ€ and โ€œitโ€™s all right here!โ€ A monster facing away from us in a backpack and walking stick is looking toward the right path. Stylized text reads โ€œhere: find your path.โ€

Artwork by Allison Horst.

Note that here() finds the path to the project folder, though RStudio will do this, tooโ€ฆ

library(here)

# on blake's computer
here() 
#> [1] "C:/Users/blake/rstuff/our-r-project"

# on bob's computer
here()
#> [1] "C:/Users/bob/likes/subfolders/our-r-project"

# on simon's computer
here()
#> [1] "?????/our-r-project"

here(), however, will also reference the top project directory no matter where you are in the project.

library(here)

# on blake's computer, in the R folder
here("data", "elevation.tiff") 
#> [1] "C:/Users/blake/rstuff/our-r-project/data/elevation.tiff"

# on bob's computer, in the figures folder
here("data", "elevation.tiff")
#> [1] "C:/Users/bob/likes/subfolders/our-r-project/data/elevation.tiff"

# on simon's computer, in the _misc folder
here("data", "elevation.tiff")
#> [1] "?????/our-r-project/data/elevation.tiff"

but the multiple drafts problem!

๐Ÿ“ my-r-project

  • |- ๐Ÿ“ manuscript
    • |- ๐Ÿ“„ draft_230202.docx
    • |- ๐Ÿ“„ draft_bob-commentss_230317.docx
    • |- ๐Ÿ“„ draft_simon-comments_230319.docx
    • |- ๐Ÿ“„ draft_simon-comments-on-bobs-comments_230318.docx
    • |- ๐Ÿ“„ draft_blake-makes-changes-without-reading-comments_230320.docx
    • |- ๐Ÿ“„ draft_bob-comments-on-newest-draft_230402.docx
    • |- ๐Ÿ“„ draft_blake-incorporates-simons-comments-from-230319.docx
    • |- ๐Ÿ“„ draft_simon-changes-his-mind-about-original-draft_230415.docx

etc., etc., etc.

and we havenโ€™t even gotten to drafts of our R scripts! hmmmโ€ฆ ๐Ÿค”

he got git

Once you have git and Github setup, RStudio makes version control super super easy.

See happy git with r for details.

An R project should be self-contained

โ€œbut I want to share data across projects,โ€ you will inevitably find yourself saying

and now youโ€™re on the cutting edge ๐Ÿ”ช๐Ÿ”ช๐Ÿ”ช

pin() your data?

โ€œThe pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.โ€
- From the package website

This looks promising, but I donโ€™t have much experience with it. Need buy in from the collabs on using projects firstโ€ฆ

now for some hands-on stuff

letโ€™s make an R project!