Sharing R products

How to share R data and code

Sharing R data (.rda) and R code (.R) files is becoming more common in the social sciences. Brilliant! However, I think that the process of sharing R products could be made significantly easier for everyone involved if people followed one simple piece of advice:

Don’t set R’s working directory from within an R script.

Sharing data is easy, because users can put the data wherever they want on their computers, and use it from there. However, the majority of shared R scripts I’ve encountered suffer from the same problem: They attempt to, or instruct users to, set the working directory from within the .R script itself:

setwd("C:/user1/arbitrary/and/idiosynchratic/path/")

Don’t do it1 (Gandrud 2016)! This practice uses absolute file paths, which are bad. Instead, you should be using relative file paths, which are good. Let me explain.

Absolute file paths

Setting the working directory from within a script relies on absolute file paths, which point to a specific location on a computer’s hard drive. This absolute location is sensitive to the folders on the user’s computer, because it specifies the entire path to the location (C:/user1/...). Therefore, if your user name isn’t “user1”, or you don’t have exactly the same folders as in the code snippet above, the script won’t run on your computer. I’d like my scripts to run on everyone’s computers!

Some also recommend that you simply write the appropriate folder inside the quotes in setwd("") once you’ve downloaded the file(s). Why would I do that extra work? What if I later decide to move the file to another folder? It’s much better to use relative file paths instead.

Relative file paths

A better option is not to set the working directory at all. Instead, assume that users themselves maintain their computing environment. This is most easily achieved by recommending users to use R Projects, which automatically set the working directory. Then, refer to files in your scripts by using relative file paths. Relative file paths point to files in relation to the script that is doing the referring. For example, if your R script file and data file (e.g. data.csv) are in the same folder, in your script you would refer to the data with:

d <- read.csv("data.csv")

No path needed!

However, if you must set the working directory, use something external to the script, such as R Studio’s GUI to set the working directory.

Figure 1: The correct way to set R’s working directory

Figure 1: The correct way to set R’s working directory

Share products, not files

Once your script files use relative file paths, it doesn’t matter where other users put the downloaded files. The files can be on a shared drive in Spain and users still don’t need to change the path with setwd(). This leads to an important insight: We shouldn’t be sharing individual files in a haphazard manner. Instead, we should share the entire project itself, with its documentation files, data files, code, etc. Importantly, sharing the entire project preserves the links between files (relative file paths), and makes it easy for people to collaborate on projects.

The simplest way to share the R “product” (all the files!) is to keep all your stuff related to the project under one main folder, and email compressed (.zip) packages of the folder to your collaborators. Of course, with a little bit more effort, you can share the project with collaborators directly, using Dropbox (not great) or GitHub (great).

Thanks for reading.

References

Gandrud, Christopher. 2016. Reproducible Research with R and R Studio, Second Edition. CRC Press.