Skip to the content.
Home | Table of Contents | Discover Coding | SURGE

This is meant to be used in RStudio not CoCalc!

Open your Rproj file

First, open your R Project file (library_carpentry.Rproj) created in the Before We Start lesson.

If you did not complete that step, do the following:

Presentation of the data

This data was downloaded from the University of Houston–Clear Lake Integrated Library System in 2018. It is a relatively random sample of books from the catalog. It consists of 10,000 observations of 11 variables.

These variables are:

Getting data into R

Ways to get data into R In order to use your data in R, you must import it and turn it into an R object. There are many ways to get data into R.

Organizing your working directory

Using a consistent folder structure across your projects will help keep things organized and make it easy to find/file things in the future. This can be especially helpful when you have multiple projects. In general, you might create directories (folders) for scripts, data, and documents. Here are some examples of suggested directories:

- data/ Use this folder to store your raw data and intermediate datasets. For the sake of transparency and provenance, you should always keep a copy of your raw data accessible and do as much of your data cleanup and preprocessing programmatically (i.e., with scripts, rather than manually) as possible.
- data_output/ When you need to modify your raw data, it might be useful to store the modified versions of the datasets in a different folder.
- documents/ Used for outlines, drafts, and other text.
- fig_output/ This folder can store the graphics that are generated by your scripts.
- scripts/ A place to keep your R scripts for different analyses or plotting.

You may want additional directories or subdirectories depending on your project needs, but these should form the backbone of your working directory.

The working directory

The working directory is an important concept to understand. It is the place on your computer where R will look for and save files. When you write code for your project, your scripts should refer to files in relation to the root of your working directory and only to files within this structure.

Using RStudio projects makes this easy and ensures that your working directory is set up properly. If you need to check it, you can use getwd(). If for some reason your working directory is not what it should be, you can change it in the RStudio interface by navigating in the file browser to where your working directory should be, clicking on the blue gear icon “More”, and selecting “Set As Working Directory”. Alternatively, you can use setwd(“/path/to/working/directory”) to reset your working directory. However, your scripts should not include this line, because it will fail on someone else’s computer.

Setting your working directory with setwd() Some points to note about setting your working directory:

The directory must be in quotation marks.

On Windows computers, directories in file paths are separated with a backslash \. However, in R, you must use a forward slash /. You can copy and paste from the Windows Explorer window directly into R and use find/replace (Ctrl/Cmd + F) in R Studio to replace all backslashes with forward slashes.

On Mac computers, open the Finder and navigate to the directory you wish to set as your working directory. Right click on that folder and press the options key on your keyboard. The ‘Copy “Folder Name”’ option will transform into ‘Copy “Folder Name” as Pathname. It will copy the path to the folder to the clipboard. You can then paste this into your setwd() function. You do not need to replace backslashes with forward slashes.

After you set your working directory, you can use ./ to represent it. So if you have a folder in your directory called data, you can use read.csv(“./data”) to represent that sub-directory.

Downloading the data and getting set up

Now that you have set your working directory, we will create our folder structure using the dir.create() function.

For this lesson we will use the following folders in our working directory: data/, data_output/ and fig_output/. Let’s write them all in lowercase to be consistent. We can create them using the RStudio interface by clicking on the “New Folder” button in the file pane (bottom right), or directly from R by typing at console:

dir.create("data")
dir.create("data_output")
dir.create("fig_output")

Go to the Figshare page for this curriculum and download the dataset called “books.csv”. The direct download link is: https://ndownloader.figshare.com/files/22031487. Place this downloaded file in the data/ you just created. Alternatively, you can do this directly from R by copying and pasting this in your terminal

download.file("https://ndownloader.figshare.com/files/22031487",
              "data/books.csv", mode = "wb")

Now if you navigate to your data folder, the books.csv file should be there. We now need to load it into our R session.