Introduction
Welcome to the SURGE Discover Coding R workshop! This workshop is aimed at people from science disciplines who have no previous experience with a programming language. We teach the basic of coding using Python, a very popular language used in science and data science. If you have programmed before, but are new to Python, you will also find this workshop useful, if perhaps a bit slow at times.
Our Approach
R This is not an introduction to computer science. The SURGE Discover Coding series aims to teach people working in science how to use Python as a tool for working with data. As such, our focus is on:
- learning the fundamentals of R
- learning the fundamentals of programming logic
- using R for data science, including:
- reading data
- manipulating/processing data (e.g., extracting specific data, splitting data according to variables, applying functions, combining data)
- exploratory data analysis
- basic statistical analyses of data sets
Learning Objectives
Upon completing this workshop, you will be able to:
- understand and use variables
- work with R to import data files and packages
- obtain basic summary statistics from data files
- manipulate and extract data from imported data files
- visualize data using R’s ggplot package, and customize these plots
- write R code according to standard style guidelines
Cloud-Based R
For this workshop, we will use a cloud-based resource (CoCalc) for all of the programming. When you register for a workshop, you will receive instructions to create an account on CoCalc. Please do this before the start of the workshop to avoid delays. Then at the start of the workshop, attendees’ CoCalc accounts will be updagraded to a paid version for the duration of the workshop.
Using CoCalc has a number of advantages, including not needing to worry about installing R on your own computer, or limitations of the type or specifications of the computer you have. As well, workshop attendees will have all the data files they need, and “skeleton” files for each workshop, automatically copied to their CoCalc accounts. An added advantage of the cloud platform is that members of the teaching team can see the work of any participant in the workshop. This is very useful when participants get stuck and need help.
If you want to install R on your own computer — either to follow along with a workshop, or to use afterwards — that’s fine. We recommend you install R Studio, which is a complete scientific R installation containing everything you’ll need for the workshop, and likely for the majority of your future data science work as well. However, if you run R on your own computer, instead of on the cloud server for this workshop, we will not be able to view your work if you are having trouble — so our ability to suport you will be more limited.
Origins of this Material
This workshop is adapted by one created by the Library Carpentry foundation. It uses freely-available open source data from Gapminder, an independent Swedish foundation whose mission is to “fight devastating ignorance with a fact-based worldview everyone can understand.”; as well as data available through the UCI Machine Learning Repository. Gapminder is perhaps most famous for the TED talks given by its co-founder, the late Dr. Hans Rosling (the other founders were Ola Rosling and Anna Rosling Rönnlund). Dr. Rosling’s TED talk, The best statistics you’ve ever seen, became one of the most watched TED talks ever (nearly 15m views as of Feb. 4, 2021). The Gapminder data we will be working with here are a subset of those used in Dr. Rosling’s talk. Specifically, the data are gross domestic product (GDP) of countries from around the world, over a time period from 1952 – 2007.
While we could introduce R with virtually any data, the Gapminder data are relatively easy to understand without deep technical knowledge of a domain (GDP is a measure of a country’s wealth, and the data reflect how this change over time), the data files are open source and so support open sharing and transparency in science, and Dr. Rosling’s talks provide a colourful and interesting introduction to the data. We have adapted the Software Carpentry (SC) version of the workshop based on experience teaching it, and finding that some parts of the SC workshop were unclear, or assumed certain background knowledge (particularly mathematics) that was not universally held by our target audience, or that concepts in the SC workshop were presented in a sequence that was not intuitive to us. We have great respect for the SC organization and its aims, and indeed most members of the SURGE Discover Coding teaching team are SC-certified instructors. Adapting and customizing the workshop is in the spirit of the open source movement and open educational resources, which SURGE wholeheartedly supports.