This document compiles R resources that I recommend. R is not the most straightforward programming language, and different styles of R code can appear drastically different. Almost every style will look quite different from more conventional programming languages like C or Python.
Given that, it is easy to head down the wrong path in R. The resources I list here are largely chosen to avoid those pitfalls.
Basics:
Do you want to learn R?
If you have little to no coding experience and want to get started, or are new to R and data processing, this online book is great:
Statistical Inference via Data Science https://moderndive.com/1-getting-started.html
If you have are kinda familiar with code, or have even coded in old school base R extensively, but are new to tidy R or data science with R, Hadley’s book is the definitive guide to analyzing data with R:
R for Data Science https://r4ds.had.co.nz/index.html
Both of these resources are listed on the tidyverse website.
Statistics:
Do you want to get some basic descriptive statistics (mean, variance, etc)? Do you want to make some plots or fit a basic model?
R for Data Science, again, has you covered.
Do you want to model your data for legit scientific use? Understand linear or logistic regression? Use (and actually understand) generalized additive models?
By far the best resource for statistical data analysis I’ve ever seen is Cosma Shalizi’s:
Advanced Data Analysis from an Elementary Point of View https://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
Most texts either focus on the principles and theory of statistics, or the nitty gritty of data analysis code. This book stays focussed on the basics while also demonstrating everything with actual R code and plots. The one downside is that it uses base R instead of tidyverse stuff. However, once you are familiar with tidyverse stuff it is not hard to go between the two. The largest difference is in the base plot
versus the tidy ggplot
.
Some of the content in this book is probably not of interest. I certainly haven’t made it through the whole thing. So I’ll highlight some sections I’ve found particularly useful here:
- 1.1 Regression: Predicting and Relating Quantitative Features
- 2.4 Linear Regression Is Not the Philosopher’s Stone
- 3 Model Evaluation
- 8 Additive Models
- 12 Generalized Linear Models and Generalized Additive Models
I think if you were to work through that whole book you would have a better statistics education than the majority of professional data scientists.
Spatial Data:
Do you want to make maps? Do spatial statistics? Explore movements?
I just found this text by Yeh Chia Jung, but it looks really nice. It’s basically and in-depth how to on using the sf
package. It illustrates how to use ggplot
in combination with spatial data, and also explains what goes into making data spatial.
Spatial Analysis with R https://chiajung-yeh.github.io/Spatial-Analysis/
Advanced R topics:
At some point writing tidy R code you are going to get frustrated and feel limited with what you can do. Most likely you will then start trying to write Python using R, with for loops and such. This can work but will also get really messy, and if you choose this path, you will regret ever using R because things would have been easier and faster had you started with Python.
Fortunately there is another path, and once again Hadley is there to guide us:
Advanced R https://adv-r.hadley.nz/
I built this package because I realized how hard it was to reliably share some R scripts analyzing telemetry data. In much the same way as putting all your telemetry data in a consistent self contained database allows you to share data and maintain data consistency, putting R code into a package allows you to share your code and maintain reproducibility.
Developing in the context of an R package is different than writing small one-off scripts, but there are huge benefits, and once you are past the initial hurdles, it does not add an incredible amount of overhead. This book will guide you through the process:
R Packages https://r-pkgs.org/
RMarkdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
For more info on how to write R Markdown and do really fancy stuff with it:
R Markdown: The Definitive Guide https://bookdown.org/yihui/rmarkdown/