R Studio 10.5.8

Model Diagnostics. “Your assumptions are your windows on the world. Scrub them off every once in a while, or the light won’t come in.”. After reading this chapter you will be able to: Understand the assumptions of a regression model.

R Studio Download

  1. Lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. Sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = 'array', an array if appropriate, by applying simplify2array.
  2. Fl Studio Mac Os X 10.5.8 free download - Apple Mac OS X Snow Leopard, R for Mac OS X, Apple Mac OS X Lion 10.7.5 Supplemental Update, and many more programs.

In this post I describe the dslabs package, which contains some datasets that I use in my data science courses.

A much discussed topic in stats education is that computing should play a more prominent role in the curriculum. I strongly agree, but I think the main improvement will come from bringing applications to the forefront and mimicking, as best as possible, the challenges applied statisticians face in real life. I therefore try to avoid using widely used toy examples, such as the mtcars dataset, when I teach data science. However, my experience has been that finding examples that are both realistic, interesting, and appropriate for beginners is not easy. After a few years of teaching I have collected a few datasets that I think fit this criteria. To facilitate their use in introductory classes, I include them in the dslabs package:

StudioR download

Below I show some example of how you can use these datasets. You can see the datasets that are included here:

R studio cloud

Note that the package also includes some of the scripts used to wrangle the data from their original source:

If you want to learn more about how we use these datasets in class, you can read this paper or this online book.

This dataset includes gun murder data for US states in 2012. I use this dataset to introduce the basics of R program.

This dataset includes health and income outcomes for 184 countries from 1960 to 2016. It also includes two character vectors, OECD and OPEC, with the names of OECD and OPEC countries from 2016. I use this dataset to teach data visualization and ggplot2.

R Download

This dataset contains yearly counts for Hepatitis A, measles, mumps, pertussis, polio, rubella, and smallpox for US states. Original data courtesy of Tycho Project. I use it to show ways one can plot more than 2 dimensions.

This data includes poll results from the US 2016 presidential elections aggregated from HuffPost Pollster, RealClearPolitics, polling firms and news reports. The dataset also includes election results (popular vote) and electoral college votes in results_us_election_2016. I use this dataset to teach inference.

These are self-reported heights in inches for males and females from data science course across several years. I use this to teach distributions and summary statistics.

R Studio Download For Windows

These data have been highly wrangled as students would often reported heights in values other than inches. The original entries are here:

We use this as an example to teach string processing and regex.

Finally, here is a silly example from the website Spurious Correlations that I use when teaching correlation does not imply causation.

Please enable JavaScript to view the comments powered by Disqus.comments powered by

R Studio 10.5.8 Free

CrackDisqus