Send it to please.


123 Street Avenue, City Town, 99999

(123) 555-6789


You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.

A refreshing dip in the data pool


A refreshing dip in the data pool

Harry Caufield

 A tiger, enjoying a swim. At least I'm assuming it's enjoying itself.  Photo by Ber'Zophus on Wikimedia Commons.

A tiger, enjoying a swim. At least I'm assuming it's enjoying itself. Photo by Ber'Zophus on Wikimedia Commons.

Feeling overheated by all the Big Data breathing down your neck? Cool off with some toy data sets. Here, I'm using "toy" to mean "anything you don't have to be responsible for and can just have some fun with."

R users are familiar with mtcars, a set of data concerning 32 different automobile models from the early 1970's. It's an old standard. Additional R data sets can be listed using data() and more can be loaded from packages like MASS (which is included with R base so don't worry about installing it). If you'd prefer to use these data sets in Python, there's a package called PyDataset to make it easy.

Not happy with that data? Try - it's urrently the home of nearly 186,000 data sets across numerous disciplines. They vary in format as well: some are nice, clean CSVs while others may just be collections of spreadsheets. Still others may require some navigation to get to the useful material. 

Here are some examples, found through and other sources:

Kaggle has some fun data sets to work with too, as does Amazon Web Services.

Or you can just give up and make a small synthetic data frame in R:

syn <- data.frame(replicate(10,sample(0:100,50,rep=TRUE)))
rownames(syn) <- c(replicate(50,paste(sample(c(0:9, LETTERS), 4, replace=TRUE), collapse="")))
colnames(syn) <- c(replicate(10,paste(sample(c(LETTERS), 4, replace=TRUE), collapse="")))