Oh say say say: treemaps in R

R provides a seemingly endless toolbox of data visualization options. As a quick example, I was trying to find a way to create a treemap yesterday and

a Flowingdata post

provided the ideal R solution. It requires the 

portfolio package

. Here's a test with some randomized data:

> testdata = data.frame(replicate(4,sample(0:1000,150,rep=FALSE)))

> map.market(id=testdata$X1, area=testdata$X2, group=testdata$X4, color=testdata$X3, main="Random Map")

The result looks like this:

Ignore the color key - there are no negative values here.

True, this example defeats one aspect of the purpose of a treemap. There's usually a hierarchy directed by qualitative variables. There is, in fact, room for two different quantitative variables and one category for each item in the data. We can certainly add a category to the data as it is:

> testdata$category <- ifelse(testdata$X4 > 800,"Large", "Small")

Now, every item with an X4 value greater than 800 will be in one category and everything smaller will be in the other. The output:

Neatly-organized boxes.

As expected, the smaller category ("Large") gets squished into one size of the figure and its contents get resized to fit. I miss the individual data labels though they got a bit dense.

The remaining issue is one of colors. The map.markey function allows for different scales but doesn't appear to provide many customization options.

This blog post

describes an alternate implementation which is RColorBrewer-friendly. Once installing and loading RColorBrewer (it's essential code, but it never seems to be installed when I need it), the newer treemap method is handled like this:

> treemap(testdata$X1, testdata$X2, testdata$category, testdata$X3, main="Random Map with Categories", pal="Reds", textcol="black")

From Sunset Snowbank to Waxy Red Delicious.

A treemap like this is likely a bit overkill for most applications, especially if there are too many categories or data items to be informative. A figure like

this treemap of Vietnam's industrial exports

provides an example: the smaller items are all unlabeled, so why include them? As always, results will vary and depend upon both data and conclusions.