#makeovermonday: Data breaches from informationisbeautiful.net

I’m going to try to do something quick for Andy Kriebel and Andy Cotgreave’s #makeovermonday every week so I can continue showcasing some of the different types of graphs you can make in my graphing web app Playfair and so I can identify and quash bugs! This week they chose an interactive from David McCandless’s informationisbeautiful.net showing the number of records leaked in various data breaches between 2004 and 2016. Andy gives a nice run-down of the pros and cons of the original graphic.

I made the mistake of looking at a few early entries and a major theme that seems to have struck several people is the division between data breeches that were the result of hacks and those that weren’t (including user error and lost equipment). A quick look at the data shows that the former are increasing rapidly. My first thought was that an area chart would show this trend nicely, but then I remembered that I recently implemented a variation on area charts that might be neat here. I’m actually not sure what you call this kind of chart, but it’s simply an area chart with two categories where both areas originate from y=0. The area for one category is above the x-axis and the area for the other is below it. Here’s my entry for this data set:

playfair-1

Why would you want to do this? I like this type of chart because it fixes one of the primary problems with area charts which is that it’s easy to read quantities for the bottom-most sector that starts at y=0, but hard to figure out the year-by-year quantities for any other category because they no longer start at 0 – you have to figure out what the top of the previous category is and subtract that from the top of the category you’re interested in.

I added a few annotations pulling out a couple of the stories, as I think these add interesting context to what hacking/negligence mean, but the focus of my graph is clearly shifted a bit: McCandless’s chart is exploratory, allowing you to look at where the data came from for each instance, whereas mine is simply summarizing one interesting trend.

Here’s the data as I entered it into Playfair:

year type Records top bottom
2004 Hacked 92000000 92 0
2005 Hacked 0 0 0
2006 Hacked 4000000 4 0
2007 Hacked 106100000 106.1 0
2008 Hacked 6500000 6.5 0
2009 Hacked 176521778 176.521778 0
2010 Hacked 5976400 5.9764 0
2011 Hacked 198735838 198.735838 0
2012 Hacked 453730000 453.73 0
2013 Hacked 236376000 236.376 0
2014 Hacked 364370000 364.37 0
2015 Hacked 96797000 96.797 0
2016 Hacked 457670436 457.670436 0
2004 Other 0 0 0
2005 Other 4225000 0 -4.225
2006 Other 66300000 0 -66.3
2007 Other 50186405 0 -50.186405
2008 Other 41566500 0 -41.5665
2009 Other 80290788 0 -80.290788
2010 Other 10160076 0 -10.160076
2011 Other 28327649 0 -28.327649
2012 Other 224398792 0 -224.398792
2013 Other 15926000 0 -15.926
2014 Other 20128000 0 -20.128
2015 Other 650000 0 -0.65
2016 Other 68614105 0 -68.614105

And here’s the variable setup in the area element tab:

ex1

Share on FacebookTweet about this on TwitterShare on LinkedIn