Ah, the glamorous life of the data visualization designer… to draw or not to draw? To obfuscate or not to obfuscate? I’ve been doing some reading lately about a debate that is making its way amongst the data viz community. At what point does too much illustration, creativity or innovation get in the way of the primary purpose of data visualization? And how well is the design community being transparent about art based on data versus data visualization? Or, to put it more simply, should data visualization be easy to understand and what happens when it’s not?
Allow me, first, to offer up my own definition, artfully cadged from people much smarter than I and enhanced by my own experience in the field, such as it is. So, data visualization is what, exactly?
Information served up visually in order to inform and improve/enhance our understanding of the data.
Clumsy, but I’m hitting the main points: inform and understanding. If pressed, I would add the word “easily.” Actually, it’s the word “easily” that prompted me to write this.
If you can’t understand a data visualization piece, then it’s pretty useless, isn’t it? Maybe it’s beautiful, but if you walk away more confused than you began, it’s useless. And if you walk away as confused, or a bit less confused, it’s still useless.
How far can we take this concept? Here is a quick survey of what folks have been saying lately. Props to infosthetics for providing a good starting point for these discussions. And here they are:
Stephen Few’s blog post on the two types of data viz is a good start. According to Few (Tufte’s alter-ego), there are two approaches to presenting data graphically—data visualization and data art. As he puts it, “rarely do the twain meet.” Therein lies the problem. They do meet. All the time. Though Few makes a good point—failing to distinguish between them creates confusion and harm, I would argue that the two are not mutually exclusive.
Few defines data visualizations as products created to inform, and “data art” as visualizations of data created to entertain—“art based on data”—something which can be judged accordingly.
My response? Would that the public were quite as discerning as he. The train has left the station and what we have before us is—at worst—a proliferation of eager designers too quick on the draw to consider the very important questions that need to be asked about the data that are being depicted. At best, a cadre of informed (and willing to learn) designers who humbly allow the information, the audience and the goals of the visualization to drive the design—who are loathe to add one extra pixel that doesn’t belong, and willing to take away any element that obscures a better understanding of the data. I’d like to think that I fall into the latter category but I fall somewhere in the middle, as do most designers.
Rather than drawing a bright line between these two approaches and dogmatically refusing to accept a middle ground, I suggest we embrace a blend of these when they are produced well—when they inform and present a clearer understanding of the data and are at the same time aesthetically pleasing. As a designer who chooses to serve both masters—art and data, I find joy in being able to translate a jumble of Excel rows and columns into a plain bar chart—sometimes the beauty lies in the hard work of sifting through the data and simplifying complexity. And sometimes the joy comes from experimenting with different formats and adding visual accents to enhance the data—provided, of course, that the user’s ability to understand the data is not impeded, but enhanced.
Nevertheless, I agree with Few’s depiction of the pitfalls of “data art” being misperceived as data visualization, and I’ll add one myself. In addition to spreading poor practice instead of best practice, it creates unrealistic expectations about what is acceptable in a data visualization, particularly for those of us who are working in the industry in a supportive capacity to researchers and writers with an uneven understanding of best practices (how many of us have been asked to create 3D graphics or exploding pie-charts on a whim?).
And a rising tide floats all boats. In this case, I’ll agree with Few’s point that the proliferation of “data art” and other fancy-schmanzy graphics that pass for data visualization imply that data viz is a closely-guarded secret known only to denizens of the data underworld (paraphrasing liberally from Mr. Few, here). But I take issue with his assertion that this prevents the “democratization of data”—implying that the public is somehow being dissuaded from engaging and creating data. For better or for worse, they aren’t. Just google “infographics.”
As an interesting aside, note that Eagereyes’ Robert Kosara wrote a primer on the two types of data visualization that Few discusses, waaaay back in 2007. Like Few, Kosara was also bothered by the blurred line between data and art. What Few calls “data art” Kosara called “artistic visualization.” Nonetheless, they each underscore the same points—keep data and art separate in order to be as transparent and clear about the data as possible. I agree with the goal.
As Kosara puts it, “looking at one type of visualization expecting the other will lead to disappointment and misunderstandings.”
Kosara, uses what is, in my opinion, one of the best data viz sites out there (infosthetics) as an example of sites that don’t make those distinctions, thus creating confusion. Granted, this was back in 2007. I wonder what he’d say now? Nonetheless, I disagree. Let’s not confuse lack of best practice (for example, normalizing your data to prove a point, and not being transparent about it) with the so-called sin of creating a piece that is visually striking. A designer can produce a graph with no artistic aspirations whatsoever that nonetheless obscures the data. And a designer can produce a terrific visual that observes best practices (to inform) and serves up the data artistically and well.
Adam Crymble has a different moniker for Few’s “data art” and Kosara’s “artistic visualizations.” He calls these graphics “shock and awe.” I love that term. Of all the discussions that I have read, Adam’s make the most sense to me. He doesn’t touch on all data viz that is artistic, but rather focuses on the extreme—and in this I strongly agree with the points he makes.
Adam Crymble: “shock and awe” graphics
We’ve all seen these very beautiful, complex visualizations that belong inside of a picture frame or a screensaver. Or, for a few seconds, they give us pause and food for thought.
I’ve seen them, written about them and admire them for what they are—unique explorations of the complexity of data. An artistic or visual expression of the complexity of the information we spew out and take in. But they don’t inform in the traditional sense of the definitions of data viz. They may underscore a pattern, convey a sense of weight through sheer numbers or complexity (as the example above does), but that’s about it. They’re pretty much impossible to understand on a granular level without some work.
Adam’s assertion that these complex visualizations have no place in the academic world is beyond my ken. For the record, the example above is mine, not his (see his post for his own, more humorous example). But if he is correct that peer reviewers are afraid to betray their lack of understanding of these graphics, and thus—through tacit acceptance—are endorsing their validity, well then that should concern all of us.
The most interesting point to be gleaned from Adam’s perspective, I think, is the bullying nature of shoving a terabyte of data in front of someone’s face and saying “Aren’t I clever? Don’t you get it?” I don’t. Point well-taken, Adam.
Mark Ravina writes an interesting rebuttal to Adam’s criticism of “shock and awe” graphics. He compares these artistic and complex visualizations to early feminist scholarship that provoked anger when it challenged the systemic sexism of the ivory tower. I’m a huge fan of confrontation and anger-provoking methods to push movements forward. In the early 90s, ACT-UP did the same thing for GLBT rights, if you’ll recall. Without ACT-UP, Queer Nation and Lesbian Avengers, there would be no fancy Human Rights Campaign fundraising dinners today. I get it.
But Ravina’s assertion that these complex visualizations of data somehow push the field forward is a bit much for me. He calls them “intellectual challenges.” I’m not so sure about that. How many of us who are willing to spend more than a few seconds trying to piece together a gazillion threads and data points in a fancy graphic. I think we consider it more of a waste of time to do anything other than admire the concept, the novelty of the presentation and then move on. Intellect doesn’t play a big role here (the creator, on the other hand, gets some bragging rights for creativity). Does it stick? Does it move the field forward? Um, maybe, sometimes?
Ravina spends a fair amount of time discussing how humanities researchers (he knows them better than I, certainly), insist on tables when they ask for data. I didn’t really read that into Adam’s criticism of these graphics—he was merely pointing out that data viz designers were making information too complex—he never claimed that the solution was to create charts. Then Ravina cites the misuse of pie charts to make the point that just because something is familiar, it can be misused. Is he implying that unfamiliar things can’t? As he puts it, “is schlock worse than shock?” Aside from the clever turn of phrase, it’s a bit of a moot point. Nothing that I have read criticizes innovation—merely obfuscation.
Mark Ravina: “Is schlock worse than shock?”
Ravina makes good points. He surveyed (presumably informally) graphs produced in history journals and notes that the bulk of them rely on formats developed (according to him) 200 years ago—pie charts, line charts and bar graphs. And he mentions how slow the field (I’m unclear if he means academics or history journals in particular) has been to adopt and thus understand formats that even today’s eighth graders are learning (box plots, for example). That’s a valid argument, certainly, but it has little to do with the complex visualizations that Adam was addressing or, for that matter, that Kosara and Few discuss. (To be fair, Ravina’s post was mostly in response to Adam’s).
However, he conflates different types of complexity, predictably citing Tufte and Menard (some of you know how I feel about that) as well as Rosling. Perhaps it’s a matter of taste, but I feel that Rosling bends over backwards to make his visualizations inspiring and accessible (not necessarily complex and beautiful), whereas the Menard graphic, while certainly elegant and ground-breaking, does not (of course not, and how could it, given when it was produced).
Lastly, one of the most important concerns that Adam raised was around obscuring data. By introducing unnecessary complexity into a visualization or graphic, data visualization designers can make academic and peer review verification and transparency needlessly difficult. Ravina counters this by saying that liars will lie. I don’t think that’s the point. They will lie, but transparency is as much about spotting errors or raising valid concerns as it is about unmasking willful deceit. Hats off to Ravina for taking the time to provide some very thoughtful counterpoints to the discussion.
Excelcharts is a pretty good resource for charting and data viz in general, despite the Excelcharts.com name (*smiling*). Jorge Camoes nicely (and literally) draws the elusive line between art/entertainment and data/information.
More importantly, he puts a restraining hand on eager designers, quite reasonably underscoring Few’s point to make sure that, as designers, we emphasize that charts and graphics are readable and easy to understand, not memorable or beautiful. Of course, I’ll see your readability and raise you ten, Jorge. Let’s make the data understandable and, if we can, beautiful as well.
Lastly, there is this. It is a tome. You could spend hours here. It’s an open-review paper, part of which is around data viz, part of which I have skimmed. It deserves careful reading, and I’m eager to do so and write a follow-up post.
Well, if you’ve hung in there with me, I hope you have learned something. I know I have.
There are not many good examples of concentric circle graphics out there. La Nacion produced one last year about subway strikes, and The Guardian produced an interactive graphic on gay rights in the U.S. Both of these intrigued me because, in my day job, I produce endless variations of graphics dealing with 50-state data. And most of the time, when we look at 50-state data, we draw… you guessed it: maps. Or bar graphs showing quantity or line graphs showing changes and trends over time but no matter what we do, it involves data for the 50 states, most often over time. 50 states multiplied by several years is a lot of lines to draw, bars to fill and state maps to create. So I’ve been thinking about ways to tell the story in different formats–going beyond the map, so to speak. Last Wednesday, we created this concentric circle interactive. Here’s how we did it, and the process we took to decide on the format.
One of the most onerous dimensions to 50-state data is the sheer physical size and length of the data. Our website used to allow for a content well of 500 pixels. Try shoving 50 state labels across 500 pixels and you’ll quickly see why it’s a challenge.
But even with all the real estate in the world, long, horizontal displays are also taxing on the user if there is a comparative aspect to the data. There is simply too much bouncing back and forth from the left to the right. Go long and you lose the comparative advantages of a horizontal layout because users with small screens must scroll vertically and can’t see the entire landscape at once. Of course, layering the data into different views as an interactive can solve that. But sometimes you want to show the data all at once. And for that, a static graphic can work well.
Understandably, a map is often the solution. But maps have their limitations too. There’s only so much that you can infer from a map. If your data consist of more than 4-5 gradations it can be tough to create the at-a-glance, concise overview for which a map is best suited.
And if there are no regional patterns discernible in your map, readers wind up staring at a jumble of color with only a legend to tie it all together.
Which brings me to concentric charts. They’re not pie charts (if you look up pie charts on wikipedia, you will see that there is a distant cousin to the pie chart called a “ring chart,” also known as a multi-level pie or a radial tree). These appear to be somewhat visually similar to concentric circle graphs but have a different use–they tend to show hierarchy in data–you might see these when your computer shows you how much disc space you have, for example.
A concentric chart, on the other hand, can tell a different story altogether. In a recent post on La Nacion’s subway strike graphic, I mentioned how designer Florencia Abd manages to plot out a time across four nodes (year, month, day and time) as well as another variable–type of incident/strike. That’s a lot of ground to cover in a static graphic. Imagine doing it in other ways and I’m sure you’ll agree.
Because a circle is, well, round, its shape lends itself quite well to a relationship-based approach. Not so much a pie-chart (where the user sees the parts in their physical relationship to the whole), but rather using the organic form of a circle to help the user more easily compare complex data. And if you add concentric circles, you take advantage of the hierarchy inherent to those circles to create layers–an intuitive way to order your data–perfect for showing levels or ratings where you use the inner and outer rings to denote the endpoints in a scale (e.g., one thing is stronger, larger or more intense on the outside than it is on the inside) or time, as the subway graphic above shows (the outer ring shows 5 a.m. and the inner ring shows 11 p.m.).
So, what does all this have to do with the U.S. map? As I mentioned, the strength of a map is to show geographic relationships in data. For example, southern states vote “red” (or conservative) in the U.S.; whereas a swath northeastern states might vote “blue” (progressive). For this, a map is helpful because regional differences tell the story and are easy to spot.
But the nice thing about concentric charts is that they, too, can show geography, or any groupings, for that matter. As the Guardian’s example shows, each “slice” of the concentric chart belongs to a state and groups of slices are regions. In the Guardian example, each ring (or level) of the chart denotes a particular right afforded to gay couples.
My team took this in a different direction. We wanted to show states and regions as well. But we also wanted to show change over time, as well as intensity on a scale. So when the Bureau of Labor Statistics released its employment figures, we had a few choices. We needed to show how changes in employment have affected each state since the recession (from April, 2007 to April, 2012). Because the recession started in December, 2007, we wanted to show how employment looked in each state before the recession, during the recession and how (and which) states were pulling themselves out of the recession.
We could have created an interactive that showed how the same views above changed over time (presumably you’d see a pre-recession view showing states doing well, a recession view showing most states doing poorly, and post-recession years showing mixed results). The most valuable piece of this would be, of course, geographical patterns in the data, if they existed (how did the Rust Belt fare, or the East Coast, for example). You could overlay this with population or any other demographic data to tell an interesting story.
When we looked at the data, we saw that there were not very strong geographic patterns to show. So we decided to create a concentric chart. Why? Because we didn’t have geographic patterns, but we did have temporal patterns (most states did poorly during a particular period of time, which contrasted well with the mixed results that states showed as they were attempting to pull themselves out of the recession, at least in terms of their employment figures). And the fact that we used a circle meant that we didn’t have to create a very long or wide table or chart, and we could stray from the map approach.
We decided to make this a light interactive–by rolling your cursor over each state’s cell you can see a small bar graph showing change in employment over time. This worked for us because our goal wasn’t to show specific numbers (how much employment rose and fell in a particular state), but rather intensity and patterns over time.
The debate continues (check out the comments on Nathan Yau’s post on the Guardian graphic) on whether or not these concentric graphs are merely eye candy when a simple bar or line chart would do just as well. I would opine that, if used correctly, they work well. Let me know if you agree. Here’s a screenshot of our interactive, and you can view the live version here.
What I learned at Alan Smithee’s Tableau blog? Alan Smithee is not Alan Smithee. The name is a pseudonym that Hollywood directors use when they don’t wish to use their real names in production credits for a (usually terrible) movie.
I also learned a few helpful tips for getting around Tableau, the open source dataviz software that many bloggers use. “Alan” catalogs and discusses a fairly robust array of dataviz formats, presenting helpful insights on working out the less well-known ways of presenting information. There are also a few very good sections on geocoding and Tableau hacks. Love it.
Icing on the cake for newbies? A tabbed data viz presentation (in Tableau of course) of all the charts that Tableau can produce. Oh Alan, I love you.
The commenters on the blog are engaged, responding to both newbies and math geeks. It’s a good blog.
Better living through data visualizations? A new web app called “Spark” claims to improve your body through data viz. And art. And a gizmo called a fitbit. Whatever you call it, it’s both interesting and scary. If you have the time to spare (and, presumably, the calories), you can purchase the fitbit gizmo to track your every physical movement to help you get a very, very detailed sense of your physical activity throughout the seconds and minutes of your life. Really. People do this.
Okay, enough of that. What’s interesting is the use of data visualization to emotionally inspire people to keep moving, walking, jogging, or whatever people do who don’t have enough sense to ride a bike.
Upload your fitbit data (remember that’s the gizmo you have to purchase and presumably wetwire into the back of your skull) to your computer or tablet, log into “Spark” and you’ll be rewarded with piles of visualizations reflecting your activity level. In real time (using the fitbit API, Raphael and HTML5 Canvas). Please ignore the fact that Spark is hosted on a website with a url that begins with “QuantifiedSelf.com.” Apparently data vis is headed for greener pastures.
Sarcasm aside, Spark provides an interesting example of how data visualization can extend into nontraditional paths. More power to ’em, I say.
I’m beginning to realize that, for developing countries like Bolivia, technology (by that I mean information and communications technologies ranging from cellphones and internet access, usage and affordability to the use of social media) is a chicken-and-egg dynamic. For Bolivia, both the egg and the chicken seem out of reach, though there are signs that some things might be improving.
The World Economic Forum and INSEAD recently released the 2012 Global Information Technology Report which scores 142 world economies on their use of information and communications technologies. Below is an infographic that I designed detailing how poorly and how well (mostly the former) Bolivia is using technology to improve the lives of its citizens and to become modestly globally competitive in, as the report puts it, “a hyper connected world.”
Don’t get too depressed, there are some bright spots. If you’re interested, read more about how a newspaper in Argentina is using open data to circumvent its government’s lack of open data transparency. And if you’re really interested, e-mail me.
The good (rankings out of 12 countries in South America):
- Bolivia’s political and regulatory environment (as it relates to technology) ranks 7th in South America.
- Although Bolivia ranks last in business and innovation, it does show a relatively high (3rd) availability of venture capital.
- Overall, the quality of Bolivia’s math and science education, its educational system overall, and its adult literacy rate all rank 7, 7 and 8, respectively.
- And, though Bolivia’s individual usage of technologies ranked last (12th), its citizen participation measure ranks a promising 6.
- Additionally, Bolivia’s capacity for innovation rank (5) is highly encouraging, despite another last place ranking for business usage of information and communications technologies overall.
- One of the most clear challenges for Bolivia is to increase the affordability, availability and reliability of its Information and Communication Technologies (ICT) to its citizens and the businesses that operate within its borders.
- Bolivia ranks last, or close to last, along almost every index. The country’s overall Network Readiness rank is 12.
Three solid entries from Spain, Brazil and Argentina are among the 58 nominees featured in the first-ever international competition for data journalism, the Data Journalism awards. The awards, announced by the Global Editors Network, will be announced on May 31. In the meantime, keep your eye on these three nominees:
“La trama de la SGAE,” from El Mundo’s Spanish designer David Alameda, covers last year’s “Operation Saga,” an undercover investigation of fraudulent financial activities conducted by the president and other members of Spain’s influential Society of Authors and Editors (SGAE). This piece boils down the complex network of who gave money to whom, how much and when into one of the best examples of interactive flowcharts that I’ve seen. As with the best data visualizations, this interactive avoids the many common mis-steps that could have occurred through the overuse of photos, text, talking heads, etc. Instead, Alameda keeps his focus–and ours–on a tightly scripted interactive that guides the user quickly and efficiently through the web of financial whodunits.
2011 Brazil State-Level Business Environment Ranking ranks the country’s business environment along eight categories (ranging from the political climate to innovation) and a series of indicators specific to each category. The interface is clean and simple to understand. Navigation, categories and indicators are well-prioritized and intuitive. One of my favorite features is the linked rollover behaviour between all four elements on the screen: a regional map, a deeper state-specific map, a regional bar graph and an overall scoring graph. A lot of information packed into a clean, well-designed interactive.
Lastly, Argentina’s La Nación is doing great stuff with open data. By my calculations, given that the country ranks sixth of 12 South American countries (and 92nd out of 142 economies globally, according to the recent Global Information and Technology Report’s Networked Readiness Index), this is a telling example of how Argentina’s relatively advanced use of information and communication technologies seem to be paying off, even if its government doesn’t always play along.
La Nación’s Subsidies for the Bus Transportation System is not so much a data visualization as a series of efforts to use open data to report on how bus subsidies in Argentina are being conducted. Dig a little and you’ll find a few good infographics, investigative pieces that detail a government’s efforts to be less than transparent about dollar figures, and an encouraging collaboration between the newspaper and Junar’s open data platform to create a Tableau dashboard that is beginning to circumvent Argentina’s lack of open data infrastructure. Interestingly, the newspaper compares its early efforts to the U.S.’s Freedom of Information Act laws and the American government’s data.gov platform. The dashboard presents a snapshot of indicators key to Argentina (ranging from crime and accident rates to political indicators and legislative data). It’s a promising approach that may help other countries (like Bolivia) with similar challenges (see related article on Bolivia’s recent technology rankings).
This interactive world map, created by Spanish data journalist and developer Chiqui Esteban, is fun, beautiful and informative even if etymology is not your bent. I’m going to spend way, way too much time on this map. Roll your cursor over the map to see a customizable, magnified view of each country’s name, and it’s English translation, ranging from Equalizer, to Chief Smith, to Land of the Enjoyers of Beautiful Things. I’ll let you figure out which those are, and ponder the sad irony of good intentions gone wrong, in many instances.
If I get hit by a bus, I’d like to come back reincarnated as a lab rat in the Universidad Porto’s Laboratorio de Infografía (Portugal). I’m pretty sure I know how the Catholic Church feels about reincarnation. But I wonder how they would feel about this interactive that covers 15 popes from 1900 to 2001. I suspect this is pretty old (it’s Flash) but it’s fun and not very complex. The interface is somewhat clunky–the transitions over the word clouds (of which I am not a fan, but for entertainment, they work) could be much smoother. Nonetheless, if you must use these lazy gems (and I have) this is an interesting use of word clouds and rollovers.
While you’re at it, check out this football transfers data visualization, also from Universidad Porto. It ranks the amounts spent on player transfers in European football clubs (as well as the biggest spenders and how Portuguese clubs fare). The first tab shows the European market (roll over the club names to find total spent (shown as millions of Euros) and where (inside the circle). Unfortunately the donut chart format that is meant to compliment the numeric totals is pretty useless. Totals run along the bottom. Second and third tabs show the four largest Portuguese clubs (incoming and outgoing transfers) and the most expensive players, respectively. Don’t forget to click on the second tab’s player origin/destination link (“ver origem e destine dos jug adores”).
David McCandless has published a brilliant infographic on how we/you/I manipulate rhetoric and logical thinking. Whether you appeal to authority, flattery, probability or tradition, this infographic is for you. Faulty deduction or garbled cause and effect? There’s a place for all of us in this chart. As a smart consumer of visual information, I’m sure you’ll appreciate this infographic.