Untangling listicles and data

This is a picture of my son’s ball of yarn. A bad listicle is like a knotted ball of yarn… sometimes there’s a lot of untangling to do to get to the facts.

a bad listicle is like a knotted ball of yarn

I remember the day a managing editor uttered the word “listicle.” I giggled and was torn between a Beavis and Butthead snort and a disgusted of-course-only-a-guy-would-say-that reaction.

Shortly thereafter, I designed my first one (the effects of the recession on Wisconsin, if you’re curious). It was actually a charticle because it had pictures (The American Journalism Review doesn’t like charticles. I suspect they like listicles even less.)

Wired recently wrote in defense of listicles. It is reassuring to find out that listicles won’t, in fact, give you ADHD. The Guardian is more tongue-in-cheek skeptical, but kindly proffers a few literary examples to redeem the form. I think they lay it on a little thick in number 7 but it’s a good read. More seriously, the trend spawned a new genre, the apocalypsticle, roundly lambasted by Politico a few months ago as “dumbing up” the tragedy of Ukraine.

I won’t go into our obsession with lists. People like them, and you already know why they do. I confess that I’ve never been successful at writing a listicle. I’m too wordy. I rebel at the constraints of a finite number. I just can’t get the thing to hang together to my satisfaction. I need paragraphs to do that. So perhaps I’m just a little jealous.

But I am concerned that our collective obsession and fascination with the genre is not making us stupider, per se, but that—coupled with our penchant for so-called data nuggets—it is becoming too acceptable for journalists and others to conjure up a type of content that does not always responsibly use data and facts to support a story.

Listicles: If you’re going to compare apples to bicycles, go ahead. But don’t pretend you’re just comparing apples.

My biggest beef with the quick comparisons use in listicles is that they make it too easy to cherry-pick disparate data points and thread them together into a seemingly logical order to support a seemingly logical claim. They don’t always use logical comparisons. You’ll frequently see different periods in time compared against each other, or slightly different variables compared as well. If you don’t look closely, you can get the wrong message. And if you do look closely, you’ll be confused. It’s okay to compare things from different time periods, but you need to explain that. You also need to explain which things change from one comparison point to another.

The order in which points are presented in a list can also hide unfeasible comparisons.

If I tell you that, in 2010, 1 million green widgets were produced in the U.S., compared to 5 million red ones produced in 2012, you might think this is a ridiculous comparison and you’d be hard-pressed to understand what, if any, trend existed here. The time period is different, as is the variable.

But what if I separate those comparisons with other items in a list? You might lose track.

1.  In 2010, 1 million green widgets were produced in the U.S.

2. In 2014, China is the largest manufacturer of widgets, with the U.S. ranked at number 5.

3. Over the past 5 years In the U.S., the majority of widgets have been produced in the South.

4. Nationally, widget manufacturing plants are closing down.

5. In 2012, there were only 5 million red widgets produced in the U.S.

Now think back to how often you have come across this in a list.

Lists should offer data in order to provide perspective and context

Too many lists and listicles are simply a series of disparate data points that offer the reader no meaningful way to compare numbers with broader context.

If I tell you that there are “x” number of people living in dire poverty in the United States, I owe you a bit more than that.

I need to tell you what percentage that number is out of the broader population, and I need to define that population. I need to tell you what “dire” means. Better, I should give you a dollar amount of poverty (and cite that definition to a credible source) so that you’ll understand the context. I should probably give you the time period for this data. And in a second bullet, it would be helpful if I further provided information about change in this figure over time–preferably a long period of time to discount variances over shorter amounts of time. Here’s a better way to accomplish the first bullet:

Of the [#] million adults* living in the U.S. in 2013, [#] of them (x %) are living at or below 50% of the federal poverty level (an annual income of $5,835 for one person**)

*Adults aged 18-and older
**Federal poverty levels 2014
Citation for data source

The examples above are just that, examples. But the other day I stumbled across Mother Jones’ “10 Poverty Myths, Busted,” It proves my point. I took the time to deconstruct it below.

Not every set of facts lends itself to a list, or a conclusion.

Before you read further, please understand that my notes below are not to undermine the claim of the author, but rather to strengthen it. The notes are not intended to be exhaustive. Reasonable minds can poke holes in them at will. But my point remains the same: We have lists. We have data. We have people very willing to read them and share them. If we have strong facts that actually relate to one another, and we lay those facts out clearly and in a logical order, the reader will draw logical conclusions. But if we don’t do these things, at best, we lead readers down a rabbit hole that leaves them frustrated and confused.

At worst, we turn facts into opinions that are interpreted as facts.

Deconstructing “10 Poverty Myths, Busted,” by Mother Jones

The main problem with this list is the disparate, unconnected (by time, logic, data set) nature of 10 claims that, together, attempt to combat poverty stereotypes. The Mother Jones listicle and its bullets are in bold followed by italics. The text prefaced by “issue” is mine.

1. Single moms are the problem. Only 9 percent of low-income, urban moms have been single throughout their child’s first five years. Thirty-five percent were married to, or in a relationship with, the child’s father for that entire time.*

Issue: To me, this bullet appears to cherry-pick the data. What makes the first 5 years a magic number? What about the first 10, 15 or 18? If there is importance to those first 5 years, context would be helpful.

Issue: It is unclear whether the dads in the relationships are actually living in the household. This matters because whether or not a dad lives in a household affects the income level of the mom and child. If he lives in the household for example, the household income can be higher (if he has an income to contribute), which can affect a mom’s low-income status.

2. Absent dads are the problem. Sixty percent of low-income dads see at least one of their children daily. Another 16 percent see their children weekly.*

Issue: There is a difference between low-income and poverty. The label “low-income” is subjective. The label “poverty” is qualitative (there is a federal poverty level). So if you use a subjective level like “low-income” and don’t define it upfront, you can give the appearance of anything you like, really.

Also note how this bullet is worded. On the flip side, you could also say that a quarter of kids don’t see their dad as often as once a week.

And notice how bullet #1 discusses dads living in the household (presumably), whereas this item discusses dads who live outside of the household.

3. Black dads are the problem. Among men who don’t live with their children, black fathers are more likely than white or Hispanic dads to have a daily presence in their kids’ lives.

Problem: Cherry-picking again. Why single out black dads in this one bullet? There are Hispanic dads who live at or below the poverty level. Ditto white dads. Did MJ simply pick the most appealing (higher) number to make the point stronger?

4. Poor people are lazy. In 2004, there was at least one adult with a job in 60 percent of families on food stamps that had both kids and a nondisabled, working-age adult.

Issue: The time period shifts for these comparisons, making it impossible to compare or understand trends. This bullet dates to 2004. Others (bullets 5 and 6, for example) date to 2012.

5. If you’re not officially poor, you’re doing okay. The federal poverty line for a family of two parents and two children in 2012 was $23,283. Basic needs cost at least twice that in 615 of America’s cities and regions.

Issue: The Economic Policy Institute calculator and date from which this bullet is derived is for 2013. The federal poverty level cited is for 2012. Not a huge deal, but you can expect the numbers to change in a year due to inflation, cost of living, etc.

6. Go to college, get out of poverty. In 2012, about 1.1 million people who made less than $25,000 a year, worked full time, and were heads of household had a bachelor’s degree.**

Issue: Context would be helpful here. 1.1 million people out of how many? And isn’t it expected for new college grads to not make much money? How long have these 1.1 million people cited been in the workforce? Poverty is unacceptable at any level (my opinion here), but telling me that a forty-year-old mother of two with a degree who has been in the workforce for 20 years and is living at the federal poverty level is one thing. Tell me the same thing for a recent grad out of school for a year will send me a different message.

7. We’re winning the war on poverty. The number of households with children living on less than $2 a day per person has grown 160 percent since 1996, to 1.65 million families in 2011.

Issue: This myth is so subjective and vague that it’s hard to dissect. It would be helpful to know if the $2 a day includes or does not include federal assistance (TANF, SNAP, etc.).

8. The days of old ladies eating cat food are over. The share of elderly single women living in extreme poverty jumped 31 percent from 2011 to 2012.

Issue: Clearly, this lacks context to be clear and helpful. What does extreme poverty mean? If you dig into the source report for this, you’ll see on page 3 that extreme poverty is defined as income at or below the federal poverty level. The statistic itself is startling and would be strengthened by an upfront definition of the spectrum of poverty, preferably in dollar figures as well.

9. The homeless are drunk street people. One in 45 kids in the United States experiences homelessness each year. In New York City alone, 22,000 children are homeless.

Issue: To put this into perspective, What is the percentage of homeless kids compared to the percentage of homeless adults with (presumably) addiction problems? The connection between homeless individuals with substance abuse issues and children may be one that is widely discussed, but in terms of statistical comparison here, the pairing is clunky.

10. Handouts are bankrupting us.In 2012, total welfare funding was 0.47 percent of the federal budget.

Issue: It’s difficult to follow the citation, partly due to how you define “2012” and “federal budget.” If you follow the citation, you’ll see that this bullet cites the budget that the president proposed for FY2013 on February 13, 2012 ($3.8 trillion). The key word is “proposed,” and the fact that the budget itself was for FY2013. This was not the 2012 budget, as the wording in this bullet implies. Further, the FY2013 budget itself is hard to pin down. The bullet implies that spending happened as a proportion of an actual budget. But in reality, it’s citing a 2012 figure ($16.6 billion) as a percentage of a proposed budget for a different year (2013), not the actual budget for 2012. Furthermore, remember that the proposed budget is simply a draft, if you will. It must be approved by Congress (remember all the subsequent political wrangling, counter-proposals by Republicans, sequestration, etc. (If you want to know what happened to the 2013 budget however, you can check out CBO’s later analysis.) So saying that total welfare funding in 2012 “was” a percent of a **proposed** budget that was intended for FY2013 is neither correct, nor easy to follow.

Issue: Then there’s how “welfare” is used in the bullet. It’s unclear what that means. Clarifying it upfront would have been more helpful. Presumably the word “welfare” is the $16.5 billion cited by the Center on Budget and Policy Priorities for what is known as TANF (Temporary Assistance for Needy Families – financial aid for some poor families). But you have to do a little digging in the source to ascertain that. And using the word “welfare” could mean TANF, and it could mean SNAP (food stamps) as well. Or something else, as this helpful post on Real Clear Politics points out. This is needlessly unclear, given that the term “welfare” is such a politically charged word.

Sources as cited in the above Mother Jones list:
*Source: Analysis by Dr. Laura Tach at Cornell University.
**Source: Census

So, listicles. Use them wisely and well. And if not, stick to the goofy stuff, not the serious stuff. Like, top 6 reasons this post would have been shorter and more effective if I had used a list.

Big data, small budget, good mission: Using data and other cool stuff for social change

These days, you can throw a rock and reliably hit any number of articles and headlines proclaiming the power of big data, open data, and transparency. The acceptance and adoption of using large, public sets of information to make informed decisions represents a sea change in how the corporate world, inacademia, think tanks and large NGOs are investing in their capacity to crunch more than numbers. No surprise there. But how does the little guy—the small grassroots organization with a small budget and a big mission around social change—fare?

I’ve been thinking a lot about this lately. Last year, I made a job shift. I moved from a very large, well-funded nonprofit to a relatively small healthcare advocacy organization. In my old job, I worked in data visualization and regularly called upon the considerable financial, technological and statistical resources that my employer afforded me. Today is a different story. I work with supremely talented and passionate people, but the data resources that I once took for granted are gone. The “data divide” is now staring me in the face. And that’s the reason for this post—the reality that, for all the promise that big data and technology claims to offer, many of today’s smaller nonprofits and grassroots organizations are not equipped to collect, understand and harness information to move their social mission. We are the “have nots” who look out onto the world of the “haves” with statistical modeling tools, economists or statisticians on hand, coders on staff or on contract.

The data divide—what is it?

The “data divide” is by now a familiar term to many of us. The Guardian wrote about “data apartheid” when it reported on the findings from the recent 2013 Open Data Index last November. Similar findings are in the Open Data Barometer 2013 report released late last year too. And we know how it exacerbates problems faced by developing countries in fostering an open, transparent government and an informed, participatory citizenry. As I wrote last year, a good example of how open data helps citizens overcome these hurdles lies in how La Nacion (Argentina’s national newspaper) teamed up with data journalists to publish data on a variety of indicators to the Argentine public—despite the government’s lack of a Freedom of Information law.

Data divide: Access to data does not translate into results

In a blog post dating back to 2011, Mike Gurnstein describes the data divide in a way that many health care advocates are talking about healthcare today. In discussing the Affordable Care Act, advocates regularly say that access to health care is not enough—it’s the quality of care that matters. And there is an entire movement around health system reform that underscores this. Gurnstein makes a similar point about data: “[A]ccess is not enough,” he writes. “[I]t is whether opportunities and pre-conditions are in place for the effective use of the technology particularly for those at the grassroots.” Go Mike. I haven’t a clue who you are, but you nailed it.

In the same way in which the “digital divide” of the 90s and 00s required education and digital literacy to make real the opportunities that online access offered, bridging the data divide for small organizations relies on more than making data available, but also in affording these groups the ability to use it effectively through knowledge (data literacy—an understanding of how to read data and how to represent/visualize it effectively for a common purpose) and resources (the realization of this understanding into actual tools).

How can data help grassroots organizations and smaller nonprofits?

Here in D.C., Applied Predictive Technology (APT), a tech firm that sells predictive analytics software, volunteered to analyze the data that a local charter school was collecting from the tablet apps that its students were using. APT used this “data dive” to help teachers assess how well the tablet reading apps were working for different kids—allowing teachers to tweak the reading curriculum and apply intervention to different types of students.

One of the best organizations out there is New York-based DataKind. If you really want to understand how socially-conscious data scientists are working to achieve social change through data, take five minutes to check out the variety of projects they work on. Over the past several years, DataKind has been launching “data drives” in cities around the U.S. Similar in nature to “hackathons” or “code-a-thons,” these DataDives team up volunteer data scientists/analysts and social organizations over the course of a few days to build apps or software that solve a well-defined data problem. And then they solve it.

When DataKind held a DataDive for D.C. Action for Children, a small organization that collects data on the indicators that affect the well-being of children to mitigate poverty, good things started to happen. The nonprofit also runs the DC chapter of the Kids Count program and, through Kids Count, it was doing a great job at collecting data (that was their mission). But the work that they were producing was static—PDFs—a situation common to many small organizations. Fortunately, they realized that, to make the data meaningful, easier to analyze, and more effective at highlighting the poverty problems that needed to be solved, they needed to visualize it. This is where DataKind came in. Their volunteers worked for a month to create an interactive data visualization tool (eDatabook) that mapped the well-being indicators and poverty clusters across the District. The best part? It’s replicable. Other DC Count programs across the country can adopt it as well.

Using data and hackathons to help on a local level

The vast global data modeling regularly published by the World Bank is impressive. But municipalities are using data to tackle local problems too. Like D.C. Action for Children, cities are pairing up with volunteer data analysts and coders to sidestep the issues of inhouse capacity and expertise.

To fight ongoing problems with obesity and diabetes, for example, New York City launched its first health data code-a-thon this past December. The result? An app called “Vera.” Based on a user’s risk for diabetes, Vera texts users reminders and tips for physical activity, glucose monitoring and even good food intake.

Leveraging hackathons for broader impact

Voting: On a broader scale, the Voting Information Project, a small group of elections experts who focus on improving the voting experience for the public through cutting-edge technology, held its first hackathon in November, 2013 (disclaimer, I was affiliated with the organization that funds this project). The hackathon yielded fast and effective results, including first-ever voter lookup tools, that were used by Americans everywhere.

Healthcare: On June 2, 2014 Health Code-a-palooza will bring together programmer teams who, over the course of 48 hours, compete to see who can use a Medicare data set to build the best app for doctors to use to improve the quality of care that they deliver to patients. This hackathon is part of the Health Data Consortium’s annual Health Datapalozza, an event that features data and healthcare experts discussing how open data can drive meaningful improvements in the health reform movement. But you have to admit, the coding is pretty cool too. If you’re interested in learning more about how open data is playing out in the field of healthcare, read more about the Health Data Consortium.

Challenges and questions around transforming the data culture in small nonprofits

Lack of data literacy can impede an organization’s ability to articulate its need.
As I mentioned, part of the problem is not just access to data, but being able to frame a goal, understand which data to collect and establish good collection practices—data literacy. For an organization taking nascent steps toward data collection, this can be daunting. It requires a change in the organization’s culture, investment of time (if not technology and staff) and a reprioritization of traditional methods of executing its goals. Much of that work is internal. But some of that can be helped by organizations such as DataKind’s, who actually mentors organizations to help them frame their problem and prepare for the end result.

Sustainability beyond the initial volunteer effort

And what happens after the project concludes? What if something breaks? How do you continue to foster an environment of learning and change in an organization after it takes its first steps toward a data culture? Again, an approach like DataKind’s is promising. They stick around, monitor the project and provide follow-up support to ensure that the work keeps going. That makes sense, because it’s part of DataKind’s mission. In future posts, this is something that I’ll be writing more about, as well as how data volunteers and organizations are finding each other. If you’ve got ideas or stories to share, let me know. You can follow me on Twitter at @uriona.

Case study: creating a 50-state data visualization on elections administration

Ever wonder how well states are running their elections systems? Want to know which state rejects the highest number of absentee ballots? Or which state has the lowest voting time? And which state has the highest rate of disability- or illness-related voting problems?

A new interactive elections tool by The Pew Charitable Trusts (the Elections Performance Index) sheds some light on many of the issues that affect how well states administer the process of ensuring that their citizens have the ability to vote and to have those votes counted. Measuring these and other indicators (17 in all, count ‘em), Pew’s elections geeks (I was a part of the team) partnered with Pitch Interactive to develop a first-of-its-kind-tool to see how states fare. Today’s post is a quick take on how the project was created from a data visualization perspective.

Pew Election Performance Index interactive

Pew’s latest elections interactive: The Elections Performance Index

Lots of data here, folks. 50 states (and the District), two elections (2008 presidential and 2010 mid-term) and 17 ways to measure performance. Add to that the ability to allow viewers to make their own judgments–there is an overall score, for sure–but the beauty of this tool is that it allows users to slice and dice the data along some or all indicators, years and states to create custom views and rankings of the data.

You might already know about Pitch Interactive. They’re the developers who created the remarkably cool and techy interactive that tracks government-sponsored weapons/ammunition transactions for Google’s Chrome workshop (view this in Chrome) as well as static graphics like Popular Science’s Evolution of Innovation and Wired’s 24 hours of 311 calls in New York.

The data will dictate your approach to a good visualization

When we sat down with Pitch to kick around ideas for the elections interactive, we were initially inspired by Moritz Stefaner’s very elegant Your Better Life visualization, a tool that measures 11 indicators of quality of life in the 30-plus member countries of the Organization for Economic Cooperation and Development (OECD). Take a look–it’s a beautiful representation of data.

And though, initially, we thought that our interactive might go in the same direction, a deeper dive into the data proved otherwise. Comparing 30 countries along 11 indicators is very different than comparing 50 states plus DC, 17 indicators and 2 election cycles. Add to that the moving target of creating an algorithm to calculate indicators for different user-selected combinations, and you’ve got yourself a project.

After our interactive was live, I talked to Wesley Grubbs (founder and creative director at Pitch) about the project. I was interested in hearing about the hurdles that the data and design presented and how his creativity was challenged when working with the elections data. One of the first things that he recalled was the sheer quantity of data, and the complications of measuring indicators along very different election cycles. If this sounds too wonky, bear with me. Remember, one of the cool things about this interactive is that it allows you to see voter patterns (e.g., voter turnout) along two very different types of elections–mid-term elections (when many states elect their governors, their members of Congress and, in many cases, municipal elections) and the higher-profile presidential elections. Pitting these two against one another is a bit like comparing the proverbial apples and oranges. Voting patterns are dramatically different. (The highest rate of voter turnout in 2008–a presidential election–was 78.1 % in Minnesota. Compare that to the highest rate in the 2010 midterm election–56% for Maine, and you’ll see what I mean.)

Your audiences will influence your design

Another challenge early on was the tension between artistry and function. In an ideal world, the most beautiful thing is the most clear thing (an earlier post, “Should graphics be easy to understand?“, delves into this further). I remember reviewing the awesomeness behind Wes and his team’s early representations of the data. From my perspective as a designer, these were breathtakingly visual concepts that, to those who hung in there, served up beauty as well as clarity. But from a more pragmatic perspective, an analysis of our audience (policymakers and key influencers as well as members of the media and state election administration officials) revealed that the comfort-level with more abstract forms of visualizations was bound to be a mixed bag. Above all else, we needed to be clear and straightforward, getting to the data as quickly as possible.

Wes decided to do just that. “It’s funny,” he said. “We don’t often use bar graphs in our work. But in this case we asked, what’s the most basic way to do rankings? And we realized, it’s simple. You put things on top of one another. So what’s more basic than a bar chart?”

“We had to build trust–you can’t show sparkle balls flying across the screen to impress [your users]–you have to impress them with the data.”–Wesley Grubbs, Pitch Interactive

When I asked Wes how, at the time, he had felt about possibly letting go of some of the crazy creativity that led him to create the Google weapons/ammunitions graphic, he simply responded, “Well, yes, we do lots of cutting edge, wild and crazy stuff. In this case, however, a good developer is going to go where the data leads them. In addition, the audiences for this tool are journalists, academics, media–the range of tech-saavyness is very broad. We had to build trust–you can’t show sparkle balls flying across the screen to impress them–you have to impress them with the data.”

Turn your challenges into an asset

When we brought up the oft-cited concern around vertical space (“How long do you expect people to scroll for 50 states, Wes?”, I remember asking) his approach was straightforward: “Let’s blow up the bar chart and make it an intentional use of vertical space. Let’s make the user scroll–build that into the design instead of trying to cram everything above the fold.”

I think it worked. This is a terrific example of visualization experts who, responsibly, put the data and the end users above all else. “We could have wound up with a beautiful visualization that only some of our audiences understood,” says Wes. “We opted to design something accessible to everyone.”

How did Pitch build the Elections Performance Index tool?

Primarily using D3, a javascript library that many developers are now using for visualizations. It was not without its drawbacks, however. When I asked Wes about lessons learned, the first thing that he mentioned was the importance of understanding the impact of end-user technology on different programming languages. “D3 isn’t for everyone,” he notes. “Take a look at your users. What browsers are they using? The older stuff simply won’t work with many of the best tools of today. You have to scale back expectations at the beginning. The hardest part can be convincing organizations that the cutting-edge stuff requires modern technology and their users may not be in line with that. It’s all about the end user.”

Well, as an end user and a participant in the process I’m pleased. I hope you’ll agree to take the tool for a spin.

geeking out on a good infographic

I stumbled across Junk Charts’ informative deconstruction of a data-driven infographic on income distribution across the U.S.

Bottom line (and I agree)–lead with the data, but unobtrusively–don’t overtax the reader. The first thing you see is an intuitively simple breakdown of income distribution. The use of color is excellent–you don’t even have to read the legend closely to understand that dark means highest concentration of income (rich) and light means least (poor). And you can see at a glance how this plays out across states.

However, I did spend a minute trying to figure out what the top horizontal line meant on the second part of this chart (income distribution by state) and realized, belatedly, that it was the national average. I would have treated that just like a state so that users could compare easily, perhaps setting the color differently (e.g., dark blue to light blue). And, as Junk Charts correctly points out, ordering the states by something other than alpha order (e.g., quintiles) makes sense.

There’s some interesting back and forth about how the top and bottom scales are colored. Same colors used for two different scales–good or bad? Design or accuracy? You don’t always have to choose one or the other–I would have opted for a different, albeit complimentary, color scheme for each of the two.

Here’s the original infographic on income distribution, posted back in December 2011.

Income Distribution across the US