Big data, small budget, good mission: Using data and other cool stuff for social change

These days, you can throw a rock and reliably hit any number of articles and headlines proclaiming the power of big data, open data, and transparency. The acceptance and adoption of using large, public sets of information to make informed decisions represents a sea change in how the corporate world, inacademia, think tanks and large NGOs are investing in their capacity to crunch more than numbers. No surprise there. But how does the little guy—the small grassroots organization with a small budget and a big mission around social change—fare?

I’ve been thinking a lot about this lately. Last year, I made a job shift. I moved from a very large, well-funded nonprofit to a relatively small healthcare advocacy organization. In my old job, I worked in data visualization and regularly called upon the considerable financial, technological and statistical resources that my employer afforded me. Today is a different story. I work with supremely talented and passionate people, but the data resources that I once took for granted are gone. The “data divide” is now staring me in the face. And that’s the reason for this post—the reality that, for all the promise that big data and technology claims to offer, many of today’s smaller nonprofits and grassroots organizations are not equipped to collect, understand and harness information to move their social mission. We are the “have nots” who look out onto the world of the “haves” with statistical modeling tools, economists or statisticians on hand, coders on staff or on contract.

The data divide—what is it?

The “data divide” is by now a familiar term to many of us. The Guardian wrote about “data apartheid” when it reported on the findings from the recent 2013 Open Data Index last November. Similar findings are in the Open Data Barometer 2013 report released late last year too. And we know how it exacerbates problems faced by developing countries in fostering an open, transparent government and an informed, participatory citizenry. As I wrote last year, a good example of how open data helps citizens overcome these hurdles lies in how La Nacion (Argentina’s national newspaper) teamed up with data journalists to publish data on a variety of indicators to the Argentine public—despite the government’s lack of a Freedom of Information law.

Data divide: Access to data does not translate into results

In a blog post dating back to 2011, Mike Gurnstein describes the data divide in a way that many health care advocates are talking about healthcare today. In discussing the Affordable Care Act, advocates regularly say that access to health care is not enough—it’s the quality of care that matters. And there is an entire movement around health system reform that underscores this. Gurnstein makes a similar point about data: “[A]ccess is not enough,” he writes. “[I]t is whether opportunities and pre-conditions are in place for the effective use of the technology particularly for those at the grassroots.” Go Mike. I haven’t a clue who you are, but you nailed it.

In the same way in which the “digital divide” of the 90s and 00s required education and digital literacy to make real the opportunities that online access offered, bridging the data divide for small organizations relies on more than making data available, but also in affording these groups the ability to use it effectively through knowledge (data literacy—an understanding of how to read data and how to represent/visualize it effectively for a common purpose) and resources (the realization of this understanding into actual tools).

How can data help grassroots organizations and smaller nonprofits?

Here in D.C., Applied Predictive Technology (APT), a tech firm that sells predictive analytics software, volunteered to analyze the data that a local charter school was collecting from the tablet apps that its students were using. APT used this “data dive” to help teachers assess how well the tablet reading apps were working for different kids—allowing teachers to tweak the reading curriculum and apply intervention to different types of students.

One of the best organizations out there is New York-based DataKind. If you really want to understand how socially-conscious data scientists are working to achieve social change through data, take five minutes to check out the variety of projects they work on. Over the past several years, DataKind has been launching “data drives” in cities around the U.S. Similar in nature to “hackathons” or “code-a-thons,” these DataDives team up volunteer data scientists/analysts and social organizations over the course of a few days to build apps or software that solve a well-defined data problem. And then they solve it.

When DataKind held a DataDive for D.C. Action for Children, a small organization that collects data on the indicators that affect the well-being of children to mitigate poverty, good things started to happen. The nonprofit also runs the DC chapter of the Kids Count program and, through Kids Count, it was doing a great job at collecting data (that was their mission). But the work that they were producing was static—PDFs—a situation common to many small organizations. Fortunately, they realized that, to make the data meaningful, easier to analyze, and more effective at highlighting the poverty problems that needed to be solved, they needed to visualize it. This is where DataKind came in. Their volunteers worked for a month to create an interactive data visualization tool (eDatabook) that mapped the well-being indicators and poverty clusters across the District. The best part? It’s replicable. Other DC Count programs across the country can adopt it as well.

Using data and hackathons to help on a local level

The vast global data modeling regularly published by the World Bank is impressive. But municipalities are using data to tackle local problems too. Like D.C. Action for Children, cities are pairing up with volunteer data analysts and coders to sidestep the issues of inhouse capacity and expertise.

To fight ongoing problems with obesity and diabetes, for example, New York City launched its first health data code-a-thon this past December. The result? An app called “Vera.” Based on a user’s risk for diabetes, Vera texts users reminders and tips for physical activity, glucose monitoring and even good food intake.

Leveraging hackathons for broader impact

Voting: On a broader scale, the Voting Information Project, a small group of elections experts who focus on improving the voting experience for the public through cutting-edge technology, held its first hackathon in November, 2013 (disclaimer, I was affiliated with the organization that funds this project). The hackathon yielded fast and effective results, including first-ever voter lookup tools, that were used by Americans everywhere.

Healthcare: On June 2, 2014 Health Code-a-palooza will bring together programmer teams who, over the course of 48 hours, compete to see who can use a Medicare data set to build the best app for doctors to use to improve the quality of care that they deliver to patients. This hackathon is part of the Health Data Consortium’s annual Health Datapalozza, an event that features data and healthcare experts discussing how open data can drive meaningful improvements in the health reform movement. But you have to admit, the coding is pretty cool too. If you’re interested in learning more about how open data is playing out in the field of healthcare, read more about the Health Data Consortium.

Challenges and questions around transforming the data culture in small nonprofits

Lack of data literacy can impede an organization’s ability to articulate its need.
As I mentioned, part of the problem is not just access to data, but being able to frame a goal, understand which data to collect and establish good collection practices—data literacy. For an organization taking nascent steps toward data collection, this can be daunting. It requires a change in the organization’s culture, investment of time (if not technology and staff) and a reprioritization of traditional methods of executing its goals. Much of that work is internal. But some of that can be helped by organizations such as DataKind’s, who actually mentors organizations to help them frame their problem and prepare for the end result.

Sustainability beyond the initial volunteer effort

And what happens after the project concludes? What if something breaks? How do you continue to foster an environment of learning and change in an organization after it takes its first steps toward a data culture? Again, an approach like DataKind’s is promising. They stick around, monitor the project and provide follow-up support to ensure that the work keeps going. That makes sense, because it’s part of DataKind’s mission. In future posts, this is something that I’ll be writing more about, as well as how data volunteers and organizations are finding each other. If you’ve got ideas or stories to share, let me know. You can follow me on Twitter at @uriona.

Case study: creating a 50-state data visualization on elections administration

Ever wonder how well states are running their elections systems? Want to know which state rejects the highest number of absentee ballots? Or which state has the lowest voting time? And which state has the highest rate of disability- or illness-related voting problems?

A new interactive elections tool by The Pew Charitable Trusts (the Elections Performance Index) sheds some light on many of the issues that affect how well states administer the process of ensuring that their citizens have the ability to vote and to have those votes counted. Measuring these and other indicators (17 in all, count ‘em), Pew’s elections geeks (I was a part of the team) partnered with Pitch Interactive to develop a first-of-its-kind-tool to see how states fare. Today’s post is a quick take on how the project was created from a data visualization perspective.

Pew Election Performance Index interactive

Pew’s latest elections interactive: The Elections Performance Index

Lots of data here, folks. 50 states (and the District), two elections (2008 presidential and 2010 mid-term) and 17 ways to measure performance. Add to that the ability to allow viewers to make their own judgments–there is an overall score, for sure–but the beauty of this tool is that it allows users to slice and dice the data along some or all indicators, years and states to create custom views and rankings of the data.

You might already know about Pitch Interactive. They’re the developers who created the remarkably cool and techy interactive that tracks government-sponsored weapons/ammunition transactions for Google’s Chrome workshop (view this in Chrome) as well as static graphics like Popular Science’s Evolution of Innovation and Wired’s 24 hours of 311 calls in New York.

The data will dictate your approach to a good visualization

When we sat down with Pitch to kick around ideas for the elections interactive, we were initially inspired by Moritz Stefaner’s very elegant Your Better Life visualization, a tool that measures 11 indicators of quality of life in the 30-plus member countries of the Organization for Economic Cooperation and Development (OECD). Take a look–it’s a beautiful representation of data.

And though, initially, we thought that our interactive might go in the same direction, a deeper dive into the data proved otherwise. Comparing 30 countries along 11 indicators is very different than comparing 50 states plus DC, 17 indicators and 2 election cycles. Add to that the moving target of creating an algorithm to calculate indicators for different user-selected combinations, and you’ve got yourself a project.

After our interactive was live, I talked to Wesley Grubbs (founder and creative director at Pitch) about the project. I was interested in hearing about the hurdles that the data and design presented and how his creativity was challenged when working with the elections data. One of the first things that he recalled was the sheer quantity of data, and the complications of measuring indicators along very different election cycles. If this sounds too wonky, bear with me. Remember, one of the cool things about this interactive is that it allows you to see voter patterns (e.g., voter turnout) along two very different types of elections–mid-term elections (when many states elect their governors, their members of Congress and, in many cases, municipal elections) and the higher-profile presidential elections. Pitting these two against one another is a bit like comparing the proverbial apples and oranges. Voting patterns are dramatically different. (The highest rate of voter turnout in 2008–a presidential election–was 78.1 % in Minnesota. Compare that to the highest rate in the 2010 midterm election–56% for Maine, and you’ll see what I mean.)

Your audiences will influence your design

Another challenge early on was the tension between artistry and function. In an ideal world, the most beautiful thing is the most clear thing (an earlier post, “Should graphics be easy to understand?“, delves into this further). I remember reviewing the awesomeness behind Wes and his team’s early representations of the data. From my perspective as a designer, these were breathtakingly visual concepts that, to those who hung in there, served up beauty as well as clarity. But from a more pragmatic perspective, an analysis of our audience (policymakers and key influencers as well as members of the media and state election administration officials) revealed that the comfort-level with more abstract forms of visualizations was bound to be a mixed bag. Above all else, we needed to be clear and straightforward, getting to the data as quickly as possible.

Wes decided to do just that. “It’s funny,” he said. “We don’t often use bar graphs in our work. But in this case we asked, what’s the most basic way to do rankings? And we realized, it’s simple. You put things on top of one another. So what’s more basic than a bar chart?”

“We had to build trust–you can’t show sparkle balls flying across the screen to impress [your users]–you have to impress them with the data.”–Wesley Grubbs, Pitch Interactive

When I asked Wes how, at the time, he had felt about possibly letting go of some of the crazy creativity that led him to create the Google weapons/ammunitions graphic, he simply responded, “Well, yes, we do lots of cutting edge, wild and crazy stuff. In this case, however, a good developer is going to go where the data leads them. In addition, the audiences for this tool are journalists, academics, media–the range of tech-saavyness is very broad. We had to build trust–you can’t show sparkle balls flying across the screen to impress them–you have to impress them with the data.”

Turn your challenges into an asset

When we brought up the oft-cited concern around vertical space (“How long do you expect people to scroll for 50 states, Wes?”, I remember asking) his approach was straightforward: “Let’s blow up the bar chart and make it an intentional use of vertical space. Let’s make the user scroll–build that into the design instead of trying to cram everything above the fold.”

I think it worked. This is a terrific example of visualization experts who, responsibly, put the data and the end users above all else. “We could have wound up with a beautiful visualization that only some of our audiences understood,” says Wes. “We opted to design something accessible to everyone.”

How did Pitch build the Elections Performance Index tool?

Primarily using D3, a javascript library that many developers are now using for visualizations. It was not without its drawbacks, however. When I asked Wes about lessons learned, the first thing that he mentioned was the importance of understanding the impact of end-user technology on different programming languages. “D3 isn’t for everyone,” he notes. “Take a look at your users. What browsers are they using? The older stuff simply won’t work with many of the best tools of today. You have to scale back expectations at the beginning. The hardest part can be convincing organizations that the cutting-edge stuff requires modern technology and their users may not be in line with that. It’s all about the end user.”

Well, as an end user and a participant in the process I’m pleased. I hope you’ll agree to take the tool for a spin.

geeking out on a good infographic

I stumbled across Junk Charts’ informative deconstruction of a data-driven infographic on income distribution across the U.S.

Bottom line (and I agree)–lead with the data, but unobtrusively–don’t overtax the reader. The first thing you see is an intuitively simple breakdown of income distribution. The use of color is excellent–you don’t even have to read the legend closely to understand that dark means highest concentration of income (rich) and light means least (poor). And you can see at a glance how this plays out across states.

However, I did spend a minute trying to figure out what the top horizontal line meant on the second part of this chart (income distribution by state) and realized, belatedly, that it was the national average. I would have treated that just like a state so that users could compare easily, perhaps setting the color differently (e.g., dark blue to light blue). And, as Junk Charts correctly points out, ordering the states by something other than alpha order (e.g., quintiles) makes sense.

There’s some interesting back and forth about how the top and bottom scales are colored. Same colors used for two different scales–good or bad? Design or accuracy? You don’t always have to choose one or the other–I would have opted for a different, albeit complimentary, color scheme for each of the two.

Here’s the original infographic on income distribution, posted back in December 2011.

Income Distribution across the US