Finding the right problems for Citizen Science

December 07, 2016

Originally posted on Medium

Citizen Science is something I’ve been interested in for a few years — there’s some great opportunities to create games that create actual impact in the world, while also educating the general public in scientific principles and methodology.

However, whenever I think “I’d like to do a citizen science project”, I’m blocked: like all science, Citizen Science is, first and foremost, about finding the right problem. You need an unsolved scientific problem that the tools of Citizen Science are uniquely suited to solve.

Part of this is understanding the science (that’s the bit I don’t know), and the other is understanding the design constraints (the bit I know). Often the two skills aren’t in the same room, and this can make starting a project difficult. So here, in a nutshell, is the bit I know.

The types of Citizen Science project

Citizen Science problems fall into three major areas: data collection, human analysis, and human problem solving.

Data collection problems are those where you need data about the world. The data you need is simple to gather (ie. you don’t need a lab), but there’s some sort of scale factor that’s difficult (you need to gather data over a large population, over a long period of time, or you just need a hell of a lot of random numbers).

Human analysis problems are problems where basic human analysis of data is useful (ie. better than computers in some way). This is often used in image analysis (e.g. how many penguins are in this picture).

Human problem solving problems are problems (yep, I see it too…) where the human ability to unpack and solve complex problems is useful. The canonical example is FoldIt, and Quantum Moves is also pretty great.

Scale

At the heart of all of these problems must be some sort of scale factor. There’s little point building a complex problem solving game if there’s only a single problem to be solved, or a few images to analyse.

Citizen science projects are, at their best, a large collection of similar, bite-sized problems. To that end, it’s important that any data sets required can be easily collated and split up (for instance, Galaxy Zoo requires large images to be split up, so that each shows a single galaxy).

Verification and Finding Anomalies

As AI increases in effectiveness, human analysis problems, in particular, start to feel like a race against the technological clock. Maybe, after all, we should just wait for AI to catch up.

However, even with improved AI, it can be very hard to verify that the AI has made a correct assessment over a large data set. It’s even harder to know if there were any anomalies: that is, things that the AI didn’t know what to look for. Humans are, at present, still much better at this (as was demonstrated by Planet Hunters when it discovered what’s being called the ‘Alien Megastructure Star’ (note: it’s probably not actually aliens)).

Constraints of the user base

Citizen Science projects aren’t great for anything where you need a representative sample of people. It’s usually going to skew towards those interested in science/the topic area. This mostly affects Data Collection problems, but is certainly worth thinking about for the other types.

Personally, I’m interested in finding ways of bringing Citizen Science to other audiences through games and genres that aren’t so immediately sciency (as an exercise in science outreach and communication — the ownership inherent in Citizen Science is pretty powerful). But even then, there’s going to be large skews in the population based on the availability and audience of the game.

Domain Knowledge

While I’m all for using Citizen Science to enhance science education, it’s important that someone can come into a Citizen Science game and do something meaningful without deep domain knowledge. It’s no use asking the general public to solve general relativity equations (or really, any equations) as a starting point.

Having said that, part of the design challenge of making a Citizen Science game is finding a way to express complex problems in an understandable way. But you‘ll still need to scaffold the user’s learning, and a game where you only start getting useful data in the third hour is a little risky.

To wrap up…

In today’s world of large data sets, there’s likely huge swaths of science that could really use the large-scale human power that citizen science can provide, and there’s a lot of people in the world who’d benefit from being a part of scientific research, even if it’s in a small way.

In many cases, the skills that are needed to make these things work are cross-discipline: a combination of domain knowledge, design, academic rigour, science communication and software engineering. Hopefully, this sheds a bit of light on what these projects entail, and leads to some great projects.

On that note… Part of the reason I put this article together is that there’s currently funding available for Citizen Science projects in Australia, which means that collaborating on such a project is suddenly much more feasible. So if you see this and have an idea for a project, please get in touch!