Playtesting 105: How to Measure Qualitatively

August 06, 2011

Originally posted on Gamasutra

This is part 5 in a series on how to playtest games (click through to parts 1 2 3 4 6).

This time, we’ll be looking at how to measure those things that can’t easily be put into numbers: from player reactions and sticking points to interface feedback.

This is going to be a longer post, as there are really two things that I’ll be going through: different ways of measuring qualitative data, and how to write good interviews and questionnaires.

1. Ways of measuring the Qualitative

Qualitative data usually involves player reaction and emotion: whether they are frustrated or happy, where they have trouble and where they are engaged. It also involves gathering responses and suggestions by the player.

There are two main ways of gathering this data: observation and questioning.

Observation involves watching the player as they play. It can involve noting down their emotional state as the game progresses, or what they say. In HCI circles, a notable technique is to ask the user to ‘Think Aloud’ during the test, so that you know exactly what the user is thinking (this is especially useful in discovering ‘perceived affordances’ - that is, what an interface tells a user about its function). The issue with the think-aloud technique, particularly in the realm of games, is that it alters response times: in an action game, it will likely make the game prohibitively difficult; while the technique is often helpful in puzzle games, where the player thinking aloud will allow them to deduce the solutions more quickly.

Think-aloud is, therefore, a very useful technique which you should use with caution. It can give you a much deeper insight into the player mindset as they play, but can fundamentally affect this mindset as well.

The alternative, straight observation of emotion, is fraught with difficulty, not least because the player will react to the observation itself. Recording the player (audio and video) can help, as it allows you to review their reactions without watching them personally. This only works, however, if you have time to go over all the footage.

The other way to get qualitative data from players is by asking them: through interviews and questionnaires. It’s important to start off every playtest with a questionnaire that will tell you important data about the tester themselves: their experience and knowledge in gaming and the specific aspects you are testing. You should then ask the players questions about their experiences with the game after each level or section, so as to get the freshest view on each of these.

So do you use an interview or a questionnaire? Questionnaires are good at asking direct, specific questions. They are often faster to perform, and can be done without your presence. Interviews are generally better for exploratory tests, and allow you to adapt your questions to better find the cause of a specific gripe or issue. They do, however, take longer to perform, and require your presence.

Both of these techniques require you to design questions (you can perform an interview on the fly, but it’s better to have some basic questions to both start things rolling and ensure you cover everything you wanted to.

Which nicely segues us to part 2 of this mega-post:

2. Designing Questions

To continue the theme of ‘two things to talk about’ that seems to be threading its way through this post, there are two basic types of questions you can ask: the ‘make a choice’ kind and the ‘describe’ kind.

Making a choice comes in three basic forms: Rankings, ratings and Likert scales. Rankings involve the player ordering levels or powerups in terms of challenge, usefulness, fun or any other number of aspects. Likert scales and Ratings are about asking the player to rate a property from 1-5, ‘Very Easy’ to ‘Very Hard’, and so on.

Likert scales and ratings should always have at least 5 options (for instance, ‘Very Hard’, ‘Hard’, ‘Neutral’, ‘Easy’ and ‘Very Easy’) and should always have an odd number of options (this always allows the player to take a middle ground). Likert scales allow the player to choose a value on a scale between two extremes, while ratings are used to give magnitude to a specific property. It’s often useful to create lists of each of these (either asking the player to rate a number of aspects in a row in, for instance, how well they understood them; or asking them to rate how well the design informs the player of different metrics they might need to keep track of for effective play).

In both of these, it’s very important that the language you use to describe the choice doesn’t lead the player in a particular direction. Use ‘Rate the challenge of the level from 1-5’ rather than ‘Rate how hard the level is’, as ‘hard’ is one end of this scale (thus pointing the player towards that end of the scale). It’s also important to label the ends of the scale (is 1 ‘easy’ or ‘hard’?), and to beconsistent about your labels throughout the questions (changing 1 from ‘easy’ to ‘hard’ between questions only confuses the player). There are certain circumstances in which it’s useful to swap these, but these are incredibly rare.

Rankings, on the other hand, are about being able to pinpoint which of a number of aspects of a game requires the most attention: which levels are most in need of work, for instance.

All of these choice-based measures only give you half of the story - they’re usually good at pinpointing what you need to work on, but are much less useful for working out why. This is why most questionnaires will include a ‘Why’ after each rating question, to find out what, particularly, lead to a less than perfect score. A great technique to use here is to combine a questionnaire with an interview: get the players to answer ‘make a choice’ questions on paper, and then get them to justify these in the interview.

Thus, we reach the ‘describe’ form of question. This is where you want the player to make some more freeform comments. It’s probably good not to clog up your questionnaires with too many of these (or to make them optional), especially if players are testing online. Some good questions you can use here include:

What was your favourite aspect of *? Why?
What was your least favourite aspect of *? Why?
Did you find any aspects of the game frustrating? (note that this question is deliberately leading)
Did you find the powerups/skills/units useful? Which was the best one and why? Which was the worst and why?
Any other comments? (This should be in EVERY questionnaire)

And that’s about it. I had to cover a lot of stuff here in a single post, and so some of it might be a little generalised (please let me know if you find this the case - I can return to some of these topics later to explore the subtleties of them if needed). Next time we’ll look at the actual running of the playtest, and what you do on the day!

Mechanical Breakdown: Boss Battles and other Climactic Events » « The Half-Cinderella: Why Gameplay never leaves the Ball