A touchscreen game for learning Mandarin tones

Check out the 30 seconds game demo!


Our game, Swipe Tone, is designed to help native English speakers learn to distinguish between and identify the five tones in Mandarin Chinese in a visual manner using a 3x3 grid and swipe gesture system.

My Contribution

Developed game concept and instructional content; Led user research sessions, including cognitive task analysis and playtesting; coordinated an external development team.

Duration 6 weeks 

Course Design of Educational Games 

Team  Amalya Henderson


Final Report | Final Presentation |  CTA Report


  • Literature Review    
  • Cognitive Task Analysis (CTA)    
  • Think-Aloud                  
  • MDA Framework  
  • Playtesting 




Because they are hard for beginners ...

Native English speakers routinely have difficulty with identifying tones in Chinese during the early stages of learning the language. This is especially true for students with no previous experience with tonal languages, since their ears are not attuned to these types of sound changes. Our sources* extensively document challenges with learning tones in beginning Mandarin classes for native English speakers.


And it is important to get them right!

  • If you don’t know the tone, you simply don’t know how to pronounce the word.
  • The tones are an integrated part of a word. They carry as much meaning as vowels do.
  • Importance of tones increases in proportion with the complexity of the meaning and context.




Our educational objectives are to help players identify and distinguish between the five tones in Mandarin Chinese. A secondary objective is to teach the player the swipe gesture system required to play the game. These gestures are determined by the vocal qualities of the tones that can help the player distinguish between the five tones: pitch and duration. The shapes of the gestures also mimic the shape of the symbols used in the Mandarin phonetic system that describe the tone for each syllable in a word.


Cognitive Task Analysis is the general term used to describe a set of methods and techniques that specify the cognitive structures and processes associated with task performance. It improves instructions by capturing the critical steps or decisions during the learning events. Based on our learning hypothesis that novice learners usually have specific difficulties with the 3rd tone and with tone pairs, we divided 33 tasks into three main categories:

  • single syllable, varied tones (easy level)
  • two syllables, both same tone (medium level)
  • two syllables, varied tones (hard level, focused on the practice of 3rd tone)

In the following think aloud studies, we gave the participants (all novice learners) initial instructions and the tasks and asked them to identify which tone or tones they think were used in the audio recording by writing the symbol for the tone.

Based on our analysis of the think aloud data, we created a CTA model of the desired cognitive process, revised the initial learning goals, tasks, task sequence, and generated more design ideas.



swipe tone playtesting data

Important Observations

1. For complete novices, it is very difficult to remember the 5 options and gestures without a key or individual tutorial.

2. Longer single syllables with multiple vowels or sounds (e.g., xiao1, qiang2) can be more confusing to native English speakers.

3. Single syllables with more complex sounds can cause more difficulties for novices.

4. Players had a difficult time evaluating if an incorrect answer was due to an incorrect gesture or to an incorrect answer.

“I can tell I was getting better...very interesting. I think I would come back and play the latter levels to test myself."     - CMU student

“I can tell I was getting better...very interesting. I think I would come back and play the latter levels to test myself."

 - CMU student

Significant Changes

1. Better scaffolding of the gesture system

  • We integrated the primary tone and gesture tutorial into beginning levels.

  • Now we only allow the players to input swipe selections from left to right which helps players avoid invalid input.

  • We made the selection area for each circle more narrow so that it is easier for players to dragging their fingers to make the big V pattern for the 3rd tone swipe without accidentally selecting other nearby circles on their way to and from the bottom circle in the V.

2. Included more specific feedback

  • When players finish a prompt, the system now shows the correct pinyin answer (with tone markings) for that prompt (for both correct and incorrect answers). 

  • When a player does an invalid swipe (a swipe pattern that is not one of the tone options), it does not count as an incorrect answer. The game shows feedback that it is an invalid swipe, and tells them to try again.

3. Added features to keep novices motivated

  • We now show progress through each level by including a number count at the bottom of the screen that shows the number of prompts that you have completed out of the total amount left in the level.

  • At the end of each level, the system shows a separate screen where it shows the player’s current score, the current high score for that level, and the average first-time player score to prevent feelings of defeat.





Our technical constraints were our primary challenges. In order to create a successful project, we consulted with the instructor and actively looked for help from a GameMaker expert and from potential partnerships with a student with a computer science background. Luckily, we were able to find Zhongye Yang, a game developer from the ETC program at Carnegie Mellon. We also got in touch with two other students from the ETC, a game designer and a visual designer, for our future work on this game beyond this class. While working with these outside team members, we gained firsthand experience with the communication difficulties an educational game designer can have with others outside of the education domain.

Amalya and I won the prize money  during the final pitch.

Amalya and I won the prize money  during the final pitch.

What Worked

We successfully improved many game mechanics according to observations and players’ feedback. The general gesture system is also well integrated in the gameplay now. It was relatively easy to find play testers since everyone we have talked to has been very excited about the game concept and wants to try it.

What Did Not Work So Well

We did not have enough time to implement the learning and game instructions into the actual mobile game. The current iteration is still a relatively short game with limited replayable value. In terms of the grid mechanic, we are currently using a simplified 3x3 grid which is easier for novices to manage (and for us to program), but not the most accurate representation of the pitch differences. For the aesthetics, the current interface is quite simple and, while more attractive than the original grey interface, not that appealing.

Looking Forward 

In the future, we plan to add:

  • additional feedback indicators, for example, sound effects for right/wrong/invalid input to make the feedback for each prompt more immediately obvious to the players.
  • wider range of difficulty levels - multiple syllables, rule variation for 3rd tone, and gradually change to real normal Chinese speech speed and sound (with less tonal emphasis) in latter levels.
  • Ideally, if we can still fit it on a touchscreen comfortably with the other interface elements, the grid should be a 5 x 4 grid.
  • Simple storyline (such as a thief trying to break into a safe and the only clue to the combination is a Chinese recording) and artistic interface to make the game more attractive.

* Sources

  • Yang Yang Cheng (yoyochinese.com)

  • Hacking Chinese (hackingchinese.com)

  • Chen, Trevor H. and Massaro, Dominic W. “Seeing pitch: Visual information for lexical tones of Mandarin-Chinese.” The Journal of the Acoustical Society of America, 123, 2356-2366 (2008), DOI:http://dx.doi.org/10.1121/1.2839004.

  • Wu, Hang, and L Keith Miller. “A Tutoring Package to Teach Pronunciation of Mandarin Chinese Characters.” Ed. Michael Kelley. Journal of Applied Behavior Analysis 40.3 (2007): 583–586. PMC. Web. 6 Apr. 2015.

  • Darren Edge, Kai-Yin Cheng, Michael Whitney, Yao Qian, Zhijie Yan, and Frank Soong. “Tip tap tones: mobile microtraining of mandarin sounds.” In Proceedings of the 14th international conference on Human-computer interaction with mobile devices and services companion (MobileHCI '12). ACM, New York, NY, USA, 215-216. 2012. DOI=10.1145/2371664.2371715 http://doi.acm.org/10.1145/2371664.2371715