Human model evaluation in interactive supervised learning by Rebecca Fiebrink, Perry R. Cook, and Daniel Trueman. Published in the CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems
Author Bios
Rebecca Fiebrink is currently an assistant professor in Computer Science at Princeton University. She holds a PhD from Princeton and was a postdoc for most of 2011 at the University of Washington.
Perry R. Cook is a professor emeritus at Princeton University in Computer Science and the Department of Music. He is no longer teaching, but still researches, lectures, and makes music.
Daniel Trueman is a musician, primarily with the fiddle and the laptop. He currently teaches composition at Princeton University.
Summary
- Hypothesis
- Because model evaluation plays a special role in interactive machine learning systems, it is important to develop a better understanding of what model criteria are most important to users.
- Methods
- The authors performed three studies of people applying supervised learning in their work. In the first study they led a design process with seven composers to focus on refining the Wekinator. Participants met regularly to discuss the software in relation to their work and suggest improvements. In the second study, students were told to use the Wekinator in an assignment focused on supervised learning in interactive music performance systems. Specifically, they were asked to use an input device to create two gesturally controlled music performance systems. The third study was a case study completed with a professional musician to build a gesture recognition system for a sensor-equipped cello bow. The goal of this study was to build a set of gesture classifiers to capture data from the bow and produce musically appropriate labels.
- Results
- In the first study participants found that the algorithms used to control the sound were difficult to control in a musically satisfying way using either a GUI or an explicitly controlled sequence. Unlike the first study, the second and third both made some use of cross validation. Users in the second study indicated that they considered high levels of cross validation accuracy to be indicative of good performance, and made use of it as such. In the third study, however, it was used more as a quick way to check. The people in all three studies used direct validation much more frequently than cross validation. The direct validation was broken down into six categories: correctness, cost, decision boundary shape, label confidence and posterior shape, and complexity and unexpectedness.
- Contents
- The researchers present work studying how users evaluate and interact with supervised learning systems. They examine what sort of criteria is used in the evaluation and present observations of different techniques, such as cross-validation and direct validation. The purpose of the research is both to make judgments of algorithm performance and improve training models, in addition to providing more effective training data.

I think this paper did a very good job of presenting findings and being thorough with the research and methodology. By my estimation the researchers did accomplish their goal of gathering useful data regarding evaluation of supervised learning systems, and I think that this work will be very beneficial in the future. I did not find any gaping faults with the paper itself; it presents its purpose, carries out the research and gives the findings, and even discusses potential benefits and uses.
No comments:
Post a Comment