Gary Ross, Glenn Norris, and Stephen Henneberry
JALT-CALL 2019, Aoyama Gakuin University, Tokyo, Japan
Session abstract:
Speech Recognition (computerized listening) and Synthesis (computerized speech), generally shortened to Speech Recognition, is the most important interface development in technology, representing the final stage in human to device interaction. Technologies such as Siri promise to revolutionize our interactions with our devices. For the language learner, the ability to speak to a device that can simultaneously take on different genders and accents will enable learners to take control of their learning process, by both time and location shifting their practice while working at their own pace, providing learners with vastly more opportunities to practice speaking and receive immediate automated feedback.Speech Recognition’s power is that(i) students can practice speaking at any time and receive instant feedback,(ii) every utterance can be stored as machine-readable text in a database allowing computer analysis of student patterns to discern common errors which can then be displayed to the instructor automatically, (iii) machine learning (artificial intelligence) techniques can analyze massive amounts of data to discover deeper spoken patterns as well as syntactic and semantic errors.As the initial part of a 4-year cross-institutional research grant from the Japanese Government (Kakenhi), this paper will present (a) the challenges of setting up such a system for both desktop and mobile, (b) a pattern analysis of over 1,000,000 utterances using the system from 3 Japanese universities, (c) an analysis of the effectiveness of online speaking on student outcomes among those institutions, (d) student feedback and reactions regarding speaking to a machine.