Improving black-box speech-to-text systems via machine learning techniques

Schwade, Guilherme Vieira

dc.contributor.advisor	Silva, Bruno Castro da	pt_BR
dc.contributor.author	Schwade, Guilherme Vieira	pt_BR
dc.date.accessioned	2016-08-25T02:16:26Z	pt_BR
dc.date.issued	2016	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/147632	pt_BR
dc.description.abstract	There are several ways a user can interact with a computer. Not every way is equally appropriate for all situations: when typing, a keyboard is more appropriate; a mouse, on the other hand, is a better fit in case the user needs to control the cursor with precision. In some complex systems, the user might need to execute several different tasks, and, therefore, might need different ways to interact with the system. In order to simplify those interactions, the use of voice commands might be a good strategy, since they often allow the user to specify the task to be executed with a richer input vocabulary than that available via other, more standard input devices. However, the development of robust speech-to-text converters (SST converters) requires a lot of time and resources which development teams often do not have. There are widely-used SST converters available on the internet, such as theWeb Speech API from Google; these systems are in a very advanced stage of maturity considering general context applications—for instance, when they are used to analyze terms and words that occur in day-to-day conversations. However, these systems are often not efficient when used to analyze contextspecific terms, which occur only in particular systems or applications. Furthermore, these systems are usually black-box and cannot be modified or improved by developers who wish to use them to solve particular specialized speech-to-text problems. To analyze possible solutions to this problem, we study the development of an additional layer of software, trained via machine learning techniques, to correct or adapt the imperfect translations generated by a black-box STT when applied to a specific domain. In particular, we propose and evaluate several machine learning solutions to improve a complex flight tickets management system to which we wish to add voice-control capabilities. In the first part of this work, we discuss our motivation and describe the domain where the proposed methods evaluated. After that, mathematical theoretical background is presented and we introduce possible solutions to the particular domain at hand. At the end, a critical analysis of the results is made and future work is discussed.	en
dc.format.mimetype	application/pdf
dc.language.iso	eng	pt_BR
dc.rights	Open Access	en
dc.subject	Speech Recognition	en
dc.subject	Reconhecimento : Padroes	pt_BR
dc.subject	Machine learning	en
dc.subject	Aprendizagem : Maquina	pt_BR
dc.subject	Levenshtein distance	en
dc.subject	Phonetic algorithm	en
dc.title	Improving black-box speech-to-text systems via machine learning techniques	pt_BR
dc.type	Trabalho de conclusão de graduação	pt_BR
dc.identifier.nrb	000999675	pt_BR
dc.degree.grantor	Universidade Federal do Rio Grande do Sul	pt_BR
dc.degree.department	Instituto de Informática	pt_BR
dc.degree.local	Porto Alegre, BR-RS	pt_BR
dc.degree.date	2016	pt_BR
dc.degree.graduation	Ciência da Computação: Ênfase em Ciência da Computação: Bacharelado	pt_BR
dc.degree.level	graduação	pt_BR

Ficheros en el ítem

Nombre:: 000999675.pdf
Tamaño:: 1.420Mb
Formato:: PDF
Descripción:: Texto completo (inglês)

Ver

Este ítem está licenciado en la Creative Commons License

Tesinas de Curso de Grado (36987)

Tesinas Ciencia de la Computación (1021)

Mostrar el registro sencillo del ítem