'Early recognition' of polysyllabic words in continuous speech

Scharenborg, Odette; Ten Bosch, Louis; Boves, Lou

doi:10.1016/j.csl.2005.12.001

DetailsÜbersicht

'Early recognition' of polysyllabic words in continuous speech

Scharenborg, O., Ten Bosch, L., & Boves, L. (2007). 'Early recognition' of polysyllabic words in continuous speech. Computer, Speech & Language, 21, 54-71. doi:10.1016/j.csl.2005.12.001.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-D1DF-E Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-D1E1-6

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

ausblenden: Dateien

:

2382BDEDd01.pdf (Verlagsversion), 276KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-0012-D1DE-0

Name:
2382BDEDd01.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Scharenborg, Odette¹, Autor
Ten Bosch, Louis¹, Autor
Boves, Lou¹, Autor

Affiliations:
1Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, ou_55203

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Humans are able to recognise a word before its acoustic realisation is complete. This in contrast to conventional automatic speech recognition (ASR) systems, which compute the likelihood of a number of hypothesised word sequences, and identify the words that were recognised on the basis of a trace back of the hypothesis with the highest eventual score, in order to maximise efficiency and performance. In the present paper, we present an ASR system, SpeM, based on principles known from the field of human word recognition that is able to model the human capability of ‘early recognition’ by computing word activation scores (based on negative log likelihood scores) during the speech recognition process. Experiments on 1463 polysyllabic words in 885 utterances showed that 64.0% (936) of these polysyllabic words were recognised correctly at the end of the utterance. For 81.1% of the 936 correctly recognised polysyllabic words the local word activation allowed us to identify the word before its last phone was available, and 64.1% of those words were already identified one phone after their lexical uniqueness point. We investigated two types of predictors for deciding whether a word is considered as recognised before the end of its acoustic realisation. The first type is related to the absolute and relative values of the word activation, which trade false acceptances for false rejections. The second type of predictor is related to the number of phones of the word that have already been processed and the number of phones that remain until the end of the word. The results showed that SpeM’s performance increases if the amount of acoustic evidence in support of a word increases and the risk of future mismatches decreases.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2007

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: DOI: 10.1016/j.csl.2005.12.001

Art des Abschluß: -

ausblenden:

Titel: Computer, Speech & Language

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: Elsevier

Seiten: - Band / Heft: 21 Artikelnummer: - Start- / Endseite: 54 - 71 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1