Comparing human and machine recognition performance on a VCV corpus

Scharenborg, Odette; Cooke, M. P.

Lokale TagsFreigabegeschichteDetailsÜbersicht

Comparing human and machine recognition performance on a VCV corpus

Scharenborg, O., & Cooke, M. P. (2008). Comparing human and machine recognition performance on a VCV corpus. In ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery".

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-D20B-F Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0012-D20C-D

Genre: Konferenzbeitrag

Dateien

einblenden: Dateien

ausblenden: Dateien

:

3CB1B9BFd01.pdf (Verlagsversion), 108KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-0012-D20A-2

Name:
3CB1B9BFd01.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Scharenborg, Odette¹, Autor
Cooke, M. P.², Autor

Affiliations:
1Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands, ou_persistent22
2Speech and Hearing Research Group, Dept. of Computer Science, University of Sheffield, UK, ou_persistent22

Inhalt

einblenden:

ausblenden:

Schlagwörter: human-machine comparison, acoustic feature representations, articulatory feature classification.

Zusammenfassung: Listeners outperform ASR systems in every speech recognition task. However, what is not clear is where this human advantage originates. This paper investigates the role of acoustic feature representations. We test four (MFCCs, PLPs, Mel Filterbanks, Rate Maps) acoustic representations, with and without ‘pitch’ information, using the same backend. The results are compared with listener results at the level of articulatory feature classification. While no acoustic feature representation reached the levels of human performance, both MFCCs and Rate maps achieved good scores, with Rate maps nearing human performance on the classification of voicing. Comparing the results on the most difficult articulatory features to classify showed similarities between the humans and the SVMs: e.g., ‘dental’ was by far the least well identified by both groups. Overall, adding pitch information seemed to hamper classification performance.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erschienen: 2008

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: -

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery"

Veranstaltungsort: Aalborg, Denmark

Start-/Enddatum: 2008-06-04 - 2008-06-06

ausblenden:

Titel: ISCA Tutorial and Research Workshop (ITRW) on "Speech Analysis and Processing for Knowledge Discovery"

Genre der Quelle: Konferenzband

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: -

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: - Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1