Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  A New Workflow for Semi-automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments

Casillas, M., Bergelson, E., Warlaumont, A. S., Cristia, A., Soderstrom, M., VanDam, M., et al. (2017). A New Workflow for Semi-automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. In Proceedings of Interspeech 2017 (pp. 2098-2102). doi:10.21437/Interspeech.2017-1418.

Item is

Dateien

einblenden: Dateien
ausblenden: Dateien
:
Casillas_etal_2017a.PDF (Verlagsversion), 307KB
Name:
Casillas_etal_2017a.PDF
Beschreibung:
-
OA-Status:
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Casillas, Marisa1, Autor           
Bergelson, Elika2, Autor
Warlaumont, Anne S.3, Autor
Cristia, Alejandrina4, Autor
Soderstrom, Melanie5, Autor
VanDam, Mark6, 7, Autor
Sloetjes, Han8, Autor           
Affiliations:
1Language Development Department, MPI for Psycholinguistics, Max Planck Society, ou_2340691              
2Duke University, ou_persistent22              
3Department of Cognitive and Information Sciences, University of California Merced, ou_persistent22              
4Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d’Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, ou_persistent22              
5University of Manitoba, ou_persistent22              
6Washington State University, ou_persistent22              
7Hearing Oral Program of Excellence (HOPE) of Spokane, USA, ou_persistent22              
8The Language Archive, MPI for Psycholinguistics, Max Planck Society, ou_530892              

Inhalt

einblenden:
ausblenden:
Schlagwörter: daylong recordings, language acquisition, annotation, speech recognition, speaker diarization
 Zusammenfassung: Interoperable annotation formats are fundamental to the utility, expansion, and sustainability of collective data repositories.In language development research, shared annotation schemes have been critical to facilitating the transition from raw acoustic data to searchable, structured corpora. Current schemes typically require comprehensive and manual annotation of utterance boundaries and orthographic speech content, with an additional, optional range of tags of interest. These schemes have been enormously successful for datasets on the scale of dozens of recording hours but are untenable for long-format recording corpora, which routinely contain hundreds to thousands of audio hours. Long-format corpora would benefit greatly from (semi-)automated analyses, both on the earliest steps of annotation—voice activity detection, utterance segmentation, and speaker diarization—as well as later steps—e.g., classification-based codes such as child-vs-adult-directed speech, and speech recognition to produce phonetic/orthographic representations. We present an annotation workflow specifically designed for long-format corpora which can be tailored by individual researchers and which interfaces with the current dominant scheme for short-format recordings. The workflow allows semi-automated annotation and analyses at higher linguistic levels. We give one example of how the workflow has been successfully implemented in a large cross-database project.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2017-03-212017-06-022017-03-142017-05-222017
 Publikationsstatus: Online veröffentlicht
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: Expertenbegutachtung
 Identifikatoren: DOI: 10.21437/Interspeech.2017-1418
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: Interspeech 2017
Veranstaltungsort: Stockholm, Sweden
Start-/Enddatum: 2017-08-20 - 2017-08-24

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Proceedings of Interspeech 2017
Genre der Quelle: Konferenzband
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: -
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 2098 - 2102 Identifikator: -