A Resource for Natural Language Processing of Swiss German Dialects

Abstract

Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLP applications for Swiss German. We extended the original corpus and improved its annotation consistency. Furthermore, we trained dialect-specific PoS-tagging models and implemented a baseline system for dialect identification.

Publication
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, pages 76–84, Dublin, Ireland, August 23 2014.