Skip to the content.

Swiss German PoS Tagging

Swiss German is a dialect continuum of the Alemannic dialect group. It comprises numerous varieties used in the German-speaking part of Switzerland. Although mainly oral varieties (Mundarten), they are frequently used in written communication. On the basis of their high acceptance in the Swiss culture and with the introduction of digital communication, Swiss German has undergone a spread over all kinds of communication forms and social media. Considering the lack of standard spelling rules, this leads to a huge linguistic variability because people write the way they speak.

Such a situation is a challenging task for NLP and we would like to provide data and resources to serve as a stepping stone to automatically process texts written in these dialects. We compiled NOAH’s Corpus of Swiss German Dialects consisting of various text genres, manually annotated with Part-of-Speech tags. Furthermore, we applied this corpus as training set to a statistical Part-of-Speech tagger and a dialect identification model.

Resources

Get NOAH’s Corpus

Get PoS-Tagging Models trained on NOAH’s Corpus.

Publications

Noëmi Aepli, Nora Hollenstein, Simon Clematide. NOAH 3.0: Recent Improvements in a Part-of-Speech Tagged Corpus for Swiss German Dialects. SwissText 2018: 116. (PDF) (poster)

Nora Hollenstein & Noëmi Aepli. A Resource for Natural Language Processing of Swiss German Dialects. GSCL 2015: 108. (PDF) (poster)

Nora Hollenstein & Noëmi Aepli. Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging. VarDial@COLING 2014: 85. (PDF) (poster)

Talks

Swiss German NLP, IBM Developer UnConference Meetup Zürich, 18.1.2018 (slides)

Swiss German NLP, NLP Meetup Zürich #3, 28.9.2017 (slides)

Demo

Check out the demo of our Swiss German PoS-Tagger!

Acknowledgements

We would like to thank the Institute of Computational Linguistics of the University of Zurich for the financial support as well as all the students who have done some of the annotation work.