Just as pop music has evolved from an originally youth cultural phenomenon into an integral part of modern culture, its textual content has become omnipresent in the realm of everyday language. We are surrounded by pop lyrics, e.g. in the form of in-car radio listening, online streaming services, ambient music in department stores and restaurants, or in the context of TV shows. In view of this high communicative impact factor, linguists face a substantial desideratum regarding the empirical exploration of pop lyrics. The Corpus of Song Lyrics ("Songkorpus") addresses this desiteratum and contains sustainably utilizable, multilayer annotated song texts, featuring phenomena of both written and spoken discourse. It is dedicated to linguistic research, as well as to related disciplines such as media, cultural & literary studies, social sciences, or musicology, who have a scientific interest in contemporary German rock and pop music language.

Corpus Archives

For detailed information please refer to the general statistics and visualizations. So far, the corpus comprises the following archives:

We very much thank the above artists, who kindly allow us to provide their lyrics for non-commercial scientific research! All these archives contain XML TEI P5 annotated song lyrics with lemmatizations and part-of-speech annotations (extended STTS). Named entities, neologisms, and constituent structures are throughout annotated, sometimes also rhyme types.

Besides, the corpus features some thematic archives:


In order to use the corpus in derived formats for your own scientific work, please proceed here.


Schneider, Roman / Faaß, Gertrud (2023): Challenges in Computational Linguistics, Empiric Research & Multidisciplinary Potential of German Song Lyrics. Special Issue of the Journal for Language Technology and Computational Linguistic (JLCL). Vol. 36(1). [PDF]

Schneider, Roman (2022): Zwischen Schriftlichkeit und Mündlichkeit: Songtexte in der deskriptiven Sprachforschung. In: Sprachreport 1/2022. 38-50. [PDF]

Schneider, Roman (2022): Das Songkorpus – Perspektiven einer korpuslinguistischen Nutzung deutschsprachiger Popmusik für die Fremd- und Zweitsprachenvermittlung. In: Korpora Deutsch als Fremdsprache 2(2). 149–153. [PDF]

Schneider, Roman / Hansen, Sandra / Lang, Christian (2022): Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag. In: Kämper, Heidrun / Plewnia, Albrecht (Hg.): Sprache in Politik und Gesellschaft: Perspektiven und Zugänge. Berlin, Boston: De Gruyter. 295-304. [PDF]

Amin, Miriam / Fankhauser, Peter / Kupietz, Marc / Schneider, Roman (2021): Data-driven Identification of Idioms in Song Lyrics, Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL). 13-22. [PDF]

Schneider, Roman (2020): A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC). Marseille: European Language Resources Association (ELRA). 835-841. [PDF]

Schneider, Roman (2020): Songkorpus - Multiply Annotated German Song Lyrics. In: WebAnno - Use Cases. TU Darmstadt: Ubiquitous Knowledge Processing (UKP) Lab.

Schneider, Roman (2019): "Konservenglück in Tiefkühl-Town" - Das Songkorpus als empirische Ressource interdisziplinärer Erforschung deutschsprachiger Poptexte. In: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019). Erlangen: German Society for Computational Linguistics & Language Technology (GSCL). 229-236. [PDF]

Talks and Posters

11th December 2023: Songtexte als lexikografische Datenbasis, Berlin-Brandenburgische Akademie der Wissenschaften (BBAW), Berlin.

2nd December 2022: Spoken, Written, and the Continuum in Between – Empirical Identification of Heterogenous Language Data, Digital Research Data and Human Sciences (DRDHum 2022) Conference, Jyväskylä, Finland.

23rd September 2021: The Corpus of German Song Lyrics: Recent Developments and Interdisciplinary Potential, 2nd International Conference of the European Association for Digital Humanities (EADH), Krasnoyarsk, Russia.

6th August 2021: Data-driven Identification of Idioms in Song Lyrics, 17th Workshop on Multiword Expressions (MWE 2021), Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL), co-located with ACL-IJCNLP 2021, Bangkok, Thailand.

15th July 2021: From Characters to Words to Multiword Expressions - Datasets and Measures within the Corpus of German Song Lyrics. Corpus Linguistics Conference (CL2021), Limerick, Ireland.

9th March 2021: Nicht nur Liebe als Thema - Sprachwissenschaftler untersuchen deutsche Popsongs, Deutschlandfunk Kultur.  [Interview MP3]

9th March 2021: Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag. 57th Annual Conference of the Leibniz Institute for the German Language (IDS), Mannheim, Germany.

26th January 2021: Das Songtextkorpus - multidisziplinäre Perspektiven einer empirischen Ressource zur deutschsprachigen Popmusik. Research Network Educational Linguistics, Justus Liebig University, Gießen, Germany.

2nd December 2020: Digital Lyrics: Multidisciplinary Research on German-language Pop Culture. 4th Digital Humanities Day Leipzig (DHDL), Leipzig, Germany.  [Poster]

May 2020: A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". 12th Language Resources and Evaluation Conference (LREC), Marseille, France.

3rd March 2020: Empirical Research Between Standard and Non-Standard: The German Song Corpus. 42nd Annual Conference of the German Linguistic Society (DGfS), Hamburg, Germany.

11th October 2019: "Konservenglück in Tiefkühl-Town" - Das Songkorpus als empirische Ressource interdisziplinärer Erforschung deutschsprachiger Poptexte. 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany.


English-language reference for citing the corpus:

BibTeX Citation RIS Citation

Schneider, Roman (2020): A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC). Marseille: European Language Resources Association (ELRA). 835-841. [PDF]

German-language reference for citing the corpus:

BibTeX Citation RIS Citation

Schneider, Roman (2022): Zwischen Schriftlichkeit und Mündlichkeit: Songtexte in der deskriptiven Sprachforschung. In: Sprachreport 1/2022. 38-50. [PDF]