About this corpus
Just as pop music has evolved from a youth cultural phenomenon into an integral part of modern culture, its lyrical content has become increasingly present in everyday language. Reflecting the high communicative impact of pop lyrics, the Corpus of Song Lyrics ("Songkorpus") offers a sustainably usable, multilayer-annotated collection of more than 15,000 texts that exhibit characteristics of both written and spoken discourse. It is designed for linguistic research and is also relevant to related disciplines such as cultural and literary studies, the social sciences, media studies and musicology, all of which share a scholarly interest in the language of contemporary German song lyrics.
Corpus Archives
So far, the corpus comprises the following archives:
We very much thank the above artists, who kindly allow us to provide their lyrics for non-commercial scientific research! All these archives contain XML TEI P5 annotated song lyrics with lemmatizations and part-of-speech annotations, see format examples below. Named entities, neologisms, and constituent structures are throughout annotated, sometimes also rhyme types.
Besides, the corpus features some thematic archives:
For detailed information please refer to the general statistics and visualizations. The corpus is available in derived formats for download.
Publications
Schneider, Roman (in preparation): Linguistic resources for the study of pop culture. In: Werner, Valentin / Cutler, Cecelia / Moody, Andrew (Ed.): Handbook of Language and Pop Culture. Berlin, Boston: De Gruyter Mouton.
Schneider, Roman / Lang, Christian / Hansen, Sandra (in preparation): Modalpartikeln in Songtexten – ein empirisch-algorithmischer Ansatz. In: Michael Westphal (Ed.): Language and Pop Music/Sprache und Popmusik. Berlin: Lang.
Schneider, Roman / Faaß, Gertrud (2023): Challenges in Computational Linguistics, Empiric Research & Multidisciplinary Potential of German Song Lyrics. Special Issue of the Journal for Language Technology and Computational Linguistic (JLCL). Vol. 36(1). [PDF]
Schneider, Roman (2022): Zwischen Schriftlichkeit und Mündlichkeit: Songtexte in der deskriptiven Sprachforschung. In: Sprachreport 1/2022. 38-50. [PDF]
Schneider, Roman (2022): Das Songkorpus – Perspektiven einer korpuslinguistischen Nutzung deutschsprachiger Popmusik für die Fremd- und Zweitsprachenvermittlung. In: Korpora Deutsch als Fremdsprache 2(2). 149–153. [PDF]
Schneider, Roman / Hansen, Sandra / Lang, Christian (2022): Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag. In: Kämper, Heidrun / Plewnia, Albrecht (Ed.): Sprache in Politik und Gesellschaft: Perspektiven und Zugänge. Berlin, Boston: De Gruyter. 295-304. [PDF]
Amin, Miriam / Fankhauser, Peter / Kupietz, Marc / Schneider, Roman (2021): Data-driven Identification of Idioms in Song Lyrics, Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL). 13-22. [PDF]
Schneider, Roman (2020): A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC). Marseille: European Language Resources Association (ELRA). 835-841. [PDF]
Schneider, Roman (2020): Songkorpus - Multiply Annotated German Song Lyrics. In: WebAnno - Use Cases. TU Darmstadt: Ubiquitous Knowledge Processing (UKP) Lab.
Schneider, Roman (2019): "Konservenglück in Tiefkühl-Town" - Das Songkorpus als empirische Ressource interdisziplinärer Erforschung deutschsprachiger Poptexte. In: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019). Erlangen: German Society for Computational Linguistics & Language Technology (GSCL). 229-236. [PDF]
Talks and Posters
17th July 2025: Digital Humanities Meets Language Technology: Empirical Insights from a Broadly Stratified Media Resource, Digital Humanities Conference 2025 (DH2025), Lisbon, Portugal.
9th November 2024: "Wir spitten Feuer gegen die Kälte". Warum Deutschrap für die Sprachforschung besonders tight ist, welche methodischen Ansätze Props einheimsen — und womit auch derbe Algorithmen strugglen, Keynote Hiphop-Symposium, Popakademie Baden-Württemberg, Mannheim.
11th December 2023: Songtexte als lexikografische Datenbasis, Berlin-Brandenburgische Akademie der Wissenschaften (BBAW), Berlin.
6th March 2023: „Alter, sprich mich bloß nicht an!“ Modalpartikeln in Popsongs – Kontextabhängige Detektion mit selbstlernenden Algorithmen, 60th Annual Conference of the Leibniz Institute for the German Language (IDS), Mannheim, Germany.
2nd December 2022: Spoken, Written, and the Continuum in Between – Empirical Identification of Heterogenous Language Data, Digital Research Data and Human Sciences (DRDHum 2022) Conference, Jyväskylä, Finland.
23rd September 2021: The Corpus of German Song Lyrics: Recent Developments and Interdisciplinary Potential, 2nd International Conference of the European Association for Digital Humanities (EADH), Krasnoyarsk, Russia.
6th August 2021: Data-driven Identification of Idioms in Song Lyrics, 17th Workshop on Multiword Expressions (MWE 2021), Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL), co-located with ACL-IJCNLP 2021, Bangkok, Thailand.
15th July 2021: From Characters to Words to Multiword Expressions - Datasets and Measures within the Corpus of German Song Lyrics. Corpus Linguistics Conference (CL2021), Limerick, Ireland.
9th March 2021: Das Vokabular von Songtexten im gesellschaftlichen Kontext – ein diachron-empirischer Beitrag. 57th Annual Conference of the Leibniz Institute for the German Language (IDS), Mannheim, Germany.
26th January 2021: Das Songtextkorpus - multidisziplinäre Perspektiven einer empirischen Ressource zur deutschsprachigen Popmusik. Research Network Educational Linguistics, Justus Liebig University, Gießen, Germany.
2nd December 2020: Digital Lyrics: Multidisciplinary Research on German-language Pop Culture. 4th Digital Humanities Day Leipzig (DHDL), Leipzig, Germany. [Poster]
May 2020: A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". 12th Language Resources and Evaluation Conference (LREC), Marseille, France.
3rd March 2020: Empirical Research Between Standard and Non-Standard: The German Song Corpus. 42nd Annual Conference of the German Linguistic Society (DGfS), Hamburg, Germany.
11th October 2019: "Konservenglück in Tiefkühl-Town" - Das Songkorpus als empirische Ressource interdisziplinärer Erforschung deutschsprachiger Poptexte. 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany.
Media and Outreach
23rd September 2025: Songtexte sprachlich erforscht: "Wenn Udo dabei ist, kann das nicht so ganz verkehrt sein". Die Welt.
4th September 2025: Wie Lindenberg das Mannheimer Songkorpus ins Rollen brachte. Mannheimer Morgen.
3rd April 2025: Hiphop als linguistische Fundgrube - Wie man Songtexte sprachwissenschaftlich untersucht. Jungen-Zukunftstag Boys'Day.
19th January 2025: Warum Hip-Hop für die Sprachwissenschaft so wertvoll ist. Deutschlandfunk Nova, Hörsaal.
27th April 2023: Deutschrap vs. Schlager – Wie positiv sind Liedtexte? Mädchen-Zukunftstag Girls'Day.
9th March 2021: Nicht nur Liebe als Thema - Sprachwissenschaftler untersuchen deutsche Popsongs. Deutschlandfunk Kultur.
Citation
English-language reference for citing the corpus:
![]() |
![]() |
Schneider, Roman (2020): A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". In: Proceedings of the 12th Language Resources and Evaluation Conference (LREC). Marseille: European Language Resources Association (ELRA). 835-841. [PDF] |
German-language reference for citing the corpus:
![]() |
![]() |
Schneider, Roman (2022): Zwischen Schriftlichkeit und Mündlichkeit: Songtexte in der deskriptiven Sprachforschung. In: Sprachreport 1/2022. 38-50. [PDF] |
Corpus Format
Text Encoding Initiative (TEI) P5 XML Format: Song Lyrics Header
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI SYSTEM "https://songkorpus.de/songkorpus.dtd">
<TEI xmlns="http://www.tei-c.org/ns/1.0" version="3.4.0">
<teiHeader>
<fileDesc>
<titleStmt xml:id="SK/STO.2023.5">
<title>Kommt mal alle wieder runter</title>
<author role="primary">Stoppok</author>
<author role="text">Stefan Stoppok</author>
</titleStmt>
<publicationStmt>
<publisher>Grundvermögen Edition</publisher>
<date>2023</date>
<ref><name>Teufelsküche</name></ref>
</publicationStmt>
<sourceDesc>
<ab>
<link target="https://stoppok.de/alben/27/"/>
</ab>
</sourceDesc>
</fileDesc>
Additional metadata may be annotated, such as a song’s chart rank or musical genre.
Text Encoding Initiative (TEI) P5 XML Format: Song Lyrics Body
<text>
<body>
<div1 type="song">
<lg type="verse">
<l>Kommt mal alle wieder runter,</l>
<l>da oben ist sowieso Ende.</l>
<l>Es wird jetzt auch nicht mehr bunter hier,</l>
<l>eher im Gegenteil,</l>
<l>jetzt kommt erst die eigentliche Wende.</l>
</lg>
<lg type="verse">
<l>Die Frage ist: wer schlägt wem den Schädel ein?</l>
<l>Wer schleppt noch mehr Kohle heim?</l>
<l>Wer schreit am lautesten, wer hat den längsten,</l>
<l>wer hat am wenigsten Skrupel und Bedenken?</l>
</lg>
</div1>
</body>
</text>
</TEI>
Additional metadata, such as spelling changes, may be annotated. Linguistic metadata — e.g., part of speech (extended STTS), lemma, named entities — is annotated separately using the WebLicht TCF format and the UIMA CAS XMI format.
Chronicle of Releases
The Songkorpus is being extended continually and existing material is being revised in terms of quality management in an ongoing process.
Release | Date | Innovations |
---|---|---|
1.0 | 20/06/2019 | Initial online presence featuring Udo Lindenberg lyrics. |
2.0 | 12/02/2020 | New archives: Konstantin Wecker, Stoppok, Chart Hits, GDR, HipHop. |
3.0 | 03/02/2021 | Addition of last year’s songs. New archives: Ulla Meinecke, Hannes Wader, Element of Crime, Fettes Brot. |
4.0 | 15/02/2022 | Addition of last year’s songs. New archive: Neue Deutsche Welle (NDW). |
5.0 | 21/02/2023 | Addition of last year’s songs. New archives: Dota, Ohrenfeindt |
6.0 | 31/01/2024 | Addition of last year’s songs. |
7.0 | 22/02/2025 | Addition of last year’s songs. Redesign of the HipHop archive based on the official German hip-hop charts. |
At this point, our heartfelt thanks go once again to all the artists involved. We sincerely appreciate your musical and lyrical creativity!