Corpus "Deutsch in Namibia"
The corpus "German in Namibia" („Deutsch in Namibia“ –DNam) was created in the period 2016-2021, in the DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context" – WI 2155/9-1 and SI 750/4-1, directed by Heike Wiese and Horst Simon in cooperation with Marianne Zappen-Thomson) at the University of Potsdam (until 2019) and at HU Berlin (since 2019), at the FU Berlin and at UNAM Windhoek.
The corpus documents language use in formal and informal situations and language attitudes within the German minority community in Namibia. The data are available as audio data with aligned and annotated transcriptions, supplemented by metadata on the speakers (biographical data, information on language competence and language use).
More details on the DNam-Corpus.
In addition to the main corpus, there is a supplementary corpus DNam-Wenker, which contains "Wenker" data on Namibian German: Renderings of the 40 classic "Wenker sentences" into Namibian German were collected via an online questionnaire, supplemented by a personal questionnaire on the biographical, social and sociolinguistic data of the speakers.
More details on DNam-Wenker.
Main corpus: DNam
Funding
DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context"), WI 2155/9-1 and SI 750/4-1.
Access to the corpus
The corpus is freely accessible online via the Datenbank für Gesprochenes Deutsch (Database for Spoken German – DGD).
Here is a short tutorial on how to use DGD, with thanks to Dr. Thomas Schmidt:
Corpus size and sub-copora
Total size: 226 recordings, 18:39 hours, 110 speakers
Elicitation set-up | Tokens | Duration (hh:mm:ss) | Speakers | Recordings |
Free conversations | 115.004 | 9:15:00 | 65 | 21 |
Speech situations | 51.509 | 4:41:30 | 103 | 198 |
Semi-structured interviews | 57.879 | 4:42:15 | 15 | 7 |
Total | 224.392 | 18:38:45 | 110 | 226 |
The recordings transcribed for the corpus are part of a larger collection of data. The selection criteria were:
- Balanced sample of set-ups (similar weighting of the three set-ups) and speakers (farmers - urban dwellers; pupils from private and government schools; speakers from different areas of Namibia); preference for speakers who were born in Namibia; broad spectrum in terms of educational level, occupations and age groups.
- In free conversations, conversations with long and frequent pauses, many meta-linguistic comments, few participants and/or pure discourse on the given topics were not considered.
Data collection
Period of collection: 2017
Recording locations: German-speaking schools, farms, private homes, public spaces in (the vicinity of) Windhoek, Witvlei, Omaruru, Swakopmund and Otjiwarongo
Clicking on a town will show you metadata on the speakers in the DGD publication of the corpus.
Witvlei
The collection of data took place in three different set-ups:
Speakers
The metadata of the speakers include:
- Biographical information: gender, year of birth, occupation; for students: school, place of birth (country, town), information on where they grew up (country, region, place name).
- Sociolinguistic information: languages of mother and father, languages of parents with each other
Group | Number | Number (male) | Number (female) | Age |
Children (not pupils) | 3 | 3 | 0 | 6, 14, 17 |
Pupils | 81 | 43 | 38 | 14-18, Average: 16 (7 no age stated) |
Adults | 26 | 13 | 13 | 26-75, Average: 48 (1 no age stated) |
Total | 110 | 59 | 51 | 6-75, Average: 24 |
Annotation, transcription, anonymisation/sigla
Annotation levels
The data are available as audio files with annotation and in the form of transcripts. The transcripts have six annotation levels:
- Transcription level (trans): original transcription level (literary transcription)
- Tokenised transcription level (trans_tok): division of the transcription into individual tokens
- Normalised level (norm): transcription according to standard orthography; no modification of non-standard utterances (e.g. in terms of case or genus).
- Word types/part-of-speech tagging (pos): based on STTS 2.0, supplemented by three corpus-specific tags (ATM: audible breathing, META: double bracket in transcription of paraverbal utterances "((laughs))", SOART: contraction of son and inflectional forms)
- Lemma level (lemma): word lemma
- Annotation of contact language tokens (FW): information on donor language, extent of integration and existence of a lexicon entry in the online version of the "Duden" dictionary (2020). This annotation level is not yet available in the DGD, but will be made available in future releases. The annotation guidelines for the contact language tokens can be found here.
The following figure illustrates the transcription levels in an EXMARaLDA transcript:
Transcription
The orthographic transcription was done with the score editor of EXMARaLDA (Schmidt, 2016); the annotation guidelines are a slight modification of the cGAT conventions (Schmidt et al. 2015). The annotation guidelines with the deviations from the cGat conventions can be found here. The transcription largely follows the standard orthography, but at the same time captures typical phenomena of spoken language (e.g. elisions, contractions, word breaks, pauses in conversation) as well as paraverbal and non-verbal information. The first versions of the transcriptions were each checked by another team member; deviations were discussed and resolved with the original transcriber. A final check was done by a German-speaking Namibian.
Anonymisation and sigla
- Anonymisation of personal names, specific location information (e.g. farms) as well as all statements that allow conclusions to be drawn about the identity of persons.
- Masking in the audio files
- Anonymisation through four types of sigla in the corpus, some of which contain meta-linguistic information
- Sigla for speakers:
Speaker ID-No. Gender Age Group NAM 006 W 1 -
001 - 2xx
one number per speaker
-
M
male
-
W
female
-
1
under 21
-
2
21 - 40
-
3
41 - 60
-
4
over 60
-
- Sigla for the researchers (e.g. RES1-RES4)
- Sigla for individual tokens that have been anonymised: initial letter of the anonymised expression + three-digit number, e.g. N001
- Sigla for anonymised expressions consisting of several tokens: Phrase „anonymisierte_Äußerung“ ("anonymised_expression") + three-digit number, e.g. anonymisierte_Äußerung001.
- Sigla for speakers:
Project participants
PIs: Heike Wiese, Horst J. Simon
Cooperation partners: Marianne Zappen-Thomson, Thomas Schmidt, Hans Boas
Project collaborators: Christian Zimmer, Janosch Leugner, Yannic Bracke, Britta Stuhl, Laura Perlitz
Student assistants: Jones Anam, Christian Anders, Alexandra Fosså, Semra Kizilkaya, Carina Schüffler, Claudia Czarniak, Philipp Klaußner, Jula Kostka, Anika Kroll-Tjingaete, Johanna Pott, Britta Stuhl
Citation
Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta; Perlitz, Laura, & Schmidt, Thomas: DNam-Korpus zum Deutschen in Namibia.
Literature
Wiese, Heike; Simon, Horst J.; Zimmer, Christian & Schumann, Kathleen (2017). German in Namibia: A vital speech community and its multilingual dynamics. In Péter Maitz & Craig A. Volker (Hg.), Language Contact in the German Colonies. S.221-245.
Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta, & Schmidt, Thomas (2020). Das Korpus Deutsch in Namibia (DNam): Eine Ressource für die Kontakt- Variations- und Soziolinguistik. Deutsche Sprache 3: 210-232.
Supplementary corpus to DNam: DNam-Wenker
In 2013/14, "Wenker" data on Namibian German was collected via an online platform.
The survey was aimed at Namibian speakers of all ages and served to obtain broad data on specific areas of lexicon and grammar, which, through this classic tool of Germanic dialect research, ensure broad comparability with other and even older studies on dialectal forms in German. In order to reach as many speakers as possible, we developed an online questionnaire with the 40 original "Wenker sentences", supplemented by an introductory text on Namibian-German, the research project and the "Wenker sentences", as well as a personal questionnaire on biographical, social and sociolinguistic data at the end.
You will find the exact wording of the task in the online survey here and the 40 Wenker sentences queried here.
Through extensive media work and dissemination in the German-speaking community via radio, newspapers, church congregations and schools, more than 200 participants were recruited; this covers approximately one percent of the speaker community. For their committed support in disseminating information on the "Wenker" survey, we would like to thank the Delta School and the German Higher Private School (Deutschen Höheren Privatschule – DHPS) Windhoek, Wilfried Hähner from "Hitradio Namibia" and the "Allgemeine Zeitung Windhoek" and the then Bishop of the Evangelical Lutheran Church in Namibia, Bishop Hertel.
The results of the Wenker-Namdeutsch survey are freely available as an Excel spreadsheet under the CC BY 3.0-Licence.
Wenker-Namdeutsch by Heike Wiese is licensed under a Creative Commons Attribution 3.0 Germany License.
Documentation
Not all survey participants completed the questionnaire in full. In the corresponding data records, the empty fields are marked with "NA". The information provided by the respondents was transferred over without modification, except for three areas that were changed for data protection reasons:
- Information on occupation was removed
- The year of birth was given as a period of ten years.
- For the question regarding place of residence during the period of the sruvey, the country "Namibia" was given for all Namibian towns. All records with a place of residence outside Namibia were removed from the table. Records with no indication of place of residence were retained.
Collaboration / Support
Heike Wiese, Hans C. Boas, Horst J. Simon, Marianne Zappen-Thomson, Laura Perlitz, Oliver Bunk
Citation
Wiese, Heike (2014): DNam-Wenker. Ein Korpus mit 'Wenker'-Sätzen zum Namibiadeutschen.