Corpus &quot;Deutsch in Namibia&quot;

German in Multilingual Contexts | German in Namibia (DNam)

Corpus "Deutsch in Namibia"

Straßenzug in Namibia mit mehrsprachigen Schildern: Luisen Apotheke/Pharmacy/Apteek, Otto Mühr & Co.

Photo: Heike Wiese

Multilingual pharmacy sign in Windhoek.

The corpus "German in Namibia" („Deutsch in Namibia“ –DNam) was created in the period 2016-2021, in the DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context" – WI 2155/9-1 and SI 750/4-1, directed by Heike Wiese and Horst Simon in cooperation with Marianne Zappen-Thomson) at the University of Potsdam (until 2019) and at HU Berlin (since 2019), at the FU Berlin and at UNAM Windhoek.

The corpus documents language use in formal and informal situations and language attitudes within the German minority community in Namibia. The data are available as audio data with aligned and annotated transcriptions, supplemented by metadata on the speakers (biographical data, information on language competence and language use).

More details on the DNam-Corpus.

In addition to the main corpus, there is a supplementary corpus DNam-Wenker, which contains "Wenker" data on Namibian German: Renderings of the 40 classic "Wenker sentences" into Namibian German were collected via an online questionnaire, supplemented by a personal questionnaire on the biographical, social and sociolinguistic data of the speakers.

More details on DNam-Wenker.

Main corpus: DNam

Funding

DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context"), WI 2155/9-1 and SI 750/4-1.

Access to the corpus

The corpus is freely accessible online via the Datenbank für Gesprochenes Deutsch (Database for Spoken German – DGD).

Here is a short tutorial on how to use DGD, with thanks to Dr. Thomas Schmidt:

Corpus size and sub-copora

Total size: 226 recordings, 18:39 hours, 110 speakers

Elicitation set-up	Tokens	Duration (hh:mm:ss)	Speakers	Recordings
Free conversations	115.004	9:15:00	65	21
Speech situations	51.509	4:41:30	103	198
Semi-structured interviews	57.879	4:42:15	15	7
Total	224.392	18:38:45	110	226

The recordings transcribed for the corpus are part of a larger collection of data. The selection criteria were:

Balanced sample of set-ups (similar weighting of the three set-ups) and speakers (farmers - urban dwellers; pupils from private and government schools; speakers from different areas of Namibia); preference for speakers who were born in Namibia; broad spectrum in terms of educational level, occupations and age groups.
In free conversations, conversations with long and frequent pauses, many meta-linguistic comments, few participants and/or pure discourse on the given topics were not considered.

Data collection

Period of collection: 2017

Recording locations: German-speaking schools, farms, private homes, public spaces in (the vicinity of) Windhoek, Witvlei, Omaruru, Swakopmund and Otjiwarongo

Clicking on a town will show you metadata on the speakers in the DGD publication of the corpus.

Witvlei

The collection of data took place in three different set-ups:

Stimulus for the LangSit survey method: series of pictures of a car accident

Speakers

Foto: Yannic Bracke

Recording situation with researchers (Christian Zimmer, Heike Wiese).

The metadata of the speakers include:

Biographical information: gender, year of birth, occupation; for students: school, place of birth (country, town), information on where they grew up (country, region, place name).
Sociolinguistic information: languages of mother and father, languages of parents with each other

Group	Number	Number (male)	Number (female)	Age
Children (not pupils)	3	3	0	6, 14, 17
Pupils	81	43	38	14-18, Average: 16 (7 no age stated)
Adults	26	13	13	26-75, Average: 48 (1 no age stated)
Total	110	59	51	6-75, Average: 24

Annotation, transcription, anonymisation/sigla

Annotation levels

The data are available as audio files with annotation and in the form of transcripts. The transcripts have six annotation levels:

Transcription level (trans): original transcription level (literary transcription)
Tokenised transcription level (trans_tok): division of the transcription into individual tokens
Normalised level (norm): transcription according to standard orthography; no modification of non-standard utterances (e.g. in terms of case or genus).
Word types/part-of-speech tagging (pos): based on STTS 2.0, supplemented by three corpus-specific tags (ATM: audible breathing, META: double bracket in transcription of paraverbal utterances "((laughs))", SOART: contraction of son and inflectional forms)
Lemma level (lemma): word lemma
Annotation of contact language tokens (FW): information on donor language, extent of integration and existence of a lexicon entry in the online version of the "Duden" dictionary (2020). This annotation level is not yet available in the DGD, but will be made available in future releases. The annotation guidelines for the contact language tokens can be found here.

The following figure illustrates the transcription levels in an EXMARaLDA transcript:

Screenshot of a transcript with all transcription levels.

Transcription

Photo: Heike Wiese

One of the recording locations: A farm

The orthographic transcription was done with the score editor of EXMARaLDA (Schmidt, 2016); the annotation guidelines are a slight modification of the cGAT conventions (Schmidt et al. 2015). The annotation guidelines with the deviations from the cGat conventions can be found here. The transcription largely follows the standard orthography, but at the same time captures typical phenomena of spoken language (e.g. elisions, contractions, word breaks, pauses in conversation) as well as paraverbal and non-verbal information. The first versions of the transcriptions were each checked by another team member; deviations were discussed and resolved with the original transcriber. A final check was done by a German-speaking Namibian.

Anonymisation and sigla

Anonymisation of personal names, specific location information (e.g. farms) as well as all statements that allow conclusions to be drawn about the identity of persons.
Masking in the audio files
Anonymisation through four types of sigla in the corpus, some of which contain meta-linguistic information
- Sigla for speakers:
  Speaker ID-No. Gender Age Group
  NAM 006 W 1
  001 - 2xx
  one number per speaker
  
  M
  male
  
  W
  female
  
  1
  under 21
  
  2
  21 - 40
  
  3
  41 - 60
  
  4
  over 60
- Sigla for the researchers (e.g. RES1-RES4)
- Sigla for individual tokens that have been anonymised: initial letter of the anonymised expression + three-digit number, e.g. N001
- Sigla for anonymised expressions consisting of several tokens: Phrase „anonymisierte_Äußerung“ ("anonymised_expression") + three-digit number, e.g. anonymisierte_Äußerung001.

Speaker	ID-No.	Gender	Age Group
NAM	006	W	1
	001 - 2xx one number per speaker	M male W female	1 under 21 2 21 - 40 3 41 - 60 4 over 60

Project participants

Mehrsprachige Werbung für einen Weihnachtsmarkt in Namibia

Photo: Heike Wiese

Multilingual advertisement for a Christmas market in Namibia.

PIs: Heike Wiese, Horst J. Simon

Cooperation partners: Marianne Zappen-Thomson, Thomas Schmidt, Hans Boas

Project collaborators: Christian Zimmer, Janosch Leugner, Yannic Bracke, Britta Stuhl, Laura Perlitz

Student assistants: Jones Anam, Christian Anders, Alexandra Fosså, Semra Kizilkaya, Carina Schüffler, Claudia Czarniak, Philipp Klaußner, Jula Kostka, Anika Kroll-Tjingaete, Johanna Pott, Britta Stuhl

Citation

Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta; Perlitz, Laura, & Schmidt, Thomas: DNam-Korpus zum Deutschen in Namibia.

Literature

Wiese, Heike; Simon, Horst J.; Zimmer, Christian & Schumann, Kathleen (2017). German in Namibia: A vital speech community and its multilingual dynamics. In Péter Maitz & Craig A. Volker (Hg.), Language Contact in the German Colonies. S.221-245.

Zimmer, Christian; Wiese, Heike; Simon, Horst J.; Zappen-Thomson, Marianne; Leugner, Janosch; Bracke, Yannic; Stuhl, Britta, & Schmidt, Thomas (2020). Das Korpus Deutsch in Namibia (DNam): Eine Ressource für die Kontakt- Variations- und Soziolinguistik. Deutsche Sprache 3: 210-232.

Supplementary corpus to DNam: DNam-Wenker

In 2013/14, "Wenker" data on Namibian German was collected via an online platform.

The survey was aimed at Namibian speakers of all ages and served to obtain broad data on specific areas of lexicon and grammar, which, through this classic tool of Germanic dialect research, ensure broad comparability with other and even older studies on dialectal forms in German. In order to reach as many speakers as possible, we developed an online questionnaire with the 40 original "Wenker sentences", supplemented by an introductory text on Namibian-German, the research project and the "Wenker sentences", as well as a personal questionnaire on biographical, social and sociolinguistic data at the end.

You will find the exact wording of the task in the online survey here and the 40 Wenker sentences queried here.

Through extensive media work and dissemination in the German-speaking community via radio, newspapers, church congregations and schools, more than 200 participants were recruited; this covers approximately one percent of the speaker community. For their committed support in disseminating information on the "Wenker" survey, we would like to thank the Delta School and the German Higher Private School (Deutschen Höheren Privatschule – DHPS) Windhoek, Wilfried Hähner from "Hitradio Namibia" and the "Allgemeine Zeitung Windhoek" and the then Bishop of the Evangelical Lutheran Church in Namibia, Bishop Hertel.

The results of the Wenker-Namdeutsch survey are freely available as an Excel spreadsheet under the CC BY 3.0-Licence.

Wenker-Namdeutsch by Heike Wiese is licensed under a Creative Commons Attribution 3.0 Germany License.

Documentation

Not all survey participants completed the questionnaire in full. In the corresponding data records, the empty fields are marked with "NA". The information provided by the respondents was transferred over without modification, except for three areas that were changed for data protection reasons:

Information on occupation was removed
The year of birth was given as a period of ten years.
For the question regarding place of residence during the period of the sruvey, the country "Namibia" was given for all Namibian towns. All records with a place of residence outside Namibia were removed from the table. Records with no indication of place of residence were retained.

Collaboration / Support

Heike Wiese, Hans C. Boas, Horst J. Simon, Marianne Zappen-Thomson, Laura Perlitz, Oliver Bunk

Citation

Wiese, Heike (2014): DNam-Wenker. Ein Korpus mit 'Wenker'-Sätzen zum Namibiadeutschen.

Faculty of Language, Literature and Humanities - German in Multilingual Contexts

Main corpus: DNam

Funding

Access to the corpus

Corpus size and sub-copora

Data collection

Speakers

Annotation, transcription, anonymisation/sigla

Annotation levels

Transcription

Anonymisation and sigla

Project participants

Citation

Literature

Supplementary corpus to DNam: DNam-Wenker

Documentation

Collaboration / Support

Citation