Kurz-AG Encoding language and linguistic information in historical corpora
39. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft 2017 (DGfS)
Conference date: 08.-10.03.2017
Conference venue: Saarbrücken, Germany
Conference Homepage: http://dgfs2017.uni-saarland.de/
AG 4 Short session Homepage: http://dgfs2017.uni-saarland.de/wordpress/en/sessions/ag-4/
Working group: Encoding language and linguistic information in historical corpora
Historical corpora have been established as an empirical digital base for various types of linguistic studies. The corpora are based on texts (sometimes images) and often require special information encodings, e.g. transcription and normalization. With respect to corpus linguistics as a method, annotating a (historical) corpus is always a matter of interpretation, either of its structure or of its content, and need not be universally consensual. Additionally, annotations have to balance between a diplomatic representation of historical texts and its linguistic analysis. This requires a linguistic modelling of annotations to develop (i) annotation guidelines, standardized and customized ones, (ii) annotation concepts, such as spans, trees or graphs, (iii) annotation assignment methods, and (iv) corpus architectures.
This working group would like to ask which methods of annotation have proven successful in order to address the balancing of historical diplomatic representation and linguistic analyses in historical, corpus-linguistic studies. Additionally, we would like to learn from cases, where common linguistic annotations are not sufficient for the structured exploration of the historical corpus data, and where new approaches address these requirements.
Invited Speaker: Prof. Dr. Mathilde Hennig (Justus-Liebig-Universität Gießen, Homepage)
This workshop would like to bring together linguists interested in and using historical corpora, corpus linguists, and computational linguists.
Program AG 4
Universität des Saarlandes, building B 3.1, room 0.12
Thursday, March 9th, 2017 |
||
11:15 – 12:15 |
Mathilde Hennig |
Slides |
12:15 – 12:45 |
Svetlana Petrova |
Slides |
12:45 – 13:45 |
Lunch break |
|
13:45 – 14:15 |
Lisa Dücker, Stefan Hartmann & Renata Szczepaniak |
Slides |
Friday, March 10th, 2017 |
||
11.30 - 12.00 |
Maarten Janssen TEITOK: Combining language and linguistic information without compromise (Abstract) |
Slides |
12:00 – 12:30 |
Zarah Weiß & Gohar Schnelle |
Slides |
12:30 – 13:00 |
Cătălina Mărănduc, Cenel-Augusto Perez, Ludmila Malahov & Alexandru Colesnicov |
|
13:00 – 13:30 |
Katrin Goldschmidt |
Sildes |
13:30 – 14:00 |
Nicoletta Puddu |
tba |
We are looking forward to seeing you at the DGfS 2017!
Kerstin Eckart and Carolin Odebrecht
Kerstin Eckart
Pfaffenwaldring 5b, D-70569 Stuttgart |
Carolin Odebrecht
Unter den Linden 6, D-10099 Berlin |