Overview

Current projects

null

ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for Annotation of Information Structure, has been designed to provide access to the data of the SFB 632 "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For projects working with spoken language, support for audio/video annotations is also required.

http://www.sfb632.uni-potsdam.de/d1/annis/

 

DDB

The DDB (Deut­sche Dia­chro­ne Baum­bank) is a small (ca. 8000 to­kens) deep­ly syn­tac­ti­cal­ly an­no­tat­ed cor­pus con­sist­ing of three sub­cor­po­ra of dif­fer­ent lan­guage pe­ri­ods of Ger­man (Old High Ger­man, Mid­dle High Ger­man, Ear­ly New High Ger­man). The set up of the cor­pus main­ly fol­lows the TI­GER-cor­pus, one of the larg­est free­ly ac­ces­si­ble tree­banks of Ger­man. DDB was de­vel­oped with­in the proj­ect, sup­port­ed by the Sen­ate of Ber­lin, „In­ter­dis­ci­pli­nar­y re­search net­work lin­guis­tics – bi­o­in­for­mat­ics for the com­pu­ta­tion of kin­ship and de­scent”.

Home­page: http://korp­ling.ger­man.hu-ber­lin.de/ddb-do­ku/in­dex.htm
Cor­pus: http://korp­ling.ger­man.hu-ber­lin.de/ddd/search.html

null

The network, which is funded by the German Research Foundation (DFG), combines skills from German Linguistics, Computer Linguistics, Computer Science and Psychology in order to achieve two goals: First, based on a set of concrete research questions, to compile suggestions for standards and the processing of linguistic data from German internet-based communication and, second, to develop methods and tools for their empirical computer-assisted analysis. The findings will be documented in publications, and the suggestions for standards and procedures will successively be provided online.

http://www.empirikom.net

null

Falko is a freely available error-annotated learner corpus of German as a foreign language.

http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/research/falko/

Kompost

Using methods from computational linguistics, this project will identify indicators of the quality of students’ texts in the German language. Special emphasis will be placed on the evolution of those quality indicators across competence levels, i.e. the development of observable parameter values over time as the students’ language skills improve. The study will be based on essays, test results, students’ attitudes and personal information from the city of Hamburg’s longitudinal KESS study, as well as material from other surveys. The core of this dataset is comprised of approximately 9000 essays which were rated along several dimensions.

http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/research/kompost

Laudatio

http://www2.hu-berlin.de/laudatio/wordpress/

null

Network Kobalt-DaF

Annotation und Analyse argumentativer Lernertexte
Konvergierende Zugänge zu einem schriftlichen Korpus des Deutschen als Fremdsprache

http://www.uni-konstanz.de/Kobalt/

null

The RIDG­ES proj­ect (Reg­is­ter in Di­a­chron­ic Ger­man Sci­ence) is an in­ves­ti­ga­tion in­to the de­vel­op­ment of the Ger­man sci­en­tif­ic lan­guage in the ear­ly mod­ern and mod­ern pe­ri­ods, rang­ing from the mid 16th to the late 19th cen­tu­ry.

null

With SaltNPepper we provide two powerful frameworks for dealing with linguistic annotated data. SaltNPepper is an Open Source project developed at the Humboldt University of Berlin. In linguistic research a variety of formats exists, but no common way of dealing with them. Therefore we developed a metamodel called Salt which abstracts over linguistic data. Salt is based on a general graph structure and treats linguistic data as sets of nodes and edges. Therefore it is highly usable in very different contexts of linguistic analysis Pepper is a pluggable framework which offers the possibility to plug-in new modules (using OSGi). The architecture of Pepper is flexible and makes it possible to benefit from already existing modules.

https://korpling.german.hu-berlin.de/saltnpepper

null

<tiger2/> is an XML format conformat to the SynAF model (ISO 24615:2010) modelled to express syntactic structures for a wide variety of theoretical formalisms and corpus architectures. It is closely related to and develops the ideas found in TigerXML: the declared goal of the project is to expand TigerXML only as much as required for the representation of current advanced syntactic resources, without any changes that are not strictly necessary and might increase the learning curve or require substantial alterations to existing tools. The format is similarly conceived as theory neutral, as it is suited to both shallow and deep parsing in any number of theories and supports both pure constituency and dependency trees, as well as combinations of the two.

http://korpling.german.hu-berlin.de/tiger2/

null

This project seeks to systematically identify linguistic structures of German that pose a specific difficulty for the acquisition of German as a foreign language (GFL). Conventionally, this is done by observing learner errors (see Borin & Prütz 2004 or Westergren-Axelsson & Hahn 2001). However, if learners avoid difficult elements, this method fails. We claim that the relative underrepresentation of structures in learner data implies that these structures are difficult to acquire. Therefore, we propose a systematic study of underrepresented structures.

http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/research/learner-difficulties/WHIG-en

Document Actions
last modified 12-04-30 by krauseto
Personal tools