Faculty of Language, Literature and Humanities - Corpus Linguistics and Morphology

Dr. Shujun Wan

Researcher

WechatIMG24.jpeg

Contact                                         

Office: Dorotheenstraße 24, Room 3.408, 10117 Berlin

Tel: +49 (0)30 2093 9618

Email: shujun.wan(at)hu-berlin.de

Postal address
Institut für deutsche Sprache und Linguistik
Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin
Unter den Linden 6
10099 Berlin

 

 

Research Interests

learner corpora, discourse analysis, natural language processing, language acquisition, transfer, contrastive rhetoric, stance markers, intercultural communication. 

 

Academic Work

Since 06.2023 Research Intern: German research center for artificial intelligence  (DFKI)
Since 06.2020

Research Fellow: SFB 1412 (C04) Register knowledge in advanced learner language

Since 12.2020

Associate Researcher: Project Building a Chinese German Learner Corpus and Researching Learners' Writing Competence Development. Funded by the Chinese National Social Science Foundation.

Since 05.2020

Member: DFG-SFB1412-MGK (Integrated Graduate School)

10.2017 - 07.2018

Lecturer: Grammatical Description Modell and Research Methods in Linguistics.

 

Ongoing Projects

OpenGPT-X
  • OpenGPT-X builds and trains large-scale AI language models to drive innovative language application services for the European economy. Through the open Gaia-X infrastructure, businesses will be able to use and share data and services free of charge, in multiple languages and according to the highest European data protection standards to develop products and processes with a wide variety of language features (e.g. chatbots, digital assistants and personalised media reports).
  • My work in this project as a visiting scholar at DFKI (German Research Center for Artificial Intelligence) is to investigate the differences in rhetorical constructions between automatically generated texts by OpenGPT-X and human-authored texts. This is a joint work with Julián Moreno Schneider.

 

SFB1412 C04: Register knowledge in advanced learner language
  • Das Projekt untersucht den Erwerb von Registerkompetenz in einer Fremdsprache anhand von verschiedenen gesprochenen und geschriebenen Lernerkorpora. Wenn Registerwissen hauptsächlich (implizit) aus sprachlicher Erfahrung erworben wird, kann man davon ausgehen, dass selbst fortgeschrittene Lernerinnen einer Fremdsprache weniger Registerwissen haben können. Daraus leitet sich ab, dass sie in bestimmten Registern andere Alternativen wählen als Muttersprachlerinnen und dass sie insgesamt weniger Unterschiede machen können (das Spektrum der Alternativen also kleiner ist). C04 untersucht dies am Phänomen Modifikation.
  • Mehr Info: SFB 1412

 

RST-Tace
  • RST-Tace is a tool for automatic comparison and evaluation of RST trees. It can be used regardless of the language or the size of the rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement.
  • Currently, RST-Tace can be used via a command-line interface. For up-to-date information and complete documentation, please refer to the GitHub repository.

 

Connective-Lex.info for Chinese
  • Connective-Lex.info is a web-based multilingual lexical resource for connectives. Users can work with it to search for connectives and the discourse relations they may signal. So far, a number of languages (such as English, German, French, etc.) have been included.
  • The task of our project is to expand Connective-Lex.info with Chinese connectives. This ongoing project is conducted jointly with Peter Bourgonje, Manfred Stede, Clara Wan Ching Ho, and Hongling Xiao.

 

Doctoral Thesis
  • My dissertation is a corpus-based contrastive rhetorical study, focusing on the argumentation strategies employed by Chinese learners of German and native speakers of German in their German essays. These strategies include not only linguistic formulation but also discourse-level considerations. 

  • The doctoral thesis is supervised by Prof. Dr. Anke Lüdeling and Prof. Dr. Manfred Stede.

  • Funded by the Chinese Scholarship Council and the German Research Foundation (DFG) - SFB 1412, 416591334.

 

Publications

  •  Wan, Shujun; Bourgonje, Peter; Xiao, Hongling; Wan Ching Ho, Clara; Stede, Manfred (2023): Chinese-DiMLex - A Lexicon of Chinese Discourse Connectives. Poster presented at the 4th Workshop on Computational Approaches to Discourse, held in conjunction with ACL2023. July 9th-14th, Toronto, Canada. LINK
  • Hirschmann, Hagen; Lüdeling, Anke; Shadrova, Anna; Bobeck, Dominique; Klotz, Martin; Akbari, Roodabeh; Schneider, Sarah; Wan, Shujun (2022): FALKO. Eine Familie vielseitig annotierter Lernerkorpora des Deutschen als Fremdsprache. In: KorDaF (Korpora Deutsch als Fremdsprache) 2(2), 139–148. LINK
  • Wan, Shujun (2022): Zur Positionierung der eigenen Meinung und der Verwendung von Appellen in argumentativen Texten: Chinesische DaF-Lerner/-innen, L1-Sprecher/-innen des Deutschen und chinesische EFL-Lerner/-innen im Vergleich. In: Deutsch als Fremdsprache. 165-177.  LINK
  • Wan, Shujun (2021): Kobalt_RST: Die Annotation von rhetorischen Strukturen im Kobalt-DaF-Korpus. Zenodo. LINK
  • Lüdeling, Anke; Hirschmann, Hagen; Shadrova, Anna & Wan, Shujun (2021): Tiefe Analyse von Lernerkorpora. In: Deutsch in Europa. De Gruyter, 235-284. LINK
  • Wan, Shujun; Kutschbach, Tino;  Lüdeling, Anke & Stede, Manfred (2019): RST-Tace a tool for automatic comparison and evaluation. In Proceedings of Discourse Relation Treebanking and Parsing (DISRPT 2019), Minneapolis, MN. LINK
  • Wan, Shujun & Lüdeling, Anke (2019): Discourse structure in German argumentative essays: A comparison of L1 German and Chinese learner German. In: Proceedings of the 5th Learner Corpus Research Conference (LCR 2019), Warschau. 
  • Wan, Shujun (2016): Überlegungen zur Förderung der interkulturellen Kompetenz im chinesischen DaF-Unterricht – Das Modell der global orientierten Persönlichkeitsentwicklung und Handlungskompetenz.(Master Thesis)
  • Wan, Shujun; Li, Yuan (2014): Das Leben in fremder Kultur – Eine empirische Untersuchung zur Lebenssituation der chinesischen Migranten in Deutschland. Im Sammelband: Theorie und Praxis der Sprachkommunikation. Materialien zur VI. Internationalen wissenschaftliche-didaktischen Konferenz 25-26.Juni 2014. S. 345 –S. 352.
  • Wan, Shujun (2013): Lebenssituation der chinesischen Migranten in Deutschland – Eine empirische Untersuchung in Berlin. (Bachelor Thesis) Best Bachelor Thesis Award.

 

Talks

  • ACL2023 and the 4th workshop on computational approaches to discourse. Poster: Chinese-DimLex: A Lexicon of Chinese Discourse Connectives. July 09th-14th, 2023, Toronto, Canada.
  • Invited talk in Zhejiang University: Einleitung, Hauptteil und Schluss: wie werden sie in L1- und L2-Texten rhetorisch aufgebaut? - Eine korpusbasierte kontrastive Studie. January 05th, 2023, Hangzhou China.
  • KO Korpuslinguistik und Phonetik: Schreibstrategien in argumentativen Texten chinesischer Deutschlerner:innen. December 14th 2022, HU Berlin.
  • Workshop Word formation and discourse structure: Complex nouns as markers of academic register in L1-and L2-authored essays (zusammen mit Julia Lukassek, Anke Lüdeling, Anna Shadrova). May 05th-06th, 2022, Leipzig.

  • Invited talk in Potsdam University: Kontrastive Rhetorik: Merkmale deutscher und chinesischer argumentativen Texte "Erörterung" und "Yilunwen" im Vergleich. September 15th, 2021, Potsdam.

  • 16. Dokorandin Tag 2019. Poster: Discourse structure in German argumentative essays. October 08th, 2019, HU Berlin. (Best Poster Award).
  • LCR Conference. Poster: Discourse structure in German argumentative essays: a comparison of L1 German and Chinese learners of German. September 12th-14th, 2019, Warschau, Polen. 
  • NAACL 2019 and the workshop on Discourse Relation Parsing and Treebanking. June 03rd-07th, 2019, Minneapolis, USA. 
  • KO Korpuslinguistik und Phonetik: Eine statistische Methode zur Evaluation der Annotation und dem Vergleich der rhetorischen Strukturen. December 07th, 2018, HU Berlin.
  • International summer school "Learner Corpus Research: Theory and Applications": Discourse structure in German argumentative essays: a comparison of L1 German and Chinese learner German- Description and evaluation of the annotation process. August 27th-31th, 2018, Bremen. 

  • MISC Workshop: Annotation der Diskursrelationen - Herausforderungen. May 18th, 2018, HU Berlin.
  • Textlink Final Action Conference. March 21th-23th, 2018, Toulouse, Frankreich. 
  • Fifth GF Summer School. Using GF to translate weather reports among English, German, and Chinese. August 14th-25th, 2017, Riga, Latvia. (Grammatical Framework (GF) is a grammar formalism and a programming language for multilingual computational grammars. https://www.grammaticalframework.org/)
  • Internationales Symposium zur Übersetzung, Rezeption und Erforschung der Schweizer Gegenwartsliteratur: Rezeptionsästhetik in der Übersetzungspraxis am Beispiel von „Die melodielosen Jahre“. May 20th-22th, 2016, Hangzhou, China.
  • The National Research Competition for Undergraduate Students of China 2012: "Lebenssituationen der chinesischen Migranten und der türkischen Migranten in Deutschland – eine Vergleichforschung". (Outstanding Project Award)

 

Education

10.2016 - 03.2023*

Ph.D.

Corpus Linguistics Humboldt University of Berlin
10.2014 - 09.2015

Master - Double Degree

German as foreign and  Language Technical University of Berlin
09.2013 - 06.2016

Master

German Studies Zhejiang University
07.2013 - 08.2013

Exchange

German Language and Culture Kiel University
10.2011 - 09.2012

Exchange Semester

Germantic Linguistics Humboldt University of Berlin
09.2009 - 06.2013

Bachelor

German Language and Literature Zhejiang University

* 07.2020 to 03.2022: Part-time study due to parental leave.
* Disputation is expected to take place in the fall of 2023.

 

Teaching

Summer term 2018

- SE Grammatical Description Modell

 Thurs. 12 - 2 pm. Sophienstr. 22-22a, Institutsgebäude, R. 0.01

- UE Research Methods in Linguistics

 Thurs. 2 - 4 pm. Sophienstr. 22-22a, Institutsgebäude, R. 0.01

Winter term 2017/2018

- SE Grammatical Description Modell

Wed. 08 - 10 am. Dorotheenstr. 24. Raum: 1, 401

- UE Research Methods in Linguistics

Wed. 12 - 2 pm. Dorotheenstr. 24. Raum: 1, 401