Dr. Shujun Wan
Contact
Office: Dorotheenstraße 24, Room 3.408, 10117 Berlin
Tel: +49 (0)30 2093 9618
Email: shujun.wan(at)hu-berlin.de
Postal address
Institut für deutsche Sprache und Linguistik
Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin
Unter den Linden 6
10099 Berlin
Research Interests
learner corpora, discourse analysis, natural language processing, language acquisition, transfer, contrastive rhetoric, stance markers, intercultural communication.
Academic Work
Since 06.2023 | Research Intern: German research center for artificial intelligence (DFKI) |
Since 06.2020 |
Research Fellow: SFB 1412 (C04) Register knowledge in advanced learner language |
Since 12.2020 |
Associate Researcher: Project Building a Chinese German Learner Corpus and Researching Learners' Writing Competence Development. Funded by the Chinese National Social Science Foundation. |
Since 05.2020 |
Member: DFG-SFB1412-MGK (Integrated Graduate School) |
10.2017 - 07.2018 |
Lecturer: Grammatical Description Modell and Research Methods in Linguistics. |
Ongoing Projects
OpenGPT-X
- OpenGPT-X builds and trains large-scale AI language models to drive innovative language application services for the European economy. Through the open Gaia-X infrastructure, businesses will be able to use and share data and services free of charge, in multiple languages and according to the highest European data protection standards to develop products and processes with a wide variety of language features (e.g. chatbots, digital assistants and personalised media reports).
- My work in this project as a visiting scholar at DFKI (German Research Center for Artificial Intelligence) is to investigate the differences in rhetorical constructions between automatically generated texts by OpenGPT-X and human-authored texts. This is a joint work with Julián Moreno Schneider.
SFB1412 C04: Register knowledge in advanced learner language
- Das Projekt untersucht den Erwerb von Registerkompetenz in einer Fremdsprache anhand von verschiedenen gesprochenen und geschriebenen Lernerkorpora. Wenn Registerwissen hauptsächlich (implizit) aus sprachlicher Erfahrung erworben wird, kann man davon ausgehen, dass selbst fortgeschrittene Lernerinnen einer Fremdsprache weniger Registerwissen haben können. Daraus leitet sich ab, dass sie in bestimmten Registern andere Alternativen wählen als Muttersprachlerinnen und dass sie insgesamt weniger Unterschiede machen können (das Spektrum der Alternativen also kleiner ist). C04 untersucht dies am Phänomen Modifikation.
- Mehr Info: SFB 1412
RST-Tace
- RST-Tace is a tool for automatic comparison and evaluation of RST trees. It can be used regardless of the language or the size of the rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement.
- Currently, RST-Tace can be used via a command-line interface. For up-to-date information and complete documentation, please refer to the GitHub repository.
Connective-Lex.info for Chinese
- Connective-Lex.info is a web-based multilingual lexical resource for connectives. Users can work with it to search for connectives and the discourse relations they may signal. So far, a number of languages (such as English, German, French, etc.) have been included.
- The task of our project is to expand Connective-Lex.info with Chinese connectives. This ongoing project is conducted jointly with Peter Bourgonje, Manfred Stede, Clara Wan Ching Ho, and Hongling Xiao.
Doctoral Thesis
-
My dissertation is a corpus-based contrastive rhetorical study, focusing on the argumentation strategies employed by Chinese learners of German and native speakers of German in their German essays. These strategies include not only linguistic formulation but also discourse-level considerations.
-
The doctoral thesis is supervised by Prof. Dr. Anke Lüdeling and Prof. Dr. Manfred Stede.
-
Funded by the Chinese Scholarship Council and the German Research Foundation (DFG) - SFB 1412, 416591334.
Publications
- Wan, Shujun; Bourgonje, Peter; Xiao, Hongling; Wan Ching Ho, Clara; Stede, Manfred (2023): Chinese-DiMLex - A Lexicon of Chinese Discourse Connectives. Poster presented at the 4th Workshop on Computational Approaches to Discourse, held in conjunction with ACL2023. July 9th-14th, Toronto, Canada. LINK
- Hirschmann, Hagen; Lüdeling, Anke; Shadrova, Anna; Bobeck, Dominique; Klotz, Martin; Akbari, Roodabeh; Schneider, Sarah; Wan, Shujun (2022): FALKO. Eine Familie vielseitig annotierter Lernerkorpora des Deutschen als Fremdsprache. In: KorDaF (Korpora Deutsch als Fremdsprache) 2(2), 139–148. LINK
- Wan, Shujun (2022): Zur Positionierung der eigenen Meinung und der Verwendung von Appellen in argumentativen Texten: Chinesische DaF-Lerner/-innen, L1-Sprecher/-innen des Deutschen und chinesische EFL-Lerner/-innen im Vergleich. In: Deutsch als Fremdsprache. 165-177. LINK
- Wan, Shujun (2021): Kobalt_RST: Die Annotation von rhetorischen Strukturen im Kobalt-DaF-Korpus. Zenodo. LINK
- Lüdeling, Anke; Hirschmann, Hagen; Shadrova, Anna & Wan, Shujun (2021): Tiefe Analyse von Lernerkorpora. In: Deutsch in Europa. De Gruyter, 235-284. LINK
- Wan, Shujun; Kutschbach, Tino; Lüdeling, Anke & Stede, Manfred (2019): RST-Tace a tool for automatic comparison and evaluation. In Proceedings of Discourse Relation Treebanking and Parsing (DISRPT 2019), Minneapolis, MN. LINK
- Wan, Shujun & Lüdeling, Anke (2019): Discourse structure in German argumentative essays: A comparison of L1 German and Chinese learner German. In: Proceedings of the 5th Learner Corpus Research Conference (LCR 2019), Warschau.
- Wan, Shujun (2016): Überlegungen zur Förderung der interkulturellen Kompetenz im chinesischen DaF-Unterricht – Das Modell der global orientierten Persönlichkeitsentwicklung und Handlungskompetenz.(Master Thesis)
- Wan, Shujun; Li, Yuan (2014): Das Leben in fremder Kultur – Eine empirische Untersuchung zur Lebenssituation der chinesischen Migranten in Deutschland. Im Sammelband: Theorie und Praxis der Sprachkommunikation. Materialien zur VI. Internationalen wissenschaftliche-didaktischen Konferenz 25-26.Juni 2014. S. 345 –S. 352.
- Wan, Shujun (2013): Lebenssituation der chinesischen Migranten in Deutschland – Eine empirische Untersuchung in Berlin. (Bachelor Thesis) Best Bachelor Thesis Award.
Talks
- ACL2023 and the 4th workshop on computational approaches to discourse. Poster: Chinese-DimLex: A Lexicon of Chinese Discourse Connectives. July 09th-14th, 2023, Toronto, Canada.
- Invited talk in Zhejiang University: Einleitung, Hauptteil und Schluss: wie werden sie in L1- und L2-Texten rhetorisch aufgebaut? - Eine korpusbasierte kontrastive Studie. January 05th, 2023, Hangzhou China.
- KO Korpuslinguistik und Phonetik: Schreibstrategien in argumentativen Texten chinesischer Deutschlerner:innen. December 14th 2022, HU Berlin.
-
Workshop Word formation and discourse structure: Complex nouns as markers of academic register in L1-and L2-authored essays (zusammen mit Julia Lukassek, Anke Lüdeling, Anna Shadrova). May 05th-06th, 2022, Leipzig.
-
Invited talk in Potsdam University: Kontrastive Rhetorik: Merkmale deutscher und chinesischer argumentativen Texte "Erörterung" und "Yilunwen" im Vergleich. September 15th, 2021, Potsdam.
- 16. Dokorandin Tag 2019. Poster: Discourse structure in German argumentative essays. October 08th, 2019, HU Berlin. (Best Poster Award).
- LCR Conference. Poster: Discourse structure in German argumentative essays: a comparison of L1 German and Chinese learners of German. September 12th-14th, 2019, Warschau, Polen.
- NAACL 2019 and the workshop on Discourse Relation Parsing and Treebanking. June 03rd-07th, 2019, Minneapolis, USA.
- KO Korpuslinguistik und Phonetik: Eine statistische Methode zur Evaluation der Annotation und dem Vergleich der rhetorischen Strukturen. December 07th, 2018, HU Berlin.
-
International summer school "Learner Corpus Research: Theory and Applications": Discourse structure in German argumentative essays: a comparison of L1 German and Chinese learner German- Description and evaluation of the annotation process. August 27th-31th, 2018, Bremen.
- MISC Workshop: Annotation der Diskursrelationen - Herausforderungen. May 18th, 2018, HU Berlin.
- Textlink Final Action Conference. March 21th-23th, 2018, Toulouse, Frankreich.
- Fifth GF Summer School. Using GF to translate weather reports among English, German, and Chinese. August 14th-25th, 2017, Riga, Latvia. (Grammatical Framework (GF) is a grammar formalism and a programming language for multilingual computational grammars. https://www.grammaticalframework.org/)
- Internationales Symposium zur Übersetzung, Rezeption und Erforschung der Schweizer Gegenwartsliteratur: Rezeptionsästhetik in der Übersetzungspraxis am Beispiel von „Die melodielosen Jahre“. May 20th-22th, 2016, Hangzhou, China.
-
The National Research Competition for Undergraduate Students of China 2012: "Lebenssituationen der chinesischen Migranten und der türkischen Migranten in Deutschland – eine Vergleichforschung". (Outstanding Project Award)
Education
10.2016 - 03.2023* |
Ph.D. |
Corpus Linguistics | Humboldt University of Berlin |
10.2014 - 09.2015 |
Master - Double Degree |
German as foreign and Language | Technical University of Berlin |
09.2013 - 06.2016 |
Master |
German Studies | Zhejiang University |
07.2013 - 08.2013 |
Exchange |
German Language and Culture | Kiel University |
10.2011 - 09.2012 |
Exchange Semester |
Germantic Linguistics | Humboldt University of Berlin |
09.2009 - 06.2013 |
Bachelor |
German Language and Literature | Zhejiang University |
* 07.2020 to 03.2022: Part-time study due to parental leave.
* Disputation is expected to take place in the fall of 2023.
Teaching
Summer term 2018
- SE Grammatical Description Modell
Thurs. 12 - 2 pm. Sophienstr. 22-22a, Institutsgebäude, R. 0.01
- UE Research Methods in Linguistics
Thurs. 2 - 4 pm. Sophienstr. 22-22a, Institutsgebäude, R. 0.01
Winter term 2017/2018
- SE Grammatical Description Modell
Wed. 08 - 10 am. Dorotheenstr. 24. Raum: 1, 401
- UE Research Methods in Linguistics
Wed. 12 - 2 pm. Dorotheenstr. 24. Raum: 1, 401