Education and Profession

Chair: Heike Zinsmeister

The SIG focuses on two main aspects:

  • Education and teaching in computational linguistics and neighboring disciplines that include natural language processing modules (such as Digital Humanities);
  • Job prospects for CL graduates.

Both aspects have changed in recent years due to the digitalization of every day life, accessibility of large amounts of digital text, and new machine learning methods. Basic questions to the discipline and its didactics have remained the same though: “What are the characteristics of good CL teaching? What kinds of learning spaces promote students in their CL studies? How to design and run CL curricula? What are the requirements for CL graduates to be employed by companies after graduation?" etc. (Irene Cramer, former SIG chair).

Upcoming events:

Teach4DH, a workshop on teaching NLP for digital humanities which is co-located with GSCL 2017

A new edition of GSCL’s (at that time the society was still known as GLDV) 2006 company survey results in German on requirements and expectations for prospective employees graduating from computational linguistics. See also Dr. Nils Lenke’s article in the current GSCL newsletter ComputerlinguistInnen bei Nuance: Neue Arbeitsmöglichkeiten in einem sich wandelnden Industriezweig.

Dialog Systems

Chair: David Schlangen, Bernhard Schröder,

Computer Linguistics for Education

Chair: Andrea Horbach, Ramon Ziai,

Computational linguistics have gained considerable importance in education in recent years. At the international level, this development is reflected in workshops such as BEA (Innovative Use of NLP for Building Educational Applications) and NLP4CALL. In German-speaking countries, there are currently no workshops or working groups that address this topic. Thematic overlaps exist with the INDUS (Individualized Language Learning) DFG network, which will expire in 2018, however.

Our goal is to provide a platform for computational linguistic research that places computational linguistics at the service of education. Our working group includes all areas of research and application in which language is automatically processed for teaching and learning purposes. Following on from the dormant working group "Language and Text Technology Methods in eLearning", we want to bring together researchers, teachers and industry representatives in German-speaking countries to discuss their work on automatic language processing for educational purposes and to create synergies.

Thematic priorities

  • Automatic evaluation of linguistic data: support for teachers
  • Intelligent tutoring systems: Automatic feedback for learners on form and content
  • Grammatical error detection
  • Generation of language exercises
  • Evaluation of the difficulty of tests and exercises
  • Preparation of texts for learners
  • Native language recognition for L2 texts

Other, thematically related topics are always welcome.


Chair: Roman Schneider, Bernhard Schröder, Angelika Storrer,,
The working group hypermedia deals with the possibilities of hypermedia from a linguistic, computational linguistic and textual perspective. In the context of working meetings, joint projects and publications, we treat the following topics:

  • Hypermedia as a publication medium in humanities applications and education
  • Multimedia linguistic information and learning systems, hypertextualization of grammars and dictionaries and the opportunities of the new medium for language description
  • Multimodal corpora and annotations
  • Standards for hypermedia and multimedia
  • Hypermedia and XML-aware database technologies
  • Computational linguistics tools (eg lemmatization programs, morphological analysis, semantic annotations) and their integration in hypermedia applications
  • Mobile systems and user adaptivity
  • Interactive and collaborative elements in Web 2.0 (wikis, weblogs etc.)
  • Media aspects, web design and usability

Workshop 2017

Corpus Linguistics

Chair: Alexander Mehler , Armin Hoenen
The working group Corpus Linguistics and Quantitative Linguistics deals with the development and testing of tools for the automatic analysis of corpora as well as the construction and application of mathematical, quantitative models of explorative corpus analysis.

The working group addresses the following questions:

  • Preparation and annotation of corpora.
  • Body analytic based metrization of properties and relations of linguistic units.
  • Extraction, reconstruction or exploration of linguistic knowledge from corpora of natural language texts.
  • Promotion of applications in the field of text analysis and text technology.
  • Support of linguistic theories.

Machine Translation

The SIG "Machine Translation" covers the whole range of automatic translations from theory to practical experience. Currently our focus is set on hybrid systems using statistical and linguistic knowledge. Other fields of attention concern the evaluation of MT-Systems, tools for computer-aided translation (CAT) or dictionary- and terminology-interfaces and the exchange of data between MT-systems from different producers.


German Sentiment Analysis

Mission Statement
The area of sentiment analysis draws more and more attention not only in the academic field but also in the business area (e.g. web- and business intelligence). In general, sentiment analysis refers to the task of identifying and extracting opinions, emotions and appraisals from a given input stream (e.g. text documents, product reviews, micro-blogging services or speech) with respect to a certain target. While Sentiment Analysis is a highly active research area in the English (and international) community, there are rarely research collaborations and resources available that focus on the German language as the target domain.

This project is a European research collaboration, which addresses German Sentiment Analysis. The collaboration involves currently eight partners from three different countries within Europe. Emphasized issues ranging from theory to applications of computer-aided Opinion Mining, Sentiment Analysis, Polarity Detection and Affective Computing. While the members of this project assume various viewpoints of sentiment analysis, they ultimately all work within a common research area, which makes this endeavour an exciting and interesting collaboration.

Inter-communication and resources are the keywords within this research group, but also services which may be offered amongst interested parties and academic-related communities. Our short-term goal is to work intensively on available resources (dictionaries, lexicons and corpora), and improve their quality and acceptability by the research community. The main target language is German, and obviously the German community, since the number of German benchmark collections, corpora, subjectivity dictionaries or other resources is rather limited.

A first milestone of this research collaboration will be the creation (and the establishment) of a first (Golden Standard) benchmark collection for the German language, in order to allow a coherent and comparable evaluation of Sentiment Analysis algorithms and systems.

Social Media / Computer-mediated Communication

The working group deals with the linguistic, linguistic and textological basics, which are needed for the construction of annotated corpora for language use in social media and in Internet-based communication as well as corresponding data in Webkorpora, Internet-based communication (also known as "computer-mediated communication") involves dialogical forms of communication that use the Internet as a communication infrastructure - for example, communication in online forums, chats, instant messaging applications and via Skype, on wiki Discussion pages, in blog and video blog comment threads, on Twitter, on social network profile pages, and in multimodal interaction spaces (learning environments, MMORPGs, and "virtual worlds").

There are already national and international initiatives on the subject areas of the working group (eg as part of the Text Encoding Initiative). This is followed by the AK, in collaboration with researchers from linguistics, computational linguistics and language technology to develop solutions specifically for German-language data.

Thematic priorities
The working group consolidates topics, projects and discussion lines with computer linguistic, linguistic and textual technological aspects , which were treated within the framework of the DFG Network Empirical Research on Internet-based Communication ( Empirikom) and for the development of methods for the processing and annotation of speech data from social media and from genres of Internet-based communication are of central importance.
This includes:

  • anchoring the topic of "Social Media / Internet-based Communication" on the agenda of national and international standardization initiatives in the field of speech and text technology ;
  • the documentation of annotation guidelines, gold standards and results from projects for the adaptation of existing NLP procedures for the automatic linguistic annotation of speech data from social media and from genres of internet-based communication;
  • the creation of standardized components for the automatic processing of voice data from social media and from genres of internet-based communication, eg in cooperation with the development teams of Apache UIMA and the DKPro framework; it is planned to develop the components in the UIMA standard and make them freely available as part of DKPro;
  • the documentation of rights issues relating to the collection, annotation and provision of voice data from the treated genres in Corpora and their use for the purposes of empirical speech analysis and in the field of speech technology;
  • the establishment of a network of researchers who deal with the issues dealt with in the AK at home and abroad (based on existing contacts and cooperations).

Regular workshops on changing key topics, exchange via a mailing list and a digital newsletter as well as documentation of current projects and events related to the topics of the AK on the GSCL website are planned.

  • Workshop of the AK as part of the KONVENS 2014: "NLP 4 CMC: Natural Language Processing for Computer-Mediated Communication / Social Media"
    University of Hildesheim, October 6, 2014
    Website for the workshop and call for papers: < / site / nlp4cmc />
  • Workshop "Social Media Corpora for the eHumanities: Standards, Challenges, and Perspectives"
    TU Dortmund, 20./21. February 2014
    The workshop focuses on topics that have been the focal points of the DFG's network "Empirical Research in Internet-based Communication" over the past three and a half years: Am Examples of corpus projects from Germany, France, the Netherlands, Italy and Switzerland will address questions of the linguistic description of language use in social media as well as corpus and computer linguistic aspects of the construction, annotation and processing of corpora to language on the Internet and in social media.

Networking and cooperation

Text Technology

AK text technology is primarily concerned with the integration of standard generalized markup language (XML, DSSSL, HyTime) and linguistic data processing. The goal is to enable the development of innovative text models and content-oriented word processing and usage.

In the 1980s, the Standard Generalized Markup Language (SGML) provided a basis for the media-independent description of textual structures and annotation systems, which in recent years has become a multitude of applications - HTML is arguably the best-known, software systems and derived Standard has led. Although one of the roots of SGML can be found in linguistics, the field of machine-language and word processing has so far been virtually unconnected. The Text Technology working group has set itself the goal of promoting the coupling of SGML-based information processing, linguistics and language processing in order to enable the development of innovative text models and content-oriented word processing and usage.

If there is one single aspect that characterizes SGML [...] it is that it puts the computing power of information technology behind the all-encompassing descriptive power of human language. [Liora Alschuler, ABCD ... SGML . 1995, 1]

In the wake of SGML, a number of other standards have emerged that are also relevant to this objective:

  • The Document Style Semantics and Specification Language (DSSSL) allows to define the transformation of SGML instances into any presentation format, including other SGML target formats.
  • The Hypermedia / Time-Based Structuring Language (HyTime) is a convention on how to express references in and between texts as well as timelines and synchronizations in SGML instances.
  • For the use of SGML, DSSSL and HyTime in WorldWideWeb, moreover, simplified versions have been developed or are currently in development: the Extensible Markup Language (XML), a simplification of SGML, the Extensible Linking Language (XLL), a subset of HyTime, and the Extensible Style Language (XSL), a major simplification of DSSSL.


