Important: If you are new to text annotation, this tutorial is a great preparation for our tutorial about the INCEpTION platform!

This tutorial guides you through the steps for manual annotation with the aim of text corpus construction for machine learning purposes. We will cover the following questions:

  • What is “text annotation”?
  • Why do we need it?
  • What should you keep in mind when selecting your (textual) data sources?
  • How do you design an annotation scheme?
  • What should be part of your annotation guidelines?
  • Which steps need to be taken during annotation scheme development and corpus annotation?
  • How long will it take?
  • Learn how to compute and interpret agreement coefficients.
  • What are common file formats?
  • What are stand-off annotations?
  • What are the advantages of a web-based annotation system?
  • Practical hints


Annemarie Friedrich is a Research Scientist at the Bosch Center for Artificial Intelligence, where she currently works on text mining for the scientific domain. Before that, she was a postdoctoral researcher at the Center for Information and Language Processing (CIS) at Ludwig-Maximilians-Universität in München. She holds a Ph.D. in Computational Linguistics, an M.Sc. of Language Science and Technology from Saarland University. She is currently serving as a Senior Area Chair for EMNLP 2021, and regularly reviews for ACL, NAACL, EACL, EMNLP and Coling. She is a member of the committee of the ACL Special Interest Group for Annotation (SIGANN). Her research interests are computational semantics, discourse processing and linguistic annotation. She is especially interested in modeling phenomena at the interface of syntax, semantics and discourse, such as temporal and aspectual structure.