The development of natural language processing and AI applications require a gold standard dataset. Data is the pillar of those intelligent applications, and an annotation is a way to acquire it. In this talk, I will first discuss the main components of WebAnno, one of the most popular annotation tools to date, which is a generic, distributive, and web-based annotation tool. I will then discuss the extension of WebAnno, which is called CodeAnno, that supports hierarchical document-level annotation, particularly for the codebook annotation in social science. In the second part of my talk, the approaches and challenges of social NLP datasets, such as hate speech, sentiment, and fake news datasets using crowdsourcing frameworks. I will conclude the talk by presenting our Telegram bot-based social media annotation tool called ASAB, which is built to alleviate crowdsourcing limitations in the annotation of low-resource languages.

Biography

Seid is currently a postdoctoral researcher at LT Group, Universität Hamburg, under the supervision of Prof. Chris Biemann. His research focuses on social NLP, adaptive ML for data annotation, big data analysis, and NLP for low-resource languages. He has been working as a scientific software engineer at Language Technology Group since September 2012. He has participated in the development of NLP tools such as Par4Sem, WebAnno/CodeAnno, new/s/leak, GermaNER, and Network of the Day. He also participates in teaching different NLP and AI-related courses and projects.