One GSCL MicroGrant (2500 Euros) has been awarded to Julia Romberg (GESIS) and Matthias Orlikowski (Bielefeld University) for their proposal on “Social Survey Data for Systematic Investigations of Annotator Characteristics”. Congratulations!

Project Description: Social Survey Data for Systematic Investigations of Annotator Characteristics

Human oversight is essential to the development of capable and robust language processing systems. Such oversight is often provided by manual annotation of texts to gather how humans would complete a certain task. Crucially, annotation outcomes can be influenced by the characteristics of annotators, such as their socio-demographic backgrounds, morals and values. Usually it is unclear which characteristics matter in a given context, so that researchers often decide ad-hoc which characteristics to include in annotation studies. Funded by the GSCL MicroGrant, Matthias Orlikowski (Bielefeld University) and Julia Romberg (GESIS - Leibniz Institute for the Social Sciences) will explore the potential of a more informed approach to annotation that uses existing survey data from the social sciences.

The proposed SURVEY annotation framework is based on a key observation: The social sciences curate a rich body of large-scale surveys that capture opinions on a wide range of topics. These surveys often build representative samples of the general population. This level of representativeness potentially enables researchers to identify characteristics that influence how individuals perceive topics relevant to a given annotation study. For example, for annotations of hate speech, researchers could check which characteristics correlate with measures of racism or misogyny in the general population. Subsequently, the selection of annotator characteristics for annotation projects can be made in a more informed manner. This approach promises to enhance annotation quality and may also provide an opportunity to manage project resources more efficiently.

To test the SURVEY annotation framework, an exemplary annotation study will be conducted, enabled by the GSCL MicroGrant. As a first test scenario, the study will use controversial social media posts discussing the introduction of a general speed limit in Germany. These posts will be annotated for a task called key point analysis, which involves detecting high-level arguments for and against a topic of debate. The SURVEY annotation framework is tested as follows: First, potentially relevant annotator characteristics will be derived from the German Election Longitudinal Survey (GLES) using regression analysis. The GLES, as a general-population survey, had participants answer questions on the general speed limit, provide socio-demographics as well as attitudes on related topics like climate change. Then, a crowd-sourcing annotation study will be conducted. In this annotation study not only the task-specific annotations are collected, but also information about the annotators. The study will collect standard socio-demographics, assessments of the general speed limit, and assessments of the annotated posts. These assessments, for example, include the personal agreement with the annotated post and how its persuasiveness is perceived. Using this fine-grained information, analyses will be conducted to understand how different characteristics influence the behavior of the studied annotators. Finally, the selection of characteristics made based on GLES will be compared to the ad-hoc approach that commonly relies on standard socio-demographics.

Contact

Julia Romberg

GESIS - Leibniz Institute for the Social Sciences

Matthias Orlikowski

Bielefeld University

E-Mail