Computational linguists have cared about data “before it was cool”. In the community of ML/AI practitioners, however, “model work” gets more love than the “data work”. Small and medium business, while not immune to the AI hype, often (1) do not have enough (representative) data for training their machine learning modules (2) lack the in-house expertise and the resources to collect realistic data (3) underestimate the effort needed to prevent data-related issues. I will present recent studies showing the importance of a more data-oriented approach when it comes to use-case specific models. I will discuss how a scarce attention to data has consequences on its quality as well as ethical consequences and argue that a data-centered and user-centered perspective is a missing link when transferring technologies outside academia and into industrial use cases.

Biography

A linguist at heart, with a background in NLP and in psycholinguistics (Almae Matres: Pisa, Saarbrücken and Stuttgart). She has worked on lexical semantics, on world knowledge in incremental language processing and on human-machine interaction. After some years at the Fraunhofer IIS in Erlangen, she recently joined the University of Applied Science in Augsburg as a Professor (Forschungsprofessur) of Language Technologies and Cognitive Assistants.