As part of the cross-disciplinary AI – Data Science programme, LIRMM is pleased to welcome Prof. Stefan Dietze (Heinrich-Heine-Universität Düsseldorf, GESIS, HeiCAD, L3S) for a seminar to be held on Tuesday, 14 October 2025 at 3 p.m. in the seminar room of Building 4.
Title: Ensuring Social Scientific Data Quality and Reproducibility in the Big Data/AI Era: Challenges and Pathways
Abstract: Throughout the last decades, the social sciences have increasingly adopted novel forms of research data, e.g. data mined from the web and social media platforms. This together with the recent advances in artificial intelligence (AI) and related areas, e.g. natural language processing (NLP), led to a much more widespread adoption of diverse computational methods, including techniques from machine learning and, most prominently, large language models. However, increasingly complex computational methods lead to new challenges with respect to transparency, reproducibility and overall quality of social science research and data, further elevating an already widely recognised reproducibility crisis. This talk will, one the one hand, introduce challenges posed by the use of AI-based methods in social science research. On the other hand, it will show pathways to address such problems. Examples are works geared towards sharing computational (AI) methods in the social sciences in a reproducible and citable way, for understanding and tracing adoption of and relations between methods and datasets at large scale, e.g. in social science research in general (e.g. by mining scientific publications) or novel ways for providing access to sensitive research data in the social sciences (e.g. social media data) to facilitate reproducible research without violating ethical or legal constraints or principles.
Speaker Bio
Stefan Dietze is full professor for Data & Knowledge Engineering at Heinrich-Heine-University Düsseldorf (HHU), and scientific director of the department Knowledge Technologies for the Social Science (KTS) at GESIS. He also is deputy director at the Heine Center for Artificial Intelligence & Data Science (HeiCAD), and affiliated member at the L3S Research Center (Hannover, Germany). His research interests are at the intersection of information retrieval, knowledge graphs, NLP and machine learning, and his work is concerned with the extraction, fusion and search of knowledge and data, in particular, on the Web. Previous positions include the Knowledge Media Institute (KMI) of The Open University (UK) and the Fraunhofer Institute for Software and Systems Engineering (ISST, now part of Fraunhofer FOKUS). Stefan has obtained substantial research funding as PI and Co-PI (DFG, Leibniz, EC, BMBF) and frequently publishes at top tier conferences such as EMNLP, The WebConf, CIKM, SIGIR or NAACL and in high impact journals.
Contact LIRMM : konstantin.todorov@lirmm.fr