Semester 2, 2025-2026
Type of courseMethodological and Practical Courses
DateApril 9, 2026
LocationUniversity of Groningen
1 day
Maximum number of participants20
ECTS0.5 EC will be appointed for participation in the complete course
StaffClaudia C. Kitz
From Digital Content to Discovery: Web-Scraping and Unsupervised Machine Learning for Social Science Research
Content
As social life increasingly takes place online, web-scraping and unsupervised machine learning provide social scientists with powerful tools to access and analyze large-scale digital data—transforming words into numbers to reveal new perspectives on social processes. This course introduces participants to the core concepts, tools, and practical applications of web-scraping and unsupervised machine learning, with a focus on how they can be used to support original research in the social sciences.
The day familiarizes participating students with web-scraping and unsupervised machine learning methods and how these approaches can be applied to research in the social sciences. In the first part of the day, we will discuss the kinds of research questions that can be addressed using online data and unsupervised techniques. We will focus on the practical challenges and benefits of using web-scraping and unsupervised learning in social research, including data processing, analysis techniques, and model interpretation. We will illustrate these approaches by discussing examples of research that use large-scale digital trace data (e.g., online platforms, or forums). We will also share practical tips and tools for building scalable scraping pipelines and choosing appropriate unsupervised methods to answer different research questions (e.g., (structural) topic modeling, sentiment analysis).
At the end of the day, participants will form small groups to discuss how web-scraping and unsupervised machine learning could be used to answer research questions relevant to their PhD projects. During this time, each group will work on a short pitch for a research study that incorporates digital data and unsupervised methods.
Time schedule
10:15 – 10:30: Arrival and Welcome
10:30 – 12:30: Introduction to web scraping and unsupervised machine learning
12:30 – 13:00: Lunch Break
13:00 – 14:00: Applied Examples
14:00 – 14:15: Coffee Break
14:30 – 16:00: Develop your research idea in groups
16:00 – 16:45: Pitch your ideas
16:45 – 17:00: Wrap-up
17:00: Goodbye and Drinks (optional)
Learning goals
By the end of the workshop, participants will:
Literature
Compulsory: