A course developed and taught by Indira Sen, Maximilian Kreutner, and Georg Ahnert
This course focuses on the collection and analysis of social media data for two types of societally relevant applications --- for studying the impact of social media on society as well as the use of social media data to learn about society. The course will introduce technical details on the development of social media platforms, programmatic and large-scale data collection of platform data, and the analysis of this data using computational methods. Students will be introduced to data collection from a variety of platforms including Wikipedia, Youtube, and Tiktok. This course will also help students critically reflect on the epistemology of social media data and the validity of its analysis.
| Week | Lecture | Readings | Tutorial |
|---|---|---|---|
| 1 | Course Intro + Potentials and Pitfalls of Social Media Data | Infrastructure setup | |
| 2 | Data Collection 1: Web Scraping | 1. Lazer, David MJ, et al. "Computational social science: Obstacles and opportunities." Science 369.6507 (2020): 1060-1062. 2. Gerard, Patrick, Nicholas Botzer, and Tim Weninger. "Truth social dataset." Proceedings of the international AAAI conference on web and social media. Vol. 17. 2023. |
Python Recap and Web Scraping |
| 3 | Data Collection 2: APIs | 1. David Garcia, "Background on APIs" 2. Murtfeldt, Ryan, et al. "RIP Twitter API: A eulogy to its vast research contributions." arXiv preprint arXiv:2404.07340 (2024). |
dynamic webpage scraping, API intro |
| 4 | Data Collection 3: Platform Affordances | Wikipedia, Bluesky API | |
| 5 | Data Collection 4: Sampling Social Media Data | Youtube, Tiktok | |
| 6 | Data Processing 1: Data Cleaning | data cleaning | |
| 7 | Data Processing 2: Data Exploration | data vizualization and exploratory data analysis | |
| 8 | Data Analysis 1: Text Analysis I | infrastructure and project consulation | |
| 9 | Project Pitches + background | stats revision + NLP basics | |
| 10 | Data Analysis 2: Text Analysis II | NLP Intermediate (topic modeling, transformers, LLMs) | |
| 11 | Data Analysis 3: Network Analysis | Network Science basics | |
| 12 | Ethics, Reproducibility, and Documentation | TBD | |
| 13 | Midway Presentations | project work + consulation | |
| 14 | Summary and Outlook | project work + consulation |
Some of the materials in this course is based on a series of Social Media Workshops Indira conducted with Prof. Katrin Weller. We're grateful to her for working on these materials!