Summer schools on quantitative methods: A report by Tereza Menšíková

15 Aug 2024 Tereza Menšíková

The summer days are almost over and with them the blissful time of academic rest from lectures, both for students and teachers alike. However, as science never sleeps, many academics and PhD students use this time off the classrooms for other educational quirks. One of them is attending summer schools. These schools packed with different courses are either focused on how to approach specific thematic issues, on getting more familiar with theoretical approaches, or a fair amount of them are aimed at mastering methods and techniques of data analysis. During my PhD study, I have been lucky to attend two of them, both on computational texts analysis and machine learning in R. I am saying lucky since I consider the courses extremely valuable but also quite expensive for an ordinary student. Alas, not many grants would pay for such an experience, at least not in Czechia, so the motivated students must wait for an opportunity to strike at the right time and in the right place.

In this short text, I would like to look back at the experience and evaluate how beneficial it was for me as a student to attend methods summer schools. For many students and academics, summer schools are like CEEPUS trips – they know some people went through them (even repeatedly), but they have no idea where to go and what to expect. My first experience was with Essex Summer School in Social Science Data Analysis, and I participated in it through a scholarship provided under Masaryk University student grant IGA. Essex Summer School is known to be the best in data analysis, along with LSE Summer School in London, and, unfortunately, it is also one of the most expensive ones. You will spend two weeks with dozens of academics from around the world, emanating the vibes of converts. They all want to transform their work (and even themselves) to something new and innovative by acquiring new skills, shedding their previous ideas about methods and software like a lizard's skin (Stata and SPSS rest in peace).

My first weapon of choice was a course Quantitative Text Analysis with Iulia Cioroianu in R and Python. Some of my classmates were at a similar basic level as me, and some had already done statistical magic with neural networks I had never heard of before. What this course represented for some was a kick and inspiration; for others an opportunity to improve and consult more demanding methods. For me, it was the kick I needed the most since I had only read about these methods before I arrived there. Most quantitatively focused courses mix lectures with exercises, taking you from simple methods to something like Bayesian predictive models all in one batch. This has to do with one feature of these courses: overload of newness with far more math than the average humanities (or social science) scholar sees in a lifetime. While this may seem discouraging, realistically no one thinks you can understand the math behind all the algorithms in two weeks, nor being able to run all the models by yourself which could take several days. Therefore, the first step is not to be afraid to overestimate your skills a little and dive into the unknown. You will get more insight and useful scripts to play at home.

I won't lie, the first summer school did the kick, but I still felt lost most of the time using machine learning methods, especially when I chose to work in the programming language R and not Python. A choice that haunts me to this day. Therefore, when I got the opportunity to attend another summer school this year, I went for it immediately. This time, I have chosen a more accessible organisation with a great reputation among social scientists: ECPR Methods School. They organize online courses over Zoom and offer ECTS credits for completing assignments at home. I have deliberately gone for a course with very similar content as the first one: Quantitative Text Analysis and Machine Learning using R with Dr Zachary Greene. We went from dictionaries, sentiment analysis, word embeddings, topic and hierarchical models, to large language models (LLMs) you probably know as ChatGPT. The composition of the course members was more inclined towards students, but there were also academics who had only heard of natural language processing (NLP) and wanted to take it up. Interestingly, even though both courses used the same libraries (mostly leaning on quanteda) and used the same methods, their approaches to handling the texts with metadata were quite different. They appeared like sculptors, carving the same bear statue into the wood, only using different techniques.

Since I originally came from ethnography-based research, one of the most surprising things about the courses was how lecturers addressed our expectations. Both accented that computational models are specific simplifications of our researched phenomena and that, at times, statisticians have to make decisions very similar to those of ethnographers. There are no simple instructions on how to get meaningful results and their interpretations. All methods – being predictive models or focused coding – care about understanding patterns beyond random occurrence, struggling with their limits through validation and verification. Many predictive models cannot even do without manual coding. Therefore, cracking the imaginary wall between quantitative and qualitative methods and arguing for mixed designs was deeply refreshing and unexpected.

In general, method schools offer a window into another world, usually concentrated into one or two weeks, and they will supply you with a ladder that you can use to climb there by yourself in the future. They help to dismantle magical concepts such as digital humanities or big data and replace them with practical examples and specific methods with all their advantages and limits. You could argue, why not buy a DataCamp or Udemy license for a year and learn what you want there? I would urge you to do that regardless. However, being able to discuss your issues with the code and ask questions in real life with a professional is a huge advantage. The other benefit is very simple: you have limited time to focus and learn as much about the topic and methods as possible, and that can make the difference between stumbling around DataCamp for a year and never actually trying the methods, or diving into them nose-deep. In the end, summer schools can help even an ethnographer like me not to be afraid of the big bad wolf of mathematics and causal inference. And that's quite something.

All articles

Summer schools on quantitative methods: A report by Tereza Menšíková

More articles

Gods on the barge

Tereza Menšíková is now a PhD

CEDRR now uses Bluesky as the primary social media platform

Did ancient sources overestimate the mortality of the Antonine Plague?