Introduction to CTA with Dr. Vojtěch Kaše

Name: Introduction to CTA with Dr. Vojtěch Kaše
Start: 2020-03-31T10:00:00+02:00
End: 2020-03-31T16:00:00+02:00
Location: The seminar is held fully in the virtual environment. See details below.

31 March 2020
10:00 AM – 4:00 PM
The seminar is held fully in the virtual environment. See details below.

The Centre for the Digital Research of Religion, a sub-department of the Department for the Study of Religions, Faculty of Arts, Masaryk University (https://religionistika.phil.muni.cz/cedrr) is pleased to invite you to the latest in its series of open seminars aimed at enhancing digital competencies.
Due to the current circumstances, the seminar will be delivered online.

The seminar will focus on the basics of computational text analysis. Examples will be mainly be drawn from patristic literature. As well as learning about the techniques themselves, participants will also have the opportunity to gain first-hand experience of their application.

At a time when we all find our opportunities for academic interaction limited, we believe that this seminar will be a refreshing participatory experience for all who attend.

When: Tuesday, March 31, 2020 10: 00-12: 30 and 13: 30-16: 00

Where: Zoom Meeting
https://cesnet.zoom.us/j/746514643?pwd=YmNOOTc2NnAyS2xMczdaUE5WU01VQT09
Meeting ID: 746,514,643
Password: 774530
Language: English

Helper doc: https://docs.google.com/document/d/e/2PACX-1vQhzXblq3tVWJDbIoIjDTsl0BlSC0Wynyc6UYnCiXViLNLKAmi7KcIuAsvJe__B3JIdW9XwMGIE2mcA/pub

You do not need to register for the seminar or videoconferencing service. Simply visit the above address and then enter the password. Zoom offers a standalone application for download, but will also work in all modern web browsers.

Annotation:

This online workshop introduces some basic methods and tools for computational text analysis in Python and is aimed at humanities audience with very limited or no experience with programming. The workshop will consist of two 2-hours sessions full of hands-on exercises, based mainly on textual data of potential relevance for the audience (e.g. historical sources in Latin or Greek). The first session will begin slowly, enabling anyone to properly set up the working environment and run all scripts and exercises. The second session will culminate with introducing some more advanced examples of word embeddings. The workshop will cover concepts and themes such as cleaning textual data, tokenization, lemmatization, POStagging, parsing TEI-XML files, bag-of-words model, word co-occurrences, the distributional hypothesis of meaning and word embeddings. The workshop does not require installing any special software, the only technical requirements are an internet connection, a web browser, and a Google account.

Preliminary schedule:

Morning session

10:00 - a short intro into Google Colaboratory, Jupyter Notebooks, and Python
10:15 - text as a variable
10:30 - cleaning textual data
10:45 - lemmatization and POStagging

10:55 - short break

11:00 - HTTP requests and TEI-XML parsing
11:15 - word frequencies and TF-IDF
11:30 - word clouds
11:45 - case study 1: Jesus’ sayings across 5 gospels

12:00 - 13:30 Lunch break

Afternoon session

13:30 - Introduction to Distributional Semantics
13:40 - word-by-word & word-by-doc matrices
14:00 - the idea of word embeddings
14:15 - Latent semantic analysis

14:25 - short break

14:30 - case study 2: text mining cultural evolution of moralizing religions
15:00 - word co-occurrence networks

15:15 - concluding discussion

We look forward to seeing you,
David Zbíral
Head of the Centre for the Digital Research of Religion

Share event