View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

Week 2, Day 4: Thursday, July 20

Synopsis

Week 2, Day 4 Mike Kestemont joins the instructional team to teach bag of words and text processing. Unfortunately, Mike is unable to publish materials used during this institute online due to copyright issues, so for now we will link his excellent presentations instead. Authenticity criticism | Stylometry with R

Outcome goals


9:00–10:30: Text analytics 1

Text analytics 1

Time Topic Type
15 min Bag of words Presentation
30 min Text processing Presentation
15 min Text as tables Code lab
30 min Query the tables Code lab

10:30–11:00: Coffee break

11:00–12:30: Text analytics 1 (cont.)

Time Topic Type
90 min Bag of words, text processing, text as tables, query the tables (continued) Code lab

12:30–2:00: Lunch

2:00–3:30: Modeling: annotations as layers to the text

Modeling: annotations as layers to the text

Time Topic Type
15 min [Review of tokenization, normalization, and collation from the point of view of annotations] Discussion
15 min Envisioning your edition as a layered model Talk lab
15 min Existing models (e.g. computational linguistics) Presentation
15 min Hands-on: identify your own layers Talk lab
30 min Hands-on: model your edition’s pipeline Code lab

3:30–4:00: Coffee break

4:00–5:30: Collation 2

Collation 2

Time Topic Type
15 min Advanced collation: Alignment in the Gothenburg Model Presentation
45 min Near-matching - theory (as step in the computational pipeline) Code lab
30 min Review Talk lab

We’ll end each day with a request for feedback, based on a general version of the day’s outcome goals, and we’ll try to adapt on the fly to your responses. Please complete Week 2, Day 4 feedback (just copy and paste it into a plain-text document) and email your response to Kaylen at kaylensanders@pitt.edu with the subject heading “Week 2, Day 4 feedback”.