View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

Week 2, Day 3: Wednesday, July 19

Synopsis

Week 2, Day 3 expands upon the idea of digital editions as text processing pipelines. After a short recap of day 2, we continue with the step normalization. We will show how these two pipeline stages prepare texts for automated collation. The process of automated collation is also discussed from a modeling perspective (with the Gothenburg Model). Participants learn that their research goals and questions influence the computational pipelines.

Outcome goals


9:00–10:30: Normalization

Time Topic Type
15 min Review of week 2, day 2: computational pipelines, modeling, processing, and tokenization Discussion
15 min Basic normalization Presentation
15 min Using NLTK for normalization Code lab
15 min Regular expressions Code lab
15 min Basic XML normalization: transforming XML to a stream of normalized (word) tokens Code lab
15 min Hands-on exercise with NLTK and regular expressions Code lab

10:30–11:00: Coffee break

11:00–12:30: Collation

Time Topic Type
15 min Modeling and collation Presentation
15 min Collation within editorial theory Talk lab
30 min Collation practice Code lab
30 min Tokenization and normalization for collation purposes Code lab

12:30–2:00: Lunch

2:00–3:30: Challenging textual phenomena: Introducing Text as Graph (TAG)

Time Topic Type
90 min Challenging textual phenomena: Introducing Text as Graph (TAG) Presentation

3:30–4:00: Coffee break

4:00–5:30: Review

Time Topic Type
90 min Review Talk lab

We’ll end each day with a request for feedback, based on a general version of the day’s outcome goals, and we’ll try to adapt on the fly to your responses. Please complete Week 2, Day 3 feedback (just copy and paste it into a plain-text document) and email your response to Kaylen at kaylensanders@pitt.edu with the subject heading “Week 2, Day 3 feedback”.