View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

Week 1, Day 3: Wednesday, July 12

Synopsis

Day three kicks off with a deeper look at refining regular expressions to tailor patterns to specific research needs. Day three introduces grep, which applies Regex to operating on the command line. Day three also introduces Python and covers the basics of its text processing capabilities.

Outcome goals


9:00–10:30: Regular expressions 2

Last session we covered simple patterns and repetition. We also did some exercises on this using egrep. Today, we cover alternation and grouping before we continue using egrep with more advanced expressions. We then compare Regex in egrep and Python.

Time Topic Type
30 min Alternation Code lab
30 min Grouping Code lab
30 min Comparison to Python Code lab

10:30–11:00: Coffee break

11:00–12:30: Command line 3

With regular expressions under our belt, we are ready to learn advanced grep skills for the command line. We will explore how we can search for lines that match a pattern and how to exploit advanced features (and work around complications) of this utility. We will also learn about the find command.

Time Topic Type
15 min Working with files Code lab
15 min Looping with for Code lab
45 min Matching things with grep Code lab
15 min Finding files with find Code lab

12:30–2:00: Lunch

2:00–3:30: Python clinic 1

The first day of the Python clinic will introduce the basics of text processing, in preparation for more complex textual analysis in the continuation of this unit. We will introduce and discuss the Natural Language Toolkit (NLTK) and the role it plays in processing text files. Participants will also gain familiarity with the Jupyter Notebook environment.

Time Topic Type
15 min Data prep and Jupyter tips Discussion
30 min Python basics Presentation
20 min Using NLTK Presentation
25 min Processing a single text file Code lab

3:30–4:00: Coffee break

4:00–5:30: Review

Time Topic Type
90 min Review Talk lab

We’ll end each day with a request for feedback, based on a general version of the day’s outcome goals, and we’ll try to adapt on the fly to your responses. Please complete Week 1, Day 3 feedback (just copy and paste it into a plain-text document) and email your response to Kaylen at kaylensanders@pitt.edu with the subject heading “Week 1, Day 3 feedback”.