Week 1, Day 3: Wednesday, July 12
Synopsis
Day three kicks off with a deeper look at refining regular expressions to tailor
patterns to specific research needs. Day three introduces grep
, which applies
Regex to operating on the command line. Day three also introduces Python and covers
the basics of its text processing capabilities.
Outcome goals
- Creating regular expressions for more complicated match patterns
- Using Regex and the command line collaboratively
- Facility with NLTK
- Processing a text file in Python
- Facility with Python
Legend
- Presentation: by instructors
- Discussion: instructors and participants
- Talk lab: participants discuss or plan in small groups
- Code lab: participants code alone or in small groups
9:00–10:30: Regular expressions 2
Last session we covered simple patterns and repetition. We also did some exercises on this using egrep
. Today, we cover alternation and grouping before we continue using egrep
with more advanced expressions. We then compare Regex in egrep
and Python.
Time | Topic | Type |
---|---|---|
30 min | Alternation | Code lab |
30 min | Grouping | Code lab |
30 min | Comparison to Python | Code lab |
10:30–11:00: Coffee break
11:00–12:30: Command line 3
With regular expressions under our belt, we are ready to learn advanced grep
skills for the command line. We will explore how we can search for lines that match a pattern and how to exploit advanced features (and work around complications) of this utility. We will also learn about the find
command.
Time | Topic | Type |
---|---|---|
15 min | Working with files | Code lab |
15 min | Looping with for |
Code lab |
45 min | Matching things with grep |
Code lab |
15 min | Finding files with find |
Code lab |
12:30–2:00: Lunch
2:00–3:30: Python clinic 1
The first day of the Python clinic will introduce the basics of text processing, in preparation for more complex textual analysis in the continuation of this unit. We will introduce and discuss the Natural Language Toolkit (NLTK) and the role it plays in processing text files. Participants will also gain familiarity with the Jupyter Notebook environment.
Time | Topic | Type |
---|---|---|
15 min | Data prep and Jupyter tips | Discussion |
30 min | Python basics | Presentation |
20 min | Using NLTK | Presentation |
25 min | Processing a single text file | Code lab |
3:30–4:00: Coffee break
4:00–5:30: Review
Time | Topic | Type |
---|---|---|
90 min | Review | Talk lab |
We’ll end each day with a request for feedback, based on a general version of the day’s outcome goals, and we’ll try to adapt on the fly to your responses. Please complete Week 1, Day 3 feedback (just copy and paste it into a plain-text document) and email your response to Kaylen at kaylensanders@pitt.edu with the subject heading “Week 1, Day 3 feedback”.