View on GitHub

NEH Institute materials

July 2017

Home | Admin | Week 1 | Week 2 | Week 3 | Misc

Command line 3

Working with files

Review

New

Looping with for

for syntax

A job for for

For more information about substring removal, see http://wiki.bash-hackers.org/syntax/pe#substring_removal (including the section immediately below, subtitled “How the heck does that help to make my life easier?”).

Finding things with grep

Useful command switches for grep

grep activity

In data-shell/writing, working with haiku.txt:

Matching strings

Leaving aside haiku.txt for the moment, find all files in the writing directory that contain ‘the’. Compare:

Not every command as an -s switch. Inside data-shell try:

wc *

Leaving aside haiku.txt for the moment, find all filenames (not the lines in the files, just the names) in all files in the writing directory or its subdirectories that contain the (case-insensitive). The answer should be:

data/LittleWomen.txt
data/one.txt
data/two.txt
haiku.txt
tools/format

Now get the count. The answer should be:

data/LittleWomen.txt:8418
data/one.txt:28
data/two.txt:142
haiku.txt:4
old/.gitkeep:0
thesis/empty-draft.md:0
tools/format:1
tools/old/oldtool:0
tools/stats:0

Notice the files with 0 hits are included. We’ll get rid of those shortly.

Matching regex

grep searches for a regex pattern, and not just a string. so, in haiku.txt:

Greedy and non-greedy matching

The task is to find all quoted text in data-shell/writing/data/LittleWomen.txt. Try:

grep -En --color '".*"' LittleWomen.txt

and scroll up to line 12378. What’s the problem? Regex matches are greedy by default; they prefer the longest possible match.

To fix it, make the match non-greedy with:

grep -En --color '".*?"' LittleWomen.txt

That doesn’t fix all the problems, though. Scroll up to line 15624; what’s the problem there? (It can’t be fixed in grep, but there are regex contexts that can deal with it.)

Inverting a search with grep -v

Previously we ran: Find all filenames (not the lines in the files, just the names) in all files in the writing directory or its subdirectories that contain the (case-insensitive). The result was:

data/LittleWomen.txt:8418
data/one.txt:28
data/two.txt:142
haiku.txt:4
old/.gitkeep:0
thesis/empty-draft.md:0
tools/format:1
tools/old/oldtool:0
tools/stats:0

Now: Pipe this output into a second grep command to get rid of the lines that report zero hits.

grep complications

Sibling rivalry

for sis in Jo Meg Beth Amy
do
	echo $sis:
	grep -ocw $sis LittleWomen.txt
done

Finding files with find

grep finds lines that contain text. find finds files with filenames that match a string.

Warning: find is recursive. Run it at the top of your filesystem and it will look at every file = take a long time.

Practice with find

Inside writing:

Combining find with other commands