More Bash Scriptings

Pipes and Filters


Capturing output from commands

1
2
3
4
5
6
7
clear
cd
pwd
wget --no-check-certificate https://www.cs.wcupa.edu/lngo/data/shell-lesson-data.zip
unzip shell-lesson-data.zip
cd ~/shell-lesson-data/exercise-data/proteins
ls -l *.pdb
List files in current directory
1
2
3
man wc
wc *.pdb
wc -l *.pdb
1
2
3
4
5
6
7
8
ls
wc -l *.pdb > lengths.txt
ls
cat lengths.txt
wc -l *.pdb >> lengths.txt
cat lengths.txt
wc -l *.pdb > lengths.txt
cat lengths.txt

Filtering output

1
man sort
Challenge: what does sort -n do?
1
2
3
4
5
6
sort ~/shell-lesson-data/exercise-data/numbers.txt
10
19
2
22
6
1
2
3
4
5
6
sort -n ~/shell-lesson-data/exercise-data/numbers.txt
2
6
10
19
22
Solution
1
2
3
sort -n lengths.txt
sort -n lengths.txt > sorted-lengths.txt
cat sorted-lengths.txt
1
head -n 1 sorted-lengths.txt

Passing output to another command

1
sort -n lengths.txt | head -n 1
1
wc -l *.pdb | sort -n | head -n 1
Challenge: piping commands together
  1. wc -l * > sort -n > head -n 3
  2. wc -l * | sort -n | head -n 1-3
  3. wc -l * | head -n 3 | sort -n
  4. wc -l * | sort -n | head -n 3
Solution
Challenge: pipe reading comprehension
1
2
3
4
5
6
7
8
9
cat ~/shell-lesson-data/exercise-data/animal-counts/animals.csv
2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-06,fox,4
2012-11-07,rabbit,16
2012-11-07,bear,1

~~~bash cat animals.csv | head -n 5 | tail -n 3 | sort -r > final.txt

Solution
1
2
3
2012-11-06,rabbit,19
2012-11-06,deer,2
2012-11-05,raccoon,7
Challenge: pipe construction
1
2
man cut
cut -d , -f 2 animals.csv
Solution
1
cut -d , -f 2 animals.csv | sort | uniq
Challenge: which pipe?
1
2
3
4
5
2012-11-05,deer,5
2012-11-05,rabbit,22
2012-11-05,raccoon,7
2012-11-06,rabbit,19
...

The uniq command has a -c option which gives a count of the number of times a line occurs in its input. Assuming your current directory is shell-lesson-data/exercise-data/animal-counts, what command would you use to produce a table that shows the total count of each type of animal in the file?

  1. sort animals.csv | uniq -c
  2. sort -t, -k2,2 animals.csv | uniq -c
  3. cut -d, -f 2 animals.csv | uniq -c
  4. cut -d, -f 2 animals.csv | sort | uniq -c
  5. cut -d, -f 2 animals.csv | sort | uniq -c | wc -l
Solution

Option 4. is the correct answer.


Nelle’s Pipeline: Checking Files

1
2
cd ~/shell-lesson-data/north-pacific-gyre
ls -l 
1
wc -l *.txt | sort -n | head -n 5
1
ls *Z.txt

Loop

Suppose we have several hundred genome data files named basilisk.dat, minotaur.dat, and unicorn.dat. For this example, we’ll use the exercise-data/creatures directory which only has three example files, but the principles can be applied to many many more files at once.

The structure of these files is the same:

Let’s look at the files:

1
2
cd ~/shell-lesson-data/exercise-data/creatures/
head -n 5 basilisk.dat minotaur.dat unicorn.dat
Viewing DNA contents of mystical creatures
1
2
3
4
for thing in list_of_things
do
    operation_using $thing    # Indentation within the loop is not required, but aids legibility
done

and we can apply this to our example like this:

1
2
3
4
for filename in basilisk.dat minotaur.dat unicorn.dat
> do
>   head -n 2 $filename | tail -n 1
> done

to match all files ending in .pdb and then lists them using ls.

Challenge: limiting sets of files
1
2
3
4
5
cd ~/shell-lesson-data/exercise-data/proteins/
for filename in c*
> do
>   ls $filename
> done
  1. No files are listed.
  2. All files are listed.
  3. Only cubane.pdb, octane.pdb and pentane.pdb are listed.
  4. Only cubane.pdb is listed.
1
2
3
4
5
cd ~/shell-lesson-data/exercise-data/proteins/
for filename in *c*
> do
>   ls $filename
> done
  1. The same files would be listed.
  2. All the files are listed this time.
  3. No files are listed this time.
  4. The files cubane.pdb and octane.pdb will be listed.
  5. Only the file octane.pdb will be listed.
Solution
Challenge: saving to a file in a Loop
1
2
3
4
5
6
cd ~/shell-lesson-data/exercise-data/proteins/
for alkanes in *.pdb
> do
>   echo $alkanes
>   cat $alkanes > alkanes.pdb
> done
  1. Prints cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb, and the text from propane.pdb will be saved to a file called alkanes.pdb.
  2. Prints cubane.pdb, ethane.pdb, and methane.pdb, and the text from all three files would be concatenated and saved to a file called alkanes.pdb.
  3. Prints cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, and pentane.pdb, and the text from propane.pdb will be saved to a file called alkanes.pdb.
  4. None of the above.
1
2
3
4
5
cd ~/shell-lesson-data/exercise-data/proteins/
for datafile in *.pdb
> do
>   cat $datafile >> all.pdb
> done
  1. All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, and pentane.pdb would be concatenated and saved to a file called all.pdb.
  2. The text from ethane.pdb will be saved to a file called all.pdb.
  3. All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb would be concatenated and saved to a file called all.pdb.
  4. All of the text from cubane.pdb, ethane.pdb, methane.pdb, octane.pdb, pentane.pdb and propane.pdb would be printed to the screen and saved to a file called all.pdb.
Solution

More complicated loop

1
2
3
4
5
6
cd ~/shell-lesson-data/exercise-data/creatures
for filename in *.dat
> do
>   echo $filename
>   head -n 100 $filename | tail -n 20
> done
1
cp *.dat original-*.dat

because that would expand to:

1
cp basilisk.dat minotaur.dat unicorn.dat original-*.dat

This wouldn’t back up our files, instead we get an error:

1
cp: target `original-*.dat' is not a directory
1
2
3
4
for filename in *.dat
> do
>   cp $filename original-$filename
> done

The following diagram shows what happens when the modified loop is executed, and demonstrates how the judicious use of echo is a good debugging technique.


Nelle’s Pipeline: Processing Files

Nelle is now ready to process her data files using goostats.sh — a shell script written by her supervisor. This calculates some statistics from a protein sample file, and takes two arguments:

  1. an input file (containing the raw data)
  2. an output file (to store the calculated statistics)

Since she’s still learning how to use the shell, she decides to build up the required commands in stages. Her first step is to make sure that she can select the right input files — remember, these are ones whose names end in ‘A’ or ‘B’, rather than ‘Z’. Starting from her home directory, Nelle types:

1
2
3
4
5
cd ~/shell-lesson-data/north-pacific-gyre
for datafile in NENE*A.txt NENE*B.txt
> do
>     echo $datafile
> done

Her next step is to decide what to call the files that the goostats.sh analysis program will create. Prefixing each input file’s name with ‘stats’ seems simple, so she modifies her loop to do that:

1
2
3
4
for datafile in NENE*A.txt NENE*B.txt
> do
>     echo $datafile stats-$datafile
> done

She hasn’t actually run goostats.sh yet, but now she’s sure she can select the right files and generate the right output filenames.

Typing in commands over and over again is becoming tedious, though, and Nelle is worried about making mistakes, so instead of re-entering her loop, she presses . In response, the shell redisplays the whole loop on one line (using semi-colons to separate the pieces):

1
for datafile in NENE*A.txt NENE*B.txt; do echo $datafile stats-$datafile; done

Using the left arrow key, Nelle backs up and changes the command echo to bash goostats.sh:

1
for datafile in NENE*A.txt NENE*B.txt; do bash goostats.sh $datafile stats-$datafile; done

When she presses Enter, the shell runs the modified command. However, nothing appears to happen — there is no output. After a moment, Nelle realizes that since her script doesn’t print anything to the screen any longer, she has no idea whether it is running, much less how quickly. She kills the running command by typing Ctrl+C, uses to repeat the command, and edits it to read:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
for datafile in NENE*A.txt NENE*B.txt; do echo $datafile;
bash goostats.sh $datafile stats-$datafile; done

<details class="details details--default" data-variant="default"><summary>Beginning and End</summary>
<ul>
  <li>We can move to the beginning of a line in the shell by typing 
<kbd>Ctrl</kbd>+<kbd>A</kbd> and to the end using <kbd>Ctrl</kbd>+<kbd>E</kbd>.</li>
</ul>

</details>
When she runs her program now, it produces one line of output every five seconds or so
1518 times 5 seconds, divided by 60, tells her that her script will take about two hours to run.
As a final check, she opens another terminal window, goes into `north-pacific-gyre`,
and uses `cat stats-NENE01729B.txt` to examine one of the output files.
It looks good, so she decides to get some coffee and catch up on her reading.

<details class="details details--default" data-variant="default"><summary>Those Who Know History Can Choose to Repeat It</summary>
<p>Another way to repeat previous work is to use the <code class="language-plaintext highlighter-rouge">history</code> command to 
get a list of the last few hundred commands that have been executed, and 
then to use <code class="language-plaintext highlighter-rouge">!123</code> (where ‘123’ is replaced by the command number) to 
repeat one of those commands. For example, if Nelle types this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre><span class="nb">history</span> | <span class="nb">tail</span> <span class="nt">-n</span> 5
  456  <span class="nb">ls</span> <span class="nt">-l</span> NENE0<span class="k">*</span>.txt
  457  <span class="nb">rm </span>stats-NENE01729B.txt.txt
  458  bash goostats.sh NENE01729B.txt stats-NENE01729B.txt
  459  <span class="nb">ls</span> <span class="nt">-l</span> NENE0<span class="k">*</span>.txt
  460  <span class="nb">history</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>then she can re-run <code class="language-plaintext highlighter-rouge">goostats.sh</code> on <code class="language-plaintext highlighter-rouge">NENE01729B.txt</code> simply by typing
<code class="language-plaintext highlighter-rouge">!458</code>.</p>

</details>
<details class="details details--default" data-variant="default"><summary>Challenge: doing a dry run</summary>
<ul>
  <li>A loop is a way to do many things at once — or to make many mistakes at
once if it does the wrong thing. One way to check what a loop <em>would</em> do
is to <code class="language-plaintext highlighter-rouge">echo</code> the commands it would run instead of actually running them.</li>
  <li>Suppose we want to preview the commands the following loop will execute
without actually running those commands:</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="rouge-code"><pre><span class="k">for </span>datafile <span class="k">in</span> <span class="k">*</span>.pdb
<span class="o">&gt;</span> <span class="k">do</span>
<span class="o">&gt;</span>   <span class="nb">cat</span> <span class="nv">$datafile</span> <span class="o">&gt;&gt;</span> all.pdb
<span class="o">&gt;</span> <span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<ul>
  <li>What is the difference between the two loops below, and which one would we
want to run?</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre><span class="c"># Version 1</span>
<span class="k">for </span>datafile <span class="k">in</span> <span class="k">*</span>.pdb
<span class="o">&gt;</span> <span class="k">do</span>
<span class="o">&gt;</span>   <span class="nb">echo cat</span> <span class="nv">$datafile</span> <span class="o">&gt;&gt;</span> all.pdb
<span class="o">&gt;</span> <span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>~~~bash</p>
<h1 id="version-2">Version 2</h1>
<p>for datafile in *.pdb</p>
<blockquote>
  <p>do
  echo “cat $datafile » all.pdb”
done</p>
</blockquote>

<details class="details details--note" data-variant="note"><summary>Solution</summary>
<ul>
  <li>The second version is the one we want to run.
This prints to screen everything enclosed in the quote marks, expanding the 
loop variable name because we have prefixed it with a dollar sign. 
It also <em>does not</em> modify nor create the file <code class="language-plaintext highlighter-rouge">all.pdb</code>, as the <code class="language-plaintext highlighter-rouge">&gt;&gt;</code> 
is treated literally as part of a string rather than as a 
redirection instruction.</li>
  <li>The first version appends the output from the command <code class="language-plaintext highlighter-rouge">echo cat $datafile</code> 
to the file, <code class="language-plaintext highlighter-rouge">all.pdb</code>. This file will just contain the list; 
<code class="language-plaintext highlighter-rouge">cat cubane.pdb</code>, <code class="language-plaintext highlighter-rouge">cat ethane.pdb</code>, <code class="language-plaintext highlighter-rouge">cat methane.pdb</code> etc.</li>
  <li>Try both versions for yourself to see the output! Be sure to change to the 
proper directory and open <code class="language-plaintext highlighter-rouge">all.pdb</code> file to view its contents.</li>
</ul>

</details>
</details>
<details class="details details--default" data-variant="default"><summary>Challenge: nested loops</summary>
<ul>
  <li>Suppose we want to set up a directory structure to organize 
some experiments measuring reaction rate constants with different compounds 
<em>and</em> different temperatures.  What would be the result of the following code:</li>
</ul>

<p>~~~bash
for species in cubane ethane methane</p>
<blockquote>
  <p>do
   for temperature in 25 30 37 40
   do
      mkdir $species-$temperature
    done
done</p>
</blockquote>

<details class="details details--note" data-variant="note"><summary>Solution</summary>
<ul>
  <li>We have a nested loop, i.e. contained within another loop, so for each species
in the outer loop, the inner loop (the nested loop) iterates over the list of
temperatures, and creates a new directory for each combination.</li>
  <li>Try running the code for yourself to see which directories are created!</li>
</ul>

</details>
</details>
---

## Shell scripting

    - Let's start by going back to `~/shell-lesson-data/exercise-data/proteins$` and creating a new file, 
    `middle.sh` which will become our shell script:

    ~~~bash
    cd ~/shell-lesson-data/exercise-data/proteins
    nano middle.sh
    cat middle.sh
    ~~~

    - Add the following line to `middle.sh` and save:
      - `head -n 15 octane.pdb | tail -n 5`
    - Once we have saved the file, we can ask the shell to execute the commands it contains.
    Our shell is called `bash`, so we run the following command:

    ~~~bash
    bash middle.sh
    ~~~

    



<figure
  
>
  <picture>
    <!-- Auto scaling with imagemagick -->
    <!--
      See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
      https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
    -->
    
      
        <source
          class="responsive-img-srcset"
          
            srcset="/assets/img/courses/csc586/09-scripting-linux/script-middle-480.webp 480w,/assets/img/courses/csc586/09-scripting-linux/script-middle-800.webp 800w,/assets/img/courses/csc586/09-scripting-linux/script-middle-1400.webp 1400w,"
            type="image/webp"
          
          
            sizes="95vw"
          
        >
      
    
    <img
      src="/assets/img/courses/csc586/09-scripting-linux/script-middle.png"
      
      
        width="50%"
      
      
        height="auto"
      
      
      
      
      
        data-zoomable
      
      
        loading="lazy"
      
      onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
    >
  </picture>

  
</figure>

    

<details class="details details--default" data-variant="default"><summary>Text vs. Whatever</summary>
<p>We usually call programs like Microsoft Word or LibreOffice Writer <em>text 
editors</em>, but we need to be a bit more careful when it comes to 
programming. By default, Microsoft Word uses <code class="language-plaintext highlighter-rouge">.docx</code> files to store not 
only text, but also formatting information about fonts, headings, and so 
on. This extra information isn’t stored as characters and doesn’t mean 
anything to tools like <code class="language-plaintext highlighter-rouge">head</code>: they expect input files to contain 
nothing but the letters, digits, and punctuation on a standard computer 
keyboard. When editing programs, therefore, you must either use a plain 
text editor, or be careful to save files as plain text.</p>

</details>
- What if we want to select lines from an arbitrary file? We could edit 
`middle.sh` each time to change the filename, but that would probably 
take longer than typing the command out again in the shell and 
executing it with a new file name. Instead, let's edit `middle.sh` 
and make it more versatile:
  - Edit `middle.sh` and replace the text `octane.pdb` with the special variable called `$1`. 
    - Wrap `$1` inside double quotes: `"$1"`. 
  - `$1` means 'the first filename (or other argument) on the command line'.

~~~bash
nano middle.sh
cat middle.sh
bash middle.sh octane.pdb
bash middle.sh pentane.pdb
1
2
3
nano middle.sh
cat middle.sh
bash middle.sh pentane.pdb 15 5
1
bash middle.sh pentane.pdb 20 5
1
wc -l *.pdb | sort -n
1
2
3
# Sort files by their length.
# Usage: bash sorted.sh one_or_more_filenames
wc -l "$@" | sort -n
1
2
3
4
cd ~/shell-lesson-data/exercise-data/proteins
nano sorted.sh
cat sorted.sh
bash sorted.sh *.pdb ../creatures/*.dat
1
#!/bin/bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
chmod 755 sorted.sh
./sorted.sh

<details class="details details--default" data-variant="default"><summary>Challenge: list unique species</summary>
<ul>
  <li>Leah has several hundred data files, each of which is formatted like this:</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>2013-11-05,deer,5
2013-11-05,rabbit,22
2013-11-05,raccoon,7
2013-11-06,rabbit,19
2013-11-06,deer,2
2013-11-06,fox,1
2013-11-07,rabbit,18
2013-11-07,bear,1
</pre></td></tr></tbody></table></code></pre></div></div>

<ul>
  <li>An example of this type of file is given in 
<code class="language-plaintext highlighter-rouge">shell-lesson-data/exercise-data/animal-counts/animals.csv</code>.</li>
  <li>We can use the command <code class="language-plaintext highlighter-rouge">cut -d , -f 2 animals.txt | sort | uniq</code> to produce 
the unique species in <code class="language-plaintext highlighter-rouge">animals.txt</code>.</li>
  <li>In order to avoid having to type out this series of commands every time, 
a scientist may choose to write a shell script instead.</li>
  <li>Write a shell script called <code class="language-plaintext highlighter-rouge">species.sh</code> that takes any number of 
filenames as command-line arguments, and uses a variation of the above command 
to print a list of the unique species appearing in each of those files separately.</li>
</ul>

<details class="details details--note" data-variant="note"><summary>Solution</summary>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="c">#!/bin/bash</span>
<span class="c"># Script to find unique species in csv files where species is the second data field</span>
<span class="c"># This script accepts any number of file names as command line arguments</span>
<span class="c"># Loop over all files</span>
<span class="k">for </span>file <span class="k">in</span> <span class="nv">$@</span>
<span class="k">do
  </span><span class="nb">echo</span> <span class="s2">"Unique species in </span><span class="nv">$file</span><span class="s2">:"</span>
  <span class="c"># Extract species names</span>
  <span class="nb">cut</span> <span class="nt">-d</span> , <span class="nt">-f</span> 2 <span class="nv">$file</span> | <span class="nb">sort</span> | <span class="nb">uniq
</span><span class="k">done</span>
</pre></td></tr></tbody></table></code></pre></div></div>

</details>
</details>
- Suppose we have just run a series of commands that did something useful --- for example,
that created a graph we'd like to use in a paper. We'd like to be able to re-create the 
graph later if we need to, so we want to save the commands in a file. 
- Instead of typing them in again (and potentially getting them wrong) we can do this:

history | tail -n 5 > redo-figure-3.sh

1
2
3
The file `redo-figure-3.sh` now *could* contains:

297 bash goostats.sh NENE01729B.txt stats-NENE01729B.txt 298 bash goodiff.sh stats-NENE01729B.txt /data/validated/01729.txt > 01729-differences.txt 299 cut -d ‘,’ -f 2-3 01729-differences.txt > 01729-time-series.txt 300 ygraph –format scatter –color bw –borders none 01729-time-series.txt figure-3.png 301 history | tail -n 5 > redo-figure-3.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
- After a moment's work in an editor to remove the serial numbers on the commands,
and to remove the final line where we called the `history` command, 
we have a completely accurate record of how we created that figure.
- In practice, most people develop shell scripts by running commands at the shell prompt a few 
times to make sure they're doing the right thing, then saving them in a file for re-use.
- This style of work allows people to recycle what they discover about their data and their 
workflow with one call to `history` and a bit of editing to clean up the output
and save it as a shell script.

---

## Nelle's Pipeline: Creating a Script

- Nelle's supervisor insisted that all her analytics must be reproducible.
The easiest way to capture all the steps is in a script.

- First we return to Nelle's project directory:

~~~bash
cd ../../north-pacific-gyre/
1
nano do-stats.sh
1
2
3
4
5
6
7
#!/bin/bash
# Calculate stats for data files.
for datafile in "$@"
do
    echo $datafile
    bash goostats.sh $datafile stats-$datafile
done
1
./do-stats.sh NENE*A.txt NENE*B.txt
1
./do-stats.sh NENE*A.txt NENE*B.txt | wc -l
1
2
3
4
5
6
7
#!/bin/bash
# Calculate stats for Site A and Site B data files.
for datafile in NENE*A.txt NENE*B.txt
do
    echo $datafile
    bash goostats.sh $datafile stats-$datafile
done
Challenge: variables in shell scripts
1
2
3
#!/bin/bash
head -n $2 $1
tail -n $3 $1

While you are in the proteins directory, you type the following command:

1
./script.sh '*.pdb' 1 1

Which of the following outputs would you expect to see?

  1. All of the lines between the first and the last lines of each file ending in .pdb in the proteins directory
  2. The first and the last line of each file ending in .pdb in the proteins directory
  3. The first and the last line of each file in the proteins directory
  4. An error because of the quotes around *.pdb
1
2
head -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb
tail -n 1 cubane.pdb ethane.pdb octane.pdb pentane.pdb propane.pdb
Challenge: find the longest file with a given extension
1
./longest.sh shell-lesson-data/data/pdb pdb

would print the name of the .pdb file in shell-lesson-data/data/pdb that has the most lines.

Feel free to test your script on another directory e.g. ~~~ bash longest.sh shell-lesson-data/writing/data txt

Solution
1
2
3
4
5
6
7
#!/bin/bash
# Shell script which takes two arguments:
#    1. a directory name
#    2. a file extension
# and prints the name of the file in that directory
# with the most lines which matches the file extension.
wc -l $1/*.$2 | sort -n | tail -n 2 | head -n 1
Challenge: script reading comprehension
1
2
# Script 1
echo *.*
1
2
3
4
5
# Script 2
for filename in $1 $2 $3
do
  cat $filename
done

~~~bash

Script 3

echo $@.pdb

Solution

In each case, the shell expands the wildcard in *.pdb before passing the resulting list of file names as arguments to the script.

1
cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb.pdb
Challenge: debugging scripts
1
2
3
4
5
6
# Calculate stats for data files.
for datafile in "$@"
do
  echo $datfile
  bash goostats.sh $datafile stats-$datafile
done
1
bash do-errors.sh NENE*A.txt NENE*B.txt
1
bash -x do-errors.sh NENE*A.txt NENE*B.txt
Solution