We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.
Let's start by putting the following lines in the file sorted_lengths.sh
:
#! /bin/bash wc -l *.pdb | sort -n > sorted_lengths.txt
The first line is a “shebang” or “hash bang” it tells the shell which program to run the script with. In this case it is a bash script. For a python script one would add the path for python. The second line is our pipe & filter command for generating a sorted list of file line counts.
We can try to run the script:
$./sorted_lengths.sh
bash: ./sorted_lengths.sh: Permission denied
Not quite what we expected. We don't have permission to execute / run the file.
Unix controls who can read, modify, and run files using *permissions*.
Users can belong to any number of groups,
each of which has a unique group name
and numeric group ID
The list of who's in what group is usually stored in the file /etc/group
.
Now let's look at files and directories. Every file and directory on a Unix computer belongs to one owner and one group. Along with each file's content, the operating system stores the numeric IDs of the user and group that own it.
The user-and-group model means that for each file every user on the system falls into one of three categories: the owner of the file, someone in the file's group, and everyone else.
For each of these three categories, the computer keeps track of whether people in that category can read the file, write to the file, or execute the file (i.e., run it if it is a program).
user | group | all | |
---|---|---|---|
read | yes | yes | yes |
write | yes | no | no |
execute | no | non | no |
it would mean that:
Let's look at this model in action.
Let's run ls -l
$ ls -l
The -l
flag tells ls
to give us a long-form listing.
It's a lot of information, so let's go through the columns in turn.
On the right side, we have the files' names. Next to them, moving left, are the times and dates they were last modified. Backup systems and other tools use this information in a variety of ways, but you can use it to tell when you (or anyone else with permission) last changed a file.
Next to the modification time is the file's size in bytes
and the names of the user and group that owns it
(in this case, jens
and jens
respectively).
We'll skip over the second column for now
(the one showing 1
for each file)
because it's the first column that we care about most.
This shows the file's permissions, i.e., who can read, write, or execute it.
Let's have a closer look at one of those permission strings:
-rwxr-xr-x
.
The first character tells us what type of thing this is:
'-' means it's a regular file,
while 'd' means it's a directory,
and other characters mean more esoteric things.
The next three characters tell us what permissions the file's owner has.
Here, the owner can read, write, and execute the file: rwx
.
The middle triplet shows us the group's permissions.
If the permission is turned off, we see a dash, so r-x
means “read and execute, but not write”.
The final triplet shows us what everyone who isn't the file's owner, or in the file's group, can do.
In this case, it's 'r-x' again, so everyone on the system can look at the file's contents and run it.
To change permissions, we use the chmod
command
(whose name stands for “change mode”).
$ chmod u+x sorted_lengths.sh
The 'u' signals that we're changing the privileges
of the user (i.e., the file's owner),
+
that we are adding permissions,
and rw
is the new set of permissions.
A quick ls -l
shows us that it worked,
because the owner's permissions are now set to read and write:
$ ls -l
Now we can finally run our script.
$ ./sorted_lengths.sh
We check the result with ls
and cat
.
$ ls
cubane.pdb ethane.pdb methane.pdb sorted_lengths.sh sorted_lengths.txt
$ cat sorted_lengths.txt
9 methane.pdb 12 ethane.pdb 15 propane.pdb 20 cubane.pdb 21 pentane.pdb 30 octane.pdb 107 total
Maybe we want to make the input more flexible so that we can tell the script which files to run on.
Let's edit the file:
#! /bin/bash wc -l $* | sort -n > sorted_lengths.txt
$*
means “All of the command-line parameters to the shell script.”
Before we try this out let's add a comment:
#! /bin/bash # Write sorted file length (lines) list in sorted_lengths # Usage: sorted_lengths.sh filenames wc -l $* | sort -n > sorted_lengths.txt
A comment starts with a #
character and runs to the end of the line. The computer ignores comments, but they're invaluable for helping people understand and use scripts.
We can run the script specifying files
$ ./sorted_lengths.sh methane.pdb ethane.pdb propane.pdb
or using the wild-card:
$ ./sorted_lengths.sh *.pdb
Let's make another script:
#! /bin/bash # Select lines from the middle of a file # Usage: middle.sh filename -end_line -num_lines head $2 $1 | tail $3
Inside a shell script, $1
means “the first filename (or other parameter) on the command line”. $2
and $3
mean the “second parameter” and “third parameter”, respectively.
What does this script do?
Now we have two scripts we can run in succession. When we make a small change to one of the input file is there a quick way to rerun the whole analysis?
$*
refers to all of a shell script's command-line parameters. $1
, $2
, etc., refer to specified command-line parameters.