Five Shell Programming Tips
February 10th, 2007
I recently read this article Good Shell Coding Practices on handling command line arguments in scripts. I am going to cover some tips for general shell programming I have learned over the years.
- grep -q: Do you use the exit status of grep? If so, you the -q switch instead of sending the output to /dev/null
#!/bin/bash
if grep 'Synergy' messages >/dev/null
then
echo 'Synergy has logged to messages'
else
echo 'Synergy has NOT logged to messages'
fiif grep -q 'Synergy' messages
then
echo 'Synergy has logged to messages'
else
echo 'Synergy has NOT logged to messages'
fi
Why use -q instead of >/dev/null? Well its ALOT faster. I have not looked at the source but I believe -q implies -m 1 as they are the same speed when used together. (See below) The option -m 1 tells grep to stop after the first match. When using -q you only care if the file contains your search term and do not care how many times or what those lines look like. Thus it would make sense that -q implies -m 1.
[root@mojito ~]# time grep 'Synergy' messages >/dev/nullreal 0m0.360s
user 0m0.266s
sys 0m0.072s
[root@mojito ~]# time grep -q 'Synergy' messages
real 0m0.002s
user 0m0.001s
sys 0m0.001s
[root@mojito ~]# time grep -q -m 1 'Synergy' messages
real 0m0.002s
user 0m0.000s
sys 0m0.002s - grep -c: Instead of using grep to match lines in a file or input and then piping that to wc -l, just use grep -c. Its faster and simpler:
[root@mojito ~]# time grep 'Synergy' messages | wc -l
495948real 0m0.519s
user 0m0.363s
sys 0m0.118s
[root@mojito ~]# time grep -c 'Synergy' messages
495948
real 0m0.245s
user 0m0.167s
sys 0m0.052s - VAL=$( funcName ): Most people new to UNIX/Linux don’t use functions in their scripts. More expirence brings more functions. However, too many people have those functions set environment variables instead of returning the values directly. You cannot use the return command to return values (except exit status) but you can use echo. For example, here is two simple functions using the environment variable method and the echo method:
[root@mojito ~]# cat returnVals.sh
#!/bin/bash
function badFunc()
{
export badFuncVar=VALUE
}
function goodFunc()
{
echo VALUE
}
badFunc
var1=$badFuncVar
var2=$( goodFunc )
var3=` goodFunc `
echo "var1=$var1 , var2=$var2 , var3=$var3"
[root@mojito ~]# ./returnVals.shvar1=VALUE , var2=VALUE , var3=VALUE
As you can see setting the environment variables takes two steps and requires you to remember the name of the variables! Too much work for me. See my discussion on the number of days in a month function.Note: “ and $() are the same thing. I personally use $() to call functions and “ to call executables.
- xargs: Newbies often us loops or variables when xargs would suffice. xargs reads standard input, splits the input by whitepsace (spaces and newlines) and then executes the command you specify with the processed input. The three examples below all delete files in the current working directory matching ‘mess*’ which contain the word ‘Synergy’. However, the third is the simplest and most concise. Its the UNIX solution.Using a variable:
[root@mojito ~]# export FILES=`find . -type f -name 'mess*' -exec grep -q 'Synergy' {} \; -print`
[root@mojito ~]# rm -f $FILES
Using a loop:
[root@mojito ~]# for FILE in `find . -type f -name 'mess*' -exec grep -q 'Synergy' {} \; -print`
do rm -f $FILE; done
Using xargs:
[root@mojito ~]# find . -type f -name 'mess*' -exec grep -q 'Synergy' {} \; -print | xargs rm -f - read: read is command built into the shell. You can use it to obtain data interactively from a user:
[root@mojito ~]# read VAR
value
[root@mojito ~]# echo $VAR
value
Or you can use read in scripts like the following. If the following script seems complex, don’t get hung up on the specifics. Just notice how the condition in the while loop is ‘read FILE’ which causes the while loop will continue to loop until the variable FILE is not assigned a new value.
[root@mojito ~]# cat logRotate.sh
#!/bin/bash
find . -type f -name 'mess*.[0-9]' | sort -r | \
while read FILE
do
NUM=`echo $FILE | awk -F. '{print $3}'` # filename Ex: ./file.$NUM
BASE=`echo $FILE | awk -F. '{print $2}'`
((NUM++))
mv -f $FILE .$BASE.$NUM
done
cp -f messages messages.0
cat /dev/null > messages[root@mojito ~]# ls -l messages*
-rw------- 1 root root 107207426 Feb 10 21:00 messages
-rw------- 1 root root 107207426 Feb 10 20:58 messages.0
-rw------- 1 root root 107207426 Feb 10 20:57 messages.1
-rw------- 1 root root 107207426 Feb 10 20:57 messages.2
-rw------- 1 root root 107207426 Feb 10 20:55 messages.3
-rw------- 1 root root 107207426 Feb 10 20:55 messages.4
-rw------- 1 root root 107207426 Feb 10 20:55 messages.5
-rw------- 1 root root 107207426 Feb 10 20:51 messages.6
[root@mojito ~]# ./logRotate.sh
[root@mojito ~]# ls -l messages*
-rw------- 1 root root 0 Feb 10 21:00 messages
-rw------- 1 root root 107207426 Feb 10 21:00 messages.0
-rw------- 1 root root 107207426 Feb 10 20:58 messages.1
-rw------- 1 root root 107207426 Feb 10 20:57 messages.2
-rw------- 1 root root 107207426 Feb 10 20:57 messages.3
-rw------- 1 root root 107207426 Feb 10 20:55 messages.4
-rw------- 1 root root 107207426 Feb 10 20:55 messages.5
-rw------- 1 root root 107207426 Feb 10 20:55 messages.6
-rw------- 1 root root 107207426 Feb 10 20:51 messages.7
December 9th, 2007 at 12:59 pm
Promoting grep -c & -q encourages the proliferation of options in programs that don’t need them—how big is the cat(1) man page on your system? If grep didn’t have all those extra options, maybe the clearer and more general grep | wc would be faster yet!
(If it’s the program start-up time you’re concerned about, that’s what the sticky bit is for.)
December 20th, 2007 at 8:49 pm
While I believe your opinion has merit, I disagree. Also regarding the sticky bit, see the chmod manual:
STICKY FILES
On older Unix systems, the sticky bit caused executable files to be hoarded in swap space. This feature is not useful on modern VM systems, and the Linux kernel ignores the sticky bit on files. Other kernels may use the sticky bit on files for system-defined purposes. On some systems, only the superuser can set the sticky bit on files.
January 8th, 2009 at 1:53 am
I agree with Joel’s opinion. I believe the *nix philosophy is to use small, sharp tools. I mean, grep -c is definitely less intuitive than grep | wc -l.
Then again, the information here does fit under the title “Five Shell Programming __Tips__”, I guess it’s just that we should opt for the more intuitive use than some obscure command line arguments in most everyday usages.
August 9th, 2009 at 8:43 pm
If you need to use xargs for filenames with spaces use the zero terminated format instead:
find . -type f -name ‘mess*’ -exec grep -q ‘Synergy’ {} \; -print0 | xargs -0 rm -f
November 10th, 2009 at 10:47 pm
$ *nix philosophy: make each program do 1 thing well, causes:
1 small program
2 process creation is as cheap as possible
3 features aren’t easily added to program
4 piping becomes the natural / intuitive way for ipc
$ the tips no 1,2,4 can be summarized to “create process as necessary”, this can be achieved in many ways (eg. using features
of tool used, tweaking the pipeline)
- the maintainer of each tool considers many things before a feature is added to the tool, such as will it be used
frequently?, does it offers much efficiency?
- no one is expected to remember all features (except the famous ones) of a tool unless he limits his area of expertise.
however, each feature added till now is logically appropriate, i mean u can expect that grep has a feature that only output
the num of matched line / diff can suppress its output & only tell whether files are different. that’s why i think doing
optimization to a script is only done after the script has satisfied its functional requirement (up to this point the script
is made in the intuitive way) & only scripts that are heavily used treated this way. the fun part is that u can skip this
boring task by making someone do it, cause it is simple
- tools’ maintainers tend to use the same opt char in controlling similar future (ex: for suppressing output, its either q /
s)
$ tips no 3 isn’t consistent with the others if what u’re looking for is performance, it forks another subsh to do the func.
a solution to this problem is to maintain a convention on the func identifier & the var that stores the return val, just like
maintaining convention of what a func do & its identifier. example:
_mul(){
_MUL=$(($1*$2))
}
mul(){
echo _mul $1 $2
}
- this way u can store _MUL to a var / use it directly as long as access to _MUL is serialized
- try it with time: compare: time v=$(mul 2 3) with time { _mul 2 3; v=$_MUL; }
$ no 5 isn’t a tip, ‘read’ is important since it’s the easiest & cheapest (builtin) way, u can do v=$(</dev/stdin) /
v=$(dd) with eof ending the input but it’s just dumb unless u need a particular characteristic where ‘read’ can’t satisfy
December 27th, 2009 at 1:10 am
I’d like to comment on point 3.
good_func can only return one value. Also, a sub-shell is created. If you call good_func in a loop, it can be very slow.
bad_func runs in the same shell. It can set multiple values. Typically, you can have a convention such as storing the result in RESULT. It runs faster in a loop.
In good_func which echoes the result, what happens if you are later debugging and add some echoes inside the prog. You must then be careful that you redirect your echo to stderr.