Win a book by debugging this shell script
October 15th, 2007
Win a copy of Chris F.A. Johnson’s Shell Scripting Recipes by telling me why the script below does not work.
UPDATE: Quite a few people responded correctly (see comments). I will sort it out tonight and decide who wins.
#/bin/bash
doRead()
{
local retVal=1
ps -e -o user | grep apache | \
while read user
do
echo $user
retVal=0
done
return $retVal
}
doRead
echo "doRead exited with retVal = $?"
When the script runs, assuming the host has a user named apache and is running something, you should see “apache” a few times and then “doRead exited with retVal = 0″. However, the actual output is below:
# ./readReturnValExample.sh
apache
apache
apache
apache
apache
apache
apache
apache
doRead exited with retVal = 1
I ran into this problem when writing the monitor cpu usage shell script. I could have put echo statement’s all over the place in order to find out what the value of retVal is at various points in the script. However, I used the shell’s builtin debug option instead. I just added “set -x” to the top of the script, like so:
#/bin/bash
set -x
doRead()
This prints out exactly what the shell is doing.
# ./readReturnValExample.sh
++ doRead
++ local retVal=1
++ ps -e -o user
++ grep apache
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ return 1
++ echo 'doRead exited with retVal = 1'
doRead exited with retVal = 1
As you can see, the value of $retVal is valued at 1, directly after the last and unsuccesful read statement. The $25.99 dollar question is, why? I remember reading something about this a long, long time ago but its quite hazy and a cursory google search did not resolve my question. The person that answers that question (except Chris as he won my last contest) gets the book. If multiple people answer, the best single answer gets the book.


October 15th, 2007 at 10:48 am
From what I can see the changes to the variable $retVal in the loop have no effect because scope of the var is confined to the subshell created by the pipe.
October 15th, 2007 at 10:53 am
It’s been a while, and I don’t have time to check my facts or the details conerning what works and what doesn’t, but…
Piping to a “while read” causes the loop to execute as a sub shell, and the variables within the loop effectively end up as local to the loop. (I beleive that pdksh has a bug where this is always true.)
Last I had to deal with this, work arounds were pretty messy. Most solutions took advantage of Awk’s ability to return a value (or at least that’s how I remember it).
Anyway, thanks for the nice blog feed. Keep it up.
October 15th, 2007 at 11:02 am
Because the pipe creates a subshell so retVal inside the while loop is in a different scope == a different variable.
October 15th, 2007 at 11:23 am
Your script is eq. with
x=1;(echo $x;x=2;echo $x);echo $x
(i.e. your changes is into a subshell and, changes made in a subshell do not carry over to the parent shell)
Try this
while …
done < <(ps -e -o user | grep apache)
to execute while into the current shell
October 15th, 2007 at 11:30 am
see E4 on bash FAQ
ftp://ftp.cwru.edu/pub/bash/FAQ
October 15th, 2007 at 12:06 pm
Simple, just put the while loop & return statement in a subshell like so:
——
#/bin/bash
doRead()
{
local retVal=1
ps -e -o user | grep qitest | \
(
while read user
do
echo $user
retVal=0
done
return $retVal
)
}
doRead
echo “doRead exited with retVal = $?”
——
Why does this fix it? The answer is explained by the issue addressed in question 24 on this site:
http://wooledge.org/mywiki/BashFAQ
October 15th, 2007 at 12:09 pm
BTW, in my recent contest answer reply I had changed the username from ‘apache’ to ‘qitest’ to try it out on a system of my own that had no apache user processes. Hopefully you noticed that!
October 15th, 2007 at 12:13 pm
I think the problem lies in the use of the while statement. It is throwing off the exit status variable which only shows the success or failure of the command immediately proceeding it.
Since the while statement checks for the truth of whether apache is in the list or not, the final hit will always be false (except in an infinite loop of apaches). So the final run of the while loop will always exit false.
I refined your function to use a for statement with command substitution and it seemed to do the trick in so far as giving the proper return value.
#/bin/bash
doRead()
{
local retVal=1
for user in $(ps -e -o user | grep apache )
do
echo $user
retVal=10
done
return $retVal
}
doRead
echo “doRead exited with retVal = $?”
You can see that I modified the content of the retVal within the for loop so that there is no ambiguity as to where the return value comes from.
October 15th, 2007 at 12:26 pm
the while loop is executed in a subshell (as all the commands in the pipe) and thus the retVal original value, 1, is never changed.
October 15th, 2007 at 12:28 pm
Although, I could be wrong.
Fun puzzle either way. I look forward to other people’s explanations.
October 15th, 2007 at 12:34 pm
I forgot: to fix the script just add parentheses around the while and return statements:
( while read user
do
echo $user
retVal=0
done
return $retVal)
October 15th, 2007 at 12:37 pm
Presumably you want solutions emailed to you like the last contest?
October 15th, 2007 at 12:43 pm
The entire pipeline beginning with ps -e -o user and ending with the end of the while loop is run in a subshell, so the retVal set inside the while loop is actually a separate variable, local to the subshell, which shadows the retVal declared at the beginning of the function. Once the while loop ends, the subshell-local retVal goes away, and the value of the original retVal (which was never changed) is returned.
October 15th, 2007 at 12:43 pm
I ran across this interesting problem on the Planet Sysadmin site
The problem in the script is caused by a combination of variable scope and sub-shells. When piping the output of the ps and grep to the while loop, BASH will execute the while loop within a sub-shell. BASH normally does this when processing the output of external commands (but not the output from internal commands). Variables set within that sub-shell, the retVal=0 in this case, do not have scope outside of the sub-shell. If you’re running BASH version 3, this is easy to test by sprinkling in a few more echo statements to print out the internal BASH_SUBSHELL variable like this:
#/bin/bash
doRead()
{
local retVal=1
echo “Sub-shell level is $BASH_SUBSHELL”
ps -e -o user | grep postfix | \
while read user
do
echo $user
echo “Sub-shell level is $BASH_SUBSHELL”
retVal=0
done
echo “Sub-shell level is $BASH_SUBSHELL”
return $retVal
}
doRead
echo “doRead exited with retVal = $?”
(Note that I replaced apache with postfix in my example as I don’t have Apache installed on my workstation)
Running it will produce an output like this:
Sub-shell level is 0
postfix
Sub-shell level is 1
postfix
Sub-shell level is 1
Sub-shell level is 0
doRead exited with retVal = 1
The entire while loop is running within a sub-shell and so variables set there are not visible outside the sub-shell.
The best way I’ve found to do this is to use a for statement like this:
doRead()
{
local retVal=1
for user in $(ps -e -o user | grep postfix)
do
echo $user
retVal=0
done
return $retVal
}
doRead
echo “doRead exited with retVal = $?”
What this does is to force the ps and grep to run within a sub-shell while the for loop reads the output from the sub-shell and runs at the same level as the rest of the script. You can show this by printing out the BASH_SUBSHELL variable in a similar way with this code:
#/bin/bash
doRead()
{
local retVal=1
for user in $(echo “Sub-shell level is $BASH_SUBSHELL” >&2 ; \
ps -e -o user | grep postfix)
do
echo $user
retVal=0
echo “Sub-shell level is $BASH_SUBSHELL”
done
return $retVal
}
doRead
echo “doRead exited with retVal = $?”
Note that the first echo of BASH_SUBSHELL is redirected to STDERR so that its output isn’t passed along to the loop. This produces the following output:
Sub-shell level is 1
postfix
Sub-shell level is 0
postfix
Sub-shell level is 0
doRead exited with retVal = 0
A good place to find more about this is the advanced bash scripting guide at http://www.tldp.org/LDP/abs/html , mainly chapter 20 on sub-shells and chapter 31 on gotchas.
I hope this was helpful and educational for you. (It certainly was for me!)
- Jeff
October 15th, 2007 at 12:55 pm
I would say it is due to process hierarchy in UNIX systems. read is executed in a subshell, which means it is treated as another process, child process if you like. And child process are not allowed to change environment of their parents.
I would suggest to change your function into:
doRead() {
local retVal=1
ps -e -o user | grep blah 2>/dev/null && retVal=0
return $retVal
}
or even better:
doRead() {
ps -e -o user | grep blah 2>/dev/null && retVal=0
return $?
}
Hope this helps.
October 15th, 2007 at 1:20 pm
The last “read user” sentence returns not zero because “user” variable has no value.
October 15th, 2007 at 1:21 pm
it’s because the loop is running in a subshell.
loops made this way
CMD | while read line; do …; done
creates a subshell.
solution maybe create a temporary file?
October 15th, 2007 at 1:28 pm
ah, there is a better way to do than creating a temp file.
it’s this way:
while read line
do
# alter variable line, or whatever
done < <( CMD )
here is a test script:
v=1
echo 234 | while read linha
do
v=$linha
echo $v
done
echo “=$v”
v=1
while read linha
do
v=$linha
echo $v
done < <( echo 234; )
echo “=$v”
October 15th, 2007 at 1:46 pm
I faced such issue some time ago and noticed that the only way to get actual outside the loop is to redirect it into file withing the loop and then get it from that file. In other words, to get retVal = 0 (right value) it’s neccesary to modify the script:
#/bin/bash
doRead()
{
local retVal=1
ps -e -o user | grep apache | \
while read user
do
echo $user
retVal=0
echo $retVal > /tmp/retVal
done
}
doRead
echo “doRead exited with retVal = $(cat /tmp/retVal)”
Unfortunately I didn’t find any other explanation why bash handles variables withing the loop but redirecting into a file works for me.
October 15th, 2007 at 1:56 pm
I didn’t expect so many responses so fast! Entries are now closed…I’ll sort through this tonight.
October 15th, 2007 at 2:02 pm
Bash is spawning a new subshell during each iteration of your while-read loop. It’s because it is receiving piped input.
It’s explained in the Advanced Bash Scripting guide and can be seen in the first example here:
http://tldp.org/LDP/abs/html/redircb.html
Sorry for the terse answer, but…
You can work around this problem by dumping your output to a temporary file and reading it into your while loop like so:
—
doRead()
{
local retVal=1
ps -e -o user | grep apache > temp.data
{
while read user
do
echo $user
retVal=0
done
} < temp.data
return $retVal
}
—
doRead
echo “doRead exited with retVal = $?”
If someone else has something more elegant (no file needed), I’d love to see it.
October 15th, 2007 at 4:31 pm
I know entries are closed, but it’s still a cool puzzle.
Couldn’t you just export $retVal? That way you’d have access the the same variable in subshells.
October 15th, 2007 at 4:49 pm
My understanding has always been that a for loop is better for iterating a definable list (which we can get with the grep statement in this example) than the more open ended while loop anyway.
I wonder if there is a way (with bash) to quantify the relative difference. It would seem, at first blush, that the while loop would take more allocated memory and might be slower.
The for loop probably has drawbacks as well.
Anybody have any numbers that they can share?
October 15th, 2007 at 5:05 pm
Sorry for my eng.
There is a scope problem.
retVal inside the loop is not the same of retVal outside the loop.
Tnx. (I would like ti have a copy of book
)
October 15th, 2007 at 5:22 pm
Jim,
Here’s a test that shows that the while loop is much faster. I am quite surprised…
$ ./whileForLoopTest.sh
55
31
$ ./whileForLoopTest.sh
54
30
Heres the script.
$ cat whileForLoopTest.sh
#!/bin/bash
run()
{
for((I=0; I < 500; I++))
do
eval "$@"
done
}
forTest()
{
for user in $( ps -e -o user )
do
:
done
}
whileTest()
{
ps -e -o user | \
while read user
do
:
done
}
start=$( date +%s )
run forTest
expr $( date +%s ) - $start
start=$( date +%s )
run whileTest
expr $( date +%s ) - $start
October 15th, 2007 at 5:24 pm
dosnlinux,
Yes, I could have exported the value. However, it seems inelegant.
October 15th, 2007 at 5:27 pm
PAshaRome,
Yes, thanks for the answer. Do not worry about the english. Not everyone is going to get a copy, I’ll let you know.
Thanks.
October 16th, 2007 at 5:28 am
Exporting retVal won’t work – the subshell will have access to its copy of the parent’s variable, not to the parent’s variable itself
November 1st, 2007 at 1:18 pm
This is problem of scope.
Scope of shell is not accessible inside subshell.
If we do not initialize or declare variable “retVal” before while loop (i.e. remove statement retVal=1 ), the script will run fine.
February 20th, 2008 at 7:49 am
The
local retVal = 1
and retVal inside the do while loop are treated differently by the shell.
whatever the loop does, the outer retVal has no effect.
So successful or unsuccesful, the function will exit with value =1.
June 14th, 2008 at 2:18 pm
The problem is with the scope of the calls and the variable being set local withing the function. If the variable had been set in the script instead of the funciton the return value would have been 0.
June 20th, 2008 at 8:13 am
The biggest problem I found with for is when dealing with space separated multi-word data input. “For” doesn’t know where the real data begins or ends.
My vote is for extending the subshell with ()’s giving you a chance to deal with vars. Good point. Thank you.