Win a copy of Chris F.A. Johnson’s Shell Scripting Recipes by telling me why the script below does not work.

UPDATE: Quite a few people responded correctly (see comments).  I will sort it out tonight and decide who wins.

#/bin/bash
doRead()
{
  local retVal=1
  ps -e -o user | grep apache | \
  while read user
  do
  echo $user
  retVal=0
  done
  return $retVal
}
doRead
echo "doRead exited with retVal = $?"

When the script runs, assuming the host has a user named apache and is running something, you should see “apache” a few times and then “doRead exited with retVal = 0″. However, the actual output is below:

# ./readReturnValExample.sh
apache
apache
apache
apache
apache
apache
apache
apache
doRead exited with retVal = 1

I ran into this problem when writing the monitor cpu usage shell script.  I could have put echo statement’s all over the place in order to find out what the value of retVal is at various points in the script.  However, I used the shell’s builtin debug option instead. I just added “set -x” to the top of the script, like so:

#/bin/bash
set -x
doRead()

This prints out exactly what the shell is doing.

# ./readReturnValExample.sh
++ doRead
++ local retVal=1
++ ps -e -o user
++ grep apache
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ echo apache
apache
++ retVal=0
++ read user
++ return 1
++ echo 'doRead exited with retVal = 1'
doRead exited with retVal = 1

As you can see, the value of $retVal is valued at 1, directly after the last and unsuccesful read statement. The $25.99 dollar question is, why? I remember reading something about this a long, long time ago but its quite hazy and a cursory google search did not resolve my question. The person that answers that question (except Chris as he won my last contest) gets the book. If multiple people answer, the best single answer gets the book.

32 Responses to “Win a book by debugging this shell script”

  1. Stuart Colville Says:

    From what I can see the changes to the variable $retVal in the loop have no effect because scope of the var is confined to the subshell created by the pipe.

  2. John A. Says:

    It’s been a while, and I don’t have time to check my facts or the details conerning what works and what doesn’t, but…

    Piping to a “while read” causes the loop to execute as a sub shell, and the variables within the loop effectively end up as local to the loop. (I beleive that pdksh has a bug where this is always true.)

    Last I had to deal with this, work arounds were pretty messy. Most solutions took advantage of Awk’s ability to return a value (or at least that’s how I remember it).

    Anyway, thanks for the nice blog feed. Keep it up.

  3. Nickus Says:

    Because the pipe creates a subshell so retVal inside the while loop is in a different scope == a different variable.

  4. carco Says:

    Your script is eq. with
    x=1;(echo $x;x=2;echo $x);echo $x
    (i.e. your changes is into a subshell and, changes made in a subshell do not carry over to the parent shell)

    Try this
    while …
    done < <(ps -e -o user | grep apache)
    to execute while into the current shell

  5. carco Says:

    see E4 on bash FAQ
    ftp://ftp.cwru.edu/pub/bash/FAQ

  6. drcheap Says:

    Simple, just put the while loop & return statement in a subshell like so:

    ——
    #/bin/bash
    doRead()
    {
    local retVal=1
    ps -e -o user | grep qitest | \
    (
    while read user
    do
    echo $user
    retVal=0
    done
    return $retVal
    )
    }
    doRead
    echo “doRead exited with retVal = $?”
    ——

    Why does this fix it? The answer is explained by the issue addressed in question 24 on this site:
    http://wooledge.org/mywiki/BashFAQ

  7. drcheap Says:

    BTW, in my recent contest answer reply I had changed the username from ‘apache’ to ‘qitest’ to try it out on a system of my own that had no apache user processes. Hopefully you noticed that!

  8. Jim Says:

    I think the problem lies in the use of the while statement. It is throwing off the exit status variable which only shows the success or failure of the command immediately proceeding it.

    Since the while statement checks for the truth of whether apache is in the list or not, the final hit will always be false (except in an infinite loop of apaches). So the final run of the while loop will always exit false.

    I refined your function to use a for statement with command substitution and it seemed to do the trick in so far as giving the proper return value.
    #/bin/bash
    doRead()
    {
    local retVal=1
    for user in $(ps -e -o user | grep apache )
    do
    echo $user
    retVal=10
    done
    return $retVal
    }
    doRead
    echo “doRead exited with retVal = $?”

    You can see that I modified the content of the retVal within the for loop so that there is no ambiguity as to where the return value comes from.

  9. Daniel Says:

    the while loop is executed in a subshell (as all the commands in the pipe) and thus the retVal original value, 1, is never changed.

  10. Jim Says:

    Although, I could be wrong.

    Fun puzzle either way. I look forward to other people’s explanations.

  11. Daniel Says:

    I forgot: to fix the script just add parentheses around the while and return statements:
    ( while read user
    do
    echo $user
    retVal=0
    done
    return $retVal)

  12. brett Says:

    Presumably you want solutions emailed to you like the last contest?

  13. Brent Yorgey Says:

    The entire pipeline beginning with ps -e -o user and ending with the end of the while loop is run in a subshell, so the retVal set inside the while loop is actually a separate variable, local to the subshell, which shadows the retVal declared at the beginning of the function. Once the while loop ends, the subshell-local retVal goes away, and the value of the original retVal (which was never changed) is returned.

  14. Jeff Berntsen Says:

    I ran across this interesting problem on the Planet Sysadmin site

    The problem in the script is caused by a combination of variable scope and sub-shells. When piping the output of the ps and grep to the while loop, BASH will execute the while loop within a sub-shell. BASH normally does this when processing the output of external commands (but not the output from internal commands). Variables set within that sub-shell, the retVal=0 in this case, do not have scope outside of the sub-shell. If you’re running BASH version 3, this is easy to test by sprinkling in a few more echo statements to print out the internal BASH_SUBSHELL variable like this:

    #/bin/bash
    doRead()
    {
    local retVal=1
    echo “Sub-shell level is $BASH_SUBSHELL”
    ps -e -o user | grep postfix | \
    while read user
    do
    echo $user
    echo “Sub-shell level is $BASH_SUBSHELL”
    retVal=0
    done
    echo “Sub-shell level is $BASH_SUBSHELL”
    return $retVal
    }
    doRead
    echo “doRead exited with retVal = $?”

    (Note that I replaced apache with postfix in my example as I don’t have Apache installed on my workstation)

    Running it will produce an output like this:

    Sub-shell level is 0
    postfix
    Sub-shell level is 1
    postfix
    Sub-shell level is 1
    Sub-shell level is 0
    doRead exited with retVal = 1

    The entire while loop is running within a sub-shell and so variables set there are not visible outside the sub-shell.

    The best way I’ve found to do this is to use a for statement like this:

    doRead()
    {
    local retVal=1
    for user in $(ps -e -o user | grep postfix)
    do
    echo $user
    retVal=0
    done
    return $retVal
    }
    doRead
    echo “doRead exited with retVal = $?”

    What this does is to force the ps and grep to run within a sub-shell while the for loop reads the output from the sub-shell and runs at the same level as the rest of the script. You can show this by printing out the BASH_SUBSHELL variable in a similar way with this code:

    #/bin/bash
    doRead()
    {
    local retVal=1
    for user in $(echo “Sub-shell level is $BASH_SUBSHELL” >&2 ; \
    ps -e -o user | grep postfix)
    do
    echo $user
    retVal=0
    echo “Sub-shell level is $BASH_SUBSHELL”
    done
    return $retVal
    }
    doRead
    echo “doRead exited with retVal = $?”

    Note that the first echo of BASH_SUBSHELL is redirected to STDERR so that its output isn’t passed along to the loop. This produces the following output:

    Sub-shell level is 1
    postfix
    Sub-shell level is 0
    postfix
    Sub-shell level is 0
    doRead exited with retVal = 0

    A good place to find more about this is the advanced bash scripting guide at http://www.tldp.org/LDP/abs/html , mainly chapter 20 on sub-shells and chapter 31 on gotchas.

    I hope this was helpful and educational for you. (It certainly was for me!)

    - Jeff

  15. miljan Says:

    I would say it is due to process hierarchy in UNIX systems. read is executed in a subshell, which means it is treated as another process, child process if you like. And child process are not allowed to change environment of their parents.

    I would suggest to change your function into:

    doRead() {
    local retVal=1
    ps -e -o user | grep blah 2>/dev/null && retVal=0
    return $retVal
    }

    or even better:

    doRead() {
    ps -e -o user | grep blah 2>/dev/null && retVal=0
    return $?
    }

    Hope this helps.

  16. roko Says:

    The last “read user” sentence returns not zero because “user” variable has no value.

  17. elias junior Says:

    it’s because the loop is running in a subshell.
    loops made this way

    CMD | while read line; do …; done

    creates a subshell.

    solution maybe create a temporary file?

  18. elias junior Says:

    ah, there is a better way to do than creating a temp file.
    it’s this way:

    while read line
    do
    # alter variable line, or whatever
    done < <( CMD )

    here is a test script:
    v=1
    echo 234 | while read linha
    do
    v=$linha
    echo $v
    done
    echo “=$v”

    v=1
    while read linha
    do
    v=$linha
    echo $v
    done < <( echo 234; )
    echo “=$v”

  19. Artem Nosulchik Says:

    I faced such issue some time ago and noticed that the only way to get actual outside the loop is to redirect it into file withing the loop and then get it from that file. In other words, to get retVal = 0 (right value) it’s neccesary to modify the script:

    #/bin/bash
    doRead()
    {
    local retVal=1
    ps -e -o user | grep apache | \
    while read user
    do
    echo $user
    retVal=0
    echo $retVal > /tmp/retVal
    done
    }
    doRead
    echo “doRead exited with retVal = $(cat /tmp/retVal)”

    Unfortunately I didn’t find any other explanation why bash handles variables withing the loop but redirecting into a file works for me. :)

  20. admin Says:

    I didn’t expect so many responses so fast! Entries are now closed…I’ll sort through this tonight.

  21. stanleypane Says:

    Bash is spawning a new subshell during each iteration of your while-read loop. It’s because it is receiving piped input.

    It’s explained in the Advanced Bash Scripting guide and can be seen in the first example here:

    http://tldp.org/LDP/abs/html/redircb.html

    Sorry for the terse answer, but…

    You can work around this problem by dumping your output to a temporary file and reading it into your while loop like so:

    doRead()
    {
    local retVal=1
    ps -e -o user | grep apache > temp.data

    {
    while read user
    do
    echo $user
    retVal=0
    done
    } < temp.data

    return $retVal
    }

    doRead
    echo “doRead exited with retVal = $?”

    If someone else has something more elegant (no file needed), I’d love to see it.

  22. dosnlinux Says:

    I know entries are closed, but it’s still a cool puzzle.

    Couldn’t you just export $retVal? That way you’d have access the the same variable in subshells.

  23. Jim Says:

    My understanding has always been that a for loop is better for iterating a definable list (which we can get with the grep statement in this example) than the more open ended while loop anyway.

    I wonder if there is a way (with bash) to quantify the relative difference. It would seem, at first blush, that the while loop would take more allocated memory and might be slower.

    The for loop probably has drawbacks as well.

    Anybody have any numbers that they can share?

  24. PAshaRome Says:

    Sorry for my eng.

    There is a scope problem.

    retVal inside the loop is not the same of retVal outside the loop.

    Tnx. (I would like ti have a copy of book ;) )

  25. admin Says:

    Jim,

    Here’s a test that shows that the while loop is much faster. I am quite surprised…


    $ ./whileForLoopTest.sh
    55
    31
    $ ./whileForLoopTest.sh
    54
    30

    Heres the script.


    $ cat whileForLoopTest.sh
    #!/bin/bash
    run()
    {
    for((I=0; I < 500; I++))
    do
    eval "$@"
    done
    }

    forTest()
    {
    for user in $( ps -e -o user )
    do
    :
    done
    }

    whileTest()
    {
    ps -e -o user | \
    while read user
    do
    :
    done
    }

    start=$( date +%s )
    run forTest
    expr $( date +%s ) - $start

    start=$( date +%s )
    run whileTest
    expr $( date +%s ) - $start

  26. admin Says:

    dosnlinux,

    Yes, I could have exported the value. However, it seems inelegant.

  27. admin Says:

    PAshaRome,

    Yes, thanks for the answer. Do not worry about the english. Not everyone is going to get a copy, I’ll let you know.

    Thanks.

  28. Adny Says:

    Exporting retVal won’t work - the subshell will have access to its copy of the parent’s variable, not to the parent’s variable itself :)

  29. Mangesh Says:

    This is problem of scope.
    Scope of shell is not accessible inside subshell.
    If we do not initialize or declare variable “retVal” before while loop (i.e. remove statement retVal=1 ), the script will run fine.

  30. Sulakshana Says:

    The
    local retVal = 1

    and retVal inside the do while loop are treated differently by the shell.

    whatever the loop does, the outer retVal has no effect.

    So successful or unsuccesful, the function will exit with value =1.

  31. Bobby Says:

    The problem is with the scope of the calls and the variable being set local withing the function. If the variable had been set in the script instead of the funciton the return value would have been 0.

  32. Al Says:

    The biggest problem I found with for is when dealing with space separated multi-word data input. “For” doesn’t know where the real data begins or ends.

    My vote is for extending the subshell with ()’s giving you a chance to deal with vars. Good point. Thank you.

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) email the text to me.