Eight ways to speed up your shell scripts
March 21st, 2008
UPDATE: Including the one I added after posting and Elias‘ quoting exampling the comments we are up to eight.
After reading Shell Scripting Recipes, I became more interested in the speed of shell operations. In his book, Chris says “Command Substitution Is Slow.” He is correct!
$ f() { echo -n }; time for i in {0..100}; do v=$( f ); done
real 0m4.189s
user 0m0.000s
sys 0m4.188s
$ f() { _F="" }; time for i in {0..100}; do f; v=$_F; done
real 0m0.006s
user 0m0.000s
sys 0m0.000s
I found a few other equivalent operations which can be used to speed up shell scripts to varying degrees (none like the above) depending on the task at hand. As Chris says, “the extra few milliseconds … may not seem significant, but scripts often loop hundred of even thousands of times.”
${#array[@]} is faster than () when expanding an array (#7)
$ a=(); time for i in {0..1000}; do a=(${a[@]} $i);done; echo ${#a[@]}
real 0m3.545s
user 0m3.544s
sys 0m0.000s
1001
$ a=(); time for i in {0..1000}; do a[${#a[@]}]=$i;done; echo ${#a[@]}
real 0m0.043s
user 0m0.040s
sys 0m0.003s
1001
< is faster than cat
$ time for i in {0..10000}; do var=`cat out`;done
real 0m9.328s
user 0m2.892s
sys 0m6.436s
$ time for i in {0..10000}; do var=`<out`;done
real 0m5.930s user 0m1.412s sys 0m4.520s
echo is faster than printf (though not nearly as powerful)
$ time for i in {0..100000}; do printf "\n"; done >/dev/null
real 0m4.446s
user 0m4.076s
sys 0m0.236s
$ time for i in {0..100000}; do echo; done >/dev/null
real 0m3.291s
user 0m3.100s
sys 0m0.184s
Arithmetic Evaluation is faster than let
$ i=0; time while :; do let "i = i + 1"; [[ $i -gt 100000 ]] && break;done
real 0m8.211s user 0m7.900s sys 0m0.304s
$ i=0; time while :; do ((i++)); [[ $i -gt 100000 ]] && break;done real 0m5.287s user 0m4.980s sys 0m0.304s
UPDATE: This appears to still be true, but by a different margin. See comments.
List expansion is faster than seq and command substitution (though not always available)
$ time for i in $(seq 0 1000000); do :; done
real 0m28.482s
user 0m28.066s
sys 0m0.412s
$ time for i in {0..1000000}; do :; done
real 0m24.563s
user 0m24.402s
sys 0m0.156s
UPDATE: On BSD systems the apparent seq equivalent (jot) is faster than list expansion. See comments.
: is faster than true
$ i=0; time while true; do ((i++)); [[ $i -gt 1000000 ]] && break;done real 0m57.360s user 0m53.967s sys 0m3.392s
$ i=0; time while :; do ((i++)); [[ $i -gt 1000000 ]] && break;done real 0m54.138s user 0m50.571s sys 0m3.560s


March 23rd, 2008 at 7:19 pm
Hi,
in “Arithmetic Evaluation is faster than let” you’re comparing
$ i=0; time while :; do let “i = i + 1″; [[ $i -gt 100000 ]] && break;done
and
$ i=0; time while :; do ((i++)); [[ $i -gt 100000 ]] && break;done
That’s not really “Arithmetic Evaluation vs. let”, though, because the former can be written as
$ i=0; time while :; do let i++; [[ $i -gt 100000 ]] && break;done
resulting in mostly the same speed. So what’s slow here is really
((i++)) vs. ((i=i+1)) or `let i++` vs. `let i=i+1`
March 23rd, 2008 at 7:31 pm
Hi,
On BSD systems, jot(1) is faster than List expansion.
bash-3.2$ time for i in `jot 1000001 0`; do :; done
real 0m8.765s
user 0m8.799s
sys 0m0.187s
bash-3.2$ time for i in {0..1000000}; do :; done
real 0m9.411s
user 0m9.346s
sys 0m0.053s
PS: naturally, using jot(1) doesn’t exactly make a script portable since it won’t work on, say, linux, but then again using seq(1) would break the script on BSD…
March 23rd, 2008 at 9:58 pm
@Elias,
You make an excellent point. The comparisons are not the same. Not sure why I did not use “let i++”. With that said, ((i++)) still seems to be faster than let i++ but not by the same margin.
Brock
March 23rd, 2008 at 10:01 pm
@Elias,
Thanks for the BSD tip….I have been meaning to start up a OpenBSD and FreeBSD vmware instance.
Brock
March 25th, 2008 at 7:46 am
Hi,
it’s more important than I would’ve thought, too, not to overuse quotes - especially with double quotes when they’re not needed:
March 25th, 2008 at 8:32 am
From Elias via email:
Regarding:
$ a=(); time for i in {0..1000}; do a=(${a[@]} $i);done; echo ${#a[@]} real 0m3.545s user 0m3.544s sys 0m0.000s 1001 $ a=(); time for i in {0..1000}; do a[${#a[@]}]=$i;done; echo ${#a[@]} real 0m0.043s user 0m0.040s sys 0m0.003s 1001Not only is the former far less efficient, it also isn’t safe for entries that contain spaces:
$ A[0]='a b' $ A[1]=c $ for ((i=0; i< ${#A[@]}; i++)); do echo ${A[$i]}; done a b c $ B=('a b') $ B=(${B[@]} c) $ for ((i=0; i<${#B[@]}; i++)); do echo ${B[$i]}; done a b c $ C=('a b') # now with quotes! $ C=("${C[@]}" c) $ for ((i=0; i<${#C[@]}; i++)); do echo ${C[$i]}; done a b c– Elias
March 25th, 2008 at 8:42 am
@Elias,
> it’s more important than I would’ve thought, too, not to overuse quotes
That is very interesting… I am guessing BASH checks each token to see if its quoted, if so it must remove the quotes which is the extra time were are seeing. I am not exactly sure why there would be a difference in single versus double quotes? Likely I guess because double quotes can have enclosed expressions whereas single quotes are taken literally.
> Not only is the former far more efficient, it also isn’t safe for entries that contain spaces:
Good point, I did not quote the variable in my example.
March 25th, 2008 at 9:27 am
I meant to say ‘far less efficient’, sorry.
March 25th, 2008 at 9:59 pm
Fixed.