Missing space - deleting open files

I ran into this one again today. If a file is open when deleted, it will not appear in a directory listing, but will take up space.

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   58G   11G  86% /
# cat - >>large-file &
[1] 8958
# lsof large-file
COMMAND  PID USER   FD   TYPE DEVICE       SIZE    NODE NAME
cat     8958 root    1w   REG  253,0 5120000000 4300883 large-file
# rm -f large-file
# lsof | grep large-file
cat       8958      root    1w      REG      253,0 5120000000    4300883 /root/large-file (deleted)
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   58G   11G  86% /
# kill -9 8958
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      72G   53G   15G  79% /
[1]+  Killed                  cat - >>large-file

uuencode/uudecode on RHEL (CentOS)

Earlier today I was looking to use uuencode on my RHEL host. Unfortunately, yum did not help:

# yum search uuencode
Loading "installonlyn" plugin
Setting up repositories
base                      100% |=========================| 1.1 kB    00:00
updates                   100% |=========================|  951 B    00:00
addons                    100% |=========================|  951 B    00:00
extras                    100% |=========================| 1.1 kB    00:00
Reading repository metadata in from local files
No Matches found

Furthermore, I struggled to find the correct search terms for Google to provide me with an answer. The correct package is “sharutils.” Anyways, for good measure, here is a quick demo of uuencode/uudecode:

$ echo "BASH Cures Cancer" > test.txt
$ zip test.zip test.txt
  adding: test.txt (stored 0%)
$ uuencode < test.zip -
begin 664 -
M4$L#!`H``````-%9=3@7HDD\$@```!(````(`!4`=&5S="YT>'155`D``^G>
MXT?IWN-'57@$`/0!]`%"05-(($-U<F5S($-A;F-E<@I02P$"%P,*``````#1
M674X%Z))/!(````2````"``-```````!````M($`````=&5S="YT>'155`4`
?`^G>XT=5>```4$L%!@`````!``$`0P```$T`````````
`
end
$ uuencode < test.zip - | uudecode > test2.zip
$ unzip test2.zip
Archive:  test2.zip
replace test.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
 extracting: test.txt
$ cat test.txt
BASH Cures Cancer

From the manual: “Uuencode reads file (or by default the standard input) and writes an encoded version to the standard output.  The encoding uses only printing ASCII characters and includes the mode of the file and the operand name for use by uudecode.

The other day, I began wondering which comparator, test, [, or [[, was fastest? Here are the results:

$ time for i in {1..100000}; do [[ -d . ]];done
real    0m1.256s
user    0m1.018s
sys     0m0.238s
$ time for i in {1..100000}; do [ -d . ];done
real    0m3.407s
user    0m2.704s
sys     0m0.703s
$ time for i in {1..100000}; do test -d .;done

real    0m3.223s
user    0m2.607s
sys     0m0.616s

The double bracket is a “compound command” where as test and the single bracket are shell built-ins (and in actuality are the same command). Thus, the single bracket and double bracket execute different code.

The test and single bracket are the most portable as they exist as separate and external commands. However, if your using any remotely modern version of BASH, the double bracket is supported.

Here is the performance numbers on the external version of test and single bracket:

$ time for i in {1..100000}; do /usr/bin/test -d .;done

real    5m49.324s
user    0m51.771s
sys     4m48.013s
$ time for i in {1..100000}; do /usr/bin/[ -d . ];done

real    5m45.728s
user    0m52.536s
sys     4m46.259s

Wow! This shows the high cost of process creation!

Five Quick Command Line Tips

January 18th, 2008

These are my tips for the week. Have you any tips to share?

Create many files of random size

$ for i in $(seq 1 1000); do let "size = ($RANDOM/100)"; dd if=/dev/urandom \
of=file-$i count=$size 2>/dev/null;done

Delete files with find, faster

Many people use find to delete files. However, they often to so like this:

find directory conditionals -exec rm {} \;

Which certainly works. However, find has a built-in “-delete” action which is significantly faster.

$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -exec rm {} \;

real    0m2.540s
user    0m1.076s
sys     0m1.196s
$ find . -name 'file-*'| wc -l
0

$ find . -name 'file-*'| wc -l
1000
$ time find . -name 'file-*' -delete

real    0m0.118s
user    0m0.008s
sys     0m0.100s
$ find . -name 'file-*'| wc -l
0

Finding more fun

Often those new to the shell do something like this to find the size of files:

$ ls -la | awk '/^\-/ {print $5}'
98816
99328
99840
99840
99840

Or the last modification date:

$ ls -la | awk '/^\-/ {print $6, $7}'
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03

Find offers a much clean interface to this information:

$ find . -maxdepth 1 -type f -printf "%s\n"
98816
99328
99840
99840
99840
$ find . -maxdepth 1 -type f -printf "%TY-%Tm-%Td %TH:%TM\n"
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03
2008-01-17 23:03

Standard input can only be read once

$ echo a > a
$ echo b > b
$ echo c > c

That is, this:

$ cat a b c | cat -
a
b
c

Is the same as:

$ cat a b c | cat - -
a
b
c

Meaning that standard input is “read desctructive.” As Pádraig Brady said recently on the core utils mailing list:

$ echo mouse | cat - -
mouse

Odd characters showing up in your terminal when using Putty

Go to your settings:

Window
Translation
Change “Recieved data assumed to be in which character set” to UTF-8

Update: I changed the first examples when finding last modification date to use awk instead of grep and awk.  Hard to change my ways!

Do it better with awk 1

January 10th, 2008

If you are system administrator or developer, you need to process log files to have a better grasp of situation. Many people use Perl or Python to help with this task. However, many times using one of the P languages is overkill. Furthermore, every single day I am on a machine that I cannot make changes to and thus cannot use my helper script. However, awk has the tools available to solve most on-the-fly log processing problems, directly from the command line. In addition, awk can provide a more concise and faster solution the the pipeline of cut, grep, sort, and other commands you are currently using.

In this article, this is the format of the file I am working with:

$ tail -n 1 access_log-2008-01
1.1.1.1 - - [10/Jan/2008:17:26:51 -0600] "GET / HTTP/1.1" 200 38856

Basically what we have here is: ip address, date, request, response code, response size. (Ignoring the dashes after the ip address.)

How would you find the largest response sent by your HTTP server? My typical solution has always been:

$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272

However, there is clearly a better solution.

By default, awk splits input lines by spaces, and assigns the entire line to $0, each field to $n, and the number of fields to NF. See this example:

$ echo a b c d e f | awk '{print $0}'
a b c d e f
$ echo a b c d e f | awk '{print $1}'
a
$ echo a b c d e f | awk '{print $2}'
b
$ echo a b c d e f | awk '{print NF}'
6

Note that you can print the last field by saying print the (NF)’s variable:

$ echo a b c d e f | awk '{print $NF}'
f

Or print the second variable from the end:

$ echo a b c d e f | awk '{print $(NF-1)}'
e

Look at my example again:

$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272

That solution starts three processes and filters the data three times. That is exceedingly inefficient! How about this:

$ awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272

This starts one process and filters the data only one time. That command in English says: For each line, if the last field is greater than the max, set it to the variable “max”. Once we have processed all the lines, print the variable max.

Which command do you suppose is faster?

 $ time awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272
real    0m1.107s
user    0m1.070s
sys     0m0.037s

$ time awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272
real    0m0.207s
user    0m0.194s
sys     0m0.012s

Experts state that “1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.” That log file is only 12MB in size and there is a different in speed which you can notice at the terminal. Imagine if the log is 300MB?

Awk also has extremely accessible associative arrays. Here I use an array to count HTTP response codes:

$ awk '{counts[$(NF-1)]+=1}; END {for(code in counts) print code, counts[code]}' \
access_log-2008-01
206 177
301 1212
302 302
304 5051
403 5
200 82539
404 906
405 1
500 183

The previous command in English says: for each line, using the second to last field as our index, increment our array. Once we have proccessed all lines, loop through the array assigning “code” to the array index.

Lets count the number of requests for each URL:

$ awk '{counts[$(NF-3)]+=1}; END {for(url in counts) print counts[url], url}' \
access_log-2008-01 | sort -n
...output removed...
796 /media/centos5.0_install/common/AA-bios.jpg
846 /robots.txt
1063 /media/misc/why-bad-interpreter-premature-end-of-script-headers.png
1425 /media/10-linux-commands-youve-never-used/mkfifo-write-to-pipe.png
1443 /media/10-linux-commands-youve-never-used/read-from-pipe.png
1629 /
2066 /feed/
3073 /10-linux-commands-youve-never-used.html
3909 /wp2.3/wp-content/themes/minn-01/style.css
6989 /favicon.ico

Now lets sum the responses sizes each URL and display it in MB:

$ awk '{sizes[$(NF-3)]+=$NF}; END {for(url in sizes) print (sizes[url]/1024/1024) "MB", url}' \
access_log-2008-01  | sort -n
...output removed...
68.6784MB /media/centos5.0_install/gui_common/AQ-install-in-progress-3.png
72.0453MB /media/centos5.0_install/gui_common/AP-install-in-progress-2.png
74.0067MB /media/centos5.0_install/gui_common/AT-setup-agent-welcome.png
74.6089MB /media/centos5.0_install/gui_common/AV-setup-agent-firewall-r-u-sure.png
78.2652MB /media/centos5.0_install/gui_common/BA-setup-agent-sound-card.png
80.3148MB /media/centos5.0_install/gui_common/AG-bootloader-configuration.png
85.8359MB /media/centos5.0_install/gui_common/AI-set-timezone.png
101.836MB /media/centos_4.4_boot.iso
137.622MB /
263.253MB /media/centos_5.0_boot.iso

Lets do the same for IP addresses:

 $ awk '{counts[$1]+=1}; END {for(ip in counts) print counts[ip], ip}' \
access_log-2008-01 | sort -n
...output removed...
378 67.202.20.7
402 65.214.45.100
476 195.225.177.39
493 87.207.147.201
702 66.150.96.121
704 213.239.195.172
968 82.150.18.3
1335 65.28.61.246
2330 66.249.73.75
2883 71.63.249.40

$ awk '{sizes[$1]+=$NF}; END {for(ip in sizes) print (sizes[ip]/1024/1024) "MB", ip}' \
access_log-2008-01 | sort -n
...output removed...
20.9338MB 61.64.209.144
21.8517MB 116.71.182.210
23.4265MB 85.102.126.48
31.5194MB 213.239.195.172
32.732MB 67.176.123.158
37.9046MB 66.249.73.75
56.1901MB 71.63.249.40
57.9892MB 67.202.20.7
78.6117MB 65.28.61.246

Sum the size of all responses by ip address if the response code is 200:

$ awk '$(NF-1) == 200 {sizes[$1]+=$NF}; END {for(ip in sizes) print (sizes[ip]/1024/1024) "MB", ip}' \
access_log-2008-01 | sort -n
...output removed...
16.5405MB 220.181.38.245
16.7031MB 207.67.117.178
16.7661MB 128.227.0.66
16.9171MB 67.176.123.158
18.2246MB 71.72.54.173
31.5194MB 213.239.195.172
37.3774MB 66.249.73.75
53.6944MB 71.63.249.40
57.9885MB 67.202.20.7
76.9965MB 65.28.61.246

The command in English: for each line, if the response code is 200 ($(NF-1)), then increment our array at index ip address ($1), by response size ($NF).

Any questions, comments, or suggestions? I will be writing a second article on some other features of awk in the near future.

I have had several people write asking how to fix unresolved uid’s and gid’s after moving servers. That is, they moved hardware/operating system install and did not create the users and groups on the new host with the same id’s as the old host. I am presenting a resolution. The script when run on the old host will output a list of find commands which you can run on the new host. This script is only meant to run on Linux. The Solaris /AIX version’s of find do not support the options I need to perform this change safely.

Note: This script assumes that the old ids are not resolved on the new host. That is, all the old id’s do not resolve, accidentally, to users on the new host. If this is NOT the case, you can remove the -nouser and -nogroup find options. However, I would recommend running the script as is and then resolving any other issues by hand.

The process is as follows:

  • Download the script to the old host.
  • Run with the parameters needed and output redirected to a file.
  • Copy output file to the new host and make executable.
  • Execute as root.

Usage:

# ./fixNoUserGroupNames.sh
Usage: fixNoUserGroupNames.sh
-u uid do not alter users below this uid
-g gid do not alter groups below this gid
-p path start at this path

Sample run without redirecting to a file:

# ./fixNoUserGroupNames.sh -u 500 -g 500 -p /tmp/
#!/bin/bash
[[ "$USER" != "root" ]] && ( echo "must be root"; exit 1 )
[[ -d "/home/brock" ]] && chown brock /home/brock
find /tmp/ -nouser -uid
5022 -exec chown brock {} \;
[[ -d "/var/lib/nfs" ]] && chown nfsnobody /var/lib/nfs
find /tmp/ -nouser -uid 65534 -exec chown nfsnobody {} \;
[[ -d "/home/USER" ]] && chown USER /home/USER
find /tmp/ -nouser -uid 501 -exec chown USER {} \;
find /tmp/ -nogroup -gid
5022 -exec chgrp brock {} \;
find /tmp/ -nogroup -gid 65534 -exec chgrp nfsnobody {} \;
find /tmp/ -nogroup -gid 501 -exec chgrp USER {} \;

Running the script and resolving an unresolved uid and gid:

# ./fixNoUserGroupNames.sh -u 500 -g 500 -p /tmp/ >fix.sh
# chmod +x fix.sh
# ls -l /tmp/no-*
-rw-r--r-- 1 brock 5022 0 Oct 22 00:51 /tmp/no-group
-rw-r--r-- 1 5022 brock 0 Oct 22 00:51 /tmp/no-user
# ./fix.sh
# ls -l /tmp/no-*
-rw-r--r-- 1 brock brock 0 Oct 22 00:51 /tmp/no-group
-rw-r--r-- 1 brock brock 0 Oct 22 00:51 /tmp/no-user

I just came across a script that had a beautiful method of printing help messages. I am unsure why this method is not used more. Here is an example:


$ cat helpExample.sh
#!/bin/bash
usage()
{
echo "Usage: ${0##*/}
This is an example help message in BASH.
-x this option does nothing
" >&2
exit $1
}
usage 1

The script is using echo ” followed by a multi line help message redirected to standard error (" >&2). Output:


$ ./helpExample.sh
Usage: helpExample.sh
This is an example help message in BASH.
-x this option does nothing

Knowing commands is essential to using the command line. It is even more so when programming the command line.
However, you don’t have to know them yet! I can teach you the basics and then you can learn as you go. The most important thing you can possibly get from this article is the next sentence. UNIX/Linux systems have the ENTIRE user manual in neat organized sections relating to the specific command you want to know about. Its called the man pages or manual pages. Just type “man” and then any command. Example “man find”.

This is the third installment of Shell Programming Beginners Class. If your reading this article first and would prefer to start from the beginning you can do so here: motivation or if you would simply like to read the previous article, it is available here: variables.

Commands in UNIX/Linux, Windows, and Mac OS all follow the same format. That is “command [options] [arguments]”. Technically options are arguments but who cares. Example “find /home/brock -name ‘ECON1001*’” or “rmdir test_directory”.

Enough already, here is the list in no particular order:

  1. cd: change to a different directory
  2. cp: copy files and directories
  3. more: used for viewing a file one page a time
  4. less: less is more (less is better than ^more^)
  5. mv: mv or rename files and directories
  6. vi: a text editor
  7. rmdir: remove empty directories
  8. rm: remove files and directories

Most of the commands should be self explanatory. cd changes from one directory to another, cp copies things, rm removes
things, and rmdir removes empty directories. Your probably not sure why we have rm and rmdir or possibly what less, more, and vi are.

rmdir exists basically as the “correct” or “safe” way to deal with deleting files and directories. The command line does not have a trash bin. When its gone its gone. Thus typically rm has the -i option enabled by default which asks your to confirm EVERY SINGLE TIME you want to delete a file. Then your supposed to remove all the files in a directory and then and only then delete it with rmdir. So what does everyone do when they want to delete something? “rm -rf”.

rm -rf is joking called the nuclear option in some very dorky circles. It deletes recursive and does not prompt. Running “rm -rf /” will bring your system down and delete everything on your hard disk.

When using the command line you typically have a small window where you type in your commands. As you type in commands
the output scrolls to the top of the window and then eventually disappears after you have scrolled all the way up. Thus
the need for more and less. They allow you to view a file one page a time. Less is more, than more.

vi is the windows notepad of UNIX/Linux although its very hard to learn for new people. If you have the command “nano”
or “pico” available (just type it in) and are getting distressed with vi, give them a try. I’d rather have you never learn
vi than to never learn the command line because of vi.

If you have no idea what I am talking about, here is a few sample commands being executed. When the user types “vi file”
they would see a screen which you will not see below. But after closing vi they would return to seeing what we see below. The user copies a file, edits the new file, and then removes the old one. Looks like some cheating is going on.

# cp econ1101-brad.doc econ1101-jessica.doc
# vi econ1101-jessica.doc
# rm econ1101-brad.doc

This is the second installment of Shell Programming Beginners Class. The first installment was on motivation.

Variables are the heart of programming. They allow you to do something with a value which you do not know. In practical terms lets say we have a program or script which takes one integer value (Ex: 1, 2, 3, 4, …) and simply adds 1 to that value. You give it 5, it gives you 6. Not very useful. However, in that program you are going to have to call whatever number the user gives you something. When writing the program you obvivously do not know what value the user is going to enter. They may not even enter an integer. Lets assume they will indeed give you an integer, for now. Whatever you call the number which the user gives you is the name of the VARIABLE.

Lets say I call the user inputed value IN_INT, then my variable name would be IN_INT. In BASH when I wanted to use the value of that variable (Ex: 6 ) then I would prefix the variable name with a dollar sign, $.

Here is our sample program: (All your programs must start with #!/bin/bash and must be executable, chmod 744 filename.)

# cat addOne.sh
#!/bin/bash
echo "Type an integer and press enter: "
read IN_INT
echo "You entered $IN_INT to the program."
echo "Here is your output master: " `expr $IN_INT + 1`

I will run the script:

# ./addOne.sh
Type an integer and press enter: 6
You entered 6 to the program.
Here is your output master: 7

Following along? If not send me an email: SpamDefeator.

In BASH variables can be numbers or strings. Example:

# STR1="this is a string."
# echo $STR1
this is a string.
# STR2=THIS_IS_A_STRING_WITH_NO_SPACES
# echo $STR2
THIS_IS_A_STRING_WITH_NO_SPACES
# INT1=21
# echo $INT1
21

You can also assign the output of a command to a variable with the ` or backtick character:

# pwd
/root
# CWD=`pwd`
# echo $CWD
/root

You can perform mathematic operations on variables in a few different ways. The simplest is the expr command. expr allows you to add, substract, multiply, divide, find remainders, test values against other values, and a whole range of other things. I used expr above to add one to a number. Read the manual page of expr if would like to learn more.

If you would like to learn more please read the next article in this series Shell Programming Beginners Class Lesson 3 - Essential Commands.

I am going to write a few lessons on shell programming for beginners from the basic to the advance. I thought I would write some motivation as I know many people who would benefit greatly from shell scripting but choose not to learn.

Why in the world would you want to learn to use your shell environment as a programming environment?

Instead of doing this:

# ssh -l user host1
user @ host1 ~$ cd /app/path
user @ host1 /app/path$ cp important.conf important.conf.02122007
user @ host1 /app/path$ vi important.conf
user @ host1 /app/path$ exit
# ssh -l user host2
user @ host2 ~$ cd /app/path
user @ host2 /app/path$ cp important.conf important.conf.02122007
user @ host2 /app/path$ vi important.conf
user @ host2 /app/path$ exit
# ssh -l user host3
user @ host3 ~$ cd /app/path
user @ host3 /app/path$ cp important.conf important.conf.02122007
user @ host3 /app/path$ vi important.conf
user @ host3 /app/path$ exit
# ssh -l user host4
user @ host4 ~$ cd /app/path
user @ host4 /app/path$ cp important.conf important.conf.02122007
user @ host4 /app/path$ vi important.conf
user @ host4 /app/path$ exit
# ssh -l user host5
user @ host5 ~$ cd /app/path
user @ host5 /app/path$ cp important.conf important.conf.02122007
user @ host5 /app/path$ vi important.conf
user @ host5 /app/path$ exit

You do something like this:

# echo host1 host2 host3 host4 host 5 | while read HOST; do \
ssh -l user $HOST "cd /app/path && \
perl -i.02122007 -pe 's@^key\s*=\s*val$@key=newval@g' important.conf ; \
egrep '^key' important.conf" ; \
done
key=newval
key=newval
key=newval
key=newval
key=newval
#

Then go get a cup of coffee or get some more work done and then get a raise! If you would like to learn more please read the next article in this series Shell Programming Beginners Class Lesson 2 - Variables.