Data Pipelines (Basic)

December 10th, 2006

Pipelines are used to send the standard output of one program to the standard input of another. This allows you to link commands without having to write the output of one command to a file and then passing that file to another command for further processing. Here is an example:

[root@www ~]# find . -type f | wc -l
1708

The above counts the number of files in the current directory and its subdirectories. You can use more than one pipe. The example displays the five largest files or directories in the current directory:

[root@www ~]# du -ks * | sort -rn | head -n 5
17207 08-28-2006.rar
14868 gb2.sql
6738 jpgraph-1.20.5
4387 jpgraph-1.20.5.tar.gz
602 temp.sql

Here is another example which uses four pipes. It displays number of processes currently running next to the user the process is running under in descending order:

[root@www ~]# ps -ef --no-headers | awk '{print $1}' | sort | uniq -c | sort -rn
  42 root
  9 apache
  4 brock
  1 xfs
  1 smmsp
  1 mysql

If you run a database server, you will probaly want to back it up from time to time. This takes my database, compresses it, and saves it a file:

[root@www ~]# mysqldump -p bash | gzip -c > bash.sql.gz
Enter password:
[root@www ~]# ls -lh bash.sql.gz
-rw-r--r-- 1 root root 31K Dec 11 00:21 bash.sql.gz

Leave a Reply

If Wordpress eats your comment (shell output, loops, ex..) email the text to me.