Learn the UNIX/Linux command line

Everyone should write shell scripts

Yes, everyone should write shell scripts. No matter what you are doing on a computer, it’s often you want to get it done more than once. At that point, it should be in a shell script.

I teach Apache Hadoop and in our courses we have students complete labs. The very first lab in one course is simply pushing files to HDFS:


$ hadoop fs -put file .
$ hadoop fs -ls


Even these trivial examples should be scripted. Why? Paul MacCready. Paul is one of the two inventors of the first Human Powered Aircraf. As this excellent post argues, Paul’s solution to the problem was not that he was a better Engineer than those who had tried previously (though that could be true as well), but that Paul knew they had to iterate fast.

Paul realized that what we needed to be solved was not, in fact, human powered flight. That was a red-herring. The problem was the process itself, and along with it the blind pursuit of a goal without a deeper understanding how to tackle deeply difficult challenges. He came up with a new problem that he set out to solve: how can you build a plane that could be rebuilt in hours not months.

One again from The Wrong Problem.

Thus, the reason that you should write shell scripts is so that you can iterate faster. Simply typing `hadoop fs -put’ the first time is OK. Once you understand the command, you should be scripting it. This is how I solve over 75% of my problems. I write a script called “run.sh” which simply sets up the environment, deletes previous output files, and executes the command sequence I think will resolve my problem. If it’s a little off, I just make a small change and iterate. Instead of:


$ cmd1
$ cmd2
$ cmd3
$ ls -l
$ rm -rf ouptut
$ cmd1
$ cmd2
$ cmd2.5
$ cmd3
$ ls -l
$ rm -rf ouptut

It’s


$ ./run.sh
$ vim run.sh
$ ./run.sh

Setting a max heap on fuse-dfs

fuse-dfs is the fuse plugin for HDFS (Hadoop Distributed File System). fuse-dfs creates a JVM through the native api and proxies fuse requests to the java api. This means the fuse-dfs process will encapsulate a JVM.

The problem is that by default that JVM will consume a fair amount of memory unless you set a max heap. The code allows you to push JVM options via the LIBHDFS_OPTS environment variable.

If using CDH, you can do this by setting a file in /etc/default/:

$ cat /etc/default/hadoop-0.20-fuse
export LIBHDFS_OPTS="-Xms128m -Xmx128m"

Twin Cities Code Camp 10 (#TCCC10)

Hello,

Tomorrow, at the second day of TCCC10 I will be presenting two sessions.

  • Introduction to Apache Hadoop
  • Introduction to BASH Programming

See you then!

Symlinks are the Devil and Other Reasons Why You Should Reconsider Using Them

Symlinks are abused and misused far too often. I have seen it most often in storage. Let’s say project A is funded and they buy a few servers and some storage.

The storage is mounted on all the servers as /projectA/featureA and life goes on. Eventually they need to buy more storage. The storage they have is perfectly fine, they just want to add. They need to decide where to place the storage for a new feature, B. The could mount the storage at /projectA/featureB.

However, since the project started the business has made a decision they will mount all storage in /storage/mount#. They wanted to achieve a global name space so storage could be moved from server to server without local path dependencies. Projects will have to use the storage in /storage. The newly purchased storage is mounted in /storage/mount1 as it’s the first storage purchased after the new mount policy came into affect.

Of course, the application developers do not like the fact they have half their storage in /projectA/featureA and the other half in /storage/mount1. They have the UNIX team create a symbolic link from /projectA/featureB to /storage/mount1. Perfect they think! Now they can always reference /projectA/..!

A few months later the project is a success! The business asks them for features X, Y, and Z. Each time requiring the purchase of more storage. Now they have five mounts:


/projectA/featureA
/projectA/featureB -> /storage/mount1
/projectA/featureX -> /storage/mount2
/projectA/featureY -> /storage/mount3
/projectA/featureZ -> /storage/mount4


Of course by this time, the project name has changed five times and it’s called ProjectF, but everyone remembers it used to be called ProjectA and that is why the data exists in /projectA. Usage is continuing to increase and they decide to buy 50 more servers. The UNIX engineer who created the initial symbolic links is long gone and the new engineer has no idea these local dependencies exist. The servers are built and have all the correct storage mounted /projectA/featureA and /storage/mount1-4. The operations team starts the applications on the servers and heads home for the night.

The next morning there are all kinds of complaints about the project’s website returning all kinds of errors! After some investigation they find that the symbolic links from /projectA/feature[BXYZ] do not exist. Frantically they create these links and service is stored.

If you find yourself creating symbolic links for a production application, take a breather. Think about this decision.

Timeout – new coreutils command

The command “timeout” which debuted in Coretuils 7.0 Beta (2008-10-05) is an exceedingly useful command. Often I write a script which needs to run a particular command for a period of time and then stop and restart. Before timeout this was quite painful. Some commands such as nc (netcat) and curl offer native timeouts as they are dealing with network operations. Non-network related commands almost never offer timeouts.

Often I have to use a utility program where I am suspect as to it’s ability to complete successfully in a given time period. I can ensure the command will stop within my timeframe by using the timeout command. An example would be a java class of which I am suspect. I can use the timeout command to let it complete successfully when my time limit is not reached and be killed once the time limit has expired. Below I run the suspect java class “SometimesLongRunning” using timeout. The command is supposed to finish in 5 seconds, I am however giving it up to 1 minute to finish. The first time the class “SometimesLongRunning” executes, it completes successfully in 5 seconds:

$ date; timeout 1m java SometimesLongRunning; date Sat Jan 15 10:11:33 CST 2011 Sat Jan 15 10:11:38 CST 2011

However, as I suspected, on subsequent runs it will not complete in 5 seconds and the 1 minute timeout is enforced:

$ date; timeout 1m java SometimesLongRunning; date Sat Jan 15 10:13:53 CST 2011 Sat Jan 15 10:14:53 CST 2011

Timeout by default sends the TERM signal after the timeout has expired, however by using the -s or –signal options you can tell timeout to send any signal you desire. I can see many uses for this feature. For example, let’s say I am having a problem with initialization of a JVM, I can use this to send the QUIT signal and have a thread dump taken soon after the JVM is launched. Much quicker than I could do so by hand and less racy than other solutions.

Back to the long running java class. Timeout allows me to do one better than simply killing the process after a specified time. I can use the –signal and –kill-after options together to take a thread dump after 10 seconds and kill the JVM after 1 minute. This allows me to both ensure the process respects my time limit and also gives me debugging information as where the JVM was stuck.

$ date; timeout --signal QUIT --kill-after 60s 10s java SometimesLongRunning; date
Sat Jan 15 10:18:03 CST 2011 2011-01-15 10:18:13 Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.0-b16 mixed mode):
"Low Memory Detector" daemon prio=10 tid=0x00002aacab78b000 nid=0x7161 runnable [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x00002aacab788000 nid=0x7160 waiting on condition [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x00002aacab4e2800 nid=0x715f waiting on condition [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00002aacab4e0c00 nid=0x715e runnable [0x0000000000000000..0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00002aacab4ce000 nid=0x715d in Object.wait() [0x0000000040d35000..0x0000000040d35ca0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aac023f1210> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x00002aac023f1210> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x00002aacab4cc800 nid=0x715c in Object.wait() [0x0000000040c34000..0x0000000040c34e20]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aac023f1078> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00002aac023f1078> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x0000000040111c00 nid=0x7152 waiting on condition [0x000000004022a000..0x000000004022af60]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at SometimesLongRunning.main(SometimesLongRunning.java:5)
"VM Thread" prio=10 tid=0x00002aacab4c7000 nid=0x715b runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c400 nid=0x7153 runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011e000 nid=0x7154 runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x000000004011f800 nid=0x7155 runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000040121000 nid=0x7156 runnable
"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000040122c00 nid=0x7157 runnable
"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000040124400 nid=0x7158 runnable
"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000040125c00 nid=0x7159 runnable
"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000040127800 nid=0x715a runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aacab78d000 nid=0x7162 waiting on condition
JNI global references: 590
Heap
PSYoungGen total 149952K, used 2571K [0x00002aac023f0000, 0x00002aac0cb40000, 0x00002aaca9940000)
eden space 128576K, 2% used [0x00002aac023f0000,0x00002aac02672e28,0x00002aac0a180000)
from space 21376K, 0% used [0x00002aac0b660000,0x00002aac0b660000,0x00002aac0cb40000)
to space 21376K, 0% used [0x00002aac0a180000,0x00002aac0a180000,0x00002aac0b660000)
PSOldGen total 342720K, used 0K [0x00002aaab3940000, 0x00002aaac87f0000, 0x00002aac023f0000)
object space 342720K, 0% used [0x00002aaab3940000,0x00002aaab3940000,0x00002aaac87f0000)
PSPermGen total 21248K, used 2479K [0x00002aaaae540000, 0x00002aaaafa00000, 0x00002aaab3940000)
object space 21248K, 11% used [0x00002aaaae540000,0x00002aaaae7abf28,0x00002aaaafa00000)
Killed
Sat Jan 15 10:19:13 CST 2011

Free Shell Scripting Basics Webinar with a FREE SHELL ACCOUNT

Shell Scripting Basics! Have you always wanted to learn shell scripting? Have you struggled with moving beyond the basics? Often times users new to the shell are not sure what questions to ask. In this webinar we will cover the basics and provide a forum to ask those questions you may have trouble resolving via google. Active participation is encouraged!

At the end of the session, I will provide every participant with a free shell account! *

The following is a list of topics for the session. This is open to change as the attendees see fit.

  • Accessing a BASH shell from Windows or a Mac
  • File system layout
  • Basic commands: echo, ls, cd, pwd, etc
  • Variables and quoting
  • Editing files
  • More commands: grep, tail, head, cut, etc
  • Control structures: if test, for and while loop

*The shell account will be provided on a shared VM and is to be used for functional education only. I reserve the right to terminate or deny access at my discretion.

Details:

  • Date Nov 09, 2010 and Nov 10, 2010
  • Time 7PM CST
  • WebEx Details sent to those who signup below