rsync –partial









  • english
  • spanish

Everybody knows the rsync command. It’s very useful when you need to keep two machines in sync over the network, minimizing the bandwidth used.

The only problem lies when running it for the first time, or when a new file appears and it’s newly uploaded to the other end: if the network connection is unstable and goes down, rsync deletes the partially uploaded file and it needs to be uploaded again. In other words, when copying new files instead of syncing existing ones, rsync only copies full files.

Luckily there’s a command line parameter to avoid this behaviour and, in the event of a network failure, force rsync to leave the partially-uploaded file as-is and continue syncing it from there on the next run:

# rsync --help ...   --partial keep partially transferred files

cut | sort | uniq: Apache logs









  • english
  • spanish

Many times the reason because a web server is slow and unresponsive is that it’s under “attack”, on purpose or not, by a bot. I’ve seen cases where Google Bot, bots from research engines from universities or some other kind of indexer were responsible for more than half the traffic of a site. These cases are not real DoS attacks, this traffic can be considered legitimate, but the result is that it brings the service down. You can instruct some of these bots not to visit your site so often, like Google Bot using the Google Webmaster Tools and the sitemaps and/or robots.txt files, but usually you can’t and have to consider filtering all this traffic at the firewall. But in any case, the first step is realizing that a single IP (or a couple of them) is responsible for most of your traffic, identifying this IP and using whois learn who it belongs to.

You can run something like this to list the top five IP addresses on your Apache’s access.log:

cut -d" " -f 1 access.log | sort | uniq -c | sort -nr | head -n 5

tar stdin/stdout









  • english
  • spanish

The tar command, like many other UNIX commands, can use stdin/stdout as its input/output target by using “-” instead of a real file.

What’s this useful for? For example, copying a full directory tree while keeping dates, ownerships and permissions, locally or remotely with ssh.

$ tar cf - . | ( cd /some/other/dir ; tar xfv - ) $ tar cf - dir | ssh usr@srv "tar xfv -"

cssh: Cluster SSH









  • english
  • spanish

cssh.jpg

Cluster SSH (cssh) is one of mankind’s greatest achievements. Really. Or at least, from a sysadmin point of view. :-) It’s one of those programs that, once you know it and start using it, you wonder how were you able to survive all those dark years without it.

cssh takes a username and several IP addresses on the command line, and opens a SSH session against each of those servers on an independent xterm window. Then, you can click on any of these windows and work independently with that particular server, or go back to the cssh console and write there, and ssh will retransmit each keypress to every SSH session.

This is a very useful tool when working with server farms, clusters, etc. For example lately I’ve been doing some monitoring and maintenance tasks on a 32-server farm on a major ISP: one by one each config modification would have take hours, with cssh it is a matter of minutes. :) Besides, as it replicates every keypress, it even works with text editors like vi: you can edit a file on every server at the same time, navigate through it, modify it… in parallel on every server. You have to be very careful when doing this, though. ;)

Shell-script: timestamp









  • english
  • spanish

A one-liner function that, when piped with the output of another command, prepends each line of that command’s output with a timestamp.

Very useful with commands that output a series of lines periodically but without a timestamp (like vmstat), so that you can’t just send their output to a file and go back to it later without a timeframe.

$ function timestamp { while read l; do d=`date +%H:%M:%S`; echo -e "$d $l"; done; }
$ vmstat 1 | timestamp 12:17:03 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 12:17:03 r b swpd free buff cache si so bi bo in cs us sy id wa 12:17:03 1 0 4 16740 42816 395164 0 0 6 5 174 11 3 1 96 0 12:17:04 0 0 4 16656 42816 395184 0 0 0 0 393 510 1 1 98 0 12:17:05 0 0 4 16656 42816 395184 0 0 0 0 391 781 2 1 98 0 12:17:06 1 0 4 16656 42824 395176 0 0 0 84 462 976 3 1 95 0 12:17:07 0 0 4 16656 42824 395184 0 0 0 0 433 1545 11 3 86 0 12:17:08 0 0 4 16656 42824 395184 0 0 0 0 356 807 1 2 97 0

pkill









  • english
  • spanish

I’m almost ashamed to admit I didn’t knew of this command until very recently. :-)

Everybody knows the kill command, and almost everybody knows  killall too (like kill using the process’ name instead of its  PID). The problem with killall is that it only signals the processes matching the exact name you specify, and sometimes you want to work over a series of related processes that begin with the same word, or have a given word on their command line. pkill does just that:

pkill -9 -f foo

As a side note, what I used before when I wanted to do this was:

kill $SIGNAL $(ps aux | grep $PROG | awk '{print $2}')

ifstat









  • english
  • spanish

ifstat is like some kind of vmstat for network interfaces. Every X seconds, it prints two columns per network interface monitoring the input and output bandwidth.

Quite useful when you need to diagnose a slow network, identify bottlenecks, monitor bandwidth consumption, etc. When invoked with the -t parameter it adds a timestamp on each line, very important if you plan to left it running for hours dumping its output to a file.

# ifstat -t   Time eth0 eth1 HH:MM:SS KB/s in KB/s out KB/s in KB/s out 12:40:25 0.05 0.19 0.00 0.00 12:40:26 0.26 0.24 0.00 0.00 12:40:27 6.87 30.14 0.00 0.00 12:40:28 0.17 0.26 0.00 0.00 12:40:29 0.05 0.22 0.00 0.00 12:40:30 0.05 0.14 0.00 0.00 12:40:31 0.05 0.14 0.00 0.00 12:40:32 0.29 0.38 0.00 0.00 12:40:33 0.29 0.38 0.00 0.00 12:40:34 0.13 0.22 0.00 0.00 12:40:35 2.85 8.70 0.00 0.00 12:40:36 9.01 38.84 0.00 0.00 12:40:37 0.55 0.24 0.00 0.00 12:40:38 0.05 0.14 0.00 0.00