Tag Archive for 'grep'

cut | sort | uniq: Apache logs

  • english
  • spanish

Many times the reason because a web server is slow and unresponsive is that it’s under “attack”, on purpose or not, by a bot. I’ve seen cases where Google Bot, bots from research engines from universities or some other kind of indexer were responsible for more than half the traffic of a site. These cases are not real DoS attacks, this traffic can be considered legitimate, but the result is that it brings the service down. You can instruct some of these bots not to visit your site so often, like Google Bot using the Google Webmaster Tools and the sitemaps and/or robots.txt files, but usually you can’t and have to consider filtering all this traffic at the firewall. But in any case, the first step is realizing that a single IP (or a couple of them) is responsible for most of your traffic, identifying this IP and using whois learn who it belongs to.

You can run something like this to list the top five IP addresses on your Apache’s access.log:

cut -d" " -f 1 access.log | sort | uniq -c | sort -nr | head -n 5

pkill

  • english
  • spanish

I’m almost ashamed to admit I didn’t knew of this command until very recently. :-)

Everybody knows the kill command, and almost everybody knows  killall too (like kill using the process’ name instead of its  PID). The problem with killall is that it only signals the processes matching the exact name you specify, and sometimes you want to work over a series of related processes that begin with the same word, or have a given word on their command line. pkill does just that:

pkill -9 -f foo

As a side note, what I used before when I wanted to do this was:

kill $SIGNAL $(ps aux | grep $PROG | awk '{print $2}')

Si te ha interesado, ¿me invitas a una cerveza? / If you liked the post, would you buy me a beer?




Creative Commons Attribution-NonCommercial 2.5 Spain
Creative Commons Attribution-NonCommercial 2.5 Spain