Archivo de la Categoría 'lighttpd'

cut | sort | uniq: Apache logs

  • english
  • spanish

Many times the reason because a web server is slow and unresponsive is that it’s under “attack”, on purpose or not, by a bot. I’ve seen cases where Google Bot, bots from research engines from universities or some other kind of indexer were responsible for more than half the traffic of a site. These cases are not real DoS attacks, this traffic can be considered legitimate, but the result is that it brings the service down. You can instruct some of these bots not to visit your site so often, like Google Bot using the Google Webmaster Tools and the sitemaps and/or robots.txt files, but usually you can’t and have to consider filtering all this traffic at the firewall. But in any case, the first step is realizing that a single IP (or a couple of them) is responsible for most of your traffic, identifying this IP and using whois learn who it belongs to.

You can run something like this to list the top five IP addresses on your Apache’s access.log:

cut -d” ” -f 1 access.log | sort | uniq -c | sort -nr | head -n 5

Daemontools scripts for lighttpd and PHP

  • english
  • spanish

I’ve prepared a set of daemontools scripts to launch and monitor lighttpd and its PHP processes spawned with spawn-fcgi. Here is the README, a tar file with the scripts, and here you can browse the directories with all the scripts.

PS: yes, I like daemontools. It helps me achieving high availability with many services, keeping them up even when a server misbehaves and some process dies. This avoids a lot of late night calls. It’s a great invention. :)

PHP with lighttpd 1.5.0

  • english
  • spanish

PHP integration and configuration in lighttpd 1.5.0 has changed: mod_fastcgi isn’t used any more, you need mod_proxy_backend_fastcgi instead; and lighty won’t launch the PHP processes, you’ll have to start them using the spawn-fcgi program.

In order to setup mod_proxycore for use with PHP, this is the bare minimum configuration (put it in lighttpd.conf, or conf-enabled/php.conf):

server.modules += ( “mod_proxy_core”, “mod_proxy_backend_fastcgi” )

$PHYSICAL[“existing-path”] =~ “.php$” {
proxy-core.allow-x-sendfile = “enable”
proxy-core.protocol = “fastcgi”
proxy-core.backends = ( “unix:/tmp/php-fastcgi.sock” )
proxy-core.max-pool-size = 16
}

And for the PHP fast-cgi processes, just run or prepare an init.d script that runs the following command:

/usr/bin/spawn-fcgi -s /tmp/php-fastcgi.sock -f /usr/bin/php-cgi -u www-data -g www-data -C 5 -P /var/run/spawn-fcgi.pid

lighttpd 1.5.0-SVN r1992 for Debian Sarge

  • english
  • spanish

I’ve built .deb packages of lighttpd 1.5.0-SVN r1992 for Debian Sarge. They’re based off the latest packages in testing, upgraded to 1.5.0. The only thing missing is mod_mysql_vhost, as I don’t have mySQL 5.0 installed at the moment. This server already runs lighty 1.5.0, so the fact that you’re reading this page is the best proof that it works. ;)

The packages are available for download here.

(PS: I know, I know, what I should do is upgrade to Etch altogether…)

UPDATE (20070921): new release with linux-aio-sendfile support. You’ll need the libaio port too.

CakePHP and lighttpd

  • english
  • spanish

I’ve been busy the last couple of days with a web project I’m developing using CakePHP, and I wanted to use lighttpd as the web server. CakePHP comes with the typical .htaccess with Apache’s mod_rewrite rules, that need to be converted to lighty’s format. The solution can be found here, 3rd comment:

Lighttpd and CakePHP setup in subdirectories

url.rewrite-once = (
“/(.*)\.(.*)” => “$0″,
“/(css|files|img|js)/” => “$0″,
“^/([^.]+)$” => “/index.php?url=$1″
)

Google’s high scalable architectures

  • english
  • spanish

A coworker has sent me three interesting articles from High Scalability, a site I still didn’t knew but which I’ve already added to my Google Reader list. :) The articles talk about the design and computer/network architecture decisions taken at YouTube, Google and GTalk in order to handle the big load their services face. They also comment the current architecture in each site and their evolution over time:

Some lessons to learn from these articles:

  • Don’t try to fix everything with one single architecture or tool. Divide the problem, see if each sub-problem is CPU-, bandwidth- or IO-bound, and optimize it. Specialize server for each task and coordinate their work.
  • Cache content whenever possible. Pre-generate content whenever possible. Make good use of HTTP’s cache-control directives. Use squid as a reverse proxy to leverage your application servers’ load.
  • Think about externalizing some things, like hosting images or videos off-site. These elements may need more bandwidth that you currently have, and moving them off-site can be a good idea, even if it’s just a temporary measure while you manage to get more bandwidth. The service must run at all times.
  • Simplicity. Will let you make changes and evolve your architecture without screwing up.
  • Commodity-PC based clusters. They maximize the power/price ratio. Have a redundancy system in place so that when one node goes down or needs maintainence, the system keeps working without it. Have a system to easily install/change a node, also without affecting the service. And start planning the power and cooling problems ahead. ;)
  • Programming today is much about libraries and frameworks. Don’t reinvent the wheel. Use a common framework in all your developments, homegrown or not. This way novel programmers will be able to start writting code faster, will be able to switch projects easily, won’t have to code the same things over and over again, and a system upgrade will benefit all your applications.
  • Think about the architecture you’ll need from the start. I’m sadly used to developers not caring about what their code runs on, or if their code will lead to CPU, IO or bandwidth problems. Google seems to face every new development looking at the architecture they’ll need to handle the service, and then develop the code arount that architecture. This is what settles Google appart from the rest.

Cabalgando los gusanos

“Debes cabalgar por la arena a la luz del día,
para que Shai-hulud vea y sepa que no tienes miedo.”

Dune, de Frank Herbert

“Si no está en Google, no existe”. Esta frase tan categórica es cierta tanto para comercios on-line o webs corporativas, como para nuestro blog personal. Cuando necesitamos localizar información en Internet, vamos a Google. Y quien dice Google, dice Yahoo, MSN, o cualquier otro buscador. Tenemos que estar ahí.

Éstos buscadores usan “bots” o “spiders” para indexar el contenido de nuestras páginas, programas que periódicamente recorren todos los sitios que ya conocen en busca de actualizaciones y nuevos enlaces a través de los cuales descubrir, procesar e indexar más y más páginas

A nadie se le escapa que el trabajo de éstos programas es beneficioso, pero normalmente no tenemos en cuenta que generan tráfico extra a nuestra web. Aunque parezca mentira, conozco sitios en los que el tráfico de GoogleBot y compañía (ojo, hablo del propio bot, no de visitas dirigidas desde el buscador) consumía hasta un tercio del ancho de banda total de los accesos. Estamos hablando de GIGAS de tráfico al día.

Además los buscadores penalizan la información repetida: si tenemos varias páginas con contenido igual o muy similar, o aún peor, si podemos cargar una misma página con varias URLs distintas, podemos llevarnos sorpresas desagradables como páginas que no aparecen en los resultados de una búsqueda en favor de un feed o un resumen (índice de sección, categoría, etc.) con contenido similar, o páginas con un pagerank bajo porque éste se “diluye” entre varias URLs.

Por ello es importante aprender cómo funcionan éstos bots para saber cómo optimizar su paso por nuestro sitio web, cómo “llevarlos de la mano” hasta la información que queremos priorizar para así mejorar nuestro posicionamiento en los resultados, minimizando a su vez cuando sea posible la cantidad de información transmitida para no saturar nuestra conexión y servidores.
Continuar leyendo ‘Cabalgando los gusanos’ »

lighttpd 1.4.15 for Debian Sarge

  • english
  • spanish

I’ve updated my lighttpd packages for Debian Sarge to the 1.4.15 release. Find them here.

lighttpd + WordPressMU (english)

For those who still don’t know it, lighttpd is a cool web server: small, efficient, extensible, powerful… but just as with any other new server, you’ve got to get used to it and doing things “its way” instead of “as always”.

Continuar leyendo ‘lighttpd + WordPressMU (english)’ »

lighttpd + WordPressMU (español)

Para el que no lo conozca, lighttpd es una caña de servidor web: pequeño, eficiente, extensible, potente… una joya. Pero como todas las cosas que son “lo mismo pero distinto”, a veces cuesta acostumbrarse o hay que buscarse la vida para hacer algo que con “lo de siempre” hubieran sido dos patás.
Continuar leyendo ‘lighttpd + WordPressMU (español)’ »