At the company I work for, we manage the digital editions of several local newspapers spread all over Spain. None of them is big in a nation-wide sense, but almost all of them are leaders on their region.
For quite some time, we’ve had performance problems with one of them: performance here was good (<5s load times), but the users from the region that particular newspaper is distributed on kept complaining about poor performance (>40s load times, unbelievable high). The more we optimized our server and network infrastructure, the HTML layout, CSS, code… the more they complained and the more obvious it became that there was something else going on.
After some investigations we discovered that the routing between the major ISP of that region, which almost all of our readers used, and ours was the cause of the problem: a traceroute from a local DSL line there to our servers showed that the traffic went to Germany before coming back to Spain, with quite a high latency and high roundtrip times.
So, it wasn’t our fault, the real solution to the real problem was out of our reach, but in the end, our image was at stake so it was OUR problem. What could we do?
After some inspiration the solution became clear: get a housing on the local ISP which had the problems and set-up a reverse proxy there, and redirect all clients of that ISP to this proxy. Sure, the connection between the proxy and our servers would be as bad as before, but as the content would be cached and refreshed on the background, the final user shouldn’t notice it any more!
There are just two pieces of software involved here:
- squid, the most used proxy on the Linux/UNIX world.
- djbdns, our DNS server of choice. Among other things, it has the ability to return different IP addresses to an A query depending on the IP address of the client.
squid
squid is quite easy to set-up as a reverse proxy. After installing it (“apt-get install squid” in our Debian-based server) edit the main config file at /etc/squid/squid.conf and:
# Make it work as a reverse proxy on port 80 instead of 3128 http_port
80 vhost
</code
># Treat several concurrent queries for the same URI as one,
# reduces bandwidth and in our case improves performance
collapsed_forwarding on
# Define wich domains are we going to serve
# Refuse anything else
acl myDomains dstdomain www.example.com isp2.example.com
http_access deny !myDomains
Obviously there's much more to configuring squid than this. These are just the basic options to get our solution going and do some preliminary tests. Then there's memory limits, object-cache management, cache-expires management (which you better have on your application code anyway), peer caches, and much much more. Get some good Squid HOW-TO or book if you want to learn how to tweak it for optimum performance.
djbdns
Now the tricky part: directing some users to our servers and some other to the proxy. Luckily, the DNS server we use (djbdns) has a built-in option to do this.
What we've done is defining two names, isp1.example.com pointing to our IP, and isp2.example.com pointing to the proxy, and then a CNAME which will point to one or another depending on the client's IP, much like Akamai does. This way we can easily and individually access each server.
# XXX.YYY.x.x IP range, will send it to the proxy %PX:XXX.YYY
# All the rest %RS:
# A records for our server and the proxy
=isp1.example.com:A.B.C.D:300
=isp2.example.com:Z.Y.X.W:300
# Pivoting CNAME depending on the client's IP
Cwww.example.com:isp2.example.com.:300::PX
Cwww.example.com:isp1.example.com.:300::RS
Of course, following this scheme we could add as many proxies on as many ISPs more as we wanted, creating an Akamai-like CDN (Content Delivery Network). You get the picture.
For more info on djbdns data syntax, please check: http://cr.yp.to/djbdns/tinydns-data.html