Friday, May 03, 2013

The Benefits of a Reverse Proxy

A typical ASP.NET public website hosted on IIS is usually configured in such a way that the server that IIS is installed on is visible to the public internet. HTTP requests from a browser or web service client are routed directly to IIS which also hosts the ASP.NET worker process.  All the functionality needed to produce the web site is embodied in a single server. This includes caching, SSL termination, authentication, serving static files and compression. This approach is simple and straightforward for small sites, but is hard to scale, both in terms of performance, and in terms of managing the complexity of a large complex application. This is especially true if you have a distributed service oriented architecture with multiple HTTP endpoints that appear and disappear frequently.

A reverse proxy is server component that sits between the internet and your web servers. It accepts HTTP requests, provides various services, and forwards the requests to one or many servers.

font-side-proxy

Having a point at which you can inspect, transform and route HTTP requests before they reach your web servers provides a whole host of benefits. Here are some:

Load Balancing

This is the reverse proxy function that people are most familiar with. Here the proxy routes incoming HTTP requests to a number of identical web servers. This can work on a simple round-robin basis, or if you have statefull web servers (it’s better not to) there are session-aware load balancers available. It’s such a common function that load balancing reverse proxies are usually just referred to as ‘load balancers’. There are specialized load balancing products available, but many general purpose reverse proxies also provide load balancing functionality.

Security

A reverse proxy can hide the topology and characteristics of your back-end servers by removing the need for direct internet access to them. You can place your reverse proxy in an internet facing DMZ, but hide your web servers inside a non-public subnet.

Authentication

You can use your reverse proxy to provide a single point of authentication for all HTTP requests.

SSL Termination

Here the reverse proxy handles incoming HTTPS connections, decrypting the requests and passing unencrypted requests on to the web servers. This has several benefits:

  • Removes the need to install certificates on many back end web servers.
  • Provides a single point of configuration and management for SSL/TLS
  • Takes the processing load of encrypting/decrypting HTTPS traffic away from web servers.
  • Makes testing and intercepting HTTP requests to individual web servers easier.

Serving Static Content

Not strictly speaking ‘reverse proxying’ as such. Some reverse proxy servers can also act as web servers serving static content. The average web page can often consist of megabytes of static content such as images, CSS files and JavaScript files. By serving these separately you can take considerable load from back end web servers, leaving them free to render dynamic content.

Caching

The reverse proxy can also act as a cache. You can either have a dumb cache that simply expires after a set period, or better still a cache that respects Cache-Control and Expires headers. This can considerably reduce the load on the back-end servers.

Compression

In order to reduce the bandwidth needed for individual requests, the reverse proxy can decompress incoming requests and compress outgoing ones. This reduces the load on the back-end servers that would otherwise have to do the compression, and makes debugging requests to, and responses from, the back-end servers easier.

Centralised Logging and Auditing

Because all HTTP requests are routed through the reverse proxy, it makes an excellent point for logging and auditing.

URL Rewriting

Sometimes the URL scheme that a legacy application presents is not ideal for discovery or search engine optimisation. A reverse proxy can rewrite URLs before passing them on to your back-end servers. For example, a legacy ASP.NET application might have a URL for a product that looks like this:

http://www.myexampleshop.com/products.aspx?productid=1234

You can use a reverse proxy to present a search engine optimised URL instead:

http://www.myexampleshop.com/products/1234/lunar-module

Aggregating Multiple Websites Into the Same URL Space

In a distributed architecture it’s desirable to have different pieces of functionality served by isolated components. A reverse proxy can route different branches of a single URL address space to different internal web servers.

For example, say I’ve got three internal web servers:

http://products.internal.net/
http://orders.internal.net/
http://stock-control.internal.net/

I can route these from a single external domain using my reverse proxy:

http://www.example.com/products/    -> http://products.internal.net/
http://www.example.com/orders/ -> http://orders.internal.net/
http://www.example.com/stock/ -> http://stock-control.internal.net/

To an external customer it appears that they are simply navigating a single website, but internally the organisation is maintaining three entirely separate sites. This approach can work extremely well for web service APIs where the reverse proxy provides a consistent single public facade to an internal distributed component oriented architecture.

So …

So, a reverse proxy can off load much of the infrastructure concerns of a high-volume distributed web application.

We’re currently looking at Nginx for this role. Expect some practical Nginx related posts about how to do some of this stuff in the very near future.

Happy proxying!

10 comments:

Ken Egozi said...

From High Availability standpoint, you'd probably need more than a single instance of the reverse-proxy server running. When that is the case, some of the factors you listed become void (centralized management etc - what is the difference between multiple "front ends" and multiple "back ends"?)

also - if you do choose to go for a separate reverse-proxy layer, yet you are still in windows-land, why not use IIS for that? modern IIS has most of the abilities listed out of the box, and is rather easy to extend for specific uses (i.e. smart load balancing etc.)
Nginx is a great tool, but adding another tool to the mix comes with a tax (security updates is an obvious one)

Mike Hadlow said...

Thanks Ken, some very good points.

Anonymous said...

Which reverse proxy products do you recommend?

REA_ANDREW said...

Have you looked at Varnish in your product comparisons?

Joao Franco said...

Thank you for your article. Helps me understand what is reverse proxy for a layman.

Unknown said...

I want to share with you this video what is reverse proxy
I have created it, I tried to explain reverse proxy.

4m said...

Ver good, no one can explain this so simple, thank you

Damon said...

Regarding the points made under "SSL Termination"... I was always under the believe you should treat the DMZ as a hostile area, even considering traffic traversing from the DMZ to a back end server. Consequently, you'd want to SSL that traffic as well. Thoughts/opinions?

Damon

Damon said...

Regarding the discussion points under "SSL Termination", I was always under the belief you should treat the DMZ as hostile territory, and thus secure sensitive information even traversing the backside (from a DMZ proxy to a back end app server) with SSL. Thoughts / opinions?

Malcolm Turnbull said...

Mike,
You make a nice clear description of the benefits of a reverse proxy such as Nginx or HAProxy.

But you do get some negatives, especially when you deploy a reverse proxy in a two arm configuration (which is unnecessary if you already have a DMZ).
It drives me nuts when people have DMZs inside DMZs - why not just deploy your proxy in one-arm mode?

It can also potentially make debugging problems difficult - people always blame the load balancer :-).