Balancers Infrastructure SOP
From FedoraProject
At present Fedora does not have a dedicated balancing resource. As a result mod_proxy_balancer (Apache) is being used. Documentation is available on the Apache website. Requests that come in to the proxy servers are balanced using a round robin algorithm based off each request. It flags dead servers as such and no longer sends them traffic. Fedora's set timeout is 3 seconds. This timeout is TCP based, not request based.
Contents |
Contact Information
Owner: Fedora Infrastructure Team
Contact: #fedora-admin, sysadmin-main, sysadmin-web group
Location: Phoenix
Servers: proxy1, proxy2
Purpose: Provides load balancing from the proxy layer to our application layer.
DNS Balancing
In addition to the normal load balancing we are doing at PHX, we also have balancing done via the DNS servers. This allows us to have multiple proxy servers in different locations. While not completely symmetric it does help spread the load. See:
dig fedoraproject.org ;; ANSWER SECTION: fedoraproject.org. 55 IN A 66.35.62.162 fedoraproject.org. 55 IN A 209.132.176.122
You'll notice fedoraproject.org has multiple A records. The '55' above means that these address will expire in 55 seconds (the ttl is set to 60). This allows us to make changes quickly if one site fails. Remote proxy servers access the application and other servers via the VPN.
Accessing the proxy balancer
The balancers run on both proxy1 and proxy2 and cannot be accessed externally. They require a login and only provide basic access to the balancer. The following commands will forward access from the proxy servers to your localhost using ssh tunneling.
ssh -L 8080:proxy1:80 -L 8081:proxy2:80 -L 8082:proxy3.vpn.fedoraproject.org:80 bastion.fedora.redhat.com -N
Once logged in proxy1 should now be accessable at http://localhost:8080/ and proxy2 should be available at http://localhost:8081/ Note that after clicking on links the actual form to make changes is at the bottom of the page. Any disabling or changes will be lost if apache is restarted.
All balancer configs should be stored in:
/etc/httpd/conf.d/balancer.conf
This configuration file is puppet managed. Each application is typically split out for flexability.
Balancer Behavior
The balancer is a very basic HTTP balancer. A request comes in and apache passes that request on to the application server. The application server recieves that request and sends the response back to the proxy server which, in turn, sends the response back to the user. All requests on the application side look as if they are coming from the proxy servers.
If a proxy server sends a request to the application server and it is down, or slow, or for whatever reason cannot make a tcp connection within 3 seconds, that worker is flagged as dead and apache moves on to the next worker. This is typically transparent to the user. Apache then waits 60 seconds before attempting to contact that worker again. If all workers are down apache will send an error back to the client and continue to try all workers until a successful connection is made.
Troubleshooting and Resolution
Pages not being displayed correctly
One common problem with load balancing is having the servers get out of sync or have them be in different states. A result of this can be that a page won't load or when the page does load every other request fails so some css will load, some will not, some images and js scripts will load while others won't. This is considered an outage.
Log in to proxy1 and proxy2 and look at the relevant error logs (/var/log/httpd/). One may have to access the application servers directly to determine which one is out of sync or incorrect in some way. See "Accessing the proxy balancer" below to explicitly disable a cluster or determine what it's current state is.
Outage at a proxy site
Right now we have proxy servers running at Phoenix and at Tummy.com (Denver). By having both addresses in the A record for fedoraproject.org people will, more or less, get sent to both servers. If something fails at either site, for example proxy[1-2] die or the load balancer goes dead, we can direct traffic to the other site by commenting out the dead record in dns. Obviously a complete outage at PHX will cause an outage of some pieces of our site but primary DNS and at least some of the static and cached content in the proxy server should at least allow partial up time.

