Wiki Infrastructure SOP

From FedoraProject

Revision as of 20:11, 3 June 2008 by Anubis (Talk | contribs)

Jump to: navigation, search

Contents

Wiki - SOP

Contact Information

Owner: Fedora Infrastructure Team / Fedora Website Team

Contact: #fedora-admin or #fedora-websites on irc.freenode.net

Location: http://fedoraproject.org/wiki/

Servers: proxy[1-2] app[1-2]

Purpose: Provides our production wiki

Description

Our wiki currently runs moin. It's based off of the stock version in EPEL with an ACL patch. Common performance issues relate to the size of our wiki. Page saves iterate over each user to determine who to contact and pages using dynamic lists based off of category can DOS the site because of iteration over the pages to determine what category they are in.

Architecture

Infrastructure SOP wiki wiki.png

Troubleshooting and Resolution

Pages only partially loading

Idea.png
Symptom: Pages only partially load. Content is missing, images missing or css / formatting issues.
Important.png
Problem: The most common issue here is one of the app servers has gotten overloaded.
Note.png
Solution: Remove the offending app server from the mix by disabling its proxy server. (Note: this is a temporary solution until we get an actual load balancer between the proxy servers and the app servers). For example, if app1 is over loaded, shut off puppet and httpd on proxy1 (proxy1 -> app1, proxy2 -> app2)

High load / unresponsive app server

Idea.png
Symptom: Application server has become unresponsive and has high load
Important.png
Problem: The most common issue here is one of the app servers has gotten overloaded doing something inefficient on the wiki. Some page formattings, searches, emails can cause an app server to get overloaded. This is especially true if the user keeps clicking search or save. This can also be from a popular page being hit (like on release day)
Note.png
Solution: Remove the offending app server from the mix by disabling its proxy server. (Note: this is a temporary solution till we get an actual load balancer between the proxy servers and the app servers). For example, if app1 is over loaded, shut off puppet and httpd on proxy1 (proxy1 -> app1, proxy2 -> app2)
Note.png
Solution 2: If load is high because the wiki is just popular (slashdot, release day, etc) simply find what pages are being hit the most. The following command (below) will list the top 20 pages hit over the last 5 hours. Run it on the proxy servers. Take abnormally popular pages and convert them to a static html page using wget or saving from your browser. Place these static pages on the proxy servers and create an alias or redirect for them. (Don't forget to use puppet to create these aliases, puppet will overwrite your changes. Disable puppet while your testing if needed). If it is not possible to get a static copy of the pages just shut the website down until load comes down enough to get the page.
awk '{ print $7 }' <code>ls -tr fedoraproject.org-access.log.* | \
tail -n 5<code> | grep -v "css\|js\|wikidata\|/wiki/WikiGraphics" | sort | uniq -c | \
sort -n | tail -n 20

UnicodeEncodeError

Idea.png
Symptom: Pages error with !UnicodeEncodeError
Important.png
Problem: NULL chars in log files for the page in question and the main edit-log
Note.png
Solution: Edit the edit-log of the page in question and the main edit-log to remove entries with null chars. An update to Moin is ready upstream to fix this bug.