Infrastructure Cleanup Tasks 2011

Good beginner/introductory items are marked with (*), but feel free to ask about any item that interests you!

Fix all the things that we have

 * Upgrade TurboGears1 apps to TurboGears2
 * Write automated tests using TG2's test framework
 * Fix the FAS authenticators to be less chatty
 * Put fas session information into memcached
 * Update FAS to have an admin console (no more direct db needs)
 * Update pkgdb to have an admin console (no more direct db needs)
 * Fix the Django auth providers to be faster
 * Move publictest to the cloud and create a sundown on them
 * Automated hosted projects (*)
 * Automated creation of new machines -- run one command and it's up
 * puppet staging vs production
 * Use yubikey for two-factor auth (instead of either or auth)
 * glusterfs/cloudfs fedorapeople filesystem
 * glusterfs/cloudfs fedorahosted filesystem
 * Talk to mediawiki folks on how to run attachments to mediawiki so that we don't need a special machine (possibly glusterfs again?)
 * upload.wikimedia.org
 * http://wikitech.wikimedia.org/view/Media_server/2011_Media_Storage_plans
 * http://wikitech.wikimedia.org/view/Media_server/Distributed_File_Storage_choices
 * Split db to get fas to a different db server
 * Replicate db so that we don't have a SPOF
 * logging sucks (*)
 * IPs hit proxies but we also need them to hit the app servers. (*)
 * Fas needs to log more actions to its database (this is in a new version of FAS, we just need to upgrade)
 * Do periodic reinstallations of guests (like app servers) so that we know there's nothing changed not in puppet.
 * fix backups
 * Make sure we're backing up everything (*)
 * Stop backing up system binary data (/usr)
 * Reduce koji's resources
 * Finish and deploy coprs
 * go through list of rpm -Va on all hosts (in /var/tmp/global-rpm-va on puppet01) and make sure all the files there have counterparts in puppet to explain their changes (*)

us to trigger passive checks using nsca.
 * Look at whether the git email hook can be done async. If so, make it async and change it to query the packagedb for people to email instead of using the PACKAGE-owner email aliases.  (This will eliminate bounces when the alias does not exist, for instance, new package requests and when the only owner of a package is orphan@fp.o)
 * the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow
 * Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)

Done items

 * Upgrade quota to 2GB
 * Go through and look for any /var/spool/mail entries on all boxes and correct:
 * that they are going there at all
 * what they are outputting
 * Automated build overrides
 * Move transifex to tx.net
 * Move blogs to wordpress.com
 * monitor more things for possible problems (*)
 * mail queues on SMTP machines, particularly bastion (*)
 * puppet reports to make sure that puppet is being run regularly on all managed machines. (*)


 * Go through and look at all entries in /var/spool/cron on every system and make sure it is:
 * not duplicated
 * supposed to be running at all
 * if you find tasks think about moving them to a file in /etc/cron.d/ on those systems in puppet
 * setup epylog on log01/log02 and start configuring weedlists and report collections
 * Create a new TG captcha widget that is easier for human's to use
 * My idea would be -- image of a simple math equation (7 + 92 = ?). Human types in the answer to that (*)