Fedora infrastructure tasks 2013
2013 Fedora Infrastructure tasks
This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc).
Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them)
technical sessions (friday)
hackfests (saturday and sunday)
- cloudy with a chance of infrastructure - finish up stuff around private clouds, move to production.
- revamp our apprentice/new contributor process - figure out a way to get more people involved long term. (more mentoring?)
lightning talks (friday)
This will be a list of things we want to get done in those timeframes.
2013 infrastructure FAD
The fad worked great to get 2 factor auth done, if we can get funding we should consider another on another topic. Ideas welcome here.
- Logging - fix our application logs, setup secondary/archive host for logs, make logs use write only netapp, make our log processing better.
- monitoring - fix nagios, revamp how we manage it, make it stop bothering us all, but still tell us about issues, etc.
In the Fedora 19 cycle
In the Fedora 20 cycle
old stuff from 2011 / 2012
Here's stuff we talked about in the past and never got done:
- Upgrade TurboGears1 apps to TurboGears2
- Write automated tests using TG2's test framework
- Fix the FAS authenticators to be less chatty
- Put fas session information into memcached
- Update FAS to have an admin console (no more direct db needs)
- Update pkgdb to have an admin console (no more direct db needs)
- Fix the Django auth providers to be faster
- Move publictest to the cloud and create a sundown on them
- Automated hosted projects (*)
- Automated creation of new machines -- run one command and it's up
- glusterfs/cloudfs fedorapeople filesystem
- Replicate db so that we don't have a SPOF
- logging sucks (*)
- IPs hit proxies but we also need them to hit the app servers. (*)
- Fas needs to log more actions to its database (this is in a new version of FAS, we just need to upgrade)
- Do periodic reinstallations of guests (like app servers) so that we know there's nothing changed not in puppet.
- Reduce koji's resources
- Finish and deploy coprs
- go through list of rpm -Va on all hosts (in /var/tmp/global-rpm-va on puppet01) and make sure all the files there have counterparts in puppet to explain their changes (*)
- Look at whether the git email hook can be done async. If so, make it async and change it to query the packagedb for people to email instead of using the PACKAGE-owner email aliases. (This will eliminate bounces when the alias does not exist, for instance, new package requests and when the only owner of a package is firstname.lastname@example.org)
- the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow
us to trigger passive checks using nsca.
- Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)