Fedora infrastructure tasks 2013

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
(Idea box for 2012 and beyond)
(Add more details)
 
(10 intermediate revisions by 4 users not shown)
Line 3: Line 3:
 
= overview =  
 
= overview =  
  
This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc).  
+
This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc). After that it may be repurposed to note those things we are actually going to work on
 +
in the coming year.  
  
 
= fudcon =
 
= fudcon =
  
Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them)
+
Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them). I am planning to try and have a high level "These are things we want to work on" session saturday morning. Hopefully everyone can attend that and then we can try and go off and do those things.
  
 
== technical sessions (friday) ==
 
== technical sessions (friday) ==
 +
 +
* Several infrastructure folks will be giving tech talks. Please attend and heckle^Whelp.
  
 
== hackfests (saturday and sunday) ==
 
== hackfests (saturday and sunday) ==
Line 32: Line 35:
  
 
* Move publictest to the cloud and create a sundown on them
 
* Move publictest to the cloud and create a sundown on them
 +
* Move dev instances all to the cloud.
 
* Make a push-based fasClient with ansible; replace the fasClient cron job on the infra boxes with it.
 
* Make a push-based fasClient with ansible; replace the fasClient cron job on the infra boxes with it.
  
Line 47: Line 51:
 
** This does mean porting old app to new(er) framework
 
** This does mean porting old app to new(er) framework
 
* Question: what are we going to do when/if EL7 is released in 2013 ? (From an app point of view)
 
* Question: what are we going to do when/if EL7 is released in 2013 ? (From an app point of view)
 +
* Setup an Intrustion Detection System (lmacken)
 +
** Have had great experiences with using [http://www.openinfosecfoundation.org suricata] personally...
 +
* restructure our app/proxy layout: (skvidal)
 +
** our current app model makes it difficult to determine which app is causing the problem. so our solutions tend to be pretty coarse-grained. Given the failure-prone state of our apps it would seem like we should adopt a model which makes it simpler to see where the problems are coming from. As our apps stabilize we can move to an environment sharing more resources.
 +
* ARM servers in infrastructure
 +
** Discuss issues around using some ARM instances for our needs.
 +
** Would need to likely use Fedora instead of RHEL
 +
** What things would be good for them?
 +
* Revamp nagios
 +
** Use check_mk on all machines and add a small amount of custom checks on top.
 +
** Automate adding nodes, etc
 +
* Extend 2 factor auth or other security measures past sysadmin groups?
 +
** hosted? pkgs? specific groups?
 +
** signed commits?
 +
* Fedorahosted-ng
 +
** Ditch trac for something better?
 +
** gitlabhq or other easier interface for git repos?
 +
** Decentralize!
 +
* Search engine? try and get dpsearch working again?
 +
* Rework fasClient (laxathom)
 +
** By make it daemonize-able
 +
** By make it a friend of fedmsg so we can trigger actions (push-mode) only when needed based on server's profile.
 +
* Interactive shell for fas administration (laxathom)
 +
** so we can avoid hacking directly into fas's DB by example.
 +
* Koji-stg (laxathom)
 +
** just finish what should be done here.
 +
* fedorahosted - auto-setup-scm-project (laxathom)
 +
** Use the rework above from fasClient so we can trigger creation (push-mode) of scm once related group has been created from fas.
 +
* FAS v3 (laxathom)
 +
** Started a proposal at https://fedorahosted.org/fas/wiki/docs/draft/FAS3#no1
 +
** This proposal will need some rework with all the nice features we added in 2012.
  
 
= old stuff from 2011 / 2012 =  
 
= old stuff from 2011 / 2012 =  
Line 59: Line 94:
 
* Update pkgdb to have an admin console (no more direct db needs)
 
* Update pkgdb to have an admin console (no more direct db needs)
 
* Fix the Django auth providers to be faster
 
* Fix the Django auth providers to be faster
* Move publictest to the cloud and create a sundown on them
 
 
* Automated hosted projects (*)
 
* Automated hosted projects (*)
 
* Automated creation of new machines -- run one command and it's up
 
* Automated creation of new machines -- run one command and it's up
Line 70: Line 104:
 
** Reduce koji's resources
 
** Reduce koji's resources
 
* Finish and deploy coprs
 
* Finish and deploy coprs
* go through list of rpm -Va on all hosts (in /var/tmp/global-rpm-va on puppet01) and make sure all the files there have counterparts in puppet to explain their changes (*)
+
* the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow us to trigger passive checks using nsca. (ties to nagios revamp)
* Look at whether the git email hook can be done async.  If so, make it async and change it to query the packagedb for people to email instead of using the PACKAGE-owner email aliases.  (This will eliminate bounces when the alias does not exist, for instance, new package requests and when the only owner of a package is orphan@fp.o)
+
* the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow
+
us to trigger passive checks using nsca.
+
 
* Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)
 
* Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)
  
 
See: https://fedoraproject.org/wiki/Infrastructure_Cleanup_Tasks_2011 for more.
 
See: https://fedoraproject.org/wiki/Infrastructure_Cleanup_Tasks_2011 for more.

Latest revision as of 14:02, 19 January 2013

Contents

[edit] 2013 Fedora Infrastructure tasks

[edit] overview

This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc). After that it may be repurposed to note those things we are actually going to work on in the coming year.

[edit] fudcon

Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them). I am planning to try and have a high level "These are things we want to work on" session saturday morning. Hopefully everyone can attend that and then we can try and go off and do those things.

[edit] technical sessions (friday)

  • Several infrastructure folks will be giving tech talks. Please attend and heckle^Whelp.

[edit] hackfests (saturday and sunday)

  • cloudy with a chance of infrastructure - finish up stuff around private clouds, move to production.
  • revamp our apprentice/new contributor process - figure out a way to get more people involved long term. (more mentoring?)
  • ansible - figure out any setup and questions, timetable to replace puppet

[edit] lightning talks (friday)

[edit] 2013

This will be a list of things we want to get done in those timeframes.

[edit] 2013 infrastructure FAD

The fad worked great to get 2 factor auth done, if we can get funding we should consider another on another topic. Ideas welcome here.

  • monitoring - fix nagios, revamp how we manage it, make it stop bothering us all, but still tell us about issues, etc.

[edit] In the Fedora 19 cycle

  • Move publictest to the cloud and create a sundown on them
  • Move dev instances all to the cloud.
  • Make a push-based fasClient with ansible; replace the fasClient cron job on the infra boxes with it.

[edit] In the Fedora 20 cycle

[edit] Idea box for 2012 and beyond

  • Integrate jenkins into our infrastructure and framework (pingou)
  • Make a clearer division between back-end and front-end in our (web)-app (pingou)
    • Helps with testing (unit-tests)
    • Reduces the dependency of the application to a particular framework
  • Automate the generation of the statistics report: https://fedoraproject.org/wiki/Statistics (pingou)
  • Reduce the number of framework used ? (pingou)
    • This does mean porting old app to new(er) framework
  • Question: what are we going to do when/if EL7 is released in 2013 ? (From an app point of view)
  • Setup an Intrustion Detection System (lmacken)
    • Have had great experiences with using suricata personally...
  • restructure our app/proxy layout: (skvidal)
    • our current app model makes it difficult to determine which app is causing the problem. so our solutions tend to be pretty coarse-grained. Given the failure-prone state of our apps it would seem like we should adopt a model which makes it simpler to see where the problems are coming from. As our apps stabilize we can move to an environment sharing more resources.
  • ARM servers in infrastructure
    • Discuss issues around using some ARM instances for our needs.
    • Would need to likely use Fedora instead of RHEL
    • What things would be good for them?
  • Revamp nagios
    • Use check_mk on all machines and add a small amount of custom checks on top.
    • Automate adding nodes, etc
  • Extend 2 factor auth or other security measures past sysadmin groups?
    • hosted? pkgs? specific groups?
    • signed commits?
  • Fedorahosted-ng
    • Ditch trac for something better?
    • gitlabhq or other easier interface for git repos?
    • Decentralize!
  • Search engine? try and get dpsearch working again?
  • Rework fasClient (laxathom)
    • By make it daemonize-able
    • By make it a friend of fedmsg so we can trigger actions (push-mode) only when needed based on server's profile.
  • Interactive shell for fas administration (laxathom)
    • so we can avoid hacking directly into fas's DB by example.
  • Koji-stg (laxathom)
    • just finish what should be done here.
  • fedorahosted - auto-setup-scm-project (laxathom)
    • Use the rework above from fasClient so we can trigger creation (push-mode) of scm once related group has been created from fas.
  • FAS v3 (laxathom)

[edit] old stuff from 2011 / 2012

Here's stuff we talked about in the past and never got done:

  • Upgrade TurboGears1 apps to TurboGears2
    • Write automated tests using TG2's test framework
  • Fix the FAS authenticators to be less chatty
    • Put fas session information into memcached
  • Update FAS to have an admin console (no more direct db needs)
  • Update pkgdb to have an admin console (no more direct db needs)
  • Fix the Django auth providers to be faster
  • Automated hosted projects (*)
  • Automated creation of new machines -- run one command and it's up
  • glusterfs/cloudfs fedorapeople filesystem
  • Replicate db so that we don't have a SPOF
  • logging sucks (*)
    • IPs hit proxies but we also need them to hit the app servers. (*)
    • Fas needs to log more actions to its database (this is in a new version of FAS, we just need to upgrade)
  • Do periodic reinstallations of guests (like app servers) so that we know there's nothing changed not in puppet.
    • Reduce koji's resources
  • Finish and deploy coprs
  • the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow us to trigger passive checks using nsca. (ties to nagios revamp)
  • Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)

See: https://fedoraproject.org/wiki/Infrastructure_Cleanup_Tasks_2011 for more.