Infrastructure/RFR Responsibilities(draft)

From FedoraProject

< Infrastructure
Revision as of 22:58, 16 March 2011 by Ricky (Talk | contribs)

Jump to: navigation, search
Warning (medium size).png
This page is a draft only
It is still under construction and content may change. Do not rely on the information on this page.

Responsibilities of an RFR Owner

If you have a service that is accepted as an RFR that you plan to eventually have deployed permanently in Fedora you are signing up for maintainance as well as for initial coding and deployment. Keep that in mind :-)

Here's a list of some of the things that we expect of an RFR maintainer. Note that infrastructure is a team so it won't just be on your shoulders to do these things but equally, infrastructure is a team of which you're a part and it's not fair to the rest of the team to bring in new maintainance burden without pulling your own weight.

Bringing in a team of people to do these things is always appreciated so there's not a single point of failure.

  • Recruiting and training other people to work on the service so that you aren't a single point of failure
  • Applying rpm updates to the service (and to any underlying pieces of the application stack) if necessary.
    • Note that there's infra policies on freeze periods around release and updating pieces of the software stack may require you to interact with the teams working on other services deployed in infrastructure.
  • Applying hotfixes via the puppet hotfix module if there's a securiry fix or bugfix that needs to go in and it's not worth spinning a new rpm.
  • Keeping up with upstream development
    • This includes keeping track of security fixes. Note that for many apps, this task involves much more work than simply following the Fedora package updates.
  • Answering questions about whether a yum update (to your app or to the underlying stack) might break your app.
    • Also, testing if you don't immediately know the answer
    • Also coding patches to fix things should it become apparent that the app is broken with the update
  • Fixing things should an app start throwing errors in production for unknown reasons
    • Could include deploying to staging
    • Could include coding and diagnosing
    • Could include spending long hours staring at log files
  • Work on deployment problems
    • It's too slow, what can we change to speed up this page?
    • Testing things in staging before deploying to production
    • Rolling new rpms of the application