User:Toshio/From RFR to production service
The goal of this page is to properly set expectations of what generic work is needed to get a service into production on infrastructure and explain how each requirement leads to a more sustainable infrastructure that can maintain its reliability as it grows in number of services.
Deploying the service gives us a good idea about the hardware requirements of a new service but it doesn't give us a complete story about manpower. With manpower we are attempting to ensure that we always have people that are maintaining the service and that they have the time and skills to do a good job of maintaining it. Doing this when faced with the fact that people can come and go from the project is not easy but making the attempt will hopefully keep us from problems like the talk.fedoraproject.org, blogs.fedoraproject.org, or translate.fedoraproject.org which we had to remove.
How many maintainers?
- Definitely more than one. We don't want the single maintainer to leave the project or get busy with real life and suddenly we have no one who knows how to maintain the service.
- Perhaps maintainers who have shown more past experience with mentoring/bringing onboard new people to help build their projects would need to have less redundancy than those who haven't -- the expectation here would be that the maintainers *are* onboarding new admins to work on their pet services, though.
Commitment is hard in a volunteer project. Sometimes you unexpectedly have to change jobs or deadlines at work conflict with when something important is happening to your service in infrastructure. This might be something where we have to mitigate "If the maintainer leaves" rather than trying to get hard and fast commitments of time from people.
OTOH, having some statement of commitment is desirable. That way we know that the service will at least get a good solid launch.
How much work does the service take to maintain?
Estimates of the amount of work might include:
- How often does the package need to be updated (for instance for security fixes)?
- Do we sometimes have to program our own code to fix things/auth to fas/etc?
- Is the upstream alive or dead?
- Do we have a relationship with upstream where we can ask them to do things for us?
- Is the upstream branch going to be producing bugfixes (or at least, security fixes) to the service for a long time?
- How easy is updating? (yum update && done at one end of the spectrum; we're packaging ourselves, porting our custom addons, run a series of scripts to update the production database, update the config file, finally, take an outage to actually do the update at the other end of the spectrum).