We need something more sophisticated than we have now to model releng processes. Right now, we have a collection of shell scripts, python bits, and koji tasks that all know "how to do" whatever it is that needs to be done. Whatever artifacts they produce, is what we produce.
When we introduced new types of artifacts (server/cloud/workstation, vagrant, docker, atomic, etc..) as requirements for releng in the past few years, we started to strain the existing processes. Those scripts became much more complicated and difficult to debug.
Long term, we would like to move to a more structured architecture for releng workflow, one that uses basic software engineering paradigms, like MVC. To start on that journey, we're looking to deploy something which can serve just as the M there (the Model).
With such a thing, we could rewrite some of our scripts to behave dynamically in response to state of the model. In the best case scenario (read: utopia), we would simply define a new variant of a deliverable in the model, and our tools would produce it. (Of course, things will involve more work than that).
- We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
- We need something which can be queried to find out what specific artifacts releng produced in the past (yesterday, last week, etc..).
- We need something which can be queried to find out what inputs go into which artifacts.
- That system should be eventually consistent with respect to the rest of our infrastructure.
Ideas for populating it over time:
- Approach 1: We could instrument all of our existing releng tools to feed info to PDC about what they are doing, as they do it.
- Approach 2: Write a pdc-updater project. It will be a single service that listens for general activity from those tools on the fedmsg bus, and updates PDC about what they're doing.
Problems with Approach 1: we have to modify all the tools. If the PDC API changes, we need to modify it in all those places. We have to distribute PDC credentials to all those tools. None of those tools will work if PDC is not present.
We're going to go with Approach 2. The problem it bears is that a message could potentially be dropped, so we'll have to write an audit script which can run once a day/week in a cron job. It will comb through all our systems and make sure that what PDC thinks is true, is actually true.
List of pdc-updater interactions
For some background, check out the PDC API first.
This is a working list. Please add ideas to it as you see fit. Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.
- When new packages are added to pkgdb, add them to pdc.
- When new packages are added to pkgdb, add them to the pdc bugzilla-components API.
- When new composes are completed by the releng/scripts/, add them to pdc.
- When new images are built in koji, add them the pdc images/ API.
- When new rpms are built in koji, add them to the pdc rpms/ API.
- When new commits are pushed to dist-git, add them to the pdc changesets/ API.
- When new users are added in FAS, add them the persons db.
We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.
Open question - pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24? F23? EPEL7?). We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc.. It was kind of like a primordial PDC.
So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there. Or do we update pkgdb from PDC when an admin adds a new release there.
Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa? I'm inclined to favor the former (making PDC the canonical source).
Another related question we'll use the component-groups feature to indicate what rings things are in. Should PDC just be the place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?
Below here are Old Notes that are super hand-wavey, and maybe not relevant any more.
The initial idea for "composedb" was to have something that knows what goes into every compose and what comes out of it: the atomic repos, the live cds, etc.. what's in them. what's in cloud, server, workstation, etc. We need such a thing so that we have a place where we can go and ask what changed between this compose and this compose, so we can easily visualize what's different between primary arch composes and secondary arch composes, etc.
Furthermore, it would be a more robust solution than the releng dash to be able to show (for instance) when was the last nightly compose that worked and to give you a way to visualize when the last updates push was done.
Action We're going to look at PDC (Production Definition Center) for this. It's a Django app that does almost all of that described above.
It may not currently be able to support the notion of "rings" (a la Fedora.NEXT). We need a way to say what's in the different rings (so they can have different policies and processes) (the component groups feature of PDC may be able to model this). And there's lots of things that can be built using this information that we can't do today.
Beyond having a system that knows what inputs go into which releng artifacts, it would be great to then develop tooling around that data source. For instance:
- it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
- furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
- it would be cool to produce reports on the different editions and their artifacts over time. i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
- it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
- leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps. We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck. Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less. we could make sure that "rawhide is never broken".
- it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.
If PDC is the system that knows what we build and what goes into what, consider that also that pungi/koji knows how to build those things (or, it should). We're missing then a third system that knows when to do those rebuild. For a time we were thinking of writing it from scratch and calling the system Outhouse. Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon. After discussions at Flock 2015, we started considering re-using Taskotron for this.
We considered that we can't necessarily use taskotron as-is. We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.
We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.
Writing relengotron tasks -- Check out the format for taskotron tasks. We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!
- How does OSBS fit into this? Is it going to sit purely behind koji as a content-generator?
- How does reactor (a la OSBS) fit into this?
- Add your question here...'