From Fedora Project Wiki
No edit summary
(Convert to a Change)
Line 1: Line 1:
== Requirements ==
<!-- Self Contained or System Wide Change Proposal?
Use this guide to determine to which category your proposed change belongs to.
 
Self Contained Changes are:
* changes to isolated/leaf package without the impact on other packages/rest of the distribution
* limited scope changes without the impact on other packages/rest of the distribution
* coordinated effort within SIG with limited impact outside SIG functional area, accepted by the SIG
 
System Wide Changes are:
* changes that does not fit Self Contained Changes category touching
* changes that require coordination within the distribution (for example mass rebuilds, release engineering or other teams effort etc.)
* changing system defaults
 
For Self Contained Changes, sections marked as "REQUIRED FOR SYSTEM WIDE CHANGES" are OPTIONAL but FESCo/Wrangler can request more details (especially in case the change proposal category is
improper or updated to System Wide category). For System Wide Changes all fields on this form are required for FESCo acceptance (when applies). 
 
We request that you maintain the same order of sections so that all of the change proposal pages are uniform.
-->
 
<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name.  This keeps all change proposals in the same namespace -->
 
<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name.  This keeps all change proposals in the same namespace -->
= PDC =
 
== Summary ==
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. -->
The [https://github.com/release-engineering/product-definition-center Product
Definition Center (PDC)] is a webapp and API designed for storing and querying
product metadata. We want to stand up an instance in Fedora Infrastructure and
automatically populate it with data from our existing releng tools/processes.
It will enable us to develop more sane tooling down the road for future releases.
 
== Owner ==
<!--
For change proposals to qualify as self-contained, owners of all affected packages need to be included here. Alternatively, a SIG can be listed as an owner if it owns all affected packages.
This should link to your home wiki page so we know who you are.
-->
* Name: [[User:Ralph| Ralph Bean]]
* Email: rbean@redhat.com
* Release notes owner: <!--- To be assigned by docs team [[User:FASAccountName| Release notes owner name]] <email address> -->
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
-->
<!--- UNCOMMENT only if this Change aims specific product, working group (Cloud, Workstation, Server, Base, Env & Stacks)
* Product:
* Responsible WG:
-->
 
== Current status ==
* Targeted release: [[Releases/24 | Fedora 24 ]]
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page
Bugzilla states meaning as usual:
NEW -> change proposal is submitted and announced
ASSIGNED -> accepted by FESCo with on going development
MODIFIED -> change is substantially done and testable
ON_QA -> change is code completed and could be tested in the Beta release (optionally by QA)
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development
-->
* Tracker bug: <will be assigned by the Wrangler>
 
== Detailed Description ==
 
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->


We need something more sophisticated than we have now to model releng
We need something more sophisticated than we have now to model releng
Line 21: Line 84:
tools would produce it.  (Of course, things will involve more work than that).
tools would produce it.  (Of course, things will involve more work than that).


Requirements:
=== Requirements ===


* We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
* We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
Line 30: Line 93:
* That system should be [https://en.wikipedia.org/wiki/Eventual_consistency eventually consistent] with respect to the rest of our infrastructure.
* That system should be [https://en.wikipedia.org/wiki/Eventual_consistency eventually consistent] with respect to the rest of our infrastructure.


== Design ==
=== Design ===


For this central know-it-all system, we're thinking of deploying
For this central know-it-all system, we're going to deploy
[https://github.com/release-engineering/product-definition-center PDC].  We
[https://github.com/release-engineering/product-definition-center PDC].  We
have a [https://pdc.fedorainfracloud.org dev instance] set up, but without any
have a [https://pdc.fedorainfracloud.org dev instance] set up, but without any
Line 45: Line 108:
Problems with ''Approach 1'':  we have to modify ''all'' the tools.  If the PDC API changes, we need to modify it in ''all'' those places.  We have to distribute PDC credentials to ''all'' those tools.  None of those tools will work if PDC is not present.
Problems with ''Approach 1'':  we have to modify ''all'' the tools.  If the PDC API changes, we need to modify it in ''all'' those places.  We have to distribute PDC credentials to ''all'' those tools.  None of those tools will work if PDC is not present.


We're going to go with ''Approach 2''.  The problem it bears is that a message could potentially be dropped, so we'll have to write an audit script which can run once a day/week in a cron job.  It will comb through all our systems and make sure that what PDC ''thinks'' is true, is actually true.
We're going to go with ''Approach 2''.  The problem it bears is that a message could theoretically be dropped, so we'll have to write an audit script which can run once a day/week in a cron job.  It will comb through all our systems and make sure that what PDC ''thinks'' is true, is actually true.


=== List of pdc-updater interactions ===
==== List of pdc-updater interactions ====


For some background, check out the [https://pdc.fedorainfracloud.org/rest_api/v1/ PDC API] first.
For some background, check out the [https://pdc.fedorainfracloud.org/rest_api/v1/ PDC API] first.


This is a working list.  Please add ideas to it as you see fit.  Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.
This is a base list -- we will likely add new interactions as we go along.  Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.


* When new packages are added to pkgdb, add them to pdc.
* When new packages are added to pkgdb, add them to pdc.
Line 63: Line 126:
We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.
We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.


'''Open question''' - pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24?  F23?  EPEL7?).  We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc..  It was kind of like a primordial PDC.
==== Open Questions ====
 
* pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24?  F23?  EPEL7?).  We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc..  It was kind of like a primordial PDC. So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there.  Or do we update pkgdb from PDC when an admin adds a new release there.  Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa?  I'm inclined to favor the former (making PDC the canonical source).
* We'll use the component-groups feature to indicate what rings things are in.  Should PDC just be ''the'' place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?
 
=== The Hand-Wavy Future ===
 
Beyond having a system that knows ''what'' inputs go into which releng artifacts (PDC), it would be great to then develop tooling around that data source.  For instance:
 
* it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
* furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
* it would be cool to produce reports on the different editions and their artifacts over time.  i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
* it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
* leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps.  We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck.  Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less.  we could make sure that "rawhide is never broken".
* it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.
 
These are all things that '''are not a part of this Change''', but are ideas that will be easier to implement after this Change is completed.
 
If PDC is the system that '''knows what''' we build and what goes into what, consider that also that pungi/koji '''knows how''' to build those things (or, it should).  We're missing then a third system that '''knows when''' to do those rebuild.  For a time we were thinking of writing it from scratch and calling the system [https://twitter.com/TheMaxamillion/status/608040785829871616 Outhouse].  Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon.  After discussions at Flock 2015, we started considering re-using a privileged instance of [[Taskotron]] for this.
 
We considered that we can't necessarily use the qa instance of taskotron as-is.  We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.
 
We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.
 
'''Writing relengotron tasks''' -- Check out the [http://libtaskotron.readthedocs.org/en/latest/taskyaml.html format for taskotron tasks].  We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!
 
== Benefit to Fedora ==
 
If Fedora is the sausage, then the releng toolchain is "how the sausage gets made".  We'll hopefully end up with a sausage-making pipeline that is less gross and more maintainable.
 
== Scope ==
 
Note that this change should not affect any other development efforts.  It does not require new instrumentation of any of our existing tools and so, should it fail as a project, there is no need for a contingency plan to back things out -- we can just abandon it.
 
* Proposal owners:
** Set up a devel instance of PDC (already [https://pdc.fedorainfracloud.org done here]).
** Write pdc-updater, the daemon that updates PDC with data from our existing toolchain (via fedmsg).
** Write an audit script that checks that PDC's data is consistent.
** Set up and deploy staging and production instances of PDC and pdc-updater in fedora-infra.
** Run the audit scripts to ensure that PDC's knowledge is consistent with the actual state of our release infra.
** Install the audit script in cron (or something) and attach it to a nagios alert, so we're made aware of inconsistencies.
 
* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- What work do other developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
 
* Release engineering: N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)?  Is a mass rebuid required?  If a rel-eng ticket exists, add a link here.
Please work with releng prior to feature submission, and ensure that someone is on board to do any process development work and testing; don't just assume that a bullet point in a change puts someone else on the hook.-->
** [[Fedora_Program_Management/ReleaseBlocking/Fedora{{FedoraVersionNumber|next}}|List of deliverables]]: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Please check the list of Fedora release deliverables and list all the differences the feature brings -->
 
* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Do the packaging guidelines or other documents need to be updated for this feature?  If so, does it need to happen before or after the implementation is done?  If a FPC ticket exists, add a link here. -->


So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there. Or do we update pkgdb from PDC when an admin adds a new release there.
* Trademark approval: N/A (not needed for this Change)
<!-- If your Change may require trademark approval (for example, if it is a new Spin), file a ticket ( https://fedorahosted.org/council/ ) requesting trademark approval from the Fedora Council. This approval will be done via the Council's consensus-based process. -->


Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa? I'm inclined to favor the former (making PDC the canonical source).
== Upgrade/compatibility impact ==
<!-- What happens to systems that have had a previous versions of Fedora installed and are updated to the version containing this change? Will anything require manual configuration or data migration? Will any existing functionality be no longer supported? -->


'''Another related question''' we'll use the component-groups feature to indicate what rings things are in.  Should PDC just be ''the'' place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)


== Old Notes ==
== How To Test ==
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this change implementation is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.


Below here are '''Old Notes''' that are super hand-wavey, and maybe not relevant any more.
Remember that you are writing this how to for interested testers to use to check out your change implementation - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your change.


These are notes from a [https://lists.fedoraproject.org/pipermail/rel-eng/2015-August/020562.html "composedb" brainstorming session] on August 5th and then revised based on conversations at [[Flock_2015]].
A good "how to test" should answer these four questions:


The initial idea for "composedb" was to have something that knows what goes into every compose and what comes out of it: the atomic repos, the live cds, etc.. what's in them.  what's in cloud, server, workstation, etc. We need such a thing so that we have a place where we can go and ask what changed between this compose and this compose, so we can easily visualize what's different between primary arch composes and secondary arch composes, etc.
0. What special hardware / data / etc. is needed (if any)?
1. How do I prepare my system to test this change? What packages
need to be installed, config files edited, etc.?
2. What specific actions do I perform to check that the change is
working like it's supposed to?
3. What are the expected results of those actions?
-->


Furthermore, it would be a more robust solution than the [https://apps.fedoraproject.org/releng-dash releng dash] to be able to show (for instance) when was the last nightly compose that worked and to give you a way to visualize when the last updates push was done.
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
The audit script should let us know if PDC's data is consistent with our release infra's output.


'''Action''' We're going to look at PDC (Production Definition Center) for this. It's a Django app that does almost all of that described above.
== User Experience ==
<!-- If this change proposal is noticeable by its target audience, how will their experiences change as a result?  Describe what they will see or notice. -->
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)


It may not currently be able to support the notion of "rings" (a la Fedora.NEXT). We need a way to say what's in the different rings (so they can have different policies and processes) (the ''component groups'' feature of PDC may be able to model this).
== Dependencies ==
And there's lots of things that can be built using this information that we can't do today.
<!-- What other packages (RPMs) depend on this package? Are there changes outside the developers' control on which completion of this change depends?  In other words, completion of another change owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel change)? -->


----
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)


Beyond having a system that knows ''what'' inputs go into which releng artifacts, it would be great to then develop tooling around that data source.  For instance:
== Contingency Plan ==


* it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "Revert the shipped configuration".  Or it might not (e.g. rebuilding a number of dependent packages).  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy. -->
* furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
* Contingency mechanism: (What to do?  Who will do it?) N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* it would be cool to produce reports on the different editions and their artifacts over time.  i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
<!-- When is the last time the contingency mechanism can be put in place?  This will typically be the beta freeze. -->
* it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
* Contingency deadline: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps.  We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck.  Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less.  we could make sure that "rawhide is never broken".
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
* it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.
* Blocks release? N/A (not a System Wide Change), No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks product? N/A (not a System Wide Change) <!-- Applicable for Changes that blocks specific product release/Fedora.next -->


If PDC is the system that '''knows what''' we build and what goes into what, consider that also that pungi/koji '''knows how''' to build those things (or, it should). We're missing then a third system that '''knows when''' to do those rebuild. For a time we were thinking of writing it from scratch and calling the system [https://twitter.com/TheMaxamillion/status/608040785829871616 Outhouse].  Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon.  After discussions at Flock 2015, we started considering re-using [[Taskotron]] for this.
== Documentation ==
<!-- Is there upstream documentation on this change, or notes you have written yourself? Link to that material here so other interested developers can get involved. -->


We considered that we can't necessarily use taskotron as-is.  We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)


We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this change, indicate them here.  A link to upstream documentation will often satisfy this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release.  


'''Writing relengotron tasks''' -- Check out the [http://libtaskotron.readthedocs.org/en/latest/taskyaml.html format for taskotron tasks].  We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!
Release Notes are not required for initial draft of the Change Proposal but has to be completed by the Change Freeze.
-->


Open questions:
[[Category:ChangePageIncomplete]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- The Wrangler announces the Change to the devel-announce list and changes the category to Category:ChangeAnnounced (no action required) -->
<!-- After review, the Wrangler will move your page to Category:ChangeReadyForFesco... if it still needs more work it will move back to Category:ChangePageIncomplete-->


* How does OSBS fit into this?  Is it going to sit purely behind koji as a content-generator?
<!-- Select proper category, default is Self Contained Change -->
* How does reactor (a la OSBS) fit into this?
[[Category:SelfContainedChange]]
* ''Add your question here...'
<!-- [[Category:SystemWideChange]] -->

Revision as of 16:21, 3 November 2015


PDC

Summary

The [https://github.com/release-engineering/product-definition-center Product Definition Center (PDC)] is a webapp and API designed for storing and querying product metadata. We want to stand up an instance in Fedora Infrastructure and automatically populate it with data from our existing releng tools/processes. It will enable us to develop more sane tooling down the road for future releases.

Owner

  • Name: Ralph Bean
  • Email: rbean@redhat.com
  • Release notes owner:

Current status

  • Targeted release: Fedora 24
  • Last updated: 2015-11-03
  • Tracker bug: <will be assigned by the Wrangler>

Detailed Description

We need something more sophisticated than we have now to model releng processes. Right now, we have a collection of shell scripts, python bits, and koji tasks that all know "how to do" whatever it is that needs to be done. Whatever artifacts they produce, is what we produce.

When we introduced new types of artifacts (server/cloud/workstation, vagrant, docker, atomic, etc..) as requirements for releng in the past few years, we started to strain the existing processes. Those scripts became much more complicated and difficult to debug.

Long term, we would like to move to a more structured architecture for releng workflow, one that uses basic software engineering paradigms, like MVC. To start on that journey, we're looking to deploy something which can serve just as the M there (the Model).

With such a thing, we could rewrite some of our scripts to behave dynamically in response to state of the model. In the best case scenario (read: utopia), we would simply define a new variant of a deliverable in the model, and our tools would produce it. (Of course, things will involve more work than that).

Requirements

  • We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
  • We need something which can be queried to find out what specific artifacts releng produced in the past (yesterday, last week, etc..).
  • We need something which can be queried to find out what inputs go into which artifacts.
  • We would like to be able to tier the mapping of inputs to artifacts, so that we can model layered builds.
  • We need something which can be queried to find the QE status of a compose and the QE status of an artifact.
  • That system should be eventually consistent with respect to the rest of our infrastructure.

Design

For this central know-it-all system, we're going to deploy PDC. We have a dev instance set up, but without any data in it, it is useless. We need to populate it, both initially and over time.

Ideas for populating it over time:

  • Approach 1: We could instrument all of our existing releng tools to feed info to PDC about what they are doing, as they do it.
  • Approach 2: Write a pdc-updater project. It will be a single service that listens for general activity from those tools on the fedmsg bus, and updates PDC about what they're doing.

Problems with Approach 1: we have to modify all the tools. If the PDC API changes, we need to modify it in all those places. We have to distribute PDC credentials to all those tools. None of those tools will work if PDC is not present.

We're going to go with Approach 2. The problem it bears is that a message could theoretically be dropped, so we'll have to write an audit script which can run once a day/week in a cron job. It will comb through all our systems and make sure that what PDC thinks is true, is actually true.

List of pdc-updater interactions

For some background, check out the PDC API first.

This is a base list -- we will likely add new interactions as we go along. Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.

  • When new packages are added to pkgdb, add them to pdc.
  • When new packages are added to pkgdb, add them to the pdc bugzilla-components API.
  • When new composes are completed by the releng/scripts/, add them to pdc.
  • When new images are built in koji, add them the pdc images/ API.
  • When new rpms are built in koji, add them to the pdc rpms/ API.
  • When new commits are pushed to dist-git, add them to the pdc changesets/ API.
  • When new users are added in FAS, add them the persons db.

We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.

Open Questions

  • pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24? F23? EPEL7?). We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc.. It was kind of like a primordial PDC. So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there. Or do we update pkgdb from PDC when an admin adds a new release there. Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa? I'm inclined to favor the former (making PDC the canonical source).
  • We'll use the component-groups feature to indicate what rings things are in. Should PDC just be the place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?

The Hand-Wavy Future

Beyond having a system that knows what inputs go into which releng artifacts (PDC), it would be great to then develop tooling around that data source. For instance:

  • it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
  • furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
  • it would be cool to produce reports on the different editions and their artifacts over time. i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
  • it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
  • leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps. We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck. Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less. we could make sure that "rawhide is never broken".
  • it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.

These are all things that are not a part of this Change, but are ideas that will be easier to implement after this Change is completed.

If PDC is the system that knows what we build and what goes into what, consider that also that pungi/koji knows how to build those things (or, it should). We're missing then a third system that knows when to do those rebuild. For a time we were thinking of writing it from scratch and calling the system Outhouse. Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon. After discussions at Flock 2015, we started considering re-using a privileged instance of Taskotron for this.

We considered that we can't necessarily use the qa instance of taskotron as-is. We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.

We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.

Writing relengotron tasks -- Check out the format for taskotron tasks. We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!

Benefit to Fedora

If Fedora is the sausage, then the releng toolchain is "how the sausage gets made". We'll hopefully end up with a sausage-making pipeline that is less gross and more maintainable.

Scope

Note that this change should not affect any other development efforts. It does not require new instrumentation of any of our existing tools and so, should it fail as a project, there is no need for a contingency plan to back things out -- we can just abandon it.

  • Proposal owners:
    • Set up a devel instance of PDC (already done here).
    • Write pdc-updater, the daemon that updates PDC with data from our existing toolchain (via fedmsg).
    • Write an audit script that checks that PDC's data is consistent.
    • Set up and deploy staging and production instances of PDC and pdc-updater in fedora-infra.
    • Run the audit scripts to ensure that PDC's knowledge is consistent with the actual state of our release infra.
    • Install the audit script in cron (or something) and attach it to a nagios alert, so we're made aware of inconsistencies.
  • Other developers: N/A (not a System Wide Change)
  • Release engineering: N/A (not a System Wide Change)
  • Policies and guidelines: N/A (not a System Wide Change)
  • Trademark approval: N/A (not needed for this Change)

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

The audit script should let us know if PDC's data is consistent with our release infra's output.

User Experience

N/A (not a System Wide Change)

Dependencies

N/A (not a System Wide Change)

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), No
  • Blocks product? N/A (not a System Wide Change)

Documentation

N/A (not a System Wide Change)

Release Notes