From Fedora Project Wiki
(Created page with "Raw notes from composedb brainstorming session (still to be formatted) August 5th, 16:00UTC https://lists.fedoraproject.org/pipermail/rel-eng/2015-August/020562.html We shoul...")
 
No edit summary
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Raw notes from composedb brainstorming session (still to be formatted)
<!-- Self Contained or System Wide Change Proposal?
August 5th, 16:00UTC
Use this guide to determine to which category your proposed change belongs to.
https://lists.fedoraproject.org/pipermail/rel-eng/2015-August/020562.html


We should post these notes on the wiki afterwards for posterity.
Self Contained Changes are:
* changes to isolated/leaf package without the impact on other packages/rest of the distribution
* limited scope changes without the impact on other packages/rest of the distribution
* coordinated effort within SIG with limited impact outside SIG functional area, accepted by the SIG


The idea is to have something that knows what goes into every compose and what comes out of it.
System Wide Changes are:
the atomic repos, the live cds, etc.. what's in them.  what's in cloud, server, workstation, etc.
* changes that does not fit Self Contained Changes category touching
so that we have a place where we can go and say what changed between this compose and this compose.
* changes that require coordination within the distribution (for example mass rebuilds, release engineering or other teams effort etc.)
so we can easily visualize what's different between primary this and s390 that.
* changing system defaults


Once upon a time we were thinking about keeping track of what's in development what's EOL, when was the last nightly compose that worked.
For Self Contained Changes, sections marked as "REQUIRED FOR SYSTEM WIDE CHANGES" are OPTIONAL but FESCo/Wrangler can request more details (especially in case the change proposal category is
improper or updated to System Wide category). For System Wide Changes all fields on this form are required for FESCo acceptance (when applies).  


Give you a way to visualize when the last updates push was done.
We request that you maintain the same order of sections so that all of the change proposal pages are uniform.
-->


Should really look at PDC since sharing tools is great.
<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name. This keeps all change proposals in the same namespace -->


it would be cool if when we're doing the rawhide compose,
<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name. This keeps all change proposals in the same namespace -->
  we can look at see that nothing has changed in XFCE
= Product Definition Center =
so we don't rebuild that, but we do rebuild other things where things actually changed.


with that we can do things as they're needed, instead of once a night or only when release time happens.
== Summary ==
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. -->
The [https://github.com/release-engineering/product-definition-center Product Definition Center (PDC)] is a webapp and API designed for storing and querying product metadata. We want to stand up an instance in Fedora Infrastructure and automatically populate it with data from our existing releng tools/processes.  It will enable us to develop more sane tooling down the road for future releases.


for CI for rpm dependencies, what about koschei?
== Owner ==
  well, it's a CI at the RPM level, but not one at the compose level.
<!--
For change proposals to qualify as self-contained, owners of all affected packages need to be included here. Alternatively, a SIG can be listed as an owner if it owns all affected packages.
This should link to your home wiki page so we know who you are.
-->
* Name: [[User:Ralph| Ralph Bean]]
* Email: rbean@redhat.com
* Release notes owner: <!--- To be assigned by docs team [[User:FASAccountName| Release notes owner name]] <email address> -->
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
-->
<!--- UNCOMMENT only if this Change aims specific product, working group (Cloud, Workstation, Server, Base, Env & Stacks)
* Product:
* Responsible WG:
-->


it would be cool to produce reporting on the different editions over time:
== Current status ==
  - show how the rpm size of workstation is growing (so we can fix it)
* Targeted release: [[Releases/24 | Fedora 24 ]]
  - show how the rpm size of the cloud image is shrinking (so we can cheer it on)
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page
Bugzilla states meaning as usual:
NEW -> change proposal is submitted and announced
ASSIGNED -> accepted by FESCo with on going development
MODIFIED -> change is substantially done and testable
ON_QA -> change is code completed and could be tested in the Beta release (optionally by QA)
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development
-->
* Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1281377 #1281377]


supporting rings stuff properly
== Detailed Description ==
  we need a way to say what's in the different rings (so they can have different policies and processes)
  (anyway, there's lots of things that can come from having this information that we can't do today)


let's open source PDC -- they want to do it and we want to do it.
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->


beyond that, let's think about a system that runs continuously to rebuild things as needed.
We need something more sophisticated than we have now to model releng
internally, it seems unlikely that anything like this exists already. they have a mostly manual process now involving sign-off, etc.
processes.  Right now, we have a collection of shell scripts, python bits, and
koji tasks that all know "how to do" whatever it is that needs to be done.
Whatever artifacts they produce, is what we produce.


maybe - build things to as-complete as they can be, but require human signoff to make things public
When we introduced new types of artifacts (server/cloud/workstation, vagrant,
docker, atomic, etc..) as requirements for releng in the past few years, we
started to strain the existing processes.  Those scripts became much more
complicated and difficult to debug.


leverage taskotron to create side-tags to rebuild stuff (if soname bumps) also to auto-gate things and keep them from reaching the next step in the process.  say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less.  we could make sure that "rawhide is never broken".
Long term, we would like to move to a more structured architecture for releng
workflow, one that uses basic software engineering paradigms, like MVCTo
start on that journey, we're looking to deploy something which can serve just
as the '''M''' there (the ''Model'').


publish fedmsg messages about failures, etc..
With such a thing, we could rewrite some of our scripts to behave dynamically
in response to state of the model. In the best case scenario (read: utopia),
we would simply define a new variant of a deliverable in the model, and our
tools would produce it.  (Of course, things will involve more work than that).


have all actual build processes running in koji (other options are less secure or less supported, jenkins, taskotron?, tunir?)
=== Requirements ===


outhouse could be a place where the policy glue (gating) comes into play by figuring out what goes into artifacts and what "ring" things are in. then we could block things appropriately if such and such input doesn't pass depcheck (for instance).
* We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
* We need something which can be queried to find out what specific artifacts releng produced in the past (yesterday, last week, etc..).
* We need something which can be queried to find out what inputs go into which artifacts.
* We would like to be able to tier the mapping of inputs to artifacts, so that we can model layered builds.
* We need something which can be queried to find the QE status of a compose and the QE status of an artifact.
* That system should be [https://en.wikipedia.org/wiki/Eventual_consistency eventually consistent] with respect to the rest of our infrastructure.


https://twitter.com/TheMaxamillion/status/608040785829871616
=== Design ===


For this central know-it-all system, we're going to deploy
[https://github.com/release-engineering/product-definition-center PDC].  We
have a [https://pdc.fedoraproject.org prod instance] set up, but without any
data in it, it is useless.  We need to populate it, both initially and over
time.


Ideas for populating it over time:


* ''Approach 1'': We could instrument all of our existing releng tools to feed info ''to PDC'' about what they are doing, as they do it.
* ''Approach 2'': Write a [https://github.com/fedora-infra/pdc-updater pdc-updater] project.  It will be a single service that listens for general activity from those tools on the fedmsg bus, and updates PDC about what they're doing.


Requirements
Problems with ''Approach 1'':  we have to modify ''all'' the tools.  If the PDC API changes, we need to modify it in ''all'' those places.  We have to distribute PDC credentials to ''all'' those tools.  None of those tools will work if PDC is not present.
------------


- Solve all the problems
We're going to go with ''Approach 2''.  The problem it bears is that a message could theoretically be dropped, so we'll have to write an audit script which can run once a day/week in a cron job.  It will comb through all our systems and make sure that what PDC ''thinks'' is true, is actually true.
-


Design/implementation notes
==== List of pdc-updater interactions ====
---------------------------


- Written in python
For some background, check out the [https://pdc.fedoraproject.org/rest_api/v1/ PDC API] first.
-
 
This is a base list -- we will likely add new interactions as we go along.  Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.
 
* When new packages are added to pkgdb, add them to pdc.
* When new packages are added to pkgdb, add them to the pdc bugzilla-components API.
* When new composes are completed by the releng/scripts/, add them to pdc.
* When new images are built in koji, add them the pdc images/ API.
* <strike>When new rpms are built in koji, add them to the pdc rpms/ API.</strike> When new rawhide rpms are built in koji and when rpms get <strike>included in a bodhi update</strike> tagged into a stable updates tag, add them to the pdc rpms/ API.  No need to model ''all'' rpms -- no need for the ones we don't end up shipping.
* <strike>When new commits are pushed to dist-git, add them to the pdc changesets/ API.</strike>  It seems that this is not what the PDC changesets/ API is meant to model... and we don't have a use case for modelling git commits, so we'll drop this one.
* When new users are added in FAS, add them the persons db.
 
We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.
 
==== Open Questions ====
 
* pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24?  F23?  EPEL7?).  We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc..  It was kind of like a primordial PDC.  So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there.  Or do we update pkgdb from PDC when an admin adds a new release there.  Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa?  I'm inclined to favor the former (making PDC the canonical source).
* We'll use the component-groups feature to indicate what rings things are in.  Should PDC just be ''the'' place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?
 
=== The Hand-Wavy Future ===
 
Beyond having a system that knows ''what'' inputs go into which releng artifacts (PDC), it would be great to then develop tooling around that data source.  For instance:
 
* it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
* furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
* it would be cool to produce reports on the different editions and their artifacts over time.  i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
* it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
* leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps.  We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck.  Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less.  we could make sure that "rawhide is never broken".
* it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.
 
These are all things that '''are not a part of this Change''', but are ideas that will be easier to implement after this Change is completed.
 
If PDC is the system that '''knows what''' we build and what goes into what, consider that also that pungi/koji '''knows how''' to build those things (or, it should).  We're missing then a third system that '''knows when''' to do those rebuild.  For a time we were thinking of writing it from scratch and calling the system [https://twitter.com/TheMaxamillion/status/608040785829871616 Outhouse].  Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon.  After discussions at Flock 2015, we started considering using a privileged instance of [[Taskotron]] for this instead of writing something from scratch.
 
We considered that we can't necessarily use the qa instance of taskotron as-is.  We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.
 
We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.
 
'''Writing relengotron tasks''' -- Check out the [http://libtaskotron.readthedocs.org/en/latest/taskyaml.html format for taskotron tasks].  We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!
 
=== Why a Change Proposal? ===
 
[[User:jwboyer | Josh Boyer]] [https://lists.fedoraproject.org/pipermail/rel-eng/2015-November/020967.html asked the question on the rel-eng list.]  This is entirely self-contained
in Rel-Eng and Infrastructure.
 
* Is there anything FESCo needs to review here?
* Is it even FESCo's responsibility to approve it?
 
The answer is perhaps "no" to both questions.  However, we're submitting a F24 Change mainly to ''raise visibility'' of the effort.
 
== Benefit to Fedora ==
 
If Fedora is the sausage, then the releng toolchain is "how the sausage gets made".  We'll hopefully end up with a sausage-making pipeline that is less gross and more maintainable.
 
== Scope ==
 
Note that this change should not affect any other development efforts.  It does not require new instrumentation of any of our existing tools and so, should it fail as a project, there is no need for a contingency plan to back things out -- we can just abandon it.
 
* Proposal owners:
** Set up a devel instance of PDC (already [https://pdc.fedorainfracloud.org done here]).
** Write pdc-updater, the daemon that updates PDC with data from our existing toolchain (via fedmsg).
** Write an audit script that checks that PDC's data is consistent.
** Set up and deploy staging and production instances of PDC and pdc-updater in fedora-infra.
** Run the audit scripts to ensure that PDC's knowledge is consistent with the actual state of our release infra.
** Install the audit script in cron (or something) and attach it to a nagios alert, so we're made aware of inconsistencies.
 
* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- What work do other developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
 
* Release engineering: N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)?  Is a mass rebuid required?  If a rel-eng ticket exists, add a link here.
Please work with releng prior to feature submission, and ensure that someone is on board to do any process development work and testing; don't just assume that a bullet point in a change puts someone else on the hook.-->
** [[Fedora_Program_Management/ReleaseBlocking/Fedora{{FedoraVersionNumber|next}}|List of deliverables]]: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Please check the list of Fedora release deliverables and list all the differences the feature brings -->
 
* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Do the packaging guidelines or other documents need to be updated for this feature?  If so, does it need to happen before or after the implementation is done?  If a FPC ticket exists, add a link here. -->
 
* Trademark approval: N/A (not needed for this Change)
<!-- If your Change may require trademark approval (for example, if it is a new Spin), file a ticket ( https://fedorahosted.org/council/ ) requesting trademark approval from the Fedora Council. This approval will be done via the Council's consensus-based process. -->
 
== Upgrade/compatibility impact ==
<!-- What happens to systems that have had a previous versions of Fedora installed and are updated to the version containing this change? Will anything require manual configuration or data migration? Will any existing functionality be no longer supported? -->
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)
 
== How To Test ==
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this change implementation is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.
 
Remember that you are writing this how to for interested testers to use to check out your change implementation - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your change.
 
A good "how to test" should answer these four questions:
 
0. What special hardware / data / etc. is needed (if any)?
1. How do I prepare my system to test this change? What packages
need to be installed, config files edited, etc.?
2. What specific actions do I perform to check that the change is
working like it's supposed to?
3. What are the expected results of those actions?
-->
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
The audit script should let us know if PDC's data is consistent with our release infra's output.
 
== User Experience ==
<!-- If this change proposal is noticeable by its target audience, how will their experiences change as a result?  Describe what they will see or notice. -->
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)
 
== Dependencies ==
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this change depends?  In other words, completion of another change owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel change)? -->
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)
 
== Contingency Plan ==
 
<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "Revert the shipped configuration".  Or it might not (e.g. rebuilding a number of dependent packages).  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy.  -->
* Contingency mechanism: (What to do?  Who will do it?) N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- When is the last time the contingency mechanism can be put in place?  This will typically be the beta freeze. -->
* Contingency deadline: N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
* Blocks release? N/A (not a System Wide Change), No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks product? N/A (not a System Wide Change) <!-- Applicable for Changes that blocks specific product release/Fedora.next -->
 
== Documentation ==
<!-- Is there upstream documentation on this change, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)
 
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this change, indicate them here.  A link to upstream documentation will often satisfy this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release.
 
Release Notes are not required for initial draft of the Change Proposal but has to be completed by the Change Freeze.
-->
 
[[Category:ChangeAcceptedF24]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- The Wrangler announces the Change to the devel-announce list and changes the category to Category:ChangeAnnounced (no action required) -->
<!-- After review, the Wrangler will move your page to Category:ChangeReadyForFesco... if it still needs more work it will move back to Category:ChangePageIncomplete-->
 
<!-- Select proper category, default is Self Contained Change -->
[[Category:SelfContainedChange]]
<!-- [[Category:SystemWideChange]] -->

Latest revision as of 15:11, 20 May 2016


Product Definition Center

Summary

The Product Definition Center (PDC) is a webapp and API designed for storing and querying product metadata. We want to stand up an instance in Fedora Infrastructure and automatically populate it with data from our existing releng tools/processes. It will enable us to develop more sane tooling down the road for future releases.

Owner

  • Name: Ralph Bean
  • Email: rbean@redhat.com
  • Release notes owner:

Current status

Detailed Description

We need something more sophisticated than we have now to model releng processes. Right now, we have a collection of shell scripts, python bits, and koji tasks that all know "how to do" whatever it is that needs to be done. Whatever artifacts they produce, is what we produce.

When we introduced new types of artifacts (server/cloud/workstation, vagrant, docker, atomic, etc..) as requirements for releng in the past few years, we started to strain the existing processes. Those scripts became much more complicated and difficult to debug.

Long term, we would like to move to a more structured architecture for releng workflow, one that uses basic software engineering paradigms, like MVC. To start on that journey, we're looking to deploy something which can serve just as the M there (the Model).

With such a thing, we could rewrite some of our scripts to behave dynamically in response to state of the model. In the best case scenario (read: utopia), we would simply define a new variant of a deliverable in the model, and our tools would produce it. (Of course, things will involve more work than that).

Requirements

  • We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
  • We need something which can be queried to find out what specific artifacts releng produced in the past (yesterday, last week, etc..).
  • We need something which can be queried to find out what inputs go into which artifacts.
  • We would like to be able to tier the mapping of inputs to artifacts, so that we can model layered builds.
  • We need something which can be queried to find the QE status of a compose and the QE status of an artifact.
  • That system should be eventually consistent with respect to the rest of our infrastructure.

Design

For this central know-it-all system, we're going to deploy PDC. We have a prod instance set up, but without any data in it, it is useless. We need to populate it, both initially and over time.

Ideas for populating it over time:

  • Approach 1: We could instrument all of our existing releng tools to feed info to PDC about what they are doing, as they do it.
  • Approach 2: Write a pdc-updater project. It will be a single service that listens for general activity from those tools on the fedmsg bus, and updates PDC about what they're doing.

Problems with Approach 1: we have to modify all the tools. If the PDC API changes, we need to modify it in all those places. We have to distribute PDC credentials to all those tools. None of those tools will work if PDC is not present.

We're going to go with Approach 2. The problem it bears is that a message could theoretically be dropped, so we'll have to write an audit script which can run once a day/week in a cron job. It will comb through all our systems and make sure that what PDC thinks is true, is actually true.

List of pdc-updater interactions

For some background, check out the PDC API first.

This is a base list -- we will likely add new interactions as we go along. Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.

  • When new packages are added to pkgdb, add them to pdc.
  • When new packages are added to pkgdb, add them to the pdc bugzilla-components API.
  • When new composes are completed by the releng/scripts/, add them to pdc.
  • When new images are built in koji, add them the pdc images/ API.
  • When new rpms are built in koji, add them to the pdc rpms/ API. When new rawhide rpms are built in koji and when rpms get included in a bodhi update tagged into a stable updates tag, add them to the pdc rpms/ API. No need to model all rpms -- no need for the ones we don't end up shipping.
  • When new commits are pushed to dist-git, add them to the pdc changesets/ API. It seems that this is not what the PDC changesets/ API is meant to model... and we don't have a use case for modelling git commits, so we'll drop this one.
  • When new users are added in FAS, add them the persons db.

We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.

Open Questions

  • pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24? F23? EPEL7?). We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc.. It was kind of like a primordial PDC. So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there. Or do we update pkgdb from PDC when an admin adds a new release there. Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa? I'm inclined to favor the former (making PDC the canonical source).
  • We'll use the component-groups feature to indicate what rings things are in. Should PDC just be the place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?

The Hand-Wavy Future

Beyond having a system that knows what inputs go into which releng artifacts (PDC), it would be great to then develop tooling around that data source. For instance:

  • it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
  • furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
  • it would be cool to produce reports on the different editions and their artifacts over time. i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
  • it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
  • leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps. We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck. Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less. we could make sure that "rawhide is never broken".
  • it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.

These are all things that are not a part of this Change, but are ideas that will be easier to implement after this Change is completed.

If PDC is the system that knows what we build and what goes into what, consider that also that pungi/koji knows how to build those things (or, it should). We're missing then a third system that knows when to do those rebuild. For a time we were thinking of writing it from scratch and calling the system Outhouse. Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon. After discussions at Flock 2015, we started considering using a privileged instance of Taskotron for this instead of writing something from scratch.

We considered that we can't necessarily use the qa instance of taskotron as-is. We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.

We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.

Writing relengotron tasks -- Check out the format for taskotron tasks. We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!

Why a Change Proposal?

Josh Boyer asked the question on the rel-eng list. This is entirely self-contained in Rel-Eng and Infrastructure.

  • Is there anything FESCo needs to review here?
  • Is it even FESCo's responsibility to approve it?

The answer is perhaps "no" to both questions. However, we're submitting a F24 Change mainly to raise visibility of the effort.

Benefit to Fedora

If Fedora is the sausage, then the releng toolchain is "how the sausage gets made". We'll hopefully end up with a sausage-making pipeline that is less gross and more maintainable.

Scope

Note that this change should not affect any other development efforts. It does not require new instrumentation of any of our existing tools and so, should it fail as a project, there is no need for a contingency plan to back things out -- we can just abandon it.

  • Proposal owners:
    • Set up a devel instance of PDC (already done here).
    • Write pdc-updater, the daemon that updates PDC with data from our existing toolchain (via fedmsg).
    • Write an audit script that checks that PDC's data is consistent.
    • Set up and deploy staging and production instances of PDC and pdc-updater in fedora-infra.
    • Run the audit scripts to ensure that PDC's knowledge is consistent with the actual state of our release infra.
    • Install the audit script in cron (or something) and attach it to a nagios alert, so we're made aware of inconsistencies.
  • Other developers: N/A (not a System Wide Change)
  • Release engineering: N/A (not a System Wide Change)
  • Policies and guidelines: N/A (not a System Wide Change)
  • Trademark approval: N/A (not needed for this Change)

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

The audit script should let us know if PDC's data is consistent with our release infra's output.

User Experience

N/A (not a System Wide Change)

Dependencies

N/A (not a System Wide Change)

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), No
  • Blocks product? N/A (not a System Wide Change)

Documentation

N/A (not a System Wide Change)

Release Notes