From Fedora Project Wiki
mNo edit summary
mNo edit summary
Line 7: Line 7:
The intention is that a team dedicated to managing a specific content type will design and maintain their own Content Generator, in coordination with the Koji developers.  Once the Content Generator is ready for production use it will be given permission to import content and metadata it produces into Koji.  Policies on the Koji hub will validate imported content and metadata and ensure that it is complete and consistent.
The intention is that a team dedicated to managing a specific content type will design and maintain their own Content Generator, in coordination with the Koji developers.  Once the Content Generator is ready for production use it will be given permission to import content and metadata it produces into Koji.  Policies on the Koji hub will validate imported content and metadata and ensure that it is complete and consistent.


=== Requirements for writing a Content Generator ===
== Requirements for writing a Content Generator ==


There are certain requirements a Content Generator must meet before it will be authorized to add content to Koji.  These requirements should be factored into the initial design and implementation of the CG from the start.  Please note that while we're confident that the list below are real requirements, we are not sure this is a complete list.  We have not fully integrated a CG with Koji yet as of this writing.
There are certain requirements a Content Generator must meet before it will be authorized to add content to Koji.  These requirements should be factored into the initial design and implementation of the CG from the start.  Please note that while we're confident that the list below are real requirements, we are not sure this is a complete list.  We have not fully integrated a CG with Koji yet as of this writing.


==== Avoid Using the Host's Software ====
=== Avoid Using the Host's Software ===


During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.
During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.


==== Binaries (or other compiled content) from Upstream May Not become included in output ====
=== Binaries (or other compiled content) from Upstream May Not become included in output ===


Content Generators may pull content from rubygems.org, Nexus, or some other external repository to provide build tools. Content downloaded in this fashion may not be included in CG build output, and may not be imported into Koji. In other words, output must be built from sources in the CG or Koji, not retrieved from the internet.  Tools necessary to build product content can be downloaded and cached in the CG.
Content Generators may pull content from rubygems.org, Nexus, or some other external repository to provide build tools. Content downloaded in this fashion may not be included in CG build output, and may not be imported into Koji. In other words, output must be built from sources in the CG or Koji, not retrieved from the internet.  Tools necessary to build product content can be downloaded and cached in the CG.


==== Log all Transformations of Content ====
=== Log all Transformations of Content ===


When the content is building, as much should be logged as possible. In addition to compilation, if the content goes through other transformations, perhaps changing formats, that should be logged as well. There can be no black-box transformations of the output. Imagine having to figure out how a piece of content was built 5 years into the future to understand the motivation behind this requirement. Details of the build environment and tools used in the environment should be recorded too.
When the content is building, as much should be logged as possible. In addition to compilation, if the content goes through other transformations, perhaps changing formats, that should be logged as well. There can be no black-box transformations of the output. Imagine having to figure out how a piece of content was built 5 years into the future to understand the motivation behind this requirement. Details of the build environment and tools used in the environment should be recorded too.


==== Preserve All Inputs ====
=== Preserve All Inputs ===


All inputs to a build task should be preserved either as logs, a database, or as output of the build itself.
All inputs to a build task should be preserved either as logs, a database, or as output of the build itself.


==== Preserve All Outputs ====
=== Preserve All Outputs ===


Naturally the outputs of a build should be preserved too. Transient artifacts are not strictly required, but if they're not onerous to maintain, they should be included.  It must not be necessary to further transform the content to make it usable.
Naturally the outputs of a build should be preserved too. Transient artifacts are not strictly required, but if they're not onerous to maintain, they should be included.  It must not be necessary to further transform the content to make it usable.


==== Do Not Use Caching Mechanisms ====
=== Do Not Use Caching Mechanisms ===


Content Generators must build without caching mechanisms (in compilers or yum) wherever possible. Caches make reproducing results in the future more difficult, and also introduce layers of indirection that can make debugging a build more difficult. Consider the risk of re-shipping a security flaw that is compiled in because an outdated library was cached in the Content Generator, this is why we have this requirement.
Content Generators must build without caching mechanisms (in compilers or yum) wherever possible. Caches make reproducing results in the future more difficult, and also introduce layers of indirection that can make debugging a build more difficult. Consider the risk of re-shipping a security flaw that is compiled in because an outdated library was cached in the Content Generator, this is why we have this requirement.


=== Metadata ===
== Metadata ==


Metadata will be provided by the Content Generator as a JSON file.  There is a proposal of the [[Koji/ContentGeneratorMetadata|Content Generator Metadata]] format available for review.
Metadata will be provided by the Content Generator as a JSON file.  There is a proposal of the [[Koji/ContentGeneratorMetadata|Content Generator Metadata]] format available for review.

Revision as of 19:12, 5 June 2015

Koji Content Generators

A Koji Content Generator is an external service that generates content (jars, zips, tarballs, .npm, .wheel, .gem, etc) which is then passed to Koji for management and delivery to other processes in the release workflow. Content Generators can evolve independently of the Koji codebase, enabling the build process to be more agile and flexible to changing requirements and new technologies, while allowing Koji to provide stable APIs and interfaces to other processes.

Along with the content to be managed by Koji, a Content Generator will provide enough metadata to enable a reasonable level of auditing and reproduceability. The exact data provided and the format used is being discussed, but will include information like the upstream source URL, build tools used, build environment contents, and any container/virtualization technologies used.

The intention is that a team dedicated to managing a specific content type will design and maintain their own Content Generator, in coordination with the Koji developers. Once the Content Generator is ready for production use it will be given permission to import content and metadata it produces into Koji. Policies on the Koji hub will validate imported content and metadata and ensure that it is complete and consistent.

Requirements for writing a Content Generator

There are certain requirements a Content Generator must meet before it will be authorized to add content to Koji. These requirements should be factored into the initial design and implementation of the CG from the start. Please note that while we're confident that the list below are real requirements, we are not sure this is a complete list. We have not fully integrated a CG with Koji yet as of this writing.

Avoid Using the Host's Software

During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.

Binaries (or other compiled content) from Upstream May Not become included in output

Content Generators may pull content from rubygems.org, Nexus, or some other external repository to provide build tools. Content downloaded in this fashion may not be included in CG build output, and may not be imported into Koji. In other words, output must be built from sources in the CG or Koji, not retrieved from the internet. Tools necessary to build product content can be downloaded and cached in the CG.

Log all Transformations of Content

When the content is building, as much should be logged as possible. In addition to compilation, if the content goes through other transformations, perhaps changing formats, that should be logged as well. There can be no black-box transformations of the output. Imagine having to figure out how a piece of content was built 5 years into the future to understand the motivation behind this requirement. Details of the build environment and tools used in the environment should be recorded too.

Preserve All Inputs

All inputs to a build task should be preserved either as logs, a database, or as output of the build itself.

Preserve All Outputs

Naturally the outputs of a build should be preserved too. Transient artifacts are not strictly required, but if they're not onerous to maintain, they should be included. It must not be necessary to further transform the content to make it usable.

Do Not Use Caching Mechanisms

Content Generators must build without caching mechanisms (in compilers or yum) wherever possible. Caches make reproducing results in the future more difficult, and also introduce layers of indirection that can make debugging a build more difficult. Consider the risk of re-shipping a security flaw that is compiled in because an outdated library was cached in the Content Generator, this is why we have this requirement.

Metadata

Metadata will be provided by the Content Generator as a JSON file. There is a proposal of the Content Generator Metadata format available for review.