From Fedora Project Wiki
mNo edit summary
Line 9: Line 9:
== Requirements for writing a Content Generator ==
== Requirements for writing a Content Generator ==


There are certain requirements a Content Generator must meet before it will be authorized to add content to Koji. These requirements should be factored into the initial design and implementation of the CG from the start. Please note that while we're confident that the list below are real requirements, we are not sure this is a complete list.  We have not fully integrated a CG with Koji yet as of this writing.
From an implementation perspective, content generators have wide latitude in how they perform
builds. To ensure sanity in the build process, we strongly recommend that administrators of
Koji systems set policies about what content generators are allowed to do, and make sure that
those policies are followed before the content generator is granted authorization in their
Koji system.
 
Below are some examples of the sorts of policies that one might require. Content Generators
should be designed and implemented with the requirements in mind. Please note that the list
below is not complete.  


=== Avoid Using the Host's Software ===
=== Avoid Using the Host's Software ===


During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.
During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.
=== Source of build environment content ===
The build environment must come from somewhere. In a standard Koji build, it comes from content
already in Koji, or from configured external repositories.
CG authors will likely want to pull content from sources outside of Koji. Koji administrators
should set a clear policy about which sources are acceptable. The use of arbitrary sources
can make it difficult or impossible to reproduce build environments.


=== Binaries (or other compiled content) from Upstream May Not become included in output ===
=== Binaries (or other compiled content) from Upstream May Not become included in output ===


Content Generators may pull content from rubygems.org, Nexus, or some other external repository to provide build tools. Content downloaded in this fashion may not be included in CG build output, and may not be imported into Koji. In other words, output must be built from sources in the CG or Koji, not retrieved from the internet. Tools necessary to build product content can be downloaded and cached in the CG.
If tools or other content downloaded from external sources are used in the build, they may
not be included in CG build output, and may not be imported into Koji. In other words, output
must be built from sources in the CG or Koji, not retrieved from the internet. Tools necessary
to build product content can be downloaded and cached in the CG.


=== Log all Transformations of Content ===
=== Log all Transformations of Content ===

Revision as of 22:46, 18 June 2015

Koji Content Generators

A Koji Content Generator is an external service that generates content (jars, zips, tarballs, .npm, .wheel, .gem, etc) which is then passed to Koji for management and delivery to other processes in the release workflow. Content Generators can evolve independently of the Koji codebase, enabling the build process to be more agile and flexible to changing requirements and new technologies, while allowing Koji to provide stable APIs and interfaces to other processes.

Along with the content to be managed by Koji, a Content Generator will provide enough metadata to enable a reasonable level of auditing and reproduceability. The exact data provided and the format used is being discussed, but will include information like the upstream source URL, build tools used, build environment contents, and any container/virtualization technologies used.

The intention is that a team dedicated to managing a specific content type will design and maintain their own Content Generator, in coordination with the Koji developers. Once the Content Generator is ready for production use it will be given permission to import content and metadata it produces into Koji. Policies on the Koji hub will validate imported content and metadata and ensure that it is complete and consistent.

Requirements for writing a Content Generator

From an implementation perspective, content generators have wide latitude in how they perform builds. To ensure sanity in the build process, we strongly recommend that administrators of Koji systems set policies about what content generators are allowed to do, and make sure that those policies are followed before the content generator is granted authorization in their Koji system.

Below are some examples of the sorts of policies that one might require. Content Generators should be designed and implemented with the requirements in mind. Please note that the list below is not complete.

Avoid Using the Host's Software

During the building process, the code should avoid using the host's installed software. The more reliance on installed software, the more risk in the future that changes (such as upgrading a builder) will break the build processes. Use mock chroots, VM guests, or containers wherever possible to insulate against changes. Isolating the build environment from the host environment makes reproducing work much easier and predictable.

Source of build environment content

The build environment must come from somewhere. In a standard Koji build, it comes from content already in Koji, or from configured external repositories.

CG authors will likely want to pull content from sources outside of Koji. Koji administrators should set a clear policy about which sources are acceptable. The use of arbitrary sources can make it difficult or impossible to reproduce build environments.

Binaries (or other compiled content) from Upstream May Not become included in output

If tools or other content downloaded from external sources are used in the build, they may not be included in CG build output, and may not be imported into Koji. In other words, output must be built from sources in the CG or Koji, not retrieved from the internet. Tools necessary to build product content can be downloaded and cached in the CG.

Log all Transformations of Content

When the content is building, as much should be logged as possible. In addition to compilation, if the content goes through other transformations, perhaps changing formats, that should be logged as well. There can be no black-box transformations of the output. Imagine having to figure out how a piece of content was built 5 years into the future to understand the motivation behind this requirement. Details of the build environment and tools used in the environment should be recorded too.

Preserve All Inputs

All inputs to a build task should be preserved either as logs, a database, or as output of the build itself.

Preserve All Outputs

Naturally the outputs of a build should be preserved too. Transient artifacts are not strictly required, but if they're not onerous to maintain, they should be included. It must not be necessary to further transform the content to make it usable.

Do Not Use Caching Mechanisms

Content Generators must build without caching mechanisms (in compilers or yum) wherever possible. Caches make reproducing results in the future more difficult, and also introduce layers of indirection that can make debugging a build more difficult. Consider the risk of re-shipping a security flaw that is compiled in because an outdated library was cached in the Content Generator, this is why we have this requirement.

Metadata

Metadata will be provided by the Content Generator as a JSON file. There is a proposal of the Content Generator Metadata format available for review.