- 1 Redefinition of what constitutes a secondary/alternate architecture in Fedora
- 2 Questions and Answers:
- 3 Q: How does this deal with the promotion of an Edition between "primary" and "seconary"?
- 4 Q: Why do I have to worry about s390x/powerpc/aarch64 when I didn't before?
- 5 Q: I don't have access to hardware to debug issues on those architectures?
- 6 Q: I can just use ExcludeArch? if things fail, right?
- 7 Q: Will my builds be slower?
- 8 Q: When will the new ARMv7 builders be in place?
- 9 Q: Will a single arch failure affect the overall build failure?
- 10 Q: How does this affect the kernel?
- 11 Q: koji build failure debugs?
- 12 Q: What is the cost/budget required for this?
Redefinition of what constitutes a secondary/alternate architecture in Fedora
With Fedora 20 we promoted our first new architecture ever, in Fedora 21 we redefined entirely what was delivered with the introduction Workstation/Server/Cloud Editions. Since then we've "demoted" certain components of i686 without any real defined process for how to do so, yet there's still a lot of reasons to keep i686 around as an almost first class citizen (it is currently at least 20% of the installed userbase) and for the last few releases there's been a lot of queries and requests about promoting components of AArch64 like Server and Docker. For the last two there's currently no cut and dry means of doing this as, unlike i686, the builds aren't in the main koji instance to easily consume for the Server edition or a Cloud component such as Docker. As the Fedora ARM lead there's no way I'd currently recommend a AArch64 Workstation edition, but in the context of server/docker an AArch64 promotion makes a lot of sense. Similarly it's made a lot of sense for a couple of cycles to be able to demote some parts of i686 while still keeping it around and consuming other parts "multilib" to support some applications in the x86_64 world.
There's no current easy means of promoting/demoting a "Fedora artefact deliverable" between primary/secondary within an architecture at the moment with our current definition of what constitutes a primary/secondary architecture because that definition revolves around the koji build system. While that might have been suitable pre modern Fedora Editions "Everything as one" world it now means we need to redefine "primary" and "secondary" architectures as artefact deliverables not koji instances, how that promotion/demotion works etc.
It's clear we need to redefine what constitutes a secondary/alternate architecture and how we deal with architectures and artefacts of them as a project. So how best do we redefine alternate architectures and make it easier to promote artefacts?
The proposal here is to remove koji as the definition of what constitutes a primary and secondary release.
With the changes to i686 for all intents and purpose we basically have already done this. It has already been the case for some time that any major toolchain issue with a non x86 "secondary" release has impacted primary  so "primary" is not isolated from issues of other alternate architectures like some believe. This would eventually result in all architectures running in the same instance of koji (like the Red Hat internal koji instance "brew") and the distinction of what makes a "primary" or "secondary" / "alternate" release of Fedora decided at compose time and the release artefacts being delivered depending on their status.
The proposal would be to initially import the AArch64 builds in the Fedora 26 cycle and complete the transition with Power64 and s390x in the Fedora 27 cycle. In both this would require a mass rebuild of all packages to be guaranteed for the cycle. This is already been requested by the toolchain team for the Fedora 26 cycle.
Questions and Answers:
Q: How does this deal with the promotion of an Edition between "primary" and "seconary"?
A: It doesn't. This proposal is about removing the architecture and the location of where the build artifects reside (koji instance) from whether a certain Edition, Artifact or any other output of a compose is considered a primary or secondary artifact of a release.
Q: Why do I have to worry about s390x/powerpc/aarch64 when I didn't before?
A: Packagers already have to deal with aarch64/Power64/s390. They get bug reports when they're FTBFS or there's issues. The only change here is that rather than having to deal with it post initial primary build they'll need to deal with it then and there. There will be the secondary arch teams available to assist as before, in fact they'll have more time to assist as they're not dealing with "tail chasing" that is koji-shadow and associated processes.
Q: I don't have access to hardware to debug issues on those architectures?
A: In all of the aarch64/Power64/s390x there is means to get access to this HW to fix issues, or there's the secondary teams that can assist in the fixing process, or in the case where the package just doesn't work, no major need for it on that arch there is already the Exclude/Exclusive? option, which in the vast majority of the packages that fit this description has already been actively put in place by the secondary arch teams.
In the vast majority of cases this will be no more work and they'll never have an issue with these other architectures. I would say around 98% of packagers will barely notice. For a noarch packages there are 9733 noarch packages out of 18154, so over 50% of the source packages in the distro are pure noarch and they already deal with ppc64/ppc64le builders due to EPEL.
Q: I can just use ExcludeArch? if things fail, right?
A: No. The policy here doesn't change. There's already procedures to deal with these architecture dependent failures for all the current non x86 architectures.
Q: Will my builds be slower?
A: No. The slowest architecture (s390 31 bit) was retired for Fedora 24. PowerPC which was previously the slowest builders in the primary koji instance (used for EPEL arch builds and any noarch) have been replaced with POWER8 hardware resolving the PPC/noarch issues and we have new 64 bit ARM hardware soon to go into production to provide new ARMv7 virtual builders of much higher spec that is currently in production. It's likely that over all in the near future your builds will actually be faster.
Q: When will the new ARMv7 builders be in place?
A: Soon! The current plan is mid to late July. This proposal isn't impacted by this as ARMv7 is already a primary architecture.
Q: Will a single arch failure affect the overall build failure?
A: Yes. An architecture failure will always affect a build failure and be dealt with, as it is now, in the case of x86_64/i686/ARMv7 it's instant, in the case of AArch64/Power64/s390x it's currently slightly delayed. Any toolchain or other issues in the current secondary architecture set already affects the primary builds as was seen with a toolchain issue in the F-21/22 cycle .
The fact is though that the actual packages that are ever affected by arch specific build issues (and I'm taking this from all non x86 arches) are a small of the 18K packages we ship.
Of those that are ever affect the vast majority are maintained by RH people that are paid to care about it across all arches and they deal with the issues already (gcc toolchain stack, glibc, python, golang, etc) and the vast majority of those teams want them to all happen as a single build they can deal with.
We have a enhancement for koji planned (should be relatively minor) where all arch builds will run to completion (whether pass or fail) rather than cancelling all the rest when one fails to enable quicker debug as to which arches have issues (one, all etc) and comparison. This has been requested internally for brew for some time.
The issue with not failing all builds when a single arch fails is how we deal with any builds that are dependent on that package?
EG: new major version of library X with a soname bump, an arch fails. How do we deal with the soname bump across all the arches, if a dependent package then tries to build they'll either get disparate sonames depending on the arch, or a missing library on the arch when that fails. Basically either way it currently ends in a big mess. How do you suggest we deal with that? It's actually less problematic to fail them all and ensure there's arch people there to help out. It doesn't currently cause us a big issue with the 3 arches already in place (or even taking into account secondary arches), and for 95%+ of the packages I doubt it'll ever cause any issue ever.
Basically from experience with all the architectures all packages into one of 5 basic categories: 1) no issues ever - basically probably 97+% of the 18K source packages 2) toolchains and core bits (gcc/glibc/python/golang etc) - Red Hat already pays people to care about all these core toolchains across all architectures no matter where the package resides. Currently they have to build/test etc in 4 difference koji instances. This change will be a reduction in work for these people and enable the same process internally/externally. 3) kernel/grub2/shim - we already deal with this in a similar manner now as we would with it merged. The Fedora kernel maintainers only actively care about x86_64, i686 is a token effort with the maintainers actively seeking others to assume maintenance, ARMv7/AArch64/Power64/s390x are actively maintained by others. 4) arch specific packages for HW enablement, or toolchain only supports some arches - already Exclude/Exclusive arch rpm options are configured in the vast majority of cases, there might be a few that need minor adjustments but this is already in process as part of the secondary architectures teams. 5) a handful if packages that occasionally cause issues (only ones actually come to mind ATM is firefire/xulrunner/thunderbird and friends), possibly libreoffice. This is where we need to put process in place and we'll have more resources assist with these issues as they to do this that aren't doing mindless "shadow" work.
Q: How does this affect the kernel?
A: It doesn't. All architectures are already supported in the current kernel src.rpm instance. Even on primary architectures the kernel for i686 and ARMv7 aren't actively maintained by the kernel leads. The AArch64/Power64/s390x kernel issues are already dealt with by other teams as well.
Q: koji build failure debugs?
A: We're working on an enhancement to koji  to enable all builds to run to completion if a single architecture fails. The plan is for all sub tasks to run to completion no matter if one sub task fails. This will enable a maintainer to see if a build failure is due to the architecture or is a general failure across all architectures to aid quicker debugging and turn around. The primary task will still fail on any sub task failures.
Q: What is the cost/budget required for this?
A: There's not expected to be any material change in cost. Initially there may be a slight increase in storage usage on the primary koji instance due to the new architectures but this will be offset against the decommisioning of the arm/ppc/s390 koji instances where a lot of files are already duplicated (src/noarch rpms). There will be no need for new builders as the existing infrastructure will be reused and there's already enough capacity for the transition.