Dissent / Discussion
This content used to appear at the bottom of the SecondaryArchitectures draft. I've moved it to its own page.
(Placed separately rather than directly editing the above to reflect my opinions... dwmw2)
I propose the following changes to the above:
Package maintainers should have access to a build/test machine of each architecture
Failures on an arch which used to build OK are usually actually generic problems. But if the maintainer can't log in to a suitable machine where it happens to be biting today, then they have no way of handling it. Packagers should have suitable access, probably using their fedoraproject login account. They wouldn't be forced to use it, but it should be available to them.
The ArchTeam for each architecture must run regular 'rebuild all in mock' runs
By doing this, they'll be more likely to catch compiler bugs and other arch-specific issues early, before they ever bother 'generic' package maintainers. This would also have caught the fallout from switching to 128-bit 'long double' when we did that on PowerPC, for example. It would also have caught the fallout from switching to 64KiB pages. Both of those were self-inflicted arch-specific issues, where 'IBM made us break it'.
It should be explicit that Fedora packages are expected to be 'reasonably portable'
Although each ArchTeam would have the ultimate responsibility for ensuring that packages build and run correctly on their specific architecture, the Fedora packages shouldn't be entirely broken. In general, they should be free of stupid errors like endianness assumptions, unconditional inline assembly, assumptions that 'char' is signed, etc. To this end, it may make sense for the 'Primary' architectures to include something 32-bit, something 64-bit, something big-endian, something little-endian, something with 'signed' char by default and something with 'unsigned' char by default.
Builds on all architectures should be synchronous
However, after a failure on one architecture, the build would run to completion on all others. If the maintainer decides that the failure is not a generic problem, he/she would be able to push a button in koji to 'ship it anyway' on the architectures for which it did succeed.
To revisit the rationale:
- There are people who are very motived to produce 'ports' of Fedora to less common architectures such as SPARC, Alpha, ARM, etc. They currently build packages for themselves, out of sync with the Fedora build system. This is very painful and we would like to make things easier for them.
- We would like to do this without placing a higher burden on existing package maintainers.
Obviously, there is a trade-off here. It is perfectly evident that people are already capable of building Fedora for other architectures even if we do absolutely nothing to assist them. But there are some things we could do to help them, which wouldn't cost us much -- and we would like to do that. Although the ArchTeam would be responsible for actually making things build on any specific architecture, Fedora packages should be reasonably portable -- no stupid endianness bugs, unconditional inline assembly, etc. Hence the recommendation that our 'Primary' architectures have reasonable coverage of such differences.
One of the main difficulties of building a Fedora port entirely externally is that you are decoupled from the build system. Packages are built in a certain order for 'Fedora', and your port might build packages in a different order, causing subtle problems and differences between your port and the official builds. Spot's observation above is entirely correct -- as much as possible, the secondary architectures will want to keep their repository entirely in sync with Fedora's, never allowing packages to be missing or out of date. In fact, if they cannot do this, then there seems little benefit in the whole 'Secondary Architectures' proposal. They might as well just keep building stuff entirely on their own and feeding patches back as they need to. With the current proposal, not only will secondary architectures easily get out of sync when a generic bug bites there, but the slower secondary architectures will also suffer when a packager builds one package, waits for it to complete, and then builds a second package which depends on the first. On slower secondary architectures where the first package hasn't yet completed, the second package will either fail to build or will silently build against the wrong version of the first package.
Experience with the FE-ExcludeArch-ppc tracker has shown me that once a given package is building on a certain architecture, subsequent build failures are often not actually arch-specific. They are often indicative of generic bugs, which just happen to bite in some situations rather than others. To allow partially-failed builds to hit the repository automatically is, in my opinion, a very bad idea. Package maintainers should be expected to at least glance at the failure and make a decision about it, before the package hits the repositories.
On some architectures where GCC is less well-maintained, the other expected class of common build failures for existing packages is compiler bugs. To reduce the impact of these, I propose that it should be mandatory for Secondary Architectures to run the periodic 'complete rebuild in mock' test runs, as Matt Domsch has been doing for x86 and x86_64. That way, the ArchTeam would be more likely to notice new compiler bugs before 'generic' package maintainers do.
I propose that Secondary Architecture builds should be synchronous. The builds for faster architectures could be available immediately upon completion anyway, directly from the builder. And for a package to get to the mirrors takes a lot longer anyway; that's hardly a fast path. If the build fails on any architecture and the packager decides she doesn't care, it's best if she doesn't have to resubmit the build. It should run to completion in the builders even after a partial failure, and it should be possible for the packager to hit a 'ship it anyway' button in koji to allow it through to the repositories -- after filing the required ExcludeArch bug, of course. There could be automated assistance with filing such a bug, but it shouldn't happen entirely automatically, because it'll usually want input from the package maintainer, outlining the problem and why she believes it to be arch-specific. And in particular, the package should never get released automatically if it's failed on any architecture on which it used to build.
The build system has a chain-build feature which would reduce waiting time when a package maintainer wants to build a sequence of packages. This could be made to work relatively easily without 'committing' the results to the build system (or the public repository, of course) until the build has completed (or been excused) on all architectures.
Advantages of synchronous builds:
- Genuine bugs are often found by failures on one architecture but not others. We catch more bugs before shipping packages.
- Secondary architecture repositories don't get out of date so easily.
- We help set reasonable expectations -- that package maintainers look after portability of their own packages and at least look at failures. Although we don't force them to.
Advantages of the original proposal:
- Builds make it into the repository faster (but not really, because there's still a mirror sync to wait for).
- Maintainers are not expected to check on build failures (even though they're likely to be generic bugs anyway).
- Maintainers can build packages in sequence more quickly without needing to use the new chain-build features (although this completely screws over Secondary Architectures which haven't finished building the first yet).
- Maintainers know sooner that their build is actually finished and can relax (even though it may have a generic bug which shows up on a Secondary Architecture)