Features/NoarchSubpackages

= Support Noarch Sub Packages in Fedora =

Summary
Presently RPM supports sub packages being noarch. Right now the Fedora infrastructure does not support this feature. This feature will provide the technical abilities to use noarch sub packages and also provide help to use them within packages all over the distribution.

Owner

 * Name: Florian Festi [mailto:ffesti@redhat.com|]
 * Name: Jindrich Novy [mailto:jnovy@redhat.com|]

Current status

 * Targeted release: Fedora 12
 * Last updated: --Ffesti 21:37, 3 March 2009 (UTC)
 * Percentage of completion: 50%

Detailed Description
There are several steps needed:


 * Support in rpm (100%)
 * Support in koji (100%) - Special thanks to Mike Bonnet and Dennis Gilmore for taking care!
 * see Ticket
 * Fix tool chain
 * RHBZ#487591
 * Get a list of possible candidates (sub packages) (100%)
 * Write a mail to f-d-l and package owners (100%)
 * Write best practise documentation (0%)
 * Get packaging policy adjusted (see /PolicyChanges) (10%)
 * Get the /PackagesChanged (ongoing)

Benefit to Fedora
Noarch packages have several benefits over arch dependent packages:


 * They can be shared between different architectures and thus use up less disk space and bandwidth on both the Fedora infrastructure and our mirrors
 * They avoid double installation of data for multilib packages.
 * They tell the user that the content of the package is arch independent.

By increasing the use of noarch packages we also increase the effect of these benefits.

Additionally we can get rid of some hacks that are used to generate noarch sub packages for very few packages right now.

Scope
A small statistic on Fedora rawhide x86_64 (2009-06-15) to give an idea how many packages/files/bytes could be affected:

All files where put into one of the following categories:
 * bin32: 32 bit binaries including libraries(!) (as known to rpm, file color==1)
 * bin64: 64 bin binaries including libraries (file color==2)
 * lib32: other files in /lib or /usr/lib
 * lib64: other files in /lib64 or /usr/lib64
 * noarch: everything else

Sizes are (uncompressed) bytes in files and though do not directly map to the size of packages nor to used disk space.

17391 packages (49 GB in 2.2 M files) 12 k bin32 files (2.4 GB) 32 k bin64 files (7.3 GB) 161 k lib32 files (1.9 GB) 173 k lib64 files (6.0 GB) 1.9 M noarch files (32 GB)

9666 x86_64 packages (26 GB in 1.0 M files) 32 k bin64 files (7.3 GB) 24 k lib32 files (535 MB) 173 k lib64 files (6.0 GB) 834 k noarch files (12 GB)

3484 i586 packages (5.6 GB in 291 k files) 11 k bin32 files (2.3 GB) 36 k lib32 files (662 MB) 243 k noarch files (2.7 GB)

4221 noarch packages (17 GB in 928 k files) 101 k lib32 files (747 MB) 64 lib64 files (492 kB) 828 k noarch files (17 GB)

Test Plan

 * 1) Create one noarch subpackage by adding BuildArch: noarch to the subpackage section
 * 2) Scratch build the package to see whether there are any problems with koji
 * 3) Build package for rawhide - check that it correctly shows up in the repository and is shown as noarch package in the metadata
 * 4) See if the package installs correctly via yum
 * 5) Check if updating from a arch dependent previous version to the new noarch package works

package           | tested                | result +---+ rpm-apidocs       | i386 x86_64 -> noarch | PASSED

User Experience

 * Slightly improved mirrors due to less transfer size
 * Only packages containing binaries will be arch dependent

Dependencies

 * rpm >= 4.6.0 (is in Fedora since months when counting release candidates)
 * the steps listed in the.

Contingency Plan

 * Move target to Fedora 12
 * As soon as the technical problems have been fixed moving more sub packages to noarch can be a continuing process.

What's this all about?
With version 4.6.0 RPM supports subpackages being noarch by just adding "BuildArch: noarch" to their subpackage section in the spec file.

The noarch subpackages built on the different arches are going to be compared by koji with rpmdiff ignoring time stamp, size and md5 sums of files. If any other differences are found the build will be rejected. Even with those automatic checks in place it is the responsibility of the packager to make sure that the package is really arch independent - as for regular noarch packages, too.

Candidates for being switched to noarch
To get a list with good candidates all x86_64 packages that contain no binaries/libs (as known to rpm) and no files in /lib64 or /usr/lib64 were selected as a starting point. To further refine the selection and get an idea what can go wrong rpmdiff was run against the i386 sister packages - both with the relaxed koji and the strict -t settings. This showed a small number of false positives - mostly development packages that put files in different locations or undetected binary packages. Subpackages are marked by one surrounding '*' if they only fail the more strict rpmdiff -t check and by two if they also fail the rpmdiff check as used by koji. It is assumed that packages without '*' can be directly switched to noarch (assuming they don't do weird stuff on other arches). One '*' will require a more detailed look but should be OK in most cases and two '*'s is most likely a sign for a false positive. The diffs can be found below (limited to 20 lines).

Data from 2009-06-15:


 * [[media:NoarchCandidates.txt]] (50kB)
 * [[media:NoarchRpmdiff.txt]] (70kB)
 * [[media:NoarchStrictRpmdiff.txt]] (250kB),


 * Sorted by Size: [[media:NoarchCandidatesByContentSize.txt]] (75kB), [[media:NoarchCandidatesByPackageSize.txt]] (75kB) - these are without markers for rpmdiff problems

The 1014 (sub) packages (1.2 GB rpms / 3.9 GB in 267 k files) are distributed over 641 source packages.

How many switched packages gain how much saved space?
When sorting the packages by package size: # new noarch sub packages | content | pkg size 10 | 1.2 GB | 359 MB   20 |  1.7 GB | 515 MB   30 |  2.0 GB | 641 MB   40 |  2.3 GB | 719 MB   50 |  2.5 GB | 783 MB   60 |  2.6 GB | 837 MB   70 |  2.8 GB | 882 MB   80 |  2.9 GB | 907 MB   90 |  3.0 GB | 921 MB  100 |  3.0 GB | 952 MB  200 |  3.5 GB | 1.1 GB  300 |  3.7 GB | 1.1 GB  400 |  3.8 GB | 1.2 GB  500 |  3.8 GB | 1.2 GB  600 |  3.8 GB | 1.2 GB  700 |  3.8 GB | 1.2 GB  800 |  3.9 GB | 1.2 GB  900 |  3.9 GB | 1.2 GB 1000 |  3.9 GB | 1.2 GB 1014 |  3.9 GB | 1.2 GB

Candidates for splitting off noarch subpackages
To search for more data that could be moved into noarch sub packages all files in the distributions where put into one of the following categories:


 * bin32: 32 bit binaries including libraries(!) (as known to rpm, file color==1)
 * bin64: 64 bit binaries including libraries (file color==2)
 * lib32: other files in /lib or /usr/lib
 * lib64: other files in /lib64 or /usr/lib64
 * noarch: everything else

To be able to detect arch independent files in (/usr)/lib x86_64 packages have been examined. It is assumed that lib32 and noarch files can be moved to noarch sub packes, bin64 and lib64 can't and bin32 should not be found. This is only a very rough estimate and must be checked for each packages and doesn't take other architectures into account. Nevertheless it gives a good idea of what packages should be considered and what results can be expected.

Data from 2009-06-15:


 * [[media:SplitCandidates.txt]] (90kB) - 1000 most worthy splitting candidates
 * [[media:PackageFileTypes.txt]] (1.0 MB) - file type distribution for each x86_64 package sorted by owner
 * [[media:PackageFileTypesForComaintainers.txt]] (1.5 MB) - same as above be each package is listed for every comaintainer

For some packages it might be better to just change the borders among the subpackages instead of blindly splitting them. Such situations are not reflected well in the above lists.

How many packages are worth splitting
The table below shows how much content can be moved to noarch packages by splitting a given number of packages - assuming that the most worthy packages are split.

# new noarch sub packages             | pkg size / content--| | noarch | other |  all   | ratio | 10 | 1.4 GB | 16 MB | 1.4 GB |   98% | 747 MB   20 | 2.1 GB | 155 MB | 2.2 GB |   93% | 1.1 GB   30 | 2.5 GB | 177 MB | 2.7 GB |   93% | 1.4 GB   40 | 2.9 GB | 197 MB | 3.1 GB |   93% | 1.5 GB   50 | 3.2 GB | 336 MB | 3.5 GB |   90% | 1.7 GB   60 | 3.4 GB | 405 MB | 3.8 GB |   89% | 1.9 GB   70 | 3.7 GB | 436 MB | 4.1 GB |   89% | 2.0 GB   80 | 3.9 GB | 457 MB | 4.3 GB |   89% | 2.0 GB   90 | 4.1 GB | 566 MB | 4.7 GB |   88% | 2.2 GB  100 | 4.3 GB | 588 MB | 4.9 GB |   88% | 2.3 GB  200 | 5.5 GB | 1.1 GB | 6.6 GB |   83% | 2.9 GB  300 | 6.2 GB | 1.6 GB | 7.8 GB |   79% | 3.3 GB  400 | 6.7 GB | 1.9 GB | 8.6 GB |   77% | 3.6 GB  500 | 7.0 GB | 2.1 GB | 9.1 GB |   76% | 3.8 GB  600 | 7.3 GB | 2.4 GB | 9.7 GB |   75% | 4.0 GB  700 | 7.5 GB | 2.6 GB |  10 GB |   74% | 4.2 GB  800 | 7.7 GB | 2.8 GB |  10 GB |   73% | 4.3 GB  900 | 7.9 GB | 3.0 GB |  11 GB |   72% | 4.4 GB 1000 | 8.0 GB | 3.2 GB |  11 GB |   71% | 4.5 GB 2000 | 8.6 GB | 4.6 GB |  13 GB |   65% | 5.3 GB 3000 | 8.8 GB | 5.5 GB |  14 GB |   61% | 5.6 GB 4000 | 8.9 GB | 6.4 GB |  15 GB |   58% | 6.0 GB 5000 | 9.0 GB | 7.1 GB |  16 GB |   55% | 6.2 GB 6000 | 9.0 GB | 8.2 GB |  17 GB |   52% | 6.6 GB 7000 | 9.0 GB | 8.9 GB |  18 GB |   50% | 6.8 GB 8000 | 9.0 GB |  12 GB |  21 GB |   43% | 7.9 GB 8652 | 9.0 GB |  13 GB |  22 GB |   40% | 8.3 GB

Note that there is not much gain above 1000 packages. Even the 500 package between 500 and 1000 gain less than another Gig to the 7.2 we get for the first 500. The decision to split the packages is of course left to the maintainers but we should try to split at least as much of the first 300 or 400 packages as possible. Together with the packages that can directly be converted to noarch they contain nearly 11 out of 13 GB of the noarch content not yet in noarch packages (see and ).

Candidates for being packaged differently
Some package may require more work to be able to split off noarch content. We are only collecting packages that really scream to be changed:


 * openoffice.org-langpack-* - 589MB package size, all residing in $LIBDIR right now.

What about other packages?
A lot of other packages could also make use of this feature. When considering to split up your package please avoid too complicated spec files or increasing the number of packages unnecessarily. Use your best judgement.

What can you do as a packager?
There is now support for noarch subpackages in rawhide (Fedora 11) and Fedora 10. You can start to adjust your packages. Have a look in the package lists above to see if your package should be changed.

Please add the packages you changed or plan to change to /PackagesChanged. Put the later in parenthesis. Thanks!

What if you don't want to change your packages?
That's perfectly fine. There is no plan to force packager to use noarch subpackages. I hope we can develop a more detailed plan on how to make use of this feature in future Fedora releases. You might be interested in taking part in this discussion.

What does that mean for the Packaging Policy?
The packaging policy will require a few additions. See /PolicyChanges. Any comments and help is welcome.

Release Notes
Not applicable as visibility for the users is low and developers need to know before the release.