From Fedora Project Wiki
(unicode 9)
Line 25: Line 25:
 
== Summary ==
 
== Summary ==
 
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. -->
 
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. -->
Update collation data in glibc to an ISO file from 2015 (in sync with Unicode 8.0.0) and sync collation rules of the locales with CLDR.
+
Update collation data in glibc to an ISO file from 2015 (in sync with Unicode 9.0.0) and sync collation rules of the locales with CLDR.
  
 
== Owner ==
 
== Owner ==
Line 62: Line 62:
  
 
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
 
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
The collation data in glibc is extremely out of date, most  locales base their  collation rules on an iso14651_t1_common file which has not been  updated for probably more than 15 years. Therefore, all characters added in later Unicode versions are missing and not sorted at all which causes  bugs like [[https://bugzilla.redhat.com/show_bug.cgi?id=1336308  Bug 1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq]]. This change is about updating that iso146541_t1_common file to the latest available version from ISO which is from 2015 and up-to-date with Unicode  8.0.0. Because  additions and changes in the syntax of the new iso146541_t1_common file, updating that file requires changing the collation rules of almost all locales. Because all these collation rules have to be touched anyway, this is a good opportunity to fix bugs in the collation ruies and sync them with the collation rules in CLDR.
+
The collation data in glibc is extremely out of date, most  locales base their  collation rules on an iso14651_t1_common file which has not been  updated for probably more than 15 years. Therefore, all characters added in later Unicode versions are missing and not sorted at all which causes  bugs like [[https://bugzilla.redhat.com/show_bug.cgi?id=1336308  Bug 1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq]]. This change is about updating that iso146541_t1_common file to the latest available version from ISO which is from 2015 and up-to-date with Unicode  9.0.0. Because  additions and changes in the syntax of the new iso146541_t1_common file, updating that file requires changing the collation rules of almost all locales. Because all these collation rules have to be touched anyway, this is a good opportunity to fix bugs in the collation ruies and sync them with the collation rules in CLDR.
  
 
== Benefit to Fedora ==
 
== Benefit to Fedora ==
Line 113: Line 113:
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
 
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
 
Test if locale specific sorting works  correctly according to the sorting rules for a locale.
 
Test if locale specific sorting works  correctly according to the sorting rules for a locale.
Test if  characters added up to Unicode 8.0.0 sort correctly.
+
Test if  characters added up to Unicode 9.0.0 sort correctly.
  
 
== User Experience ==
 
== User Experience ==

Revision as of 08:49, 7 March 2018


Glibc collation update and sync with cldr

Summary

Update collation data in glibc to an ISO file from 2015 (in sync with Unicode 9.0.0) and sync collation rules of the locales with CLDR.

Owner

  • Name: Mike Fabian
  • Email: <mfabian@redhat.com>
  • Release notes ticket: #79

Current status

  • Targeted release: Fedora 28
  • Last updated: 2018-03-07
  • Tracker bug: #1537247
  • Change is pushed to glibc master branch upstream.
  • I have now backported the change to the glibc 2.27 release branch to make patches for the Fedora 28 glibc rpm packages.

Detailed Description

The collation data in glibc is extremely out of date, most locales base their collation rules on an iso14651_t1_common file which has not been updated for probably more than 15 years. Therefore, all characters added in later Unicode versions are missing and not sorted at all which causes bugs like [Bug 1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq]. This change is about updating that iso146541_t1_common file to the latest available version from ISO which is from 2015 and up-to-date with Unicode 9.0.0. Because additions and changes in the syntax of the new iso146541_t1_common file, updating that file requires changing the collation rules of almost all locales. Because all these collation rules have to be touched anyway, this is a good opportunity to fix bugs in the collation ruies and sync them with the collation rules in CLDR.

Benefit to Fedora

This will fix many bugs in the collation and make glibc sort more correctly according to current standards.

Scope

  • Proposal owners: Work with upstream, file bugs and provide patches where required.
  • Other developers: This change will impact glibc and everything which sorts strings using the collation functions from glibc. Other Developers do not need to make any changes from their end, but they need to watch how their application behaves with improved localedata. We need proper testing to see that it does not break any application.
  • Policies and guidelines: No, this change does not require any updates to Policies or packaging guideline updates.
  • Trademark approval: N/A (not needed for this Change)

Upgrade/compatibility impact

The sort order of strings in many locales will change somewhat.

How To Test

Test if locale specific sorting works correctly according to the sorting rules for a locale. Test if characters added up to Unicode 9.0.0 sort correctly.

User Experience

Better sorting of strings by glibc, more up-to-date with current standards.

Dependencies

  • Upstream release schedule.
  • If our patches does not come in upstream, we will not try to patch it in Fedora. So this change will make it into Fedora 28 only if glibc 2.27 is released in time for Fedora 28.

Contingency Plan

  • Contingency mechanism: Will move change to Fedora 29 release.
  • Contingency deadline: Fedora 29 Beta release.
  • Blocks release? No. Yes/No
  • Blocks product? No.

Documentation

[Bug 14095 - Review / update collation data from Unicode / ISO 14651]

Release Notes