From Fedora Project Wiki

< Features

Revision as of 14:53, 12 March 2012 by Ppisar (talk | contribs) (→‎User Experience: Fix a typo)

PCRE 8.30

Summary

Upgrade to PCRE (Perl-Compatible Regular Expression) library 8.30 or newer. This library version brings UTF-16 support and changes API which affects a lot of packages.

Owner

Current status

  • Targeted release: Fedora 18
  • Last updated: 2012-02-28
  • Percentage of completion: 100.00 %

Done

  • PCRE 8.30 has been built
  • 109 of 109 reverse dependencies have been rebuilt.
  • Old libpcre.so.0 has been removed from pcre-8.30-2.fc18.

Detailed Description

Each PCRE release brings new fixes and features (like updated Unicode tables). Thus it's necessary to keep synchronization with upstream releases. Version 8.30 changes API. Because PCRE is in critical path and in minimal build root, it's necessary to do the upgrade carefully. So feature page to track the progress is necessary. Also 8.30 brings support for UTF-16 encoding, which is helpful for applications using this encoding internally. It will avoid expensive recoding between UTF-16 and UTF-8. Qt upstream has already expressed intention to move from its own regular expression implementation to PCRE.

Benefit to Fedora

Fedora will keep providing latest upstream PCRE version with latest Unicode tables. Fedora will provide UTF-16 mode in PCRE.

Scope

8.30 version changes API as described by upstream:

  • The pcre_info() function, which has been obsolete for over 10 years, has been removed.
  • When a compiled pattern was saved to a file and later reloaded on a host with different endianness, PCRE used automatically to swap the bytes in some of the data fields. With the advent of the 16-bit library, where more of this swapping is needed, it is no longer done automatically. Instead, the bad endianness is detected and a specific error is given. The user can then all a new function called pcre_pattern_to_host_byte_order() (or an equivalent 16-bit function) to do the swap.
  • In UTF-8 mode, the values 0xd800 to 0xdfff are not legal Unicode code points and are now faulted. (They are the so-called surrogates" that are reserved for coding high values in UTF-16.)

This is reflected in changed libpcre SONAME from libpcre.so.0 to libpcre.so.1. This change affects 109 packages. All the packages needs to be rebuilt and some of them may need moving to new API.

How To Test

  • Check the distribution contains pcre >= 8.30.
  • Check none package depends on old PCRE soname libpcre.so.0 (e.g. repoquery --whatrequires 'libpcre.so.0()(64bit)').
  • Check PCRE is compiled with UTF-16 support (install pcre-tools, check pcretest -C pcre16 returns 1).
  • Check PCRE tools works properly with UTF-16 PCRE library variant (install pcretools, read pcretest(1) manual, try pcretest -16 …).
  • Check applications can be compiled against pcre16 library. Install pcre-devel, check presence of pcre16_*(3) manual pages, check output of pcre-config --libs16 and pkg-config --libs libpcre16. Try to compile and link a short code using pcre16 library.

User Experience

There is no visible change for end users. Developers can see pcre_info(3) has been removed. pcre_info(3) users need to migrate to pcre_fullinfo(3) as documented for last 10 years in pcre_info(3) manual page.

Dependencies

109 packages needs rebuilding:

   adanaxisgpl
   blender
   bti
   cclive
   ccze
   cduce
   cegui
   cegui06
   cfengine
   classads
   coccinelle
   collada-dom
   condor
   dansguardian
   EMBOSS
   eterm
   ettercap
   exim
   fsniper
   gambas2
   gambas3
   ganglia
   ghc-hakyll
   ghc-pcre-light
   ghc-regex-pcre
   git
   gnaughty
   gnome-mud
   gnote
   gource
   grep
   gsmartcontrol
   gxneur
   haproxy
   highlighting-kate
   httpd
   imapfilter
   Io-language
   kannel
   kaya
   kdelibs
   kdelibs3
   kismet
   leafnode
   ledger
   less
   libast
   libguestfs
   lighttpd
   logstalgia
   lua-rex
   maildrop
   matahari
   mboxgrep
   mcstrans
   medusa
   mod_security
   mongodb
   monotone
   mysql-workbench
   nekovm
   nginx
   ngrep
   nmap
   ocaml-ocamlnet
   octave
   openCOLLADA
   openscada
   openscap
   opensips
   ovaldi
   pads
   pandoc
   perl-HTML-Template-Pro
   php
   picviz
   pidgin-musictracker
   poco
   postfix
   prelude-lml
   privoxy
   proftpd
   R
   regexxer
   rekall
   root
   scilab
   slang
   spring-installer
   sssd
   suricata
   swig
   syncevolution
   syslog-ng
   tabled
   Thunar
   tin
   tintin
   tinyfugue
   varnish
   wmweather+
   xastir
   xfce4-verve-plugin
   xgrep
   xmlcopyeditor
   xneur
   znc-infobot
   zoneminder
   389-ds-base

Contingency Plan

There is no contingency plan. All reverse dependencies will be rebuilt, possibly adapted to new API, or removed from the distribution.

Documentation

  • /usr/share/doc/pcre-8.30/NEWS
  • /usr/share/doc/pcre-8.30/Changelog
  • pcre16(3) manual page for UTF-16 feature
  • pcre_fullinfo(3) manual page as replacement for pcre_info(3)
  • tinyfugue conversion from pcre_info() to pcre_fullinfo()
  • Private _pcre_valid_utf8() function has been renamed to _pcre_valid_utf()

Release Notes

  • UTF-16 support through pcre16 library added
  • API change of pcre library documented in NEWS and Changelog

Comments and Discussion