Infrastructure/PackageDatabase/RoadMap


 * 1) !rst (-)

=
Next Steps

=

 * Author: Toshio Kuratomi
 * Contact: toshio-tiki-lounge-com
 * Date: Sunday, 21 January, 2007

.. contents::

Import Data

=
Currently, information about the packages comes from several sources. We need to import data from those sources into the packageDB.

owners.list ---


 * Location: CVSROOT=:pserver:anonymous@cvs.fedora.redhat.com:/cvs/extras cvs co owners; cat owners/owners.list
 * Data: Package names, summary, owner, qacontact, watchers
 * Duration: One-time; Obsoletes
 * Priority: Essential
 * Status: Information is largely imported. Need to perform a similar step for

owners.epel.list. A few people in initialcc do not have Fedora accounts.

owners.list contains the bulk of new information that we are interested in. This file should contain information on all packages whether they are current in the repository or not.

This importer is almost complete. A bit more work needs to be done to combine the information here with information from the CVS Repository. With a combination of these we should have all the basic information necessary for the first stage of the package DB.

We are not currently retrieving any information for RHL releases and do not plan to.

CVS Repository --


 * Location: CVSROOT:pserver:anonymous@cvs.fedora.redhat.com:/cvs/extras
 * Data: Collections that the package is present in, package metadata present in the spec file (URL, upstream source, etc.)
 * Duration: One-time; SyncReversal
 * Priority: Essential

The VCS repository (currently CVS) is the source of most information pertaining to the package itself. Most of this information is better taken from the built packages, though, as the metadata has already been parsed and pulled into the repomd files by that time.

Unique to the CVS repository is knowledge of what packages are present in which collections -- FC-{1,2,3,4,5,devel} and FE-{1,2,3,4,5,devel}. RH-{9}, and soon EL-{4,5}. Regardless of whether they were obsoleted or otherwise, disappeared in the course of the release.

The cvs repository is also gaining ACL knowledge shortly. We could pull this information either from the files within the cvs tree or from the CVSROOT/avail file.

In the future, the packageDB will manage the branches in the VCS Repository.

Package Reviews (mailing list) --


 * Location: APPROVED_trim.txt REQUESTS_trim.txt
 * Data: URL, dates, and authors of package reviews
 * Duration: One-time
 * Priority: High

In the early days, packages were approved via the fedora-extras mailing list. These approval messages have been abstracted to two files. We need to enter this information into the packagedb so people know which packages need to be reviewed before being built for the combined Core + Extras.

The URL goes into the package table. The date and author go into the PackageLog.

Package Reviews (bugzilla) --


 * Location: bugzilla
 * Data: URL, dates, and authors of package reviews
 * Duration: Ongoing
 * Priority: High

Currently, packages are reviewed in bugzilla. We need to pull information about packages which were formerly reviewed into the packagedb. Then we need to set up a system that continues to sync against it. Perhaps we could have a packagedb front end that opens up a bugzilla report for a new package in order to start the process. The packagedb can periodically scan FE-REVIEW and FE-APPROVE tracker bugs to tell when a package changes state.

The URL goes into the package table. The date and author go into the PackageLog.

comps-fcX.xml.in


 * Location: CVSROOT=:pserver:anonymous@cvs.fedora.redhat.com:/cvs/extras cvs co comps; cat comps/comps-fe6.xml.in
 * Data: Groups that the package belongs to.
 * Duration: One-time; SyncReversal
 * Priority: Add-on

There is one comps.xml.in file per release. It is used to create the comps.xml file that lists what groups packages in the repository are for.

In the future, comps-fcX.xml.in could be generated from the packageDB.

Download Repository ---


 * Location: http://download.fedora.redhat.com/pub/fedora/linux/extras/$release/repodata/repomd.xml
 * Data: RPM Metadata extracted into an xml file. License, URL, upstream, present package EVR.
 * Duration: Ongoing
 * Priority: Add-on

Parsing the Download repository information gives us EVR information which is needed for the package_version table. There is a wealth of other information in the files that we could use as well (file listings, summary, etc.) however, this is a large extension from the initial stage of the packagedb.

We probably need to cache the repomd files in order to extract information sensibly as we're going to be pulling the data from the files on an ongoing basis. Having the cached files allows us to find the differences between the old and new information.

Build Server


 * Location: http://buildsys.fedoraproject.org/logs/fedora-development-extras/19621-gnubiff-2.2.2-3.fc6/x86_64/build.log
 * Data: Logs of failed and completed builds
 * Duration: Ongoing
 * Priority: Add-on

What do we want to do with these? They are a large source of raw information but there's not much in terms of metadata type data. If we have the db horsepower, we could put them into a Full Text Searchable field and put them in the db so we can search on them.

Write Web Front End

=
======

The Web front end is the primary interface to the package db.

Status --

postgreSQL-SQLAlchemy-TurboGears combination. Account System.
 * A Turbo Gears skeleton has been constructed.
 * A database schema has been created that works with the
 * Configured apache to pass requests on to the TurboGears instance
 * Wrote a CGI to autostart the TurboGears CGI when a request is made
 * Wrote a TurboGears identity layer to validate users against the Fedora

TODO

Version 0.1

(owners.list is finished. (ACLs are not ready yet.) (EPEL is being put off. It requires information from cvs and owners.epel.list). for each package.
 * See the two "essentials" from Importing Data.
 * Finish filling in read-only pages that allow us to see what's in the database

Version 0.2


 * Add the ability for authenticated users to change owners.list information:

- Add packages. - Set themselves as owner for a package. - Set a package as orphaned. - Request to watch a package.

Before F7


 * Add the ability for owners to hand out permissions to other users.
 * Allow owners and trusted users to make changes to the packages listed here.
 * Send email notifications when things change.
 * Write a backend for the cvs repository to pull ACLs from the packageDB.
 * Integrate with the accounts system.

As time permits

will have a cron job that pulls ACLs from the packagedb to the CVS acls file. The next generation should have the packageDB set the ACLs in CVS instead.
 * Make ACLs a push technology instead of pull. The first generation of code

from here.
 * Add comps group information to the database schema so we can generate comps


 * Write admin functionality:

- Branch script to make new release branches in the VCS. Owners can immediately branch for certain releases. For other releases, owner requests and admin clicks one button to confirm the request. Script takes care of the actual branch.


 * RSS Feeds of package changes.

on is updated.
 * Second level notification: Tell package owners when a package they depend

Ideas

want to include them in this interface but they are a bit more problematic due to the way they operate. functionality? This would take us into the realm of what Debian's web interface to their package browser does but it may be too many things for this project.
 * Hidden branches (for security fixes) need to be integrated somehow. We may
 * Maybe take more information from repomd and subsume some of repoview's