AutoQA resultsdb approaches

This page is a draft only
It is still under construction and content may change. Do not rely on the information on this page.

This document summarizes alternative approaches for storing and presenting automated test results used by different automation solutions.

Our discussion outputs

https://fedorahosted.org/pipermail/autoqa-devel/2010-February/000201.html

Beaker

rpmdiff

Background

rpmdiff is a collection of tools ...

A test tool for static and comparative analysis of packages
A test scheduler for pushing tests to test systems
A web front-end for results reporting/waiving

Scheduling

Rpmdiff tests are initiated by ...

ad-hoc "make" invocations by maintainers
ad-hoc XML-RPC by maintainers (for a CLI tool)
web UI for manual scheduling
package updates - analogous to bodhi updates

There is more information recorded in the database around queuing and running tests on systems. I'm ignoring this since it's not something we're including in our discussion.

Execution

Not much to say here, rpmdiff compares two packages and reports feedback. There is a utility included with yum-utils called rpmdiff, these are similar, but not the same.

Results

Each test run produces a list of results. Each result consists of:

severity - INFO, GOOD, BAD, VERIFY
text - the message for the end-user

Reporting

All test results are sent to stdout. In addition, the results are saved to an xml data file which is sent back to the scheduler (via XML-RPC). The scheduler processes the input and ...

Updates database so that results are visible from the web front-end
Based on the results, emails failure information to maintainers

DB schema

Thankfully, the database doesn't appear very complex. The same database is used for storing scheduler and results information, which isn't something we'd need to worry about. So I'll focus just on the results ...

There are more tables, but the 4 key tables for test results appear to be:

tests - just some metadata about the test, including a URL to a wiki page
runs - stores job information (time, what is being testing, NVRs etc...). All information so that the same test could be initiated at a future date.
results - all the test results produced by the run
- run (foreign key)
- test (foreign key) - basically a link back to the specific test
- test result - PASS, INFO, FAIL, WAIVE, INSPECT
- log - stdout associated with the test
waivers - contains information for all result waivers
- person - some reference to who is waiving
- description - user supplied text describing waiver
- old_result - the test result prior to waiving
- datestamp

autotest

SpikeSource

SpikeSource has a public definition/schema for test results that they use to allow federation between test systems:

http://dev.spikesource.com/wiki/index.php/Test_Results_Publication_Interface

Discussion with Petr Šplíchal

He asked if the ResultsDB functionality is not already (at least partially) provided by Beaker. That would mean we don't have to write things from scratch, but for example collaborate on finishing it in Beaker. Provided that that piece could be used standalone (because Beaker as a whole is probably not finished yet to be used instead of Autotest). As a local Brno contact for Beaker he guessed it could be mcsontos.
In RHTS all the main metadata about test are in its Makefile, like RunFor (list of packages for which to run the test), Requires (dependencies), MaxTime, Destructive, Archs, Releases (list of distributions to run the test for), etc. Also some of these metadata are in TCMS - Testopia, therefore it's duplicated.
He described how you can group individual test cases into recipes, then into recipe sets, and then into jobs. This provides a mean to run predefined set of test on particular architectures on particular distributions, for example. Should we also consider some kind of grouping?
In RHTS it collects and stores amongst others these artifacts:
- global test results (PASS, FAIL, WARN) - also considered ABORT in the future?
- test phases results - currently global result is FAIL if any of the phases fails. It would be a nice improvement to be able to define which phases may fail and which may not, or provide some custom specification how to global result should depend on the phases. This could be illustrated on something RHTS calls "real comparative workflow" - you install an old package, run the test (which is most probably supposed to fail), then you upgrade/patch the package and you run the test again. The global result in RHTS is failed (since the old version package failed the test), but the real oucome should depend on the result of the new package testing.
- score - integer? can be used to measure anything, from number of errors to performance
- logs - arbitrary files, also collected some system logs (installed packages, messages.log, kickstart/anaconda logs, etc.)
- summary output - a short summary generated from assertions and other beakerlib commands, but you may also write there something on your own; this summary is shown by default when reviewing the test
- run time
beakerlib stores the log (journal) of the test in XML structure. This could be the basis for all the logs of all tests, so we could extract some useful information from it by automated tools and display it in the front-end. This means, that one could easily have different levels of detail - it's possible to make "quick summary" (just pass/fail state of all the test phases), "detailed view" (for example which asserts failed), and "complete view" (complete log, with stout/stderr logged etc) from just one file - since the XML produced by beakerlib is quite well structured. This of course has a minor drawback that beakerlib is for shell and our wrappers are in Python. But the journalling part of beakerlib is written in Python, and Petr Muller is working on (if i recall correctly) making this journalling part a standalone library, so the journalling could be done directly.

Discussion with David Kovalský

He draw me a little diagram of how RHTS looks like and what would be better. Currently there's a Scheduler handling job scheduling. It has access to an Inventory, that contains information about available hardware. There's an RHTS Hub, that executes the tests and collects the results. What would be better is if the results went through a TCMS and be stored there.
He said that in the past all the tests could use any output format they wanted. After that they started to use rhtslib (now beakerlib) and that unified the output. He stressed that this approach really simplified everything and he really recommends it (to have unified output format).
Ideally the whole process should focus around TCMS. In the TCMS the tests and jobs should be defined. All the tests metadata should be there. And also all the test results should be reported there. He sees that as an very important building stone of the whole process and recommends it.
Some tests phases should be mandatory and some optional, so the test won't stop on an optional phase.
The current set of test results (pass, fail, warn) could be extended to a finer granularity.
He heavily recommended to look at Nitrate TCMS, so we don't develop something on our own. It should be available and ready, some more XML-RPC API is to be added soon. Author is Victor Chen.
He basically stressed that it is important to look around first before developing something, because we may find a lot of stuff already in Red Hat and available for pushing upstream. For example guys working with cobbler and kickstart have a really good tools and techniques for clean machine installation/repair, etc.
Some other interesting contacts: psplicha, pmuller, cward, bpeck; mailing lists: rhts, test-auto, tcms
We can have a look at the Inventory at https://inventory.engineering.redhat.com/

Search