Taskotron/SystemTriage

For System Triage
This document describes some basic steps for triaging the Taskotron system when things aren't working correctly. You will need root access to the machines running taskotron for most of these steps. For triaging task failures, see the documentation for individual tasks.

Taskotron Services

There are 5 major components to most Taskotron deployments:

taskmaster (buildmaster)
taskotron-trigger
resultsdb
resultsdb_frontend
taskotron clients

While it is possible to have all 5 components on a single machine, most non-dev deployments are set up as:

one machine has resultsdb and resultsdb_frontend
one machine has the taskmaster and taskotron-trigger
several machines acting as clients

All publicly available services are proxied such that they are available through a single hostname (assumed to be taskotron.localdomain for the rest of this document).

Making Local Changes
Some of the items described here talk about making changes to configuration files by hand, directly on the affected machines. Be careful if you do this! If the machines are controlled by a configuration management system (ansible et. al), your local changes will likely be overwritten and you should probably be making changes using your configuration management system.

Taskmaster (buildmaster)

The buildmaster for Taskotron is run as a non-privileged user. Most of the buildmaster logs are stored in /home/<user>/master/ and the log most likely to be interesting is twistd.log which contains the logs from the buildmaster process. If there are problems with buildbot, this is the first place to check.

The buildmaster process is controlled via a systemd unit (buildmaster.service) and can be controlled like any other similar service. Be aware that restarting the buildmaster may cause problems with running jobs - changing config can be done without restart by using the reconfig command. However, any configuration changes should be done through ansible which takes care of rendering, checking the config and reloading that new, valid configuration (fails if there are syntax errors in the new config changes).

Resultsdb and Resultsdb Frontend

From an admininstrative point of view, resultsdb and resultsdb_frontend are pretty simple. They are both apps written in Flask and behave like any other app hosted through mod_wsgi. starting/stopping/restarting httpd on the resultsdb-containing node will affect the apps as expected. Note that they can be configured to send logging messages to syslog instead of to the normal httpd logs (error_log, access_log etc.).

Taskotron Clients

The Taskotron clients are generally VMs running a buildslave. Similarly to the buildmaster, this is run as a non-privileged user and controlled through a systemd unit file (buildslave.service). If there are problems with a particular slave, looking at the slave's log in /home/<user>/slave/twistd.log is a good place to start.

Taskotron Trigger

The Taskotron trigger is built upon fedmsg-hub (part of fedmsg). The important logs to look for are:

/var/log/fedmsg/taskotron-trigger.log (raw logs of activity which triggers jobs)
/var/log/taskotron-trigger/jobs.csv (record of which jobs were triggered and when, can be used to reschedule previously scheduled jobs)

Search