From Fedora Project Wiki
(RCA 2017-02-18 proxy01 disk full)
 
Line 11: Line 11:
== When it recovered (or got fixed) ==
== When it recovered (or got fixed) ==
09:20 UTC, rerouted internal proxy01 to proxy10 and disabled proxy01 in DNS to prevent user issues.
09:20 UTC, rerouted internal proxy01 to proxy10 and disabled proxy01 in DNS to prevent user issues.
09:40 UTC, after clearing koji access logs, proxy01 httpd started.
09:40 UTC, after clearing koji access logs, proxy01 httpd started.
09:45 UTC, proxy01 re-enabled in DNS.
09:45 UTC, proxy01 re-enabled in DNS.



Revision as of 09:56, 18 February 2017

Proxy01 down issue

Description

Proxy01 went down with lots of nagios notifications following. This also dragged down koji from internal (internally, koji.fp.o is only mapped to proxy01).


When the issue presented itself

February 18, 2017, 09:15 UTC was the first nagios notification.

When it recovered (or got fixed)

09:20 UTC, rerouted internal proxy01 to proxy10 and disabled proxy01 in DNS to prevent user issues.

09:40 UTC, after clearing koji access logs, proxy01 httpd started.

09:45 UTC, proxy01 re-enabled in DNS.

Root cause

Koji access logs filled up proxy01's drive to 100%. For some reason, logrotate had not rotated the 20170218 logs out to an xz-compressed form, meaning there were lots of multi-GB access files, filling up the disk entirely.


Service owners


Follow-up steps

  • Figure out why logrotate didn't work.
  • Figure out why access logs were bigger than usual (access logs on proxy01 gone, will need to come from hubs)..
  • Make internal use both proxy01 and proxy10 for koji access


Future ideas

  • Make it easier to disable proxy by only providing name in cmds/ files.