Nagios Infrastructure SOP

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
m (Add Category)
m (SOP Formatting)
Line 1: Line 1:
= Nagios: Standard Operating Procedure =
+
{{header|infra}}
 
+
{{shortcut|ISOP:NAGIOS}}
  
 +
This SOP is to describe nagios configurations
  
 
== Contact Information ==
 
== Contact Information ==
Line 14: Line 15:
 
Purpose: This SOP is to describe nagios configurations
 
Purpose: This SOP is to describe nagios configurations
  
= Initial Configuration =
+
== Initial Configuration ==
== CGI Access ==
+
=== CGI Access ===
 
To view information in nagios (anything with cgi-bin in the path) you need to be able to grant yourself access.  After checking out the Puppet CVS tree as described in the  [[Infrastructure/SOP/Puppet |Puppet SOP]]  you first need to edit configs/system/nagios/cgi.cfg and append your FAS username to 'authorized_for_system_commands'
 
To view information in nagios (anything with cgi-bin in the path) you need to be able to grant yourself access.  After checking out the Puppet CVS tree as described in the  [[Infrastructure/SOP/Puppet |Puppet SOP]]  you first need to edit configs/system/nagios/cgi.cfg and append your FAS username to 'authorized_for_system_commands'
== Contact Information ==
+
=== Contact Information ===
 
{{Admon/caution | You must configure a contacts file to be able to acknowledge [[Infrastructure/SOP/Outage |outages]]}}
 
{{Admon/caution | You must configure a contacts file to be able to acknowledge [[Infrastructure/SOP/Outage |outages]]}}
  
Line 38: Line 39:
 
Next append your name to the 'members' section of configs/system/nagios/contactgroups/fedora-sysadmin-email.cfg
 
Next append your name to the 'members' section of configs/system/nagios/contactgroups/fedora-sysadmin-email.cfg
  
== nagios-external ==
+
=== nagios-external ===
 
The same changes will need to be applied with the nagios-external configuration (configs/system/nogios-external)
 
The same changes will need to be applied with the nagios-external configuration (configs/system/nogios-external)
  
== Commit Changes ==
+
=== Commit Changes ===
 
{{Admon/caution | Remember to "cvs add" the contacts/fasname.cfg files}}
 
{{Admon/caution | Remember to "cvs add" the contacts/fasname.cfg files}}
  
 
Commit changes by running <code>cvs commit -m "Adding fasname to Nagios"</code> and then mark the changes for distribution by <code>make install</code>
 
Commit changes by running <code>cvs commit -m "Adding fasname to Nagios"</code> and then mark the changes for distribution by <code>make install</code>
  
= Configuration =
+
== Configuration ==
== Instances ==
+
=== Instances ===
 
Fedora Project runs two nagios instances, [https://admin.fedoraproject.org/nagios nagios] (noc1) and [http://admin.fedoraproject.org/nagios-external nagios-external] (noc2), you must be in the 'sysadmin' group to accesss them.
 
Fedora Project runs two nagios instances, [https://admin.fedoraproject.org/nagios nagios] (noc1) and [http://admin.fedoraproject.org/nagios-external nagios-external] (noc2), you must be in the 'sysadmin' group to accesss them.
  
== nagios (noc1) ==
+
=== nagios (noc1) ===
 
The nagios configuration on noc1 should only monitor general host statistics - puppet status, uptime, apache status (up/down), SSH etc.
 
The nagios configuration on noc1 should only monitor general host statistics - puppet status, uptime, apache status (up/down), SSH etc.
  
 
The configurations are found at <code>configs/system/nagios/</code> in the puppet tree.
 
The configurations are found at <code>configs/system/nagios/</code> in the puppet tree.
  
== nagios-external (noc2) ==
+
=== nagios-external (noc2) ===
 
The nagios configuration on noc2 is located outside of our main datacenter and should monitor our user websites/applications (fedoraproject.org, FAS, PackageDB, Bodhi/Updates).
 
The nagios configuration on noc2 is located outside of our main datacenter and should monitor our user websites/applications (fedoraproject.org, FAS, PackageDB, Bodhi/Updates).
  
 
The configurations are found at <code>configs/system/nagios-external/</code> in the puppet tree.
 
The configurations are found at <code>configs/system/nagios-external/</code> in the puppet tree.
  
= Understanding the Messages =
+
== Understanding the Messages ==
== General ==
+
=== General ===
 
Nagios notifications are generally easy to read, and follow this consistent format:
 
Nagios notifications are generally easy to read, and follow this consistent format:
 
<pre>
 
<pre>
Line 69: Line 70:
 
Reading the message will provide extra information on what is wrong.
 
Reading the message will provide extra information on what is wrong.
  
== Disk Space Warning/Critical ==
+
=== Disk Space Warning/Critical ===
 
Disk space warnings normally include the following information:
 
Disk space warnings normally include the following information:
 
<pre>
 
<pre>
Line 77: Line 78:
 
A message stating "(1% inode=99%)" means that the diskspace is critical '''not''' the inode usage and is a sign that more diskspace is required.
 
A message stating "(1% inode=99%)" means that the diskspace is critical '''not''' the inode usage and is a sign that more diskspace is required.
  
= Further Reading =
+
== Further Reading ==
 
* [[Infrastructure/SOP/Puppet |Puppet SOP]]  
 
* [[Infrastructure/SOP/Puppet |Puppet SOP]]  
 
* [[Infrastructure/SOP/Outage |Outages SOP]]
 
* [[Infrastructure/SOP/Outage |Outages SOP]]
  
 
[[Category:Infrastructure SOPs]]
 
[[Category:Infrastructure SOPs]]

Revision as of 04:25, 18 February 2009

Infrastructure InfrastructureTeamN1.png
Shortcut:
ISOP:NAGIOS

This SOP is to describe nagios configurations

Contents

Contact Information

Owner: Fedora Infrastructure Team

Contact: #fedora-admin, sysadmin-main & sysadmin-noc groups

Location: Anywhere

Servers: noc1, noc2, puppet1

Purpose: This SOP is to describe nagios configurations

Initial Configuration

CGI Access

To view information in nagios (anything with cgi-bin in the path) you need to be able to grant yourself access. After checking out the Puppet CVS tree as described in the Puppet SOP you first need to edit configs/system/nagios/cgi.cfg and append your FAS username to 'authorized_for_system_commands'

Contact Information

Stop (medium size).png
You must configure a contacts file to be able to acknowledge outages

Create a new file named 'fasname.cfg' in configs/system/nagios/contacts/ with the following details:

define contact{
contact_name            fasname
alias                   Real Name
service_notification_period   24x7
host_notification_period      24x7
service_notification_options  w,u,c,r
host_notification_options     d,u,r
service_notification_commands notify-by-email
host_notification_commands    host-notify-by-email
email                   Email address (any)
}
Warning (medium size).png
Using the 24x7 notification period may cause duplicate messages if you are a member of sysadmin-main, in which case you can specify 'never' instead

Next append your name to the 'members' section of configs/system/nagios/contactgroups/fedora-sysadmin-email.cfg

nagios-external

The same changes will need to be applied with the nagios-external configuration (configs/system/nogios-external)

Commit Changes

Stop (medium size).png
Remember to "cvs add" the contacts/fasname.cfg files

Commit changes by running cvs commit -m "Adding fasname to Nagios" and then mark the changes for distribution by make install

Configuration

Instances

Fedora Project runs two nagios instances, nagios (noc1) and nagios-external (noc2), you must be in the 'sysadmin' group to accesss them.

nagios (noc1)

The nagios configuration on noc1 should only monitor general host statistics - puppet status, uptime, apache status (up/down), SSH etc.

The configurations are found at configs/system/nagios/ in the puppet tree.

nagios-external (noc2)

The nagios configuration on noc2 is located outside of our main datacenter and should monitor our user websites/applications (fedoraproject.org, FAS, PackageDB, Bodhi/Updates).

The configurations are found at configs/system/nagios-external/ in the puppet tree.

Understanding the Messages

General

Nagios notifications are generally easy to read, and follow this consistent format:

** PROBLEM/ACKNOWLEDGEMENT/RECOVERY alert - hostname/Check is WARNING/CRITICAL/OK **
** HOST DOWN/UP alert - hostname **

Reading the message will provide extra information on what is wrong.

Disk Space Warning/Critical

Disk space warnings normally include the following information:

DISK WARNING/CRITICAL/OK - free space: mountpoint freespace(MB) (freespace(%) inode=freeinodes(%)):

A message stating "(1% inode=99%)" means that the diskspace is critical not the inode usage and is a sign that more diskspace is required.

Further Reading