|Line 111:||Line 111:|
== Dependencies ==
== Dependencies ==
== Contingency Plan ==
== Contingency Plan ==
Revision as of 05:21, 29 August 2012
Agent-Free Systems Management
On server class systems, monitoring and managing hardware health/configuration remotely for large number of systems is crucial. One important component of this systems management solution on each server is the Service Processor. System Administrators and Monitoring/Configuration Software (like Nagios, etc.) connect to the Service Processor via shared/dedicated management networks.
The information provided by the Service Processor is mostly independent of the Operating System running on the server. It is possible through systems management software installed on the operating system to obtain a richer set of systems management functionality overall. Such systems management software that run on Linux are specific to the vendor of the server, and can also be proprietary. They can also be bulky and require to be validated/managed like any other application.
We can envision an ideal systems management solution comprising of the Service Processor and the operating system combination that “just work” without the need for a vendor specific (and sometimes proprietary) software without a major loss of feature.
The goal of this feature is the substitute some of the important functionality of the systems management software that is usually installed on the operating system by a native implementation. This will also put existing standards already in use by Service Processors like IPMI and WSMAN to better use.
- Name: Charles Rose
- Email: firstname.lastname@example.org
- Targeted release: Fedora 18
- Last updated: 2012-08-29
- Percentage of completion: 70 %
- 1. Publish OS information to Service Processor - 100%
- Purpose: OS information should be accessible via the Service Processor remotely.
- On systems that contain a service processor, upon each boot-up, publish these to the Service Processor:
- “OS Name”, Example: “Fedora”
- “OS Version”, Example: “17"
- “System Host Name”, Example: “fedora.example.com”
- os-name, os-hostname are standard IPMI commands. os-version is OEM command.
- 2. Heartbeat to Service Processor - 100%
- Purpose: Capture screen shot in Service Processor for debugging on system crash during runtime and install-time
- On each startup or during installation, set-up the IPMI watchdog via systemd's hardware watchdog. Example: RuntimeWatchdogSec=120
- During runtime, setting IPMI_WATCHDOG=yes in /etc/sysconfig/ipmi (OpenIPMI package) should enable systemd watchdog
- 3. Retrieve log from Service Processor - Will not be ready for F18
- Purpose: Have syslog log Service Processor events so there is one log where system administrators can look for OS and Service Processor events.
- OS daemon should fetch logs from Service Processor, preferably with filtering capability.
- In addition to what ipmievd can provide currently
- 4. Support for redirection of SNMP - 100%
- Purpose: Utilize SNMP agent on Service Processor and provide access to Service Processor MIB via the OS's SNMP agent.
- Redirect selected snmp queries to the service processor's IP
- Redirect traps form Service Processor to Fedora's trap destination.
- Detect Service Processor interface IP and configure SNMP redirection and trap configuration
- Example on Dell PowerEdge Servers: Redirect OIDs under .188.8.131.52.4.1.674.10892.2 to the service processor, we will need this in /etc/snmpd.conf: proxy -v2c -Os -c public <Serv_Proc_IP> .184.108.40.206.4.1.674.10892.2
- 5. Include IPMI support in anaconda - Work in progress
- Purpose: Set/Retrieve systems management information during install-time
- Use-case: Communicate various stages of the anaconda installation to the Service Processor to aid in debugging installation issues.
- Use-case: Capture screen-shot from Service Processor in case of install-time system crash/hang.
- Development: http://www.redhat.com/archives/anaconda-devel-list/2012-July/msg00101.html
- 6. Publish Service Processor URL and IP via WSMAN - 50%
- Purpose: One-to-Many management consoles are able to launch the service processor management console by retrieving the URL from an OS based agent. Moving this functionality into the OS enables the same feature without the need to install an additional application into the OS.
- Retrieve IP address and URL of service processor and expose them via
- standard DMTF name-space
- environment variable for privileged users
- Needs to be dynamic: Any changes to service processor IP address/URL should reflect on the host OS or when queried by wsman
- Can use existing CIM name space: https://sblim.sf.net/wbem/wscim/1/cim-schema/2
- Completed fetch and set environment variable
- Working upstream (openwsman) on exposing variables via WSMAN
- 7. WS-MAN Provider Redirection - Will not be ready for F18
- Purpose: Many management consoles and tools manage hardware via WS-MAN. This requires the addition of a WMI provider from the hardware vendor. Placing a WS-MAN redirection to the service processor’s WS-MAN stack into the OS enables the same feature without the need to install an additional application into the OS
- Development: http://sourceforge.net/mailarchive/message.php?msg_id=29640731
Benefit to Fedora
- The Fedora users of servers that contain Service Processors do not have to install additional software dedicated to systems management and still expect standard pieces of information to be available remotely.
- Assists with debugging system failures (panic, hang, etc.) remotely.
The new features will require:
- Automated loading of ipmi driver where service processor hardware is available.
- Proposed: https://lkml.org/lkml/2012/7/26/278
- Contingency: OpenIPMI already has systemd start-up script that loads the drivers when enabled.
- One start-up script that will run after ipmi drivers have loaded to:
- fetch service processor IP address/URL
- set OS name, version in the service processor
- setup snmpd.conf for redirection
- A configuration file that accompanies the start-up script that contains:
- snmpd OID of the service processor (will differ for each OEM)
- systemd already has support for hardware watchdog, but we will require that ipmi_watchdog driver is loaded and does not conflict with iTCO_wdt on systems that have both watchdog hardware.
- This is not Dell specific and will work on any system with an IPMI compliant Service Processor
- Patch /usr/libexec/openipmi-helper to set systemd watchdog if /etc/sysconfig/ipmi:IPMI_WATCHDOG=yes
- Support for IPMI (driver and freeipmi) in Anaconda
- systemd already supports hardware watchdog.
How To Test
- Install Fedora on test machine with service processor
- Publish OS information to Service Processor
- Service Processor should provide the OS Version and Name via various supported interfaces
- Heartbeat to Service Processor
- With the watchdog daemon configured, a kernel panic or system crash should result in the system rebooting after the set time and a snapshot of the crash and/or an entry in the SEL log should be recorded.
- Retrieve log from Service Processor
- syslog should contain IPMI SEL events logged by ipmievd
- Support for redirection of SNMP
- After configuration of /etc/snmpd.conf, snmp queries to Fedora with the Service Processor OID should succeed and return correct values that would otherwise be retrieved via the Service Processor's snmp agent.
- Include IPMI support in anaconda
- During install time, we should have access to ipmitool or freeipmi commands that can be used via kickstart's pre-install section.
- Publish Service Processor URL and IP via WSMAN
- Get Service Processor URL and IP by querying the OS's wsman server.
- Easier management of larger networks using industry standard technologies.
- Minimal intervention from user in configuring Service Processor access details in the OS.
- Document with script-lets on how similar functionality can be achieved.
- Better management of Fedora system remotely via Service Processor
- On systems that contain IPMI compliant Service Processors, it is now possible to have closer integration of OS and Service Processor without the need for 3rd party software. This will enable better management of the system remotely.