Mass Upgrade Infrastructure SOP

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
(Add proper category.)
(redirect page to new infra-docs)
 
(20 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
{{shortcut|ISOP:UPGRADES}}
 
{{shortcut|ISOP:UPGRADES}}
  
Every once in a while, we need to apply mass upgrades to our servers for various security and other upgrades.
 
  
== Contact Information ==
+
This SOP has moved to the fedora Infrastructure SOP git repo. Please see the current document at: http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt
Owner: Fedora Infrastructure Team
+
  
Contact: #fedora-admin, sysadmin-main, fedora-infrastructure-list@redhat.com
+
For changes, questions or comments, please contact anyone in the Fedora Infrastructure team.  
  
Location: Phoenix
 
 
Servers: all
 
 
Purpose: Apply kernel/other upgrades to all of our servers
 
 
== Preparation ==
 
 
# Follow the [[Outage Infrastructure SOP]] and send advance notification to fedora-infrastructure-list and fedora-devel-announce.  Try to schedule the update at a time when many admins are around to help/watch for problems.
 
# Plan an order for rebooting the machines considering two factors:
 
#* Location of systems on the xen clusters. [You will normally reboot all systems on a cluster together.
 
#* Impact of systems going down on other services, operations and users.  Thus since the database servers and nfs servers are the backbone of many other systems, they and systems that are on the same xen boxes would be rebooted before other boxes.
 
# Switch DNS to point to PHX only in advance.  This allows the external proxy servers to be rebooted without causing downtime.
 
# Schedule downtime in nagios
 
# Make doubly sure that various app owners are aware of the reboots
 
 
== Staging ==
 
 
Any updates that can be tested in staging or a pre-production environment should be tested there first.  Including new kernels, updates to core database applications / libraries.  Web applications, libraries, etc.
 
 
== Minimizing Downtime ==
 
 
To minimize downtime as much as possible, the following main servers (and thus their respective xen hosts) should probably be rebooted first.  Note that the xen servers may change from update to update.
 
 
* db1
 
* db2
 
* db3
 
* nfs1
 
* cvs1
 
* proxy2 (the proxy server for all PHX machines)
 
* kojipkgs1
 
* secondary1
 
* fas1 (minor, only absolutely needed for certificate generation)
 
* torrent1
 
* hosted1
 
* people1
 
 
When rebooting servers, try to avoid having all of the machines in any of
 
these groups down at the same time.
 
 
* proxy1, proxy2
 
* app1, app2, app3, app4
 
* fas1, fas2
 
* memcached1, memcached2,
 
* bastion1, bastion2 (these use heartbeat, but they will probably cause VPN blips on rebooting)
 
* koji1, koji2 (also on heartbeat)
 
* ns1, ns2
 
 
External xen hosts can generally be done at any time during this, with the exception of the main machines listed above.
 
 
== Doing the upgrade ==
 
 
To aid in organizing a mass reboot with many people helping, it may help to create a checklist of machines in a gobby document.
 
 
In the order determined above, reboots will usually be grouped by the xen hosts that the servers are on.  For each xen host, login to each guest and upgrade it:
 
 
<pre>
 
xm console guestname # and login
 
yum update # make sure to review the list of updates and ask if you think it might break something
 
w # ping any logged on people if they're around so they don't get kicked off unexpectedly
 
grep default /etc/grub.conf # make sure that the kernel you upgraded to will be the one rebooted.
 
shutdown -h now
 
</pre>
 
 
This is also a good time to double check that each xen guest has a proper symlink in /etc/xen/auto if it should be started automatically.  When the guests are done, double check that no guests are running, then reboot the xen host.
 
 
== Aftermath ==
 
# Make sure that everything's running fine
 
# Reenable nagios notification as needed
 
# Make sure to perform any manual post-boot setup (such as loading SSH keys for transifex or entering passphrases for encrypted volumes)
 
  
 
[[Category:Infrastructure SOPs]]
 
[[Category:Infrastructure SOPs]]

Latest revision as of 18:28, 19 December 2011

Infrastructure InfrastructureTeamN1.png
Shortcut:
ISOP:UPGRADES


This SOP has moved to the fedora Infrastructure SOP git repo. Please see the current document at: http://infrastructure.fedoraproject.org/infra/docs/massupgrade.txt

For changes, questions or comments, please contact anyone in the Fedora Infrastructure team.