Fedora Cloud Infrastructure SOP
Fedora Cloud computing
Owner: Fedora Infrastructure Team
Contact: #fedora-admin, sysadmin-cloud group
Persons: mmcgrath, SmootherFrOgZ, G
Location: Phoenix ?
Servers: capp1.fedoraproject.org, cnode[1-5].fedoraproject.org, store[1-4]
Purpose: Provide Virtual Machine for Fedora contributors.
Rebuild capp1 (ovirt-server)
Log into cnode1
Check that no capp1 domain is running
sudo virsh list
If there is a capp1 running, proceed as follow
sudo virsh destroy capp1 sudo virsh undefine capp1
Format capp1 disk for a better new virtual install
sudo /sbin/mkfs.ext3 -j /dev/VolGroup00/appliance1
You can now start install a new fresh capp1 virtual system
sudo virt-install -n capp1 -r 1024 --vcpus=2 --os-variant fedora11 --os-type linux \ -l http://mirrors.kernel.org/fedora/releases/11/Fedora/x86_64/os/ \ --disk="path=/dev/VolGroup00/appliance1" --nographics --noacpi --hvm --network=bridge:br2 \ --accelerate -x "console=ttyS0 ks=http://infrastructure.fedoraproject.org/rhel/ks/fedora ip=220.127.116.11 netmask=255.255.254.0 gateway=18.104.22.168 dns=22.214.171.124"
Note: If the network messes up during the prompt install, just configure it manually. NM will takes care of it then.
Note2: The above ks file seems to have graphical install as install method. Rebuild one or do a manual install to continue.
capp1 network interfaces will need to be setup manually in order to work against physical one.
Here is how to proceed, create your network interface
sudo vi /etc/sysconfig/network-scripts/ifcfg-eth1
Then add this following configuration to the file
DEVICE=eth1 BOOTPROTO=static ONBOOT=yes PEERNTP=yes IPADDR=$physical_br_IP NETMASK=$physical_br_NETMASK HWADDR=$random_mac_addr
Reproduce the above for eth3 against br3
You can get br? IP and netmask on cnode1 with <ifconfig> cd-line.
VMs can't receive tasks anymore
If for some reason VM appear to not receive tasks, this's because it's not reachable anymore.
Reloading your browser will show the VM as state <unreachable> in the ovirt UI.
At first, check that hosts are still available. If so, you gonna have to do a manual reload of ovirt taksOmatic on the broker connectivity.
Log in to capp1 and restart services in this order (hold few sec for each)
sudo service ovirt-taskomatic restart
If you get an error like below from taskomatic.log :
ERROR Wed Nov 11 18:46:11 +0000 2009 (3382) Task action processing failed: RuntimeError: No agent responded within timeout period
Restart qpid servive first (taskomatic process will died which is normal)
sudo service qpidd restart
Cnodes or VM are unreachable
If cnode(s) or VM got unreachable, there're a couple of way to figure out what's going on.
- 0. first off, logs are always useful (specificaly : db-omatic.log and task-omatic.log).
- 1. Check that cnode(s) or VM are still "physically" reachable.
- 2. If there are but VM, check if libvirt-qpid or libvirtd is still running on related cnode(s).
If there're not :
sudo libvirtd-qpid start sudo libvirtd start
- 3. If both cnode and VM are still UP and running, it could be a timeout on qmf connectivity or db-omatic died without any reason. The best way to fix this is to reload ovirt qmf/qpid's process as follow :
sudo service qpidd restart sudo service ovirt-db-omatic restart sudo service ovirt-taskomatic restart