HPUX ServiceGuard Cluster Manager
HPUX ServiceGuard Cluster Commands and “Patching” example:
First off, read the man pages. There aren’t that many, and they aren’t that long:
ls -1 /usr/sbin/mk* | cut -d’/’ -f 4 | xargs -l man | col -b >> clustermanpages
(See article on how to dump man pages to Word also.)
A node is a computer and a package is the application that runs on a node.
The cluster is two or more systems that see some of each others disks and can run each others applications.
The application, or package, can only be run on one node at a time. All of the cmxxx commands can be run on either node and you will get the exact same output regardless of which node you issue the command on.
cmhaltnode is very friendly and safe according to the man pages. Run on it’s own, it won’t halt the node if any packages are running on it. If you use the -f option, it will halt the packages first and then they will start on the other node (if failover is set). If a package fails to halt, then cmhaltnode will fail. It won’t stop a node with a package running on it.
Moving packages - There isn’t really a “move” command. The man pages say to move a package like this: (say from node 1 to 2)
cmhaltpkg package1
cmrunpkg -n node2 package1
cmmodpkg -e package1
(Note, both the halt and run commands will operate on the package no matter what node it is on, if a node isn’t given with the -n command)
(See note below for purpose of cmmodpkg)
cmhalt may confuse you. When it is run, the cluster knows it, and assumes you meant what you said, so it will not “fail over” to another node. Failover only happens when the package fails for other reasons, or if you use the cmhaltnode program with the -f option, which does allow them to move over.
In fact, due to this, after moving a package manually, you need to reenable package switching for the pacakge with “cmmodpkg -e packagename” since cmhaltpkg disabled package switching for that package. You can see the status of package switching with cmviewcl.
One issue is Failback. If it is set to “auto” on any package, it could present a problem. When your node comes back up, packages may unexpectedly fall back onto it.
Run:
cmviewcl -v | grep Failback
will list all of the Failback settings. If they all say “manual” then you are in good shape, because a package will not move back to its primary node without manual intervention.
It is possible to use cmmodpkg to tell the packages that they may not move to a given node, which would help you in this case. See the man page for cmmodpkg
Before doing anything crazy, you should do a cmviewcl and a cmviewcl -v and copy down the info. That way you can see how things were set up before.
When you are done, do a cmviewcl -v and make sure “PKG_SWITCH” is enabled for all packages. It is possible for the cmhaltpkg to disable this on some of them, and for you to forget to put it back.
Here are some tests. These should come back with nothing:
cmviewcl -v | grep disabled
cmviewcl -v | grep down
These should come back with everything:
cmview -v | grep enabled
cmview -v | grep up
Check to see that everyone is running on their primary server:
cmviewcl -v | grep Primary
Run cmview -v through MORE also, and just look to see that it all looks right.
You should find out from application contact what order the package should go down and up in. 99.9% of the time the order does NOT matter. Common sense says the come up 1,2 3 and go down 3 2 1. On this box pkgftp* can go down anytime and pkg01-05 - go down in reverse and up in order.
cmviewcl
CLUSTER STATUS
clustername-cl5 up
NODE STATUS STATE
server1 up running
PACKAGE STATUS STATE PKG_SWITCH NODE
pkg01 up running enabled server1
pkg02 up running enabled server1
pkg03 up running enabled server1
pkg04 up running enabled server1
pkg05 up running enabled server1
pkgftp1 up running enabled server1
NODE STATUS STATE
server2 up running
PACKAGE STATUS STATE PKG_SWITCH NODE
pkg06 up running enabled server2
pkg07 up running enabled server2
pkg08 up running enabled server2
pkg09 up running enabled server2
pkg10 up running enabled server2
pkgftp2 up running enabled server2
So, as root from any member of the cluster:
(I suggest that you do a cmviewcl between these commands periodically to make sure that what you expect to happen is actually happening.)
cmhaltpkg pkgftp1
cmrunpkg -n server2 pkgftp1
(better to do it this way than just downing the node, unless you really feel cool, in which case, you could technially just do a ‘cmhaltcl server1′ and all packages should move over to server 2 automatically. See the man page.)
cmhaltpkg pkg05
cmrunpkg -n server2 pkg05
cmhaltpkg pkg04
cmrunpkg -n server2 pkg04
cmhaltpkg pkg03
cmrunpkg -n server2 pkg03
(you can’t hardly mess up these commands, they will complain if you tell it to do the wrong thing)
cmhaltpkg pkg02
cmrunpkg -n server2 pkg02
cmhaltpkg pkg01
cmrunpkg -n server2 pkg01
(Some boxes can take 30 minutes per package! Some boxes take 45 minutes to move stuff over!)
(The command will hang there until it is done moving the package, so that is one good reason to do them one at a time.)
(Remember, the order may be important, so ask the app contact ahead of time!)
cmmodpkg -e pkg01 pkg02 pkg03 pkg04 pkg05 pkgftp1
(Turns the enable for failover back on for all packages.)
(This should be done if there is a 3 node cluster, so that they can fail to node 3, otherwise they cannot fail over to anywhere after this)
(By reenabling it, if the other node did go down after your patched node came back up, they could come over to it before you did it. My trainer had this happen once, where his patched server came back up and then suddenly pakcages came to it, b/c another box failed on him suddenly during the patching window)
(The cmhaltpkg command automaticall disables the “pkg_switch” option, as you will see in a cmviewcl display. This is covered in the man page for cmhaltpkg. The concept is that if you halt a package manuall, you don’t want it to go starting up anywhere, you want it to stay halted, or to start where you put it and stay there.)
(Also, you may get patrol alerts if the pkg_switch id disabled)
cmhaltnode server1
cmviewcl
(may take a few minutes, before the node finished reforming (reads soem files)) before cmviewcl shows things properly
shutdown -y 0, or whatever it is you need to do.
After the server comes back up, the cluster will come up by itself (remember, you halted it), but the packages should not move over to it, UNLESS you have AUTO_FALL_BACK set on, and the PKG_SWITCH is enabled.
(Some clients let the SA edit the pkg files, some clients don’t want the SA to mess with them at all.)
When moving packages BACK to their home server, you can use a little trick. Just cmhaltpkg the package, that halts it and disables the PKG_SWITCH, then just do a ‘cmmodpkg -e pacckage’ and it will automatically start back up, on the package’s primary server. You get to skip the ‘cmrunpkg -n server package’ command for each package.
cmhaltpkg pkgftp1
cmmodpkg -e pkgftp1
cmhaltpkg pkg05
cmmodpkg -e pkg05
cmhaltpkg pkg05
cmmodpkg -e pkg05
(Only downside here is the cmmodpkg comes back before the package has started, unlike the cmrunpkg, so you have to use cmviewcl to see when the package has started up.)
you can also:
cmhaltpkg pkg03 pkg02;cmmodpkg -e pkg03 pkg02
to save a little typing
(One cool note, if you are patching or upgrading boxes, moving all of the packages to server1 after you patched it and before you patch server2, and then testing the application gives you a quick and dirty real world test of whether whatever you did will break the application before you upgrade server2. If it does break the application right away, then you have server2 still in pre-broken state, so you can just move things there while you roll back server1.)
