Lofland bLOG

Unix Notes

I am currently a Unix System Administrator by vocation. I work primarly with SUN Solaris, although I dabble in Linux, HPUX and AIX (in that order). These are reference notes that I have taken in my studies of and work with Unix and Linux. Very little of it is original with me. I have just put it here so that I can easily reference it. If it is also of help to you, wonderful! Please leave a comment if you see a problem or can improve on something.

HPUX Commands

Filed under Unix Notes on Wednesday, May 2nd, 2007 @ 9:50am by Christen

HPUX single user mode.

when it is booting, wait for the “hit any key” and do so

mount -a : mounts everything, which may be important if in single user mode, which mounts only /

Use these commands to see what boot device you are using/should be using:
setboot
lvlnboot -v

Quick SUN Solaris Disk commands

Filed under Unix Notes on Wednesday, May 2nd, 2007 @ 9:45am by Christen

Commands to see what you have for disks on Solaris:
vxdisk list
vxprint
print|format
iostat -En
luxadm display FCloop
These are all “harmless” commands that just display info.

SSH to SSH slow

Filed under Unix Notes on Monday, April 23rd, 2007 @ 11:37am by Christen

I have noticed that sometimes when you ssh into one system, and then try to ssh into another system from there, it hangs for a while during the login.

ssh hostname

If you TELNET into the first server and then SSH from there, it is MUCH faster!

using ssh -v hostname showed me that it was hanging when trying to set up X11 forwarding, so a quick fix was like this:

ssh -x hostname

And it should go much faster. This is especially noticeable in for loops like this:

for i in $(cat SystemList);do ssh -x $i ‘hostname;cat /etc/passwd’ >> output;done;less output;rm output

Solaris will not boot due to disk errors

Filed under Unix Notes on Wednesday, March 21st, 2007 @ 8:25am by Christen

when Solaris goes into singel user mode due to disk problems, just type:

fsck -y

do that over and over until it is happy, then exit

if it is still not happy, it will dump you to single again, so fsck -y again

eventually it will be happy and finish booting.

Now if it boots ok, another reboot is probably in order, for a test.

Use awk to combine lines

Filed under Unix Notes on Tuesday, March 20th, 2007 @ 2:34pm by Christen

I used this script to find to find out what version of Sendmail was on each server by package:

for i in $(cat SystemList);do ssh $i ‘hostname;pkginfo -l Sendmail | grep VERSION:;pkginfo -l OtherSendmail | grep VERSION:’ >> output;done;cat output

What ouput looks like is this:

hostname1
VERSION: 8.12.10
hostname2
VERSION: 8.13.7
hostname3
VERSION: 8.13.8
hostname4
VERSION: 8.13.8
hostname5
VERSION: 8.13.8

This is hard to parse, I wanted it on one line.

Per this web page: http://unix-simple.blogspot.com/2006/12/awk-script-to-combine-lines-in-file.html

I modified the code slightly and ran this:

cat output | awk ‘{d=d”"$o}
/VERSION/ {
print d
d=”"
}’

and got this:

hostname1 VERSION: 8.12.10
hostname2 VERSION: 8.13.7
hostname3 VERSION: 8.13.8
hostname4 VERSION: 8.13.8
hostname5 VERSION: 8.13.8

VERY cool, and easy to parse, and even to stick into Excel. :)
____

Make it comma delimited:
for i in $(cat SystemList);do ssh $i ‘uname -n;echo ,;crontab -l|grep -i SEARCHTEXT1;echo ,;crontab -l|grep -i SEARCHTEXT2;echo done’>>output;done;less output

cat output | awk ‘{d=d”"$o}
/done/ {
print d
d=”"
}’

Move root’s home dir from / to /root

Filed under Unix Notes on Tuesday, March 20th, 2007 @ 2:33pm by Christen

Here are the step by step (maybe to script soon):
Open a console connection to the box and log into it. Just leave it there for emergency.
from SSH login
cd
pwd
mkdir /root
ls -la
rm .profile.orig
mv .forward .profile .rhosts .sh_history .ssh .Xauthority /root/
mail -f mbox
mail -f mbox
rm mbox
ls -la
vi /etc/passwd
SSH in again and see if it works.
Log off and back on at console to make sure it works.
pwd
whoami

Expect

Filed under Unix Notes on Tuesday, March 20th, 2007 @ 9:56am by Christen

Expect is really cool, and you can use autoexpect to MAKE an expect script from a session, like a macro recorder!
http://www.linuxjournal.com/article/3065
http://www.oreilly.com/catalog/expect/chapter/ch03.html
Here is my script to just telnet to port 25 to see the Sendmail version:

set timeout 10
spawn telnet $argv 25
match_max 100000
expect -exact “sendmail”
send — “quit\r”
expect eof

It was made with autoexpect and then edited, there is some more stuff in the file autoexpect spit out.
Here is how I loop it:
for i in $(cat MissingList);do echo $i >> output;test2.exp $i | grep -i sendmail >> output;done;cat output

Solaris Patch Return Codes

Filed under Unix Notes on Tuesday, January 30th, 2007 @ 11:58am by Christen

#               0       No error
#               1       Usage error
#               2       Attempt to apply a patch that’s already been applied
#               3       Effective UID is not root
#               4       Attempt to save original files failed
#               5       pkgadd failed
#               6       Patch is obsoleted
#               7       Invalid package directory
#               8       Attempting to patch a package that is not installed
#               9       Cannot access /usr/sbin/pkgadd (client problem)
#               10      Package validation errors
#               11      Error adding patch to root template
#               12      Patch script terminated due to signal
#               13      Symbolic link included in patch
#               14      NOT USED
#               15      The prepatch script had a return code other than 0.
#               16      The postpatch script had a return code other than 0.
#               17      Mismatch of the -d option between a previous patch
#                       install and the current one.
#               18      Not enough space in the file systems that are targets
#                       of the patch.
#               19      $SOFTINFO/INST_RELEASE file not found
#               20      A direct instance patch was required but not found
#               21      The required patches have not been installed on the manager
#               22      A progressive instance patch was required but not found
#               23      A restricted patch is already applied to the package
#               24      An incompatible patch is applied
#               25      A required patch is not applied
#               26      The user specified backout data can’t be found
#               27      The relative directory supplied can’t be found
#               28      A pkginfo file is corrupt or missing
#               29      Bad patch ID format
#               30      Dryrun failure(s)
#               31      Path given for -C option is invalid
#               32      Must be running Solaris 2.6 or greater
#               33      Bad formatted patch file or patch file not found
#               34      The appropriate kernel jumbo patch needs to be installed
#               35      Later revision already installed

Show Installed patches on Solaris

Filed under Unix Notes on Monday, December 11th, 2006 @ 4:03pm by Christen

showrev -p

Move network cable to another port on Solaris

Filed under Unix Notes on Monday, October 30th, 2006 @ 12:12pm by Christen

Port CE0 on the quad card (NIC) appears to be bad.

Even though the lights were all showing good link status on the card, the box was unpingable. When I put in the ticket with SUN on the NIC, the SUN tech was actually on site. He suggested moving the cable to another port on the same card. After he did this and I plumbed the port and assigned an address to it, connectivity was restored.

So the box is working now, but port CE0 is bad.

Here is how it looks now:

root@hostname: ifconfig -a
lo0: flags=1000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=11000802 mtu 1500 index 2
inet 192.168.1.78 netmask ffffff00 broadcast 192.168.1.255
groupname nicgroup
ether …
ce2: flags=1000843 mtu 1500 index 3
inet 192.168.1.78 netmask ffffff00 broadcast 192.168.1.255
ether …

Please be aware that when the NIC is replaced, the network cable needs to be moved back to port CE0.

Also please be aware that if the box is rebooted before the NIC is replaced it will revert to using CE0. You will need to manually bring up CE2 again. There is no reason that I am aware of for the box to be rebooted before the NIC is replaced.

Here are the commands I used to bring up the IP on CE2:

ifconfig ce2 plumb
ifconfig ce2 192.168.1.78
ifconfig ce2 netmask 255.255.255.0 broadcast 192.168.1.255
ifconfig ce2 up
ifconfig ce0 down

Replace Text in Files with Perl

Filed under Unix Notes on Monday, October 30th, 2006 @ 11:58am by Christen

I didn’t write this, just posting it here for my reference:

Subject: UNIX Trick

Here’s a really quick way to edit a bunch of files in a search and replace manner. For example, I had 80 files for sending test transactions and had to change the IP Addr when I copied them to each server. I could have manually edited each one, but by using the following, I was able to change it in all 80 in a matter of about 10 seconds. It’s done using “perl” from the command line and is similar to using global replaces within “vi” if you’ve ever used that.

perl -pi -e ’s/wordToFind/replaceWithThisWord/’ *.fileExtension
perl -pi -e ’s/wordToFind/replaceWithThisWord/g’*.fileExtension
perl -pi -e ’s/wordToFind/replaceWithThisWord/gi’*.fileExtension

Remember to escape any special characters like “*”, “.”, etc by putting a “\” in front. Here’s the command I used this morning that shows what I mean:

perl -pi -e ’s/192\.168\.1\.75/10\.0\.0\.79/g’ *

The above command replaces 192.168.1.75 with 10.0.0.79 in all the files in the directory I was working in. I put the “\” in front of the “.” to instruct perl to ignore the special meaning of the “.”

Useful Links for Unix Admins

Filed under Unix Notes on Monday, October 30th, 2006 @ 10:39am by Christen

Network Calculators:
http://www.subnetmask.info/

Miscilaneous Solaris Notes:
http://www.brandonhutchinson.com/Miscellaneous_Solaris_notes.html

NIC Speed:
http://docs.sun.com/source/816-2128/paramset_2.html

Java slow? Research. Ideas:
http://www.biostat.wustl.edu/archives/html/s-news/2001-02/msg00105.html
http://groups.google.com/group/borland.public.jbuilder.ide/browse_thread/thread/a31006d3ab995820/1924a9c0a5d82e4c?lnk=st&q=java+slow+xwindows&rnum=3&hl=en#1924a9c0a5d82e4c

Php arrays:
http://us3.php.net/manual/en/function.in-array.php

SUN Account PW’s and locking/non-login accts.
http://blogs.sun.com/gbrunett/entry/managing_non_login_and_locked

Filesystem Corruption on a Veritas Disk

Filed under Unix Notes on Monday, October 30th, 2006 @ 10:26am by Christen

I had filesystem corruption on a veritas disk. In order to fix it I had to do this:

You will need to:
1) boot from an alternate disk from the ok prompt boot otherlocation -s
2) stop the Veritas volumes vxvol -g rootdg stopall
run a vxprint -htrg rootdg and make sure the volumes are disabled
3) fsck -o f -y /dev/rdsk/c#t#d#s0 where c#t#d# it the rootdisk from a vxdisk list
reitterate until it gives no errors, or until it only gives the “FILE SYSTEM STATE IN SUPERBLOCK IS WRONG” error.
root@hostname: vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced appldg01 appldg online
c1t1d0s2 sliced appldg02 appldg online
c1t2d0s2 sliced - - error
c2t0d0s2 sliced rootdisk rootdg online
c2t1d0s2 sliced hotspare rootdg online nohotuse
c3t0d0s2 sliced rootmirror rootdg online
c3t1d0s2 sliced appldg03 appldg online
c3t2d0s2 sliced appldg04 appldg online

- then re-enable the rootvol:
vxvol -g rootdg start rootvol

Then check it again, now it checks the mirors too:

fsck -o f -y /dev/vx/rdsk/rootdg/rootvol

4) make sure you can mount the root disk by slices mount /dev/dsk/c#t#d#s0 /mnt
5) umount /mnt
6) reboot off of rootdisk

CORE files on SUN Solaris

Filed under Unix Notes on Wednesday, October 25th, 2006 @ 1:29pm by Christen

I expect to find core dump files in /, and I often do, but also if the system crashes, the system core dumps should be saved under /var/crash

Screen Notes

Filed under Unix Notes on Tuesday, October 3rd, 2006 @ 11:24am by Christen

I found some time recently to play with the screen program.

Telnet into your server.
Now run:
screen -R -T vt100

Now do whatever you want, open a file with less, edit a file in vi, or
start a TUI, or whatever.

Now hit that big X in the upper corner. That is right, just toast your
telnet session, don’t exit out if it, just toast it.

Now log into your server again.
Now run:
screen -R -T vt100

You should be right back where you left from! Screen does not terminate
when your session drops, and it holds whatever programs, or sessions you
have open with it open too. Even ssh sessions to other servers.

I have the line “screen -R -T vt100″ as the last line in my .profile on
my server, and it was saving my tail almost daily while I was WFH. Every
time the VPN dropped me, I didn’t have to start over what was doing, or
wonder just what the patch job did when it got terminated half way
through.

(Technical NOTE: The -R tells screen to reconnect to any unconnected
screen session for the user, and if there isn’t one, start a new one, so
it either starts new, or connects. The -T gives screen the terminal
type. I found that programs like vi went nuts without this.)

Screen does lots of other cool things too. Here are some good sites
about it:
http://gentoo-wiki.com/TIP_Using_screen
http://www.hmug.org/man/1/screen.php
http://www.bangmoney.org/presentations/screen.html

Another cool feature is that two (or more) people can connect to the
SAME screen session. Great for training and collaborating. Since you can
do a telnet/ssh session from screen, there is no need for it to run on
your server, you can just start from your server and go from there. I
haven’t thoroughly tested that feature yet.

Here is my .screenrc file:
hardstatus on
hardstatus lastline
hardstatus string ‘Current:%n %t | %W | %C%A %D, %M %d, %Y’
vbell_msg “BEEP!”
bell_msg ‘BEEP on %n’
vbell off

It gives me a nice status line at the bottom with a list of the open windows within screen and their names. I use some scripts to make sure my new windows get good names when I open them. I haven’t experiemented with the status line much, but it could probably be a lot fancier.
MultiUser Mode:

you have to go to the command mode (ctrl a :) and type multiuser on and
then command mode again and type acladd userid where userid is the
person you want to share with and you have to do that for every userid.

SUN Patch Return Codes

Filed under Unix Notes on Tuesday, September 26th, 2006 @ 1:18pm by Christen

Patch Return Codes

The complete list:
#               0       No error
#               1       Usage error
#               2       Attempt to apply a patch that’s already been applied
#               3       Effective UID is not root
#               4       Attempt to save original files failed
#               5       pkgadd failed
#               6       Patch is obsoleted
#               7       Invalid package directory
#               8       Attempting to patch a package that is not installed
#               9       Cannot access /usr/sbin/pkgadd (client problem)
#               10      Package validation errors
#               11      Error adding patch to root template
#               12      Patch script terminated due to signal
#               13      Symbolic link included in patch
#               14      NOT USED
#               15      The prepatch script had a return code other than 0.
#               16      The postpatch script had a return code other than 0.
#               17      Mismatch of the -d option between a previous patch
#                       install and the current one.
#               18      Not enough space in the file systems that are targets
#                       of the patch.
#               19      $SOFTINFO/INST_RELEASE file not found
#               20      A direct instance patch was required but not found
#               21      The required patches have not been installed on the manager
#               22      A progressive instance patch was required but not found
#               23      A restricted patch is already applied to the package
#               24      An incompatible patch is applied
#               25      A required patch is not applied
#               26      The user specified backout data can’t be found
#               27      The relative directory supplied can’t be found
#               28      A pkginfo file is corrupt or missing
#               29      Bad patch ID format
#               30      Dryrun failure(s)
#               31      Path given for -C option is invalid
#               32      Must be running Solaris 2.6 or greater
#               33      Bad formatted patch file or patch file not found
#               34      The appropriate kernel jumbo patch needs to be installed
#               35      Later revision already installed

srsproxy

Filed under Unix Notes on Monday, July 10th, 2006 @ 2:42pm by Christen

srsproxy is part of sun’s netconnect monitoring software, you can kill -9 it, it will restart on it’s own

Same for sh_prv, ssha_pvr_exec & ssh_pvr_runner.sh

inetd

Filed under Unix Notes on Monday, July 10th, 2006 @ 2:35pm by Christen

Inetd monitors ports and when a connection is made, it passes them to a given program.

The configuration is in /etc/inetd.conf

If you edit that file, to get inetd to read it again, just HUP it:

ps -ef|grep inetd
kill -HUP pid

Mailman & Sendmail

Filed under Unix Notes on Monday, July 3rd, 2006 @ 11:45am by Christen

Mailman works by using the aliases file to have mail sent to certain addresses to a program (mailman) using the aliase file.

Probably, if you get it set up right, you’ll see this error in your maillog:

Jul  3 10:44:36 SERVER smrsh: uid 1: attempt to use “mailman post MAILADDRESS” (stat failed)
Jul  3 10:44:36 SERVER sendmail[5555]: NUMBERS: to=”|/folders/mailman/mail/mailman post MAILADDRESS”, ctladdr= (1/0), delay=00:00:01, xdelay=00:00:00, mailer=prog, pri=30539, dsn=5.0.0, stat=Service unavailable

Here is the text from mailman’s docs:

Problem: I use Sendmail as my mail server, and when I send mail to the list, I get back mail saying, “sh:
mailman not available for sendmail programs”.
Solution: Your system uses the Sendmail restricted shell (smrsh). You need to configure smrsh by creating a
symbolic link from the mail wrapper (‘$prefix/mail/mailman’) to the directory identifying executables allowed
to run under smrsh.
Some common names for this directory are ‘/var/admin/sm.bin’, ‘/usr/admin/sm.bin’ or ‘/etc/smrsh’.
Note that on Debian Linux, the system makes ‘/usr/lib/sm.bin’, which is wrong, you will need to create the
directory ‘/usr/admin/sm.bin’ and add the link there. Note further any aliases newaliases spits out will need to
be adjusted to point to the secure link to the wrapper.

and here is the fix:

ln -s /appl/wfa/mailman/mail/mailman /usr/adm/sm.bin/mailman

How to do a screenshot on the Zaurus (and on X in general?)

Filed under Unix Notes on Thursday, June 29th, 2006 @ 9:38am by Christen

This is straight out of the forums at www.oesf.org

Use the following command:
xwd -display :0.0 -root -out screenshot.xwd
this will take a screenshot right away - so often its better to use:
sleep 5;xwd -display :0.0 -root -out screenshot.xwd
then use gimp to convert.

If you install Imagemagick you can convert it directly, I use a simple script I like this to take the screenshot, convert and name it with a unique name. Adjust the sleep command and the location of where the files go to your liking.
CODE

sleep 10

if test -z “$1″; then
let i=1
f=”/mnt/card/screen$i.png”
echo $f
while test -f $f; do
let i=1+$i
f=”/mnt/card/screen$i.png”
done

else
f=”$1″
fi

xwd -display :0.0 -root -out “$f”.xwd
convert “$f”.xwd “$f”
rm “$f”.xwd

Root’s Shell

Filed under Unix Notes on Tuesday, June 13th, 2006 @ 8:26am by Christen

In the old days when /usr was a separate file system, making your shell /bin/ksh was not a good thing, because /bin was a link to /usr/bin and if /usr didn’t mount you had no shell. However, /sbin is part of root so you always had /sbin.

At least here, the way our filesystem is normally laid out today, /usr is part of / so this isn’t an issue.

I always change the shell to /bin/ksh, make sure you make it /bin and not /sbin or you will lock your self out of the box.

So, in short, make sure your shell will be available if ONLY / mounts.

Also, it is a good idea to test that you can log into the box again before terminating the connection you used to make the change.

SUN Solaris & Veritas bug

Filed under Unix Notes on Wednesday, May 31st, 2006 @ 8:00am by Christen

There is bug in Veritas volume manager 3.5 and greater with an encapsulated rootdisk and ufs logging enabled. We enable ufs logging on systems running Solaris 8, and on Solaris 9 it is enabled by default. The error message you see is below as well as a resolution to this problem. Anyone with systems running Solaris 8 or greater needs to verify that logging is either disabled or that the necessary patch for Veritas volume manager is applied. See the sunsolve article for patch numbers and fixes.

ERROR MESSAGE ON THE CONSOLE
00:19:24: WARNING: Error writing ufs log state
00:19:32: WARNING: ufs log for / changed state to Error
00:19:32: WARNING: Please umount(1M) / and run fsck(1M)
00:19:33: WARNING: Error writing master during ufs log roll
00:19:33: WARNING: ufs log for / changed state to Error
00:19:33: WARNING: Please umount(1M) / and run fsck(1M)
00:19:34: Cannot mount root on /pseudo/vxio@0:0 fstype ufs

http://sunsolve.sun.com/search/document.do?assetkey=1-26-57636-1

Example commands to check your server:
grep -i /rootvol /etc/vfstab
pkginfo -l VRTSvxvm | grep VERSION
patchadd -p | grep 112392

If you do not want to apply the patch, you can add nologging to the end of the rootvol and var mount lines in /etc/vfstab like this: (one line with and one without)
/dev/vx/dsk/rootvol /dev/vx/rdsk/rootvol / ufs 1 no nologging
/dev/vx/dsk/var /dev/vx/rdsk/var /var ufs 1 no -

You must reboot for the above to take affect, either way. It may take two reboots for it to actually “take” when going from “nologging” to “-” The first boot creates the metaspace for the log, and the second actually starts logging.

IF you DO get a machine that will not boot because of this, then you can try booting from the oscopy disk and fixing things like this:

In the OBP:

ok setenv auto-boot? false
ok reset-all
(it won’t try to boot since auto-boot is false)

ok boot oscopy -s

vxdisk list

ksh -o vi

mount /dev/dsk/c1t0d0s0 /mnt
cd /mnt
ls -la
cd
umount /mnt
fsck /dev/dsk/c1t0d0s0 /mnt
#IF var also has trouble:

mount /dev/dsk/c1t0d0s5 /mnt/var
cd /mnt/var
umount /mnt/var
mount -o nologging n/dev/dsk/c1t0d0s5 /mnt/var
umount /mnt/var
umount /mnt
You have to keep doing it until all errors are gone. You could try this:

for i in 0 5 6 7
do
fsck -o f -y /dev/rdsk/c1t0d0s${i}
done

SAME
fsck -o f -y /dev/rdsk/c1t0d0s${i}

OVER AND OVER until ALL errors are gone
See if it will work:

mount -o nologging /dev/dsk/c1t0d0s0 /mnt
You must also change the logging status:
cd /mnt/etc
vi vfstab

in vfstab
add nologging to the end of the rootvol and var mount lines

Change the Gateway (Default Router) on SUN

Filed under Unix Notes on Thursday, May 11th, 2006 @ 3:22pm by Christen

The default route is carried in a file called /etc/defaultrouter which is read at boot time. In order to avoid having to reboot, you can also add a default route manually and delete the old one.

Here are the steps I took to change the default route on a host recently:

#first do a traceroute for later comparison

traceroute knownhost.anothersub.net

#This command will show you the current default route
netstat -rn

#just for the record, take a look at what the route is now, if any
cat /etc/defaultrouter

#This sets the new default route for the next reboot
echo “10.0.0.1″ > /etc/defaultrouter

#Again, just checking to be sure it worked!
cat /etc/defaultrouter

#Now we add the new default route to the current routing table
#If you are going to reboot, you do not need to do this step
route add default 10.0.0.1

#and we take out the old one
#If you are going to reboot, you do not need to do this step
route delete default 192.168.0.1

#Now we check that the new route is in and the old one is gone
netstat -rn

#and we do one last traceroute and compare it to the first one to see that our work had some affect
traceroute knownhost.anothersub.net

I usually log all of this, just for the record.

NOTE that you can do this while connected remotely via SSH. It will not drop your connection, although it doesn’t hurt to have a console connection open and logged in case you change it to a non-working gateway and need to quickly putit back!

Mounted File system and Mount Point permissions

Filed under Unix Notes on Tuesday, March 21st, 2006 @ 12:12pm by Christen

The permissions on a folder used as a mount point, and the permissions on the top of the filesystem itself are not related.

In other words. If you ls on a directory and then mount a filesystem to that directory and run the same ls command, the permissions will look different.

Thus:

1. Restrict the permissions on the mount point (directory) to be read only. This way, if the filesystem is not mounted for some reason no one can write any files to the directory. This will prevent two things. First, it will prevent someone from filling up the root filesystem by accident. Second, it will prevent files from getting spread out across the directory (inside the mount point) and the filesystem itself. It can be a real mess trying to resync up a filesystem and the files stuck in the directory.

2. When creating a new filesystem, be sure to mount it, and set the permissions, since you can’t do this without the filesystem being mounted.

How do you tell if a directory is just that, or if it is it’s own filesystem? df -k :

root> df -k /home
/home (/dev/vg00/lvol5 ) : 20464 total allocated Kb
8728 free allocated Kb
11736 used allocated Kb
57 % allocation used
root> df -k /etc
/ (/dev/vg00/lvol3 ) : 143360 total allocated Kb
3976 free allocated Kb
139384 used allocated Kb
97 % allocation used
See, /home is its own filesystem, but /etc is in /, as you can see above.

What do you do if you have a directory full of files that should have been in a filesystem mounted at that point?

First, mount the filesystem manually to another directory. Under temp or something.

Use fuser to see if any of the files are open. They must not be.

Then do something like this to get the files over to the mount point.

cd /directory_that_should_not_have_files_in_it

find . -depth -xdev | cpio -dumpx /tmp/mount_point

man find so you can verify my switches here (HPUX). These find switches say to traverse the directory tree first (-depth) and don’t cross a mount point (-xdev). The cpio switches say -d make the directories first, -u unconditionally -m retain modify times -p pass-through from std in and -x save or restore special files.

Now unmount the filesystem from the temp location and mount it properly. Oh! First fix the permissions on that directory so that next time the application or user cannot write to it!

Who is holding a file open?

Filed under Unix Notes on Monday, March 13th, 2006 @ 2:53pm by Christen

lsof will tell you who is using what file system. It is a pretty standard unix util, but not all OS’s install it by default. There is a package to install it on HPUX

man lsof

fuser can also be used to find out what proccess has a file open on a file system

man fuser

Running out of free space in /var on HPUX

Filed under Unix Notes on Monday, March 13th, 2006 @ 11:24am by Christen

cd /var
bdf .
= free space just here

cleanup -c 1 = clean up any patches that have been superceded at least ome time. May free up some space

check /var/tombstones
check /var/adm/wtmp = last user login info, type “last” to get the info, most systems zero this out daily

Use this to list how much space is used by each directory, starting with the biggest. Useful for spotting where the problem may be:
du -kxa . | sort -rn | more

remember:

> filename = zero a file out, used for open files! (same as cat /dev/null > filename)

/var/adm/wtmp holds the info that “last” shows It gets quite huge if noone cleans it out. Most boxes have a cron job to do this just:
> /var/adm/wtmp to empty it

last = last good login
lastb = last login failure

Remember to cp things out to another filesystem and gzip them before zeroing them out if you are unsure about whether they will be needed later or not.

Open Log Files

Filed under Unix Notes on Monday, March 13th, 2006 @ 11:02am by Christen

I’ve patched this together, some of my facts may be wrong, but I think the solution is right.
When an application is writing to a file, it may keep the file open. The application will “grab” the file and open a “handle.” If you delete the file, the application still has an open handle to the file’s “inode.” Deleting or moving the file out from under the application can have some bad affects. One possibility is that the applicatoin just keeps writing to the same handle, and the space is not cleared from the drive. So if your drive was filling up, it just keeps filling. Another possibility is that the application just gets upset and will not log anymore until you restart it.

The solution is to “zero” the file instead by simply running:

cat /dev/null > file_name

this has become such a common thing to do that on most Unix flavors this does the same thing:

> file_name

That will “zero” the file so that it is empty, but leave the inode alone, and the open handle that the applicatoin has open will still work, but now your disk wil have more free space again!

If you need to preserve the file’s contents, use the cp command to copy the file to somewhere else first befire you “zero” it. Then gzip the copy. Just don’t ever use mv or rm on an open file!

The best thing to do is stop the application, then move the log, then restart it, but usually you don’t want to do that, and besides, isn’t “uptime” what unix is all about? Also, some application simply open the file for each log entry and then close it again, in which case you can rm or mv the file, but knowing that is more difficult than just using the “zero” method above

Logical Volume Manager (LVM) on HPUX

Filed under Unix Notes on Thursday, March 9th, 2006 @ 4:36pm by Christen

See page 495 of “HPUX CSA” by Rafeeq Rehman
This entry contains notes from training by coworkers, personal experience, and the above mentioned book. I do not claim any of it to be original with me, unless you see a mistake. I am sure the mistakes are mine.

Take a look at your disks. An easy way is to use bdf. This is kind of the HPUX equivalent of df.

vgdisplay will show all of the existing volume groups, and vgdisplay -v will give the details about the logical and physical volumes in the group.

Each volume group has a unique directory under /dev/ where the LVM device files are kept.
Here we call them vg*

/dev/vg00 will usually be the volume for the onboard disks, where HPUX lives. The other vg’s will probably exist on drives from an SAN

Each /dev/vg*/ directory contains three things:

A group file.
- It is a “c” (character special) file
- Each volume group needs one of these files. It is created before the vg is actually built
- It must have a unique “minor number.” The minor number is a hexidcimal number starting with 0x in the long listing. The first two digits are the vg #. The last four digits are always 0’s. If you name your vg groups vg01, vg02, vg03, etc, then you can make the minor number correspond to the vg name. Otherwise I’m not sure that there is any direct correlation.
- The major number for all LVM device files is 64
root:/dev/vg00> ll group
crwxr-xr-x 1 root sys 64 0×000000 Dec 12 2003 group

The minor number for the group file must be unique for each vg (it is a hex number). You only work with the first two digits, the last four are always 0000.
A “b” (block device) file for each logical volume
- This is the device you mount

A “c” (character special) file for each logical volume called rlvolname
-This is the raw device, used to format the filesystem

Each logical volume device file will have the major number 64, and the minor number wil be the same as the group file’s, but with the last digits showing the lv number. Again, like the vg#’s, they may or may not correspond to the names (LV01, LV02, etc.), but they will be sequential. See pg 499 of the HPUX CSA book.
You can see the various mount pounts (logical volumes) for HPUX under /dev/vg00
- Remember, group is not a lvol, it is a special file for the vg

The first thing you have to do before making a new vg is find out what the 0x numbers are for the goup file in each volume group, so that we can make a new one.

To find all of them:
find /dev -name group -exec ls -al {} \;

root:/root> find /dev -name group -exec ls -al {} \;
crwxr-xr-x 1 root sys 64 0×000000 Dec 12 2003 /dev/vg00/group
crw——- 1 root sys 64 0×040000 Mar 8 11:04 /dev/vgfapp/group
crw——- 1 root sys 64 0×050000 Mar 8 11:04 /dev/vgfdata/group

or just typing this usually works too:
ll /dev/*/group
(ll = ls -la on non HPUX machines)

So now we can use 0×020000, 0×030000, or 0×060000 and above. They don’t have to be sequential, but there isn’t any point in using random numbers here.

mkdir /dev/vg02

man vgcreate: look under examples, the mkdir and mknod command is right there

mknod /dev/vg02/group c 64 0×020000

We are telling it to make a character special file, with a major number of 64 (all volume groups have a major number of 64) and the minor number of 0×020000 (which we explained above).

Now we need to find some free hard drives (disks) to use:

The disks on the system are listed here:
ls /dev/dsk/*

To list all of the existing volume groups, and show what disks they are using you can run this:
vgdisplay -v

ioscan -funCdisk shows a bunch of disks, but how do we know how/when/where they are used? Plus with a SAN, there is a primary and an alternate path to each disk (they look like two disks to the system, but they are not), so the SAN disks show twice. You really only have half of the number listed.

Again, how do I know if they are bing used? - HP has no good way of telling you waht is not used.

ls /dev/dsk/* will tell us what disks are available to HPUX
for i in `ls /dev/dsk/*`; do pvdisplay $i; done 1>/dev/null
-so the errors are the drives that are not in a volume group (we piped the stin to null and just watched stderr)

or

To get a good list of available drives:
for i in `ls /dev/dsk/*`; do pvdisplay $i; done 2>/tmp/disk.out
cat /tmp/disk.out | grep -v Could | grep -v belongs | cut -d’ ‘ -f6 | sed s/\”//g | sed s/\.$//g > /tmp/disklist
(rm /tmp/disk.out)

Ok, now /tmp/disklist has a list of available disks.
Just to double check our list:

for i in `cat /tmp/disk2/`;do pvdisplay $i;done 2> /dev/null

-We piped error to null, but ALL should return errors, thus Should be BLANK, meaning everything in the list was NOT a volume

(Remember, all disks on an HP box are in a vg if they are used.)

Now we know what disks are not being used, however, half of those are alternate links. How do we know which ones?

EMC SAN Arrays:

inq -sortsymm = this is an EMC tool, so it works if we have EMC disks in the box Each disk has a serial number. The first 3 digits (220) is the frame serial number, next two E8 is the disk, last part is the frame. So here you can see what drives are the same drive. The Array usually has a bunch of 72 gig drives, but they are shown to us as 8 gig drives. Internal drives will always be a d0, b/c they are not carved up.

Usually with EMC disks, the C#’s are different but the t and d #’s are the same for the same drive on the EMC. It doesn’t have to be that way though, so watch those serial numbers.

Each EMC drive has a unique serial #, so you can see if two disks on the local system are just redundant paths to the same EMC path.

To combine the free disk list from HPUX with the serial number list from the Hitachi SAN, and get a list of availalbe drives with serial numbers:

for i in `ls /dev/dsk/*`; do pvdisplay $i; done 2>/tmp/disk.out
cat /tmp/disk.out | grep -v Could | grep -v belongs | cut -d’ ‘ -f6 | sed s/\”//g | sed s/\.$//g > /tmp/disklist
rm /tmp/disk.out

inq -sortsymm | grep “rdsk” > /tmp/seriallist
for i in kk`cat /tmp/disklist | cut -d”/” -f4`; do grep $i /tmp/seriallist;done | sort -k 5 > /tmp/availabledisklist
rm /tmp/disklist
rm /tmp/seriallist
Echo Your list is in “/tmp/availabledisklist”
#Maybe we could combine a few more commands and ellimnate one more temp file, but the commands start to get insane

Obviously the DVD-ROM isn’t “available” and any other “local” drives are not what we want. The list should clearly show if they are EMC drives or something else.

Also, there are what may be EMC “control” disks, they are small 4 meg, and 2 meg drives. Don’t use these. Just use the ones that are in the standard 8gig size.

Hitachi SAN Arrays:

lunstat -ts | egrep “Serial|Device” | paste -s -d”\t\n” -

This will list the devices with the serial numbers, so you can see which drives are identical.
(Be sure to sanity check it all. The ctd numbers should be similar for same drives, when you add drives to the volume, it should automatically recognize them as “alernate paths” to the same drive.

Don’t use drives that are not on the SAN. They will have different serial numbers.

To combine the free disk list from HPUX with the serial number list from the Hitachi SAN, and get a list of availalbe drives with serial numbers:

for i in `ls /dev/dsk/*`; do pvdisplay $i; done 2>/tmp/disk.out
cat /tmp/disk.out | grep -v Could | grep -v belongs | cut -d’ ‘ -f6 | sed s/\”//g | sed s/\.$//g > /tmp/disklist
rm /tmp/disk.out

lunstat -ts | egrep “Serial|Device|Manufacturer” | paste -s -d”\t\t\n” - > /tmp/seriallist
for i in `cat /tmp/disklist | cut -d”/” -f4`; do grep $i /tmp/seriallist;done | sort -k 9 > /tmp/availabledisklist
rm /tmp/disklist
rm /tmp/seriallist
Echo Your list is in “/tmp/availabledisklist”
#Maybe we could combine a few more commands and ellimnate one more temp file, but the commands start to get insane

The list should clearly show if they are Hitachi drives or something else. We don’t want the drives made by “HP,” etc. They are either internal disks, or things like the DVD-ROM. You can always check on them with ioscan -funCdisk

(Note, that the Hitachi lunstat program works on EMC arrays, so you can use it if you want. However, the EMC inq program does not report the serial number on Hitachi drives, so you it will not work for them.)

Now, to create a volume group:

pvcreate /dev/rdsk/c29t12d2 -> told us it already belongs a to a volume group
pvdisplay /dev/dsk/c29t12d2 -> says no volume group
-problem is, the drive WAS a member of a volume group, but isn’t anymore
–if you vgreduce this drive out of the volume group before you export the vg, this probably won’t happen
-if you are SURE it is not used anymore then:
pvcreate -f /dev/rdsk/c29t12d2 - DANGER! this will wipe out a drive, very careful! the -f forces it.

man vgcreate
vgcreate -e 30000 vg02 /dev/dsk/c29t12d2 - must give it at least one drive.

NOTE: -e sets the Max PE per PV Once you create the volume group and use the disk, by default whatever size that disk is, that is the largets disk you can add by default. If you built it with a 4 gig disk and then later added a 20gig disk, you could only use 4 gigs on the 20gig disk. Instead, we say, give me X “PE Size (Mbytes) extents = Max PE per PV. John always uses 30000 for Max PE per PV, with PE Size (Mbytes) of 4 (default PE size).

vgdisplay (and there it is!) (size is Total PE x PE Size)

Now find the other drive with the same serial number and add it, it should show up as an “alternate link”

vgextend vg02 /dev/dsk/c27t12d2
- It figures out that this is the same disk and adds it as an “Alternate Link”
- You only need to use pvcreate on the disk once, not once for each “link”
vgkk
pvcreate /dev/rdsk/c29t12d3
pvdispaly /dev/rdsk/c29t12d3
pvcreate -f /dev/rdsk/c29t12d3
vgextend vg02 /dev/dsk/c29t12d3
vgreduce vg02 /dev/dsk/c29t12d3
(oops!)
vgextend vg02 /dev/dsk/c27t12d3 (Swapp controllers, so that second drive in the vg has the other controller as primary)
vgextend vg02 /dev/dks/c29t12d3
vgdispay -v
- now primary controller is alternated back and forth, b/c HPUX goes to primary first, then to backup, so this allows the system to use both controllers.

NOTE: We could just put all of the PV names on the same command line with the first vgcreate command, rather than using a bunch of vgextend commands. vgcreate will take the names of multiple physical volumes. A vgextend command will also probably take multiple pv names.

man lvcreate
(size is L or l by size or le number)
lvcreate -L 1000m -n lvol1 vg02
vgdisplay -v (now some disk space is missing from PV’s and there is an LV)

(try to mount it)
cd /
mkdir chris
mount /dev/vg02/lvol1 /chris
(NOT FORMATTED! :) )

newfs -F vxfs /dev/vg02/rlvol2
(newfs is a front end for mkfs)
mount /dev/vg02/lvol1 /chris
bdf
It worked!

Set permissions on the filesystem after you mount it. Setting them on the mount directory first doesn’t do any good.

(now make it 2000m)

lvextend -L 2000m /dev/vg02/lvol1

vgdisplay -v
(now it is bigger)

bdf
(not bigger :( )
We gave the lvol more, but not the filessytem

fsadm -b 2000m /chris

Now the FS has been expanded in place, online

(man pages for fsadm are incorrect, what we did is not in there)

swlist | grep -i online
- fsadm to do this is called “onlinejfs”

DIG FOR SOME FSADM documentation!

mount - will show it is there
vi /etc/fstab
add
/dev/vg02/lvol1 /chris vxfs delaylog 0 2 (0 & 2 is for when to check for dirty bit, 1 = at boot 2= later)

umount /chris

now you can just type
mount /chris
and it knows where to mount it

to remove a volume use vgexport

umount /chris
vgdisplay -v
(good to back it down)
(Remove alternate links first, if you do pri first, it just switches alt to pri, doesnt’ really matter, but…)
lvremove /dev/vg02/lvol1
vgreduce vg02 /dev/dsk/c27t12d3
vgreduce vg02 /dev/dsk/c29t12d3
(I don’t think you can reduce the last disk out of the volume group)
vgchange -a n vg02
vgexport vg02 (dn’t be in /dev/vg02 when you do this or it won’t remove the directory)
(gone)
vgdisplay -v

Create a mirrored volume?

lvextend -m 1 /dev/vgora/lvolora

-But that won’t work if we only have one pv in the vg! You can turn off strict, but it is really silly.

pvcreate /dev/rdsk/c35t9d2
vgextend vgora /dev/dsk/c35t9d2
vgextend vgora /dev/dsk/c34t9d2

lvextend -m 1 /dev/vgora/lvolora

Make a stripe set?

lvcreate -L 1000m -n lvolstripeset -i 3 -I 128 vgora

Patching SUN: (single patch)

Filed under Unix Notes on Tuesday, February 28th, 2006 @ 3:02pm by Christen

patchadd -p will list the installed patches

from console

patchadd PATCHNAME

this is done in multi-user mode

If it asks you, go to single user mode. The patch may actually tell you that it is better to go to single user mode.

To drop to single user mode on SUN:
init s

SUN requires the root password when you go to single-user mode
(You can’t “backdoor” SUN machines from single-user mode. If you don’t know the root password you have to boot from CD and mount the root file system and change the root password that way, kind of like Windows NT.)

Again, now in single user mode, the command is the same:
patchadd PATCHNAME

to get out of single user mode:
exit

It will ask you what runlevel to go to, and you want 3

NOTE: (This is Solaris specific:)

There is a huge difference between patchadd and pkgadd as pertaining to the -d option. Short story is you should NEVER use the -d option with patchadd. Unlike pkgadd where the -d specifies the device (location) for the package, the -d in patchadd tells it not to backup the files being patched, i.e you can never remove the patch if you need to….That is a bad thing. From the man pages:

patchadd:

-d Does not back up the files to be patched. The patch cannot be removed.

pkgadd:

-d device
Install or copy a package from device. device can be a
full path name to a directory or the identifiers for
tape, floppy disk, or removable disk (for example,
/var/tmp or /floppy/floppy_name ). It can also be a
device alias (for example, /floppy/floppy0).

Adding & Removing Disks on HPUX

Filed under Unix Notes on Tuesday, February 28th, 2006 @ 2:40pm by Christen

Exercise. Disk is removed and /dev files wiped out, return the disk to HPUX:

diskinfo /dev/rdsk/c0t8d0

ioscan -funCdisk

disk 4 8/4.8.0 sdisk CLAIMED DEVICE SEAGATE ST34572WC
/dev/dsk/c0t8d0 /dev/rdsk/c0t8d0
(The U option tells ioscan to pull info from the kernel’s memory. It avoids actually scanning the hardware. This is faster, but it also means any changes won’t be found. If you take out the U, then it will scan the hardware, and if there was a change, it will find it. You may not want the system to find hardware changes sometimes, like if an array has gone offline temporarily.)

yank disk 8/4.8.0

ioscan -funCdisk
-still shows the disk as CLAIMED because of the U switch

ioscan -fnCdisk
-finds hardware changes, and returns “NO_HW” instead of “CLAIMED”
-Same thing would happen if anything had tried to read/write the disk


now use rmsf to revove the /dev files for the drive.
-If you ever need to remove a /dev file, don’t use rm, use rmsf (man is your friend)

(The example here is removing a Hitachi SAN disk)

ioscan will show you both the hardware address (That number with the /’s) and the /dev files for the hardware:

disk    101  0/0/10/0/0.1.0.5.0.10.0  sdisk    NO_HW       DEVICE       HITACHI DF600F
/dev/dsk/c34t10d0   /dev/rdsk/c34t10d0
To wipe out a piece of hardare run:
rmsf -a /dev/dsk/c0t8d0

(The -a option should cause this to also remove the /dev/rdsk file and it shoudln’t show up in ioscan -funCdisk anymore either)

Now ioscan will show no signs of the hardware.

If you don’t use the “-a” option then you have to remove both the /dev/dsk and the /dev/rdsk files individually. Then ioscan will STILL see the drive:

disk    101  0/0/10/0/0.1.0.5.0.10.0  sdisk    NO_HW       DEVICE       HITACHI DF600F
So then you must use the hardware address to remove the drive:

rmsf -H  0/0/10/0/0.1.0.5.0.10.0

put disk back in (or put in a new disk)

(The “replace” example here is a local disk, the “new” example is an EMC SAN disk.)

ioscan -fnCdisk
- finds hardware changes and recognizes disk again

Devices physically attached to the system should automatically show up when you run ioscan without the “u” option, as above. However, there are no /dev files for things like disks, so you can’t use them yet.

insf is used to replace the /dev files
(opposite of rmsf)
insf -e is just like box rebooted, it tries to install EVERYTHING, if you don’t remember the hardware path, and youc an do that.
or
insf -e -Cdisk will JUST do the disks, instead of the entire box
or
insf -e -H 8/4.8.0 if you know the hardware address, this is right out of ioscan

Again:
disk     49  0/0/8/0/0.3.0.3.0.4.7    sdisk    CLAIMED     DEVICE       EMC     SYMMETRIX
insf -e -H 0/0/8/0/0.3.0.3.0.4.7

disk     49  0/0/8/0/0.3.0.3.0.4.7    sdisk    CLAIMED     DEVICE       EMC     SYMMETRIX
/dev/dsk/c29t4d7   /dev/rdsk/c29t4d7
Another good command to check that all disks are there:

for i in `ls /dev/dsk/*`;do pvdisplay $i;done

To check for disks that are not being used use diskinfo instead of pvdisplay:

for i in `ls /dev/rdsk/*`;do diskinfo $i;done

Xserver

Filed under Unix Notes on Tuesday, February 28th, 2006 @ 2:25pm by Christen

Server:
xinit (for no window manager (get from client)
or startx (includes window manager, in cygwin it integrates nicely with XP)
with CygWIn you can run C:\cygwin\usr\X11R6\bin\startxwin.bat without first starting Cygwin

xhost IPofXClient
or
xhost + (allows ANY client)

Client:
DISPLAY=IPofServer:0.0
export DISPLAY

AnyXProgram &

You can use a window manager from the client like thus:
openbox &
or just run a single program like a terminal on the client:
aterm &
xterm &

be sure to & the programs so that you don’t loose your prompt.

Dump Man pages to Word

Filed under Unix Notes on Monday, February 27th, 2006 @ 4:48pm by Christen

Dump text formatted man pages for all commands in /usr/sbin starting with mk* to testmans:
ls -1 /usr/sbin/mk* | cut -d’/’ -f 4 | xargs -l man | col -b >> testmans

There are lines that appear at the top/bottom of every man page. You may want to use grep to eliminate these.
use grep SOMETEXTINLINE file to see if it gets the right thing, then “grep -v SOMETEXTINFILE file > newfile” will output all lines WITHOUT that text to the new file.

Items to consider using to grep out extra lines:
“Hewlett-Packard Company” - Gets rid of page break fillers between pages on HPUX
‘(1M).*(1M)$’ - only want the ones with two occurences, one at the end of the line, b/c those are at every page top also.

Like so:
grep -v “Hewlett-Packard Company” testmans > testmans1
grep -v ‘(1M).*(1M)$’ testmans1 > testmans

Or put them onto one line if you are confident:
grep -v “Hewlett-Packard Company” testmans | grep -v ‘(1M).*(1M)$’ > testmans1;rm testmans

Open the file in winword and accept “Windows (Default)” as the Text encoding

Knock your top & bottom margins down to the minimum
(Adjusting the left/right margins will likely make no difference, so don’t bother)
(For very small jobs, you could leave top and bottom if you like, I’m usually working close to 100 page though)

Replace: ^p^p^p
with: ^p
-Do this repeatedly. The first time the page count may cut in half. After that you’ll get diminishing returns. Keep doing it until you either get 0 replacements, or the same number repeatedly. Sometimes there will be a string at the bottom of the document that can’t be removed this way.

Finally, make sure there are no blank pages at the end.

Scroll through the document and make sure it doesn’t have any huge white space sections.

Print if you like!

HPUX ServiceGuard Cluster Manager

Filed under Unix Notes on Monday, February 27th, 2006 @ 4:38pm by Christen

HPUX ServiceGuard Cluster Commands and “Patching” example:

First off, read the man pages. There aren’t that many, and they aren’t that long:

ls -1 /usr/sbin/mk* | cut -d’/’ -f 4 | xargs -l man | col -b >> clustermanpages
(See article on how to dump man pages to Word also.)

A node is a computer and a package is the application that runs on a node.

The cluster is two or more systems that see some of each others disks and can run each others applications.

The application, or package, can only be run on one node at a time. All of the cmxxx commands can be run on either node and you will get the exact same output regardless of which node you issue the command on.

cmhaltnode is very friendly and safe according to the man pages. Run on it’s own, it won’t halt the node if any packages are running on it. If you use the -f option, it will halt the packages first and then they will start on the other node (if failover is set). If a package fails to halt, then cmhaltnode will fail. It won’t stop a node with a package running on it.

Moving packages - There isn’t really a “move” command. The man pages say to move a package like this: (say from node 1 to 2)

cmhaltpkg package1
cmrunpkg -n node2 package1
cmmodpkg -e package1
(Note, both the halt and run commands will operate on the package no matter what node it is on, if a node isn’t given with the -n command)
(See note below for purpose of cmmodpkg)

cmhalt may confuse you. When it is run, the cluster knows it, and assumes you meant what you said, so it will not “fail over” to another node. Failover only happens when the package fails for other reasons, or if you use the cmhaltnode program with the -f option, which does allow them to move over.
In fact, due to this, after moving a package manually, you need to reenable package switching for the pacakge with “cmmodpkg -e packagename” since cmhaltpkg disabled package switching for that package. You can see the status of package switching with cmviewcl.

One issue is Failback. If it is set to “auto” on any package, it could present a problem. When your node comes back up, packages may unexpectedly fall back onto it.
Run:
cmviewcl -v | grep Failback
will list all of the Failback settings. If they all say “manual” then you are in good shape, because a package will not move back to its primary node without  manual intervention.
It is possible to use cmmodpkg to tell the packages that they may not move to a given node, which would help you in this case. See the man page for cmmodpkg

Before doing anything crazy, you should do a cmviewcl and a cmviewcl -v and copy down the info. That way you can see how things were set up before.

When you are done, do a cmviewcl -v and make sure “PKG_SWITCH” is enabled for all packages. It is possible for the cmhaltpkg to disable this on some of them, and for you to forget to put it back.
Here are some tests. These should come back with nothing:
cmviewcl -v | grep disabled
cmviewcl -v | grep down
These should come back with everything:
cmview -v | grep enabled
cmview -v | grep up
Check to see that everyone is running on their primary server:
cmviewcl -v | grep Primary
Run cmview -v through MORE also, and just look to see that it all looks right.

You should find out from application contact what order the package should go down and up in. 99.9% of the time the order does NOT matter. Common sense says the come up 1,2 3 and go down 3 2 1. On this box pkgftp* can go down anytime and pkg01-05 - go down in reverse and up in order.

cmviewcl

CLUSTER STATUS
clustername-cl5 up

NODE STATUS STATE
server1 up running

PACKAGE STATUS STATE PKG_SWITCH NODE
pkg01 up running enabled server1
pkg02 up running enabled server1
pkg03 up running enabled server1
pkg04 up running enabled server1
pkg05 up running enabled server1
pkgftp1 up running enabled server1

NODE STATUS STATE
server2 up running

PACKAGE STATUS STATE PKG_SWITCH NODE
pkg06 up running enabled server2
pkg07 up running enabled server2
pkg08 up running enabled server2
pkg09 up running enabled server2
pkg10 up running enabled server2
pkgftp2 up running enabled server2

So, as root from any member of the cluster:
(I suggest that you do a cmviewcl between these commands periodically to make sure that what you expect to happen is actually happening.)
cmhaltpkg pkgftp1
cmrunpkg -n server2 pkgftp1
(better to do it this way than just downing the node, unless you really feel cool, in which case, you could technially just do a ‘cmhaltcl server1′ and all packages should move over to server 2 automatically. See the man page.)
cmhaltpkg pkg05
cmrunpkg -n server2 pkg05
cmhaltpkg pkg04
cmrunpkg -n server2 pkg04
cmhaltpkg pkg03
cmrunpkg -n server2 pkg03
(you can’t hardly mess up these commands, they will complain if you tell it to do the wrong thing)
cmhaltpkg pkg02
cmrunpkg -n server2 pkg02
cmhaltpkg pkg01
cmrunpkg -n server2 pkg01
(Some boxes can take 30 minutes per package! Some boxes take 45 minutes to move stuff over!)
(The command will hang there until it is done moving the package, so that is one good reason to do them one at a time.)
(Remember, the order may be important, so ask the app contact ahead of time!)
cmmodpkg -e pkg01 pkg02 pkg03 pkg04 pkg05 pkgftp1
(Turns the enable for failover back on for all packages.)
(This should be done if there is a 3 node cluster, so that they can fail to node 3, otherwise they cannot fail over to  anywhere after this)
(By reenabling it, if the other node did go down after your patched node came back up, they could come over to it before you  did it. My trainer had this happen once, where his patched server came back up and then suddenly pakcages came to it, b/c another  box failed on him suddenly during the patching window)
(The cmhaltpkg command automaticall disables the “pkg_switch” option, as you will see in a cmviewcl display. This is covered in the man page for cmhaltpkg. The concept is that if you halt a package manuall, you don’t want it to go starting up anywhere, you want it to stay halted, or to start where you put it and stay there.)
(Also, you may get patrol alerts if the pkg_switch id disabled)

cmhaltnode server1

cmviewcl
(may take a few minutes, before the node finished reforming (reads soem files)) before cmviewcl shows things properly

shutdown -y 0, or whatever it is you need to do.

After the server comes back up, the cluster will come up by itself (remember, you halted it), but the packages should not move over to it, UNLESS you have AUTO_FALL_BACK set on, and the PKG_SWITCH is enabled.

(Some clients let the SA edit the pkg files, some clients don’t want the SA to mess with them at all.)

When moving packages BACK to their home server, you can use a little trick. Just cmhaltpkg the package, that halts it and disables the PKG_SWITCH, then just do a ‘cmmodpkg -e pacckage’ and it will automatically start back up, on the package’s primary server. You get to skip the ‘cmrunpkg -n server package’ command for each package.

cmhaltpkg pkgftp1
cmmodpkg -e pkgftp1

cmhaltpkg pkg05
cmmodpkg -e pkg05

cmhaltpkg pkg05
cmmodpkg -e pkg05

(Only downside here is the cmmodpkg comes back before the package has started, unlike the cmrunpkg, so you have to use cmviewcl to see when the package has started up.)

you can also:

cmhaltpkg pkg03 pkg02;cmmodpkg -e pkg03 pkg02

to save a little typing

(One cool note, if you are patching or upgrading boxes, moving all of the packages to server1 after you patched it and before you patch server2, and then testing the application gives you a quick and dirty real world test of whether whatever you did will break the application before you upgrade server2. If it does break the application right away, then you have server2 still in pre-broken state, so you can just move things there while you roll back server1.)

Cheat Sheet

Filed under Unix Notes on Monday, February 27th, 2006 @ 3:51pm by Christen

This is common stuff that I always forget at the wrong moment, so I print this and put it on my wall at work.

shell :

To see what shell you are using (being in just “sh” explains a lot):
echo $SHELL

Make vi your command line editory:
set -o vi
Or, if you are still in sh, to switch to ksh and set vi as the command line editor:
ksh -o vi

crontab:
minutes(0-59) hours(0-23) dates(1-31) months(1-12) days(0-6)
* * * * * command

Set your backspace key to work:
stty erase BACKSPACE
(It may be you just need to switch how Putty is set up)

Set CTRL-Z to suspend job:
stty susp ^Z

Set other Terminal parameters:
stty rows 80
stty cols 25
export DISPLAY=vt100

Command line completion:
HPX: ESC\ or ESC ESC
SUN: ESC\
AIX: ESC\
ABOVE: ESC= - List all options ESC* - Put all options on command line
LNX: TAB

vi:

h - left
j - up
k - down
l - right

i – insert
a - append (A - append at end of line)
x - delete
X - backspace

0 - start of line (or is it ^ ?)
$ - end of line

b - back word
w - forward word

CTRL-H - backspace

/ - search
n - next instance

Other handy vi commands:

#g - go to line # of file
G - last line of file

o - open a line below me to type in
O - open a line above me to type in

u - undo

:1,$ s/4/john/g - from first line to last line of file, search for 4 and replace with john globally (all instances on the line)

yy - yank the line (copy)
dd - cut current line
p - paste (below) P - Paste (above)

:x! - (as ROOT) write and quit, even Read-Only files

Firefox:
F7 - Carret Browsing - select text w/ keyboard
/ - search
‘ - Search only links

Simple awk lines I always use but never remember:

awk ‘{ print $3 }’ = cuts column 3
awk ‘/dev/{print $2}’ = print /dev/ with column 2
c=’$'$n;awk “{print $c}” = print column $n
More awk stuff:
http://mattwalsh.com/twiki/bin/view/Main/AwkTutorial

banner test > /dev/console

Kill Usage

Filed under Unix Notes on Monday, February 20th, 2006 @ 9:38am by Christen

kill -HUP PID
kill -TERM PID
kill -KILL PID
(I did not write this, I just found it and posted it here for my own reference.)
Why does everyone jump straight to -9?

DO NOT USE -9 AS YOUR FIRST KILL SIGNAL.

Try a nice friendly -HUP … perhaps a little stronger -TERM.

If, for some reason, these don’t work, knock a little louder with -KILL (ie: -9), but don’t whine if something else gets hosed due to the strong signal (not to scare anyone, it’s unlikely that anything critical is happening anyway, any more than just writing a file– but you could leave resources locked/in use (ie: memory, etc)).

Also, use -HUP and -TERM and -KILL …. you’ll be thankful when you miss your kill -1 and do a kill 1 instead … you’ll use -HUP from then on (well, maybe not on a pda, but on a production server etc.)

UNIX kill signals

Signal
Name

Signal
Number

Action

HUP

1

Hangup

INT

2

Interrupt

QUIT

3

Quit (dumps core file)

ILL

4

Illegal instruction (dumps core file)

TRAP

5

Breakpoint trap (dumps core file)

IOT

6

I/O trap (dumps core file)

EMT

7

Emulator trap (dumps core file)

FPE

8

Floating Point Exception (dumps core file)

KILL

9

Kill with extreme prejudice

BUS

10

Bus error (dumps core file)

SEGV

11

Segment Violation (dumps core file)

SYS

12

Bad system call argument (dumps core file)

PIPE

13

Write to nonexistent pipe

ALRM

14

Alarm clock timeout

TERM

15

Terminate

USR1

16

User defined signal

USR2

17

User defined signal

CHLD

18

Child status (aka CLD)

PWR

19

Power failure or restart

WINCH

20

Window size change

URG

21

Urgent socket condition

IO

22

Socket I/O (aka POLL)

STOP

23

Stop from non-tty process (see CONT)

TSTP

24

Stop from tty process (see CONT)

CONT

25

Continue a stopped process

TTIN

26

Waiting for background tty input (see CONT)

TTOU

27

Waiting for background tty output (see CONT)

VTALRM

28

Virtual alarm timeout

PROF

29

Profiling timeout

XCPU

30

CPU time limit exceeded (dumps core file)

XFSZ

31

File size limit exceeded (dumps core file)

Powered by WordPress