Archive

Archive for the ‘Linux’ Category

Does Google Hurt Efficiency?

July 31st, 2011 1 comment

The other night we were doing a hardware upgrade on a cluster and testing. We were working with the command clusvcadm to relocate a service from one host in the cluster to another but the originating server kept getting power fenced. We assumed it was the command switches we were running so I went straight to the man page, my coworker went straight to google. Just for reference there is a 10 year difference in our ages, I grew up with man pages and it is a pet peeve of mine when either no man page exists or it is a terrible placeholder. I digress, through his search he came upon a webified man page while I was reading the man page. When I needled him about it his answer was, “But mine is nicely formatted and I can search the web page.” I was surprised, I can search the man page too, right in the pager and can even change man page viewers by changing the PAGER variable.

Three weeks ago I needed to bring up an https server on Ubuntu and spent 45 minutes googling around reading old, outdated or completely wrong howtos before finally going to help.ubuntu.com and 20 minutes later it was done.

The same thing happened over the past couple of weeks working with Xen and VirtualBox. I’ve toiled away looking at poorly written documentation and even mentioned it in my last Red Hat class. The instructor worked for Red Hat and took umbrage with my statement. He was amazed that I did not think Red Hat had great documentation, I was even more shocked that he considered their documentation more than rudimentary. Have a look for your self at the Red Hat documentation.

Just this week I was helping a friend who is the server and network administrator for a small school system configure the proper etherchannel load balancing for a server and he was frustrated at the Cisco documentation. I was astonished. It seemed that he was overwhelmed. He was stuck googling around trying find the “right” documentation rather than learning the layout of the Cisco documentation website.

The point of this post is that lately it seems I waste more time trying to find good information through searching on the web than trying to find the best source of information.

Categories: Linux, Routing Tags:

DRBD and Heartbeat

May 10th, 2011 No comments

I spent a considerable amount of time over the last couple of days working with DRBD and Heartbeat.

Below are the links I used to get things running:
http://wiki.centos.org/HowTos/Ha-Drbd
http://www.howtoforge.com/vm_replication_failover_vmware_debian_etch_p3
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/s-intro-pacemaker.html
http://www.drbd.org/users-guide/s-heartbeat-r1.html
http://www.drbd.org/users-guide/s-heartbeat-config.html
http://www.drbd.org/users-guide/s-heartbeat-crm.html

Part of my problem was not understanding the difference between R1 and DRM style clusters and their accompanying daemons; heartbeat, pacemaker and the different protocol versions. Pacemaker is a more advanced cluster resource manager that can work with both Corosync and Heartbeat. Heartbeat uses an older protocol whereas pacemaker uses OpenAIS to be compatible with RedHat cluster services.

Regardless here are my notes for configuration, and just for completeness my notes are a mix of doing this first on VMWare and then on a Xen cluster so any inconsistencies are a result of doing this multiple times in different environments. Regardless the errors are mine and I would recommend reading the documentation linked above.

The basics behind the setup is that DRBD replicates data between two servers. DRBD is the network block device that mirrors the data. The heartbeat daemon keeps track of the shared IP, the daemons that are in HA and runs the init scripts appropriately.

DRBD Initialization

Format the disk:

fdisk /dev/xvdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 10443.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p
Disk /dev/xvdb: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-10443, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-10443, default 10443):
Using default value 10443

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): p
Disk /dev/xvdb: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/xvdb1               1       10443    83883366   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Make sure that the names names are consistent throughout all of these configuration files. This may mean ensuring they are correct in DNS and /etc/hosts.

Locally configure name for this server:

uname -n
drbd01.chainringcircus.org

uname -n
drbd02.chainringcircus.org

DNS name for this server:

dig +short drbd01.chainringcircus.org
192.168.1.191
dig +short drbd02.chainringcircus.org
192.168.1.192

The /etc/drbd.conf file was designed to allow a verbatim copy on both nodes of the cluster.

cat /etc/drbd.conf
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#

global {
        usage-count no;
}

common {
        protocol C;
        handlers {
                pri-on-incon-degr "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
                #pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
                #pri-on-incon-degr This handler is called if the node is primary, degraded and the local
                #copy of the data is inconsistent.  It broadcasts an error, sleeps for 60 seconds and then halts.
        }

        startup {
                wfc-timeout 10;                 # Wait for connection timeout.  The init script blocks the boot process
                                                          # until the DRBD resources are connected.  We wait for 10 seconds.
                degr-wfc-timeout 30;        # Wait for connection timeout if this node was a degraded cluster.
        }

        disk {
                on-io-error detach;
        } # or panic, ...

        net {  
                cram-hmac-alg "sha1";
                shared-secret "CHANGEME";        # Don't forget to choose a secret for auth
                max-buffers   20000;                  # Play with this setting to achieve highest possible performance
                unplug-watermark   12000;         # Play with this setting to achieve highest possible performance
                max-epoch-size 20000;               # Should be the same as max-buffers
        }
        syncer {
                rate 100M;
        }
}

resource sites {
        device /dev/drbd0;
        disk /dev/sdb;
        meta-disk internal;     # Internal means that the last part of the backing device is used to store the metadata.
        on drbd01.chainringcircus.org {       #on hostname as seen in uname -n and the DNS lookup.
                address 192.168.1.191:7788;
        }
        on drbd02.chainringcircus.org {
                address 192.168.1.192:7788;
        }
}

Copy the configuration file:

scp /etc/drbd.conf root@drbd02.chainringcircus.org:/etc/

Tried to start DRBD but got an error:

service drbd start
Starting DRBD resources: [
sites
no suitable meta data found :(
Command '/sbin/drbdmeta 0 v08 /dev/sdb internal check-resize' terminated with exit code 255
drbdadm check-resize sites: exited with code 255
d(sites) 0: Failure: (119) No valid meta-data signature found.

        ==> Use 'drbdadm create-md res' to initialize meta-data area. <==


[sites] cmd /sbin/drbdsetup 0 disk /dev/sdb /dev/sdb internal --set-defaults --create-device --on-io-error=detach  failed - continuing!
 
s(sites) n(sites) ]..........
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
m:res    cs            ro                 ds                 p  mounted  fstype
0:sites  WFConnection  Secondary/Unknown  Diskless/DUnknown  C


/etc/init.d/drbd stop
Stopping all DRBD resources: .

I did not initialize the meta data storage and this needs to be done before a DRBD resource can be brought online. The DRBD resource needs to be down or detached from its backing storage.

drbdadm create-md sites
md_offset 1073737728
al_offset 1073704960
bm_offset 1073672192

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

service drbd start
Starting DRBD resources: [
sites
Found valid meta data in the expected location, 1073737728 bytes into /dev/sdb.
d(sites) s(sites) n(sites) ]..........

Check the status:

cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1048508

Make it primary:

drbdadm -- --overwrite-data-of-peer primary sites
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:67584 nr:0 dw:0 dr:67584 al:0 bm:4 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:980924
        [>...................] sync'ed:  6.7% (980924/1048508)K delay_probe: 10
        finish: 0:01:27 speed: 11,264 (11,264) K/sec
[root@localhost etc]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:1019904 nr:0 dw:0 dr:1019904 al:0 bm:62 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:28604
        [==================>.] sync'ed: 97.7% (28604/1048508)K delay_probe: 195
        finish: 0:00:02 speed: 11,132 (10,404) K/sec
[root@localhost etc]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:1048508 nr:0 dw:0 dr:1048508 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
You have new mail in /var/spool/mail/root

Make a file system:

mkfs.ext3 /dev/drbd0
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262127 blocks
13106 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Testing the filesystem:

mount /dev/drbd0 /sites

mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/drbd0 on /sites type ext3 (rw)

touch /sites/test.txt

ls /sites
lost+found  test.txt

umount /sites

drbdadm secondary sites

On the second server:

drbdadm primary sites

mount /dev/drbd0 /sites/

mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda5 on /home type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
.host:/ on /mnt/hgfs type vmhgfs (rw,ttl=1)
none on /proc/fs/vmblock/mountPoint type vmblock (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/drbd0 on /sites type ext3 (rw)

ls /sites
lost+found  test.txt

Heartbeat R1-style

Heartbeat in R1 configuration uses 3 files that must be configured if you are using the heartbeat protocol.
/etc/ha.d/ha.cf
/etc/ha.d/haresources
/etc/ha.d/authkeys

cat /etc/ha.d/authkeys
auth 1          # A numerical identifier between 1 and 15 inclusive
                    # must be unique within the file.
1 sha1 CHANGEME   # Methods can be md5 sha1 or crc.
                                # The password is just a string.
chmod 600 /etc/ha.d/authkeys

Before we take care of the ha.cf file we need to set up the ha_logd configuration file.

cp /usr/share/doc/heartbeat-2.1.3/logd.cf /etc/

And make changes to the logd.cf file accordingly. Be sure to copy /etc/logd.cf to both servers. Also note that I had to completely stop and then restart the heartbeat daemon for my logging changes to take affect.

cat /etc/logd.cf
#       File to write debug messages to
#       Default: /var/log/ha-debug
debugfile /var/log/ha-debug.log

#
#
#       File to write other messages to
#       Default: /var/log/ha-log
logfile /var/log/ha.log

#
#
#       Facility to use for syslog()/logger
#       Default: daemon
#logfacility    daemon

#       Entity to be shown at beginning of a message
#       for logging daemon
#       Default: "logd"
entity logd

#       Do we register to apphbd
#       Default: no
#useapphbd no

#       There are two processes running for logging daemon
#               1. parent process which reads messages from all client channels
#               and writes them to the child process
#  
#               2. the child process which reads messages from the parent process through IPC
#               and writes them to syslog/disk

#       set the send queue length from the parent process to the child process
#
#sendqlen 256

#       set the recv queue length in child process
#
#recvqlen 256
cat /etc/ha.d/ha.cf
# The recommendation is to use logd.
use_logd yes
# Default option is 0, values are 0-255 with 1-3 being the most useful.
debug 0
# Timing according to the FAQ at www.linux-ha.org/wiki/FAQ
# warntime should be at least 2 * keepalive
# warntime should be 1/2 to 1/4 deadtime
# The interval between heartbeat packets.
keepalive 1
# How quickly Heartbeat should issue a "late heartbeat" warning.  Warntime is
# important for tuning deadtime.
warntime 5
# How long to decide a cluster node is dead.  Too low will flasely declare
# a death and too high will hinder takeover during a failure.
# Can be specified as a floating point number followed by a untis-specifier.
# If units are omitted it defaults to seconds.
# deadtime 1
# deadtime 100ms 100 milliseconds
# deadtime 1000us 1000 microseconds
deadtime 10
# 694 is the default but can be changed if multiple clusters are in use.
udpport 694
# Which interfaces send UDP broadcast traffic, more than one can be specified.
bcast   eth0
# auto_failback can be "on" "off" or "legacy"
auto_failback off
# Set the nodes in the cluster.
node    in1.eamc.org        
node    in2.eamc.org
# Make sure this IP address is pingable from the bcast network above.
ping 192.168.1.1    
respawn hacluster /usr/lib/heartbeat/ipfail
cat /etc/ha.d/haresources
drbd01 192.168.1.190 drbddisk::sites Filesystem::/dev/drbd0::/sites::ext3 httpd
# Explanation:
# Primary Server name --> virtual IP address to be used --> DRBD resource as configurd in /etc/drbd.conf
# --> where to mount the DRBD resource and the filesystem type --> resource to start/stop in case of failover

Cluster Management
To take over cluster management from a primary server:

/usr/lib/heartbeat/hb_takeover

Relinquishing cluster management to a secondary server:

/usr/lib/hearbeat/hb_standby
/etc/init.d/heartbeat stop

The order of operations as set by the init scripts:

ls -al /etc/rc3.d/ | egrep "hear|drb"
lrwxrwxrwx  1 root root   14 Apr  1 11:40 S70drbd -> ../init.d/drbd
lrwxrwxrwx  1 root root   19 Jun  1 08:58 S75heartbeat -> ../init.d/heartbeat

Notes for Xen users:

# cat /etc/modprobe.d/drbd.conf
options debd disable_sendpage=1

To allow live migration on Xen:

        net {
                allow-two-primaries;
        }

Split-brain
Playing around this morning I got the cluster into split-brain.

Jun  1 10:46:53 in1 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Jun  1 10:46:53 in1 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0

Here is how to fix it.
Run this first on the secondary node.

drbadm -- --discard-my-data connect sites

Run this on the primary node.

drbadm connect sites
Categories: Linux Tags:

Data Loss Prevention

March 7th, 2011 No comments

Every once in a while I get to write a neat piece of code that I can share. This is one of those times. I realize it is not large and by PerlMonk standards not very elegant. The problem therein lies with maintainability over the next few years. Regardless I like what I wrote and would like to share.

At the Circus we had a pretty good idea that we had some data leakage. Nothing like people taking off with everything needed to get home loans and rip off customers, just people not thinking about what they send through email. We didn’t know the extent of the problem or even if we had one. We just weren’t sure. Our C-level executives didn’t believe that employees would be so careless with customer data. We decided to find out.

I must say that the results were actually quite positive. We had a couple of people email work related data home so they could work at home over the weekend and a few emails regarding employment, but they were originated by the prospective employee.

Regardless, in order for us to find out I wrote a few scripts that hook into our email system. One that I am particularly proud of recurses through a directory of email messages and attachments scanning each file for relevant data.

Please note that by the time these scripts touch the data it has been scrubbed by the antivirus and other checks we have in place. I am only looking for keywords or regular expressions that would indicate customer related data loss.

Let me explain the directory structure. Under the email system is the directory /var/spool/filter that contains every email message that has been sent in the last 30 minutes. There is a cleanup process that erases all the files in that directory and that is actually where I wrote the hook, in the cleanup process. Here is a sample listing of the directory.

#ls -1 /var/spool/filter/
msg-1299451572-29517-0
msg-1299451626-29523-0
msg-1299451695-29528-0
msg-1299452467-29565-0
msg-1299452491-29570-0
msg-1299453007-29593-0
msg-1299453086-29599-0

As you can see, each email header ends with a .hed extension and the message is in .txt format. The ETP.doc file is an attachment.

#ls -1 /var/spool/filter/msg-1299451626-29523-0/
ETP.doc
msg-29523-1.txt
msg-29523-2.dat.hed

The subroutine I am most pleased with is the one that recurses through the directory structure. The slurp command returns a hash and if it is a subdirectory then it is a hash as well. I look for it with the following line of code.

if (ref $structure->{$key} eq 'HASH')

That is how I find subdirectories to push onto the stack of recursive calls. As it traverses each directory it just looks at each file extension and makes a determination as to what to do with it.

I realize most system administrators are asking why I didn’t use the file command to make sure the script was acting appropriately for each file type but that does not work with the new Microsoft document types.

# file Test-Excel.xlsx
Test-Excel.xlsx: Zip archive data, at least v2.0 to extract

I thought it was a fun project and I enjoyed writing what I felt was an interesting piece of code.

#!/usr/bin/perl
# 2011-01-12 Jud Bishop
# This script goes looking for customer data being sent out through email and
# flags it for further review.
use strict;
use warnings;
use File::Find;
use File::Basename;
use File::Copy::Recursive qw(fcopy dircopy rcopy);
use File::Slurp::Tree;

#my $dir = "/home/jud/TestMessages";
#my $log = "/home/jud/TestMessages/violation";
#my $auditdir = "/home/jud/TestMessages/Trash/";
my $dir = "/var/spool/filter";
my $log = "/var/log/hipaa/violation";
my $auditdir = "/opt/smtpaudit/";
my $debug = 0;


###################
# MAIN
###################
my %tree;
my $tree = slurp_tree($dir);

open (LOG, '>>', $log) or die $!;

traverse_structure($dir, $tree);

close LOG or die $!;


##########
# This does the heavy lifting of the whole program.  It recursively
# iterates through the directory structure and works on a file accordingly.
# Each directory is a hash key.
##########
sub traverse_structure {
        if($debug){print "##traverse_structure\n";}
        my ($base, $structure) = @_;
        my $path;
    my @violation;
    my $secure;
        foreach my $key ( keys %$structure) {
                $path = $base . "/" . $key;
        $secure = 0;
        ## If it's a HASH then it's a directory.
                if (ref $structure->{$key} eq 'HASH'){
            if($debug){print "key: $key\n"};
                        traverse_structure( $path, $structure->{$key} );
                } else {
            if($debug){print "file  : $key\n"};
            if($debug){print "base  : $base\n"};
            if($debug){print "path  : $path\n"};
            if($debug){print "secure: $secure\n"};
            if($debug){print "violation: $#violation\n"};
   
            ## If the file is not being used...
            if ($path =~ m/doc$/){
                parse_doc($path, \@violation);
            } elsif ($path =~ m/xlsx$|xls$/) {
                parse_excel($path, \@violation);
            } elsif ($path =~ m/txt$/) {
                parse_message($path, \@violation);
            } elsif ($path =~ m/pdf$/) {
                parse_pdf($path, \@violation);
            } elsif ($path =~ m/hed$/) {
                parse_head($path, \@violation, \$secure);
            }
                }
    }
       # If it is a secure email than it is encrypted on
       # the fly and not a violation.
    if ( ($secure == 0) && ($#violation > 3) ){
        push (@violation, "EMAIL: " . $base);
        log_it(@violation);
        copy_dir($base);
    }
}

# For later review.
sub copy_dir {
    my $path = shift;
    if($debug){print "##copy_dir $path\n";}
    my $file = fileparse($path);
   
    if ($file =~ m/^msg/){
        my $basename = basename($path);
        my $newpath = $auditdir . $basename;
   
        if($debug){print "dircopy $path $newpath\n";}
        dircopy($path,$newpath);
    }
}

# Log file that is easy to ready because an employee goes through
# this file and decides if it is a REAL violation.
sub log_it {
    my @text = @_;
    my $line;
    if($debug){print "##log_it\n";}
    print LOG "---------------------------------------------\n";
    foreach $line (@text) {
        print LOG "$line\n";
    }
    print LOG "---------------------------------------------\n";
}

sub parse_head {
    my ($file, $violation_ref, $secure_ref) = @_;
    my @body;
    my $line;
    if($debug){print "##parse_head $file\n";}

    open(FILE,$file) || return 0;
        @body = <FILE>;
    close(FILE);

    foreach $line (@body)   {
        if ($line =~ m/^From/){
                        push (@$violation_ref, $line);
        } elsif ($line =~ m/^To/) {
                        push (@$violation_ref, $line);
        } elsif ($line =~ m/^Subject/) {
                        push (@$violation_ref, $line);
            if ($line =~ m/^secure/i )
            {
                $$secure_ref = 1;
            }
        }
    }
}

sub parse_pdf {
    my ($file, $violation_ref) = @_;
    my @body;
    my $new_file = $file . ".txt";
    my $CMD;

    if($debug){print "##parse_doc $dir $file\n";}
    $CMD = "/usr/bin/pdftotext \"" . $file . "\" > \"" . $new_file . "\"";
    if($debug){print "CMD: $CMD\n";}
        system($CMD);
        parse_text ($new_file, $violation_ref);
}

sub parse_doc {
    my ($file, $violation_ref) = @_;
    my @body;
    my $new_file = $file . ".txt";
    my $CMD;

    if($debug){print "##parse_doc $dir $file\n";}
    $CMD = "/usr/bin/antiword -st \"" . $file . "\" > \"" . $new_file . "\"";
    if($debug){print "CMD: $CMD\n";}
        system($CMD);
        parse_text ($new_file, $violation_ref);
}

sub parse_excel {
    my ($file, $violation_ref) = @_;
    my @body;
    my $new_file = $file . ".txt";
    my $CMD;

    if($debug){print "##parse_excel $file\n";}
    $CMD = "/usr/local/bin/antiexcel \"" . $file . "\" > \"" . $new_file . "\"";
    if($debug){print "CMD: $CMD\n";}
        system($CMD);
        parse_text ($new_file, $violation_ref);
}

sub parse_text {
    my ($file, $violation_ref) = @_;
    my @body;
    if($debug){print "##parse_text $file\n";}

    open(FILE,$file) || return 0;
        @body = <FILE>;
    close(FILE);

    compare_text(\@body, $violation_ref);
}

sub parse_message {
    my ($file, $violation_ref) = @_;
    my @body;
    if($debug){print "##parse_text $file\n";}

    open(FILE,$file) || return 0;
        @body = <FILE>;
    close(FILE);

    compare_text(\@body, $violation_ref);
}

# All of the earlier subroutines call this one.  
# It takes the text and looks for keywords.
sub compare_text {
    my ($text_ref, $violation_ref) = @_;
        my @difference;
    my @text_array;
    my @elements;
        my %count;
        my %rules;
        my $element;
    if($debug){print "##compare_text\n";}

    foreach $element (@$text_ref){
            @elements = split(' ', $element);
        push (@text_array, @elements);
    }

        # The parser was already created above.
        my @rule = ("DOB", "D.O.B.", "d.o.b.", "dob", "death:", "release", "admit", "admission", "Age:", "SSN", "Social", "Security", "Account", "Acct", "claimant", "MRI", "myelogram", "credit", "card");

    # Me being lazy.
        foreach $element (@rule)
        {
                $rules{$element} = 1;
        }

        foreach $element (@text_array)
        {
                if (exists $rules{$element})
                {
            if($debug){print "$element\n";}
            $element = "VIOLATION: " . $element;
                        push (@$violation_ref, $element);
                }
                # Social Security Number
                elsif($element =~ /\d{3}-?\d{2}-?\d{4}/)
                {
            if($debug){print "$element\n";}
            $element = "VIOLATION: " . $element;
                        push (@$violation_ref, $element);
                }
                # Credit Card Number or MRN
                elsif($element =~ /\d{4}-?\d{4}-?\d{4}-?\d{4}/)
                {
            if($debug){print "$element\n";}
            $element = "VIOLATION: " . $element;
                        push (@$violation_ref, $element);
                }

        }
}
Categories: Code, Linux Tags:

Veritas/Symantec Baremetal Restore

February 1st, 2011 No comments

I spent a considerable amount of time over the last couple of months testing different restore processes. This is my documentation for restoring Veritas/Symantec backups to a Linux server.

The general outline is this:
1. Create a LiveUSB drive to boot CentOS with a persistent overlay.
2. Install Symantec backupexec on the LiveUSB drive.
3. Recreate the drive layout on the new server.
4. Restore to the new server.

Create LiveUSB
CentOS makes a LiveCD toolset for CentOS. They also have directions for how to create a LiveUSB drive with persistent overlay. Please follow those links for more in depth directions.

You must install CentOS LiveUSB on an ext2/3/4 formatted USB drive in order for Symantec to work. If you leave the VFat partition Symantec will not work properly and you will get the error “An unknown error occurred within the NDMP subsystem.” Once I reformatted the USB drive as ext3 and installed a new LiveUSB with persistent overlay Symantec worked. My guess is it has to do with permission bits but that is only a guess.

I downloaded the LiveCD tools for Centos here.

Here is some of my history from that server:

umount /mnt
fdisk /dev/sdb
mkfs -t ext3 /dev/sdb1
mkfs -t ext3 /dev/sdb2
livecd-iso-to-disk --overlay-size-mb 1500 CentOS-5.5-i386-LiveCD.iso /dev/sdb1
mount /dev/sdb1 /mnt
ls /mnt

LiveUSB Setup
I wanted to give it a persistent name and IP address for use in our data center. For some of this I was also shooting in the dark in order to get Symantec working, for thoroughness I include it here.

vi /etc/sysconfig/network
HOSTNAME=recovery.chainringcircus.org
vi /etc/sysconfig/networking/devices/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.255.255.0
IPADDR=192.168.1.200
GATEWAY=192.168.1.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
DNS1=192.168.1.201
DNS2=192.168.1.202
DOMAIN=chainringcircus.org

After I configured the hostname and network settings I rebooted to make sure that the persistent overlay worked. I also turned on sshd and set it to runlevel 3 in /etc/inittab because I did not want to mess with a gui, but that is your choice. When everything came up properly I installed Symantec and we did a test restore.

Install Symantec
I cover installing Symantec on Linux in another post here. You need to install an older package for compatibility:

yum install compat-libstdc++-296-2.96-138.i386

The specific Symantec rpms I installed are listed below. I did try a newer package from Symantec but it did not allow us to restore erroring with a different message. I will also say that was when we were on a VFat partition. Once I got everything working on an ext3 partition I quit testing.

VRTSvxmsa-4.2.1-211.i386.rpm
VRTSralus-10.00.5629-0.i386.rpm

Recreate Drive Layout
For thoroughness I am going to cover creating the logical volumes that are default for CentOS and RHEL.

First I need to lay out the drive mappings. This is from the old server which I am cloning onto a similar server. In this section I am just going to show the output of a number of commands that confirm the file system layout of the server.

File layout on the old server
From the file /etc/fstab:

LABEL=/boot             /boot                   ext3    defaults        1 2
/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0

From the mount command:

/dev/sda1 on /boot type ext3 (rw)
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)

From the fdisk command:

Disk /dev/sda: 219.8 GB, 219823472640 bytes
255 heads, 63 sectors/track, 26725 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       26725   214564140   8e  Linux LVM

Working my way up from the bottom of the LVM stack with the physical volume, the volume group and finally the logical volume.

From pvdisplay:

  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VolGroup00
  PV Size               204.62 GB / not usable 31.29 MB
  Allocatable           yes
  PE Size (KByte)       32768
  Total PE              6547
  Free PE               4
  Allocated PE          6543
  PV UUID               jAuzGO-3Zpz-4T3K-mqcI-Ql6D-1dqf-wj917q

From vgdisplay:

  --- Volume group ---
  VG Name               VolGroup00
  System ID            
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               204.59 GB
  PE Size               32.00 MB
  Total PE              6547
  Alloc PE / Size       6543 / 204.47 GB
  Free  PE / Size       4 / 128.00 MB
  VG UUID               LJc2HJ-D7Gr-ketA-5TSe-ppQM-m5di-4YMEgZ

From lvdisplay:

  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol00
  VG Name                VolGroup00
  LV UUID                HcyaVT-DOEs-1Rdy-h7af-7i0t-P0EF-K2cCxy
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                202.53 GB
  Current LE             6481
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol01
  VG Name                VolGroup00
  LV UUID                ZpAnvu-Of5D-PoEO-HaDN-2krv-zIXp-1fF5av
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                1.94 GB
  Current LE             62
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1

On the old server the drive is broken into two partitions, sda1 and sda2:
sda1 /boot 100MB
sda2 Volume Group ~200GB

The volume group on the old server on the sda2 partition is broken into two logical volumes:
LogVol00 / ~200GB
LogVol01 swap ~2GB

It is important to remember that the drive mappings on the old server will not necessarily match the mappings on the new one. For instance on the old server the raid was on /dev/sda and on the new server the raid drive is mapped on /dev/sdb. That is only because I am booting from /dev/sda on the LiveUSB, under normal circumstance it will come back up as /dev/sda.

Working on the new server recreate the partitions

fdisk /dev/sdb
Command (m for help): p

Disk /dev/sdb: 1199.9 GB, 1199906488320 bytes
255 heads, 63 sectors/track, 145880 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-145880, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-145880, default 145880): +200M

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (26-145880, default 26):
Using default value 26
Last cylinder or +size or +sizeM or +sizeK (26-145880, default 145880):
Using default value 145880

Command (m for help): a
Partition number (1-4): 1

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 8e
Changed system type of partition 2 to 8e (Linux LVM)

Command (m for help): p

Disk /dev/sdb: 1199.9 GB, 1199906488320 bytes
255 heads, 63 sectors/track, 145880 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          25      200781   83  Linux
/dev/sdb2              26      145880  1171580287+  8e  Linux LVM

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
  /usr/sbin/lvmdiskscan
  /dev/ramdisk       [       16.00 MB]
  /dev/live-squashed [      669.80 MB]
  /dev/root          [        4.00 GB]
  /dev/ram           [       16.00 MB]
  /dev/live-osimg    [        4.00 GB]
  /dev/live          [        7.45 GB]
  /dev/ram2          [       16.00 MB]
  /dev/live-overlay  [        1.46 GB]
  /dev/sda2          [        7.47 GB] LVM physical volume
  /dev/ram3          [       16.00 MB]
  /dev/ram4          [       16.00 MB]
  /dev/ram5          [       16.00 MB]
  /dev/ram6          [       16.00 MB]
  /dev/ram7          [       16.00 MB]
  /dev/ram8          [       16.00 MB]
  /dev/ram9          [       16.00 MB]
  /dev/ram10         [       16.00 MB]
  /dev/ram11         [       16.00 MB]
  /dev/ram12         [       16.00 MB]
  /dev/ram13         [       16.00 MB]
  /dev/ram14         [       16.00 MB]
  /dev/ram15         [       16.00 MB]
  /dev/sdb1          [      196.08 MB]
  /dev/sdb2          [        1.09 TB]
  7 disks
  16 partitions
  0 LVM physical volume whole disks
  1 LVM physical volume

Turn off the the LVM in order to make changes, this is just a precautionary step if you have repartitioned your drive.

lvm vgchange -an

Create the LVM.

  vgscan
  Reading all physical volumes.  This may take a while...

  pvcreate -ff /dev/sdb2
  Physical volume "/dev/sdb2" successfully created

Create and activate the volume groups.

  vgcreate VolGroup00 -l 0 -p 0 -s 32m /dev/sdb2
  Volume group "VolGroup00" successfully created

  vgchange -ay VolGroup00
  0 logical volume(s) in volume group "VolGroup00" now active

Finally, create the logical volumes. Even though I have 1.1T I decided to start using 800G, leaving myself room if I want to add another mount point.

  lvcreate -L 800000m -r auto -n LogVol00 VolGroup00
  Logical volume "LogVol00" created

  lvcreate -L 4096m -r auto -n LogVol01 VolGroup00
   Logical volume "LogVol01" created

Read in the new volume groups.

  vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2

Format all of the partitions:

mkfs -t ext3 /dev/sdb1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
[output removed for brevity]
mkfs -t ext3 /dev/VolGroup00/LogVol00
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
[output removed for brevity]
mkswap /dev/VolGroup00/LogVol01
Setting up swapspace version 1, size = 4294963 kB

Here are some quick commands if you mess up and need to delete any of the LVM stack.

lvremove -f /dev/VolGroup00/LogVol00
lvm lvremove -f /dev/VolGroup00/LogVol01
lvm vgchange -a n VolGroup00
lvm vgremove VolGroup00

Restore
Mount the root under /mnt and then have Veritas restore to that mount point.

mount -t ext3 /dev/VolGroup00/LogVol00 /mnt

If you have made it this far then you must really need the information. Here are a couple of screenshots from our backup guru to help in the restore process.

“Preserve Tree”, by default it is selected.
de-select

Select “Restore over existing files”, “Restore all information for files and directories” and “Preserve tree”.
select

Install grub on the new machine
During the restore we restored all of the files and directories to /mnt, including /boot. In order to get everything working again we need to setup the boot directory and then grub. But Red Hat and CentOS 4.X uses legacy grub.

Copy all of /mnt/boot to the real /boot directory.

mkdir /mnt/newboot
mount /dev/sdb1 /mnt/newboot
cp -r /mnt/boot/* /mnt/newboot/

umount /mnt/newboot

mount /dev/sdb1 /boot

grub> root (hd1,0)
 Filesystem type is ext2fs, partition type 0x83

grub> find /grub/stage1
 (hd1,0)

grub> setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+16 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub > quit

Restore /dev and /tmp
Depending upon your backup options you may need to restore the /dev directory and create a tmp directory. You need to set the sticky bit on /tmp.

cp devices.tar /mnt/lvm/VolGroup00-LogVol00/
cd /mnt/lvm/VolGroup00-LogVol00/
tar -tvf devices.tar
tar -xvf devices.tar

chroot /mnt/lvm/VolGroup00-LogVol00/
mkdir /tmp
ls -al /
chmod a+rwx /tmp
chmod +t /tmp
exit

Finally you need to set up your ethernet interfaces by editing the file,
/etc/sysconfig/networking/devices/ifcfg-eth0

Categories: Linux Tags:

Password Aging

January 28th, 2011 No comments

At the Circus we have a password policy to change all passwords every 90 days. Today it was brought to my attention that one of the linux servers was not following that policy. I confirmed that was true and after a little digging I found that it was only accounts that had been migrated from AIX to linux. But we couldn’t force around 2000 users to all change their passwords at the same time because we would inundate the help desk.

This is the script that I wrote to fix the problem and distribute the password changes over a month. The result is that there are only 78 users per day that are forced to change their password each day over a 28 day period.

#!/bin/bash
# 2011-01-28
# Jud Bishop
# Checks for passwords set to never expire and gives an expiration date.
# Distributes the password changes over a 28 day spread.

X=0;

for I in `cat /etc/passwd | cut -d: -f 1`
do
        #echo $I
        #chage -l $I | egrep "Password expires" | cut -d : -f 2

        DATE=`chage -l $I | egrep "Password expires" | cut -d : -f 2 | cut -d \  -f 2`
        if [ $DATE = "never" ]
        then
                echo $I
                if [ $X -le "27" ]
                then
                        X=`expr $X + 1`
                else
                        X=1;
                fi
                echo $X $I
                chage -d  2010-11-$X -M 90 $I
        fi
done
Categories: Code, Linux Tags:

Red Hat Upgrades

December 3rd, 2010 No comments

Now that RHEL6 is out I’ve begun playing with upgrading our older RHEL servers, starting with anything that is RHEL4 and then moving forward to RHEL5. The simplest one to start with was one of our network management servers. They are vital to our job but not customer facing and we can deal with some downtime on them.

My original plan was to install a base RHEL4 on the new server, which would give us a platform with the base system installed and the restore software hosted, then restore from backup over the top.

I took an old server we had that still had maintenance and pressed it into service. I wanted to take the opportunity to test our backups as well as the upgrade path from RHEL4 to RHEL6. So I installed RHEL4 on the new server and ran up2date to make sure it was the “latest” and greatest.

On the old server I ran up2date and then I queried the rpm database to see what packages were installed. The problem is that you cannot pipe rpm -qa output as input into an update script. Up2date wants “freetype-devel” as the package name, not the whole package name as listed in the output, “freetype-devel-2.1.9-17.el4_8.1″ note the version number. Notice also that some package names have multiple dashes while others have only one dash.

[root@server] # rpm -qa
freetype-devel-2.1.9-17.el4_8.1
krb5-devel-1.3.4-62.el4_8.3
php-4.3.9-3.31

I could not easily use cut in a simple bash script so I wrote the following quick hack.

#!/usr/bin/perl

# 2010-12-03 Jud Bishop
# Run:
# rpm -qa >/tmp/installed-software.log
# Then run:
# for I in `rhel-upgrade.pl`; do up2date $I; done

use strict;

my $log_file = "/tmp/installed-software.log";

open (FILE, $log_file) or die "Error: can't open log file\n $! \n";
while (<FILE>)
{
    chomp;
    my (@log) = split /-/;
    my $package = "";

    for (my $i = 0; $i <= $#log; $i++)
    {
        if ( $log[$i]  =~ /^[a-zA-Z]/ )
        {
            if ( $i > 0 )
            {
                $package = $package . "-" . $log[$i];
            } else {
                $package = $log[0];
            }
        }
    }
    print "$package \n";
}

close FILE or die "Error: can't close file\n $! \n";

It has been interesting testing our backup software. It appears it will take some refinement for us to get the restore process worked out.

Categories: Code, Linux Tags:

McKesson Star and DHCP

July 17th, 2010 No comments

Aren’t statistics wonderful. I was looking through some referrer traffic and it appears that McKesson Star and DHCP are often googled and this blog comes up as #1 with that query. So I figured I had better write a post on how to set up McKesson Star and DHCP to all play well together.

Which leads to a funny story. When I first came to my present employer all PCs that accessed Star had static IP addresses. Well to be fair not all of them, but the default-lease-time was literally set for one year and IP addresses were used in the ports table. At the time we had ~1,500 PCs and 1,000 of them were static IP addresses. Woe unto you if you had a laptop and tried to access Star.

I guess the previous administrator was thinking he would only have to change an IP address in the ports table if the PC was turned off, or once annually _if_ it got a new address upon a renewal request. My day was filled with changing DNS entries and fixing that was high on my list of priorities.

We use ISC BIND and DHCP so let me give you an example of my DHCP configuration. I have another post on DHCP here.

# /etc/dhcpd.conf
# This dhcpd server is the _real_ deal.
authoritative;

# Update using DDNS
# Tells the client where to send the forward update.
ddns-domainname "sub.chainringcircus.org";
ddns-update-style interim;
ddns-updates on;

# Leases
default-lease-time 345600;  # 4 days
max-lease-time 604800;  # 7 days

/etc/tcpd.conf
McKesson wrote their own telnet daemon. The reason is because the view you get in Star as well as your default printer is set according to a DNS lookup done by their daemon. The McKesson telnet daemon options are set in /etc/tcpd.conf. Let’s discuss this next because how you define name lookups also makes a big difference. As a side note, our tcpd.conf did not change when we moved from AIX to Linux.

From /etc/dhcpd.conf:

##  EXAMPLES:
##      GETNAME=NONE        Do not try to get the callers name.
##      GETNAME=SIMPLE      Try to get the callers simple name.
##      GETNAME=FULL        Try to get the callers full name.
##
##
##  Lines beginning with MAPNAME= are used to determine if the callers
##  name gotten from getname should be mapped to lower or upper case.
##
##  FORMAT:
##      MAPNAME=VALUE
##
##      VALUE ......... NONE, the callers name is unchanged. This
##                  is the default if the parameter is
##                  not in the configuration file.
##
##              LOWER, the callers name will be mapped to
##                  lower case.
##
##              UPPER, the callers name will be mapped to
##                  upper case.
##
##  EXAMPLES:
##      MAPNAME=NONE        Do not remap callers name.
##      MAPNAME=LOWER       Map callers name to lower case.
##      MAPNAME=UPPER       Map callers name to upper case.
##
PURGETIME=3h
GETNAME=SIMPLE
MAPNAME=LOWER

What does all of this mean? Keep in mind that UNIX is case sensitive and so is Star. What this means is that defining a computer name in Star as well as on the PC, it is important to make sure that they all match. That is why it’s easier to use an IP address. Because the default file does not specify MAPNAME and therefore whether a PC technician uses HumpBack or ALLCAPS, or lowercase makes a difference in how a host name is defined in the Star tables.

GETNAME
The GETNAME option defines whether or not the server does a query for host.chainringcircus.org or just host. If you decide to do a SIMPLE lookup make sure you have all of the possible domains listed in /etc/resolv.conf.

cat /etc/resolv.conf
nameserver 192.168.1.1
nameserver 192.168.1.2
domain chainringcircus.org
search chainringcircus.org sub.chainringcircus.org chainringcircus.com chainringcircus.net

We use simple because a host is defined as host in the Star table and returns the correct information from an nslookup command.

[root@StarCluster ~]# nslookup host1
Server:     192.168.1.1
Address:    192.168.1.1#53

Name:   host1.chainringcircus.org
Address: 192.168.1.22

MAPNAME
If you don’t set MAPNAME you will have to make sure that the PC name, DNS name and Star table name all match case. We decided to stay with all lowercase PC names. This is very important so let me explain this again, differently. Go to a windows PC and look at it’s PC name.

Click:
My Computer
–> Properties
–> Computer Name

If it is DoctorPC521 then it will register in DNS as DoctorPC521. It will return from an nslookup as DoctorPC521 and so it had better be in the Star table as DoctorPC521 not DOCTORPC521 or it will not get the correct view and printer.

I hope this helps other administrators trying to figure out how to make McKesson Star and DHCP work well together.

Categories: Linux Tags:

ClusterIt

July 16th, 2010 No comments

I’ve been playing with more clustering as I prepare for a Red Hat class in August and figured I would write about ClusterIt. I was looking to run a few commands on about six servers and went looking for a simple solution. I believe ClusterIt provides an elegant solution for very little work.

Commands
Here is a list of commands and their description from their respective manpages.
dsh – Run a command on a cluster of machines as defined in the CLUSTER environmental variable.
dshbak – Takes input from the dsh command and formats it to look nicer for the user.
run – Run a command on a machine at random.
rseq – Run a command on a sequence of machines or cluster.
pcp – Copy a file to a number of machines.
pdf – Display free disk space across a number of machines, can be for a single filesystem or the entire machine.
prm – Delete a file, directory or list of files on a number of machines.
rvt – Remote terminal emulator.
clustersed – Quickly dissect cluster files, used to cut individual groups out of a cluster file.
dtop – Used to remotely monitor and display top information, this program segfaulted on my system.

There are also some more involved commands, the daemons for these must be set up on the remote machines.
barrier – Used to synchronize execution of commands on slower and faster machines. When a barrier is set, the process is not released until all of the nodes or processes have met the barrier condition.
barrierd – The daemon portion of barrier that accepts connections from the client program barrier.
jsh – Run scheduled commands on remote machines.
jsd – A simple command scheduling daemon for remote execution.

Installation
The first thing you need to do is make sure you have ssh password-less login set up. I went to our network management server and added a couple of the servers that needed to be able to run commands remotely.

In case you are doing this from scratch, here is the sequence of commands. Generate private/public keys on your management server A.

ssh-keygen -t dsa
press enter when it asks for the filename
press enter when it asks for the passphrase (yes, a blank passphrase)

This will generate two files: ~/.ssh/id_dsa and ~/.ssh/id_dsa.pub. You now want to allow access from this server (A) to the remote server (B) by putting the contents of ~/.ssh/id_dsa.pub from A into ~/.ssh/authorized_keys2 on B.

cat ~/.ssh/id_dsa.pub | ssh B 'cat >> ~/.ssh/authorized_keys2'

Make sure permissions are correct and are not writable or readable except by the owner. Do this on both server A and B.

chmod a-x,go-w,o-r ~/.ssh/*

And to verify it works.

ssh B ls -la

Now it’s time to install ClusterIt. I like to have a suite of programs installed in a common directory but don’t want to modify my MANPATH or worry about other nonsense. This is how I installed ClusterIt.

./configure --bindir=/usr/local/clusterit
make
make install
cd /usr/local/clusterit/
ls

If you read the manpage for dsh or one of the other program in ClusterIt you can see a number of environmental variables and how to set up the ClusterIt environmental variables and files. A snippet of the manpage for dsh.

ENVIRONMENT
dsh utilizes the following environment variables.

CLUSTER            Contains a filename, which is a newline separated
list of nodes in the cluster.

RCMD_CMD           Command to use to connect to remote machines.  
The command chosen must be able to connect with no password to
the remote host.  Defaults to rsh

 ...removed for brevity...

FILES
The file pointed to by the CLUSTER environment variable has the
following format:
           pollux
           castor
           GROUP:alpha
           rigel
           kent
           GROUP:sparc
           alshain
           altair
           LUMP:alphasparc
           alpha
           sparc

This example would have pollux and castor a member of no groups,
rigel and kent a member of group 'alpha', and alshain and altair a
member of group 'sparc'.  Note the format of the GROUP command,
it is in all capital letters, followed by a colon, and the group name.
There can be no spaces following the GROUP command, or in the
name of the group.

As a result I set up my .bashrc with the following options for ClusterIt.

CLUSTER=/etc/clusterit/servers
export CLUSTER

RCMD_CMD=/usr/bin/ssh
export RCMD_CMD

PATH=$PATH:/usr/local/clusterit
export PATH

Make sure you re-source your .bashrc.

source ~/.bashrc

And I have a simple /etc/clusterit/servers file:

cat /etc/clusterit/servers
B
C
D

Now to test.

dsh uptime
B:  17:44:26 up 24 days,  6:32,  5 users,  load average: 0.02, 0.01, 0.00
C:  17:46:56 up 443 days,  9:53,  2 users,  load average: 0.00, 0.00, 0.00
D:  17:46:56 up 443 days,  9:52,  1 user,  load average: 0.00, 0.01, 0.00

Testing
And finally run some commands.

man pcp
pcp /usr/local/bin/script.sh /usr/local/bin/script.sh
dsh /usr/local/bin/script.sh -d /tmp
dsh scp /tmp/output.txt user@A:/tmp/

That last command you must have password-less login from the ClusterIt servers back to your management server.

Categories: Linux Tags:

The rest of the story.

May 30th, 2010 No comments

In short, I returned my e-book to Narbik. I would recommend Micronicstraining to anyone. In fact I am now even more likely to go to Narbik’s class then I was before this incident.

The long version.
Later that day I called Micronicstraining to discuss my misgivings with them and actually spoke with Narbik. He was very helpful and understood my concerns saying that there would be no problem giving me licenses for more than one computer. With that I got off the phone placated to some extent. I tried to install LockLizard onto Wine and figured I would just deal with the inconvenience. But the installation onto Wine failed and I did not install LockLizard on Windows nor did I try to open the e-book. I didn’t even unrar the files.

That night I tossed and turned, woke up in the middle of the night and pondered my predicament. I figured I had nothing to loose by asking for my money back. That next morning I sent an email to Narbik explaining my dilemma. It is below.

Sir,

Regretfully I am writing to you to request a refund. I have not
activated my LockLizard license and am requesting that you have it
deactivated.

I would like to thank you for taking the time with me on the phone
yesterday. I had fewer misgivings concerning the number of computers
I would be allowed to study on after our conversation, however, I have
developed a study routine over the past 18 months and shoehorning
Windows into that process would not be beneficial at this time. I do
realize the lab PC runs Windows but I had already decided the last few
months of lab practice would be done in a Windows environment, not the
core of my studies.

I am truly disappointed. I downloaded the free workbook and have done
a number of labs from it. Because of that previous experience with
Micronics I did not expect the type of copy protection used in the
workbook as there is no mention of LockLizard on the Micronics
website. Over the past few months I have frequently visited the table
of contents for your workbook to map out my studies. My work
environment is based upon Linux, I do not have a Windows PC at home,
and I would be forced to change my study process in order to use the
workbook.

If you decide to change your copy protection to something more along
the lines of O’Reilly Media or Internetwork Expert please contact me,
I will be the first to purchase your workbook in a more portable
format. If you need to speak with me directly, my office phone number
is (xxx) xxx-xxxx and my cell phone number is (xxx) xxx-xxxx.

Sincerely,

Jud Bishop

Categories: Linux, Musings, Routing Tags:

Integrate McKesson MSE into AD

May 24th, 2010 2 comments

I use the term hacking in the classic sense, not in the cracker sense.

We moved one of our enterprise electronic medical records (EMR) from AIX to Linux over the last few weeks. Go-live was last Thursday night, and I would like to take the time to discuss one of the more interesting hacks we did. It was a long project with some interesting puzzles but this was the most interesting to me.

We were told that you cannot integrate Star/MSE into active directory. As far as I was concerned that was throwing down the gauntlet of a challenge to make it work. We have had our fair share of problems with Samba and AD over the years so my boss was pushing to use Likewise rather than pure Samba. We have split infrastructure, most of the virtual servers use Likewise because my boss set them up, whereas all of the pure Linux servers use Samba because I set them up. It boiled down to my boss can hack around Likewise and I am more comfortable hacking Samba. I talked him into Samba so I had to make it work. My boss had hacked Likewise to do something similar so we discussed it and the resulting code is below.

For those who use Star/MSE you probably understand the login process, however, for those who don’t let me explain. Every user who gets a GUI interface on a Star server shares the same home directory under a restricted korn shell. We have about 1,500 users that all share one home directory but it doesn’t matter because the .profile just fires off a GUI program. In a typical setup all of the users are in the hbo group and in the password file their home points to /home/mse.

We configured winbind to use the system files first, then AD. This is so that we could have an orderly move from system authentication to AD authentication.

# cat /etc/nsswitch.conf | grep winbind
passwd:     files winbind
shadow:     files winbind
group:      files winbind

In AD we made two groups, hbo to map to the Linux hbo group and a nomse group. Then we forced every AD user into /home/mse directory upon login with the following configuration in /etc/samba/smb.conf.

template shell = /bin/rksh
template homedir = /home/mse
winbind use default domain = true
obey pam restrictions = yes

The point of the nomse group is to be able to pick out the users who should not have the GUI fired off upon login. Even though the group numbers do not match and they are not group mapped with the net groupmap command it doesn’t matter. The trick here is that I am looking for group names in the .profile rather than gids. Below is a portion of the .profile, I would include more but I am not sure of the copyright and it is not pertinent to the discussion.

## 2010-05-19  Jud Bishop
## This is for Active Directory integration of MSE.
## DO NOT CHANGE THIS PORTION OF THE FILE OR USERS WILL NOT BE ABLE TO LOGIN.

USER=`whoami`

for I in `groups |cut -d \: -f 2`
do
        if [ "$I" = "nomse" ]
        then
                export HOME="/home/AD/$USER"
                export SHELL="/bin/bash"
                # The MSEFLAG used to be set below, it is now set here for AD integration.
                MSEFLAG=NO
                # This break is crucial because it exits out with the correct $HOME
                break
        else
                export HOME="/home/mse"
                MSEFLAG=YES
        fi
done
echo "Setting home directory to $HOME"
cd $HOME
Categories: Linux Tags: