Corrupted Boot Archive after Solaris X86 patch update

I’ve installed a number of Solaris 10 X86 (U3) systems recently a very annoying issue on each one of them which results in the system not booting after installing the latest applicable patches for that system. Immediately after the GRUB boot menu times out and it attempts to boot Solaris, it returns with a “corrupted boot_archive. No boot device available” message. No other information is presented.

Here is how I recovered from this situation:

  1. Boot the system in Failsafe mode
  2. The system will detect your Solaris boot partition and offer to mount it on /a. Select Yes when asked about this.
  3. Once the system completes its Failsafe boot, go to /a/platform/i86pc and remove the file called boot_archive.
  4. Reboot the system using the “reboot” command wherby the system appears to re-generated the file you just deleted.
  5. The system should then boot normally again

After installation and registration of fresh Solaris system, I usually run the smpatch update command at least once to bring the system to a reasonable patch level (before installing any other software on it). I realise that this may not be entirely advisable in a live environment but on a fresh install, I feel it should be reasonable thing to do. After all, the man pages for the smpatch command state (for the update subcommand):

This subcommand analyzes the system, then downloads the appropriate updates from the Sun update server to your system. After the availability of the updates has been confirmed, the updates are applied based on the update policy. …If an update does not meet the policy for applying updates, the update is not applied.

I have used this technique several times on SPARC-based systems without issue. It only appears to happen on X86 installations.

Mounting a CD image on Solaris

If you need to access files on a CD/DVD from a Solaris system that does not have a CD/DVD drive installed, you can do so using the the Loopback File Driver (lofi) on Solaris as follows:

# lofiadm -a /path/to/your/flle.iso

This will create a new device file in /dev/lofi (e.g. /dev/lofi/1) which can then be mounted in the usual fashion:

# mount -F hsfs /dev/lofi/1 /mnt/cdrom

I found this very useful when installing a Solaris JumpStart server on a SunFire T1000 which does not have a CD/DVD drive. I haven’t you looked at doing this at boot time yet but I expect there are many ways that could be done.

Changing the port on Solaris 10 Web Console

If enabled, the Solaris Management Console (SMC) runs a secure web server on port 6789 and this port setting is well documented in many Sun documents as well as several online forums. However, what is not so well documented is how to change this port setting.

Several Sun documents suggest that a simple change to the /etc/opt/webconsole/server.xml file (followed by a restart of the SMC web server) will do the trick but I have found this not to be the case. Each time I restarted the web server (after changing this file), the port on which the web server ran seemed to reverted back to the original setting of 6789. A comment at the top of the server.xml file saying, “DO NOT EDIT THIS FILE” also seems to suggest that this is not the correct file.

Instead (after a lot of searching), I found what I believe to be the correct file – /usr/lib/webconsole/.runtimerc and here is how I eventually changed the port on which the SMC web server runs:

# vi /usr/lib/webconsole/.runtimerc
# /usr/sbin/smcwebserver restart

You should be able to verify the new port is in use using the netstat -an command.

Installing Emacs on Solaris

I tried to install emacs on a Solaris 10 system earlier today and ran into trouble along the way. It wasn’t anything major but I was not able to find the solution on the web so I’m publishing my comments here.

The Problem

I downloaded, unpacked and installed the emacs binaries (version 22.0.91) as well as all specified dependencies from sunfreeware.com but when I tried to run emacs from the command-line after installling, I got the following error:

ld.so.1: emacs: fatal: /usr/local/lib/libpng.so.3: wrong ELF class: ELFCLASS64

It turns out that the latest version of the libpng library (1.2.21) was built as 64-bit library but all of the other dependencies of emacs that I downloaded (including emacs itself) were built as 32-bit files. Naturally, a 32-bit application cannot load a 64-bit library and thus the error above.

The Solution

The solution was as simple as reverting to an earlier version of the libpng library (1.2.20) which seems to have been built in 32-bit. I was able to download this from ftp://ftp.sunfreeware.com/pub/freeware/sparc/10

Dumping MySQL databases on a replication slave

I spent some time looking at the mysqldump command in detail this afternoon with a view to backing up some in-house MySQL databases which were configured in a simple replicated environment (single master, single slave). I knew that executing something like:

# mysqldump -u user --password=xxxx dbname > dumpfile.sql

on the replication slave would do most of what I needed but I wanted to be sure that I was getting a consistent backup of the data nonetheless. At first, I thought I might have to stop MySQL before and after the dump but then I recalled of course that MySQL actually needs to be running to do a dump. My Subversion hat was duly removed for the afternoon!

Then I started to think about locking and flushing databases and tables, a simple database lock/unlock (read) before/after the dump should do the trick. Fortunately, this is taken care of for me by the mysqldump utility through the --opt parameter. This parameter is a shorthand way of enabling several other options, most notably the --lock-tables option, which locks the tables in the database being dumped for the duration of the dump itself.

Interestingly though, the replication process is also suspended by mysqldump and this actually lead me to falsely believe (for a short time) that all databases had been locked instead of just the one being dumped.

So, in essence, the simple command I started out with above actually does everything I need.

Creating start/finish scripts for Solaris JumpStart installations

One of my earlier posts dealt with the installation of a Solaris JumpStart server. However, I have since been asked to publish some details on the use of start/finish scripts (a means by which you can further tailor the JumpStart process). so, here they are:

What are Start/Finish scripts?

In my understanding, a start script is any executable script file (in any supported language) that will be executed prior to the commencement of the Solaris JumpStart installation on a given system. A finish script is any executable script file (in any supported language) that shall be executed just after the Solaris JumpStart installation has completed but before the system is formally rebooted. Personally, I have only ever used finish scripts and did so to automate the installation of certain SUN packages as well as the creation of certain user accounts and directories.

/export/jumpstart/rules

My original post showed a rules file entry which looked something like this:

network XX.XX.XX.0 && arch sparc - myT1000 -

In this instance, there are no start or finish scripts being used (as indicated by the hyphens on either side of the profile name, myT1000). To specify a finish script change the above entry to look like this:

network XX.XX.XX.0 && arch sparc - myT1000 myFinish.sh

Don’t forget to re-run the check program after you update the rules file (/export/install/check)

/export/install/myFinish.sh

An important thing to note about the Solaris JumpStart process is that, prior to the final reboot of the (new) system, the root file system of the new system will be mounted on /a and will remain there until the system performs its final reboot. So, when creating a finish script, you must bear this in mind and alter you finish script accordingly. Aside from this, the finish script uses tools and techniques used by most Solaris administrators (well, mine does anyway) with some notable exceptions. Please Note that I have deliberately omitted certain instructions for security reasons (but you should get the idea in any case):

Script Setup

BASE=/a
MNT=$BASE/mnt
GROUPS_FILE=$BASE/etc/group
PASSWD_FILE=$BASE/etc/passwd

Creating Groups and Users

As I recall, it was a bit tricky to automate this so I went with the following solution:

echo "Creating Groups ..." | tee -a $INSTALL_LOG
echo "$GRP_MYSQL::$GID_MYSQL:" >> $GROUPS_FILE
echo "Creating Users ..." | tee -a $INSTALL_LOG
echo "$USR_MYSQL:x:$UID_MYSQL:$GID_MYSQL::/home/$USR_MYSQL:/dev/null" >> $PASSWD_FILE

Creating Directories

echo "Creating Directories ..." | tee -a $INSTALL_LOG
EXTRA_DIRS="$MNT/media $BASE/storage $BASE/export/zones"
for d in $EXTRA_DIRS
do
if [ ! -d $d ]; then
echo "Creating $d ..." | tee -a $INSTALL_LOG
mkdir -p $d >> $INSTALL_LOG 2>&1
fi
done

Installing Additional Packages

This can be a little tricky as it requires the creation of a package administration file (new to me) in order to automatically install packages. Also, the packages I wanted to install were located on another NFS server

PKG_ADMIN=$BASE/tmp/admin
PKG_REPO=xxxx:/export/install/pkgs
echo "Configuring additional software ..." | tee -a $INSTALL_LOG
mount -F nfs $PKG_REPO $MNT >> $INSTALL_LOG 2>&1
cat > $PKG_ADMIN <
mail=root
instance=overwrite
partial=nocheck
runlevel=nocheck
idepend=nocheck
rdepend=nocheck
space=ask
setuid=nocheck
conflict=nocheck
action=nocheck
basedir=default
DONT_ASK

echo “Installing MySQL …” | tee -a $INSTALL_LOG
pkgadd -n -a $PKG_ADMIN -R $BASE -d $MNT/CSKmysql_sparc.pkg all >> $INSTALL_LOG 2>&1

umount $MNT >> $INSTALL_LOG 2>&1

Clearly, the above snippets will not work out of the box for you. However, they should give you a good starting point.

Reference Sites

Configuring Syslog-NG using SSH on Solaris

Objective

The objective of this exercise is to enable remote systems (clients) to be able to write to the System Log on a central Log Server (server), without losing the ability to write to their own local system log. This is achieved by creating a reverse SSH tunnel from the server to each client (on a special port) such that if the client configures their system log to use that port, the log entry will be sent across the tunnel to the log server.

Configuring SSH Access between Client and Server

The server will need to create an SSH connection to each client. The SSH tunnel will be initiated by the root user at the server but SSH connections by root have been disabled on all of my clients (for obvious reasons) so we will need to use a non-root user to create the SSH tunnel (e.g. someuser). The steps below were used to set up SSH access between server and client:

  1. Create a public key on the server as the user who will initiate the SSH tunnel to the clients (root)

    server# ssh-keygen –t rsa

  2. Copy the resultant public key file to the SSH directory for the someuser at the client

    server# scp /.ssh/id_rsa.pub someuser@remotehost:

  3. At the client, append the public key file just copied to the ~someuser/.ssh/uthorized_keys file

The server should now be able to crate an unchallenged SSH connection to the client (as user someuser)

Configuring SSH Tunnelling

Syslog normally uses port 514 and it follows that this is the port that would need to be tunnelled from the clients to the server in order to enable remote logging. However, the non-root user at the client (someuser) will not be allowed to open port 514 since it is below the range of ports it has access to (only root can access ports below 1024 on Unix). Therefore, what we require is a tunnelled connection from a port above 1024, say 1514. If you are using a firewall or IP Packet Filtering software, you will also need to configure this to allow TCP traffic on port 514 at the server.

Use the following command to test the reverse SSH tunnel between the server and client:

server# /usr/bin/ssh -nNTx -R 1514:127.0.0.1:514 someuser@remotehost.domain.com

If this works, then proceed to the next step. If it does not, try adding –vv to the ssh command to see additional information about why. I had a problem here and it turned out that port forwarding was disabled at my client. To enable port forwarding, I modified the SSH configuration after which I restarted SSH at the client:

client# vi /etc/ssh/sshd_config
Change the value of AllowTcpForwarding to yes
client# svcadm refresh ssh

Automating SSH Tunnelling

To enable the server to automatically create a reverse tunnel to a given client, the following entry should be added to the /etc/inittab file at the server:

log1:3:respawn:/usr/bin/ssh –nNTx
-R 1514:127.0.0.1:514
someuser@remotehost.domain.com > /dev/null 2>&1

This should occupy a single line only (but has been split over several lines here to increase readability). This someuser and remotehost.domain.com should be replaced with a valid user and client hostname for your system.

Configuring the Log Server

  1. Download the syslog-ng package from campin.net and copy it to the system that will become the loghost (do not install it yet). You could obtain a copy of syslog-ng from sunfreeware.com but the one from campin.net installs as a proper SMF service on Solaris and is a cleaner package to work with.
  2. Now remove the existing system-log service from the server. This is required so that syslog-ng can become the primary system logging service on this system.

    # svcadm disable system-log
    # svccfg delete system-log

  3. Install the syslog-ng package downloaded above:

    # pkgadd –d NCsysng-1.6.7-1.pkg

  4. Ensure that it is listed as a valid service

    # svcs –a | grep system-log-ng

  5. Edit the configuration file /usr/local/etc/syslog-ng/syslog-ng.conf (you might like to take a copy of the existing file first) and edit as appropriate:

    options {
    check_hostname(yes);
    keep_hostname(yes);
    chain_hostnames(no);
    };
    source inputs {
    internal();
    sun-streams("/dev/log");
    udp(ip("127.0.0.1"));
    tcp(ip("127.0.0.1") max_connections(100) keep-alive(yes));
    };
    destination logfile {
    file("/var/adm/syslog-ng/$HOST/$YEAR/$MONTH/$FACILITY.$YEAR$MONTH$DAY"
    owner(root) group(root) perm(0600)
    create_dirs(yes) dir_perm(0700));
    };
    log {
    source(inputs);
    destination(logfile);
    };

  6. Start the new syslog-ng service

    # svcadm restart system-log-ng

  7. Verify that the service is operating correctly (should not be listed in output from command below)

    # svcs -xv system-log-ng

Configuring the Log Client

Repeat steps 1-7 above but in Step 5, add the following 2 additional settings to the configuration file:

destination remote {
tcp("127.0.0.1" port(1514));
};
log {
source(inputs);
destination(remote);
};

Note the inclusion of 127.0.0.1 and port(1514) here. This tells the syslog service to write to port 1514 on the local system. This port represents one end of the SSH tunnel and writing to it will result is the log entry being sent to port 514 on the log server, which will result in a new entry on that log server.

Notes

  1. Some sites recommend using the keep-alive(yes) setting with the tcp() function above on the server and the client (to avoid SSH hang-ups). However, I found that this is not supported at the client and it caused my system-log-ng service to enter maintenance mode at the client. The reason given by svcs –xv was “restarting too quickly” which was very vague. I ended up searching through the manifest files for the system-log-ng service to see what command-line it was actually executing (/usr/local/sbin/syslog-ng) and then running that by hand. It was only then that I saw an error indicating an invalid parameter in my configuration file.
  2. The use of ip("127.0.0.1") in tcp() and udp() in the server configuration ensures that the log server will only listen for local traffic on port 514 on that system. This is more secure.

Useful Links

Best of luck!

Registering a Solaris system from the command-line

After you install Solaris on a system you must register it with Sun before you can do anything useful with it (in particular, apply patches). The most common way of doing this is via the Update Manager application (/usr/bin/updatemanager) which normally runs the registration wizard the first time it is used. However, this application requires a graphical terminal which you many not always have.

So, here is how to register your Solaris system from the command-line:

1. Create a Registration properties file (copy a sample one already on the system)

# cp /usr/lib/breg/data/RegistrationProfile.properties /tmp/myreg.properties

2. Add your Sun Developer Connection (SDC) username and password to the new file and save

# vi /tmp/myreg.properties

3. Register the system as follows:

# sconadm -a -r /tmp/myreg.properties

That should be all that you need to do. There is also no need to retain the properties file.

Configuring MySQL Database Replication using Solaris and ZFS

The following notes were taken during an exercise to configure MySQL database replication across two SunFire T2000 servers running Solaris 10, each of which also has a single ZFS file system mounted in /storage/xxxx (where the respective MySQL data files are located). The snapshot capabilities of ZFS were hugely beneficial in this scenario as they allowed mere seconds of database downtime (as opposed to potentially several minutes or even hours otherwise).

The process of replication setup is already well documented on the Internet, particularly here on the MySQL website. However, as usual, I have chosen to share my experiences and observations in the hope that may prove useful to others.

Introduction

Each of the steps below is precluded by the system on which the commands are to be carried out, MASTER or SLAVE.

MASTER

The following commands will affect the availability of the database so prepare a number of login sessions with the commands ready to be executed and execute them as quickly as possible in the correct order so as to minimize the down time of the database.

1. Lock the Database
master$ mysql -u root -p
mysql> FLUSH TABLES WITH READ LOCK;

2. Create a snapshot of the database
master# zfs snapshot tank/masterzfs@mysql-2007-07-18

3. Record the File and Position information from Master Status (to be used on slave later on)
mysql> SHOW MASTER STATUS;
Typical values include File: mysql-bin.000008 and Position: 420414560

4. Unlock the database again
mysql> UNLOCK TABLES;

5. Verify the successful creation of the ZFS snapshot
master# zfs list

6. Compress/pack the required contents of the ZFS snapshot
master# cd /storage/masterzfs/.zfs/snapshot/masterzfs-mysql-2007-07-18/mysql
master# tar -cf /storage/masterzfs/masterzfs-mysql-2007-07-13.tar mysql/

[NOTE] The .zfs directory above is a hidden directory and may not be visible (even via “ls -a”). However, it is there and you can access it!

SLAVE

7. Copy the database archive produced on the master to the slave (using SSH compression)
slave# mkdir /storage/slavezfs/tmp
slave# cd /storage/slavezfs/tmp
slave# scp -C etadev@db219:/storage/masterzfs/masterzfs-mysql-2007-07-13.tar .

[NOTE] This took approximately 75 minutes for a 4.7GB file over a 100Mb connection

8. Unpack the database archive at the slave
slave# cd /storage/slavezfs/tmp
slave# tar –xvf masterzfs-mysql-2007-07-13.tar

MASTER

9. Grant database access to the slave
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'xx.xx.xx.xx' IDENTIFIED BY 'replpass';
Clearly, you will need to substitute your own username, slave IP Address and replication password here

10. Update the Master MySQL Configuration File(s) and restart MySQL (if required)
master# vi /etc/my.cnf
server-id=1
log-bin=mysql-bin (uncomment)
innodb_flush_log_at_trx_commit = 1 (uncomment if using InnoDB tables)

SLAVE

11. Move the unpacked files from the Master into place (ensure that MySQL and all services that use it are stopped)
slave# cd /storage/slavezfs
slave# mv mysql mysql.pre-repl
slave# mv tmp/mysql .

12. Update the Slave MySQL Configuration File(s) and restart MySQL (if required)
slave# vi /etc/my.cnf
server-id=2
log-bin=mysql-bin (uncomment)
innodb_flush_log_at_trx_commit = 1 (uncomment if using InnoDB tables)

13. Configure and Start Replication
mysql> CHANGE MASTER TO MASTER_HOST='xx.xx.xx.xx', MASTER_PORT=XXXX, MASTER_USER='repl', MASTER_PASSWORD='replpass', MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=420414560;
mysql> START SLAVE;

Clearly, you will need to substitute your own username, master IP Address, Port and replication password here. Also MASTER_LOG_FILE and MASTER_LOG_POS are the values recorded from the SHOW MASTER STATUS in Step 3. Be careful when entering the CHANGE MASTER command above as the inclusion of white spaces in the log file setting can prevent replication from working correctly. Also, you should be aware that you are now accessing a different database than might have previously been on this system and so, the users/passwords may be different.

14. Verify Replication is Operating correctly
mysql> SHOW SLAVE STATUS;
and examine the Seconds_Behind_Master field. When the slave is fully synchronised with the master, this will be 0.

Best of luck!

Cloning a Solaris Zone

I tried out cloning on a Solaris Zone today and it was a breeze, so much easier (and far, far quicker) than creating another zone from scratch and re-installing all the same users, packages, port lock-downs etc. Here are my notes from the exercise:

Existing System Setup

SunFire T1000 with a single sparse root zone (zone1) installed in /export/zones/zone1. The objective is to create a clone of zone1 called zone2 but using a different IP address and physical network port. I am not using any ZFS datasets (yet).

Procedure

1. Export the configuration of the zone you want to clone/copy

# zonecfg -z zone1 export > zone2.cfg

2. Change the details of the new zone that differ from the existing one (e.g. IP address, data set names, network interface etc.)

# vi zone2.cfg

3. Create a new (empty, unconfigured) zone in the usual manner based on this configuration file

# zonecfg -z zone2 -f zone2.cfg

4. Ensure that the zone you intend to clone/copy is not running

# zoneadm -z zone1 halt

5. Clone the existing zone

# zoneadm -z zone2 clone zone1
Cloning zonepath /export/zones/zone1...
This took around 5 minutes to clone a 1GB zone (see notes below)

6. Verify both zones are correctly installed

# zoneadm list -vi
ID NAME STATUS PATH
0 global running /
- zone1 installed /export/zones/zone1
- zone2 installed /export/zones/zone2

7. Boot the zones again (and reverify correct status)

# zoneadm -z zone1 boot
# zoneadm -z zone2 boot
# zoneadm list -vi
ID NAME STATUS PATH
0 global running /
5 zone1 running /export/zones/zone1
6 zone2 running /export/zones/zone2

8. Configure the new zone via its console (very important)

# zlogin -C zone2

The above step is required to configure the locale, language, IP settings of the new zone. It also creates the system-wide RSA key pairs for the new zone, without which you cannot SSH into the zone. If this step not done, many of the services on the new zone will not start and you may observe /etc/.UNCONFIGURED errors in certain log files.

Summary

You should now be able to log into the new zone, either from the root zone using zlogin or directly via ssh (of configured). All of the software that was installed in the existing zone was present and accounted for in the new zone, including SMF services, user configuration and security settings etc.

Notes

If you are using ZFS datasets in your zones, then you may see the following error when trying to execute the clone command for the newly created zone:

Could not verify zfs dataset tank/xxxxx: mountpoint cannot be inherited
zoneadm: zone xxxxx failed to verify

To resolve this, you need to ensure that the mountpoint for the data set (i.e. ZFS partition) being used has been explicitly set to none. Even though the output from a zfs list command at the global zone might suggest that it does not have a mount point, this has happened to me a number of times and in each case, the following command did the trick for me:

# zfs set mountpoint=none tank/xxxxx

Easy!