Recovering an Amazon instance with a corrupted SSH configuration

One of our test Amazon EC2 instances decided to go for a holiday last week (reason still under investigation), and as part of the attempted recovery process we tried the eventual obligatory “system reboot”. Unfortunately, this failed to fix the problem, and we then also discovered that an unsupported setting previously added to the OpenSSH configuration (/etc/ssh/sshd_config) actually prevented the SSH daemon from starting at all (we found this out from “View System Log” option in the AWS Management Console).

In any case, as you may know, SSH is pretty much the only way to gain administrative access to an Amazon instance (there is no Unix/Serial console or Remote Desktop to speak of), so if you mess up your SSH configuration, you really are in big trouble!

Actually, that’s not entirely true. As it happens, if your instance is EBS-backed (i.e. uses an EBS volume for its root/system partition), you can actually detach the root partition from the broken instance, attach and mount it on another running instance, fix the erroneous configuration setting, re-attach to the broken instance, and reboot again. Voila, Problem solved!

This is all very reminiscent of that call from a family member saying their computer won’t boot any more, whereupon you remove their hard disk and insert it as a second drive in your own PC in order to get a copy of all the files they invariably haven’t backed up.

For the record we found the solution on the Amazon Forums, and also uncovered another useful command in the process:

sshd -t

Very much like the “apachectl -t” command, this will syntax-check your SSH configuration files for you. You should run this command every time you make a change to your SSH configuration settings.

Leave a Reply