A Simple Model for Managing Change Windows

One of the more common things we do in the Cloud Operations team at Red Hat Mobile is facilitate changes to environments hosted on the Red Hat Mobile Application Platform, either on behalf of our customers or for our own internal operational purposes.

These are normally done within what is commonly known as a “Change Window”, which is a predetermined period of time during which specific changes are allowed to be made to a system, in the knowledge that fewer people will be using the system or where some level of service impact (or diminished performance) has been deemed acceptable by the business owner.

We have used a number of different models for managing Change Windows over the years, but one of our favourite approaches (that adapts equally well to both simple and complex changes and that is easy for our customers and internal stakeholders to understand) is this 5-phase model.

Planning

The planning phase is basically about identifying (and documenting) a solid plan that will serve as a rule book for all the other elements in this model (below). In addition to specifying the (technical) steps required to make (and validate) the necessary changes, your plan should also include additional (non-technical) information that you will most likely need to share externally so as to set the appropriate expectations with the affected users. This includes specifying:

  • What changes are you planning to make?
  • When are you proposing to make them?
  • How long will they take to complete?
  • What will the impact (if any) be on the users of the system before, during and after the changes are made?
  • Is there anything your customers/users need to do beforehand or afterwards?
  • Why are you making these changes?

Your planning phase should also include a provision for formally communicating the key elements of your plan (above) with those interested in (or affected by) it.

Commencement

The commencement phase is about executing on the elements of your plan that can be done ahead of time (i.e. in the hours or minutes before the Change Window formally opens) but that do not involve any actual changes.

Examples include:

  1. Capturing the current state of the system (before it is changed) so that you can verify the system has returned to this state afterwards.
  2. Issuing a final communication notice to your users, confirming that the Change Window is still going ahead.
  3. Configuring any monitoring dashboards so that the progress (and impact) of the changes can be analysed in real time once they commence.

The commencement phase can be a very effective way to maximise the time available during the formal Change Window itself, giving you extra time to test your changes or handle any unexpected issues that arise.

Execution

The execution phase is where the planned changes actually take place. Ideally, this will involve iterating through a predefined set of commands (or steps) in accordance with your plan.

One important mantra which has stood us in good stead here over the years is, “stick to the plan”. By this we mean, within reason, try not to get distracted by minor variations in system responses which could consume valuable time, to the point where you run out of time and have to abandon (or roll back) your changes.

It’s also strongly recommended that the input to (and outputs from) all commands/steps are recorded for reference. This data can be invaluable later on if there is a delayed impact on the system and steps need to be retraced.

Validation

Again this phase should be about iterating through a predefined set of verification steps that may include examining various monitoring dashboards, running automated acceptance/regression test tooling, all in accordance with two very basic principles:

  1. Have the changes achieved what they were designed to (i.e. does the new functionality work)?
  2. Have there been any unintended consequences of the changes (i.e. does all the old functionality still work, or have you broken something)?

Again, it’s very important to capture evidence of the outcomes from validation phase, both as evidence to confirm the changes have been completed successfully and that the system has returned to it’s original state.

All Clear

This phase is very closely linked to the validation phase but is slightly more abstract (and usually less technical) in nature. It’s primary purpose is to act as a higher-level checklist of tasks that must to be completed, in order that the final, formal communication to the customer (or users) can be sent, confirming that the work has been completed and verified successfully.

 

A new era begins for Red Hat in Waterford

As the year that was 2016 draws to a close, we embrace and celebrate the dawn of a new era for Red Hat in Ireland with the opening of our brand new offices in my home city of Waterford on Monday, 12 December 2016.

This is an immensely proud moment for the entire Red Hat team in Waterford, especially so for those involved in the FeedHenry acquisition from October 2014 which has lead us to this wonderful occasion.

It is also fitting that the new offices are the first to feature the trademark We Are Red Hat internal branding in the Irish language, which translates as “Is Sinne Red Hat”.

So to the management, staff, families and friends of the growing Red Hat community in Waterford, take a bow and enjoy the celebration and delights that this day will bring.

It is everything that we have worked for and is no more than our wonderful city deserves.

Muhammad Ali and My Grandfather

Edmund (Neddy) Mernin

One of my most prized possessions is some very old VHS footage of my Grandfather, Edmund (Neddy) Mernin (1893-1983), being interviewed for an Irish history documentary in 1969. It really is such a privilege to be able to share this tiny slice of family history with my own children, where they can see real-life footage of their Great Grandfather.

The documentary, entitled Gift of a Church, tells the unusual story of how (in 1965) the church in his home village of Villierstown, Co. Waterford, had been donated by the Church of Ireland to the Catholic people in the village so that they would not have to walk several miles to the nearest village to celebrate Mass on Sunday. The church was in need of some repair and was seemingly no longer needed as the Protestant population had moved away.

The documentary was first broadcast on 30 October 1969 as part of an RTÉ programme called Newsbeat and was reported by the renowned Irish history documentary maker and TV presenter, Cathal O’Shannon (well known for this distinctive voice). And with huge thanks to the team at RTÉ Archives, an excerpt (showing Neddy speaking) is now available online.

Muhammad Ali

So what’s the connection with Muhammad Ali, I hear you say? Well, it turns out that the very same broadcaster that interviewed my Grandfather in 1969 also went on to conduct an infamous interview with boxing legend, Muhammad Ali, when he visited Ireland (to fight Alvin Lewis in Croke Park) in July 1972.

That visit was also chronicled in another Irish documentary from 2012 by Ross Whitaker, entitled When Ali Came to Ireland.

Two Legends, One Story

So it turns out that Muhammad Ali was not the only legendary world figure that Cathal O’Shannon had the privilege of interviewing. He also interviewed Neddy Mernin!

 

How to exhaust the processing power of 128 CPUs

Amazon Web Services launched another first earlier this year in the form of a virtual server (or EC2 instance as they call it) with a staggering 128 virtual CPUs and over 1.9 Terabytes of memory.

The instance, which is an x1.32xlarge in their naming scheme, could cost you as much as $115,000 per year to operate but you could certainly reduce that figure significantly (e.g. $79,000) if you knew ahead of time that you would be running it 24×7 during that time.

In any case, during a recent experiment using one of these instances, we set about trying to find some novel ways to max out the processing power and memory, and here are the two techniques we settled on (with evidence of each of them in action).

CPU Exhaustion

This was strangely a lot easier than we expected and simply involved using the Unix yes command which, it seems, draws excessive amounts of processing power when used in isolation from it’s normal purpose.

So for our x1.32xlarge instance, with it’s 128 vCPUs, we used the command below to spawn 127 processes each running the yes command and we then monitored it’s impact using the htop command.

$ for i in {1..127}; do yes>/dev/null & done

And here it is in action:

The reason for spawning just 127 processes (instead of the full 128) was to ensure that the htop monitoring utility itself would have enough resources to be able to function, which can been seen clearly above.

Memory Exhaustion

Exhausting the memory was a little harder (to do quickly) but one of the more hard-core Unix guys came up with this old-school beauty which combines the processor-hungry yes command with some complex character replacements, plus searches for character sequences that will never be found.

$ for i in `seq 1 40`; do cat <(yes | tr \\n x | head -c $((10240*1024*4096))) <(sleep 18000) | grep n &  done

And here it is in action too, noting the actual memory usage in the bottom, left:

Note also that the CPU usage, while almost at the limit, is not as clear-cut as before and all processors are being utilised equally (for the most part). Note also the Load Average of 235 (bottom, right of centre) which supports the theory that Unix systems can theoretically sustain load averages of twice the number of processors before encountering performance issues. Some folks believe this to be closed to one times the number of processors but the results above suggest otherwise.

Amazon Web Services X1

The original announcement of the X1 instance type is available at:

How to store 1.8 trillion photos on AWS

During a recent evaluation of Amazon’s Elastic File System service, we were astounded to discover that it is backed by what can only be described as a gargantuan storage volume, spanning a whopping 9 Exabytes in size.

For those of you familiar with the Unix operating system, here is a screen shot showing the 9 Exabytes in action (note the sheer number of digits in the Available space column).

To put this mammoth of a number into perspective, assume that:

  1. The average size of a photo taken with a decent smartphone these days is around 5 Megabytes (MB).
  2. It’s quite common to see many such smartphones with a capacity of 16 Gigabytes (GB), which is 16,000 Megabytes, which is around 3,200 photos.
  3. Several laptop models now come with as much as 1 Terabyte (TB) of storage, which is the equivalent of 1,000 Gigabytes (1,000,000 Megabytes), which is enough for well over 200,000 photos.

But to get from there to an Exabyte, you’d need eat your way through a further 1,000 Terabytes to get to what’s known as a Petabyte. And it’d take a further 1,000 of those to finally get to an Exabyte. And remembering that our EFS volume is 9 of those, that’s the equivalent of 1,800 billion (or 1.8 trillion) photos!

And fascinatingly, when using the more helpful alternative of the above Unix command (df -h) which shows the used space in percentage terms, you would have to copy an astonishing 90 Petabytes of information (or 18 billion photos) onto this disk volume just to get the Used column to move off 0%.

Amazon Elastic File System

The main selling points of EFS are that it:

  1. Elastically grows (and shrinks) to meet your storage needs;
  2. Runs across multiple data centres (Availability Zones);
  3. Can be attached to more than one server at a time (by virtue of the fact it’s powered by NFS).
  4. You only pay for the storage you consume.

For more, see http://docs.aws.amazon.com/efs/latest/ug/whatisefs.html.

Why you should always evaluate each new AWS region

Amazon Web Services launched another new hosting region last month, this time in Mumbia, India. The official press release is available at:

Based on our experiences from previous region launches (e.g. Sydney), where we discovered that not all of the services we use/require are available at the time of launch, we decided to compile a list of those features and put together a set of evaluation criteria for determining if/when we might be able to launch some new services from Mumbai.

And while it may seem excessive or wasteful to formally evaluate if all of the services you use are available (plus, the press release normally contains some information about this), we actually discovered that a number of Instance Types (that we were still using in other regions) we not in fact not being made available in Mumbai at all.

Findings

The items below are what we discovered were not as we expected (in the Mumbai region) but the list could, of course, be different for you or for the next region launch.  So there is still value in compiling a list of the services your organisation uses/requires also.

EC2 Instance Types

Not all instances types were available.

Elastic File System

At the time of Mumbai launch this was also not yet available in the Sydney region either.

EC2 API Version

It was actually when first evaluating the Frankfurt region (eu-central-1) that we discovered that AWS would not be supporting V1 of their APIs there (so we had to update some of our tooling). It was the same in Mumbai.

Availability Zones

The Mumbai region only had two Availability Zones at initial launch and this was insufficient for a Disaster Recovery (DR) deployment of MongoDB. This is because MongoDB requires a minimum of 3 members to form a valid Replica Set. And while you can still form a 3-member Replica Set using two AZs, you could lose access to two of those in the event of an AZ failure.

Summary

None of these issues were insurmountable for us but learning about these differences ahead of time did enable us to adjust our delivery timelines and manage customer expectations accordingly (e.g. to allow extra time for new AMI generation), which is a very valuable part of any planning process.

Top Tips for AWS Certificate Manager service

On foot of the recent launch of the AWS Certificate Manager service, we decided to check it out. Here are some of our highlights along with some noteworthy items you may find helpful.

Highlights

  1. The acronym for the new service is ACM (AWS Certificate Manager).
  2. You can programmatically generate certificates, using either the AWS command-line tools or via their APIs (see below).
  3. Certificates generated via ACM are free of charge.
  4. The certificates will automatically renew each year.
  5. Wildcard certificates are also fully supported.

Important to Note

  1. You can only use the certificates within AWS and so cannot extract them to use with externally hosted web servers.
  2. Even though you can programmatically generate certificates, there is still a manual validation process that needs to be completed.
  3. This validation process will be triggered as part of the automatic annual renewal of certificates.
  4. When generating wildcard certificates (e.g. *.acme.example.com), you must also ensure that you include the non-wildcard (base) address as a Subject Alternative Name so that visitors to the site using only that base address (e.g. https://acme.example.com) will avoid security warnings.
  5. You do not appear to have control over the name/id of the generated certificate, so if you had devised some tooling around a naming convention for your previous certs (imported from another provider), the ACM certs may not work with this.

Examples

This command will generate a wildcard certificate in you default region using your default AWS profile (i.e. account):

$ aws acm request-certificate --domain-name *.acme.example.com --subject-alternative-names acme.example.com

This command shows how to specify the region and profile to be used for the new certificate:

$ aws acm request-certificate --profile default --region us-east-1 --domain-name *.acme.example.com --subject-alternative-names acme.example.com

For more details about the AWS Certificate Manager service, visit https://aws.amazon.com/certificate-manager.

Monitoring the Health of your Security Certificates

Security Certificates

Most modern websites are protected by some form of Security Certificate that ensures the data transmitted from your computer to it (and vice versa) is encrypted. You can usually tell if you are interacting with an encrypted website via the presence of a padlock symbol (usually green in colour) near the website address in your browser.

These certificates are more commonly known as SSL Certificates but in actual fact, the more technically correct name for them is TLS Certificates (it’s just that nobody really calls them that as the older name has quite a sticky sound/feel to it).

Certificate Monitoring

In any case, one of the things we do a lot of at Red Hat Mobile is monitoring, and over the years we’ve designed a large collection of security certificate monitoring checks. The majority of these are powered by the OpenSSL command-line utility (on a Linux system), which contains some pretty neat features.

This article explains some of my favourite things you can do with this utility, targeting a website secured with an SSL Certificate.

Certificate Analysis

Certificate Dumps

Quite often, in order to extract interesting data from a security certificate you first need to dump it’s contents to a file (for subsequent analysis):

$ echo "" | openssl s_client -connect <server-address>:<port> > /tmp/cert.txt

You will, of course, need to replace <server-address> and <port> with the details of the particular website you are targeting.

Determine Expiry Date

Using the dump of your certificate from above, you can then extract it’s expiry date like this:

$ openssl x509 -in /tmp/cert.txt -noout -enddate|cut -d"=" -f2

Determine Common Name (Subject)

The Common Name (sometimes called Subject) for a security certificate is the website address by which the secured website is normally accessed. The reason it is important is that, in order for your website to operate securely (and seamlessly to the user), the Common Name in the certificate must match the server’s Internet address.

$ openssl x509 -in /tmp/cert.txt -noout -subject | sed -n '/^subject/s/^.*CN=//p' | cut -d"." -f2-

Extract Cipher List

This is a slightly more technical task but useful in some circumstances nevertheless (e.g. when determining the level or encryption being used).

$ echo "" | openssl s_client -connect <server-address>:<port> -cipher

View Certificate Chain

In order that certain security certificates can be verified more thoroughly, a trust relationship between them and the Certificate Authority (CA) that created them must be established. Most desktop web browsers are able to do this automatically (as they are pre-programmed with a list of authorities they will always trust) but some mobile devices need some assistance with establishing this trust relationship.

To assist such devices with this, the website administrator can opt to install an additional set of certificates on the server (sometimes known as Intermediate Certs) which will provide a link (or chain) from the server to the CA. This chain can be verified as follows:

$ echo "" | openssl s_client -connect <server-address>:<port> -showcerts -status -verify 0 2>&1 | egrep "Verify return|subject=/serial"

The existence of the string “0 (ok)” in the response indicates a valid certificate chain.

OpenSSL

You can find out more about the openssl command-line utility at http://manpages.ubuntu.com/manpages/xenial/en/man1/openssl.1ssl.html.

The True Value of a Modern Smartphone

While vacationing with my family recently, I stumbled into a conversation with my 11-year old daughter about smartphones and the ever growing number of other devices they are replacing as they digitally transform our lives.

For fun, we decided to compare the relative cost of the vacation with and without my smartphone at the time (a Samsung Galaxy S3, by the way) and by imagining if we’d taken the same vacation a mere 10 years earlier, how much extra would that vacation have cost without the same smartphone?

Smart Cost Savings

I was actually quite shocked at the outcome, both in terms of the number of other devices the modern smartphone now replaces (we managed to count 10) and at the potential cost savings it can yield, which we estimated at a whopping $3,450!

Smart Cost Analysis

The estimations below are really just for fun and are not based on very extensive research on my part (more of a gut feeling about a moment in time, plus some quick Googling). You can also assume a 3-week vacation near some theme parks in North America.

Telephony: $100

Assuming two 15 minute phone calls per week, from USA to Ireland, at mid-week peak rates, you could comfortably burn $100 here.

Camera: $1,000

Snapping around 1,000 old-school, non-digital photos (at 25 photos per 35mm roll of film) would require approximately 40 rolls of film (remember, no live preview). Then factoring in the cost of a decent SLR camera, plus plus the cost of developing those 40 rolls of film, you could comfortably spend well in excess of $1,000 here.

Of course digital cameras would indeed have been an option 10 years ago too but it’s unreasonable to suggest that a decent digital camera (with optical zoom, of sufficient portability and quality for a 3-week family vacation) could also have set you back $1,000.

Music Player: $300

The cost of an Apple iPod in 2005 was around $299.

GPS / Satellite Navigation: $400

It’s possible that in 2005, the only way to obtain a GPS system for use in North America was to rent one from the car rental company. Today, this costs around $10 per day, so let’s assume it would have cost around/under $20 in 2005.

Games Console: $300

The retail price for a Nintendo DS in 2005 was $149.99 but you also need to add in the cost of a selection of games, which cost around $50 each. Let’s be reasonable and suggest 3 games (one for each week of the vacation).

Laptop Computer: $1,000

I’m not entirely sure how practical/easy it would have been to access the Internet (at the same frequency) while on vacation in 2005 (i.e. how many outlets offered WiFi at all, never mind free WiFi). Internet Café’s would have been an option too, but would not have offered the levels of flexibility I’d had needed to catch up on emails and update/author some important business documents, so let’s assume the price of a small laptop here.

Mobile Hotspot / MiFi: $200

Again, not quite sure if these were freely available (or feasible) in 2005, but let’s nominally assume they were and price them at double what they cost today, plus $100 for Internet access itself

Alarm Clock: $50

I guess you could request a wake up call in your hotel but if you were not staying in a hotel and needed an alarm clock, you’d either have needed a watch with one on it, or had to purchase an alarm clock.

Compass: $50

Entirely optional of course, but if you’re the outdoor type and fancy a little roaming in some of the national parks, you might like to have a decent compass with you.

Torch: $50

Again, if you’re the outdoor type, or just like to have some basic/emergency tools with you on vacation, you might have brought (or purchased) a portable torch or Maglite Flashlight.