How to store 1.8 trillion photos on AWS

During a recent evaluation of Amazon’s Elastic File System service, we were astounded to discover that it is backed by what can only be described as a gargantuan storage volume, spanning a whopping 9 Exabytes in size.

For those of you familiar with the Unix operating system, here is a screen shot showing the 9 Exabytes in action (note the sheer number of digits in the Available space column).

To put this mammoth of a number into perspective, assume that:

  1. The average size of a photo taken with a decent smartphone these days is around 5 Megabytes (MB).
  2. It’s quite common to see many such smartphones with a capacity of 16 Gigabytes (GB), which is 16,000 Megabytes, which is around 3,200 photos.
  3. Several laptop models now come with as much as 1 Terabyte (TB) of storage, which is the equivalent of 1,000 Gigabytes (1,000,000 Megabytes), which is enough for well over 200,000 photos.

But to get from there to an Exabyte, you’d need eat your way through a further 1,000 Terabytes to get to what’s known as a Petabyte. And it’d take a further 1,000 of those to finally get to an Exabyte. And remembering that our EFS volume is 9 of those, that’s the equivalent of 1,800 billion (or 1.8 trillion) photos!

And fascinatingly, when using the more helpful alternative of the above Unix command (df -h) which shows the used space in percentage terms, you would have to copy an astonishing 90 Petabytes of information (or 18 billion photos) onto this disk volume just to get the Used column to move off 0%.

Amazon Elastic File System

The main selling points of EFS are that it:

  1. Elastically grows (and shrinks) to meet your storage needs;
  2. Runs across multiple data centres (Availability Zones);
  3. Can be attached to more than one server at a time (by virtue of the fact it’s powered by NFS).
  4. You only pay for the storage you consume.

For more, see http://docs.aws.amazon.com/efs/latest/ug/whatisefs.html.

Why you should always evaluate each new AWS region

Amazon Web Services launched another new hosting region last month, this time in Mumbia, India. The official press release is available at:

Based on our experiences from previous region launches (e.g. Sydney), where we discovered that not all of the services we use/require are available at the time of launch, we decided to compile a list of those features and put together a set of evaluation criteria for determining if/when we might be able to launch some new services from Mumbai.

And while it may seem excessive or wasteful to formally evaluate if all of the services you use are available (plus, the press release normally contains some information about this), we actually discovered that a number of Instance Types (that we were still using in other regions) we not in fact not being made available in Mumbai at all.

Findings

The items below are what we discovered were not as we expected (in the Mumbai region) but the list could, of course, be different for you or for the next region launch.  So there is still value in compiling a list of the services your organisation uses/requires also.

EC2 Instance Types

Not all instances types were available.

Elastic File System

At the time of Mumbai launch this was also not yet available in the Sydney region either.

EC2 API Version

It was actually when first evaluating the Frankfurt region (eu-central-1) that we discovered that AWS would not be supporting V1 of their APIs there (so we had to update some of our tooling). It was the same in Mumbai.

Availability Zones

The Mumbai region only had two Availability Zones at initial launch and this was insufficient for a Disaster Recovery (DR) deployment of MongoDB. This is because MongoDB requires a minimum of 3 members to form a valid Replica Set. And while you can still form a 3-member Replica Set using two AZs, you could lose access to two of those in the event of an AZ failure.

Summary

None of these issues were insurmountable for us but learning about these differences ahead of time did enable us to adjust our delivery timelines and manage customer expectations accordingly (e.g. to allow extra time for new AMI generation), which is a very valuable part of any planning process.

Top Tips for AWS Certificate Manager service

On foot of the recent launch of the AWS Certificate Manager service, we decided to check it out. Here are some of our highlights along with some noteworthy items you may find helpful.

Highlights

  1. The acronym for the new service is ACM (AWS Certificate Manager).
  2. You can programmatically generate certificates, using either the AWS command-line tools or via their APIs (see below).
  3. Certificates generated via ACM are free of charge.
  4. The certificates will automatically renew each year.
  5. Wildcard certificates are also fully supported.

Important to Note

  1. You can only use the certificates within AWS and so cannot extract them to use with externally hosted web servers.
  2. Even though you can programmatically generate certificates, there is still a manual validation process that needs to be completed.
  3. This validation process will be triggered as part of the automatic annual renewal of certificates.
  4. When generating wildcard certificates (e.g. *.acme.example.com), you must also ensure that you include the non-wildcard (base) address as a Subject Alternative Name so that visitors to the site using only that base address (e.g. https://acme.example.com) will avoid security warnings.
  5. You do not appear to have control over the name/id of the generated certificate, so if you had devised some tooling around a naming convention for your previous certs (imported from another provider), the ACM certs may not work with this.

Examples

This command will generate a wildcard certificate in you default region using your default AWS profile (i.e. account):

$ aws acm request-certificate --domain-name *.acme.example.com --subject-alternative-names acme.example.com

This command shows how to specify the region and profile to be used for the new certificate:

$ aws acm request-certificate --profile default --region us-east-1 --domain-name *.acme.example.com --subject-alternative-names acme.example.com

For more details about the AWS Certificate Manager service, visit https://aws.amazon.com/certificate-manager.