Building a Cloud while in the Clouds

So you’re heading to the US for some business meetings with your Chief Architect then you get upgraded to business class where there’s free WiFi and you’ve got 6 hours to kill. You options are watch movies (seen them all before), drink wine (a given) and/or have an in-flight hackathon to test out the quality of the WiFi.

And so we did just that and went ahead and provisioned an instance of the latest Aerogear Mobile Services powered by OpenShift Origin, resulting in very own cloud platform built in the clouds!

Indeed, the Internet connection was spotty at best but in between the spottiness, our installer script did run to completion…

…and we did (eventually) get the all-elusive OpenShift Console with the Mobile tab in all it’s beautiful glory.

We also needed to get very creative in order to share the screen shots (which involved USB-C cables and several other travel accessories that only an Architect and Director would have) despite physically sitting beside each other, but such is life. And for good measure, we also published this blog article from the air!

So what have you done to test your in-flight WiFi and how was it for you?

A case for more Open Source at Apple

Open Source Context

I’ve been involved in the software industry for almost 30 years and have long been an admirer of open source software, both in terms of what it stands for and in terms of the inherent value it provides to the communities that create it and support it.

I’m even old enough to remember reading about the creation of the Free Software Foundation itself by Richard Stallman in 1985 and couldn’t be happier to now find myself working at the king pins of open source, Red Hat (through the acquisition of FeedHenry in 2014).

And while in recent years it’s been reassuring to see more and more other companies adopt an open source strategy for some of their products, including the likes of Apple and Microsoft, it’s been equally soul destroying having to live with the continued closed source nature of some of their other products. Take Apple’s Photos app for iOS as a case in point.

Apple iPhoto

Some time around 2011, I took the decision to switch to Apple’s excellent iPhoto app for managing my personal photo collection, principally due the facial recognition and geolocation features but also because of the exceptional and seamless user experience across the multitude of Apple devices I was amassing.

Then, in late 2012, I undertook a very lengthly personal project (spanning 9 months or more) to convert my extended family’s vintage photo collection to digital format, importing them to iPhoto and going the extra mile to complete the facial and location tagging also.

The resultant experience was incredible, particularly when synced onto my iPad of the time (running iOS 6). Hours at a time were spent perusing through the memories it invoked, with brief interludes of tears and laughter along the way. What was particularly astonishing was how the older generations embraced the iPad experience within minutes of holding the device for the very first time. This was the very essence of what Steve Jobs worked his entire life for, and for this I am eternally grateful to the genius he clearly was.

Apple Photos

However, since then, with the launch of subsequent releases of iOS I have never been able to recreate the same experience, for two reasons.

Firstly, the user interface of the iPhoto app kept changing (becoming less intuitive each time, proven by the lessening magic experienced by the same generation that previously loved it so much), and secondly, it was replaced by the Photos app outright which, incredibly, has one simple but quite incredulous bug – it cannot sort!

Yes, quite incredibly, the Photos app for iOS cannot sort my photos when using the Faces view. If you don’t believe me, just Google phrase “apple photos app sort faces” and take your pick of the articles lamenting such a rudimentary failing.

A Case for Open Source

So what does this have to with open source?“, I hear you ask.

Well, trawling through the countless support articles on Apple’s user forums, it seems that this bug has been confirmed by hundreds of users but, several years later, it is still not fixed. If this was an open source project, it would have been long since fixed by any one of a number of members of the community I’m sure would form around it, and potentially even by me!

So c’mon Apple, let’s have some more open source and let’s make your products better, together.

A Simple Model for Managing Change Windows

One of the more common things we do in the Cloud Operations team at Red Hat Mobile is facilitate changes to environments hosted on the Red Hat Mobile Application Platform, either on behalf of our customers or for our own internal operational purposes.

These are normally done within what is commonly known as a “Change Window”, which is a predetermined period of time during which specific changes are allowed to be made to a system, in the knowledge that fewer people will be using the system or where some level of service impact (or diminished performance) has been deemed acceptable by the business owner.

We have used a number of different models for managing Change Windows over the years, but one of our favourite approaches (that adapts equally well to both simple and complex changes and that is easy for our customers and internal stakeholders to understand) is this 5-phase model.


The planning phase is basically about identifying (and documenting) a solid plan that will serve as a rule book for all the other elements in this model (below). In addition to specifying the (technical) steps required to make (and validate) the necessary changes, your plan should also include additional (non-technical) information that you will most likely need to share externally so as to set the appropriate expectations with the affected users. This includes specifying:

  • What changes are you planning to make?
  • When are you proposing to make them?
  • How long will they take to complete?
  • What will the impact (if any) be on the users of the system before, during and after the changes are made?
  • Is there anything your customers/users need to do beforehand or afterwards?
  • Why are you making these changes?

Your planning phase should also include a provision for formally communicating the key elements of your plan (above) with those interested in (or affected by) it.


The commencement phase is about executing on the elements of your plan that can be done ahead of time (i.e. in the hours or minutes before the Change Window formally opens) but that do not involve any actual changes.

Examples include:

  1. Capturing the current state of the system (before it is changed) so that you can verify the system has returned to this state afterwards.
  2. Issuing a final communication notice to your users, confirming that the Change Window is still going ahead.
  3. Configuring any monitoring dashboards so that the progress (and impact) of the changes can be analysed in real time once they commence.

The commencement phase can be a very effective way to maximise the time available during the formal Change Window itself, giving you extra time to test your changes or handle any unexpected issues that arise.


The execution phase is where the planned changes actually take place. Ideally, this will involve iterating through a predefined set of commands (or steps) in accordance with your plan.

One important mantra which has stood us in good stead here over the years is, “stick to the plan”. By this we mean, within reason, try not to get distracted by minor variations in system responses which could consume valuable time, to the point where you run out of time and have to abandon (or roll back) your changes.

It’s also strongly recommended that the input to (and outputs from) all commands/steps are recorded for reference. This data can be invaluable later on if there is a delayed impact on the system and steps need to be retraced.


Again this phase should be about iterating through a predefined set of verification steps that may include examining various monitoring dashboards, running automated acceptance/regression test tooling, all in accordance with two very basic principles:

  1. Have the changes achieved what they were designed to (i.e. does the new functionality work)?
  2. Have there been any unintended consequences of the changes (i.e. does all the old functionality still work, or have you broken something)?

Again, it’s very important to capture evidence of the outcomes from validation phase, both as evidence to confirm the changes have been completed successfully and that the system has returned to it’s original state.

All Clear

This phase is very closely linked to the validation phase but is slightly more abstract (and usually less technical) in nature. It’s primary purpose is to act as a higher-level checklist of tasks that must to be completed, in order that the final, formal communication to the customer (or users) can be sent, confirming that the work has been completed and verified successfully.


How to exhaust the processing power of 128 CPUs

Amazon Web Services launched another first earlier this year in the form of a virtual server (or EC2 instance as they call it) with a staggering 128 virtual CPUs and over 1.9 Terabytes of memory.

The instance, which is an x1.32xlarge in their naming scheme, could cost you as much as $115,000 per year to operate but you could certainly reduce that figure significantly (e.g. $79,000) if you knew ahead of time that you would be running it 24×7 during that time.

In any case, during a recent experiment using one of these instances, we set about trying to find some novel ways to max out the processing power and memory, and here are the two techniques we settled on (with evidence of each of them in action).

CPU Exhaustion

This was strangely a lot easier than we expected and simply involved using the Unix yes command which, it seems, draws excessive amounts of processing power when used in isolation from it’s normal purpose.

So for our x1.32xlarge instance, with it’s 128 vCPUs, we used the command below to spawn 127 processes each running the yes command and we then monitored it’s impact using the htop command.

$ for i in {1..127}; do yes>/dev/null & done

And here it is in action:

The reason for spawning just 127 processes (instead of the full 128) was to ensure that the htop monitoring utility itself would have enough resources to be able to function, which can been seen clearly above.

Memory Exhaustion

Exhausting the memory was a little harder (to do quickly) but one of the more hard-core Unix guys came up with this old-school beauty which combines the processor-hungry yes command with some complex character replacements, plus searches for character sequences that will never be found.

$ for i in `seq 1 40`; do cat <(yes | tr \\n x | head -c $((10240*1024*4096))) <(sleep 18000) | grep n &  done

And here it is in action too, noting the actual memory usage in the bottom, left:

Note also that the CPU usage, while almost at the limit, is not as clear-cut as before and all processors are being utilised equally (for the most part). Note also the Load Average of 235 (bottom, right of centre) which supports the theory that Unix systems can theoretically sustain load averages of twice the number of processors before encountering performance issues. Some folks believe this to be closed to one times the number of processors but the results above suggest otherwise.

Amazon Web Services X1

The original announcement of the X1 instance type is available at:

Monitoring the Health of your Security Certificates

Security Certificates

Most modern websites are protected by some form of Security Certificate that ensures the data transmitted from your computer to it (and vice versa) is encrypted. You can usually tell if you are interacting with an encrypted website via the presence of a padlock symbol (usually green in colour) near the website address in your browser.

These certificates are more commonly known as SSL Certificates but in actual fact, the more technically correct name for them is TLS Certificates (it’s just that nobody really calls them that as the older name has quite a sticky sound/feel to it).

Certificate Monitoring

In any case, one of the things we do a lot of at Red Hat Mobile is monitoring, and over the years we’ve designed a large collection of security certificate monitoring checks. The majority of these are powered by the OpenSSL command-line utility (on a Linux system), which contains some pretty neat features.

This article explains some of my favourite things you can do with this utility, targeting a website secured with an SSL Certificate.

Certificate Analysis

Certificate Dumps

Quite often, in order to extract interesting data from a security certificate you first need to dump it’s contents to a file (for subsequent analysis):

$ echo "" | openssl s_client -connect <server-address>:<port> > /tmp/cert.txt

You will, of course, need to replace <server-address> and <port> with the details of the particular website you are targeting.

Determine Expiry Date

Using the dump of your certificate from above, you can then extract it’s expiry date like this:

$ openssl x509 -in /tmp/cert.txt -noout -enddate|cut -d"=" -f2

Determine Common Name (Subject)

The Common Name (sometimes called Subject) for a security certificate is the website address by which the secured website is normally accessed. The reason it is important is that, in order for your website to operate securely (and seamlessly to the user), the Common Name in the certificate must match the server’s Internet address.

$ openssl x509 -in /tmp/cert.txt -noout -subject | sed -n '/^subject/s/^.*CN=//p' | cut -d"." -f2-

Extract Cipher List

This is a slightly more technical task but useful in some circumstances nevertheless (e.g. when determining the level or encryption being used).

$ echo "" | openssl s_client -connect <server-address>:<port> -cipher

View Certificate Chain

In order that certain security certificates can be verified more thoroughly, a trust relationship between them and the Certificate Authority (CA) that created them must be established. Most desktop web browsers are able to do this automatically (as they are pre-programmed with a list of authorities they will always trust) but some mobile devices need some assistance with establishing this trust relationship.

To assist such devices with this, the website administrator can opt to install an additional set of certificates on the server (sometimes known as Intermediate Certs) which will provide a link (or chain) from the server to the CA. This chain can be verified as follows:

$ echo "" | openssl s_client -connect <server-address>:<port> -showcerts -status -verify 0 2>&1 | egrep "Verify return|subject=/serial"

The existence of the string “0 (ok)” in the response indicates a valid certificate chain.


You can find out more about the openssl command-line utility at

The True Value of a Modern Smartphone

While vacationing with my family recently, I stumbled into a conversation with my 11-year old daughter about smartphones and the ever growing number of other devices they are replacing as they digitally transform our lives.

For fun, we decided to compare the relative cost of the vacation with and without my smartphone at the time (a Samsung Galaxy S3, by the way) and by imagining if we’d taken the same vacation a mere 10 years earlier, how much extra would that vacation have cost without the same smartphone?

Smart Cost Savings

I was actually quite shocked at the outcome, both in terms of the number of other devices the modern smartphone now replaces (we managed to count 10) and at the potential cost savings it can yield, which we estimated at a whopping $3,450!

Smart Cost Analysis

The estimations below are really just for fun and are not based on very extensive research on my part (more of a gut feeling about a moment in time, plus some quick Googling). You can also assume a 3-week vacation near some theme parks in North America.

Telephony: $100

Assuming two 15 minute phone calls per week, from USA to Ireland, at mid-week peak rates, you could comfortably burn $100 here.

Camera: $1,000

Snapping around 1,000 old-school, non-digital photos (at 25 photos per 35mm roll of film) would require approximately 40 rolls of film (remember, no live preview). Then factoring in the cost of a decent SLR camera, plus plus the cost of developing those 40 rolls of film, you could comfortably spend well in excess of $1,000 here.

Of course digital cameras would indeed have been an option 10 years ago too but it’s unreasonable to suggest that a decent digital camera (with optical zoom, of sufficient portability and quality for a 3-week family vacation) could also have set you back $1,000.

Music Player: $300

The cost of an Apple iPod in 2005 was around $299.

GPS / Satellite Navigation: $400

It’s possible that in 2005, the only way to obtain a GPS system for use in North America was to rent one from the car rental company. Today, this costs around $10 per day, so let’s assume it would have cost around/under $20 in 2005.

Games Console: $300

The retail price for a Nintendo DS in 2005 was $149.99 but you also need to add in the cost of a selection of games, which cost around $50 each. Let’s be reasonable and suggest 3 games (one for each week of the vacation).

Laptop Computer: $1,000

I’m not entirely sure how practical/easy it would have been to access the Internet (at the same frequency) while on vacation in 2005 (i.e. how many outlets offered WiFi at all, never mind free WiFi). Internet Café’s would have been an option too, but would not have offered the levels of flexibility I’d had needed to catch up on emails and update/author some important business documents, so let’s assume the price of a small laptop here.

Mobile Hotspot / MiFi: $200

Again, not quite sure if these were freely available (or feasible) in 2005, but let’s nominally assume they were and price them at double what they cost today, plus $100 for Internet access itself

Alarm Clock: $50

I guess you could request a wake up call in your hotel but if you were not staying in a hotel and needed an alarm clock, you’d either have needed a watch with one on it, or had to purchase an alarm clock.

Compass: $50

Entirely optional of course, but if you’re the outdoor type and fancy a little roaming in some of the national parks, you might like to have a decent compass with you.

Torch: $50

Again, if you’re the outdoor type, or just like to have some basic/emergency tools with you on vacation, you might have brought (or purchased) a portable torch or Maglite Flashlight.

10 Ways to Analyzing your Apache Access Logs using Bash

Since it’s original launch in 1995, the Apache Web Server continues to be one of the most popular web servers in use today. One of my favourite parts of working with Apache during this time has been discovering and analysing the wealth of valuable information contains its access logs (i.e. the files produced on foot of general usage from users viewing the websites served by Apache on a given server).

And while there are plenty of free tools available to help analyse and visualise this information (in an automated or report-driven way), it can often be quicker (and more efficient) to just get down and dirty with the raw logs and fetch the data you need that way instead (subject to traffic volumes and log file sizes, of course). This is also a lot more fun.

For the purposes of this discussion, let’s assume that you are working with compressed backups of the access logs and have a single, compressed file for each day of the month (e.g. the logs were rotated by the operating system). I’m also assuming that you are using the standard “combined” LogFormat.

1. Total Number of Requests

This is just a simple word count (wc) on the relevant file(s):

Requests For Single Day

$ zcat access.log-2014-12-31.gz | wc -l

Requests For Entire Month

$ zcat access.log-2014-12-*.gz | wc -l

2. Requests per Day (or Month)

This will show the number of requests per day for a given month:

$ for d in {01..31}; do echo "$d = `zcat access.log-2014-12-$d.gz | wc -l`"; done

Or the requests per month for a given year:

$ for m in {01..12}; do echo "$m = `zcat access.log-2014-$m-*.gz | wc -l`"; done

3. Traffic Sources

Show the list of unique IP addresses the given requests orginated from:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq

List the unique IP addresses along with the number of requests from each, from lowest to highest:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq -c | sort -n

4. Traffic Volumes

Show the number of requests per minute over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c

Show the number of requests per second over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c

5. Traffic Peaks

Show the peak number of requests per minute for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c | sort -n

Show the peak number of requests per second for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c | sort -n

Commands Reference

The selection of Unix commands used to conduct the above analysis (along with what they’re doing above) was:

  • zcat – prints the contents of a compressed file.
  • wc – counts the number of lines in a file (or input stream)
  • cut – splits data based on a delimiting character.
  • sort – sorts data, either alphabetically or numerically (via -n option).
  • uniq – removes duplicates from a list, or shows how much duplication there is.

Injecting Data into Scanned Vintage Photos

In a previous post, entitled Vintage Photo Scanning – A Journey of Discovery, I shared my experiences while digitising my parent’s entire vintage photo collection.

As part of that pet project, I also took the liberty of digitally injecting some of the precious data I had learned about them (i.e. when and where the photos were taken, and who was in them) into the photo files themselves, so that it would be preserved forever along with the photo is pertained to. This article explains how I did that.

Quick Recap

My starting point was a collection of around 750 digital images, arranged into a series of folders and subdirectories, and named in accordance with the following convention:


So the task at hand was how to programmatically (and thus, automatically) inject data into each of these files, starting over if required, so that sorting them or importing them into photo management software later becomes much easier.

Ready, Steady, Bash!

In order to process many files at a time, you’re going to need to do a little programming. My favourite language for this sort of stuff is Bash, so expect to see plenty of snippets written in Bash from here on. I am also going to assume that you are running this on a Unix/Linux-based system (sorry, Mr. Gates).


While each photo will have a different date, title, list of people and location, there are some other pieces of data you like to store within them (while you’re at it), which you (or your grandchildren) may be glad of later on. To this end, let’s set up some assumptions and common fields.

Default Data

If you only know the year of a photo, you’ll need to make some assumptions about the month and day:


Reusable File Locations

Define the location of exiftool utility or any other programs you’re using (if they are located in a non-standard part of your system), along with any other temporary files you’ll be creating:

CSVFILE=$(dirname "$0")/photos.csv
GPSFILE=$(dirname "$0")/gps.txt
GPS_COORDS=$(dirname "$0")/gps-coords.txt

Common Metadata

Most photo files support other types of metadata including the make/model of camera used, the applicable copyright statement and of the owner of the files. If you wish to use these, you could define their values in some variables that can be used (for each file) later on:

EXIF_MODEL="HP Deskjet M4500"
EXIF_COPYRIGHT="Copyright (c) 2014, James Mernin"
EXIF_OWNER="James Mernin"

Data Injection

List, Iterate & Extract

Firstly, compile a list of the files you wish to process:

find "$IMAGE_DIR" -name "*.jpg" > $PHOTO_FILES

Now iterate over that list, processing one file at a time:

while read line; do
 # See below for what to do...
done < "$PHOTO_FILES"

Now split the input line to extract the 4 field of data:

BASENAME=`basename "$line" .jpg`
ID_DATE=`echo $BASENAME|cut -d"~" -f1`
ID_TITLE=`echo $BASENAME|cut -d"~" -f2`
ID_PEOPLE=`echo $BASENAME|cut -d"~" -f3`
ID_LOCATION=`echo $BASENAME|cut -d"~" -f4`

Prepare Date & Title

Prepare a suitable value for Year, Month and Day (taking into account the month and day maybe unknown):

DATE_Y=`echo $ID_DATE|cut -d"-" -f1`
DATE_M=`echo $ID_DATE|cut -d"-" -f2`
if [ -z "$DATE_M" ] || [ "$DATE_M" = "$DATE_Y" ]; then DATE_M=$DEFAULT_MONTH; fi
DATE_D=`echo $ID_DATE|cut -d"-" -f3`
if [ -z "$DATE_D" ] || [ "$DATE_D" = "$DATE_Y" ]; then DATE_D=$DEFAULT_DAY; fi

It’s possible the title of some photos may contain a numbered prefix in order to separate multiple photos taken at the same event (e.g. 01-School Concert, 02-School Concert). This can be handled as follows:

TITLE_ORDER=`echo $ID_TITLE|cut -d"-" -f1`
if [ -n "$TITLE_ORDER" ]; then
 if [ $TITLE_ORDER -eq $TITLE_ORDER ] 2>/dev/null; then TITLE=`echo $ID_TITLE|cut -d"-" -f2-`; fi

Location Processing

This is a somewhat complex process but essential boils down to trying to determine the GPS coordinates for the location of each photo. This is because most photo file only support the GPS coordinates inside their metadata. I have used the Google Maps APIs for this step with an assumption that you get an exact match first time. You can, of course, complete this step as a separate exercise beforehand and store the results of that in a separate file to be used as input here.

In any case, the following snippet will attempt to fetch the GPS coordinates for the location of the given photo. Pardon also the crude use of Python for post-processing of the JSON data returned by the Google Maps APIs.

ENCODED_LOCATION=`python -c "import urllib; print urllib.quote(\"$ID_LOCATION\");"`
GPSDATA=`curl -s "$ENCODED_LOCATION&sensor=false"`
NUM_RESULTS=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print len(data['results'])"`
if [ $NUM_RESULTS -eq 0 ]; then
 GPS_LAT=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lat']"`
 GPS_LNG=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lng']"`

Convert any negative Latitude and Longitude values to North/South or West/East so that your coordinates end up in the correct hemisphere and on the correct side Greenwich, London:

if [ "`echo $GPS_LAT|cut -c1`" = "-" ]; then GPS_LAT_REF=South; else GPS_LAT_REF=North; fi
if [ "`echo $GPS_LNG|cut -c1`" = "-" ]; then GPS_LNG_REF=West; else GPS_LNG_REF=East; fi

Inject Data

You now have all the data you need to prepare the exiftool command:

EXIF_DATE="${DATE_Y}:${DATE_M}:${DATE_D} 12:00:00"
echo "Title: $TITLE" > $PHOTO_DESC
echo "People: $ID_PEOPLE" >> $PHOTO_DESC
echo "Location: $ID_LOCATION" >> $PHOTO_DESC

if [ -n "$ID_LOCATION" ]; then
 $EXIFTOOL -overwrite_original \
  -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
  -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
  -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \
  -GPSLatitude=$GPS_LAT -GPSLongitude=$GPS_LNG -GPSLatitudeRef=$GPS_LAT_REF -GPSLongitudeRef=$GPS_LNG_REF \
  $EXIFTOOL -overwrite_original \
   -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
   -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
   -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \

Cross your fingers and hope for the best!

Apple iPhoto Integration

Personally, I manage my portfolio of personal photographs using Apple iPhoto so I wanted to import these scanned photos there too. And so the Data Injection measures above I took above simplified this process greatly (especially the Date and Location fields which iPhoto understands natively).

While I did then go on to use iPhoto’s facial recognition features and manually tag each of the people in the photographs (adding several weeks to my project), the metadata injected into the files helped make this a lot easier (as it was visible in the information panel displayed beside each photo) in iPhoto.

Return on Investment


In all, this entire project took almost 9 months to complete (with an investment of 2-3 hours per evening, 1-2 nights per week). The oldest photograph I scanned was from 1927 and the most precious one was one from my early childhood holding a teddy bear that my own children now play with.


The total number of photos processed was somewhere in the region of 750. And while that may appear to be a very long time for relatively few photographs, the return on investment is likely to last for many, many multiples of that time.

Upon Reflection

I’ve also been asked a few times since then, “Would I do it again?”, to which the answer is an emphatic “Yes!” as the rewards will last far longer than I could ever have spent doing the work.

However, when asked, “Could I do it again for someone else?”, that has to be a “No”. And not because I would not have the time or the energy, but simply because I would not have the same level of emotional attachment to the subject matter (either the people or the occasions in the photos) and I believe that this would ultimately be reflected in the overall outcome. So hopefully these notes will help others to take on the same journey with their own families.

Vintage Photo Scanning – A Journey of Discovery

I recently undertook a project to scan and digitally convert a collection of vintage photographs belonging to my parents and wanted to share some of my findings, both from a technical and an emotional perspective. So if, like me, you discover a treasure trove of old photographs buried in a drawer somewhere in your parents house, don’t put them back, but do keep reading!

Parental Archives

Like many of my generation and the generations before me, I grew up in an almost exclusively non-digital era with an unwilling reliance on a minimal selection of analog TV and radio channels, cassette tapes and film cameras.

And while the invention of the Internet, coupled with services like YouTube, iTunes and Spotify has meant that many of the TV, radio and musical memories of my youth can be resurrected in the blink of an eye, alas the same does not hold not true for photographic memories. These are way harder to resurrect (impossible in some cases) as they cannot be reproduced or digitally remastered without the original content itself, which in most cases is in the possession of a single entity – your parents!

And this also means that you will require that your parents have done two things:

  1. Taken the time to capture photographs of your childhood in the first place;
  2. Ensure that these (and others of their own) were preserved intact over the intervening years.

And indeed the exact same applies to your parents and to the memories of the life they had before your arrival.


In terms of the equipment used to conduct this year-long exercise, here is what I needed:

  • Digital Photo Scanner: You don’t need to pay a lot of money for this (mine was a HP Deskjet F4580 that I bought for just €50), it just needs to support both Greyscale and Colour scanning (which most do) and at a decent resolution (300dpi).
  • Computer: Again, a relatively inexpensive laptop/desktop will be fine, although the scanning software can prefer a little extra RAM at times (when you’re scanning a lot of photos in a single session). Mine (actually, my wife’s) was a Dell Lattitude running Microsoft Windows.
  • Scanning Software: This may depend on your scanning device. I used to software that came with my scanner.
  • Graphics Software: As you are highly likely to want to crop some of the scanned images afterwards, you may need some additional graphics software for this. My personal open source favourite is GIMP, but there are lots to choose from and your scanning software may even do this for you anyway.
  • Post-its: These could prove really handy for cataloging and sorting the photographs so they can be reinserted into their original albums afterwards.
  • Exiftool: This a Unix command-line utility for injecting meta-data into digital image files, such as GPS location, date & time and names of those in the photograph.

Copious amounts of patience, coffee and beer are also strongly recommended.

Planning & Sorting

Strangely, one of the first challenges you’ll face is exactly how to remove the photos from their albums without damaging them and in such a way that you’ll be able to reinsert them in roughly the same order afterwards.

And don’t forget that, while you may feel that you project is complete once you’ve scanned the photos and have them on your laptop, your parents may want them restored into their original setting and you need to respect that.

So in my view, here is the best way to approach this:

  1. Devise an album/page numbering scheme and attach some post-its (or equivalent) to the pages in the various albums.
  2. Remove all of the Black & White photos first, because it’ll be more efficient to scan these together using the same scanner resolution/quality settings.
  3. As you remove each photo, write the album and page number on the rear, preferably using a pencil (which can easily be removed afterwards if required).
  4. Once removed, arrange the photographs into bundles of roughly the same size. This will also make for more efficient scanning (and cropping) of images later on.


Some of the photos may also be too faded, blurred, cropped or too small to be worth scanning so you may wish to omit those from the process early on. Similarly, keeping multiple (but very similar) photos of the same occasion (with the same people in them) can sometimes dilute the power of just one photo of that occasion.

This is just something you’ll need to make a personal judgement call on but you could use the following logic:

  1. Is there another, similar photo of the same occasion with the same people in it?
  2. Although it’s blurred, or of poor quality, is this the only photo with a particular person or group in it?
  3. Is there a favourite piece of music that this photo could go with, if you were to include it in a musical slide show or movie?

In my case, the percentage success rate here was actually only around 50% (i.e. I ended up skipping roughly half the entire collection) but given the nature of photographic technology at the time, this is not entirely surprising.

Testing, Trial and Error

The first thing you need to do once you think you are ready to start scanning is to stop and do some testing (with just a couple of photos) to be sure you are going to be happy with the results. Here, you are looking to settle on your optimum scanning technique and preferred resolution, file format, compression ratio, colour balance etc.

In terms of the file format, this is important too because not all formats are supported by the popular exiftool utility and so if you plan to inject metadata into the scanned images later on, you need to test this now so you do not use a format you will later regret. For example, I had scanned several hundred photos in PNG format before I realised that I could not inject metadata into them using the exiftool utility. I found that the JPEG format worked best for me.

So trust me, testing beforehand will save you a huge amount of time (and stress) later on, and you will thank me for warning you now.

Scanning & Cropping

In terms of the scanning effort itself, I found that scanning multiple (similarly sized) images at the same time was way more efficient. I also found it more efficient to crop the images from within the scanning software (that came with my scanner) before saving them to disk as separate images.

You might be forgiven for thinking this is the longest phase of the journey, but for me it wasn’t – the dating of the photos, naming of the image files and insertion of meta-data took a lot longer.

Naming Convention

In terms of how you name the image files produced by the scanning exercise, this is really a matter of personal preference. You could just stick with the arbitrary names assigned by the scanning software, but based on my experience you are far better off to invest a little extra time in devising a naming scheme for the files so that you can search for (and/or rearrange) them more easily later on.

What worked for me here was to construct the name of each file using 4 basic pieces of data, separated by a tilde character:



  • <Date> follows the standard YYYY-MM-DD date format. This means that the files will naturally sort themselves chronologically on most standard file browsing applications.
  • <Title> is some sort of snappy, 4-5 word title for the photo or event, possibly prefixed by a number if there are multiple photos taken on the same day at the same event.
  • <People> is a comma-separated list of the names of the people in the photo (as they would be commonly known to your family).
  • <Location> is a succinct description of where the photograph was originally taken (e.g. something that would match a search in Google Maps).

The use of a tilde character as the field separator (as opposed to a comma or hyphen, for example) is also optional, of course, but works well for me in many situations as it is rarely used within any of the other field/data types, thus allowing you to have commas and hyphens in those other fields without confusion.


Personally, I would not advise storing several hundred photos in a single directory as I think it would make them harder to manage, find and sort. I therefore decided to store batches of related files in a series of hierarchical subdirectories, some of which themselves included dates in their name. This is again a personal preference thing but it may work in your favour if you are planning to share a copy of the finished photo collection (on a USB stick or CD or via Dropbox) with friends and family.

Dating & Facial Recognition

This was by far the most enjoyable part of the journey. Not only did I learn so much about my wider family (and about myself) but the time I shared with my parents while undertaking this phase was hugely rewarding, both for them and for me. More mature readers will already know this, of course.

The facial recognition itself is relatively straightforward, in that your parents will either recognise the people or not, and it really doesn’t have to me any more complicated than that.

However, putting date on an old photograph can be a lot more difficult, especially when the folks are that little bit older. However, there are some tricks you can use to help with the accuracy here too, which essentially boil down to asking one or more of the following questions:

  1. We you married when this photo was taken?
  2. Was it before or after an important event in your life (e.g. Holy Communion, Confirmation, 21st Birthday, Wedding)?
  3. Was I (or any of my siblings) born when it was taken?
  4. Were your parents still alive when it was taken?
  5. Where did you/we live when that was taken?

By trying to evaluate the date of the photo in the context of seemingly unrelated milestones in their lives, you may find yourselves able to hone in on the real date with a reasonable sense of accuracy.

Are We There Yet?

At this point, you should have all of the photos scanned, cropped and named according to when they were taken, what the event was, who was in the photo and where it was taken. And for many people that would be more than enough.

However, the engineer in me was of course not happy to leave it at that. So watch out for my next blog post on how to inject metadata into your scanned images and use that to aid the importing of the photos into popular photo management software.