Muhammad Ali and My Grandfather

Edmund (Neddy) Mernin

One of my most prized possessions is some very old VHS footage of my Grandfather, Edmund (Neddy) Mernin (1893-1983), being interviewed for an Irish history documentary in 1969. It really is such a privilege to be able to share this tiny slice of family history with my own children, where they can see real-life footage of their Great Grandfather.

The documentary, entitled Gift of a Church, tells the unusual story of how (in 1965) the church in his home village of Villierstown, Co. Waterford, had been donated by the Church of Ireland to the Catholic people in the village so that they would not have to walk several miles to the nearest village to celebrate Mass on Sunday. The church was in need of some repair and was seemingly no longer needed as the Protestant population had moved away.

The documentary was first broadcast on 30 October 1969 as part of an RTÉ programme called Newsbeat and was reported by the renowned Irish history documentary maker and TV presenter, Cathal O’Shannon (well known for this distinctive voice). And with huge thanks to the team at RTÉ Archives, an excerpt (showing Neddy speaking) is now available online.

Muhammad Ali

So what’s the connection with Muhammad Ali, I hear you say? Well, it turns out that the very same broadcaster that interviewed my Grandfather in 1969 also went on to conduct an infamous interview with boxing legend, Muhammad Ali, when he visited Ireland (to fight Alvin Lewis in Croke Park) in July 1972.

That visit was also chronicled in another Irish documentary from 2012 by Ross Whitaker, entitled When Ali Came to Ireland.

Two Legends, One Story

So it turns out that Muhammad Ali was not the only legendary world figure that Cathal O’Shannon had the privilege of interviewing. He also interviewed Neddy Mernin!

 

How to exhaust the processing power of 128 CPUs

Amazon Web Services launched another first earlier this year in the form of a virtual server (or EC2 instance as they call it) with a staggering 128 virtual CPUs and over 1.9 Terabytes of memory.

The instance, which is an x1.32xlarge in their naming scheme, could cost you as much as $115,000 per year to operate but you could certainly reduce that figure significantly (e.g. $79,000) if you knew ahead of time that you would be running it 24×7 during that time.

In any case, during a recent experiment using one of these instances, we set about trying to find some novel ways to max out the processing power and memory, and here are the two techniques we settled on (with evidence of each of them in action).

CPU Exhaustion

This was strangely a lot easier than we expected and simply involved using the Unix yes command which, it seems, draws excessive amounts of processing power when used in isolation from it’s normal purpose.

So for our x1.32xlarge instance, with it’s 128 vCPUs, we used the command below to spawn 127 processes each running the yes command and we then monitored it’s impact using the htop command.

$ for i in {1..127}; do yes>/dev/null & done

And here it is in action:

The reason for spawning just 127 processes (instead of the full 128) was to ensure that the htop monitoring utility itself would have enough resources to be able to function, which can been seen clearly above.

Memory Exhaustion

Exhausting the memory was a little harder (to do quickly) but one of the more hard-core Unix guys came up with this old-school beauty which combines the processor-hungry yes command with some complex character replacements, plus searches for character sequences that will never be found.

$ for i in `seq 1 40`; do cat <(yes | tr \\n x | head -c $((10240*1024*4096))) <(sleep 18000) | grep n &  done

And here it is in action too, noting the actual memory usage in the bottom, left:

Note also that the CPU usage, while almost at the limit, is not as clear-cut as before and all processors are being utilised equally (for the most part). Note also the Load Average of 235 (bottom, right of centre) which supports the theory that Unix systems can theoretically sustain load averages of twice the number of processors before encountering performance issues. Some folks believe this to be closed to one times the number of processors but the results above suggest otherwise.

Amazon Web Services X1

The original announcement of the X1 instance type is available at:

How to store 1.8 trillion photos on AWS

During a recent evaluation of Amazon’s Elastic File System service, we were astounded to discover that it is backed by what can only be described as a gargantuan storage volume, spanning a whopping 9 Exabytes in size.

For those of you familiar with the Unix operating system, here is a screen shot showing the 9 Exabytes in action (note the sheer number of digits in the Available space column).

To put this mammoth of a number into perspective, assume that:

  1. The average size of a photo taken with a decent smartphone these days is around 5 Megabytes (MB).
  2. It’s quite common to see many such smartphones with a capacity of 16 Gigabytes (GB), which is 16,000 Megabytes, which is around 3,200 photos.
  3. Several laptop models now come with as much as 1 Terabyte (TB) of storage, which is the equivalent of 1,000 Gigabytes (1,000,000 Megabytes), which is enough for well over 200,000 photos.

But to get from there to an Exabyte, you’d need eat your way through a further 1,000 Terabytes to get to what’s known as a Petabyte. And it’d take a further 1,000 of those to finally get to an Exabyte. And remembering that our EFS volume is 9 of those, that’s the equivalent of 1,800 billion (or 1.8 trillion) photos!

And fascinatingly, when using the more helpful alternative of the above Unix command (df -h) which shows the used space in percentage terms, you would have to copy an astonishing 90 Petabytes of information (or 18 billion photos) onto this disk volume just to get the Used column to move off 0%.

Amazon Elastic File System

The main selling points of EFS are that it:

  1. Elastically grows (and shrinks) to meet your storage needs;
  2. Runs across multiple data centres (Availability Zones);
  3. Can be attached to more than one server at a time (by virtue of the fact it’s powered by NFS).
  4. You only pay for the storage you consume.

For more, see http://docs.aws.amazon.com/efs/latest/ug/whatisefs.html.

Why you should always evaluate each new AWS region

Amazon Web Services launched another new hosting region last month, this time in Mumbia, India. The official press release is available at:

Based on our experiences from previous region launches (e.g. Sydney), where we discovered that not all of the services we use/require are available at the time of launch, we decided to compile a list of those features and put together a set of evaluation criteria for determining if/when we might be able to launch some new services from Mumbai.

And while it may seem excessive or wasteful to formally evaluate if all of the services you use are available (plus, the press release normally contains some information about this), we actually discovered that a number of Instance Types (that we were still using in other regions) we not in fact not being made available in Mumbai at all.

Findings

The items below are what we discovered were not as we expected (in the Mumbai region) but the list could, of course, be different for you or for the next region launch.  So there is still value in compiling a list of the services your organisation uses/requires also.

EC2 Instance Types

Not all instances types were available.

Elastic File System

At the time of Mumbai launch this was also not yet available in the Sydney region either.

EC2 API Version

It was actually when first evaluating the Frankfurt region (eu-central-1) that we discovered that AWS would not be supporting V1 of their APIs there (so we had to update some of our tooling). It was the same in Mumbai.

Availability Zones

The Mumbai region only had two Availability Zones at initial launch and this was insufficient for a Disaster Recovery (DR) deployment of MongoDB. This is because MongoDB requires a minimum of 3 members to form a valid Replica Set. And while you can still form a 3-member Replica Set using two AZs, you could lose access to two of those in the event of an AZ failure.

Summary

None of these issues were insurmountable for us but learning about these differences ahead of time did enable us to adjust our delivery timelines and manage customer expectations accordingly (e.g. to allow extra time for new AMI generation), which is a very valuable part of any planning process.

Top Tips for AWS Certificate Manager service

On foot of the recent launch of the AWS Certificate Manager service, we decided to check it out. Here are some of our highlights along with some noteworthy items you may find helpful.

Highlights

  1. The acronym for the new service is ACM (AWS Certificate Manager).
  2. You can programmatically generate certificates, using either the AWS command-line tools or via their APIs (see below).
  3. Certificates generated via ACM are free of charge.
  4. The certificates will automatically renew each year.
  5. Wildcard certificates are also fully supported.

Important to Note

  1. You can only use the certificates within AWS and so cannot extract them to use with externally hosted web servers.
  2. Even though you can programmatically generate certificates, there is still a manual validation process that needs to be completed.
  3. This validation process will be triggered as part of the automatic annual renewal of certificates.
  4. When generating wildcard certificates (e.g. *.acme.example.com), you must also ensure that you include the non-wildcard (base) address as a Subject Alternative Name so that visitors to the site using only that base address (e.g. https://acme.example.com) will avoid security warnings.
  5. You do not appear to have control over the name/id of the generated certificate, so if you had devised some tooling around a naming convention for your previous certs (imported from another provider), the ACM certs may not work with this.

Examples

This command will generate a wildcard certificate in you default region using your default AWS profile (i.e. account):

$ aws acm request-certificate --domain-name *.acme.example.com --subject-alternative-names acme.example.com

This command shows how to specify the region and profile to be used for the new certificate:

$ aws acm request-certificate --profile default --region us-east-1 --domain-name *.acme.example.com --subject-alternative-names acme.example.com

For more details about the AWS Certificate Manager service, visit https://aws.amazon.com/certificate-manager.

Monitoring the Health of your Security Certificates

Security Certificates

Most modern websites are protected by some form of Security Certificate that ensures the data transmitted from your computer to it (and vice versa) is encrypted. You can usually tell if you are interacting with an encrypted website via the presence of a padlock symbol (usually green in colour) near the website address in your browser.

These certificates are more commonly known as SSL Certificates but in actual fact, the more technically correct name for them is TLS Certificates (it’s just that nobody really calls them that as the older name has quite a sticky sound/feel to it).

Certificate Monitoring

In any case, one of the things we do a lot of at Red Hat Mobile is monitoring, and over the years we’ve designed a large collection of security certificate monitoring checks. The majority of these are powered by the OpenSSL command-line utility (on a Linux system), which contains some pretty neat features.

This article explains some of my favourite things you can do with this utility, targeting a website secured with an SSL Certificate.

Certificate Analysis

Certificate Dumps

Quite often, in order to extract interesting data from a security certificate you first need to dump it’s contents to a file (for subsequent analysis):

$ echo "" | openssl s_client -connect <server-address>:<port> > /tmp/cert.txt

You will, of course, need to replace <server-address> and <port> with the details of the particular website you are targeting.

Determine Expiry Date

Using the dump of your certificate from above, you can then extract it’s expiry date like this:

$ openssl x509 -in /tmp/cert.txt -noout -enddate|cut -d"=" -f2

Determine Common Name (Subject)

The Common Name (sometimes called Subject) for a security certificate is the website address by which the secured website is normally accessed. The reason it is important is that, in order for your website to operate securely (and seamlessly to the user), the Common Name in the certificate must match the server’s Internet address.

$ openssl x509 -in /tmp/cert.txt -noout -subject | sed -n '/^subject/s/^.*CN=//p' | cut -d"." -f2-

Extract Cipher List

This is a slightly more technical task but useful in some circumstances nevertheless (e.g. when determining the level or encryption being used).

$ echo "" | openssl s_client -connect <server-address>:<port> -cipher

View Certificate Chain

In order that certain security certificates can be verified more thoroughly, a trust relationship between them and the Certificate Authority (CA) that created them must be established. Most desktop web browsers are able to do this automatically (as they are pre-programmed with a list of authorities they will always trust) but some mobile devices need some assistance with establishing this trust relationship.

To assist such devices with this, the website administrator can opt to install an additional set of certificates on the server (sometimes known as Intermediate Certs) which will provide a link (or chain) from the server to the CA. This chain can be verified as follows:

$ echo "" | openssl s_client -connect <server-address>:<port> -showcerts -status -verify 0 2>&1 | egrep "Verify return|subject=/serial"

The existence of the string “0 (ok)” in the response indicates a valid certificate chain.

OpenSSL

You can find out more about the openssl command-line utility at http://manpages.ubuntu.com/manpages/xenial/en/man1/openssl.1ssl.html.

The True Value of a Modern Smartphone

While vacationing with my family recently, I stumbled into a conversation with my 11-year old daughter about smartphones and the ever growing number of other devices they are replacing as they digitally transform our lives.

For fun, we decided to compare the relative cost of the vacation with and without my smartphone at the time (a Samsung Galaxy S3, by the way) and by imagining if we’d taken the same vacation a mere 10 years earlier, how much extra would that vacation have cost without the same smartphone?

Smart Cost Savings

I was actually quite shocked at the outcome, both in terms of the number of other devices the modern smartphone now replaces (we managed to count 10) and at the potential cost savings it can yield, which we estimated at a whopping $3,450!

Smart Cost Analysis

The estimations below are really just for fun and are not based on very extensive research on my part (more of a gut feeling about a moment in time, plus some quick Googling). You can also assume a 3-week vacation near some theme parks in North America.

Telephony: $100

Assuming two 15 minute phone calls per week, from USA to Ireland, at mid-week peak rates, you could comfortably burn $100 here.

Camera: $1,000

Snapping around 1,000 old-school, non-digital photos (at 25 photos per 35mm roll of film) would require approximately 40 rolls of film (remember, no live preview). Then factoring in the cost of a decent SLR camera, plus plus the cost of developing those 40 rolls of film, you could comfortably spend well in excess of $1,000 here.

Of course digital cameras would indeed have been an option 10 years ago too but it’s unreasonable to suggest that a decent digital camera (with optical zoom, of sufficient portability and quality for a 3-week family vacation) could also have set you back $1,000.

Music Player: $300

The cost of an Apple iPod in 2005 was around $299.

GPS / Satellite Navigation: $400

It’s possible that in 2005, the only way to obtain a GPS system for use in North America was to rent one from the car rental company. Today, this costs around $10 per day, so let’s assume it would have cost around/under $20 in 2005.

Games Console: $300

The retail price for a Nintendo DS in 2005 was $149.99 but you also need to add in the cost of a selection of games, which cost around $50 each. Let’s be reasonable and suggest 3 games (one for each week of the vacation).

Laptop Computer: $1,000

I’m not entirely sure how practical/easy it would have been to access the Internet (at the same frequency) while on vacation in 2005 (i.e. how many outlets offered WiFi at all, never mind free WiFi). Internet Café’s would have been an option too, but would not have offered the levels of flexibility I’d had needed to catch up on emails and update/author some important business documents, so let’s assume the price of a small laptop here.

Mobile Hotspot / MiFi: $200

Again, not quite sure if these were freely available (or feasible) in 2005, but let’s nominally assume they were and price them at double what they cost today, plus $100 for Internet access itself

Alarm Clock: $50

I guess you could request a wake up call in your hotel but if you were not staying in a hotel and needed an alarm clock, you’d either have needed a watch with one on it, or had to purchase an alarm clock.

Compass: $50

Entirely optional of course, but if you’re the outdoor type and fancy a little roaming in some of the national parks, you might like to have a decent compass with you.

Torch: $50

Again, if you’re the outdoor type, or just like to have some basic/emergency tools with you on vacation, you might have brought (or purchased) a portable torch or Maglite Flashlight.

10 Ways to Analyzing your Apache Access Logs using Bash

Since it’s original launch in 1995, the Apache Web Server continues to be one of the most popular web servers in use today. One of my favourite parts of working with Apache during this time has been discovering and analysing the wealth of valuable information contains its access logs (i.e. the files produced on foot of general usage from users viewing the websites served by Apache on a given server).

And while there are plenty of free tools available to help analyse and visualise this information (in an automated or report-driven way), it can often be quicker (and more efficient) to just get down and dirty with the raw logs and fetch the data you need that way instead (subject to traffic volumes and log file sizes, of course). This is also a lot more fun.

For the purposes of this discussion, let’s assume that you are working with compressed backups of the access logs and have a single, compressed file for each day of the month (e.g. the logs were rotated by the operating system). I’m also assuming that you are using the standard “combined” LogFormat.

1. Total Number of Requests

This is just a simple word count (wc) on the relevant file(s):

Requests For Single Day

$ zcat access.log-2014-12-31.gz | wc -l

Requests For Entire Month

$ zcat access.log-2014-12-*.gz | wc -l

2. Requests per Day (or Month)

This will show the number of requests per day for a given month:

$ for d in {01..31}; do echo "$d = `zcat access.log-2014-12-$d.gz | wc -l`"; done

Or the requests per month for a given year:

$ for m in {01..12}; do echo "$m = `zcat access.log-2014-$m-*.gz | wc -l`"; done

3. Traffic Sources

Show the list of unique IP addresses the given requests orginated from:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq

List the unique IP addresses along with the number of requests from each, from lowest to highest:

$ zcat access.log-2014-12-31.gz | cut -d" " -f1 | sort | uniq -c | sort -n

4. Traffic Volumes

Show the number of requests per minute over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c

Show the number of requests per second over a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c

5. Traffic Peaks

Show the peak number of requests per minute for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3 | uniq -c | sort -n

Show the peak number of requests per second for a given day:

$ zcat access.log-2014-12-31.gz | cut -d" " -f4 | cut -d":" -f2,3,4 | uniq -c | sort -n

Commands Reference

The selection of Unix commands used to conduct the above analysis (along with what they’re doing above) was:

  • zcat – prints the contents of a compressed file.
  • wc – counts the number of lines in a file (or input stream)
  • cut – splits data based on a delimiting character.
  • sort – sorts data, either alphabetically or numerically (via -n option).
  • uniq – removes duplicates from a list, or shows how much duplication there is.

Injecting Data into Scanned Vintage Photos

In a previous post, entitled Vintage Photo Scanning – A Journey of Discovery, I shared my experiences while digitising my parent’s entire vintage photo collection.

As part of that pet project, I also took the liberty of digitally injecting some of the precious data I had learned about them (i.e. when and where the photos were taken, and who was in them) into the photo files themselves, so that it would be preserved forever along with the photo is pertained to. This article explains how I did that.

Quick Recap

My starting point was a collection of around 750 digital images, arranged into a series of folders and subdirectories, and named in accordance with the following convention:

<Date>~<Title>~<People>~<Location>.jpg

So the task at hand was how to programmatically (and thus, automatically) inject data into each of these files, starting over if required, so that sorting them or importing them into photo management software later becomes much easier.

Ready, Steady, Bash!

In order to process many files at a time, you’re going to need to do a little programming. My favourite language for this sort of stuff is Bash, so expect to see plenty of snippets written in Bash from here on. I am also going to assume that you are running this on a Unix/Linux-based system (sorry, Mr. Gates).

Setup

While each photo will have a different date, title, list of people and location, there are some other pieces of data you like to store within them (while you’re at it), which you (or your grandchildren) may be glad of later on. To this end, let’s set up some assumptions and common fields.

Default Data

If you only know the year of a photo, you’ll need to make some assumptions about the month and day:

DEFAULT_MONTH="06"
DEFAULT_DAY="01"
DEFAULT_LOCATION="-"

Reusable File Locations

Define the location of exiftool utility or any other programs you’re using (if they are located in a non-standard part of your system), along with any other temporary files you’ll be creating:

EXIFTOOL=/usr/local/bin/exiftool
CSVFILE=$(dirname "$0")/photos.csv
GPSFILE=$(dirname "$0")/gps.txt
GPS_COORDS=$(dirname "$0")/gps-coords.txt
PHOTO_FILES=/tmp/photo-files.txt
PHOTO_DESC=/tmp/photo-description.txt

Common Metadata

Most photo files support other types of metadata including the make/model of camera used, the applicable copyright statement and of the owner of the files. If you wish to use these, you could define their values in some variables that can be used (for each file) later on:

EXIF_MAKE="HP"
EXIF_MODEL="HP Deskjet M4500"
EXIF_COPYRIGHT="Copyright (c) 2014, James Mernin"
EXIF_OWNER="James Mernin"

Data Injection

List, Iterate & Extract

Firstly, compile a list of the files you wish to process:

find "$IMAGE_DIR" -name "*.jpg" > $PHOTO_FILES

Now iterate over that list, processing one file at a time:

while read line; do
 # See below for what to do...
done < "$PHOTO_FILES"

Now split the input line to extract the 4 field of data:

BASENAME=`basename "$line" .jpg`
ID_DATE=`echo $BASENAME|cut -d"~" -f1`
ID_TITLE=`echo $BASENAME|cut -d"~" -f2`
ID_PEOPLE=`echo $BASENAME|cut -d"~" -f3`
ID_LOCATION=`echo $BASENAME|cut -d"~" -f4`

Prepare Date & Title

Prepare a suitable value for Year, Month and Day (taking into account the month and day maybe unknown):

DATE_Y=`echo $ID_DATE|cut -d"-" -f1`
DATE_M=`echo $ID_DATE|cut -d"-" -f2`
if [ -z "$DATE_M" ] || [ "$DATE_M" = "$DATE_Y" ]; then DATE_M=$DEFAULT_MONTH; fi
DATE_D=`echo $ID_DATE|cut -d"-" -f3`
if [ -z "$DATE_D" ] || [ "$DATE_D" = "$DATE_Y" ]; then DATE_D=$DEFAULT_DAY; fi

It’s possible the title of some photos may contain a numbered prefix in order to separate multiple photos taken at the same event (e.g. 01-School Concert, 02-School Concert). This can be handled as follows:

TITLE=$ID_TITLE
TITLE_ORDER=`echo $ID_TITLE|cut -d"-" -f1`
if [ -n "$TITLE_ORDER" ]; then
 if [ $TITLE_ORDER -eq $TITLE_ORDER ] 2>/dev/null; then TITLE=`echo $ID_TITLE|cut -d"-" -f2-`; fi
fi

Location Processing

This is a somewhat complex process but essential boils down to trying to determine the GPS coordinates for the location of each photo. This is because most photo file only support the GPS coordinates inside their metadata. I have used the Google Maps APIs for this step with an assumption that you get an exact match first time. You can, of course, complete this step as a separate exercise beforehand and store the results of that in a separate file to be used as input here.

In any case, the following snippet will attempt to fetch the GPS coordinates for the location of the given photo. Pardon also the crude use of Python for post-processing of the JSON data returned by the Google Maps APIs.

ENCODED_LOCATION=`python -c "import urllib; print urllib.quote(\"$ID_LOCATION\");"`
GPSDATA=`curl -s "http://maps.google.com/maps/api/geocode/json?address=$ENCODED_LOCATION&sensor=false"`
NUM_RESULTS=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print len(data['results'])"`
if [ $NUM_RESULTS -eq 0 ]; then
 GPS_LAT=0
 GPS_LNG=0
else 
 GPS_LAT=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lat']"`
 GPS_LNG=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lng']"`
fi

Convert any negative Latitude and Longitude values to North/South or West/East so that your coordinates end up in the correct hemisphere and on the correct side Greenwich, London:

if [ "`echo $GPS_LAT|cut -c1`" = "-" ]; then GPS_LAT_REF=South; else GPS_LAT_REF=North; fi
if [ "`echo $GPS_LNG|cut -c1`" = "-" ]; then GPS_LNG_REF=West; else GPS_LNG_REF=East; fi

Inject Data

You now have all the data you need to prepare the exiftool command:

EXIF_DATE="${DATE_Y}:${DATE_M}:${DATE_D} 12:00:00"
echo "Title: $TITLE" > $PHOTO_DESC
echo "People: $ID_PEOPLE" >> $PHOTO_DESC
echo "Location: $ID_LOCATION" >> $PHOTO_DESC

if [ -n "$ID_LOCATION" ]; then
 $EXIFTOOL -overwrite_original \
  -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
  -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
  -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \
  -GPSLatitude=$GPS_LAT -GPSLongitude=$GPS_LNG -GPSLatitudeRef=$GPS_LAT_REF -GPSLongitudeRef=$GPS_LNG_REF \
  "$line"
 else
  $EXIFTOOL -overwrite_original \
   -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
   -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
   -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \
   "$line"
 fi

Cross your fingers and hope for the best!

Apple iPhoto Integration

Personally, I manage my portfolio of personal photographs using Apple iPhoto so I wanted to import these scanned photos there too. And so the Data Injection measures above I took above simplified this process greatly (especially the Date and Location fields which iPhoto understands natively).

While I did then go on to use iPhoto’s facial recognition features and manually tag each of the people in the photographs (adding several weeks to my project), the metadata injected into the files helped make this a lot easier (as it was visible in the information panel displayed beside each photo) in iPhoto.

Return on Investment

Timescales

In all, this entire project took almost 9 months to complete (with an investment of 2-3 hours per evening, 1-2 nights per week). The oldest photograph I scanned was from 1927 and the most precious one was one from my early childhood holding a teddy bear that my own children now play with.

Benefits

The total number of photos processed was somewhere in the region of 750. And while that may appear to be a very long time for relatively few photographs, the return on investment is likely to last for many, many multiples of that time.

Upon Reflection

I’ve also been asked a few times since then, “Would I do it again?”, to which the answer is an emphatic “Yes!” as the rewards will last far longer than I could ever have spent doing the work.

However, when asked, “Could I do it again for someone else?”, that has to be a “No”. And not because I would not have the time or the energy, but simply because I would not have the same level of emotional attachment to the subject matter (either the people or the occasions in the photos) and I believe that this would ultimately be reflected in the overall outcome. So hopefully these notes will help others to take on the same journey with their own families.