Injecting Data into Scanned Vintage Photos

In a previous post, entitled Vintage Photo Scanning – A Journey of Discovery, I shared my experiences while digitising my parent’s entire vintage photo collection.

As part of that pet project, I also took the liberty of digitally injecting some of the precious data I had learned about them (i.e. when and where the photos were taken, and who was in them) into the photo files themselves, so that it would be preserved forever along with the photo is pertained to. This article explains how I did that.

Quick Recap

My starting point was a collection of around 750 digital images, arranged into a series of folders and subdirectories, and named in accordance with the following convention:

<Date>~<Title>~<People>~<Location>.jpg

So the task at hand was how to programmatically (and thus, automatically) inject data into each of these files, starting over if required, so that sorting them or importing them into photo management software later becomes much easier.

Ready, Steady, Bash!

In order to process many files at a time, you’re going to need to do a little programming. My favourite language for this sort of stuff is Bash, so expect to see plenty of snippets written in Bash from here on. I am also going to assume that you are running this on a Unix/Linux-based system (sorry, Mr. Gates).

Setup

While each photo will have a different date, title, list of people and location, there are some other pieces of data you like to store within them (while you’re at it), which you (or your grandchildren) may be glad of later on. To this end, let’s set up some assumptions and common fields.

Default Data

If you only know the year of a photo, you’ll need to make some assumptions about the month and day:

DEFAULT_MONTH="06"
DEFAULT_DAY="01"
DEFAULT_LOCATION="-"

Reusable File Locations

Define the location of exiftool utility or any other programs you’re using (if they are located in a non-standard part of your system), along with any other temporary files you’ll be creating:

EXIFTOOL=/usr/local/bin/exiftool
CSVFILE=$(dirname "$0")/photos.csv
GPSFILE=$(dirname "$0")/gps.txt
GPS_COORDS=$(dirname "$0")/gps-coords.txt
PHOTO_FILES=/tmp/photo-files.txt
PHOTO_DESC=/tmp/photo-description.txt

Common Metadata

Most photo files support other types of metadata including the make/model of camera used, the applicable copyright statement and of the owner of the files. If you wish to use these, you could define their values in some variables that can be used (for each file) later on:

EXIF_MAKE="HP"
EXIF_MODEL="HP Deskjet M4500"
EXIF_COPYRIGHT="Copyright (c) 2014, James Mernin"
EXIF_OWNER="James Mernin"

Data Injection

List, Iterate & Extract

Firstly, compile a list of the files you wish to process:

find "$IMAGE_DIR" -name "*.jpg" > $PHOTO_FILES

Now iterate over that list, processing one file at a time:

while read line; do
 # See below for what to do...
done < "$PHOTO_FILES"

Now split the input line to extract the 4 field of data:

BASENAME=`basename "$line" .jpg`
ID_DATE=`echo $BASENAME|cut -d"~" -f1`
ID_TITLE=`echo $BASENAME|cut -d"~" -f2`
ID_PEOPLE=`echo $BASENAME|cut -d"~" -f3`
ID_LOCATION=`echo $BASENAME|cut -d"~" -f4`

Prepare Date & Title

Prepare a suitable value for Year, Month and Day (taking into account the month and day maybe unknown):

DATE_Y=`echo $ID_DATE|cut -d"-" -f1`
DATE_M=`echo $ID_DATE|cut -d"-" -f2`
if [ -z "$DATE_M" ] || [ "$DATE_M" = "$DATE_Y" ]; then DATE_M=$DEFAULT_MONTH; fi
DATE_D=`echo $ID_DATE|cut -d"-" -f3`
if [ -z "$DATE_D" ] || [ "$DATE_D" = "$DATE_Y" ]; then DATE_D=$DEFAULT_DAY; fi

It’s possible the title of some photos may contain a numbered prefix in order to separate multiple photos taken at the same event (e.g. 01-School Concert, 02-School Concert). This can be handled as follows:

TITLE=$ID_TITLE
TITLE_ORDER=`echo $ID_TITLE|cut -d"-" -f1`
if [ -n "$TITLE_ORDER" ]; then
 if [ $TITLE_ORDER -eq $TITLE_ORDER ] 2>/dev/null; then TITLE=`echo $ID_TITLE|cut -d"-" -f2-`; fi
fi

Location Processing

This is a somewhat complex process but essential boils down to trying to determine the GPS coordinates for the location of each photo. This is because most photo file only support the GPS coordinates inside their metadata. I have used the Google Maps APIs for this step with an assumption that you get an exact match first time. You can, of course, complete this step as a separate exercise beforehand and store the results of that in a separate file to be used as input here.

In any case, the following snippet will attempt to fetch the GPS coordinates for the location of the given photo. Pardon also the crude use of Python for post-processing of the JSON data returned by the Google Maps APIs.

ENCODED_LOCATION=`python -c "import urllib; print urllib.quote(\"$ID_LOCATION\");"`
GPSDATA=`curl -s "http://maps.google.com/maps/api/geocode/json?address=$ENCODED_LOCATION&sensor=false"`
NUM_RESULTS=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print len(data['results'])"`
if [ $NUM_RESULTS -eq 0 ]; then
 GPS_LAT=0
 GPS_LNG=0
else 
 GPS_LAT=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lat']"`
 GPS_LNG=`echo $GPSDATA|python -c "import json; import sys; data=json.load(sys.stdin); print data['results'][0]['geometry']['location']['lng']"`
fi

Convert any negative Latitude and Longitude values to North/South or West/East so that your coordinates end up in the correct hemisphere and on the correct side Greenwich, London:

if [ "`echo $GPS_LAT|cut -c1`" = "-" ]; then GPS_LAT_REF=South; else GPS_LAT_REF=North; fi
if [ "`echo $GPS_LNG|cut -c1`" = "-" ]; then GPS_LNG_REF=West; else GPS_LNG_REF=East; fi

Inject Data

You now have all the data you need to prepare the exiftool command:

EXIF_DATE="${DATE_Y}:${DATE_M}:${DATE_D} 12:00:00"
echo "Title: $TITLE" > $PHOTO_DESC
echo "People: $ID_PEOPLE" >> $PHOTO_DESC
echo "Location: $ID_LOCATION" >> $PHOTO_DESC

if [ -n "$ID_LOCATION" ]; then
 $EXIFTOOL -overwrite_original \
  -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
  -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
  -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \
  -GPSLatitude=$GPS_LAT -GPSLongitude=$GPS_LNG -GPSLatitudeRef=$GPS_LAT_REF -GPSLongitudeRef=$GPS_LNG_REF \
  "$line"
 else
  $EXIFTOOL -overwrite_original \
   -Make="$EXIF_MAKE" -Model="$EXIF_MODEL" -Credit="$EXIF_CREDIT" -Copyright="$EXIF_COPYRIGHT" -Owner="$EXIF_OWNER" \
   -FileSource="Reflection Print Scanner" -Title="$TITLE" -XMP-iptcExt:PersonInImage="$ID_PEOPLE" "-Description<=$PHOTO_DESC" \
   -AllDates="$EXIF_DATE" -DateTimeOriginal="$EXIF_DATE" -FileModifyDate="$EXIF_DATE" \
   "$line"
 fi

Cross your fingers and hope for the best!

Apple iPhoto Integration

Personally, I manage my portfolio of personal photographs using Apple iPhoto so I wanted to import these scanned photos there too. And so the Data Injection measures above I took above simplified this process greatly (especially the Date and Location fields which iPhoto understands natively).

While I did then go on to use iPhoto’s facial recognition features and manually tag each of the people in the photographs (adding several weeks to my project), the metadata injected into the files helped make this a lot easier (as it was visible in the information panel displayed beside each photo) in iPhoto.

Return on Investment

Timescales

In all, this entire project took almost 9 months to complete (with an investment of 2-3 hours per evening, 1-2 nights per week). The oldest photograph I scanned was from 1927 and the most precious one was one from my early childhood holding a teddy bear that my own children now play with.

Benefits

The total number of photos processed was somewhere in the region of 750. And while that may appear to be a very long time for relatively few photographs, the return on investment is likely to last for many, many multiples of that time.

Upon Reflection

I’ve also been asked a few times since then, “Would I do it again?”, to which the answer is an emphatic “Yes!” as the rewards will last far longer than I could ever have spent doing the work.

However, when asked, “Could I do it again for someone else?”, that has to be a “No”. And not because I would not have the time or the energy, but simply because I would not have the same level of emotional attachment to the subject matter (either the people or the occasions in the photos) and I believe that this would ultimately be reflected in the overall outcome. So hopefully these notes will help others to take on the same journey with their own families.


One thought on “Injecting Data into Scanned Vintage Photos”

Comments are closed.