The Frazer Nash Archive
One of the first collections/archives made after learning Greenstone was for the Frazer Nash. It is based on an imported (exploded) CSV file of all the pre-war and post-war cars, about 440 cars (including "replicas"). The archive also contains about 120 objects, consisting of photos, documents and sample web pages. These objects were categorized in Greenstone with the standard Dublin Core metadata categories, plus author-created car-specific categories such as "car.make", "car.year", etc.
The Frazer Nash archive was completed in September 2012 and unchanged while other collections were created and other management systems reviewed. Based on lessons learned through work on Digital Asset Management systems (DAM) and Collection Management (CMS) systems, the archive was upgraded with new digital archival techniques. However, there is very little difference visible to an end-user. These changes improve searching and make the addition of new material more rationale. Also the Frazer Nash archive is closer to "best practices".
These are the types of improvements to the Frazer Nash archive:
Section D describes the first use of the ExifTool to efficiently put metadata into a group of photos from an Excel spreadsheet (exported to a CSV file). This should be very useful for archives with many digital photos which are partly or poorly identified.
The initial list of Frazer Nash cars in the archive had a simple numbering scheme - the first Frazer Nash built is "F001", the next one is "F002", etc. Museum and archives practices recommend an "accession number", which should be in the first section the year an object enters the collection, and then just serialized for each object after that. Multiple objects entering in a group would get a third digit.
The first Frazer Nash built in 1925 with S/N 1008 is now assigned "1925.1008". None of these cars actually "enter the collection", but this combination of year and chassis number seems to be a practical approach. Postwar cars have serial numbers such as "421/100/168". The last three digits are uniquely adequate and easily remembered, so a specific car built in 1952 becomes "1952.168"
Other resources - photos, documents, books - will use the archive convention if the "collection entry date" is known and meaningful. Most frequently, numbers are being assigned to the date(s) best identified with the object. A photograph or document from April, 1954 will become 1954.4.xxx with the "xxx" assigned as necessary. A spreadsheet is used record the file names, accession numbers and a description of the item. As discussed below, this data will be used eventually as "metadata" for each object/resource.
Figure 1 below is the master spreadsheet showing data for the early cars and the assigned accession numbers.
Figure 1 - Extract from a Frazer Nash spreadsheet
This spreadsheet was converted to a CSV file and imported into the Frazer Nash archive (see a guide to this process here) on July 20, 2013. Actually, it was imported, deleted, imported, etc. a few times before it was done correctly! As forecast, there are no apparent changes for an end-user.
"embedded metadata" is the next topic. There
are many standard metadata classifications but the primary
one used by Greenstone is the "Dublin Core" which has 15 basic
Table 1 - Dublin Core categories defined
These categories were known when the Frazer Nash archive was created, but their use was inconsistent. To better understand improving their use, the Dublin Core examples were reviewed. Specific examples in Table 2 below come from this Frazer Nash photo (Figure 2).
Figure 2 - Frazer Nash Mille Miglia publicity photo, Duke Donaldson is the driver
Table 2 - Frazer Nash Dublin Core examples
Data similar to that in Table 2 is already in the Frazer Nash archive for all the objects and resources, but was not very well "controlled." To approach archival standards, many of the terms should adhere to a "controlled vocabulary" from an "authority. For example, the Library of Congress Name Authority File recognizes "Frazer Nash" as the preferred term for this car and this project will use this as part of a controlled vocabulary. No authority could be found for John Stuart "Duke" Donaldson, the importer of several cars and owner of the Frazer Nash car and team that won Sebring in 1952.
All the resources in the original Frazer Nash Greenstone archive are described and classified by "external" metadata, which are stored by the Greenstone program as separate files. For the most part, the work that was done to create this metadata is only useful in the Greenstone environment, with some exceptions for specialized exports of Greenstone files. Collection managers now realize there are many benefits to "embedding" the metadata in the digital objects, as this forms a link between the metadata and the object that can only be changed by deliberate editing.
Embedding metadata is discussed elsewhere on this website and the Picasa and ExifTool programs allows this to be done efficiently. (A recent, unreviewed video on getting started with the ExifTool and GUI is here) The ExifTool (and its GUI) allow the metadata to be extracted to spreadsheet compatible files. These, in turn, can be imported to Greenstone or Greenstone archive can be built directly, using only the embedded metadata. In summary - "type it once, use it many times".
One initial decision is to choose which categories to use for embedding data - there are hundreds, perhaps thousands! Picasa captions are put into XMP and IPTC categories: "Description". Picasa tags (keywords) are put into "XMP.Keywords" and the "DC.Subject" categories. The ExifToolGUI was found to be more flexible, useful and efficient than Picasa. There is some inconsistent transfer of keywords between IPTC.Keywords and DC.Subject when using Picasa to embed "tags".
In the ExifToolGUI program, a custom Workspace file was created to embed metadata in certain Dublin Core, EXIF and XMP categories. The ExifToolGUI will display metadata in PDF, Word and Excel files. Although the the ExifToolGUI will write metadata to some PDF files, expect to use other programs (Acrobat, Lightning PDF, etc.) to embed metadata in PDF files. Word, Excel and compatible programs in LibreOffice and OpenOffice can embed limited categories of metadata in the "Properties" menu choice for "doc" and "xls" files.
Table 3, below, is the Workspace "set" (part of the "ExifToolGUIv5.ini" file) to use for work on the FN archive, created by much trial and error!
Table 3 - ExifToolGUI Workspace for the Frazer Nash archive
Below is the actual "WorkspaceTags" part of the "ini" file that will produce the Workspace described above and shown below. It can be copied and pasted into that corresponding section any ExifToolGUIv5.ini:
Figure 4 below shows what the ExifToolGui sees on the image file from Figure 2 before adding the example data from Table 2 (above), as embedded metadata.
Figure 4 - Screenshot from the ExifToolGui
The payoff from using a modified Workspace manager with the ExifToolGUI is the efficient ability to embed user-chosen data in digital photos and documents for later search and retrieval. Further, extracting the data to spreadsheet compatible files is easily done for many future uses. Experience builds proficiency with the ExifToolGUI - it can embed data in selected batches of photos quickly. The original photo file date can also be preserved.
Figure 5, below, is an Excel spreadsheet made from ExifTool extraction of metadata embedded in 200+ images and documents in the Frazer Nash Greenstone archive. Although the "external metadata" in the original archive was not changed, this demonstrates that the title, subject, keywords, etc. embedded in these documents can also be used for classifying and searching the archive and for any other future need. Note that the "Identifier" category (column) is the new accession number assigned to these digital objects.
Figure 5 - Screenshot from an Excel spreadsheet of extracted metadata
This Frazer Nash archive has also embedded the "DC:Relation"/"Primary
Object Number" in nearly all photos and documents,
showing the relation of the digital object to
the physical object (primarily a particular Frazer Nash
car). This has the potential for very good, future
the example in Figure 4 above, the subject Frazer Nash car is
"1952.168". This technique can be used for collections
and databases that include lists of car owners, cars,
events and digital objects.
Based on more trials and feedback from reviewers with the sample of photos and documents in the archive, this webpage will further make recommendations that may help others making Greenstone collections.
There are about 500 personal photos from the Frazer Nash Car Club Raid to New England, September 24 - October 3, 2013. Just over 400 were considered "good" and tagged into a Picasa Album, then uploaded to a Picasa Web Album. The ExifTool GUI was used to put captions in "DC:Description" category; these appear as captions in Picasa on the photos. The cars were also identified with an assigned accession number in the "DC:Relation"/"Primary Object Number" fields. These accession numbers were created as described above (e.g. "1952.196"), but a few cars could not be identified with a chassis number, so numbers such as "1937.UNK1" were assigned, for temporary, testing reasons. Keywords and map locations were also assigned in the ExifToolGUI and confirmed in Picasa.
Adding a unique accession number (in the "DC:ResourceIdentifier" category) for each photo would be tedious using the ExifToolGUI, so the ExifTool "-tagsFromFile" option, used from the command line. This option is described as using data from a CSV file ("saved as" from Excel) to write to entire folders of images as new (or added) metadata. After help from the ExifTool forum, this command was successfully run. These were the steps:
This produced the "Raid1030.csv" file, opened in Excel.
2. The metadata for each photo (in the Excel rows) and in each category (in the columns) was checked.
3. Specific data for 102 photos of individual cars was copied from the "Keywords" column to the empty "Title" column. This set of photos was the initial selection of photos to be added to the Greenstone archive.
4. The Excel "data fill" function was used to create an "accession number" for all 400+ photos in the format "2013.9.1", "2013.9.2" etc. in the "Identifier" column.
5. Columns that had no new data were deleted, leaving only "SourceFile", "Title" and "Identifier".
6. The Excel file was saved in the "CSV" format, using a new file name: "Raid1030input.csv". This was done to prevent confusion with the CSV file which extracted the metadata from the photos.
7. A command window was opened and the "e:\DigitalLibrary\USRaid\" drive and directory for the photos was maneuvered to.
8. This ExifTool command was entered:
9. Success! The accession numbers were added to all 405 photos as "Identifiers" and the 102 individual car photos now had "Titles". The ExifTool had backed up the original photos with an added "original" file extension.
77 of the 102 photos were further selected and added to the Frazer Nash Greenstone archive. Only a single Greenstone item of "external" metadata was added in the "DC:Description" category at the "folder" level: "Frazer Nash cars on the Raid to New England, 2013" using the Greenstone "Enrich" function.
The Raid car photos can be reviewed in the Greenstone archive in the "titles" browsing tab by looking for the "year of manufacture" of any car; it's the last tab segment: "0-9".
One anomaly was noted in the displayed "Document/photo date" field for some Raid cars - the date of the most recent photo modification (when photos were resized smaller for this archive by an export from Picasa) is displayed. For other cars, the preferred "DateTimeOriginal" is shown. Review of the metadata in Greenstone shows "DateTimeOriginal" has not been consistently extracted from all files; this is an issue for further investigation.
To provide another method to find the Raid cars, the Greenstone "Create" function was used to add the metadata category "ex.XMP.Relation" to "car.Serial" as a search index. Because this archive primarily holds digital objects classified with metadata originally imported as Greenstone "external" metadata, "car.Serial" is on nearly all these original objects. The Raid cars embedded field "Relation" is now the "accession number", assigned by the year AND serial number of each car (if known).
The full metadata descriptor for "Relation" is "ex.XMP.Relation". Greenstone will search both of these fields in the single "car serial number" search box. For example, searching for "2065" (look for "2065" by searching for "car serial number") will display two photos of the 1932 TT Replica that visited the Raid at the Lime Rock race track and the simple (nul) record originally imported in July, 2013. See section A. above and Figure 1.
The ExifTool was later used on the original photographs in the "sep13" and "oct13" folder, in two steps. First, all photos had complete (and new) accession numbers added, even those not related to the Raid. Next, those photos intended for the Greenstone archive had "Titles" added, exactly as done previously. This new metadata was visible, of course, in the Picasa Album for the Raid. Later all the original Raid photos had accession numbers and "titles" added. Individual car photos were exported and resized, as done previously, and the photos replaced those previously in the Frazer Nash Archive. These steps were repeated to develop and confirm a process that can be recommended for other collections and archives.
When a photo is found or viewed on the Frazer Nash archive (search for "2065" as above), this photo can be saved - "Save Image As..." - and the metadata can be reviewed in the ExifToolGUI or other programs. The file name may have been changed, but the original metadata has been preserved.
Alternative Greenstone search and browsing categories are possible, as are changes to the display format of the search and browsing results. Greenstone reports it has extracted 90+ metadata items from most digital objects, so many, many search and display formats are possible!
E. Further Metadata Trials and Recommendations
After a visit to the Frazer Nash Archives in September 2014, the command-line ExifTool was used to create Excel spreadsheets from more than 2000 Frazer Nash photos in 53 subdirectories of a single "AFNPics" folder/directory to evaluate its possible application to the digital resources in the Frazer Nash Archives. The ExifTool was also used to create embedded metadata for the 800+ travel photos in England.
Based on this recent experience, these recommendations should be considered as "next steps" for any collection of digital assets:
In conclusion, why consider creating "embedded metadata"? Most significantly, embedding metadata in digital objects (photos, etc.) results in those objects being very well identified for many future uses - not only for Greenstone! Databases, Digital Asset Management (DAM) systems and Collections Management Systems (CMS, for archives and museums) most always can use the exported Excel-file data directly or indirectly as imports into their systems.
November 3, 2014
update December 10, 2015