Car Collections: A Greenstone Sample

Bob Schmitt

November 11, 2011

Return to Main Page

Summary:

The DVD labeled "RGS-Collections" is a self-running program (Greenstone software) which contains three sample "collections" intended to demonstrate this software's ability to organize data, related particularly to car collections.  These separate files, selected from the opening screen, are:

  • Petersen Automotive Museum data (370 documents, images, webpages)

  • Frazer Nash data (100 documents, images)

  • Fabulous Fifties Newsletters (48 documents)

After selecting any "collection", you can further search for documents, photos or webpages using the full-text search window or a preset "browser" to display the content based on manually-set categories (meta-tags), such as "Title" or "Description".

Put in the DVD and you should see this screen:

Click "Enter Library" and this will appear:

You can choose any of the three "collections".  If you choose the Frazer Nash logo, this will appear:

From this screen, you can do a full-text Search or browse the Titles, Descriptions or Subjects categories.  If you make the pull-down choice "car.location, car.model, (etc) in the "Search in" box, you will see this:

'

If you search on "1952", you will get this result:

These are all the results that have been categorized under "1952".  You can further click on any item to see the full image or document.

Go back in your web-browser interface to the original library screen and click "titles" in the menu bar.  You will get this:

The icon on the left is a "bookshelf, indicating one or more images or documents have been categorized under one of these "titles".

Click "Bristol and Frazer Nash Documents", third from the top, and you will see this:

You can click on any of the small images or the Acrobat icon to see the full document.  If you click on the "page" to the left of these, you may see Greenstone's text of the same document, if available.

You can go back to the main screen again and enter the "Fabulous Fifties" newsletter collection or the Petersen Automotive Museum collection and search or browse in a similar manner.

At this stage, the indexing and categorization in these collections is not consistent or complete, but these samples should demonstrate Greenstone's potential nevertheless.  As we learn more, our categorization skills will improve!  More important - we want to form a community of users to settle on standards for "titles", "descriptions", and "subjects".  "Car.make", "car.manufacturer", and "car.year" are much easier to understand!

Email me with any questions!  Bob Schmitt, rgschmitt@gmail.com

Greenstone Digital Library Background

Greenstone is designed to organize digital media into a "library".  It originated from and has an active development team at Waikato University, Hamilton, New Zealand.  It follows widely-accepted digital library standards and is both very professional and powerful.

Making a Greenstone Digital Collection

To make a Greenstone collection:

1.   Define a "Collection" name.

2.   "Gather" the material by dragging the images/documents in from local drives to a window.

3.   Optionally "Enrich" each item with "meta-tags", a traditional librarian function. Numerous (new) meta tags can be added to any item.  Meta-tags have been added to many documents in the sample by the Enrich function.

4.   Numerous other index and search categories on the meta-tags for each item are available and can be pre-defined during the "Design" stage by an author or librarian.  Also, Greenstone "plug-ins" extract meta-tags from many item types.  The most recent Greenstone release, 2.85, extracts EXIF meta-tags from digital photos.  My testing confirms this.

5.  In the "Create" stage all items in the Collection are indexed and categorized.  This can be a lengthy process depending on the speed of the computer and Collection size.

5.   "Preview" then becomes available through a web browser. 

6.   Your default web browser lets you search and view all Collection items, such as documents and images; the full image or document can be viewed after clicking on it from the search results.  Documents, including newer PDF files, are fully text-searchable. 

The Greenstone.org website has workshop courses and tutorials to download.  These are well-designed courses and explain how to use existing meta-tags or add custom meta-tags.  For the sample data, I've added the the "car.xxx" meta-tags. 

Greenstone has functions to import nearly any type of file, change the web-browser interface appearance and change how the search and browse results are displayed.  The Librarian Interface makes this customization "easier" but expect using some technical skills to get full use of the Greenstone functions.  

Greenstone is very robust; see the examples of completed Collections at http://www.greenstone.org/examples  The largest collection, with a reported 1,000,000+ images, is an extensive archive of New Zealand newspapers, both as images and full text.

A similar open-source program, DSpace, was developed by MIT and HP for academic use.  The DSpace site shows over 300 institutions using this system, primarily in the U.S.  Greenstone has a tutorial to show how to move a digital collection from DSpace to Greenstone and vice-versa.  Comparisons of both systems on the Internet seem to confer no advantage to either and note that meta-tag classification in either collection is preserved.

Appendix - Contents of the Collections

1.  The Fabulous Fifties collection consists of 48 newsletters and special announcements from this "non-club" organization.  Most are in PDF format and can be searched in full-text by Greenstone.

2.  The Petersen collection includes several of their web pages, photos of their current exhibits and some digital assets in a group titled Digital Library Collection. A list of the video interviews done by Bill Pollack are part of these digital assets.

The Digital Library Collection also includes early auto periodicals.  Eight years ago Bob Norton (gone from us too early) embarked on a massive scanning project of early west coast racing newsletters and magazines.  Less than 25% of his work is included in the Petersen collection.  He made a full 28 page index of the 9 volumes of "MotoRacing" (Index.pdf) which was text-rich and searchable, including in this Greenstone sample collection.  The PDF files of the documents in the Norton segment otherwise are images only, not text-searchable.  However, such PDF images can by put through OCR software to generate text.  One page of one copy of MotoRacing was processed using ABBY FineReader and it worked very well, with about 95% accuracy.

3.  In the Frazer Nash collection is a (redacted) Excel list of the Frazer Nash Club members and their cars.  The "cars" file was expanded to include all the postwar cars.  There are also approximately 100 Frazer Nash images and documents in this collection, many of the postwar cars.

Conclusion/Recommendations:

A car collector or the manager of a(ny) collection with a need to organize the data on the vehicles in the collection should:

1.   Review the data currently available and create a database from lists of cars, owners, club members. An Excel spreadsheet if a good first step if nothing else has been done. Data that has been structured in an Excel file or almost any database is nearly always easily transported into an improved database system.

2.   Digital assets and other "unstructured" data should be categorized. File folders and index cards are a good start for documents and books. Images and other digitized data can be put into labeled hard-drive folders; a re-naming system for individual images can be useful, but perhaps not essential.

3.   Web pages made for the collection are also digital assets and should be marked for indexing.

4.   Start a Greenstone trial with sample digital assets. Much can be learned about the usefulness of file-naming standards, meta-tags/metadata, scanning needs, etc. 

5.   You may find Greenstone meets your needs, but a trial/test process will assist you in developing plans (and specifications/requirements) for your continuing project and also help with decisions for added resources (trained volunteers or professional services) or use of a different digital/content management system.

5.   Consider using a "content management system" if Greenstone or other "library system" does not seem suitable.

About Bob Schmitt: I am a car hobbyist and mostly retired from my former occupation as a technology contracts manager (tech-contracts.com).  My car hobby is currently centered on a Frazer Nash car which was restored in Arizona and New Zealand and is currently in the Classic Car Museum in Nelson NZ. (my website for the car is FrazerNash-USA.com). Long before my career in law and contract managements, I completed grad school in Information Science at the University of Hawaii (in the days of punch cards!)  However, when a law school opened at UH and I had unused GI Bill time, I took a path away from computers for a long time.

I've been involved with several database and scanning projects for contracts with my last three (large corporate) employers; the results did not seem to justify the costs.

A few years ago, I began to help Bill Pollack with his video interview project (with people more or less well-known in the car/racing world) for the Petersen Automotive Museum (Los Angeles).  Initially I digitized video tapes of Bill's completed interviews and later helped with his new interviews. We now use high-definition video, which I  process to create standard DVDs from the hi-def format.  Iíve also collected and organized the results of a scanning project of '50s-'60s motoring journals done by retired engineer Bob Norton. 

Return to Main Page