Importing Files Into Greenstone
This webpage is intended to show a step-by-step process how to get from a "list" into a basic Greenstone collection of records, with their metadata, ready to build a digital library/archive. If you have landed on this page and are new to Greenstone and may be unsure why you want to import a list, click on the Return to Main Page link above. If you have immediate questions or comments, please email me, Bob Schmitt.
This suggested process assumes you have some familiarity with Greenstone and Excel. The Greenstone wiki has links to both a tutorial and the lesson plans for three to five-day workshops. All are very good and recommended. I highly recommend obtaining the very complete book, "How to Build a Digital Library" by Witten, Bainbridge and Nichols. It contains excellent background and considerations for a digital library and a great Greenstone tutorial and reference.
The examples on this webpage are Excel files with real, working data actually imported into Greenstone. A few were done many times, until I re-read the manual or learned from mistakes.
This process also assumes you have some knowledge of library classification techniques. If not, do a little research on this topic or talk to an expert. One of the official Dublin Core websites is a good starting point, especially topic 4, "Elements" at the bottom of that webpage. Basic categories such as "Title", "Description", and "Subject and Keyword" can be confusing and tedious to correct after an import if you are dealing with many records. It's best to get it (mostly) right in the beginning.
"The Perfect is the enemy of the Good (enough)" - Voltaire
So don't hesitate too long to get started!
Why Do This?
If you have a collection of objects - cars, paintings, books, photographs - and have been reasonably organized, you probably have a list (inventory) of these objects and some of their characteristics: "1948 Oldsmobile 88, green, bought in 1985 from Ted Smith, loaned to the Oakville Museum, call Fred Friday for status" or "Snowy Winter, oil on canvas, by Patricia Jones, 26"x18", bought in 1997 for $175". If you are a bit more organized, this list and it attributes may now be on a computer, in some type of database. But what do you do with the documents about the restoration of your 1948 Oldsmobile or the hundreds of photos at various car shows? You can put linked references - or sometimes the actual document - on your PC. But when the documents grows to hundreds and thousands and you want direct, repeatable access to any type of digital file related to your collection, a software program that goes beyond databases or photo organizers is recommended - Greenstone Digital Library software. The cost is right - it's open-source (free to use). And it's not just for "libraries", but applicable for collections, archives, libraries and museums.
When you make the conversion of your (Excel) list to Greenstone, the attributes of each object become metadata in categories you can use for access or classifying your current collection and future acquisitions. Further, if you use Greenstone's metadata sets, these classifications will be recognized and can be accessible to a wider audience. Your collection on Greenstone can remain private on your computer or home network or made accessible to the Internet.
Lists, Databases and Excel
A good example of a "list" is a simple contact list of names, addresses and other personal or business information. Such a list would look like this:
Lists such as the example above have been hand-written and typed for generations, but computers allow us to put this data into Excel (or Word) tables or more advanced programs, such as Access. For a useful table (or list) to be used as a "database", some rules must be followed:
Note in the example above, Record 01 uses "CA" as the standard abbreviation for "California" whereas Record 02 spells out the full state name. This is not good database practice.
We highly recommend, and this webpage will use, Excel as the table/list/database for gathering records and data for import into Greenstone. If you have typed lists, investigate using OCR to convert them to Excel files. Or find a good typist familiar with Excel! If your tables are in Word or another program, they usually can be copied into or imported into Excel.
Excel provides many functions to help you review and clean-up your data. Sorting, copying, pasting and moving cells of data will speed up any need to make your data uniform and within good database practices.
For more information on using Excel as a database, see Using Excel As A Database or any Excel book.
Let's look at another example of records and a database, closer to data we may want to bring into Greenstone:
This is another example we will use to show how a archive catalog can be imported into Greenstone:
Finally, without providing much detail at this stage, here are two examples of field names (columns headings) that have been used for Excel files to record data on cars (vehicles) and their owners, in separate files:
Create an Excel File/Database
Using the examples above, create or check your Excel file to ensure all the data for each record is on a single line, "like kind" data is in each column, variations in each data item (spelling and abbreviations) have been made uniform, and blank lines have been eliminated. Blank cells are OK.
Dates often assume great variability. The can be either "text" or one of Excel's date formats - which can look exactly like text. A good method to fix dates is to sort the entire file on the "date" field. Dates in text format should be at the top and should be corrected to one of the Excel formats for dates.
Your file can have a few records - probably best for initial trials - or thousands of records. Greenstone seems to import very quickly!
Finally, create a new first column for your Excel File with a name such as "RecordID" - the exact field name is not critical. The data in this column should be name that means something to you plus a number, perhaps, to make sure each records has a unique identifier. For the Frazer Nash car file above, this would be something like "HighSpeed05", "Highspeed06", and "LeMansReplica08". For the archive file above, this would be "Sales-Promo001". If you have many records, this can be tedious to do manually, so use Excel's process to create a list of consecutive numbers, format the numbers into a standard format (ie. "01", "02") and then use the "Concatenate" function to combine this number field with a data item from a different column. This column of data will be very useful later in both Greenstone and Access!
Import/Backup to Access
Consider importing your Excel file(s) into Access. Your data will be more secure (harder to delete inadvertently) and Access will give you excellent reporting (print or online) and query abilities. Files (tables) can be linked together to make a potentially very powerful and useful relational database. Examples of Access databases for car collections, linking to car owners, events and other historical data can be found on this related webpage. The "RecordID" field you created in the previous step can become a key index.
Review Standard Greenstone Metadata Categories
Assuming you have made at least an initial exploration of Greenstone, you should be familiar with the "Dublin Core Metadata Standard", which is the basic classification scheme used in Greenstone and widely recognized by digital libraries and other resources (including web pages).
The Dublin Core basically consists of these elements:
In Greenstone, each Dublin Core element is prefixed with "dc.", so they appear as dc.Title, dc.Creator, etc. Because these elements are widely accepted and recognized, it is a good idea to match your field names to the Dublin Core elements, insofar as that is possible.
For our example of an archive file, we will use this mapping:
Note that not all Dublin Core metatags are mapped from the file scheduled to be imported; unused metatags can be added after the import as needed, directly in Greenstone. Note also that new "item" metatags have appeared. These should be added to Greenstone before the import. This process is described below.
Other metatags will be also added, such as "car.Make", "car.Model", etc., specifically because the data of this archive is from a car company and a car club.
The "mapping" is very easy - just rename your column headings to the relevant Dublin Core element or a new metatag you plan to add to Greenstone.
Adding Metatags to Greenstone
There is a Greenstone tutorial which describes how add new metadata elements by the Metadata Set Editor. Either use this approach or click on the "Manage Metadata Sets box in the lower left when you are in Greenstone's "Enrich" panel.
For the the imports of the Excel example files shown above, two metadata sets were created: one for "cars.XXX" and one for "item.XXX"
"Exploding" (Importing) Your Database
If you have worked with Greenstone, you know "importing" an Excel file/database is very easy - In the "Gather" panel, just drag the file across from the Local Filespace on the left to the Collection panel on the right. You should do this as the first record in your new collection. When the collection is set up, this will give you a searchable file containing all your records.
But the purpose of this webpage process is to create hundreds of (nul/empty) records, with each of their data elements a new metatag, ready to be joined to a document, photo or other digital item. The database must be "exploded"!
A Greenstone tutorial explains this - on that page, follow onwards from step 15.
An Excel file cannot be "exploded", but such a file is easily saved to a "comma-delimited" (csv) file.
Your work to complete your new collection has only just begun!
If you would like specific help with your Excel file, send it to me by email (all or part) and I'll send you suggestions or make the actual import to a Greenstone collection. All will be volunteer work, until I feel fully qualified to charge for services!
Email me with any questions! Bob Schmitt, firstname.lastname@example.org
April 26, 2012