Technical Documentation - ALA Archives Digital Collections
 
 

This page is intended primarily for anyone who will continue the project by digitizing more collections and make them accessible through the web. Therefore the technical details covered on this page is specific to ALA Archives Digital Collections Project. However all are welcome to read it :) Technical information on this page will be applicable for digitizing other collections in the University of Illinois Archives besides ALA Archives. General readers might also find useful the information regarding the tips and tricks in using Query Builder to create Predefined Custom Searches. The Predefined Custom Search feature is very useful in assisting users to target a specific part of a collection, something users will never be able to do using the generic Search Page provided by CONTENTdm.



Scanning Process | Image Post Processing | Cataloging & Data Entry | Creating Predefined Custom Searches in CONTENTdm

SCANNING PROCESS

The Digital Imaging & Media Technology Initiative has developed a "Guidelines for Digital Imaging Projects" for University of Illinois at Urbana-Champaign. The same guideline has been used as reference in digitizing the ALA Archives. Since I knew the existence of this guideline only some time after I started the project, some Full Resolution images were scanned in 300 dpi, instead of 600 dpi as recommended by the guideline. I decided to proceed and used the guideline for the next batches of materials without going back to retrospectively scan the materials that have been scanned in 300 dpi.

Back to Top
 

IMAGE POST PROCESSING

Image post processing is needed for several reasons:

Photoshop was used as the image processing software to perform some or all of the following tasks:


Back to Top
 

CATALOGING & DATA ENTRY

In starting to catalog a new batch of materials for a particular collection (new Theme), it is recommended to use the ALA/UIUC Archives Digital Collection Worksheet to assist the data entry for the first 20-25 records. The worksheet will help us to get the 'feel' of the various values that will be used in that particular collection since it is easier to compare pages of record in printed form rather than on screen. It will help us to be more consistent in assigning values to various fields.

After we get the 'feel' of the whole batch of materials we can start entering data directly to the CONTENTdm Acquisition Module without the worksheet. For consistency it is always recommended to use standard terms in describing each material. The following standards were used to assist in describing each material in the collection: (see Description of Metadata for further information)

For using the CONTENTdm Acquisition Module you will need the Assistant Archivist's User ID and Password. It is recommended to upload the content of the Acquisition Module to the server periodically and notify the server administrator to approve the newly uploaded contents. It is also useful to make use of the indexes that are available for each searchable field in the collection. You will need to ask the server administrator to re-index the searchable fields so that the changes will be reflected in the Acquisition Module. Any upgrade in the server can cause incompatibility in the Acquisition Module. Contact the server administrator (Tim Cole) for further information.

Basically we use the Acquisition Module to catalog and upload the Access Images (JPG) and the Thumbnails into the server. The Thumbnail images will be created automatically by the Acquisition Module. However we should manage the Full Resolution Images manually by storing them into Compact Discs. For each record it is crucial to always enter the CD Volume Name, where the Full Resolution Image (TIF) on that record is stored 'physically' in a CD. This information is very crucial when a user want to access the Full Resolution Image for a particular Access Image he/she finds on the web through CONTENTdm. Every batch of Full Resolution Images with the total file size of 650 MB or less can be stored in a single CD.

Actually CONTENTdm has Full Resolution Manager (FRM), which can assist us in managing the Full Resolution Images. FRM will automaticall create a folder in the local computer (where the Acquisition Module is installed) and add an additional field in the database called "Full Resolution" that will record the CD Volume Name, where a particular image is stored. The Volume Name will be incremented automatically (Example: from ALA-9901015-1 to ALA-9901015-2) and another folder with that name will be created everytime it reaches 650 MB. All we need to do is to copy each folder onto a CD. The Prefix for the CD Volume Name (Example: ALA-9901015-) can be set up in the Full Resolution Manager page, accessible through the Acquisition Module. Using the FRM will not cause the Acquisition Module to upload the Full Resolution Images to the server. However due to security and other considerations, the server administrator decided not to provide this feature in Acquisition Module. To compensate this I added a field called CD Volume that will hold the CD Volume Name information for each image. To learn more about the Full Resolution Manager, go to CONTENTdm Tutorial on Full Resolution Archiving.

In order to make modification in a record that you have uploaded to the server, you will need to use the Search feature of the Acquisition Module and save the record temporarily in the Acquisition Module, make the changes and upload it back to the server. Don't forget to notify the server administrator to approve your upload every time you do it. Otherwise your newly uploaded record(s) won't be accessible from the web.
 

Back to Top
 

CREATING PREDEFINED CUSTOM SEARCHES IN CONTENTdm

This section is not intended to explain on how to use Query Builder since CONTENTdm has provided a tutorial called "Query Builder: Creating Customized Collection Interface." Instead this section will provide you with tricks to expand the capability of Query Builder, which is not available on CONTENTdm documentation. I found the  tricks by trial and error. Before proceed reading this section, it is important that you have read and understand the Query Builder tutorial. Knowledge of HTML will also be very useful though not required.

Here are some basics that you need to know in order to help you understand why we need the tricks. Query Builder is a web-based 'wizard' that allow you to create predefined custom searches. You can paste the predefined custom searches created using Query Builder on your web page in the forms of simple hyperlinks, pull-down lists, and text-boxes. You can define the look and feel of the search result page by using predefined templates or create your own templates. You don't need to know HTML to use it. Basically Query Builder will produce a URL for each predefined custom search you create. You can paste the URL(s) into your web page. However it will only allow you to enter search term(s) or keyword(s) in ONE field only. If you need to use multiple fields for your search you must use CONTENTdm 'Generic' Search Page. While the 'Generic' Seach Page is powerful - since you can use multiple fields for your searches - it contraints you in its format for search result page. Besides you and users will have to type in the keyword(s) everytime they want to a search. Due to these constraints it is not possible to use it for providing users with predefined custom searches that will always be available for users to use.

The trick to overcome those limitations is to combine the features of both the Query Builder and the 'Generic' Search Page. Having the capability to search in multiple fields using Query Builder will enable you to create predefined custom searches to assist users in finding the information/item they need. You will also have the capability to create a custom 'online exhibition' for users since you can create your own templates for displaying the search results from your predefined custom searches.

Here are the tricks. First we need to know how the CONTENTdm 'Generic' Search Page formulates the queries using MULTIPLE fields. Since it uses frames we cannot take a look at URL for the search results. You will only see http://images.libraryillinois.edu:8081/cgi-bin/htmlclient.exe?CISOOP=&CISOROOT4=%2FALA as the URL of the 'Generic' Search Page, as shown in the Figure 1 below.


Figure 1. CONTENTdm 'Generic' Search Page uses frames that conceals the URL for the search page that contains the boxes where you type in the keywords (middle part).

We need to view the page source to get only the Search Page's URL. Actually what we need is the URL for the Search Results Page but in order to have it you need to submit a query first. That's why we need the URL for the Search Page first to get to the Search Results Page. Confused? I hope not :) Click on View | Source in Internet Explorer or View | Page Source in Netscape. Copy the /cgi-bin/htmlsrch.exe?CISOROOT=/ALA and use it to replace the /cgi-bin/htmlclient.exe?CISOOP=&CISOROOT4=%2FALA part of the 'Generic' Search Page URL. Now you should have the full URL of the search result page, which is:
http://images.libraryillinois.edu:8081/cgi-bin/htmlsrch.exe?CISOROOT=/ALA. ALA in the URL refers to the ALA Archives Digital Collections. For other collections you should have different name. Paste the URL to Address bar of the browser. You should get the Search Page only, as shown in Figure 2 below.


Figure 2. Search Page where we type in the keyword(s) in the available text-boxes.

Type in the keyword(s) in the text-boxes. As an example we will try to use the following keywords:

When you click the Search button, another page will pop up containing the search results, as shown in Figure 3 below. We need to copy the URL from the Address bar of the browser and analyze it to get an idea of how CONTENTdm formulate searches using multiple fields.


Figure 3. A Search Results Page displaying the search results of our query.

The format of the URL, representing CONTENTdm multiple-field queries, is as follows:

http://images.libraryillinois.edu:8081/cgi-bin/htmlquery.exe?CISOROOT1=%2FALA&CISOMAX=4&CISOFIELD1=theme
&CISOBOX1=Conference+Photographs&CISOFIELD2=sub&CISOBOX2=F.+W.+Faxon+Collections&CISOFIELD3=subjea
&CISOBOX3=ALA+Ex-Presidents&CISOFIELD4=CISOSEARCHALL&CISOBOX4=

Analyzing the URL format we can conclude that:


Second thing to do is to get the idea how the CONTENTdm Query Builder formulates the queries in a SINGLE field. Go to http://images.libraryillinois.edu:8081/cgi-bin/qbuild.exe and using Query Builder create a simple hyperlink for a predefined custom search. A sample query using "Conference Photographs" as keyword(s) in the Theme field will yield the following URL:

http://images.libraryillinois.edu:8081/cgi-bin/pquery.exe?&CISOROOT1=/ALA&CISOFIELD1=theme&CISOBOX1=Conference%20Photographs
&CISOOP=all&CISORESTMP=/qbuild/template1.html&CISOVIEWTMP=/qbuild/template2.html&CISOMODE=thumbnails
&CISOROWS=4&CISOCOLS=5

Analyzing the URL format we can conclude that it has similar format to the 'Generic' Search Page URL with the differences/additions of:

The query above will produce a Search Results page as shown in Figure 4 below.


Figure 4. Search Results Page produced by the Query Builder using the default HTML template named "template1.html".

Clicking on one of the thumbnails will bring us to a new page, displaying the Access Image (full size) in JPG format, as shown in Figure 5 below.


Figure 5. Item Display Page produced by Query Builder using the default HTML template named "template2.html"

Search Results Page (Figure 4) and Item Display Page (Figure 5), which are produced Query Builder, are customizable. We can take advantage of this feature to create a custom page for online exhibitions by making use of the Theme, Sub Theme, Subject and/or any other field's values. However since we will need more than one fields to do that, we should utilize our knowledge of the URL format produced by the 'Generic' Search Page, which has the capability for handling multiple-field queries. To save you time, here is a sample of the combined format that I got after many trials and errors:

http://images.libraryillinois.edu:8081/cgi-bin/pquery.exe?&CISOROOT1=/ALA&CISOFIELD1=theme&CISOBOX1=Conference%20Photographs
&CISOFIELD2=sub&CISOBOX2=F.%20W.%20Faxon%20Collections&CISOFIELD3=subjea&CISOBOX3=ALA%20Ex-Presidents&CISOOP=all
&CISORESTMP=/qbuild/template1.html&CISOVIEWTMP=/qbuild/template2.html&CISOMODE=thumbnails&CISOROWS=3&CISOCOLS=5

Notice that the values for CISOFIELDx is a little bit different than field names defined in Table 1. ALA Archives Digital Collection Metadata Set in CONTENTdm Format. Not all fields defined in the table have the corresponding CISOFIELDx value. Only fields that are defined as Searchable will have the corresponding values for CISOFIELDx variable. Table 2 below list the searchable fields in ALA Archives Digital Collections and their corresponding values for CISOFIELDx.

Table 2. Searchable fields in ALA Archives Digital Collections
and their corresponding values for CISOFIELDx.
Field Name CISOFIELDx Name
Title title
Type type
Digitized Material digiti
Year Coverage covera
Geographic Coverage coverc
Names Coverage subjec
Subject subjea
Description descry
Creator Corporate creata
Creator Role creatb
Publisher publis
Theme theme
Sub Theme sub
Inclusive Dates inclus
Language langua
Medium materi
Technique techni
Format - Meaasurement measub
Date Created date
Search Across All Fields CISOSEARCHALL

Although I have never tried using more than three fields for creating predefined custom searches, I'm almost sure that you can use as many fields as you like. Due to this fact predefined custom searches are very powerful and useful. They give us the ability to pinpoint any specific parts of any collections and present them in customized web pages.

Predefined multiple-field searches using Query Builder format we learned so far are useful to create predefined custom searches in the form of simple hyperlinks. I have never tried to find out whether they can be applied for creating pull-down and text-box versions. I just don't see the need to do that since we can use the 'Generic' Search Page for that.

Back to Top

Composed by: Aditya Nugraha (anugraha_a@yahoo.com)
Created: July 2003
Last updated: August 01, 2003

Project Background | Project Summary | Desciption of Metadata | Technical Documentation | Search Collections | CONTENTdm Tutorial | Back to UIUC Archives Page | Back to DIMTI Projects Page