Cover Article

Architecting
A Digital Library Solution

There is a growing feeling that despite technological advancements, many rare documents kept in our cultural museums and libraries are being maintained using traditional methods that are not foolproof and may not sustain the documents for long. It is important that an attempt is made to create an awareness, required to reverse this trend. One way of preserving this rich resource is by transferring it on and creating Digital Libraries. Prof N Chandrasekaran dwells on the means of architecting a Digital Library solution and C-DAC’s efforts in this direction.

A Nation or society, which lacks aware ness about its heritage, cannot fully comprehend its present and hence cannot steer its future. Our forefathers have meticulously documented numerous aspects of our culture and the generations to follow have ensured that the tell- tale signs relating to our culture have been safeguarded. The Digital Library technology, which enables safe and effective storage of ancient documents, has matured considerably. In the past one year, C-DAC has done significant work to architect a Digital Library solution that can facilitate the creation of a digital repository for the country’s rare possessions.

Digital Library technology caters to two different types of applications, viz., Scientific and Heritage. A number of commercial solutions are available for designing a Digital Library. They can be broadly classified as belonging to Framework based solutions and to Direct Methodology. Our experience has indicated that a direct methodology, which directly tackles the problem and provides a one-time instant solution, has been found to be more suitable. Even framework based solutions, require significant effort for customisation and hence cannot be automated without expert intervention.

Heritage Solutions : One of the intensely pursued subjects of digital libraries is “heritage” in the form of a digital repository of rare texts, manuscripts, images, paintings and artifacts. C-DAC is currently engaged in developing an Open and Scalable solution to meet the requirements, vis-ŕ-vis, Digital Library for the preservation of the Indian Heritage as part of a major project.

One of the Pilot Projects, C-DAC has on hand is for a famous South Indian Temple in collaboration with the Departments of Culture, Science and Technology and the Department of Information Technology. The source materials used in this study are among the rarest, most valuable and most beautiful manuscripts ever created by mortals.

Most of the contents are hundreds of years old. The solution aims at capturing the contents using the most advanced technologies such that they can be accessed by masses. This approach also permits their preservation. An attempt is also being made to provide transliteration capabilities for creating better understanding through multilingual rendering. Hence, it can clearly be seen that the heritage domain is ideal for a Digital Library

Technology for Creating Digital Contents : Attempts are being made by C-DAC to preserve Indian Heritage contents. The system designed to satisfy the project needs include the following:

l A subsystem to inspect the digitized images.
l A subsystem that provides access to the images and cataloging information through a network.
      This subsystem also provides additional image display functions to support the needs of local scholars.
l A subsystem that is used to examine the scanned images, to archive them, replicate and ship them.
      Routine examination of the scanned images is done to ensure that the scanning subsystem is operating
      correctly and the scanned images are of high quality.

Selected fields from the records in the catalog (e.g., manuscript name, page) are also imported into the database to index images that will be stored within the system. This database assists researchers physically located at the Library to locate and retrieve images of the pages of manuscripts.

To illustrate the concepts, detailed below are the steps involved in Digitising the Newspaper Archives :

Scanning the newspapers with the OmniScan Photo Imaging Equipment. The equipment, the first of its kind in India, is a document scanner. It comes built with an OmniScan 5.0, an efficient software to control the book scanners with a high operating speed and reliability. It does not require any special hardware, and recognises a large number of graphic formats so that scanned images can be stored in a format suitable to the requirements of processing. It is possible to file all images of one scanning task in a single multi-page TIFF file. The document may be scanned as left/right page, split page or as a full-frame image. It also permits “granularity of storage” implying that the institution can exercise control over the resolutions during transmissions and storage.

Process the scanned images covering the following steps :

l Reduce image size as required
l Crop/ erase parts of the image
l Sharpen the images
l Enhance readability
l Rotate to proper orientation

Information Storing and Retrieval : In developing a Digital Library Solution using XML with built-in search capabilities, the keywords shall be catalogued and indexed. A data entry page shall be provided for the user to enter the keywords for searching. The solution permits the display of the retrieved data. Back Up of the scanned images with the help of CD Writers or digital tapes will be provided which helps in preserving the data.

After capturing the contents, we can create digital libraries, which necessitates that the contents are stored in an RDBMS. There are some special advantages if Oracle is used as the database engine. Oracle Intermedia, which permits ConTeXt queries, comes integrated with the RDBMS. The general steps for enabling text queries in a query application are :

1. Load the text
2. Index the text
3. Issue queries and
4. Present the documents that satisfy a query

Some of the salient features of the Pilot projects completed by C-DAC are :

Sarasvati Mahal Library : Some of the rare documents and manuscripts belonging to the library were digitised using the Imaging equipment. It was then compiled into a web site. We also established a hierarchical relationship between the various pages using Net Objects Fusion. Programming based on JavaScripts was used in the site. Component plug-ins was employed to further enhance the Object Oriented capabilities. A query-based search can be made if the images and their description are stored in a database. Also included was a series of Palm Leaf manuscripts. The palm leaf manuscripts contain some dance dramas written in old Tamil fonts that cannot be understood even if the language is known. Hence a transliteration of the fonts was made using the Leap package of C-DAC with the help of language experts.

Salar Jung Museum: A list of artifacts in the museum along with their descriptions was stored in the Microsoft Access database and a query-based search was enabled.

Indian Institute of AstroPhysics : Some of the rare documents of Kepler’s work belonging to the Indian Institute of AstroPhysics were digitised using the Image Capture Equipment and all the written books were stored in Compact Disks to ensure protection and also permanency of the documents.

Prof. N. Chandrasekaran, Director C-DAC heads the team for the Digital Library Project at C-DAC, Bangalore. Click here to send an eMail