DELOS logo | Link to DELOS home page
newsletter logo
Issue 3 : June 2005
DELOS Home DELOS Newsletter Front PageDelos Newsletter Contents

Heterogeneity in Digital Libraries: Two Sides of the Same Coin

Georgia Koutrika provides us with an overview of the challenges facing digital library design caused by the diversity both of data resources and users and describes how IAP user surveys are addressing this difficulty.

Introduction

Heterogeneity may be regarded as a benefit to digital libraries, but the truth is, it also represents something of a problem to developers as well. We come across it in a number of forms. Even at a very high level abstraction, we should consider data sources and users as two of the most fundamental constituents of a digital library. Accordingly, two basic types of heterogeneity are evident: data source heterogeneity and user heterogeneity. One cannot design and implement a digital library without considering these issues extremely carefully. As a consequence, their importance to digital library design has instigated the production of three separate surveys within the DELOS context.

Data Source Heterogeneity

A digital library can be a vast collection of objects stored and maintained by multiple information sources, including databases, image banks, file systems, e-mail systems, the World Wide Web, and others. Therefore, assembling information of relevance on a specific topic involves searching for correct information items emanating from a wide variety of sources.

The issue of data source heterogeneity can represent significant problems when accessing multiple data sources. In effect it is the degree of dissimilarity between the component data sources that determines the amount of difficulty involved in implementing a data integration system. Data sources may differ in many ways. At a lower level, heterogeneity arises out of differing hardware platforms, operating systems, networking protocols and access interfaces. At the higher level, heterogeneity arises out of differences among different programming and data models as well as different perceptions and modelling of the same real world. Moreover, the fact remains that sources are evolutionary, i.e. where at one point they may be included on a system, there also comes a time when they are removed.

Four types of data source heterogeneity have been identified:

Consequently, there is a need to provide users with the capacity to access digital library objects both seamlessly and transparently despite the heterogeneity and dynamism across the various information sources involved. Interoperable information sources and services allow users to focus on information use instead of their being obliged to acquire and combine the required content manually from the different sources.

Syntactic and structural interoperability supports the handling, exchange and combining of data properly, having proper regard to formats, encodings, properties, values, data types and so forth. A data integration system is one that provides users with transparent access to a collection of related data sources as if these sources, as a whole, constitute a single data source. The main objective of a data integration system is to facilitate users' attempts to focus on specifying what data they want, rather than on describing how to obtain it. To achieve this, the system provides an integrated view of the data stored in the underlying data sources. In a data integration system, users are interested mainly in querying the integrated data rather than updating the data through the integrated view. It is therefore something of an understatement to suggest that heterogeneous data sources invariably present designers of data integration systems with a raft of challenging difficulties.

The Data Integration Services Survey

Given such challenges, the aim of the Data Integration Services Survey is thoroughly to describe and compare the different approaches, schemes, frameworks and systems mentioned in the current literature on supporting information integration from structurally heterogeneous sources. This is a survey on the following data source description approaches:

Semantic Interoperability

Semantic interoperability, on the other hand, allows users to negotiate and understand the meaning of the metadata items both in the same application domain and between application domains. Semantic interoperability refers to the extent to which different metadata schemes express the same semantics in their categorization. Successful interoperation requires clarity on how the categories of metadata relate to each other across different schemes. To this end, several questions must be answered:

Furthermore, different application domains have established different metadata standards, making the interoperation of applications from different domains a tricky task. The problem becomes even more complicated when a vast body of standards already exists for the same application domain.

Semantic Interoperability Survey

The state of the art survey on Semantic Interoperability in Digital Libraries focuses on semantic interoperability issues, and in particular on:

User Heterogeneity

On the other hand, Internet access has resulted in digital libraries being increasingly used by diverse communities for a variety of purposes; among these sharing and collaboration have become important social elements. In addition, a user's information-seeking activities are no longer bound, neither geographically nor temporally. Information access can be achieved through a variety of devices from users' offices, homes, hotel rooms or even on the move, at any time of the day or night, seven days a week. As a result, information systems are seeing far greater use. More importantly still, the kind of people doing so now range well beyond librarians or scientists, as was once the case.

User Heterogeneity is, hence, a significant problem for digital libraries. Users have ever more complex needs and different users have differing requirements. At the same time, users want to achieve their goals with a minimum of cognitive load and as much enjoyment as possible. Furthermore, we must factor in the matter of information overload which fuels the need for more sophisticated and user-centered services which can provide access to the content of digital libraries. Individuals as much as groups of users have to be better supported if they are to capture, structure and share knowledge successfully. Furthermore, in the same context, both formal and informal learning activity requires similar support.

Personalization

An integral step towards these ends lies in building effective profiles of their users. A user profile is an appropriate description of the user, created manually by either the user, or automatically by the system. It is used by the system during its interaction with digital library users in order to anticipate their needs and satisfy them in the best possible way. This is achieved by adapting presentation, content, and services based on a person's task, background, history, device, information needs, location, and so forth, as dictated by the user profile. Digital libraries which fail to meet the personalization requirements posed by their users will ultimately find it difficult to retain their user base or indeed attract new users.

Therefore, this has led to the development of personalization systems which adapt their behaviour to the goals, interests, and other characteristics of their users, either as individuals or as members of particular groups.

Central to all personalization systems is the issue of user profile representation. This provides the means to record the user's preferences and status and so filter the content retrieved, personalize the services offered as well as track user access behaviour and needs. However the construction of user profiles can represent considerable effort which remains largely invisible to the layman.

The aim of the User Modeling for Personalization in Digital Libraries Survey is to study user profiling in Information Retrieval and Information Filtering. It describes different user profile representations, such as history-based, vector space model, weighted n-grams, and classifier-based profiles, explicit and implicit methods for user profile acquisition, user context, existing standards and models, and user profile management in major commercial systems and research projects.

User profiles can be used in a variety of ways to individualize user experience which means of course that approaches to personalization also differ. However it has been commonly observed that the largest proportion of research derives from the Information Retrieval community, with that of the Database community next most in evidence, in many cases inspired by Institutional Repositories (IR).

The Profile Usage for Personalization in Digital Libraries Survey covers personalization methods proposed in the IR and Database communities. It describes information filtering, continuous queries, recommender systems and personalized search engines.

Other Vital Work

Heterogeneity is by no means the only issue to consider in digital library design. During the first year of work, the IAP cluster has been drafting a set of comprehensive surveys and reports on other key relevant areas of interest to provide broad overviews of existing models and approaches as well as identify problems. These surveys formed the basis for establishing common approaches on information access, information integration and personalization; they were also instrumental in initiating joint research in a number of the aforementioned areas.

Apart from the surveys mentioned above, other surveys already in draft relate to the following topics:

Work carried out on the formulation of these surveys has served to identify major themes in research on both information access and personalization, as follows:

The surveys are available from the Information Access and Personalization cluster website:
http://delos.di.uoa.gr/transactions.php?type=Reports

Author Details

Georgia Koutrika
University of Athens
Email: koutrika@di.uoa.gr
Telephone: +30 210 727 5242
Fax: +30 210 727 5214

Return to top

DELOS Home DELOS Newsletter Front PageDelos Newsletter Contents