DELOS logo | Link to DELOS home page
newsletter logo
Issue 3 : June 2005
DELOS Home DELOS Newsletter Front PageDelos Newsletter Contents

Reports from the DELOS Clusters

Each issue of the DELOS Newsletter carries reports from clusters working within the DELOS Network of Excellence.


Digital Library Architecture

Can Türker has provided us with an update on activity by the Digital Library Architecture cluster in terms of workshops, exchange programme work and site development.

Over recent months considerable progress has been made in preparing a survey on service-oriented architectures (SoA), peer-to-peer architectures (P2P), and grid infrastructures, which is one of the central objectives of WP1 within the first 18 months of the DELOS Project. Input from all WP1 partners on applying SoA, P2P, and/or grid technology to Digital Libraries has been collected and integrated. The survey was completed in February 2005. For further information, do contact the cluster leader.

Many WP1 partners were also involved in preparing a revised, extended version of their paper presented at the Sixth Thematic Workshop of DELOS on Digital Library Architectures at S. Margherita di Pula (Cagliari), Italy, 24-25 June 2004. (See http://ii.umit.at/research/delos_website/6thworkshop.html) These papers have now undergone a second reviewing phase. Accepted papers will be published in the workshop post-proceedings and appear as a book in the Lecture Notes in Computer Science (LNCS) series of Springer. Can Türker (ETHZ), Hans-Jörg Schek (ETHZ/UMIT), and Maristella Agosti (U Padua) are in charge of editing and producing this book. For further information, do contact the cluster leader

Over the same period, joint work between several DELOS partners has intensified within the researcher exchange programme. For instance, MUNI hosted Giuseppe Amato from ISTI-CNR Pisa for two weeks in November 2004 and ETHZ hosted Harald Krottmaier from Graz for three weeks in total from September to November 2004.

The DELOS OpenDLib (http://www.opendlib.com) managed by ISTI/CNR has been updated by including documents produced during the DELOS events organized over the autumn. In particular, it now contains material from the Third International Summer School on Digital Library Technologies (ISDL 2004), the DELOS WP7 Workshop on the Evaluation of Digital Libraries, and CLEF 2004.

The Spring 2005 joint WP1/WP2 workshop organized by MPII took place at Dagstuhl Castle, Germany over 29 March - 1 April. This workshop, the 8th in the DELOS series of Thematic Workshops, devoted itself to two critical themes in digital libraries: system architecture and information access. The main workshop objectives were to bring together researchers and practitioners interested in these two areas and their inter-connections, to identify fundamental system services that allow the development and operation of future Digital Libraries, and to explore the main relevant technical directions. More information about this workshop is available under http://ii.umit.at/research/delos_website/8thworkshop.html

Can Türker
Member
ETH Zurich,
Institute of Information Systems,
CH-8092 Zurich

E-mail: tuerker@inf.ethz.ch
url: http://www.dbs.ethz.ch
Telephone: +41 1 6327248
Fax: +41 1 6321172

DELOS
Name of Cluster:

Name of Leader:
Network of Excellence
Digital Library Architecture

Hans-Jörg Schek
email: schek@inf.ethz.ch
phone: +30 210 727-5224
fax: +30 210 727-5214
url: http://www.dbs.ethz.ch/

Return to top

Information Access and Personalization

The Information Access and Personalization cluster enters the second period of the project with a new, exciting research agenda. Georgia Koutrika outlines for us the new research activities envisaged in the new phase.

The New Research Agenda for IAP

Introduction

The Information Access and Personalization cluster enters the second period of the project with a new, exciting research agenda. This agenda has come as a product of the efforts of the cluster so far as well as a logical continuation of them. For this reason, it comprises a list of vital research topics with a fair distribution of interest and effort among the major areas of the cluster's interests. In addition, it provides a strong incentive for collaboration among researchers with different background and interests in order to solve new problems that require expertise from different fields.

IAP's Strategic Goals

This agenda represents the combined effect of the cluster's past efforts. These have been planned in line with the following strategic goals:

  1. Construction of a common, comprehensive framework for information access and personalization approaches. Towards this end, a set of comprehensive surveys and reports have been generated. These provide a broad coverage of the general areas of interest to the cluster, and describe problems, existing models and approaches.
  2. Promotion of knowledge about available practices in the fields of information access and personalization in digital libraries. To further this objective, several information-sharing and dissemination activities have taken place.
  3. Initiation of research on new information access and personalization models and methodologies. This was made possible through the afore-mentioned activities and the researchers' exchange programme.

All these activities have led to the identification of major research trends and significant obstacles in the fields of information access and personalization as well as the establishment of cooperation between researchers and of several research proposals. In addition, they have provided the proper basis for the new objectives.

Phase Two

During the second phase, the Information Access and Personalization cluster will build on and continue the work performed during the first period and will pursue the following strategic goals:

  1. Substantial support of cooperation between individual research groups initiated during the first phase of the project.
  2. Research on new models and methodologies in order to eliminate inefficiencies in existing ones.
  3. Development of toolkits and systems for purposes of re-use and demonstration of proposed methods and models.

IAP Research Agenda

For the second phase of DELOS Project, we have drawn three basic lines of research:

Along these lines, five new activities have been planned overall. These, along with those activities continuing from the first period of the project’s lifetime, constitute the new research agenda of the IAP cluster. In what follows, I will try to give an overview of the aforementioned research directions.

1. Advanced Information Access Methods

The importance of this topic is justified by the ever-growing volume of multimedia objects in digital libraries. Added to this is the critical role multimedia plays in the improvement of a user's experience when accessing a Digital Library. Therefore, effective and efficient information access of multimedia content becomes increasingly more important. In particular, the main challenge here is as follows:

The main issues to be investigated include the following:

2. P2P Resource-sharing for Digital Libraries

The peer-to-peer (P2P) paradigm is an intriguing approach for coping with dynamically evolving federations of loosely coupled digital libraries. The large scale and high dynamics of a P2P system poses very challenging issues as far as information access is concerned:

Such a P2P system promises unlimited scalability, robustness to failures, fluctuation, and load dynamics, and also much reduced vulnerability to attacks and information manipulation.

The main issues that will be investigated include:

3. User Context Modelling

The interpretation and suitability of data increasingly depends on changing conditions, like for example the current position of the user or the media he is using (laptop, mobile, PDA, etc.), and on user characteristics, e.g. preferences. All these parameters constitute the user's context which should be taken into account during information access in order better to serve the user. Towards this direction, it is essential that information providers specify the context under which information becomes relevant. Conversely, information users can specify their (context-dependent or context-free) preferences as well as their current context when requesting for data.

Towards the enhancement of the personalization capabilities of DL systems, the following issues will be investigated:

Concluding Remarks

The outcomes from the efforts of the Information Access and Personalization cluster so far have served to underline the significance of the research directions selected. Furthermore, the research agenda of the IAP cluster retains two additional and unique features:

With this research agenda before us, the second period of the project for the Information Access and Personalization cluster promises to be both challenging and exciting.

Author Details

Georgia Koutrika
University of Athens
Email: koutrika@di.uoa.gr
Tel: +30 210 727 5242
Fax: +30 210 727 5214

DELOS
Name of Cluster:

Name of Leader:
Network of Excellence
Information Access and Personalization

Yannis Ioannidis
email: yannis@di.uoa.gr
phone: +30 210 727-5224
fax: +30 210 727-5214
url: http://cgi.di.uoa.gr/~yannis/

Return to top

Audio/Visual and Non-traditional Objects

George Ioannidis gives an outline of the cluster's progress and refers us to greater detail and future directions in the feature of this issue.

Introduction

Over the first 12 months of the project WP3 has aimed to develop a common understanding and foundation for the work that has to be done in DELOS in terms of State of the Art Reports, support for Forum and Testbeds, and efforts at understanding the expertise of the partners and their possible cooperation towards the objectives of DELOS as they are described in the Technical Annex.

Progress on Reports

The reports entitled State of the Art on Metadata Extraction and State of the Art in Audiovisual Content-Based Retrieval, Information Universal Access & Interaction including Data Models & Languages have been completed. A preliminary draft of the state of the art report on Audiovisual Metadata Management has been produced. For further information, do contact the cluster leaders.

Portals and Demonstrators

The Delos Collaborative Portal has been released. The portal is intended to foster exchange of ideas and useful information within the DELOS Community.

The DEMOS portal for demonstrators and testbeds has been created based on an analysis of the requirements for supporting testbeds and demonstrators. The DEMOS portal is described in further detail in Section 3 of the feature. Several demonstrators have already been ingested, some of which are described in Section 4 of the feature. Some testbeds have also been provided. They will not be described here, but may be accessed through the DEMOS portal.

Metadata-related Activity

For ontology-based metadata definition, a tool named GraphOnto has been implemented. An OWL upper ontology that captures the MPEG7 MDS is utilized. This upper ontology is extended with domain knowledge through appropriate OWL domain Ontologies.

In the same context, a study for the integration of the TV-Anytime Metadata model with the SCORM 1.2 Content Aggregation Model has been completed that defines a detailed mapping between the two metadata standards. This mapping allows for the provision of eLearning services on digital TV systems as well as the reuse of TV programs in order to build educational experiences.

MPEG-7-related Work

An analysis of the applicability of MPEG-7 descriptors to the existing video annotation tools that are based on home-grown XML annotation formats was carried out. Based on MPEG-7, a modelling language for magazine broadcasts has been specified. It is capable of describing classes of telecasts, instead of specific telecast instances, for automatic segmentation into semantic structural elements.

A Java class framework has been implemented for the modelling of MPEG-7 descriptions (MDS, Video, Audio). These can be stored in an implemented persistence management framework for media descriptors.

Other Developments

An automated image classifier based on SVM techniques has been designed and realized. An automatic region grouping method for improving semantic meaning of features using psychology laws has been developed. The classifier has been integrated in the MILOS Content Management System, which is also available as a demonstrator through the DEMOS portal. It is described in Section 4.1 of the main feature.

For video analysis, annotation, and retrieval, a prototype video content management system, named VCM, has been developed. It is available through the DEMOS demonstrator portal, and is described in Section 4.6

A multimedia authoring tool has been defined, which supports content-based constraints for personalizing the presentation of multimedia objects according to users’ preferences and skill level.

A prototype system was developed to explore the multimedia content of a digital library (images, text, videos, and audio) relating to theatrical works in 19th Century Milan and which supplies a VR (Virtual Reality) interface (namely, a reconstruction of a 19th Century Milanese theatre).

A front-end of a music search engine has been developed, which is accessible through a web browser to allow users to interact using a query-by-example paradigm. Moreover the typical query-by-humming paradigm is also supported. A preliminary version of a component for semi-automatic extraction of song metadata (title, lyrics, cover) from ID3-tags and by querying via web services has also been created. Methodologies for music indexing and retrieval have been extensively evaluated, based on a data fusion approach, with encouraging initial results.

Preliminary tests on the use of APIs provided by Web-based CD dealers were made to examine the potential of automatic creation of a network of composers/performers with scope for extracting information about their similarities, and reflecting to customers’ behaviour.

Feature extraction systems for audio content, named Marsyas and SOMeJB, have been installed and tested. Evaluation measures on a larger sample collection based on audio files have been collected and will subsequently be used to define scenarios for interactive retrieval and evaluation of retrieval performance in different scenarios.

An audio classification framework for the participation in the International Conference on Music Information Retrieval (ISMIR) audio contests in the disciplines of Rhythm Genre and Artist detection, has been implemented. It was awarded winner of the Rhythm Classification Competition, was ranked fourth in the genre classification contest, and was again winner in the “stress-test” performance of the genre classification contest. A corresponding demonstrator is available through the DEMOS portal. It is described in more detail in Section 4.5 in the feature.

A web crawler, which is based on APIs provided by a major Web Search Engine, has been developed to create a collection of MIDI files automatically, to be used as a testbed for Music Information Retrieval techniques. When launched, the crawler is able to collect and store thousands of MIDI files in a database, partially overcoming the classic problem of lack of test data.

A syllable-based speech recognition engine for English has been developed. A speech recognizer named ISIP was trained with huge amounts of American English broadcast data. Hidden-Markov-Models were used forming context-dependent cross-word-triphone models. The syllable inventory was generated using tools from NIST. The syllable recognition rate is 88.0%. A syllable retrieval system could be implemented with the syllable recognizer, similar to what has been done for German.

NIST TRECVID Evaluation

Delos members participated in the 2004 NIST TRECVID evaluation - the de facto international standard benchmark for content-based video retrieval. Members participated in the feature extraction task, the shot detection task, and the search task. For the latter task the UvA TRECVID Semantic Video Search Engine was developed, showing the effectiveness of the approaches to content-based retrieval by audio-visual libraries, as well as the parallel implementation thereof. The Semantic Video Search Engine is described in the feature, Section 4.4, and is accessible through the DEMOS portal. The shot detection algorithms implemented for TRECVID participation are also available through the portal. They are referred to in Section 4.6.

Other Advances

Several software components have been continuously refined. These include software for 3D objects modelling and retrieval, as well as tools for MPEG-7 manual annotation of videos and real-time automatic video annotation, in particular for soccer video analysis. Further improvements have been done on automatic audio-visual metadata extraction tools.

Advances have been made with the development of a test-bed and demonstrator for the extraction and integration of most of the MPEG-7 standard visual descriptors. The output of the demonstrator is collected in an MPEG-7 stream and testing on the interoperability is being analyzed.

Other work has included the following:

Readers are referred to the contents of the feature in this issue to which this summary relates. For further information on the above report, please contact the cluster leaders.

Author Details

George Ioannidis
Technologie-Zentrum Informatik (TZI)
University of Bremen
Germany
url http://www.tzi.de
email: george.ioannidis@tzi.de

DELOS
Name of Cluster:

Name of Leaders:
Network of Excellence
Audio/Visual and Non-traditional Objects

Stavros Christodoulakis




Alberto del Bimbo



email: stavros@ced.tuc.gr
phone: +30-2821-037399
fax: +30-2821-037399
url: http://www.music.tuc.gr

email: delbimbo@dsi.unifi.it
phone: +39-055-479 6262
fax: +39-055-479 6363
url: http://viplab.dsi.unifi.it/

Return to top

User Interfaces and Visualization

Tiziana Catarci and Stephen Kimani describe the outcomes from a questionnaire-based study and gives her conclusions.

Results from the Questionnaire-based Study

The User Interfaces and Visualization cluster carried out a questionnaire-based study in order to establish the functional and non-functional requirements of digital libraries as described previously in our report in issue 1. In this issue, we report the digital library requirements based on the results that we obtained from the study. (See the study at http://delos.dis.uniroma1.it/C2/Deliverables/default.aspx). In particular, we present:

Overview of the set-up of the questionnaire-based study

The cluster adopted an online questionnaires-based approach and composed two separate questionnaires, one for digital library end-users and one for digital library stakeholders. The questionnaires were designed such that they could gather information pertaining to user background and demographics; users' current experience; DL functional requirements; and DL non-functional requirements.

Demographics and user background

There were 45 library end-users (14 female, 25 male, 6 not specified) who responded to the online questionnaire. Most of the respondents ranged from 20 to 55 years in age and they all came from Europe. It was noted that while many of them worked in the field of computer science, the sample contained very divergent respondent backgrounds (from computer scientists to humanities studies and librarians). A small number of participants reported a considerable degree of disability in one or more of the cognitive, intellectual and visual categories. This sample of 45 DL users was also characterised by multilingualism, a high level of education, considerable experience of computing and the Internet, as well as relatively high experience of DLs.

The questionnaire results indicated that the users frequently accessed a digital library and thus they were generally aware of the weaknesses, advantages and drawbacks of current digital systems. In addition, it was noted that as far as the type of access used for data retrieval was concerned, the vast majority could be typified as public or free access, indicating that most users were not willing to pay a lot for retrieving data and knowledge from a digital archive. Moreover, Web access was by far the most popular medium. Two thirds of the DLs identified in the completed questionnaires supported English as the only language of interaction. Slightly less than a quarter of all the DLs mentioned supported both English and some other local national language. Very few DLs identified by respondents offered multilingual support or just the relevant national language.

Data analysis regarding functional and non-functional requirements

The cluster analyzed the results in order to determine the needs and requirements concerning both the functionality of digital libraries and other non-functional characteristics related to interaction which would be important for user interface design. Toward this end, high- and low-importance requirements were identified for both DL stakeholders and end-users, in order to provide a basis for the development of a taxonomy of functionalities and interaction characteristics which will inform the design, implementation and evaluation of future digital libraries.

Stakeholders appeared to pay particular attention to functions for locating and organising resources, including functions for creating cross-reference links among similar resources, as well as functions for storing metadata about resources and checking for inconsistencies among the DL resources. Another interesting observation was that all accessibility requirements (i.e. for all kinds of disabilities) occupied a significant position in the list of high-priority requirements of DL stakeholders; whereas usability requirements reached only the list of medium-priority requirements, except the need for 'ease of use' of the DL. DL stakeholders also placed all kinds of functionalities related to the administration and management of registered DL users on the list of high-priority requirements. On the other hand, the group of requirements that appeared to be of lower value to DL stakeholders included most of the miscellaneous functional and non-functional requirements, as well as usability requirements related to novice users. Furthermore, History facilities and multilingual support also proved to be of relatively low value to these DL stakeholders.

DL end-users, just like stakeholders, paid a lot of attention to all types of DL facilities for locating useful information by subject. Nevertheless, they, in contrast to stakeholders, seemed to pay particular attention to certain miscellaneous non-functional requirements such as system performance, security, privacy, safety, and other ethical requirements. Printing and Up- or Downloading facilities were also assigned significant importance by DL users, followed by general usability requirements and accessibility for people with motor impairments (i.e. mobility and dexterity impairments). On the other hand, personalisation did not appear to assume great importance for DL end-users, and facilities for user-to-user communication and collaboration hardly proved of interest at all.

Overall, it appeared that DL stakeholders were striving for enriched functionality, whereas DL users paid more attention to the perceived behaviour and reliability of a DL.

Conclusions

Three issues surfaced in the analysis conducted. Firstly, while end-users view DLs as personalised environments where privacy is protected, stakeholders appear to view DLs as more collaborative environments. Secondly, the traditional "paper document" metaphor is still seen as prevailing, which may prove a challenge when it comes to a purely digital environment. Finally, there is a conceptual rift between the end-user and the stakeholder in respect of DL non-functional aspects.

The prioritization of requirements identified in this deliverable has the potential to provide a framework for DL user interface design. Toward this end, at least two future steps are planned. First of all, a further extension of the study. Secondly, an investigation into the DL lifecycle in order to get an insight into how a digital library is expected to evolve as regards the interaction of users and stakeholders, as well as how the different phases of the lifecycle relate to both functional and non-functional requirements.

Author Details

Tiziana Catarci
Cluster Leader
User Interfaces and Visualization Cluster (UIV)
Università degli Studi di Roma "La Spienza"
E-mail: catarci@dis.uniroma1.it

url: http://www.dis.uniroma1.it/~catarci/
Telephone: +39-06-4991 8331
Fax: +39-06-4991 8331

Stephen Kimani
University of Rome "La Sapienza"
DIS, Piano 2, Stanza 233
Via Salaria 113
00198 Rome
Italy

url: http://www.dis.uniroma1.it/~kimani/
E-mail: kimani@dis.uniroma1.it
Telephone: +39-06-49918548

DELOS
Name of Cluster:

Name of Leader:
Network of Excellence
User Interfaces and Visualization

Tiziana Catarci
email: catarci@dis.uniroma1.it
phone: +39-06-4991 8331
fax: +39-06-4991 8331
url: http://www.dis.uniroma1.it/~catarci/

Return to top

Knowledge Extraction and Semantic Interoperability

Martin Doerr provides us with an overview of a comprehensive report on Semantic Interoperability.

The DELOS WP5 cluster on Knowledge Extraction and Semantic Interoperability is finishing a comprehensive report on Semantic Interoperability in Digital Library Systems [1].

The Internet and more particularly the Web have been instrumental in making widely accessible a vast range of digital resources. However, the current state of affairs is such that the task of pulling together relevant information involves searching for individual bits and pieces of information gleaned from a range of sources and services and manually assembling them into a whole. This task becomes increasingly intractable with the rapid rate at which resources are becoming available online.

Interoperability is therefore a major issue that affects all types of digital information systems, but has gained prominence with the widespread adoption of the Web. As far as digital libraries are concerned, interoperability is becoming a paramount issue as the Internet unites digital library systems of differing types, run by separate organisations which are geographically distributed all over the world. Federated digital library systems, in the form of co-operating autonomous systems, are emerging in a bid to make distributed collections of heterogeneous resources appear as a single, virtually integrated collection.

The report defines interoperability very broadly as enabling any form of inter-system communication, or the ability of a system to make use of data from a previously unforeseen source. Interoperability in general is concerned with the capability of different information systems to communicate. This communication may take various forms such as the transfer, exchange, transformation, mediation, migration or integration of information.

Semantic interoperability ("SI") is characterised by the capability of different information systems to communicate information consistent with the intended meaning of the encoded information (as intended by the creators or maintainers of the information system). It involves:

The issue is addressed from the following perspectives:

SI issues are analyzed from a practical point of view for the following extended list of information life cycle elements that reveals the extraordinary relevance of SI in all aspects of Digital Libraries:

  1. Creation, modification
  2. Publication
  3. Acquisition, selection, storage, system and collection building
  4. Cataloguing (metadata, identification/naming, registration), indexing, knowledge organisation, knowledge representation, modelling
  5. Integration, brokering, linking, syntactic and semantic interoperability engineering
  6. Mediation (user interfaces, personalisation, reference, recommendation, transfer etc.)
  7. Access, search and discovery
  8. Use, shared application/collaboration, scholarly communication, annotation, evaluation, reuse, work environments
  9. Maintenance
  10. Archiving and preservation

From a theoretical point of view, the report distinguishes SI at three levels of abstraction:

  1. Data structures, be it for metadata, content data, collection management data, service description data.
  2. Categorical data, i.e. data that refer to universals, such as classification, typologies and general subjects.
  3. Factual data, i.e. data that refer to particulars, such as people, items, places.

It shows in the sequence how these levels relate to different problems, methods and systems to achieve SI. Arguments are made that interoperability is always achieved by a reasonable combination of adhering to common standards and providing methods for dynamic interpretation of non-standardized contents. The above levels of abstraction greatly differ in the scale of concepts or data to be integrated. Consequently, standards are more easily promoted for the upper level, whereas the lower levels have to be addressed more by automated, dynamic methods of integration. The report also tries to bridge some gaps between the emerging different terminology of the libraries and the computer science communities for the same concepts.

The analysis of enabling factors and technologies to enhance SI begins with the role of foundational and core ontologies. They are not only perceived as a means to improve contents and consistency of terminological systems, but particularly as a means to assist mediation between different data structures and the transition zone of data structures and upper-level terminology.

A central role is played by Knowledge Organisation Systems (KOS) and their use in networks (NKOS), which deserve a particular classification due to the large variety in size and sophistication of intellectual analysis. KOS represent the shared agreement on concepts (categorical data) and important factual data, such as place names, very important people etc. Particular methods to enhance semantic interoperability are KOS transformation, correlation, mapping and others, but also questions of availability and rights are addressed.

An analysis of the role of architecture and infrastructures connects to how communication protocols and central services can support the global communication on standards and shared concepts, starting with metadata registries at level one to gazetteer services at level three.

In particular when discussing implementation strategies of integrated services, standardization, mediation and data warehousing are frequently controversially discussed, each one as the best solution. The report sees these techniques as alternatives with different application characteristics. Therefore a particular chapter is devoted to the pros and cons of these approaches so that designers may have better decision criteria at hand for their specific application.

Finally, some implications for a research agenda are discussed. At least some important areas for further R&D are identified:

Methodologies and tools for schema matching, mapping, and semantic data transformation, including graphical visualization methods to assist domain experts to formulate equivalences following their conceptualization as well as automated tools proposing schema matching to the expert.

Future issues for Thesaurus and KOS protocols include possible provision of more complex services, such as semantic expansion (beyond basic broader and narrower expansion), more advanced natural language functionality for identifying controlled terminology in free text (documents or query), cross-mapping provision (important for semantic interoperability) and possible data-dependent filters such as the number of postings associated with a concept.

The vision of employing imprecise semantic equivalences between multiple KOS (as "switching languages" etc.) requires a revision of query languages and engines in order to control dynamically the respective information loss.

Overall, methods and services are sought that lead to a convergence of global resources to higher states of semantic consistency, against the diverging forces of information isolation, update and local innovation.

References

  1. Available draft: Patel M., Koch T., Doerr M., Tsinaraki C., DELOS Deliverable D5.3.1: Semantic Interoperability in Digital Library Systems, February 2005

Author Details

Martin Doerr
Principal Researcher
Institute of Computer Science
The Foundation for Research and Technology - Hellas (FORTH)
Vassilika Vouton
P.O.Box 1385
GR 711 10 Heraklion, Crete
Greece

email: martinATics.forth.gr
Tel: +30 2810 391625
Fax: +30 2810 391638

DELOS
Name of Cluster:

Name of Leader:
Network of Excellence
Knowledge Extraction and
Semantic Interoperability
Elizabeth Lyon
email: e.lyon@ukoln.ac.uk
phone: +44 1225 386580
fax: +44 1225 386838
url: http://www.ukoln.ac.uk

Return to top

Evaluation

Sarantos Kapidakis provides a summary of activity across the differing tasks for the Evaluation cluster.

Task 1: Evaluation Forum

The Evaluation forum website (http://dlib.ionio.gr/wp7/) that brings together DL developers and evaluators was harmonized with the DELOS website and guidelines and further content was added. The forum is represented by two distinct virtual spaces, where the communication and collaboration of WP7 members takes place.

The first virtual space consists of the Evaluation cluster website which hosts collections of existing evaluation approaches and testbeds. The collection of existing evaluation approaches (http://dlib.ionio.gr/wp7/literature.html), as expressed in the form of a list of publications, related to the evaluation of digital libraries, is available in two bibliographic formats (Harvard and BibTex), which allows for easy inclusion in publications. The collection of existing testbeds and toolkits (http://dlib.ionio.gr/wp7/testbeds.html) operates as a linking point to testbed collections for the evaluation of digital libraries or to the results of other research projects. In order to ensure the holistic inspection of the research area and awareness of previous work, these results are carefully selected and reflect diverse forms of research, methodologies, measurements and metrics.

Finally the WP7 website is an area for publishing information to other research communities or to the public. Visitors can read about the aims of WP7, the work already completed, the partners and the events that are organized by the cluster.

The second virtual space implements the discussion forum, which serves as an area of communication among the members of the WP. A list of threads reflecting the general interests of the WP and the specific Tasks enables WP members to communicate in a centralized way.

In addition, a satellite website supporting the WP7 workshop on the evaluation of digital libraries was created (http://dlib.ionio.gr/wp7/workshop2004.html) which now contains the presentations from the workshop and the electronic proceedings.

Task 2: Evaluation Models and Methods

In our Workshop on DL evaluation, which took place in Padova over 4-5 October, 5 keynote speakers covered major aspects of DL evaluation:

In addition, 3 representatives from DL-related Integrated projects currently funded by the EU described their plans and expectations concerning DL evaluation. The workshop gave a very good analysis of the state of the art on digital library evaluation. During the final panel session major issues for further research on DL evaluation were discussed. The presentations from the workshop together with the electronic proceedings are online at http://dlib.ionio.gr/wp7/workshop2004_program.html.

Task 3: INEX

In 2004, INEX consisted of three tracks: ad-hoc retrieval, iTrack focusing with interactive retrieval, and the heterogeneous track dealing with heterogeneous collections. Retrieval effectiveness and efficiency are the evaluation criteria currently considered. Appropriate evaluation measures that consider structural relationships between different answers are still to be developed, as well as usage-oriented measures for the interactive track.

Ad-hoc Retrieval Track

For the ad-hoc retrieval track, the following steps were performed: The creation and selection of topics, the submissions of runs, the pooling of runs, and relevance assessments by the participating groups. Computation of results is currently underway.

Interactive Track

In the interactive track, a base system has been developed and tested, topics were selected based on the ad hoc topics, questionnaires were created and guidelines written. Each of the 10 participants ran the base system with at least 8 users and submitted their results and their interaction logs along with the completed questionnaires. In addition, some participants ran their own interactive system with additional users. Evaluation of results and analysis of the interaction logs is underway.

Heterogeneous Track

For the heterogeneous track, six subcollections from different DLs were gathered. Then topic selection guidelines were created before topics were selected from the ad-hoc track; furthermore new topics were created. After topics had been distributed and submission guidelines had been formulated, participating groups submitted their runs.

INEX Annual Workshop

The INEX annual workshop took place in Schloss Dagstuhl on December 6-8. Working notes for this workshop are available at http://inex.is.informatik.uni-duisburg.de:2004/pdf/INEX2004PreProceedings.pdf. The final proceedings will appear in the Springer series 'Lecture Notes in computer science'. At the workshop, participants presented their work, results were discussed and the evaluation campaign for 2005 was planned.

Task4: CLEF

For the 2004 evaluation campaign, 6 different evaluation tracks were defined to assess different aspects of the MLIR (Multilingual Information Retrieval) paradigm:

CLEF activity included: setting up and managing the CLEF 2004 website; preparing and distributing Calls for Participation; preparing and extending the data collection (a Portuguese newspaper collection for 1994/95 has been added to the existing multilingual comparable corpus in nine European languages); preparing and distributing topics for the Ad Hoc and GIRT tracks; receiving and analysing the results. In addition, DELOS supported the overall coordination of the other six tracks - managed on a voluntary basis by research groups with expertise in the areas covered - and the organisation of the CLEF2004 workshop held in Bath, UK, 15-17 September (immediately following ECDL2004).

Of the 64 registered, 55 groups submitted results:

This represented an increase on the 42 groups in CLEF2003. 96 people attended the workshop. 15 European and 2 North American research groups collaborated in the organization under the overall coordination of ISTI-CNR, supported by DELOS. The test collections (consisting of data, queries and relevance assessments for a number of tasks) were expanded. The main multilingual comparable document collection now contains nearly 2 million news documents in ten European languages; new collections were added for cross-language image retrieval.

There was a shift of focus from textual document retrieval to information extraction and multimedia retrieval over languages. Evaluation methodologies were tested for the two new tracks (cross-language question answering and cross-language retrieval in image collections via a combination of text- and content-based methods). The aim has been to stimulate research towards next-generation CLIR (Cross-Language Information Retrieval) systems. A CLEF Steering Committee meeting was held in Bath on 15 September 2004.

Author Details

Sarantos Kapidakis
Laboratory on Digital Libraries and Electronic Publishing
Archive and Library Sciences Department
Ionian University
Platia Eleftherias, Palea Anaktora,
Corfu 49100, Greece

email: sarantos@ionio.gr
Tel: +30 26610 87413
Fax: +30 26610 87436

DELOS
Name of Cluster:

Name of Leader:
Network of Excellence
Evaluation

Norbert Fuhr
email: fuhr@uni-duisburg.de
phone: +49-203-379-2524
fax: +49-203-379-2549
url: http://www.is.informatik.uni-duisburg.de/staff/index.html

Return to top

DELOS Home DELOS Newsletter Front PageDelos Newsletter Contents