Henri AVANCINI - Research

"verum ipsum factum"
-G.Vico

CASPAR project

How can digitally encoded information still be understood and used in the future when the software, systems, and everyday knowledge will have changed? This is the challenge of CASPAR.

Digital information innervates modern civilization. Yet digital information is extremely vulnerable. A huge amount of precious digital information created and stored all over the world becomes unaccessible every few years at a very fast pace. Think of losing official records, a museum archive, irreplaceable scientific data, or even a collection of family photos, and we realize digital preservation is affecting us all.

CASPAR Finding Manager at CNR Page

CASPAR Finding Manager at CNR wsdl

CASPAR Finding Manager at ESA Page

CASPAR Finding Manager at ESA wsdl

DILIGENT project

DILIGENT project contacts. DILIGENT Scientific Coordinator: Donatella Castelli. DILIGENT Administrative Coordinator: Jessica Michel.

The DILIGENT test-bed will be built by integrating Grid and Digital Library (DL) technologies. Merging of these different technologies will lay the foundations for a next generation e-Science knowledge infrastructure with many different research and industrial applications.

The test-bed will be demonstrated and validated by two complementary real-life application scenarios: one from the environmental e-science domain and one from the cultural heritage domain. The first user community is composed of representatives from leading organisations that operate in the environmental sector; the second consists of scholars, distributed all over the world working together in a three-year project to merge the medical, humanity, social science and communication research areas.

The DILIGENT infrastructure, which will build upon the efforts of the EGEE project, is funded in part by the European Community’s Information Society Technologies priority of the Sixth Framework Programme.

You are invited to explore our website to learn more about DILIGENT, the people behind the project and our objectives and activities.

Automated Web Page Categorization (AWPC)

Advisor: Analia Amandi & Fabrizio Sebastiani

AWPC is relatively young area, beyond first works in the ´60s [Maron 61] it was no really employed till the beginnings of ´90s. Despite lasts years research interest in AWPC still have open problems, i.e. how can be used the user feedback to improve categorization effectiveness, how to take advantage of both the hierarchical structure and the hyper textual nature of the Web pages.
General text categorization techniques are in principle applicable, however they are usually unable to take advantage of the two peculiarities mentioned. A number of categorization techniques that explicitly deal with either or both peculiarities have recently emerged, like gating networks and expert networks [Ruiz et al. 99], hard and soft filters for hierarchical classification [Dumais et al. 00], post-processing the results of "flat" classification using the hierarchical structure [Frommholz 01], etc.
To address the objective of exploit the hierarchical structure of documents I am working on:

A modified version of Rocchio that allow a better characterizing of the concept underlying a category using the quasi-positive documents.
Shrinkage estimators, a statistical technique for choosing estimators having minimal MSE (mean squared error).
JIRE, a Java framework for Information REtrieval.

JIRE is being developed in order to aid the building of Information Retrieval Systems, which could also use Machine Learning techniques. JIRE is an Object-Oriented framework (basically a OO framework is a technique of reuse, which is composed by a semi-completed application that can be specialized in order to construct concrete applications [Johnson 97]) used in diverse IR experiments. JIRE follows a pipeline model of processing. Input data (corpus) is broken up into an internal representation. From that representation is possible to calculate thresholds (according to classification technique) and make a model of the training data (using Clustering, Naive Bayes, Linear Classifier). After this steps evaluation process can be done in order to confront the data model with test documents and obtain standard performance measures (Precision, Recall, F1).
State of the art with JIRE allow to work with New Reuters Corpus (vol.1), Reuters-21578, and other sources just implementing the method AbstractDocument getDoc (see documentation for details), it is also possible to construct model based on Rocchio, Modified Rocchio and Clustering (other techniques can be easily added), term indexing (instead of classic text indexing) was recently incorporated. JIRE is an on-going project and it is possible to download current version, send mail to avancini@iei.pi.cnr.it.

Installation:

Cyclades project

CYCLADES contacts: Umberto Straccia

Objectives

The main objective of CYCLADES is to develop advanced Internet accessible mediator services to support scholars both individually and as members of networked communities when interacting with large interdisciplinary electronic (e-print) archives. Such archives are important vehicles for the dissemination of preliminary results and non-peer reviewed "grey literature". Most focus on information dissemination within disciplinary or institutional communities. However, scientific research is now oriented towards an interdisciplinary approach. Scientists thus need to easily retrieve information from diverse sources, and to communicate and collaborate across traditional community boundaries. CYCLADES aims at supporting the transition of e-print systems into genuine building blocks of a transformed scholarly communication model by developing a set of leading edge technologies providing innovative methods for information access, dissemination, sharing and collaborative work.

Description of the work

The proposed open archives environment consists of two components: the archives and the services. The implementation of the former will be carried out by the US partners in the context of the Open Archives initiative (OAI) which aims at guaranteeing interoperability among e-print archives. The Oai has established a set of simple but potentially powerful interoperability specifications that facilitate the development of third party services. CYCLADES will base the development of the service environment on these specifications. In particular, a core set of cross-archive value-added services will be developed to constitute a federation of independent but interoperable services. According to this approach, a service provides a functionality and can either work independently or can communicate and collaborate with other services to offer a new value-added service. The Service Environment will provide OAi compliant functionality.

Main Cyclades services

Access: supports harvest-based information gathering, plus indexing and storage of gathered information in a local database.

Query and Browse: develops plans for the execution of user queries. An ad-hoc or a profile-based user query will be decomposed into more simple sub-queries to be sent to the Access service for execution. The results of the sub-queries are fused and returned to the user. A browse facility is also supported.

Collection: provides mechanisms for dynamically building meaningful collections.

Personalization: supports information personalization on the basis of single user profiles, and of an individual's behavior as a member of a community. Recommendation: provides recommendations to satisfy information needs of a user by exploiting both user and community profiles.

Collaborative Work: supports collaboration between members of virtual communities. Community working areas are created to use the OAi content in collaborative work.

FraMaS: Framework for Multi-agent Systems based on Composition

Advisor: Analia Amandi

Agent-based Software Engineering has been developed in order to satisfy the requirements of Multi-agent Systems (MAS). MAS basically are distributed and are composed of a set of software agents that interact in order to satisfy their objectives. Although MAS are applied to a great variety of different computer systems (hardware and/or software), they share a number of characteristics and components, i.e. the distributed information, entities with partial knowledge, no central control system.
Software agents (or intelligent agents or simply agents) are autonomous computational entities driven by objectives and inserted in an environment, which can be perceived and in with they can act. By autonomy we mean the set of actions performed by the agent without explicit user indication or other entity indication. Moreover, an agent is autonomous if it persists over time, i.e. it does not complete its execution when finished with an action; in fact, it continues perceiving its environment, deciding what action to carry out next, etc.
These and other characteristics of MAS make their development a non-trivial task. It is therefore desirable to have tools to assist with the different stages involved in the construction of MAS. However, despite the research efforts in that area, there are no tools at present.
The reuse of previously constructed software components is a common proposal in Software Engineering. The most common reuse techniques, in the Object Oriented paradigm, are the reuse of libraries of classes, design patterns, and frameworks. Frameworks are the skeleton of a set of applications that belong to a specific domain.
In particular, we propose a framework for the Multi-agent Systems domain called FraMaS, implemented in Java, which incorporates the characteristics of MAS systems. The FraMaS structure is made up of a set of Java classes and interfaces that model Multi-agent Systems by means of a set of multi-agent environments. Each one of these environments contains a set of agents that interact autonomously in order to satisfy their objectives through the use of the services provided by the environment.
Each agent designed under FraMaS has a set of basic actions covered by the advanced functionality. Actions that do not use any kind of deliberation techniques, learning, negotiation, etc. are basic agent actions; e.g. robot move action is a basic action, whereas the planning of the move action is an advanced functionality. The idea behind the design of each agent is to endow the agent with a set of wrappers over the basic actions that can be added (and deleted) dynamically.
Agents are inserted in only one environment in each time stamp, though they can move from one environment to another. Events that occurs in the agent context (i.e. in other agents or the environment) are perceived by sensors. The communication between agents is direct.
The presented framework (FraMaS) is viable for the construction of multi-agent systems according to our experience with the design, implementation and testing of the following software systems: Meeting Scheduling Agent, Forklift agents? system, and e-commerce agent structure.
FraMaS was developed using a bottom-up approach, i.e. from applications to the design. This permits the inclusion of new components by means of a series of implemented methods that simplify the extension of the framework functionality. Therefore, other applications built using the framework can make use of these components.

FraMaS thesis (Spanish version)

Publications

See CV page.

References

[Dumais et al. 00] Susan T. Dumais and Hao Chen. Hierarchical classification of Web content. Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval. Eds. Nicholas J. Belkin and Peter Ingwersen and Mun-Kew Leong. ACM Press, New York, US. 2000. Pp. 256-263.
[Frommholz 01] Categorizing Web documents in hierarchical catalogues. In Proceedings of ECIR-01, 23rd European Colloquium on Information Retrieval Research (Darmstand, DE, 2001).
[Johnson 97] Johnson, R. Components, Frameworks, Patterns. Proceedings of the Symposium on Software Reusability, pp.10-17, 1997.
[Maron 61] M.E. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, 8(3):404-417, 1961.
[Ruiz et al. 99] Miguel E. Ruiz and Padmini Srinivasan. Hierarchical neural networks for text categorization.
Proceedings of SIGIR-99, 2nd ACM International Conference on Research and Development in Information Retrieval. Ed. Marti A. Hearst and Fredric Gey and Richard Tong. ACM Press, New York, US. Pp. 281-282. 1999.

Curriculum Vitae Research About me Resources Home