Henri AVANCINI - Research
"verum ipsum factum"
-G.Vico
CASPAR project
How can digitally encoded information still be understood and used in the future
when the software, systems, and everyday knowledge will have changed? This is the
challenge of CASPAR.
Digital information innervates modern civilization.
Yet digital information is extremely vulnerable. A huge amount of precious
digital information created and stored all over the world becomes unaccessible
every few years at a very fast pace. Think of losing official records, a museum
archive, irreplaceable scientific data, or even a collection of family photos,
and we realize digital preservation is affecting us all.
CASPAR Finding Manager at CNR Page
CASPAR Finding Manager at CNR wsdl
CASPAR Finding Manager at ESA Page
CASPAR Finding Manager at ESA wsdl
DILIGENT project
DILIGENT project contacts. DILIGENT Scientific Coordinator: Donatella
Castelli. DILIGENT Administrative Coordinator: Jessica
Michel.
The DILIGENT test-bed will be built by integrating Grid and
Digital Library (DL) technologies.
Merging of these different technologies will lay the foundations for a
next generation e-Science knowledge infrastructure with many different
research and industrial applications.
The test-bed will be demonstrated and validated by two complementary
real-life application scenarios: one from the environmental
e-science domain and one from the cultural heritage domain.
The first user community is composed of representatives from
leading
organisations that operate in the environmental sector; the second
consists of scholars, distributed all over the world working together
in a three-year project to merge the medical, humanity, social science
and communication research areas.
The DILIGENT infrastructure, which will build upon the efforts of
the EGEE project, is funded in part by the European Community’s
Information Society Technologies priority of the Sixth Framework
Programme.
You are invited to explore our website to learn more about DILIGENT,
the people behind the project and our objectives and activities.
Automated Web Page Categorization (AWPC)
Advisor: Analia Amandi
& Fabrizio
Sebastiani
- AWPC is relatively young area, beyond first works in the
´60s [Maron 61] it was no really employed till the beginnings of
´90s.
Despite lasts years research interest in AWPC still have open problems,
i.e. how can be used the user feedback to improve categorization
effectiveness,
how to take advantage of both the hierarchical structure and the hyper
textual nature of the Web pages.
- General text categorization techniques are in principle
applicable, however they are usually unable to take advantage of the
two peculiarities mentioned. A number of categorization techniques that
explicitly deal with either or both peculiarities have recently
emerged, like gating networks and expert networks [Ruiz et al. 99],
hard and soft filters for hierarchical classification [Dumais et al.
00], post-processing the results of "flat"
classification using the hierarchical structure [Frommholz 01], etc.
- To address the objective of exploit the hierarchical structure
of documents I am working on:
- A modified version of Rocchio that allow a better
characterizing of the concept underlying a category using the
quasi-positive documents.
- Shrinkage estimators, a statistical technique for choosing
estimators having minimal MSE (mean squared error).
- JIRE, a Java framework for Information REtrieval.
- JIRE is being developed in order to aid the building of
Information Retrieval Systems, which could also use Machine Learning
techniques. JIRE is an Object-Oriented framework (basically a OO
framework is a technique of reuse, which is composed by a
semi-completed application that can be
specialized in order to construct concrete applications [Johnson 97])
used
in diverse IR experiments. JIRE follows a pipeline model of processing.
Input data (corpus) is broken up into an internal representation. From
that
representation is possible to calculate thresholds (according to
classification
technique) and make a model of the training data (using Clustering,
Naive
Bayes, Linear Classifier). After this steps evaluation process can be
done
in order to confront the data model with test documents and obtain
standard
performance measures (Precision, Recall, F1).
- State of the art with JIRE allow to work with New Reuters Corpus
(vol.1), Reuters-21578, and other sources just implementing the method
AbstractDocument getDoc (see documentation for details), it is also
possible to construct model based on Rocchio, Modified Rocchio and
Clustering (other techniques can be easily added), term indexing
(instead of classic text indexing) was recently incorporated. JIRE is
an on-going project and it is possible to
download current version, send mail to avancini@iei.pi.cnr.it.
- Installation:
1. Unzip the tar file in a $JIRE_HOME directory
2. See instruction file in $JIRE_HOME/tmp/install
3. $JIRE_HOME/bin/ contains scripts to execute indexing, classification
and evaluation modules
Cyclades project
CYCLADES
contacts: Umberto Straccia
Objectives
The main objective of CYCLADES
is to develop advanced Internet accessible mediator services to support
scholars both individually and as members of networked communities when
interacting with large interdisciplinary electronic (e-print) archives.
Such archives are important vehicles for the dissemination of
preliminary results and non-peer reviewed "grey literature". Most focus
on information dissemination within disciplinary or institutional
communities. However, scientific research is now oriented towards an
interdisciplinary approach. Scientists thus need to easily retrieve
information from diverse sources, and to communicate and collaborate
across traditional community boundaries. CYCLADES aims at supporting
the transition of e-print systems into genuine building blocks of a
transformed scholarly communication model by developing a set of
leading edge technologies providing innovative methods for information
access, dissemination, sharing and collaborative work.
Description of the work
The proposed open archives
environment consists of two components: the archives and the services.
The implementation of the former will be carried out by the US partners
in the context of the Open Archives initiative (OAI) which aims at
guaranteeing interoperability among e-print archives. The Oai has
established a set of simple but potentially powerful interoperability
specifications that facilitate the development of third party services.
CYCLADES will base the development of the service environment on these
specifications. In particular, a core set of cross-archive value-added
services will be developed to constitute a federation of independent
but interoperable services. According to this approach, a service
provides a functionality and can either work independently or can
communicate and collaborate with other services to offer a new
value-added service. The Service Environment will provide OAi compliant
functionality.
Main Cyclades services
FraMaS: Framework for Multi-agent Systems based on Composition
Advisor: Analia Amandi
- Agent-based Software Engineering has been developed in order to
satisfy the requirements of Multi-agent Systems (MAS). MAS basically
are distributed and are composed of a set of software agents that
interact in
order to satisfy their objectives. Although MAS are applied to a great
variety
of different computer systems (hardware and/or software), they share a
number
of characteristics and components, i.e. the distributed information,
entities
with partial knowledge, no central control system.
- Software agents (or intelligent agents or simply agents) are
autonomous computational entities driven by objectives and inserted in
an environment, which can be perceived and in with they can act. By
autonomy we mean the set of actions performed by the agent without
explicit user indication or other entity indication. Moreover, an agent
is autonomous if it persists over time, i.e. it does not complete its
execution when finished with an action;
in fact, it continues perceiving its environment, deciding what action
to
carry out next, etc.
- These and other characteristics of MAS make their development
a non-trivial task. It is therefore desirable to have tools to assist
with the different stages involved in the construction of MAS. However,
despite the research efforts in that area, there are no tools at
present.
- The reuse of previously constructed software components is a
common proposal in Software Engineering. The most common reuse
techniques, in
the Object Oriented paradigm, are the reuse of libraries of classes,
design
patterns, and frameworks. Frameworks are the skeleton of a set of
applications
that belong to a specific domain.
- In particular, we propose a framework for the Multi-agent
Systems domain called FraMaS, implemented in Java, which incorporates
the characteristics of MAS systems. The FraMaS structure is made up of
a set of Java classes and interfaces that model Multi-agent Systems by
means of a set of multi-agent environments. Each one of these
environments contains a set of agents that interact autonomously in
order to satisfy their objectives through the use of the services
provided by the environment.
- Each agent designed under FraMaS has a set of basic
actions covered by the advanced functionality. Actions that do not use
any kind
of deliberation techniques, learning, negotiation, etc. are basic agent
actions; e.g. robot move action is a basic action, whereas the planning
of the move action is an advanced functionality. The idea behind the
design
of each agent is to endow the agent with a set of wrappers over the
basic
actions that can be added (and deleted) dynamically.
- Agents are inserted in only one environment in each time stamp,
though they can move from one environment to another. Events that
occurs in the agent context (i.e. in other agents or the environment)
are perceived by sensors. The communication between agents is direct.
- The presented framework (FraMaS) is viable for the construction
of multi-agent systems according to our experience with the design,
implementation and testing of the following software systems: Meeting
Scheduling Agent, Forklift agents? system, and e-commerce agent
structure.
- FraMaS was developed using a bottom-up approach, i.e. from
applications to the design. This permits the inclusion of new
components by means of a series of implemented methods that simplify
the extension of the framework functionality. Therefore, other
applications built using the framework can make use of these components.
FraMaS thesis (Spanish version)
Publications
References
[Dumais et al. 00] Susan T. Dumais and Hao Chen. Hierarchical
classification of Web content. Proceedings of SIGIR-00, 23rd ACM
International Conference on Research and Development in Information
Retrieval. Eds. Nicholas J. Belkin and Peter Ingwersen and Mun-Kew
Leong. ACM Press, New York, US. 2000. Pp. 256-263.
[Frommholz 01] Categorizing Web documents in hierarchical catalogues.
In Proceedings of ECIR-01, 23rd European Colloquium on Information
Retrieval Research (Darmstand, DE, 2001).
[Johnson 97] Johnson, R. Components, Frameworks, Patterns. Proceedings
of the Symposium on Software Reusability, pp.10-17, 1997.
[Maron 61] M.E. Maron. Automatic indexing: an experimental inquiry.
Journal of the Association for Computing Machinery, 8(3):404-417, 1961.
[Ruiz et al. 99] Miguel E. Ruiz and Padmini Srinivasan. Hierarchical
neural networks for text categorization.
Proceedings of SIGIR-99, 2nd ACM International Conference on Research
and Development in Information Retrieval. Ed. Marti A. Hearst and
Fredric Gey and Richard Tong. ACM Press, New York, US. Pp. 281-282.
1999.
Curriculum Vitae
Research About me
Resources
Home
Send mail to avancini @ isti . cnr . it
with questions or comments about this web site.
©1995-2009 Henri AVANCINI