Approximate Similarity Search in Metric Spaces
by Giuseppe Amato
Compute Science Department - University of Dortmund, June
2002
Download:
|
Abstract:
There is an urgent need to improve the efficiency of
similarity queries. For this reason, this thesis investigates approximate
similarity search in the environment of metric spaces. Four different
approximation techniques are proposed, each of which obtain high performance at
the price of tolerable imprecision in the results. Measures are defined to
quantify the improvement of performance obtained and the quality of
approximations. The proposed techniques were tested on various synthetic and
real-life files. The results of the experiments confirm the hypothesis that high
quality approximate similarity search can be performed at a much lower cost than
exact similarity search. The approaches that we propose provide an improvement
of efficiency of up to two orders of magnitude, guaranteeing a good quality of
the approximation.
The most promising of the proposed techniques exploits the
measurement of the proximity of ball regions in metric spaces. The proximity of
two ball regions is defined as the probability that data objects are contained
in their intersection. This probability can be easily obtained in vector spaces
but is very difficult to measure in generic metric spaces, where only distance
distribution is available and data distribution cannot be used. Alternative
techniques, which can be used to estimate such probability in metric spaces, are
thus also proposed, discussed, and validated in the thesis.
|
Results of this thesis were also published in:
- Book:
- "Similarity
Search - The Metric Space Approach", P. Zezula, G. Amato, V. Dohnal,
M. Batko, Springer, Series: Advances in Database Systems, Vol. 32,
2006, XVIII, 220 p., Hardcover, ISBN: 0-387-29146-6
- Journals:
- "Approximate
Similarity Retrieval With M-Trees", P. Zezula, P. Savino, G. Amato, F.
Rabitti,
The
VLDB Journal The International Journal on Very Large Data Bases, published by
Springer-Verlag, Volume 7, N. 4, December 1998, pp. 275-293
-
“Region
Proximity in Metric Spaces and Its Use for Approximate Similarity Search”,
G. Amato, F. Rabitti, P. Savino, P. Zezula,
ACM Transaction on Information Systems
(TOIS), Vol 21, No. 2, April 2003, pp. 192-227.
- Conferences:
- "Estimating
Proximity of Metric Ball Regions for Multimedia Data Indexing",
Giuseppe Amato, Pasquale Savino, Fausto Rabitti,
Pavel Zezula.
International conference on Advances in Information
Systems, ADVIS 2000, Izmir, Turkey, 25-27 October
2000, LNCS 1909: pp. 71-81
- "Approximate
Similarity Search in Metric Data by Using Region Proximity",
Giuseppe Amato, Pasquale Savino, Fausto Rabitti,
Pavel Zezula,
DELOS Workshop: Information Seeking, Searching and Querying in Digital
Libraries, December 11-12 2000
|