Reducing the Dimensional Dependence for Rank-based Similarity Search

Jeevan Arjun Shelke; Viresh Shivshankar Humnabadkar; Tushar Tinajirao Kale; Girish Gundopant Kulkarni; PS Kulkarni

Reducing the Dimensional Dependence for Rank-based Similarity Search

Jeevan Arjun Shelke, Viresh Shivshankar Humnabadkar, Tushar Tinajirao Kale, Girish Gundopant Kulkarni, PS Kulkarni

Abstract

In this paper, we try to introduce a data structure for k-NN search, the Rank Cover Tree (RCT). The pruning tests for RCT rely on the comparison of similarity values not on the other properties of the underlying space, such as the triangle inequality. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. Theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values.

Cite this Article Jeevan Arjun Shelke, Tushar Tinajirao Kale, Viresh Shivshankar Humnabadkar et al. Reducing the Dimensional Dependence for Rank-Based Similarity Search. Journal of Advanced Database Management & Systems. 2015; 2(3): 20–25p.

Keywords

Nearest neighbor search, intrinsic dimensionality, rank-based search, data, numerical constraints

Full Text:

PDF

References

Guttman A. R-trees: A dynamic index structure for spatial searching. In Proc. Annu. Meeting. 1984; 14(2): 47–57p.

Ciaccia P, Patella M, Zezula P. M-tree: An efficient access method for similarity search in metric spaces. In Proc. 23rd Int. Conf. Very Large Data Bases. 1997; 426–435p.

Goyal N, Lifshits Y, Sch€utze H. Disorder inequality: A combinatorial approach to nearest neighbor search. In Proc. Intern. Conf. Web Search Web Data Mining. 2008; 25–32p.

Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. Inf. Syst. 2000; 25(5): 345–366p.

Ert€oz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Proc. 3rd SIAM Int. Conf. Data Mining. 2003; 1p.

Andoni A, Indyk P. Available at: www.mit.edu/andoni/LSH/.

de Vries T, Chawla S, Houle ME. Finding local anomalies in very high dimensional space. In Proc. IEEE Int. Conf. Data Mining. 2010; 128–137p.

Han J, Kamber M. Data Mining: concepts and techniques. San Francisco, CA, USA: Morgan Kaufmann; 2006.

Chan HTH, Gupta A, Maggs BM et al. On hierarchical routing in doubling metrics. In Proc. 15th Annu. ACMSIAM Symp. Discrete Algorithm. 2005; 762–771p.

Bozkaya T, Ozsoyoglu M. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 1999; 24(3): 361–404p.

Beygelzimer A, Kakade S, Langford J. Cover trees for nearest neighbor. In Proc. 23rd Int. Conf. Mach. Learn. 2006; 97–104p.

Breunig MM, Kriegel HP, Ng RT et al. LOF: Identifying density-based local outliers. SIGMOD Rec. 2000; 29(2): 93–104p.

Ch_avez E, Navarro G, Baeza-Yates R et al. Searching in metric spaces. ACM Comput. Surv. 2001; 33(3): 273–321p.

Beyer KS, Goldstein J, Ramakrishnan R et al. When is ”nearest neighbor” meaningful?” In Proc. 7th Int. Conf. Database Theory. 1999; 1540: 217–235p.

Asuncion A, Newman DJ. Available at: http://www.ics.uci.edu/~mlearn/MLRepository.html.

Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. ACM Comput. Surv. 2009; 41(3): 1–58p.

Gupta A, Krauthgamer R, Lee JR. Bounded geometries, fractals, and low-distortion embeddings. In Proc. 44th Annu. IEEE Symp. Foundations Comput. Sci. 2003; 534p.

Karger DR, Ruhl M. Finding nearest neighbors in growth restricted metrics. In Proc. 34th ACM Symp. Theory Comput. 2002; 741–750p.

Geusebroek JM, Burghouts GJ, Smeulders AWM. The Amsterdam library of object images. Int. J. Comput. Vis. 2005; 61(1): 103–112p.

Houle ME, Sakuma J. Fast approximate similarity search in extremely high-dimensional data sets. In Proc. 21st Intern. Conf. Data Eng. 2005; 619–630p.

Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 1967; 13(1): 21–27p.

Houle ME, Nett M. Rank cover trees for nearest neighbor search. In Proc. Int. Conf. Similarity Search Appl. 2013; 8199: 16–29p.

Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann; 2006.

Goodman JE, O’Rourke J. Handbook of discrete and computational geometry. Cleveland, OH, USA: CRC Press; 1997.

Indyk P, Motwani R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th ACM Symp. Theory Comput. 1998; 604–613p.

Houle ME. The relevant set correlation model for data clustering. Statist. Anal. Data Mining. 2008; 1(3): 157–176p.

Refbacks

There are currently no refbacks.

This site has been shifted to https://stmcomputers.stmjournals.com/

Username
Password
Remember me