Domain Specific Parallel Crawler Architecture

Himanshu verma

Domain Specific Parallel Crawler Architecture

Himanshu verma

Abstract

The World Wide Web is an interlinked gathering of billions of reports organized utilizing HTML. Because of the developing and dynamic nature of the web, it has turned into a test to cross all URLs in the web archives and handle these URLs, so it has ended up basic to parallelize a creeping prepare. The crawler process is further being parallelized in the shape nature of crawler specialists that parallel download data from the web. This paper proposes a novel engineering of parallel crawler, which depends on area particular creeping, makes slithering assignment more compelling, versatile and load-sharing among the distinctive crawlers which parallel download website pages identified with various spaces particular URLs.

Keywords: URL (uniform resource locator), URI (uniform resource identifier), crawl work

Cite this Article
Himanshu Verma. Domain Specific Parallel Crawler Architecture. Recent Trends in Parallel Computing. 2016; 3(1): 17–21p.

Full Text:

PDF

References

Burner M. Crawling towards Eternity: Building An Archive of The World Wide Web. In Web Techniques Magazine. 1997; 2(5): 37–40p.

Yadav D, Sharma AK, Gupta JP, Garg N, Mahajan A. Architecture for Parallel Crawling and Algorithm for Change Detection in Web Pages. In Proceedings of the 10th international Conference on information Technology (December 17 - 20, 2007). ICIT. IEEE Computer Society, Washington, DC. 2007: 258–264p.

Yu C, Lin S. Parallel Crawling and Capturing for On-Line Auction. In Proceedings of the IEEE ISI Paisi, Paccf, and SOCO international Workshops on intelligence and Security informatics (Taipei, Taiwan, June 17 - 17, 2008) Springer-Verlag, Berlin, Heidelberg. 2008; 5075: 455–466p.

Balamurugan Newlin, Rajkumar, Preethi J. Design and Implementation of a New Model Web Crawler with Enhanced Reliability. 2008.

Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. In Computer Networks and ISDN Systems. 1998; 30(1–7): 107–117p.

Cho J, Garcia-Molina H. Parallel Crawlers. In WWW’02. 11th International World Wide Web Conference. 2002.

Junghoo Cho, Hector Garcia–Molina. The Evolution of the Web and implementation for an incremental crawler. Prc. of VLDB Conf. 2000.

Heydon A, Najork M. Mercator: A scalable, extensible Web crawler. In World Wide Web. 1999; 2(4): 219–229p.

Refbacks

There are currently no refbacks.

This site has been shifted to https://stmcomputers.stmjournals.com/

Username
Password
Remember me