Domain Specific Parallel Crawler Architecture

Himanshu verma


The World Wide Web is an interlinked gathering of billions of reports organized utilizing HTML. Because of the developing and dynamic nature of the web, it has turned into a test to cross all URLs in the web archives and handle these URLs, so it has ended up basic to parallelize a creeping prepare. The crawler process is further being parallelized in the shape nature of crawler specialists that parallel download data from the web. This paper proposes a novel engineering of parallel crawler, which depends on area particular creeping, makes slithering assignment more compelling, versatile and load-sharing among the distinctive crawlers which parallel download website pages identified with various spaces particular URLs.

Keywords: URL (uniform resource locator), URI (uniform resource identifier), crawl work

Himanshu Verma. Domain Specific Parallel Crawler Architecture. Recent Trends in Parallel Computing. 2016; 3(1): 17–21p.

