Main Article Content

Abstract

These instructions give you guidelines for Smart crawler. As deep web grows, there has been increased interest in techniques that help efficiently locate deep-web interfaces. Due to the large volume of web resources, achieving wide coverage and high efficiency is a challenging issue. So This project proposes a two-stage framework, namely Smart Crawler, for efficient harvesting deep web interfaces. In the first stage, Smart Crawler performs site-based searching for center pages .In second stage, Smart Crawler ranks websites to prioritize highly relevant ones for a given topic. To eliminate bias on visiting some highly relevant links in hidden web directories, the project designs a link tree data structure to achieve wider coverage for a website. To eliminate bias on visiting some highly relevant links in hidden web directories, the project designs a link tree data structure to achieve wider coverage for a website. This project provides the experimental result on a set of representative domains show the agility and accuracy of proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers. The deep (or hidden) web refers to the contents lie behind searchable web interfaces that cannot be indexed by searching engines. These data contain a vast amount of valuable information and entities such as Info mine may be interested in building an index of the deep web sources in a given domain (such as book).

Article Details

How to Cite
N. ZahiraJahan, & M. Kalaipriya. (2018). A Two-Stage Crawler for Efficiently Harvesting Deep-Web Interfaces . International Journal of Intellectual Advancements and Research in Engineering Computations, 6(2), 1497–1502. Retrieved from https://ijiarec.com/ijiarec/article/view/684