Open Access Open Access  Restricted Access Subscription Access

An Implementation of Extracting Data and Mining Extracted Data from Web Pages

Dhanashri Sandbhor, Archana Jadhav, Govind Kumar Dubey, Sonali Pawade, Rupesh Dhole

Abstract


Web contains various information of particular object, which could be relevant as well as non-relevant are called as data records. It is necessary to extract relevant information from web pages. Web data extraction is the system which is used for extracting data from various web pages. Data present on web pages are in un-structured format. In the process of data extraction, we convert un-structured data into structured format. This paper contains web data extraction system, stages of making a mashup and the data mining concept for data clustering. Mashup is the process which provides functionality such as data retrieval, data source modeling, data cleaning/filtering, data integration, and data visualization. We use “Xtractorz” system for data extraction and mashup for data records. By using data mining, we analyze data from different sources and using text mining we cluster all the information. And we also use those data for providing value added services


Keywords


Web, data, mashup, data clustering

Full Text:

PDF

References


Knoblock CA, Lerman K, Minton S, et al. Accurately and reliably extracting data from the web: A machine learning approach. Intelligent Exploration of the Web. Springer-Verlag, Berkeley, CA; 2003.

Chamberlin D, et al. (Eds). XQuery: A query language for XML. http://www.w3.org, 2001.

Huynh D, Mazzocchi S, Karger D. Piggy bank: Experience the semantic web inside your web browser. In: Proc. of ISWC. 2005.

Google Map Facility, http://maps.google.com, last accessed 12 October 2009.

Wong Jeffrey, Hong Jason I. Making Mashups with Marmite: Towards End-User Programming for the Web. Human-Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, last downloaded 12 October 2009.

Kapow Technologies. Kapow Mashup Server 6.3 Robomaker User Guide. http://www.kapowtech.com, last accessed 12 October 2009.

Lerman K, Plangrasopchok A, Knoblock CA. Semantic labeling of online information sources. In: Pavel Shaiko (Ed.). IJSWIS, Special Issue on Ontology Matching. 2007.

Lee Y, Sayyadian M, Doan A, et al. eTuner: Tuning schema matching software using synthetic scenarios. VLDB Journal, Special Issue 2006.

Lixto Technologies. Lixto Visual Developer. http://www.lixto.com, last accessed 12 October 2009.


Refbacks

  • There are currently no refbacks.