Open Access Open Access  Restricted Access Subscription or Fee Access

Data Extraction Using NLP for Unstructured Text Categorization

Waarengeye Varun Vikram

Abstract


Abstract

 

In this research project, I tried to crawl the web and files to create a dataset so that it can be fetched to FRL (Fuzzy rough set-based semi-supervised learning algorithm). The approach used in the project is with the help of semi-supervised learning that made use of unlabeled data for training typically a small amount of labeled data with a large amount of unlabeled data. We de ne and use various Information extraction and Web data mining techniques to categorize nouns and find their context while tokenizing it with nouns. The mentioned techniques are application of Natural Language Processing part-of-speech(pos) tagging. The Dataset is then created by categorizing these nouns and phrases with the context in a tree like structure which is termed as chunking in Natural Language Processing(NLP). It helps extraction of the nouns through particular toolkit library available in programming languages. After the categorization is done we process the data labeled out and give it to FRL so that it can rank them accordingly and the text categorization is complete.

Keywords: Natural Language Processing, POS tagging, Web Information Extraction, Rough Sets, Advanced Machine Learning, Semi-Supervised learning, Data Categorization

Cite this Article
Waarengeye Varun Vikram. Data Extraction Using NLP for Unstructured Text Categorization. Journal of Software Engineering Tools & Technology Trends. 2017; 4(3): 30–39p.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.


This site has been shifted to https://stmcomputers.stmjournals.com/