Pattern Discovery for Text Mining

Ravindra Changala, D Rajeswara Rao, Vidyullatha P, Annapurna Gummadi

Abstract


Text mining refers to the process of extracting interesting, non-trivial information and knowledge from unstructured text. The challenging issue is to find accurate cognition or characteristics in text documents to help users to find what they want. Many term-based methods solved this but later suffered from the problems of polysemy and synonymy where polysemy means a word has multiple meanings and synonymy is multiple words having the same meaning. Later phrase-based approaches served well, as phrases may carry more “semantics” like information. The performance of it decreases due to phrases having inferior statistical properties to terms, low frequency of occurrence, and large numbers of redundant and noisy phrases among them. To overcome this pattern mining-based approaches have been proposed, which include the concept of closed sequential patterns, and pruned non-closed patterns. These pattern mining-based approaches worked with effectiveness. The contradiction is people think pattern-based approaches are significant alternative, but less improvements is made for the effectiveness compared with term-based methods.

Keywords: Text mining, closed sequential patterns, pruned non-closed patterns, taxonomy discovery model, term based, phrase based


Full Text:

PDF

References


Ning Zhong, Yuefeng Li, and Sheng-Tang Wu. Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering. January 2012; 24(1).

Caropreso MF, Matwin S, Sebastiani F. Statistical phrases in automated text categorization. Technical Report IEI-B4-07-2000. Instituto di Elaborazione dell’Informazione; 2000.

Huang Y, Lin S. Mining sequential patterns using graph search techniques. Proc. 27th Ann. Int’l Computer Software and Applications Conf. 2003; 4–9p.

Lewis DD. Feature selection and feature extraction for text categorization. Proc. Workshop Speech and Natural Language. 1992; 212–7p.

Li Y, Zhou X, Bruza P, et al. A two-stage text mining model for information filtering. Proc. ACM 17th Conf. Information and Knowledge Management (CIKM ’08). 2008; 1023–32p.

Moulinier I, Raskinis G, Ganascia J. Text Categorization: A Symbolic Approach. Proc. Fifth Ann. Symp. Document Analysis and Information Retrieval (SDAIR). 1996; 87–99p.

Sharma R, Raman S. Phrase-based text representation for managing the web document. Proc. Int’l Conf. Information Technology: Computers and Comm. (ITCC). 2003; 165-9p.

Shehata S, Karray F, Kamel M. Enhancing text clustering using concept-based mining model. Proc. IEEE Sixth Int’l Conf. Data Mining (ICDM ’06). 2006; 1043–8p.

Wu S-T, Li Y, Xu Y. Deploying approaches for pattern refinement in text mining. Proc. IEEE Sixth Int’l Conf. Data Mining (ICDM ’06). 2006; 1157–61p.

Wu S-T, Li Y, Xu Y. et al. Automatic pattern-taxonomy extraction for web mining. Proc. IEEE/WIC/ACM Int’l Conf. Web Intelligence (WI ’04). 2004; 242–8p.

Zaki M. Spade: An efficient algorithm for mining frequent sequences. Machine Learning. 2001; 40: 31–60p.

Agrawal R, Mannila H, Srikant R, et al. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press; 1996; 307–28p.

Ahonen H. Finding all maximal frequent sequences in text. ICML99 Workshop, Machine Learning in Text Data Analysis. 1999.

Ahonen H. Knowledge discovery in documents by extracting frequent word sequences. Library Trends. 1999; 48(1): 160–81p.

Fu YJ. Data mining: Tasks, techniques and applications. IEEE Potentials. 1997; 16(4): 18–20p.

Kum H, Chang JH, Wang W. Sequential pattern mining in multi-databases via multiple alignments. Data Mining and Knowledge Discovery. 2006; 12(2-3): 151–80p.


Refbacks

  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »
  • »


This site has been shifted to https://stmcomputers.stmjournals.com/