Open Access Open Access  Restricted Access Subscription Access

A Novel Approach for Performance Analysis of Dynamic Information Integration

Vikash Kumar Garg, Ashish Oberoi, Manish Arora

Abstract


Abstract

Data analysis and management was not a problem few years ago as the amount of data generated was not as huge as to cause any complexity. But in the recent years the amount of data being generated has increased exponentially. Thus management and analysis has become a problem since the traditional database management systems were not designed to handle such large amounts of data. The relational database management systems still can handle large data sets, but it increases the complexity thus making it a difficult task. Hadoop comes into picture as a solution to this problem. Hive which is an open source data warehouse built on the Hadoop framework provides a solution to handle large datasets. It provides an SQL tongue called Hive Query Language (HQL) for querying and processing of large sets of data. But the problem with RDBMS is that it takes more time when queries are applied to convert data in rows to columns, i.e., horizontal to vertical. This limitation of Relational Database Management System (RDBMS) is improved by using the combination of UDF and HQL in Hive. With the help of this approach, many calculations that are outside the scope of built in RDBMS operations and functions in Hive like query many columns, combine several column values into one and transformations that are taking more time in RDBMS, can be solved easily. In the proposed work seven different data sets are taken from web for experimental results. Aggregating queries using RDBMS and Hive are run on these data sets with the combination of UDF. The results obtained on these data sets show that combination of UDF with HQL is better than RDBMS when aggregation queries are fired on horizontal data and to join many columns in one. In the proposed work datasets of varying sizes have been analyzed using RDBMS, i.e., MySQL and MS SQL and then using Hive. Different comparison has been done which shows the advantage of using Hive over RDBMS.

Keywords: MySQL, MS SQL, Hadoop, Hive, UDF, HQL, HiveServer2, Dlimit, Processing Big Data

Cite this Article

Vikash Kumar Garg, Ashish Oberoi, Manish Arora. A Novel Approach for Performance Analysis of Dynamic Information Integration. Journal of Advances in Shell Programming. 2019; 6(1): 27–41p.



Full Text:

PDF

Refbacks

  • There are currently no refbacks.