What are the tools/languages to query Big Data?

1.	What are the tools/languages to query Big Data?
Answer» To query Big Data, there are various languages available. Some of these languages are either functional, dataflow, declarative, or imperative. Querying Big Data often involves certain challenges. For example: Unstructured data Latency Fault tolerance etc. By 'unstructured data’ we mean that the data, as well as the various data SOURCES, do not follow any particular format or protocol. By 'latency’ we mean the time taken by certain processes such as Map-Reduce to produce the result. By 'fault tolerance’ we mean the steps in the analysis that support partial failures, rolling back to previous results, etc. To query Big Data, there are various tools available. You have to decide which one to use as per your infrastructural requirements. The FOLLOWING are some of the tools/languages to query the Big Data: HIVEQL, Pig Latin, Scriptella, BigQuery, DB2 Big SQL, JAQL, etc. The tools such as Flume and Pig are based on the concept of processing pipeline which is explicit. The other approach is to translate the SQL into an equivalent construct in Big Data. For example, HiveQL, Drill, Impala, Dremel, etc. follow this approach. It is always desirable from a user perspective to use the second approach based on SQL. It is easy to follow and widely known. The query optimization part is left for the tool/system to perform. The major limitation of using such a query language is the built-in operators. They are very limited. The dataflow languages such as Flume and Pig are designed in such a manner to incorporate user-specified operators. Therefore such languages can be easily extensible. The construction of processing pipelines is a major limitation in such query languages. 'Presto' is a good example of a distributed 'SQL query' engine which is an open source also. It can run interactive analytical queries over various data stores. One of the features of Presto which is WORTH mentioning is its ability to COMBINE data from multiple stores by a single query. Thus it allows you to perform analytics across the entire organization.

Answer»

To query Big Data, there are various languages available. Some of these languages are either functional, dataflow, declarative, or imperative. Querying Big Data often involves certain challenges. For example:

Unstructured data
Latency
Fault tolerance etc.

By 'unstructured data’ we mean that the data, as well as the various data SOURCES, do not follow any particular format or protocol.
By 'latency’ we mean the time taken by certain processes such as Map-Reduce to produce the result.
By 'fault tolerance’ we mean the steps in the analysis that support partial failures, rolling back to previous results, etc.

To query Big Data, there are various tools available. You have to decide which one to use as per your infrastructural requirements. The FOLLOWING are some of the tools/languages to query the Big Data: HIVEQL, Pig Latin, Scriptella, BigQuery, DB2 Big SQL, JAQL, etc.

The tools such as Flume and Pig are based on the concept of processing pipeline which is explicit. The other approach is to translate the SQL into an equivalent construct in Big Data.

For example, HiveQL, Drill, Impala, Dremel, etc. follow this approach.

It is always desirable from a user perspective to use the second approach based on SQL. It is easy to follow and widely known. The query optimization part is left for the tool/system to perform.

The major limitation of using such a query language is the built-in operators. They are very limited. The dataflow languages such as Flume and Pig are designed in such a manner to incorporate user-specified operators.

Therefore such languages can be easily extensible. The construction of processing pipelines is a major limitation in such query languages.

'Presto' is a good example of a distributed 'SQL query' engine which is an open source also. It can run interactive analytical queries over various data stores.

One of the features of Presto which is WORTH mentioning is its ability to COMBINE data from multiple stores by a single query. Thus it allows you to perform analytics across the entire organization.

What are the tools/languages to query Big Data?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment