InterviewSolution
| 1. |
What are the tools/languages to query Big Data? |
|
Answer» To query Big Data, there are various languages available. Some of these languages are either functional, dataflow, declarative, or imperative. Querying Big Data often involves certain challenges. For example:
To query Big Data, there are various tools available. You have to decide which one to use as per your infrastructural requirements. The FOLLOWING are some of the tools/languages to query the Big Data: HIVEQL, Pig Latin, Scriptella, BigQuery, DB2 Big SQL, JAQL, etc. The tools such as Flume and Pig are based on the concept of processing pipeline which is explicit. The other approach is to translate the SQL into an equivalent construct in Big Data. For example, HiveQL, Drill, Impala, Dremel, etc. follow this approach. It is always desirable from a user perspective to use the second approach based on SQL. It is easy to follow and widely known. The query optimization part is left for the tool/system to perform. The major limitation of using such a query language is the built-in operators. They are very limited. The dataflow languages such as Flume and Pig are designed in such a manner to incorporate user-specified operators. Therefore such languages can be easily extensible. The construction of processing pipelines is a major limitation in such query languages. 'Presto' is a good example of a distributed 'SQL query' engine which is an open source also. It can run interactive analytical queries over various data stores. One of the features of Presto which is WORTH mentioning is its ability to COMBINE data from multiple stores by a single query. Thus it allows you to perform analytics across the entire organization. |
|