The Impala GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups.
Following is the syntax of the GROUP BY clause.
select data from table_name Group BY col_name;
Assume we have a table named customers in the database my_db and its contents are as follows −
[quickstart.cloudera:21000] > select * from customers; Query: select * from customers +----+----------+-----+-----------+--------+ | id | name | age | address | salary | +----+----------+-----+-----------+--------+ | 1 | Ramesh | 32 | Ahmedabad | 20000 | | 2 | Khilan | 25 | Delhi | 15000 | | 3 | kaushik | 23 | Kota | 30000 | | 4 | Chaitali | 25 | Mumbai | 35000 | | 5 | Hardik | 27 | Bhopal | 40000 | | 6 | Komal | 22 | MP | 32000 | +----+----------+-----+-----------+--------+ Fetched 6 row(s) in 0.51s
You can get the total amount of salary of each customer using GROUP BY query as shown below.
[quickstart.cloudera:21000] > Select name, sum(salary) from customers Group BY name;
On executing, the above query gives the following output.
Query: select name, sum(salary) from customers Group BY name +----------+-------------+ | name | sum(salary) | +----------+-------------+ | Ramesh | 20000 | | Komal | 32000 | | Hardik | 40000 | | Khilan | 15000 | | Chaitali | 35000 | | kaushik | 30000 | +----------+-------------+ Fetched 6 row(s) in 1.75s
Assume that this table has multiple records as shown below.
+----+----------+-----+-----------+--------+ | id | name | age | address | salary | +----+----------+-----+-----------+--------+ | 1 | Ramesh | 32 | Ahmedabad | 20000 | | 2 | Ramesh | 32 | Ahmedabad | 1000| | | 3 | Khilan | 25 | Delhi | 15000 | | 4 | kaushik | 23 | Kota | 30000 | | 5 | Chaitali | 25 | Mumbai | 35000 | | 6 | Chaitali | 25 | Mumbai | 2000 | | 7 | Hardik | 27 | Bhopal | 40000 | | 8 | Komal | 22 | MP | 32000 | +----+----------+-----+-----------+--------+
Now again, you can get the total amount of salaries of the employees, considering the repeated entries of records, using the Group By clause as shown below.
Select name, sum(salary) from customers Group BY name;
On executing, the above query gives the following output.
Query: select name, sum(salary) from customers Group BY name +----------+-------------+ | name | sum(salary) | +----------+-------------+ | Ramesh | 21000 | | Komal | 32000 | | Hardik | 40000 | | Khilan | 15000 | | Chaitali | 37000 | | kaushik | 30000 | +----------+-------------+ Fetched 6 row(s) in 1.75s