This chapter discusses the various JOIN strategies available in Teradata.
Teradata uses different join methods to perform join operations. Some of the commonly used Join methods are −
Merge Join method takes place when the join is based on the equality condition. Merge Join requires the joining rows to be on the same AMP. Rows are joined based on their row hash. Merge Join uses different join strategies to bring the rows to the same AMP.
If the join columns are the primary indexes of the corresponding tables, then the joining rows are already on the same AMP. In this case, no distribution is required.
Consider the following Employee and Salary Tables.
CREATE SET TABLE EMPLOYEE,FALLBACK ( EmployeeNo INTEGER, FirstName VARCHAR(30) , LastName VARCHAR(30) , DOB DATE FORMAT 'YYYY-MM-DD', JoinedDate DATE FORMAT 'YYYY-MM-DD', DepartmentNo BYTEINT ) UNIQUE PRIMARY INDEX ( EmployeeNo );
CREATE SET TABLE Salary ( EmployeeNo INTEGER, Gross INTEGER, Deduction INTEGER, NetPay INTEGER ) UNIQUE PRIMARY INDEX(EmployeeNo);
When these two tables are joined on EmployeeNo column, then no redistribution takes place since EmployeeNo is the primary index of both the tables which are being joined.
Consider the following Employee and Department tables.
CREATE SET TABLE EMPLOYEE,FALLBACK ( EmployeeNo INTEGER, FirstName VARCHAR(30) , LastName VARCHAR(30) , DOB DATE FORMAT 'YYYY-MM-DD', JoinedDate DATE FORMAT 'YYYY-MM-DD', DepartmentNo BYTEINT ) UNIQUE PRIMARY INDEX ( EmployeeNo );
CREATE SET TABLE DEPARTMENT,FALLBACK ( DepartmentNo BYTEINT, DepartmentName CHAR(15) ) UNIQUE PRIMARY INDEX ( DepartmentNo );
If these two tables are joined on DeparmentNo column, then the rows need to be redistributed since DepartmentNo is a primary index in one table and non-primary index in another table. In this scenario, joining rows may not be on the same AMP. In such case, Teradata may redistribute employee table on DepartmentNo column.
For the above Employee and Department tables, Teradata may duplicate the Department table on all AMPs, if the size of Department table is small.
Nested Join doesn’t use all AMPs. For the Nested Join to take place, one of the condition should be equality on the unique primary index of one table and then joining this column to any index on the other table.
In this scenario, the system will fetch the one row using Unique Primary index of one table and use that row hash to fetch the matching records from other table. Nested join is the most efficient of all Join methods.
Product Join compares each qualifying row from one table with each qualifying row from other table. Product join may take place due to some of the following factors −