Teradata - Hashing Algorithm

Advertisements

A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.

Following is a high level diagram on hashing algorithm.

Following are the steps to insert the data.

The client submits a query.
The parser receives the query and passes the PI value of the record to the hashing algorithm.
The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.
The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.
BYNET sends the data to the identified AMP.
AMP uses the 32 bit Row hash to locate the row within its disk.
If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.
The combination of Row hash and Uniqueness ID is called as Row ID.
Row ID prefixes each record in the disk.
Each table row in the AMP is logically sorted by their Row IDs.

How Tables are Stored

Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.

Row Hash	EmployeeNo	FirstName	LastName
2A01 2611	101	Mike	James
2A01 2612	104	Alex	Stuart
2A01 2613	102	Robert	Williams
2A01 2614	105	Robert	James
2A01 2615	103	Peter	Paul