O: Oracle Join Methods

Nested Loops Joins

In Nested loops join, for each row of one table rows from the other tables are fetched based on the join conditions used and the where criteria. The rows from the second table are generally fetched using a index on the join column for optimum performance. The first table is the driver table, and must preferably have fewer number of rows for the filter criteria given. The second and subsequent tables are generally large tables and rows from these tables are fetched based on index filter on join column.

SELECT  FIRST_NAME, LAST_NAME, EMAIL,PHONE_NUMBER , jh.START_DATE , jh.JOB_ID, jh.DEPARTMENT_ID
FROM  JOB_HISTORY jh, EMPLOYEES e
WHERE  jh.EMPLOYEE_ID = e.EMPLOYEE_ID;

Explain plan for the same is shown below

Plan hash value: 589903929
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                    				| Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |               |    10 |   700 |    13   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                               |               |       |       |            |          |
|   2 |   NESTED LOOPS                              |               |    10 |   700 |    13   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                        | JOB_HISTORY   |    10 |   250 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN                        | EMP_EMP_ID_PK |     1 |       |     0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID 				| EMPLOYEES     |     1	|    45 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("E"."EMPLOYEE_ID"="JH"."EMPLOYEE_ID")

In the above example the job_history table has about 10 records and the employees table has about 10000 records. So the optimizer does a FTS of the job history table and for each record fetched, it selects the necessary data from employees table for each employee Id using index unique scan.

Ensure statistics are gathered on both the tables before taking the plans.
To enforce nested loops join method for a particular query following hint can be used - USE_NL
In the above query it can be used as follows after the select /*+ USE_NL(jh e) */

SELECT /*+ USE_NL(jh e) */ FIRST_NAME, LAST_NAME ....

Nested loops joins are useful in the following scenarios
When the first few rows want to be fetched immediately for display purpose.
The rows in the second and subsequent tables have a large number of rows and low cardinality for the join condition.
The first table(driving table) gets a selective set of data for the WHERE clause.

Sort - Merge Joins

In a sort-merge join Oracle will sort the data in each data set based on the columns in the join condition.
It will then match rows in both the data sets on the join columns and merge the data from them.
Oracle uses Sort merge joins when the data in both the data sets is fairly large (greater than 5% of the blocks accessed) and lack of data selectivity renders Nested Loops Joins ineffective.

EXPLAIN PLAN FOR
SELECT FIRST_NAME, LAST_NAME, jh.START_DATE , jh.JOB_ID, jh.DEPARTMENT_ID
d.DEPARTMENT_ID = jh.DEPARTMENT_ID)
FROM EMPLOYEES e, JOB_HISTORY jh
WHERE jh.EMPLOYEE_ID = e.EMPLOYEE_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan hash value: 2391369632

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |    10 |   440 |     6  (34)| 00:00:01 |
|   1 |  MERGE JOIN                  |                   |    10 |   440 |     6  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| JOB_HISTORY       |    10 |   250 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | JHIST_EMPLOYEE_IX |    10 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT JOIN                  |                   |   107 |  2033 |     4  (50)| 00:00:01 |
|   5 |    VIEW                      | index$_join$_001  |   107 |  2033 |     3  (34)| 00:00:01 |
|*  6 |     HASH JOIN                |                   |       |       |            |          |
|   7 |      INDEX FAST FULL SCAN    | EMP_NAME_IX       |   107 |  2033 |     1   (0)| 00:00:01 |
|   8 |      INDEX FAST FULL SCAN    | EMP_EMP_ID_PK     |   107 |  2033 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("JH"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
       filter("JH"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")
   6 - access(ROWID=ROWID)

Hash Joins

In Hash join, oracle builds a Hash table of the join key for one table in memory. It then uses a hashing algorithm to fetch records from the other tables. Hash joins are useful only when the PGA_AGGREGATE_TARGET value is set to a large enough value. It is similar to Nested Loops in the sense that it loops through the Hash table in memory and for each Hash value it fetches data from the other tables.

EXPLAIN PLAN FOR
SELECT FIRST_NAME, LAST_NAME, d.DEPARTMENT_NAME
FROM EMPLOYEES e, DEPARTMENTS d
WHERE d.DEPARTMENT_ID = e.DEPARTMENT_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan for the Hash Join query

Plan hash value: 2052257371

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1000K|    32M|  3041   (1)| 00:00:37 |
|*  1 |  HASH JOIN         |             |  1000K|    32M|  3041   (1)| 00:00:37 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS |    27 |   432 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMPLOYEES   |  1000K|    17M|  3033   (1)| 00:00:37 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")

The Hash table preferably should be on the smaller table for optimum execution plan. If one of the tables is small, it can be the first table in the join and you can use the ORDERED hint to ensure the Hash table is made on the smaller table.
Hash joins work only on equi joins. Hash joins use memory resources for creating the Hash table, so if memory is not sufficient it can result in increase in i/o in the temporary tablespace resulting in performance impact and making the query execution slow.

Cartesian Joins

Cartesian Joins are used when one or more tables in the query do not have a join condition with the other tables. The sql engine joins every row in one table with every other row in the other table. The join output is the product of the number of rows in each table.

EXPLAIN PLAN FOR
SELECT FIRST_NAME, LAST_NAME, d.DEPARTMENT_NAME
FROM EMPLOYEES e, DEPARTMENTS d
WHERE d.DEPARTMENT_ID = e.DEPARTMENT_ID;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Plan for the Hash Join query

Plan hash value: 2052257371

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1000K|    32M|  3041   (1)| 00:00:37 |
|*  1 |  HASH JOIN         |             |  1000K|    32M|  3041   (1)| 00:00:37 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS |    27 |   432 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMPLOYEES   |  1000K|    17M|  3033   (1)| 00:00:37 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("D"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")

In the above example number of rows in employees table is 1M. and the number of rows in departments table is 27.
So the 27M is the product of the 2 counts.
Generally tables are always joined on key columns. Cartesian joins may be used in cases where for the records of one table all records from the other table need to be accessed, all other conditions being fulfilled.

Cluster Joins

Cluster joins occurs only when you join tables in a multi table cluster and the equal sign is used on the cluster key columns. Oracle looks up the rows in one table based on the cluster key and join conditions. For the selected rows it selects rows from the other tables in the cluster. The advantage is that it does not have to scan the second table, as the key columns for the table also reside in the same block. So rows from the second table are got by default.

Anti Joins
In earlier versions of Oracle prior to version 7.3 Oracle could not resolve sub queries for the NOT IN condition efficiently. For each row in the outer query the inner query would get processed. So if the outer query were to be processing a million records, the subquery would be executed a million times rendering the query very inefficient. In Anti joins, rows from the subquery are selected once, compared to the rows in the outer query and all the matching rows are discarded. The sub query cannot be a corelated sub query for Anti joins to work. The sub query must be a NOT IN sub query.
Following hints can used to enforce Anti Joins : UNNEST, HASH_AJ or MERGE_AJ.

Thought for the day

Pleasant speech multiplies friends

and a gracious tongue multiplies courtesies

Let those who are friendly with you be many

but let your advisers be one in a thousand

When you gain friends gain them through testing

and do not trust them hastily

Sirach 6:5-7

Navigation Bar

Saturday, September 3, 2016

Oracle Join Methods

Nested Loops Joins

Sort - Merge Joins

Hash Joins

Cartesian Joins

Cluster Joins

No comments:

Post a Comment