Monday, September 5, 2011

Netezza: Joining tables based on columns that may have null values


Scenario:


Table: COWS_ONE

CNUMBER
COW_TYPE
COW_CATEGORY

1
Holstein
Nepali

2
NULL
Angus

3
NULL
Nepali

4
Guernsey
NULL

5
NULL
NULL

Table: COWS_TWO

CNUMBER
COW_TYPE
COW_CATEGORY
DESCRIPTION

1
Holstein
Nepali
FIRST

2
NULL
Angus
SECOND

3
Guernsey
NULL
THIRD

4
NULL
NULL
FOURTH

These queries might not give the desired results:

Test select statement #1 (left join):

SELECT
C1.CNUMBER, C1.COW_TYPE, C1.COW_CATEGORY
, C2.CNUMBER, C2.COW_TYPE, C2.COW_CATEGORY, C2.DESCRIPTION
FROM
COWS_ONE C1
LEFT OUTER JOIN
COWS_TWO C2
ON    
       (C1.COW_TYPE = C2.COW_TYPE
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
ORDER BY 1   
;

CNUMBER
COW_TYPE
COW_CATEGORY
CNUMBER_1
COW_TYPE_1
COW_CATEGORY_1
DESCRIPTION

1
Holstein
Nepali
1
Holstein
Nepali
FIRST

2
NULL
Angus
NULL
NULL
NULL
NULL

3
NULL
Nepali
NULL
NULL
NULL
NULL

4
Guernsey
NULL
NULL
NULL
NULL
NULL

5
NULL
NULL
NULL
NULL
NULL
NULL


Test select statement #2 (inner join):

SELECT C1.CNUMBER, C1.COW_TYPE, C1.COW_CATEGORY
, C2.CNUMBER, C2.COW_TYPE, C2.COW_CATEGORY, C2.DESCRIPTION
FROM
COWS_ONE C1
INNER JOIN
COWS_TWO C2
ON    
       (C1.COW_TYPE = C2.COW_TYPE
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
ORDER BY 1   
;

CNUMBER
COW_TYPE
COW_CATEGORY
CNUMBER_1
COW_TYPE_1
COW_CATEGORY_1
DESCRIPTION

1
Holstein
Nepali
1
Holstein
Nepali
first


Try these solutions:

Solution select statement #1 (left join):

SELECT C1.CNUMBER, C1.COW_TYPE, C1.COW_CATEGORY
, C2.CNUMBER, C2.COW_TYPE, C2.COW_CATEGORY, C2.DESCRIPTION
FROM
COWS_ONE C1
LEFT OUTER JOIN
COWS_TWO C2
ON    
       (C1.COW_TYPE IS NOT NULL
       AND C1.COW_CATEGORY IS NOT NULL
       AND C1.COW_TYPE = C2.COW_TYPE
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_TYPE IS NULL
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_CATEGORY IS NULL
       AND C1.COW_TYPE = C2.COW_TYPE)          
ORDER BY 1   
;

CNUMBER
COW_TYPE
COW_CATEGORY
CNUMBER_1
COW_TYPE_1
COW_CATEGORY_1
DESCRIPTION

1
Holstein
Nepali
1
Holstein
Nepali
FIRST

2
NULL
Angus
2
NULL
Angus
SECOND

3
NULL
Nepali
1
Holstein
Nepali
FIRST

4
Guernsey
NULL
3
Guernsey
NULL
THIRD

5
NULL
NULL
NULL
NULL
NULL
NULL
      

Solution select statement #2 (inner join):

SELECT C1.CNUMBER, C1.COW_TYPE, C1.COW_CATEGORY
, C2.CNUMBER, C2.COW_TYPE, C2.COW_CATEGORY, C2.DESCRIPTION
FROM
COWS_ONE C1
INNER JOIN
COWS_TWO C2
ON    
       (C1.COW_TYPE IS NOT NULL
       AND C1.COW_CATEGORY IS NOT NULL
       AND C1.COW_TYPE = C2.COW_TYPE
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_TYPE IS NULL
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_CATEGORY IS NULL
       AND C1.COW_TYPE = C2.COW_TYPE)          
ORDER BY 1   
;

CNUMBER
COW_TYPE
COW_CATEGORY
CNUMBER_1
COW_TYPE_1
COW_CATEGORY_1
DESCRIPTION

1
Holstein
Nepali
1
Holstein
Nepali
FIRST

2
NULL
Angus
2
NULL
Angus
SECOND

3
NULL
Nepali
1
Holstein
Nepali
FIRST

4
Guernsey
NULL
3
Guernsey
NULL
THIRD

Solution select statement #3 (left join):


SELECT C1.CNUMBER, C1.COW_TYPE, C1.COW_CATEGORY
, C2.CNUMBER, C2.COW_TYPE, C2.COW_CATEGORY, C2.DESCRIPTION
FROM
COWS_ONE C1
INNER JOIN
COWS_TWO C2
ON    
       (C1.COW_TYPE IS NOT NULL AND C1.COW_CATEGORY IS NOT NULL
       AND C2.COW_TYPE IS NOT NULL AND C2.COW_CATEGORY IS NOT NULL
       AND C1.COW_TYPE = C2.COW_TYPE
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_TYPE IS NULL AND C2.COW_TYPE IS NULL
       AND C1.COW_CATEGORY = C2.COW_CATEGORY)
       OR
       (C1.COW_CATEGORY IS NULL AND C2.COW_CATEGORY IS NULL
       AND C1.COW_TYPE = C2.COW_TYPE)          
ORDER BY 1   
;


CNUMBER
COW_TYPE
COW_CATEGORY
CNUMBER_1
COW_TYPE_1
COW_CATEGORY_1
DESCRIPTION

1
Holstein
Nepali
1
Holstein
Nepali
FIRST

2
NULL
Angus
2
NULL
Angus
SECOND

4
Guernsey
NULL
3
Guernsey
NULL
THIRD


The fundamental logic to understand is that joining NULL values does not work.  Even though you have two NULL values Server does not treat these as the same value.  Internally a value of NULL is an unknown value and therefore Server does not equate an unknown value being equal to another unknown value.