Understanding SQL Joins – The Left Join

SQL joins are a crucial part of anything more than the simplest of queries. Many programmers who do not fully comprehend SQL joins end up writing bloated software which will pull information from one table, store it, then run another query to get the information they want to join on. They then use code to process the join. It is much more efficient to have the SQL server handle the joining and processing of these records for you than to create custom code which ties tables together.




Let’s use these tables as an example.

AnimalTypes table

AnimalTypeID Name
1 Dog
2 Cat
3 Turtle
4 Ferret

Animals table

AnimalID AnimalTypeID Name Age
1 1 Dusty 5
2 3 Jonesey 2
3 2 Bonnie 1
4 3 Fiddler 3
5 1 Marci 1
6 5 Tails 2

Here we have a table called AnimalTypes which contains 4 types of animals, and a table called Animals which lists out animals and what type they are.

Let’s say we want to write a program which will display all the animals in our animals table, as well as the Name of the Type of the animal, not the ID of the type. If an animal has an unknown type, we want to display “UNKNOWN.” Here is the bad process that we want to avoid.

SELECT * FROM AnimalTypes
SELECT * FROM Animals

The bad program will then loop through all the animals and print out the Name field in the AnimalTypes results which corresponds to the given type ID. If one does not exist in the AnimalTypes results, the code could then print “UNKNOWN.”

It also might occur to you that you can write a query like this one, which would also be incorrect.

SELECT
      *
FROM
      Animals, AnimalTypes
WHERE
      Animals.AnimalTypeID = AnimalTypes.AnimalTypeID

This is the basis of an inner join. It can be written as an inner join query, but I wrote it in the manner above as an example. This query will return results, however you will only get animals who have a corresponding correct AnimalType. Our program wants to display all animals, regardless of whether or not they have a valid type selected. Using the tables from above, you would get these results:

AnimalID AnimalTypeID Name Age AnimalTypeID Name
1 1 Dusty 5 1 Dog
2 3 Jonesey 2 3 Turtle
3 2 Bonnie 1 2 Cat
4 3 Fiddler 3 3 Turtle
5 1 Marci 1 1 Dog

If you notice, the animal Tails is missing from the list. This is because Tails’ AnimalTypeID of 5 did not have a corresponding AnimalTypeID in the AnimalTypes table. In a query such as this, both values have to exist in both tables. This is not the case in our data, even though we want all animals to display.

What we need to use in this case is a left join or a left outer join. The left join will take each record from the left-hand side of the join, and tie it to records on the right-hand side. If the tie does not exist in the right-hand table, then the left-hand fields will still exist in the results, with NULL values for the right-hand.

Here is the query using a left join.

1
2
3
4
5
6
SELECT 
     * 
FROM 
     Animals
     LEFT JOIN AnimalTypes
          ON Animals.AnimalTypeID = AnimalTypes.AnimalTypeID

What you get here is every record from Animals, joined to the corresponding AnimalType record. In the case of Tails, we won’t get any AnimalType information.

AnimalID AnimalTypeID Name Age AnimalTypeID Name
1 1 Dusty 5 1 Dog
2 3 Jonesey 2 3 Turtle
3 2 Bonnie 1 2 Cat
4 3 Fiddler 3 3 Turtle
5 1 Marci 1 1 Dog
6 5 Tails 2 NULL NULL

A very common use of this join is in financial transactions, where you might be missing the category of the transaction but still want the amount included.


Follow Dave on Twitter

Leave a Reply