SQL

Using COALESCE() in Linq-To-SQL

Posted in CSharp, SQL on November 23rd, 2009 by Dave Andrews – 2 Comments

I had a situation today where I wanted to return a user’s name using Linq. Sure, I could have easily written some code for this, but for the sake of keeping my code simple and readable I wanted Linq to handle the building of the name.

Users Table

UserID FirstName LastName
1 David Andrews
2 Bob NULL

The Linq seemed easy:

1
2
3
4
5
6
7
8
9
10
public static string GetNameOfUser(int userID)
        {
            DatabaseDataContext dc = new DatabaseDataContext();
 
            var user = from u in dc.Users
                       where u.UserID == userID
                       select u.FirstName + " " + u.LastName;
 
            return (user.Count() == 0) ? "Unknown User" : user.First();
        }

However my results were not correct. Bob was left out because he did not have a last name value (it’s NULL in the database.) You will get results like this if you run that for each user (one at a time, but I put them in a table here.)

Results

Expected Value Received
David Andrews David Andrews
Bob <Empty String>

In SQL it is very common to use the COALESCE() function to handle problems with null values. But how can I do that in Linq? I didn’t want to write extra code to check for nulls and concatenate the values myself.

The answer is simple: ??. Yes, that’s right. Two question marks is the “coalesce” operator in Linq. It’s a good idea to wrap what you are testing in coalesce inside parenthesis as well.

What follows is my new function.

1
2
3
4
5
6
7
8
9
10
public static string GetNameOfUser(int userID)
        {
            DatabaseDataContext dc = new DatabaseDataContext();
 
            var user = from u in dc.Users
                       where u.UserID == userID
                       select (u.FirstName ?? "") + " " + (u.LastName ?? "");
 
            return (user.Count() == 0) ? "Unknown User" : user.First();
        }

As you can see, I’ve added ?? “” after both FirstName and LastName. This translates to:

coalesce(FirstName, ”)
coalesce(LastName, ”)

Which means that if FirstName is null you will get ”, which adds correctly in SQL to another string. Same goes for LastName.

Here are my new results:

Results

Expected Value Received
David Andrews David Andrews
Bob Bob

Join on Multiple Columns in Linq To SQL

Posted in CSharp, SQL on November 20th, 2009 by Dave Andrews – Be the first to comment

Today I had to create a Linq-To-SQL query which joined two tables. These tables had compound primary keys (table simplified for example):

Table Referrers

ServerID (PK) ReferrerID (PK) Value
1 1 My value 1
1 2 My value 2
1 3 My value 3

Table ReferrerInfo

ServerID (PK) ReferrerID (PK) Value
1 1 More info… 1
1 2 More info… 2
1 3 More Info… 3

So I wanted to join these together using a Linq-To-SQL query. The first thought that occured to me was to use a join:

1
2
3
var referrers = from r in Referrers
               join ri in ReferrerInfo on r.ServerConnectionID equals ri.ServerConnectionID ..... ??????
               select r;

So, apparently in C# you cannot do multiple columns in your join. I have to join on both ServerConnectionID and ReferrerID.

This page: http://www.onedotnetway.com/linq-to-sql-join-on-multiple-conditions/ explains one method to do so, but unfortunately that Linq query only works in VB.

So, I ended up writing a Linq-To-SQL that just uses the old style of joining, by using the where clause!

1
2
3
4
5
var referrers = from r in Referrers
                from ri in Referrer_Info
                where r.ServerConnectionID == ri.ServerConnectionID &&
                   r.ReferrerID == ri.ReferrerID
                select r;

This successfully executes my multi-column join.

Using Row_Number() to Enumerate and Partition Records in SQL Server

Posted in SQL on October 30th, 2009 by Dave Andrews – Be the first to comment

I had a situation recently where I had a table full of people records, where the people were divided into families. The business logic that needed to be followed was that I had to assign a “Twin Code” to each record. This meant that for each family in the database, if two or more members were born on the same day they should be treated as twins. The twins should be assigned a number enumerating them in order of birth. If the member was not a twin, they should just receive the twin code of 1.

Here’s an example table:

PersonID FamilyID FirstName LastName DateOfBirth
1 1 Joe Johnson 2000-10-23 13:00:00
2 1 Jim Johnson 2001-12-15 05:45:00
3 2 Karly Matthews 2000-05-20 04:00:00
4 2 Kacy Matthews 2000-05-20 04:02:00
5 2 Tom Matthews 2001-09-15 11:52:00

There are lots of ways to achieve the desired result, but the simplest is to just use a simple SELECT statement combined with the ROW_NUMBER() function with a couple parameters as to how to number the rows!

ROW_NUMBER() provides you with the number of a row in a given recordset, where you provide details on how to number the records. For example, if I just had to number the records above based solely upon the date of birth (ignoring families) then I would use this query:

1
2
3
4
5
6
7
8
9
10
11
SELECT
     [PersonID]
    ,[FamilyID]
    ,[FirstName]
    ,[LastName]
    ,[DateOfBirth]
    ,ROW_NUMBER() over (ORDER BY DateOfBirth) AS Number
FROM
	People
ORDER BY 
	PersonID

This just tells the ROW_NUMBER() function to order its numbering ascending by DateOfBirth. Notice that I apply an order myself later on in the query, which is different than the row_number() order. I would get these results:

PersonID FamilyID FirstName LastName DateOfBirth Number
1 1 Joe Johnson 2000-10-23 13:00:00 3
2 1 Jim Johnson 2001-12-15 05:45:00 5
3 2 Karly Matthews 2000-05-20 04:00:00 1
4 2 Kacy Matthews 2000-05-20 04:02:00 2
5 2 Tom Matthews 2001-09-15 11:52:00 4

The number field that is assigned to each record is in the order of DateOfBirth.

Ordering my numbering by DateOfBirth is just half of the picture. I also need to “group” the records by the FamilyID. This is where a clause in T-SQL that you might not be very familiar with comes into play: “PARTITION BY”. The PARTITION BY clause allows us to group the results within the call to ROW_NUMBER() without grouping them ourselves via a GROUP BY. It just tells the ROW_NUMBER what groupings to use when it does its counting.

Here is our final SQL statement, which achieves the business logic we wanted to implement.

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT 
       [PersonID]
      ,[FamilyID]
      ,[FirstName]
      ,[LastName]
      ,[DateOfBirth]
      ,ROW_NUMBER() over(PARTITION BY FamilyID, 
                         CONVERT(NVARCHAR(25), DateOfBirth, 111) 
                         ORDER BY DateOfBirth ASC) TwinCode
 
  FROM [People]
ORDER BY
	PersonID

In the ROW_NUMBER function above, I am doing several things. I’m grouping on FamilyID, and also grouping on a converted DateOfBirth. I convert the DateOfBirth to an nvarchar using the 111 conversion code, because that gets results like ‘2009/10/11′ and ‘2009/10/12′ which can easily be grouped by to achieve distinct dates.

Grouping on the Family, DateOfBirth, and then sorting by DateOfBirth ascending achieves the desired result for the ROW_NUMBER. Here are the results of the query:

PersonID FamilyID FirstName LastName DateOfBirth TwinCode
1 1 Joe Johnson 2000-10-23 13:00:00 1
2 1 Jim Johnson 2001-12-15 05:45:00 1
3 2 Karly Matthews 2000-05-20 04:00:00 1
4 2 Kacy Matthews 2000-05-20 04:02:00 2
5 2 Tom Matthews 2001-09-15 11:52:00 1

As you can see, the two people who qualify as twins above (Karly and Kacy) are enumerated correctly, with Karly receiving a 1 and Kacy receiving a 2. All the records that are not twins properly receive a 1.

SQL Server – Using Computed or Calculated Columns in Tables

Posted in SQL on October 21st, 2009 by Dave Andrews – Be the first to comment

You are probably used to writing views to access data in your tables when there is some sort of computation that must be made in a field. But did you know that you can have your tables make computations themselves, without running through a view?

This can be done with Computed (or Calculated) Columns. These columns are table-level expressions that can operate on the other fields in a given record.

Let’s create a table which uses computed columns. I am going to create a table called Programmers which allows me to store a programmer’s first name, last name, middle initial, and date of birth. The table will include two computed columns: one which combines the elements of the name into a FullName field, and a second column which tells me the programmer’s age. This will all be achieved directly in the table, without the use of a view.

First, let’s create the table. Here is the query to create all of the fields except the computed ones.

1
2
3
4
5
6
7
CREATE TABLE Programmers
(
	ProgrammerID INT IDENTITY(1,1) NOT NULL,
	FirstName NVARCHAR(30),
	LastName NVARCHAR(30),
	MiddleInit NCHAR(1),
	DateOfBirth DATETIME,

Now, let’s create our first computed column. The syntax is simple. Just begin with the name of the column, and then in parenthesis define the expression which will calculate the value of the column.

Let’s begin with the FullName calculation. Just add the LastName, a comma, FirstName, and MiddleInit, and then trim white space off the right to handle a missing initial.

1
2
3
        FullName AS (rtrim(coalesce(LastName, '') + ', ' +
            coalesce(FirstName, '') + ' ' + 
            coalesce(MiddleInit, ''))),

There we have it, our FullName calculation. Each field is encapsulated in coalesce to handle NULL values properly.

The last calculation will be the age. The age is simply the difference in years of the current date from the birth date.

1
2
	Age AS (datediff(year, DateOfBirth, getdate()))
)

We also add a following parenthesis to close out our “CREATE TABLE” statement.

Now lets insert some test data. I added a few records with some NULLs for good testing measure. All standard stuff here. Notice we are not inserting the FullName or Age values.

1
2
3
4
5
6
INSERT INTO Programmers(FirstName, LastName, MiddleInit, DateOfBirth)
     VALUES ('David', 'Andrews', 'C', '1984-09-20')
INSERT INTO Programmers(FirstName, LastName, MiddleInit, DateOfBirth) 
     VALUES ('Billy', 'Jenkins', NULL, '1990-01-20')
INSERT INTO Programmers(FirstName, LastName, MiddleInit, DateOfBirth) 
     VALUES ('Robert', 'Anderson', 'K', NULL)

Now lets test out our fields, using nothing more than a SELECT.

1
SELECT * FROM Programmers

We will get the following results:

ProgrammerID FirstName LastName MiddleInit DateOfBirth FullName Age
1 David Andrews C 1984-09-20 00:00:00.000 Andrews, David C 25
2 Billy Jenkins NULL 1990-01-20 00:00:00.000 Jenkins, Billy 19
3 Robert Anderson K NULL Anderson, Robert K NULL

I emphasized the calculated fields above. Our query did not calculate them, they were considered to be just a part of the table since they are calculated fields.

One thing to keep in mind about calculated fields is that they are difficult to modify. You have to DROP the field and then ADD it back with the same name. This can change the order of fields in your query if you use SELECT *. It can also affect any triggers you may have which rely on the fields being in a certain order.

Also keep in mind any overhead that calculated fields may produce. It’s a good idea to use them for absolutely basic, atomic information, such as what I presented above. Complex calculations could become taxing to your queries.

Understanding SQL Joins – The Left Join

Posted in SQL on October 21st, 2009 by Dave Andrews – Be the first to comment

SQL joins are a crucial part of anything more than the simplest of queries. Many programmers who do not fully comprehend SQL joins end up writing bloated software which will pull information from one table, store it, then run another query to get the information they want to join on. They then use code to process the join. It is much more efficient to have the SQL server handle the joining and processing of these records for you than to create custom code which ties tables together.

Let’s use these tables as an example.

AnimalTypes table

AnimalTypeID Name
1 Dog
2 Cat
3 Turtle
4 Ferret

Animals table

AnimalID AnimalTypeID Name Age
1 1 Dusty 5
2 3 Jonesey 2
3 2 Bonnie 1
4 3 Fiddler 3
5 1 Marci 1
6 5 Tails 2

Here we have a table called AnimalTypes which contains 4 types of animals, and a table called Animals which lists out animals and what type they are.

Let’s say we want to write a program which will display all the animals in our animals table, as well as the Name of the Type of the animal, not the ID of the type. If an animal has an unknown type, we want to display “UNKNOWN.” Here is the bad process that we want to avoid.

SELECT * FROM AnimalTypes
SELECT * FROM Animals

The bad program will then loop through all the animals and print out the Name field in the AnimalTypes results which corresponds to the given type ID. If one does not exist in the AnimalTypes results, the code could then print “UNKNOWN.”

It also might occur to you that you can write a query like this one, which would also be incorrect.

SELECT
      *
FROM
      Animals, AnimalTypes
WHERE
      Animals.AnimalTypeID = AnimalTypes.AnimalTypeID

This is the basis of an inner join. It can be written as an inner join query, but I wrote it in the manner above as an example. This query will return results, however you will only get animals who have a corresponding correct AnimalType. Our program wants to display all animals, regardless of whether or not they have a valid type selected. Using the tables from above, you would get these results:

AnimalID AnimalTypeID Name Age AnimalTypeID Name
1 1 Dusty 5 1 Dog
2 3 Jonesey 2 3 Turtle
3 2 Bonnie 1 2 Cat
4 3 Fiddler 3 3 Turtle
5 1 Marci 1 1 Dog

If you notice, the animal Tails is missing from the list. This is because Tails’ AnimalTypeID of 5 did not have a corresponding AnimalTypeID in the AnimalTypes table. In a query such as this, both values have to exist in both tables. This is not the case in our data, even though we want all animals to display.

What we need to use in this case is a left join or a left outer join. The left join will take each record from the left-hand side of the join, and tie it to records on the right-hand side. If the tie does not exist in the right-hand table, then the left-hand fields will still exist in the results, with NULL values for the right-hand.

Here is the query using a left join.

1
2
3
4
5
6
SELECT 
     * 
FROM 
     Animals
     LEFT JOIN AnimalTypes
          ON Animals.AnimalTypeID = AnimalTypes.AnimalTypeID

What you get here is every record from Animals, joined to the corresponding AnimalType record. In the case of Tails, we won’t get any AnimalType information.

AnimalID AnimalTypeID Name Age AnimalTypeID Name
1 1 Dusty 5 1 Dog
2 3 Jonesey 2 3 Turtle
3 2 Bonnie 1 2 Cat
4 3 Fiddler 3 3 Turtle
5 1 Marci 1 1 Dog
6 5 Tails 2 NULL NULL

A very common use of this join is in financial transactions, where you might be missing the category of the transaction but still want the amount included.

Conditional Sums in SQL Queries

Posted in SQL on September 29th, 2009 by Dave Andrews – 1 Comment

One thing that you may encounter frequently is a situation where you want to count the records in a table which are of a certain type. You want to do this as efficiently as possible, instead of running a new count query for each type. Let’s use the below table for reference.

Name Type Age
Buffy Dog 10
Dusty Dog 13
Clyde Dog 2
Bonnie Cat 1
Milky Ferret 1

We have above a table of animals. How can we determine the count of each type of animal? The simplest (and most inefficient) is to run a count() query for each type of animal. This would mean we have 3 queries to run, as there are 3 types of animals.

1
2
3
4
5
6
7
8
9
declare @numdogs int
declare @numcats int
declare @numferrets int
 
SELECT @numdogs = COUNT(*) FROM animals WHERE type = 'Dog'
SELECT @numcats = COUNT(*) FROM animals WHERE type = 'Cat'
SELECT @numferrets = COUNT(*) FROM animals WHERE type = 'Ferret'
 
SELECT @numdogs AS NumDogs, @numcats AS NumCats, @numferrets AS NumFerrets

We will get results from that query like this.

NumDogs NumCats NumFerrets
3 1 1

As you can imagine, this is terribly inefficient. A good answer in this case is to use a “conditional” sum in the SQL query. So rather than running COUNT(*) we will be running SUM(…). We will combine the SUM() call with a CASE statement, so that it counts correctly. This is the trick I hope to teach you. You can place CASE … END statements inside of aggregate functions.

Throw away all of our temporary variables, the 4 select statements involved, and let’s replace them with this single select.

1
2
3
4
5
6
SELECT 
	SUM(CASE WHEN type='Dog' then 1 else 0 end) AS NumDogs,
	SUM(CASE WHEN type='Cat' then 1 else 0 end) AS NumCats,
	SUM(CASE WHEN type='Ferret' then 1 else 0 end) AS NumFerrets
FROM 
	animals

We get these results:

NumDogs NumCats NumFerrets
3 1 1

Which is exactly what we got previously, but using a much more efficient method. The trick was to insert a CASE … END statement inside the SUM. The case will return a 1 when the type is the type we are looking for, and a 0 otherwise. The sum will then execute to add up all of those 0s and 1s. A 0 will not add anything, so we will effectively end up counting the values which match the type we are looking for.

Mobile Merge Replication combined with SSIS Packages

Posted in SQL, SSIS on September 11th, 2009 by Dave Andrews – 1 Comment

I have a project I’ve worked on recently which involved several database.

  1. Database 1 is the “master” database to which a web interface interacts.
  2. Database 2 is an “intermediate” database, which is set up for merge replication, to publish down to handheld clients running Windows Mobile.
  3. Database 3 is the database on the handhelds which subscribe to database 2.

Data can be entered in the web interface (and therefore directly to database 1) or it can be entered on the handheld, and replicated up to database 2. The handhelds can’t tie directle to database 1, because some transformations have to be applied.

To tie data back and forth between database 1 and 2, I have an SSIS package which runs every half hour to do the following:

  1. The package imports new data from the intermediate database.
  2. The package pushes any changes down from database 1 to database 2, to be replicated out to the handhelds.

Here is the problem I encountered. Any new record I put into the web interface would not make it down to the mobile handhelds, even after the SSIS package put the data into database 2 and the handheld synchronized. I would have to re-create the snapshot and then re-subscribe the handheld in order to see new records. That just won’t work!

After pulling my hair our for about an hour trying to figure out the problem, and a lot of web searching, I finally came across this amazing page about SQL replication which solved the problem immediately.

When populating a merge-replication published database using SSIS, DO NOT USE “FAST LOAD”. Fast load does not run the triggers on the tables which are being replicated, and those triggers have to run in order for replication to do its work. I just changed it from using “Fast Load” to using the normal method, and my replication worked like a charm!

Distance Between Two Latitude and Longitude Points in SQL Server

Posted in SQL on August 28th, 2009 by Dave Andrews – Be the first to comment

I had to create a process to search locations within a given radius recently. I found this page which describes the Haversine formula to calculate distances. I implemented this as a SQL Server Function to calculate the distance.

read more »