Reading Time: 3 minutes

Today was one of those days where I was tasked to install and configure a SQL Server at a customer’s data center somewhere in Scandinavia.

The scope was fairly straightforward but I was missing one important clue: collation.

Collation definition …in-short.

Collation is the mechanism that dictates the way things happen behind the JOIN, ORDER BY and comparison of characters in our database. So, it is quite important as you can imagine for the overall application behavior as well as the database engine and its intricacies.

Collation can be found on the instance level, on the database level and on the column level of a table. There is also the expression-level collation but I will not get into this one.

When a database developer designs the database, depending on the usage of the application and its audience, the columns of a table that hold characters will be using most of the time char or varchar data type. Now, there are times that because of the language and the bits needed to represent a character the Unicode comes to play and the column can become nchar or nvarchar. The Unicode conforms to international standards and can accommodate all (at least to my knowledge and experience) the written characters. With the introduction of SQL Server 2019 Unicode has changed quite a bit and UTF-8 was introduced mainly for backward compatibility. Check the link at the bottom for a detailed guide.

You can get a list of the supported collation by executing the following T-SQL statement:

SELECT Name, Description FROM fn_helpcollations();

Note: SQL Server in the past did not support Windows collations, I know, so there are some left-over SQL Server Collations (<80) that are still there for backward-compatibility but should not be used for new development.

To get a list of Windows Collation on your system, execute the following T-SQL statement:

SELECT * FROM sys.fn_helpcollations() WHERE [name] NOT LIKE N'SQL%';

Why you should care about collation

The initial collation configuration occurs at the time of SQL Server setup and configuration either from the wizard or from a command. At this moment another crucial configuration happens on a system database: tempdb.

Because #1

You should care about the collation because of quite a few reasons but one of the most important is the tempdb and how you interact with it. If your database has different collation than of the instance and consequently of the tempdb, then you could be in for a treat.

Conversions will happen all over the place and in some cases, it could be quite unpredictable and hinder the performance of the solution.

Because #2

It is very hard to change the instance level collation, without recreating many things along the way. It could mean a significant amount of down-time for a production system.

For Azure SQL Managed Instance the collation is defined at the creation and *cannot* be changed afterward. So beware of this and check your database collation before migrating or deploying an instance.

And now the tip! (duh…)

So, my trouble was the collation that I should configure the instance with. I remembered that I was handed a database backup that I was to restore after the installation and configuration of the SQL. That’s it!

I issued the following T-SQL statement:

RESTORE HEADERONLY FROM DISK = '<my path to the database bakcup file>'

The resultset holds quite a lot of information but I was only interested in one column only: Collation.

Based on that, I pushed through with the installation successfully.

Cheers!

Further reading:

  • You can check more on collation here.
  • A very detailed guide on the characters and the whole encoding can be found here.