Pro T SQL Programmer's Guide (4th Edition) Miguel Cebollero Apress

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 727 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Shelve in
Databases/MS SQL Server
User level:
Pro T-SQL Programmer’s Guide
Pro T–SQL Programmer’s Guide is your guide to making the best use of the
powerful, Transact-SQL programming language that is built into Microsoft SQL
Server’s database engine. This edition is updated to cover the new, in-memory
features that are part of SQL Server 2014. Discussing new and existing features,
the book takes you on an expert guided tour of Transact–SQL functionality. Fully
functioning examples and downloadable source code bring technically accurate
and engaging treatment of Transact–SQL into your own hands. Step–by–step
explanations ensure clarity, and an advocacy of best–practices will steer you down
the road to success.
Transact–SQL is the language developers and DBAs use to interact with SQL
Server. It’s used for everything from querying data, to writing stored procedures, to
managing the database. Support for in-memory stored procedures running queries
against in-memory tables is new in the language and gets coverage in this edition.
Also covered are must-know features such as window functions and data paging
that help in writing fast-performing database queries. Developers and DBAs alike
can benefit from the expressive power of T-SQL, and Pro T-SQL Programmer’s
Guide is your roadmap to success in applying this increasingly important database
language to everyday business and technical tasks.
9781484 201466
ISBN 978-1-4842-0146-6
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
Contents at a Glance
About the Authors. ................................................................................................xxiii
About the Technical Reviewer. ...............................................................................xxv
Acknowledgments ...............................................................................................xxvii
Introduction ..........................................................................................................xxix
Chapter 1: Foundations of T-SQL ........................................................................... 1
Chapter 2: Tools of the Trade ............................................................................... 19
Chapter 3: Procedural Code ................................................................................. 47
Chapter 4: User-Defined Functions ...................................................................... 79
Chapter 5: Stored Procedures ........................................................................... 111
Chapter 6: In-Memory Programming ................................................................. 153
Chapter 7: Triggers ............................................................................................ 177
Chapter 8: Encryption ........................................................................................ 207
Chapter 9: Common Table Expressions and Windowing Functions ................... 233
Chapter 10: Data Types and Advanced Data Types ............................................ 269
Chapter 11: Full-Text Search ............................................................................. 317
Chapter 12: XML ................................................................................................ 347
Chapter 13: XQuery and XPath .......................................................................... 387
Chapter 14: Catalog Views and Dynamic aent Views ........................................ 433
Chapter 15: .NET Client Programming ............................................................... 461
Chapter 16: CLR Integration Programming ........................................................ 511
Chapter 17: Data Services ................................................................................. 559
Chapter 18: Error Handling and Dynamic SQL ................................................... 589
Chapter 19: Performance Tuning ....................................................................... 613
Appendix A: Exercise Answers .......................................................................... 653
Appendix B: XQuery Data Types ......................................................................... 663
Appendix C: Glossary ......................................................................................... 669
Appendix D: SQLCMD Quick Reference .............................................................. 683
Index ..................................................................................................................... 693
In the mid-1990s, when Microsoft parted ways with Sybase in their conjoint development of SQL Server, it
was an entirely dierent product. When SQL Server 6.5 was released in 1996, it was starting to gain credibility
as an enterprise-class database server. It still had rough management tools, only core functionalities, and
some limitations that are forgotten today, like xed-size devices and the inability to drop table columns. It
functioned as a rudimentary database server: storing and retrieving data for client applications. ere was
already plenty for anyone new to the relational database world to learn. Newcomers had to understand
many concepts, such as foreign keys, stored procedures, triggers, and the dedicated language, T-SQL
(which could be a baing experience—writing SELECT queries sometimes involves a lot of head-scratching).
Even when developers mastered all that, they still had to keep up with the additions Microsoft made to
the database engine with each new version. Some of the changes were not for the faint of heart, like .NET
database modules, support for XML and the XQuery language, and a full implementation of symmetric and
asymmetric encryption. ese additions are today core components of SQL Server.
Because a relational database management server (RDBMS) like SQL Server is one of the most
important elements of the IT environment, you need to make the best of it, which implies a good
understanding of its more advanced features. We have designed this book with the goal of helping T-SQL
developers get the absolute most out of the development features and functionality in SQL Server 2014.
We cover all of what’s needed to master T-SQL development, from using management and development
tools to performance tuning. We hope you enjoy the book and that it helps you to become
a pro SQL Server 2014 developer.
Whom This Book Is For
is book is intended for SQL Server developers who need to port code from prior versions of SQL Server,
and those who want to get the most out of database development on the 2014 release. You should have a
working knowledge of SQL, preferably T-SQL on SQL Server 2005 or later, because most of the examples
in this book are written in T-SQL. e book covers some of the basics of T-SQL, including introductory
concepts like data domain and three-valued logic, but this isn’t a beginners book. We don’t discuss database
design, database architecture, normalization, and the most basic SQL constructs in any detail. Apress oers
a beginner’s guide to T-SQL 2012 that covers more basic SQL constructs.
We focus here on advanced SQL Server 2014 functionalities, and so we assume you have a basic
understanding of SQL statements like INSERT and SELECT. A working knowledge of C# and the .NET
Framework is also useful (but not required), because two chapters are dedicated to .NET client
programming and .NET database integration.
Some examples in the book are written in C#. When C# sample code is provided, it’s explained in detail,
so an in-depth knowledge of the .NET Framework class library isn’t required.
How This Book Is Structured
is book was written to address the needs of four types of readers:
SQL developers who are coming from other platforms to SQL Server 2014•
SQL developers who are moving from prior versions of SQL Server to •
SQL Server 2014
SQL developers who have a working knowledge of basic T-SQL programming and •
want to learn about advanced features
Database administrators and non-developers who need a working knowledge of •
T-SQL functionality to eectively support SQL Server 2014 instances
For all types of readers, this book is designed to act as a tutorial that describes and demonstrates T-SQL
features with working examples, and as a reference for quickly locating details about specic features. e
following sections provide a chapter-by-chapter overview.
Chapter 1
Chapter 1 starts this book by putting SQL Server 2014’s implementation of T-SQL in context, including a
short history, a discussion of the basics, and an overview of T-SQL coding best practices.
Chapter 2
Chapter 2 gives an overview of the tools that are packaged with SQL Server and available to SQL Server
developers. Tools discussed include SQL Server Management Studio (SSMS), SQLCMD, SQL Server Data
Tools (SSDT), and SQL Proler, among others.
Chapter 3
Chapter 3 introduces T-SQL procedural code, including control-of-ow statements like IF...THEN and
WHILE. is chapter also discusses CASE expressions and CASE-derived functions, and provides an
in-depth discussion of SQL three-valued logic.
Chapter 4
Chapter 4 discusses the various types of T-SQL user-dened functions available to encapsulate T-SQL logic
on the server. We talk about all forms of T-SQL–based user-dened functions, including scalar user-dened
functions, inline table-valued functions, and multistatement table-valued functions.
Chapter 5
Chapter 5 covers stored procedures, which allow you to create server-side T-SQL subroutines.
In addition to describing how to create and execute stored procedures on SQL Server, we also address an
issue that is thorny for some: why you might want to use stored procedures.
Chapter 6
Chapter 6 covers the latest features available in SQL Server 2014: In-Memory OLTP tables.
e In-Memory features provide the capability to dramatically increase the database performance of an
OLTP or data-warehouse instance. With the new features also come some limitations.
Chapter 7
Chapter 7 introduces all three types of SQL Server triggers: classic DML triggers, which re in response
to DML statements; DDL triggers, which re in response to server and database DDL events; and logon
triggers, which re in response to server LOGON events.
Chapter 8
Chapter 8 discusses SQL Server encryption, including the column-level encryption functionality introduced
in SQL Server 2005 and the newer transparent database encryption (TDE) and extensible key management
(EKM) functionality, both introduced in SQL Server 2008.
Chapter 9
Chapter 9 dives into the details of common table expressions (CTEs) and windowing functions in
SQL Server 2014, which feature some improvements to the OVER clause to achieve row-level running and
sliding aggregations.
Chapter 10
Chapter 10 discusses T-SQL data types: rst some important things to know about basic data types, such
as how to handle date and time in your code, and then advanced data types and features, such as the
hierarchyid complex type and FILESTREAM and filetable functionality.
Chapter 11
Chapter 11 covers the full-text search (FTS) feature and advancements made since SQL Server 2008,
including greater integration with the SQL Server query engine and greater transparency by way of
FTS-specic data-management views and functions.
Chapter 12
Chapter 12 provides an in-depth discussion of SQL Server 2014 XML functionality, which carries forward
and improve on the new features introduced in SQL Server 2005. We cover several XML-related topics in this
chapter, including the xml data type and its built-in methods, the FOR XML clause, and XML indexes.
Chapter 13
Chapter 13 discusses XQuery and XPath support in SQL Server 2014, including improvements on the
XQuery support introduced in SQL Server 2005, such as support for the xml data type in XML DML insert
statements and the let clause in FLWOR expressions.
Chapter 14
Chapter 14 introduces SQL Server catalog views, which are the preferred tools for retrieving database and
database object metadata. is chapter also discusses dynamic-management views and functions, which
provide access to server and database state information.
Chapter 15
Chapter 15 covers SQL CLR Integration functionality in SQL Server 2014. In this chapter, we discuss and
provide examples of SQL CLR stored procedures, user-dened functions, user-dened types, and
user-dened aggregates.
Chapter 16
Chapter 16 focuses on client-side support for SQL Server, including ADO.NET-based connectivity and the
newest Microsoft object-relational mapping (ORM) technology, Entity Framework 4.
Chapter 17
Chapter 17 discusses SQL Server connectivity using middle-tier technologies. Because native HTTP
endpoints have been deprecated since SQL Server 2008, we discuss them as items that may need to be
supported in existing databases but shouldn’t be used for new development. We focus instead on possible
replacement technologies, such as ADO.NET data services and IIS/.NET web services.
Chapter 18
Chapter 18 discusses improvements to server-side error handling made possible with the TRY...CATCH
block. We also discuss various methods for debugging code, including using the Visual Studio T-SQL
debugger. is chapter wraps up with a discussion of dynamic SQL and SQL injection, including the causes
of SQL injection and methods you can use to protect your code against this type of attack.
Chapter 19
Chapter 19 provides an overview of performance-tuning SQL Server code. is chapter discusses SQL
Server storage, indexing mechanisms, and query plans. We end the chapter with a discussion of a proven
methodology for troubleshooting T-SQL performance issues.
Appendix A
Appendix A provides the answers to the exercise questions included at the end of each chapter.
Appendix B
Appendix B is designed as a quick reference to the XQuery Data Model (XDM) type system.
Appendix C
Appendix C provides a quick reference glossary to several terms, many of which may be new to those using
SQL Server for the rst time.
Appendix D
Appendix D is a quick reference to the SQLCMD command-line tool, which allows you to execute
ad hoc T-SQL statements and batches interactively, or run script les.
To help make reading this book a more enjoyable experience, and to help you get as much out of it as
possible, we’ve used the following standardized formatting conventions throughout.
C# code is shown in code font. Note that C# code is case sensitive. Heres an example:
while (i < 10)
T-SQL source code is also shown in code font, with keywords capitalized. Note that we’ve lowercased
the data types in the T-SQL code to help improve readability. Here’s an example:
DECLARE @x xml;
XML code is shown in code font with attribute and element content in bold for readability.
Some code samples and results have been reformatted in the book for easier reading. XML ignores
whitespace, so the signicant content of the XML has not been altered. Heres an example:
<book publisher = "Apress">Pro SQL Server 2014 XML</book>:
Note Notes, tips, and warnings are displayed like this, in a special font with solid bars placed
over and under the content.
Sidebars include additional information relevant to the current discussion and other interesting facts.
Sidebars are shown on a gray background.
is book requires an installation of SQL Server 2014 to run the T-SQL sample code provided. Note that the
code in this book has been specically designed to take advantage of SQL Server 2014 features, and some
of the code samples won’t run on prior versions of SQL Server. e code samples presented in the book
are designed to be run against the AdventureWorks 2014 and SQL Server 2014 In-Memory OLTP sample
databases, available from the CodePlex web site at
e database name used in the samples is not AdventureWorks2014, but AdventureWorks or 2014
In-Memory, for the sake of simplicity.
If you’re interested in compiling and deploying the .NET code samples (the client code and SQL
CLR examples) presented in the book, we highly recommend an installation of Visual Studio 2010 or a
later version. Although you can compile and deploy .NET code from the command line, we’ve provided
instructions for doing so through the Visual Studio Integrated Development Environment (IDE).
We nd that the IDE provides a much more enjoyable experience.
Some examples, such as the ADO.NET Data Services examples in Chapter 16, require an installation
of Internet Information Server(IIS) as well. Other code samples presented in the book may have specic
requirements, such as the Entity Framework 4 samples, which require the .NET Framework 3.5. We’ve added
notes to code samples that have additional requirements like these.
Apress Web Site
Visit this book’s web page at for the complete sample code
download for this book. It’s compressed in a zip le and structured so that each subdirectory contains all the
sample code for its corresponding chapter.
We and the Apress team have made every eort to ensure that this book is free from errors and defects.
Unfortunately, thex occasional error does slip past us, despite our best eorts. In the event that you nd an
error in the book, please let us know! You can submit errors to Apress by visiting and lling out the form on the Errata tab.
Chapter 1
Foundations of T-SQL
SQL Server 2014 is the latest release of Microsoft’s enterprise-class database management system (DBMS). As
the name implies, a DBMS is a tool designed to manage, secure, and provide access to data stored in structured
collections in databases. Transact-SQL (T-SQL) is the language that SQL Server speaks. T-SQL provides query and
data-manipulation functionality, data definition and management capabilities, and security administration tools
to SQL Server developers and administrators. To communicate effectively with SQL Server, you must have a solid
understanding of the language. In this chapter, you begin exploring T-SQL on SQL Server 2014.
A Short History of T-SQL
The history of Structured Query Language (SQL), and its direct descendant Transact-SQL (T-SQL), begins
with a man. Specifically, it all began in 1970 when Dr. E. F. Codd published his influential paper “A Relational
Model of Data for Large Shared Data Banks” in the Communications of the Association for Computing
Machinery (ACM). In his seminal paper, Dr. Codd introduced the definitive standard for relational databases.
IBM went on to create the first relational database management system, known as System R. It subsequently
introduced the Structured English Query Language (SEQUEL, as it was known at the time) to interact with
this early database to store, modify, and retrieve data. The name of this early query language was later
changed from SEQUEL to the now-common SQL due to a trademark issue.
Fast-forward to 1986, when the American National Standards Institute (ANSI) officially approved the
first SQL standard, commonly known as the ANSI SQL-86 standard. The original versions of Microsoft SQL
Server shared a common code base with the Sybase SQL Server product. This changed with the release
of SQL Server 7.0, when Microsoft partially rewrote the code base. Microsoft has since introduced several
iterations of SQL Server, including SQL Server 2000, SQL Server 2005, SQL Server 2008, SQL 2008 R2,
SQL 2012, and now SQL Server 2014. This book focuses on SQL Server 2014, which further extends the
capabilities of T-SQL beyond what was possible in previous releases.
Imperative vs. Declarative Languages
SQL is different from many common programming languages such as C# and Visual Basic because it’s a
declarative language. In contrast, languages such as C++, Visual Basic, C#, and even assembler language are
imperative languages. The imperative language model requires the user to determine what the end result
should be and tell the computer step by step how to achieve that result. It’s analogous to asking a cab driver to
drive you to the airport and then giving the driver turn-by-turn directions to get there. Declarative languages,
on the other hand, allow you to frame your instructions to the computer in terms of the end result. In this
model, you allow the computer to determine the best route to achieve your objective, analogous to telling the
cab driver to take you to the airport and trusting them to know the best route. The declarative model makes a
lot of sense when you consider that SQL Server is privy to a lot of “inside information.” Just like the cab driver
who knows the shortcuts, traffic conditions, and other factors that affect your trip, SQL Server inherently knows
several methods to optimize your queries and data-manipulation operations.
Consider Listing 1-1, which is a simple C# code snippet that reads in a flat file of names and displays
them on the screen.
Listing 1-1. C# Snippet to Read a Flat File
StreamReader sr = new StreamReader("c:\\Person_Person.txt");
string FirstName = null;
while ((FirstName = sr.ReadLine()) != null) {
Console.WriteLine(s); } sr.Dispose();
The example performs the following functions in an orderly fashion:
1. The code explicitly opens the storage for input (in this example, a flat file is used
as a “database”).
2. It reads in each record (one record per line), explicitly checking for the end of the
3. As it reads the data, the code returns each record for display using
4. Finally, it closes and disposes of the connection to the data file.
Consider what happens when you want to add a name to or delete a name from the flat-file “database.
In those cases, you must extend the previous example and add custom routines to explicitly reorganize all
the data in the file so that it maintains proper ordering. If you want the names to be listed and retrieved
in alphabetical (or any other) order, you must write your own sort routines as well. Any type of additional
processing on the data requires that you implement separate procedural routines.
The SQL equivalent of the C# code in Listing 1-1 might look something like Listing 1-2.
Listing 1-2. SQL Query to Retrieve Names from a Table
SELECT FirstName FROM Person.Person;
Tip Unless otherwise specified, you can run all the T-SQL samples in this book in the AdventureWorks 2014
or SQL 2014 In-Memory sample database using SQL Server Management Studio or SQLCMD.
To sort your data, you can simply add an ORDER BY clause to the SELECT query in Listing 1-2. With
properly designed and indexed tables, SQL Server can automatically reorganize and index your data for
efficient retrieval after you insert, update, or delete rows.
T-SQL includes extensions that allow you to use procedural syntax. In fact, you could rewrite the
previous example as a cursor to closely mimic the C# sample code. These extensions should be used with
care, however, because trying to force the imperative model on T-SQL effectively overrides SQL Servers
built-in optimizations. More often than not, this hurts performance and makes simple projects a lot more
complex than they need to be.
One of the great assets of SQL Server is that you can invoke its power, in its native language, from nearly any
other programming language. For example, in .NET you can connect to SQL Server and issue SQL queries and
T-SQL statements to it via the System.Data.SqlClient namespace, which is discussed further in Chapter 16. This
gives you the opportunity to combine SQLs declarative syntax with the strict control of an imperative language.
SQL Basics
Before you learn about developments in T-SQL, or on any SQL-based platform for that matter, let’s make sure
we’re speaking the same language. Fortunately, SQL can be described accurately using well-defined and time-
tested concepts and terminology. Lets begin the discussion of the components of SQL by looking at statements.
To begin with, in SQL you use statements to communicate your requirements to the DBMS. A statement is
composed of several parts, as shown in Figure 1-1.
Figure 1-1. Components of a SQL statement
As you can see in the figure, SQL statements are composed of one or more clauses, some of which
may be optional depending on the statement. In the SELECT statement shown, there are three clauses: the
SELECT clause, which defines the columns to be returned by the query; the FROM clause, which indicates the
source table for the query; and the WHERE clause, which is used to limit the results. Each clause represents
a primitive operation in the relational algebra. For instance, in the example, the SELECT clause represents
a relational projection operation, the FROM clause indicates the relation, and the WHERE clause performs a
restriction operation.
Note The relational model of databases is the model formulated by Dr. E. F. Codd. In the relational model,
what are known in SQL as tables are referred to as relations; hence the name. Relational calculus and relational
algebra define the basis of query languages for the relational model in mathematical terms.
Understanding the logical order in which SQL clauses are applied within a statement or query is
important when setting your expectations about results. Although vendors are free to physically perform
whatever operations, in any order, that they choose to fulfill a query request, the results must be the
same as if the operations were applied in a standards-defined order.
The WHERE clause in the example contains a predicate, which is a logical expression that evaluates to
one of SQLs three possible logical results: true, false, or unknown. In this case, the WHERE clause and the
predicate limit the results to only rows in which ContactId equals 1.
The SELECT clause includes an expression that is calculated during statement execution. In the example,
the expression EmailPromotion * 10 is used. This expression is calculated for every row of the result set.
SQL institutes a logic system that may seem foreign to developers coming from other languages like
C++ or Visual Basic (or most other programming languages, for that matter). Most modern computer
languages use simple two-valued logic: a Boolean result is either true or false. SQL supports the
concept of NULL, which is a placeholder for a missing or unknown value. This results in a more complex
three-valued logic (3VL).
Let’s look at a quick example to demonstrate. If I asked you, “Is x less than 10? your first response might
be along the lines of, “How much is x ?” If I refused to tell you what value x stood for, you would have no
idea whether x was less than, equal to, or greater than 10; so the answer to the question is neither true
nor false—it’s the third truth value, unknown. Now replace x with NULL, and you have the essence of SQL
3VL. NULL in SQL is just like a variable in an equation when you don’t know the variable’s value.
No matter what type of comparison you perform with a missing value, or which other values you compare
the missing value to, the result is always unknown. The discussion of SQL 3VL continues in Chapter 3.
The core of SQL is defined by statements that perform five major functions: querying data stored in
tables, manipulating data stored in tables, managing the structure of tables, controlling access to tables, and
managing transactions. These subsets of SQL are defined following:
• Querying: The SELECT query statement is complex. It has more optional clauses and
vendor-specific tweaks than any other statement. SELECT is concerned simply with
retrieving data stored in the database.
• Data Manipulation Language (DML): DML is considered a sublanguage of SQL.
It’s concerned with manipulating data stored in the database. DML consists of
four commonly used statements: INSERT, UPDATE, DELETE, and MERGE. DML also
encompasses cursor-related statements. These statements allow you to manipulate
the contents of tables and persist the changes to the database.
• Data Definition Language (DDL): DDL is another sublanguage of SQL. The primary
purpose of DDL is to create, modify, and remove tables and other objects from the
database. DDL consists of variations of the CREATE, ALTER, and DROP statements.
• Data Control Language (DCL): DCL is yet another SQL sublanguage. DCLs goal is to
allow you to restrict access to tables and database objects. Its composed of various
GRANT and REVOKE statements that allow or deny users access to database objects.
• Transactional Control Language (TCL): TCL is the SQL sublanguage that is
concerned with initiating and committing or rolling back transactions. A transaction
is basically an atomic unit of work performed by the server. TCL comprises the BEGIN
A SQL Server instance—an individual installation of SQL Server with its own ports, logins, and databases
can manage multiple system databases and user databases. SQL Server has five system databases, as follows:
• resource: The resource database is a read-only system database that contains all
system objects. You don’t see the resource database in the SQL Server Management
Studio (SSMS) Object Explorer window, but the system objects persisted in the
resource database logically appear in every database on the server.
• master: The master database is a server-wide repository for configuration and status
information. It maintains instance-wide metadata about SQL Server as well as
information about all databases installed on the current instance. Its wise to avoid
modifying or even accessing the master database directly in most cases. An entire
server can be brought to its knees if the master database is corrupted. If you need to
access the server configuration and status information, use catalog views instead.
• model: The model database is used as the template from which newly created
databases are essentially cloned. Normally, you won’t want to change this database
in production settings unless you have a very specific purpose in mind and are
extremely knowledgeable about the potential implications of changing the model
• msdb: The msdb database stores system settings and configuration information for
various support services, such as SQL Agent and Database Mail. Normally, you use
the supplied stored procedures and views to modify and access this data, rather than
modifying it directly.
• tempdb: The tempdb database is the main working area for SQL Server. When SQL
Server needs to store intermediate results of queries, for instance, they’re written to
tempdb. Also, when you create temporary tables, they’re actually created in tempdb.
The tempdb database is reconstructed from scratch every time you restart SQL Server.
Microsoft recommends that you use the system-provided stored procedures and catalog views to
modify system objects and system metadata, and let SQL Server manage the system databases. You should
avoid modifying the contents and structure of the system databases directly through ad hoc T-SQL. Only
modify the system objects and metadata by executing the system stored procedures and functions.
User databases are created by database administrators (DBAs) and developers on the server. These
types of databases are so called because they contain user data. The AdventureWorks2014 sample database
is one example of a user database.
Transaction Logs
Every SQL Server database has its own associated transaction log. The transaction log provides recoverability
in the event of failure and ensures the atomicity of transactions. The transaction log accumulates all changes
to the database so that database integrity can be maintained in the event of an error or other problem.
Because of this arrangement, all SQL Server databases consist of at least two files: a database file with an
.mdf extension and a transaction log with an .ldf extension.
the aDVeNtUreWOrKS2014 CID teSt
SQL folks, and IT professionals in general, love their acronyms. A common acronym in the SQL world
is ACID, which stands for “atomicity, consistency, isolation, durability.” These four words form a set of
properties that database systems should implement to guarantee reliability of data storage, processing,
and manipulation:
• Atomicity : All data changes should be transactional in nature. That is, data changes
should follow an all-or-nothing pattern. The classic example is a double-entry
bookkeeping system in which every debit has an associated credit. Recording a
debit-and-credit double entry in the database is considered one transaction, or a single
unit of work. You can’t record a debit without recording its associated credit, and vice
versa. Atomicity ensures that either the entire transaction is performed or none of it is.
• Consistency : Only data that is consistent with the rules set up in the database is stored.
Data types and constraints can help enforce consistency in the database. For instance,
you can’t insert the name Meghan in an integer column. Consistency also applies
when dealing with data updates. If two users update the same row of a table at the
same time, an inconsistency could occur if one update is only partially complete when
the second update begins. The concept of isolation, described in the following bullet
point, is designed to deal with this situation.
• Isolation: Multiple simultaneous updates to the same data should not interfere with one
another. SQL Server includes several locking mechanisms and isolation levels to ensure
that two users can’t modify the exact same data at the exact same time, which could
put the data in an inconsistent state. Isolation also prevents you from even reading
uncommitted data by default.
• Durability : Data that passes all the previous tests is committed to the database.
The concept of durability ensures that committed data isn’t lost. The transaction
log and data backup and recovery features help to ensure durability.
The transaction log is one of the main tools SQL Server uses to enforce the ACID concept when storing
and manipulating data.
SQL Server 2014 supports database schemas, which are logical groupings by the owner of database objects.
The AdventureWorks2014 sample database, for instance, contains several schemas, such as HumanResources,
Person, and Production. These schemas are used to group tables, stored procedures, views, and user-
defined functions (UDFs) for management and security purposes.
Tip When you create new database objects, like tables, and don’t specify a schema, they’re automatically
created in the default schema. The default schema is normally dbo, but DBAs may assign different default
schemas to different users. Because of this, it’s always best to specify the schema name explicitly when
creating database objects.
SQL Server supports several types of objects that can be created in a database. SQL stores and manages
data in its primary data structures: tables. A table consists of rows and columns, with data stored at the
intersections of these rows and columns. As an example, the AdventureWorks HumanResources.Department
table is shown in Figure 1-2. In SQL Server 2014, you now have the option of creating a table In-Memory. This
feature allows all the table data to be stored in memory and can be accessed with extremely low latency.
Figure 1-2. HumanResources.Department table
In the table, each row is associated with columns and each column has certain restrictions placed on
its content. These restrictions form the data domain. The data domain defines all the values a column can
contain. At the lowest level, the data domain is based on the data type of the column. For instance, a smallint
column can contain any integer values between -32,768 and +32,767.
The data domain of a column can be further constrained through the use of check constraints, triggers,
and foreign key constraints. Check constraints provide a means of automatically checking that the value of a
column is within a certain range or equal to a certain value whenever a row is inserted or updated. Triggers
can provide functionality similar to that of check constraints. Foreign key constraints allow you to declare a
relationship between the columns of one table and the columns of another table. You can use foreign key
constraints to restrict the data domain of a column to include only those values that appear in a designated
column of another table.
reStrICtING the Data DOMaIN: a COMparISON
This section has given a brief overview of three methods of constraining the data domain for a column.
Each method restricts the values that can be contained in the column. Here’s a quick comparison of the
three methods:
Foreign key constraints allow SQL Server to perform an automatic check against •
another table to ensure that the values in a given column exist in the referenced table. If
the value you’re trying to update or insert in a table doesn’t exist in the referenced table,
an error is raised and any changes are rolled back. The foreign key constraint provides a
flexible means of altering the data domain, because adding values to or removing them
from the referenced table automatically changes the data domain for the referencing
table. Also, foreign key constraints offer an additional feature known as cascading
declarative referential integrity (DRI), which automatically updates or deletes rows from
a referencing table if an associated row is removed from the referenced table.
Check constraints provide a simple, efficient, and effective tool for ensuring that the •
values being inserted or updated in a column(s) are within a given range or a member
of a given set of values. Check constraints, however, aren’t as flexible as foreign key
constraints and triggers because the data domain is normally defined using hard-coded
constant values or logical expressions.
Triggers are stored procedures attached to insert, update, or delete events on a table or •
view. Triggers can be set on DML or DDL events. Both DML and DDL triggers provide a
flexible solution for constraining data, but they may require more maintenance than the
other options because they’re essentially a specialized form of stored procedure. Unless
they’re extremely well designed, triggers have the potential to be much less efficient
than other methods of constraining data. Generally triggers are avoided in modern
databases in favor of more efficient methods of constraining data. The exception to this
is when you’re trying to enforce a foreign key constraint across databases, because
SQL Server doesn’t support cross-database foreign key constraints.
Which method you use to constrain the data domain of your column(s) needs to be determined by your
project-specific requirements on a case-by-case basis.
A view is like a virtual table—the data it exposes isn’t stored in the view object itself. Views are composed of
SQL queries that reference tables and other views, but they’re referenced just like tables in queries. Views
serve two major purposes in SQL Server: they can be used to hide the complexity of queries, and they can be
used as a security device to limit the rows and columns of a table that a user can query. Views are expanded,
meaning their logic is incorporated into the execution plan for queries when you use them in queries and
DML statements. SQL Server may not be able to use indexes on the base tables when the view is expanded,
resulting in less-than-optimal performance when querying views in some situations.
To overcome the query performance issues with views, SQL Server also has the ability to create a special
type of view known as an indexed view. An indexed view is a view that SQL Server persists to the database
like a table. When you create an indexed view, SQL Server allocates storage for it and allows you to query
it like any other table. There are, however, restrictions on inserting into, updating, and deleting from an
indexed view. For instance, you can’t perform data modifications on an indexed view if more than one of the
view’s base tables will be affected. You also can’t perform data modifications on an indexed view if the view
contains aggregate functions or a DISTINCT clause.
You can also create indexes on an indexed view to improve query performance. The downside to an
indexed view is increased overhead when you modify data in the view’s base tables, because the view must
be updated as well.
Indexes are SQL Servers mechanisms for optimizing access to data. SQL Server 2014 supports several types
of indexes, including the following:
• Clustered index: A clustered index is limited to one per table. This type of index
defines the ordering of the rows in the table. A clustered index is physically
implemented using a b-tree structure with the data stored in the leaf levels of the
tree. Clustered indexes order the data in a table in much the same way that a phone
book is ordered by last name. A table with a clustered index is referred to as a
clustered table, whereas a table with no clustered index is referred to as a heap.
• Nonclustered index: A nonclustered index is also a b-tree index managed by SQL
Server. In a nonclustered index, index rows are included in the leaf levels of the b-tree.
Because of this, nonclustered indexes have no effect on the ordering of rows in a table.
The index rows in the leaf levels of a nonclustered index consist of the following:
A nonclustered key value•
A row locator, which is the clustered index key on a table with a clustered index, •
or a SQL-generated row ID for a heap
Nonkey columns, which are added via the • INCLUDE clause of the CREATE INDEX
• Columnstore index: A columnstore index is a special index used for very large tables
(>100 million rows) and is mostly applicable to large data-warehouse
implementations. A columnstore index creates an index on the column as opposed
to the row and allows for efficient and extremely fast retrieval of large data sets. Prior
to SQL Server 2014, tables with columnstore indexes were required to be read-
only. In SQL Server 2014, columnstore indexes are now updateable. This feature is
discussed further in Chapter 6.
• XML index: SQL Server supports special indexes designed to help efficiently query
XML data. See Chapter 11 for more information.
• Spatial index: A spatial index is an interesting new indexing structure to support
efficient querying of the new geometry and geography data types. See Chapter 2 for
more information.
• Full-text index: A full-text index (FTI) is a special index designed to efficiently
perform full-text searches of data and documents.
• Memory-optimized index: SQL Server 2014 introduced In-Memory tables that bring with
them new index types. These types of indexes only exist in memory and must be created
with the initial table creation. These index types are covered at length in Chapter 6:
• Nonclustered hash index: This type of index is most efficient in scenarios where
the query will return values for a specific value criteria. For example, SELECT *
FROM <Table> WHERE <Column> = @<ColumnValue>.
• Memory-optimized nonclustered index: This type of index supports the same
functions as a hash index, in addition to seek operations and sort ordering.
You can also include nonkey columns in your nonclustered indexes with the INCLUDE clause of the CREATE
INDEX statement. The included columns give you the ability to work around SQL Server’s index size limitations.
Stored Procedures
SQL Server supports the installation of server-side T-SQL code modules via stored procedures (SPs). It’s very
common to use SPs as a sort of intermediate layer or custom server-side application programming interface
(API) that sits between user applications and tables in the database. Stored procedures that are specifically
designed to perform queries and DML statements against the tables in a database are commonly referred to
as CRUD (create, read, update, delete) procedures.
User-Defined Functions
User-defined functions (UDFs) can perform queries and calculations, and return either scalar values or
tabular result sets. UDFs have certain restrictions placed on them. For instance, they can’t use certain
nondeterministic system functions, nor can they perform DML or DDL statements, so they can’t make
modifications to the database structure or content. They can’t perform dynamic SQL queries or change the
state of the database (cause side effects).
SQL CLR Assemblies
SQL Server 2014 supports access to Microsoft .NET functionality via the SQL Common Language Runtime
(SQL CLR). To access this functionality, you must register compiled .NET SQL CLR assemblies with the
server. The assembly exposes its functionality through class methods, which can be accessed via SQL CLR
functions, procedures, triggers, user-defined types, and user-defined aggregates. SQL CLR assemblies
replace the deprecated SQL Server extended stored procedure (XP) functionality available in prior releases.
Tip Avoid using extended stored procedures (XPs) on SQL Server 2014. The same functionality provided
by XPs can be provided by SQL CLR code. The SQL CLR model is more robust and secure than the XP model.
Also keep in mind that the XP library is deprecated, and XP functionality may be completely removed in a future
version of SQL Server.
Elements of Style
Now that you’ve had a broad overview of the basics of SQL Server, lets look at some recommended
development tips to help with code maintenance. Selecting a particular style and using it consistently helps
immensely with both debugging and future maintenance. The following sections contain some general
recommendations to make your T-SQL code easy to read, debug, and maintain.
SQL Server ignores extra whitespace between keywords and identifiers in SQL queries and statements. A
single statement or query may include extra spaces and tab characters and can even extend across several
lines. You can use this knowledge to great advantage. Consider Listing 1-3, which is adapted from the
HumanResources.vEmployee view in the AdventureWorks2014 database.
Listing 1-3. The HumanResources.vEmployee View from the AdventureWorks2014 Database
SELECT e.BusinessEntityID, p.Title, p.FirstName, p.MiddleName, p.LastName, p.Suffix,
e.JobTitle, pp.PhoneNumber, pnt.Name AS PhoneNumberType, ea.EmailAddress,
p.EmailPromotion, a.AddressLine1, a.AddressLine2, a.City, sp.Name AS StateProvinceName,
a.PostalCode, cr.Name AS CountryRegionName, p.AdditionalContactInfo
FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID =
e.BusinessEntityID INNER JOIN Person.BusinessEntityAddress AS bea ON bea.BusinessEntityID
= e.BusinessEntityID INNER JOIN Person.Address AS a ON a.AddressID = bea.AddressID INNER
JOIN Person.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID INNER JOIN Person.
CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode LEFT OUTER JOIN Person.
PersonPhone AS pp ON pp.BusinessEntityID = p.BusinessEntityID LEFT OUTER JOIN Person.
PhoneNumberType AS pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID LEFT OUTER JOIN
Person.EmailAddress AS ea ON p.BusinessEntityID = ea.BusinessEntityID
This query will run and return the correct result, but it’s very hard to read. You can use whitespace and
table aliases to generate a version that is much easier on the eyes, as demonstrated in Listing 1-4.
Listing 1-4. The HumanResources.vEmployee View Reformatted for Readability
pnt.Name AS PhoneNumberType,
sp.Name AS StateProvinceName,
cr.Name AS CountryRegionName,
FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p
ON p.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS bea
ON bea.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = bea.AddressID
INNER JOIN Person.StateProvince AS sp
ON sp.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion AS cr
ON cr.CountryRegionCode = sp.CountryRegionCode
LEFT OUTER JOIN Person.PersonPhone AS pp
ON pp.BusinessEntityID = p.BusinessEntityID
LEFT OUTER JOIN Person.PhoneNumberType AS pnt
ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID
LEFT OUTER JOIN Person.EmailAddress AS ea
ON p.BusinessEntityID = ea.BusinessEntityID;
Notice that the ON keywords are indented, associating them visually with the INNER JOIN operators
directly before them in the listing. The column names on the lines directly after the SELECT keyword are also
indented, associating them visually with SELECT. This particular style is useful in helping visually break up a
query into sections. The personal style you decide on may differ from this one, but once you’ve decided on a
standard indentation style, be sure to apply it consistently throughout your code.
Code that is easy to read is easier to debug and maintain. The code in Listing 1-4 uses table aliases, plenty
of whitespace, and the semicolon (;) terminator to mark the end of SELECT statements, to make the code more
readable. (It’s a good idea to get into the habit of using the terminating semicolon in your SQL queries—it’s
required in some instances.)
Tip Semicolons are required terminators for some statements in SQL Server 2014. Instead of trying to
remember all the special cases where they are or aren’t required, it’s a good idea to use the semicolon
statement terminator throughout your T-SQL code. You’ll notice the use of semicolon terminators in all the
examples in this book.
Naming Conventions
SQL Server allows you to name your database objects (tables, views, procedures, and so on) using just about
any combination of up to 128 characters (116 characters for local temporary table names), as long as you
enclose them in single quotes ('') or brackets ([ ]). Just because you can, however, doesn’t necessarily
mean you should. Many of the allowed characters are hard to differentiate from other similar-looking
characters, and some may not port well to other platforms. The following suggestions will help you avoid
potential problems:
Use alphabetic characters (A–Z, a–z, and Unicode Standard 3.2 letters) for the first •
character of your identifiers. The obvious exceptions are SQL Server variable names
that start with the at (@) sign, temporary tables and procedures that start with the
number sign (#), and global temporary tables and procedures that begin with a
double number sign (##).
Many built-in T-SQL functions and system variables have names that begin with •
a double at sign (@@), such as @@ERR0R and @@IDENTITY. To avoid confusion and
possible conflicts, don’t use a leading double at sign to name your identifiers.
Restrict the remaining characters in your identifiers to alphabetic characters (A–Z, •
a–z, and Unicode Standard 3.2 letters), numeric digits (0–9), and the underscore
character (_). The dollar sign ($) character, although allowed, isn’t advisable.
Avoid embedded spaces, punctuation marks (other than the underscore character), •
and other special characters in your identifiers.
Avoid using SQL Server 2014 reserved keywords as identifiers. You can find the list •
Limit the length of your identifiers. Thirty-two characters or less is a reasonable limit •
while not being overly restrictive. Much more than that becomes cumbersome to
type and can hurt your code readability.
Finally, to make your code more readable, select a capitalization style for your identifiers and code, and
use it consistently. My preference is to fully capitalize T-SQL keywords and use mixed-case and underscore
characters to visually break up identifiers into easily readable words. Using all capital characters or
inconsistently applying mixed case to code and identifiers can make your code illegible and hard to maintain.
Consider the example query in Listing 1-5.
Listing 1-5. All-Capital SELECT Query
The all-capital version is difficult to read. It’s hard to tell the SQL keywords from the column and
table names at a glance. Compound words for column and table names aren’t easily identified. Basically,
your eyes have to work a lot harder to read this query than they should, which makes otherwise simple
maintenance tasks more difficult. Reformatting the code and identifiers makes this query much easier on the
eyes, as Listing 1-6 demonstrates.
Listing 1-6. Reformatted, Easy-on-the-Eyes Query
FROM Person.Person p INNER JOIN Sales.SalesPerson sp
ON p.BusinessEntityID = sp.BusinessEntityID;
The use of all capitals for the keywords in the second version makes them stand out from the mixed-
case table and column names. Likewise, the mixed-case column and table names make the compound word
names easy to recognize. The net effect is that the code is easier to read, which makes it easier to debug and
maintain. Consistent use of good formatting habits helps keep trivial changes trivial and makes complex
changes easier.
One Entry, One Exit
When writing SPs and UDFs, it’s good programming practice to use the “one entry, one exit” rule. SPs and
UDFs should have a single entry point and a single exit point (RETURN statement).
The SP in Listing 1-7 is a simple procedure with one entry point and several exit points. It retrieves the
ContactTypelD number from the AdventureWorks2014 Person.ContactType table for the ContactType
name passed into it. If no ContactType exists with the name passed in, a new one is created, and the newly
created ContactTypelD is passed back.
Listing 1-7. Stored Procedure Example with One Entry and Multiple Exits
CREATE PROCEDURE dbo.GetOrAdd_ContactType
@Name NVARCHAR(50),
SELECT @Err_Code = 0;
SELECT @ContactTypeID = ContactTypeID
FROM Person.ContactType
WHERE [Name] = @Name;
RETURN; -- Exit 1: if the ContactType exists
INTO Person.ContactType ([Name], ModifiedDate)
SELECT @Err_Code = 'error';
IF @Err_Code <> 0
RETURN @Err_Code; -- Exit 2: if there is an error on INSERT
RETURN @Err_Code; -- Exit 3: after successful INSERT
This code has one entry point but three possible exit points. Figure 1-3 shows a simple flowchart for the
paths this code can take.
As you can imagine, maintaining code such as that in Listing 1-7 becomes more difficult because the
flow of the code has so many possible exit points, each of which must be accounted for when you make
modifications to the SP. Listing 1-8 updates Listing 1-7 to give it a single entry point and a single exit point,
making the logic easier to follow.
Listing 1-8. Stored Procedure with One Entry and One Exit
CREATE PROCEDURE dbo.GetOrAdd_ContactType
@Name NVARCHAR(50),
SELECT @Err_Code = 0;
SELECT @ContactTypeID = ContactTypeID
FROM Person.ContactType
WHERE [Name] = @Name;
IF @ContactTypeID IS NULL
Figure 1-3. Flowchart for an example with one entry and multiple exits
INTO Person.ContactType ([Name], ModifiedDate)
SELECT @Err_Code = @@error;
IF @Err_Code = 0 -- If there's an error, skip next
RETURN @Err_Code; -- Single exit point
Figure 1-4 shows the modified flowchart for this new version of the SP.
Figure 1-4. Flowchart for an example with one entry and one exit
The one entry and one exit model makes the logic easier to follow, which in turn makes the code easier
to manage. This rule also applies to looping structures, which you implement via the WHILE statement
in T-SQL. Avoid using the WHILE loops CONTINUE and BREAK statements and the GOTO statement; these
statements lead to old-fashioned, difficult-to-maintain spaghetti code.
Defensive Coding
Defensive coding involves anticipating problems before they occur and mitigating them through good coding
practices. The first and foremost lesson of defensive coding is to always check user input. Once you open your
system to users, expect them to do everything in their power to try to break your system. For instance, if you
ask users to enter a number between 1 and 10, expect that they’ll ignore your directions and key in ; DROP
TABLE dbo.syscomments; -- at the first available opportunity. Defensive coding practices dictate that you
should check and scrub external inputs. Don’t blindly trust anything that comes from an external source.
Another aspect of defensive coding is a clear delineation between exceptions and run-of-the-mill
issues. The key is that exceptions are, well, exceptional in nature. Ideally, exceptions should be caused by
errors that you can’t account for or couldn’t reasonably anticipate, like a lost network connection or physical
corruption of your application or data storage. Errors that can be reasonably expected, like data-entry errors,
should be captured before they’re raised to the level of exceptions. Keep in mind that exceptions are often
resource-intensive, expensive operations. If you can avoid an exception by anticipating a particular problem,
your application will benefit in both performance and control. SQL Server 2012 introduced a valuable
new error-handling feature called THROW. The TRY/CATCH/THROW statements are discussed in more detail in
Chapter 18.
The SELECT * Statement
Consider the SELECT * style of querying. In a SELECT clause, the asterisk (*) is a shorthand way of specifying
that all columns in a table should be returned. Although SELECT * is a handy tool for ad hoc querying of
tables during development and debugging, you normally shouldn’t use it in a production system. One
reason to avoid this method of querying is to minimize the amount of data retrieved with each call. SELECT *
retrieves all columns, regardless of whether they’re needed by the higher-level applications. For queries that
return a large number of rows, even one or two extraneous columns can waste a lot of resources.
If the underlying table or view is altered, columns may be added to or removed from the returned result
set. This can cause errors that are hard to locate and fix. By specifying the column names, your front-end
application can be assured that only the required columns are returned by a query and that errors caused by
missing columns will be easier to locate.
As with most things, there are always exceptions—for example, if you’re using the FOR XML AUTO clause
to generate XML based on the structure and content of your relational data. In this case, SELECT * can be
quite useful, because you’re relying on FOR XML to automatically generate the node names based on the
table and column names in the source tables.
Tip SELECT * should be avoided, but if you do need to use it, always try to limit the data set being
returned. One way of doing so is to make full use of the T-SQL TOP command and restrict the number of records
returned. In practice, though, you should never write SELECT * in your code—even for small tables. Small
tables today could be large tables tomorrow.
Variable Initialization
When you create SPs, UDFs, or any script that uses T-SQL user variables, you should initialize those variables
before the first use. Unlike some other programming languages that guarantee that newly declared variables
will be initialized to 0 or an empty string (depending on their data types), T-SQL guarantees only that newly
declared variables will be initialized to NULL. Consider the code snippet shown in Listing 1-9.
Listing 1-9. Sample Code Using an Uninitialized Variable
DECLARE @i INT; SELECT @i = @i + 5; SELECT @i;
The result is NULL, which is a shock if you were expecting 5. Expecting SQL Server to initialize numeric
variables to 0 (like @i in the previous example) or an empty string will result in bugs that can be extremely
difficult to locate in your T-SQL code. To avoid these problems, always explicitly initialize your variables after
declaration, as demonstrated in Listing 1-10.
Listing 1-10. Sample Code Using an Initialized Variable
DECLARE @i INT = 0; -- Changed this statement to initialize @i to 0
SELECT @i = @i + 5;
This chapter has served as an introduction to T-SQL, including a brief history of SQL and a discussion of
the declarative programming style. The chapter started with a discussion of ISO SQL standard compatibility
in SQL Server 2014 and the differences between imperative and declarative languages, of which SQL is the
latter. You also saw many of the basic components of SQL, including databases, tables, views, SPs, and other
common database objects. Finally, I provided my personal recommendations for writing SQL code that is
easy to debug and maintain. I subscribe to the “eat your own dog food” theory, and throughout this book
I faithfully follow the best practice recommendations that I’ve asked you to consider.
The next chapter provides an overview of the new and improved tools available out of the box for
developers. Specifically, Chapter 2 discusses the SQLCMD text-based SQL client (originally a replacement
for osql), SSMS, SQL Server 2014 Books Online (BOL), and some of the other available tools that make
writing, editing, testing, and debugging easier and faster than ever.
1. Describe the difference between an imperative language and a declarative
2. What does the acronym ACID stand for?
3. SQL Server 2014 supports seven different types of indexes. Two of these indexes
are newly introduced in SQL 2014. What are they?
4. Name two of the restrictions on any type of SQL Server UDF.
5. [True/False] In SQL Server, newly declared variables are always assigned the
default value 0 for numeric data types and an empty string for character data types.
Chapter 2
Tools of the Trade
SQL Server 2014 comes with a wide selection of tools and utilities to make development easier and more
productive for developers. This chapter introduces some of the most important tools for SQL Server
developers, including SQL Server Management Studio (SSMS) and the SQLCMD utility, SQL Server Data
Tool add-ins to Microsoft Visual Studio, SQL Profiler, Database Tuning Advisor, Extended Events, and SQL
Server 2014 Books Online (BOL). You’re also introduced to supporting tools like SQL Server Integration
Services (SSIS), the Bulk Copy Program (BCP), and the AdventureWorks 2014 sample database, which you
use in examples throughout the book.
SQL Server Management Studio
Back in the heyday of SQL Server 2000, it was common for developers to fire up the Enterprise Manager (EM)
and Query Editor GUI database tools in rapid succession every time they sat down to write code. Historically,
developer and DBA roles in the DBMS have been highly separated, and with good reason. DBAs have historically
brought hardware and software administration and tuning skills, database design optimization experience, and
healthy doses of skepticism and security to the table. On the other hand, developers have focused on coding skills,
problem solving, system optimization, and debugging. This separation of powers works very well in production
systems, but in development environments developers are often responsible for their own database design and
management. Sometimes developers are put in charge of their own development server local security.
SQL Server 2000 EM was originally designed as a DBA tool, providing access to the graphical user
interface (GUI) administration interface, including security administration, database object creation and
management, and server management functionality. Query Editor was designed as a developer tool, the
primary GUI tool for creating, testing, and tuning queries.
SQL Server 2014 continues the tradition begun with SQL Server 2005 by combining the functionality
of both these GUI tools into a single GUI interface known as SQL Server Management Studio (SSMS).
This makes perfect sense in supporting real-world SQL Server development, where the roles of DBA and
developer are often intermingled in development environments.
Many SQL Server developers prefer the GUI administration and development tools to the text-based
query tool SQLCMD to build their databases, and on this front SSMS doesn’t disappoint. SSMS offers several
features that make development and administration easier, including the following:
Integrated, functional Object Explorer, which provides the ability to easily view •
all the objects in the server and manage them in a tree structure. The added filter
functionality helps users narrow down the objects they want to work with.
Color coding of scripts, making editing and debugging easier.•
Enhanced keyboard shortcuts that make searching faster and easier. Additionally, •
users can map predefined keyboard shortcuts to stored procedures that are used
most often.
Two keyboard shortcut schemes: keyboard shortcuts from SQL Server 2008 R2 and •
Microsoft Visual Studio 2010 compatibility.
Usability enhancements such as the ability to zoom text in the Query Editor by •
holding the Ctrl key and scrolling to zoom in and out. Users can drag and drop tabs,
and there is true multimonitor support.
Breakpoint validation, which prevents users from setting breakpoints at invalid locations.•
T-SQL code snippets, which are templates that can be used as starting points to build •
T-SQL statement in scripts and batches.
T-SQL Debugger Watch and Quick Watch windows, which support watching T-SQL •
Graphical query execution plans. These are the bread and butter of the query-•
optimization process. They greatly simplify the process of optimizing complex
queries, quickly exposing potential bottlenecks in your code.
Project-management and code-version control integration, including integration •
with Team Foundation Server (TFS) and Visual SourceSafe version control systems.
SQLCMD mode, which allows you to execute SQL scripts using SQLCMD. You can •
take advantage of SQLCMD’s additional script capabilities, like scripting variables
and support for the AlwaysON feature.
SSMS also includes database and server management features, but this discussion is limited to some of
the most important developer-specific features.
IntelliSense is a feature that was introduced in SQL Server 2008. When coding, you often need to look up
language elements such as functions, table names, and column names to complete your code. This feature
allows the SQL Editor to automatically prompt for the completion of the syntax you input, based on partial
words. To enable IntelliSense, go to Tools Options Text Editor Transact-SQL IntelliSense. Figure 2-1
demonstrates how the IntelliSense feature suggests language elements based on the first letter entered.
Figure 2-1. Using IntelliSense to complete a Select statement
Code Snippets
Code snippets aren’t a new concept to the programming world. Visual Studio developers are very familiar
with this feature; and because SSMS is built on the Visual Studio 2010 shell, SQL inherits this functionality
as well. During the development cycle, developers often use a set of T-SQL statements multiple times
throughout the code being worked on. It’s much more efficient to access a block of code that contains
common code elements such as create stored procedure and create function, to help you build on top
of the code block. Code snippets are building blocks of code that you can use as a starting point when building
T-SQL scripts. This feature can help you be more productivity while increasing reusability and standardization
by enabling the development team to use existing templates or to create and customize a new template.
Code snippets help provide a better T-SQL code-editing experience. In addition, a snippet is an XML
template that can be used to guarantee consistency across the development team. These snippets fall into
three categories:
• Expansion snippets list the common outline of T-SQL commands such as Select,
Insert, and Create Table.
• Surround snippets include constructs such as while, if else, and begin end
• Custom snippets allow custom templates that can be invoked via the snippet menu.
You can create a custom snippet and add it to the server by importing the snippet
using the Code Snippet Manager. Once you add a custom snippet, the Custom
Snippets category appears in the Code Snippet Manager.
To access the code snippets, select the Code Snippets Manager from the Tools menu. Figure 2-2 shows
the Code Snippet Manager interface, which you can use to add, remove, or import code snippets.
Figure 2-2. Code Snippet Manager
When the snippet is inserted into the T-SQL Editor, fields that need to be customized are highlighted,
and you can use the Tab key to navigate through them. If you mouse over a highlighted token, a tooltip
provides additional information about it. Figure 2-5 shows the CREATE TABLE snippet invoked in the T-SQL
Editor along with the tooltip that lists the field’s description.
To insert a code snippet in the T-SQL Editor, right-click and select Insert Snippet or press Ctrl K+X.
Figure 2-3 demonstrates how to invoke the Insert Snippet and Surround With commands.
Figure 2-3. Right-click in the T-SQL Editor to invoke the command to insert snippets
Figure 2-4. Inserting a snippet
Once the Insert Snippet command is invoked, you have the option to choose a template based on the
SQL object type, such as Index, Table, Function, Login, Role, Schema, Stored Procedure, Trigger, Custom
Snippets, and so on. Figure 2-4 shows how to insert a snippet.
Figure 2-5. Adding a CREATE TABLE snippet, with the tooltip displayed
Keyboard Shortcut Schemes
If you ask an SQL user and a Visual Studio user, “What is the shortcut key to execute queries?” you’re bound
to receive two different answers: Ctrl+E for SQL users and Ctrl+Shift+E for Visual Studio users. Because
application developers are primarily Visual Studio users, it’s prudent to have an option that lets users pick
the keyboard shortcut scheme that’s familiar based on the tool they have been using. Another advantage
of defining and standardizing the keyboard shortcut scheme at the team level is that doing so helps team
members avoid executing wrong actions in the team environment.
SQL Server 2014 offers two keyboard shortcut schemes: the default, the SQL Server 2014 shortcut
scheme (the default) and the Visual Studio 2010 shortcut scheme. The SSMS interface hasn’t been updated
in SQL 2014. Functionality and color schemes operate the same as in SQL Server 2012. To change the
keyboard shortcut settings, choose Tools Options Environment Keyboard. Figure 2-6 shows the
option to change the keyboard mapping scheme.
Figure 2-6. Keyboard shortcut mapping scheme
T-SQL Debugging
SQL Server 2012 introduced enhancements to T-SQL debugging by providing the ability to set conditional
breakpoints, meaning a breakpoint is invoked only if a certain expression is evaluated. T-SQL debugging
also extends support for expression evaluation in Watch and Quick Watch windows. You can also specify a
hit count, meaning you can specify how many times a breakpoint can be hit before it’s invoked. Breakpoints
can also be exported from one session to the other. The Watch and Quick Watch windows support T-SQL
expressions as well. Figure 2-7 shows the Debugging screen with the Output and Locals windows.
Figure 2-7. T-SQL debugging with the Locals and Output windows
A breakpoint can now be placed on individual statements in a batch, and breakpoints are context-
sensitive. When a breakpoint is set, SQL validates the breakpoint’s location and immediately provides
feedback if the breakpoint is set at an invalid location. For example, if you set a breakpoint on a comment,
you get feedback that it’s an invalid breakpoint; and if you try to set a breakpoint for one of the lines in a
multiline statement, the breakpoints is added to all the lines.
A DataTip is another debugging enhancement that was added in SQL Server 2012 to help you track
variables and expressions in the scope of execution while debugging by providing ability to “pin” a DataTip
to keep it visible (even when the debug session is restarted). When the debugger is in break mode, if you
mouse over a T-SQL expression that is being evaluated, you can see the current value of that expression.
Figure 2-8 shows a breakpoint and DataTip.
Figure 2-8. A breakpoints and a DataTip
Note The user login must be part of the sysadmin role on the SQL Server instance in order to use T-SQL
debugging capabilities when using SSMS. With SQL Server Data Tools (SSDT), developers now have the option
of debugging without being part of the sysadmin role, using their localdb instance of the schema.
SSMS Editing Options
SSMS incorporates and improves on many of the developer features found in Query Editor. You can change
the editing options discussed in this section via the Tools ä Options.
SSMS includes fully customizable script color coding. The default font has been changed to the
monotype font Consolas, and the background color is now blue to match Visual Studio 2012. You can
customize the foreground and background colors, font face, size, and style for elements of T-SQL, XML,
XSLT, and MDX scripts. Likewise, you can customize just about any feedback that SSMS generates, to suit
your personal taste.
You can set other editing options, such as word wrap, line-number display, indentation, and tabs for
different file types based on their associated file extensions. SSMS lets you configure your own keyboard
shortcuts to execute common T-SQL statements or SPs.
By default, SSMS displays queries using a tabbed window environment. If you prefer the classic multiple-
document interface (MDI) window style, you can switch the environment layout accordingly. You can also
change the query result output style from the default grid output to text or file output.
Context-Sensitive Help
Starting with SQL Server 2012, the product documentation is hosted online (MSDN/TechNet) to ensure that
the content is kept up to date. If you want to access the product documentation from your local computer,
you have to download the help catalogs and set up the Help Viewer. To configure the documentation, go to
the Help menu and select Manage Help Settings. Doing so launches the Help Library Manager. Scroll down
to the SQL Server 2014 section, and click Add Next for the documentation you want to download. If the
documentation is already available in your system, the Help Library Manager updates the catalog’s index
with the SQL Server documentation.
To access context-sensitive help, highlight the T-SQL or other statement you want help with and press F1.
You can add help pages to your Help Favorites or go directly to MSDN. If pressing F1 doesn’t work, remember
to download the documentation locally and choose to use local help. Figure 2-9 shows the result of calling
context-sensitive help for the CREATE TABLE statement.
Figure 2-9. Using SSMS context-sensitive help to find the CREATE TABLE statement
SSMS has several options that allow you to control help functionality and presentation. You can, for
example, use the SSMS Integrated Help Viewer, shown in Figure 2-9, or you can use the External Online Help
Viewer. The Settings window in the Help Viewer allows you to set a preference to use online or offline help;
it’s shown in Figure 2-10.
Help Search rounds out this discussion of the help functionality in SSMS. The Help Search function
automatically searches several online providers of SQL Server–related information for answers to your
questions. Searches aren’t restricted to SQL Server keywords or statements; you can search for anything,
and the Help Search function scours registered web sites and communities for relevant answers. Figure 2-11
shows the result of using Help Search to find XQuery content and articles.
Figure 2-10. Using the Help Viewer Settings window to personalize SSMS help
Graphical Query Execution Plans
SSMS offers graphical query execution plans similar to the plans available in Query Editor. A graphical query
execution plan is an excellent tool for aiding and optimizing query performance. SSMS allows you to view two
types of graphical query execution plans: estimated and actual. An estimated query execution plan is SQL
Server’s cost-based performance estimate of a query. The actual execution plan is virtually identical to the
estimated execution plan, except that it shows additional information such as actual row counts, number of
rebinds, and number of rewinds when the query is run. Sometimes the actual execution plan differs from the
estimated execution plan; this may be due to changes in indexes or statistics, parallelism, or, in some cases, a
query using temporary tables or DDL statements. These options are available via the Query menu. Figure 2-12
shows an estimated query execution plan in SSMS.
Figure 2-11. Using Help Search to find help on XQuery
In addition, you can right-click the Execution Plan window and choose to save the XML version of the
graphical query plan to a file. SSMS can open these XML query plan files (with the extension .sqlplan) and
automatically show you the graphical version. In addition, the Properties window of the SQL Server 2014
query plan contains details regarding the MemoryGrantInfo, OptimizerHardwareDependentProperties,
and warnings about data that can affect plans. Figure 2-13 shows a sample Properties window for a query
plan. You also have an option to view the execution plan in XML format by right-clicking the Execution Plan
window and choosing Show Execution Plan XML.
Figure 2-12. Estimated query execution plan for a simple query
Figure 2-13. Sample Properties window for a simple query
Along with the execution plan, you can review query statistics and network statistics in the Client
Statistics tab. This is extremely useful for remotely troubleshooting performance problems with slow-
running queries.
Project-Management Features
SQL Server 2014 SSMS supports project-management features that will be familiar to Visual Studio
developers using solution-based development. These types of solutions, referred to as SQL Server
Management Studio database projects, are a deprecated feature in SQL Server 2014. There is no migration
path for these types of solutions/projects, and they won’t be supported in future releases of SQL Server.
The replacement for this type of functionality is SQL Server Data Tools (SSDT) using Visual Studio database
projects. The two products have completely different project types that can’t be managed or opened in the
other product.
This section explains how to use SSMS projects types, but the recommendation is that you start
developing any new projects in SSDT. There is a section discussing SSDT at the end of this chapter.
SSMS lets you create solutions that consist of projects, which contain T-SQL scripts, XML files,
connection information, and other files. By default, projects and solutions are saved in your My Documents\
SQL Server Management Studio\Projects directory. Solution files have the extension .ssmssln, and
project files are saved in an XML format with the .smssproj extension. SSMS incorporates a Solution
Explorer window similar to Visual Studios Solution Explorer, as shown in Figure 2-14. You can access the
Solution Explorer through the View menu.
Figure 2-14. Viewing a solution in the SSMS Solution Explorer
SSMS can take advantage of source-control integration with TFS to help you manage versioning and
deployment. To use SSMS’s source-control integration, you have to set the appropriate source-control option
in the Options menu. The Options window is shown in Figure 2-15.
Note To use SSMS with TFS, you need to download and install the appropriate Microsoft Source Code
Control Interface (MSSCCI) provider from Microsoft. Go to, search for “MSSCCI, and download
the Visual Studio Team System 2010, 2012, or 2013 version of the MSSCCI provider, depending on which version
you’re already using.
After you create a solution and add projects, connections, and SQL scripts, you can add your solution to
TFS by right-clicking the solution in the Solution Explorer and selecting Add Solution to Source Control.
To check out items from source control, open a local copy and choose Check Out for Edit. You can
find options for checking out items from source control on the File Source Control menu. After checking
out a solution from TFS, SSMS shows you the pending check-ins, letting you add comments to or check in
individual files or projects.
Figure 2-15. Viewing the source-control options
Most objects in the Object Explorer and the Object Explorer Details tab have object-specific pop-up
context menus. Right-clicking any given object brings up the menu. Figure 2-17 shows an example pop-up
context menu for database tables.
Figure 2-16. Viewing the Object Explorer and the Object Explorer Details tab
The Object Explorer
The SSMS Object Explorer lets you view and manage database and server objects. In the Object Explorer,
you can view tables, stored procedures (SPs), user-defined functions (UDFs), HTTP endpoints, users, logins,
and just about every other database-specific or server-scoped object. Figure 2-16 shows the Object Explorer
in the left pane and the Object Explorer Details tab on the right.
Object Explorer in SQL Server 2014 allows developers to filter specific types of objects from all the
database objects. To filter objects, type text with optional wildcard characters in the Object Explorer Details
window, and press Enter. Optionally, you can filter objects using the Filter icon on the Object Explorer
Details toolbar. Figure 2-18 shows an example of filtering objects named “Person”.
Figure 2-17. Object Explorer database table pop-up context menusss
The SQLCMD Utility
The SQLCMD utility was originally introduced in SQL Server 2005 as an updated replacement for the SQL
2000 osql command-line utility. You can use SQLCMD to execute batches of T-SQL statements from script
files, individual queries or batches of queries in interactive mode, or individual queries from the command
line. This utility uses SQL Server Native Client to execute the T-SQL statements.
Note Appendix D provides a quick reference to SQLCMD command-line options, scripting variables, and
commands. The descriptions in the appendix are based on extensive testing of SQLCMD and differ in some
areas from the descriptions given in BOL.
SQLCMD supports a wide variety of command-line switches, making it a flexible utility for one-off
batch or scheduled script execution. The following command demonstrates the use of some commonly used
command-line options to connect to an SQL Server instance named SQL2014 and execute a T-SQL script in
the AdventureWorks2014 database:
sqlcmd -S SQL2014 -E -d AdventureWorks2014 -i "d:\scripts\ListPerson.sql"
The options include -S to specify the server\instance name, -E to indicate Windows authentication, -d
to set the database name, and -i to specify the name of a script file to execute. The command-line switches
are all case sensitive, so -v is a different option from -V, for instance.
SQLCMD allows you to use scripting variables that let you use a single script in multiple scenarios.
Scripting variables provide a mechanism for customizing the behavior of T-SQL scripts without modifying
the scripts’ content. You can reference scripting variables that were previously set with the -v command-line
switch, with the SQLCMD :setvar command (discussed in the next section), or via Windows environment
variables. You can also use any of the predefined SQLCMD scripting variables from within your script. The
format to access any of these types of scripting variables from within your script is the same: $(variable_
name). SQLCMD replaces your scripting variables with their respective values during script execution.
Listing 2-1 shows some examples of scripting variables in action.
Figure 2-18. Object Explorer with database objects filtered on “Person
Listing 2-1. Using Scripting Variables in an SQLCMD Script
-- Windows environment variable
-- SQLCMD scripting variable
-- Command-line scripting variable -v COLVAR= "Name" switch
FROM Sys.Tables;
Because scripting variables are replaced in a script wholesale, some organizations may consider their
use a security risk due to the possibility of SQL injection-style attacks. For this reason, you may choose to
turn off this feature by using the -x command-line option, which disables variable substitution.
An example of an SQLCMD scripting variable is the predefined SOLCMDINI, which specifies the
SQLCMD startup script. The startup script is run every time SQLCMD is run. It’s useful for setting scripting
variables with the :setvar command, setting initial T-SQL options such as QUOTED_IDENTIFIER and ANSI_
PADDING, and performing any necessary database tasks before other scripts are run.
In addition to T-SQL statements, SQLCMD recognizes several commands specific to the application.
SQLCMD commands allow you to perform tasks like listing servers and scripting variables, connecting to
a server, and setting scripting variables, among others. Except for the batch terminator GO, all SQLCMD
commands begin with a colon (:).
SQLCMD can also be run interactively. To start an interactive mode session, run SQLCMD with any of
the previous options that don’t exit immediately on completion.
Note SQLCMD options such as -0, -i, -Z, and -? exit immediately on completion. You can’t start an
interactive SQLCMD session if you specify any of these command-line options.
During an interactive SQLCMD session, you can run T-SQL queries and commands from the SQLCMD
prompt. The interactive screen looks similar to Figure 2-19.
The SQLCMD prompt indicates the current line number of the batch (1>, 2>, and so on). You can enter
T-SQL statements or SQLCMD commands at the prompt. T-SQL statements are stored in the statement
cache as they’re entered; SQLCMD commands are executed immediately. Once you have entered a
complete batch of T-SQL statements, use the GO batch terminator to process all the statements in the cache.
SQLCMD has support for the new AlwaysOn feature. You can use the switch –K to specify the listener name.
There has been a behavior change for SQLCMD for XML as well. In SQL 2008, text data that contained
a single quote was always replaced with an apostrophe. This behavior change has been addressed in SQL
Server 2012. Additionally, legacy datetime values with no fractional seconds donot return three decimal
digits; however, other datetime data types aren’t affected.
SQL Server Data Tools
SQL Server 2014 ships with a new developer toolset named SQL Server Data Tools that serves as a replacement
for Business Intelligence Development Studio (BIDS). In the highly competitive business world, the top three
challenges todays developers face are collaboration, targeting different database platforms with the same
codebase, and code stability. SSDT is designed to help with these challenges. It provides a tool that enables you to
add validations at design time and not at runtime. A common pitfall for developers is that errors are discovered at
runtime which aren’t apparent and don’t surface at design time, and SSDT serves to eliminate this issue.
You can code, build, debug, package, and deploy code without leaving the tool. After importing or
creating a new database project, you can alter the project properties to target a specific database version.
The underlying compiler uses the database version rules engine and compiles the project based on the
database edition features. For example, if you’re developing code for SQL Azure, the tool knows that you
can’t use sequence objects. This type of built-in intelligence in the tool is key to faster effective development
so you don’t discover issues at runtime, which would require rearchitecting the application.
This type of feature is also helpful when you’re upgrading from an older version of SQL to a newer
version. The compiler tells you if the older code will generate errors in the newer version of SQL.
Figure 2-19. Sample query run from the SQLCMD interactive prompt
You can create objects and buffer object editing, and T-SQL IntelliSense is also used. Once you finalize
development, you can choose the platform to deploy to, and the project is deployed with a single click.
SQL Profiler
SQL Profiler is the primary tool for analyzing SQL Server performance. If you have a performance problem
but aren’t sure where the bottleneck lies, SQL Profiler can help you rapidly narrow down the suspects. It
works by capturing events that occur on the server and logging them to a trace file or table. The classes of
events that can be captured are exhaustive, covering a wide range of server-side events including T-SQL and
SP preparation and execution, security events, transaction activity, locks, and database resizing.
When you create a new trace, SQL Profiler allows you to select all the events you wish to audit.
Normally, you narrow this list as much as possible for both performance and manageability reasons.
Figure 2-21 is a sample trace that captures T-SQL–specific events on the server.
Figure 2-20. SSDT New Project window
SSDT can be used for connected development and disconnected development in case of a team project.
Figure 2-20 shows the New Project window, which is based on the familiar SSMS Object Explorer.
Once a trace is configured and running, it captures all the specified events on the server. A sample trace
run using T-SQL events is shown in Figure 2-22.
Figure 2-22. Running a trace of T-SQL events
Figure 2-21. Preparing to capture T-SQL events in SQL Profiler
As you can see in the example, even a simple trace that captures a relatively small number of events can
easily become overwhelming, particularly if run against an SQL Server instance with several simultaneous
user connections. SQL Profiler offers the Column Filter option, which lets you eliminate results from a trace.
Using filters, you can narrow the results to include only actions performed by specific applications or users,
or activities relevant only to a particular database. Figure 2-23 shows the Edit Filter window where you select
trace filters.
Figure 2-23. Editing filters in SQL Profiler
SQL Profiler offers several additional options, including trace replay and the ability to save trace results
to either a file or a database table. SQL Profiler is vital to troubleshooting SQL Server performance and
security issues.
SQL Server 2014 lists SQL Profiler for trace capture and trace replay as deprecated; they won’t be
supported in future versions of SQL Server. However, for analysis services workloads, both trace capture and
trace replay will be supported. The replacement feature for the deprecated functionality is Extended Events.
Extended Events
These days it’s common to have many complex systems with hundreds of cores that support applications
with a scale-out model with a set of SQL Servers. The SQL Servers that support the complex applications
use various features such as compression to reduce storage costs, high availability, and disaster-recovery
features. For such a complex system, performance monitoring is vital: Extended Events is designed to handle
these complex situations and diagnose issues in these systems without adding a performance penalty.
The Extended Events (XEvents) diagnostic tools was introduced in SQL 2008, and it received a
makeover in SQL Server 2012 with a new GUI interface to aid ease of use. It’s a lightweight, asynchronous
eventing system that can retrieve information based on events triggered in the SQL engine. You can use
XEventsto track both high-level issues such as query execution or blocking in the server, and low-level issues
that are very close to the SQL Server code, such as how long it took for the spinlocks to back off. XEvents can
be used to collect additional data about any event and perform predefined actions such as taking a memory
dump when events happen; for example, you may be working with an application whose developer requests
that you take a memory dump when a specific query executes.
Results from XEvents can be written to various targets, including the Windows trace file. If you have an
application that is gathering diagnostic information from IIS, and you want to correlate the data from SQL
Server, writing to the Windows trace file will make debugging much easier. The event data that has been
written to the Windows trace file can be viewed using a tool such as Xperf or tracerpt. As with any diagnostic
tool, the data that is collected can be saved to multiple locations including the file system, tables, and
windows logging simultaneously. Figure 2-24 shows the Extended Events user interface.
Figure 2-24. Extended Events new session
XEvents has been implemented by the SQL Engine, merge replication, analysis services, and reporting
services in SQL Server 2014. In some of the components, such as analysis services, it’s targeted information
and not a complete implementation.
The XEvents UI is integrated with Management Studio: the tree has a separate node called Extended Events.
You can create a new session by right-clicking the Extended Events node and selecting the session. XEvents
sessions can be based on predefined templates, or you can create a session by choosing specific events.
XEvents offers a rich diagnostic framework that is highly scalable and offers the capability to collect
little or large amounts of data in order to troubleshoot a given performance issue. Another reason to start
using XEvents is that SQL Profiler has been marked for deprecation. Extended Events is discussed in detail in
Chapter 19.
SQL Server Integration Services
SSIS was introduced in SQL Server 2005 as the replacement for SQL Server 7.0 and 2000 Data
Transformation Services (DTS). SSIS provides an enterprise-class Extract Transform Load (ETL) tool that
allows you to design simple or complex packages to extract data from multiple sources and integrate them
into your SQL Server databases. It also provides rich BI integration and extensibility. In addition to data
transformations, SSIS provides SQL Server–specific tasks that allow you to perform database-administration
and -management functions like updating statistics and rebuilding indexes.
SSIS divides the ETL process into three major parts: control flow, data flow, and event handlers.
The control flow provides structure to SSIS packages and controls execution via tasks, containers, and
precedence constraints. The data flow imports data from various sources, transforms it, and stores it in
specified destinations. The data flow, from the perspective of the control flow, is just another task. However,
the data flow is important enough to require its own detailed design surface in a package. Event handlers
allow you to perform actions in response to predefined events during the ETL process. Figure 2-25 shows a
simple SSIS data flow that imports data from a table into a flat file.
SSIS is a far more advanced ETL tool than DTS, and it provides significant improvements in features,
functionality, and raw power over the old DTS tools.
The Bulk Copy Program
Although it isn’t as flashy or feature-rich as SSIS, BCP is small and fast, and it can perform simple imports
with no hassle. BCP is handy for generating format files for BCP and other bulk-import tools, for one-off
imports where a full-blown SSIS package would be overkill, for exporting data from database tables to files,
and for backward compatibility when you don’t have the resources to devote to immediately upgrading old
BCP-based ETL processes.
Figure 2-25. Data flow to import data from a table to flat file
Figure 2-26 shows a simple command-line call to BCP to create a BCP format file and a listing of the
format file. The format files generated by BCP can be used by BCP, SSIS, and the T-SQL BULK INSERT statement.
Figure 2-26. Generating a format file with BCP
SQL Server 2014 Books Online
Books Online (BOL) is the primary reference for SQL Server programming and administration. SQL Server
2014 introduces the Help Viewer piece from the VS2010 shell and doesn’t include BOL along with the default
setup. During the SQL installation, you have the option to choose the documentation feature, which in turn
installs the Help Viewer.
You also have the option to install the BOL from an online resource. You can access a locally installed copy
of BOL, or you can access it over the Web at Microsoft’s web site. The help documentation can be found at Figure 2-27 shows a search of a local copy of BOL.
You can get updates for BOL at The online version of SQL
Server 2012 BOL is available at Also keep in
mind that you can search online and local versions of BOL, as well as several other SQL resources, via the
Help Search function discussed previously in this chapter.
Tip Microsoft now offers an additional option for obtaining the most up-to-date version of BOL. You can
download the latest BOL updates from the Microsoft Update site, at
microsoftupdate. Microsoft has announced plans to refresh BOL with updated content more often and to
integrate SQL Server developer and DBA feedback into BOL more quickly.
The AdventureWorks Sample Database
SQL Server 2014 has two main sample databases: the AdventureWorks2014 OLTP and SQL Server 2014 RTM
In-Memory OLTP databases. This book refers to the AdventureWorks2014 OLTP database for most examples.
Microsoft now releases SQL Server sample databases through its CodePlex web site. You can download the
AdventureWorks databases and associated sample code from
Figure 2-27. Searching local BOL for information about the SELECT statement
Note It’s highly recommended that you download the SQL Server AdventureWorks2014 OLTP database so
that you can run the sample code in this book as you go through each chapter.
SQL Server 2014 includes the tools you’ve come to expect with any SQL Server release. This chapter has
provided an overview of several tools that will be important to you as an SQL Server 2014 developer. The
tools discussed include the following:
SSMS, the primary GUI for SQL Server development and administration•
SQLCMD, SSMS’s text-based counterpart•
SSDT, an integrated tool for developers•
SQL Profiler, which supplies event-capture and server-side tracing capabilities for •
analyzing SQL Server performance and auditing security
Extended Events, a lightweight, asynchronous, event-based troubleshooting tool•
SSIS, the primary ETL tool for SQL Server 2014•
BCP, a command line–based bulk import tool•
BOL, the first place to look when you’re trying to locate information about all things •
SQL Server
AdventureWorks, the freely available Microsoft-supplied sample database•
These topics could easily fill a book by themselves (and many, in fact, have). The following chapters
review the SQL Server 2014 features in detail.
1. SSDT is an SQL development tool. What tools did SSDT replace?
2. [Choose all that apply] SQL Server 2014 SSMS provides which of the following
a. Ability to add code snippets and customize them
b. An integrated Object Explorer for viewing and managing the server, databases,
and database objects
c. IntelliSense, which suggests table, object, and function names as you type SQL
d. Customizable keyboard mapping scheme for Visual Studio users
3. SSIS is considered what type of tool?
4. [True/False] SQLCMD can use command-line options, environment variables, and
SQLCMD :setvar commands to set scripting variables.
5. [Choose one] BCP can be used to perform which of the following tasks?
a. Generating format files for use with SSIS
b. Importing data into tables without format files
c. Exporting data from a table to a file
d. All of the above
6. What is one feature that Extended Events offers that SQL Profiler doesn’t?
7. What are the target platforms that can be deployed using SSDT?
Chapter 3
Procedural Code
T-SQL has always included support for procedural programming in the form of control-of-flow statements
and cursors. One thing that throws developers from other languages off their guard when migrating to SQL
is the peculiar three-valued logic (3VL) we enjoy. Chapter 1 introduced you to SQL 3VL, and this chapter
expands further on this topic. SQL 3VL is different from most other programming languages’ simple
two-valued Boolean logic. This chapter also discusses T-SQL control-of-flow constructs, which allow you
to change the normally sequential order of statement execution. Control-of-flow statements let you branch
your code logic with statements like IF...ELSE..., perform loops with statements like WHILE, and perform
unconditional jumps with the GOTO statement. You’re also introduced to CASE expressions and CASE-derived
functions that return values based on given comparison criteria in an expression. Finally, we finish the
chapter by explaining a topic closely tied to procedural code: SQL cursors.
Note Technically the T-SQL TRY...CATCH and the newer TRY_PARSE and TRY_CONVERT are control-of-flow
constructs. But these are specifically used for error handling and are discussed in Chapter 18, which describes
error handling and dynamic SQL.
Three-Valued Logic
SQL Server 2014, like all ANSI-compatible SQL DBMS products, implements a peculiar form of logic known
as 3VL. 3VL is necessary because SQL introduces the concept of NULL to serve as a placeholder for values that
aren’t known at the time they’re stored in the database. The concept of NULL introduces an unknown logical
result into SQLs ternary logic system. Let’s begin looking at SQL 3VL with a simple set of propositions:
Consider the proposition “1 is less than 3.” The result is logically true because the •
value of the number 1 is less than the value of the number 3.
The proposition “5 is equal to 6” is logically false because the value of the number 5 •
isn’t equal to the value of the number 6.
The proposition “X is greater than 10” presents a bit of a problem. The variable X is •
an algebraic placeholder for an actual value. Unfortunately, we haven’t told you what
value X stands for at this time. Because you don’t know what the value of X is, you
can’t say the statement is true or false; instead you can say the result is unknown.
SQL NULL represents an unknown value in the database in much the same way that
the variable X represents an unknown value in this proposition, and comparisons
with NULL produce the same unknown logical result in SQL.
Because NULL represents unknown values in the database, comparing anything with NULL (even other NULLs)
produces an unknown logical result. Figure 3-1 is a quick reference for SQL Server 3VL, where p and q
represent 3VL result values.
Figure 3-1. SQL 3VL quick reference chart
As mentioned previously, the unknown logic values shown in the chart are the result of comparisons
with NULL. The following predicates, for example, all evaluate to an unknown result:
@x = NULL
FirstName <> NULL
PhoneNumber > NULL
If you used one of these as the predicate in a WHERE clause of a SELECT statement, the statement would
return no rows—SELECT with a WHERE clause returns only rows where the WHERE clause predicate evaluates to
true; it discards rows for which the WHERE clause is false or unknown. Similarly, the INSERT, UPDATE, and DELETE
statements with a WHERE clause only affect rows for which the WHERE clause evaluates to true.
SQL Server provides a proprietary mechanism, the SET ANSI_NULLS OFF option, to allow direct equality
comparisons with NULL using the = and <> operators. The only ISO-compliant way to test for NULL is with the
IS NULL and IS NOT NULL comparison predicates. We highly recommend that you stick with the
ISO-compliant IS NULL and IS NOT NULL predicates for a few reasons:
Many SQL Server features like computed columns, indexed views, and XML indexes •
require SET ANSI_NULLS ON at creation time.
Mixing and matching • SET ANSI_NULLS settings in your database can confuse other
developers who have to maintain your code. Using ISO-compliant NULL-handling
consistently eliminates confusion.
• SET ANSI_NULLS OFF allows direct equality comparisons with NULL, returning true
if you compare a column or variable to NULL. It doesn’t return true if you compare
NULLs contained in two columns, though, which can be confusing.
To top it all off, Microsoft has deprecated the • SET ANSI_NULLS OFF setting. It will be
removed in a future version of SQL Server, so it’s a good idea to start future-proofing
your code now.
It’S a CLOSeD WOrLD, aFter aLL
The closed-world assumption (CWA) is an assumption in logic that the world is “black and white,
“true or false, or “ones and zeros.” When applied to databases, the CWA basically states that all data
stored in the database is true; everything else is false. The CWA presumes that only knowledge of the
world that is complete can be stored in a database.
NULL introduces an open-world assumption (OWA) to the mix. It allows you to store information in the
database that may or may not be true. This means an SQL database can store incomplete knowledge
of the world—a direct violation of the CWA. Many relational management (RM) theorists see this as an
inconsistency in the SQL DBMS model. This argument fills many an RM textbook and academic blog,
including web sites like Hugh Darwen’s and C. J. Date’s The Third Manifesto (,
so we won’t go deeply into the details here. Just realize that many RM experts dislike SQL NULL. As an
SQL practitioner in the real world, however, you may discover that NULL is often the best option available
to accomplish many tasks.
Control-of-Flow Statements
T-SQL implements procedural language control-of-flow statements, including such constructs as
BEGIN...END, IF...ELSE, WHILE, and GOTO. T-SQLs control-of-flow statements provide a framework for
developing rich server-side procedural code. Procedural code in T-SQL does come with some caveats,
though, which we discuss in this section.
The BEGIN and END Keywords
T-SQL uses the keywords BEGIN and END to group multiple statements together in a statement block. The BEGIN
and END keywords don’t alter execution order of the statements they contain, nor do they define an atomic
transaction, limit scope, or perform any function other than defining a simple grouping of T-SQL statements.
Unlike other languages, such as C++ and C#, which use braces ({ }) to group statements in logical
blocks, T-SQLs BEGIN and END keywords don’t define or limit scope. The following sample C# code, for
instance, won’t even compile:
int j = 10; } Console.WriteLine (j);
C# programmers will automatically recognize that the variable j in the previous code is defined inside
braces, limiting its scope and making it accessible only inside the braces. T-SQLs roughly equivalent code,
however, doesn’t limit scope in this manner:
DECLARE @j int = 10;
The previous T-SQL code executes with no problem, as long as the DECLARE statement is encountered
before the variable is referenced in the PRINT statement. The scope of variables in T-SQL is defined in terms
of command batches and database object definitions (such as SPs, UDFs, and triggers). Declaring two or
more variables with the same name in one batch or SP results in errors.
Caution T-SQLs BEGIN and END keywords create a statement block but don’t define a scope. Variables
declared in a BEGIN...END block aren’t limited in scope just to that block, but are scoped to the whole batch,
SP, or UDF in which they’re defined.
BEGIN...END is useful for creating statement blocks where you want to execute multiple statements
based on the results of other control-of-flow statements like IF...ELSE and WHILE. BEGIN...END can also
have another added benefit if you’re using SSMS 2014 or a good third-party SQL editor like ApexSQL Edit
( BEGIN...END can alert the GUI that a section of code is collapsible. Figure 3-2 shows
more than one region of code that is collapsible. This can speed up development and ease debugging,
especially if you’re writing complex T-SQL scripts.
Figure 3-2. BEGIN...END statement blocks marked collapsible in SSMS
Tip Although it’s not required, we like to wrap the body of CREATE PROCEDURE statements with
BEGIN...END. This clearly delineates the body of the stored procedure. This is purely a coding style preference
and has no affect on the stored procedure performance or function.
The IF...ELSE Statement
Like many procedural languages, T-SQL implements conditional execution of code using the simplest
of procedural statements: the IF...ELSE construct. The IF statement is followed by a logical predicate.
If the predicate evaluates to true, the single SQL statement or statement block wrapped in BEGIN...END
is executed. If the predicate evaluates to either false or unknown, SQL Server falls through to the ELSE
statement and executes the single statement or statement block following ELSE.
Tip A predicate in SQL is an expression that evaluates to one of the logical results true, false, or unknown.
Predicates are used in IF...ELSE statements, WHERE clauses, and anywhere that a logical result is needed.
The example in Listing 3-1 performs up to three comparisons to determine whether a variable is equal
to a specified value. The second ELSE statement executes if and only if the tests for both true and false
conditions fail.
Listing 3-1. Simple IF...ELSE Example
DECLARE @i int = NULL;
IF @i = 10
ELSE IF NOT (@i = 10)
Because the variable @i is NULL in the example, SQL Server reports that the result is unknown. If you
assign the value 10 to the variable @i, SQL Server will report that the result is true; all other values will
report false.
To create a statement block containing multiple T-SQL statements after either the IF statement or the
ELSE statement, simply wrap your statements with the T-SQL BEGIN and END keywords discussed in the
previous section. The example in Listing 3-2 is an IF...ELSE statement with statement blocks. The example
uses IF...ELSE to check the value of the variable @direction. If @direction is ASCENDING, a message is
printed, and the top ten names, in order of last name, are selected from the Person.Contact table. If
@direction is DESCENDING, a different message is printed, and the bottom ten names are selected from the
Person.Contact table. Any other value results in a message that @direction was not recognized. The results
of Listing 3-2 are shown in Figure 3-3.
Listing 3-2. IF...ELSE with Statement Blocks
IF @direction = N'ASCENDING'
PRINT 'Start at the top!';
FROM Person.Person
ELSE IF @direction = N'DESCENDING'
PRINT 'Start at the bottom!';
FROM Person.Person
PRINT '@direction was not recognized!';
The WHILE, BREAK, and CONTINUE Statements
Looping is a standard feature of procedural languages, and T-SQL provides looping support through the
WHILE statement and its associated BREAK and CONTINUE statements. The WHILE loop is immediately followed
by a predicate; WHILE executes a given SQL statement or statement block bounded by the BEGIN and END
keywords as long as the associated predicate evaluates to true. If the predicate evaluates to false or unknown,
the code in the WHILE loop doesn’t execute and control passes to the next statement after the WHILE loop. The
WHILE loop in Listing 3-3 is a very simple example that counts from 1 to 10. The result is shown in Figure 3-4.
Listing 3-3. WHILE Statement Example
DECLARE @i int = 1;
WHILE @i <= 10
SET @i = @i + 1;
Figure 3-3. The last ten contact names in the AdventureWorks database
Tip Be sure to update your counter or other flag in the WHILE loop. The WHILE statement will keep looping
until its predicate evaluates to false or unknown. A simple coding mistake can create a nasty infinite loop.
T-SQL also includes two additional keywords that can be used with the WHILE statement: BREAK and
CONTINUE. The CONTINUE keyword forces the WHILE loop to immediately jump to the start of the code block,
as in the modified example in Listing 3-4.
Listing 3-4. WHILE...CONTINUE Example
DECLARE @i int = 1;
WHILE @i <= 10
SET @i = @i + 1;
CONTINUE; -- Force the WHILE loop to restart
PRINT 'The CONTINUE keyword ensures that this will never be printed.';
The BREAK keyword, on the other hand, forces the WHILE loop to terminate immediately. In Listing 3-5,
BREAK forces the WHILE loop to exit during the first iteration so that the numbers 2 through 10 are never printed.
Listing 3-5. WHILE...BREAK Example
DECLARE @i int = 1;
WHILE @i <= 10
SET @i = @i + 1;
BREAK; -- Force the WHILE loop to terminate
PRINT 'The BREAK keyword ensures that this will never be printed.';
Figure 3-4. Counting from 1 to 10 with WHILE
Tip BREAK and CONTINUE should be avoided in most cases. It’s not uncommon to see a WHILE l = l
statement with a BREAK in the body of the loop. This can always be rewritten, usually very easily, to remove the
BREAK statement. Most of the time, the BREAK and CONTINUE keywords introduce additional complexity to your
logic and cause more problems than they solve.
The GOTO Statement
Despite Edsger W. Dijkstras best efforts at warning developers (see Dijkstras 1968 letter, “Go To Statement
Considered Harmful”),1 T-SQL still has a GOTO statement. The GOTO statement transfers control of your
program to a specified label unconditionally. Labels are defined by placing the label identifier on a line
followed by a colon (:), as shown in Listing 3-6. This simple example executes its step 1 and uses GOTO to
dive straight into step 3, skipping step 2. The results are shown in Figure 3-5.
Listing 3-6. Simple GOTO Example
PRINT 'Step 1 Begin.';
GOTO Step3_Label;
PRINT 'Step 2 will not be printed.';
PRINT 'Step 3 End.';
Figure 3-5. The GOTO statement transfers control unconditionally
The GOTO statement is best avoided, because it can quickly degenerate your programs into unstructured
spaghetti code. When you have to write procedural code, you’re much better off using structured
programming constructs like IF...ELSE and WHILE statements.
1A Case against the GO TO Statement by: Edsger W Dijkstra; Technology University Eindhoven, The Netherlands
The WAITFOR Statement
The WAITFOR statement suspends execution of a transaction, SP, or T-SQL command batch until a specified
time is reached, a time interval has elapsed, or a message is received from Service Broker.
Note Service Broker is an SQL Server messaging system. We don’t detail Service Broker in this book, but
you can find out more about it in Pro SQL Server 2008 Service Broker, by Klaus Aschenbrenner (Apress, 2008).
The WAITFOR statement has a DELAY option that tells SQL Server to suspend code execution until one of
the following criteria is met or a specified time interval has elapsed. The time interval is specified as a valid
time string in the format hh:mm:ss. The time interval can’t contain a date portion; it must only include the
time, and it can be up to 24 hours. Listing 3-7 is an example of the WAITFOR statement with the DELAY option,
which blocks execution of the batch for 3 seconds.
WaItFOr CaVeatS
There are some caveats associated with the WAITFOR statement. In some situations, WAITFOR can
cause longer delays than the interval you specify. SQL Server also assigns each WAITFOR statement
its own thread, and if SQL Server begins experiencing thread starvation, it can randomly stop WAITFOR
threads to free up thread resources. If you need to delay execution for an exact amount of time, you can
guarantee more consistent results by suspending execution through an external application like SQL
Server Integration Services (SSIS).
In addition to its DELAY and TIME options, you can use WAITFOR with the RECEIVE and GET
CONVERSATION GROUP options with Service Broker–enabled applications. When you use WAITFOR with
RECEIVE, the statement waits for receipt of one or more messages from a specified queue.
When you use WAITFOR with the GET CONVERSATION GROUP option, it waits for a conversation group
identifier of a message. GET CONVERSATION GROUP allows you to retrieve information about a message
and lock the conversation group for the conversation containing the message, all before retrieving the
message itself.
Listing 3-7. WAITFOR Example
PRINT 'Step 1 complete. ';
DECLARE @time_to_pass nvarchar(8);
SELECT @time_to_pass = N'00:00:03';
WAITFOR DELAY @time_to_pass;
PRINT 'Step 2 completed three seconds later. ';
You can also use the TIME option with the WAITFOR statement. If you use the TIME option, SQL Server
waits until the appointed time before allowing execution to continue. Datetime variables are allowed, but
the date portion is ignored when the TIME option is used.
The RETURN Statement
The RETURN statement exits unconditionally from an SP or command batch. When you use RETURN, you can
optionally specify an integer expression as a return value. The RETURN statement returns a given integer
expression to the calling routine or batch. If you don’t specify an integer expression to return, a value of 0
is returned by default. RETURN isn’t normally used to return calculated results, except for UDFs, which offer
more RETURN options (as detailed in Chapter 4). For SPs and command batches, the RETURN statement is used
almost exclusively to return a success indicator, a failure indicator, or an error code.
All system SPs return 0 to indicate success, or a nonzero value to indicate failure (unless otherwise
documented in BOL). It’s considered bad form to use the RETURN statement to return anything other than
an integer status code from a script or SP.
UDFs, on the other hand, have their own rules. UDFs have a flexible variation of the RETURN statement,
which exits the body of the UDF. In fact, a UDF requires the RETURN statement be used to return scalar
or tabular results to the caller. You see UDFs again in detail in Chapter 4.
Note There are a couple of methods in T-SQL to redirect logic flow based on errors. These include
the TRY...CATCH statement and the THROW statement. Both statements are discussed in detail in Chapter 18.
The CASE Expression
The T-SQL CASE function is SQL Server’s implementation of the ISO SQL CASE expression. Whereas the
previously discussed T-SQL control-of-flow statements allow for conditional execution of SQL statements
or statement blocks, the CASE expression allows for set-based conditional processing in a single query. CASE
provides two syntaxes, simple and searched, which are discussed in this section.
The Simple CASE Expression
The simple CASE expression returns a result expression based on the value of a given input expression. The
simple CASE expression compares the input expression to a series of expressions following WHEN keywords.
Once a match is encountered, CASE returns a corresponding result expression following the keyword THEN.
If no match is found, the expression following the keyword ELSE is returned. NULL is returned if no ELSE
keyword is supplied.
Consider the example in Listing 3-8, which uses a simple CASE expression to count all the
AdventureWorks customers on the West Coast (arbitrarily defined as the states of California, Washington,
and Oregon). The query also uses a common table expression (CTE, discussed more thoroughly in Chapter 9).
The results are shown in Figure 3-6.
Listing 3-8. Counting West Coast Customers with a Simple CASE Expression
WITH EmployeesByRegion(Region)
CASE sp.StateProvinceCode
WHEN 'CA' THEN 'West Coast'
WHEN 'WA' THEN 'West Coast'
WHEN 'OR' THEN 'West Coast'
ELSE 'Elsewhere'
FROM HumanResources.Employee e
INNER JOIN Person.Person p
ON e.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress bea
ON bea.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.Address a
ON a.AddressID = bea.AddressID
INNER JOIN Person.StateProvince sp
ON sp.StateProvinceID = a.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
SELECT COUNT(Region) AS NumOfEmployees, Region
FROM EmployeesByRegion
GROUP BY Region;
Figure 3-6. Results of the West Coast Customer Count
The CASE expression in the subquery compares the StateProvinceCode value to each of the state codes
following the WHEN keywords, returning the name West Coast when StateProvinceCode is equal to CA, WA,
or OR. For any other StateProvinceCode in the United States, it returns a value of Elsewhere:
SELECT CASE sp.StateProvinceCode
WHEN 'CA' THEN 'West Coast'
WHEN 'WA' THEN 'West Coast'
WHEN 'OR' THEN 'West Coast'
ELSE 'Elsewhere'
The remainder of the example counts the number of rows returned by the query, grouped by Region.
The simple CASE expression performs basic equality comparisons between the input expression and the
expressions following the WHEN keywords. This means you can’t use the simple CASE expression to check
for NULLs. Recall from the “Three-Valued Logic” section of this chapter that a NULL, when compared to
anything, returns unknown. The simple CASE expression only returns the expression following the THEN
keyword when the comparison returns true. This means if you ever try to use NULL in a WHEN expression,
the corresponding THEN expression won’t be returned. If you need to check for NULL in a CASE expression,
use a searched CASE expression with the IS NULL or IS NOT NULL comparison operators.
The Searched CASE Expression
The searched CASE expression provides a mechanism for performing more complex comparisons. The
searched CASE evaluates a series of predicates following WHEN keywords until it encounters one that evaluates
to true. At that point, it returns the corresponding result expression following the THEN keyword. If none of
the predicates evaluates to true, the result following the ELSE keyword is returned. If none of the predicates
evaluates to true and ELSE isn’t supplied, the searched CASE expression returns NULL.
Predicates in the searched CASE expression can take advantage of any valid SQL comparison operators
(such as <, >, =, LIKE, and IN). The simple CASE expression from Listing 3-8 can be easily expanded to cover
multiple geographic regions using the searched CASE expression and the IN logical operator, as shown in
Listing 3-9. This example uses a searched CASE expression to group states into West Coast, Pacific, and New
England regions. The results are shown in Figure 3-7.
Listing 3-9. Counting Employees by Region with a Searched CASE Expression
WITH EmployeesByRegion(Region)
CASE WHEN sp.StateProvinceCode IN ('CA', 'WA', 'OR') THEN 'West Coast'
WHEN sp.StateProvinceCode IN ('HI', 'AK') THEN 'Pacific'
WHEN sp.StateProvinceCode IN ('CT', 'MA', 'ME', 'NH', 'RI', 'VT')
THEN 'New England'
ELSE 'Elsewhere'
FROM HumanResources.Employee e
INNER JOIN Person.Person p
ON e.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress bea
ON bea.BusinessEntityID = e.BusinessEntityID
INNER JOIN Person.Address a
ON a.AddressID = bea.AddressID
INNER JOIN Person.StateProvince sp
ON sp.StateProvinceID = a.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
SELECT COUNT(Region) AS NumOfCustomers, Region
FROM EmployeesByRegion
GROUP BY Region;
The searched CASE expression in the example uses the IN operator to return the geographic area that
StateProvinceCode is in: California, Washington, and Oregon all return West Coast; and Connecticut,
Massachusetts, Maine, New Hampshire, Rhode Island, and Vermont all return New England. If the
StateProvinceCode doesn’t fit in one of these regions, the searched CASE expression returns Elsewhere:
CASE WHEN sp.StateProvinceCode IN ('CA', 'WA', 'OR') THEN 'West Coast'
WHEN sp.StateProvinceCode IN ('HI', 'AK') THEN 'Pacific'
WHEN sp.StateProvinceCode IN ('CT', 'MA', 'ME', 'NH', 'RI', 'VT')
THEN 'New England'
ELSE 'Elsewhere'
The balance of the sample code in Listing 3-9 counts the rows returned, grouped by Region. The CASE
expression, either simple or searched, can be used in SELECT, UPDATE, INSERT, MERGE, and DELETE statements.
a CaSe BY aNY Other NaMe
Many programming and query languages offer expressions that are analogous to the SQL CASE
expression. C++ and C#, for instance, offer the ?: operator, which fulfills the same function as a
searched CASE expression. XQuery has its own flavor of if...then...else expression that is also
equivalent to the SQL searched CASE.
C# and Visual Basic supply the switch and Select statements, respectively, which are semi-analogous
to SQLs simple CASE expression. The main difference, of course, is that SQLs CASE expression returns
a scalar value, whereas the C# and Visual Basic statements actually control program flow, allowing you
to execute statements based on an expression’s value. The similarities and differences between SQL
expressions and statements and similar constructs in other languages provide a great starting point for
learning the nitty-gritty details of T-SQL.
CASE and Pivot Tables
Many times, business reporting requirements dictate that a result should be returned in pivot table format.
Pivot table format simply means the labels for columns and/or rows are generated from the data contained
in rows. Microsoft Access and Excel users have long had the ability to generate pivot tables on their data, and
SQL Server 2014 supports the PIVOT and UNPIVOT operators introduced in SQL Server 2005. Back in the days
of SQL Server 2000 and before, however, CASE expressions were the only method of generating pivot table–
type queries. And even though SQL Server 2014 provides the PIVOT and UNPIVOT operators, truly dynamic
Figure 3-7. Results of the regional customer count
pivot tables still require using CASE expressions and dynamic SQL. The static pivot table query shown in
Listing 3-10 returns a pivot table–formatted result with the total number of orders for each AdventureWorks
sales region in the United States. The results are shown in Figure 3-8.
Listing 3-10. CASE-Style Pivot Table
CASE WHEN t.Name = 'Northwest' THEN 1
) AS Northwest,
CASE WHEN t.Name = 'Northeast' THEN 1
) AS Northeast,
CASE WHEN t.Name = 'Southwest' THEN 1
) AS Southwest,
CASE WHEN t.Name = 'Southeast' THEN 1
) AS Southeast,
CASE WHEN t.Name = 'Central' THEN 1
) AS Central
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory t
ON soh.TerritoryID = t.TerritoryID
WHERE t.CountryRegionCode = 'US'
GROUP BY t.CountryRegionCode;
This type of static pivot table can also be used with the SQL Server 2014 PIVOT operator. The sample code
in Listing 3-11 uses the PIVOT operator to generate the same result as the CASE expressions in Listing 3-10.
Listing 3-11. PIVOT Operator Pivot Table
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory t
ON soh.TerritoryID = t.TerritoryID
WHERE t.CountryRegionCode = 'US'
) p
COUNT (Name)
FOR Name
) AS pvt;
On occasion, you may need to run a pivot table–style report where you don’t know the column names
in advance. This is a dynamic pivot table script that uses a temporary table and dynamic SQL to generate
a pivot table, without specifying the column names in advance. Listing 3-12 demonstrates one method of
generating dynamic pivot tables in T-SQL. The results are shown in Figure 3-9.
Figure 3-8. Number of sales by region in pivot table format
Listing 3-12. Dynamic Pivot Table Query
-- Declare variables
DECLARE @sql nvarchar(4000);
DECLARE @temp_pivot table
CountryRegion nvarchar(20) NOT NULL,
CountryRegionCode nvarchar(3) NOT NULL
-- Get column names from source table rows
INSERT INTO @temp_pivot
FROM Sales.SalesTerritory
-- Generate dynamic SQL query
SET @sql = N'SELECT' +
SELECT N', SUM(CASE WHEN t.TerritoryID = ' +
CAST(TerritoryID AS NVARCHAR(3)) +
N' THEN 1 ELSE 0 END) AS ' + QUOTENAME(CountryRegion) AS "*"
FROM @temp_pivot
), 2, 4000) +
N' FROM Sales.SalesOrderHeader soh ' +
N' INNER JOIN Sales.SalesTerritory t ' +
N' ON soh.TerritoryID = t.TerritoryID; ' ;
-- Print and execute dynamic SQL
PRINT @sql;
EXEC (@sql);
The script in Listing 3-12 first declares an nvarchar variable that holds the dynamically generated SQL
script and a table variable that holds all the column names, which are retrieved from the row values in the
source table:
-- Declare variables
DECLARE @sql nvarchar(4000);
DECLARE @temp_pivot table
CountryRegion nvarchar(20) NOT NULL,
CountryRegionCode nvarchar(3) NOT NULL
Next, the script grabs a list of distinct territory-specific values from the table and stores them in the
@temp_pivot table variable. These values from the table become column names in the pivot table result:
-- Get column names from source table rows
INSERT INTO @temp_pivot
FROM Sales.SalesTerritory
The script then uses FOR XML PATH to efficiently generate the dynamic SQL SELECT query that contains
CASE expressions and column names generated dynamically based on the values in the @temppivot table
variable. This SELECT query creates the dynamic pivot table result:
-- Generate dynamic SQL query
SET @sql = N'SELECT' +
SELECT N', SUM(CASE WHEN t.TerritoryID = ' +
CAST(TerritoryID AS NVARCHAR(3)) +
Figure 3-9. Dynamic pivot table result
N' THEN 1 ELSE 0 END) AS ' + QUOTENAME(CountryRegion) AS "*"
FROM @temp_pivot
), 2, 4000) +
N' FROM Sales.SalesOrderHeader soh ' +
N' INNER JOIN Sales.SalesTerritory t ' +
N' ON soh.TerritoryID = t.TerritoryID; ' ;
Finally, the dynamic pivot table query is printed out and executed with the T-SQL PRINT and EXEC
-- Print and execute dynamic SQL
PRINT @sql;
EXEC (@sql);
Listing 3-13 shows the dynamic SQL pivot table query generated by the code in Listing 3-12.
Listing 3-13. Autogenerated Dynamic SQL Pivot Table Query
CASE WHEN t.TerritoryID = 1 THEN 1
) AS [Northwest],
CASE WHEN t.TerritoryID = 2 THEN 1
) AS [Northeast],
CASE WHEN t.TerritoryID = 3 THEN 1
) AS [Central],
CASE WHEN t.TerritoryID = 4 THEN 1
) AS [Southwest],
CASE WHEN t.TerritoryID = 5 THEN 1
) AS [Southeast],
CASE WHEN t.TerritoryID = 6 THEN 1
) AS [Canada],
CASE WHEN t.TerritoryID = 7 THEN 1
) AS [France],
CASE WHEN t.TerritoryID = 8 THEN 1
) AS [Germany],
CASE WHEN t.TerritoryID = 9 THEN 1
) AS [Australia],
CASE WHEN t.TerritoryID = 10 THEN 1
) AS [United Kingdom]
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory t
ON soh.TerritoryID = t.TerritoryID;
Caution Any time you use dynamic SQL, make sure you take precautions against SQL injection—that is,
malicious SQL code being inserted into your SQL statements. This example uses the QUOTENAME function to
quote the column names being dynamically generated, to help avoid SQL injection problems. Chapter 18 covers
dynamic SQL and SQL injection in greater detail.
The IIF Statement
SQL Server 2012 simplified the standard CASE statement by introducing the concept of an IIF statement. You
get the same results as you would using the CASE statement but with much less code. Those familiar with
Microsoft .NET will be glad to see that the same functionality is now part of T-SQL.
The syntax is simple. The command takes a Boolean expression, a value for when the expression
equates to true, and a value for when the expression equates to false. Listing 3-14 show two examples: one
uses variables, and the other uses table columns. The output for both statements is shown in Figure 3-10.
Listing 3-14. Examples Using the IIF statement
--Example 1. IIF Statement Using Variables
DECLARE @valueA int = 85
DECLARE @valueB int = 45
SELECT IIF (@valueA < @valueB, 'True', 'False') AS Result
--Example 2. IIF Statement Using Table Column
SELECT IIF (Name in ('Alberta', 'British Columbia'), 'Canada', Name)
FROM [Person].[StateProvince]
Another logical function introduced in SQL Server 2012 is the CHOOSE function. The CHOOSE function allows
you to select a member of an array based on an integer index value. Simply put, CHOOSE lets you select a
member from a list. The member you select can be based on either a static index value or a computed value.
The syntax for the CHOOSE function is as follows:
CHOOSE ( index, val_1, val_2 [, val_n ] )
If the index value isn’t an integer (let’s say it’s a decimal), then SQL converts it to an integer. If the index
value is out of range for the index, then the function returns NULL. Listing 3-15 shows a simple example,
and Figure 3-11 shows the output. The example uses the integer value of PhoneNumberTypeID to determine
the type of phone. In this case, the phone type is defined in the table, so a CHOOSE function wouldn’t be
necessary; but in other cases, the value may not be defined.
Figure 3-10. Partial output of IIF statements
Listing 3-15. Example Using the CHOOSE Statement
SELECT p.FirstName,
CHOOSE(pp.PhoneNumberTypeID, 'Cell', 'Home', 'Work') 'Phone Type'
FROM Person.Person p
JOIN Person.PersonPhone pp
ON p.BusinessEntityID = pp.BusinessEntityID
The COALESCE function takes a list of expressions as arguments and returns the first non-NULL value from the
list. COALESCE is defined by ISO as shorthand for the following equivalent searched CASE expression:
WHEN (expression1 IS NOT NULL) THEN expression1 WHEN (expression2 IS NOT NULL) THEN expression2
[ ... " ] END
Figure 3-11. Partial output of the CHOOSE statement
The following COALESCE function returns the value of MiddleName when MiddleName is not NULL, and the
string No Middle Name when MiddleName is NULL:
COALESCE (MiddleName, 'No Middle Name')
The NULLIF function accepts exactly two arguments. NULLIF returns NULL if the two expressions are
equal, and it returns the value of the first expression if the two expressions aren’t equal. NULLIF is defined by
the ISO standard as equivalent to the following searched CASE expression:
CASE WHEN expression1 = expression2 THEN NULL
ELSE expression1
NULLIF is often used in conjunction with COALESCE. Consider Listing 3-16, which combines COALESCE
with NULLIF to return the string “This is NULL or A” if the variable @s is set to the character value A or NULL.
Listing 3-16. Using COALESCE with NULLIF
DECLARE @s varchar(10);
SELECT @s = 'A';
SELECT COALESCE(NULLIF(@s, 'A'), 'This is NULL or A');
T-SQL has long had alternate functionality similar to COALESCE. Specifically, the ISNULL function accepts
two parameters and returns NULL if they’re equal.
The T-SQL functions COALESCE and ISNULL perform similar functions, but which one should you use?
COALESCE is more flexible than ISNULL and is compliant with the ISO standard to boot. This means
it’s also the more portable option among ISO-compliant systems. COALESCE also implicitly converts
the result to the data type with the highest precedence from the list of expressions. ISNULL implicitly
converts the result to the data type of the first expression. Finally, COALESCE is a bit less confusing than
ISNULL, especially considering that there’s already a comparison operator called IS NULL. In general,
we recommend using the COALESCE function instead of ISNULL.
The word cursor comes from the Latin word for runner, and that is exactly what a T-SQL cursor does: it
“runs” through a result set, returning one row at a time. Many T-SQL programming experts rail against the
use of cursors for a variety of reasons—the chief among these include the following:
Cursors use a lot of overhead, often much more than an equivalent set-based •
Cursors override SQL Servers built-in query optimizations, often making them •
much slower than an equivalent set-based solution.
Because cursors are procedural in nature, they’re often the slowest way to manipulate data in T-SQL.
Rather than spend the balance of the chapter ranting against cursor use, however, we’d like to introduce
T-SQL cursor functionality and play devil’s advocate to point out some areas where cursors provide an
adequate solution.
The first such area where we can recommend the use of cursors is in scripts or procedures that perform
administrative tasks. In administrative tasks, the following items often hold true:
Unlike normal data queries and data manipulations that are performed dozens, •
hundreds, or potentially thousands of times per day, administrative tasks are often
performed on a one-off basis or on a regular schedule like once per day.
Administrative tasks often require calling an SP or executing a procedural code block •
once for each row when the tasks are based on a table of entries.
Administrative tasks generally don’t need to query or manipulate massive amounts •
of data to perform their jobs.
The order of the steps in which administrative tasks are performed and the order of •
the database objects they touch are often important.
The sample SP in Listing 3-17 is an example of an administrative task performed with a T-SQL cursor.
The sample uses a cursor to loop through all indexes on all user tables in the current database. It then
creates dynamic SQL statements to rebuild every index whose fragmentation level is above a user-specified
threshold. The results are shown in Figure 3-12. Be aware that your results may return different values for
each row.
Listing 3-17. Sample Administrative Task Performed with a Cursor
CREATE PROCEDURE dbo.RebuildIndexes
( @ShowOrRebuiId nvarchar(10) = N'show'
, @MaxFrag decimal(20, 2) = 20.0
-- Declare variables
@Schema nvarchar(128), @Table nvarchar(128)
, @Index nvarchar(128), @Sql nvarchar(4000)
, @DatabaseId int, @SchemaId int
, @TableId int, @lndexId int;
-- Create the index list table
( DatabaseName nvarchar(128) NOT NULL
, DatabaseId int NOT NULL
, SchemaName nvarchar(128) NOT NULL
, SchemaId int NOT NULL
, TableName nvarchar(128) NOT NULL
, TableId int NOT NULL
, IndexName nvarchar(128)
, IndexId int NOT NULL
, Fragmentation decimal(20, 2)
, PRIMARY KEY (DatabaseId, SchemaId, TableId, IndexId)
-- Populate index list table
( DatabaseName, DatabaseId
, SchemaName, SchemaId
, TableName, TableId
, IndexName, IndexId
, Fragmentation
SELECT db_name(), db_id()
, s.Name, s.schema_id
, t.Name, t.object_id
, i.Name, i.index_id
, MAX(ip.avg_fragmentation_in_percent)
FROM sys.tables t
INNER JOIN sys.schemas s ON
t.schema_id = s.schema_id
INNER JOIN sys.indexes i ON
t.object_id = i.object_id
INNER JOIN sys.dm_db_index_physical_stats (db_id(), NULL, NULL, NULL, NULL) ip ON
ip.object_id = t.object_id AND ip.index_id = i.index_id
WHERE ip.database_id = db_id()
, s.schema_id
, t.Name
, t.object_id
, i.Name
, i.index_id;
-- If user specified rebuiId, use a cursor to loop through all indexes
-- rebuiId them
IF @ShowOrRebuiId = N'rebuiId'
-- Declare a cursor to create the dynamic SQL statements
FOR SELECT SchemaName, TableName, IndexName
FROM @IndexList
WHERE Fragmentation > @MaxFrag
ORDER BY Fragmentation DESC, TableName ASC, IndexName ASC;
-- Open the cursor for reading
OPEN Index_Cursor;
-- Loop through all the tables in the database
INTO @Schema, @Table, @Index;
BEGIN -- Create ALTER INDEX statement to rebuiId index
QUOTENAME(RTRIM(@Index)) + N' ON ' + QUOTENAME(RTRIM(@Table)) + N'.' +
-- Execute dynamic SQL
EXEC (@Sql);
-- Get the next index
INTO @Schema, @Table, @Index;
-- Close and deallocate the cursor.
CLOSE Index_Cursor;
DEALLOCATE Index_Cursor;
-- Show results, including oId fragmentation and new fragmentation
-- after index rebuiId
SELECT il.DatabaseName
, il.SchemaName
, il.TableName
, il.IndexName
, il.Fragmentation AS FragmentationStart
, MAX( CAST(ip.avg_fragmentation_in_percent AS DECIMAL(20, 2))
) AS FragmentationEnd
FROM @IndexList il
INNER JOIN sys.dm_db_index_physical_stats(@DatabaseId, NULL, NULL, NULL, NULL) ip ON
DatabaseId = ip.database_id AND
TableId = ip.object_id AND
IndexId = ip.index_id
, il.SchemaName
, il.TableName
, il.IndexName
, il.Fragmentation
Fragmentation DESC
, TableName ASC
, IndexName ASC;
-- Execute index rebuild stored procedure
EXEC dbo.RebuildIndexes N'rebuild', 30;
The dbo.RebuildIndexes procedure shown in Listing 3-17 populates a table variable with the
information necessary to identify all indexes on all tables in the current database. It also uses the
sys.dm_db_indexphysical_stats catalog function to retrieve initial index fragmentation information:
--Populate index list table
FROM sys.tables t
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.indexes i
ON t.object_id = i.object_id
Figure 3-12. The results of a cursor-based index rebuild in the AdventureWorks database
INNER JOIN sys.dm_db_index_physical_stats (db_id(), NULL, NULL,NULL, NULL) ip
ON ip.object_id = t.object_id
AND ip.index_id = i.index_id
WHERE ip.database_id = db_id()
If you specify a rebuild action when you call the procedure, it creates a cursor to loop through the rows
of the @IndexList table, but only for indexes with a fragmentation percentage higher than the level you
specified when calling the procedure:
-- Declare a cursor to create the dynamic SOL statements
IndexName FROM @IndexList
WHERE Fragmentation > @MaxFrag
Fragmentation DESC,
TableName ASC,
IndexName ASC;
The procedure then loops through all the indexes in the @IndexList table, creating an ALTER INDEX
statement to rebuild each index. Each ALTER INDEX statement is created as dynamic SQL to be printed and
executed using the SQL PRINT and EXEC statements:
-- Open the cursor for reading
OPEN Index_Cursor;
-- Loop through all the tables in the database
INTO @Schema,@Table, @Index;
-- Create ALTER INDEX statement to rebuild index
QUOTENAME(RTRIM(@Index)) + N' ON ' + QUOTENAME(l@Schema) + N'.' +
-- Execute dynamic SQL
EXEC (@Sql);
-- Get the next index
INTO @Schema, @Table, @lndex;
-- Close and deallocate the cursor.
CLOSE Index_Cursor;
DEALLOCATE Index_Cursor;
The dynamic SQL statements generated by the procedure look similar to the following:
ALTER INDEX [IX_PurchaseOrderHeader_EmployeeID]
ON [Purchasing].[PurchaseOrderHeader] REBUILD WITH (ONLINE = OFF);
The balance of the code simply displays the results, including the new fragmentation percentage after
the indexes are rebuilt.
Notice in the example code in Listing 3-17 that we specifically avoided using database console
commands (DBCCs) like DBCC DBREINDEX and DBCC SHOWCONTIG to manage index fragmentation and
rebuild the indexes in the database. There is a very good reason for this: these DBCC statements, and
many others, are deprecated. Microsoft is planning to do away with many common DBCC statements
in favor of catalog views and enhanced T-SQL statement syntax. The DBCC DBREINDEX statement,
for instance, is being replaced by the ALTER INDEX REBUILD syntax, and DBCC SHOWCONTIG is being
replaced by the sys.dm_db_index_physical_stats catalog function. Keep this in mind when porting
code from legacy systems and creating new code.
Another situation where we advise developers to use cursors is when the solution required is a one-
off task, a set-based solution would be very complex, and time is short. Examples include creating
complex running sum-type calculations and performing complex data-scrubbing routines on a very
limited timeframe. We don’t using a cursor as a permanent production application solution without
exploring all available set-based options. Remember that whenever you use a cursor, you override SQL
Server’s automatic optimizations—and the SQL Server query engine has much better and more current
information to optimize operations than you have access to at any given point in time. Also keep in
mind that tasks you consider extremely complex today will become much easier as SQLs set-based
processing becomes second nature to you.
CUrSOrS, CUrSOrS eVerYWhere
Although cursors commonly get a lot of bad press from SQL gurus, there is nothing inherently evil about
them. They’re just another tool in the toolkit and should be viewed as such. What is wrong is the ways in
which developers abuse them. Generally speaking, as much as 90% of the time, cursors absolutely are
not the best tool for the job when you’re writing T-SQL code. Unfortunately, many SQL newbies find set-
based logic difficult to grasp at first. Cursors provide a comfort zone for procedural developers because
they lend themselves to procedural design patterns.
One of the worst design patterns you can adopt is the “cursors, cursors everywhere” design pattern.
Believe it or not, there are developers who have been writing SQL code for years and have never
bothered learning about SQLs set-based processing. These developers tend to approach every SQL
problem as if it were a C# or Visual Basic problem, and their code tends to reflect it with “cursors,
cursors everywhere.” Replacing cursor-based code with WHILE loops doesn’t solve the problem.
Simulating the behavior of cursors with WHILE loops doesn’t fix the design flaw inherent in the cursor-
based solution: row-by-row processing of data. WHILE loops may, under some circumstances, perform
comparably to cursors; and in some situations even a cursor will outperform a WHILE loop.
Another horrible design pattern results from what are actually best practices in other procedural
languages. Code reuse isn’t SQLs strong point. Many programmers coming from object-oriented
languages that promote heavy code reuse tend to write layers and layers of SPs that call one another.
These SPs often have cursors, and cursors within cursors, to feed each layer of procedures. Although
it does promote code reuse, this design pattern causes severe performance degradation. A commonly
used term for this type of design pattern, popularized by SQL professional Jeff Moden, is “row-by-
agonizing-row” (RBAR) processing. This design pattern is high on our top-ten list of ways to abuse SQL
Server and will cause you far more problems than it ever solves. SQL Server 2014 offers a feature,
the table-valued parameter, that may help increase manageability and performance of the layered SP
design methodology. Chapter 5 discusses table-valued parameters.
SQL Server supports syntax for both ISO standard cursors and T-SQL extended syntax cursors. The ISO
standard supports the following cursor options:
The • INSENSITIVE option makes a temporary copy of the cursor result set and uses
that copy to fulfill cursor requests. This means changes to the underlying tables
aren’t reflected when you request rows from the cursor.
The • SCROLL option allows you to use all cursor fetch options to position the cursor on
any row in the cursor result set. The cursor fetch options include FIRST, LAST, NEXT,
PRIOR, ABSOLUTE, and RELATIVE. If the SCROLL option isn’t specified, only the NEXT
cursor fetch option is allowed.
The • READ ONLY option in the cursor FOR clause prevents updates to the underlying
data through the cursor. In a non-read only cursor, you can update the underlying
data with the WHERE CURRENT OF clause in the UPDATE and DELETE statements.
The • UPDATE OF option allows you to specify a list of updatable columns in the
cursor’s result set. You can specify UPDATE without the OF keyword and its associated
column list to allow updates to all columns.
The T-SQL extended syntax provides many more options than the ISO syntax. In addition to supporting
read-only cursors, the UPDATE OF option, the SCROLL option, and insensitive cursors (using the STATIC
keyword), T-SQL extended syntax cursors support the following options:
Cursors that are local to the current batch, procedure, or trigger in which they’re •
created via the LOCAL keyword. Cursors that are global to the connection in which
they’re created can be defined using the GLOBAL keyword.
The • FORWARDONLY option, which is the opposite of the SCROLL option, allowing you to
only fetch rows from the cursor using the NEXT option.
The • KEYSET option, which specifies that the number and order of rows is fixed at the
time the cursor is created. Trying to fetch rows that are subsequently deleted doesn’t
succeed, and a @@FETCH_STATUS value of -2 is returned.
The • DYNAMIC option, which specifies a cursor that reflects all data changes made to
the rows in its underlying result set. This type of cursor is one of the slowest, because
every change to the underlying data must be reflected whenever you scroll to a new
row of the result set.
The • FAST_FORWARD option, which specifies a performance-optimized combination
forward-only/read-only cursor.
The • SCROLLLOCKS option, which locks underlying data rows as they’re read to ensure
that data modifications will succeed. The SCROLLLOCKS option is mutually exclusive
with the FAST_FORWARD and STATIC options.
The • OPTIMISTIC option, which uses timestamps to determine if a row has changed
since the cursor was loaded. If a row has changed, the OPTIMISTIC option doesn’t
allow the current cursor to update the same row. The OPTIMISTIC option is
incompatible with the FAST_FORWARD option.
The • TYPEWARNING option, which sends a warning if a cursor will be automatically
converted from the requested type to another type. This can happen, for instance, if
SQL Server needs to convert a forward-only cursor to a static cursor.
Note If you don’t specify a cursor as LOCAL or GLOBAL, cursors that are created default to the setting
defined by the default to local cursor database setting.
Cursors come in several flavors, and you could spend a lot of time just trying to figure out which one
you need to perform a given task. Most of the time, you need forward-only/read-only cursors. These
cursors are efficient because they move in only one direction and don’t need to perform updates on the
underlying data. Maximizing cursor efficiency by choosing the right type of cursor for the job is a
quick-win strategy that you should keep in mind when you have to resort to a cursor.
This chapter introduced SQL 3VL, which consists of three logical result values: true, false, and unknown. This is
a key concept to understanding SQL development in general, but it can be a foreign idea to developers coming
from backgrounds in other programming languages. If you’re not yet familiar with the 3VL chart, we highly
recommend revisiting Figure 3-1. This chart summarizes the logic that governs SQL 3VL.
This chapter also introduced T-SQLs control-of-flow statement offerings, which allow you to branch
conditionally and unconditionally, loop, handle exceptions, and force delays in your code. We also covered
the two flavors of CASE expression and some of the more advanced uses of CASE, including dynamic pivot
table queries and CASE-based functions like COALESCE and NULLIF.
Finally, we discussed the redheaded stepchild of SQL development, the cursor. Although cursors
commonly get a bad rep, there’s nothing inherently bad about them; the problem is with how people use them.
The discussion of cursors focused on some common scenarios where they might be considered the best tool for
the job, including administrative and complex one-off tasks. Finally, we presented the options available for ISO-
compliant cursors and T-SQL extended syntax cursors, both of which are supported by SQL Server 2014.
The next chapter begins to discuss T-SQL programmability features, starting with an in-depth look at
T-SQL UDFs in all their various forms.
1. [True/False] SQL 3VL supports the logical result values true, false, and unknown.
2. [Choose one] SQL NULL represents which of the following?
a. An unknown or missing value
b. The number 0
c. An empty (zero-length) string
d. All of the above
3. [True/False] The BEGIN and END keywords delimit a statement block and limit the
scope of variables declared in that statement block, like curly braces ({ }) in C#.
4. [Fill in the blank] The ____keyword forces a WHILE loop to terminate immediately.
5. [True/False] The TRY...CATCH block can catch every possible SQL Server error.
6. [Fill in the blanks] SQL CASE expressions come in two forms, ___ and ___.
7. [Choose all that apply] T-SQL supports which of the following cursor options?
a. Read-only cursors
b. Forward-only cursors
c. Backward-only cursors
d. Write-only cursors
8. Modify the code in Listing 3-10 to generate a pivot table result set that returns the
total dollar amount (TotalDue) of orders by region, instead of the count of orders
by region.
Chapter 4
User-Defined Functions
Each new version of SQL Server features improvements to T-SQL that make development easier. SQL Server
2000 introduced (among other things) the concept of user-defined functions (UDFs). Like functions in other
programming languages, T-SQL UDFs provide a convenient way for developers to define routines that accept
parameters, perform actions based on those parameters, and return data to the caller. T-SQL functions come
in three flavors: inline table-valued functions (TVFs), multistatement TVFs, and scalar functions. SQL Server
2014 also supports the ability to create CLR integration UDFs, which are discussed in Chapter 15.
Scalar Functions
Basically, a scalar UDF is a function that accepts zero or more parameters and returns a single scalar value as
the result. You’re probably already familiar with scalar functions in mathematics, and with T-SQLs built-in
scalar functions (such as ABS and SUBSTRING). The CREATE FUNCTION statement allows you to create custom
scalar functions that behave like the built-in scalar functions.
To demonstrate scalar UDFs, let’s a trip back in time to high school geometry class. In accordance with
the rules passed down from Euclid, this UDF accepts a circle’s radius and returns the area of the circle using
the formula area = p × r2. Listing 4-1 demonstrates this simple scalar UDF.
Listing 4-1. Simple Scalar UDF
CREATE FUNCTION dbo.CalculateCircleArea (@Radius float =1.0)
RETURN PI() * POWER(@Radius, 2);
The first line of the CREATE FUNCTION statement defines the schema and name of the function using
a standard SQL Server two-part name (dbo.CalculateCircleArea) and a single required parameter, the
radius of the circle (@Radius). The @Radius parameter is defined as a T-SQL float type. The parameter is
assigned a default value of 1.0 by the = 1.0 after the parameter declaration:
CREATE FUNCTION dbo.CalculateCircleArea (@Radius float =1.0)
The next line contains the RETURNS keyword, which specifies the data type of the result that will be
returned by the UDF. In this instance, the RETURNS keyword indicates that the UDF will return a float result:
The third line contains additional options following the WITH keyword. The example uses the RETURNS
NULL ON NULL INPUT function option for a performance improvement. The RETURNS NULL ON NULL INPUT
option is a performance-enhancing option that automatically returns NULL if any of the parameters passed in
are NULL. The performance enhancement occurs because SQL Server won’t execute the body of the function
if a NULL is passed in and this option is specified:
The AS keyword indicates the start of the function body which must be enclosed in the T-SQL BEGIN and
END keywords. The sample function in Listing 4-1 is very simple, consisting of a single RETURN statement that
immediately returns the value of the circle area calculation. The RETURN statement must be the last statement
before the END keyword in every scalar UDF:
RETURN PI() * POWER(@radius, 2);
You can test this simple UDF with a few SELECT statements like the following. The results are shown
in Figure 4-1:
SELECT dbo.CalculateCircleArea(10);
SELECT dbo.CalculateCircleArea(NULL);
SELECT dbo.CalculateCircleArea(2.5);
Figure 4-1. The results of the sample circle area calculations
UDF paraMeterS
UDF parameters operate similarly to, but slightly differently from, stored procedure (SP) parameters. It’s
important to be aware of the differences. For instance, if you create a UDF that accepts no parameters,
you still need to include empty parentheses after the function name—both when creating and when
invoking the function. Some built-in functions, like the PI() function used in Listing 4-1, which
represents the value of the constant p (3.14159265358979), don’t take parameters. Notice that when
the function is called in the UDF, it’s still called with empty parentheses.
When SPs are assigned default values, you can simply leave the parameter off your parameter list
completely when calling the procedure. This isn’t an option with UDFs. To use a UDF default value,
you must use the DEFAULT keyword when calling the UDF. To use the default value for the @radius
parameter of the example dbo.CalculateCircleArea UDF, you call the UDF like this:
SELECT dbo.CalculateCircleArea (DEFAULT);
Finally, SPs have no equivalent to the RETURNS NULL ON NULL INPUT option. You can simulate this
functionality to some extent by checking your parameters for NULL immediately on entering the SP,
though. SPs are discussed in greater detail in Chapter 5.
UDFs provide several creation-time options that allow you to improve performance and security,
including the following:
The • ENCRYPTION option can be used to store your UDF in the database in obfuscated
format. Note that this isn’t true encryption, but rather an easily circumvented
obfuscation of your code. See the “UDF ‘Encryption’” sidebar for more information.
The • SCHEMABINDING option indicates that your UDF will be bound to database
objects referenced in the body of the function. With SCHEMABINDING turned on,
attempts to change or drop referenced tables and other database objects result in an
error. This helps to prevent inadvertent changes to tables and other database objects
that can break your UDF. Additionally, the SQL Server Database Engine team has
published information indicating that SCHEMABINDING can improve the performance
of UDFs, even if they don’t reference other database objects (http://blogs.msdn.
The • CALLED ON NULL INPUT option is the opposite of RETURNS NULL ON NULL INPUT.
When CALLED ON NULL INPUT is specified, SQL Server executes the body of the
function even if one or more parameters are NULL. CALLED ON NULL INPUT is a
default option for all scalar-valued functions.
The • EXECUTE AS option manages caller security on UDFs. You can specify that the
UDF be executed as any of the following:
• CALLER indicates that the UDF should run under the security context of the user
calling the function. This is the default.
• SELF indicates that the UDF should run under the security context of the user
who created (or altered) the function.
• OWNER indicates that the UDF should run under the security context of the owner
of the UDF (or the owner of the schema containing the UDF).
Finally, you can specify that the UDF should run under the security context of a •
specific user by specifying a username.
Using the ENCRYPTION option on UDFs performs a simple obfuscation of your code. It actually does little
more than “keep honest people honest,” and in reality it tends to be more trouble than it’s worth. Many
developers and DBAs have spent precious time scouring the Internet for tools to decrypt their database
objects because they were convinced the scripts in their source control database were out of sync with
the production database. Keep in mind that those same decryption tools are available to anyone with
an Internet connection and a browser. If you write commercial database scripts or perform database
consulting services, your best (and really only) protection against curious DBAs and developers
reverse-engineering and modifying your code is a well-written contract. Keep this in mind when
deciding whether to “encrypt” your database objects.
Recursion in Scalar User-Defined Functions
Now that you’ve learned the basics, let’s hang out in math class for a few more minutes to talk about
recursion. Like most procedural programming languages that allow function definitions, T-SQL allows
recursion in UDFs. There’s hardly a better way to demonstrate recursion than the most basic recursive
algorithm around: the factorial function.
For those who put factorials out of their minds immediately after graduation, here’s a brief rundown
of what they are. A factorial is the product of all natural (or counting) numbers less than or equal to n,
where n > 0. Factorials are represented in mathematics with the bang notation: n!. As an example,
5! = 1 × 2 × 3 × 4 × 5 = 120. The simple scalar dbo.CalculateFactorial UDF in Listing 4-2 calculates a
factorial recursively for an integer parameter passed into it.
Listing 4-2. Recursive Scalar UDF
CREATE FUNCTION dbo.CalculateFactorial (@n int = 1)
RETURNS decimal(38, 0)
WHEN @n > 1 THEN CAST(@n AS float) * dbo.CalculateFactorial (@n - 1)
WHEN @n = 1 THEN 1
The first few lines are similar to Listing 4-1. The function accepts a single int parameter and returns a
scalar decimal value. The RETURNS NULL ON NULL INPUT option returns NULL immediately if NULL is passed in:
CREATE FUNCTION dbo.CalculateFactorial (@n int = 1)
RETURNS decimal(38, 0)
You return a decimal result in this example because of the limitations of the int and bigint types.
Specifically, the int type overflows at 13! and bigint bombs out at 21!. In order to put the UDF through its
paces, you have to allow it to return results up to 32!, as discussed later in this section. As in Listing 4-1, the
body of this UDF is a single RETURN statement, this time with a searched CASE expression:
WHEN @n > 1 THEN CAST(@n AS float) * dbo.CalculateFactorial (@n - 1)
WHEN @n = 1 THEN 1 END);
The CASE expression checks the value of the UDF parameter, @n. If @n is 0 or negative, dbo.
CalculateFactorial returns NULL because the result is undefined. If @n is greater than 1, dbo.
CalculateFactorial returns @n * dbo.CalculateFactorial(@n - 1), the recursive part of the UDF. This
ensures that the UDF will continue calling itself recursively, multiplying the current value of @n by (@n-1)!.
Finally, when @n reaches 1, the UDF returns 1. This is the part of dbo.CalculateFactorial that stops
the recursion. Without the check for @n = 1, you could theoretically end up in an infinite recursive loop.
In practice, however, SQL Server saves you from yourself by limiting you to a maximum of 32 levels of
recursion. Demonstrating the 32-level limit on recursion is why it was important for the UDF to return
results up to 32!. Following are some examples of dbo.CalculateFactorial calls with various parameters,
and their results:
SELECT dbo.CalculateFactorial(NULL); -- Returns NULL
SELECT dbo.CalculateFactorial(-1); -- Returns NULL
SELECT dbo.CalculateFactorial(0); -- Returns NULL
SELECT dbo.CalculateFactorial(5); -- Returns 120
SELECT dbo.CalculateFactorial(32); -- Returns 263130836933693520000000000000000000
As you can see, the dbo.CalculateFactorial function easily handles the 32 levels of recursion required
to calculate 32!. If you try to go beyond that limit, you get an error message. Executing the following code,
which attempts 33 levels of recursion, doesn’t work:
SELECT dbo.CalculateFactorial(33);
This causes SQL Server to grumble loudly with an error message similar to the following:
Msg 217, Level 16, State 1, Line 1
Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).
MOre thaN ONe WaY tO SKIN a Cat
The 32-level recursion limit is a hard limit; that is, you can’t programmatically change it through server
or database settings. This really isn’t as bad a limitation as you might think. Very rarely do you actually
need to recursively call a UDF more than 32 times, and doing so could result in a severe performance
penalty. There’s generally more than one way to get the job done. You can work around the 32-level
recursion limitation in the dbo.CalculateFactorial function by rewriting it with a WHILE loop or using
a recursive common table expression (CTE), as shown here:
CREATE FUNCTION dbo.CalculateFactorial (@n int = 1)
DECLARE @result float;
SET @result = NULL;
IF @n > 0
SET @result = 1.0;
WITH Numbers (num)
AS (
SELECT num + 1
FROM Numbers
WHERE num < @n
SELECT @result = @result * num
FROM Numbers;
RETURN @result;
This rewrite of the dbo.CalculateFactorial function averts the recursive function call limit by
eliminating the recursive function calls. Instead, it pushes the recursion back into the body of the
function through the use of a recursive common table expression (CTE). By default, SQL Server allows
up to 100 levels of recursion in a CTE (you can override this with the MAXRECURSION option), greatly
expanding your factorial calculation power. With this function, you can easily find out that 33! is
8.68331761881189E+36, or even that 100! is 9.33262154439441E+157. The important idea to take
away from this discussion is that although recursive function calls have hard limits on them, you can
often work around those limitations using other T-SQL functionality.
Also keep in mind that although you used factorial calculation as a simple example of recursion, this
method is considered naive, and there are several more-efficient methods of calculating factorials.
Procedural Code in User-Defined Functions
So far, you’ve seen simple functions that demonstrate the basic points of scalar UDFs. But in all likelihood,
unless you’re implementing business logic for a swimming pool installation company, you aren’t likely to
need to spend much time calculating the area of a circle in T-SQL.
A common problem that you have a much greater chance of running into is name-based searching.
T-SQL offers tools for exact matching, partial matching, and even limited pattern matching via the LIKE
predicate. T-SQL even offers built-in phonetic matching (sound-alike matching) through the built-in
SOUNDEX function.
Heavy-duty approximate matching usually requires a more advanced tool, like a better phonetic
matching algorithm. Let’s use one of these algorithms, the New York State Identification and Intelligence
System (NYSIIS) algorithm, to demonstrate procedural code in UDFs.
The NYSIIS algorithm is an improvement on the Soundex phonetic encoding algorithm, itself nearly 90
years old. The NYSIIS algorithm converts groups of one, two, or three alphabetic characters (known
as n-grams) in names to a phonetic (“sounds like”) approximation. This makes it easier to search
for names that have similar pronunciations but different spellings, such as Smythe and Smith. As
mentioned in this section, SQL Server provides a built-in SOUNDEX function, but Soundex provides very
poor accuracy and usually results in many false hits. NYSIIS and other modern algorithms provide much
better results than Soundex.
To demonstrate procedural code in UDFs, you can implement a UDF that phonetically encodes names
using NYSIIS encoding rules. The rules for NYSIIS phonetic encoding are relatively simple, with the majority
of the rules requiring simple n-gram substitutions. The following is a complete list of NYSIIS encoding rules:
1. Remove all non-alphabetic characters from the name.
2. The first characters of the name are encoded according to the n-gram
substitutions shown in the Start of Name table in Figure 4-2. In Figure 4-2, the
n-grams shown to the left of the arrows are replaced with the n-grams to the right
of the arrows during the encoding process.
Figure 4-2. NYSIIS phonetic encoding rules / character substitutions
3. The last characters of the name are encoded according to the n-gram
substitutions shown in the End of Name table in Figure 4-2.
4. The first character of the encoded value is set to the first character of the name.
5. After the first and last n-grams are encoded, all remaining characters in the name
are encoded according to the n-gram substitutions shown in the Middle of Name
table in Figure 4-2.
6. All side-by-side duplicate characters in the encoded name are reduced to a single
character. This means that AA is reduced to A and SS is reduced to S.
7. If the last character of the encoded name is S, it’s removed.
8. If the last characters of the encoded name are AY, they’re replaced with Y.
9. If the last character of the encoded name is A, it’s removed.
10. The result is truncated to a maximum length of six characters.
You could use some fairly large CASE expressions to implement these rules, but let’s go with a more
flexible option: using a replacement table. This table will contain the majority of the replacement rules in
three columns,
as described here:
• Location: This column tells the UDF whether the rule should be applied to the start,
end, or middle of the name.
• NGram: This column is the n-gram, or sequence of characters, that will be encoded.
These n-grams correspond to the left side of the arrows in Figure 4-2.
• Replacement: This column represents the replacement value for the corresponding
n-gram on the same row. These character sequences correspond to the right side of
the arrows in Figure 4-2.
Listing 4-3 is a CREATE TABLE statement that builds the NYSIIS phonetic encoding replacement rules table.
Listing 4-3. Creating the NYSIIS Replacement Rules Table
-- Create the NYSIIS replacement rules table
CREATE TABLE dbo.NYSIIS_Replacements
(Location nvarchar(10) NOT NULL,
NGram nvarchar(10) NOT NULL,
Replacement nvarchar(10) NOT NULL,
PRIMARY KEY (Location, NGram));
Listing 4-4 is a single INSERT statement that uses row constructors to populate all the NYSIIS
replacement rules, as shown in Figure 4-2.
Listing 4-4. INSERT Statement to Populate the NYSIIS Replacement Rules Table
INSERT INTO NYSIIS_Replacements (Location, NGram, Replacement)
VALUES(N'End', N'DT', N'DD'),
(N'End', N'EE', N'YY'),
(N'End', N'lE', N'YY'),
(N'End', N'ND', N'DD'),
(N'End', N'NT', N'DD'),
(N'End', N'RD', N'DD'),
(N'End', N'RT', N'DD'),
(N'Mid', N'A', N'A'),
(N'Mid', N'E', N'A'),
(N'Mid', N'T', N'A'),
(N'Mid', N'K', N'C'),
(N'Mid', N'M', N'N'),
(N'Mid', N'O', N'A'),
(N'Mid', N'Q', N'G'),
(N'Mid', N'U', N'A'),
(N'Mid', N'Z', N'S'),
(N'Mid', N'AW', N'AA'),
(N'Mid', N'EV', N'AF'),
(N'Mid', N'EW', N'AA'),
(N'Mid', N'lW', N'AA'),
(N'Mid', N'KN', N'NN'),
(N'Mid', N'OW', N'AA'),
(N'Mid', N'PH', N'FF'),
(N'Mid', N'UW', N'AA'),
(N'Mid', N'SCH', N'SSS'),
(N'Start', N'K', N'C'),
(N'Start', N'KN', N'NN'),
(N'Start', N'PF', N'FF'),
(N'Start', N'PH', N'FF'),
(N'Start', N'MAC', N'MCC'),
(N'Start', N'SCH', N'SSS');
Listing 4-5 is the UDF that encodes a string using NYSIIS. This UDF demonstrates the complexity of the
control-of-flow logic that can be implemented in a scalar UDF.
Listing 4-5. Function to Encode Strings Using NYSIIS
@String nvarchar(100)
RETURNS nvarchar(6)
DECLARE @Result nvarchar(100);
SET @Result = UPPER(@String);
-- Step 1: Remove All Nonalphabetic Characters
WITH Numbers (Num)
SELECT Num + 1
FROM Numbers
WHERE Num < LEN(@Result)
CASE WHEN SUBSTRING(@Result, Num, 1) >= N'A'
AND SUBSTRING(@Result, Num, 1) <= N'Z'
THEN SUBSTRING(@Result, Num, 1)
FROM Numbers;
SET @Result = REPLACE(@Result, N'.', N'');
-- Step 2: Replace the Start N-gram
SELECT TOP (1) @Result = STUFF
FROM dbo.NYSIIS_Replacements
WHERE Location = N'Start'
AND SUBSTRING(@Result, 1, LEN(NGram)) = NGram
-- Step 3: Replace the End N-gram
SELECT TOP (1) @Result = STUFF
LEN(@Result) - LEN(NGram) + 1,
FROM dbo.NYSIIS_Replacements
WHERE Location = N'End'
AND SUBSTRING(@Result, LEN(@Result) - LEN(NGram) + 1, LEN(NGram)) = NGram
-- Step 4: Save the First Letter of the Name
DECLARE @FirstLetter nchar(1);
SET @FirstLetter = SUBSTRING(@Result, 1, 1);
-- Step 5: Replace All Middle N-grams
DECLARE @Replacement nvarchar(10);
DECLARE @i int;
SET @i = 1;
WHILE @i <= LEN(@Result)
SET @Replacement = NULL;
-- Grab the middle-of-name replacement n-gram
SELECT TOP (1) @Replacement = Replacement
FROM dbo.NYSIIS_Replacements
WHERE Location = N'Mid'
AND SUBSTRING(@Result, @i, LEN(NGram)) = NGram
SET @Replacement = COALESCE(@Replacement, SUBSTRING(@Result, @i, 1));
-- If we found a replacement, apply it
SET @Result = STUFF(@Result, @i, LEN(@Replacement), @Replacement)
-- Move on to the next n-gram
SET @i = @i + COALESCE(LEN(@Replacement), 1);
-- Replace the first character with the first letter we saved at the start
SET @Result = STUFF(@Result, 1, 1, @FirstLetter);
-- Here we apply our special rules for the 'H' character. Special handling for 'W'
-- characters is taken care of in the replacement rules table
WITH Numbers (Num)
SELECT 2 -- Don't bother with the first character
SELECT Num + 1
FROM Numbers
WHERE Num < LEN(@Result)
CASE SUBSTRING(@Result, Num, 1)
CASE WHEN SUBSTRING(@Result, Num + 1, 1)
NOT IN (N'A', N'E', N'I', N'O', N'U')
OR SUBSTRING(@Result, Num - 1, 1)
NOT IN (N'A', N'E', N'I', N'O', N'U')
THEN SUBSTRING(@Result, Num - 1, 1)
ELSE SUBSTRING(@Result, Num, 1)
FROM Numbers;
-- Step 6: Reduce All Side-by-side Duplicate Characters
-- First replace the first letter of any sequence of two side-by-side
-- duplicate letters with a period
WITH Numbers (Num)
SELECT Num + 1
FROM Numbers
WHERE Num < LEN(@Result)
CASE SUBSTRING(@Result, Num, 1)
WHEN SUBSTRING(@Result, Num + 1, 1) THEN N'.'
ELSE SUBSTRING(@Result, Num, 1)
FROM Numbers;
-- Next replace all periods '.' with an empty string ''
SET @Result = REPLACE(@Result, N'.', N'');
-- Step 7: Remove Trailing 'S' Characters
WHILE RIGHT(@Result, 1) = N'S' AND LEN(@Result) > 1
SET @Result = STUFF(@Result, LEN(@Result), 1, N'');
-- Step 8: Remove Trailing 'A' Characters
WHILE RIGHT(@Result, 1) = N'A' AND LEN(@Result) > 1
SET @Result = STUFF(@Result, LEN(@Result), 1, N'');
-- Step 9: Replace Trailing 'AY' Characters with 'Y'
IF RIGHT(@Result, 2) = 'AY'
SET @Result = STUFF(@Result, LEN(@Result) - 1, 1, N'');
-- Step 10: Truncate Result to 6 Characters
The NYSIISReplacements table rules reflect most of the NYSIIS rules described by Robert L. Taft
in his famous paper “Name Search Techniques.1 The start and end n-grams are replaced, and then the
remaining n-gram rules are applied in a WHILE loop. The special rules for the letter H are applied, side-by-
side duplicates are removed, special handling of certain trailing characters is performed, and the first six
characters of the result are returned.
Robert L. Taft, “Name Search Techniques,” Special Report (Albany, NY: Bureau of Systems Development, 1970).
This example uses recursive CTEs to dynamically generate virtual numbers tables in a couple of places.
A numbers table is simply a table of numbers counting up to a specified maximum. The following
recursive CTE generates a small numbers table (the numbers 1 through 100):
WITH Numbers (Num)
SELECT Num + 1
FROM Numbers
WHERE Num < 100
SELECT Num FROM Numbers;
Listing 4-5 used the number of characters in the name to limit the recursion of the CTEs. This speeds up
the UDF overall. You can get even more performance gains by creating a permanent numbers table in your
database with a clustered index/primary key on it, instead of using CTEs. A numbers table is always handy
to have around, doesn’t cost you very much to build or maintain, doesn’t take up much storage space, and
is extremely useful for converting loops and cursors to set-based code. A numbers table is by far one of the
handiest and simplest tools you can add to your T-SQL toolkit.
As an example, you can use the query in Listing 4-6 to phonetically encode the last names of all contacts
in the AdventureWorks database using NYSIIS. Partial results are shown in Figure 4-3.
Listing 4-6. Using NYSIIS to Phonetically Encode All AdventureWorks Contacts
SELECT LastName,
dbo.EncodeNYSIIS(LastName) AS NYSIIS
FROM Person.Person
GROUP BY LastName;
Using the dbo.EncodeNYSIIS UDF is relatively simple. Listing 4-7 is a simple example of using the new
UDF in the WHERE clause to retrieve all AdventureWorks contacts whose last name is phonetically similar to
the name Liu. The results are shown in Figure 4-4.
Listing 4-7. Retrieving All Contact Phonetic Matches for Liu
dbo.EncodeNYSIIS(LastName) AS NYSIIS
FROM Person.Person
WHERE dbo.EncodeNYSIIS(LastName) = dbo.EncodeNYSIIS(N' Liu');
Figure 4-3. Partial results of NYSIIS encoding AdventureWorks contacts
The example in Listing 4-7 is the naive method of using a UDF. The query engine must apply the UDF
to every single row of the source table. In this case, the dbo.EncodeNYSIIS function is applied to the nearly
20,000 last names in the Person.Contact table, resulting in an inefficient query plan and excessive I/O. A
more efficient method is to perform the NYSIIS encodings ahead of time—to pre-encode the names. The
pre-encoding method is demonstrated in Listing 4-8.
Listing 4-8. Pre-encoding AdventureWorks Contact Names with NYSIIS
BusinessEntityID int NOT NULL,
NYSIIS nvarchar(6) NOT NULL,
Figure 4-4. Partial listing of AdventureWorks contacts with names phonetically similar to Liu
FROM Person.Person;
Once you have pre-encoded the data, queries are much more efficient. The query shown in Listing 4-9 uses
the table created in Listing 4-8 to return the same results as Listing 4-7—just much more efficiently, because this
version doesn’t need to encode every row of data for comparison in the WHERE clause at query time.
Listing 4-9. Efficient NYSIIS Query Using Pre-encoded Data
FROM Person.ContactNYSIIS cn
INNER JOIN Person.Person c
ON cn.BusinessEntityID = c.BusinessEntityID
WHERE cn.NYSIIS = dbo.EncodeNYSIIS(N'Liu');
To keep the efficiency of the dbo.EncodeNYSIIS UDF-based searches optimized, I highly recommend
pre-encoding your search data. This is especially true in production environments where performance is
critical. NYSIIS (and phonetic matching in general) is an extremely useful tool for approximate name-based
searches in a variety of applications, such as customer service, business reporting, and law enforcement.
Multistatement Table-Valued Functions
Multistatement TVFs are similar in style to scalar UDFs, but instead of returning a single scalar value, they
return their result as a table data type. The declaration is very similar to that of a scalar UDF, with a few
important differences:
The return type following the • RETURNS keyword is actually a table variable
declaration, with its structure declared immediately following the table variable
aren’t valid in a multistatement TVF definition.
The • RETURN statement in the body of the multistatement TVF has no values or
variables following it.
Inside the body of the multistatement TVF, you can use the SQL Data Manipulation Language (DML)
statements INSERT, UPDATE, MERGE, and DELETE to create and manipulate the return results in the table
variable that will be returned as the result.
For the example of a multistatement TVF, let’s create another business application function: a
product pull list for AdventureWorks. This TVF matches the AdventureWorks sales orders stored in the
Sales.SalesOrderDetail table against the product inventory in the Production.ProductInventory
table. It effectively creates a list for AdventureWorks employees, telling them exactly which inventory
bin to go to when they need to fill an order. Some business rules need to be defined before you write this
multistatement TVF:
In some cases, the number of ordered items may be more than are available in one •
bin. In that case, the pull list will instruct the employee to grab the product from
multiple bins.
Any partial fills from a bin will be reported on the list.•
Any substitution work (for example, substituting a different-colored item of the same •
model) will be handled by a separate business process and won’t be allowed on this list.
No zero fills (ordered items for which there is no matching product in inventory) will •
be reported back on the list.
For purposes of this example, let’s say there are three customers: Jill, Mike, and Dave. Each of these
three customers places an order for exactly five of item number 783, the black Mountain-200 42-inch
mountain bike. Let’s also say that AdventureWorks has six of this particular inventory item in bin 1, shelf
A, location 7, and another three of this particular item in bin 2, shelf B, location 10. Your business rules will
create a pull list like the following:
• Jills order: Pull five of item 783 from bin 1, shelf A, location 7; mark the order as a
complete fill.
• Mikes order: Pull one of item 783 from bin 1, shelf A, location 7; mark the order as a
partial fill.
• Mikes order: Pull three of item 783 from bin 2, shelf B, location 10; mark the order as a
partial fill.
In this example, there are only 9 of the ordered items in inventory, but 15 total items have been ordered
(3 customers multiplied by 5 items each). Because of this, Dave’s order is zero-filled—no items are pulled from
inventory to fill his order. Figure 4-5 is designed to help you visualize the sample inventory/order fill scenario.
Figure 4-5. Filling orders from inventory
Because the inventory is out of item 783 at this point (there were nine items in inventory and all nine
were used to fill Jill’s and Mike’s orders), Daves order is not even listed on the pull list report. This function
doesn’t concern itself with product substitutions—for example, completing Mike’s and Dave’s orders with a
comparable product such as item ID number 780 (the silver Mountain-200 42-inch mountain bike), if there
happens to be some in stock. The business rule for substitutions states that a separate process handles this
aspect of order fulfillment.
Many developers may see this problem as an opportunity to flex their cursor-based coding muscles.
If you look at the problem from a procedural point of view, it essentially calls for performing nested loops
through AdventureWorks’ customer orders and inventory to match them up. However, this code doesn’t
require procedural code, and the task can be completed in a set-based fashion using a numbers table, as
described in the previous section. A numbers table with numbers from 0 to 30,000 is adequate for this task;
the code to create it is shown in Listing 4-10.
Listing 4-10. Creating a Numbers Table
USE [AdventureWorks2014]
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[Numbers]')
AND type in (N'U'))
DROP TABLE [dbo].[Numbers];
-- Create a numbers table to allow the product pull list to be
-- created using set-based logic
-- Fill the numbers table with numbers from 0 to 30,000
SELECT Num + 1
WHERE Num < 30000
So, with a better understanding of order-fulfillment logic and business rules, Listing 4-11 creates a
multistatement TVF to return the product pull list according to the rules provided. As mentioned, this
multistatement TVF uses set-based logic (no cursors or loops) to retrieve the product pull list.
Many programming problems in business present a procedural loop-based solution on first glance.
This applies to problems that you must solve in T-SQL as well. If you look at business problems with a
set-based mindset, you often find a set-based solution. In the product pull list example, the loop-based
process of comparing every row of inventory to the order-detail rows is immediately apparent.
However, if you think of the inventory items and order-detail items as two sets, then the problem becomes
a set-based problem. In this case, the solution is a variation of the classic computer science/mathematics
bin-packing problem. In the bin-packing problem, you’re given a set of bins (in this case, orders) in which
to place a finite set of items (inventory items in this example). The natural bounds provided are the number
of each item in inventory and the number of each item on each order-detail line.
By solving this as a set-based problem in T-SQL, you allow SQL Server to optimize the performance of
your code based on the most current information available. As mentioned in Chapter 3, when you use
cursors and loops, you take away SQL Server’s performance-optimization options, and you assume the
responsibility for performance optimization. You can use set-based logic instead of cursors and loops to
solve this particular problem. In reality, solving this problem with a set-based solution took only about
30 minutes of my time. A cursor or loop-based solution would have taken just as long or longer, and it
wouldn’t have been nearly as efficient.
Listing 4-11. Creating a Product Pull List
CREATE FUNCTION dbo.GetProductPullList()
RETURNS @result table
SalesOrderID int NOT NULL,
ProductID int NOT NULL,
LocationID smallint NOT NULL,
Shelf nvarchar(10) NOT NULL,
Bin tinyint NOT NULL,
QuantityInBin smallint NOT NULL,
QuantityOnOrder smallint NOT NULL,
QuantityToPull smallint NOT NULL,
PartialFillFlag nchar(1) NOT NULL,
PRIMARY KEY (SalesOrderID, ProductID, LocationID, Shelf, Bin)
COUNT(*) AS PullQty,
CASE WHEN COUNT(*) < Order_Details.OrderQty
END AS PartialFillFlag
ORDER BY p.ProductID,
) AS Num,
FROM Production.ProductInventory p
INNER JOIN dbo.Numbers n
ON n.Num BETWEEN 1 AND Quantity
) Inventory_Details
ORDER BY o.ProductID,
) AS Num,
FROM Sales.SalesOrderDetail o
INNER JOIN dbo.Numbers n
ON n.Num BETWEEN 1 AND o.OrderQty
) Order_Details
ON Inventory_Details.ProductID = Order_Details.ProductID
AND Inventory_Details.Num = Order_Details.Num
Retrieving the product pull list involves a simple SELECT query like the following. Partial results are
shown in Figure 4-6:
FROM dbo.GetProductPullList();
Figure 4-6. AdventureWorks product pull list (partial)
One interesting aspect of the multistatement TVF is the CREATE FUNCTION keyword and its RETURNS clause,
which define the name of the procedure, parameters passed in (if any), and the resulting set table structure:
CREATE FUNCTION dbo.GetProductPullList()
RETURNS @result table
SalesOrderIlD int NOT NULL,
ProductID int NOT NULL,
LocationID smallint NOT NULL,
Shelf nvarchar(10) NOT NULL,
Bin tinyint NOT NULL,
QuantityInBin smallint NOT NULL,
QuantityOnOrder smallint NOT NULL,
QuantityToPull smallint NOT NULL,
PartialFillFlag nchar(1) NOT NULL,
PRIMARY KEY (SalesOrderID, ProductID, LocationID, Shelf, Bin)
Notice that you define a primary key on the table result. This also serves as the clustered index for the
result set. Due to limitations in table variables, you can’t explicitly specify other indexes on the result set.
The body of the function begins with the INSERT INTO and SELECT clauses that follow:
COUNT(*) AS PullQty,
CASE WHEN C0UNT(*) < Order_Details.OrderQty
END AS PartialFillFlag
These clauses establish population of the @result table variable. The most important point to notice
here is that the return results of this multistatement TVF are created by manipulating the contents of the
@result table variable. When the function ends, the @result table variable is returned to the caller.
Some other important facts about this portion of the multistatement TVF are that the COUNT(*) AS PullQty
aggregate function returns the total number of each item to pull from a given bin to fill a specific order-detail
row, and the CASE expression returns Y when an order-detail item is partially filled from a single bin and N
when an order-detail item is completely filled from a single bin.
The source for the SELECT query is composed of two subqueries joined together. The first subquery,
aliased as InventoryDetails, is shown next. This subquery returns a single row for every item in inventory
with information identifying the precise location where the inventory item can be found:
ORDER BY p.ProductID,
) AS Num,
FROM Production.ProductInventory p
INNER JOIN dbo.Numbers n
ON n.Num BETWEEN 1 AND Quantity
) Inventory_Details
Consider the previous example with the customers Jill, Mike, and Dave. If there are nine black
Mountain-200 42-inch mountain bikes in inventory, this query returns nine rows, one for each instance of
the item in inventory, and each with a unique row number counting from 1.
The InventoryDetails subquery is inner-joined to a second subquery, identified as Order_Details:
ORDER BY o.ProductID,
) AS Num,
FROM Sales.SalesOrderDetail o
INNER JOIN dbo.Numbers n
ON n.Num BETWEEN 1 AND o.OrderQty
) Order_Details
This subquery breaks up quantities of items in all order details into individual rows. Again, in the
example of Jill, Mike, and Dave, this query breaks each of the order details into five rows, one for each
item of each order detail. The rows are assigned unique numbers for each product. So in the example, the
rows for each black Mountain-200 42-inch mountain bike that the three customers ordered are numbered
individually from 1 to 15.
The rows of both subqueries are joined based on their ProductID numbers and the unique row
numbers assigned to each row of each subquery. This effectively assigns one item from inventory to
fill exactly one item in each order. Figure 4-7 is a visualization of the process described here, where the
inventory items and order-detail items are split into separate rows and the two rowsets are joined together.
Figure 4-7. Splitting and joining individual inventory and order-detail items
The SELECT statement also requires a GROUP BY to aggregate the total number of items to be pulled from
each bin to fill each order detail, as opposed to returning the raw inventory-to-order detail items on a one-
to-one basis:
Finally, the RETURN statement returns the @result table back to the caller as the multistatement TVF
result. Notice that the RETURN statement in a multistatement TVF isn’t followed by an expression or variable
as it is in a scalar UDF:
The table returned by a TVF can be used just like a table in a WHERE clause or a JOIN clause of an SQL
SELECT query. Listing 4-12 is a sample query that joins the example TVF to the Production.Product table to
get the product names and colors for each product listed in the pull list. Figure 4-8 shows the output of the
product pull list joined to the Production.Product table.
Listing 4-12. Retrieving a Product Pull List with Product Names
p.Name AS ProductName,
FROM Production.Product p
INNER JOIN dbo.GetProductPullList() ppl
ON p.ProductID = ppl.ProductID;
Inline Table-Valued Functions
If scalar UDFs and multistatement TVFs aren’t enough to get you excited about T-SQLs UDF capabilities,
here comes a third form of UDF: the inline TVF. Inline TVFs are similar to multistatement TVFs in that they
return a tabular rowset result.
However, whereas a multistatement TVF can contain multiple SQL statements and control-of-flow
statements in the function body, the inline function consists of only a single SELECT query. The inline TVF
is literally “inlined” by SQL Server (expanded by the query optimizer as part of the SELECT statement that
contains it), much like a view. In fact, because of this behavior, inline TVFs are sometimes referred to as
parameterized views.
The inline TVF declaration must simply state that the result is a table via the RETURNS clause. The body
of the inline TVF consists of an SQL query after a RETURN statement. Because the inline TVF returns the result
of a single SELECT query, you don’t need to bother with declaring a table variable or defining the return-table
structure. The structure of the result is implied by the SELECT query that makes up the body of the function.
The sample inline TVF performs a function commonly implemented by developers in T-SQL using
control-of-flow statements. Many times, a developer determines that a function or SP requires that a large
or variable number of parameters be passed in to accomplish a particular goal. The ideal situation would be
to pass an array as a parameter. T-SQL doesn’t provide an array data type per se, but you can split a comma-
delimited list of strings into a table to simulate an array. This gives you the flexibility of an array that you can
use in SQL joins.
Tip SQL Server 2012 forward allows table-valued parameters, which are covered in Chapter 5 in the
discussion of SPs. Because table-valued parameters have special requirements, they may not be optimal in all
Figure 4-8. Joining the product pull list to the Production.Product table
Although you could do this using a multistatement TVF and control-of-flow statement such as a WHILE
loop, you get better performance if you let SQL Server do the heavy lifting with a set-based solution. The
sample function accepts a comma-delimited varchar(max) string and returns a table with two columns, Num
and Element, which are described here:
The • Num column contains a unique number for each element of the array, counting
from 1 to the number of elements in the comma-delimited string.
The • Element column contains the substrings extracted from the comma-delimited
Listing 4-13 is the full code listing for the comma-separated string-splitting function. This function
accepts a single parameter, which is a comma-delimited string like Ronnie,Bobbie,Ricky,Mike. The
output is a table-like rowset with each comma-delimited item returned on its own row. To avoid looping
and procedural constructs (which aren’t allowed in an inline TVF), you use the same Numbers table created
previously in Listing 4-10.
Listing 4-13. Comma-Separated String-Splitting Function
CREATE FUNCTION dbo.GetCommaSplit (@String nvarchar(max))
WITH Splitter (Num, String)
CASE CHARINDEX(N',', @String, Num)
WHEN 0 THEN LEN(@String) - Num + 1
ELSE CHARINDEX(N',', @String, Num) - Num
) AS String
FROM dbo.Numbers
WHERE Num <= LEN(@String)
AND (SUBSTRING(@String, Num - 1, 1) = N',' OR Num = 0)
RTRIM(LTRIM(String)) AS Element
FROM Splitter
WHERE String <> ''
The inline TVF name and parameters are defined at the beginning of the CREATE FUNCTION statement.
The RETURNS table clause specifies that the function returns a table. Notice that the structure of the table
isn’t defined as it is with a multistatement TVF:
CREATE FUNCTION dbo.GetCommaSplit (@String varchar(max)) RETURNS table
The body of the inline TVF consists of a single RETURN statement followed by a SELECT query. This
example uses a CTE called Splitter to perform the actual splitting of the comma-delimited list. The query
of the CTE returns each substring from the comma-delimited list. CASE expressions are required to handle
two special cases, as follows:
The first item in the list, because it isn’t preceded by a comma•
The last item in the list, because it isn’t followed by a comma•
WITH Splitter (Num, String)
CASE CHARINDEX(N',', @String, Num)
WHEN 0 THEN LEN(@String) - Num + 1
ELSE CHARINDEX(N',', @String, Num) - Num
) AS String
FROM dbo.Numbers
WHERE Num <= LEN(@String)
AND (SUBSTRING(@String, Num - 1, l) = N',' OR Num = 0)
Finally, the query selects each ROWNUMBER and Element from the CTE as the result to return to the caller.
Extra space characters are stripped from the beginning and end of each string returned, and empty strings
are ignored:
LTRIM(RTRIM(String)) AS Element
FROM Splitter
WHERE String <> ''
You can use this inline TVF to split up the Jackson family, as shown in Listing 4-14. The results are
shown in Figure 4-9.
Figure 4-9. Splitting up the Jacksons
Listing 4-14. Splitting Up the Jacksons
SELECT Num, Element
FROM dbo.GetCommaSplit ('Michael,Tito,Jermaine,Marlon,Rebbie,Jackie,Janet,La Toya,Randy');
You can use this technique to pull descriptions for a specific set of AdventureWorks products. A usage
like this is good for front-end web page displays or business reports where end users can select multiple
items for which they want data returned. Listing 4-15 retrieves product information for a comma-delimited
list of AdventureWorks product numbers. The results are shown in Figure 4-10.
Listing 4-15. Using the FnCommaSplit Function
FROM Production.Product p
INNER JOIN dbo.GetCommaSplit('FR-R38R-52,FR-M94S-52,FR-M94B-44,BK-M68B-38') n
ON p.ProductNumber = n.Element;
Figure 4-10. Using a comma-delimited list to retrieve product information
Restrictions on User-Defined Functions
T-SQL imposes some restrictions on UDFs. This section discusses these restrictions and some of the
reasoning behind them.
Nondeterministic Functions
T-SQL prohibits the use of nondeterministic functions in UDFs. A deterministic function is one that returns
the same value every time when passed a given set of parameters (or no parameters). A nondeterministic
function can return different results with the same set of parameters passed to it. An example of a
deterministic function is ABS, the mathematical absolute value function. Every time and no matter how
many times you call ABS(-10), the result is always 10. This is the basic idea behind determinism.
On the flip side, there are functions that don’t return the same value despite the fact that you pass in
the same parameters, or no parameters. Built-in functions such as RAND (without a seed value) and NEWID
are nondeterministic because they return a different result every time they’re called. One hack that people
sometimes use to try to circumvent this restriction is creating a view that invokes the nondeterministic
function and selecting from that view inside their UDFs. Although this may work to some extent, it isn’t
recommended: it could fail to produce the desired results or cause a significant performance hit, because
SQL can’t cache or effectively index the results of nondeterministic functions. Also, if you create a computed
column that tries to reference your UDF, the nondeterministic functions you’re trying to access via your view
can produce unpredictable results. If you need to use nondeterministic functions in your application logic,
SPs are probably the better alternative. Chapter 5 discusses SPs.
In previous versions of SQL, there were several restrictions on the use of nondeterministic system
functions in UDFs. In SQL Server 2012, these restrictions were somewhat relaxed. You can use the
nondeterministic system functions listed in the following table in your UDFs. One thing these system
functions have in common is that they don’t cause side effects or change the database state when you
use them:
If you want to build an index on a view or computed column that uses a UDF, your UDF has to be
deterministic. The requirements to make a UDF deterministic include the following:
The UDF must be declared using the • WITH SCHEMABINDING option. When a UDF
is schema-bound, no changes are allowed to any tables or objects on which it’s
dependent without dropping the UDF first.
Any functions you refer to in your UDF must also be deterministic. This means if you •
use a nondeterministic system function—such as GETDATE—in your UDF, it’s marked
You can’t invoke extended stored procedures (XPs) in the function. This shouldn’t be •
a problem, because XPs are deprecated and will be removed from future versions of
SQL Server.
If your UDF meets all these criteria, you can check to see if SQL Server has marked it deterministic via
the OBJECTPROPERTY function, with a query like the following:
SELECT OBJECTPROPERTY (OBDECT_ID('dbo.GetCommaSplit'), 'IsDeterministic');
The OBJECTPROPERTY function returns 0 if your UDF is nondeterministic and 1 if it’s deterministic.
State of the Database
One of the restrictions on UDFs is that they aren’t allowed to change the state of the database or cause other
side effects. This prohibition on side effects in UDFs means you can’t even execute PRINT statements from
within a UDF. It also means that although you can query database tables and resources, you can’t execute
INSERT, UPDATE, MERGE, or DELETE statements against database tables. Some other restrictions include the
You can’t create temporary tables within a UDF. You can, however, create and modify •
table variables in the body of a UDF.
You can’t execute • CREATE, ALTER, or DROP on database tables from within a UDF.
Dynamic SQL isn’t allowed within a UDF, although XPs and SQLCLR functions can •
be called.
A TVF can return only a single table/result set. If you need to return more than one •
table/result set, you may be better served by an SP.
Although XPs and SQL CLR functions can be called from a UDF, Microsoft warns against depending on
results returned by XPs and SQL CLR functions that cause side effects. If your XP or SQL CLR function
modifies tables, alters the database schema, accesses the file system, changes system settings, or
utilizes non-deterministic resources external to the database, you may get unpredictable results from
your UDF. If you need to change database state or rely on side effects in your server-side code, consider
using an SQL CLR function or a regular SP instead of a UDF.
The prohibition on UDF side effects extends to the SQL Server display and error systems. This means
you can’t use the T-SQL PRINT or RAISERROR statement in a UDF. The PRINT and RAISERROR statements
are useful in debugging stored procedures and T-SQL code batches but are unavailable for use in
UDFs. One workaround that I often use is to temporarily move the body of the UDF code to an SP while
testing. This gives you the ability to use PRINT and RAISERROR while testing and debugging code in
development environments.
Variables and table variables created in UDFs have a well-defined scope and can’t be accessed outside
of the UDF. Even if you have a recursive UDF, you can’t access the variables and table variables that
were previously declared and assigned values by the calling function. If you need values that were
generated by a UDF, you must pass them in as parameters to another UDF call or return them to the
caller in the UDF result.
This chapter discussed the three types of T-SQL UDFs and provided working examples of the different types.
Scalar UDFs are analogous to mathematical functions that accept zero or more parameters and return a single
scalar value for a result. You can use the standard SQL statements, as well as control-of-flow statements, in a
scalar UDF. Multistatement TVFs allow control-of-flow statements as well but return a table-style result set
to the caller. You can use the result set returned by a multistatement TVF in WHERE and JOIN clauses. Finally,
inline TVFs also return table-style result sets to the caller; however, the body consists of a single SELECT query,
much like an SQL view. In fact, inline TVFs are sometimes referred to as parameterized views.
The type of UDF that you need to accomplish a given task depends on the problem you’re trying to
solve. For instance, if you need to calculate a single scalar value, a scalar UDF will do the job. On the other
hand, if you need to perform complex calculations or manipulations and return a table, a multistatement
TVF might be the correct choice.
You also learned about recursion in UDFs, including the 32-level recursion limit. Although 32 levels of
recursion is the hard limit, for all practical purposes you should rarely—if ever—hit this limit. If you do need
recursion beyond 32 levels, you can replace recursive function calls with CTEs and other T-SQL constructs.
Finally, the chapter talked about determinism and side effects in UDFs. Specifically, your UDFs should
not cause side effects, and specific criteria must be met in order for SQL Server to mark your UDFs as
deterministic. Determinism is an important aspect of UDFs if you plan to use them in indexed views or
computed columns.
The next chapter looks at SPs—another tool that allows procedural T-SQL code to be consolidated into
server-side units.
1. [Fill in the blank] SQL Server supports three types of T-SQL UDFs: _______,
________, and _________.
2. [True/False] The RETURNS NULL ON NULL INPUT option is a performance-
enhancing option available for use with scalar UDFs.
3. [True/False] The ENCRYPTION option provides a secure option that prevents anyone
from reverse-engineering your source code.
4. [Choose all that apply] You aren’t allowed to do which of the following in a
multistatement TVF?
a. Execute a PRINT statement
b. Call RAISERROR to generate an exception
c. Declare a table variable
d. Create a temporary table
5. The algebraic formula for converting Fahrenheit measurements to the Celsius scale is:
C=(F – 32.0) × (5/9), where F is the measurement in degrees Fahrenheit and C is
the measurement in degrees Celsius.
Write a deterministic scalar UDF that converts a measurement in degrees Fahrenheit to degrees
Celsius. The UDF should accept a single float parameter and return a float result. You can use the
OBJECTPROPERTY function to ensure that the UDF is deterministic.
Chapter 5
Stored Procedures
Stored procedures (SPs) have been a part of T-SQL from the beginning. SPs provide a means for creating
server-side subroutines written in T-SQL. SQL Server 2014 introduces the ability to natively compile an SP
that accesses memory-optimized tables. The efficiencies gained with natively compiled SPs are an absolute
game-changer in how you consider architecting an OLTP database solution.
This chapter begins with a discussion of what SPs are and why you might want to use them, and it
continues with a discussion of SP creation and usage, including examples. Natively compiled SPs are
introduced in this chapter, but the complete picture of how they work with in-memory tables is covered in
more detail in Chapter 6.
Introducing Stored Procedures
SPs are saved collections of one or more T-SQL statements stored on the server as code units. They’re
analogous to procedures or subroutines in procedural languages like Visual Basic and C#. And just like
procedures in procedural languages, SPs give you the ability to effectively extend the language of SQL Server
by letting you add named custom subroutines to your databases.
An SP declaration begins with the CREATE PROCEDURE keywords followed by the name of the SP.
Microsoft recommends against naming the SP with the prefix sp_. This prefix is used by SQL Server to name
system SPs and isn’t recommended for user SPs in databases other than the master database. The name
can specify a schema name and procedure name, or just a procedure name. If you don’t specify a schema
name when creating an SP, SQL Server creates it in the default schema for your login. It’s a best practice to
always specify the schema name so your SPs are always created in the proper schema, rather than leaving it
up to SQL Server. SQL Server allows you to drop groups of procedures with the same name with a single DROP
PROCEDURE statement.
Warning You can also define a stored procedure with the group number option during SP creation.
The group number option is deprecated and will be removed from future versions of SQL Server. Don’t use this
option in new development, and start planning to update code that uses it.
SPs, like the T-SQL user-defined functions (UDFs) discussed in Chapter 4, can accept parameter values
from and return them to the caller. The parameters are specified in a comma-separated list following the
procedure name in the CREATE PROCEDURE statement. Unlike with UDFs, when you call an SP, you can specify
the parameters in any order; and you can omit them altogether if you assigned a default value at creation
time. You can also specify OUTPUT parameters, which return values from the procedure. All this makes SP
parameters far more flexible than those of UDFs.
Each parameter is declared as a specific type and can also be declared as OUTPUT or with the VARYING
keyword (for cursor parameters only). When calling SPs, you have two choices: you can specify parameters
by position or by name. If you specify an unnamed parameter list, the values are assigned based on position.
If you specify named parameters in the format @parameter = value, they can be in any order. If your
parameter specifies a default value in its declaration, you don’t have to pass in a value for that parameter.
Unlike UDFs, SPs don’t require the DEFAULT keyword as a placeholder to specify default values. Leaving out a
parameter when you call the SP applies the default value to that parameter.
Unlike UDFs, which can return results only via the RETURN statement, SPs can communicate with the
caller in a variety of ways:
The SP’s • RETURN statement can return an int value to the caller. Unlike UDFs, SPs
don’t require a RETURN statement. If the RETURN statement is left out of the SP, 0 is
returned by default if no errors were raised during execution.
SPs don’t have the same restrictions on database side effects and determinism as do •
UDFs. SPs can read, write, delete, and update permanent tables. In this way, the caller and
SP can communicate information to one another through the use of permanent tables.
When a temporary table is created in an SP, that temporary table is available to any •
SPs called by that SP. There are two types of temporary tables: local and global. The
scope of a local temporary table is the current session, and the scope of a global
temporary table is all sessions. A local temporary table is prefixed with #, and a global
temporary table is prefixed with ##. As an example, if dbo.MyProc1 creates a local
temporary table named #Temp and then calls dbo.MyProc2, dbo.MyProc2 can access
#Temp as well. If dbo.MyProc2 then calls dbo.MyProc3, dbo.MyProc3 can also access
the same #Temp temporary table. Global temporary tables are accessible by all users
and all connections after they’re created. This provides a useful method of passing an
entire table of temporary results from one SP to another for further processing.
Output parameters provide the primary method of retrieving scalar results from an •
SP. Parameters are specified as output parameters with the OUTPUT keyword.
To return table-type results from an SP, the SP can return one or more result sets. •
Result sets are like virtual tables that can be accessed by the caller. Unlike with views,
updates to these result sets by applications don’t change the underlying tables used
to generate them. Also, unlike table-valued function (TVFs) and inline functions that
return a single table only, SPs can return multiple result sets with a single call.
Sp retUrN StateMeNtS
Because the SP RETURN statement can’t return tables, character data, decimal numbers, and so on, it’s
normally used only to return an int status or error code. This is a good convention to follow, because
most developers who use your SPs will expect it. The normal practice, followed by most of SQL Server’s
system SPs, is to return a value of 0 to indicate success and a nonzero value or an error code to indicate
an error or a failure.
Metadata Discovery
SQL Server 2012 introduced two new stored procedures and supporting Dynamic Management Views
(DMVs) to provide new capabilities for determining metadata associated with code batches or SPs. This set
of capabilities replaces the SET FMTONLY option, which is being deprecated.
Often it’s necessary to determine the format of a result set without actually executing the query.
There are also scenarios in which you have to ensure that the column and parameter metadata from
query execution is compatible with or identical to the format you specified before executing the query.
For example, if you want to generate dynamic screens based on a SELECT statement, you need to make sure
there are no metadata errors after query execution, so in turn you need to determine whether the parameter
metadata is compatible before and after query execution. This functionality introduces metadata discovery
capabilities for result sets and parameters using the SPs sp_describe_first_result_set and sp_describe_
undeclared_parameters and the DMVs dm_exec_describe_first_result_set and dm_exec_describe_
The SP sp_describe_first_result_set analyzes all possible first result sets and returns the metadata
information for the first result set that is executed from the input T-SQL batch. If the SP returns multiple
result sets, this procedure only returns the first result set. If SQL Server is unable to determine the metadata
for the first query, then an error is raised. This procedure takes three parameters: @tsql passes the T-SQL
batch, @params passes the parameters for the T-SQL batch, and @browse_information_mode determines
whether additional browse information for each result set is returned.
Alternatively, you can use the DMV sys.dm_exec_describe_first_result_set to query against; this
DMV returns the same details as the SP sp_describe_first_result_set. You can use the DMV
sys.dm_exec_describe_first_result_set_for_object to analyze objects such as SPs or triggers in the
database and return the metadata for the first possible result set and the errors associated with them. Let’s
say you want to analyze all the objects in the database and use the information for documentation purposes.
Instead of analyzing the objects one by one, you can use the DMV sys.dm_exec_describe_first_result_
set_for_object with a query similar to following:
SELECT, p.schema_id, x.* FROM sys.procedures p CROSS APPLY sys.dm_exec_describe_
first_result_set_for_object(p.object_id,0) x
The SP sp_describe_undeclared_parameters analyzes the T-SQL batch and returns the suggestion for
the best parameter datatype based on least number of conversions. This feature is very useful when you have
complicated calculations or expressions and you’re trying to figure out the best datatype for the undeclared
parameter value.
Natively Compiled Stored Procedures
Natively compiled stored procedures are new in SQL Server 2014 and can provide massive performance
gains. These SPs are similar to traditional T-SQL compiled SPs in the way you call them and how they
function. Natively compiled SPs are compiled into native C machine code, which is stored as a DLL in
machine code. This allows the CPU to run the code without the need to interpret the code at runtime,
providing for some extreme performance gains. By contrast, traditional T-SQL SPs are interpretive; they’re
compiled and then executed every time the SP is called. Natively compiled SPs have several limitations and
can only access memory-optimized tables. (Memory-optimized tables are discussed in Chapter 6.) As of SQL
Server 2014 RTM, creating a natively compiled SP has several limitations and requires a very specific syntax.
Listing 5-1 is a simple example of a traditional T-SQL interpreted SP in the Person schema that accepts
an AdventureWorks employee’s ID and returns the employees full name and e-mail address via output
parameters. The following section contrasts a new natively compiled SP using the same memory-optimized
table objects in Listing 5-1.
Note The SP in the example, Person.GetEmployee, accepts a business entity ID number as an input
parameter and returns the corresponding employee’s e-mail address and full name as output parameters. If the
business entity ID number passed in is valid, the SP returns 0 as a return value; otherwise 1 is returned.
Listing 5-1. Creating a Traditional T-SQL SP That Retrieves an Employee’s Name and E-mail
@BusinessEntityID int = NULL
, @Email_Address nvarchar(50) OUTPUT
, @Full_Name nvarchar(100) OUTPUT
-- Retrieve email address and full name from HumanResources.Employee table
SELECT @Email_Address = ea.EmailAddress,
@Full_Name = p.FirstName + ' ' + COALESCE(p.MiddleName,'') + ' ' + p.LastName
FROM HumanResources.Employee e
INNER JOIN Person.Person p
ON e.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.EmailAddress ea
ON p.BusinessEntityID = ea.BusinessEntityID
WHERE e.BusinessEntityID = @BusinessEntityID;
-- Return a code of 1 when no match is found, 0 for success
WHEN @Email_Address IS NULL THEN 1
To contrast the differences, see Listing 5-2. I break down the differences line by line following this listing.
Note The code in Listing 5-2 will not execute correctly on a test machine until all the in-memory
tables have been created. Chapter 6 discusses all the code samples, with an explanation of how to set up the
in-memory tables.
Listing 5-2. Natively Compiled SP Person.GetEmployee_inmem
CREATE PROCEDURE Person.GetEmployee_inmem
@BusinessEntityID int = NULL
, @Email_Address nvarchar(50) OUTPUT
, @Full_Name nvarchar(100) OUTPUT
/*** New InMemory Syntax ***/
/*** New InMemory Syntax ***/
LANGUAGE = N'us_english')
/*** New Variable to handle ReturnCode Logic ***/
DECLARE @ReturnCode bit = 0;
-- Retrieve email address and full name from HumanResources.Employee table
SELECT @Email_Address = ea.EmailAddress,
@Full_Name = p.FirstName + ' ' + ISNULL(p.MiddleName,'') + ' ' + p.LastName
/*** New Code to handle ReturnCode Logic ***/
, @ReturnCode = ISNULL( LEN(ea.EmailAddress,1) )
FROM HumanResources.Employee_inmem e
INNER JOIN Person.Person_inmem p ON
e.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.EmailAddress_inmem ea ON
p.BusinessEntityID = ea.BusinessEntityID
WHERE e.BusinessEntityID = @BusinessEntityID;
-- Return a code of 1 when no match is found, 0 for success
RETURN ( @ReturnCode )
There should several obvious differences when you look at the SPs in Listing 5-1 and 5-2. Following is
an outline of the differences and how to create a natively compiled SP:
1. The tables accessed in Listing 5-2 reference in-memory tables only. The new
tables are identified with the _inmem suffix. It’s an absolute requirement to access
memory-optimized tables from a natively compiled SP. Chapter 6 goes over
how to create in-memory tables, in addition to several of the limitations and
requirements for these types of tables.
2. The first difference from a traditional T-SQL SP is in line 9:
a. The WITH option is required with the indicator NATIVE_COMPILATION to show
that it’s a natively compiled SP.
b. SCHEMABINDING must be specified so it’s bound to the schema of the objects
it references. The tables referenced in the SP can’t be dropped without first
dropping the SP itself.
c. The EXECUTE AS execution context must be specified as EXECUTE AS OWNER,
EXECUTE AS USER, or EXECUTE AS SELF. The default behavior of a T-SQL SP
is EXECUTE AS CALLER, which isn’t supported in a natively compiled SP.
3. The second line with a difference is line 13. BEGIN ATOMIC must be specified so
the execution is guaranteed to be atomic. There are two required options for the
atomic blocks:
b. LANGUAGE must be specified.
4. Line 33 is completely different from the original version of the code (see Figure 5-1),
for a very important reason. Natively compiled SPs don’t support the CASE
statement. This limitation forced me to accommodate for the logic in a different
manner. In the SELECT clause (line 24 in Listing 5-2), I check the column for
ISNULL and set the variable @ReturnCode so that the valid value is returned.
Figure 5-1. Differences in the RETURN code blocks between the original T-SQL SP and the natively compiled SP
Natively compiled SPs have a significant number of limitations. They’re so numerous that it’s best to
reference the Microsoft MSDN for the latest limitations and workarounds at
One thing to keep in mind: this is the first version of this type of functionality. Each time a SP is
compiled into native machine code, it’s translating all the T-SQL into C. The limitations arise from the
challenges involved in doing this accurately. Microsoft has promised to continue investing in additional
capabilities in the next version of the in-memory features. Even with their limitations, the enhanced
performance gains of using these features are too compelling to not begin using them now.
Managing Stored Procedures
T-SQL provides two statements that allow you to modify and delete SPs: ALTER PROCEDURE and DROP
PROCEDURE, respectively. ALTER PROCEDURE lets you modify the code for an SP without first dropping it. The
syntax is the same as for the CREATE PROCEDURE statement, except that the keywords ALTER PROCEDURE
are used in place of CREATE PROCEDURE. ALTER PROCEDURE, like CREATE PROCEDURE, must always be the
first statement in a batch. Using the CREATE, DROP, and ALTER PROCEDURE statements forces SQL Server
to generate a new query plan. The advantage of ALTER over CREATE or DROP is that ALTER preserves the
permissions for the object, whereas CREATE and DROP reset the permissions. If you’re using a natively
compiled SP, the ALTER PROCEDURE code isn’t allowed. The only way to alter a natively compiled SP is to drop
the procedure and re-create it.
To delete a procedure from your database, use the DROP PROCEDURE statement. Listing 5-3 shows how to
drop the procedure created in Listing 5-1.
Listing 5-3. Dropping the Person.GetEmployee SP
DROP PROCEDURE Person.GetEmployee;
You can specify multiple SPs in a single DROP PROCEDURE statement by putting the SP names in a
comma-separated list. Note that you can’t specify the database or server name when dropping an SP, and
you must be in the database containing the SP in order to drop it. Additionally, as with other database
objects, you can grant or deny EXECUTE permissions on an SP through the GRANT and DENY statements.
Stored Procedures Best Practices
Stored procedures enable you to store batches of Transact-SQL or Managed Common Language Runtime
(CLR) code centrally on the server. SPs can be very efficient; here are some best practices that can aid
development and avoid common pitfalls that can hurt performance:
Use the • SET NOCOUNT ON statement after the AS keyword, as the first statement in the
body of the procedure, when you have multiple statements in your SP. This turns
off the DONE_IN_PROC messages that SQL Server sends back to the client after each
statement in the SP is executed. This also reduces the processing performed by SQL
Server and the size of the response sent across the network.
Use schema names when creating or referencing the SP and the database objects in •
the procedure. This helps SQL Server find the objects more quickly and thus reduces
compile lock, which results in less processing time.
Don’t use the • SP_ and sys** prefixes to name user-created database objects. They’re
reserved for Microsoft and have different behaviors.
Avoid using scalar functions in • SELECT statements that return many rows of data.
Because the scalar function must be applied to every row, the resulting behavior is
like row-based processing and degrades performance.
Avoid using • SELECT *, and select only the columns you need. This reduces
processing in the database server as well as network traffic.
Use parameters when calling SPs to increase performance. In your SPs, explicitly •
create parameters with type, size, and precision to avoid type conversions.
Use explicit transactions by using • BEGIN/END TRANSACTION, and keep transactions as
short as possible. The longer the transaction, the more chances you have for locking
or blocking, and in some cases deadlocking, as well. Keep transactions short to
reduce blocking and locking.
Use the T-SQL • TRY...CATCH feature for error handling in procedures. TRY...CATCH
can encapsulate an entire block of T-SQL statements. If you’re using TRY...CATCH
with loops, place it outside the loop for better performance. This not only creates
less performance overhead, but also makes error reporting more accurate with
significantly less programming.
Use • NULL or NOT NULL for each column in a temporary table. The ANSI_DFLT_ON
and ANSI_DFLT_OFF options control the way the database engine assigns the
NULL or NOT NULL attribute to columns when these attributes aren’t specified in a
CREATE TABLE or ALTER TABLE statement. If a connection executes a procedure with
different settings for these options than the connection that created the procedure,
the columns of the table created for the second connection can have different
nullability and exhibit different behavior. If NULL or NOT NULL is explicitly stated for
each column, the temporary tables are created by using the same nullability for all
connections that execute the procedure.
Use the • UNION ALL operator instead of the UNION or OR operator, unless there is a
specific need for distinct values. UNION filters and removes the duplicate records,
whereas the UNION ALL operator requires less processing overhead because
duplicates aren’t filtered out of the result set.
WhY StOreD prOCeDUreS?
Debates have raged through the years over the utility of SQL Server SPs. Traditional SPs in SQL
Server 2014 offer the same execution plan caching and reuse, but the luster of this benefit has faded
somewhat. Query optimization, query caching, and reuse of query execution plans for parameterized
queries have been in a state of constant improvement since SQL Server 2000. Query optimization has
been improved even more in SQL Server 2014. SPs still offer the performance benefit of not having
to send large and complex queries over the network, but the primary benefit of query execution plan
caching and reuse isn’t as enticing as it once was.
So why use SPs? Apart from the performance benefit, which isn’t as big a factor in these days of highly
efficient parameterized queries, SPs offer code modularization and security. Creating code modules
helps reduce redundant code, eliminating potential maintenance nightmares caused by duplicate code
stored in multiple locations. By using SPs, you can deny users the ability to perform direct queries
against tables, but still allow them to use SPs to retrieve the relevant data from those tables. SPs also
offer the advantage of centralized administration of portions of your database code. Finally, SPs can
return multiple result sets with a single procedure call, such as the sp_help system SP demonstrated
here (the results are shown in Figure 5-2):
EXECUTE dbo.sp_help;
Figure 5-2. Results of the dbo.sp_help SP call
Using SPs, you can effectively build an application programming interface (API) for your database. You
can also minimize and almost prevent SQL injection by using SPs with input parameters to filter and
validate all the inputs. Creation and adherence to such an API can help ensure consistent access across
applications and make development easier for front-end and client-side developers who need to access
your database. Some third-party applications, such as certain ETL programs and database drivers, also
require SPs.
Using natively compiled SPs will change the way SPs are thought of in the architecture of an application.
Because they’re compiled into machine language, there will be instances that placing business logic
directly in the database layer will perform better than other architectures.
What are the arguments against SPs? One major issue tends to be that they tightly couple your code
to the DBMS. A code base that is tightly integrated with SQL Server 2014 will be more difficult to port
to another RDBMS (such as Oracle, DB2, or MySQL) in the future. A loosely coupled application, on the
other hand, is much easier to port to different SQL DBMSs.
Portability, in turn, has its own problems. Truly portable code can result in databases and applications
that are slow and inefficient. To get true portability out of any RDBMS system, you have to take great
care to code everything in plain vanilla SQL, meaning a lot of the platform-specific performance-
enhancing functionality offered by SQL Server is off-limits.
I’m not going to dive too deeply into a discussion of the pluses and minuses of SPs. In the end, the
balance between portability and performance needs to be determined by your business requirements
and corporate IT policies on a per-project basis. Just keep these competing factors in mind when
making that decision.
Stored Procedure Example
A common application of SPs is to create a layer of abstraction for various data query, aggregation, and
manipulation functionality. The example SP in Listing 5-4 performs the common business reporting task of
calculating a running total. The results are shown in Figure 5-3.
Listing 5-4. Procedure to Calculate and Retrieve a Running Total for Sales
CREATE PROCEDURE Sales.GetSalesRunningTotal (@Year int)
WITH RunningTotalCTE
SELECT soh.SalesOrderNumber,
SELECT SUM(soh1.TotalDue)
FROM Sales.SalesOrderHeader soh1
WHERE soh1.SalesOrderNumber <= soh.SalesOrderNumber
) AS RunningTotal,
SUM(soh.TotalDue) OVER () AS GrandTotal
FROM Sales.SalesOrderHeader soh
WHERE DATEPART(year, soh.OrderDate) = @Year
GROUP BY soh.SalesOrderNumber,
SELECT rt.SalesOrderNumber,
(rt.RunningTotal / rt.GrandTotal) * 100 AS PercentTotal
FROM RunningTotalCTE rt
ORDER BY rt.SalesOrderNumber;
EXEC Sales.GetSalesRunningTotal @Year = 2014;
Figure 5-3. Partial results of the running total calculation for the year 2014
The SP in Listing 5-4 accepts a single int parameter indicating the year for which the calculation should
be performed:
CREATE PROCEDURE Sales.GetSalesRunningTotal (@Year int)
The SP uses a common table expression (CTE) to return the relevant data for the year specified,
including calculations for the running total via a simple scalar subquery and the grand total via a SUM
calculation with an OVER clause:
WITH RunningTotalCTE
SELECT soh.SalesOrderNumber,
SELECT SUM(soh1.TotalDue)
FROM Sales.SalesOrderHeader soh1
WHERE soh1.SalesOrderNumber <= soh.SalesOrderNumber
) AS RunningTotal,
SUM(soh.TotalDue) OVER () AS GrandTotal
FROM Sales.SalesOrderHeader soh
WHERE DATEPART(year, soh.OrderDate) = @Year
GROUP BY soh.SalesOrderNumber,
The result set is returned by the CTE’s outer SELECT query, and the SP finishes with a RETURN statement
that sends a return code of 0 back to the caller:
SELECT rt.SalesOrderNumber,
(rt.RunningTotal / rt.GrandTotal) * 100 AS PercentTotal FROM RunningTotalCTE rt ORDER BY
rt.SalesOrderNumber; RETURN 0;
The running sum, or running total, is a very commonly used business reporting tool. A running sum
calculates totals as of certain points in time (usually dollar amounts, and often calculated over days,
months, quarters, or years—but not always). In Listing 5-4, the running sum is calculated per order, for
each day over the course of a given year.
The running sum generated in the sample gives you a total sales amount as of the date and time when
each order is placed. When the first order is placed, the running sum is equal to the amount of that order.
When the second order is placed, the running sum is equal to the amount of the first order plus the
amount of the second order, and so on. Another closely related and often used calculation is the running
average, which represents a calculated point-in-time average as opposed to a point-in-time sum.
As an interesting aside, the ISO SQL standard allows you to use the OVER clause with aggregate
functions like SUM and AVG. The ISO SQL standard allows the ORDER BY clause to be used with the
aggregate function OVER clause, making for extremely efficient and compact running sum calculations.
Unfortunately, SQL Server 2012 doesn’t support this particular option, so you still have to resort to
subqueries and other less efficient methods of performing these calculations for now.
For the next example, assume that AdventureWorks management has decided to add a database-
driven feature to its web site. The feature they want is a “recommended products list” that will appear when
customers add products to their online shopping carts. Of course, the first step to implementing any solution
is to clearly define the requirements. The details of the requirements-gathering process are beyond the
scope of this book, so you work under the assumption that the AdventureWorks business analysts have done
their due diligence and reported back the following business rules for this particular function:
The recommended products list should include additional items on orders that •
contain the product selected by the customer. As an example, if the product selected
by the customer is product ID 773 (the silver Mountain-100 44-inch bike), then items
previously bought by other customers in conjunction with this bike—like product ID
712 (the AWC logo cap)—should be recommended.
Products that are in the same category as the product the customer selected should •
not be recommended. As an example, if a customer has added a bicycle to an order,
other bicycles should not be recommended.
The recommended product list should never contain more than ten items.•
The default product ID should be 776, the black Mountain-100 42-inch bike.•
The recommended products should be listed in descending order of the total •
quantity that has been ordered. In other words, the best-selling items will be listed in
the recommendations list first.
Listing 5-5 shows the SP that implements all these business rules to return a list of recommended
products based on a given product ID.
Listing 5-5. Recommended Product List SP
CREATE PROCEDURE Production.GetProductRecommendations (@ProductID int = 776)
WITH RecommendedProducts
SUM(od2.OrderQty) AS TotalQtyOrdered,
SUM(od2.UnitPrice * od2.OrderQty) AS TotalDollarsOrdered
FROM Sales.SalesOrderDetail od1
INNER JOIN Sales.SalesOrderDetail od2
ON od1.SalesOrderID = od2.SalesOrderID
INNER JOIN Production.Product p1
ON od2.ProductID = p1.ProductID
WHERE od1.ProductID = @ProductID
AND od2.ProductID <> @ProductID
ORDER BY rp.TotalQtyOrdered DESC
) AS Rank,
FROM RecommendedProducts rp
INNER JOIN Production.Product p
ON rp.ProductID = p.ProductID
WHERE rp.ProductSubcategoryID <>
SELECT ProductSubcategoryID
FROM Production.Product
WHERE ProductID = @ProductID
ORDER BY TotalQtyOrdered DESC;
The SP begins with a declaration that accepts a single parameter, @ProductID. The default @ProductID is
set to 776, per the AdventureWorks management teams rules:
CREATE PROCEDURE Production.GetProductRecommendations (@ProductID int = 776)
Next, the CTE that will return the TotalQtyOrdered, ProductID, TotalDollarsOrdered, and
ProductSubCategoryID for each product is defined:
WITH RecommendedProducts (
TotalDollarsOrdered )
In the body of the CTE, the Sales.SalesOrderDetail table is joined to itself based on SalesOrderlD.
A join to the Production.Product table is also included to get each product’s SubcategorylD.
The point of the self-join is to grab the total quantity ordered (OrderQty) and the total dollars ordered
(UnitPrice * OrderQty) for each product.
The query is designed to include only orders that contain the product passed in via @ProductID in the
WHERE clause, and it also eliminates results for @ProductID itself from the final results. All the results are
grouped by ProductID and ProductSubcategorylD:
SUM(od2.OrderQty) AS TotalQtyOrdered,
SUM(od2.UnitPrice * od2.OrderQty) AS TotalDollarsOrdered
FROM Sales.SalesOrderDetail od1
INNER JOIN Sales.SalesOrderDetail od2
ON od1.SalesOrderID = od2.SalesOrderID
INNER JOIN Production.Product p1
ON od2.ProductID = p1.ProductID
WHERE od1.ProductID = @ProductID
AND od2.ProductID <> @ProductID
The final part of the CTE excludes products that are in the same category as the item passed in by
@ProductID. It then limits the results to the top ten and numbers the results from highest to lowest by
TotalQtyOrdered. It also joins on the Production.Product table to get each product’s name:
ORDER BY rp.TotalOtyOrdered DESC ) AS Rank,
FROM RecommendedProducts rp INNER JOIN Production.Product p
ON rp.ProductID = p.ProductID WHERE rp.ProductSubcategorylD <> (
SELECT ProductSubcategorylD FROM Production.Product WHERE ProductID = @ProductID ) ORDER BY
TotalOtyOrdered DESC;
Figure 5-4 shows the result set of a recommended product list for people who bought a silver
Mountain-100 44-inch bike (ProductID = 773), as shown in Listing 5-6.
Figure 5-4. Recommended product list for ProductID 773
Listing 5-6. Getting a Recommended Product List
EXECUTE Production..GetProductRecommendations 773;
Implementing this business logic in an SP provides a layer of abstraction that makes it easier to use from
front-end applications. Front-end application programmers don’t need to worry about the details of which
tables need to be accessed, how they need to be joined, and so on. All your application developers need to
know to utilize this logic from the front end is that they need to pass the SP a ProductID number parameter,
and it will return the relevant information in a well-defined result set.
The same procedure promotes code reuse, and if you have business logic implemented with complex
code in an SP, the code doesn’t have to be written multiple times; instead you can simply call the SP to access
the code. Also, if you need to change the business logic, it can be done one time, in one place. Consider what
happens if the AdventureWorks management decides to make suggestions based on total dollars’ worth of a
product ordered instead of the total quantity ordered. You can change the ORDER BY clause from this
ORDER BY TotalOtyOrdered DESC;
to the following:
ORDER BY TotalDollarsOrdered DESC;
This simple change in the procedure does the trick. No additional changes to front-end code or logic
are required, and no recompilation and redeployment of code to web server farms is needed, because the
interface to the SP remains the same.
Recursion in Stored Procedures
Like UDFs, SPs can call themselves recursively. There is an SQL Server–imposed limit of 32 levels of
recursion. To demonstrate recursion, let’s solve a very old puzzle.
The Towers of Hanoi puzzle consists of three pegs and a specified number of discs of varying sizes that
slide onto the pegs. The puzzle begins with the discs stacked on top of one another, from smallest to largest,
all on one peg. The Towers of Hanoi puzzle’s starting position is shown in Figure 5-5.
Figure 5-5. The Towers of Hanoi puzzle’s starting position
The object of the puzzle is to move all the discs from the first tower to the third tower. The trick is that
you can only move one disc at a time, and no larger disc may be stacked on top of a smaller disc at any time.
You can temporarily place discs on the middle tower as necessary, and you can stack any smaller disc on top
of a larger disc on any tower. The Towers of Hanoi puzzle is often used as an exercise in computer science
courses to demonstrate recursion in procedural languages. This makes it a perfect candidate for a T-SQL
solution to demonstrate SP recursion.
The T-SQL implementation of the Towers of Hanoi puzzle uses five discs and displays each move as the
computer makes it. The complete solution is shown in Listing 5-7.
Listing 5-7. The Towers of Hanoi Puzzle
-- This stored procedure displays all the discs in the appropriate
-- towers.
-- Each disc is displayed like this "===3===" where the number is the disc
-- and the width of the === signs on either side indicates the width of the
-- disc.
-- These CTEs are designed for displaying the discs in proper order on each
-- tower.
WITH FiveNumbers(Num) -- Recursive CTE generates table with numbers 1...5
SELECT Num + 1
FROM FiveNumbers
WHERE Num < 5
GetTowerA (Disc) -- The discs for Tower A
FROM FiveNumbers f
ON f.Num = a.Disc
GetTowerB (Disc) -- The discs for Tower B
FROM FiveNumbers f
ON f.Num = b.Disc
GetTowerC (Disc) -- The discs for Tower C
FROM FiveNumbers f
ON f.Num = c.Disc
-- This SELECT query generates the text representation for all three towers
-- and all five discs. FULL OUTER JOIN is used to represent the towers in a
-- side-by-side format.
WHEN 5 THEN ' =====5===== '
WHEN 4 THEN ' ====4==== '
WHEN 3 THEN '===3=== '
WHEN 2 THEN ' ==2== '
WHEN 1 THEN ' =1= '
ELSE ' | '
END AS Tower_A,
CASE b.Disc
WHEN 5 THEN ' =====5===== '
WHEN 4 THEN ' ====4==== '
WHEN 3 THEN ' ===3=== '
WHEN 2 THEN ' ==2== '
WHEN 1 THEN ' =1= '
ELSE ' | '
END AS Tower_B,
CASE c.Disc
WHEN 5 THEN ' =====5===== '
WHEN 4 THEN ' ====4==== '
WHEN 3 THEN ' ===3=== '
WHEN 2 THEN ' ==2== '
WHEN 1 THEN ' =1= '
ELSE ' | '
END AS Tower_C
COALESCE(Disc, -1) AS Disc
FROM GetTowerA
) a
COALESCE(Disc, -1) AS Disc
FROM GetTowerB
) b
ON a.Num = b.Num
COALESCE(Disc, -1) AS Disc
FROM GetTowerC
) c
ON b.Num = c.Num
-- This SP moves a single disc from the specified source tower to the
-- specified destination tower.
CREATE PROCEDURE dbo.MoveOneDisc (@Source nchar(1),
@Dest nchar(1))
-- @SmallestDisc is the smallest disc on the source tower
DECLARE @SmallestDisc int = 0;
-- IF ... ELSE conditional statement gets the smallest disc from the
-- correct source tower
IF @Source = N'A'
-- This gets the smallest disc from Tower A
SELECT @SmallestDisc = MIN(Disc)
FROM #TowerA;
-- Then delete it from Tower A
WHERE Disc = @SmallestDisc;
ELSE IF @Source = N'B'
-- This gets the smallest disc from Tower B
SELECT @SmallestDisc = MIN(Disc)
FROM #TowerB;
-- Then delete it from Tower B
WHERE Disc = @SmallestDisc;
ELSE IF @Source = N'C'
-- This gets the smallest disc from Tower C
SELECT @SmallestDisc = MIN(Disc)
FROM #TowerC;
-- Then delete it from Tower C
WHERE Disc = @SmallestDisc;
-- Show the disc move performed
SELECT N'Moving Disc (' + CAST(COALESCE(@SmallestDisc, 0) AS nchar(1)) +
N') from Tower ' + @Source + N' to Tower ' + @Dest + ':' AS Description;
-- Perform the move - INSERT the disc from the source tower into the
-- destination tower
IF @Dest = N'A'
INSERT INTO #TowerA (Disc) VALUES (@SmallestDisc);
ELSE IF @Dest = N'B'
INSERT INTO #TowerB (Disc) VALUES (@SmallestDisc);
ELSE IF @Dest = N'C'
INSERT INTO #TowerC (Disc) VALUES (@SmallestDisc);
-- Show the towers
EXECUTE dbo.ShowTowers;
-- This SP moves multiple discs recursively
CREATE PROCEDURE dbo.MoveDiscs (@DiscNum int,
@MoveNum int OUTPUT,
@Source nchar(1) = N'A',
@Dest nchar(1) = N'C',
@Aux nchar(1) = N'B'
-- If the number of discs to move is 0, the solution has been found
IF @DiscNum = 0
PRINT N'Done';
-- If the number of discs to move is 1, go ahead and move it
IF @DiscNum = 1
-- Increase the move counter by 1
SELECT @MoveNum += 1;
-- And finally move one disc from source to destination
EXEC dbo.MoveOneDisc @Source, @Dest;
-- Determine number of discs to move from source to auxiliary tower
DECLARE @n int = @DiscNum - 1;
-- Move (@DiscNum - 1) discs from source to auxiliary tower
EXEC dbo.MoveDiscs @n, @MoveNum OUTPUT, @Source, @Aux, @Dest;
-- Move 1 disc from source to final destination tower
EXEC dbo.MoveDiscs 1, @MoveNum OUTPUT, @Source, @Dest, @Aux;
-- Move (@DiscNum - 1) discs from auxiliary to final destination tower
EXEC dbo.MoveDiscs @n, @MoveNum OUTPUT, @Aux, @Dest, @Source;
-- This SP creates the three towers and populates Tower A with 5 discs
-- SET NOCOUNT ON to eliminate system messages that will clutter up
-- the Message display
-- Create the three towers: Tower A, Tower B, and Tower C
-- Populate Tower A with all five discs
INSERT INTO #TowerA (Disc)
VALUES (1), (2), (3), (4), (5);
-- Initialize the move number to 0
DECLARE @MoveNum int = 0;
-- Show the initial state of the towers
EXECUTE dbo.ShowTowers;
-- Solve the puzzle. Notice you don't need to specify the parameters
-- with defaults
EXECUTE dbo.MoveDiscs 5, @MoveNum OUTPUT;
-- How many moves did it take?
PRINT N'Solved in ' + CAST (@MoveNum AS nvarchar(10)) + N' moves.';
-- Drop the temp tables to clean up - always a good idea.
-- SET NOCOUNT OFF before we exit
To solve the puzzle, just run the following statement:
-- Solve the puzzle
EXECUTE dbo.SolveTowers;
Figure 5-6 is a screenshot of the processing as the discs are moved from tower to tower.
Figure 5-6. Discs are moved from tower to tower.
Note The results of Listing 5-7 are best viewed in Results to Text mode. You can put SSMS in Results to
Text mode by pressing Ctrl+T while in the Query Editor window. To switch to Results to Grid mode, press Ctrl+D.
The main procedure you call to solve the puzzle is dbo.SolveTowers. This SP creates three temporary
tables named #TowerA, #TowerB, and #TowerC. It then populates #TowerA with five discs and initializes the
current move number to 0:
-- Create the three towers: Tower A, Tower B, and Tower C
-- Populate Tower A with all five discs
INSERT INTO #TowerA (Disc)
VALUES (1), (2), (3), (4), (5);
-- Initialize the move number to 0
DECLARE @MoveNum INT = 0;
Because this SP is the entry point for the entire puzzle-solving program, it displays the start position of
the towers and calls dbo.MoveDiscs to get the ball rolling:
-- Show the initial state of the towers
EXECUTE dbo.ShowTowers;
-- Solve the puzzle. Notice you don't need to specify the parameters
-- with defaults
EXECUTE dbo.MoveDiscs 5, @MoveNum OUTPUT;
When the puzzle is finally solved, control returns back from dbo.MoveDiscs to dbo.SolveTowers, which
displays the number of steps it took to complete the puzzle and performs some cleanup work, like dropping
the temporary tables:
-- How many moves did it take?
PRINT N'Solved in ' + CAST (@MoveNum AS nvarchar(10)) + N' moves.';
-- Drop the temp tables to clean up - always a good idea.
-- SET NOCOUNT OFF before we exit
Tip When an SP that created local temporary tables is no longer in scope, the local temporary tables are
automatically dropped. Because temporary tables are created in the tempdb system database, its a good idea to
get in the habit of explicitly dropping temporary tables. By explicitly dropping temporary tables, you can guarantee
that they exist only as long as they’re needed, which can help minimize contention in the tempdb database.
The procedure responsible for moving discs from tower to tower recursively is dbo.MoveDiscs. This
procedure accepts several parameters, including the number of discs to move (@DiscNum); the number of the
current move (@MoveNum); and the names of the source, destination, and auxiliary/intermediate towers.
This procedure uses T-SQL procedural IF statements to determine which types of moves are required—
single-disc moves, recursive multiple-disc moves, or no more moves (when the solution is found). If the
solution has been found, the message Done is displayed, and control is subsequently passed back to the
calling procedure, dbo.SolveTowers:
-- If the number of discs to move is 0, the solution has been found
IF @DiscNum = 0
PRINT N'Done';
If there is only one disc to move, the move counter is incremented and dbo.MoveOneDisc is called to
perform the move:
-- If the number of discs to move is 1, go ahead and move it
IF @DiscNum = 1
-- Increase the move counter by 1
SELECT @MoveNum += 1;
-- And finally move one disc from source to destination
EXEC dbo.MoveOneDisc @Source, @Dest;
Finally, if there is more than one disc move required, dbo.MoveDiscs calls itself recursively until there
are either one or zero discs left to move:
-- Determine number of discs to move from source to auxiliary tower
DECLARE @n INT = @DiscNum - 1;
-- Move (@DiscNum - 1) discs from source to auxiliary tower
EXEC dbo.MoveDiscs @n, @MoveNum OUTPUT, @Source, @Aux, @Dest;
-- Move 1 disc from source to final destination tower
EXEC dbo.MoveDiscs 1, @MoveNum OUTPUT, @Source, @Dest, @Aux;
-- Move (@DiscNum - 1) discs from auxiliary to final destination tower
EXEC dbo.MoveDiscs @n, @MoveNum OUTPUT, @Aux, @Dest, @Source;
The basis of the Towers of Hanoi puzzle is the movement of a single disc at a time from tower to tower,
so the most basic procedure, dbo.MoveOneDisc, simply moves a disc from the specified source tower to the
specified destination tower. Given source and destination towers as inputs, this procedure first determines
the smallest (or top) disc on the source and moves it to the destination table using simple SELECT queries.
The smallest disc is then deleted from the source table:
-- @SmallestDisc is the smallest disc on the source tower
DECLARE @SmallestDisc int = 0;
-- IF ... ELSE conditional statement gets the smallest disc from the
-- correct source tower
IF @Source = N'A'
-- This gets the smallest disc from Tower A
SELECT @SmallestDisc = MIN(Disc)
FROM #TowerA;
-- Then delete it from Tower A
WHERE Disc = @SmallestDisc;
Once the smallest disc of the source table is determined, dbo.MoveOneDisc displays the move it’s about
to perform and then performs the INSERT to place the disc in the destination tower. Finally, it calls the
dbo.ShowTowers procedure to show the current state of the towers and discs:
-- Show the disc move performed
SELECT N'Moving Disc
(' + CAST(COALESCE(@SmallestDisc, 0) AS nchar(1)) + N')
FROM Tower ' + @Source + N' to Tower '
+ @Dest + ':' AS Description;
-- Perform the move - INSERT the disc from the source tower into the
-- destination tower
IF @Dest = N'A'
INSERT INTO #TowerA (Disc) VALUES (@SmallestDisc);
ELSE IF @Dest = N'B'
INSERT INTO #TowerB (Disc) VALUES (@SmallestDisc);
ELSE IF @Dest = N'C
INSERT INTO #TowerC (Disc) VALUES (@SmallestDisc);
-- Show the towers
EXECUTE dbo.ShowTowers;
The dbo.ShowTowers procedure doesn’t affect processing; it’s included as a convenience to output a
reasonable representation of the towers and discs they contain at any given point during processing.
This implementation of a solver for the Towers of Hanoi puzzle demonstrates several aspects of SPs
introduced in this chapter, including the following:
SPs can call themselves recursively. This is demonstrated with the • dbo.MoveDiscs
procedure, which calls itself until the puzzle is solved.
When default values are assigned to parameters in an SP declaration, you don’t •
have to specify values for them when you call the procedure. This concept is
demonstrated in the dbo.SolveTowers procedure, which calls the dbo.MoveDiscs
The scope of temporary tables created in an SP includes the procedure in which •
they’re created, as well as any SPs it calls and any SPs they in turn call. This is
demonstrated in dbo.SolveTowers, which creates three temporary tables and then
calls other procedures that access those same temporary tables. The procedures
called by dbo.SolveTowers and those called by those procedures (and so on) can
also access these same temporary tables.
The • dbo.MoveDiscs SP demonstrates output parameters. This procedure uses an
output parameter to update the count of the total number of moves performed after
each move.
Table-Valued Parameters
Beginning with SQL Server 2008, you can pass table-valued parameters to SPs and UDFs. Prior to SQL Server
2008, the primary methods of passing multiple rows of data to an SP included the following:
Converting multiple rows to an intermediate format like comma-delimited or XML. •
If you use this method, you have to parse out the parameter into a temporary table,
table variable, or subquery to extract the rows from the intermediate format. These
conversions to and from intermediate format can be costly, especially when large
amounts of data are involved.
Placing rows in a permanent or temporary table and calling the procedure. This •
method eliminates conversions to and from the intermediate format, but it isn’t
without problems of its own. Managing multiple sets of input rows from multiple
simultaneous users can introduce a lot of overhead and additional conversion code
that must be managed.
Passing lots and lots of parameters to the SP. SQL Server SPs can accept up to 2,100 •
parameters. Conceivably, you could pass several rows of data using thousands of
parameters and ignore those parameters you don’t need. One big drawback to this
method, however, is that it results in complex code that can be extremely difficult to
Calling procedures multiple times with a single row of data each time. This method •
is probably the simplest, resulting in code that is very easy to create and manage.
The downside to this method is that querying and manipulating potentially tens of
thousands of rows of data or more, one row at a time, can result in a big performance
A table-valued parameter allows you to pass rows of data to your T-SQL statement or SPs and UDFs in
tabular format. To create a table-valued parameter, you must first create a table type that defines your table
structure, as shown in Listing 5-8.
Listing 5-8. Creating a Table Type
CREATE TYPE HumanResources.LastNameTableType
AS TABLE (LastName nvarchar(50) NOT NULL PRIMARY KEY);