TF V1400.4 Student Manual Teradata Factory

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 3462

DownloadTF V1400.4 Student Manual - Teradata Factory
Open PDF In BrowserView PDF
Teradata Factory

Course # 9038
Version 14.00.4

Student Guide

Notes

Module 0
Course Overview

Teradata Factory
Teradata Concepts
MPP System Architectures
Physical Design and Implementation
Application Utilities
Database Administration

Teradata Proprietary and Confidential

Course Introduction

Page 0-3

Tenth Edition
April, 2012

Trademarks
The following names are registered names or trademarks and are used throughout this
manual.
The product or products described in this book are licensed products of Teradata Corporation or its
affiliates.
Teradata, BYNET, DBC/1012, DecisionCast, DecisionFlow, DecisionPoint, Eye logo design, InfoWise,
Meta Warehouse, MyCommerce, SeeChain, SeeCommerce, SeeRisk, Teradata Decision Experts, Teradata
Source Experts, WebAnalyst, and You’ve Never Seen Your Business Like This Before, and Raising
Intelligence are trademarks or registered trademarks of Teradata Corporation or its affiliates.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
BakBone and NetVault are trademarks or registered trademarks of BakBone Software, Inc.
EMC2, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC2 Corporation.
GoldenGate is a trademark of GoldenGate Software, a division of Oracle Corporation.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, RACF, Tivoli, z/OS, and z/VM are registered trademarks of International Business Machines
Corporation.
Linux is a registered trademark of Linus Torvalds.
Engenio is a registered trademarks of NetApp Corporation.
Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of
Microsoft Corporation in the United States and other countries.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
QLogic and SANbox trademarks or registered trademarks of QLogic Corporation.
SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc.
SPARC is a registered trademark of SPARC International, Inc.
Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or
its affiliates in the United States and other countries.
Unicode is a collective membership mark and a service mark of Unicode, Inc.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.

The materials included in this book are a licensed product of Teradata Corporation.
Copyright Teradata Corporation ©2010-2012
Miamisburg, Ohio, U.S.A.
All Rights Reserved.
Material developed by:

Teradata Learning

Page 0-4

Course Introduction

Table of Contents
Trademarks................................................................................................................................... 0-4
Course Materials .......................................................................................................................... 0-6
Options for Displaying PDF Files ................................................................................................ 0-8
Example of Left Page – Right Page Display.............................................................................. 0-10
View and search a PDF .......................................................................................................... 0-10
PDF Comment and Markup Tools ............................................................................................. 0-12
Example of Highlighter and Sticky Note Tools ......................................................................... 0-14
Example of Typewriter Tool ...................................................................................................... 0-16
Course Description ..................................................................................................................... 0-18
Who Should Attend .................................................................................................................... 0-20
Prerequisites ............................................................................................................................... 0-20
Class Format .............................................................................................................................. 0-22
Classroom Rules ........................................................................................................................ 0-22
Outline of the Two Weeks ......................................................................................................... 0-24
Teradata Certification Tests ....................................................................................................... 0-26

Course Introduction

Page 0-5

Course Materials
The Teradata Factory course materials that are provided on a USB flash drive are listed on
the facing page. These materials are provided at the beginning of the class.
The Teradata Factory Student Manual and the Lab Workbook have been created as PDF
files which can be viewed using Adobe® Reader®.
These PDF files were created using Adobe Acrobat® and commenting has been enabled for
both files. This allows you to use Adobe® Reader® Comment and Markup tools to place
your own notes and comments within the files.

Page 0-6

Course Introduction

Course Materials
Teradata Factory course materials include:

• Paper copy of TF Lab Workbook
• Electronic copy (PDF files) of Student Manual and Lab Workbook
Contents of the flash drive include:

• Teradata Factory Class Files
– Class Files (these PDF files allow use of Comment and Markup tools)
•
•

TF v1400.4 Lab Workbook.pdf
TF v1400.4 Student Manual.pdf

– Miscellaneous Software
•
•
•
•

Acrobat Reader
Microsoft .NET Packages
Putty – use for secure shell Linux connections
Secure FTP – use for secure FTP to Linux servers

– TD 14.0 Reference Manuals
– TD 14.0 TTU – Subset of tools and utilities (numbered in order of installation)
• 01_piom__windows_i386.14.00.00.06.zip
• 02_TeraGSS__windows_i386.14.00.00.01.zip
•
:
– TD Demo Lab Setup (numbered in order of installation)

Course Introduction

Page 0-7

Options for Displaying PDF Files
Adobe® Reader® is a tool that you can use to open and view the Teradata Factory PDF
course files. You can also use the Adobe Reader to make comments or notes and save your
PDF file.
Since the Teradata Factory course materials have been created in a book format (left page right page), you may want to set options in Adobe Reader to view the materials in a book
format.



The left page contains additional information about the right or slide page.
The right page is copy of the PPT slide that is used during the presentation.

To view the Teradata Factory Student Manual in a book format using Adobe Reader 9.2 or
before, use the View Menu > Page Display and set the following options.




Page 0-8

Two-Up Continuous
Show Gaps Between Pages (normally checked or set by default)
Show Cover Page During Two-up

Course Introduction

Options for Displaying PDF Files
The Teradata Factory course materials are created in a left page – right page format.

• Left page – contains additional information about the slide page
• Right page – copy of the PPT slide that is used during the presentation
To display PDF files in a book type (left page – right page) format, Adobe Reader options need to be set.

In Adobe Reader 9.2 and earlier versions, the
options are named:
• Two-Up Continuous
• Show Gaps Between Pages
• Show Cover Page During Two-Up

Course Introduction

Page 0-9

Example of Left Page – Right Page Display
The facing page illustrates an example of displaying the Teradata Factory Student Manual in
a left page – right page format.

View and search a PDF
In the Adobe Reader toolbar, use the Zoom tools and the Magnification menu to enlarge or
reduce the page. Use the options on the View menu to change the page display. There are
various options in the Tools menu to provide you with more ways to adjust the page for
better viewing (Tools > Select & Zoom).
This is an example of menus using Adobe Reader 9.2.

These Adobe Reader toolbars open by default:
A.
B.
C.
D.
E.

Page 0-10

File toolbar
Page Navigation toolbar
Select & Zoom toolbar
Page Display toolbar
Find toolbar

Course Introduction

Example of Left Page – Right Page Display
After setting the Page Display options, the PDF file is displayed as below.

This PDF file has been created to allow the use of comment and markup tools.

Course Introduction

Page 0-11

PDF Comment and Markup Tools
The Teradata Factory course materials have "commenting" enabled. Therefore, you can
make comments in these files using the commenting and markup tools. Of the many
commenting and markup tools that are available, you may find it easier to use the following
tools (highlighted on the facing page).




Add Sticky Note
Highlight Text Tool
Typewriter

Comments can include both notes and drawings (if you have the time during class). You
can enter a text message using the Sticky Note tool. You can use a drawing tool to add a
line, circle, or other shape and then type a note in the associated pop-up note.
You can enable the Comment & Markup Toolbar or you can simply select the tools using
the pull-down menus. The example below is for Adobe Reader 9.2.


Enable the Comment & Markup Toolbar and select the tool to use



Menus (View > Toolbars > Comment & Markup) to add notes or comments.

Options on the Comment & Markup toolbar:
A.
B.
C.
D.
E.
F.
G.
H.
I.
J.
K.
L.
M.

Sticky Note tool
Text Edits tool
Stamp tool and menu
Highlight Text tool
Callout tool
Text Box tool
Cloud tool
Arrow tool
Line tool
Rectangle tool
Oval tool
Pencil tool
Show menu

After you add a note or comment, it stays selected until you click elsewhere on the page. A
selected comment is highlighted by a blue halo to help you find the markup on the page.

Page 0-12

Course Introduction

PDF Comment and Markup Tools
Comment and markup tools that may be useful include:
• Sticky Note (Comment)
Sticky Note
• Highlight Text Tool (Comment)
• Add a Text Box (Extended) or Typewriter
Highlight Text Tool

In Adobe Reader 9.2 and earlier versions, the
options are in the Tools Menu:
• Comment & Markup > Sticky Note
• Comment & Markup > Highlight Text Tool
• Typewriter

Course Introduction

Page 0-13

Example of Highlighter and Sticky Note Tools
The facing page illustrates an example of using the Highlighter and Sticky Note tools.
Select a commenting or markup tool.
Choose Tools > Comment & Markup > Highlighter or Sticky Note (or another tool)
Note: After you make an initial comment, the tool changes back to the Select tool so that
you can move, resize, or edit your comment. (The Pencil, Highlight Text, and Line tools
stay selected.)
To keep a commenting tool selected so you can add multiple comments without reselecting
the tool, do the following:




Select the tool you want to use (but don’t use it yet).
Choose View > Toolbars > Properties Bar.
Select Keep Tool Selected.

You can change the font of a text in a sticky note. Open the sticky note, choose View >
Toolbars > Properties Bar, select the text in a note, and then change the font size in the
Properties Bar.

Page 0-14

Course Introduction

Example of Highlighter and Sticky Note Tools
The left page illustrates the Highlighter tool and the right page illustrates the Sticky Note tool.

Course Introduction

Page 0-15

Example of Typewriter Tool
The facing page illustrates an example of using the Typewriter tool. This example also
illustrates that the Typewriter Toolbar is enabled.
The Typewriter Toolbar may be useful when completing review questions as shown on the
facing page. You already have the answer to one of hundreds of questions in this course.
After making notes and comments, save your changes. You may want to save your changes
to a different PDF file name in order to preserve the original PDF file.

Page 0-16

Course Introduction

Example of Typewriter Tool
The Typewriter tool can be used to add text at any location in the PDF file.

To enable the Typewriter Toolbar in Adobe 9.2 or before:

• Tools > Typewriter > Show Typewriter Toolbar

Course Introduction

Page 0-17

Course Description
This course provides information on the following major topics:







Page 0-18

Teradata Concepts
System Architectures (e.g., 2650, 2690, 6650, and 6690 Systems)
Teradata Physical Database Design
Teradata SQL ANSI Differences for Version 2
Teradata Application Utilities
Teradata Database Administration

Course Introduction

Course Description
Description
The primary focus of this ten day course is to teach you about the design,
implementation, and administration of the Teradata Database.
The major topics in this course include:

• Teradata Database features and functions
• The parallelism of the Teradata Database
• How Teradata is implemented on MPP systems (e.g., 6690 systems)
• How to perform physical database design for Teradata Database
• Teradata SQL ANSI Differences
• How to load and export data using the Teradata application utilities
• How to perform common administrative functions for the Teradata Database

Course Introduction

Page 0-19

Who Should Attend
This class is a learning event for relational database experienced individuals who need to
learn the Teradata Database. This course is designed for Teradata practitioners who need to
get hands-on practice with the Teradata Database in a learning environment.



Professional Services Consultants
Channel Partners

Prerequisites
An understanding of relational databases, SQL, and the logical data model is necessary
before attending this course.
Experience with large systems, relational databases and SQL, and an understanding of the
UNIX operating system is useful, but not required before attending this course.
There are Web Based Training classes that provide information about Teradata concepts and
SQL.



Page 0-20

Overview of Teradata
Teradata SQL

Course Introduction

Who Should Attend and Prerequisites
Who Should Attend
This course is designed for ...

• Teradata Professional Services Consultants
• Channel Partners

Prerequisites
Required:

• An understanding of the logical data model, relational, SQL, and data processing
concepts.

Useful, but not required:

• Experience with relational databases and SQL
• Experience with large systems used with Teradata

Course Introduction

Page 0-21

Class Format
This ten-day class will be conducted as a series of lectures with classroom discussions,
review questions, and workshops.

Classroom Rules
The classroom rules are listed on the facing page.

Page 0-22

Course Introduction

Class Format and Rules
Class Format
This ten day class consists of ...

• Instructor presentations
• Class discussions
• Workshop exercises

Classroom Rules
The classroom rules are …

• Turn off your cellular phones.
• During lecture, only use your laptop to follow the class materials.
• Come to class on time in the morning and after breaks.
• Enjoy the two weeks.

Course Introduction

Page 0-23

Outline of the Two Weeks
An outline of the two weeks is described on the following page. Major topic examples are
listed for each week.

Page 0-24

Course Introduction

Outline of the Two Weeks
1. Teradata Concepts
Teradata features and functions
Parallelism and Teradata

MPP System Architectures
Characteristics of MPP (e.g., 6690) systems – typical configurations
Disk Array subsystems and how Teradata utilizes disk arrays

Teradata Physical Database Design (continued in week #2)
Primary and secondary index selection; partitioned, NoPI, and columnar tables
How the Teradata database works
Collecting Statistics and Explains
SQL ANSI syntax & features; Teradata and ANSI transaction modes
Temporary tables, System Calendar, and Teradata System Limits

2. Teradata Application Utilities
Load utilities (e.g., BTEQ, FastLoad, MultiLoad, and TPump)
Export utilities (e.g., BTEQ and FastExport)

Teradata Database Administration
Dictionary tables and views; system hierarchy and space management
Users, Databases, Access Rights, Roles, and Profiles
Administrator and System Utilities – Teradata Administrator, Viewpoint, DBSControl
How to use the archive facility to do Archive, Restore, and Recovery procedures

Course Introduction

Page 0-25

Teradata Certification Tests
The facing page lists the various Teradata certification tests. Depending upon the tests that
are completed, you can earn various Teradata Certified designations such as Teradata
Certified Professional.
The Teradata 12 Certification tests require knowledge plus experience with Teradata. This
manual will help you prepare for these Teradata 12 tests, but many of the test questions are
scenario-based and Teradata experience is needed to answer these types of questions.
The Teradata V2R5 Certification tests were retired on March 31, 2010.

Page 0-26

Course Introduction

Teradata Certification Tests
Teradata 12.0 Certification Tests
 1 – Teradata 12 Basics
2 – Teradata 12 SQL
 3 – Teradata 12 Physical Design and Implementation
 4 – Teradata 12 Database Administration
5 – Teradata 12 Solutions Development
6 – Teradata 12 Enterprise Architecture
7 – Teradata 12 Comprehensive Mastery
By passing all seven Teradata 12 certification tests, you become a Teradata 12 Certified Master.

 This course (along with Teradata experience) will prepare you for these tests.
Options for Teradata V2R5 Certified Masters:

• The Teradata 12 Qualifying Exam is available as an alternative to taking tests 1 – 6.
• To achieve the Teradata 12 Master certification …
1. Pass the Teradata 12 Qualifying Exam OR pass each of the 6 tests
2. Pass the Teradata 12 Comprehensive Mastery exam

Course Introduction

Page 0-27

Notes

Page 0-28

Course Introduction

Module 1
Teradata Overview

After completing this module, you will be able to:
 Describe the purpose of the Teradata product
 Understand the history of the Teradata Corporation
 List major architectural features of the product

Teradata Proprietary and Confidential

Teradata Overview

Page 1-1

Notes

Page 1-2

Teradata Overview

Table of Contents
What is Teradata?......................................................................................................................... 1-4
How large is a Trillion and a Quadrillion? .............................................................................. 1-4
Teradata – A Brief History........................................................................................................... 1-6
What is a Data Warehouse? ......................................................................................................... 1-8
Data Marts ................................................................................................................................ 1-8
Independent Data Marts ....................................................................................................... 1-8
Logical Data Marts............................................................................................................... 1-8
Dependent Data Marts.......................................................................................................... 1-8
What is Active Data Warehousing? ........................................................................................... 1-10
What is a Relational Database? .................................................................................................. 1-12
Primary Key ........................................................................................................................... 1-12
Answering Questions with a Relational Database ..................................................................... 1-14
Foreign Key............................................................................................................................ 1-14
Teradata Database Competitive Advantages ............................................................................. 1-16
Module 1: Review Questions ..................................................................................................... 1-18

Teradata Overview

Page 1-3

What is Teradata?
Teradata is a Relational Database Management System (RDBMS) for the world’s largest
commercial databases. It is possible to have databases with over 100 terabytes (of data) in
size. This characteristic makes Teradata an obvious choice for large data warehousing
applications; however the Teradata system may also be as small as 100 gigabytes. With its
parallelism and scalability, Teradata allows you to start small with a single node and grow
large with many nodes through linear expandability.
Teradata is comparable to a large database server, with multiple client application making
inquiries against it concurrently.
Teradata 14.0 was released on February 14, 2012.
The acronym SUSE comes from the German name "Software und System Entwicklung"
which means Software and Systems Development.
The ability to manage terabytes of data is accomplished using the concept of parallelism,
wherein many individual processors perform smaller tasks concurrently to accomplish an
operation against a huge repository of data. To date, only parallel architectures can handle
databases of this size.
Acronyms: SLES – SUSE Linux Enterprise Server
SUSE – Software und System Entwicklung (German name which means
Software and Systems Development)

How large is a Trillion and a Quadrillion?
The Teradata Database was the first commercial database system to support a trillion bytes
of data. It is hard to imagine the size of a trillion. To put it in perspective, the life span of
the average person is 2.5 gigaseconds (or said differently 2,500,000,000 seconds). A trillion
seconds is 31,688 years!
Teradata has customers with multiple petabytes of data. One petabyte is one quadrillion
bytes of data. A petabyte is effectively 1000 terabyes.
1 Kilobyte (KB)
1 Megabyte (MB)
1 Gigabyte (GB)
1 Terabyte (TB)
1 Petabyte (PB)
1 Exabyte
1 Zetabyte
1 Yottabyte

Page 1-4

= 1024 bytes
= 10242 >= 1,000,000 bytes
= 10243 >= 1,000,000,000 bytes
= 10244 >= 1,000,000,000,000 bytes
= 10245 >= 1,000,000,000,000,000 bytes
= 10246 >= 1,000,000,000,000,000,000 bytes
= 10247 >= 1,000,000,000,000,000,000,000 bytes
= 10248 >= 1,000,000,000,000,000,000,000,000 bytes

Teradata Overview

What is Teradata?
The Teradata Database is a Relational Database Management System.
Designed to run the world’s largest commercial databases.

• Preferred solution for enterprise data warehousing
• Acts as a "database server" to client applications throughout the enterprise
• Uses parallelism to manage terabytes or petabytes of data
– A terabyte is a trillion bytes of data – 1012.
– A petabyte is a quadrillion bytes of data – 1015, effectively 1000 terabytes.
• Capable of supporting many concurrent users from various client platforms (over
TCP/IP or IBM channel connections).

• The latest Teradata release is 14.0 and executes as a SUSE Linux application.

Windows XP

Windows 7

Teradata
Database

Linux
Client

Teradata Overview

Mainframe
Client

Page 1-5

Teradata – A Brief History
The Teradata Corporation was founded in 1979 in Los Angeles, California. The corporate
goal was the creation of a “database computer” which could handle billions of rows of data,
up to and beyond a terabyte of data storage. It took five years of development before a
product was shipped to a first customer in 1984. In 1982, the YNET technology was
patented as the enabling technology for the parallelism that was at the heart of the
architecture. The YNET was the interconnect which allowed hundreds of individual
processors to share the same bandwidth.
In 1987, Teradata went public with its first stock offering. In 1988, Teradata partnered with
the NCR Corporation to build the next generation of database computers (e.g., 3700).
Before either company could market its next generation product, NCR was purchased by
AT&T Corporation at the end of 1991. AT&T purchased Teradata and folded Teradata into
the NCR structure in January of 1992. The new division was named AT&T GIS (Global
Information Solutions).
In 1996, AT&T spun off three separate companies, one of which was NCR which then
returned to its old name. Teradata was a division of NCR from 1997 until 2001. In 1997,
Teradata (as part of NCR) had become the world leader in scalable data warehouse
solutions.
In 2007, NCR and Teradata separated as two corporations.

Page 1-6

Teradata Overview

Teradata – A Brief History
1979 – Teradata Corp founded in Los Angeles, California
– Development begins on a massively parallel computer
1982 – YNET technology is patented.
1984 – Teradata markets the first database computer DBC/1012
– First system purchased by Wells Fargo Bank of California
1989 – Teradata and NCR partner on next generation of DBC.
1992 – NCR Corporation is acquired by AT&T and Teradata is merged into NCR within
AT&T and named AT&T GIS (Global Information Solutions).
1996 – AT&T spins off NCR Corporation with Teradata; Teradata Version 2 is released.
1997 – The Teradata Database becomes the industry leader in data warehousing.
2000 – The first 100+ Terabyte system is put into production.
2002 – Teradata V2R5 released 12/2002; major release including features such as PPI,
roles and profiles, multi-value compression, and more.
2007 – NCR and Teradata become two separate corporations. Teradata 12.0 is released.
2010 – Teradata 13.10 is released as well as 2650/4600/5600/5650 systems.
2011 – Teradata releases 6650/6680/6690 systems.
– More than 20 customers with 1 PB or larger systems
2012 – Teradata 14.0 is released on February 14, 2012.

Teradata Overview

Page 1-7

What is a Data Warehouse?
A data warehouse is a central, enterprise-wide database that contains information extracted
from the operational data stores. Data warehouses have become more common in
corporations where enterprise-wide detail data may be used in on-line analytical processing
to make strategic and tactical business decisions. Warehouses often carry many years worth
of detail data so that historical trends may be analyzed using the full power of the data.
Many data warehouses get their data directly from operational systems so that the data is
timely and accurate. While data warehouses may begin somewhat small in scope and
purpose, they often grow quite large as their utility becomes more fully exploited by the
enterprise.
Data Warehousing is a process, not a product. It is a technique to properly assemble and
manage data from various sources to answer business questions not previously possible or
known.

Data Marts
A data mart is a special purpose subset of enterprise data used by a particular department,
function or application. Data marts may have both summary and detail data, however,
usually the data has been pre-aggregated or transformed in some way to better handle the
particular type of requests of a specific user community.

Independent Data Marts
Independent data marts are created directly from operational systems, just as is a data
warehouse. In the data mart, the data is usually transformed as part of the load process.
Data might be aggregated, dimensionalized or summarized historically, as the requirements
of the data mart dictate.

Logical Data Marts
Logical data marts are not separate physical structures but rather are an existing part of the
data warehouse. Because in theory the data warehouse contains the detail data of the entire
enterprise, a logical view of the warehouse might provide the specific information for a
given user community, much as a physical data mart would. Without the proper technology,
a logical data mart can be a slow and frustrating experience for end users. With the proper
technology, it removes the need for massive data loading and transforming, making a single
data store available for all user needs.

Dependent Data Marts
Dependent data marts are created from the detail data in the data warehouse. While having
many of the advantages of the logical data mart, this approach still requires the movement
and transformation of data but may provide a better vehicle for performance-critical user
queries.

Page 1-8

Teradata Overview

What is a Data Warehouse?
A Data Warehouse is a central, enterprise-wide database that contains information
extracted from Operational Data Stores (ODS).

•
•
•
•
•

Based on enterprise-wide model
Can begin small but may grow large rapidly
Populated by extraction/loading data from operational systems
Responds to end-user "what if" queries
Can store detailed as well as summary data

ATM

PeopleSoft ®

Point of Service
(POS)

Operational
Data

Data Warehouse
Teradata Database
Teradata
Warehouse Miner

Cognos ®

MicroStrategy ®

Examples of
Access Tools
End Users

Teradata Overview

Page 1-9

What is Active Data Warehousing?
The facing page provides a simple definition of Active Data Warehousing (ADW).
Examples of why ADW is important (possibly mission critical applications) to different
industries include:


Airlines want an accurate view of customer value contribution so as to provide
optimum customer service to the appropriate customer, whether or not they are
frequent flyers.



Health care organizations need to control costs, but not at the expense of
jeopardizing quality of care. Proactive intervention programs where high-risk
patients are identified and steered into case-management programs accomplish
both.



Financial institutions must fully understand a customer’s profitability
characteristics to automate appropriate and timely communications for increased
revenue opportunity and/or better customer service.



Retailers need to have a single, integrated view of each customer across multiple
channels of opportunity - web, in-store, and catalog - to provide the right offer
through the right vehicle.



Communications companies must manage a constantly changing competitive
environment and offer products and services to reduce customer churn rates.

One of the capabilities of ADW is to execute tactical queries in a timely fashion. Tactical
queries are not the same as OLTP queries. Characteristics of a tactical query include:




More read-oriented
Focused on decision making
More casual arrival rate than OLTP queries

Examples of tactical queries include determining the best offer for a customer or altering an
advertising campaign based on current demand and results.
Another example of utilizing Active Data Warehousing is in the “Rental Car Business”.
Assume a service provider has a limited (relatively) fixed inventory of cars. The goal is to
rent the maximum number of vehicles at the maximum price possible under the constraint
that all prices offered exceed variable cost of the rental.


Pricing can be determined by forecasting demand and price elasticity as it relates to
demand



Differentiated pricing is the ultimate yield management strategy

In order to do this, the business requires up to date, complete, and detailed data across the
entire company.

Page 1-10

Teradata Overview

What is Active Data Warehousing?
Data Warehousing … is the timely, integrated, logically consistent store of
detailed data available for analytic business decision making.

• Primarily batch feeds and updates
• Ad hoc (or decision support) queries to support strategic decisions that return in
minutes and maybe hours

Active Data Warehousing … is the timely, integrated, logically consistent store
of detailed data available for strategic, tactical driven business decisions.

• Timely updates – close to real time
• Short, tactical queries that return in seconds
• Event driven activity plus strategic queries
Business requirements for an ADW (Active Data Warehouse)?

•
•
•
•

Performance – response within seconds
Scalability – support for large data volumes, mixed workloads, and concurrent users
Availability – 7 x 24 x 365
Data Freshness – Accurate, up to the minute, data

Teradata Overview

Page 1-11

What is a Relational Database?
A database is a collection of permanently stored data that is used by an application or
enterprise. A database contains logically related data. Basically, that means that the
database was created with a purpose in mind. A database supports shared access by many
users. A database also is protected to control access and managed to retain its value and
integrity.
The key to understanding relational databases is the concept of the table made up of rows
and columns.
A column always contains like data. In the example on the following page, the column
named LAST NAME contains last name, and never anything else. The position of the
column in the table is arbitrary.
A row is one instance of all the columns of a table. In our example, all of the information
about a single employee is in one row. The sequence of the rows in a table is arbitrary.
Specifically, in a Relational Database, tables are defined as a named collection of one or
more named columns by zero or more rows of related information.
Notice that each row of the table is about a person. There are no rows with data on two
people, nor are there rows with information on anything other than people. This may seem
obvious, but the concept underlying it is very important.
Each row represents an occurrence of an entity defined by the table. An entity is defined as
a person, place or thing about which the table contains information. In this case the entity is
the employee.

Primary Key
Tables, made up of rows and columns, represent entities or relationships. Entities are the
people, places, things, or events that the Entity Tables Model. Each table holds only one
kind of row, and each row is uniquely identified within a table by a Primary Key (PK).
A Primary Key is required. A Primary Key can be more than one column. A Primary
Key uniquely identifies each row in a table. No duplicate values are allowed. Only one
Primary Key is allowed per table. The Primary Key for the EMPLOYEE table is the
Employee number. No two employees can have the same number.
Because it is used to identify, the Primary Key cannot be NULL. There must be something
in that field to uniquely identify each occurrence. Primary Key values cannot be changed.
Historical information as well as relationships with other entities may be lost if a PK value is
changed or re-used.

Page 1-12

Teradata Overview

What is a Relational Database?
• A Relational Database consists of a set of logically related tables.
• A table is a two dimensional representation of data consisting of rows and columns.
• Each row is in the table uniquely identified by a Primary Key (PK) – 1 or more columns.
– A PK cannot have duplicate values and cannot be NULL; only one per table.
– A PK are considered “non-changing” values.

• A table may optionally have 1 or more Foreign Keys (FK).
– A FK can be 1 or more columns, can have duplicate values, and allows NULLs
– Each FK value must exist somewhere as a PK value
Column
Employee

Table

Row

MANAGER
EMPLOYEE EMPLOYEE
NUMBER
NUMBER

DEPT
NUMBER

JOB
CODE

PK

FK

FK

FK

1006
1008
1007
1003

1019
1019
1005
0801

301
301
?
401

312101
312102
432101
411100

LAST
NAME

Stein
Kanieski
Villegas
Trader

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Arnando
James

861015
870201
870102
860731

631015
680517
470131
570619

3945000
3925000
5970000
4785000

This Employee table has 9 columns and 4 rows of sample data – one row per employee.
There is no prescribed order for the rows of the table.
There is only one row “format” for the entire table.
Missing data values are represented by “NULLs”.

Teradata Overview

Page 1-13

Answering Questions with a Relational Database
A relational database is a collection of relational tables stored in a single installation of a
relational database management system (RDBMS). The words “management system”
indicate that not only is this a relational database but also there is underlying software to
provide additional functions that the industry expects. This includes transaction integrity,
security, journaling, and other features that are expected of databases in general. The
Teradata Database is a Relational Database Management System.
Relational databases do not use access paths to locate data, rather data connections are made
by data values. In other words, data connections are made by matching values in one
column with the values in a corresponding column in another table. This connection is
referred to as a JOIN in relational terminology.
The diagram on the facing page show how the values in one table may be matched to values
in another. Both tables have a column named “Department Number”. That connection
allows the database to answer questions like, “What is the name of the department in which
an employee works?”
One reason relational databases are so powerful is that, unlike other databases, they are
based on a mathematical model developed by Dr. Edgar Codd and implement a query
language solidly founded in set theory.
To summarize, a relational database is a collection of tables. The data contained in the
tables can be associated using data values, specifically, columns with matching data
values.

Foreign Key
Relational Databases permit associations by data value across more than one table. Foreign
Keys (FKs) model the relationships between entities.
On the facing page you will see that the employee table has 3 FK columns, one of which
models the relationship between employees and their departments. A second one models the
relationship between employees and their job codes.
A third FK column is used to model the relationship between employees and each other.
This is called a “recursive” relationship.
Rules of Foreign Keys include:





Duplicate values are allowed in a FK column.
Missing values are allowed in a FK column.
Values may be changed in a FK column.
Each FK value must exist as a Primary Key.

Note that Dept_Number is the Primary Key for the DEPARTMENT table.

Page 1-14

Teradata Overview

Answering Questions with a Relational Database
Employee (partial listing)
MANAGER
EMPLOYEE EMPLOYEE
NUMBER
NUMBER

DEPT
NUMBER

JOB
CODE

PK

FK

FK

FK

1006
1008
1005
1004
1007
1003

1019
1019
0801
1003
1005
0801

301
301
403
401
403
401

312101
312102
431100
412101
432101
411100

LAST
NAME

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

Department
DEPT
NUMBER

DEPARTMENT
NAME

MANAGER
BUDGET EMPLOYEE
AMOUNT NUMBER

PK
501
301
403
402
401

FK
marketing sales
research and development
education
software support
customer support

80050000
46560000
93200000
30800000
98230000

1017
1019
1005
1011
1003

Questions:
1. Name the department in which James Trader works.
2. Who manages the Education Department?
3. Identify by name an employee who works for James Trader.

Teradata Overview

Page 1-15

Teradata Database Competitive Advantages
As technology has improved, a number of aspects of the decision support environment have
changed (improved). DSS systems are expected to:




Store and efficiently process detailed data (reduces the need for summarized data).
Process ad hoc queries in a timely fashion.
Contain current (up-to-date) data.

Teradata meets these requirements. The facing page lists a number of the key competitive
advantages that Teradata provides. This course will look at these features in detail and
explain why these are competitive advantages.
Teradata provides a central, enterprise-wide database that contains information extracted
from operational data stores. It provides for a single version of the business (or truth).
Characteristics include:





Based on enterprise-wide model – this type of model provides the ability to
look/work across functional processes.
Customers can begin small (right size), but may grow large rapidly
Populated by extraction/loading of data from operational systems
Allows end-users to submit “what if” queries

Examples of applications that Teradata enables include:





Customer Relationship Management (CRM)
Campaign Management
Yield Management
Supply Chain Management

Some of the reasons that Teradata is the leader in data warehousing include:


Scalable – supports a small (10 GB) to a massive (Petabytes) database.



Provides a query optimizer with approximately 30+ years of experience in largetable query planning.



Does not require complex indexing schemes, complex data partitioning or timeconsuming reorganizations (re-orgs).



Supports ad hoc querying against the detail data in the warehouse, not just
summary data in the data mart.



Designed and built with parallelism from day one (not a parallel retrofit).

Page 1-16

Teradata Overview

Teradata Database Competitive Advantages
• Unlimited, Proven Scalability – amount of data and number of users; allows
for an enterprise wide model of the data.

• Unlimited Parallelism – parallel access, sorts, and aggregations.
• Mature Optimizer – handles complex queries, up to 128 joins per query, adhoc processing.

• Models the Business – normalized data (usually in 3NF), robust view
processing, & provides star schema capabilities.

• Provides a “single version of the business”.
• Low TCO (Total Cost of Ownership) – ease of setup, maintenance, &
administration; no re-orgs, lowest disk to data ratio, and robust expansion
utility (reconfig).

• High Availability – no single point of failure.
• Parallel Load and Unload utilities – robust, parallel, and scalable load and
unload utilities such as FastLoad, MultiLoad, TPump, and FastExport.

Teradata Overview

Page 1-17

Module 1: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 1-18

Teradata Overview

Module 1: Review Questions
1. Which feature allows the Teradata Database to process enormous volumes of data quickly? ____
a.
b.
c.
d.

High availability software and hardware components
High performance servers from Intel
Proven Scalability
Parallelism

2. The Teradata Database is primarily a ____ .
a. Client
b. Server
3. Which choice represents a quadrillion bytes or a Petabyte (PB) of data? ____
a.
b.
c.
d.

109
1012
1015
1018

4. In a relational table, the set of columns that uniquely identifies a row is the _________ _________.

Teradata Overview

Page 1-19

Notes

Page 1-20

Teradata Overview

Module 2
Teradata Basics

After completing this module, you will be able to:
 List and describe the major components of the Teradata
architecture.
 Describe how the components interact to manage incoming
and outgoing data.
 List 5 types of Teradata database objects.

Teradata Proprietary and Confidential

Teradata Basics

Page 2-1

Notes

Page 2-2

Teradata Basics

Table of Contents
Major Components of Teradata ................................................................................................... 2-4
Teradata Storage Architecture...................................................................................................... 2-6
Teradata Retrieval Architecture ................................................................................................... 2-8
Multiple Tables on Multiple AMPs ........................................................................................... 2-10
Here's how it works: ........................................................................................................... 2-10
Linear Growth and Expandability .............................................................................................. 2-12
Teradata Objects......................................................................................................................... 2-14
Tables ................................................................................................................................. 2-14
Views ................................................................................................................................. 2-14
Macros ................................................................................................................................ 2-14
Triggers .............................................................................................................................. 2-14
Stored Procedures .............................................................................................................. 2-14
The Data Dictionary Directory (DD/D) ..................................................................................... 2-16
Structure Query Language (SQL) .............................................................................................. 2-18
Data Definition Language (DDL) .......................................................................................... 2-18
Data Manipulation Language (DML) .................................................................................... 2-18
Data Control Language (DCL) .............................................................................................. 2-18
User Assistance ...................................................................................................................... 2-18
CREATE TABLE – Example of DDL ...................................................................................... 2-20
Views ......................................................................................................................................... 2-22
Single-table View ................................................................................................................... 2-22
Multi-Table Views ..................................................................................................................... 2-24
Macros ........................................................................................................................................ 2-26
Features of Macros ................................................................................................................. 2-26
Benefits of Macros ................................................................................................................. 2-26
HELP Commands ...................................................................................................................... 2-28
SHOW Command ...................................................................................................................... 2-30
EXPLAIN Facility ..................................................................................................................... 2-32
Summary .................................................................................................................................... 2-34
Module 2: Review Questions ..................................................................................................... 2-36

Teradata Basics

Page 2-3

Major Components of Teradata
Up until now we have discussed relational databases in terms of how the user perceives
them – as a collection of tables that relate to one another. Now it's time to describe the
components of the system.
The major software components are the Parsing Engine (PE) and the Access Module
Processor (AMP).
The Parsing Engine is a component that interprets SQL requests, receives input records and
passes data. To do that it sends the messages through the Message Passing Layer to the
AMPs.
The Message Passing Layer (MPL) handles the internal communication of the Teradata
Database. The MPL is a combination of hardware and software (BYNET and PDE as we
will see later). All communication between PEs and AMPs is done via the Message Passing
Layer.
The Access Module Processor (AMP) is responsible for managing a portion of the
database. An AMP will control some portion of each table on the system. AMPs do all of
the physical work associated with generating an answer set including, sorting, aggregating,
formatting and converting.
A Virtual Disk is disk space associated with an AMP. Tables/data rows are stored in this
space. A virtual disk is usually assigned to two or more disk drives in a disk array. This
concept will be discussed in detail later in the course.

Page 2-4

Teradata Basics

Major Components of Teradata
SQL
Request

Answer Set
Response

Parsing Engine

…

Parsing Engine

Message Passing Layer (MPL)
• Allows PEs and AMPs to communicate with
each other

Message Passing Layer

AMP

Vdisk

AMP

Vdisk

AMP

Vdisk

…

…

Parsing Engines (PE)
• Manage sessions for users
• Parse, optimize, and send your request to
the AMPs as execution steps
• Returns answer set response back to client

AMP

Access Module Processors (AMP)
• Owns and manages its storage
• Performs the steps sent by the PEs

Vdisk

Virtual Disks (Vdisk)
• Space owned by the AMP and is used to
hold user data (rows within tables).
• Maps to physical space in a disk array.

AMPs store and retrieve rows to and from disk.

Teradata Basics

Page 2-5

Teradata Storage Architecture
On the facing page you will see a simplified view of how the physical components of a
Teradata database work to insert a row of data.
The PEs and AMPs are actually implemented as virtual processors (vprocs) in the system.
A vproc is effectively a group of processes that represents a Teradata software component.
The Parsing Engine interprets the SQL command and converts the data record from the
host into an AMP message.


The Parsing Engine is a component that interprets SQL requests, receives input
records and passes data. To do that it sends the messages through the Message
Passing Layer to the AMPs.

The Message Passing Layer distributes the row to the appropriate Access Module
Processor (AMP).


The Message Passing Layer is implemented as hardware and/or software,
depending on the platform used. It determines which vprocs should receive a
message.

The AMP formats the row and writes it to its associated disks (Vdisks) which are assigned
to physical disks in a disk array. The physical disk holds the row for subsequent access.
The Host or Client system supplies the records. These records are the raw data from which
the database will be constructed.
Think of the AMP (Access Module Processor) as a independent computer designed for and
dedicated to managing a portion of the entire database. It performs all the database
management functions – such as sorting, aggregating, and formatting the data. It receives
data from the PE, formats the rows, and distributes the rows to the disk storage units it
controls. It also retrieves the rows requested by the parsing engine.

Page 2-6

Teradata Basics

Teradata Storage Architecture
Records From Client (in random sequence)
2

32

67

12

90

6

54

75

18

25

80

41

Teradata

Parsing
Engine(s)

The Parsing Engine dispatches
request to insert a row.
The Message Passing Layer
insures that a row gets to the
appropriate AMP (Access Module
Processor).

Message Passing Layer

AMP 1

2

AMP 2

AMP 3

12

80

54
18

Teradata Basics

90
41

75

AMP 4

25
32

67
6

The AMP stores the row on its
associated (logical) disk.

An AMP manages a logical or
virtual disk which is mapped to
multiple physical disks in a disk
array.

Page 2-7

Teradata Retrieval Architecture
Retrieving data from the Teradata Database simply reverses the process of the storage
model. A request is made for data and is passed on to a Parsing Engine (PE). The PE
optimizes the request for efficient processing and creates tasks for the AMPs to perform,
which will result in the request being satisfied. These tasks are then dispatched to the AMPs
via the Message Passing Layer. Often times all AMPs must participate in creating the
answer set, such as in returning all rows of a table. Other times, only one or a few AMPs
need participate, depending on the nature of the request. The PE will insure that only the
AMPs that are needed will be assigned tasks on behalf of this request.
Once the AMPs have been given their assignments, they will retrieve the desired rows from
their respective disks. If sorting, aggregating or formatting of any kind is needed, the AMPs
will also take care of that. The rows are then returned to the requesting PE via the Message
Passing Layer. The PE takes the returned answer set and returns it to the requesting client
application.

Page 2-8

Teradata Basics

Teradata Retrieval Architecture
Rows retrieved from table
2

32

67

12

90

6

54

75

18

25

80

41

Teradata

The Parsing Engine dispatches a
request to retrieve one or more
rows.

Parsing
Engine(s)

The Message Passing Layer
insures that the appropriate
AMP(s) are activated.

Message Passing Layer

AMP 1

2

AMP 2

Teradata Basics

90
41

AMP 4

75

The AMP(s) locate and retrieve
desired row(s) in parallel access.

Message Passing Layer returns
the retrieved rows to PE.

80

12

54
18

AMP 3

25
32

67
6

The PE returns row(s) to
requesting client application.

Page 2-9

Multiple Tables on Multiple AMPs
Logically, you might think that the Teradata Database would assign each table to a particular
AMP, and that the AMP would put that table on a single disk. However, as you see on the
diagram on the facing page, that’s not what will happen. The system takes the rows that
composes a table and divides those rows up among all available AMPs. TO MAKE IT PARALLEL!!!!!!!

Here's how it works:
Tables are distributed across all AMPs. This distribution of rows should be even across all
AMPs. This way, a request to get the rows of a given table will result in the workload being
evenly distributed across the AMPs.


Each table has some rows distributed to each AMP.



Each AMP controls one logical storage unit which may consist of several physical
disks
VDISK



Each AMP places, maintains, and manages the rows on its own disks.



Large configurations may have hundreds of AMPs.



Full table scans, operations that require looking at all the rows of a table, access all
AMPs in parallel. That parallelism is what makes possible the accessing of
enormous amounts of data.
faster

Consider the following three tables: EMPLOYEE, DEPARTMENT, and JOB.
The Teradata Database takes the rows from each of the tables and divides them up among all
the AMPs. The AMPs divide the rows up among their disks. Notice that each AMP gets
part of each table. Dividing up the tables this way means that all the AMPs and their
associated disks will be activated in a full table scan, thus speeding up requests against these
tables.
In our example, if you assume four AMPs, each AMP would get approximately 25% of
each table. If, however, AMP #1 were to get 90% of the rows from the EMPLOYEE table
that would be called "lumpy" data distribution. Lumpy data distribution would slow the
system down because any request that required scanning all the rows of EMPLOYEE would
have three AMPs sitting idle while AMP #1 finished its work. It is better to divide all the
tables up evenly among all the available AMPs. You will see how this distribution is
controlled in a later chapter.

Page 2-10

Teradata Basics

Multiple Tables on Multiple AMPs
EMPLOYEE Table

DEPARTMENT Table

JOB Table

Parsing Engine

Row from each table will usually
be stored on each AMP.
Each AMP may have rows from all
tables.

Message Passing Layer

AMP 1

AMP 2

EMPLOYEE Rows
EMPLOYEE Rows
DEPARTMENT Rows DEPARTMENT Rows
JOB Rows
JOB Rows

Teradata Basics

AMP 3

EMPLOYEE Rows
DEPARTMENT Rows
JOB Rows

Ideally, each AMP will hold
roughly the same amount of data.
AMP 4

EMPLOYEE Rows
DEPARTMENT Rows
JOB Rows

Page 2-11

Linear Growth and Expandability
The Teradata DBS is the first commercial database system to offer true parallelism and the
performance increase that goes with it.
Think back to the example of how rows are divided up among AMPs that we just discussed.
Assume that our three tables, EMPLOYEE, DEPARTMENT, and JOB total 100,000 rows,
with a certain number of users, say 50.
What happens if you double the number of AMPs and the number of users stays the same?
Performance doubles. Each AMP can only work on half as many rows as they used to.
Now think of that system in a situation where the number of users is doubled, as well as the
number of AMPs. We now have 100 users, but we also have twice as many AMPs. What
happens to performance? It stays the same. There is no drop-off in the speed with which
requests are executed.
That's because the system is modular and the workload is easily partitioned into
independent pieces. In the last example, each AMP is still doing the same amount of work.
This feature – that the amount of time (or money) required to do a task is directly
proportional to the size of the system – is unique to the Teradata Database. Traditional
databases show a sharp drop in performance when the system approaches a critical size.
Look at the diagram on the facing page. As the number of Parsing Engines increases, the
number of SQL requests that can be supported increases.
As you add AMPs, data is spread out more even as you add processing power to handle the
data.
As you add disks, you add space for each AMP to store and process more information. All
AMPs must have the same amount of disk storage space.
There are numerous advantages to having a system that has linear scalability. Two
advantages include:



Page 2-12

Linear scalability allows for increased workload without decreased throughput.
Investment protection for application development

Teradata Basics

Linear Growth and Expandability
Parsing
Engine

Parsing
Engine
Parsing
Engine

SESSIO

• Teradata is a linearly
expandable RDBMS.

NS

• Components may be added as

AMP

requirements grow.

AMP
AMP
LE
PARAL

CESSIN
L PR O

• Linear scalability allows for

G

increased workload without
decreased throughput.

• Performance impact of adding

Disk

components is shown below.

Disk
Disk
DAT A

USERS
Same
Double
Same
Same

Teradata Basics

AMPs
Same
Double
Double
Double

DATA
Same
Same
Double
Same

Performance
Same
Same
Same
Double

Page 2-13

Teradata Objects
A “database” or “user” in Teradata database systems is a collection of objects such as
tables, views, macros, triggers, stored procedures, user-defined functions, or indexes
(join and hash). Database objects are created and accessed using standard Structured
Query Language or SQL.
All database object definitions are stored in a system database called the Data
Dictionary/Directory (DD/D).
Databases provide a logical grouping for information. They are also the foundation for
space allocation and access control. A description of some of the objects follows.

Tables
A table is the logical structure of data in a relational database. It is a two-dimensional
structure made up of columns and rows. A user defines a table by giving it a table name
that refers to the type of data that will be stored in the table. A column represents attributes
of the table. Column names are given to each column of the table. All the information in a
column is the same type, for example, date of birth. Each occurrence of an entity is stored in
the table as a row. Entities are the people, things, or events that the table is about. Thus a
row would represent a particular person, thing, or event.

Views
A view is a pre-defined subset of one of more tables or other views. It does not exist as a
real table, but serves as a reference to existing tables or views. One way to think of a view
is as a virtual table. Views have definitions in the data dictionary, but do not contain any
physical rows. The database administrator can use views to control access to the underlying
tables. Views can be used to hide columns from users, to insulate applications from
database changes, and to simplify or standardize access techniques.

Macros
A macro is a predefined, stored set of one or more SQL commands and optionally, report
formatting commands. Macros are used to simplify the execution of frequently used SQL
commands.

Triggers
A trigger is a set of SQL statements usually associated with a column or a table and when
that column changes, the trigger is fired – effectively executing the SQL statements.

Stored Procedures
A stored procedure is a program that is stored within Teradata and executes within the
Teradata Database. A stored procedure uses permanent disk space.
A stored procedure is a pre-defined set of statements invoked through a single SQL CALL
statement. Stored procedures may contain both Teradata SQL statements and procedural
statements (in Teradata, referred to as Stored Procedure Language, or SPL).
Page 2-14

Teradata Basics

Teradata Objects
Examples of objects within a Teradata database or user include:
Tables – rows and columns of data
Views – predefined subsets of existing tables
Macros – predefined, stored SQL statements
Triggers – SQL statements associated with a table
Stored Procedures – program stored within Teradata
User-Defined Function – function (C or Java program) to provide additional SQL functionality
Join and Hash Indexes – separate index structures stored as objects within a database
Permanent Journals – table used to store before and/or after images for recovery
DATABASE or USER can have a mix
of various objects.
* - require Permanent Space

These objects are created,
maintained, and deleted using SQL.
Object definitions are stored in the
DD/D.

TABLE 1 *

TABLE 2 *

TABLE 3 *

VIEW 1

VIEW 2

VIEW 3

MACRO 1

Stored Procedure 1 *

TRIGGER 1

UDF 1 *

Join/Hash Index 1 *
These aren't directly accessed by users.
Permanent Journal *

Teradata Basics

Page 2-15

The Data Dictionary Directory (DD/D)
The Data Dictionary/Directory is an integrated set of system tables which store database
object definitions and accumulate information about users, databases, resource usage,
data demographics, and security rules. It records specifications about tables, views, and
macros. It also contains information about ownership, space allocation, accounting, and
access rights (privileges) for these objects.
Data Dictionary/Directory information is updated automatically during the processing of
Teradata SQL data definition (DDL) statements. It is used by the Parser to obtain
information needed to process all Teradata SQL statements.
Users may access the DD/D through Teradata-supplied views, if permitted by the system
administrator.

Page 2-16

Teradata Basics

The Data Dictionary Directory (DD/D)
The DD/D ...
– is an integrated set of system tables
– contains definitions of and information about all objects in the system
– is entirely maintained by the Teradata Database
– is “data about the data” or “metadata”
– is distributed across all AMPs like all tables
– may be queried by administrators or support staff
– is normally accessed via Teradata supplied views

Examples of DD/D views:
DBC.TablesV

– information about all tables

DBC.UsersV

– information about all users

DBC.AllRightsV

– information about access rights

DBC.AllSpaceV

– information about space utilization

Teradata Basics

Page 2-17

Structure Query Language (SQL)
Structured Query Language (SQL) is the language of relational databases. It is sometimes
referred to as a "Fourth Generation Language (4GL)" to differentiate it from "Third
Generation Languages" such as FORTRAN and COBOL, though it is quite different from
other 4GL’s. It acts as an intermediary between the user and the database.
SQL is different in some very important ways from other computer languages. Its
statements resemble English-like structures. It provides powerful, set-oriented database
manipulation including structural modification, data retrieval, modification, and security
functions.
SQL is a non-procedural language. Because of its set orientation it does not require IF,
GOTO, DO, FOR NEXT or PERFORM statements.
We'll describe three important subsets of SQL – the Data Definition Language, the Data
Manipulation Language, and the Data Control Language.

Data Definition Language (DDL)
The DDL allows a user to define the database objects and the relationships that exist
among them. Examples of DDL uses are creating or modifying tables and views.

Data Manipulation Language (DML)
The DML consists of the statements that manipulate, change or retrieve the data rows
of the database. If the DDL defines the database, the DML lets the user change the
information contained in the database. The DML is the most commonly used subset of
SQL. It is used to select, update, delete, and insert rows.

Data Control Language (DCL)
The Data Control Language is used to restrict or permit a user's access in various ways. It
can selectively limit a user's ability to retrieve, add, or modify data. It is used to grant and
revoke access privileges on tables and views. An example is granting update privileges on a
table, or read privileges on a view to specified users.

User Assistance
These commands allow you to list the objects in a database, or the characteristics of a table,
see how a query will execute, or show you the details of your system. They vary widely
from vendor to vendor.

Page 2-18

Teradata Basics

Structured Query Language (SQL)
SQL is a query language for Relational Database Systems and is used to access Teradata.
– A fourth-generation language
– A set-oriented language
– A non-procedural language (e.g., doesn’t have IF, DO, FOR NEXT, etc. )
SQL consists of:
Data Definition Language (DDL)
– Defines database structures (tables, users, views, macros, triggers, etc.)
CREATE

DROP

ALTER

Data Manipulation Language (DML)
– Manipulates rows and data values
SELECT

INSERT

UPDATE

DELETE

Data Control Language (DCL)
– Grants and revokes access rights
GRANT

REVOKE

Teradata SQL also includes Teradata Extensions to SQL
HELP

Teradata Basics

SHOW

EXPLAIN

CREATE MACRO

Page 2-19

CREATE TABLE – Example of DDL
To create and store the table structure definition in the DD/D, you can execute the CREATE
TABLE DDL statement as shown on the facing page.
An example of the output from a SHOW TABLE command follows:
SHOW TABLE Employee;

CREATE SET TABLE Per_DB.Employee, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
employee_number INTEGER NOT NULL,
manager_emp_number INTEGER NOT NULL,
dept_number INTEGER COMPRESS,
job_code INTEGER COMPRESS ,
last_name CHAR(20) NOT CASESPECIFIC NOT NULL,
first_name VARCHAR(20) NOT CASESPECIFIC,
hire_date DATE FORMAT 'YYYY-MM-DD'
birth_date DATE FORMAT 'YYYY-MM-DD',
salary_amount DECIMAL(10,2) COMPRESS 0
)
UNIQUE PRIMARY INDEX (employee_number)
INDEX (dept_number);

You can create secondary indexes after a table has been created by executing the CREATE
INDEX command. An example of creating an index for the job_code column is shown on
the facing page.
Examples of the DROP INDEX and DROP TABLE commands are also shown on the facing
page.

Page 2-20

Teradata Basics

CREATE TABLE – Example of DDL
CREATE TABLE Employee
(employee_number
INTEGER
NOT NULL
,manager_emp_number INTEGER
COMPRESS
,dept_number
INTEGER
COMPRESS
,job_code
INTEGER
COMPRESS
,last_name
CHAR(20)
NOT NULL
,first_name
VARCHAR (20)
,hire_date
DATE
FORMAT 'YYYY-MM-DD'
,birth_date
DATE
FORMAT 'YYYY-MM-DD'
,salary_amount
DECIMAL (10,2) COMPRESS 0
)
UNIQUE PRIMARY INDEX (employee_number)
INDEX (dept_number);

Other DDL Examples
CREATE INDEX (job_code) ON Employee ;
DROP INDEX (job_code) ON Employee ;
DROP TABLE Employee ;

Teradata Basics

Page 2-21

Views
A view is a pre-defined subset or filter of one or more tables. Views are used to control
access to the underlying tables and simplify access to data. Authorized users may use views
to read data specified in the view and/or to update data specified in the view.
Views are used to simplify query requests, to limit access to data, and to allow different
users to look at the same data from different perspectives.
A view is a window that accesses selected portions of a database. Views can show parts of
one table (single-table view), more than one table (multi-table view), or a combination of
tables and other views. To the user, views look just like tables.
Views are an alternate way of organizing and presenting information. A view, like a
table, has rows and columns. However, the rows and columns of a view are not stored
directly but are derived from the rows and columns of tables whenever the view is
referenced. A view looks like a table, but has no data of its own, and therefore takes up no
storage space except for its definition. One way to think of a view is as if it was a window
through which you can look at selected portions of a table or tables.

Single-table View
A single-table view takes specified columns and/or rows from a table and makes them
available in a fashion that looks like a table. An example might be an employee table from
which you select only certain columns for employees in a particular department number, for
example, department 403, and present them in a view.
Example of a CREATE VIEW statement:
CREATE VIEW Emp403_v AS
SELECT
employee_number
,department_number
,last_name
,first_name
,hire_date
FROM
Employee
WHERE
department_number = 403;

It is also possible to execute SHOW VIEW viewname;

Page 2-22

Teradata Basics

Views
Views are pre-defined filters of existing tables consisting of specified columns
and/or rows from the table(s).
A single table view:
– is a window into an underlying table
– allows users to read and update a subset of the underlying table
– has no data of its own
EMPLOYEE (Table)
MANAGER
EMPLOYEE EMP
NUMBER
NUMBER

DEPT
NUMBER

JOB
CODE

PK

FK

FK

FK

1006
1008
1005
1004
1007
1003

1019
1019
0801
1003
1005
0801

301
301
403
401
403
401

312101
312102
431100
412101
432101
411100

LAST
NAME

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

FIRST
NAME

HIRE
DATE

BIRTH
DATE

SALARY
AMOUNT

John
Carol
Loretta
Darlene
Arnando
James

861015
870201
861015
861015
870102
860731

631015
680517
650910
560423
470131
570619

3945000
3925000
4120000
4630000
5970000
4785000

Emp403_v (View)
EMP NO
1005
801

Teradata Basics

DEPT NO
403
403

LAST NAME
Villegas
Ryan

FIRST NAME
Arnando
Loretta

HIRE DATE
870102
861015

Page 2-23

Multi-Table Views
A multi-table view combines data from more than one table into one pre-defined view.
These views are also called “join views” because more than one table is involved.
An example might be a view that shows employees and the name of their department,
information that comes from two different tables.
Note: Multi-table Views are read only. The user cannot update the data via the view.
One might wish to create a view containing the last name and department name for all
employees.
A Join operation joins rows of multiple tables and creates rows in work space or spool.
These are rows that contain data from more than one table but are not maintained anywhere
in permanent storage. These rows in spool are created dynamically as part of a join
operation. Rows are matched up based on Primary and Foreign Key relationships.
Example of SQL to create a join view:
CREATE VIEW EmpDept_v AS
SELECT
Last_Name
,Department_Name
FROM
Employee E INNER JOIN Department D
ON
E.dept_number = D.dept_number ;

An example of reading via this view is:
SELECT
FROM

Last_Name
,Department_Name
EmpDept_v;

This example utilizes an alias name of E for the Employee table and D for the Department
table.

Page 2-24

Teradata Basics

Multi-Table Views
A multi-table view allows users to access data from multiple tables as if it were in a single
table. Multi-table views (i.e., join views) are used for reading only, not updating.
EMPLOYEE (Table)
MANAGER
EMPLOYEE EMP
NUMBER NUMBER

DEPARTMENT (Table)

DEPT
NUMBER

JOB
CODE

LAST
NAME

FIRST
NAME

DEPT
NUMBER

DEPARTMENT
NAME

BUDGET
AMOUNT

PK

PK

FK

FK

FK

1006
1008
1005
1004
1007
1003

1019
1019
0801
1003
1005
0801

301
301
403
401
403
401

312101
312102
431100
412101
432101
411100

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

John
Carol
Loretta
Darlene
Arnando
James

501
301
302
403
402
401

MANAGER
EMP
NUMBER

FK
Marketing Sales
Research & Development
Product Planning
Education
Software Support
Customer Support

80050000
46560000
22600000
93200000
30800000
98230000

1017
1019
1016
1005
1011
1003

Joined Together
Example of SQL to create a join view:
EmpDept_v
CREATE VIEW EmpDept_v AS
SELECT
Last_Name
,Department_Name
FROM
Employee E
INNER JOIN
Department D
ON
E.dept_number = D.dept_number;

Teradata Basics

(View)

Last_Name

Department_Name

Stein
Kanieski
Ryan
Johnson
Villegas
Trader

Research & Development
Research & Development
Education
Customer Support
Education
Customer Support

Page 2-25

Macros
The Macro facility allows you to define a sequence of Teradata SQL statements (and
optionally Teradata report formatting statements) so that they execute as a single transaction.
Macros reduce the number of keystrokes needed to perform a complex task. This saves you
time, reduces the chance of errors, reduces the communication volume to Teradata, and
allows efficiencies internal to Teradata. Macros are a Teradata SQL extension.

Features of Macros







Macros are source code stored on the DBC.
They can be modified and executed at will.
They are re-optimized at execution time.
They can be executed by interactive or batch applications.
They are executed by one EXECUTE command.
They can accept user-provided parameter values.

Benefits of Macros






Macros simplify and control access to the system.
They enhance system security.
They provide an easy way of installing referential integrity.
They reduce the amount of source code transmitted from the client application.
They are stored in the Teradata DD/D and are available to all connected hosts.

To create a macro:
CREATE MACRO Customer_List AS
(SELECT customer_name FROM Customer; );

To execute a macro:
EXEC Customer_List;

To replace a macro:
REPLACE MACRO Customer_List AS
(SELECT customer_name, customer_number FROM Customer; );

To drop a macro:
DROP MACRO Customer_List;

Page 2-26

Teradata Basics

Macros
A MACRO is a predefined set of SQL statements which is logically stored in a database.
Macros may be created for frequently occurring queries of sets of operations.
Macros have many features and benefits:

•
•
•
•
•
•
•

Simplify end-user access
Control which operations may be performed by users
May accept user-provided parameter values
Are stored in the Teradata Database, thus available to all clients
Reduces query size, thus reduces LAN/channel traffic
Are optimized at execution time
May contain multiple SQL statements

To create a macro:
CREATE MACRO Customer_List AS (SELECT customer_name FROM Customer;);
To execute a macro:
EXEC Customer_List;
To replace a macro:
REPLACE MACRO Customer_List AS
(SELECT customer_name, customer_number FROM Customer;);

Teradata Basics

Page 2-27

HELP Commands
HELP commands (a Teradata SQL extension) are available to display information on
database objects:










Databases and Users
Tables
Views
Macros
Triggers
Join Indexes
Hash Indexes
Stored Procedures
User-Defined Functions

The facing page contains an example of a HELP DATABASE command. This command
lists the tables, views, macros, triggers, etc. in the specified database.
The Kind (TableKind) column codes represent the following:
T
O
V
M
G
P
F
I
N
J
A
B
D
E
H
Q
U
X

Page 2-28

–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–

Table
Table without a Primary Index
View
Macro
Trigger
Stored Procedure
User-defined Function
Join Index
Hash Index
Permanent Journal
Aggregate Function
Combined aggregate and ordered analytical function
JAR
External Stored Procedure
Instance or Constructor Method
Queue Table
User-defined data type
Authorization

Teradata Basics

HELP Commands
Databases and Users
HELP DATABASE
HELP USER

Customer_Service;
Dave_Jones;

Tables, Views, Macros, etc.
HELP
HELP
HELP
HELP

TABLE
VIEW
MACRO
COLUMN

Employee;
Emp_v;
Payroll_3;
Employee.*;
Employee.last_name;
HELP INDEX
Employee;
HELP TRIGGER
Raise_Trigger;
HELP STATISTICS
Employee;
HELP CONSTRAINT Employee.over_21;
HELP JOIN INDEX
Cust_Order_JI;
HELP SESSION;

Example:
HELP DATABASE Customer_Service;
*** Help information returned. 15 rows.
*** Total elapsed time was 1 second.
Table/View/Macro name
Contact
Customer
Cust_Comp_Orders
Cust_Order_JI
Department
:
Orders
Orders_Temp
Orders_HI
Raise_Trigger
Set_Ansidate_on

Kind
T
T
V
I
T
:
T
O
N
G
M

Comment
?
?
?
?
?
:
?
?
?
?
?

This is not an inclusive list of HELP
commands.

Teradata Basics

Page 2-29

SHOW Command
HELP commands display information about database objects (users/databases, tables, views,
macros, triggers, and stored procedures) and session characteristics.
SHOW commands (another Teradata extension) display the data definition (DDL)
associated with database objects (tables, views, macros, triggers, or stored procedures).
BTEQ contains a SHOW command, in addition to and separate from the SQL SHOW
command. The BTEQ SHOW provides information on the formatting and display settings
for the current BTEQ session, if applicable.

Page 2-30

Teradata Basics

SHOW Command
SHOW commands display how an object was created. Examples include:
Command
SHOW TABLE
SHOW VIEW
SHOW MACRO
SHOW TRIGGER
SHOW PROCEDURE
SHOW JOIN INDEX

Returns statement
table_name;
view_name;
macro_name;
trigger_name;
procedure_name;
join_index_name;

CREATE TABLE statement …
CREATE VIEW ...
CREATE MACRO ...
CREATE TRIGGER …
CREATE PROCEDURE …
CREATE JOIN INDEX …

SHOW TABLE Employee;
CREATE SET TABLE PD.Employee, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
Employee_Number INTEGER NOT NULL,
Emp_Mgr_Number INTEGER COMPRESS,
Dept_Number INTEGER COMPRESS,
Job_Code INTEGER COMPRESS,
Last_Name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
First_Name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
Salary_Amount DECIMAL(10,2) COMPRESS 0)
UNIQUE PRIMARY INDEX ( Employee_Number )
INDEX ( Dept_Number );

Teradata Basics

Page 2-31

EXPLAIN Facility
The EXPLAIN facility (a very useful and robust Teradata extension) allows you to preview
how Teradata will execute a query you have requested. It returns a summary of the steps the
Teradata Database would perform to execute the request. EXPLAIN also discloses the
strategy and access method to be used, how many rows will be involved, and its “cost” in
minutes and seconds. You can use EXPLAIN to evaluate a query performance and to
develop an alternative processing strategy that may be more efficient. EXPLAIN works on
any SQL request. The request is fully parsed and optimized, but it is not run. Instead, the
complete plan is returned to the user in readable English statements.
EXPLAIN also provides information about locking, sorting, row selection criteria, join
strategy and conditions, access method, and parallel step processing.
There are a lot of reasons for using EXPLAIN. The main ones we’ve already pointed out –
it lets you know how the system will do the job, what kind of results you will get back, and
the relative cost of the query. EXPLAIN is also useful for performance tuning, debugging,
pre-validation of requests, and for technical training.
The following is an example of an EXPLAIN on a very simple query doing a FTS (Full
Table Scan).
EXPLAIN SELECT * FROM Employee WHERE Dept_Number = 1018;

Explanation (full)
--------------------------------------------------------------------------1) First, we lock a distinct PD."pseudo table" for read on a RowHash to prevent global
deadlock for PD.Employee.
2) Next, we lock PD.Employee for read.
3) We do an all-AMPs RETRIEVE step from PD.Employee by way of an all-rows
scan with a condition of ("PD.Employee.Dept_Number = 1018") into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated
with high confidence to be 10 rows (730 bytes). The estimated time for this step is
0.14 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in
processing the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The
total estimated time is 0.14 seconds.

Page 2-32

Teradata Basics

EXPLAIN Facility
The EXPLAIN modifier in front of any SQL statement generates an English translation of
the Parser’s plan.
The request is fully parsed and optimized, but not actually executed.
EXPLAIN returns:

• Text showing how a statement will be processed (a plan)
• An estimate of how many rows will be involved
• A relative cost of the request (in units of time)
This information is useful for:

•
•
•
•

predicting row counts
predicting performance
testing queries before production
analyzing various approaches to a problem

EXPLAIN SELECT * FROM Employee WHERE Dept_Number = 1018;
:
3) We do an all-AMPs RETRIEVE step from PD.Employee by way of an all-rows scan with a condition of
("PD.Employee.Dept_Number = 1018") into Spool 1 (group_amps), which is built locally on the AMPs. The
size of Spool 1 is estimated with high confidence to be 10 rows (730 bytes). The estimated time for this
step is 0.14 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is
0.14 seconds.

Teradata Basics

Page 2-33

Summary
The Teradata system is a high-performance database system that permits the processing of
enormous quantities of detail data, quantities which are beyond the capability of
conventional systems.
The system is specifically designed for large relational databases. From the beginning
the Teradata system was created to do one thing: manage enormous amounts of data.
Over one thousand terabytes of on-line storage capacity is currently available making it
an ideal solution for enterprise data warehouses or even smaller data marts.
Uniform data distribution across multiple processors facilitates parallel processing. The
system is designed in such a way that the component parts divides the work up into
approximately equal pieces. This keeps all the parts busy all the time; this enables the
system to accommodate a larger number of users and/or more data.
Open architecture adapts readily to new technology. As higher-performance industry
standard computer chips and disk drives are made available, they are easily incorporated
into the architecture.
As the configuration grows, performance increase is linear.
Structured Query Language (SQL) is the industry standard for communicating with
relational databases.
The Teradata Database currently runs as a database server on a variety of Linux, UNIX,
and Windows based hardware platforms.

Page 2-34

Teradata Basics

Summary
The major components of the Teradata Database are:
Parsing Engines (PE)

• Manage sessions for users
• Parse, optimize, and send your request to the AMPs as execution steps
• Returns answer set response back to client
Message Passing Layer (MPL)

• Allows PEs and AMPs to communicate with each other
Access Module Processors (AMP)

• Owns and manages its storage
• Performs the steps sent by the PEs
Virtual Disks (Vdisk)

• Space owned by the AMP and is used to hold user data (rows within tables).
• Maps to physical space in a disk array.

Teradata Basics

Page 2-35

Module 2: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 2-36

Teradata Basics

Module 2: Review Questions
1. What language is used to access a Teradata table?

2. What are five Teradata database objects?

3. What are four major components of the Teradata architecture?

4. What are views?

5. What are macros?

Teradata Basics

Page 2-37

Notes

Page 2-38

Teradata Basics

Module 3
Teradata Database Architecture

After completing this module, you will be able to:
 Describe the purpose of the PE and the AMP.
 Describe the overall Teradata Database parallel architecture.
 Describe the relationship of the Teradata Database to its
client side applications.

Teradata Proprietary and Confidential

Teradata Database Architecture

Page 3-1

Notes

Page 3-2

Teradata Database Architecture

Table of Contents
Teradata and MPP Systems .......................................................................................................... 3-4
Teradata Functional Overview ..................................................................................................... 3-6
Channel-Attached Client Software Overview.............................................................................. 3-8
Network-Attached Client Software Overview ........................................................................... 3-10
The Parsing Engine .................................................................................................................... 3-12
Message Passing Layer .............................................................................................................. 3-14
The Access Module Processor (AMP) ....................................................................................... 3-16
Teradata Parallelism ................................................................................................................... 3-18
Module 3: Review Questions ..................................................................................................... 3-20

Teradata Database Architecture

Page 3-3

Teradata and MPP Systems
Teradata is the software that makes a MPP system appear to be a single system to users and
administrators.
The BYNET (BanYan NETwork) is the software and hardware interconnect that provides
high performance networking capabilities to Teradata MPP (Massively Parallel Processing)
systems.
Using communication switching techniques, the BYNET allows for point-to-point, multicast, and broadcast communications among the nodes, thus supporting a monumental
increase in throughput in very large databases. This technology allows Teradata users to
grow massively parallel databases without fear of a communications bottleneck for any
database operations.
Although the BYNET software also supports the multicast protocol, Teradata software uses
the point-to-point protocol whenever possible. When an all-AMP operation is needed,
Teradata software uses the broadcast protocol to broadcast the request to the AMPs.
The BYNET is linearly scalable for point-to-point communications. For each new node
added to the system, an additional 960 MB (with BYNET Version 4) of bandwidth is added,
thus providing scalability as the system grows. Scalability comes from the fact that multiple
point-to-point circuits can be established concurrently. With the addition of another node,
more circuits can be established concurrently.

Page 3-4

Teradata Database Architecture

Teradata and MPP Systems
Teradata is the software that makes a MPP system appear to be a single system
to users and administrators.
BYNET 0

BYNET 1

The major components of
the Teradata Database are
implemented as virtual
processors (vproc).

• Parsing Engine (PE)
• Access Module

PE

PE

PE

PE

PE

PE

PE

PE

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

:

:

:

:

:

:

:

:

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

Processor (AMP)
The Communication Layer
or Message Passing Layer
(MPL) consists of PDE and
BYNET SW/HW and
connects multiple nodes
together.

Teradata Database Architecture

PDE
O.S.

PDE
O.S.

PDE
O.S.

PDE
O.S.

Node 0

Node 1

Node 2

Node 3

Page 3-5

Teradata Functional Overview
The client may be a mainframe system (e.g., IBM) in which case it is channel-attached to
the Teradata Database. Also, a client may be a PC or UNIX-based system that is LAN or
network-attached.
The client application submits an SQL request to the Teradata Database, receives the
response, and submits the response to the user.
The Call Level Interface (CLI) is a library of routines that resides on the client side. Client
application programs use these routines to perform operations such as logging on and off,
submitting SQL queries and receiving responses which contain the answer set. These
routines are 98% the same in a network-attached environment as they are in a channelattached.

Page 3-6

Teradata Database Architecture

Teradata Functional Overview
Channel-Attached System

Network-Attached System
Client
Application

Client
Application

ODBC, JDBC, or .NET

CLI

CLI

Teradata Database
MTDP
Channel

TDP

Parsing
Engine

LAN

Parsing
Engine

MOSI
Message Passing Layer

AMP

Teradata Database Architecture

AMP

AMP

AMP

Page 3-7

Channel-Attached Client Software Overview
In channel-attached systems, there are three major software components, which play
important roles in getting the requests to and from the Teradata Database.
The client application is either written by a programmer or is one of Teradata’s provided
utility programs. Many client applications are written as “front ends” for SQL submission,
but they also are written for file maintenance and report generation. Any client-supported
language may be used provided it can interface to the Call Level Interface (CLI).
For example, a user could write a COBOL application with “embedded SQL”. The
application developer would have to use the Teradata COBOL Preprocessor and COBOL
compiler programs to generate an object module and link this object module with the CLI.
The CLI application interface provides maximum control over Teradata connectivity and
access.
The Call Level Interface (CLI) is the lowest level interface to the Teradata Database. It
consists of system calls which create sessions, allocate request and response buffers, create
and de-block “parcels” of information, and fetch response information to the requesting
client.
The Teradata Director Program (TDP) is a Teradata-supplied program that must run on
any client system that will be channel-attached to the Teradata Database. The TDP manages
the session traffic between the Call-Level Interface and the Database. Its functions include
session initiation and termination, logging, verification, recovery, and restart, as well as
physical input to and output from the PEs, (including session balancing) and the
maintenance of queues. The TDP may also handle system security.
The Host Channel Adapter is a mainframe hardware component that allows the mainframe
to connect to an ESCON or Bus/Tag channel.
The PBSA (PCI Bus ESCON Adapter) is a PCI adapter card that allows a Teradata server to
connect to an ESCON channel.
The PBCA (PCI Bus Channel Adapter) is a PCI adapter card that allows a Teradata server to
connect to a Bus/Tag channel.

Page 3-8

Teradata Database Architecture

Channel-Attached Client Software Overview
Channel-Attached System
Client
Application

Client
Application

CLI

CLI

Channel (ESCON or FICON)
Host Channel
Adapter

PBSA

TDP
Parsing
Engine

Parsing
Engine

Client Application
– Your own application(s)
– Teradata utilities (BTEQ, etc.)

CLI (Call-Level Interface) Service Routines
– Request and Response Control
– Parcel creation and blocking/unblocking
– Buffer allocation and initialization

TDP (Teradata Director Program)
– Session balancing across multiple PEs
– Insures proper message routing to/from the Teradata Database
– Failure notification (application failure, Teradata restart)

Teradata Database Architecture

Page 3-9

`

Network-Attached Client Software Overview
In a network-attached environment, the SMPs running Teradata will typically have 1 or
more Ethernet adapters that are used to connect to Teradata via a LAN connection. One of
the key reasons for having multiple Ethernet adapters in a node is redundancy.
In network-attached systems, there are four major software components that play
important roles in getting the requests to and from the Teradata Database.
The client application is written by the programmer using a client-supported language such
as “C”. The purpose of the application is usually to submit SQL statements to the Teradata
Database and perform processing on the result sets. The application developer can “embed”
SQL statements in the application and use the Teradata Preprocessor to interpret the
embedded SQL statements.
In a networked environment, the application developer can use either the CLI interface or
the ODBC driver to access Teradata.
The Teradata CLI application interface provides maximum control over Teradata
connectivity and access. The ODBC and JDBC drivers are a much more open standard and
are widely used with client applications.
The Teradata ODBC™ (Open Database Connectivity) or JDBC (Java) drivers use open
standards-based ODBC or JDBC interfaces to provide client applications access to Teradata
across LAN-based environments.
Note: ODBC 3.02.0 is the minimum certified version for Teradata V2R5.
The Micro Teradata Director Program (MTDP) is a Teradata-supplied program that must
be linked to any application that will be network-attached to the Teradata Database. The
MTDP performs many of the functions of the channel based TDP including session
management. The MTDP does not control session balancing across PEs. Connect and
Assign Servers that run on the Teradata system handle this activity.
The Micro Operating System Interface (MOSI) is a library of routines providing
operating system independence for clients accessing the Teradata Database. By using
MOSI, we only need one version of the MTDP to run on all network-attached platforms.
Teradata Gateway software executes on every node. Gateway software runs as a number
of tasks. Two of the key tasks are called "ycgastsk" (assign task) and "ycgcntsk" (connect
task). On a 4-node system with one gateway, only one node has the assign task (ycgastsk)
running on it and every node will have the connect task (ycgcntsk) running on it. Initial
session assignment is done by the assign task and will assign a user session to a PE and to
the connect task in the same node as the PE. The connect task on a node will handle
connections to the PEs on that node.

Page 3-10

Teradata Database Architecture

Network-Attached Client Software Overview
LAN-Attached Servers

Client
Application
(ex., FastLoad)

Client
Application
(ex., SQL
Assistant)

CLI

ODBC (CLI)

MTDP

MTDP
MOSI

MOSI

LAN (TCP/IP)

Ethernet Adapter
Gateway Software (tgtw)
Parsing
Engine

Parsing
Engine

Client
Application
(ex., BTEQ)
CLI
MTDP
MOSI

CLI (Call Level Interface)
–

Library of routines for blocking/unblocking requests and responses to/from the Teradata
Database

ODBC™ (Open Database Connectivity), JDBC™ (Java), or .NET Drivers
–

Use open standards-based ODBC, JDBC, or .NET interfaces to provide client applications
access to Teradata.

MTDP (Micro Teradata Director Program)
–

Library of session management routines

MOSI (Micro Operating System Interface)
–

Library of routines providing OS independent interface

Teradata Database Architecture

Page 3-11

The Parsing Engine
Parsing Engines (PEs) are made up of the following software components: session control,
the Parser, the Optimizer, and the Dispatcher.
Once a valid session has been established, the PE is the component that manages the
dialogue between the client application and the Teradata Database.
The major functions performed by session control are logon and logoff. Logon takes a
textual request for session authorization, verifies it, and returns a yes or no answer. Logoff
terminates any ongoing activity and deletes the session’s context. When connected to an
EBCDIC host the PE converts incoming data to the internal 8-bit ASCII used by the
Teradata Database, thus allowing input values to be properly evaluated against the database
data.
When a PE receives an SQL request from a client application, the Parser interprets the
statement, checks it for proper SQL syntax and evaluates it semantically. The PE also must
consult the Data Dictionary/Directory to ensure that all objects and columns exist and that
the user has authority to access these objects.
The Optimizer’s role is to develop the least expensive plan to return the requested response
set. Processing alternatives are evaluated and the fastest alternative is chosen. This
alternative is converted to executable steps, to be performed by the AMPs, which are then
passed to the dispatcher.
The Dispatcher controls the sequence in which the steps are executed and passes the steps
on to the Message Passing Layer. It is composed of execution control and response control
tasks. Execution control receives the step definitions from the Parser, transmits the step
definitions to the appropriate AMP or AMPs for processing, receives status reports from the
AMPs as they process the steps, and passes the results on to response control once the AMPs
have completed processing. Response control returns the results to the user. The Dispatcher
sees that all AMPs have finished a step before the next step is dispatched.
Depending on the nature of the SQL request, the step will be sent to one AMP, a few AMPs,
or all AMPs.
Note: Teradata Gateway software can support up to 1200 sessions per processing node.
Therefore a maximum of 10 Parsing Engines can be defined for a node using the Gateway.

Page 3-12

Teradata Database Architecture

The Parsing Engine
Answer Set Response

SQL Request

The Parsing Engine is responsible for:
Parser

• Managing individual sessions (up to
120)

Parsing
Engine

• Parsing and Optimizing your SQL

Optimizer

requests

• Dispatching the optimized plan to the

Dispatcher

AMPs

• Input conversion (EBCDIC / ASCII) - if
necessary
Message Passing Layer

• Sending the answer set response back
to the requesting client

AMP

AMP

Teradata Database Architecture

AMP

AMP

Page 3-13

Message Passing Layer
The Message Passing Layer (MPL) or Communications Layer handles the internal
communication of the Teradata Database. All communication between PEs and AMPs is
done via the Message Passing Layer.
When the PE dispatches the steps for the AMPs to perform, they are dispatched onto the
MPL. The messages are routed to the appropriate AMP(s) where results sets and status
information are generated. This response information is also routed back to the requesting
PE via the MPL.
The Message Passing Layer is a combination of the Teradata PDE software, the BYNET
software, and the BYNET interconnect itself.
PDE and BYNET software - used for multi-node MPP systems and single-node SMP
systems. With a single-node SMP, the BYNET device driver is used in conjunction with the
PDE even though a physical BYNET network is not present.
Depending on the nature of the dispatch request, the communication may be a:
Broadcast
- message is routed to all AMPs and PEs on the system
Multi-Cast
- message is routed to a group of AMPs
Point-to-Point - message is routed to one specific AMP or PE on the system
The technology of the MPL is a key piece in the system part that makes possible the
parallelism of the Teradata Database.

Page 3-14

Teradata Database Architecture

Message Passing Layer
SQL Request

Answer Set Response

The Message Passing Layer or
Communications Layer is responsible for:

• Carrying messages between the AMPs
and PEs

Parsing
Engine

• Point-to-Point, Multi-Cast, and
Broadcast communications

• Merging answer sets back to the PE
• Making Teradata parallelism possible

Message Passing Layer
(PDE and BYNET)

AMP

AMP

AMP

The Message Passing Layer or
Communications Layer is a combination of:
AMP

• Parallel Database Extensions (PDE)
Software

• BYNET Software
• BYNET Hardware for MPP systems

Teradata Database Architecture

Page 3-15

The Access Module Processor (AMP)
The Access Module Processor (AMP) is responsible for managing a portion of the
database. An AMP will control some portion of each table on the system. AMPs do all of
the physical work associated with generating an answer set including, sorting, aggregating,
formatting and converting.
An AMP responds to Parser/Optimizer steps transmitted across the MPL by selecting data
from or storing data to its disks. For some requests the AMPs may also redistribute a copy
of the data to other AMPs.
The Database Manager subsystem resides on each AMP. It receives the steps from the
Dispatcher and processes the steps. To do that it has the ability to lock databases and tables,
to create, modify, or delete definitions of tables, to insert, delete, or modify rows within
the tables, and to retrieve information from definitions and tables. It collects accounting
statistics, recording accesses by session so those users can be billed appropriately. Finally,
the Database manager returns responses to the Dispatcher.
Earlier in this course we discussed the logical organization of data into tables. The
Database Manager provides a bridge between that logical organization and the physical
organization of the data on disks. The Database Manager performs a space management
function that controls the use and allocation of space.
AMPs also perform output data conversion, checking the session and changing the
internal, 8-bit ASCII used by Teradata to the format of the requester. This is the reverse of
the process performed by the PE when it converts the incoming data into internal ASCII.

Page 3-16

Teradata Database Architecture

The Access Module Processor (AMP)
SQL Request

Answer Set Response

Parsing
Engine

Message Passing Layer

AMP

AMP

AMP

AMP

AMPs store and retrieve rows to and from disk.

Teradata Database Architecture

The AMPs are responsible for:
• Accesses storage using Teradata's File
System Software
• Lock management
• Sorting rows
• Aggregating columns
• Join processing
• Output conversion and formatting
• Creating answer set for client
• Disk space management
• Accounting
• Special utility protocols
• Recovery processing
Teradata File System Software:
• Translates DatabaseID/TableID/RowID
into location on storage
• Controls a portion of physical storage
• Allocates storage space by “Cylinders”

Page 3-17

Teradata Parallelism
Parallelism is at the very heart of the Teradata Database. There is virtually no part of the
system where parallelism has not been built in. Without the parallelism of the system,
managing enormous amounts of data would either not be possible or, best case, would be
prohibitively expensive and inefficient.
Each PE can support up to 120 user sessions in parallel. This could be 120 distinct users, or
a single user harnessing the power of all 120 sessions for a single application.
Each session may handle multiple requests concurrently. While only one request at a time
may be active on behalf of a session, the session itself can manage the activities of 16
requests and their associated answer sets.
The Message Passing Layer was designed such that it can never be a bottleneck for the
system. Because the MPL is implemented differently for different platforms, this means that
it will always be well within the needed bandwidth for each particular platform’s maximum
throughput.
Each AMP can perform up to 80 tasks in parallel. This means that AMPs are not dedicated
at any moment in time to the servicing of only one request, but rather are multi-threading
multiple requests concurrently. The value 80 represents the number of AMP Worker Tasks
and may be changed on some systems.
Because AMPs are designed to operate on only one portion of the database, they must
operate in parallel to accomplish their intended results.
In addition to this, the optimizer may direct the AMPs to perform certain steps in parallel if
there are no contingencies between the steps. This means that an AMP might be
concurrently performing more than one step on behalf of the same request.
A recently added feature called Parallel CLI allows for parallelizing the client application,
particularly useful for multi-session applications. This is accomplished by setting a few
environmental variables and requires no changes to the application code.
In truth, parallelism is built into the Teradata Database from the ground up!

Page 3-18

Teradata Database Architecture

Teradata Parallelism
PE

PE

PE

Session A

Session C

Session E

Session B

Session D

Session F

Message Passing Layer
AMP 0
Task 1
Task 2
Task 3

AMP 1
Task 4
Task 5
Task 6

AMP 2
Task 7
Task 8
Task 9

AMP 3

Parallelism is built into
Teradata from the
ground up!

Task 10
Task 11
Task 12

Notes:
• Each PE can handle up to 120 sessions in parallel.
• Each Session can handle multiple REQUESTS.
• The Message Passing Layer can handle all message activity in parallel.
• Each AMP can perform up to 80 tasks in parallel.
• All AMPs can work together in parallel to service any request.
• Each AMP can work on several requests in parallel.

Teradata Database Architecture

Page 3-19

Module 3: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 3-20

Teradata Database Architecture

Module 3: Review Questions
1. What are the two software elements that accompany an application on all client side environments?
2. What is the purpose of the PE?
3. What is the purpose of the AMP?
4. How many sessions can a PE support?

Match Quiz
____ 1. CLI

a. Does Aggregating and Locking

____ 2. MTDP

b. Validates SQL syntax

____ 3. MOSI

c. Connects AMPs and PEs

____ 4. Parser

d. Balances sessions across PEs

____ 5. AMP

e. Provides Client side OS independence

____ 6. Message Passing Layer

f. Library of Session Management Routines

____ 7. TDP

g. PE S/W turns SQL into AMP steps

____ 8. Optimizer

h. PE S/W sends plan steps to AMP

____ 9. Dispatcher

i. Library of Teradata Service Routines

____10. Parallelism

j. Foundation of Teradata architecture

Teradata Database Architecture

Page 3-21

Notes

Page 3-22

Teradata Database Architecture

Module 4
Teradata Databases and Users

After completing this module, you will be able to:

•

Distinguish between a Teradata Database and Teradata User.

•

Define Perm Space and explain how it is used.

•

Define Spool Space and its use.

•

Visualize the hierarchy of objects in a Teradata system.

Teradata Proprietary and Confidential

Creating a Teradata Database

Page 4-1

Notes

Page 4-2

Creating a Teradata Database

Table of Contents
A Teradata Database .................................................................................................................... 4-4
Tables ................................................................................................................................... 4-4
Views ................................................................................................................................... 4-4
Macros .................................................................................................................................. 4-4
Triggers ................................................................................................................................ 4-4
A Teradata User ........................................................................................................................... 4-6
Database – User Comparison ....................................................................................................... 4-8
The Hierarchy of Databases and Users ...................................................................................... 4-10
Example of a System Hierarchy................................................................................................. 4-12
Permanent Space ........................................................................................................................ 4-14
Spool Space ................................................................................................................................ 4-16
Temporary Space ....................................................................................................................... 4-18
Creating Tables .......................................................................................................................... 4-20
Data Types ................................................................................................................................. 4-22
Access Rights and Privileges ..................................................................................................... 4-24
Module 4: Review Questions ..................................................................................................... 4-26

Creating a Teradata Database

Page 4-3

A Teradata Database
A Teradata database is a collection of tables, views, macros, triggers, stored procedures, join
indexes, hash indexes, UDFs, access rights and space limits used for administration and
security. All databases have a defined upper limit of permanent space. Permanent space is
used for storing the data rows of tables. Perm space is not pre-allocated. It represents a
maximum limit. All databases also have an upper limit of spool space. Spool space is
temporary space used to hold intermediate query results or formatted answer sets to queries.
Databases provide a logical grouping for information. They are also the foundation for
space allocation and access control. We'll review the definitions of tables, views, and
macros.

Tables
A table is the logical structure of data in a database. It is a two-dimensional structure
made up of columns and rows. A user defines a table by giving it a table name that refers
to the type of data that will be stored in the table.
A column represents attributes of the table. Attributes identify, describe, or qualify the
table. Column names are given to each column of the table. All the information in a
column is the same type, for example, data of birth.
Each occurrence of an entity is stored in the table as a row. Entities are the people, things,
or events that the table is about. Thus a row would represent a particular person, thing, or
event.

Views
A view is a pre-defined subset of one of more tables or other views. It does not exist as a
real table, but serves as a reference to existing tables or views. One way to think of a view
is as a virtual table. Views have definitions in the data dictionary, but do not contain any
physical rows. Views can be used by the database administrator to control access to the
underlying tables. Views can be used to hide columns from users, to insulate applications
from database changes, and to simplify or standardize access techniques.

Macros
A macro is a definition containing one or more SQL commands and report formatting
commands that is stored in the Data Dictionary/Directory. Macros are used to simplify the
execution of frequently-used SQL commands.

Triggers
A trigger consists of one or more SQL statements that are associated with a table and
are executed when the trigger is “fired”.

Page 4-4

Creating a Teradata Database

A Teradata Database
A Teradata database is a defined logical repository for:

•
•
•
•
•

Tables
Views

•
•

Join Indexes
Hash Indexes

Macros
Triggers

•
•

Permanent Journals
User-defined Functions (UDF)

Stored Procedures

Attributes that may be specified for a database:

• Perm Space – max amount of space available for tables, stored procedures, and UDFs
• Spool Space – max amount of work space available for requests
• Temp Space – max amount of temporary table space
A Teradata database is created with the CREATE DATABASE command.
Example

CREATE DATABASE Database_2 FROM Sysdba
AS PERMANENT = 20E9, SPOOL = 500E6;
Notes:
"Database_2" is owned by "Sysdba".
A database is empty until objects are created within it.

Creating a Teradata Database

Page 4-5

A Teradata User
A user can also be thought of as a collection of tables, views, macros, triggers, stored
procedures, join indexes, hash indexes, UDFs, and access rights.
A user is almost the same as a database except that a user can actually log on to the DBS.
To accomplish this, a user must have a password. A user may or may not have perm space.
Even with no perm space, a user can access other databases depending on the privileges the
user has been granted.
Users are created with the SQL statement CREATE USER.

Page 4-6

Creating a Teradata Database

A Teradata User
A Teradata user is a database with an assigned password.
A Teradata user may logon to Teradata and access objects within:

• itself
• other databases for which it has access rights
Examples of attributes that may be specified for a user:

• Perm Space – max amount of space available for tables, stored procedures, and UDFs
• Spool Space – max amount of work space available for requests
• Temp Space – max amount of temporary table space
A user is an active repository while a database is a passive repository.
A user is created with the CREATE USER command.
Example

CREATE USER User_C FROM User_A
AS PERMANENT = 100E6
,SPOOL = 500E6
,TEMPORARY = 150E6
,PASSWORD = lucky_day ;
"User_C" is owned by "User_A".
A user is empty until objects are created within it.

Creating a Teradata Database

Page 4-7

Database – User Comparison
In Teradata, a Database and a User are essentially the same. Database/User names must be
unique within the entire system and represent the highest level of qualification in an SQL
statement.
A User represents a logon point within the hierarchy and Access Rights apply only to Users.
In many systems, end users do not have Perm space given to them. They are granted rights
to access database(s) containing views and macros, which in turn are granted rights to access
the corporate production tables.
At any time, another authorized User can change the Spool (workspace) limit assigned to a
User.
Databases may be empty. They may or may not have any tables, views, macros, triggers, or
stored procedures. They may or may not have Perm Space allocated. The same is true for
Users. The only absolute requirement is that a User must have a password.
Once Perm Space is assigned, then and only then can tables be put into the database. Views,
macros, and triggers may be added at any time, with or without Perm Space.
Remember that databases and users are both repositories for database objects. The main
difference is the user ability to logon and acquire a session with the Teradata Database.
A row exists in DBC.Dbase for each User and Database.

Page 4-8

Creating a Teradata Database

Database – User Comparison
User

Database

Unique Name
Password = Value
Define and use Perm space
Define and use Spool space
Define and use Temporary space
Set Fallback protection default
Set Permanent Journal defaults
Multiple Account strings
Logon and establish a session with a priority
May have a startup string
Default database, dateform, timezone,
and default character set
Collation Sequence

Unique Name

•
•
•
•

Define and use Perm space
Define Spool space
Define Temporary space
Set Fallback protection default
Set Permanent Journal defaults
One Account string

You can only LOGON as a known User to establish a session with Teradata.
Tables, Join/Hash Indexes, Stored Procedures, and UDFs require Perm Space.
Views, Macros, and Triggers are definitions in the DD/D and require NO Perm Space.
A database (or user) with zero Perm Space may have views, macros, and triggers, but
cannot have tables, join/hash indexes, stored procedures, or user-defined functions.

Creating a Teradata Database

Page 4-9

The Hierarchy of Databases and Users
As you define users and databases, a hierarchical relationship among them will evolve.
When you create new objects, you subtract permanent space from the assigned limit of an
existing database or user. A database or user that subtracts space from its own permanent
space to create a new object becomes the immediate owner of that new object.
An “owner” or “parent” is any object above you in the hierarchy. (Note that you can use the
terms owner and parent interchangeably.) A “child” is any object below you in the
hierarchy. An owner or parent can have many children.
The term “immediate parent” is sometimes used to describe a database or user just above
you in the hierarchy.

Page 4-10

Creating a Teradata Database

Hierarchy of Databases and Users
Maximum Perm Space – maximum
available space for a user or
database.

User DBC

Current Perm Space – space that is
currently allocated – contains
tables, stored procedures, UDFs.

User SYSDBA

No Box

No Perm Space

User_A
Database_1

Database_2

User_D

Database_3
User_B

User_C

• A new database or user must be created from an existing database or user.
• All Perm space specifications are subtracted from the immediate owner or parent.
• Perm space is a zero sum game – the total of all Perm Space for all databases and users
equals the total amount of disk space available to Teradata.

• Perm space is only used for tables, join/hash indexes, stored procedures, and UDFs.
• Perm space currently unused is available to be used as Spool or Temp space.

Creating a Teradata Database

Page 4-11

Example of a System Hierarchy
An example of a system structure for the Teradata database is shown on the facing page.

Page 4-12

Creating a Teradata Database

Example of a System Hierarchy
DBC

CrashDumps

QCD

SysAdmin

SysDBA

SystemFE

Customer_Service

A User and/or a
Database may be given
PERM space.
In this example, Mark
and Tom have no
PERM space, but
Susan does.

Sys_Calendar

CS_Users

Mark

Tom

Susan

CS_VM

CS_Tables

View_1
View_2

Table_1
Table_2
Table_3
Table_4

Macro_1
Macro_2

Users may use views and macros
to access the actual tables.

Creating a Teradata Database

Page 4-13

Permanent Space
Permanent Space (Perm space) is the maximum amount of storage assigned to a user or
database for holding table rows, Fallback tables, secondary index subtables, stored
procedures, UDFs, and permanent journals.
Perm space is specified in the CREATE statement as illustrated below. Perm space is not
pre-allocated which means that it is available on demand, as entities are created not reserved
ahead of time. Perm space is deducted from the owner’s specified Perm space and is
divided equally among the AMPs. Perm space can be dynamically modified.
The total amount of Perm space assigned divided by the number of AMPs equals the perAMP limit. Whenever the per AMP limit is exceeded on any AMP, a Database Full
message is generated.

CREATE DATABASE CS_Tables FROM Customer_Service AS
PERMANENT = 100000000000 BYTES … ;

Page 4-14

Creating a Teradata Database

Permanent Space
CREATE DATABASE CS_Tables FROM Customer_Service
AS PERMANENT = 100E9 BYTES, ... ;
AMP
Perm
Space
Limit per
AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

10 GB 10 GB 10 GB 10 GB 10 GB 10 GB 10 GB 10 GB 10 GB 10 GB

• Table rows, index subtable rows, join indexes, hash indexes, stored procedures, and
•
•
•
•
•
•
•
•

UDFs use Perm space.
Fallback protection uses twice the Perm space of No Fallback.
Perm space is deducted from the owner’s database space.
Disk space is not reserved ahead of time, but is available on demand.
Perm space is defined globally for a database.
Perm space can be dynamically modified.
The global limit divided by the number of AMPs is the per/AMP limit.
The per/AMP limit cannot be exceeded.
Good data distribution is crucial to space management.

Creating a Teradata Database

Page 4-15

Spool Space
Spool Space is work space acquired automatically by the system and used for work space
and answer sets for intermediate and final results of Teradata SQL statements (e.g.,
SELECT statements generally use Spool space to store the SELECTed data). When the
spool space is no longer needed by a query, it is released back to the system.
A Spool limit is specified in the CREATE statement shown below. This limit cannot
exceed the Spool limit of the owner. However, a single user can create multiple databases
or users, and each can have a Spool limit as large as the Spool limit of that owner.
The total amount of Spool space assigned divided by the number of AMPs equals the per
AMP limit. Whenever the per-AMP limit is exceeded on any AMP, an Insufficient Spool
message is generated to that client.

CREATE USER Susan FROM CS_Users AS
PERMANENT = 100000000 BYTES,
SPOOL = 500000000 BYTES,
PASSWORD = secret ... ;

Page 4-16

Creating a Teradata Database

Spool Space
CREATE USER Susan FROM CS_Users AS PERMANENT = 100E6 BYTES,
SPOOL = 500E6 BYTES, PASSWORD = secret … ;
AMP
Spool
Space
Limit per
AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

50 MB 50 MB 50 MB 50 MB 50 MB 50 MB 50 MB 50 MB 50 MB 50 MB

• Spool space is work space acquired automatically by the system for
intermediate query results or answer sets.

– SELECT statements generally use Spool space.
– Only INSERT, UPDATE, and DELETE statements affect table contents.

• The Spool limit cannot exceed the Spool limit of the original owner.
• The Spool limit is divided by the number of AMPS in the system, giving a perAMP limit that cannot be exceeded.

– "Insufficient Spool" errors often result from poorly distributed data or joins on
columns with large numbers of non-unique values.

– Keeping Spool rows small and few in number reduces Spool I/O.

Creating a Teradata Database

Page 4-17

Temporary Space
Temporary (Temp) Space is temporary space acquired automatically by the system when
Global Temporary tables are materialized and used.
A Temporary limit is specified in the CREATE statement shown below. This limit cannot
exceed the Temporary limit of the owner. However, a single user can create multiple
databases or users, and each can have a Temporary limit as large as the Temporary limit of
that owner.
The total amount of Temporary space assigned divided by the number of AMPs equals the
per AMP limit. Whenever the per-AMP limit is exceeded on any AMP, an Insufficient
Temporary message is generated to that client.

CREATE USER Susan FROM CS_Users AS
PERMANENT = 100000000 BYTES,
SPOOL = 500000000 BYTES,
TEMPORARY = 150000000 BYTES,
PASSWORD = secret ...

Page 4-18

Creating a Teradata Database

Temporary Space
CREATE USER Susan FROM CS_Users AS PERMANENT = 100E6 BYTES,
SPOOL = 500E6 BYTES, TEMPORARY = 150E6 BYTES, PASSWORD = secret … ;
AMP
Temporary
Space
Limit per
AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

AMP

15 MB 15 MB 15 MB 15 MB 15 MB 15 MB 15 MB 15 MB 15 MB 15 MB

• Temporary space is space acquired automatically by the system when a
"Global Temporary" table is used and materialized.

• The Temporary limit cannot exceed the Temporary limit of the original owner.
• The Temporary limit is divided by the number of AMPS in the system, giving a
per-AMP limit that cannot be exceeded.

– "Insufficient Temporary" errors often result from poorly distributed data or joins
on columns with large numbers of non-unique values.

• Note: Volatile Temporary tables and derived tables utilize Spool space.

Creating a Teradata Database

Page 4-19

Creating Tables
Creation of tables is done via the DDL portion of the SQL command vocabulary. The table
definition, once accepted, is stored in the DD/D.
Prior to Teradata 13.0, creating tables required the definition of at least one column and the
assignment of a Primary Index. With Teradata 13.0, it is possible to create tables without a
primary index. Columns are assigned data types, attributes and optionally may be assigned
constraints, such as a range constraint.
Tables, like views and macros, may be dropped when they are no longer needed. Dropping
a table both deletes the data from the table and removes the definition of the table from the
DD/D.
Secondary indexes may also optionally be assigned at table creation, or may be deferred
until after the table has been built. Secondary indexes may also be dropped, if they are no
longer needed. It is not uncommon to create secondary indexes to assist in the processing of
a specific job sequence, then to delete the index, and its associated overhead, once the job is
complete.
We will have more to say on indexes in general in future modules.

Page 4-20

Creating a Teradata Database

Creating Tables
Creating a table requires ...
– defining columns
– a primary index (Teradata 13.0 provides an option of a No Primary Index table)
– optional assignment of secondary indexes
CREATE TABLE Employee
(Employee_Number
,Last_Name
,First_Name
,Salary_Amount
,Department_Number
,Job_Code
Primary 
Secondary

INTEGER NOT NULL
CHAR(20) NOT NULL
VARCHAR(20)
DECIMAL(10,2)
SMALLINT
CHAR(3))

UNIQUE PRIMARY INDEX (Employee_Number)
INDEX (Last_Name) ;

Database objects may be created or
dropped as needed.

CREATE
DROP

Tables
Views
Macros
Triggers
Procedures

Secondary indexes may be
– created at table creation
– created after table creation
– dropped after table creation

Creating a Teradata Database

CREATE
DROP

INDEX (secondary only)

Page 4-21

Data Types
When a table is created, a data type is specified for each column. Data types are divided
into three classes – numeric, byte, and character. The facing page shows data types.
DATE is a 32-bit integer that represents the date as YYYYMMDD. It supports century and
year 2000 and is implemented with calendar-based intelligence.
TIME WITH ZONE and TIMESTAMP WITH ZONE are ANSI standard data types that
allow support of clock and time zone based intelligence.
DECIMAL (n, m) is a number of n digits, with m digits to the right of the decimal point.
BYTEINT is an 8-bit signed binary whole number that may vary in range from -128 to
+127.
SMALLINT is a 16-bit signed binary whole number that may vary in range from -32,768 to
+32,767.
INTEGER is a 32-bit signed binary whole number that may vary in size from
-2,147,483,648 to +2,147,483,647.
BIGINT is a 64-bit (8 bytes) signed binary whole number that may vary in size
from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 or as (-263 to 263 - 1).
FLOAT, REAL, and DOUBLE PRECISION is a 64-bit IEEE floating point number.
BYTE (n) is a fixed-length binary string of n bytes. BYTE and VARBYTE are never
converted to a different internal format. They can also be used for digitized objects.
VARBYTE (n) is a variable-length binary string of n bytes.
BINARY LARGE OBJECT (n) is similar to a VARBYTE; however it may be as large as
2 GB. A BLOB may be used to store graphics, video clips and binary files.
CHAR (n) is a fixed-length character string of n characters.
VARCHAR (n) is a variable-length character string of n characters.
LONG VARCHAR is the longest variable-length character string. It is equivalent to
VARCHAR (64000).
GRAPHIC, VARGRAPHIC and LONG VARGRAPHIC are the equivalent character
types for multi-byte character sets such as Kanji.
CHARACTER LARGE OBJECT (n) is similar to a VARCHAR; however it may be as
large as 2 GB. A CLOB may be used to store simple text, HTML, or XML documents.

Page 4-22

Creating a Teradata Database

Data Types
TYPE

Name

Bytes

Description

Date/Time

DATE
TIME (WITH ZONE)
TIMESTAMP (WITH ZONE)

4
6/8
10 / 12

YYYYMMDD
HHMMSSZZ
YYYYMMDDHHMMSSZZ

Numeric

DECIMAL or NUMERIC (n, m)

2, 4, 8
or 16
BYTEINT
1
SMALLINT
2
INTEGER
4
BIGINT
8
FLOAT, REAL, DOUBLE PRECISION 8

Byte

BYTE(n)
VARBYTE (n)
BLOB

0 – 64,000
0 – 64,000
0 – 2 GB
Binary Large Object (V2R5.1)

Character

CHAR (n)
VARCHAR (n)
LONG VARCHAR
GRAPHIC
VARGRAPHIC
LONG VARGRAPHIC
CLOB

0 – 64,000
0 – 64,000

Creating a Teradata Database

+ OR – (up to 18 digits V2R6.1 and prior)
(up to 38 digits is V2R6.2 feature)
-128 to +127
-32,768 to +32,767
-2,147,483,648 to +2,147,483,647
-263 to +263 - 1 (+9,223,372,036,854,775,807)
IEEE floating pt

same as VARCHAR(64,000)
0 – 32,000
0 – 32,000
0 – 2 GB

same as VARGRAPHIC(32,000)
Character Large Object (V2R5.1)

Page 4-23

Access Rights and Privileges
The diagram on the facing page shows access rights and privileges as they might be defined
for the database administrator, a programmer, a user, a system operator, and an
administrative user.
The database administrator has right to use all of the commands in the data definition
privileges, the data manipulation privileges, and the data control privileges.
The programmer has all of those except the ability to GRANT privileges to others.
A typical user is limited to data manipulation privileges, while the operator is limited to
data control privileges.
Finally, the administrative user is limited to a subset of data manipulation privileges,
SELECT and EXECUTE.
Each site should carefully consider the access rules that best meet their needs.

Page 4-24

Creating a Teradata Database

Access Rights and Privileges
Data Definition Privileges
Command
CREATE
DROP

A Sample Scenario

Object
Database and/or User
Table and/or View
Macro and/or Trigger
Stored Procedure
Role and/or Profile

Data Manipulation Privileges
SELECT
INSERT
UPDATE
DELETE

Table
View

EXECUTE

Macro and/or Stored Procedure

Data Control Privileges
DUMP
RESTORE
CHECKPOINT

Database
Table
Journal

GRANT
REVOKE

Privileges on
Databases
Users
Objects

Creating a Teradata Database

D
B
A

P
R
O
G
R
A
M
M
E
R
S

U
S
E
R

ADMIN
O
P
E
R

Page 4-25

Module 4: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 4-26

Creating a Teradata Database

Module 4: Review Questions
True or False
______ 1.
______ 2.
______ 3.
______ 4.
______ 5.
______ 6.
______ 7.

A database will always have tables.
A user will always have a password.
A user creating a subordinate user must give up some of his/her Perm Space.
Creating tables requires the definition of at least 1 column and a Primary Index.
The sum of all user and database Perm Space will equal the total space on the system.
The sum of all user and database Spool Space will equal the total space on the system.
Before a user can read a table, a database or table SELECT privilege must exist in the DD/D
for that user.
______ 8. Deleting a macro from a database reclaims Perm Space for the database.
9. Which statement is TRUE about PERM space? ____
a.
b.
c.
d.

PERM space cannot be dynamically modified.
The per/AMP limit of PERM space can be exceeded.
Tables, index subtables, and stored procedures use PERM space.
Maximum PERM space can be defined at the database or table level.

10. Which statement is TRUE about SPOOL space? ____
a.
b.
c.
d.

SPOOL space cannot be dynamically modified.
Maximum SPOOL space can be defined at the database or user level.
The SPOOL limit is dependent on the database limit where the table is located.
Maximum SPOOL space can be defined at a value greater than the immediate parent's value.

Creating a Teradata Database

Page 4-27

Notes

Page 4-28

Creating a Teradata Database

Module 5
PI Access and Mechanics

After completing this module, you will be able to:
 Explain the purpose of the Primary Index

•

Distinguish between Primary Index and Primary Key

•

Explain the role of the hashing algorithm and the hash map in
locating a row.

•

Explain the makeup of the Row ID and its role in row storage.

•

Describe the sequence of events for locating a row given its PI
value.

Teradata Proprietary and Confidential

Storing and Accessing Data Rows

Page 5-1

Notes

Page 5-2

Storing and Accessing Data Rows

Table of Contents
Primary Keys and Primary Indexes ............................................................................................. 5-4
Distribution of Rows .................................................................................................................... 5-6
Specifying a Primary Index.......................................................................................................... 5-8
Primary Index Values................................................................................................................. 5-10
Accessing Via a Unique Primary Index ..................................................................................... 5-12
Accessing Via a Non-Unique Primary Index ............................................................................. 5-14
Row Distribution Using a Unique Primary Index (UPI) – Case 1 ............................................. 5-16
Row Distribution Using a Non-Unique Primary Index (NUPI) – Case 2 .................................. 5-18
Row Distribution Using a Highly Non-Unique Primary Index (NUPI) – Case 3...................... 5-20
Which AMP has the Row? ......................................................................................................... 5-22
Hashing Down to the AMPs ...................................................................................................... 5-24
A Hashing Example ................................................................................................................... 5-26
The Hash Map ............................................................................................................................ 5-28
Hash Maps for Different Systems .............................................................................................. 5-30
Identifying Rows ........................................................................................................................ 5-32
The Row ID ................................................................................................................................ 5-34
Storing Rows (1 of 2) ................................................................................................................. 5-36
Storing Rows (2 of 2) ............................................................................................................. 5-38
Locating a Row on an AMP Using a PI ..................................................................................... 5-40
Module 5: Review Questions ..................................................................................................... 5-42

Storing and Accessing Data Rows

Page 5-3

Primary Keys and Primary Indexes
While it is true that many tables use the same columns for both Primary Indexes and
Primary Keys, Indexes are conceptually different from Keys. The table on the facing
page summarizes those differences.
A Primary Key is relational data modeling term that defines, in the logical model, the
columns that uniquely identify a row. A Primary Index is a physical database
implementation term that defines the actual columns used to distribute and access rows in a
table.
It is also true that a significant percentage of the tables in any database will use the same
column(s) for both the PI and the PK. However, one should expect that in any real-world
scenario there would be some tables that will not conform to this simplistic rule. Only
through a careful analysis of the type of processing that will take place can the tables be
properly evaluated for PI candidates. Remember, changing your mind about the columns
that comprise the PI means recreating (and reloading) the table.

Page 5-4

Storing and Accessing Data Rows

Primary Keys and Primary Indexes
•
•
•
•
•

Indexes are conceptually different from keys.
A PK is a relational modeling convention which allows each row to be uniquely identified.
A PI is a Teradata convention which determines how the row will be stored and accessed.
A significant percentage of tables may use the same columns for both the PK and the PI.
A well-designed database will use a PI that is different from the PK for some tables.
Primary Key

Primary Index

Logical concept of data modeling

Physical mechanism for access and storage

Teradata doesn’t need to recognize

Each table can have (at most) one primary index

No limit on number of columns

64 column limit

Documented in data model

Defined in CREATE TABLE statement

(Optional in CREATE TABLE)
Must be unique

May be unique or non-unique

Identifies each row

Identifies 1 (UPI) or multiple rows (NUPI)

Values should not change

Values may be changed (Delete + Insert)

May not be NULL – requires a value

May be NULL

Does not imply an access path

Defines most efficient access path

Chosen for logical correctness

Chosen for physical performance

Storing and Accessing Data Rows

Page 5-5

Distribution of Rows
Ideally, the rows of every table will be distributed among all of the AMPs. There may be
some circumstances where this is not true. What if there are fewer rows than AMPs?
Clearly in this case, at least some AMPs will hold no rows from that table. This should be
considered the exceptional situation, and not the rule. Each AMP is designed to hold a
portion of the rows of each table. The AMP is responsible for the storage, maintenance and
retrieval of the data under its control.
More ideally, the rows of each table will be evenly distributed across all of the AMPs. This
is desirable because in operations involving all rows of the table (such as a full table scan);
each AMP will have an equal portion of the work to do. When workloads are not evenly
distributed, the desired response will only be as fast as the slowest AMP.
Controlling the distribution of the rows of a table is done by the selection of the Primary
Index. The relative uniqueness of the Primary Index will determine the uniformity of
distribution of the rows of this table among the AMPs.

Page 5-6

Storing and Accessing Data Rows

Distribution of Rows
AMP

AMP

AMP

AMP

Table A rows
Table B rows

• The rows of every table are distributed among all AMPs
• Each AMP is responsible for a subset of the rows of each table.
– Ideally, each table will be evenly distributed among all AMPs.
– Evenly distributed tables result in evenly distributed workloads.

• For tables with a Primary Index (majority of the tables), the uniformity of distribution of
the rows of a table depends on the choice of the Primary Index. The actual distribution
is determined by the hash value of the Primary Index.

• For tables without a Primary Index (Teradata 13.0 feature), the rows of a table are still
distributed between the AMPs based on random generator code within the PE or AMP.
– A small number of tables will typically be created as NoPI tables. Common uses for NoPI
tables are as staging/intermediate tables used in load operations or as column partitioned
tables.

Storing and Accessing Data Rows

Page 5-7

Specifying a Primary Index
Choosing a Primary Index for a table is perhaps the most critical decision a database
designer makes. The choice will affect the distribution of the rows of the table and,
consequently, the performance of the table in a production environment. Although many
tables used combined columns as the Primary Index choice, the examples used here are
single column indexes, mostly for the sake of simplicity.
Unique Primary Indexes (UPI’s) are desirable because they guarantee the uniform
distribution of the rows of that table.
Because it is not always feasible to pick a Unique Primary Index, it is sometimes necessary
to pick a column (or columns) which have non-unique values; that is there are duplicate
values. This type of index is called a Non-Unique Primary Index or NUPI. While not a
guarantor of uniform row distribution, the degree of uniqueness of the index will determine
the degree of uniformity of the distribution. Because all rows with the same PI value end up
on the same AMP, columns with a small number of distinct values which are repeated
frequently typically do not make good PI candidates.
The choosing of a Primary Index is not an exact science. It requires analysis and
thoughtfulness for some tables and will be completely self-evident on other tables.
The Primary Index is always designated as part of the CREATE TABLE statement. Once a
Primary Index choice has been designated for a table, it cannot be changed to something
else. If an alternate choice of column(s) is desired for the PI, it is necessary to drop and
recreate the table.
Teradata, adhering to the ANSI standard, permits duplicate rows by specifying that you wish
to create a MULTISET table. In Teradata transaction mode, the default, however, is a SET
table that does not permit duplicate rows.
Also, if MULTISET is enabled, it will be overridden by choosing a UPI as the Primary
Index or by having a unique index (e.g., unique secondary) on another column(s) on the
table. Doing this effectively disables the MULTISET.
Multiset tables will be covered in more detail later in the course.
Starting with Teradata 13.0, the option of NO PRIMARY INDEX is also available.

Page 5-8

Storing and Accessing Data Rows

Specifying a Primary Index
• A Primary Index is defined at table creation.
• It may consist of a single column, or a combination of columns (up to 64 columns)
– With Teradata 13.0, an option of NO PRIMARY INDEX is available.
UPI

NUPI

NoPI

CREATE TABLE sample_1
(col_a
INTEGER
,col_b
CHAR(10)
,col_c
DATE)
UNIQUE PRIMARY INDEX (col_b);

If the index choice of column(s) is unique, then this is
referred to as a UPI (Unique Primary Index).

CREATE TABLE sample_2
(col_m
INTEGER
,col_n
CHAR(10)
,col_o
DATE)
PRIMARY INDEX (col_m);

If the index choice of column(s) isn’t unique, then this
is referred to as a NUPI (Non-Unique Primary Index).

CREATE TABLE sample_3
(col_x
INTEGER
,col_y
CHAR(10)
,col_z
DATE)
NO PRIMARY INDEX;

A NoPI choice will result in distribution of the data
between AMPs based on random generator code.

A UPI choice will result in even distribution of the
rows of the table across all AMPs.

The distribution of the rows of the table is proportional
to the degree of uniqueness of the index.

Note: Changing the choice of Primary Index requires dropping and recreating the table.

Storing and Accessing Data Rows

Page 5-9

Primary Index Values
Indexes are used to access rows from a table without having to search the entire table.
On Teradata, the Primary Index is the mechanism for assigning a data row to an AMP and
a location on the AMP’s disks. Prior to Teradata 13.0, when a table is created, a table must
have a Primary Index specified (either user-assigned or Teradata assigned). This cannot be
changed without dropping and creating the table.
Primary Indexes are very important because they have a powerful effect on the performance
of the database. The most important thing to remember is that a Primary Index is the
mechanism used to assign each row to an AMP and may be used to retrieve that row from
the AMP. Thus retrievals, updates and deletes that specify the Primary Index value will be
much faster than those queries that do not specify the PI value. Primary Index selection is
probably the most important factor in the efficiency of join processing.
Earlier we learned that the Primary Key was always unique and unchanging. This is based
on the logical model of the data. The Primary Index may (and frequently is) be different
than the Primary Key and may be non-unique; it is chosen for the physical performance of
the database.
There are three types of primary index selection – unique (UPI), non-unique (NUPI), or
NO PRIMARY INDEX.

Page 5-10

Storing and Accessing Data Rows

Primary Index Values
• The value of the Primary Index for a specific row determines the AMP assignment for
that row.

• This is done using a hashing algorithm.
PE
Row assignment
Row access

AMP

PI Value

Other table access techniques:

• Secondary index access
• Full table scans

Hashing
Algorithm

AMP

AMP

• Accessing the row by its Primary Index value is:
– always a one-AMP operation
– the most efficient way to access a row

Storing and Accessing Data Rows

Page 5-11

Accessing Via a Unique Primary Index
A Primary Index operation is always a one-AMP operation. In the case of a UPI, the oneAMP access can return, at most, one row. In the facing example, we are looking for the row
whose primary index value is 345. By specifying the PI value as part of our selection
criteria, we are guaranteed that only the AMP containing the specified row will need to be
searched.
The correct AMP is located by taking the PI value and passing it through a hashing
algorithm. The hashing takes place in the Parsing Engine. The output of the hashing
algorithm contains information that will point the request to a specific AMP. Once it has
isolated the appropriate AMP, finding the row is quick and efficient. How this happens we
will see in a future module.

Page 5-12

Storing and Accessing Data Rows

Accessing Via a Unique Primary Index
A UPI access is a one-AMP operation which may access at most a single row.
CREATE TABLE sample_1
(col_a
INTEGER
,col_b
INTEGER
,col_c
CHAR(4))
UNIQUE PRIMARY INDEX (col_b);

PE

SELECT col_a
,col_b
,col_c
FROM
sample_1
WHERE col_b = 345;

Hashing
Algorithm

AMP

col_a col_b

Storing and Accessing Data Rows

UPI = 345

AMP

col_c

col_a col_b

AMP

col_c

col_a col_b

123

345

567

234

456

678

col_c

Page 5-13

Accessing Via a Non-Unique Primary Index
A Non-Unique Primary Index operation is also a one-AMP operation. In the case of a
NUPI, the one-AMP access can return zero to many rows. In the facing example, we are
looking for the rows whose primary index value is 25. By specifying the PI value as part of
our selection criteria, we are once again guaranteeing that only the AMP containing the
required rows will need to be searched.
As before, the correct AMP is located by taking the PI value and passing it through a
hashing algorithm executing in the Parsing Engine. The output of the hashing algorithm will
once again point to a specific AMP. Once it has isolated the appropriate AMP, it must now
find all rows that have the specified value. In this example, the AMP returns two rows.

Page 5-14

Storing and Accessing Data Rows

Accessing Via a Non-Unique Primary Index
A NUPI access is a one-AMP operation which may access multiple rows.
CREATE TABLE sample_2
(col_x
INTEGER
,col_y
INTEGER
,col_z
CHAR(4))
PRIMARY INDEX (col_x);

PE

NUPI = 25

SELECT col_x
,col_y
,col_z
FROM
sample_2
WHERE col_x = 25;

Hashing
Algorithm

AMP

Both UPI and NUPI accesses
are one AMP operations.

Storing and Accessing Data Rows

col_x col_y

AMP

col_z

col_x col_y

AMP

col_z

col_x col_y

col_z

10
10

30
30

A
B

20
25

50
55

A
A

5
30

70
80

B
B

35

40

B

25

60

B

30

80

A

Page 5-15

Row Distribution Using a Unique Primary Index (UPI) –
Case 1
At the heart of the Teradata database is a way of predictably distributing and retrieving rows
across AMPs. The same value stored in the same data type will always produce the same
hash value. If the Primary Index is unique, Teradata can distribute the rows evenly. If the
Primary Index is slightly non-unique, that is, there are only four or five rows per index
value; the table will still distribute evenly. But if there are hundreds or thousands of rows
for some index values the distribution will probably be lumpy.
In this example, the Order_Number is used as a unique primary index. Since the primary
index value for Order_Number is unique, the distribution of rows among AMPs is very
uniform. This assures maximum efficiency because each AMP is doing approximately the
same amount of work. No AMPs sit idle waiting for another AMP to finish a task.
This way of storing the data provides for maximum efficiency and makes the best use of the
parallel features of the Teradata system.

Page 5-16

Storing and Accessing Data Rows

Row Distribution Using a UPI – Case 1
Orders

Notes:

O rd e r
N um ber

C u s to m e r
Num ber

O rd e r
D a te

O rd e r
S ta tu s

• Often, but not always, the PK column(s)
will be used as a UPI.
– Order_Number can be a UPI since all the

PK
UPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

2
3
1
1
2
1
3
1
2

4 /1 3
4 /1 3
4 /1 3
4 /1 0
4 /1 5
4 /1 2
4 /1 6
4 /1 3
4 /0 9

AMP

o_#

c_#

7202
7415

2
1

values are unique.

O
O
C
O
C
C
C
C
C

• Teradata will distribute different UPI
values evenly across all AMPs.
– Resulting row distribution among AMPs is
very uniform.

– Assures maximum efficiency for parallel
operations.

AMP

AMP

o_dt o_st

o_#

c_#

o_dt o_st

o_#

c_#

4/09
4/13

7325
7103

2
1

4/13
4/10

O
O

7188
7225

1
2

7402

3

4/16

C

C
C

Storing and Accessing Data Rows

AMP

o_dt o_st

o_#

c_#

4/13
4/15

7324
7384

3
1

C
C

o_dt o_st
4/13
4/12

O
C

Page 5-17

Row Distribution Using a Non-Unique Primary Index
(NUPI) – Case 2
In the example on the facing page Customer_Number has been used as a non-unique
Primary Index (NUPI). Note row distribution among AMPs is uneven. All rows with the
same primary index value (in other words, with the same customer number) are stored on
the same AMP.
Customer_Number has three possible values, so all the rows are hashed to three AMPs,
leaving the fourth AMP without rows from this table. While this distribution will work, it is
not as efficient as spreading all the rows among all the AMPs.
AMP 2 has a disproportionate number of rows and AMP 3 has none. In an all-AMP
operation AMP 2 will take longer than the other AMPs. The operation cannot complete until
AMP 2 completes its tasks. The overall operation time is increased and some of the AMPs
are under-utilized.
NUPI’s can create irregular distributions, called “skewed distributions”. AMPs that have
more than an average number or rows will take longer for full table operations than the other
AMPs will. Because an operation is not complete until all AMPs have finished, this will
cause the operation to finish less quickly due to being underutilized.

Page 5-18

Storing and Accessing Data Rows

Row Distribution Using a NUPI – Case 2
Orders

Notes:

O rd e r
N um ber

C u s to m e r
Num ber

O rd e r
D a te

O rd e r
S ta tu s

• Customer_Number may be the preferred
access column for this table, thus a good
index candidate.
– Since a customer can have multiple

PK
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

2
3
1
1
2
1
3
1
2

4 /1 3
4 /1 3
4 /1 3
4 /1 0
4 /1 5
4 /1 2
4 /1 6
4 /1 3
4 /0 9

O
O
C
O
C
C
C
C
C

AMP

o_#

c_#

7325
7202
7225

2
2
2

orders, Customer_Number will be a NUPI.

• Rows with the same PI value distribute to
the same AMP.
– Row distribution is less uniform or
skewed.

AMP

AMP

AMP

o_dt o_st

o_#

c_#

o_dt o_st

o_#

c_#

4/13
4/09
4/15

7384
7103
7415

1
1
1

4/12
4/10
4/13

C
O
C

7402
7324

3
3

7188

1

4/13

C

O
C
C

Storing and Accessing Data Rows

o_dt o_st
4/16
4/13

C
O

Page 5-19

Row Distribution Using a Highly Non-Unique Primary
Index (NUPI) – Case 3
This example uses Order_Status as a NUPI. Order_Status is a poor choice, because it
yields the most uneven distribution. Because there are only two possible values for
Order_Status, all of the rows are placed on two AMPs.
STATUS is an example of a highly non-unique Primary Index.
When choosing a Primary Index, you should never choose a column with such a severely
limited value set. The degree of uniqueness is critical to efficiency. Choose NUPI’s that
allow all AMPs to participate fairly equally.
The degree of uniqueness of a NUPI is critical to efficiency.

Page 5-20

Storing and Accessing Data Rows

Row Distribution Using a Highly Non-Unique
Primary Index (NUPI) – Case 3
Orders

Notes:

O rd e r
N um ber

C u s to m e r
Num ber

O rd e r
D a te

O rd e r
S ta tu s

PK
NUPI
7325
7324
7415
7103
7225
7384
7402
7188
7202

2
3
1
1
2
1
3
1
2

4 /1 3
4 /1 3
4 /1 3
4 /1 0
4 /1 5
4 /1 2
4 /1 6
4 /1 3
4 /0 9

AMP

O
O
C
O
C
C
C
C
C

• Values for Order_Status are “highly” nonunique.
– Order_Status would be a NUPI.
– If only two values exist, then only two
AMPs will be used for this table.

• Highly non-unique columns are generally
poor PI choices.
– The degree of uniqueness is critical to
efficiency.

AMP

AMP

o_#

c_#

o_dt o_st

o_#

c_#

7402
7202
7225

3
2
2

4/16
4/09
4/15

C
C
C

7103
7324
7325

1
3
2

7415
7188
7384

1
1
1

4/13
4/13
4/12

C
C
C

Storing and Accessing Data Rows

AMP

o_dt o_st
4/10
4/13
4/13

O
O
O

Page 5-21

Which AMP has the Row?
This discussion (rest of this module) will assume that a table has a primary index assigned
and is not using the NO PRIMARY INDEX option.
A hashing algorithm is a standard data processing technique that takes in a data value, like
last name or order number, and systematically mixes it up so that the incoming values are
converted to a number in a range from zero to the specified maximum value. A successful
hashing scheme scatters the input evenly over the range of possible output values.
It is predictable in that Smith will always hash to the same value and Jones will always hash
to another (and they do) different value. With a good hashing algorithm any patterns in the
input data should disappear in the output data. If many names begin with “S”, they should
and will not all hash to the same group of hash values. If order numbers all have “00” in the
hundreds and tens place or if all names are four letters long we should still see the hash
values spread fairly evenly over the whole range.
Textbooks still say that this requires manually designing and tuning a hash algorithm for
each new type of data values. However, the Teradata algorithm works predictably well over
any data, typically loading each AMP with variations in the range of .1% to .5% between
AMPs. For extremely large systems, the variation can be as low as .001% between AMPs.
Teradata also uses hashing quite differently than other data storage systems. Other hashed
data storage systems equate a bucket with a physical location on disk. In Teradata, a bucket
is simply an entry in a hash map. Each hash map entry points to a single AMP. Therefore,
changing the number of AMPs does not require any adjustment to the hashing algorithm.
Teradata simply adjusts the hash maps and redistributes any affected rows.
The hash maps must always be available to the Message Passing Layer. For systems using a
16-bit hash bucket number, the hash map has 65,536 entries. For systems using a 20-bit
hash bucket number, the hash map has 1,048,576 entries (approximately 1 million entries).
20-bit hash bucket numbers are available starting with Teradata 12.0.
When the hash bucket has determined the destination AMP, the full 32-bit row hash plus the
Table-ID is used to assign the row to a cylinder and a data block on the AMPs disk storage.
The 32-bit row hash can produce over 4 billion row hash values.

Page 5-22

Storing and Accessing Data Rows

Which AMP has the Row?
PARSER

SQL with primary index values
and data.

PI value = 197190

Hashing
Algorithm

For example:
Assume PI value is 197190
Table ID

Hashing
Algorithm

Row Hash
HBN

PI values
and data

000A1F4A
HBN

Message Passing Layer (Hash Maps)

AMP 0

AMP 1

...

...

AMP x

Hash Maps

AMP n - 1

AMP n

AMP #

Data Table
Summary
Row ID
Row Hash

Row Data

Uniq Value

x '00000000'

RH

Data

x'000A1F4A' 0000 0001

x 'FFFFFFFF'

Storing and Accessing Data Rows

38

The MPL accesses the Hash Map using
Hash Bucket Number (HBN) of 000A1.
Bucket # 000A1 contains the AMP
number that has this hash value –
effectively the AMP with this row.
HBN – Hash Bucket Number

Page 5-23

Hashing Down to the AMPs
The rows of all tables are distributed across the AMPs according to their Primary Index
value. The Primary Index value goes into the hashing algorithm and the output is a 32-bit
Row Hash. The high order bits (16 or 20) are referred to as the “bucket number” and are
used to identify a hash map entry. This entry, in turn, is used to identify the AMP that will
be targeted. The remaining 12 or 16 bits are not used to locate the AMP.
The entire 32-bit Row Hash is used by the selected AMP to locate the row within its disk
space.
Hash maps are uniquely configured for each size of system, thus a 96 AMP system will
have a hash map different from a 64 AMP system, but another 64 AMP system will have the
same map (if the have the same number of bits in their HBN).
Each hash map is simply an array that associates Hash Bucket Number (HBN) values or
bucket numbers with specific AMPs.
The Hash Bucket Number (prior to Teradata 12.0) has also been referred to as the DSW or
Destination Selection Word.
When a system grows, new AMPs are typically added. This requires a change to the hash
map to reflect the new total number of possible target AMPs.

Page 5-24

Storing and Accessing Data Rows

Hashing Down to the AMPs
Index value(s)

Hashing Algorithm

Row Hash
Hash Bucket
Number

Hash Map

AMP #

{
{
{
{

Storing and Accessing Data Rows

The hashing algorithm is designed to insure even distribution of
unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.

A Row Hash is the 32-bit result of applying a hashing algorithm to
an index value.
The Hash Bucket Number is represented by the high order bits
(usually 20 on newer systems) of the Row Hash.

A Hash Map is uniquely configured for each system.
It is a array of entries (buckets) which associates bucket
numbers with specific AMPs.

Two systems with the same number of AMPs will have the same
Hash Map (if both have the same number of bits in their HBN).
Changing the number of AMPs in a system requires a change to
the Hash Map.

Page 5-25

A Hashing Example
The facing page shows an example of how the hashing algorithm would produce a 32-bit
row hash value on the primary index value of 197190.
The hash value is divided into two parts. The first 20 bits in this example are the Hash
Bucket Number. These bits are also simply referred to as the Hash Bucket. The hash
bucket points to a particular hash map entry, which in turn points to one AMP. The entire
Row Hash along with the Table ID references a particular logical location on that AMP.

Page 5-26

Storing and Accessing Data Rows

A Hashing Example
Orders
Order
Number

Customer
Number

PK
UPI
197185
197186
197187
197188
197189
197190
197191
197192
197193
197194

2005
3018
1035
1035
1001
2087
1012
3600
5650
1009

Order
Date

SELECT * FROM Orders
WHERE order_number = 197190;

Order
Status

197190
2012-04-10
2012-04-10
2012-04-11
2012-04-11
2012-04-11
2012-04-11
2012-04-12
2012-04-12
2012-04-13
2012-04-13

C
O
O
C
O
C
C
C
O
O

Hashing Algorithm

000A1 F4A

32 bit Row Hash
Hash Bucket Number *

Remaining 12 bits

0000 0000 0000 1010 0001

1111 0100 1010

0

0

0

A

1

* Assumes 20-bit hash bucket numbers.

Storing and Accessing Data Rows

Page 5-27

The Hash Map
A hash map is simply an array of entries where each entry is two bytes long. The hash map
is loaded into memory and is used by Teradata software. Each entry contains an AMP
number for the system on which Teradata is implemented. The hash bucket number (or
bucket number) is an offset into the hash map to locate a specific entry (or AMP).
For systems using a 16-bit hash bucket number, the hash map has 65,536 entries. For
systems using a 20-bit hash bucket number, the hash map has 1,048,576 entries
(approximately 1 million entries).
To determine the destination AMP for a Primary Index operation, the hash map is checked
by BYNET software using the row hash information. A message is placed on the BYNET
to be sent to the target AMP using point-to-point communication.
In the example, the HBN entry 000A1 (hexadecimal) contains an entry that identified AMP
13. AMP 13 will be the recipient of the message from the Message Passing Layer.
The facing page identifies a portion of an actual primary hash map for a 26 AMP system.
An example of hash functions that can be used in SQL follows:
SELECT
HASHROW (197190) AS "Hash Value"
,HASHBUCKET (HASHROW (197190)) AS "Bucket Num"
,HASHAMP (HASHBUCKET (HASHROW (197190))) AS "AMP Num"
,HASHBAKAMP (HASHBUCKET (HASHROW (197190))) AS "AMP Fallback Num"
;
*** Query completed. One row found. 4 columns returned.
*** Total elapsed time was 1 second.
Hash Value
000A1F4A

Page 5-28

Bucket Num
161

AMP Num
13

AMP Fallback Num
0

Storing and Accessing Data Rows

The Hash Map
197190

000A1F4A

Hashing Algorithm

32 bit Row Hash
Hash Bucket Number *

Remaining 12 bits

0000 0000 0000 1010 0001

1111 0100 1010

0

0

0

A

* With 20-bit hash bucket
numbers, the hash map has
1,048,576 entries.

1

With 16-bit hash bucket
numbers, the hash map only
has 65,536 entries.

HASH MAP
0007
0008
0009
000A
000B
000C

0

1

2

3

4

5

6

7

8

9

A B C

D E

F

24
21
21
08
25
16

25
22
21
13
06
12

19
20
22
23
15
09

12
20
14
14
08
25

19
22
21
24
24
12

19
23
21
09
24
16

25
25
22
11
12
14

23
21
23
11
24
04

20
11
22
23
12
09

20
23
22
23
09
09

23
10
24
15
10
13

24
10
25
23
05
25

20
12
11
07
24
13

20
22
12
25
24
14

21
13
25
23
25
25

13
24
11
13
24
03

AMP 13

197190 2087 2012-04-11 C
Portion of actual hash map (20-bit hash bucket numbers) for a 26
AMP system. AMPs are shown in decimal format.

Storing and Accessing Data Rows

Page 5-29

Hash Maps for Different Systems
The diagrams on the facing page show a graphical representation of a Primary Hash Map for
an 8 AMP system and a Primary Hash Map for a 16 AMP system.
A data value which hashes to “000028CF” will be directed to different AMPs on different
systems. For example, this hash value will be associated with AMP 7 on an 8 AMP system
and AMP 15 on a 16 AMP system.
Note: These are the actual partial hash maps for 8 and 16 AMP systems.

Page 5-30

Storing and Accessing Data Rows

Hash Maps for Different Systems
Row Hash (32 bits)
Hash Bucket Number

Remaining bits

PRIMARY HASH MAP – 8 AMP System

0000
0001
0002
0003
0004
0005

0

1

2

3

4

5

6

7

8

9

A B C

D E

F

07
07
01
07
04
01

06
07
00
06
04
00

07
02
05
03
05
05

06
04
05
03
07
04

07
01
03
06
05
03

04
00
02
06
06
02

05
05
04
02
07
06

06
04
03
02
07
05

05
03
01
01
03
01

05
02
00
00
02
00

06
03
06
01
03
06

07
00
04
07
00
06

04
06
00
07
06
07

The integer value 337772
hashes to:

Portions of actual hash maps
with 1,048,576 hash buckets.

Storing and Accessing Data Rows

07
01
04
07
01
07

03
02
01
05
02
05

PRIMARY HASH MAP – 16 AMP System

00002 8CF
8 AMP system – AMP 07
16 AMP system – AMP 15

06
05
02
00
04
05

0000
0001
0002
0003
0004
0005

0

1

2

3

4

5

6

7

8

9

15
13
10
15
15
01

14
14
10
15
04
00

15
14
13
13
05
05

15
10
14
14
07
04

13
15
11
06
09
08

14
08
11
08
06
10

12
11
12
13
09
10

14
11
12
14
07
05

13
15
11
13
15
08

15
09
11
13
15
08

A B C
15
10
14
14
03
06

12
12
12
14
08
09

11
09
13
07
15
07

D E

F

12
09
14
08
15
06

14
13
12
07
06
11

13
10
12
15
02
05

Page 5-31

Identifying Rows
Can two different PI values come out of the hashing algorithm with the same row hash
value? The answer is “Yes”. There are two ways that can happen.
First, two different primary index values may happen to hash identically. This is called a
hash synonym.
Secondly, if a non-unique primary index is used; duplicate NUPI values will produce the
same row hash.

Page 5-32

Storing and Accessing Data Rows

Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
1254

A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.

Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.

7769

Data values input

Hash Algorithm
40A70 3BE 40A70 3BE

(John)
'Smith'

(Dave)
'Smith'

Hash Synonyms

NUPI Duplicates

Hash Algorithm

2482A D73

2482A D73

Rows have
same hash

Conclusion
A row hash is not adequate to uniquely identify a row.

Storing and Accessing Data Rows

Page 5-33

The Row ID
In order to differentiate each row in a table, every row is assigned a unique Row ID. The
Row ID is a combination of the row hash value plus a uniqueness value. The AMP
appends the uniqueness value to the row hash when it is inserted. The Uniqueness Value is
used to differentiate between PI values that generate identical row hashes.
The first row inserted with a particular row hash value is assigned a uniqueness value of 1.
Each new row with the same row hash is assigned an integer value one greater than the
current largest uniqueness value for this Row ID.
If a row is deleted or the primary index is modified, the uniqueness value can be reused.
Only the Row Hash portion is used in Primary Index operations. The entire Row ID is used
for Secondary Index support that is discussed in a later module.
In summary, Row Hash is a 32-bit value. Up to and including Teradata V2R6.2, the
Message Passing Layer looks at the high-order 16 bits (previously called “DSW” Destination Selection Word). This is used to index into the Hash Map to determine which
AMP gets the row or is used to retrieve a row. Once the AMP has been determined, the
entire 32-bits of the Row Hash are passed to the AMP. The AMP uses the entire 32-bit Row
Hash to store/retrieve the row.
Since there are only 4 billion permutations of Row Hash, you can get duplicates. NUPI
Duplicates also cause duplicate Row Hashes, therefore the Row Hash is not sufficient to
uniquely identify a row in a table. Therefore, the AMP adds another 32-bit number (called a
uniqueness value) to the Row Hash. This total 64-bit number (32-bit Row Hash + 32-bit
Uniqueness Value) is called the Row ID. This number uniquely identifies a row in a table.

Page 5-34

Storing and Accessing Data Rows

The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID

Row Hash
(32 bits)

Each stored row has
a Row ID as a prefix.

Rows are logically
maintained in Row ID
sequence.

Uniqueness Id
(32 bits)

Row ID

Row Data

Row ID

Row Data

Row Hash

Unique ID

3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:

0000
0000
0000
0000
0000
0000

Storing and Accessing Data Rows

0001
0002
0003
0001
0001
0002
:

Emp_No
1018
1020
1031
1014
1012
1021
:

Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:

First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:

Page 5-35

Storing Rows (1 of 2)
Rows are stored in a data block in Row ID sequence. As rows are added to a table with the
same row hash, the uniqueness value is incremented by one in order to provide a unique
Row ID.
Assume Last_Name is a NUPI and that all rows in this example hash to the same AMP.
The ‘John Smith’ row is assigned to AMP 3 based on the bucket number portion of the row
hash. Because it is the first row with this row hash, a uniqueness id of 1 is assigned.
The ‘Sam Adams’ row has a different row hash and thus is also assigned a uniqueness value
of 1. The bucket number, although different, also points to AMP 3 in the hash map.

Page 5-36

Storing and Accessing Data Rows

Storing Rows (1 of 2)
Assumptions:
Last_Name is defined as a NUPI.
All rows in this example hash to the same AMP.

Add a row for 'John Smith'
'Smith'

Hash Algorithm

2482A D73

Hash Map

Row ID
Row Hash

Unique ID

2482A D73 0000 0001

AMP #3

Row Data
Last_Name

First_Name

Smith

John

Etc.

Add a row for 'Sam Adams'
'Adams'

Hash Algorithm

782B7 E4D

Hash Map

Row ID
Row Hash

Unique ID

2482A D73 0000 0001
782B7 E4D 0000 0001

Storing and Accessing Data Rows

AMP #3

Row Data
Last_Name

First_Name

Smith
Adams

John
Sam

Etc.

Page 5-37

Storing Rows (2 of 2)
The ‘Fred Smith’ row hashes to the same row hash as ‘John Smith’ because it is a NUPI
duplicate. It is therefore assigned a uniqueness id of 2.
The ‘Dan Jones’ row also hashes to the same row hash because it is a hash synonym. It is
thus assigned a uniqueness id of 3.
Note: In reality, the last names of Smith and Jones DO NOT hash to the same value. This is
simply an example that illustrates how the uniqueness ID is used when a hash synonym does
occur.

Page 5-38

Storing and Accessing Data Rows

Storing Rows (2 of 2)
Add a row for 'Fred Smith' - (NUPI Duplicate)
'Smith'

Hash Algorithm

2482A D73

Hash Map

Row ID
Row Hash

Unique ID

2482A D73 0000 0001
2482A D73 0000 0002
782B7 E4D 0000 0001

AMP #3

Row Data
Last_Name

First_Name

Smith
Smith
Adams

John
Fred
Sam

Etc.

Add a row for 'Dan Jones' - (Hash Synonym)
'Jones'

Hash Algorithm

2482A D73

Hash Map

Row ID

AMP #3

Row Data

Row Hash

Unique ID

Last_Name

First_Name

2482A D73
2482A D73
2482A D73
782B7 E4D

0000 0001
0000 0002
0000 0003
0000 0001

Smith
Smith
Jones
Adams

John
Fred
Dan
Sam

Etc.

Given the row hash, what other information would be needed to find the 'Dan Jones' row?
… The 'Fred Smith' row?

Storing and Accessing Data Rows

Page 5-39

Locating a Row on an AMP Using a PI
To locate a row, the AMP file system searches through a memory-resident structure called
the Master Index. An entry in the Master Index will indicate that if a row with this Table ID
and row hash exists, then it must be on a specific disk cylinder.
The file system will use the cylinder number to locate the Cylinder Index and search through
the designated Cylinder Index. There it will find an entry that indicates that if a row with
this Table ID and row hash exists, it must be in one specific data block on that cylinder.
The file system then searches the data block until it locates the row(s) or returns a No Rows
Found condition code.

Table-id
Row-hash

Master
Index

Table-id
Row-hash

Cylinder
Index

Row-hash
PI Value

Page 5-40

Data
Block

Data Row

Storing and Accessing Data Rows

Locating a Row On An AMP Using a PI
Locating a row on an AMP
requires three input elements:
1. The Table ID
2. The Row Hash of the PI
3. The PI value itself
Table ID

M
a
s
t
e
r

Cyl 1
Index

Cyl 2
Index

I
n
d
e
x

Row Hash

START WITH:

AMP #3

APPLY TO:

Table Id
Row Hash

Master
Index

Cylinder #
Table Id
Row Hash

Cylinder
Index

Row Hash
PI Value

Data
Block

Storing and Accessing Data Rows

Cyl 3
Index

Cyl 4
Index

Cyl 5
Index

Cyl 6
Index

Cyl 7
Index

PI Value
DATA
BLOCK
Data Row
Row
Data

FIND:

Cylinder #

Data Block Address

Data Row

Page 5-41

Module 5: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 5-42

Storing and Accessing Data Rows

Module 5: Review Questions
Answer the following either as True or False as these apply to Primary Indexes:
True or False

1. UPI and NUPI equality value accesses are always a one-AMP operation.

True or False

2. UPI and NUPI indexes allow NULL in a primary index column.

True or False

3. UPI, NUPI, and NOPI tables allow duplicate rows in the table.

True or False

4. Only UPI can be used as a Primary Key implementation.

Fill in the Blanks
5. The output of the hashing algorithm is called the _____ _____.
6. To determine the target AMP, the Message Passing Layer must lookup an entry in the Hash Map
based on the _______ _______ _______.
7. A Row ID consists of a row hash plus a ____________ value.
8. A uniqueness value is required to produce a unique Row ID because of ______ ___________ and
________ ___________.
9. Once the target AMP has been determined for a PI search, the _______ ________ for that AMP is
accessed to determine the cylinder that may hold the row.
10. The Cylinder Index points us to the address and length of the data ________.

Storing and Accessing Data Rows

Page 5-43

Notes

Page 5-44

Storing and Accessing Data Rows

Module 6
Secondary Indexes and Table Scans

After completing this module, you will be able to:
 Define Secondary Indexes.
 Distinguish between the implementation of unique and
non-unique secondary indexes.
 Define Full Table Scans and what causes them.
 Describe the operation of a Full Table Scan in a parallel
environment.

Teradata Proprietary and Confidential

Secondary Indexes and Table Scans

Page 6-1

Notes

Page 6-2

Secondary Indexes and Table Scans

Table of Contents
Secondary Indexes ....................................................................................................................... 6-4
Choosing a Secondary Index........................................................................................................ 6-6
Unique Secondary Index (USI) Access........................................................................................ 6-8
Non-Unique Secondary Index (NUSI) Access .......................................................................... 6-10
Comparison of Primary and Secondary Indexes ........................................................................ 6-12
Full Table Scans ......................................................................................................................... 6-14
Module 6: Review Questions ..................................................................................................... 6-16

Secondary Indexes and Table Scans

Page 6-3

Secondary Indexes
A secondary index is an alternate path to the data. Secondary Indexes are used to improve
performance by allowing the user to avoid scanning the entire table. A Secondary Index is
like a Primary Index in that it allows the user to locate rows. It is unlike a Primary Index in
that it has no influence on the way rows are distributed among AMPs. A database designer
typically chooses a secondary index because it provides faster set selection.
Primary Index requests require the services of only one AMP to access rows, while
secondary indexes require at least two and possibly all AMPs, depending on the index and
the type of operation. A secondary index search will typically be less expensive than a full
table scan.
Secondary indexes add overhead to the table, both in terms of disk space and maintenance;
however they may be dropped when not needed, and recreated whenever they would be
helpful.

Page 6-4

Secondary Indexes and Table Scans

Secondary Indexes
There are 3 general ways to access a table:
Primary Index access

(one AMP access)

Secondary Index access

(two or all AMP access)

Full Table Scan

(all AMP access)

• A secondary index provides an alternate path to the rows of a table.
• A secondary index can be used to maintain uniqueness within a column or set of
columns.

• A table can have from 0 to 32 secondary indexes.
• Secondary Indexes:
–
–
–
–

Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance.

Secondary Indexes and Table Scans

Page 6-5

Choosing a Secondary Index
Just as with primary indexes, there are two types of secondary indexes – unique (USI) and
non-unique (NUSI).
Secondary Indexes may be specified at table creation or at any time during the life of the
table. It may consist of up to 64 columns, however to get the benefit of the index, the query
would have to specify a value for all 64 values.
Unique Secondary Indexes (USI) have two possible purposes. They can speed up access
to a row which otherwise might require a full table scan without having to rely on the
primary index. Additionally, they can be used to enforce uniqueness on a column or set of
columns. This is sometimes the case with a Primary Key which is not designated as the
Primary Index. Making it a USI has the effect of enforcing the uniqueness of the PK.
Non-Unique Secondary Indexes (NUSI) are usually specified in order to prevent full table
scans. However, a NUSI does activate all AMPs – after all, the value being sought might
well exist on many different AMPs (only Primary Indexes have same values on same
AMPs). If the optimizer decides that the cost of using the secondary index is greater than a
full table scan would be, it opts for the table scan.
All secondary indexes cause an AMP local subtable to be built and maintained as column
values change. Secondary index subtables consist of rows which associate the secondary
index value with one or more rows in the base table. When the index is dropped, the
subtable is physically removed.

Page 6-6

Secondary Indexes and Table Scans

Choosing a Secondary Index
A Secondary Index may be defined ...
– at table creation
– following table creation

(CREATE TABLE)
(CREATE INDEX)

– may be up to 64 columns

USI

NUSI

If the index choice of column(s) is
unique, it is called a USI.

If the index choice of column(s) is nonunique, it is called a NUSI.

Unique Secondary Index

Non-Unique Secondary Index

Accessing a row via a USI is a 2 AMP
operation.

Accessing row(s) via a NUSI is an all AMP
operation.

CREATE UNIQUE INDEX

CREATE INDEX

(Employee_Number) ON Employee;

(Last_Name) ON Employee;

Notes:

• Creating a Secondary Index cause an internal sub-table to be built.
• Dropping a Secondary Index causes the sub-table to be deleted.

Secondary Indexes and Table Scans

Page 6-7

Unique Secondary Index (USI) Access
The facing page shows the two AMP accesses necessary to retrieve a row via a Unique
Secondary Index access.
After the row hash of the secondary index value is calculated, the hash map points us to
AMP 1 as containing the subtable row for this USI value. After locating the subtable row in
AMP 1, we find the row-id of the base row we are seeking. This base row id (which
includes the row hash) again allows the hash map to point us to AMP 3 which contains the
base row.
Secondary index access uses the complete row-id to locate the row, unlike primary index
access, which only uses the row hash portion.
The Customer table below is the table used in the example. It is only a partial listing of the
rows.
Customer Table
Cust

Name

NUPI

USI
37
98
74
95
27
56
45

Page 6-8

Phone

White
Brown
Smith
Peters
Jones
Smith
Adams

555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666

Secondary Indexes and Table Scans

Unique Secondary Index (USI) Access
Message Passing Layer

Create USI
AMP 0

CREATE UNIQUE INDEX
(Cust) ON Customer;

AMP 1

USI Subtable

Access via USI

RowID
244, 1
505, 1
744, 4
757, 1

SELECT *
FROM
Customer
WHERE Cust = 56;

Cust
74
77
51
27

RowID
884, 1
639, 1
915, 9
388, 1

AMP 2

USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1

Cust
98
84
56
49

100

Cust
31
40
45
95

RowID
638, 1
640, 1
471, 1
778, 3

RowID
175, 1
489, 1
838, 4
919, 1

Cust
37
72
12
62

RowID
107, 1
717, 2
147, 2
822, 1

778

7

Message Passing Layer
USI Value = 56
Hashing
Algorithm

Table ID

RowID
288, 1
339, 1
372, 2
588, 1

USI Subtable

Row Hash Unique Val

100

Customer
Table ID = 100

USI Subtable

RowID
555, 6
536, 5
778, 7
147, 1

Table ID

PE

AMP 3

AMP 0

AMP 1

AMP 2

AMP 3

Base Table

Base Table

Base Table

Base Table

Row Hash USI Value
602

56

RowID Cust
USI
107, 1 37
536, 5 84
638, 1 31
640, 1 40

Name

Phone
NUPI
White 555-4444
Rice
666-5555
Adams 111-2222
Smith 222-3333

RowID Cust
USI
471, 1 45
555, 6 98
717, 2 72
884, 1 74

Name
Adams
Brown
Adams
Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

RowID Cust Name
USI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

RowID Cust
USI
639, 1 77
778, 3 95
778, 7 56
915, 9 51

Name
Jones
Peters
Smith
Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

to MPL

Secondary Indexes and Table Scans

Page 6-9

Non-Unique Secondary Index (NUSI) Access
The facing page shows an all-AMP access necessary to retrieve a row via a Non-Unique
Secondary Index access.
After the row hash of the secondary index value is calculated, the Message Passing Layer
will automatically activate all AMPs per instructions of the Parsing Engine. Each AMP
locates the subtable rows containing the qualifying value and row hash. These subtable
rows contain the row-id(s) for the base rows, which are guaranteed to be on the same AMP
as the subtable row. This reduces activity in the MPL and essentially makes the query an
AMP-local operation.
Because each AMP may have more than one qualifying row, it is possible for the subtable
row to have multiple row-ids for the base table rows.
The Customer table below is the table used in the example. It is only a partial listing of the
rows.

Customer Table
Cust

37
98
74
95
27
56
45

Page 6-10

Name

Phone

NUSI

NUPI

White
Brown
Smith
Peters
Jones
Smith
Adams

555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666

Secondary Indexes and Table Scans

Non-Unique Secondary Index (NUSI) Access
Message Passing Layer

Create NUSI
CREATE INDEX (Name)
ON Customer;

AMP 0

AMP 1

NUSI Subtable

Access via NUSI
SELECT *
FROM
Customer
WHERE Name = 'Adams';

RowID
432,8
448,1
567,3
656,1

Name
Smith
White
Adams
Rice

RowID
640,1
107,1
638,1
536,5

AMP 2

NUSI Subtable
RowID
432,3
567,2
852,1

Name
Smith
Adams
Brown

RowID
884,1
471,1 717,2
555,6

AMP 3

NUSI Subtable
RowID
432,1
448,4
567,6
770,1

Name
Smith
Black
Jones
Young

RowID
147,1
822,1
338,1
147,2

NUSI Subtable
RowID
155,1
396,1
432,5
567,1

Name
Marsh
Peters
Smith
Jones

RowID
915, 9
778, 3
778, 7
639, 1

PE
Customer Table
Table ID =
100

NUSI Value =
'Adams'

AMP 0

AMP 1

AMP 2

AMP 3

Hashing
Algorithm

Base Table
Table ID

Row Hash

Value

100

567

Adams

RowID Cust Name
NUSI
107,1
37 White
536,5
84 Rice
638,1
31 Adams
640,1
40 Smith

Phone
NUPI
555-4444
666-5555
111-2222
222-3333

Base Table
RowID Cust Name
NUSI
471,1
45 Adams
555,6
98 Brown
717,2
72 Adams
884,1
74 Smith

Phone
NUPI
444-6666
333-9999
666-7777
555-6666

Base Table
RowID Cust Name
NUSI
147,1
49 Smith
147,2
12 Young
388,1
27 Jones
822,1
62 Black

Phone
NUPI
111-6666
777-4444
222-8888
444-5555

Base Table
RowID Cust Name
NUSI
639,1
77 Jones
778,3
95 Peters
778,7
56 Smith
915,9
51 Marsh

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

to MPL

Secondary Indexes and Table Scans

Page 6-11

Comparison of Primary and Secondary Indexes
The table on the facing page compares and contrasts primary and secondary indexes:
Primary indexes are required; secondary indexes are optional. All tables must have a
method of distributing rows among AMPs -- the Primary Index.
A table can only have one primary index, but it can have up to 32 secondary indexes.
Both primary and secondary indexes can have up to 64 columns.
Secondary indexes, like primary indexes, can be either unique (USI) or non-unique (NUSI).
The secondary index does not affect the distribution of rows. Rows are only distributed
according to the Primary Index values.
Secondary indexes can be created and dropped dynamically. In other words, Secondary
Indexes can be added as needed. In fact, in some cases it is a good idea to wait and see how
the database is used and then add Secondary Indexes to facilitate that usage.
Both primary and secondary indexes affect system performance. However, Primary and
Secondary Indexes affect performance for different reasons. A poorly-chosen PI results in
“lumpy” data distribution which makes some AMPs do more work than others and slows the
system.
Secondary Indexes affect performance because they require subtables. Both indexes allow
rapid retrieval of specific rows.
Both primary and secondary indexes can be created using multiple data types.
Secondary indexes are stored in separate subtables; primary indexes are not.
Because secondary indexes require separate subtables, extra I/O is needed to maintain those
subtables.

Page 6-12

Secondary Indexes and Table Scans

Comparison of Primary and Secondary Indexes
Index Feature

Primary

Secondary

Yes*

No

Number per Table

1

0 - 32

Max Number of Columns

64

64

Unique or Non-unique

Both

Both

Affects Row Distribution

Yes

No

Created/Dropped Dynamically

No

Yes

Improves Access

Yes

Yes

Multiple Data Types

Yes

Yes

Separate Physical Structure

No

Sub-table

Extra Processing Overhead

No

Yes

May be ordered by value

No

Yes (NUSI)

May be Partitioned

Yes

No

Required?

* Not required with NoPI table in Teradata 13.0

Secondary Indexes and Table Scans

Page 6-13

Full Table Scans
A full table scan is another way to access data without using any Primary or Secondary
Indexes.
In evaluating an SQL request, the Parser examines all possible access methods and chooses
the one it believes to be the most efficient. The coding of the SQL request along with the
demographics of the table and the availability of indexes all play a role in the decision of the
Parser. Some coding constructs, listed on the facing page, always cause a full table scan. In
other cases, it might be chosen because it is the most efficient method. In general, if the
number of physical reads exceeds the number of data blocks then the optimizer may decide
that a full-table scan is faster.
With a full table scan, each data block is found using the Master and Cylinder Indexes and
each data row is accessed only once.
As long as the choice of Primary Index has caused the table rows to distribute evenly across
all of the AMPs, the parallel processing of the AMPs can do the full table scan quickly. The
file system keeps each table on as few cylinders as practical to help reduce the cost full table
scans.
While full table scans are impractical and even disallowed on some systems, the Teradata
Database routinely executes ad hoc queries with full table scans.

Page 6-14

Secondary Indexes and Table Scans

Full Table Scans
Every row of the table must be read.
All AMPs scan their portion of the table in parallel.

• Fast and efficient on Teradata due to parallelism.
Full table scans typically occur when either:

•

An index is not used in the query

• An index is used in a non-equality test
Customer
Cust_ID
USI

Cust_Name

Cust_Phone
NUPI

Examples of Full Table Scans:
SELECT * FROM Customer WHERE Cust_Phone LIKE '858-485-_ _ _ _ ';
SELECT * FROM Customer WHERE Cust_Name = 'Koehler';
SELECT * FROM Customer WHERE Cust_ID > 1000;

Secondary Indexes and Table Scans

Page 6-15

Module 6: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 6-16

Secondary Indexes and Table Scans

Module 6: Review Questions
Fill each box with either Yes, No, or the appropriate number.
USI
Access

NUSI
Access

FTS

# AMPs
# rows
Parallel Operation
Uses Hash Maps
Uses Separate Sub-table
Reads all data blocks of table

Secondary Indexes and Table Scans

Page 6-17

Notes

Page 6-18

Secondary Indexes and Table Scans

Module 7
Teradata System Architecture

After completing this module, you will be able to:
 Identify characteristics of various components.
 Specify the difference between a TPA and non-TPA node.

Teradata Proprietary and Confidential

Teradata System Architecture

Page 7-1

Notes

Page 7-2

Teradata System Architecture

Table of Contents
Teradata Database Releases ......................................................................................................... 7-4
Teradata Version 1 ................................................................................................................... 7-4
Teradata Version 2 ................................................................................................................... 7-4
Teradata Database Architecture ................................................................................................... 7-6
Teradata Database – Multiple Nodes ........................................................................................... 7-8
MPP Systems ............................................................................................................................. 7-10
Example of 3 Node Teradata Database System ......................................................................... 7-12
Example: 5650 and 6844 Disk Arrays ................................................................................... 7-12
Teradata Cliques ........................................................................................................................ 7-14
BYNET ...................................................................................................................................... 7-16
BYNET Communication Protocols ........................................................................................... 7-18
Vproc Inter-process Communication ......................................................................................... 7-20
Examples of Teradata Database Systems ................................................................................... 7-22
Example of 5650 Cabinets ......................................................................................................... 7-24
What makes Teradata’s MPP Platforms Special? ...................................................................... 7-26
Summary .................................................................................................................................... 7-28
Module 7: Review Exercises...................................................................................................... 7-30

Teradata System Architecture

Page 7-3

Teradata Database Releases
The facing page identifies various Teradata releases that have been available since 1984.
This page identifies some historical information about Teradata Version 1 systems.

Teradata Version 1
Teradata Database Version 1 platforms were first available to customers in 1984. This
first platform was the original Database Computer, DBC/1012 from Teradata. In 1991, the
NCR Corporation introduced the 3600. Both of these systems are older technologies and
both of these systems used a proprietary 16-bit operating system known as TOS (Teradata
Operating System). All AMPs and PEs were dedicated hardware processors that were
connected together using a message-passing layer known as the Ynet.
Both platforms supported channel-attached (Bus and Tag) and LAN-attached host systems.
DBC/1012 Architecture – this system was a dedicated relational database management
system. Two specific components of the DBC/1012 were the IFP and the COP, both of
which were effectively hardware Parsing Engines. The acronyms IFP and COP still appear
in the Data Dictionary even today.
Interface Processor (IFP) – the IFP was the Parsing Engine of the DBC/1012 for
channel-attached systems. For current systems (e.g., 5550), PEs that are assigned
to a channel are identified in the Data Dictionary with a type of IFP.
Communications Processor (COP) – the COP is the Parsing Engine of the DBC/1012
for network-attached systems. For current systems (e.g., 5550), PEs that are
assigned to a LAN (or network) are identified in the Data Dictionary with a type of
COP.
3600 Architecture – this system included hardware AMPs and PEs as well as multipurpose
Application Processors executing UNIX MP-RAS. Application Processors (APs) also
provided the channel and LAN connectivity. UNIX applications were executed on APs
while Teradata was executed on PEs and AMPs. All processing units were connected via
the Ynet.

Teradata Version 2
Starting with Teradata Database Version 2 Release 1 (available in January 1996), the
Teradata Database became an open database system. No longer was Teradata software
dependent on a proprietary Operating System (TOS) and proprietary hardware. Rather, it
was an application that initially executed ran under UNIX.
By porting the Teradata Database to a general-purpose operating system platform, a variety
of processing options against the Teradata database became possible, all within a single
system. OLTP (as well as OLCP and OLAP) applications became processing options in
addition to standard DSS.
Page 7-4

Teradata System Architecture

Teradata Database Releases
Teradata Releases
Version 1 Release 1
Release 2
Release 3
Release 4
Release 5

Version 1 was a combination of hardware and software.

Version 2 Release 1
Release 2
Release 3
Release 4
Release 5
Release 6

Version 2 is an implementation of Teradata PEs and AMPs as
software vprocs (virtual processors).

Teradata 12.0
Teradata 13.0
Teradata 13.10
Teradata 14.0

Teradata System Architecture

For example, if a customer needed additional AMPs, the hardware
and software components for an AMP had to be purchased,
installed, and configured.

V1 Platforms
DBC/1012
3600

Year Available
1984
1991

Teradata is effectively a database application that
executes under an operating system.
Platforms
5100

Year Available
1996 (UNIX MP-RAS only)

5650 (requires 12.0 or later)
6650, 6680
6690

2010
2011
2012

Page 7-5

Teradata Database Architecture
Teradata is effectively an application that runs under an operating system (SUSE Linux,
UNIX MP-RAS, or Windows Server 2003).
PDE software provides Teradata Database software the capability to under a specific
operating system. Parallel Database Extensions (PDE) software is an interface layer on top
of the operating system (Linux, UNIX MP-RAS, or Windows Server 2003). PDE provides
the Teradata Database with the ability to:





Run the Teradata Database in a parallel environment
Execute vprocs
Apply a flexible priority scheduler to Teradata Database sessions
Debug the operating system kernel and the Teradata Database using resident
debugging facilities

AMPs and PEs are implemented as “virtual processors - vprocs”. They run under the
control of PDE and their number is software configurable.
AMPs are associated with “virtual disks – vdisks” which are associated with logical units
(LUNs) within a disk array.
The versatility of Teradata Database is based on virtual processors (vprocs) that eliminate
dependency on specialized physical processors. Vprocs are a set of software processes that
run on a node under Teradata Parallel Database Extensions (PDE) within the multitasking
environment of the operating system.

Page 7-6

Teradata System Architecture

Teradata Database Architecture
Teradata Processing Node (e.g., 6650 node)
Operating System (e.g., SUSE Linux)
PDE and BYNET S/W (MPL)

PE vproc

Teradata Gateway Software (LANs)

...

PE vproc

AMP
vproc

AMP
vproc

AMP
vproc

AMP
vproc

AMP
vproc

AMP
vproc

AMP
vproc

...

AMP
vproc

Vdisk

Vdisk

Vdisk

Vdisk

Vdisk

Vdisk

Vdisk

...

Vdisk

• Teradata executes on a 64-bit operating system (e.g., SUSE Linux).
– Utilizes general purpose SMP/MPP hardware.
– Parallel Database Extensions (PDE) is unique per OS that Teradata is supported on.

• AMPs and PEs are implemented as virtual processors (Vprocs).
• “Shared Nothing” Architecture – each AMP has its own memory, manages its own disk
space, and executes independently of other AMPs.

Teradata System Architecture

Page 7-7

Teradata Database – Multiple Nodes
A customer may choose to implement Teradata on a small, single node SMP system for
smaller database requirements and to eventually grow incrementally to a multiple terabyte
system. A single-node SMP platform may also be used as low cost development systems.
Under the single-node (SMP) version of Teradata, PE and AMP vproc still communicate
with each other via PDE and BYNET software. All vprocs share the resources of CPUs and
memory within the SMP node.
As a customer’s Teradata database needs grow, additional nodes will probably be needed. A
multi-node system running the Teradata Database is referred to as an MPP (Massive Parallel
Processing) system.
The Teradata Database application is considered a Trusted Parallel Application (TPA).
The Teradata Database is the only TPA application available at this time.
Nodes in a system configuration may or may not be connected to the BYNET. Examples of
nodes and their purpose include:


TPA (Trusted Parallel Application) node – executes Teradata Database software.



HSN (Hot Standby Node) – is a spare node in the clique (not running Teradata)
used in event of a node failure.



Non-TPA (NOTPA) node – is an application node that does not executes Teradata
Database software.

Hot standby nodes allow spare nodes to be incorporated into the production environment.
The Teradata Database can use spare nodes to improve availability and maintain
performance levels in the event of a node failure. A hot standby node is a node that:




is a member of a clique
does not normally participate in Teradata Database operations
can be brought in to participate in Teradata Database operations to compensate for
the loss of a node in the clique

Configuring a hot standby node can eliminate the system-wide performance degradation
associated with the loss of a node. A hot standby node is added to each clique in the system.
When a node fails, all AMPs and all LAN-attached PEs on the failed node migrate to the
node designated as the hot standby node. The hot standby node becomes a production node.
When the failed node returns to service, it becomes the new hot standby node.
Configuring hot standby nodes eliminates:



Page 7-8

Restarts that are required to bring a failed node back into service.
Degraded service when vprocs have migrated to other nodes in a clique.

Teradata System Architecture

Teradata Database – Multiple Nodes
BYNET

TPA Node 1

TPA Node 2

Operating System (e.g., Linux)
PDE and BYNET
PE vproc
AMP
vproc

AMP
vproc

AMP
vproc

Vdisk

Vdisk

Vdisk

Operating System (e.g., Linux)

Gateway Software

...

PDE and BYNET

PE vproc

.......

PE vproc

AMP
vproc

AMP
vproc

AMP
vproc

AMP
vproc

Vdisk

Vdisk

Vdisk

Vdisk

Gateway Software

...

PE vproc

.......

AMP
vproc

Vdisk

Teradata is a linearly expandable database – as your database grows, additional nodes
may be added – effectively becoming an MPP (Massive Parallel Processing) systems.

•

Teradata software makes a multi-node system look like a single-Teradata system.

Examples of types of nodes that connect to the BYNET.
• TPA (Trusted Parallel Application) node – executes Teradata Database software.
• HSN (Hot Standby Node) – spare node in the clique (not running Teradata) used in event of a node
failure.

•

Non-TPA (NOTPA) node – application node that does not executes Teradata Database software.

Teradata System Architecture

Page 7-9

MPP Systems
When multiple SMP nodes (simply referred to as nodes) are connected together to form a
larger configuration, we refer to this as an MPP (Massively Parallel Processing) system.
The connecting layer (or system interconnect) is called the BYNET. The BYNET is a
combination of hardware and software that allows multiple vprocs on multiple nodes to
communicate with each other.
Because Teradata is a linearly expandable database system, as additional nodes and vprocs
are added to the system, the system capacity scales in a linear fashion.
The BYNET Version 1 can support up to 128 SMP nodes. The BYNET Version 2 can
support up to 512 nodes. The BYNET Version 3 can support up to 1024 nodes and BYNET
Version 4 can support up to 4096 nodes.
Acronyms that may appear in diagrams throughout this course:
PCI – Peripheral Component Interconnect
EISA – Extended Industry Standard Architecture
PBCA – PCI Bus Channel Adapter
PBSA – PCI Bus ESCON Adapter
EBCA – EISA Bus Channel Adapter

Page 7-10

Teradata System Architecture

MPP Systems
The BYNET consists of
redundant switches that
interconnect multiple
nodes.

BYNET Switch

TPA Node

BYNET Switch

TPA Node

HSN Node

Multiple nodes make up
Massively Parallel Processing
(MPP) system.

A clique is a group of nodes
connected to and sharing the
same storage.

Teradata System Architecture

:

:

:

:

:

:

:

:

Page 7-11

Example of 2+1 Node Teradata System
The facing page contains an illustration of a simple three-node (2+1) Teradata Database
system. Each node has its own Vprocs to manage, while communication among the Vprocs
takes place via the BYNETs. The PEs are not shown in this example.
Each node is an SMP from a configuration standpoint. Each node has its own CPUs,
memory, UNIX and PDE software, Teradata Database software, BYNET software, and
access to one or more disk arrays.
Nodes are the building blocks of MPP systems. A system size is typically expressed in
terms of number of nodes.
AMPs provide access to user data stored within tables that are physically stored on disk
arrays.
Each AMP is associated with a Vdisk. Each AMP sees its Vdisk as a single disk. Teradata
(AMP software) organizes its data on its disk space (Vdisk) using a Teradata “File System”
structure.
A Vdisk may be actually composed of multiple Pdisks - Physical disk. A Pdisk is assigned
to physical drives in a disk array.

Example: 6650 and Internal 6844 Disk Arrays
The facing page contains an example of a 3-node (2+1) clique sharing two 6844 disk arrays.
Each node has Fibre Channel adapters and Fibre Channel cables (point-to-point connections)
to connect to the disk arrays.

Page 7-12

Teradata System Architecture

Example of 2+1 Node Teradata System
SMP001-7 AMPs
0

1

SMP002-6 AMPs

……. 29

30

SMP002-7

31 ……. 59

Hot Standby Node

AMP 0
Vdisk 0

600 GB

Pdisk 0

600 GB

Pdisk 1

600 GB

:

:

:

:

:

:

:

:

600 GB

MaxPerm = 1.08 TB*
* Actual space is app. 90%.

120 disks

120 disks

2+1 node clique sharing 240 drives; 30 AMPs/node; Linux System

Teradata System Architecture

Page 7-13

Teradata Cliques
A clique is a set of Teradata nodes that share a common set of disk arrays. In the event of
node failure, all vprocs can migrate to another available node in the clique. All nodes in the
clique must have access to the same disk arrays.
The illustration on the facing page shows a 6-node system consisting of two cliques, each
containing three nodes. Because all disk arrays are available to all nodes in the clique, the
AMP vprocs will still have access to the rows they are responsible for.

Page 7-14

Teradata System Architecture

Teradata Cliques
BYNET Switch

TPA Node

: : : :

•
•
•
•

HSN

TPA Node

BYNET Switch

TPA Node

: : : :

HSN

TPA Node

: : : :

: : : :

A clique is a defined set of nodes that share a common set of disk arrays.
All nodes in a clique must be able to access all Vdisks for all AMPs in the clique.
A clique provides protection from a node failure.
If a node fails, all vprocs will migrate to the remaining nodes in the clique (Vproc
Migration) or to a Hot Standby Node (HSN).

Teradata System Architecture

Page 7-15

BYNET
There are two physical BYNETs, BYNET 0 and BYNET 1. Both are fully operational and
provide fault tolerance in the event of a BYNET failure. The BYNETs automatically handle
load balancing and message routing. BYNET reconfiguration and message rerouting in the
event of a component failure is also handled transparently to the application.

Page 7-16

Teradata System Architecture

BYNET
BYNET 0

Node

Node

HSN

BYNET 1

Node

Node

HSN

Node

Node

HSN

The BYNET is a dual redundant, bi-directional interconnect network.

• All nodes are connected to both BYNETs. This example shows three (2+1) cliques.
BYNET Features:

•
•
•
•
•
•

Enables multiple nodes to communicate with each other.
Automatic load balancing of message traffic.
Automatic reconfiguration after fault detection.
Fully operational dual BYNETs provide fault tolerance.
Scalable bandwidth as nodes are added.
Even though there are two physical BYNETs to provide redundancy and bandwidth,
the Teradata Database and TCP/IP software only see a single network.

Teradata System Architecture

Page 7-17

BYNET Communication Protocols
Using communication-switching techniques, the BYNET allows for point-to-point,
multicast, and broadcast communications among the nodes, thus supporting a monumental
increase in throughput in very large databases. This technology allows Teradata users to
grow massively parallel databases without fear of a communications bottleneck for any
database operations.
Although the BYNET software supports the multi-cast protocol, Teradata only uses this
protocol with Group AMPs operations. This is a Teradata feature starting with release
V2R5. Teradata software will use the point-to-point protocol whenever possible. When an
all-AMP operation is needed, Teradata software uses the broadcast protocol to send
messages to the different SMPs.
The BYNET is linearly scalable for point-to-point communications. For each new node
added to a system with BYNET V4, an additional 960 MB of additional bandwidth is added
to each BYNET, thus providing scalability as the system grows. Scalability comes from the
fact that multiple point-to-point circuits can be established concurrently. With the addition
of another node, more circuits can be established concurrently.
For broadcast and multicast operations with BYNET V4, the bandwidth is 960 MB per
second per BYNET.
BYNET V1 (old implementation) had a bandwidth of 10 MB per second per direction per
BYNET for a node.

Page 7-18

Teradata System Architecture

BYNET Communication Protocols
BYNET 0

PE

PE

BYNET 1

PE

PE
Hot Standby Node

AMP

...

AMP

AMP

...

AMP

Point-to-Point (one-to-one)
One vproc communicates with one vproc (e.g., 1 PE to 1 AMP). Scalable bandwidth:
• BYNET v2 – 60 MB x 2 (bi-directional) x 2 BYNETs = 240 MB per node
• BYNET v3 – 93.75 MB x 2 (bi-directional) x 2 BYNETs = 375 MB per node
• BYNET v4 – 240 MB x 2 (bi-directional) x 2 BYNETs = 960 MB per node

Multi-Cast (one-to-many)
One vproc communicates to a subset of vprocs (e.g., Group AMP operations).

Broadcast (one-to-all)
One vproc communicates to all vprocs (e.g., 1 PE to all AMPs). Not scalable.

Teradata System Architecture

Page 7-19

Vproc Inter-process Communication
The “message passing layer” is a combination of two pieces of software and hardware– the
PDE and the BYNET device drivers and software and the BYNET hardware.
Communication among vprocs in an MPP system may be either inter-node or intra-node.
When vprocs within the same node communicate they do not require the physical transport
services of the BYNET. However, they do use the highest levels of the BYNET software
even though the messages themselves do not leave the node.
When vprocs must communicate across nodes, they must use the physical transport services
of the BYNET requiring movement of the data. Any broadcast messages, for example, will
go out to the BYNET, even for the AMPs and PEs that are in the same node.
Communication among vprocs in a single SMP system occurs with the PDE and BYNET
software, even though a physical BYNET does not exist in a single-node system.

Page 7-20

Teradata System Architecture

Vproc Inter-process Communication
Single-Node System

PDE and BYNET s/w
vproc

vproc

vproc

vproc

vproc

vproc

Teradata Database

MPP Systems

BYNET

PDE and BYNET s/w

vproc

vproc

vproc

vproc

vproc

vproc

vproc

vproc

vproc

vproc

vproc

vproc

Teradata Database

Node 1

Teradata System Architecture

PDE and BYNET s/w

Teradata Database

Node 2

Page 7-21

Examples of Teradata Database Systems
The facing page identifies various SMP servers and MPP systems that are supported for the
Teradata Database.
The following dates indicate when these systems were generally available to customers
(GCA – General Customer Availability).
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–

5100M
4700/5150
4800/5200
4850/5250
4851/4855/5251/5255
4900/5300
4950/5350
4980/5380
5400E/5400H
5450E/5450H
5500E/5500C/5500H
2500/5550H
2550
1550
2555/5555C/H
1600/5600C/H
2650/5650C/H
6650C/H, 6680

January, 1996 (not described in this course)
January, 1998 (not described in this course)
April, 1999
June, 2000
July, 2001
March, 2002
December, 2002
August, 2003
March, 2005
April, 2006
March, 2007
January, 2008
October, 2008
December, 2008
March, 2009
February, 2010
July, 2010 (Internal release; Official release Oct 2010)
2011

The Teradata Database is also available on non-Teradata platforms. The Teradata Database
is available on the Intel-based mid-range platforms running Microsoft Windows 2003 or
Linux. For example, Dell provides processing nodes that are used in some of the Teradata
appliance systems.

Page 7-22

Teradata System Architecture

Examples of Teradata Database Systems
Examples of systems used with the Teradata Database include:
Active Enterprise Data Warehouse Systems
5200/525x
5300/5350/5380
5400/5450
5500/555x/56xx
6650/6680/6690

–
–
–
–
–

up to 2 nodes/cabinet
up to 4 nodes/cabinet
up to 10 nodes/cabinet
up to 9 nodes/cabinet
up to 4 nodes/cabinet with associated storage

The basic building block is the SMP (Symmetric Multi-Processing) node.
Common characteristics of these systems:

• MPP systems that use the BYNET interconnect
• Single point of operational control – AWS or SWS
• Rack-based systems – each technology is encapsulated in its own chassis
Key differences:

• Speed and capacity of SMP nodes and systems
• Cabinet architecture
• BYNET interface cards, switches and speeds
*BYNET V4 – up to 4096 nodes

Teradata System Architecture

Page 7-23

6650 Cabinets
The facing page contains two pictures of rack-based cabinets. This represents a two cabinet
3+1 6650 clique.
54xx, 55xx, and 56xx systems also used a rack-based cabinet. The rack was initially
designed for the 54xx systems and has been improved on with later systems such as 55xx
and 56xx systems.
This redesign allows for better cooling and maintenance and has a black and gray
appearance. This design is also used with the LSI disk array cabinets. The 56xx cabinet is a
different cabinet and is approximately 4” deeper than the 55xx cabinets.
An older style rack or cabinet is used for the 4700, 4800, 4850, 4851, 4855, 4900, 4950,
4980, 5200, 5250, 5251, 5255, 5300, 5350, and 5380 systems. This cabinet was similar in
size and almond in color.
The approximate external dimensions of this rack or cabinet are:
Height – 77”
Width – 24” (inside rails are 19” apart and this is often referred to as a 19” wide rack)
Depth – 40” (the 56xx/66xx cabinet is 44” deep)
This industry-standard rack is referred to as a 40U rack where a U is a unit of measure of
height of 1.75” or 4.445 cm.
The system or processor cabinet includes a Server Management (SM) chassis which is often
referred to as the CMIC (Chassis Management Interface Controller). This component is part
of the server management subsystem and interfaces with the AWS or SWS.

Page 7-24

Teradata System Architecture

6650 Cabinets
Secondary SM Switch

Secondary SM Switch

Drive Trays (16 HD)

Drive Trays (16 HD)

6844 Array
Controllers

6844 Array
Controllers

TPA Node

HSN

TPA Node

TPA Node

TMS Node (opt.)
BYA32S-1
BYA32S-0
SM – CMIC (1U)

TMS Node (opt.)

Primary SM Switch

Primary SM Switch

AC Box AC Box

AC Box AC Box

6650

6650

Teradata TPA Node
PE

. . AMP

...

AMP

Teradata uses industry standard
rack-based cabinets.
HSN – Hot Standby Node

Teradata System Architecture

TMS Node (opt.)
SM – CMIC (1U)

3+1 6650 Clique

Page 7-25

What makes Teradata’s MPP Platforms Special?
The facing page lists the major features of Teradata’s MPP systems.
Acronyms:
PUT – Parallel Upgrade Tool
AWS – Administration Workstation
SWS – Service Workstation – utilizes Server Management Web Services (SMWeb) for
the 56xx.

Page 7-26

Teradata System Architecture

What Makes Teradata’s MPP Platforms Special?
Key features of Teradata’s MPP systems include:

• Teradata Database software – allows the Teradata Database to execute on multiple
nodes and act as a single instance.

• Scalable BYNET Interconnect – as you add nodes, you add bandwidth.
• Operating system software (e.g., Linux) for a node is only aware of the resources
within the node and only has to manage those resources.

• AWS/SWS – single point of operational control and scalable server management.
• PUT (Parallel Upgrade Tool) – simplifies installation/upgrade of software across many
nodes.

• Redundant (availability) components. Examples include:
–
–
–
–
–

Hot Standby Nodes
Two BYNETs
Two Disk Array Controllers within a Disk Array
Dual AC capability for increased availability
N+1 Power Supplies within a processing node and disk arrays

Teradata System Architecture

Page 7-27

Summary
The facing page summarizes the key points and concepts discussed in this module.

Page 7-28

Teradata System Architecture

Summary
• Teradata Database is a software implementation of Teradata.
– AMPs and PEs are implemented as virtual processors (Vprocs).

• The Teradata Database utilizes a “Shared Nothing” Architecture – each AMP
has its own memory and manages its own disk space.

– Teradata is called a Trusted Parallel Application (TPA).

• Multiple nodes may be configured to provide a Massively Parallel Processing
(MPP) system.

• A clique is a defined set of nodes that share a common set of disk arrays.
• The Teradata Database is a linearly expandable RDBMS – as your database
grows, additional nodes may be added.

Teradata System Architecture

Page 7-29

Module 7: Review Exercises
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 7-30

Teradata System Architecture

Module 7: Review Questions
Complete the following.
1. Each AMP has its own memory and manages its own disk space and executes independently of
other AMPs. This is referred to as a _________ _________ architecture.
2. The software component that allows the Teradata Database to execute in different operating system
environments is the __________.
3. A physical message passing interconnect is called the _____________.
4. A clique provides protection from a _________ failure.
5. If a node fails, all vprocs will migrate to the remaining nodes in the clique. This feature is referred to
as ___________ _____________.
6. The _______ or _______ provides a single point of operational control for Teradata MPP systems.
7. A _________ node is part of a system configuration, is connected to the BYNET, and executes the
Teradata Database software.
8. A _________ node is part of a system configuration, connects to the BYNET, and is used to execute
application software other than Teradata Database software.
9. A _________ node is part of a system configuration, connects to the BYNET, and is used as a spare
node in the event of a node failure.

Teradata System Architecture

Page 7-31

Notes

Page 7-32

Teradata System Architecture

Module 8
Data Protection

After completing this module, you will be able to:
 Explain the concept of FALLBACK tables.
 List the types and levels of locking provided by Teradata.
 Describe the Recovery, Transient and Permanent Journals
and their function.
 List the utilities available for archive and recovery.

Teradata Proprietary and Confidential

Data Protection

Page 8-1

Notes

Page 8-2

Data Protection

Table of Contents
Data Protection Features .............................................................................................................. 8-4
Disk Arrays .................................................................................................................................. 8-6
RAID Technologies ..................................................................................................................... 8-8
RAID 1 – Mirroring ................................................................................................................... 8-10
RAID 10 – Striped Mirroring................................................................................................. 8-10
RAID 1 Summary ...................................................................................................................... 8-12
Cliques ....................................................................................................................................... 8-14
Large Cliques ......................................................................................................................... 8-14
Teradata Vproc Migration .......................................................................................................... 8-16
Hot Standby Nodes (HSN) ......................................................................................................... 8-18
Large Cliques ......................................................................................................................... 8-18
Performance Degradation with Node Failure ............................................................................ 8-20
Restarts ............................................................................................................................... 8-20
Fallback ...................................................................................................................................... 8-22
Fallback Clusters ........................................................................................................................ 8-24
Fallback and RAID Protection ................................................................................................... 8-26
Fallback and RAID 1 Example .................................................................................................. 8-28
Fallback and RAID 1 Example (cont.) ................................................................................... 8-30
Fallback and RAID 1 Example (cont.) ................................................................................... 8-32
Fallback and RAID 1 Example (cont.) ................................................................................... 8-34
Fallback and RAID 1 Example (cont.) ................................................................................... 8-36
Fallback vs. non-Fallback Tables Summary .............................................................................. 8-38
Clusters and Cliques................................................................................................................... 8-40
Locks .......................................................................................................................................... 8-42
Locking Modifier ....................................................................................................................... 8-44
ACCESS................................................................................................................................. 8-44
NOWAIT ............................................................................................................................... 8-44
Rules of Locking ........................................................................................................................ 8-46
Access Locks.............................................................................................................................. 8-48
Transient Journal ........................................................................................................................ 8-50
Recovery Journal for Down AMPs ............................................................................................ 8-52
Permanent Journal ...................................................................................................................... 8-54
Archiving and Recovering Data ................................................................................................. 8-56
Module 8: Review Questions ..................................................................................................... 8-58

Data Protection

Page 8-3

Data Protection Features
Disk Arrays – Disk arrays provide RAID 1, RAID 5, or RAID S data protection. If a disk
drive fails, the array subsystem provides continuous access to the data. Systems with disk
arrays are configured with redundant Fibre adapters, buses, and array controllers to provide
highly available access to the data.
Clique – a set of Teradata nodes that share a common set of disk arrays. In the event of
node failure, all vprocs can migrate to another available node in the clique. All nodes in the
clique must have access to the same disk arrays.
Locks – Locking prevents multiple users who are trying to change the same data at the same
time from violating the data's integrity. This concurrency control is implemented by locking
the desired data. Locks are automatically acquired during the processing of a request and
released at the termination of the request. In addition, users can specify locks. There are
four types of locks: Exclusive, Write, Read, and Access.
Fallback – protects your data by storing a second copy of each row of a table on an
alternative “fallback AMP”. If an AMP fails, the system accesses the fallback rows to meet
requests. Fallback provides AMP fault tolerance at the table level. With Fallback tables, if
one AMP fails, all of the table data is still available. Users may continue to use Fallback
tables without any loss of available data.
Down-AMP Recovery Journal – started automatically when the system has a failed or
down AMP. Its purpose is to log any changes to rows which reside on the down AMP.
Transient Journal – exists to permit the successful rollback of a failed transaction.
Transactions are not committed to the database until an End Transaction request has been
received by the AMPs, either implicitly or explicitly. Until that time, there is always the
possibility that the transaction may fail in which case the participating table(s) must be
restored to their pre-transaction state.
Permanent Journal – provides selective or full database recovery to a specified point in
time by keeping either before-image or after-images of rows in a journal. It permits
recovery from unexpected hardware or software disasters.
ARC and NetVault/NetBackup – ARC command scripts provide the capability to backup
and restore the Teradata database. The NetVault and NetBackup utilities provide a GUI
based front-end for creation and execution of ARC command scripts.

Page 8-4

Data Protection

Data Protection Features
Facilities that provide system-level protection
Disk Arrays
– RAID data protection (e.g., RAID 1)
– Redundant SCSI and/or Fibre Channel buses and array controllers

Cliques and Vproc Migration
– SMP or O.S. failures - Vprocs can migrate to other nodes within the clique.

Facilities that provide Teradata DB protection
Fallback

– provides data access with a “down” AMP

Locks

– provides data integrity

Transient Journal

– automatic rollback of aborted transactions

Down AMP Recovery Journal

– fast recovery of fallback rows for AMPs

Permanent Journal

– optional before and after-image journaling

ARC

– Archive/Restore facility

NetVault and NetBackup

– provide tape management and ARC script creation
and scheduling capabilities

Data Protection

Page 8-5

Disk Arrays
Disk arrays utilize a technology called RAID (Redundant Array of Independent Disks)
Spanning the entire spectrum from personal computers to mainframes, disk arrays (utilizing
RAID technology) offer significant improvements in availability, reliability and
maintainability of information storage, along with higher performance. Yet the concept
behind disk arrays is relatively simple.
A disk array subsystem consists of controller(s) which drive a set of disks. Typically, a disk
array is configured to represent a number of logical volumes (or disks), each of which
appears to be a physical disk to the user. A logical volume can be configured to reside on
multiple physical disks. The fact that a logical volume is located on 1 or more disks is
transparent to the user.
There is one immediate advantage of having the data spread across a number of individual
separate disks which arises from the redundant manner in which the data can be stored in the
disk array. The remarkable benefit of this feature is that if any single disk in the array fails,
the unit continues to function without loss of data. This is possible because redundancy
information is stored separate from the data. The redundancy information, as will be
explained, can be a copy of the data or other information that can be used to reconstruct any
data that was stored on a failed disk.
Secondly, performance increases for specific applications are possible as the effective seek
time for finding records on a given disk can potentially be reduced by allowing multiple
simultaneous accesses of different blocks on different disks. Alternatively, with a different
architecture, the rate at which data is transferred to and from the disk array can be increased
significantly over that of a single disk utilizing parallel reads and writes of the data spread
across the disks in the array. This function is referred to as “striping the data”.
Finally, disk array subsystem maintenance is typically simplified because it is possible to
replace (“hot swap”) individual disks and other components while the system continues to
function. You no longer have to bring down the system to replace a disk.

Page 8-6

Data Protection

Disk Arrays
Utilities

Applications

Host Operating System

DAC

DAC

Why Disk Arrays?

• High availability through data mirroring or data parity protection.
• Better I/O performance through implementation of RAID technology at the hardware
level.

• Convenience – automatic disk recovery and data reconstruction when mirroring or
data parity protection is used.

Data Protection

Page 8-7

RAID Technologies
RAID is an acronym for Redundant Array of Independent Disks. The term was coined in
1988 in a paper describing array configuration and application by researchers and authors
Patterson, Gibson and Katz of the University of California at Berkeley. The word redundant
implies that data, functions and/or components have been duplicated in the array’s
architecture. Duplication of data, functions, and hardware ensures that even in the event of a
failed drive or other components, data is not lost and is continuously available.
The industry currently has agreed upon six RAID configuration levels and designated them
as RAID 0 through RAID 5. The physical configuration is dictated to some extent by the
choice of RAID level; however, RAID conventions specify more precisely how data is
stored on disk.
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5

Data striping
Disk mirroring
Parallel array, hamming code
Parallel array with parity
Data parity protection, dedicated parity drive
Data parity protection, interleaved parity

With Teradata, the RAID 1 is most commonly used. RAID 5 (data parity protection) is also
available with some arrays.
There are other RAID technologies that are defined by specific vendors or are accepted in
the data processing industry. For example, RAID 10 or RAID 1+0 (or RAID 0+1) is
considered to be “striped mirroring”. RAID level classifications do not imply superiority of
one mode over another. Each mode has its rightful application. In fact, these modes of
operation can be combined within a single system configuration, within product limitations,
to obtain maximum flexibility and performance.
The advantages of RAID 1 (compared to RAID 5) include:
Superior Performance





Mirroring provides the best read and write throughput.
Maximizes the performance capabilities of controllers and disk drives.
Best performance when a drive has failed.
Less reconstruction impact when a drive has failed.

Superior Availability



Less susceptible to a double disk failure in a RAID drive group.
Faster reconstruction of a failed drive - shorter vulnerability period during
reconstruction.

Superior Price/Performance - the performance advantage of RAID 1 outweighs the
additional cost for typical Teradata warehouses.
Page 8-8

Data Protection

RAID Technologies
RAID – Redundant Array of Independent Disks
RAID technology provides data protection at the disk drive level. With RAID 1 and RAID
5 technologies, access to the data is continuous even if a disk drive fails.

RAID technologies available with Teradata:
RAID 1

Disk mirroring, used with NetApp (LSI Logic) and EMC2 Disk Arrays.

RAID 5

Data parity protection, interleaved parity, RAID 5 provides more
capacity, but less performance than RAID 1.

For Teradata:
RAID 1

Most useful with typical Teradata data warehouses (e.g., Active Data
Warehouses). Most frequently used RAID technology.

RAID 5

Most useful when creating archival data warehouses that require less
expensive storage and where performance is not as important.
Not frequently used with Teradata systems (not covered in this class).

Data Protection

Page 8-9

RAID 1 – Mirroring
RAID 1 is data mirroring protection. The RAID 1 technology requires each primary data
disk to have a companion disk or mirror. The contents of the primary disk and the mirror
disk are identical.
When data is written on the primary disk, a write also occurs on the mirror disk. The
mirroring process is invisible to the user. For this reason, RAID 1 is also called transparent
mirroring.
With RAID solutions, mirroring is managed by the controller, which provides a higher level
of performance. Performance is improved because data can be read from either the primary
(data) drive or the mirror. The controller decides which read/write assembly (drive actuator)
is closest to the requested data.
If the primary data disk fails, the mirror disk can be accessed without data loss. There is a
minor performance penalty if a drive fails because the array controller can read from either
drive if both drives are available. If either disk fails, the disk array controller can copy the
data from the remaining drive to a replacement drive while normal operations continue.

RAID 10 – Striped Mirroring
When user data is to be written to the array, the controller instructs the array to write a block
of data to one drive pair to the defined stripe depth. Subsequent data blocks are written
concurrently to contiguous sectors in the next drive pair to the defined stripe depth. In this
manner, data are striped across the array of drives, utilizing multiple drives and actuators.
With LSI Logic arrays, striped mirroring is automatic when you create a drive group (with
RAID 1 technology) that has multiple mirrored pairs of disks.
If an application (e.g., Teradata Database) uniformly distributes data, striped mirroring
(RAID 10 or 1+0) and mirroring (RAID 1) will have similar performance.
If an application (database) partitions data, striped mirroring (RAID 10) can lead to
performance gains over mirroring (RAID 1) because array controllers equally spread I/O’s
between channels in the array.

Striped Mirroring is NOT necessary with Teradata.

Page 8-10

Data Protection

RAID 1 – Mirroring
• 2 Drive Groups each with 1 mirrored pair of disks
• Operating system sees 2 logical disks (LUNs) or volumes
• If LUN 0 has more activity, more disk I/Os occur on the first two drives in the array.

Disk Array Controller
LUN 0

LUN 1
Mirror 1

Disk 3

Mirror 3

Block A0

Block A0

Block B0

Block B0

Block A1

Block A1

Block B1

Block B1

Block A2

Block A2

Block B2

Block B2

Block A3

Block A3

Block B3

Block B3

Disk 1

2 Drive Groups each with 1 pair of
mirrored disks

Notes:

• If the physical drives are 600 GB each, then each LUN or volume is effectively 600 GB.
• If both logical units (or volumes) are assigned to an AMP, then the AMP will have approximately
1.2* TB assigned to it.
* Actual MaxPerm space will be a little less.

Data Protection

Page 8-11

RAID 1 Summary
RAID 1 characteristics include:






Data is fully replicated
Easy to understand technology
Follows a traditional approach
Transparent to the operating system
Redundant drive is affected only by write operations

RAID 1 advantages include:





High I/O rate (small logical block size)
Maximum data availability
Minor performance penalty with single drive failure
No performance penalty in write intensive environments

RAID 1 disadvantage is:


Only 50% of total disk space is available for user data. Therefore, RAID 1 has
50% overhead in disk space usage.

Summary



RAID 1 provides high data availability and performance, but storage costs are high.
Striped mirroring is not necessary with Teradata.

RAID 1 for Teradata - most useful with typical Teradata data warehouses (e.g., Active
Data Warehouses).
RAID 5 for Teradata - most useful when creating archival data warehouses that
require less expensive storage and where performance is not as important.

Page 8-12

Data Protection

RAID 1 Summary
Characteristics

• data is fully replicated
• striped mirroring is possible with multiple pairs of disks in a drive group
• transparent to operating system
Advantages (compared to RAID 5)

•
•
•
•
•

Provides maximum data availability
Mirroring provides the best read and write throughput
Maximizes the performance capabilities of controllers and disk drives
Minimal performance issues when a drive has failed
Less reconstruction impact when a drive has failed

Disadvantage

• 50% of disk space is used for mirrored data
Summary

• RAID 1 provides best data availability and performance, but storage costs are higher.
• Striped Mirroring is NOT necessary with Teradata.

Data Protection

Page 8-13

Cliques
A clique is a set of Teradata nodes that share a common set of disk arrays. In the event of
node failure, all vprocs can migrate to available nodes in the clique. All nodes in the clique
must have access to the same disk arrays.
The illustration on the facing page shows a three-node clique. In this example, each AMP
has 24 AMP vprocs.
In the event of node failing, the remaining nodes will attempt to absorb all vprocs from the
failed node.

Large Cliques
A large clique is usually a set of 8 Teradata nodes that share a common set of disk arrays
via a set of Fibre Channel switches. In the event of a node failure, AMP vprocs can migrate
to the other available nodes in the clique. In this case, work is distributed among 7 nodes
and the performance degradation is approximately 14%.
After the failed node is recovered/repaired and rebooted, a second restart of Teradata is
needed to reuse the node that had failed. The restart will redistribute the AMPs to the
recovered node.
Acronyms:
DAC – Disk Array Controller

Page 8-14

Data Protection

Cliques
Clique – a set of SMPs that share a common set of disk arrays.
SMP001-2

0

1

….

SMP001-4

SMP001-3

23

DAC-A

24

DAC-B

25

DAC-A

….

DAC-B

48

47

DAC-A

49

….

71

DAC-B

Example of a 2650 clique (3 nodes, no HSN) – 24 AMPs/node.

Data Protection

Page 8-15

Teradata Vproc Migration
If a TPA node (running Teradata) fails, Teradata restarts and the AMP vprocs that were
executing on the failed node are started on other nodes within the clique.
PE vprocs that are assigned to channel connections do not migrate to another node. PE
vprocs that are assigned to gateway connections may or may not (depending on
configuration) migrate to another node within the clique.
If a node fails, the vprocs from the failed node are distributed between the remaining nodes
in the clique. The vconfig.out file determines the node on which vprocs will start if all of
the nodes in the clique are available.
The following is from a “Get Config” command following the failure of SMP001-4.
DBS LOGICAL CONFIGURATION
----------------------------------------------Vproc
Number
-----0*
1
2
3
:
22
23
24
25
26
:
46
47
48
49
50
:
59
60
61
62
:
71

Page 8-16

Rel.
Vproc#
-----1
2
3
4
:
23
24
1
2
3
:
23
24
25
26
27
:
36
25
26
27
:
36

Node
ID
-----1-02
1-02
1-02
1-02
:
1-02
1-02
1-03
1-03
1-03
:
1-03
1-03
1-02
1-02
1-02
:
1-02
1-03
1-03
1-03
:
1-03

Movable
------Yes
Yes
Yes
Yes
:
Yes
Yes
Yes
Yes
Yes
:
Yes
Yes
Yes
Yes
Yes
:
Yes
Yes
Yes
Yes
:
Yes

Crash
Count
----0
0
0
0
:
0
0
0
0
0
:
0
0
0
0
0
:
0
0
0
0
:
0

Vproc
State
------ONLINE
ONLINE
ONLINE
ONLINE
:
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
:
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
:
ONLINE
ONLINE
ONLINE
ONLINE
:
ONLINE

Config Config
Status Type
------------Online AMP
Online AMP
Online AMP
Online AMP
:
:
Online AMP
Online AMP
Online AMP
Online AMP
Online AMP
:
:
Online AMP
Online AMP
Online AMP
Online AMP
Online AMP
:
:
Online AMP
Online AMP
Online AMP
Online AMP
:
:
Online AMP

Cluster/
Host No.
-------0
1
2
3
:
22
23
24
25
26
:
46
47
48
49
50
:
59
60
61
62
:
71

RcvJrnl/
Host Type
--------On
On
On
On
:
On
On
On
On
On
:
On
On
On
On
On
:
On
On
On
On
:
On

Data Protection

Teradata Vproc Migration
Clique – a set of SMPs that share a common set of disk arrays.
SMP001-2
….

48
0

1

….

SMP001-4

SMP001-3
60

59
23

DAC-A

24

DAC-B

….
25

DAC-A

71
….

DAC-B

Node Fails

47

DAC-A

DAC-B

This example illustrates
vproc migration without
the use of Hot Standby
Nodes.

After vproc migration, the two remaining nodes each have 36 AMPs.
•

After failed node is repaired, a second restart is needed for failed node to rejoin the
configuration.

Data Protection

Page 8-17

Hot Standby Nodes (HSN)
A Hot Standby Node (HSN) is a node that is part of a clique and the hot standby node is
not configured (initially) to execute any Teradata vprocs. If a node in the clique fails, the
AMPs from the failed node move to the hot standby node. The performance degradation is
0%.
When the failed node is recovered/repaired and restarted, it becomes the new hot standby
node. A second restart of Teradata is not needed.
Characteristics of a hot standby node are:




A node that is a member of a clique.
Does not normally participate in the trusted parallel application (TPA).
Can be brought into the TPA to compensate for the loss of a node in the clique.

Hot Standby Nodes are positioned as a performance continuity feature.

Large Cliques
A large clique can also utilize a Hot Standby Node (Node).
For example, an 8-node large clique with a Hot Standby Node would consist of 7 nodes
running Teradata and 1 Hot Standby Node. The performance degradation would be 0% for
an all-AMP operation when a node fails in the clique. This configuration is often referred to
as a 7+1 configuration.
Large Clique configurations have not been supported since the introduction of the 5500.

Page 8-18

Data Protection

Hot Standby Nodes (HSN)
HSN
Node 1
A

A

A

X

Node 2

Node 3

HSN

Node 4

Node 5

Node 6

Node 7

Node 8

A

A

A

A

A

A

A
A

A
A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A
A

A
A

A

A

A

A

A

A

A

A

A

A

A

A

A

:

:

:

:

:

:

A:

A:

A:

:

:

:

:

:

:

:

:

:

A

A

A

A

A

A

A

A

A

A
A

A
A

A

A

A

A

A

A

A

A

A

A

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

Disk Array

1. Performance Degradation is 0% as AMPs
are moved to the Hot Standby Node.
2. When Node 1 is recovered, it becomes the
new Hot Standy Node.

Data Protection

This example illustrates vproc migration
using a Hot Standby Node.

Page 8-19

Performance Degradation with Node Failure
The facing page displays 2 examples of the performance degradation with all-AMP
operations that occur when a node fails. Note: WL - Workload
The top example illustrates two 3-node cliques and the performance degradation of 50% for
an all-AMP operation when a node fails in one of the cliques.
From a simple perspective, if you have 3 nodes in a clique and you lose a node, you would
logically think 33% performance degradation. In reality, the performance cost or
degradation is 50%. Assume 3 nodes, 72 AMPs, and you execute an all-AMPs query. This
query uses 240 CPU seconds per node to complete the query. The 3 nodes use a total of 720
CPU seconds to do the work. Another way to look at is that each AMP needs 10 CPU
seconds or 72 AMPs x 10 CPU seconds equals 720 CPU seconds of work to be done.
A node fails and now there are 2 nodes to complete the query. There are still 72 AMPs and
the query still needs 720 CPUs seconds to complete the query, but now there are only 2
nodes. Each node will need about 360 CPUs seconds to complete the query. Each node has
about 50% more work to do. This is why it is considered a 50% performance cost.
Another way of looking at a query is from the response time back to the user. From a user
perspective, let’s assume that response time back to the user with all 4 nodes normally active
is 1 minute (60 seconds) of wall clock time. The wall clock response time with only 2 active
nodes is 90 seconds. (Since there are fewer nodes, the query is going to take longer to
complete.) From the user perspective, the response time is 50% longer (30/60).
It is true that if you take 67% of 90, you will get 60 and you may think that the degradation
is 33%. However, 90 seconds is not the normal response time. This normal response time is
60 seconds and the exception is 90 seconds, therefore the performance is worse by
50%. The percentage is calculated from the “normal”.
The bottom example illustrates two 3-node cliques, each with a hot standby node (2 TPA
nodes) and the performance degradation of 0% for an all-AMP operation when a node fails
in the clique. This configuration is often referred to as a 2+1 configuration.

Restarts
In the first (top) example, after the failed node is recovered/repaired and rebooted, a second
restart of Teradata is needed to reuse the node that had failed. The restart will redistribute
the AMPs to the recovered node.
With a hot standby node, when the failed node is recovered/repaired and restarted, it
becomes the new hot standby node within the clique. A second restart of Teradata is not
needed.

Page 8-20

Data Protection

Performance Degradation with Node Failure
2 Cliques without HSN nodes (3 nodes) – performance degradation of 50% with node failure.
Workload = 6.0
Clique WL = 3.0
Node WL = 1.00
----------------------Workload = 6.0
Clique WL = 3.0
Node WL = 1.5

Clique 1
Node 1

Clique 1
Node 2

Clique 1
Node 3

Clique 2
Node 4

Clique 2
Node 5

Clique 2
Node 6

1.0

1.0

1.0

1.0

1.0

1.0

X

Clique 1
Node 2

Clique 1
Node 3

Clique 2
Node 4

Clique 2
Node 5

Clique 2
Node 6

1.5

1.5

1.0

1.0

1.0

When a node fails, Teradata restarts.
After the node is repaired, a second restart of Teradata is required to allow the
node to rejoin the configuration.

2 Cliques each with a HSN (3+1 nodes) – performance degradation of 0% with node failure.
Workload = 6.0
Clique WL = 3.0
Node WL = 1.00
----------------------Workload = 6.0
Clique WL = 3.0
Node WL = 1.0

Clique 1
Node 1

Clique 1
Node 2

Clique 1
Node 3

Clique 1

Clique 2
Node 4

Clique 2
Node 5

Clique 2
Node 6

1.0

1.0

1.0

HSN

Clique 2
HSN

1.0

1.0

1.0

X

Clique 1
Node 2

Clique 1
Node 3

Clique 1
(Node 1)

Clique 2
Node 4

Clique 2
Node 5

Clique 2
Node 6

1.0

1.0

1.0

1.0

1.0

1.0

Clique 2
HSN

When a node fails, Teradata restarts.
After the node is repaired, it becomes the new Hot Standby Node. A second
restart of Teradata is not required.

Data Protection

Page 8-21

Fallback
Fallback protects your data by storing a second copy of each row of a table on an alternative
“fallback AMP”. If an AMP fails, the system accesses the fallback rows to meet requests.
Fallback provides AMP fault tolerance at the table level. With Fallback tables, if one AMP
fails, all of the table data is still available. Users may continue to use Fallback tables
without any loss of available data.
When a table is created, or any time after its creation, the user may specify whether or not
the system should keep a fallback copy. If Fallback is specified, it is automatic and
transparent to the user.
Fallback guarantees that the two copies of a row will always be on different AMPs.
Therefore, if either AMP fails, the alternate row copy is still available on the other AMP.
Certainly there is a benefit to protecting your data. However, there are costs associated with
that benefit. They are: twice the disk space for storage and twice the I/O for Inserts,
Updates, and Deletes. (However, the Fallback option does not require any extra I/O for
SELECT operations and the fallback I/O will be performed in parallel with the primary I/O.)
The benefits of Fallback include protecting your data from hardware (disk) failure,
protecting your data from software (node) failure, automatic recovery and minimum
recovery time after repairs or fixes are complete.

A hardware (disk) or software (vproc) failure causes an AMP to be taken off-line
until the problem is corrected.
During this period, Fallback tables are fully available to users.
When the AMP is brought back on-line, the associated Vdisk is refreshed to
reflect any changes during the off-line period.

Page 8-22

Data Protection

Fallback
A Fallback table is fully available in the event of an unavailable AMP.
A Fallback row is a copy of a “Primary row” which is stored on a different AMP.
Cluster

Cluster

AMP 0

Primary
rows

2

AMP 1

Benefits of Fallback
• Permits access to table data during

11

6

3

AMP off-line period.

• Adds a level of data protection
Fallback
rows

12

5

7

5

1

beyond disk array RAID.

• Automatic restore of data changed
during AMP off-line.

AMP 2

5

12
2

AMP 3

7
6

• Critical for high availability

5

applications.

1

11

3

Cost of Fallback
• Twice the disk space for table storage.
• Twice the I/O for Inserts, Updates and
Deletes.

Loss of any two AMPs in a cluster causes RDBMS to halt!

Data Protection

Page 8-23

Fallback Clusters
A cluster is a group of AMPs that act as a single fallback unit. Clustering has no effect on
the distribution of the Primary rows of a table. The Fallback row copy however, will always
go to a different AMP in the same cluster.
The cluster size is set when Teradata is configured and the only choice for new systems is 2AMP clusters. Years ago, AMP clusters ranged from 2 to 16 AMPs per cluster and were
commonly set as groups of 4 AMPs. Starting with 5450 systems, all clusters are defined as
2 AMP clusters.
Should an AMP fail, the primary and fallback row copies stored on that AMP cannot be
accessed. However, their alternate copies are available through the other AMPs in the same
cluster.
The loss of an AMP in a cluster has no effect upon other clusters. It is possible to lose one
AMP in each cluster and still have full access to all Fallback-protected table data. If both
AMPs fail in a cluster, then Teradata halts.
While an AMP is down, the remaining AMPs in the cluster must do their own work plus the
work of the down AMP.
A small cluster size (e.g., 2 AMP cluster) reduces the chances of have 2 down AMPs in a
single cluster which would cause a non-operational configuration. With today’s new
systems, a typical cluster size of 2 AMPs provides the best option to maximize availability.

Page 8-24

Data Protection

Fallback Clusters
• A Fallback cluster is a defined set of 2 AMPs across which fallback is implemented.
• Loss of one AMP in the cluster permits continued table access.
• Loss of two AMPs in the cluster causes the RDBMS to halt.
Cluster 0

Cluster 1

Cluster 2

Cluster 3

AMP 0

AMP 1

AMP 2

AMP 3

Primary
rows

62

8

Fallback
rows

41

66

7

34

22

50

5

78

58

93

20

88

2

AMP 4

Primary
rows
Fallback
rows

41

66

62

8

Data Protection

AMP 5

7

19

4

45

14

1

38

17

37

72

AMP 6

58

93

20

88

34

22

50

5

2

78

4

AMP 7

45

19

17

37

72

14

1

38

Page 8-25

Fallback and RAID Protection
RAID 1 mirroring and RAID 5 data parity protection provide protection in the event of a
disk drive failure.
Fallback provides another level of data protection beyond disk mirroring or data parity
protection.
Examples of other failures that Fallback provides protection against include:




Page 8-26

Multiple drive failures in the same drive group
An array is not available (e.g., both disk array controllers fail in the disk array)
An AMP is not available (e.g., a software problem)

Data Protection

Fallback and RAID Protection
• RAID 1 Mirroring or RAID 5 Data Parity Protection provides protection in the
event of disk drive failure.

– Provides protection at a hardware level
– Teradata is unaware of the RAID technology used

• Fallback provides an additional level of data protection and provides access
to data when an AMP is not available (not online).

• Additional types of failures that Fallback protects against include:
– Multiple drives fail in the same drive group,
– Disk array is not available
• Both disk array controllers fail in a disk array
• Two of the three power supplies fail in a disk array

– AMP is not available (e.g., software or data error)
• The combination of RAID 1 and Fallback provides the highest level of
availability.

Data Protection

Page 8-27

Fallback and RAID 1 Example
The next set of pages contains an example of how Fallback and RAID 1 Mirroring work
together.

Page 8-28

Data Protection

Fallback and RAID 1 Example
This example assumes that RAID 1 Mirroring is used and the table is fallback protected.
AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

15

78

99

19

39

28

Fallback
rows

15

78

99

19

39

28

62

8

27

34

22

50

Primary

RAID 1 Mirrored
Pair of
Physical
Disk
Drives

Fallback

Primary

Fallback

Data Protection

62
8
27
15
78
99

Primary

62
8
27
15
78
99

Primary

Fallback

Fallback

34
22
50
19
38
28

Primary

34
22
50
19
38
28

Primary

Fallback

Fallback

15
78
99
62
8
27

Primary

15
78
99
62
8
27

Primary

Fallback

Fallback

19
39
28
34
22
50

19
39
28
34
22
50

Page 8-29

Fallback and RAID 1 Example (cont.)
The example of how Fallback and RAID 1 Mirroring work together is continued.
In this example, one disk drive has failed in the first drive group. Is Fallback needed? No.
As a matter of fact, Teradata doesn’t even realize that the drive has failed. The disk array
continues to provide access to the data directly from the second disk drive in the drive
group. The disk array controller will send a “fault” or error message to the AWS.

Page 8-30

Data Protection

Fallback and RAID 1 Example (cont.)
Assume one disk drive fails. Is Fallback needed in this example?
AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

15

78

99

19

39

28

Fallback
rows

15

78

99

19

39

28

62

8

27

34

22

50

Primary

RAID 1 Mirrored
Pair of
Physical
Disk
Drives

Fallback

Primary

Fallback

Data Protection

62
8
27
15
78
99

Primary

62
8
27
15
78
99

Primary

Fallback

Fallback

34
22
50
19
38
28

Primary

34
22
50
19
38
28

Primary

Fallback

Fallback

15
78
99
62
8
27

Primary

15
78
99
62
8
27

Primary

Fallback

Fallback

19
39
28
34
22
50

19
39
28
34
22
50

Page 8-31

Fallback and RAID 1 Example (cont.)
The example of how Fallback and RAID 1 Mirroring work together is continued.
In this example, assume two disk drives have failed – one in the first drive group and one in
the third drive group. Is Fallback needed? No. Like before, Teradata doesn’t even realize
that the drives have failed. The disk array continues to provide access to the data directly
from the second disk drive each of the drive groups. The disk array controller will send
“fault” or error messages to the AWS.

Page 8-32

Data Protection

Fallback and RAID 1 Example (cont.)
Assume two disk drives have failed. Is Fallback needed in this example?
AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

15

78

99

19

39

28

Fallback
rows

15

78

99

19

39

28

62

8

27

34

22

50

Primary

RAID 1 Mirrored
Pair of
Physical
Disk
Drives

Fallback

Primary

Fallback

Data Protection

62
8
27
15
78
99

Primary

62
8
27
15
78
99

Primary

Fallback

Fallback

34
22
50
19
38
28

Primary

34
22
50
19
38
28

Primary

Fallback

Fallback

15
78
99
62
8
27

Primary

15
78
99
62
8
27

Primary

Fallback

Fallback

19
39
28
34
22
50

19
39
28
34
22
50

Page 8-33

Fallback and RAID 1 Example (cont.)
The example of how Fallback and RAID 1 Mirroring work together is continued.
In this example, assume two disk drives have failed – both failed drives are in the first drive
group. Is Fallback needed? Yes, if you need to access the data in this table. When multiple
disk drives fail in a drive group, the data (Vdisk) is not available and the AMP goes into a
FATAL state. At this point, Teradata does realize that an AMP is not available and Teradata
restarts. The disk array controller will send “fault” or error messages to the AWS.
The AWS will also get “fault” messages indicating that Teradata has restarted.

Page 8-34

Data Protection

Fallback and RAID 1 Example (cont.)
Assume two disk drives have failed in the same drive group. Is Fallback needed?
AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

15

78

99

19

39

28

Fallback
rows

15

78

99

19

39

28

62

8

27

34

22

50

Primary

RAID 1 Mirrored
Pair of
Physical
Disk
Drives

Fallback

Primary

Fallback

Data Protection

62
8
27
15
78
99

Primary

62
8
27
15
78
99

Primary

Fallback

Fallback

34
22
50
19
38
28

Primary

34
22
50
19
38
28

Primary

Fallback

Fallback

15
78
99
62
8
27

Primary

15
78
99
62
8
27

Primary

Fallback

Fallback

19
39
28
34
22
50

19
39
28
34
22
50

Page 8-35

Fallback and RAID 1 Example (cont.)
The example of how Fallback and RAID 1 Mirroring work together is continued.
In this example, assume three disk drives have failed – two failed drives are in the first drive
group and one failed drive is in the third drive group. Is Fallback needed? Yes, if you need
to access the data in this table. When multiple disk drives fail in a drive group, the data
(Vdisk) is not available and the AMP goes into a FATAL state. However, the third AMP is
still operational and online.

Page 8-36

Data Protection

Fallback and RAID 1 Example (cont.)
Assume three disk drive failures. Is Fallback needed? Is the data still available?
AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

15

78

99

19

39

28

Fallback
rows

15

78

99

19

39

28

62

8

27

34

22

50

Primary

RAID 1 Mirrored
Pair of
Physical
Disk
Drives

Fallback

Primary

Fallback

Data Protection

62
8
27
15
78
99

Primary

62
8
27
15
78
99

Primary

Fallback

Fallback

34
22
50
19
38
28

Primary

34
22
50
19
38
28

Primary

Fallback

Fallback

15
78
99
62
8
27

Primary

15
78
99
62
8
27

Primary

Fallback

Fallback

19
39
28
34
22
50

19
39
28
34
22
50

Page 8-37

Fallback vs. non-Fallback Tables Summary
Fallback tables have a major advantage in terms of availability and recoverability. They
can withstand an AMP failure in each cluster and maintain full data availability. A second
AMP failure in any cluster results in a system halt. A manual restart of the system is
required in this circumstance.
Non-Fallback tables are affected by the loss of any one AMP. The table continues to be
accessible, but only for those AMPs that are still on-line. A one-AMP Primary Index access
is possible, but a full table scan is not.
Fallback tables are easily recovered after a failure due to the availability of Fallback rows.
Non-Fallback tables may only be restored from external medium in the event of a disaster.

Page 8-38

Data Protection

Fallback vs. non-Fallback Tables Summary
FALLBACK TABLES
One AMP Down

AMP

AMP

AMP

AMP

- Data fully available

Two or more AMPs Down

AMP

AMP

AMP

AMP

- If different cluster,
data fully available
- If same cluster,
Teradata halts

One AMP Down

AMP

AMP

AMP

Two or more AMPs Down

AMP

AMP

AMP

Non-FALLBACK TABLES

Data Protection

AMP

AMP

- Data partially available;
queries that avoid
down AMP succeed.
- If different cluster,
data partially available;
queries that avoid
down AMP succeed.
- If same cluster,
Teradata halts

Page 8-39

Clusters and Cliques
As you know, a cluster is a group of AMPs that act as a single fallback unit. A clique is a
set of Teradata nodes that share a common set of disk arrays. Clusters provide data access
protection in the event of an AMP failure (usually because of a Vdisk failure). Cliques
provide protection from SMP node failures.
The best availability for Teradata is to spread clusters across different cliques. The “Default
Cluster” function of the CONFIG utility does this automatically.
The example on the facing page illustrates a 4+2 node system. Each clique consists of 3
nodes (2 TPA plus one Hot Standby Node – HSN) connected to a set of disk arrays with 240
disks. This example assumes each node is configured with 30 AMPs.

Page 8-40

Data Protection

Clusters and Cliques
SMP001-7

Clique
0

0

1

…

SMP002-6
29

30

SMP003-7

Clique
1

60

61

…

31

…

SMP002-7
59

SMP004-6
89

90

91

…

Hot-Standby Node

240 Disks in Disk Arrays
for Clique 0

SMP004-7
119

Hot-Standby Node

240 Disks in Disk Arrays
for Clique 1

Cluster 0 – AMPs 0 and 60

Cluster 1 – AMPs 1 and 61

To provide the highest availability, the goal is to interleave clusters across cliques and
cabinets.

Data Protection

Page 8-41

Locks
Locking prevents multiple users who are trying to change the same data at the same time
from violating the data's integrity. This concurrency control is implemented by locking the
desired data. Locks are automatically acquired during the processing of a request and
released at the termination of the request. In addition, users can specify locks.
There are four types of locks: Exclusive, Write, Read, and Access.
Exclusive locks are only applied to databases or tables, never to rows. They are the most
restrictive type of lock; all other users are locked out. Exclusive locks are used rarely, most
often when structural changes are being made to the database.
Write locks enable users to modify data while locking out all other users except readers not
concerned about data consistency (Access lock readers). Until a Write lock is released, no
new read or write locks are allowed.
Read locks are used to ensure consistency during read operations. Several users may hold
concurrent read locks on the same data, during which no modification of the data is
permitted.
Access locks can be specified by users who are not concerned about data consistency. The
use of an access lock allows for reading data while modifications are in process. Access
locks are designed for decision support on large tables that are updated only by small singlerow changes. Access locks are sometimes called “stale read” locks, i.e. you may get ‘stale
data’ that hasn’t been updated.
Three levels of database locking are provided:




Database
Table
Row Hash

- locks all objects in the database
- locks all rows in the table or view
- locks all rows with the same row hash

The type and level of locks are automatically chosen based on the type of SQL command
issued. The user has, in some cases, the ability to upgrade or downgrade the lock.
For example, if an SQL UPDATE command is executed without a WHERE clause, a
WRITE lock is placed on the table. If an SQL UPDATE command is executed with a
WHERE clause that specifies a Primary Index value, then a row hash lock is used.

Page 8-42

Data Protection

Locks
There are four types of locks:
Exclusive

– prevents any other type of concurrent access

Write

– prevents other reads, writes, exclusives

Read

– prevents writes and exclusives

Access

– prevents exclusive only

Locks may be applied at three levels:
Database

– applies to all tables/views in the database

Table/View

– applies to all rows in the table/views

Row Hash

– applies to all rows with same row hash

Lock types are automatically applied based on the SQL command:

Data Protection

SELECT

– applies a Read lock

UPDATE

– applies a Write lock

CREATE TABLE

– applies an Exclusive lock

Page 8-43

Locking Modifier

This option precedes an SQL statement and locks a database, table, view, or row hash. The
locking modifier overrides the default usage lock that Teradata places on a database, table,
view, or row hash in response to a request.
Note: The DROP TABLE access right is required on the table in order to upgrade a
READ or WRITE LOCK to an EXCLUSIVE LOCK.

ACCESS
Access locks have many advantages. This allows quick access to data, even if other
requests are updating the data. They also have minimal effect on locking out others – when
you use an access lock; virtually all requests are compatible with your lock except exclusive
locks

NOWAIT
If a resource is locked and an application does not want to wait for that lock to be released,
the Locking Modifier NOWAIT option can be used. The NOWAIT option indicates that if
the lock cannot be obtained, then the statement will be aborted.
This option is used in situations where it is not desirable to have a statement wait for
resources, possibly also tying up resources in the process of waiting.
Example:
LOCKING TABLE tablename FOR WRITE NOWAIT UPDATE ….. ;
*** Failure 7423 Object already locked and NOWAIT.
Transaction Aborted. Statement# 1, Info =0

The user is informed with a 7423 error status code that indicates the lock could not be placed
due to an existing, conflicting lock.

Page 8-44

Data Protection

Locking Modifier
The locking modifier overrides the default usage lock that Teradata places on a
database, table, view, or row hash in response to a request.
Certain locks can be upgraded or downgraded:
LOCKING ROW FOR ACCESS

SELECT * FROM Table_A;

An “Access Lock” allows the user to access (read) an object that has a READ or
WRITE lock associated with it.
In this example, even though an access row lock was requested, a table level
access lock will be issued because the SELECT causes a full table scan.
Note: A "Locking Row" request must be followed by a SELECT.
LOCKING TABLE Table_B FOR EXCLUSIVE

UPDATE Table_B SET A = 2011;

This request asks for an exclusive lock, effectively upgrading the lock.
LOCKING TABLE Table_C FOR WRITE NOWAIT UPDATE Table_C SET A = 2012;
The NOWAIT option is used if you do not want your transaction to wait in a queue.
NOWAIT effectively says to abort the the transaction if the locking manager cannot
immediately place the necessary lock. Error code 7423 is returned if the lock
request is not granted.

Data Protection

Page 8-45

Rules of Locking
As the facing page illustrates, a new lock request must wait (queue) behind other
incompatible locks that are either in queue or in effect. The new Read lock must wait until
the write lock ahead of it is released before it goes into effect.
In the second example, the second Read lock request may occupy the same position in the
queue as the Read lock that was already there. When the current Write lock is released, both
requests may be given access concurrently. This only happens when locks are compatible.
When an SQL statement provides row hash information, a row hash lock will be used. If
multiple row hashes within the table are affected, a table lock is used.

Page 8-46

Data Protection

Rules of Locking
Rule

LOCK LEVEL HELD

Lock requests are queued
behind all outstanding
incompatible lock requests
for the same object.

LOCK
REQUEST

NONE

ACCESS

Granted

READ
WRITE

ACCESS

READ

WRITE

EXCLUSIVE

Granted

Granted

Granted

Queued

Granted

Granted

Granted

Queued

Queued

Granted

Granted

Queued

Queued

Queued

EXCLUSIVE Granted

Queued

Queued

Queued

Queued

Example 1 – New READ lock request goes to the end of queue.
New request

Lock queue

Current lock

READ

WRITE

READ

New lock queue

READ

WRITE

Current lock

READ

Example 2 – New READ lock request shares slot in the queue.
New request

Lock queue

Current lock

READ

READ

WRITE

New lock queue

READ

Current lock

WRITE

READ

Data Protection

Page 8-47

Access Locks
Access locks have many advantages. They allow quick access to data, even if other requests
are updating the data. They also have minimal effect on locking out others - when you use
an access lock; virtually all requests are compatible with yours.
When doing large aggregations of numbers, it may be inconsequential if certain rows are
being updated during the summation, particularly if one is only looking for approximate
totals. Access locks are ideal for this situation.
Looking at Example 3, what happens to the Write lock request when the Read lock goes
away? Looking at the chart, it will be “Granted” since Write and Access are considered
compatible.
Another example not shown on the facing page:
Assume user1 is in ANSI mode and has updated a row, but hasn't entered COMMIT
yet. The actual row in the table is updated on disk; the before-image is located in the TJ
of the WAL log in case user1 decides to ROLLBACK.
If user2 accesses this row with an access lock, the updated row on disk is returned even though it is locked and not committed yet. Assume user1 issues a ROLLBACK,
then the before-image in the TJ is used to rollback the row on disk. If user2 selects the
row a second time, user2 will get the row (original) that is now on disk.

Page 8-48

Data Protection

Access Locks
Rule

LOCK LEVEL HELD

Lock requests are queued
behind all outstanding
incompatible lock requests
for the same object.

LOCK
REQUEST

NONE

ACCESS

Granted

READ
WRITE

ACCESS

READ

WRITE

EXCLUSIVE

Granted

Granted

Granted

Queued

Granted

Granted

Granted

Queued

Queued

Granted

Granted

Queued

Queued

Queued

EXCLUSIVE Granted

Queued

Queued

Queued

Queued

Example 3 – New ACCESS lock request granted immediately.
New request

Lock queue

Current lock

New lock queue

Current locks

ACCESS

WRITE

READ

WRITE

READ
ACCESS

Advantages of Access Locks
Permit quicker access to table in multi-user environment.
Have minimal ‘blocking’ effect on other queries.
Very useful for aggregating large numbers of rows.
Disadvantages of Access Locks
May produce erroneous results if during table maintenance.

Data Protection

Page 8-49

Transient Journal
The Transient Journal exists to permit the successful rollback of a failed transaction.
Transactions are not committed to the database until an End Transaction request has been
received by the AMPs, either implicitly or explicitly. Until that time, there is always the
possibility that the transaction may fail in which case the participating table(s) must be
restored to their pre-transaction state.
The Transient Journal maintains a copy of all before images of all rows affected by the
transaction. If the event of transaction failure, the before images are reapplied to the
affected tables, the images are deleted from the journal and a rollback operation is
completed. In the event of transaction success, at the point of transaction commit, the before
images for the transaction are discarded from the journal.
In Summary, if a Transaction fails (for whatever reason), the before images in the transient
journal are used to return the data (in the tables involved in the transaction) to its original
state.

Page 8-50

Data Protection

Transient Journal
Transient Journal – provides transaction integrity

•
•
•
•
•

A journal of transaction “before images” (UNDO rows) maintained within WAL.
Provides for automatic rollback in the event of TXN failure.
Is automatic and transparent.
“Before images” are reapplied to table if a transaction fails.
“Before images” are discarded upon transaction completion.

Successful TXN
BEGIN TRANSACTION
UPDATE Row A
– Before image Row A recorded (Add $100 to checking)
UPDATE Row B
– Before image Row B recorded (Subtract $100 from savings)
END TRANSACTION
– Discard before images

Failed TXN
BEGIN TRANSACTION
UPDATE Row A
UPDATE Row B
(Failure occurs)
(Rollback occurs)
(Terminate TXN)

Data Protection

– Before image Row A recorded
– Before image Row B recorded
– Reapply before images
– Discard before images

Page 8-51

Recovery Journal for Down AMPs
After the loss of any AMP, a Down-AMP Recovery Journal is started automatically. Its
purpose is to log any changes to rows which reside on the down AMP. Any inserts, updates,
or deletes affecting rows on the down AMP, are applied to the Fallback copy within the
cluster. The AMP that holds the Fallback copy logs the Row ID in its Recovery Journal.
This process continues until such time as the down AMP is brought back on-line. As part of
restart activity, the Recovery Journal is read and changed rows are applied to the recovered
AMP. When the journal has been exhausted, it is discarded and those tables that are
fallback-protected are fully recovered.

Page 8-52

Data Protection

Recovery Journal for Down AMPs
Recovery Journal is:

Automatically activated when an AMP is taken off-line.
Maintained by the other AMPs in a cluster.
Totally transparent to users of the system.

While AMP is off-line

Journal is active.
Table updates continue as normal.
Journal logs Row IDs of changed rows for down-AMP.

When AMP is back on-line

Restores rows on recovered AMP to current status.
Journal discarded when recovery complete.

AMP 0

AMP 1

AMP 2

AMP 3

Vdisk
Primary
rows

62

8

27

34

22

50

5

78

19

14

1

38

Fallback
rows

5

78

19

19

38

8

62

8

27

50

27

78

Recovery
Journal

Data Protection

TableID/RowID – 62
TableID/RowID – 5

Page 8-53

Permanent Journal
The purpose of the Permanent Journal is to provide selective or full database recovery to a
specified point in time. It permits recovery from unexpected hardware or software disasters.
The Permanent Journal also has the effect of reducing the need for full table backups which
can be costly both in time and resources.
The Permanent Journal is an optional journal and its features must be customized to the
specific needs of the installation. The journal may capture before images (for rollback),
after images (for rollforward), or both. Additionally, the user must specify if single images
(default) or dual images (for fault-tolerance) are to be captured.
A Permanent Journal may be shared by multiple tables or multiple databases.
The journal captures images concurrently with standard table maintenance and query
activity. The cost in additional required disk space may be calculated in advance to ensure
adequate disk reserve.
The journal is periodically dumped to external media, thus reducing the need for full table
backups – in effect, only the changes are backed up.

Page 8-54

Data Protection

Permanent Journal
The Permanent Journal is an optional, user-specified, system-maintained journal
which is used for recovery of a database to a specified point in time.
The Permanent Journal:

• Is used for recovery from unexpected hardware or software disasters.
• May be specified for ...
– One or more tables
– One or more databases

•
•
•
•
•
•
•

Permits capture of Before Images for database rollback.
Permits capture of After Images for database rollforward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

Data Protection

Page 8-55

Archiving and Recovering Data
The purpose of the ARC utility is to allow for the archiving and restoring of database
objects which may have been damaged or lost. There are several scenarios where restoring
objects from external media may be necessary.


Restoring of non-Fallback tables after a disk failure.



Restoring of tables which have been corrupted by batch processes which may have
left the data in an ‘uncertain’ state.



Restoring of tables, views or macros which have been accidentally dropped by the
user.



Miscellaneous user errors resulting in damaged or lost database objects.

Teradata’s Backup and Recovery (BAR) architecture provides solutions from Teradata
Partners. Two examples are:



NetVault – from BakBone software
NetBackup – from Symantec (Veritas NetBackup by Symantec)

The ASF2 utility is an older utility that provides an X Windows based front-end for creation
and execution of ARC command scripts. It is designed to run on UNIX MP-RAS.

Page 8-56

Data Protection

Archiving and Recovering Data
ARC

•
•
•
•
•

The Archive/Restore utility (arcmain)
Runs on IBM, UNIX MP-RAS, Windows 2003, and Linux systems
Archives and restores data from/to Teradata Database
Restores or copies data from archive media
Permits data recovery to a specified checkpoint using Permanent Journals

Backup and Recovery (BAR)

• Example of BAR choices from different Teradata Partners
– NetBackup - Veritas NetBackup by Symantec
– Tivoli Storage Manager – utilizes TARA

•
•
•
•

Provides Windows front end for ARC
Easy creation of scripts for archive/recovery
Provides job scheduling and tape management functions
BAR was previously referred to as Open Teradata Backup (OTB)

Data Protection

Page 8-57

Module 8: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 8-58

Data Protection

Module 8: Review Questions
Match the item to a lettered description.
____ 1. Database locks
____ 2. Table locks
____ 3. Row Hash locks
____ 4. FALLBACK
____ 5. Cluster
____ 6. Recovery journal
____ 7. Transient journal
____ 8. ARC
____ 9. NetBackup/Tivoli
____ 10. Permanent journal
____ 11. Disk Array

Data Protection

a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.

Provides for TXN rollback in case of failure
Teradata Backup and Recovery applications
Protects all rows of a table
Logs changed rows for down AMP
Provides for recovery to a point in time
Applies to all tables and views within
Multi-platform archive utility
Lowest level of protection granularity
Protects tables from AMP failure
Protects database from a physical drive failure
Group of AMPs used by Fallback

Page 8-59

Notes

Page 8-60

Data Protection

Module 9
Introduction to MPP Systems

After completing this module, you will be able to:
 Specify a major difference between a 6650 and a 6690 system.
 Specify a major difference between a 2650 and a 2690 system.
 Define the purpose of the major subsystems that are part of an
MPP system.
 Specify the names of the Teradata (TPA) nodes in a 6690 cabinet.

Teradata Proprietary and Confidential

Introduction to MPP Systems

Page 9-1

Notes

Page 9-2

Introduction to MPP Systems

Table of Contents
Teradata Systems ......................................................................................................................... 9-4
SMP Architecture ......................................................................................................................... 9-6
Hyper-Threading and Multi-Core CPUs ...................................................................................... 9-8
Comparing Performance of Servers ........................................................................................... 9-10
Cabinet or Rack Pictures ............................................................................................................ 9-12
Teradata 6650 Systems .............................................................................................................. 9-14
Teradata 6650 Cabinets .............................................................................................................. 9-16
Adding SSD to a 6650 (Future) ................................................................................................. 9-18
Teradata 6650 Configuration Examples..................................................................................... 9-20
Teradata 6690 Systems .............................................................................................................. 9-22
Teradata 6690 Cabinets .............................................................................................................. 9-24
Teradata Extended Nodes .......................................................................................................... 9-26
Making Sense of the Different Platforms................................................................................... 9-28
Linux Coexistence Combinations .............................................................................................. 9-30
Teradata Appliance Introduction................................................................................................ 9-32
Teradata 2650/2690 Appliances ................................................................................................. 9-34
Teradata 2650/2690 Cabinets..................................................................................................... 9-36
Appliance Configuration Examples ........................................................................................... 9-38
What is the BYNET™? ............................................................................................................. 9-40
BYNET 32 Switches .................................................................................................................. 9-42
BYNET 64 Switches .................................................................................................................. 9-44
BYNET Expansion Switches ..................................................................................................... 9-46
BYNET Expansion to 1024 Nodes ............................................................................................ 9-46
Server Management with SWS .................................................................................................. 9-48
Node Naming Conventions ........................................................................................................ 9-50
Summary .................................................................................................................................... 9-52
Module 9: Review Questions ..................................................................................................... 9-54

Introduction to MPP Systems

Page 9-3

Teradata Systems
As the competitive needs of businesses change, the system architecture changes over time.
To be best-in-class, an information processing system in today's environment will typically
have the following characteristics.


Utilization of multiple processors in multiple nodes to achieve acceptable
performance. Easily scalable in both processing power and data storage capacity
with adherence to all industry-standard interfaces.



Be capable of handling a very large databases, rapidly process complex queries,
maintain data security, and be accessible to the total enterprise. Support on-line
transaction processing as well as decision support applications.



In today’s global and highly competitive markets, computing systems (especially
enterprise servers) need to be available to the world 24 hours a day.

TPerf (Traditional Performance) is a power metric that has been used in a rigorous and
consistent manner for each generation of the Teradata platform since the model 5100. It is a
metric for how fast a node can process data. TPerf is maximized when there is a balance
between CPU and I/O bandwidth. When used to compare different Teradata configurations,
the TPerf metric is similar to other throughput metrics, such as rows/second or
transactions/second that a node processes where actual data volumes in terms of bytes are
not reflected in the metric. Data capacity is not a function of a general or design center
TPerf used by sales and engineering to compare Teradata systems, that is, this metric
assumes there is a constant database volume in place when comparing one system to
another.
TPerf is a power metric that measures the throughput performance of the TPerf workload. It
is not a response time metric for specific queries and operations. Response time depends on
a number of factors in the Teradata architecture in addition to the ones that TPerf gauges,
i.e., CPU power and I/O performance. Other factors influencing response time include, but are
not limited to:
1.
2.
3.
4.

Parallelism provided by the number of AMPs
Concurrency (competition among queries)
Workload mix
Workload management

TPerf is analogous to the pulling Power of a train locomotive. The “Load” is the work the
Node operates on. The data space is analogous to the freight cars in a train. You would
need twice as big a locomotive to pull twice as many cars. You would need a
To have the same performance with twice as much data and load on a system, you would
need a system with a TPerf that is twice (2x) as large.

Page 9-4

Introduction to MPP Systems

Teradata Systems
Examples of systems used with the Teradata Database include:
Teradata Systems
5400/5450
5500/555x/56xx
6650/6680/6690
15xx/16xx/25xx/26xx

–
–
–
–

up to 10 nodes/cabinet
up to 9 nodes/cabinet
up to 4 nodes/cabinet with associated storage
various Appliance systems

The basic building block is the SMP (Symmetric Multi-Processing) node.
The power of these nodes will be measured by TPerf – Traditional Performance.

• The Teradata metric for total power of a node or system.
• Determined by measuring system elements and calculating the performance with a
representative set of workloads.

Key differences:

• Speed and capacity of SMP nodes and systems
• Cabinet architecture
• BYNET interface cards, switches and speeds
*BYNET V4 – up to 4096 nodes

Introduction to MPP Systems

Page 9-5

SMP Architecture
The SMP or “processing node” is the basic building block for Teradata systems. The
processing node contains the primary processor logic (CPUs), memory, and I/O
functionality.
Teradata is supported on non-Teradata SMP servers with 4 or fewer physical CPU sockets.
A Teradata license can only be purchased for an SMP server with up to 4 physical CPUs.
The server might have 4 hyper-threading CPUs which look like 8 logical CPUs to the
operating system. The server may have two quad-core CPUs which appears to the operating
system as 8 CPUs.
Basic definitions of the CPUs used with Teradata servers:


Hyper-Threading CPUs – one physical CPU (chip) socket, but with 2 control
(context) areas – makes 1 CPU look like 2 logical CPUs.



Dual-core CPUs – one physical CPU (chip) socket, but with two control (context)
areas and 2 execution cores – makes 1 CPU look like 2 physical CPUs.



Quad-core CPUs – one physical CPU (chip) socket, but with four control (context)
areas and 4 execution cores – makes 1 CPU look like 4 physical CPUs.



Quad-core CPUs with Hyper-Threading – one physical CPU (chip) socket, but with
8 control (context) areas and 4 execution cores – makes 1 CPU look like 4 physical
CPUs or 9 logical CPUs.



Six-core CPUs with Hyper-Threading – one physical CPU (chip) socket, but with
12 control (context) areas and 6 execution cores – makes 1 CPU look like 6
physical CPUs or 12 logical CPUs.

5400/5450 nodes have 2 physical chips using Hyper-Threading, effectively 4 logical CPUs.
5500H nodes have 2 dual-core chips, effectively 4 CPUs.
5555C nodes have 1 quad-core chips, effectively 4 CPUs
5550H and 5555H nodes have 2 quad-core chips, effectively 8 CPUs.
5600H nodes have 2 quad-core chips using hyper-threading, effectively 16 CPUs per node.
2650, 2690, 5650H, 6650H, 6680, and 6690 nodes have 2 six-core chips using hyperthreading, effectively 24 CPUs per node.

Page 9-6

Introduction to MPP Systems

SMP Architecture
SMP (Symmetrical Multi-Processing) Node – basic building block of MPP systems.
• Hyper-Threading CPUs – one CPU socket (chip) with 1 execution core and 2 control (context) areas
•
•
•
•

– makes 1 CPU chip look like 2 logical CPUs.
Dual-core CPUs – one CPU socket with 2 execution cores – makes 1 chip look like 2 physical
CPUs.
Quad-core CPUs – one CPU socket with 4 execution cores – makes 1 chip look like 4 physical
CPUs.
Quad-core CPUs with Hyper-Threading – one chip socket with 4 execution cores each with 2
control areas – makes 1 CPU chip socket look like 8 logical CPUs
Six-core CPUs with Hyper-Threading – one chip socket with 6 execution cores each with 2 control
areas – makes 1 CPU chip socket look like 12 logical CPUs

Other names include node, compute node, processing node, 24-way node, etc.
Key hardware components of a node include:
CPUs and cache memory
Memory
System Bus
I/O Subsystem

Processor(s)

Memory

CPU

CPU

Memory

CPU

CPU

Memory

I/O Subsystem
Fibre
Channel
Adapter

Fibre Channel

•
•
•
•

System Bus

Introduction to MPP Systems

Page 9-7

Hyper-Threading and Multi-Core CPUs
The facing page illustrates the concept of Hyper-Threading and Multi-Core CPUs.
With Hyper-Threading, 2 physical CPUs appear to the Operating System as 4 logical or
virtual CPUs. With Dual-Core, 2 physical CPUs appear to the Operating System as 4
physical CPUs. The SMP’s BIOS automatically tells the Operating System that there are 4
CPUs. The Operating System will schedule work as though there are actually 4 CPUs in
either case.
The reason for a performance gain with Hyper-Threading is as follows. When one of the
logical processors (control unit) is setting up its data and instruction registers from cache or
memory, the execution unit can be executing instructions from the other logical processor.
In this way, the execution unit doesn’t have to wait for one of the control units to set up its
data and instruction registers – it is effectively kept busy a larger percentage of the time.
Some of the benefits of Hyper-Threading include:




No software changes required
Symmetric
Improved CPU Efficiency

The reason for a performance gain with Dual-Core CPUs is that there are two control areas
and two execution units. One CPU socket is really two physical CPUs. Quad-Core CPUs
provide even more processing power with one CPU socket providing four physical CPUs.
With Quad-Core, 2 physical CPUs appear to the Operating System as 8 physical CPUs. The
SMP’s BIOS effectively tells the Operating System that there are 8 CPUs.
With Quad-Core and Hyper-Threading, 2 physical CPUs appear to the Operating System as
16 CPUs. The SMP’s BIOS effectively tells the Operating System that there are 16 CPUs.
Notes:



Page 9-8

The Operating System schedules work across logical or physical CPUs.
The Windows Task Manager or UNIX “pinfo” command actually identifies the
CPUs (e.g., 8 with quad-core) for which work can be scheduled.

Introduction to MPP Systems

Hyper-Threading and Multi-Core CPUs
x

Control Unit (context area) – Data
Registers and Instruction Registers

Execution Unit – physical
execution of instructions

Without Hyper-Threading

With Hyper-Threading

With Dual-Core CPUs

Operating System

Operating System

Operating System

1

2

1

3

2

4

1

3

With Quad-Core CPUs and H-T

With Six-Core CPUs and H-T

Operating System

Operating System

Introduction to MPP Systems

2

4

Page 9-9

Comparing Performance of Servers
TPerf is a metric for total Power of a Node or system


TPerf = Traditional Performance

Analogous to the pulling Power of a train locomotive.
The “Load” is the work the Node operates on. The data space is analogous to the freight
cars in a train. You would need twice as big a locomotive to pull twice as many cars.
To have the same performance with twice as much data and load on a system, you would
need a system with a TPerf that is twice (2x) as large.
Acronym: H-T is Hyper-Threading
Teradata’s Design Center establishes typical system configurations for different Teradata
system models. For example, one design center configuration for a 6650 system is cliques
of 3+1 nodes, 42 AMPs per node, and two 600 GB mirrored disks for each node. The
design center power rating is called TPerf-dc.
The process for deriving design center TPerf for a Teradata platform consists of five steps:
1) A diverse se of performance tests is executed on the platform design center
configuration for a Teradata platform model.
2) The CPU and IO resource usage and throughput are measured.
3) An analytical model is used to calculate the CPU and IO resource usage of a
weighted blend of workloads.
4) The blended workload is compared against the resource capabilities provided by
the design center platform configuration.
5) The TPerf metric is then calculated.
This design center TPerf (TPerf-dc) represents system throughput potential, in other words,
how much work could be done in a given amount of time given a high concurrency
workload for that design center hardware configuration. Any system with the same
configuration and the same workload mix used in the model will deliver overall performance
that matches the level indicated by the TPerf-dc metric.
TPerf-dc usually does not describe the throughput potential for deployed configurations of
Teradata systems. The reality is that business demands require a wide variety of Teradata
system configurations to meet specific performance and pricing needs and no customer
workload is the same as that for the TPerf-dc model.
TPerf-dc plays only a small part in any attempt to estimate response time expectations for the
design center configuration and TPerf workload – all the other factors listed above must be
considered.

Page 9-10

Introduction to MPP Systems

Comparing Performance of Servers
4 Cores
4 Sockets

130

2 Cores

4 Cores

8 Cores

8 Cores

12 Cores

H-T
120
H-T
110
H-T (Hyper-Threading)

100
90
80

H-T

70
130.0
60

119.0

50
83.4

40
30
52.1
20
10
0

H-T

10.14 12.84

1.00

5.04

5.79

4.68

6.13

8.45

5100

5250

5255

5300

5350

5380

Introduction to MPP Systems

H-T

H-T

5400

5450

31.7
Linux

Linux

5500H

5555H
32 GB

Linux

Linux

Linux

5600H
96 GB

6650H
96 GB

6690
96GB

Page 9-11

Cabinet or Rack Pictures
The Rack Cabinet is an industry standard, 40U rack frame used to house Teradata
processing nodes and/or disk arrays.
Measurements

The “U” in the 40U rack term represents a unit of vertical measurement for
placement of chassis in the rack. [1U = 4.445 cm (1.75 in.)] This diagram
illustrates the depth of older cabinet which was 40”.

Teradata systems use an industry standard rack mount architecture and individual chassis
that conform to industry standards. Each chassis occupies a specific number of U spaces in
the rack. Examples of types of chassis that can be placed in a rack or cabinet include.




Processing Node (54xx , 55xx, 56xx, and 66xx nodes – 2U
BYNET Switch (BYA32S) – 1U
Server Management Chassis (CMIC) – 1U

The 55xx and 66xx systems use a rack that is 44” deep (4” deeper than previous rack).
Older systems (e.g., 5650) used a separate Teradata SWS (Service Workstation) for
operational maintenance of the system. The last SWS was a Dell PowerEdge T710 Server
and was available as deskside or rack mount server.
Newer systems (e.g., 6690) utilize a VMS (Virtualized Management Server) which
consolidates CMIC, SWS, and Teradata Viewpoint functions into a single chassis.

Page 9-12

Introduction to MPP Systems

Cabinet or Rack Pictures
Notes:
• Cabinet Size = 24" W X 77"H X
44" D without doors and side
panels
• Improved cable management
– Larger exit hole in base
– Supports inter-rack cabling

Node Chassis

Processor /Storage Cabinet

Introduction to MPP Systems

Page 9-13

Teradata 6650 Systems
The Teradata Active Enterprise Data Warehouse 6650 platform is scalable from one to
4,096 Teradata nodes, and can handle more than 15 petabytes of data to support the complex
workloads in an active warehouse environment.
The 6650 processing nodes are the newest release of Teradata Servers which supports the
Teradata Warehouse solution. These nodes are similar to the 5650 processing nodes,
utilizing the Intel Westmere™ six-core CPUs with hyper-threading enabled.

The Teradata Active Enterprise Data Warehouse platform is made up of a combination of
cabinet types, depending on the system configuration:




Processing/storage cabinet
BYNET cabinet
Teradata Managed Server (TMS) cabinet

The 6650 provides high availability via the following features:


Hot standby nodes (HSN): One node in a clique can be configured as a hot standby
node. Eliminates the degradation of database performance in the event of a node
failure in the clique. Tasks assigned to the failed node are completely redirected to
the hot standby node.



Hot spare disks: One or more disks per array can be configured as hot spare disks.
In the event of a disk failure on a RAID mirrored pair, the contents of the failed
disk are copied into a hot spare disk from the mirrored surviving disk to repair the
RAID pair. When the failed drive is replaced, a copy back operation occurs to
restore data to the replaced drive.



Fallback: Data protection can be provided at the table level by automatically storing
a copy of each permanent data row of a table on a different or “fallback” AMP. If
an AMP fails, the Teradata Database can access the fallback copy and continue
operation.

The design center recommendations has a different number of AMPs and associated storage
per AMP varies depending on the configuration.




Page 9-14

1+1 clique – 48 AMPs/node; 192 disks per node
2+1 clique – 30 AMPs/node; 120 disks per node
3+1 clique – 42 AMP/node; 84 disks per node

Introduction to MPP Systems

Teradata 6650 Systems
Features of the 6650 system include:

• The Teradata 6650 platform is the first release of unified Node/Storage within a single
cabinet in the Active Enterprise Data Warehouse space.

• The 6650 is designed to reduce floor space utilization.
– The UPS/batteries are not used with the 6650
– In the event of site wide power loss, data integrity is provided by WAL.

• The 6650 utilizes up to two Intel® 2.93 GHz six-core CPUs
– Two models – 6650C and 6650H
• 6650C nodes utilize 1 socket with one six-core CPU and 48 GB of memory
• 6650H nodes utilize 2 sockets with two six-core CPUs and 96 GB of memory
• 6650C can be used to co-exist with previous generations and 6650H will co-exist with
future Active EDW Platform mixed storage offerings

• The 6650 can be configured in 1+1, 2+1, and 3+1 cliques.
– A 6650 clique consists of either one or two processing/storage cabinets. Each cabinet
contains processing nodes and a disk array.

• The 6650 can be upgraded to use SSD drives.
– 6650 is an SSD Ready platform and prepares for the introduction of Solid State Drives (SSD)
in the Active EDW space.

Introduction to MPP Systems

Page 9-15

Teradata 6650 Cabinets
The facing page illustrates various 6650 cabinet configurations.
The 66xx and later systems utilize an industry standard rack mount cabinet which provide
for excellent air flow and cooling. Similar to previous rack-based systems, this rack
contains individual subsystem chassis that are housed in standard rack frames. Subsystems
are self-contained, and their configurations — either internal or within a system — are
redundant. The design ensures overall system reliability, enhances its serviceability, and
enables time and cost efficient upgrades.
The key chassis in the rack/cabinet is the node chassis. The SMP node chassis is 2U in
height.
A Hot Standby Node is required with 6650 systems.


For 6650 systems, a clique has a maximum of three TPA nodes with one HSN
node.

Cabinet Build Conventions
The placement of the hardware components in a cabinet follows these general cabinet build
conventions:




Page 9-16

A 6650 clique consists of either one or two processing/storage cabinets. Each
cabinet contains processing nodes and a disk array. The following clique
configurations are available:
–

A two-cabinet 3+1 clique. The first cabinet contains two processing nodes and
one disk array. The second cabinet contains one processing node, one hot
standby node, and one disk array.

–

A two-cabinet 2+1 clique. The first cabinet contains one processing node and
one disk array. The second cabinet contains one hot standby node, one
processing node, and one disk array.

–

A two-cabinet 1+1 clique. The first cabinet contains one processing node and
one disk array. The second cabinet contains one hot standby node and one disk
array.

–

A one-cabinet 1+1 clique. The cabinet contains one processing node, one hot
standby node, and one disk array.

There is 1 CMIC in first cabinet of each two-cabinet clique. If a system only has
one clique, then there is a CMIC in the second cabinet.

Introduction to MPP Systems

Teradata 6650 Cabinets
6650 Characteristics

• Integrated Cabinet with
nodes and arrays in same
cabinet.

• NetApp array with 2
controllers and 8 drive
trays.

– 300, 450, or 600 GB drives

• With 2+1 clique, each AMP
is typically assigned to 4
disks (2 mirrored pairs).

– Usually 30 AMPs/node

• With 3+1 clique, each AMP
is typically assigned to 2
disks (1 mirrored pair).

– Usually 42 AMPs/node

• No UPSs in cabinet.

Secondary SM Switch

Secondary SM Switch

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

6844 Array
Controllers (4U)

6844 Array
Controllers (4U)

TPA Node

HSN

TPA Node

TPA Node

TMS Node
BYA32S-1
BYA32S-0
SM – CMIC (1U)

TMS Node

SM – CMIC (1U)

Primary SM Switch

Primary SM Switch

PDU

PDU

6650H

Introduction to MPP Systems

3+1 Clique
across 2 cabinets

TMS Node

PDU

PDU

6650H

Page 9-17

Adding SSD to a 6650 (Future)
The facing page illustrates a future option to add Solid State Disks (SSD) to a 6650 cabinet.

Page 9-18

Introduction to MPP Systems

Adding SSD to a 6650 (Future)
SSD Upgrade Steps

Secondary SM Switch

• Place SSD arrays in
positions 3, 4, and 5).

• Upgrade to 13.10 if not on
13.10.

• Enable TVS.
• Reconfig (no backup/restore
required).

Drive Tray (16 HD)
Drive Tray (16 HD)
Drive Tray (16 HD)
Drive Tray (16 HD)
Drive Tray (16 HD)
Drive Tray (16 HD)

If TMS, Channel Servers and/or
BYNET switches are installed,
they can be moved to a another
cabinet to make room for the
SSD storage.

Drive Tray (16 HD)
Drive Tray (16 HD)
6844 Array
Controllers (4U)
TPA Node

SSD Arrays use SAS based
controllers and 400 GB SSD.
Each tray has its own controllers
and SSD drives.

TPA Node
SSD Array
SSD Array

TMS Node
BYA32S-1
BYA32S-0
SM – CMIC (1U)

TMS Node
BYA32S-1
BYA32S-0

Primary SM Switch

PDU

PDU

6650H

Introduction to MPP Systems

Page 9-19

Teradata 6650 Configuration Examples
The facing page includes two examples of 6650 cliques. Typically, a 6650 node in a 3+1
clique will be configured with 42 AMPs, 2 disks per AMP, and 96 GB of memory.
Current configurations of the 6650 include:
6650H – Design Center
Clique
3+1 H

2+1 H

1+1 H

Drive
Options
300GB
450GB
600GB
300GB
450GB
600GB
300GB
450GB
600GB

HDDs
per
Node
84
84
84
120
120
120
192
192
192

Configuration
HDDs
CPU COD
per
Available
Clique
252
Yes
252
Yes
252
Yes
240
Yes
240
Yes
240
Yes
192
Yes
192
Yes
192
Yes

Allows upgrade to
SSD per
Node

Disks
per AMP

28
28
28
28
28
28
28
28
28

2
2
2
2
2
2
2
2
2

AMPs per
Node
42
42
42
30
30
30
48
48
48

These configurations provide an effective future SSD upgrade path while maintaining
optimum AMPs per node for a 6650H.
6650C – Design Center
Clique
3+1 C

2+1 C

1+1 C

Drive
Options
300GB
450GB
600GB
300GB
450GB
600GB
300GB
450GB
600GB

HDDs
per
Node
42
42
42
60
60
60
96
96
96

Configuration
HDDs
CPU COD
per
Available
Clique
126
Yes
126
Yes
126
Yes
120
Yes
120
Yes
120
Yes
96
Yes
96
Yes
96
Yes

Allows upgrade to
SSD per
Node

Disks
per AMP

14
14
14
14
14
14
14
14
14

2
2
2
2
2
2
2
2
2

AMPs per
Node
21
21
21
15
15
15
24
24
24

These configurations provide an effective future SSD upgrade path while maintaining
optimum AMPs per node for a 6650C.
For both 6650H and 6650C, if CPU Only Capacity on Demand is active, it should be
removed to take full advantage of the increased I/O now available. Following the Optimum
Performance Configurations will allow the customer to avoid a data reload and maintain
their systems AMPs per node ratio thereby reducing the impact of an upgrade.

Page 9-20

Introduction to MPP Systems

Teradata 6650 Configuration Examples
6650H (3+1 Clique)

6650H (2+1 Clique)

120
Disks

120
Disks

126
Disks

126
Disks

600 GB

600 GB

600 GB

600 GB

Node 1

HSN

Node 2

HSN

TMS

Node 2

Node 1

Node 3

TMS

TMS

TMS

TMS

Note:
Each disk array
will typically
have additional
global hot
spare drives.

6650H (2+1 nodes/clique)

6650H (3+1 nodes/clique)

30 AMPs / Node
60 AMPs / Clique

42 AMPs / Node
126 AMPs / Clique

120 Disks per Node
240 Disks per Clique

84 Disks per Node
252 Disks per Clique

Each Vdisk – 4 Disks (RAID 1)
Each Vdisk – 1.08 TB*
Clique – 60 AMPs x 1.08 TB = 65 TB*

Introduction to MPP Systems

* Actual MaxPerm
space is app. 90%.

Each Vdisk – 2 Disks (RAID 1)
Each Vdisk – 540 GB*
Clique – 126 AMPs x 540 GB = 68 TB*

Page 9-21

Teradata 6690 Systems
The Teradata 6690 platforms utilize Solid State Drives (SSD) and Hard Disk Drives (HDD)
within a single cabinet in the Active Enterprise Data Warehouse space.



Requires Teradata Virtual Storage (TVS).
SSD and HDD Storage is maintained within the same drive tray.

The Teradata Active Enterprise Data Warehouse platform is made up of a combination of
cabinet types, depending on the system configuration:




Processing/storage cabinet
BYNET cabinet
Teradata Managed Server (TMS) cabinet

Note: A Service Workstation (SWS) is installed in one TMS cabinet. A system may have
additional TMS cabinets.
6690 nodes are based on the 6650 processing nodes. Characteristics include:

Page 9-22



Up to two Intel Westmere six-core CPU’s
– 12 MB L2 cache with Hyper-threading
– Small performance increase over 5650; 6680H (126 TPerf)



450 GB OS drives support 96GB memory



300 GB dump drive for restart performance

Introduction to MPP Systems

Teradata 6690 Systems
Features of the 6690 system include:

• The Teradata 6690 platforms utilize Solid State Drives (SSD) and Hard Disk Drives
(HDD) within a single cabinet in the Active Enterprise Data Warehouse space.

– Requires Teradata Virtual Storage (TVS).
– SSD and HDD Storage is maintained within the same drive tray.

• The 6690 is designed to reduce floor space utilization (similar to 6650).
– The UPS/batteries are not used with the 6690 cabinet.
– Data integrity in event of site wide power loss is provided by WAL.

• A 6690 nodes uses the Intel six-core Westmere CPUs with hyper-threading enabled.
The 6690 has a faster CPU (2.93 GHz. versus 3.06 GHz) than the previous 6680 node.

– These systems can be configured in 1+1 or 2+1cliques.
– A 6690 clique is contained within 1 processing/storage cabinet.

• No co-existence with Active Warehouse 5xxx and not planned for with 6650 systems.
– The 6690 is ideal for new customers and/or floor sweeps.
– The 6690 will co-exist with future Active EDW Platform mixed storage offerings.

Introduction to MPP Systems

Page 9-23

Teradata 6690 Cabinets
Each Teradata 6690 cabinet can be configured in a 1+1 or 2+1 clique configuration.


A processing/storage cabinet contains one clique.



A cabinet with a 2+1 clique contains two processing nodes, one hot standby node,
and four disk arrays.



A cabinet with a 1+1 clique contains one processing node, one hot standby node,
and four disk arrays.

Virtualized Management Server (VMS)
The VMS is available with the 2690 Appliance and the 6690 Enterprise Warehouse Server.
Characteristics of the VMS include:
•

1U Server that VIRTUALIZES system and cabinet management software onto a single
server

• Teradata System VMS – provides complete system management functionality
–
–
–
–

Cabinet Management Interface Controller (CMIC)
Service Workstation (SWS)
Teradata Viewpoint (single system only)
Automatically installed on base/first cabinet

• The VMS allows full rack solutions without an additional cabinet for traditional
Viewpoint and SWS
• Eliminates need for expansion racks reducing customers’ floor space and energy costs
• For multi-system monitoring and management traditional Teradata Viewpoint is
required.

Page 9-24

Introduction to MPP Systems

Teradata 6690 Cabinets
6690 Characteristics

• Integrated Cabinet with nodes and SSD and HDD
arrays in same cabinet.

• Each NetApp drive tray can hold up to 24 SSD
and/or HDD drives.

– SSD drives are 400 GB.
– HDD drives (10K RPM) are 600 GB.
– Possible maximum of 360 disks in the cabinet.

• One NetApp tray has 2 controllers and supports 2
additional expansion trays.

• 6690 feature – Virtualized Management Server (VMS)

Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives

Expansion Tray
Controllers
Expansion Tray

* Expansion Tray
Controllers

Up to 24 SAS Drives

Expansion Tray

Up to 24 SAS Drives

Expansion Tray

Up to 24 SAS Drives

Controllers

VMS (1U)
HSN
TPA Node
TPA Node
Up to 24 SAS Drives

• No UPSs in cabinet.
• There is no room for BYNET switches in this

Up to 24 SAS Drives

2+1
Clique in
a single
cabinet

Up to 24 SAS Drives
Up to 24 SAS Drives

Expansion Tray

Up to 24 SAS Drives

Expansion Tray

Up to 24 SAS Drives

Controllers

PDU

PDU

6690

Introduction to MPP Systems

*

Up to 24 SAS Drives

– Consolidated CMIC, SWS, Teradata Viewpoint

cabinet. Therefore, BYNET switches are located in a
separate cabinet.

Expansion Tray

* Not present in
a 1+1
Configuration

Page 9-25

Teradata Extended Nodes
Additional specialized nodes are available to Teradata 55xx, 56xx, and 66xx systems. The
various type and possible uses are listed on the facing page.
General Notes:


All TPA nodes (Teradata Nodes running the Teradata Database) must execute the
same Operating System.



Non-TPA Nodes and/or Managed Servers, can execute the same or a different
Operating System; this is the "mixed OS support".



A Non-TPA Node is a Teradata Server (Node) that is BYNET connected, but does
not run the Teradata Database. A Non-TPA Node can communicate to the Teradata
Database through TCP/IP emulation across the BYNET.



A Managed Server is a Teradata Server (Node) that resides in the Teradata System
Cabinet (rack mounted) and is connected through a dedicated Ethernet network to
the Teradata Database Instance.



The purpose of both Non-TPA Nodes and Managed Server Nodes is flexibility.
These nodes can be used similar to external application servers for BAR,
ETL/ELT, BI, etc. Some of the advantages of Non-TPA or Managed Server nodes
include a single point of management/maintenance, "pre-built" dedicated network
to Teradata Database, and they can often be installed into existing Cabinets,
minimizing additional footprint in the data center.

Page 9-26

Introduction to MPP Systems

Teradata Extended Nodes
Examples of extended node types:

• Hot Standby Nodes (HSN)
–
–
–

BYNET connected
spare node that is part of a clique and is used in the event of a node failure.
Located in same cabinet as other nodes; managed by SWS

• Channel Server (used as interface between Teradata and mainframe (e.g., IBM)
–
–
–
–

BYNET connected
Maximum of 3 ESCON and/or FICON adapters – allows host channel connections
Node with 1 Quad-core CPU and 24 GB of memory – improves Teradata performance by
offloading the channel workload
Located in same cabinet as other nodes; managed by SWS

• Teradata Managed Server (TMS) Nodes
–
–
–

Not BYNET connected
Dell server integrated in processor cabinet for use with Teradata applications
• Can be utilized as a Viewpoint, SAS, BAR, Ethernet, TMSM, Data Mover, etc. node
Located in same cabinet as other nodes; managed by SWS

• Non-TPA Nodes
–
–
–

BYNET connected
Can be used to execute application software (e.g., ETL)
Located in same cabinet as other nodes; managed by SWS

Introduction to MPP Systems

Page 9-27

Making Sense of the Different Platforms
The facing page attempts to provide some perspective of the different platforms.
The 4400, 4800, 4850, 5200, and 5250 nodes are based on the Intel Eclipse chassis and
Aspen baseboard technology. These nodes are often referred to as Eclipse nodes.
The 4455, 4851, 4855, 5251, and 5255 nodes are based on the Intel Koa baseboard
technology. These nodes may be referred to as Koa nodes.
The 4470, 4900 and 5300 nodes are based on the INTEL Dodson baseboard technology and
may be referred to as Dodson nodes.
The 4475, 4950 and 5350 nodes are based on the INTEL Hodges baseboard technology and
may be referred to as Hodges nodes.
The 4480, 4980, and 5380 nodes are based on the INTEL Harlingen baseboard technology
and may be referred to as Harlingen nodes.
The 5400 and 5450 nodes are based on the INTEL Jarrell baseboard technology and may be
referred to as Jarrell nodes.
The 155x, 25xx, and 55xx nodes are based on the INTEL Alcolu baseboard technology and
may be referred to as Alcolu nodes.
The following dates indicate when these systems were generally available to customers
(GCA – General Customer Availability).
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–

Page 9-28

5100M
4700/5150
4800/5200
4850/5250
4851/4855/5251/5255
4900/5300
4950/5350
4980/5380
5400E/5400H
5450E/5450H
5500E/5500C/5500H
2500/5550H
2550/2555/5555C/H
1550
1600/2580/5600C/H
5650C/H
6650C/H and 6680
2690
6690

January, 1996 (not described in this course)
January, 1998 (not described in this course)
April, 1999
June, 2000
July, 2001
March, 2002
December, 2002
August, 2003
March, 2005
April, 2006
March, 2007
January, 2008
October, 2008 (2550) and March, 2009 (2555/5555)
December, 2008
March, 2010
July, 2010
April, 2011
October, 2011
February, 2012

Introduction to MPP Systems

Making Sense of the Different Platforms
Model

CPU

BYNET

2003
2004

5350/5380 (2 – 512 nodes)

Intel Xeon 2.8/3.06 GHz

BYNET V2.1

2005
2006

5400/5450H (1–1024 nodes)

Intel Xeon 3.6/3.8 GHz

BYNET V3.0

2007

5500H
(1–1024 nodes)

Intel Dual-core
Xeon CPUs 2.66 GHz

BYNET V3.1

2008
2009

5550/5555H
(1–1024 nodes)

Two Intel Quad-core
Xeon CPUs 2.33 GHz

BYNET V3.1/V3.2

2010

5600/5650H
(1–4096 nodes)

Two Intel quad or six-core
CPUs 2.66/2.93 GHz

BYNET V4.0

2011

6650H/6680/6690
(1–4096 nodes)

Two Intel six-core CPUs
2.93/3.06 GHz

BYNET V4.0

Introduction to MPP Systems

Page 9-29

Linux Coexistence Combinations
The facing page illustrates possible Linux coexistence combinations.

Page 9-30

Introduction to MPP Systems

Linux Coexistence Combinations
Coexistence systems contain a mixture of node and storage generations that operate as a
single MPP system running the same software.
Goal is to have
Parallel Efficiency:

5400E/5400H – Xeon 3.6 GHz
5450E/5450H – Xeon 3.8 GHz

Utilization of one set of
cliques at 100% and the
other sets of cliques as
close to 100% as
possible.

Conversion to 64-bit
Linux is required if the
nodes are not already
running 64-bit Linux.

5500C/H – 2/4 core Xeon 2.66 GHz
5550H – 8 core Xeon 2.66 GHz

This is done by
balancing the workload
between the nodes.

May need to leverage
larger Linux memory.

5555C/H – 4/8 core Xeon 2.33 GHz
5600C/H – 4/8 core Nehalem 2.66 GHz

5650C/H – 6/12 core Westmere 2.93 GHz
6650C/H – 6/12 core Westmere 2.93 GHz
6680/6690 – 12 core Westmere 2.93/3.06 GHz

Introduction to MPP Systems

66xx systems can
coexist with future
systems.

Page 9-31

Teradata Appliance Introduction
A Teradata appliance is a Teradata server which is optimized specifically for high DSS
performance. The first Teradata appliance was the 2500 introduced in 1Q2008.
Characteristics of the Teradata appliances include:






Delivered Ready to Run
– Integrated system fully staged and tested
– Includes a robust set of tools and utilities
Rapid Time to Value
– System live within hours
Competitive Price Point
– Capacity on Demand available if needed
Easy Data and Application Migration to a Teradata EDW/ADW

What is an Appliance?
An appliance is an instrument or device designed for a particular use. The typical
characteristics of an appliance are:







Combination of hardware and software designed for a specific function – for
example, the 25xx hardware/software is optimized for fast table scans & “Deep
Dive” Analytics.
Fixed/limited function – designed specifically for Decision Support workloads,
the hardware is not configured or optimized for ADW.
Fixed capacity/configuration - have a fixed configuration and limited upgrade
paths.
Ease of installation – fully staged and the integrated design greatly reduces the
number of cabinet interconnect cables.
Simple to operate – appliances are Teradata system! They have all the Server
Management and capabilities used in the MPP systems.

Teradata Load ‘N Go Services make is easy to quickly implement a new system

Load data from operational systems

Five easy steps completed in about one month
– Step 1 - Build the base database structure
– Step 2 - Easy Set-Up Options
– Step 3 - Build and test the load scripts using the TPT Wizard
– Step 4 - Conduct the initial load
– Step 5 - Document and turn load/reload process over to customer

No transformations or consolidation into an enterprise data model

Users have access to data quickly

Enabling new business insights
The firmware in the disk array controllers for 25xx systems has been specifically optimized
for scan-based workloads. The disk array controller pre-fetches entire cylinder to cache
when a cylinder index is accessed by Teradata.

Page 9-32

Introduction to MPP Systems

Introduction to Teradata Appliances
• What is an Appliance?
– An appliance is an device designed for a specific function.
– Fixed/limited function and fixed capacity/configuration.
– Easy to install and simple to operate.

• Data Warehouse Appliance
– Teradata nodes and storage is integrated into a single cabinet.
– Delivered ready to run with rapid time to value.
– System live within hours, fully staged and tested.

• Powerful
– Purpose-built for high analytical performance.
– Optimized for fast file scans and heavy “deep dive” analytics.

• Cost-Effective
– Competitive price point.
– Easy data and application migration to a Teradata Enterprise
Data Warehouse.

• Ideal for Entry Level Data Warehouses, Analytical
Sand Boxes, and Test and Development Systems.

Introduction to MPP Systems

Teradata 2500

Page 9-33

Teradata 2650/2690 Appliances
Teradata 2650 Appliance
The Data Warehouse Appliance 2650 can have up to 9 nodes in a cabinet. The nodes utilize
the Intel Westmere six-core CPU with hyper-threading and 96 GB of memory per node.
The Data Warehouse Appliance 2650 comes standard with the BYNET over Ethernet
switch. For scalability requirements beyond 275TB you can configure BYNET V4, but
special approval is required.

Teradata 2690 Appliance
The Data Warehouse Appliance 2690 can have up to 8 nodes in a cabinet. The nodes utilize
the Intel Westmere six-core CPU (3.06 GHz) with hyper-threading and 96 GB of memory
per node. Cliques consist of 2 nodes and no HSN. The Data Warehouse Appliance 2690
comes standard with the BYNET over Ethernet switch.

Page 9-34

Introduction to MPP Systems

Teradata 2650/2690 Appliances
Teradata appliances utilize a fully integrated cabinet design with nodes and disk
arrays in the same cabinet. Two examples of appliances are:
Teradata 2650 Systems
• Nodes use 2 Intel Six-core Westmere CPUs at 2.93 GHz; 96 GB of memory per node
• 24 AMPs per node
– 24 SAS 300 or 600 GB drives, or 12 SAS 2 TB drives per node
• A 2650 cabinet can house up to 9 nodes.
– Cliques are in 3 node configurations (no HSN); Cabinets can have 1, 2, or 3 cliques.

Teradata 2690 Systems
• Nodes use 2 Intel Six-core Westmere CPUs at 3.06 GHz; 96 GB of memory per node
• Each node has 2 hardware compression boards
• 24 AMPs per node
– 24 SAS 300, 600, or 900 GB drives per node (2.5" drives @ 10K RPM)
• A 2690 cabinet can house up to 8 nodes.
– Cliques are in 2 node configurations (not HSN); a cabinet can have between 1 and 4 cliques.
• Utilizes VMS (Virtualized Management Server)
– Consolidated CMIC, SWS, Teradata Viewpoint

Introduction to MPP Systems

Page 9-35

Teradata 2650/2690 Cabinets
With the 2650, you can have 3 cabinet type configurations: a 1/3 cabinet, 2/3 cabinet and
full cabinet. With a full cabinet you have 9 nodes. The disk drives that are supported are
300GB or 600GB drives or 108 2 TB 3.5” drives. To also help improve loading you can
configure the system with 10GB Ethernet copper or fiber ports. Cliques are configured with
3 nodes and no HSN.
A 1/3 cabinet is designed for lower CPU/Node density per cabinet. A 2/3 cabinet is
designed for medium CPU/Node density per cabinet. It is a good solution for mid-size
capacity options and provides flexible solutions for adding an integrated SWS or TMS. A
fully populated 2650 cabinet is designed for high CPU/Node density per cabinet. It is a
good solution for a high capacity system driving high CPU utilization.
With the 2690, a cabinet can be configured with up to 4 cliques in 2 node clique
configurations (no HSN). A full cabinet will have 8 nodes. The disk drives that are
supported are 300GB, 600GB, or 900GB drives.
One important new feature of the 2690 is hardware compression.
•
•
•

With automatic block level compression, customers can get as much as 3x the
customer data space, so the amount of available storage has tripled.
System level scan rate (what’s also known as effective scan rate), has increased 3x
as well because with compression because 3x more data is scanned.
Also, hot cache memory, which is where frequently used results are stored until not
needed, has tripled as well because the data/results being stored are compressed.

With compression, the system can be pushed higher because compression CPU work has
been moved out of the nodes, and that CPU is available for Teradata work.
The Teradata Virtualized Management Server is a standard feature on the Data Warehouse
Appliance 2690. This 1U managed server rack mounts in the appliance node cabinet and
essentially consolidates all Teradata management functionality into one server. The VMS
contains the following functionality:
•
•
•

Teradata Viewpoint, single system: Teradata Viewpoint is the robust web portal that
manages workloads, queries, and systems.
SWS: The Teradata SWS is the software that monitors the physical cabinet. This
includes the nodes, disks, and connectivity.
CMIC: The CMIC monitors all the disk controllers and cabling

The VMS is a key reason why full racks can be shipped without having to have a separate
expansion cabinet for this functionality. Some considerations include:
• Traditional Viewpoint is still available, but it is priced and licensed differently.
Please see the Teradata Viewpoint OCI for more information. Also note that VMS
Viewpoint can only monitor one system, not multiple
• If more than one node cabinet is required, the expansion cabinet will also have a
VMS but will only contain the CMIC software as the others aren’t needed.
Page 9-36

Introduction to MPP Systems

Teradata 2650/2690 Cabinets
Disk Array
(Dual Array
Controllers;
72 Drives)

Disk Array

Disk Array

Disk Array
(Dual Array
Controllers;
48 Drives)

Disk Array
Disk Array
2 Node
Clique
2 Node
Clique

2690 Node
2690 Node
2690 Node
2650 Node

Disk Array
3 Node
Clique
3 Node
Clique
3 Node
Clique

2650 Node
2650 Node
2650 Node

Disk Array

2650 Node
2650 Node

Disk Array

2650 Node
2650 Node
2650 Node
2650 Node
Dual AC Box

Fully loaded 2650
Cabinet

Introduction to MPP Systems

Nodes are
numbered
2 -10.

2 Node
Clique
2 Node
Clique

2690 Node
2690 Node
2690 Node
2690 Node
Dual AC Box

Nodes are
numbered
2 -9.

Fully loaded 2690
Cabinet

Page 9-37

Appliance Configuration Examples
The examples on the facing page show a typical AMP and Disk configurations for 2650 and
2690 systems.
Notes:


2650 systems utilize SAS disks (Serial Attached SCSI) – 300 GB and 600 GB disk
drives



2650 systems can utilize 2 TB SATA disks (Serial Advanced Technology
Attachment)



2690 systems can utilize 300, 600, or 900 GB SAS disk drives.

Page 9-38

Introduction to MPP Systems

Appliance Configuration Examples
2690

2650
24 Disks – 600 GB
24 Disks – 600 GB

• 300 or 600 GB SAS Disks
2.5" – 216 in cabinet

24 Disks – 600 GB
24 Disks – 600 GB

• 2 TB Disks
3.5" – 108 in cabinet

24 Disks – 600 GB

24 Disks – 600 GB
24 Disks – 600 GB
Node – Westmere CPUs
Node – Westmere CPUs
Node – Westmere CPUs
Node – Westmere CPUs

• 3 Node Cliques share 3
•
•
•

drive trays
96 GB Memory / Node
24 AMPs / Node
72 AMPs /Clique

Node – Westmere CPUs

24 Disks – 600 GB
24 Disks – 600 GB

Node – Westmere CPUs
Node – Westmere CPUs

24 Disks – 600 GB

• 300, 600, or 900 GB SAS
Disks
2.5" – 192 in cabinet

2690 Clique

• 2 Node Cliques share 2
drive trays

• 96 GB Memory / Node
• 24 AMPs / Node
• 48 AMPs /Clique
• Includes hardware
compression.

24 Disks – 600 GB

• 24 Disks / Node (RAID 1)
• 72 Disks / Clique

24 Disks – 600 GB
24 Disks – 600 GB

• 24 Disks / Node (RAID 1)
• 48 Disks / Clique

Node – Westmere CPUs

Node – Westmere CPUs

Node – Westmere CPUs

24 Disks – 600 GB

Node – Westmere CPUs

2650 Clique

Node – Westmere CPUs

Node – Westmere CPUs

24 Disks – 600 GB

Node – Westmere CPUs

24 Disks – 600 GB
24 Disks – 600 GB

2690 Disk Options

2650 Disk Options

2650 Cabinet with 9 nodes

• 216 AMPs
• 864 GB memory in cabinet

(Up to 9 Nodes
in a cabinet)

Introduction to MPP Systems

Node – Westmere CPUs
Node – Westmere CPUs
Node – Westmere CPUs

2690 Cabinet with 8 nodes

• 192 AMPs
• 768 GB memory in cabinet

(Up to 8 Nodes
in a cabinet)

Page 9-39

What is the BYNET™?
The BYNET (BanYan Network) provides high performance networking capabilities for
MPP systems. The BYNET is a dual-redundant, bi-directional, multi-staged network based
on a Banyan network topology. The BYNET enables multiple processing nodes (SMP
nodes) to communicate in a high speed, loosely-coupled fashion.
BYNET communication occurs in a point-to-point, multi-cast, or broadcast fashion. A
connection request contains an address or routing tag for the intended receiving node or
group of nodes. Once the connection is made, a circuit is established for the duration of the
connection. The BYNET works much like a telephone network where many callers can
establish connections, including conference calls.
The BYNET interconnect provides a peak bandwidth of x Megabytes (MB) per second for
each node per direction connected to a network.





V1 – 10 MB
V2 – 60 MB
V3 – 93.75 MB
V4 – 240 MB

For example, a BYNET v4 network provides 240 MB x 2 (bi-directional) x 2 (BYNETs) =
960 MB/sec per node. A 10-node 5600 system with a dual BYNET network has the
potential raw capability of 9600 MB (or 9.6 GB) per second total bandwidth for point–to–
point connection. However, the total available broadcast bandwidth is 960 MB per second
for a dual network system of any size.
Other features of the BYNET network include:


Guaranteed delivery - a message from a node is guaranteed to be delivered without
error to the receiving node(s); multiple levels of error checking and
acknowledgment are used to ensure this.



Fault tolerant - multiple connection paths are available in each network; dual
network feature provides an active backup network should one network be lost.



Flexible network usage - nodes communicate in point-to-point or broadcast fashion.



Self-configuring - the BYNET automatically determines network topology at startup; enables ease of installation.



Self-diagnosis and automatic fault recovery - automatically detects and reports
errors; reconfigures routing of connections to avoid inoperable processing nodes.



Load balancing - traffic is automatically and dynamically distributed throughout
the networks.

Page 9-40

Introduction to MPP Systems

What is the BYNET?
What is the BYNET (BanYan NETwork)?

• High speed interconnect (network) for processing nodes in MPP systems. The BYNET
is a dual redundant network.
• BYNET works much like a telephone network where many callers (nodes) can
establish connections, including conference calls.
• BYNET Version 3 Switches – 375 MB/Sec per node

• BYNET Version 4 Switches – 960 MB/Sec per node
BYNET Switch
(v1, v2, v3, or v4)

BYNET Switch
(v1, v2, v3, or v4)

BYNET Switch Examples

•
•
•
•

BIC
Open
BYNET SW

BYNET 4 switch (v2.1 – 240 MB/sec)
BYNET 32 switch (BYA32S)
– Can execute at v3 or v4 speed
BYNET 64 switch (v3.0 – 12U switches)
BYNET 64 switch (v4.0 – 5U switches)

BIC
...

SMP

Introduction to MPP Systems

Open
BYNET SW

BIC (BYNET Interface Card) Examples
(these can run at v3 or v4 speeds)

•
•

BIC2SX – used with 54xx nodes
BIC2SE – used with 5500 nodes and later

SMP

Page 9-41

BYNET 32 Switches
The facing page contains of an example of a BYNET 32 switches. Examples of other
BYNET switches are listed below. This is not an inclusive list.
BYNET 4 Switch Version 2 (BYA4G) – a PCI card designed to interconnect up to 4 SMPs.
This switch is a BYNET v2 switch (60 MB/sec.) designed for 485x systems. The BYA4G is
a PCI card that is placed into a PCI slot of an SMP.
BYNET 4 Version 2.1 Switch (BYA4M) – PCI card designed to interconnect up to 4
SMPs. This switch is a BYNET v2.1 switch (60 MB/sec.) designed for 4900 systems. The
BYA4M is a PCI card that is placed into a PCI slot of an SMP.
BYNET 4 Switch Version 2.1 (BYA4MS) – PCI card designed to interconnect up to 4
SMPs. This BYNET V2.1 switch (60 MB/sec.) was designed for 4980 systems. The
BYA4MS has a shorter form factor – S is for shorter.
BYNET 32 Switch (BYA32S) – this switch can run at v3 or v4 speeds depending on the
system and type of BICs. Up to 16 TPA nodes and 16 NOTPA nodes can be connected to
this switch. This 1U chassis switch resides in a Base or System Cabinet.


Includes an Ethernet Interface for BYNET status & error reporting and chassis
management.

Note on BYNET cables:


Page 9-42

There is a physical difference between BYNET v2 and BYNET v3/v4 cables. The
BYNET v3/v4 cables have a “Quick Disconnect” connector whereas the BYNET
v2 cables have a “Micro D” connector with 2 screws. The number of wires inside
the cables is the same.

Introduction to MPP Systems

BYNET 32 Switches
BYNET 0

BYNET 1

BYA32S Switch

TPA
1

TPA
2

...

BYA32S Switch

TPA
16

NOTPA

17

NOTPA
18

...

NOTPA

32

BYNET 32 switch (BYA32S) is a 1U chassis used in an processor rack.

• This 32-port switch can execute at v3 or v4 speeds.
• Up to 16 TPA nodes can be connected.
• An additional 16 HSN, Channel, or non-TPA nodes can be connected.

Introduction to MPP Systems

Page 9-43

BYNET 64 Switches
For configurations greater that 16 TPA/HSN nodes, BYNET 64 switches must be used.
BYNET 64 Node Switch Version 2 (BYA64GX chassis) – this switch is actually
composed of 8 BYA8X switch boards in the BYA64GX chassis. Each BYA8X switch
board allows up to 8 SMPs to interconnect (i.e., 8 switches x 8 SMPs each = 64 SMPs). The
BYA64GX is actually a backpanel that allows the 8 BYA8X switch boards to interconnect.
This 12U chassis resides in either the BYNET V2 64 Node Switch cabinet or the BYNET
V2 64/512 Node Expansion Cabinet.
Note: BYA8X switch board (in BYA64GX chassis): This is Stage A base switch board.
Each board supports 8 links to nodes. The BYA64GX chassis can contain a
maximum of 8 BYA8X switches, allowing for 64 links to nodes. In systems
greater than 64 nodes, the BYA8X switch boards also connect the BYA64GX
chassis to BYB64G chassis through X-port connectors, one on each BYA8X board.

BYNET Switch Cabinets
Even though the BYNET switch cabinets are different for BYNET v2, v3, and v4.
However, the basic purpose is the same - the purpose is to house BYNET 64 switches.
The BYNET 64 Node Switch Cabinet (shown on facing page) can be used for
configurations from 2 through 64 nodes and must be used for configurations greater than 16
nodes. All nodes in the configuration are interconnected from the BYNET (V2 or V3) node
interface to the BYNET (V2 or V3) 64 Node Switch chassis (BYA64GX). Two BYNET
(V2 or V3) 64 Node Switch Cabinets are required for the base dual redundant BYNET V2
networks.
The BYNET 512 Node Expansion Cabinet Version 2 (or 3) (not shown) is for used for
configurations that begin with 64 nodes or less and has expanded beyond 64 node maximum
configuration supported by the BYNET BYA64GX chassis (in the BYNET 64 Node Switch
Cabinet). Above 64 nodes, the BYNET BYB64G chassis (effectively a 512 node switch
chassis) is used to interconnect multiple BYNET 64 node switch chassis. The simple
configuration rules are:


Each group of 2 to 64 nodes requires two BYNET V2 64 node switch chassis; a
minimum of two is required for dual redundancy.



For configurations with greater than 64 nodes, each BYNET V2 64 node switch
chassis must have a complimentary BYNET V2 512 node switch chassis.

Page 9-44

Introduction to MPP Systems

BYNET 64 Switches
A BYNET 64 Switch is a separate chassis located
inside a BYNET rack or cabinet.

•

BYNET v3 64 Switches (BYA64GX) – 12U in height
– 375 MB/sec per node for both BYNET channels

•

BYNET v4 64 Switches (BYA64S) – 5U in height
– 960 MB/sec per node for both BYNET channels

Two BYNET switch racks are needed to house these
two BYNET 64 switches.

BYA64S Switch
(v4)

Node
2

...

BYNET 64
Node
Switch
Chassis – 5U

BYA64S
(v4)

BYA64S
(v4)

BYNET 1

BYNET 0

Node
1

BYNET 64
Node
Switch
Chassis – 5U

BYA64S Switch
(v4)

Node
64

Nodes connect to BYA switches.

Introduction to MPP Systems

BYNET Switch
Rack

BYNET Switch
Rack

Page 9-45

BYNET Expansion Switches
With BYNET v3, the BYA64GX and BYC64G switches are physically identical. What
makes them different is the firmware that is loaded onto the BYNET switch and how they
are cabled together.


The base chassis is the same as the BYNET version 2 base chassis. This includes
including sheet metal and backpanel, power supplies, power control board, and
fans.



The v3 BYA8QX switch is new within the BYA64GX and BYC64G switches.

BYNET V3 64-node Switch Chassis
The BYNET V3 64-node Switch Chassis are used in 5400H systems with greater than 16
nodes. Each switch chassis resides in its own cabinet or co-reside with a BYNET V3 1024node Expansion Switch Chassis. Each BYNET V3 64-node Switch Chassis provides the
BYNET switching for its own BYNET V3 fabric. Therefore, for redundancy; two 64-node
Switch Chassis are needed. In systems with greater than 64 nodes, two BYNET 64-node
switches are needed for every 64 nodes.

BYNET V3 1024-node Expansion Switch Chassis
The BYNET V3 1024-node Expansion Switch Chassis (marketing name) is used in 5400H
systems with greater than 64 nodes. The 1024-node switch resides in its own cabinet or coresides with a BYNET 64-node switch.
The total number of 1024-node switch chassis needed in a system is a power of 2 based on
the number of nodes.


For systems with 65 - 128 nodes, two 1024-node switches are needed per BYNET
fabric (total of 4).



For systems with 129 – 256 nodes, four 1024-node switches are needed per
BYNET fabric (total of 8).



For systems with 257 – 512 nodes eight 1024-node switches are needed per
BYNET fabric (total of 16).

BYNET Expansion to 1024 Nodes
BYNET v3/v4 support configurations up to 1024 nodes. For BYNET v3, in order to
interconnect more than 512 nodes additional BYOX and BYCLK hardware is needed.

Page 9-46

Introduction to MPP Systems

BYNET Expansion Switches
This example shows both BYNETs and connects 128 nodes.
BYNET 0

BYNET 1

BYC
Switch

BYC
Switch

BYC
Switch

BYC
Switch

BYA
Switch

BYA
Switch

BYA
Switch

BYA
Switch

Node
1

Node
2

...

Node
64

Node
65

Node
66

...

Node
128

BYA64S
(v4)

BYA64S
(v4)

BYA64S
(v4)

BYA64S
(v4)

BYC64S
(v4)

BYC64S
(v4)

BYC64S
(v4)

BYC64S
(v4)

• The BYNET v4 Expansion switch (BYC) is a separate 5U
chassis located inside the BYNET rack or cabinet.

• The BYNET v3 Expansion switch (BYC – not shown) is a
12U chassis. To support 128 nodes with BYNET v3
switches, 4 BYNET switch BYNET racks are needed.

Introduction to MPP Systems

BYNET Switch
Rack

BYNET Switch
Rack

Page 9-47

Server Management with SWS
The SWS (Service Workstation) provides a single operational view for Teradata MPP
Systems and the environment to configure, monitor, and manage the system. The SWS
effectively is the central console for MPP systems.
The SWS is one part of the Server Management subsystem that provides monitoring and
management capabilities of MPP systems. Prior to the SWS, other server management
environments were:
1st Generation Server Management (3600) – Server Management (SM) processing,
storage and display occurred on AWS.
2nd Generation Server Management (5100, 48xx/52xx, 49xxx/53xx) – most SM
processing occurs on CMICs and Management Boards. The AWS still provides all the
storage and display.
3rd Generation Server Management (54xx systems and beyond) – most SM
processing occurs on CMICs and Management Boards. The SWS still provides all the
storage and display. The Server Management subsystem uses industry standard parts, a
Server Management Node and Ethernet switches to implement an Ethernet based
Server Management solution. This new Server Management is referred to a Third
Generation Server Management (SM3G).
One of the reasons for the new Server Management subsystem is to better adhere to industry
standards. Ethernet-based management is now the industry standard for chassis vendors.

Virtualized Management Server (VMS)
The Teradata Virtualized Management Server is a standard feature on the Data Warehouse
Appliance 2690 and the 6690. This 1U managed server rack mounts in the appliance node
cabinet and essentially consolidates all Teradata management functionality into one server.
The VMS contains the following functionality:
•
•
•

Teradata Viewpoint, single system: Teradata Viewpoint is the robust web portal that
manages workloads, queries, and systems.
SWS: The Teradata SWS is the software that monitors the physical cabinet. This
includes the nodes, disks, and connectivity.
CMIC: The CMIC monitors all the disk controllers and cabling

The VMS is a key reason why full racks can be shipped without having to have a separate
expansion cabinet for this functionality. Some considerations include:
• Traditional Viewpoint is still available, but it is priced and licensed differently.
Please see the Teradata Viewpoint OCI for more information. Also note that VMS
Viewpoint can only monitor one system, not multiple
• If more than one node cabinet is required, the expansion cabinet will also have a
VMS but will only contain the CMIC software as the others aren’t needed.

Page 9-48

Introduction to MPP Systems

Server Management with SWS
For 1600, 56xx, and 66xx systems:
• The SWS (Service Workstation) is a Linux workstation that is

Dual Ethernet LANs

dedicated to system servicing and maintenance.

– May be deskside or rack mounted
•

Server Management WEB (SMWeb) services provides
operational & maintenance control via Internet access.

Option for 1650, 2690, and 6690 systems:
• VMS (Virtualized Management Server) – consolidated CMIC,
SWS, Teradata Viewpoint

SMWeb services
provide the ability to:
• connect to AWS
type display

•
•
•
•

connect to nodes
power on/off/reset
manage alerts
obtains h/w or s/w
status information

Array
Controllers

BYNET
BYNET
HSN
SMP 15
SMP 14
HSN
SMP 12
SMP 11
HSN
SMP 9
SMP 8
SM (CMIC)

Collective #1

Introduction to MPP Systems

Array
Controllers

Array
Controllers

HSN
SMP 15
SMP 14
HSN
SMP 12
SMP 11
HSN
SMP 9
SMP 8
SM (CMIC)

Array
Controllers

Collective #2

Page 9-49

Node Naming Conventions
The examples on the facing page show AWS naming conventions for cabinets or racks.
Each chassis consists of a number of internal components (processors, fans, power supplies,
management boards, etc.). The chassis numbering for 52xx/53xx cabinets starts at 1 from
the top of the cabinet to bottom of the cabinet. The chassis numbering for 54xx and 55xx
cabinets starts at 1 from the bottom of the cabinet to the top of the cabinet.

54xx/55xx Chassis Numbering Conventions
A standard chassis numbering convention is used for the 54xxE, 54xxH/LC, 55xxC/H,
Storage, and BYNET cabinets. The chassis numbers are not defined by hardware, but only
by convention and numbering defined in the CMIC configuration file. Chassis numbers
begin with one and go up to 22. Chassis numbers are assigned to the position; chassis
numbers begin for each type of chassis as defined below and are not skipped if a chassis is
not installed.
All Cabinets
In all cabinets, chassis 1 is the bottom UPS, the numbering continues upward until all
UPS(s) are assigned. Up to 5 UPS(s) can exist in the cabinet.
Node Cabinets
Chassis 6 - CMIC in the Node cabinets
Chassis 7 through 16 – Nodes; the bottom node chassis starts 7 and continues up to 16.
The chassis number is assigned to the position, if no node is installed the chassis
number is skipped. If only 8 TPA nodes in a rack, then nodes are numbered 9 to 16.
Chassis 17 and 18 – BYA32Gs
Chassis 19 through 22 – FC switches (if present)
Storage Cabinets
Chassis 4 – SM Chassis (CMIC) in a Storage cabinet (if present)
Chassis 5 and 6 – Disk Array Controller chassis; lower disk array is 5, the upper is 6.
Disk Array names are DAMCxxx-y-z where xxx is collective number, y is cabinet
number, and z is chassis number.
BYNET Cabinets
Chassis 4 and 5 – BYNET 64 switches (Chassis 4 - BYC64, Chassis 5 - BYA64)

54xx/55xx Collective Conventions
A collective is made up of the node and disk array cabinets that are part of the same server
management group (usually the same clique).





Include the first BYNET Cabinet to the first Node Cabinet Collective
Include the second BYNET Cabinet to the second Node Cabinet Collective
Include the third BYNET Cabinet to the third Node Cabinet Collective, etc
Remember, only one BYNET Cabinet may be configured in any 54xx Collective

The SM3G Collectives are defined in software using the CMIC Configuration Utility. The
CMIC Configuration Records (CMICConfig.xml) contain configuration information for all
the chassis in a CMIC’s collective. All SM3G chassis must reside on the same Primary and
Secondary management networks.

Page 9-50

Introduction to MPP Systems

Node Naming Conventions
6650

6650

6690

Secondary SM Switch

Secondary SM Switch

Up to 24 SAS Drives

Drive Tray (16 HD)

Drive Tray (16 HD)

Up to 24 SAS Drives

Drive Tray (16 HD)

Drive Tray (16 HD)

Up to 24 SAS Drives

1U BYNET Switch
1U BYNET Switch

Drive Tray (16 HD)

Drive Tray (16 HD)

Up to 24 SAS Drives

HSN
SMP001-15

Drive Tray (16 HD)

Drive Tray (16 HD)

Up to 24 SAS Drives

Drive Tray (16 HD)

Drive Tray (16 HD)

Up to 24 SAS Drives

SMP001-14
HSN
SMP001-12

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

Drive Tray (16 HD)

6844 Array
Controllers (4U)

6844 Array
Controllers (4U)

5650
1st E'net Switch – P
1st E'net Switch – S

16
15
14
13
12
11
10
9
8

Up to 24 SAS Drives

SMP001-11
HSN
SMP001-9
SMP001-8

7
SM – CMIC
UPS
UPS
UPS
UPS
UPS

Introduction to MPP Systems

SMP002-7

HSN

6

SMP002-6

SMP003-6

5

TMS Node

TMS Node

BYA32S-1
BYA32S-0
SM – CMIC (1U)
Primary SM Switch
PDU
PDU

TMS Node
SM – CMIC (1U)
Primary SM Switch
PDU
PDU

Up to 24 SAS Drives
Up to 24 SAS Drives

10
9
8

VMS (1U)
HSN
SMP004-9
SMP004-8
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
PDU

PDU

Page 9-51

Summary
The facing page summarizes the key points and concepts discussed in this module.

Page 9-52

Introduction to MPP Systems

Summary

Data Mart
Appliance

Extreme Data
Appliance

Data Warehouse
Appliance

Extreme
Performance
Appliance

Active Enterprise
Data Warehouse

Purpose

Test/
Development
or Smaller
Data Marts

Analytics on
Extreme Data
Volumes from
New Data Types

Data Warehouse
or Departmental
Data Marts

Extreme
Performance for
Operational Analytics

Enterprise Scale
for both
Strategic and
Operational
Intelligence
EDW/ADW

Possible
Uses

Departmental
Analytics,
Entry level
EDW

Analytical
Archive, Deep
Dive Analytics

Strategic
Intelligence,
Decision Support,
Fast Scan

Operational
Intelligence, Lower
Volume, High
Performance

Active Workloads,
Real Time Update,
Tactical and
Strategic response
times

Introduction to MPP Systems

Page 9-53

Module 9: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 9-54

Introduction to MPP Systems

Module 9: Review Questions
1. What is a major difference between a 6650 system as compared to a 6690 system?
_____________________________________________________________________
2. What is a major difference between a 2650 node and a 2690 node?
_____________________________________________________________________
3. What does the acronym represent and briefly define the purpose of the following subsystems?
BYNET _____________________________________________________________________________
SWS

_____________________________________________________________________________

4. Specify the names of the two TPA nodes in 6690 cabinet #2.
__________

____________

Play the numbers games – match the number to a definition.
1.
2.

3
8

a.
b.

Typical # of AMPs per node in a 6650 3+1 clique
Maximum number of nodes that can be in a 2690 cabinet

3.
4.

24
42

c.
d.

Maximum number of drives in one NetApp 6844 disk array
Number of nodes in a 2650 clique

5. 128
6. 900

e.
f.

Large disk drive size (GB) for a 2690 disk array
Typical # of AMPs in a 2690 node

Introduction to MPP Systems

Page 9-55

Notes

Page 9-56

Introduction to MPP Systems

Module 10
How Teradata uses MPP Systems

After completing this module, you will be able to:
 Identify items that are placed into FSG cache.
 Identify a purpose for the WAL Depot and the WAL Log.
 Describe the fundamental relationship between Linux, logical
units, and disk array controllers.
 Describe the fundamental relationship between Vdisks, Pdisks,
LUNs, and partitions.

Teradata Proprietary and Confidential

How Teradata uses MPP Systems

Page 10-1

Notes

Page 10-2

How Teradata uses MPP Systems

Table of Contents
Teradata and the Processing Node ............................................................................................. 10-4
FSG Cache ............................................................................................................................. 10-4
Memory and the Teradata Database ........................................................................................... 10-6
5555H Example...................................................................................................................... 10-6
SMP Memory – Summary ......................................................................................................... 10-8
Determining FSG Cache ........................................................................................................ 10-8
O.S. Managed Memory and FSG Cache .................................................................................. 10-10
WAL – Write Ahead Logic ...................................................................................................... 10-12
WAL Concepts ......................................................................................................................... 10-14
Linux Vproc Number Assignment ........................................................................................... 10-16
Disk Arrays from a O.S. Perspective ....................................................................................... 10-18
Logical Units and Partitions ..................................................................................................... 10-20
EMC2 Notes ......................................................................................................................... 10-20
Teradata and Disk Arrays......................................................................................................... 10-22
Teradata 6650 (2+1) Logical View .......................................................................................... 10-24
Teradata 6650 (3+1) Logical View .......................................................................................... 10-26
Example of 1.2 TB Vdisk (pre-TVS) ....................................................................................... 10-28
Teradata File System Concepts ................................................................................................ 10-30
Teradata Vdisk Size Limits ...................................................................................................... 10-30
Teradata 13.10 Large Cylinder Support ................................................................................... 10-32
When to Use This Feature ................................................................................................ 10-32
Full Cylinder Read ................................................................................................................... 10-34
Summary .................................................................................................................................. 10-36
Module 10: Review Questions ................................................................................................. 10-38

How Teradata uses MPP Systems

Page 10-3

Teradata and the Processing Node
The example on the facing page illustrates a 5650H processing node running Linux and
Teradata.
Memory will initially be allocated for the operating system and Teradata vprocs. PDE will
calculate how much memory to allocate to itself for FSG (File Segment Cache) based on
memory not being used by the operating system and the Teradata vprocs. PDE software will
manage the FSG memory space.
Practical experience (for most environments) indicates that the operating system (e.g.,
Linux) may need more than this initial allocation during startup. For these reasons, PDE is
not assigned all of the remaining memory for FSG cache, but a percentage (e.g., 90%) of the
remaining memory.
Also note that LAN and Channel adapters (PBSA) also require memory for network and
channel activity. For example, each channel adapter uses memory buffers up to 500 MB in
size. For 56xx systems, LAN and Channel Adapters not utilized within a TPA node. These
are implemented in “Extended Node Types”.

FSG Cache
FSG Cache is primarily used by the AMPs to access memory resident database segments.
When the Teradata Database needs to read a database block, it checks FSG Cache first.

Page 10-4

How Teradata uses MPP Systems

Teradata and the Processing Node
FSG (File Segment Cache) – managed by PDE

PE vproc

PE vproc

AMP

AMP

AMP

AMP

AMP

AMP

AMP

vproc

vproc

vproc

vproc

vproc

vproc

vproc

Teradata TPA S/W
PDE Vproc

GTW Vproc

TVS Vproc

RSG Vproc - optional

(Parallel Database Extensions)

(Gateway)

(Teradata Virtual Storage)

(Relay Services Gateway)

Linux
Process Control

Memory Mgmt.

Memory

CPUs
Pentium Westmere
Six-Core – 3.06 GHz

I/O Mgmt. (Device Drivers)
BIC2SE

QFC

Eth.

BYNET

Disk
Arrays

LANs

Pentium Westmere
Six-Core – 3.06 GHz

QFC - Quad Fibre Channel

How Teradata uses MPP Systems

Page 10-5

Memory and the Teradata Database
The example on the facing page assumes a 5650H node with 96 GB of memory executing
the Teradata Database. This example assumes 42 AMPs, 2 PEs, PDE, GTW, RSG, and TVS
vprocs for a total of 48 vprocs in this node. This means that memory will have to be
allocated for the 48 vprocs.
The operating system, device drivers, and Teradata vprocs for a 6650H Linux node with 96
GB of memory will use approximately 18 GB of memory. PDE will use a FSG Cache
Percent (CTL parameter) to calculate how much memory to allocate to itself for FSG (File
Segment Cache) based on the available memory (96 GB – 18 GB).
Practical experience (for most environments) indicates that the operating system (e.g.,
Linux) may need more than this initial allocation during startup. Parsing Engines and AMPs
will typically use more than their initial allocation of memory (80 MB). For example,
redistribution buffers for an AMP may use an additional 130 MB of memory for a total of
210 MB of memory per AMP.
For these reasons, PDE is not assigned all of the remaining memory for FSG cache, but a
percentage of the remaining memory. The default of 90% for FSG Cache Percent works for
most 66xx systems. 90% of 78 GB (96-18) = 70.2 GB of FSG cache.
This can be verified by using the ctl utility hardware function, it can be determined that 42
AMPs have an average of 1.669 GB of memory. 42 x 1.669 = 70.1 GB of FSG cache.

5555H Example
Assume a 5555H node with 32 GB of memory executing the Teradata Database.
Assume that a typical 5555H node will have 25 AMPs, 2 PEs, PDE, GTW, RSG, and TVS
vprocs for a total of 31 vprocs. This means that memory will have to be allocated for the 31
vprocs.
The operating system, device drivers, and Teradata vprocs for a 5555H Linux node with 32
GB of memory may use as much as 5.8 GB of memory. PDE will use a FSG Cache Percent
(CTL parameter) to calculate how much memory to allocate to itself for FSG (File Segment
Cache) based on the available memory (32 GB – 5.8 GB).
The 5.8 GB is based on the Design Center recommendation for a 5555H node with 32 GB of
memory.
For these reasons, PDE is not assigned all of the remaining memory for FSG cache, but a
percentage of the remaining memory. The default of 80% for FSG Cache Percent works for
most 5555 systems.

Page 10-6

How Teradata uses MPP Systems

Memory and the Teradata Database
Example of 6650 (Linux) node
with 2 PEs and 42 AMPs and
96 GB of memory:

10% of remaining space – 8 GB available as free space

Memory
O.S., Device Drivers, and
space for vprocs ≈ 18 GB
96 GB
– 18 GB
78 GB
FSG Cache 90%

FSG Cache ≈ 70 GB
Free Memory ≈

8 GB

Examples of objects that are
memory resident:
Hash Maps
Configuration Maps
Master Indexes
RTS – Request-to-Steps Cache
D/D – Data Dictionary Cache

How Teradata uses MPP Systems

FSG (File Segment Cache)
(Examples of use – Data Blocks & Cylinder Indexes)
Managed by PDE Software

90% of remaining space – 70 GB available for FSG
PE
Vproc
RTS
D/D
Cache

PDE

...

AMP
Vproc

AMP
Vproc

.........

Master
Index
Hash Maps
Configuration Maps

Master
Index

GTW

VSS

RSG

Operating System and Device Drivers

Ex. 96 GB Memory

Page 10-7

SMP Memory – Summary
Practical experience (for most environments) indicates that Linux and Teradata vprocs need
more memory than initially allocated during normal processing periods. Prior to V2R5, it
was recommended that at least 20 MB to 40MB of additional free memory be available for
each AMP. With 32-bit systems, it is recommended that a node have at least 60 – 80 MB of
free memory available for each AMP. With 64-bit systems, each AMP may use up to 210
MB of memory. This would be an additional 130 MB of memory per AMP.
This is accomplished by not giving 100% of the remaining memory to FSG. It is always
recommended that the FSG Cache Percent be set to a value less than 100%. The default of
90% for FSG Cache Percent works well for most 56xx and 66xx configurations. 80%
usually works well for 5555 configurations.

Determining FSG Cache
The “ctl” utility can be used to determine how much FSG cache memory is actually
available to a node.
Using the “ctl” utility, the hardware command will report the amount of FSG cache for each
AMP. The values below represent the average amount of FSG memory per AMP.
Examples are shown below.
For a 5555H node with 25 AMPs, the report will indicate 838,016 KB/per AMP.
838,016 KB/AMP x 25 = 20,950,500 KB or approximately 21 GB of FSG cache.
For a 2555H node with 36 AMPs, the report will indicate 582,016 KB/per AMP.
582,016 KB/AMP x 36 = 20,952,576 KB or approximately 21 GB of FSG cache.
For a 5600H node with 40 AMPs, the report will indicate 1,753,856 KB/per AMP.
1,753,856 KB/AMP x 40 = 70,154,240 KB or approximately 70 GB of FSG cache.
For a 5650H node with 47 AMPs, the report will indicate 1,472,000 KB/per AMP.
1,472,000 KB/AMP x 47 = 69,184,000 KB or approximately 69.2 GB of FSG
cache.
For a 6650H node with 42 AMPs, the report will indicate 1,669,120 KB/per AMP.
1,669,120,000 KB/AMP x 42 = 70,103,040 KB or approximately 70.1 GB of FSG
cache.

Page 10-8

How Teradata uses MPP Systems

SMP Memory – Summary
Based on the configuration and FSG
Cache Percent value, PDE will
determine the amount of memory to
allocate for FSG cache.
However, vprocs (especially AMPs) will
use more than their initial memory
allocations during normal processing
(e.g., redistribution buffers,
aggregations buffers, hash join buffers,
etc.).
Some basic guidelines for AMPs are:
64-bit systems – assume 210 MB per AMP

7.8 GB

Memory managed
by O.S.

90% – 70.2 GB
80% – 62.4 GB

FSG Cache

O.S., Device drivers,
and Teradata Vprocs
18.0 GB

Managed by
PDE FSG
software.

Memory
managed by
O.S.

Ex. 96 GB Memory

FSG – pool of memory managed by PDE and each AMP uses what it needs.
ctl Parameter – FSG Cache Percent – for 66xx, the design center recommendation is 90%
and this works for most configurations.

How Teradata uses MPP Systems

Page 10-9

O.S. Managed Memory and FSG Cache
The facing page lists examples of how Operating System managed memory (free memory)
and FSG cache is used.
Memory managed and used by the operating system and the vprocs is sometimes called
“free memory”. The main code (on a TPA node) that uses free memory is the operating
system and Teradata vprocs
A brief description of Teradata Vprocs:


AMP Access module processors perform database functions, such as executing
database queries. Each AMP owns a portion of the overall database storage.



GTW Gateway vprocs provide a socket interface to Teradata Database on Windows
and Linux systems. On MP-RAS systems, the same functionality is provided by
gateway software running directly on the system nodes within the PDE vproc.



Node (or Base) PDE vproc - the node vproc handles PDE and operating system
functions not directly related to AMP and PE work. Node vprocs cannot be
externally manipulated, and do not appear in the output of the Vproc Manager
utility.



PE Parsing engines perform session control, query parsing, security validation,
query optimization, and query dispatch.



RSG Relay Services Gateway provides a socket interface for the replication agent,
and for relaying dictionary changes to the Teradata Meta Data Services (MDS)
utility.



TVS Manages Teradata Database storage. AMPs acquire their portions of database
storage through the TVS (previous releases named this VSS) vproc.

When Teradata needs to read a database block, it checks FSG Cache first.
Examples of how FSG Cache is used







Page 10-10

Permanent data blocks
Cylinder Indexes
Spool data blocks
Transient Journals
Permanent Journals
Synchronized scan (sync scan) data blocks

How Teradata uses MPP Systems

O.S. Managed Memory and FSG Cache
Memory managed by the O.S. is referred to as “free memory”.

• Teradata Vprocs
–
–
–
–
–
–

AMP – includes AMP worker tasks
PE – Session control, Parser, Optimizer, Dispatcher
PDE (Parallel Database Extensions) – messaging, FSG space management, etc.
GTW (Gateway) – Logon Security, Session Context, Connection to Client
RSG (Relay Services Gateway) – Optional; Replication Gateway, MDS auto-update
TVS (Teradata Virtual Storage) – manages Teradata Virtual Storage

• Administrative and/or user programs such as:
– kernel resources and administrative program text and data
– message buffers (ex., TCP/IP)
Memory managed by PDE is called FSG cache. FSG cache is primarily used by
the AMPs to access memory resident database segments.

• When Teradata needs to read a database block, it checks FSG Cache first.
–
–
–
–
–

Permanent data blocks
Cylinder Indexes
Spool data blocks
Journal blocks; Transient Journal and/or Permanent Journals
Synchronized scan (sync scan) data blocks

How Teradata uses MPP Systems

Page 10-11

WAL – Write Ahead Logic
WAL (Write Ahead Logic) is a recoverability/reliability feature that can possibly provide
performance improvements in the area of database writes. In general, I/O increases with
WAL and, therefore, it may reduce throughput for I/O bound workloads. However, the
overall performance is expected to be better with WAL since the benefit of CPU
improvement outweighs the I/O cost. There is some additional CPU cost for maintaining the
WAL log so WAL may reduce throughput for CPU-bound workloads, but is minimal.
Simple example: Assume Teradata Mode, an implicit transaction, and you are doing an
UPDATE of a single row in a block that has 300 rows in it.
1.
2.
3.
4.
5.

Data block is read into FSG Cache.
UNDO row is written to WAL Log (effectively a before-image or TJ type row)
The data block in memory is changed and is marked as changed (not immediately written
to disk - called deferred write).
REDO row is written to the WAL Log (effectively an after-image) - writing a single
REDO row is faster than writing a complete block
The lock is released and the user gets a transaction completed message. Note the updated
block is still in memory and hasn't been written to disk yet.
Note: Other users might be doing updates on rows in the same block and there might be
multiple updates to the same block in memory.

6.

At some point (maybe a half-second second later), the block needs to be written to
disk. This is a deferred write and is done in the background.

6A. If the updated block has not changed size, then it can be written back-in-place. Before
physically writing the block back-in-place, the updated block is first written to the WAL
depot. After the datablock is successfully written to the WAL Depot, it is then physically
written back-in-place.
Why is the block effectively written twice back to disk? A write operation can fail (called
interrupted write) and this can corrupt a block on disk and potentially corrupt all 300
rows. This is a very rare occurrence, but can happen. The WAL Log only has 1 row
(REDO row) of the row that has changed. Therefore, by writing the block first to the WAL
Depot before writing back-in-place, Teradata ensures that a good copy of the entire
datablock is written back-to-disk. The WAL Depot is ONLY used for blocks that haven't
changed size - effectively write back-in-place operations. This is an extra internal I/O, but
it provides data integrity and protection from interrupted write operations.
6B. If the block has changed size in memory (e.g., block expands to an additional sector), then
the updated block is written to a new location on disk - it is not written to the WAL
Depot. If there is an interrupted write, the original block has not been touched and the
REDO rows along with the original data block can be used for recovery.

WAL can batch up modifications from multiple transactions and apply them with a single
disk I/O, thereby saving I/O operations. WAL will help improve throughput for I/O-bound
workloads. Obviously, Load utilities such as FastLoad and MultiLoad don't need to use
WAL. Other functions such as FastPath operations use the WAL subsystem differently.

Page 10-12

How Teradata uses MPP Systems

WAL – Write Ahead Logic
WAL – Write Ahead Logic

• Available with all Teradata systems – PDE (UNIX MP-RAS) and OpenPDE (Windows
and Linux)

• Replaced Buddy Backup in PDE (UNIX MP-RAS) Teradata systems
WAL is a primarily an internal recoverability/reliability feature that also provides
performance improvements in the area of database writes.

• All modifications are represented in a log and the log is forced to disk at key times.
• Data Blocks updated in Memory, but not written immediately to disk
• In place of the data block written to disk, the before image (UNDO row) and after
image (REDO row) are written to a WAL buffer which is written to the WAL log on disk.

• WAL can batch up modifications from multiple transactions and apply them with a
single disk I/O, thereby saving I/O operations. WAL will help improve throughput for
I/O-bound workloads.

• Updated data blocks will be eventually aged out and written to disk.
Note: There are numerous DBS Control parameters to specify space allocations
for WAL.

How Teradata uses MPP Systems

Page 10-13

WAL Concepts
WAL has its own file system software and uses a fixed number of cylinders for the WAL
Depot (varies by vdisk size and DBSControl parameters) and a dynamic number of cylinders
for the WAL Log itself.
The WAL Depot consists of two types of slots:



Large Depot slots
Small Depot slots

The Large Depot slots are used by aging routines to write multiple blocks to the Depot area
with a single I/O. The Small Depot slots are used when individual blocks that require Depot
protection are written to the Depot area by foreground tasks.
The number of cylinders allocated to the Depot area is fixed at startup based on the settings
of several internal DBS Control flags.
The number of Depot area cylinders allocated is per pdisk, so their total number depends on
the number of Pdisks in your system. Sets of Pdisks belong to a subpool, and the system
assigns individual AMPs to those subpools.
Because it does not assign Pdisks to AMPs, the system calculates the average number of
Pdisks per AMP in the entire subpool from the vconfig GDO when it allocates Depot
cylinders, rounding up the calculated value if necessary. The result is then multiplied by the
specified values to obtain the total number of depot cylinders for each AMP. Using this
method, each AMP is assigned the same number of Depot cylinders.
The concept is to disperse the Depot cylinders fairly evenly across the system. This prevents
one pdisk from becoming overwhelmed by all the Depot writes for your system.
WAL (Write Ahead Logic) is a transaction logging scheme maintained by the File System
in which a write cache for disk writes of permanent data is maintained using log records
instead of writing the actual data blocks at the time a transaction is processed. Multiple log
records representing transaction updates can then be batched together and written to disk
with a single I/O thus achieving a large savings in I/O operations and enhancing system
performance as a result.
The amount of space used for the WAL Log is dynamic. WAL contains before-images (TJ)
and after-images (Redo) for transactions. For example, the number of TJ images is very
dependent on the type of transaction. Updating every row in a large table places a lot of TJ
images into WAL.
Note: Prior to the V2R6.2 release, Teradata systems running under UNIX MP-RAS systems
utilized a facility referred to as “buddy backup”.

Page 10-14

How Teradata uses MPP Systems

WAL Concepts
WAL Depot
• Fixed number of cylinders allocated

AMP Cylinders

to each AMP.

• Used for Write-in-Place operations.
• Teradata first writes data to WAL

WAL Depot

Depot and if successful, then writes
to disk.

• WAL Logic will attempt to group a
number of blocks to write to WAL
Depot.

WAL Log

WAL Log
• Dynamic number of cylinders used by
each AMP.

• Used for new block allocations on
disk.

Data
Cylinders

• Contains before-images (UNDO) and
after-images (REDO) for transactions
– used with both Write-in-Place or
new block allocations.

(Perm, Spool,
Temporary,
Permanent
Journals)

• Updated data blocks will be
eventually aged out and written to
disk.

How Teradata uses MPP Systems

Allocation of cylinders is not contiguous.

Page 10-15

Linux Vproc Number Assignment
The facing page describes how Vprocs are assigned numbers.
With OpenPDE systems, gateway software is implemented in as separate vproc (named
GTW) from the PDE vproc. With MP-RAS systems (PDE), gateway software is
incorporated into the PDE vproc.
Within a multi-node single clique system, it is possible for one of the nodes to have a second
TVS vproc. This may seem like an anomaly, but this is normal.
For example, assume a 3+1 single clique system:


In order for fallback to be most effective, a single clique is divided into two
subpools of storage and AMPs which reference that storage. Fallback is then setup
as a cluster size of two between the two subpools of AMPs. An Allocator (part of
TVS vproc) only deals with a single sub-pool of storage. Since in this case we are
dividing up two subpools into three nodes, one of the nodes has about half of its
storage in one subpool and half of its storage in the other subpool. Therefore, that
node needs to have two Allocator vprocs, one for each sub-pool of storage. Any
system with more than one clique has only one sub-pool per clique and this
anomaly goes away.



A single node system (which is of course a single clique) also has two sub-pools for
the same reason.

With Teradata 13.10 (and previous releases), vproc number ranges are:
AMPs – 0, 1, 2, …
PEs – 16383, 16382, 16381, …
GTW – 8192, 8193, 8194, …
VSS – 10238, 10237, 10236, …
PDE – 16384, 16385, 16386, …
RSG – 9215, 9216, 9217, … (Optional)
When a system is configured with PUT, the installer is presented with an option to choose
large vproc numbers if configuring a system with more than 8,192 AMPs. Therefore,
starting with Teradata 14.0, optional vproc number ranges are:
AMPs – 0, 1, 2, …
PEs – 30719, 30718, 30717, …
GTW – 22528, 22529, 22530, …
TVS – 28671, 28670, 28669, …
PDE – 30720, 30721, 30722, …
RSG – 26623, 26622, 26621, … (Optional)

Page 10-16

How Teradata uses MPP Systems

Linux Vproc Number Assignment
Each Teradata Vproc is assigned a unique Vproc number in the system. For example:
Typical Vproc assignments:
AMP Vproc #s (start at 0 and increment by 1)
• First AMP
0
• Second AMP
1
• Third AMP
2
PE Vproc #s (start at 16383 and decrement by 1)
• First PE
16383
• Second PE
16382
• Third PE
16381
Optional Vproc assignments starting with Teradata 14.0:

Appear in DD/D and
utilities such as Teradata
Administrator, Viewpoint
etc.

AMP Vproc #s (start at 0 and increment by 1)
• First AMP
0
• Second AMP
1
• Third AMP
2
PE Vproc #s (start at 30719 and decrement by 1)
• First PE
30719
• Second PE
30718
• Third PE
30717

How Teradata uses MPP Systems

Page 10-17

Disk Arrays from a O.S. Perspective
The Operating System is used to read and/or write data to/from an individual disk. Disk
arrays trick the operating system into thinking it is writing to a single disk. A disk array
LUN looks to the operating system like a single disk. When the operating system gets ready
to do a read or a write, the disk array controller steps in and says, “I’ll handle that for you”.
The operating system says, “I am writing to a single disk and its address is c10t0d0s1”.
The operating system does not directly read or write to a disk in a disk array environment.
The operating system communicates with the disk array controller. The operating system
actually reads or writes the data from a logical unit (often referred to as a LUN or a
Volume). A logical unit (LUN) or Volume is a logical disk and not a physical disk.
The operating system does not know (or care) if a LUN or Volume is RAID 0, RAID 1, or
RAID 5. The operating system does not know if the drive group is one disk, two disk, or
four disks. The operating system does not know if the data is spread across one disk or four
disks. The operating system simply sees the logical unit as a single disk.
The standard operating system utilities that are used to manage, configure, and utilize a
physical disk are also used to manage, configure, and utilize a logical disk or LUN. With
the Teradata Database, the PUT utility is used to configure the disk array.
The array controller performs the actual input/output operations to its disks. The array
controller is responsible for handling the different RAID technologies.

Page 10-18

How Teradata uses MPP Systems

Disk Arrays from an O.S. Perspective
A logical unit (LUN) or Volume is a single disk to the operating system.

– The operating system does not know or care about the specific RAID technology
being used for a LUN or Volume.

– The operating system uses LUNs to communicate with disk array controllers.
– It is possible to divide a LUN into one or more partitions (or slices for MP-RAS).

Operating System

LUN 0

Disk 1
in Array

Disk 2
in Array

LUN 1

Disk 3
in Array

Disk 4
in Array

……
……

LUN 59

Disk 119
in Array

Disk 120
in Array

The operating system (e.g., Linux) thinks it is reading and writing to 60 logical disks.

How Teradata uses MPP Systems

Page 10-19

Logical Units and Partitions
A logical unit (just like a physical disk) can be divided into multiple partitions (or slices
with MP-RAS). A partition is a portion of a logical unit. A partition is typically used in one
of two ways.



Used to hold the Linux file system on SMP node internal disks.
Provides a raw data storage area (raw disk partition) that is used by Teradata.

EMC2 Notes
EMC2 DMX disk arrays are configured with 4-way Hyper (disk slice) Meta volumes which
are seen as LUNs at the host or SMP level.
Each drive is divided into 4 equal size pieces (effectively slices within the array). 4 slices
(across 4 disks) make a LUN that is presented to the operating system.
Meta volumes are used to reduce the number of LUNs and minimize registry entries in a
Windows system.
Acronym: FS – File System

Page 10-20

How Teradata uses MPP Systems

Logical Units and Partitions
With Linux, a logical unit (LUN) or Volume can be divided into one or more
partitions.

• With MP-RAS systems, the portions of a LUN are referred to as slices.
How are partitions typically used by Teradata?

• Provides raw data storage area (raw disk partition) for Teradata.
• A Pdisk (Teradata) is a name that is assigned to a partition (slice) within a LUN.
LUN

LUN

Single Partition
- raw disk space

Multiple Partitions
- each is raw disk
space
- each is assigned to
a different Pdisk

One Pdisk

Multiple Pdisks

How Teradata uses MPP Systems

Page 10-21

Teradata and Disk Arrays
The Teradata Database has long been recognized as one of the most effective database
platforms for the storage and management of very large relational databases.
The Teradata Database implementation executes as an application under the operating
system. The two key pieces of software that make up the Teradata Database are the PE
software and the AMP software.
Users access the Teradata Database by issuing SQL commands - usually from channelattached hosts or LAN attached workstations. The user request is handled by Channel
Driver or Gateway software and is passed to a Parsing Engine (PE) which processes the
SQL request. PE software manages the user session, interprets (parses) the SQL request,
creates an execution plan, and dispatches the steps of that plan to the AMP(s).
AMPs provide access to user data stored within tables that are physically stored on disk
arrays.
Each AMP is associated with a Vdisk. Each AMP sees its Vdisk as a single disk. Teradata
Database (AMP software) organizes its data on its disk space (Vdisk) using a Teradata
Database “File System” structure. A “master index” is used to locate “cylinder indexes”
which are used to locate data blocks that contain data rows.
A Vdisk is actually composed of multiple slices (also called Pdisks - Physical disk) that are
part of a LUN (Logical Unit) in a disk array. The operating system (e.g., Linux) and the
array controllers work at the LUN level.
A logical unit (just like a physical disk) can be divided into multiple slices. A slice is a
portion of a logical unit.
An AMP is assigned to a Vdisk. A Vdisk is composed of one or more Pdisks. In Linux, a
Pdisk is assigned to a partition within a LUN.
The PUT utility is used to define a Teradata Database configuration.

Page 10-22

How Teradata uses MPP Systems

Teradata and Disk Arrays
Teradata Pdisk = Linux/Windows Partition

User

AMP
File
System
Software

Vdisk
Pdisk 0

Single Disk
Pdisk 1

PE
TVS

PDE

O.S.

How Teradata uses MPP Systems

Disk Array Controller

Logical
Disks

LUN 0

LUN 1

Pdisk 0

Pdisk 1

Page 10-23

Teradata 6650 (2+1) Logical View
The facing page illustrates a logical view of the Teradata Database using a 6650 3+1 clique.
The design center configuration for a 6650H (Linux) 2+1 clique is as follows:



4 Drives per AMP
30 AMPs per Node

Each virtual AMP is assigned to a virtual disk (Vdisk). AMP 0 is assigned to Vdisk 0 which
consists of 2 mirrored pairs of disks.
Each AMP has a Vdisk with 592,020 cylinders.
Note: The actual MaxPerm space that is available to an AMP is slightly less than the
physical disk space because of file system overhead. Approximately 90 - 91% of the
physical disk space is actually available as MaxPerm space.

Page 10-24

How Teradata uses MPP Systems

Teradata 6650 (2+1) Logical View
6650H Node
AMP

AMP

0

1

Vdisk
0

AMPs 2 – 28

6650H Node
AMP

AMP

AMP

29

30

31

Vdisk
29

Vdisk
1

120 Disks

Vdisk
30

AMPs 32 – 58

AMP
59

Vdisk
59

Vdisk
31

120 Disks

Two Disk Arrays with 240 Disks – Logical View
Typical configuration is to assign each AMP with two mirrored pairs of disks.

How Teradata uses MPP Systems

Page 10-25

Teradata 6650 (3+1) Logical View
The facing page illustrates a logical view of the Teradata Database using a 6650 3+1 clique.
The design center configuration for a 6650H (Linux) 3+1 clique is as follows:



2 Drives per AMP
42 AMPs per Node

Each virtual AMP is assigned to a virtual disk (Vdisk). AMP 0 is assigned to Vdisk 0 which
consists of 1 mirrored pair of disks.
Each AMP has a Vdisk with 295,922 cylinders.
Note: The actual MaxPerm space that is available to an AMP is slightly less than the
physical disk space because of file system overhead. Approximately 90 - 91% of the
physical disk space is actually available as MaxPerm space.

Page 10-26

How Teradata uses MPP Systems

Teradata 6650 (3+1) Logical View
6650H Node
AMP
0

Vdisk
0

AMP
1

Vdisk
1

6650H Node

AMPs 2 – 40

AMP

AMP

AMP

6650H Node
AMP

AMP

AMP

83

84

85

AMPs 44 – 82
41

Vdisk
41

126 Disks

42

Vdisk
42

43

Vdisk
43

Vdisk
83

Vdisk
84

Vdisk
85

AMP
AMPs 86 – 124

125

Vdisk
125

126 Disks

Two Disk Arrays with 252 Disks – Logical View
Typical configuration is to assign each AMP with one mirrored pair of disks.

How Teradata uses MPP Systems

Page 10-27

Example of 1.2 TB Vdisk (pre-TVS)
A Vdisk effectively represents a set of disks in a disk array. In this example, a Vdisk
represents a rank of 4 disks in a disk array that is configured to use RAID 1 technology.
If the disk array has 600 GB disks and RAID 1 protection is used, then one rank of disks (4
disks) has 1.2 TB of available disk space.
4 disks x 600 GB x .50 (parity is 50%) = 1.2 TB*
If the Vdisk is configured (assigned) with four 600 GB disks (RAID 1), then the associated
AMP has 1.2 TB of perm disk space available to it.
The facing page contains a typical example of a 1.2 TB Vdisk. It would contain 592,021
cylinders; each cylinder is 3872 sectors in size. A cylinder is approximately 1.9 MB in size
(3872 x 512 bytes).
With 600 GB disk drives, 592,021 cylinders are numbered from 0 to 592,020. Cylinder 0
contains control information used by the AMP and does not contain user data.
If 73 GB disk drives are used, the AMP's Vdisk will be as follows:
Total number of cylinders – 71,853
First Pdisk – 35,924 cylinders (numbered 0 through 35,923)
Second Pdisk – 35,929 cylinders (numbered 35,924 through 71,852)
If 146 GB drives are used, then the Vdisk will be as following:
Total number of cylinders – 144,482
First Pdisk – 72,237 cylinders (numbered 0 through 72,236)
Second Pdisk – 72,245 cylinders (numbered 72,237 through 144,481)
If 300 GB drives are used, then the Vdisk will be as following:
Total number of cylinders – 290,072
First Pdisk – 145,037 cylinders (numbered 0 through 145,036)
Second Pdisk – 145,035 cylinders (numbered 145,037 through 290,071)
The configuration of LUNs/partitions and the assignment Pdisks/Vdisks to AMPs is done
through the PUT utility.
As mentioned previously, the actual space that is available to an AMP is slightly less that the
numbers used above because of file system overhead. The actual MaxPerm space is
approximately 90-91% of the physical disk space. In the example on the facing page, each
AMP will have approximately 1080 GB of MaxPerm space, not 1200GB.

Page 10-28

How Teradata uses MPP Systems

Example of 1.2 TB Vdisk (pre-TVS)
Teradata’s File System
software divides the Vdisk
into logical cylinders.
Typically, each cylinder is
3872 sectors in size.

Physical Disks
600 GB

Vdisk

Pdisk 0

LUN

600 GB

600 GB

Cylinder 0

1.2 TB
Cylinder 0

600 GB
296,011

1

AMP
600 GB

592,020

Pdisk 1

LUN

600 GB

600 GB

296,012
600 GB
592,020

RAID 1 Mirroring

How Teradata uses MPP Systems

Page 10-29

Teradata File System Concepts
Each AMP has its own disk space managed by the Teradata Database file system software.
The file system software groups physical blocks into logical cylinders.
One AMP can address/manage up to 700,000 cylinders. Each cylinder has a cylinder index
(CI). Although an AMP can address this large number of cylinders, typically an AMP will
only be responsible for a much smaller number of cylinders. For example, an AMP that
manages 292 GB of disk space will have 144,482 cylinders.
When an AMP is initialized (booted), it reads the Cylinder Indexes and creates an inmemory Master Index to the Cylinder Indexes.
Notes:


Teradata Database V2R5 to 13.0 – each Cylinder Index is 12 KB in size. The
cylinder size is still 3872 sectors.



Teradata uses both of the cylinder indexes as alternating cylinder indexes for write
(INSERT, UPDATE, and DELETE) operations for all of the supported operating
systems.

Teradata Vdisk Size Limits
For Teradata releases up to Teradata 13.0, the maximum amount of space that one AMP can
access is based on the following calculation:
700,000 logical cylinders x 3872 sectors/cylinder x 512 bytes/sector
This equals 1,387,724,800,000 bytes or approximately 1.26 TB where a TB is 10244.

Page 10-30

How Teradata uses MPP Systems

Teradata File System Concepts
The cylinder size is 3872
sectors.
For a 1.2 TB Vdisk, there
are 592,021 cylinders.
Note: However, the amount
of actual MaxPerm space is
approximately 90% of the
actual disk space because
of overhead (cylinder
indexes, etc.).

Data Cylinders
Cylinder
1

x
x
=

700,000 cylinders
3872 sectors/cylinder
512 bytes/sector
1.26 Terabytes

Data Block with rows
Data Block with rows

AMP Memory
Cylinder
2

Cylinder Index
Data Block with rows

Master Index
Entry for CI #1
Entry for CI #2

Data Block with rows

MaxPerm per AMP
1.2 TB x .90 ≈ 1.08 TB
Note:
The maximum disk space
an AMP can address is:

Cylinder Index

Entry for CI #700,000

Max # of
Cylinders
is approx.
700,000

How Teradata uses MPP Systems

Cylinder Index

Size of Cylinder Index space: 24K

Page 10-31

Teradata 13.10 Large Cylinder Support
Prior to Teradata 13.10, the maximum space available to an AMP is approximately 1.2 TB.
This feature increases the maximum space available to an AMP to approximately 7 TB.
Benefits of this feature are listed on the facing page.
Only systems that are newly initialized (sysinit) with Teradata 13.10 will have large
cylinders enabled. Existing systems that are upgraded to 13.10 will have to be initialized
(sysinit) in order to utilize large cylinders.
A cylinder contains Perm, Spool, Temporary, Permanent Journal, or WAL data, but NOT a
combination. For an existing system, large cylinders results in fewer cylinders that are
available for different types of data. Fewer cylinders can result in low cylinder conditions
occurring more quickly and possibly more Mini-CylPacks.
If the larger cylinder size is used on an existing system where each AMP has much less
space than 1.2 TB, then the number of available cylinders will be much less. For example:


Assume a 5650 system with disk arrays populated with 600 GB drives. An AMP
will typically have 4 drives assigned to it (2 sets of mirrored disks). Therefore, the
AMP will have approximately 1200 GB of available space. This space is divided
into approximately 592,000 cylinders.
–



Note: The actual MAXPERM space available to an AMP in this example is
approximately 1080 GB (90% of 1200 GB).

If this system is configured with large cylinders, then the system will only have
approximately 99,000 cylinders. Large cylinders consume more physical space,
resulting in fewer overall cylinders.

When to Use This Feature
A customer should consider enabling Large Cylinders if:




Initial system will be sized above the current 1.2 TB per AMP limit
It is possible that future expansion would cause the per AMP limit of 1.2 TB to be
exceeded.
If the customer anticipates the need to utilize larger row sizes (e.g., 1 MB rows) in
a future release.

A customer should NOT enable Large Cylinders if:


Page 10-32

AMPs on the system are sized considerably less than 1.2 TB with no plans to
expand beyond that limit. Large cylinders consume more physical space, resulting
in fewer overall cylinders.

How Teradata uses MPP Systems

Teradata Large Cylinder Support
Starting with Teradata 13.10, this feature increases the maximum space available to an
AMP to approximately 7.2 TB.
Benefits of this feature are:

•

To utilize larger disk drives, AMPs must be to address more storage space.

– Example, 2650 systems utilizing 2 TB disk drives
•

Customers that require a large capacity of storage space have the option of increasing their
storage per AMP, rather than increasing the number of AMPs.

•

The maximum row size will most likely increase in future releases. Larger cylinders are more
space efficient for storing large rows.

– The maximum row size (~64 KB) is unchanged in 14.0
If large cylinders are enabled for a Teradata 13.10 or 14.0 system, then the maximum space
that an AMP can access is 6 times greater or approximately 7.2 TB.
Max # of cylinders
~700,000

•

x
x

#sectors in cylinder x
23,232
x

sector size
512 bytes

= 7.2 TB

Each cylinder index has increased to 64 KB to accommodate more blocks in a large cylinder.

Only newly initialized (sysinit) systems can have large cylinders enabled.
• Existing systems upgraded to 13.10 have to be initialized (sysinit) in order to utilize large cylinders.

How Teradata uses MPP Systems

Page 10-33

Full Cylinder Read
Full Cylinder Read allows retrieval operations to run more efficiently by reading a list of
cylinder-resident data blocks with a single I/O operation. This reduces I/O overhead from
once per data block to once per cylinder.
A data block is a disk-resident structure that contains one or more rows from the same table
and is the smallest I/O unit for the Teradata Database file system. Data blocks are stored in
physical disk sectors or segments, which are grouped in cylinders.
Full Cylinder Read improves the performance of systems with both fine-grained operations
and decision support workload. It eliminates the tradeoffs for short queries and concurrent
updates versus strategic queries.
Performance may benefit from Full Cylinder Read during operations such as:






Full-table scan operations under conditions such as:
Large select
Merge insert/select and Merge delete
Aggregation: Sum, Average, Minimum/Maximum, Count
Join operations that involve many data blocks, such as merge joins, product joins,
and inner/outer joins

Starting with Teradata 13.10, this feature no longer needs to be tuned using a Cylinder
Slots/AMP setting. This allows for more extensive use of cylinder read without the need to
reserve portions of the FSG cache for Cylinder Read when Cylinder Read is not being used.
Prior to Teradata 13.10, it was necessary to specify the number of cylinder slots per AMP
that would be available. The default number of cylinder slots per AMP is:





6 on 32-bit systems with model numbers lower than 5380.
6 on 32-bit coexistence systems with “older nodes.” An “older node” is a node for a
system with a model number lower than 5380.
8 on the 32-bit systems with model numbers at 5380 or higher.
8 on 64-bit systems.

Teradata Database Customer Support sets the CR flag to ON and uses the ctl (control) utility
to modify the number of cylslots.
Memory allocated to cylinder slots can only be used for cylinder reads. The benefit of
cylinder reads is likely to outweigh the reduction in generic FSG cache.

Page 10-34

How Teradata uses MPP Systems

Full Cylinder Read
The Full Cylinder Read feature allows data to be retrieved with a single cylinder
(large) read, rather than individual reads of blocks.

CYLINDER

Data
Block

DB

DB

DB

DB

DB

~1.9 MB

Enables efficient use of disk & CPU performance resources for the following table scan
operations under specific conditions. Examples include:

–
–
–
–

large selects and aggregates: sum, avg, min, max, count
joins: merge joins, product joins, inner/outer joins
merge delete merge insert/select into empty or populated tables
full table update/deletes

With Teradata 13.10, it is no longer necessary to specify a number of cylinder slots to
make available for this feature.

–

This 13.10 enhancement allows for more extensive use of cylinder reads without the need to
reserve portions of the FSG cache for the Cylinder Read feature.

–

Prior to 13.10, the number of cylinder slots was set using the ctl utility. The default was 8 for 64bit operating systems.

How Teradata uses MPP Systems

Page 10-35

Summary
The facing page summarizes the key points and concepts discussed in this module.

Page 10-36

How Teradata uses MPP Systems

Summary
• Memory managed and used by the operating system and the vprocs is sometimes
called “free memory”.

• PDE software manages FSG Cache.
– FSG Cache is primarily used by the AMPs to access memory resident database
segments.

• The operating system and Teradata does not know or care about the RAID technology
being used.

• A LUN or Volume looks like a single disk to the operating system.
– With Linux or Windows, a LUN or Volume is considered a partition and the raw
partition is assigned to a Teradata Pdisk.

How Teradata uses MPP Systems

Page 10-37

Module 10: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 10-38

How Teradata uses MPP Systems

Module 10: Review Questions
1. Which two are placed into FSG cache?
a. Hash maps
b. Master Index
c. Cylinder Indexes
d. Permanent data blocks
2. What is the WAL Depot used for?
a. UNDO Rows
b. New data blocks
c. Master Index updates
d. Write-in-place data blocks
3. Which two are placed into the WAL Log?
a. REDO Rows
b. UNDO Rows
c. New data blocks
d. Master Index updates
e. Write-in-place data blocks
4. Describe the fundamental relationship between Linux, logical units, and disk array controllers.
________________________________________________________________________________
5. Describe the fundamental relationship between AMPs, Vdisks, Pdisks, Partitions, and LUNs.
________________________________________________________________________________
________________________________________________________________________________

How Teradata uses MPP Systems

Page 10-39

Notes

Page 10-40

How Teradata uses MPP Systems

Module 11
Teradata Virtual Storage

After completing this module, you will be able to:
 List two benefits of Teradata Virtual Storage.
 List the two operational modes of TVS.
 Identify the difference between temperature and performance.
 Identify typical data that is identified as hot data.

Teradata Proprietary and Confidential

Teradata Virtual Storage

Page 11-1

Notes

Page 11-2

Teradata Virtual Storage

Table of Contents
Teradata Virtual Storage ............................................................................................................ 11-4
Teradata Virtual Storage Concepts ............................................................................................ 11-6
Allocation Map and Statistics Overhead ................................................................................ 11-6
TVAM .................................................................................................................................... 11-6
Teradata Virtual Storage Terminology ...................................................................................... 11-8
Teradata Virtual Storage Components ................................................................................... 11-8
TVS Operational Modes .......................................................................................................... 11-10
Expanding Data Storage Concepts ........................................................................................... 11-12
Multi-Temperature Concepts ................................................................................................... 11-14
Storage Performance vs. Data Temperature............................................................................. 11-16
Teradata with Hybrid Storage .................................................................................................. 11-18
What Goes Where? .................................................................................................................. 11-20
Multi-Temperature Data Example ........................................................................................... 11-22
Teradata 6690 Cabinets ............................................................................................................ 11-24
Virtualized Management Server (VMS) .............................................................................. 11-24
HHD to SSD Drive Configurations.......................................................................................... 11-26
Summary .................................................................................................................................. 11-28
Module 11: Review Questions ................................................................................................. 11-30

Teradata Virtual Storage

Page 11-3

Teradata Virtual Storage
Teradata Virtual Storage (TVS) is designed to allow the Teradata Database to make use of new
storage technologies. It will allow you to store data that is accessed more frequently on faster devices
and data that is accessed less frequently on slower devices. It will also allow Teradata to make use
of solid state drives (SSD), for example, whenever the technology is available at a competitive price.
Solid state refers to the use of semiconductor devices.

Teradata Virtual Storage is responsible for:


pooling clique storage and allocating cylinders from the storage pool to individual
AMPs



tracking where data is stored on the physical media



maintaining statistics on the frequency of data access and on the performance of
physical storage media

These capabilities allow Teradata Virtual Storage to provide the following benefits:


Storage optimization, data migration, and data evacuation
Teradata Virtual Storage maintains statistics on frequency of data access (“data
temperature”) and on the performance (“grade”) of physical media. This allows the
Teradata Virtual Storage product to intelligently place more frequently accessed
data on faster physical storage. As data access patterns change, Teradata Virtual
Storage can move (“migrate”) storage cylinders to faster or slower physical media
within each clique. This can improve system performance over time.
Teradata Virtual Storage can migrate data away from a physical storage device in
order to prepare for removal or replacement of the device. This process is called
“evacuation.”. Complete data evacuation requires a system restart, but Teradata
Virtual Storage supports a “soft evacuation” feature that allows much of the data to
be moved while the system remains online. This can minimize system down time
when evacuations are necessary.



Lower Barriers to System Growth
Device management features of Teradata Virtual Storage provide the ability to pool
storage within each clique. Each storage device (pdisk) can be shared, if necessary,
by all AMPs in the clique. If the number of storage devices is not a multiple of the
number of AMPs in the clique, the extra storage will be shared. Consequently,
storage can be added to the system in smaller increments, as needs and
opportunities arise.

Page 11-4

Teradata Virtual Storage

Teradata Virtual Storage
What is Teradata Virtual Storage (TVS)?

• TVS (Teradata 13.0) is a change to the way in which Teradata accesses storage.
• Purpose is to manage a Multi-Temperature Warehouse.
• Pools all of the cylinders within a clique's disk space and allocates cylinders from this
storage pool to individual AMPs.

Advantages include:

• Simplifies adding storage to existing cliques.
– Improved control over storage growth. You can add storage to the clique-storage-pool
versus to every AMP.
– Allows sharing of storage devices among AMPs.

• Enables mixing drive sizes / speeds / technologies
– Enables the “mixing” of storage devices (e.g., spinning disks, Solid-State Disks – SSD).
• Enables non-intrusive migration of data.
– The most frequently accessed data (hot data cylinders) can migrate to the high
performing cylinders and infrequently accessed data (cold data cylinders) can migrate to
the lower performing cylinders.

Teradata Virtual Storage

Page 11-5

Teradata Virtual Storage Concepts
The facing page illustrates the conceptual differences with and without Teradata Virtual
Storage
One of benefits of Teradata Virtual Storage is the ease of adding storage to an existing
system.
Before Teradata Virtual Storage,



Existing systems have integral number of drives / AMP
Today adding storage requires an additional drive per AMP – means 50% or 100%
increase in capacity

With Teradata Virtual Storage, you can add any number of drives.



Added drives are shared by all AMPs
These new disks may have different capacities and / or performance than those
disks which already reside in the system.

Cylinders IDs (with TVS) are unique in the system and are 8 bytes in length as compared to
4 bytes in length before TVS (Teradata 12.0 and before).

Allocation Map and Statistics Overhead
The file system requires space on each pdisk for its allocation map and statistics areas. The
number of cylinders required depends on the pdisk size as specified in the vconfig GDO.

TVAM
TVAM is a support utility to control and monitor Teradata Virtual Storage. TVAM …



Page 11-6

Includes “-optimize” command to cause forced migration
Includes “evacuate” and “un-join” command to enable removing a drive

Teradata Virtual Storage

Teradata Virtual Storage Concepts
Pre-TVS – AMPs own storage

TVS owns storage
AMPs don't know physical location
of a cylinder and it can change.

AMP

AMP

AMP

AMP

TVS Extent (Cylinder) Driver
Pdisk

Pdisk

Pdisk

Pdisk

Pdisk

Pdisk

Pdisk

Pdisk

Cylinders were addressed by drive # and cylinder #.
All of the cylinders in clique are effectively in
a pool that is managed by that TVS vproc.
Cylinders are assigned a unique cylinder id
(virtual id) across all of the pdisks.

Teradata Virtual Storage

Page 11-7

Teradata Virtual Storage Terminology
The facing page lists and defines some of the key terms used with Teradata Virtual Storage
(TVS).
A subpool is a set of pdisks. There is typically one subpool per clique. Single clique
systems have 2 subpools, so we can spread the AMP clusters across the subpools to achieve
fallback. It is very important to understand that TVS is configured on a clique by clique
basis. For a multi-clique system each clique typically has one subpool. This is where we
configure the AMP clusters across the cliques. No two AMPs in the same cluster should be
configured in the same clique.
TVS will take all the cylinders it finds in the clique (actually the subpool), and will divide
that by the number of AMPs in the clique. This is the maximum that each AMP can allocate
and is communicated back to the AMP so that it can size its master index. Each AMP can
allocate cylinders as it needs cylinders up to that maximum. If some AMPs allocate more or
less than other AMPs at any given time, it does not cause a problem because the space is not
over-subscribed and no AMP can allocate more than its maximum.

Teradata Virtual Storage Components
1. The DBS to File System interface is unchanged.
2.

The file system calls SAM (Storage Allocation Manager) at startup to obtain the list of
allocated extents which can be used to rebuild or verify the MI (Master Index) and
WMI (WAL Master Index). SAM also reports the maximum size of the vdisk for this
AMP.

3.

The file system makes calls on the SAM library interface in order to allocate and free
the extents (cylinders) of storage. SAM provides an opaque handle for each allocated
extent virtualizing the storage from the file system’s point of view.

4.

SAM sends messages to this AMP’s Allocator to allocate/free the extents requested.
Usually this will be on the same node as the AMP, but may require a node hop if the
AMP and Allocator (part of VSS Vproc) have migrated due to a node failure.

5.

The Allocator keeps the Extent Driver apprised of all virtual to physical extent
mappings. Sometimes this requires a message to the node agent on another node in case
of vproc migration. The Allocator keeps a copy of its translation maps on physical
storage in case of node crash. It performs these I/Os directly.

6.

The file system uses extent handles when communicating FSGids to FSG.

7.

FSG passes the extent handle as the disk address for disk I/O to the Extent Driver.

8.

The Extent Driver translates the handle to a physical extent address before passing the
request to the disk driver.

Page 11-8

Teradata Virtual Storage

Teradata Virtual Storage Terminology
TVS Software
• Consists of TVS (previously named VSS) vproc which includes Allocator and Migrator code
• Includes Extent Driver (in kernel) which interfaces between PDE and operating system
Cylinder (or Extent)
• Allocation unit of disk space (currently 3872 sectors) from TVS to an AMP
Pdisk
• Partition/slice of a physical disk; associated with a TVS vproc
Subpool
• Group of disks (effectively a set of pdisks) assigned to a TVS vproc
• Fallback clustering is across subpools; a single AMP is associated with a specific subpool
• 1 subpool/clique except for single-clique systems which have 2 subpools
Storage Performance
• TVS profiles the performance of all the disk drives (e.g., spinning disks versus SSD)
• With spinning disks, outer zones have a faster transfer rate than inner zones on a disk.
Temperature
• Frequency of access of a cylinder (Hot – Cold). TVS gathers metrics on data access
frequency.
Migration
• Movement of cylinders between disks or between locations within a disk.

Teradata Virtual Storage

Page 11-9

TVS Operational Modes
Teradata Virtual Storage operates in one of two modes:



Teradata Traditional
– Mimics prior Teradata releases
Intelligent Placement (a.k.a., 1D)
– Data temperature based placement

Teradata Traditional (TT) mode Characteristics:
When using configurations modeled with the standard interface, the TVS software is used in
Teradata Traditional (TT) Mode. TT mode is available for all operating systems.
In TT mode, TVS software uses similar placement algorithms as pre-TVS Teradata
software. There is no migration of hot data to fast locations in TT mode
Use Teradata Traditional mode when

No mixing of array models in a clique AND

No mixing of disk sizes in an array AND

All Pdisks are the same size (homogeneous storage) AND

Performance capability of all Pdisks is equal (within 5%) AND

Migration is not desired AND

The number of Pdisks is an integer multiple of the AMPs in the clique. This is not
a strict requirement. In this case, any fractional Pdisks will go unused in TT mode.
Intelligent Placement (1D – 1 Dimensional) mode Characteristics
Intelligent Placement is only available for Linux 64-bit operating systems.
This mode is used when any of the following are true:

Mixing of array models in a clique

Mixing of disk sizes in an array

Pdisks in a clique are different sizes
When TVS software is used in Intelligent Placement (1D) Mode:

TVS software uses advanced data placement algorithms

Migration of hot data to fast locations and cold data to slower locations can be
enabled in 1D mode.

Use Intelligent Placement (1D) when
– Pdisks are multiple sizes OR
– Performance capability of any Pdisks is different (> 5%) OR
– The number of Pdisks of each size in the clique is not a multiple of the number
of amps in the clique OR
– Migration is desired

Page 11-10

Teradata Virtual Storage

TVS Operational Modes
Teradata Virtual Storage (Storage Allocation) operates in one of two modes:

• Teradata Traditional – works like prior Teradata releases
• Intelligent Placement (a.k.a., 1D) – data temperature based placement

Operational Mode
Teradata Traditional

Intelligent Placement

Operating System Support

All

Linux

Mixed Disk and/or Mixed Array

No

Yes

Small Growth Increments

No

Yes

Data Migration

No

Yes

Disk Evacuation

No

Yes

Note:
Evacuation is used to migrate all allocated extents (cylinders) from the selected storage to different
devices. This may be done when a disk goes bad or if a disk is to be removed from the storage pool.

Teradata Virtual Storage

Page 11-11

Expanding Data Storage Concepts
When adding non-shared storage to a clique on a system with Teradata Virtual Storage Services
(TVS), the number of devices (Pdisks) added should be a multiple of the number of AMPs in the
clique. The allocation method can be either 1D migration or Teradata Traditional (TT).

When adding shared storage to a clique on a system with Teradata Virtual Storage Services
(TVS), the storage added will be shared among all AMPs. The allocation method is 1D
migration only.
In addition to utilizing the existing storage capacity more efficiently with temperature based
placement, TVS simplifies the ability to alter the storage capacity of a Teradata system. As
previously mentioned, database generations prior to Teradata Database 13.0 typically
allocated the entire capacity of each drive in the system to a single AMP. That design
parameter, coupled with the need for each AMP to have identical amounts of available
storage meant that system-wide storage capacity could only be increased by adding enough
new disks (of the same size) to provide each AMP in the system with an entire new drive.
Thus, a system with 100 AMPs would typically have a minimum increment of growth of
100 drives (actually 200 drives using RAID-1) which have identical performance and
capacities to the existing drives.
With TVS, storage capacity can be added in a much more flexible manner. Instead of
requiring each drive to be dedicated to a single AMP, TVS can subdivide a device into
equivalent groups of cylinders which can then be allocated to each AMP, allowing even
single drive pairs to be equally shared by all of the AMPs in a clique. This “fine grained
virtualization” enables the growth of storage capacity using devices with differing capacities
and speeds. For multi-clique or co-existence systems, new storage capacity would still have
to be added to each clique in an amount that is proportional to the number of AMPs in each
clique.

Page 11-12

Teradata Virtual Storage

Expanding Data Storage Concepts
Storage can be added to a clique and is shared between all the AMPs within the clique.
Expanded storage within a clique is treated as "cold storage".

AMP 0

……

AMP 1

AMP 24

TVS Extent (Cylinder) Driver
Added storage to the clique

Pdisk
0

Pdisk
1

Pdisk
2

Pdisk
3

Physical
Disk

Physical
Disk

Physical
Disk

Physical
Disk

……

Pdisk
48

Pdisk
49

Pdisk
50

Pdisk
51

Physical
Disk

Physical
Disk

Physical
Disk

Physical
Disk

Mirrored
Disk

Mirrored
Disk

Mirrored
Disk

Mirrored
Disk

……
Mirrored
Disk

Mirrored
Disk

Mirrored
Disk

Teradata Virtual Storage

Mirrored
Disk

Page 11-13

Multi-Temperature Concepts
The facing page identifies two key areas associated with Multi-Temperature environments.
With today’s disk technology, the user experiences faster data transfer with data that is
located on the outer zones of a disk drive. This is because there is more data accessed per
disk revolution. Data located on the inner zones experience slower data transfer because
more disk revolutions are needed to access the same amount of data.
Teradata Virtual Storage can track data temperature over time and can move data to
appropriate region.
A Multi-Temperature Warehouse has the ability to prioritize the use of system resources
based on business rules while maximizing utilization of storage with ever increasing
capacity
Teradata Virtual Storage enhances performance with multi-temperature data
placement.

Page 11-14

Teradata Virtual Storage

Multi-Temperature Concepts
Two related concepts for Multi-Temperature Data:
 Performance of Storage
• Differences between different drives on different controllers (spinning disk vs. SSD)
• Differences between different areas within a drive
– Outer zones on a disk have fastest transfer rates.

 Data Access Pattern (Frequency) or Temperature is determined by:
• Frequency of access
• Frequency of updates
• Data maintenance
Faster Data Transfer
(more data per revolution)
• Depth of history

Slower Data Transfer
(less data per revolution)

Teradata Virtual Storage

Page 11-15

Storage Performance vs. Data Temperature
For the purposes of describing the performance characteristics of storage devices, we’ll use
terms like “fast”, “medium” and “slow” to describe the relative response times (grade) of the
physical cylinders that comprise each device. The important thing to keep in mind is that
temperatures (hot, warm, and cold) refer to data access and grade (fast, medium, slow) refer
to the speed of physical devices.
PUT executes the TVS Profiler on one node per clique. This is done during initial install
while the system is essentially idle. The TVS Profiler measures and records the cylinder
response times of one disk of each size (i.e., 146 GB, 300 GB, 450 GB, 600 GB, etc.) and
type (i.e., Cheetah-5). These metrics are then used for all disks of the same size and type in
the clique throughout the life of the system. This can also be done using the TVAM utility.
Data temperature values are maintained by the TVS Allocator vproc. They are viewable at
an extent level via the new TVAM (Teradata Virtual Administration Menu) command and at
a system, AMP, and table level via the Ferret utility SHOW command.
The data temperatures are calculated as a function of both frequency of access and
“recency” of access and are measured relative to the temperatures of the rest of the user data
stored in the warehouse.
This concept of “recency” is important because it allows the data temperatures to cool off as
time passes so that even if a table or partition has a lot of historical access, the temperature
of that data appears lower than data that has the same amount of access during a more recent
time period. This trait of data becoming cooler over time is commonly referred to as data
aging. But just because data is older doesn’t imply that it will only continue to get cooler. In
fact, cooler data can become warm/hot again as access increases. For example, sales data
from this quarter may remain hot until several months in the future as current
quarter/previous quarter comparisons are run in different areas of the business. After 6
months, that sales data may cool off until nearly a year later when it becomes increasingly
queried (i.e. becomes hotter) by comparison against the new current quarter’s sales data.
Teradata Virtual Storage enhances performance with multi-temperature data
placement.

Page 11-16

Teradata Virtual Storage

Storage Performance vs. Data Temperature
Storage Performance relative response times – (e.g., fast, medium, slow).
• Profiles the performance of all the disk drives (e.g., SSD versus spinning disks)
• Identifies performance zones (usually 10) on each spinning disk drive
Data Access Frequency – referred to as "Data Temperature" (e.g., hot, warm, cold).
• TVS records information about data access (called Profiling and Metric Collection)
– How long it takes to access data (I/O response times)
– How often data is accessed (effectively the data temperature)
TVS places data for optimal access based upon storage performance, type of data (WAL,
Depot, Spool, etc.) and the results of metric collection.
• Initial Data Placement
• Migration of data based upon data temperature
Three types of Data Migration:
• Background Process During Queries – moves 10% of data in about a one week
• Optimize Storage Command (Database Off-Hours) - moves 10% of data in about eight hours
– Ignores other work – just runs “flat out”
• Anticipatory Migration to Make Room in Fast Reserve, Fast or Warm Storage for Hotter Data
(when needed)

Teradata Virtual Storage

Page 11-17

Teradata with Hybrid Storage
The facing page illustrates an example of a Teradata system with both HDD and SDD
drives.

Page 11-18

Teradata Virtual Storage

Teradata with Hybrid Storage
Mix of Fast SSD and HDD Spinning Drives

Node

Node

AMPs

AMPs

HSN

Node
AMPs

Node
AMPs

HSN

Node

Node

AMPs

AMPs

HSN

SSD

> 300 MB/Sec

HDD (Spinning)

15 MB/Sec

Teradata Virtual Storage

Page 11-19

What Goes Where?
Virtualization is the Key to Managing Multi-Temperature Data.







TVS “knows” the performance characteristics of the different storage devices
TVS virtualizes location information for the database and stores the translation
information in its meta-data
TVS collects usage information and adds it to the meta-data
TVS uses these metrics to assign a temperature to each cylinder (extent) of data
– Each time cylinder is accessed it heats up
– Over time all cylinders cool off
TVS migrates data from one performance zone to another as appropriate to match
temperature to storage performance

Initial Data Placement




Based on several factors and controls
File System indicates to TVS expected temperature of the data, and its use
(permanent tables, spool, WAL, other temp tables)
TVS allocates from appropriate performance device
– SSDs for hot data
– Inside cylinders of HDD for cold data
– All the rest of HDD for all else (warm)

Initial Data Temperature for Permanent Data





Page 11-20

All new cylinders are assigned an initial temperature
Defaults for each type are specified in DBSControl
When loading into empty tables, if don’t want the default then temperature can be
specified by the user via Querybanding the Session doing the loading
When adding data to existing tables, new data assumes the temperature of the data
already in the table at the location it is inserted.
– Possible to forcibly change temperature of existing table or part of table via
Ferret – this is not a recommended management tool
– Changing temperature does not move data, just make it subject to normal
migration
– Over time, temperature will return to ambient

Teradata Virtual Storage

What Goes Where?
Migration is done at the Cylinder level.
Depot, WAL, and Spool cylinders are allocated as HOT

• 20% of Fast storage (SSD) is reserved for this Depot, WAL, and Spool.
• This region is called the Fast Reserve.
– This does not limit the total amount of WAL or Spool.
• When Fast Reserve is full, use Fast or even Medium for WAL and Spool allocations.
• These cylinders types are not subject to “cooling off”, their temperature is static.
Loading perm data into an empty table defaults to HOT

• This is assumed to be a short-lived staging table.
• If not, this default can be changed with DBSControl.
• Another option is to specify Initial Data Temperature with Query Band.
– SET QUERY_BAND = 'TVSTemperature=COLD;' UPDATE FOR SESSION;
Note: The UPDATE option is used so that this query band statement will not completely replace,
but rather supplement, any other query band name:value pairs already specified for the session.

Teradata Virtual Storage

Page 11-21

Multi-Temperature Data Example
The facing page illustrates an example of using a Multi-Temperature Warehouse.
Example of Multi-Temperature with a PPI Table:







If this is time based (e.g., DATE), then rows of the table are physically grouped by
DATE and the groups ordered by DATE, even though hash ordered within the
partition for each DATE value.
Because the rows are logically grouped together, they reside in a set of cylinders
Based on usage patterns, all the cylinders of a partition will have same temperature.
As usage drops, partition cools off, eventually its cylinders get migrated out of
FAST to MEDIUM, then eventually to SLOW storage.
Newly loaded partition will assume temperature of previous latest (probably HOT).

While TVS can monitor data temperatures, it can’t change or manipulate the temperature of
your data because data temperatures are primarily dictated by the workloads that are run on
the warehouse. That is, the more queries that are run against a particular table (or tables) the
higher its temperature(s). The only way to change a table’s temperature is to alter the
number of queries that are run against it.
For technical accuracy, TVS temperature is measured at a cylinder level not a data level.
The facing page illustrates the result of data migration with Teradata Virtual Storage.
Teradata Virtual Storage enables you to more easily, more cost effectively, mix data with
different levels of importance on the same system
Advantages of Teradata Virtual Storage:


Allows incremental growth of storage



Provides lower cost method for adding “Cold” data to Enterprise Warehouse w/o
Performance Penalty to Bread and Butter Workload



Enhances multi-generation co-existence

Page 11-22

Teradata Virtual Storage

Multi-Temperature Data Example

DSS
History
DSS
Current
Tactical

Hot

Warm

Heavily Accessed
Operational Intelligence
Shallow History

Cool
Regulatory Compliance
Trending Analysis
Deep History

Hybrid Storage Data Usage Model

The closer this model fits your data, the more useful the Hybrid system will be.

Teradata Virtual Storage

Page 11-23

Teradata 6690 Cabinets
Each Teradata 6690 cabinets can be configured in a 1+1 or 2+1 clique configuration.


A processing/storage cabinet contains one clique.



A cabinet with a 2+1 clique contains two processing nodes, one hot standby node,
and four disk arrays.



A cabinet with a 1+1 clique contains one processing node, one hot standby node,
and four disk arrays.

Virtualized Management Server (VMS)
The VMS is available with the 6690 Enterprise Warehouse Server.
Characteristics of the VMS include:
•

1U Server that VIRTUALIZES system and cabinet management software onto a single
server

• Teradata System VMS – provides complete system management functionality
–
–
–
–

Cabinet Management Interface Controller (CMIC)
Service Workstation (SWS)
Teradata Viewpoint (single system only)
Automatically installed on base/first cabinet

• The VMS allows full rack solutions without an additional cabinet for traditional
Viewpoint and SWS
• Eliminates need for expansion racks reducing customers’ floor space and energy costs
• For multi-system monitoring and management traditional Teradata Viewpoint is
required.

Page 11-24

Teradata Virtual Storage

Teradata 6690 Cabinets
6690 Characteristics

• Integrated Cabinet with nodes and SSD and HDD arrays
in same cabinet.

• Each NetApp Drive Tray can hold up to 24 SSD and/or
HDD drives.

– SSD drives are 400 GB.
– HDD drives (10K RPM) are 600 GB.
– Possible maximum of 360 disks in the cabinet.

• The primary requirement for planning a 6690 system is
the completion of the Data Temperature assessment.

• There is a range of configurations to meet the
requirements of most customers’ data temperature
assessments.

Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives

VMS (1U)
HSN
TPA Node
TPA Node

2+1
Clique in
a single
cabinet

Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
Up to 24 SAS Drives
PDU

PDU

6690

Teradata Virtual Storage

Page 11-25

HHD to SSD Drive Configurations
The facing page lists possible hybrid system configurations.

Page 11-26

Teradata Virtual Storage

HHD to SSD Drive Configurations
• There are four preset HDD to SSD configurations (ratios of SSD:HDD per node)
which vary slightly between 1+1 and 2+1 cliques.
Solid State Devices (SSD)

Hard Disk Drives (HDD)

# of SSD per Clique/per Node

# of HDD per Clique/per Node

1+1 Configuration
16*

60

15:60

16*

120

15:120

18*

160

15:160

20

80

20:80

2+1 Configuration
30/15

120/60

30/15

240/120

30/15

320/160

40/20

160/80

• PUT requires specific and even SSD numbers in a clique, thus the difference between a 1+1
and 2+1 disks per node (16 or 18 vs. 15). 18 includes GHS drives.

• 6690 nodes are typically configured with 30 AMPs per node.

Teradata Virtual Storage

Page 11-27

Summary
The facing page summarizes the key points and concepts discussed in this module.

Page 11-28

Teradata Virtual Storage

Summary
• TVS is a change to the way in which Teradata accesses storage.
• Advantages include:
– Simplifies adding storage to existing cliques
– Enables mixing drive sizes / speeds / technologies
– Enables non-intrusive migration of data
• Purpose is to manage a Multi-Temperature Warehouse.
• Two related concepts for Multi-Temperature Data
– Performance of Storage
– Data Access Pattern – Frequency
• Pools all of the cylinders within a clique's disk space and allocates cylinders from this
storage pool to individual AMPs.

Teradata Virtual Storage

Page 11-29

Module 11: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 11-30

Teradata Virtual Storage

Module 11: Review Questions
1.

List two capabilities of using Teradata Virtual Storage.
____________________________________________________________
____________________________________________________________

2. List the two operational modes of Teradata Virtual Storage.
_______________________________

_______________________________

3. Which choice is associated with data temperature?
a.
b.
c.
d.

Skewed data
Frequency of access
Solid State Disk Drives
Inner tracks versus outer tracks on a spinning disk

4. Which data level is migrated from hot to cold storage?
a. Row
b. Block
c. Cylinder
d. Subtable
5. Which two types of data are typically considered to be HOT data?
a.
b.
c.
d.

WAL
DBC tables
Spool data
History data

Teradata Virtual Storage

Page 11-31

Notes

Page 11-32

Teradata Virtual Storage

Module 12
Physical Database Design Overview

After completing this module, you should be able to:
 Understand the stages of database design.
 List and describe the input requirements for database design.
 List and describe the outputs and objectives for database
design.
 Describe the differences between a Logical, Extended, and
Physical Data Model.

Teradata Proprietary and Confidential

Physical Database Design Overview

Page 12-1

Notes

Page 12-2

Physical Database Design Overview

Table of Contents
The Stages of Database Development........................................................................................ 12-4
Example of Data Model – ER Diagram ..................................................................................... 12-6
Customer Service Logical Model............................................................................................... 12-8
Relational Terms Review ......................................................................................................... 12-10
Domains ................................................................................................................................... 12-12
Attributes .................................................................................................................................. 12-14
Entities and Relationships ........................................................................................................ 12-16
Decomposable Data ................................................................................................................. 12-18
Normal Forms .......................................................................................................................... 12-20
Normalization........................................................................................................................... 12-22
Normalization Example ........................................................................................................... 12-24
Denormalizations ..................................................................................................................... 12-34
Derived Data ............................................................................................................................ 12-36
Pre-Joins ................................................................................................................................... 12-38
Exercise 1: Choose Indexes ..................................................................................................... 12-40
Tables Index Selection ............................................................................................................. 12-42
Database Design Components.................................................................................................. 12-44
Extended Logical Data Model ................................................................................................. 12-46
Physical Data Model ................................................................................................................ 12-48
The Principles of Index Selection ............................................................................................ 12-50
Transactions and Parallel Processing ....................................................................................... 12-52
Module 12: Review Questions ................................................................................................. 12-54

Physical Database Design Overview

Page 12-3

The Stages of Database Development
Four core stages are identified as being relevant to any database design task. They are:


Requirement Analysis involves eliciting the initial set of information and
processing requirements from users.



Logical Modeling determines the contents of a database independent of a
particular physical implementation’s exigencies.
–

Conceptual Modeling transforms the user requirements into a number of
individual user views normally expressed as entity-relationship diagrams.

–

View Integration combines these individual user views into a single global
schema expressed as key tables. The logical model is implemented by taking
the conceptual model as input and transforming it into the data model
supporting the target relational database management system (RDBMS). The
result is the relational data model.



Activity Modeling determines the volume, usage, frequency, and integrity analysis
of a database. This process also consists of placing any constraints on domains and
entities in addition to addressing any legal and ethical issues including referential
integrity.



Physical Modeling transforms the logical model into a definition of the physical
model suitable for a specific software and hardware configuration. In relational
terms, this is usually some schema expressed in a dialect of the data definition
language of SQL.

Outputs from these stages are shown on the right of the facing page and are as follows:


Business Information Model (BIM)
– shows major entities and their relationships
– also referred to as “Business Model”
– BIM acronym – also used for “Business Impact Model”



Logical Data Model (LDM) - should be in Third Normal Form (3NF)
– BIM plus all tables, minor entities, PK – FK relationships
– constraints and attributes (columns)



Extended Logical Data Model (ELDM)
– LDM plus demographics and frequencies



Physical Data Model (PDM)
– ELDM plus index selections and any denormalizations

Page 12-4

Physical Database Design Overview

The Stages of Database Development
Project Initiation

Requirements
Analysis

Initial Training and Research
Data Models
Project Analysis

(typically output from a stage)

Logical
Modeling

Logical Database Design – Conceptual
Modeling and View Integration

Business Information Model (BIM)
and/or Logical Data Model (LDM)

Activity
Modeling

Activity Modeling
– Volume
– Usage
– Frequency
– Integrity

Extended Logical Data Model (ELDM)

Physical
Modeling

Physical Database Design & Creation

Physical Data Model (PDM)

Application Development and Testing

Production Release

Physical Database Design Overview

Page 12-5

Example of Data Model – ER Diagram
The Customer Service database is designed to handle information pertaining to phone calls
by customers to Customer Service employees. The CALL table is the central table in this
database. On the facing page is the Entity-Relationship (E-R) diagram of the Customer
Service database. This type of model depicts entities and the relationships between them.
The E-R diagram provides you with a high-level perspective.

ERD Convention Overview
The following conventions are generally used in ER diagramming. Symbols in this module
are consistent with the ERwin data modeling tool’s conventions.
Convention

(FK)

Example
Independent entity. An independent entity does not depend on another entity for its
identification. It should have a single-column PK. PK attribute appears above the
horizontal line.
Dependent entity. A dependent entity depends on one or more other entities for its
identification. It generally has multiple columns in its PK, one or more of which is also
an FK. All PK attributes appear above the horizontal line.
A Foreign Key. An attribute in the entity that is the PK in another, closely related
entity. FK columns are shown above or below the horizontal dividing line in all
entities, depending on the nature of the relationship. For 1:1 and 1:M relationships,
their FKs are below the horizontal line. For M:M relationships the FKs participating in
the PK are above the horizontal line.
One-to-Zero, One, or Many occurrences (1:0-1-M). Solid lines indicate a relationship
(join path) between two entities. The dot identifies the child end of a parent-child
relationship between two entities.
The dotted line indicates that the child does not depend on the parent for identification.
One-to-At least One or More occurrences (1:1-M)
One-to-Zero, or at most One occurrence (1:0-1)
Zero or One-to-Zero, One, or Many occurrences (0-1:0-1-M). The diamond shape on
the originating end indicates the relationship is optional. Physically, this means that a
NULL value can exist for an occurrence of any row of the entity positioned at the
terminating end (filled dot) of the relationship.
Many-to-Many occurrences (M:M). A many-to-many relationship, also called a nonspecific relationship, represents a situation where an instance in one entity relates to
one or more instances in a second entity and an instance in the second entity also
relates to one or more instances in the first entity.
Indicates that each parent instance must participate in one and only one sub-type as
shown in the LDM.
Indicates that parent instances may or may not participate in one of the sub-types as
shown in the LDM.

Page 12-6

Physical Database Design Overview

Example of Data Model – ER Diagram
LOCATION_EMPLOYEE
EMP# (FK)
LOC# (FK)

LOCATION
LOC#
LINE1_ADDR
LINE2_ADDR
LINE3_ADDR
CITY
STATE
ZIP_CODE
CNTRY

LOCATION_PHONE
LOC# (FK)
AREA_CODE
PHONE
DESCR

DEPARTMENT
DEPT#
MGR_EMP# (FK)
DEPT_NAME
BUDGET_AMOUNT

CUSTOMER
CUST#
SALES_EMP# (FK)
CUST_NAME
PARENT_CUST_#
PARENT_CUST# (FK)

EMPLOYEE
EMP#
DEPT# (FK)
JOB_CODE (FK)
LAST_NAME
FIRST_NAME
HIRE_DATE
SALARY_AMOUNT
SUPV_EMP# (FK)

JOB
JOB_CODE
DESCR
HOURLY_BILLING_RATE
HOURLY_COST_RATE

EMPLOYEE_PHONE
EMP# (FK)
AREA_CODE
PHONE
DESCR

CONTACT
CONT#
CONT_NAME
AREA_CODE
PHONE
EXT
LAST_CALL_DATE
COMMENT

SYSTEM
SYS#
LOC# (FK)
INSTALL_DATE
RECONFIG_DATE
COMMENT

CALL_TYPE
CALL_TYPE_CODE
DESCR
CALL_PRIORITY
CALL_PRIORITY_CODE
DESCR

PART_CATEGORY
PART_CAT
DRAW#
PRICE_AMOUNT
DESCR

SYSTEM_LOG
SYS# (FK)
ENTERED_DATE
ENTERED_TIME
ENTERED_BY_USERID
LINE#
COMMENT_LINE

CALL_DETAIL
CALL# (FK)
ENTERED_BY_USERID
ENTERED_DATE
ENTERED_TIME
LINE#
COMMENT_LINE

Physical Database Design Overview

CALL_STATUS
CALL_STATUS_CODE
DESCR
CALL
CALL#
PLACED_BY_EMP# (FK)
PLACED_BY_CONT# (FK)
CALL_PRIORITY_CODE (FK)
TAKEN _BY_EMP#
CUST# (FK)
CALL_DATE
CALL_TIME
CALL_STATUS_CODE (FK)
CALL_TYPE_CODE (FK)
CALLER_AREA_CODE
CALLER_PHONE
CALLER_EXT
SYS# (FK)
PART_CAT (FK)
ORIG_CALL# (FK)

CALL_EMPLOYEE
EMP# (FK)
CALL# (FK)
CALL_STATUS_CODE (FK)
ASSIGNED_DATE
ASSIGNED_TIME
FINISHED_DATE
FINISHED_TIME
LABOR_HOURS

Page 12-7

Customer Service Logical Model
While the E-R diagram (previous page) was very helpful, it lacked the detail necessary for
broad user acceptance. How many columns are in the CALL table? What is the Primary
Key (PK) of the CALL table?
The logical model of the Customer Service database is depicted on the facing page. It shows
many more table-level details than the E-R diagram does. You can see the individual
column names for every table. In addition, there are codes to indicate PKs and Foreign
Keys (FKs), as well as columns which are System Assigned (SA) or which allow No
NULLS (NN) or No Duplicates (ND). Sample data values are also depicted.
This is the type of model that comes about as a result of Relational Data Modeling. This
example most closely represents a “Logical Data Model” or LDM.

Page 12-8

Physical Database Design Overview

Customer Service Logical Model
(ERA Methodology Diagram)
CALL
TAKEN
PLACED PLACED
BY
BY
ORIG CALL CALL
BY
CALL# EMP# CUST# CONT# EMP#
CALL# DATE TIME
PK,SA FK,NN FK
NN
FK
FK
FK
NN
1

4

1002

030215

1004

CALL
STATUS
CODE
FK,NN

0905

CALL CALL
CALLER
TYPE PRIORITY AREA
CALLER CALLER
SYS#
CODE CODE
CODE
PHONE EXT
FK,NN FK,NN
FK

1

CALL DETAIL

CALL PRIORITY

ENTERED
BY
ENTERED
CALL#
DATE
USERID

ENTERED
LINE#
TIME

CALL
PRIORITY
CODE

COMMENT
LINE

PK

PK

030215

LJC

1

1625

TOP

FK

1

1004

FK,NN,NC

NN

NN

1

891215

0905

NN

8010

CSB

NN
408

7654321

DEPARTMENT
BUDGET
AMOUNT

403

932000

EDUC

MGR
EMP#
FK,NN
1005

JOB

412101

NN,ND
F.E.

HOURLY
COST
RATE

FK,NN
1
030212

LAST
CALL
DATE COMMENT
NN

415

1234567

27

SUPV
EMP# EMP#
PK,SA
FK
1001

415

CALL TYPE

CUST PARENT SALES
CUST# NAME CUST# EMP#

030321

LOC EMP

AREA
CODE PHONE DESCR

LOC#

NN

PK

PK

NN,ND

FK

FK,NN

4

TDAT

3

1023

CALL TYPE
CODE
PK
H

EMP#
PK

1234567 OFFICE

FK

FK

1

1001

PART CATEGORY
PART
CAT
DRAW#

DESCR

PRICE
DESCR
AMT

NN,ND

PK

NN,ND

HDWR

1

A7Z348 1.27

NN
CLIP

SYSTEM LOG
JOB
DEPT# CODE
FK
FK

1003

401

412101

LAST
NAME
NN
NOMAR

FIRST HIRE
NAME DATE
JOE

890114

BIRTH
DATE

SALARY
AMOUNT

450824

50000.00

ENTERED
ENTERED ENTERED BY
USERID
SYS# DATE
TIME
LINE#

COMMENT
LINE

PK
FK

LOCATION

HOURLY
JOB
BILLING
CODE DESCR RATE
PK

LOC#

EMPLOYEE

DEPT
DEPT# NAME
PK
NN,ND

INSTALL RECONFIG
DATE
DATE
COMMENT

FK

1001

8.5

CUSTOMER

CONT AREA
CONT# NAME CODE PHONE EXT

LOC#

LOCATION PHONE

AREA
EMP# CODE PHONE DESCR
PK
FK

1625

891215

CONTACT

PK

PK
547

EMPLOYEE PHONE

CALL
STATUS ASSIGNED ASSIGNED FINISHED FINISHED LABOR
CALL# EMP# CODE
TIME
DATE
TIME
HOURS
DATE

FK

SYS#

When the

CALL EMPLOYEE

PK

SYSTEM

CALL
STATUS
CODE
DESCR
PK
NN,ND
1
OPEN

NN,ND

FK
5

CALL STATUS

DESCR

1

FK
1

1

4

H

PART
CAT

LOC#
PK,SA
1

547

LINE1
CUST# ADDR
FK,NN

NN

4

100 N.

Physical Database Design Overview

LINE2
ADDR

LINE3
ADDR

CITY
NN
ATLANTA

ZIP
STATE CODE
NN
GA

030212

1738

LJC

1

We added

CNTRY

NN
30096

USA

Page 12-9

Relational Terms Review
Relational theory uses the terms Relations, Tuples, and Attributes. Most people are more
comfortable with the terms Tables, Rows, and Columns.
Additional Relational terminology (such as Domains) will be discussed more completely on
the following pages.
Acronyms:
PK – Primary Key
FK – Foreign Key
SA – System Assigned
UA – User Assigned
NN – No NULLS
ND – No Duplicates
NC – No Changes
It is very important that you start with a well-documented relational model in Third Normal
Form. This model is used as the basis for an ELDM.
Knowledge of the hardware and software environment is crucial to doing physical database
design for Teradata. The final PDM should be optimized for site-specific implementation.
It is also crucial that you think in terms of the large volume of data that is usually stored in
Teradata databases. When working with such large-scale databases, extraneous I/Os can
have a great impact on performance.
By understanding how the Teradata Database System works, you can make constructive
physical design decisions that have a positive impact on your system’s performance.

Page 12-10

Physical Database Design Overview

Relational Terms Review
Operational
File Systems

Relational
Theory

Logical Models
& RDBMS systems

File
Record

Relation
Tuple

Entity or Table
Row

Field

Attribute

Column

Table

A two-dimensional representation of data composed of rows and columns.

Row

One occurrence in a relational table – a record.

Column

The smallest category of data in the model – a field or attribute.

Domain

The definition of a pool of valid values from which column values are drawn.

EMPLOYEE

EMP#

LAST NAME

FIRST NAME

PK, SA

NN

NN

01029821

Smith

John

Physical Database Design Overview

MI

NETWORK ID
FK, ND, NN

A

JS129101

Page 12-11

Domains
The following statements are true for domains and their administration in relational database
management systems:


A domain defines the SET of all possible valid values, which may appear in all
columns based within that domain.



A domain value is a fundamental non-decomposable unit of data.



A domain must have a domain name and a domain data type. Valid domain data
types are:
INTEGER
DECIMAL
CHARACTER
DATE
TIME
BIT STRING

Any integer value
Whole and fractional values
Alpha-numeric values
Valid Gregorian calendar dates
24 hour notation
Digitized data (e.g. photos, x-rays)

Domain Values
A domain defines the conceptual SET, or range, of all valid values that may appear in any
column based upon that domain.
Sometimes domains are restricted to specific values. For example:


Would you ever want negative employee numbers?



Has there ever been, or will there ever be, an employee with the employee number
of ZERO?

Page 12-12

Physical Database Design Overview

Domains
Domain – the definition of a pool of valid values from which column values are
drawn.
Employee_Number, INTEGER > 0

53912
-123

Dept_Number, INTEGER > 1000

-12308
43156

123

123
3718
123456
3.14159

9127
4095
1023
3718

0

123456
3.14159

0

Question: Does an Employee_Number of 3718 and a Dept_Number of 3718 represent
the same value?

Physical Database Design Overview

Page 12-13

Attributes
Types of Attributes
The types of attributes include:


Primary Key (PK): Uniquely identifies each row in a table



Foreign Key (FK): Identifies the relationship between tables



Non-Key Attributes: All other attributes that are not part of any key. They are
descriptive only, and do not define uniqueness (PK) or relationship (FK).



Derived Attributes: An attribute whose value can be calculated or otherwise
derived from other existing attributes. Example: NetPay is derived by calculating
GrossPay - TaxAmt.

Derived Attribute Issues
The attributes from which derived attributes are calculated are in the design, so carrying the
derived attribute in addition creates redundant data. Derived attributes may be identified and
defined in order to validate that the model can in fact deduce them, but they are not shown in
the ER Diagram, because carrying redundant data goes against relational design theory and
principles.
There are several good reasons to avoid carrying redundant data:


The data must be maintained in two places, which involves extra work, time and
expense.



There is a risk (likelihood) of the copies getting out of sync with each other,
causing data inconsistency.



It takes more physical storage.

Page 12-14

Physical Database Design Overview

Attributes
Types of Attributes

• Primary Key (PK): Uniquely identifies each row in a table
• Foreign Key (FK): Identifies the relationship between tables
• Non-Key Attributes: All other attributes that are not part of any key. They
are descriptive only, and do not define uniqueness (PK) or relationship (FK).

• Derived Attributes: An attribute whose value can be calculated or otherwise
derived from other existing attributes.
Example: Count of current employees in a department. A SUM of Employee table
meets this requirement.

Derived Attribute Issues

• Carrying a derived attribute creates redundant data.
• Reasons to avoid carrying redundant data:
– The data must be maintained in two places which possibly causes data
inconsistency.

– It takes more physical storage

Physical Database Design Overview

Page 12-15

Entities and Relationships
The entities and their relationships are shown in table form on the facing page. The naming
convention used for the tables and columns makes it easy to find the PK of any FK.
Acronyms:
PK – Primary Key
FK – Foreign Key
SA – System Assigned
UA – User Assigned
NN – No NULLS
ND – No Duplicates
NC – No Changes

Relationship Descriptions
Many-to-many relationships are usually implemented by an associative table (e.g.,
Order_Part table).
Examples of relationships are shown below.
1:1 and 1:M Relationships
(PK)
Country
Has
Customer
Has
Employee Generates
Generates
Receives
Location
Generates
Generates
Receives
Has Individual
Order
Requisitions individual

(FK)
LOCATIONs
LOCATIONs
ORDERs
SHIPMENTs
SHIPMENTs
ORDERs
SHIPMENTs
SHIPMENTs
PARTs
PARTs

M : M Relationships
Order/Part Category Show kinds of PARTs on an ORDER before it is filled (Direct.)
Order/Shipment

Page 12-16

Shows which PARTs belong to which ORDERs and SHIPMENTs
after the ORDER is filled (INDIRECT).

Physical Database Design Overview

Entities and Relationships
There are three types of relationships:
1:1 Relationships are rare.
Ex. One employee has only
one Network ID and a Network
ID is only assigned to one
Employee.

EMPLOYEE
EMPLOYEE
NUMBER
PK, SA
30547
21289

1:1

1:M

EMPLOYEE NETWORK
L_NAME
ID
FK, ND, NN
SMITH
BS100421
NOMAR
JN450824

M:M
NETWORK_USERS
NETWORK VIRTUAL
ID
FLAG
PK, UA
BS100421
JN450824
Y

SecurID
ND
231885
348145

1:M and M:M Relationships are common.
CUSTOMER
CUST
CUST
ID
NAME
PK, SA
1001
MALONEY
1002
JONES

CUST
ADDRESS

Examples:
1:M – A Customer can place many orders.
M:M – An Order can have many parts on it. The same part
can be on many Orders. An “associative” table is used
to resolve M:M relationships.

100 Brown St.
12 Main St.

ORDER
ORDER
#
PK, SA

ORDER
DATE

CUST
ID

1
2
3

2005-12-24
2006-01-23
2006-02-07

FK, NN
1001
1002
1001

Physical Database Design Overview

ORDER_ITEM
ORDER
ITEM
#
ID
PK
FK
FK
1
1
2

6001
6200
6001

ITEM
QTY

3
1
5

ITEM
ITEM
ID
PK

ITEM
DESC

RETAIL
PRICE

6001
6200

Paper
Printer

15.00
300.00

Page 12-17

Decomposable Data
Data may be either decomposable or atomic. Decomposable data can be broken down into
finer, smaller units while atomic data is already at its finest level.
There is a Relational Rule that “Domains must not be decomposable.” If you normalize
your relational design and create your tables based on domains, you will have columns that
do not contain decomposable data.
In practice, you may have columns that contain decomposable data. This will not cause
excessive problems if those columns are not used for access. You should create a column
for any individual character or number that is used for access.
A good example of decomposable data is a person’s name:


Name can be broken down into last name and first name.



Last name and first name are good examples of atomic data since they really can’t
be broken down into meaningful finer units.

There are several benefits to designing your system in this manner. You should get
increased performance because there will be fewer Full Table Scans due to partial value
index searches. Also, if the columns are NUSI’s, you will increase the chance of using
NUSI Bit Mapping. Finally, you will simplify the coding of your SQL queries.
Remember that storage and display are separate issues.

Page 12-18

Physical Database Design Overview

Decomposable Data
RELATIONAL RULE: Domains must not be decomposable.

•

Atomic level data should be defined.

•

Continue to normalize through the lifetime of the system.

•

Columns with multiple domains should be decomposed to the finest level of ANY
access.

•

Create a column for an individual character or number if it is used for access.

•

Storage and display are separate issues.

The GOAL:

•

Eliminate FTS (Full Table Scans) on partial value index searches.

•

Simplify SQL coding.

Physical Database Design Overview

Page 12-19

Normal Forms
Normalization is a set of rules and a methodology for making sure that the attributes in a
design are carried in the correct entity to map accurately to reality, eliminate data
redundancy and minimize update anomalies.
Stated simply: One Fact, One Place!


1NF, 2NF and 3NF are progressively more refined and apply to non-key attributes
regarding their dependency on PK attributes.



4NF and 5NF apply to dependencies between or among PK attributes.

For most models, normalizing to 3NF meets the business requirements.
Normalization provides a rigorous, relational theory based way to identify and eliminate
most data problems:


Provides precise identification of unique data values



Creates data structures which have no anomalies for access and maintenance
functions

Later in the module, we will discuss the impact of denormalizing a model and the effect it
may have (good or bad) on performance.
By implementing a model that is in Third Normal Form (3NF), you might gain the following
Teradata advantages.


Usually more tables – therefore, more primary index choices
–
–



Possibly fewer full table scans
More Data control

Fewer Columns per Row – usually smaller rows
–
–
–
–

Better user isolation from the data
Better application separation from the data
Better blocking
Less transient and permanent journaling space

These advantages will be discussed in Physical Design and Implementation portion of this
course.

Page 12-20

Physical Database Design Overview

Normal Forms
Once you’ve identified the attributes, the question is which ones belong in which
entities?

• A non-key attribute should be placed in only one entity.
• This process of placing attributes in the correct entities is called normalization.
First Normal Form (1NF)

• Attributes must not repeat within a table. No repeating groups.
Second Normal Form (2NF)

• An attribute must relate to the entire Primary Key, not just a portion.
• Tables with a single column Primary Key (entities) are always in Second Normal form.
Third Normal Form (3NF)

• Attributes must relate to the Primary Key and not to each other.
• Cover up the PK and the remaining attributes must not describe each other.

Physical Database Design Overview

Page 12-21

Normalization
The facing page illustrates violations of First, Second and Third Normal Form.

First Normal Form (1NF)
The rule for 1NF is that attributes must not repeat within a table. 1NF also requires that
each row has a unique identifier (PK). In the violation example, there are six columns
representing sales amount.

Second Normal Form (2NF)
The rule for 2NF is that attributes must describe the entire Primary Key, not just a portion.
In the violation example, the ORDER DATE column describes only the ORDER portion of
the Primary Key.

Third Normal Form (3NF)
The rule for 3NF is that attributes must describe only the Primary Key and not each other.
In the violation example, the JOB DESCRIPTION column describes only the JOB CODE
column and not the EMPLOYEE NUMBER (Primary Key) column.

Fourth (4NF) and Fifth (5NF) Normal Forms
4NF and 5NF are covered here only for your information. The vast majority of models never
apply these levels. Essentially these Normal Forms are designed to impose the same level of
consistency within a PK composed of more than two columns as the first 3NFs impose on
attributes outside the PK.
Entities with more than two columns in the PK often contain no non-key attributes. If nonkey attributes do exist, 4NF and 5NF violations are unlikely because bringing the model into
3NF compliance precludes them.
Usually 4NF and 5NF violations occur when the definition of the information to be
represented is ambiguous (e.g. the user has either not really understood what they are asking
for, or they have failed to state it clearly enough for the designer to understand it). 4NF and
5NF really represent two flip sides of the same issue: The PK must contain the minimum
number of attributes that accurately describe all of the business rules.

Formal Definitions:
4NF: The entity’s PK represents a single multi-valued fact that requires all PK attributes be
present for proper representation. Attributes of a multi-valued dependency are functionally
dependent on each other.
5NF: The entity represents, in its key, a single multi-valued fact and has no unresolved
symmetric constraints. A 4NF entity is also in 5NF if no symmetric constraints exist.

Page 12-22

Physical Database Design Overview

Normalization
Normalization is a technique for placing non-key attributes in tables in order to:

– Minimize redundancy
– Provide optimum flexibility
– Eliminate update anomalies
SALES HISTORY

First Normal Form (1NF)
attributes must not repeat
within a table.

Second Normal Form (2NF)
attributes must describe the entire
Primary Key, not just a portion.

Third Normal Form (3NF)
attributes must describe only the
Primary Key and not each other.

Physical Database Design Overview

FIGURES FOR LAST SIX MONTHS
EMP
NUMBER
PK, SA
2518

SALES

SALES

SALES

SALES

SALES SALES

32389

21405

18200

27590

29785

ORDER PART
ORDER
PART
NUMBER
NUMBER
PK
FK
FK
100
1234
100
2537
EMPLOYEE
EMPLOYEE EMPLOYEE
NUMBER
NAME
PK, SA
30547
SMITH
21289
NOMAR

ORDER
DATE

2005-02-15
2005-02-15

JOB
CODE
FK
9038
9038

35710

QUANTITY

200
100

JOB
DESCRIPTION
INSTRUCTOR
INSTRUCTOR

Page 12-23

Normalization Example
The facing page contains an illustration of a simple order form that a customer may use.
It is possible to simply convert this data file into a relational table, but it would not be in
Third Normal Form.

Dr. Codd Mnemonic
Every non-key attribute in an entity must depend on:
The KEY
The WHOLE key
And NOTHING BUT the Key
-- E.F. Codd

Page 12-24

- 1st Normal Form (1NF)
- 2nd Normal Form (2NF)
- 3rd Normal Form (3NF)

Physical Database Design Overview

Normalization Example
One of the order forms a customer uses is shown below.
Order # _______
Order # _______
Customer ID
Customer ID
Customer Name
Customer Name
Customer Address
Customer Address
Customer City
Customer City
Item
Item
ID
ID
______
______
______
______
______
______
______
______

Order Date ______
Order Date ______
__________
__________
__________________________
__________________________
____________________________________
____________________________________
____________ State _______ Zip _______
____________ State _______ Zip _______

Item
Item
Description
Description
_____________________
_____________________
_____________________
_____________________
_____________________
_____________________
_____________________
_____________________

Item
Item
Item(s)
Item
Item
Item(s)
Price
Quantity Total Price
Price
Quantity Total Price
_______ ______ ________
_______ ______ ________
_______ ______ ________
_______ ______ ________
_______ ______ ________
_______ ______ ________
_______ ______ ________
_______ ______ ________
Order Total ________
Order Total ________

Repeats

Physical Database Design Overview

A listing of the fields is:
Order #
Order Date
Customer ID
Customer Name
Customer Address
Customer City
State
Zip
Item ID
Item Description
Item Price
Item Quantity
Item(s) Total Price
Order Total

Page 12-25

Normalization Example (cont.)
The tables on the facing page represent the normalization to 1NF for the previous order form
example.
Recall that the rule for 1NF is that attributes must not repeat within a table.
Negative effects of violating 1NF include:




Page 12-26

Places artificial limits on the number of repeating items (attributes)
Sorting on the attribute becomes very difficult
Searching for a particular value of the attribute is more complex

Physical Database Design Overview

Normalization Example (cont.)
A modeler chooses to remove the repeating groups and creates two tables as shown
below.
Order Table

Order-Item Table

Order #
Order Date
Customer ID
Customer Name
Customer Address
Customer City
State
Zip
Order Total

Order #
Item ID
Item Description
Item Price
Item Quantity
Item(s) Total Price

This places the data in first normal form.

Physical Database Design Overview

Page 12-27

Normalization Example (cont.)
The tables on the facing page represent the normalization to 2NF for the previous order form
example.
Recall that the rule for 2NF is that attributes must describe the entire Primary Key, not just a
portion.
Negative effects of violating 2NF include:





Page 12-28

More disk space may be used
Redundancy is introduced
Updating is more difficult
Can also comprise the integrity of the data model

Physical Database Design Overview

Normalization Example (cont.)
A modeler checks that attributes describe the entire Primary Key.

Order Table

Order-Item Table

Item Table

Order #
Order Date
Customer ID
Customer Name
Customer Address
Customer City
State
Zip
Order Total

Order #
Item ID
Item Price (sale)
Item Quantity
Item(s) Total Price

Item ID
Item Description
Item Price (retail)

This places the data in second normal form.
As an option, the item price may be kept at the Order-Item level in the event a discount or
different price is given for the order. The Item table may identify the retail price.
The Order Total and Item(s) Total Price are derived data and may or may not be included.

Physical Database Design Overview

Page 12-29

Normalization Example (cont.)
The tables on the facing page represent the normalization to 3NF for the previous order form
example.
Recall that the rule for 3NF is that attributes must describe only the Primary Key and not
each other.
Negative effects of violating 3NF include:




Page 12-30

More disk space may be used
Redundancy is introduced
Updating is more costly

Physical Database Design Overview

Normalization Example (cont.)
A modeler checks that attributes only describe the Primary Key.
Order Table

Order-Item Table

Item Table

Order #
Order Date
Customer ID
Order Total

Order #
Item ID
Item Price (sale)
Item Quantity
Item(s) Total Price

Item ID
Item Description
Item Price (retail)

Customer Table
Customer ID
Customer Name
Customer Address
Customer City
State
Zip
These tables are now in third normal form. If the item sale price is always the same as the
retail price, then the item price only needs to be kept in the item table.
The Order Total and Item(s) Total Price are derived data and may or may not be included.

Physical Database Design Overview

Page 12-31

Normalization Example (cont.)
The facing page completes this example and illustrates the tables in a logical format
showing PK-FK relationships.

Page 12-32

Physical Database Design Overview

Normalization Example (cont.)
The tables are shown below in 3NF with PK-FK relationships.
ORDER
ORDER
#
PK, SA
1
2

ORDER
DATE

CUSTOMER
ID

2005-02-27
2005-04-24

FK
1001
1002

ORDER_ITEM
ORDER
ITEM
#
ID
PK
FK
FK
1
1
2

5001
5002
5001

CUSTOMER
CUST
CUST
ID
NAME
PK, SA
1001
MALONEY
1002
JONES

SALE
PRICE

ITEM
QUANTITY

15.00
300.00
15.00

2
1
1

ITEM
ITEM
ID
PK
5001
5002

CUST
ADDRESS

CUST
CITY

CUST CUST
STATE ZIP

100 Brown St.
Dayton
12 Main St.
San Diego

OH
CA

ITEM
DESCRIPTION

RETAIL
PRICE

PS20 Electric Pencil Sharpener
MFC140 Multi-Function Printer

15.00
300.00

45479
92127

Note that Items Total Price & Order_Total are not shown in this model.
How are Items Total Price & Order_Total handled?

Physical Database Design Overview

Page 12-33

Denormalizations
This course recommends that the corporate database tables that represent the company's
business be maintained in Third Normal Form (3NF). Due to the large volume of data
normally stored in a Teradata system, denormalization may be necessary to improve
performance. If you do denormalize, make sure that you are aware of all the trade-offs
involved.
It is also recommended that, whenever possible, you keep the normalized tables from the
Logical Model as an authoritative source and add additional denormalized tables to the
database. This module will cover the various types of denormalizations that you may
choose to use. They are:






Derived Data
Repeating Groups
Pre-Joins
Summary and/or Temporary Tables
Partitioning (Horizontal or Vertical)

Complete the Logical Model before choosing to use these denormalizations.
There are a few costs in normalizing your data. Typically, the advantages of having a data
model in 3NF outweigh the costs of normalizing your data.
Costs of normalizing to 1NF include:



you use more disk space
you have to do more joins

Costs of normalizing to 2NF when already in 1NF include:


you have to do more joins

Costs of normalizing to 3NF when already in 2NF include:


you have to do more joins

A customer may choose to implement a semantic layer between the data tables and the end
users. The simplest definition of a semantic layer is as the view layer that uses business
terminology and does presentation.
The semantic layer can also be viewed as a logical construct to support a presentation layer
which may interface directly with some end-user access methodology. The "semantic layer"
may change column names, derive new column values, perform aggregation, or whatever
else the presentation layer needed to support the users.

Page 12-34

Physical Database Design Overview

Denormalizations
Denormalize only when all of the trade-offs of the following are known. Examples of
denormalizations are:
• Derived data
• Pre-Joins
• Repeating groups
• Partitioning (Horizontal or Vertical)
• Summary and/or Temporary tables

Make these choices AFTER completing the Logical Model.

• Keep the Logical Model pure.
• Keep the documentation of the physical model up-to-date.
Denormalization may increase or decrease system costs.

•
•
•
•
•

It may be positive for some applications and negative for others.
It generally makes new applications harder to implement.
Any denormalization automatically reduces data flexibility.
It introduces the potential for data problems (anomalies).
It usually increases programming cost and complexity.
Note: Only a few denormalization examples are included in this module. Other techniques will be
discussed throughout the course.

Physical Database Design Overview

Page 12-35

Derived Data
Attributes whose values can be determined or calculated from other data are known as
Derived Data. Derived Data can be either integral or stand-alone, examples of which are
shown on the facing page. You should notice that integral Derived Data requires no
additional I/O and no denormalization. Stand-alone Derived Data, on the other hand,
requires additional I/O and may require denormalization.
Creating temporary tables to hold Derived Data is a good strategy when the Derived Data
will be used frequently and is stable.

Handling Derived Data
Storing Derived Data is a normalization violation that breaks the rule against redundant data.
Whenever you have stand-alone Derived Data, you must decide whether to calculate it or
store it. This decision should be based on the following demographics:




number of tables and rows involved
access frequency
column data value volatility and column data value change schedule

All above demographics are determined through Activity Modeling – also referred to as
Application and Transaction Modeling. The following table gives you guidelines on what
approach to take depending on the value of the demographics.
Guidelines apply when you have a large number of tables and rows. In cases where you
have a small number of tables and rows, calculate the Derived Data on demand.
Access
Frequency
High

Change
Rating
High

Update
Frequency
Dynamic

High

High

Scheduled

High
High

Low
Low

Dynamic
Scheduled

Low

?

?

Recommended Approach
Denormalize the model or use
Temporary Table
Use Temporary Table or produce batch
report
Use Temporary Table
Use Temporary Table or produce batch
report
Calculate on demand

Note that, in general, using summary/temporary tables is preferable to denormalization.
The example on the facing page shows an example of using a derived data column
(Employee Count) to identify the number of employees in a department.
This count can be determined by doing a count of employees from the Employee table.

Page 12-36

Physical Database Design Overview

Derived Data
Derived data is an attribute whose value can be determined or calculated from other
data. Storing a derived item is a denormalization (redundant data).

Normalized

DEPARTMENT
DEPT
DEPT
NUM
NAME
PK, SA
NN, ND
UPI
1001
ENGINEERING
1002
EDUCATION

EMPLOYEE
EMPLOYEE
NUMBER
PK, SA
UPI
22416
30547
82455
17435
23451

EMPLOYEE
NAME
NN

DEPT
NUM
FK

JONES
SMITH
NOMAR
NECHES
MILLER

1002
1001
1002
1001
1002

Carrying the count of the number of employees in a department is a normal forms violation. The
number of employees can be determined from the Employee table.

Denormalized

DEPARTMENT
DEPT
DEPT
EMPLOYEE
NUM
NAME
COUNT
PK, SA
NN, ND
Derived Data
UPI
1001
ENGINEERING
2
1002
EDUCATION
3

Physical Database Design Overview

EMPLOYEE
EMPLOYEE
NUMBER
PK, SA
UPI
22416
30547
82455
17435
23451

EMPLOYEE
NAME
NN

DEPT
NUM
FK

JONES
SMITH
NOMAR
NECHES
MILLER

1002
1001
1002
1001
1002

Page 12-37

Pre-Joins
Pre-Joins can be created in order to eliminate Joins to small, static tables (Minor Entities).
The example on the facing page shows a Pre-Join table that contains columns from both the
JOB and EMPLOYEE tables above it.
Although this is a violation of Third Normal Form, there are several reasons that you may
want to use it:




It is a good performance technique for the Teradata DBS especially when there are
known queries.
It is a good way to handle situations where you have tables with fewer rows than
AMPs.
You still have your original Minor Entity to maintain data consistency and avoid
anomalies.

Costs of pre-joins include:



Page 12-38

Additional space is required
More maintenance and I/Os are required.

Physical Database Design Overview

Pre-Joins
To eliminate joins to a small table (possibly static), consider including their attribute(s) in
the parent table.
NORMALIZED

DENORMALIZED

JOB
JOB
CODE
PK, SA
UPI
1015
1023

JOB
DESCRIPTION
NN, ND
PROGRAMMER
ANALYST

EMPLOYEE
EMPLOYEE
NUMBER
PK, SA
UPI
22416
30547

EMPLOYEE
EMPLOYEE
NUMBER
PK, SA
UPI
22416
30547

EMPLOYEE
NAME

JOB
CODE
FK

JONES
SMITH

1023
1015

EMPLOYEE
NAME

JOB
CODE

JOB
DESCRIPTION

JONES
SMITH

1023
1015

ANALYST
PROGRAMMER

Reasons you may want Pre-Joins:

• Performance technique when there are known queries.
• Option to handle situations where you have tables with fewer rows than AMPs.
A Join Index (Teradata feature covered later) provides a way of creating a “pre-join table”.
As the base tables are updated, the Join Index is updated automatically.

Physical Database Design Overview

Page 12-39

Exercise 1: Choose Indexes
At right is the EMPLOYEE table from the CUSTOMER_SERVICE database. The legend
below explains the abbreviations you see below the column names. The following pages
contain fifteen more PTS tables.
Choose the best indexes for these tables. Remember, you must choose exactly one Primary
Index per table, but you may choose up to 32 Secondary Indexes.
Primary Keys do not have to be declared. Any Primary Key which is declared must have all
columns of the PK defined as NOT NULL, and will be implemented by Teradata as a
Unique index (UPI or USI).

 REMEMBER 
The Primary Key is the logical reference for the Logical Data
Model. The Primary Index is the physical access mechanism for the
Physical Data Model. They may be but will not always be the same.

Page 12-40

Physical Database Design Overview

Exercise 1: Choose Indexes
The next page contains a portion of the logical model of the PTS database.
Indicate the candidate index choices for all of the tables. An example is shown below.

The Teradata database supports four index types:
UPI (Unique Primary Index)
USI (Unique Secondary Index)

NUPI (Non-Unique Primary Index)
NUSI (Non-Unique Secondary Index)

EMPLOYEE
50,000
Rows
PK/FK
PI/SI

SUPV
JOB
LAST FIRST HIRE BIRTH
EMP# EMP# DEPT# CODE NAME NAME DATE DATE
PK,SA
UPI

FK

FK

FK

NUSI

NUSI

NUSI

NN

NN

NN

NN

SAL
AMT
NN

NUSI

LEGEND
PK = Primary Key (implies NC, ND, NN)
NC = No Change
ND = No Duplicates
NN = No Nulls

Physical Database Design Overview

FK = Foreign Key
SA = System Assigned Value
UA = User Assigned Value

Page 12-41

Tables Index Selection
On the facing page, you will find some of the tables in the PTS database.
Choose the best indexes for these tables. Remember that you must choose exactly one
Primary Index per table, but you may choose up to 32 Secondary Indexes.

Page 12-42

Physical Database Design Overview

Tables Index Selection
LOCATION
PK/FK

LOC#

CUST#

LINE1
ADDR

LINE2
ADDR

PK,SA

FK,NN

NN

ORD#

CUST#

LOC#

ORD
DATE

PK,SA

FK,NN

FK,NN

NN

LINE3
ADDR

CITY

STATE ZIP

CNTRY

NN

PI/SI

ORDER
PK/FK

CLOSE
DATE

UPD
DATE

UPD
TIME

UPD
USER

SA,NN

SA,NN

FK,NN

SHIP#

ORD#

STAT

FK,NN

FK,NN

SA,NN

PI/SI

PART
PK/FK

PART#

PART
CAT

SER#

LOC#

PK,SA

FK,NN

FK,NN

FK,NN

SYS#

UPD
DATE

PI/SI

Physical Database Design Overview

Page 12-43

Database Design Components
Each System Development Phase adds to the design. As we mentioned earlier, they are:


Logical Data Modeling



Extended Data Modeling (also known as Application and Transaction Modeling;
we will call it Activity Modeling).



Physical Data Modeling

First and foremost, make sure the system is designed as a
function of business usage and not the reverse.
Let usage drive design.

Page 12-44

Physical Database Design Overview

Database Design Components

Data
Demographics

Logical
Data
Model

Application
Knowledge
(CURRENT) (FUTURE)

• A good logical model reduces application workload.
• Thorough application knowledge produces dependable demographics.
• Proper demographics are needed to make sound index choices.
• Though you don’t know users’ access patterns, you will need that information in the
future. For example, management may want to know why there are two copies of data.

• For DSS, OLAP, and Data Warehouse systems, aim for even distribution and let
Teradata parallel architecture handle the changing access needs of the users.

Physical Database Design Overview

Page 12-45

Extended Logical Data Model
At right is the Extended Logical Data Model (ELDM), which includes data demographic
information pertaining to data distribution, sizing and access.
Information provided by the ELDM results from user input about transactions and
transaction rates.
The Delete Rules and Constraint Numbers (from a user-generated list) are provided as an aid
to application programmers, but have no effect on physical modeling.
The meaning and importance of the other ELDM data to physical database design will be
covered in coming modules of this course.

Page 12-46

Physical Database Design Overview

Extended Logical Data Model
TABLE NAME: Employee

EXTENDED LOGICAL DATA
MODEL

• It provides demographics

DESCRIPTION: Someone who works for our company and on payroll.
ROW COUNT:

of data distribution, sizing
and access.

• It is the main information
source for creating the
physical data model

PK/FK

SUPERVISOR
EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
NUMBER NUMBER
NUMBER
CODE NAME NAME DATE DATE AMOUNT
PK, SA
FK
FK
FK NN
N

DEL RULES
CONSTR#

101

N

P

101

VALUE ACC
FREQ

10K

0

8K

1K

200

0

0

0

0

JOIN ACC
FREQ

17K

50

12K

6K

0

0

0

0

0

JOIN ACC
ROWS

136K

10K

96K

50K

0

0

0

0

0

DISTINCT
VALUES

50K

7K

2K

3K

40K

NA

NA

NA

NA

MAXIMUM
ROWS/VAL

1

30

40

4K

2K

NA

NA

NA

NA

MAX ROWS
NULL

0

1

18

40

0

NA

NA

NA

NA

TYPICAL
ROWS/VAL

1

7

23

15

1

NA

NA

NA

NA

CHANGE
RATING

0

3

2

4

1

NA

NA

NA

NA

2431

18

OZ

SAMPLE
DATA

Physical Database Design Overview

TABLE TYPE: Entity

EMPLOYEE

• It maps applications and
transactions to the related
tables, columns and row
sets.

50,000

8326

647

WIZ

Page 12-47

Physical Data Model
The model at right is the Physical Data Model (PDM), which contains the same
information as the ELDM except that index selections and other physical design choices
such as data protection mechanisms (e.g., Fallback) have been added.
A complete PDM will define all tables, indexes and views to be implemented. Due to
physical design considerations, the PDM may differ from the logical model. In general, the
more the PDM differs from the logical model, the less flexible it is and the more
programming it requires.

Page 12-48

Physical Database Design Overview

Physical Data Model
PHYSICAL DATA MODEL

• A collection of DBMS
constructs that define the
tables, indexes and views
to be implemented.

TABLE NAME: Employee
DESCRIPTION: Someone who works for our company and on payroll.
FALLBACK: YES

• The more it differs, the
less flexible it is and the
more programming it
requires.

PK/FK

SUPERVISOR
EMPLOYEE EMPLOYEE DEPARTMENT JOB LAST FIRST HIRE BIRTH SALARY
NUMBER
NUMBER
NUMBER
CODE NAME NAME DATE DATE AMOUNT
PK, SA
FK
FK
FK NN
N

DEL RULES
CONSTR#

101

N

P

101

VALUE ACC
FREQ

10K

0

8K

1K

200

0

0

0

0

JOIN ACC
FREQ

17K

50

12K

6K

0

0

0

0

0

JOIN ACC
ROWS

136K

10K

96K

50K

0

0

0

0

0

DISTINCT
VALUES

50K

7K

2K

3K

40K

NA

NA

NA

NA
NA

MAXIMUM
ROWS/VAL

1

30

40

4K

2K

NA

NA

NA

MAX ROWS
NULL

0

1

18

40

0

NA

NA

NA

NA

TYPICAL
ROWS/VAL

1

7

23

15

1

NA

NA

NA

NA

0

3

4

1

NA

NA

NA

NA

OZ

WIZ

CHANGE
RATING

Physical Database Design Overview

TABLE TYPE: Entity

EMPLOYEE

the entities of the business
function.
logical model due to
implementation issues.

IMPLEMENTATION: 3NF

ROW COUNT: 50,000

• The main tables represent

• It may differ from the

PERM JRNL: NO

PI/SI

UPI

SAMPLE
DATA

8326

647

2

NUSI

NUSI

2431

18

Page 12-49

The Principles of Index Selection
The right-hand page illustrates the many factors that impact Index selection. As you can
see, they represent all three of the Database Design Components (Logical Data Model, Data
Demographics and Application Knowledge).
Index selection can be summarized as follows:




Page 12-50

Start with a well-documented 3NF logical model.
Develop demographics to create the ELDM.
Make index selections based upon these demographics.

Physical Database Design Overview

The Principles of Index Selection
There are many factors which guide the designer in choosing indexes:

–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–

The way the system uses the index.
The space the index requires.
The table type.
The number of rows in the table.
The type of data protection.
The column(s) most frequently used to access rows in the table.
The number of distinct column values.
The maximum rows per value.
Whether the rows are accessed by values or through a Join.
The primary use of the table data (Decision support, Ad Hoc, Batch Reporting,
Batch Maintenance, OLTP).
The number of INSERTS and when they occur.
Through
Throughlecture
lectureand
andexercises,
exercises,
The number of DELETEs and when they occur.
this
course
points
this course pointsout
outthe
the
The number of UPDATEs and when they occur.
importance
and
use
of
all
importance
and
use
of
allthese
these
The way transactions are written.
factors.
factors.
The way the transactions are parceled.
The level and type of locking a transaction requires.
How long a transaction hold locks.
How normalized the data model is.

Physical Database Design Overview

Page 12-51

Transactions and Parallel Processing
One additional goal of this course is to point out what causes all-AMP operations. In some
cases, they are accidental and can be changed into one-or two-AMP operations.
To have the maximum number of transactions that need only one-or two-AMPs, you require
a good logical model (Third Normal Form), a good physical model (what you will learn
about in this course), and good SQL coding (we will provide some examples).

Page 12-52

Physical Database Design Overview

Transactions and Parallel Processing
Teradata does all-AMP processing very efficiently.
However, one-AMP and two-AMP processing is even more efficient. It allows
the existing configuration to support a greater workload.
Ideal for Decision Support
(DSS), Ad Hoc, Batch
Processing,
and some Batch Maintenance
operations.

TXN1
TXN2
TXN3
TXN4

AMP1

Best for OLTP, tactical
transactions, and preferred
for many Batch Maintenance
operations. Created by a
good Logical Model AND a
good Physical Model AND
good SQL coding.

AMP2

TXN1

TXN13
TXN18

AMP1

AMP4

TXN2

TXN7
TXN12

AMP3

AMP2

AMP5

TXN3
TXN8

TXN9

AMP6

AMP7

AMP8

TXN4

TXN5

TXN6

TXN10

TXN11

TXN14

TXN15

TXN19

TXN20

TXN21

AMP5

AMP6

AMP3

AMP4

TXN16

TXN17
TXN22

AMP7

AMP8

This
This course
coursepoints
pointsout
outthe
themethods
methodsof
ofmaximizing
maximizingthe
theuse
use of
ofone-AMP
one-AMP and
and two-AMP
two-AMP
transactions
transactionsand
andwhen
whenall-AMP
all-AMPoperations
operationsare
areneeded.
needed.

Physical Database Design Overview

Page 12-53

Module 12: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 12-54

Physical Database Design Overview

Module 12: Review Questions
1.

Which three are benefits to creating a data model in 3NF? ____ ____ ____
a.
b.
c.
d.
e.

Minimize redundancy
To reduce update anomalies
To improve distribution of data
To improve flexibility of access
To reduce number of I/Os to access data

2. Which data model would include the definition of a partitioned primary index? ____
a.
b.
c.
d.
3.

Which two factors should be considered when deciding to denormalize a table? ____ ____
a.
b.
c.
d.

4.

Logical data model
Physical data model
Business information model
Extended logical data model

Volatility
Performance
Distribution of data
Connectivity of users

Which is a benefit of implementing data types at the domain level? ____
a.
b.
c.
d.

Reduce storage space
Avoid data conversion
Provides consistent display of data
Reduce need for secondary indexes

Physical Database Design Overview

Page 12-55

Notes

Page 12-56

Physical Database Design Overview

Module 13
Data Distribution and Hashing

After completing this module, you will be able to:
 Describe the data distribution form and method.
 Describe Hashing.
 Describe Primary Index hash mapping.
 Describe the reconfiguration process.
 Describe a Block Layout.
 Describe File System Read Access.

Teradata Proprietary and Confidential

Data Distribution and Hashing

Page 13-1

Notes

Page 13-2

Data Distribution and Hashing

Table of Contents
Data Distribution ........................................................................................................................ 13-4
Hashing ...................................................................................................................................... 13-6
Enhanced Hashing Algorithm Starting with Teradata 13.10 ................................................. 13-6
Hash Related Expressions .......................................................................................................... 13-8
Hashing – Numeric Data Types ............................................................................................... 13-10
Multi-Column Hashing ............................................................................................................ 13-12
Multi-Column Hashing (cont.) ............................................................................................. 13-14
Additional Hash Examples....................................................................................................... 13-16
Using Hash Functions to View Distribution ............................................................................ 13-18
Identifying the Hash Buckets ............................................................................................... 13-18
Identifying the Primary AMPs ............................................................................................. 13-18
Primary Index Hash Mapping .................................................................................................. 13-20
Hash Maps................................................................................................................................ 13-22
Primary Hash Map ................................................................................................................... 13-24
Hash Maps for Different Systems ............................................................................................ 13-26
Fallback Hash Map .................................................................................................................. 13-28
Reconfiguration ........................................................................................................................ 13-30
Row Retrieval via PI Value – Overview .................................................................................. 13-32
Names and Object IDs ............................................................................................................. 13-34
Table ID ................................................................................................................................... 13-36
Spool File Table IDs ........................................................................................................ 13-36
Row ID ..................................................................................................................................... 13-38
AMP File System – Locating a Row via PI ............................................................................. 13-40
Teradata File System Overview ............................................................................................... 13-42
Master Index Format ................................................................................................................ 13-44
Cylinder Index Format ............................................................................................................. 13-46
Data Block Layout ................................................................................................................... 13-48
Example of Locating a Row – Master Index ........................................................................... 13-50
Example of Locating a Row – Cylinder Index......................................................................... 13-52
Example of Locating a Row – Data Block............................................................................... 13-54
Accessing the Row within the Data Block............................................................................... 13-56
AMP Read I/O Summary ......................................................................................................... 13-58
Module 13: Review Questions ................................................................................................. 13-60

Data Distribution and Hashing

Page 13-3

Data Distribution
Parsing Engines (PE) are assigned either to channel connections (e.g., IBM Mainframe) or
to LAN connections. Data is always stored by the AMPs in 8-bit ASCII. If the input is in
EBCDIC, the PE converts it to ASCII before any hashing and distribution takes place.
A USER may have a COLLATION = EBCDIC, ASCII, MULTINATIONAL, or HOST. If
the HOST is an EBCDIC host or COLLATION = EBCDIC, then the AMPs convert from
ASCII to EBCDIC before doing any comparisons or sorts. MULTINATIONAL allows sites
to create their own collation file. Otherwise, all comparisons and sorts use the ASCII
collating sequence.
Teradata has no concept of pre-allocated table space. The rows of all hashed tables are
distributed randomly across all AMPs and then randomly within the space available on the
selected AMP.

Page 13-4

Data Distribution and Hashing

Data Distribution
Records From Client (in random sequence)
2

32

67

12

90

6

54

75

18

25

80

41

From
Host

Teradata

ASCII

EBCDIC

Data distribution is
dependent on the
hash value of the
primary index.

Parsing
Engine(s)

Converted
and
Hashed

Parsing
Engine(s)
ASCII

Distributed

Message Passing Layer

AMP 0

2

AMP 1

AMP 2

12

80

54
18

90
41

25

67
75
32

Data Distribution and Hashing

AMP 3

Formatted

Stored

6

Page 13-5

Hashing
Hashing is the mechanism by which Teradata utilizes the Primary Index to distribute rows
of data. The Hashing Algorithm acts like a mathematical “blender”. It takes up to 64
columns of mixed data as input and generates a single 32-bit binary value called a Row
Hash.


The Row Hash is the logical storage locator of the row. A part of this value is used
in determining the AMP to which the row is distributed.



Teradata uses the Row Hash value for distribution, placement and retrieval of rows.

The Hashing Algorithm is random but consistent. Although consecutive PI values do not
normally produce consecutive hash values, identical Primary Index (PI) values always
generate the same Row Hash (assuming that the data types hash identically). Rows with the
same Row Hash are always distributed to the same AMP.
Different PI values rarely produce the same Row Hash. When this does occur, they are
known as Hash Synonyms or Hash Collisions.
Note: Upper and lower case values hash to the same hash value. For example, ‘Jones’ and
‘JONES’ generate the same hash value.

Enhanced Hashing Algorithm Starting with Teradata 13.10
This enhancement is targeted to reduce the number of hash collisions for character data
stored as either Latin or Unicode, notably strings that contain primarily numeric data.
Reduction in hash collisions reduces access time per AMP and produces a more balanced
row distribution which in-turn improves parallelism. Reduced access time and increased
parallelism translate directly to better performance.
This capability is only available starting in TD 13.10. This feature is available to new
systems and requires a System Initialization (sysinit) for existing systems. It is anticipated
that typically this activity would be performed during technology refresh opportunities.

Page 13-6

Data Distribution and Hashing

Hashing
• The Hashing Algorithm creates a fixed length value from any length input string.
• Input to the algorithm is the Primary Index (PI) value of a row.
• The output from the algorithm is the Row Hash.
– A 32-bit binary value.
– Used to identify the AMP of the row and the logical storage location of the row in the AMP.
– Table ID + Row Hash is used to locate the Cylinder and Data Block.

• Row Hash uniqueness depends directly on PI uniqueness.
– Good data distribution depends directly on Row Hash uniqueness.

• The algorithm produces random, but consistent, Row Hashes.
– The same PI value and data type combination always hash identically.
– Rows with the same Row Hash will always go to the same AMP.

• Teradata has a new "Enhanced Hashing Algorithm" starting with Teradata 13.10 new
systems and fresh installs (sysinit).
– Solves the problem of too many hash synonyms when character columns contain numeric
data.

– Problem most commonly occurs with long strings of numeric data in CHAR or VARCHAR
columns as either Latin or Unicode.

Data Distribution and Hashing

Page 13-7

Hash Related Expressions
The Teradata Database includes extensions to Teradata SQL, known as hash functions,
which allow the user to extract statistical properties from the current index, evaluate those
properties for other columns to determine their suitability as a future primary index, or more
effectively design the primary index of rows. These statistics also help minimize hash
synonyms and enhance data distribution uniformity. Hash functions are valid within a
Teradata SQL statement where other functions (like SUBSTRING or INDEX) can occur.
HASHROW — this function returns the row hash value of a given sequence of expressions
in BYTE (4) data type. For example, the following statement returns the average number of
rows per row hash where C1 and C2 constitute an index (or potential index) of table TabX
SELECT COUNT(*) (FLOAT) / COUNT (DISTINCT(HASHROW (C1,C2))
FROM TabX;

HASHBUCKET — this function returns the bucket number that corresponds to a hashrow.
The bucket number is an integer type. The following example returns the number of rows in
each hash bucket where C1 and C2 are an index (or potential index) of table TabX:
SELECT HASHBUCKET (HASHROW(C1,C2)), COUNT(*)
FROM TabX
GROUP BY 1 ORDER BY 1;

Query results can be treated as a histogram of table distribution among the hash buckets.
HASHAMP and HASHBACKAMP — this function returns the identification number of the
primary or fallback AMP corresponding to a hashbucket. With Teradata V2R6.2 (and
before), HASHAMP accepts only integer values between 0 and 65,535 as its argument. In
this example, HASHAMP is used to determine the number of primary rows on each AMP
where C1 and C2 are to be the primary index of table TabX:
SELECT HASHAMP (HASHBUCKET (HASHROW (C1, C2))), COUNT(*)
FROM TabX
GROUP BY 1 ORDER BY 1;

Query results can be treated as a histogram of the table distribution among the AMPs.
Further information on these functions and their uses can be found in the Teradata RDBMS
SQL Reference.
Note the examples on the facing page. This example was captured on a 26 AMP system
using a hash map with 1,048,576 entries.
The row hash of the literal 'Teradata' is the same with 16-bit or 20-bit hash bucket numbers.
However, the target AMP numbers are different for a system with 65,536 hash buckets as
compared to the same system with 1,048,576 hash buckets.

Page 13-8

Data Distribution and Hashing

Hash Related Expressions
• The SQL hash functions are:
HASHROW (column(s))
HASHAMP (hashbucket)

HASHBUCKET (hashrow)
HASHBAKAMP (hashbucket)

• Example 1:
SELECT

HASHROW ('Teradata')
,HASHBUCKET (HASHROW ('Teradata'))
,HASHAMP (HASHBUCKET (HASHROW ('Teradata')))
,HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata')))
Hash Value
F5C4BC93

Bucket Num
1006667

AMP Num
12

AS "Hash Value"
AS "Bucket Num"
AS "AMP Num"
AS "AMP Fallback Num" ;

AMP Fallback Num
25

AMP Numbers based on 26-AMP system with 1,048,576 hash buckets.

• Example 2:
SELECT

HASHROW ('Teradata')
,HASHROW ('Teradata ')
,HASHROW (' Teradata')
Hash Value 1
F5C4BC93

Data Distribution and Hashing

AS "Hash Value 1"
AS "Hash Value 2"
AS "Hash Value 3" ;

Hash Value 2
F5C4BC93

Hash Value 3
01989D47

Note: Literals are
converted to Unicode and
then hashed.

Page 13-9

Hashing – Numeric Data Types
The hashing algorithm will hash the same numeric value in different data types to the same
value.
A DATE data type and an INTEGER data type hash to the same value. An example
follows:
CREATE TABLE tableE
(c1_int
INTEGER
,c2_date
DATE)
UNIQUE PRIMARY INDEX (c1_int);
INSERT INTO tableE (1010601, 1010601);
INSERT INTO tableE (NULL, NULL);
SELECT c1_int, HASHROW (c1_int), HASHROW (c2_date) from tableE;
c1_int
1010601
?

HASHROW (c1_int)
1213C458
00000000

HASHROW (c2_date)
1213C458
00000000

A second example follows:
CREATE TABLE tableF
(c1_int
INTEGER
,c2_int
INTEGER
,c3_char
CHAR(4)
,c4_char
CHAR(4))
UNIQUE PRIMARY INDEX (c1_int, c2_int);
INSERT INTO tableF (0, NULL,'0', NULL);
SELECT HASHROW (c1_int) AS "Hash c1"
,HASHROW (c2_int) AS "Hash c2"
,HASHROW (c3_char) AS "Hash c3"
,HASHROW (c4_char) AS "Hash c4"
FROM tableF;
Hash c1
00000000

Hash c2
00000000

Hash c3
2BB7F6D9

Hash c4
00000000

Note: The BTEQ commands .SET SIDETITLES and .SET FOLDLINE were used to display
the output on the bottom of the facing page.

Page 13-10

Data Distribution and Hashing

Hashing – Numeric Data Types
• The Hashing Algorithm hashes the following numeric data types to the same hash
value:
– BYTEINT, SMALLINT, INTEGER, BIGINT, DECIMAL(x,0), DATE
Example:
SELECT

CREATE TABLE tableA
(c1_bint
BYTEINT
,c2_sint
SMALLINT
,c3_int
INTEGER
,c4_bigint
BIGINT
,c5_dec
DECIMAL(8,0)
,c6_dec2
DECIMAL(8,2)
,c7_float
FLOAT
,c8_char
CHAR(10))
UNIQUE PRIMARY INDEX
(c1_bint, c2_sint);

FROM

INSERT INTO tableA (5, 5, 5, 5, 5, 5, 5, '5');

Output from SELECT

Data Distribution and Hashing

HASHROW
,HASHROW
,HASHROW
,HASHROW
,HASHROW
,HASHROW
,HASHROW
,HASHROW
tableA;

(c1_bint)
(c2_sint)
(c3_int)
(c4_bigint)
(c5_dec)
(c6_dec2)
(c7_float)
(c8_char)

Hash Byteint
Hash Smallint
Hash Integer
Hash BigInt
Hash Dec80
Hash Dec82
Hash Float
Hash Char

AS "Hash Byteint"
AS "Hash Smallint"
AS "Hash Integer"
AS "Hash BigInt"
AS "Hash Dec80"
AS "Hash Dec82"
AS "Hash Float"
AS "Hash Char"

609D1715
609D1715
609D1715
609D1715
609D1715
BD810459
E40FE360
551DCFDC

Page 13-11

Multi-Column Hashing
The hashing algorithm uses multiplication and addition as commutative operators for
handling a multi-column index.
If the data types hash the same, a multi-column index will hash the same for the same values
in different columns. Note the example on the facing page.
Note: The result would be the same if 3.0 and 5.0 were used as decimal values instead of 3
and 5.
INSERT INTO tableB (5, 3.0);
INSERT INTO tableB (3, 5.0);
SELECT c1_int AS c1
,c2_dec AS c2
,HASHROW (c1_int) AS “Hash c1”
,HASHROW (c2_dec) AS “Hash c2”
,HASHROW (c1_int, c2_dec) as “Hash c1c2”
FROM tableB;
c1
5
3

Page 13-12

c2
3
5

Hash c1
609D1715
6D27DAA6

Hash c2
6D27DAA6
609D1715

Hash c1c2
6C964A82
6C964A82

Data Distribution and Hashing

Multi-Column Hashing
•

The Hashing Algorithm uses multiplication and addition to create the hash value for a multi-column
index.

•

Assume PI = (A, B)
[Hash(A) * Hash(B)] + [Hash(A) + Hash(B)] = [Hash(B) * Hash(A)] + [Hash(B) + Hash(A)]

•

Example: A PI of (3, 5) will hash the same as a PI of (5, 3) if both c1 & c2 are equivalent data types.

CREATE TABLE tableB
(c1_int
INTEGER
,c2_dec
DECIMAL(8,0))
UNIQUE PRIMARY INDEX (c1_int, c2_dec);
INSERT INTO tableB (5, 3);
INSERT INTO tableB (3, 5);

SELECT c1_int AS c1
,c2_dec AS c2
,HASHROW (c1_int) AS "Hash c1"
,HASHROW (c2_dec) AS "Hash c2"
,HASHROW (c1_int, c2_dec) as "Hash c1c2"
FROM
tableB;
*** Query completed. 2 rows found. 5 columns returned.

These two rows will
hash the same and
will produce a hash
synonym.

Data Distribution and Hashing

c1

c2

5
3

3
5

Hash c1

Hash c2

Hash c1c2

609D1715
6D27DAA6

6D27DAA6
609D1715

6C964A82
6C964A82

Page 13-13

Multi-Column Hashing (cont.)
As mentioned before, the hashing algorithm uses multiplication and addition as
commutative operators for handling a multi-column index.
If the data types hash differently, then a multi-column index will hash differently for the
same values in different columns. Note the example on the facing page.

Page 13-14

Data Distribution and Hashing

Multi-Column Hashing (cont.)
•

A PI of (3, 5) will hash differently than a PI of (5, 3) if column1 and column2 are data
types that do not hash the same.

•

Example:

CREATE TABLE tableC
(c1_int
INTEGER
,c2_dec
DECIMAL(8,2))
UNIQUE PRIMARY INDEX (c1_int, c2_dec);
INSERT INTO tableC (5, 3);
INSERT INTO tableC (3, 5);

SELECT c1_int AS c1
,c2_dec AS c2
,HASHROW (c1_int) AS "Hash c1"
,HASHROW (c2_dec) AS "Hash c2"
,HASHROW (c1_int, c2_dec) as "Hash c1c2"
FROM
tableC;
*** Query completed. 2 rows found. 5 columns returned.

These two rows will
not hash the same
and probably will not
produce a hash
synonym.

Data Distribution and Hashing

c1

c2

5
3

3.00
5.00

Hash c1

Hash c2

Hash c1c2

609D1715
6D27DAA6

A4E56902
BD810459

0E452DAE
336B8C96

Page 13-15

Additional Hash Examples
A numeric value of 0 hashes the same as a NULL. A character data type with a value of all
spaces also hashes the same as a NULL. However, a character value of ‘0’ hashes to a value
different than the hash of a NULL.
Upper and lower case characters hash the same.
The following example shows that different numeric types with a value of 0 all hash to the
same hash value.
CREATE TABLE tableA
(c1_bint
BYTEINT,
c2_sint
SMALLINT,
c3_int
INTEGER,
c4_dec
DECIMAL(8,0),
c5_dec2
DECIMAL(8,2),
c6_float
FLOAT,
c7_char
CHAR(10))
UNIQUE PRIMARY INDEX (c1_bint, c2_sint);
.SET FOLDLINE
.SET SIDETITLES
INSERT INTO tableA (0,0,0,0,0,0,'0');
SELECT
HASHROW (c1_bint)
,HASHROW (c2_sint)
,HASHROW (c3_int)
,HASHROW (c4_dec)
,HASHROW (c5_dec2)
,HASHROW (c6_float)
,HASHROW (c7_char)
FROM tableA;
Hash Byteint
Hash Smallint
Hash Integer
Hash Dec0
Hash Dec2
Hash Float
Hash Char

AS "Hash Byteint"
AS "Hash Smallint"
AS "Hash Integer"
AS "Hash Dec0"
AS "Hash Dec2"
AS "Hash Float"
AS "Hash Char"

00000000
00000000
00000000
00000000
00000000
00000000
2BB7F6D9

Note: An INTEGER value of 500 and a DECIMAL (8, 2) value of 5.00 will both have the
same hash value.

Page 13-16

Data Distribution and Hashing

Additional Hash Examples
• A NULL value for numeric data types is treated as 0.
• Upper and lower case characters hash the same.
Example:

CREATE TABLE tableD
(c1_int
INTEGER
,c2_int
INTEGER
,c3_char
CHAR(4)
,c4_char
CHAR(4))
UNIQUE PRIMARY INDEX (c1_int, c2_int);
INSERT INTO tableD ( 0, NULL, 'EDUC', 'Educ' );
SELECT

FROM

Result:

HASHROW
,HASHROW
,HASHROW
,HASHROW
tableD;

(c1_int) AS "Hash c1"
(c2_int) AS "Hash c2"
(c3_char) AS "Hash c3"
(c4_char) AS "Hash c4"

Hash c1

Hash c2

Hash c3

Hash c4

00000000

00000000

6ED679D5

6ED679D5

Hash of 0

Hash of NULL

Hash of 'EDUC' Hash of 'Educ'

Data Distribution and Hashing

Page 13-17

Using Hash Functions to View Distribution
The Hash Functions can be used to view the distribution of rows for a chosen Primary Index.
Notes:





HashRow – returns the row hash value for a given value(s)
HashBucket – the grouping for a specific hash value
HashAMP – the AMP that is associated with the hash bucket
HashBakAMP – the fallback AMP that is associated with the hash bucket

Identifying the Hash Buckets
If you suspect data skewing due to hash synonyms or NUPI duplicates, you can use the
HashBucket function to identify the number of rows in each hash bucket. The HashBucket
function requires the HashRow of the columns that make up the Primary Index or the
columns being considered for a Primary Index.

Identifying the Primary AMPs
The HASHAMP function can be used to determine data skewing and which AMP(s) have
the most rows.
The Customer table on the facing page consists of 7017 rows.

Page 13-18

Data Distribution and Hashing

Using Hash Functions to View Distribution
Hash Functions can be used to calculate the impact of NUPI duplicates
and synonyms for a PI.
SELECT

HASHROW (Last_Name, First_Name)
AS "Hash Value"
,COUNT(*)
FROM
customer
GROUP BY 1
ORDER BY 2 DESC;

Hash Value
2D7975A8
14840BD7

HASHAMP (HASHBUCKET
(HASHROW (Last_Name, First_Name)))
AS "AMP #"
,COUNT(*)
FROM
customer
GROUP BY 1
ORDER BY 2 DESC;

Count(*)

AMP #

Count(*)

12
7

7
6
4
5
2
3
1
0

929
916
899
891
864
864
833
821

(Output cut due to length)
E7A4D910
AAD4DC80

SELECT

1
1

Data Distribution and Hashing

The largest
number of NUPI
duplicates or
synonyms is 12.

AMP #7 has the
largest number of
rows.

Page 13-19

Primary Index Hash Mapping
The diagram on the facing page gives you an overview of Primary Index Hash Mapping,
the process by which all data is distributed in the Teradata DBS.
The Primary Index value is fed into the Hashing Algorithm, which produces the Row Hash.
The row goes onto the Message Passing Layer. The Hash Maps in combination with the
Row Hash determines which AMP gets the row. The Hash Maps are part of the Message
Passing Layer interface.
Starting with Teradata Database 12.0, Teradata supports either 65,536 or 1,048,576 hash
buckets for a system. The larger number of buckets primarily benefits systems with
thousands of AMPs, but there is no disadvantage to using the larger number of buckets on
smaller systems.
The hash map is an array indexed by hash bucket number. Each entry of the array contains
the number of the AMP that processes the rows in the corresponding hash bucket.
The RowHash is a 32-bit result obtained by applying the hash function to the primary index
of the row. On systems with:


65,536 hash buckets, the system uses 16 bits of the 32-bit RowHash to index into
the hash map.



1,048,576 hash buckets, the system uses 20 bits of the 32-bit RowHash as the
index.

Page 13-20

Data Distribution and Hashing

Primary Index Hash Mapping
Primary Index Value for a Row

* Most newer systems have hash
bucket numbers that are
represented in the first 20 bits
of the row hash.

Hashing
Algorithm
Row Hash (32 bits)
Hash Bucket Number
(20 bits)*

Remaining bits
(12 bits)

• With a 20-bit hash bucket
number, the hash map will have
1,048,576 hash buckets.

• The hash bucket number is
effectively used to index into the
hash map.

Hash Map – 1,048,576* entries
(memory resident)

• Older systems (before TD 12.0)
use the first 16 bits of the row
hash for the hash bucket number.
These systems have hash maps
with 65,536 hash buckets.

• This course will assume 20 bits
for the hash bucket number
unless otherwise noted.

Data Distribution and Hashing

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
0
1
2
3
4
5
6
7
8
9

Page 13-21

Hash Maps
As you have seen, Hash Maps are the mechanisms that determine which AMP gets a row.
They are duplicated on every TPA node in the system. There are 4 Hash Maps:


Current Configuration Primary (designates where rows are stored)



Current Configuration Fallback (designates where copies of rows are stored)



Reconfiguration Primary (designates where rows move during a system
reconfiguration)



Reconfiguration Fallback (designates where copies of rows move during a
reconfiguration)

Hash Maps are also used whenever there is a PI or USI operation.
Hash maps are arrays of Hash Map entries. There are 65,536 or 1,048,576 Hash Map
entries. Each of these entries points to a single AMP in the system. The Row Hash
generated by the Hashing Algorithm contains information that designates a particular entry
on a particular Hash Map. This entry tells the system which AMP should be interrupted.


Teradata Version 1 used a Hash Map with only 3643 hash buckets.



Teradata Version 2 (prior to Teradata 12.0) used hash maps with 65,536 hash
buckets. Starting with Teradata 12.0, the number of hash buckets in a hash map
can be either 65,536 or 1,048,576. One of the important impacts of this change was
that this increase provides for a more even distribution of data with large numbers
of AMPs.

For systems upgraded to Teradata Database 12.0, the default number of hash buckets
remains unchanged at 65,536 buckets. For new systems or following a sysinit, the default is
1,048,676 buckets.
Note: The Hash Maps are stored in GDO (Globally Distributed Object) files on each SMP
and are loaded into the PDE memory space when PDE software is started – usually as
part of the UNIX MP-RAS, Windows 2003, or Linux startup process.

Page 13-22

Data Distribution and Hashing

Hash Maps
Hash Maps are the mechanism for determining which AMP gets a row.
• There are four (4) Hash Maps on every TPA node.
• By default, the two Current Hash Maps are loaded into PDE memory space of each TPA node
when PDE software boots.
Message Passing Layer
Current Configuration Primary

Reconfiguration Primary

Current Configuration Fallback

Reconfiguration Fallback

Hash Maps have either 65,536 or 1,048,576 entries. Each entry is 2 bytes in size.
• Starting with Teradata 12.0, new systems (or for systems that have a sysinit), the default number
of hash buckets is 1,048,576.

•

The increased number of hash buckets provides for a more even distribution of data with large
numbers of AMPs.

•

For systems upgraded to Teradata Database 12.0, the default number of hash buckets remains
unchanged at 65,536 buckets.

Data Distribution and Hashing

Page 13-23

Primary Hash Map
The diagram on the facing page is a graphical representation of a Primary Hash Map. (It
serves to illustrate the concept; they really don’t look like this.) The Hash Map utilized by
the system is the Current Configuration Primary Hash Map. The Fallback Hash Map IS
NOT an exact copy of the Primary Hash Map. The Primary Hash Map identifies which
AMP the first (Primary) copy of a row belongs to. The Fallback Hash Map is only used for
Fallback protected tables and identifies a different AMP in the same "cluster" for the second
(Fallback) row copy.
Note: On most systems (i.e., systems since the 5450), clusters typically consist of 2
AMPs.
That portion of the Row Hash that points to a particular Hash Map entry is called the Hash
Bucket Number (HBN). The hash bucket number is the first 16 or 20 bits of the Row Hash
depending on the size of the hash maps. The hash bucket number points to a single entry in
a Hash Map. As the diagram shows, the system looks at the particular Hash Map entry
specified by the hash bucket number to determine which AMP the row belongs to.
The Message Passing Layer (or Communications Layer) uses only the hash bucket number
portion of the Row Hash to determine which AMP gets the row when inserting a new row
into a table. The AMP uses the entire 32 bit Row Hash to determine logical disk storage
location of the row.
Teradata builds Hash Maps in a consistent fashion. The Primary Hash Map of systems with
the same number of AMP vprocs is identical assuming the same number of buckets in the
hash map (65,536 or 1,048,576 hash buckets). Fallback Hash Maps may differ due to
clustering differences at each site.
The hash bucket number (prior to Teradata 12.0) was commonly referred to as the
Destination Selection Word (DSW).

Page 13-24

Data Distribution and Hashing

Primary Hash Map
Row Hash (32 bits)
Hash Bucket Number
(20 or 16 bits)

Remaining bits

PRIMARY HASH MAP – 14 AMP System

0000
0001
0002
0003
0004
0005

•
•
•

0

1

2

3

4

5

6

7

8

9

13
13
10
07
04
01

12
07
10
06
04
00

13
08
13
13
05
05

12
10
05
03
07
04

13
08
11
06
09
08

11
08
11
08
06
10

12
11
12
13
09
10

10
11
12
02
07
05

13
09
11
13
03
08

10
09
11
13
02
08

A B C
11
10
06
01
03
06

12
12
12
00
08
09

11
09
13
07
01
07

D E
12
09
04
08
00
06

13
10
12
05
02
05

F
09
13
12
07
06
11

Note:
This partial hash map
(1,048,576 buckets) is
associated with a 14
AMP System.

Assume the Hash Bucket Number is the first 20 bits of the Row Hash.
The Hash Bucket Number points to one entry within the map.
The referenced Hash Map entry identifies the AMP for the row hash.

Data Distribution and Hashing

Page 13-25

Hash Maps for Different Systems
The diagrams on the facing page show a graphical representation of a Primary Hash Map for
an 8 AMP system and a Primary Hash Map for a 16 AMP system. These examples assume
hash maps with 1,048,576 entries.
A data value which hashes to “00023 1AB” will be directed to different AMPs on different
systems. For example, this hash value will be associated with AMP 5 on an 8 AMP system
and AMP 14 on a 16 AMP system.

Page 13-26

Data Distribution and Hashing

Hash Maps for Different Systems
Row Hash (32 bits)
Hash Bucket Number

Remaining bits

PRIMARY HASH MAP – 8 AMP System

0000
0001
0002
0003
0004
0005

Portions of actual hash maps
with 1,048,576 hash buckets.

Data Distribution and Hashing

1

2

3

4

5

6

7

8

9

A B C

D E

F

07
07
01
07
04
01

06
07
00
06
04
00

07
02
05
03
05
05

06
04
05
03
07
04

07
01
03
06
05
03

04
00
02
06
06
02

05
05
04
02
07
06

06
04
03
02
07
05

05
03
01
01
03
01

05
02
00
00
02
00

06
03
06
01
03
06

07
00
04
07
00
06

04
06
00
07
06
07

06
05
02
00
04
05

07
01
04
07
01
07

03
02
01
05
02
05

PRIMARY HASH MAP – 16 AMP System

Assume row hash of
00023 1AB
8 AMP system – AMP 05
16 AMP system – AMP 14

0

0000
0001
0002
0003
0004
0005

0

1

2

3

4

5

6

7

8

9

15
13
10
15
15
01

14
14
10
15
04
00

15
14
13
13
05
05

15
10
14
14
07
04

13
15
11
06
09
08

14
08
11
08
06
10

12
11
12
13
09
10

14
11
12
14
07
05

13
15
11
13
15
08

15
09
11
13
15
08

A B C
15
10
14
14
03
06

12
12
12
14
08
09

11
09
13
07
15
07

D E
12
09
14
08
15
06

13
10
12
15
02
05

F
14
13
12
07
06
11

Page 13-27

Fallback Hash Map
The diagram on the facing page is a graphical representation of a Primary Hash Map and a
Fallback Hash Map.
The Fallback Hash Map is only used for Fallback protected tables and identifies a different
AMP in the same “cluster” for the second (Fallback) row copy.
Note: These are the actual partial primary and fallback hash maps for a 14 AMP system
with 1,048,576 hash buckets.

Page 13-28

Data Distribution and Hashing

Fallback Hash Map
Row Hash (32 bits)
Hash Bucket Number

Remaining bits

PRIMARY HASH MAP – 14 AMP System

Assume row hash of
00023 1AB

0000
0001
0002
0003
0004
0005

Data Distribution and Hashing

1

2

3

4

5

6

7

8

9

13
13
10
07
04
01

12
07
10
06
04
00

13
08
13
13
05
05

12
10
05
03
07
04

13
08
11
06
09
08

11
08
11
08
06
10

12
11
12
13
09
10

10
11
12
02
07
05

13
09
11
13
03
08

10
09
11
13
02
08

A B C
11
10
06
01
03
06

12
12
12
00
08
09

11
09
13
07
01
07

D E
12
09
04
08
00
06

F

13
10
12
05
02
05

09
13
12
07
06
11

FALLBACK HASH MAP – 14 AMP System

Primary AMP – 05
Fallback AMP – 12

Notes:
14 AMP System with 2 AMP
clusters; hash maps with
1,048,576 buckets.

0

0000
0001
0002
0003
0004
0005

0

1

2

3

4

5

6

7

8

9

06
06
03
00
11
08

05
00
03
13
11
07

06
01
06
06
12
12

05
03
12
10
00
11

06
01
04
13
02
01

04
01
04
01
13
03

05
04
05
06
02
03

03
04
05
09
00
12

06
02
04
06
10
01

03
02
04
06
09
01

A B C

D E

F

04
03
13
08
10
13

05
02
11
01
07
13

02
06
05
00
13
04

05
05
05
07
01
02

04
02
06
00
08
00

06
03
05
12
09
12

Page 13-29

Reconfiguration
Reconfiguration (Reconfig) is the process for changing the number of AMPs in a system
and is controlled by the Reconfiguration Hash Maps. The system constructs
Reconfiguration Hash Maps by reassigning Hash Map Entries to reflect a new configuration
of AMPs. This is done in a way that minimizes the number of rows (and Hash Map Entries)
reassigned to a new AMP. After rows are moved, the Reconfiguration Primary Hash Map
becomes the Current Configuration Primary Hash Map, and the Reconfiguration Fallback
Hash Map becomes the Current Fallback Hash Map.
The diagram on the right illustrates a 200 AMP to 300 AMP Reconfig for a system. The
1,048,576 Hash Map entries are distributed evenly across the 200 AMPs in the initial
configuration (top illustration), with approximately 5243 entries referencing each AMP.
Thus, there are 5243 Hash Map Entries pointing to AMP 1.
In a 300 AMP system, each AMP will have approximately 3496 referencing the AMP. It is
necessary to change 1748 (5243 - 3496) of those and divide them between the new AMPs
(AMP 200 through 299). The system does the same thing for the Hash Map Entries that
currently point to the other AMPs. This constitutes the Reconfiguration Primary Hash Map.
A similar process is done for the Reconfiguration Fallback Hash Map.
Once the new Hash Maps are ready, the system looks at every row on each AMP and checks
to see if the Hash Bucket Number points to one of the Hash Map Entries which was
changed. If so, then the row is moved to its new destination AMP.
The formula used to determine the percentage of rows migrating to new AMPs during a
Reconfig is shown at the bottom of the right-hand page. Divide the Number of New AMPs
by the Sum of the Old and New AMPs (the number of AMPs after the Reconfig). For
example, the above 200 to 300 AMP Reconfig causes 33.3% of the rows to migrate.

Page 13-30

Data Distribution and Hashing

Reconfiguration
Existing
AMPs

If a 12.0 system (with 1,048,576
Hash buckets) has 200 AMPs,
then each of the 200 AMPs will
have approx. 5243 entries in the
hash map.

If upgrading to 300 AMPs, then
each of the 300 AMPs will have a
similar number of entries
(approx. 3496) in the hash map.

0

1

2

5243

5243

5243

New
AMPs

…..

199

200

5242

Empty

299
….. Empty

1,048,576 Hash Map Entries
0

1

2

3496

3496

3496

299
……………………………………..

3495

• The system creates new Hash Maps to accommodate the new configuration.
• Old and new maps are compared – each AMP reads its rows, and moves only those that
hash to a new AMP.
Percentage of
Number of New AMPs
Rows Moved =
SUM of Old + New AMPs
to new AMPs

=

100
300

=

1
3

= 33.3%

• It is not necessary to offload and reload data due to a reconfiguration.
• If the hash map size is changed (65,536 to 1,048,576), more data will be moved as part
of a reconfiguration.

Data Distribution and Hashing

Page 13-31

Row Retrieval via PI Value – Overview
The facing page illustrates the step-by-step process involved in Primary Index retrieval. The
SELECT statement (shown on facing page) retrieves the row or rows where the PI is equal
to a particular column value (or column values in the case of a multi-column PI).
The PE parser always puts out a three-part message composed of the Table ID, Row Hash
and Primary Index value. The 48 bit Table ID is looked up in the Data Dictionary, the 32 bit
Row Hash value is generated by the Hashing Algorithm and the Primary Index value comes
from the SQL request.
The Message Passing Layer (a.k.a., Communications Layer) Interface uses the Hash Bucket
Number (first 16 or 20 bits of the Row Hash) to determine which AMP to interrupt and pass
on the message.
The AMP uses the Table ID and Row Hash to identify and locate the proper data block, then
uses the Row Hash and PI value to locate the specific row(s). The PI value is required to
distinguish between Hash Synonyms.

Page 13-32

Data Distribution and Hashing

Row Retrieval via PI Value – Overview
SELECT … FROM tablename
WHERE primaryindex = values(s);

Parsing Engine
SQL Request
Parser
Hashing Algorithm
48 Bit TABLE ID

Index Value

Hash Bucket
Number

Message Passing Layer

AMP File System

32 Bit Row Hash

Logical Block Identifier

Logical Row Identifier

With a PI row retrieval, only
the AMP (whose number
appears in the referenced
Hash Map) is accessed by
the system.

Data Distribution and Hashing

Vdisk

Data
Block

Page 13-33

Names and Object IDs
DBC.Next is a Data Dictionary table that consists of a single row with 9 columns as shown
below.
One of the counters is used to assign a globally unique numeric ID to every Database, User,
Role, and Profile. A different counter is used to assign a globally unique numeric ID to
every Table, View, Macro, Trigger, Stored Procedure, User-Defined Function, Join Index,
and Hash Index.
DBC.Next always contains the next value to be assigned to any of these. Think of these
columns as counters for ID values.
You may be interested in noting that DBC.Next only contains a single, short row but it
requires a Table Header on every AMP, as does any table.
Columns and Indexes are also assigned numeric IDs, which are unique within their
respective tables. However, column and index IDs are not assigned from DBC.Next.
DBC.Next columns

Values

Data Type

RowNum
DatabaseID
TableID
ProcsRowLock
EventNum
LogonSequenceNo
TempTableID
StatsQueryID
ReconfigID

1
numeric
numeric
numeric
numeric
numeric
numeric
number
number

CHAR(1)
BYTE(4)
BYTE(4)
BYTE(4)
BYTE(4)
BYTE(4)
BYTE(4)
BYTE(4)
INTEGER

Page 13-34

Data Distribution and Hashing

Names and Object IDs
DBC.Next (1 row)
NEXT
DATABASE ID

NEXT
TVM ID

6 Other Counters

•
•

Each Database/User/Profile/Role – is assigned a globally unique numeric ID.

•
•

Each Column – is assigned a numeric ID unique within its Table ID.

•
•
•

The DD keeps track of all SQL names and their numeric IDs.

Each Table, View, Macro, Trigger, Stored Procedure, User-defined Function,
Join Index, and Hash Index – is assigned a globally unique numeric ID.

Each Index – is assigned a numeric ID unique within its Table ID.

The PE’s RESOLVER uses the DD to verify names and convert them to IDs.
The AMPs use the numeric IDs supplied by the RESOLVER.

Data Distribution and Hashing

Page 13-35

Table ID
The Table ID is the first part of the three-part message. It is a 48-bit number supplied by
the parser. There are two major components of the Table ID:


The first component of the Table ID is the Unique Value. Every table, view and
macro is assigned a 32-bit Unique Value, which is assigned by the system table
called DBC.Next. In addition to specifying a particular table, this value also
indicates whether the table is a normal data table, Permanent Journal table or Spool
file table.



The second component of the Table ID is known as the Subtable ID. Teradata
stores various types of rows of a table in separate blocks. For example, Table
Header rows (described later) are stored in different blocks than primary data rows,
which are stored in different blocks than Fallback data rows, and so on (more
examples are shown on the facing page). Each separate set of blocks is known as a
subtable. The Subtable ID is a 16-bit value that tells the file system which type of
blocks to search for.

The facing page lists subtable IDs in decimal value for 2-AMP clusters. The
SHOWBLOCKS utility will display the block allocations by subtable and uses
decimal values to represent each subtable. If a Reference Index subtable was
created, it would have subtable IDs of 1536 and 2560.
For convenience, Table ID examples throughout this course only refer to the Unique Value
and omit the Subtable ID.
The Table ID, together with the Row ID, gives Teradata a way to uniquely identify every
single row in the entire system.

Spool File Table IDs
Spool files are temporary work tables which are created and dropped as queries are
executed. When a query is complete, all of the spool files that it used will be dropped
automatically.
Like all tables, a spool file (essentially a temporary work table) requires a Table ID (or
tableid). There is a range of tableids exclusively reserved for spool files (C000 0001
through FFFF FFFF) and the system cycles through them. Eventually, the system will cycle
through all the tableids for spool files and reassign spool tableids starting at C000 0001.

Page 13-36

Data Distribution and Hashing

Table ID
The Table ID is a Unique Value for Tables, Views, Macros, Triggers, Stored Procedures,
Join Indexes, etc. that comes from DBC.Next dictionary table.
Unique Value also defines the type of table:
• Normal data table
• Permanent journal
• Global Temporary
• Spool file

UNIQUE VALUE
32 Bits

+

SUB-TABLE ID
16 Bits

Sub-table ID identifies the part of a table the system is looking at.
Sub-table type
Table Header
Data table
1st Secondary index
2nd Secondary index
1st Reference index
1st BLOB or CLOB
2nd BLOB or CLOB
Archive Online Subtable

Primary ID Fallback ID (shown in decimal format)
0
1024
2048
1028
2052
1032
2056
1536
2560
1792
2816
1794
2818
18440
n/a

Table ID plus Row ID makes every row in the system unique.
Examples shown in this manual use the Unique Value to represent the entire Table ID.

Data Distribution and Hashing

Page 13-37

Row ID
The Row Hash is not sufficient to identify a specific row in a table. Since it is based on a
Primary Index value, multiple rows can have the same Row Hash. This is due either to Hash
Synonyms or NUPI Duplicates.
The Row ID makes every row within a table uniquely identifiable. For a non-partitioned
table, the Row ID consists of the Row Hash plus a Uniqueness Value. The Uniqueness
Value is a 32-bit numeric value, designed to identify specific rows within a single Row Hash
value. When there are multiple rows with the same Row Hash within a table, the first row is
assigned a Uniqueness Value of 1. Additional rows with the same Row Hash are assigned
ascending Uniqueness Values.
For Primary Index retrievals, only the Row Hash and Primary Index values are needed to
find the qualifying row(s). The Uniqueness Value is needed for Secondary Index support.
Since a Row ID is a unique identifier of a row within a table, Teradata uses Row IDs as
Secondary Index pointers.
Although Row IDs do identify every row in a table uniquely, they do not guarantee that the
data itself is unique. In order to avoid the problem of duplicate rows (permitted in Multiset
tables), the complete set of data values for a row (in a Set table) must also be unique.
Summary

For a non-partitioned table (NPPI), the Row ID consists of the Row Hash +
Uniqueness Value for a total of 8 bytes in length.


Page 13-38

For a partitioned table (PPI), the Row ID actually consists of the Partition Number
+ Row Hash + Uniqueness Value for a total of 10 or 16 bytes in length.

Data Distribution and Hashing

Row ID
On INSERT, Teradata stores both the data values and the Row ID.
ROW ID = ROW HASH and UNIQUENESS VALUE

Row Hash

• Row Hash is based on Primary Index value.
• Multiple rows in a table could have the same Row Hash.
• NUPI duplicates and hash synonyms have the same Row Hash.
Uniqueness Value

•
•
•
•
•
•

The AMP creates a numeric 32-bit Uniqueness Value.
The first row for a Row Hash has a Uniqueness Value of 1.
Additional rows have ascending Uniqueness Values.
Row IDs determine sort sequence within a Data Block.
Row IDs support Secondary Index performance.
The Row ID makes every row within a table uniquely identifiable.

Duplicate Rows

• Row ID uniqueness does not imply data uniqueness.
Note: The Row ID for a non-partitioned table is effectively 8 bytes long.

Data Distribution and Hashing

Page 13-39

AMP File System – Locating a Row via PI
The steps on the right-hand page outline the process that Teradata uses to locate a row. We
know that rows are distributed according to their Row Hash. More specifically, the Hash
Bucket Number points to a single entry in a Hash Map which designates a particular AMP.


Once the correct AMP has been found, the Master Index for that AMP is used to
identify which Cylinder Index should be referenced.



The Cylinder Index then identifies the correct Data Block.



A search of the Data Block locates the row or rows specified by the original threepart message.



The system performs either linear or indexed searches.

The diagram at the bottom of the facing page illustrates these steps in a graphical fashion.

Page 13-40

Data Distribution and Hashing

AMP File System – Locating a Row via PI
• The AMP accesses its Master Index (always memory-resident).
– An entry in the Master Index identifies a Cylinder # and the AMP accesses the Cylinder Index
(frequently memory-resident).

• An entry in the Cylinder Index identifies the Data Block.
– The Data Block is the physical I/O unit and may or may not be memory resident.
– A search of the Data Block locates the row(s).

The PE sends request to an AMP
via the Message Passing Layer
(PDE & BYNET).

Table ID Row Hash

PI Value

AMP Memory
Master Index
Cylinder Index
(accessed in FSG Cache)
Data Block
(accessed in FSG Cache)

Data Distribution and Hashing

Vdisk
CI

Row

Page 13-41

Teradata File System Overview
The Teradata File System software has these characteristics:





part of AMP address space
unaware of other AMP or File System instances
AMP Interface to disk services
uses PDE FSG services

The Master Index contains an entry (CID) for each allocated cylinder. (CID – Cylinder
Index Descriptor)
On the facing page, SRD–A represents an SRD (Subtable Reference Descriptor) for table A.
DBD–A1 and DBD–A2 represent data blocks for table A. (DBD – Data Block Descriptor)
On the facing page, SRD–B represents an SRD for table B. DBD–B1, etc. represent data
blocks for table B.
There are actually two cylinder indexes allocated for each cylinder. Each cylinder index is
12 KB in size. Therefore, there is 24 KB (48 sectors) allocated for cylinder indexes at the
beginning of each cylinder.
Prior to Teradata 13.10 and Large Cylinder Support, cylinders are 3872 sectors.
Miscellaneous notes:


Master index entries are 72 bytes long.



A cylinder index is 12 KB in size for 2 MB cylinders and are 64 KB in size for 12
MB cylinders



Data rows for PPI tables require an additional 2 bytes to identify the partition
number and the spare byte is set to x'80' to identify the row as a PPI row.
Secondary index subtable rows also have the Part # + Row Hash + Uniqueness ID)
to identify data rows.

Page 13-42

Data Distribution and Hashing

Teradata File System Overview
Master Index

CID

CID

CID

CID

. ..

AMP Memory

CID – Cylinder Index Descriptor
SRD – Subtable Reference Descriptor
DBD – Data Block Descriptor

VDisk
Cylinder Index
SRD - A

DBD - A1

DBD - A2

Data Block A1

SRD - B

DBD - B1

DBD - B2

Cylinder
3872 sectors

Data Block B1

Data Block A2

Data Block B2

Cylinder Index
SRD - B

DBD - B3

DBD - B4

DBD - B5

Data Block B3
Data Block B4

Data Distribution and Hashing

Data Block B5

Page 13-43

Master Index Format
The first cylinder in each Vdisk contains a number of control structures used by the AMP’s
File System software. Segment 0 (512 bytes) contains the Vdisk status and a number of
structure pointers for the AMP. Following Segment 0 is the FIB (File System Information
Block). The FIB contains global file system information – a key component is a status array
that shows the status of cylinders (used, free, bad, etc.), and the sorted list of CIDs that are
the descriptors for the cylinders currently in use. The FIB effectively contains the list of free
or available cylinders. Unlike the Master Index (MI), the FIB is written to disk when
cylinders are allocated, and it is read from disk when Teradata boots or when the MI needs
to be rebuilt in memory. If necessary, software will allocate additional cylinders for these
structures.
The Master Index is a memory resident structure that contains an entry for every allocated
data cylinder on that AMP. Entries in the Master Index are sorted by the lowest Table ID
and Row ID that can be found on the associated cylinder. The Master Index is used to
identify which cylinder a specific row can be found in.
The key elements of the Master Index are:


Master Index Header - 32 bytes (not shown)



Cylinder Index Descriptors (CID) – one per allocated cylinder – 72 bytes in length



Cylinder Index Descriptor Reference Array (not shown) – set of 4 byte pointers to
the CIDs; these entries are sorted in descending order.

Note: This array is similar to the row reference array at the end of a data block.
Cylinders that contain no data are not listed in the Master Index. They appear in the Free
Cylinder List (which is part of the FIB – File System Information Block) for the associated
Vdisk. Entries in the Free Cylinder List are sorted by Cylinder Number.
Each Master Index entry (or CID) contains the following data:








Lowest Table ID in the cylinder
Lowest Part # / Row ID value in the cylinder (associated with the lowest Table ID)
Highest Table ID in the cylinder
Highest Part # / Row hash (not Row ID) value in the cylinder (associated with the
highest Table ID)
Drive (Pdisk) and Cylinder Number
Free sectors
Flags

The maximum size of the Master Index is based on number of cylinders available to the
AMP.

Page 13-44

Data Distribution and Hashing

Master Index Format
Characteristics

•

Memory resident structure specific
to each AMP.

•

Contains Cylinder Index Descriptors
(CID) – one for each allocated
Cylinder (72 bytes long).

•

Each CID identifies the lowest Table
ID / Part# / Row ID and the highest
Table ID / Part# / Row Hash for a
cylinder.

•

Range of Table ID / Part# / Row IDs
does not overlap with any other
cylinder.

•

Sorted list of CIDs.

Vdisk
Cylinder 0

Seg. 0

Master Index

FIB (contains
Free Cylinder
List)

CI

CID 1
CID 2
CID 3
.
.

Cylinder
CI
Cylinder
CI
Cylinder

CID n

CI
Cylinder

Notes:
• The Master index and Cylinder Index entries include the partition #’s to support partition
elimination for Partitioned Primary Index (PPI) tables.

• For non-partitioned tables, the partition number is 0 and the Master and Cylinder Index entries (for
NPPI tables) will use 0 as the partition number in the entry.

Data Distribution and Hashing

Page 13-45

Cylinder Index Format
Each cylinder has its own Cylinder Index (CI). The Cylinder Index contains a list of the
data blocks and free sectors that reside on the cylinder. The Cylinder Index is accessed to
determine which data block a row resides in.
The key elements of the Cylinder Index include:


Cylinder Index Header (not shown)



Subtable Reference Descriptors (SRD) contain
–
–



Table ID
Range of DBDs (1st and count)

Data Block Descriptors (DBD)
–
–
–
–
–

First Part # / Row ID
Last Part # / Row Hash
Sector number and size
Flags
Row count



Free Sector Entries (FSE) – identifies free sectors in the cylinder. There is one FSE
(for each free sector range in the cylinder. The set of FSEs effectively make up the
“Free Block List” or also known as the “Free Sector List”.



Subtable Reference Descriptor Array (not shown) – set of 2 byte pointers to the
SRDs; these entries are sorted in descending order. Note: This array is similar to
the row reference array at the end of a data block.



Data Block Descriptor Array (not shown) – set of 2 byte pointers to the DBDs;
these entries are sorted in descending order. Note: This array is similar to the row
reference array at the end of a data block.

There are two cylinder indexes allocated for each cylinder. Each cylinder index is 12 KB in
size. Therefore, there is 24 KB (48 sectors) allocated for cylinder indexes at the beginning
of each cylinder.
The facing page illustrates a logical view of SRDs and DBDs and does not represent the
actual physical implementation. For example, the SRD and DBD reference arrays are not
shown.

Page 13-46

Data Distribution and Hashing

Cylinder Index Format
Characteristics

VDisk

• Located at beginning of each
Cylinder..

• There is one SRD (Subtable
Reference Descriptor) for each
subtable that has data blocks on the
cylinder.

• Each SRD references a set of
DBD(s). A DBD is a Data Block
Descriptor..

• One DBD per data block - identifies
location and lowest Part# / Row ID
and the highest Part # / Row Hash
within a block.

• FSE - Free Segment (or Sector)
Entry identifies free sectors.

• Note: Each Cylinder actually has

Cylinder Index

SRD A
DBD A1
DBD A2
.
SRD B
DBD B1
DBD B2
.
FSE
FSE

Cylinder
Data Block A1
Data Block A2

Data Block B1

Data Block B2

Range of Free Sectors
Range of Free Sectors

two 12K Cylinder Indexes and the
File System software alternates
between them.

Data Distribution and Hashing

Page 13-47

Data Block Layout
A Block is the physical I/O unit for Teradata. It contains one or more data rows, all of
which belong to the same table. They must fit entirely within the block.
The maximum block size is 255 sectors or 127.5 KB.
A Data Block consists of three major sections:




The Data Block Header (DB Header)
The Row Heap
The Row Reference Array

Rows cannot be split between blocks. Each row in a DB is referenced by a separate index to
the row known as the Row Reference Array. The Row Reference Array is placed at the end
of the data block just before the Block Trailer.
With tables that are not partitioned (Non-Partitioned Primary Index – NPPI), each row has at
least 14 bytes of overhead in addition to the data values stored in that row. With tables that
are partitioned (PPI), each row has at least 16 bytes of overhead in addition to the data
values stored in that row. The partition number uses the additional two bytes.
There are also 2 bytes of space used in the Row Reference Array for a 2-byte Reference
Array Pointer. This 2-byte pointer identifies the offset of where the row starts within the
block. If a row is an odd number of bytes in length, the Row Length specifies its precise
length, but the system allocates whole words within the block for the row. Rows will start
on an even address boundary.


Teradata truly supports variable length rows.



The max amount of user data that you can define in a table row is 64,243 bytes
because there is a minimum of 12 bytes of overhead within the row. This gives a
total of 64,255 bytes for the data row plus an additional 2 bytes for the row offset
within the row reference array.

Page 13-48

Data Distribution and Hashing

Data Block Layout
• A data block contains rows with same subtable ID.
– Contains rows within range of Row IDs of associated DBD entry and the range of
Row IDs does not overlap with any other data block.

– Logically sorted set of rows.

• The maximum block size is 255 sectors (127.5 KB).
– Blocks can vary in size from 1 sector to 255 sectors.

Data Distribution and Hashing

Row 1

Row 3
Row 2

Row 4

Row
Reference
Array
-3 -2 -1 0

Trailer (2 bytes)

Header (72 bytes)

• A maximum row size is 64,255 bytes.

Page 13-49

Example of Locating a Row – Master Index
In the example on the facing page, you can see how Teradata would use the Master Index to
locate the data requested by a SELECT statement. The three-part message is Table ID=100,
Row Hash=1000 and EMPNO=3755. After identifying the appropriate AMP, Teradata uses
that AMP’s Master Index to locate which cylinder contains this Table ID and Row Hash.
By examining the Master Index, you can see that Cylinder Number 169 contains the
appropriate row, if it exists in the system.
Teradata’s File System software does a binary search of the CIDs based on Table ID / Part #
/ Row Hash or Table ID / Part # / Row ID to locate the cylinder number that has the row(s).
The CI for that cylinder is accessed to locate the data block.
A user request for a row based on a Primary Index value will only have the Table ID / Part #
/ Row Hash.
A user request for a row based on a Secondary Index (SI) will have the Table ID / Row Hash
for the SI value. The SI subtable row contains the Row ID(s) of the base table row(s).
Teradata software uses the Table ID / Row ID(s) to locate the base table row(s). If a table is
partitioned, the SI subtable row will have the Part # and the Row ID.
Free cylinders appear in the Free Cylinder List which is part of the FIB (File System
Information Block) for the associated Vdisk.
Summary









Page 13-50

There is only one entry for each cylinder on the AMP.
Cylinders with data appear on the Master Index.
Cylinders without data appear on the free Cylinder List (which is located
within the FIB – File System Information Block).
Each index entry identifies its cylinder’s lowest Table ID / Partition # / Row
ID.
Index entries are sorted by Table ID, Partition #, and Lowest Row ID.
Multiple tables may have rows on the same cylinder.
A table may have rows on many cylinders on different Pdisks on an AMP.
The Free Cylinder List is sorted by Cylinder Number.

Data Distribution and Hashing

Example of Locating a Row – Master Index
Table ID Part #
100

0

Row Hash
1000

empno
3755

SELECT *
FROM
employee
WHERE empno = 3755;

Master Index
Lowest
Highest
Pdisk and
Table ID Part # Row ID Table ID Part # Row Hash Cylinder Number
:
078
098
100
100
100
100
100
100
100
123
123
:

:
0
0
0
0
0
0
0
0
0
1
2
:

:
58234, 2
00107, 1
00773, 3
01361, 2
02937, 1
03662, 1
04123, 2
05974, 1
07353, 1
00343, 2
06923, 1
:

Part # - Partition Number

:
095
100
100
100
100
100
100
100
120
123
123
:

:
0
0
0
0
0
0
0
0
0
2
3
:

:
72194
00676
01361
02884
03602
03999
05888
07328
00469
01864
00231
:

:
204
037
169
777
802
117
888
753
477
529
943
:

Free
Cylinder
List
Pdisk 0

Free
Cylinder
List
Pdisk 1

:
124
125
168
170
183
189
201
217
220
347
702
:

:
761
780
895
896
914
935
941
1012
1234
1375
1520
:

To CYLINDER INDEX

What cylinder would have Table ID = 100, Row Hash = 00598?

Data Distribution and Hashing

Page 13-51

Example of Locating a Row – Cylinder Index
Using the example on the facing page, the File System would determine that the data block
it needs is the six-sector block beginning at sector 0789. The Table ID and Row Hash we
are looking for (100 + 1000, n) falls between the lowest and highest entries of 100 + 00998,
1 and 100 + 01010.
The convention of 00998, 1 is as follows: 00998 is the Row Hash and 1 is the Uniqueness
Value.
Teradata’s File System software does a binary search of the SRDs based on Table ID and a
binary search of the DBDs Partition #, Row Hash or Row ID to identify the data block(s)
that has the row(s).
A user request for a row based on a Primary Index value will include the Table ID / Part # /
Row Hash.
A user request for a row based on a Secondary Index (SI) will have the Table ID / Part # /
Row Hash for the SI value. The SI subtable row contains the Row ID(s) of the base table
row(s). Teradata software uses the Table ID / Part # / Row ID(s) to locate the base table
row(s) for secondary index accesses. If a table is partitioned, the SI subtable row will have
the Part # and the Row ID.
The example on the facing page illustrates a cylinder that only has one SRD. All of the data
blocks in this cylinder are associated with the same subtable.
Summary






Page 13-52

There is an entry (DBD) for each data block on this cylinder.
These entries are sorted ascending on Table ID, Partition #, and Lowest Row
ID.
Only rows belonging to the same table and sub-table appear in a block.
Blocks belonging to the same sub-table can vary in size.
Blocks without data appear on the Free Sector List that is sorted ascending
on sector number.

Data Distribution and Hashing

Example of Locating a Row – Cylinder Index
Table ID Part #
100

0

Row Hash

empno

1000

3755

SELECT *
FROM
employee
WHERE empno = 3755;

Cylinder Index - Cylinder #169
SRDs Table ID First DBD DBD
Offset Count
SRD #1

Free Block List

100

FFFF

12

Free Sector Entries

DBDs

Part #

Lowest
Row ID

Part #

Highest
RowHash

Start
Sector

:
DBD #4
DBD #5
DBD #6
DBD #7
DBD #8
DBD #9
:

:
0
0
0
0
0
0
:

:
00867, 2
00938, 1
00998, 1
01010, 3
01185, 2
01290, 1
:

:
0
0
0
0
0
0
:

:
00902
00996
01010
01177
01258
01333
:

:
1010
0093
0789
0525
0056
1138
:

Sector Row
Count Count
:
4
7
6
3
5
5
:

:
5
10
8
4
6
6
:

Start
Sector

Sector
Count

:
0270
0301
0349
0470
0481
0550
:

:
3
5
5
4
6
5
:

This example assumes that only 1 table ID has rows on this cylinder and the table is
not partitioned.
Part # - Partition Number

Data Distribution and Hashing

Page 13-53

Example of Locating a Row – Data Block
A Block is the physical I/O unit for Teradata. It contains one or more data rows, all of
which belong to the same table. They must fit entirely within the block.
The maximum block size is 255 sectors or 127.5 KB.
A Data Block consists of three major sections:




The Data Block Header (DB Header)
The Row Heap
The Row Reference Array

Rows cannot be split between blocks. Each row in a DB is referenced by a separate “offset
or pointer” to the row. These offsets are kept in the Row Reference Array. The Row
Reference Array is placed near the end of the DB just before the Block Trailer.
The DB Header contains control information for both the Row Reference Array and the
Row Heap. The DB Header is 72* bytes of information which contains the Table ID (6
bytes). It shows which table and subtable the rows in the block are from.
The Row Heap is where the rows reside in the DB. The rows may be in any physical order,
are aligned on an even address boundary, and therefore have an even number of bytes
allocated for them.
The Reference Array Pointers (2 bytes each), which point to the first byte of a row (Row
Length), are maintained in reverse Row ID sequence. The Reference Array pointers are
used to do both binary and sequential searches.
The Block Trailer (2 bytes) consists of a block version number which must match the block
version number in the Data Block Header.
* Notes on amount of space used by DB Headers.





Page 13-54

If the DB is on a 32-bit system and has never been updated, then the DB Header is
only 36 bytes long.
If the DB is on a 64-bit system and has never been updated, then the DB Header is
only 40 bytes long.
If a data block is new or has been updated (either a 32-bit or 64-bit system), then
the DB Header is 72 bytes long.
The length of the block header for a compressed block is 128 bytes. Note that, in a
compressed block, the header is not compressed and neither is the block trailer.
Only the row data within the block is compressed. The extended block header has
the normal block header at the start and then 56 extra bytes that contains
information specific to the compressed block plus some extra filler bytes to allow
for later additions without requiring data conversion.

Data Distribution and Hashing

Example of Locating a Row – Data Block
Sector
789
790
791
792
793
794

Header (72)

Row 1

Row 3

Row 2

Row 4

Row 6
Row 5

Row
Heap

Row 7

Row 8
Row R eference
Array

Trailer (2)

• A block is the physical I/O unit.
• The block header contains the Table ID (6 bytes).
• Only rows for the same table reside in the same data block.
–

Rows are not split across block boundaries.

• Blocks within a table vary in size. The system adjusts block sizes dynamically.
–

Blocks may be from 1 sector (512 bytes) to 255 sectors (127.5 KB).

• Data blocks are not chained together.
• Row Reference Array pointers are stored (sorted) in reverse sequence based on Row ID
within the block.

Data Distribution and Hashing

Page 13-55

Accessing the Row within the Data Block
Teradata’s File System software does a binary search of the Row Reference Array to locate
the rows that have a matching Row Hash. Since the Row Reference Array is sorted in
reverse sequence based on Row ID, the system can do a binary or linear search.
The first row with a matching Row Hash has its Primary Index value compared with the
Primary Index value in the request. The PI value must be checked to eliminate Hash
Synonyms. The matching rows are then put into spool. If no matches are made, a message
is returned that no rows are found.
In the case of a Unique Primary Index (UPI), the search ends with the first row found
matching the criteria. The row is then returned.
In the case of a Non-Unique Primary Index (NUPI), the matching rows (same PI value and
Row Hash) are put into spool. With a NUPI, the matching rows in spool are returned.
The example on the right-hand page illustrates how Teradata utilizes the Primary Index data
value to eliminate synonyms. This is the conclusion of the example that we have been
following throughout this module.
In earlier steps the Master Index was used to find that the desired row was in Cylinder 169.
Then the Cylinder Index was used to find that the desired row was in the 6-sector block
beginning in Sector Number 789. The diagram shows that block.
The objective is to find that row with Row Hash=1000 and Index Value=3755. When the
block is searched, the first row with Row Hash 1000 does not meet these criteria. Its Index
Value is 1006, which means that it is a Hash Synonym. The system must continue its search
to the next row, the only row that meets both criteria.
The diagram on the facing page shows the logical order of rows in the block with a binary
search.

Page 13-56

Data Distribution and Hashing

Accessing the Row within the Data Block
• Within the data block, the Row Reference Array is used to locate the first row with a
matching Row Hash value within the block.

• The Primary Index data value is used as a row qualifier to eliminate synonyms.
Value
3755

SELECT
FROM
WHERE

Hash
1000

*
employee
employee_number = 3755;

Data Distribution and Hashing

Index
Hash Uniq Value

Data Columns

998

1

4219

Row data

999

1

2968

Row data

999

2

6324

Row data

1000

1

1006

Row data

1000

2

3755

Row data

1002

1

6838

Row data

1008

1

8825

Row data

1010

1

0250

Row data

Data Block
Sectors
789

794

Page 13-57

AMP Read I/O Summary
You have seen that a Primary Index Read requires that the Master Index, Cylinder Index and
Data Block all must be accessed. The number of I/Os involved in this process can vary.
The Master Index is always resident in memory. The Cylinder Index may or may not be
resident in memory and the Data Block may or may not be resident in memory.
Factors that affect the number of I/Os involved include AMP memory, cache size and
locality of reference. Often the Cylinder Index is memory resident so that a Unique Primary
Index retrieval requires only a single I/O.
Note that no matter how many rows are in the table and no matter how many inserts are
made, Primary Index access never gets any more complicated than Master Index to Cylinder
Index to Data Block.

Page 13-58

Data Distribution and Hashing

AMP Read I/O Summary
The Master Index is always memory resident.
The AMP reads the Cylinder Index if not memory resident.
The AMP reads the Data Block if not memory resident.

• The amount of FSG cache size also has an impact if either of these steps require physical I/O.
• The data block may or may not be memory residence depending on recent accesses of this data
block.

• The Cylinder Index is usually memory resident and a Unique Primary Index retrieval requires only
one I/O.

Message Passing Layer
Table ID

Row Hash

PI Value

AMP Memory
Master Index
Cylinder Index
(accessed in FSG Cache)
Data Block
(accessed in FSG Cache)

Data Distribution and Hashing

Vdisk
CI

Row

Page 13-59

Module 13: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 13-60

Data Distribution and Hashing

Module 13: Review Questions
1. The Row Hash for a PI value of 824 is the same for the data types of INTEGER and DECIMAL(18,0).
True or False. _______
2. The first 16 or 20 bits of the Row Hash is referred to as the _________ _________ _________ .
3. The Hash Map consists of entries or buckets which identify an _____ number for the Row Hash.
4. The Current Configuration ___________ Hash Map is used to locate the AMP to locate/store a row
based on PI value.
5. The ____________ utility is used to redistribute rows to a new system configuration with more
AMPs.
6. When creating a new table, the Unique Value of a Table ID comes from the dictionary table named
DBC.________ .
7. The Row ID consists of the _______ ________ and the __________ _____ .
8. The _______ _______ contains a Cylinder Index Descriptor (CID) for each allocated Cylinder.
9. The _______ _______ contains an entry for each data block in the cylinder.
10. The ____ __________ ________ consists of a set of 2 byte pointers to the data rows in data block.
11. The maximum block size is approximately _______ and the maximum row size is approximately
_______ .
12. The Primary Index data value is used as a row qualifier to eliminate hash _____________ .

Data Distribution and Hashing

Page 13-61

Notes

Page 13-62

Data Distribution and Hashing

Module 14
File System Writes

After completing this module, you will be able to:
 Describe File System Write Access.
 Describe what happens when Teradata inserts a new row
into a table.
 Describe the impact of row inserts on block sizes.
 Describe how fragmentation affects performance.

Teradata Proprietary and Confidential

File System Writes

Page 14-1

Notes

Page 14-2

File System Writes

Table of Contents
AMP Write I/O........................................................................................................................... 14-4
New Row INSERT – Part 1 ....................................................................................................... 14-6
New Row INSERT – Part 2 ....................................................................................................... 14-8
New Row INSERT – Part 2 (cont.).......................................................................................... 14-10
New Row INSERT – Part 3 ..................................................................................................... 14-12
New Row INSERT – Part 4 ..................................................................................................... 14-14
Alternate Cylinder Index ...................................................................................................... 14-14
Blocking in Teradata ................................................................................................................ 14-16
Block Size and Filling Cylinders ............................................................................................. 14-18
Variable Block Sizes ................................................................................................................ 14-20
Block Splits (INSERT and UPDATE) ..................................................................................... 14-22
Space Fragmentation ................................................................................................................ 14-24
Cylinder Full ............................................................................................................................ 14-26
Mini-Cylpack ........................................................................................................................... 14-28
Space Utilization ...................................................................................................................... 14-30
Teradata 13.10 Auto Cylinder Pack Feature ........................................................................ 14-30
Merge Datablocks (13.10 Feature) ........................................................................................... 14-32
Merge Datablocks (Teradata 13.10) cont. ............................................................................ 14-34
How to use this Feature .................................................................................................... 14-34
File System Write Summary .................................................................................................... 14-36
Module 14: Review Questions ................................................................................................. 14-38
Module 14: Review Questions (cont.) ................................................................................. 14-40

File System Writes

Page 14-3

AMP Write I/O
The facing page illustrates how Teradata performs write operations and it outlines steps
required to perform an AMP Write operation.
WAL (Write Ahead Logging) is a recoverability/reliability feature that also provides
performance improvements in the area of database writes. WAL is a Teradata V2R6.2 (and
later) feature. WAL can batch up modifications from multiple transactions and apply them
with a single disk I/O, thereby saving I/O operations. WAL will help improve throughput for
I/O-bound workloads.
WAL is a log-based file system recovery scheme in which modifications to permanent data
are written to a log file, the WAL log. The log file contains change records (Redo records)
which represent the updates. At key moments, such as transaction commit, the WAL log is
forced to disk. In the case of a reset or crash, Redo records can be used to transform the old
copy of a permanent data block on disk into the version that existed at the time of the reset.
By maintaining the WAL log, the permanent data blocks that were modified no longer have
to be written to disk as each block is modified. Only the Redo records in the WAL log must
be written to disk. This allows a write cache of permanent data blocks to be maintained.
WAL protects all permanent tables and all system tables but is not used to protect either the
Transient Journal (TJ), since TJ records are stored in the WAL log, or any type of spool
tables, including global temporary tables.
The WAL log is maintained as a separate logical file system from the normal table area.
Whole cylinders are allocated to the WAL log, and it has its own index structure. The WAL
log data is a sequence of WAL log records and includes the following:



Redo records, used for updating disk blocks and insuring file system consistency
during restarts.
TJ records used for transaction rollback.

There is some additional CPU cost for maintaining the WAL log so WAL may reduce
throughput for CPU-bound workloads. However, the overall performance is expected to be
better with WAL since the benefit of I/O improvement outweighs the much smaller
CPU cost.
If CHECKSUM = NONE and the New Block length = Old Block length, Teradata will
attempt to update-in-place for any INSERT, DELETE, or UPDATE operations.
If the CHECKSUM feature is enabled for a table, any INSERT, UPDATE, or DELETE
operation will cause a new data block to be allocated.
The FastLoad and MultiLoad utilities always allocate new data blocks for write operations.
TPump follows the same rules as an SQL INSERT, UPDATE, or DELETE.

Page 14-4

File System Writes

AMP Write I/O
For SQL writes, Teradata uses WAL logic to manage disk write operations.

• Read the Data Block if not in memory (Master Index > Cylinder Index > Data Block).
• Place appropriate entries (e.g., before-images) into the Transient Journal buffer
(actually a WAL buffer) and write it to the WAL log on disk.

• Data blocks are updated in Memory, but not written immediately to disk.
• The after-image or changed image (REDO row) is written to a WAL buffer which is
written to the WAL log on disk.

– WAL can batch up modifications from multiple transactions and apply them with a single disk
I/O, thereby saving I/O operations.

– Updated data blocks in memory will be eventually aged out and written to disk.

• Make the changes to the Data Block in memory and determine the new block’s length.
– If the New Block has changed size, always allocate a new Data Block.
– If the New Block length = Old Block length, Teradata will attempt to update-inplace for any INSERT, DELETE, or UPDATE operations.

These operations happen concurrently on the Fallback AMP.

File System Writes

Page 14-5

New Row INSERT – Part 1
The facing page illustrates what happens when Teradata INSERTs a new row into a table.
The three part message is Table ID = 100, Partition # = 0, Row Hash = 1123 and PI Value =
7923.


The AMP uses its Master Index to locate the proper cylinder for the new row. As
you can see, Cylinder #169 is where a row with Table ID = 100, Partition # = 0,
and Row Hash = 1123 should be inserted.



The next step is to access the Cylinder Index for Cylinder #169, as illustrated on
the facing page.

Teradata’s File System software does a binary search of the CIDs based on Table ID /
Partition # / Row Hash to locate the cylinder number in which to insert the row. The CI for
that cylinder is accessed to locate the data block.
Note: The Partition # (shown in the examples) does not exist in Teradata systems prior to
V2R5.

Page 14-6

File System Writes

New Row Insert – Part 1
INSERT INTO employee VALUES (7923, . . . . );
INSERT

Table ID

ROW

100

Part # Row Hash
0

1123

data column values
7923

Master Index
Lowest
Highest
Pdisk and
Table ID Part # Row ID Table ID Part # Row Hash Cylinder Number
:
078
098
100
100
100
100
100
100
100
123
123
:

:
0
0
0
0
0
0
0
0
0
1
2
:

:
58234, 2
00107, 1
00773, 3
01361, 2
02937, 1
03662, 1
04123, 2
05974, 1
07353, 1
00343, 2
06923, 1
:

Part # - Partition Number

File System Writes

:
095
100
100
100
100
100
100
100
120
123
123
:

:
0
0
0
0
0
0
0
0
0
2
3
:

:
72194
00676
01361
02884
03602
03999
05888
07328
00469
01864
00231
:

:
204
037
169
777
802
117
888
753
477
529
943
:

Free
Cylinder
List
Pdisk 0

Free
Cylinder
List
Pdisk 1

:
124
125
168
170
183
189
201
217
220
347
702
:

:
761
780
895
896
914
935
941
1012
1234
1375
1520
:

To CYLINDER INDEX

Page 14-7

New Row INSERT – Part 2
The example on the facing page is a continuation from the previous page. Teradata has
determined that the new row must be INSERTed into Cylinder #169 in this example.

Page 14-8

File System Writes

New Row Insert – Part 2
INSERT INTO employee VALUES (7923, . . . . );
INSERT

Table ID

ROW

100

Part # Row Hash
0

1123

data column values
7923

Cylinder Index - Cylinder #169
SRDs Table ID First DBD DBD
Offset Count
SRD #1

Free Block List

100

FFFF

12

Free Sector Entries

DBDs

Part #

Lowest
Row ID

Part #

Highest
RowHash

Start
Sector

:
DBD #4
DBD #5
DBD #6
DBD #7
DBD #8
DBD #9
:

:
0
0
0
0
0
0
:

:
00867, 2
00938, 1
00998, 1
01010, 3
01185, 2
01290, 1
:

:
0
0
0
0
0
0
:

:
00902
00996
01010
01177
01258
01333
:

:
1010
0093
0789
0525
0056
1138
:

Read the block into memory
(FSG cache).

File System Writes

Sector Row
Count Count
:
4
7
6
3
5
5
:

:
5
10
8
4
6
6
:

Start
Sector

Sector
Count

:
0270
0301
0349
0470
0481
0550
:

:
3
5
5
4
6
5
:

To Data Block

Page 14-9

New Row INSERT – Part 2 (cont.)
The example on the facing page is a continuation from the previous page. Teradata has
determined that the new row hash value falls with the range of the data block that starts at
sector 525 and is 3 sectors long.
If the block that has been read into memory (FSG Cache) has enough contiguous free bytes,
then the row is inserted into this space within the block. The row reference array and the
Cylinder Index are updated.
If the block that has been read into memory (FSG Cache) does not have enough contiguous
free bytes, but it does have enough free bytes within the entire block, the software will
defragment the block and insert the row. The row reference array and the Cylinder Index
are updated.
Note: The block header contains a field that indicates the total number of free bytes
within the block.
Also note that the Row Reference Array expands by 2 bytes to reflect the added row. If the
block now has 5 rows, the Row Reference Array will increase from 8 bytes to 10 bytes in
length.
Acronyms:
FS – Free Space
RRA – Row Reference Array
BT – Block Trailer

Page 14-10

File System Writes

New Row Insert – Part 2 (cont.)
Read the block into memory (FSG cache).
1. If the block has enough free
contiguous bytes, then insert row
into block and update CI.
525
526
527

Block Header


Row #1



Free Space

RRA

525
526

Row #4 

Row #3

FS



Row #2

2. If the block has enough free bytes,
then defragment the block and insert
row into block and update CI.

BT

Block Header


527

Row #1

Free Space
Row #4 

New row to INSERT

Block Header

526



527



Row #1

FS

Free Space

Row #4
RRA

BT

FS

Block is defragmented and row is
inserted into contiguous free space.


525

Row #4 

526

Row #2

Row #3
New row

Row #3



New row to INSERT

Row is inserted into contiguous free
space.
525

Row #2

RRA

BT

527

Block Header



Row #1
Row #3

New row

Row #2



Row #4 
FS

RRA

BT

FS - Free Space; RRA - Row Reference Array; BT - Block Trailer

File System Writes

Page 14-11

New Row INSERT – Part 3
The File System then accesses the 3-sector block which starts at sector 525 and makes it
available in AMP memory.
The row is placed into the block, and the new block length is computed. In this example,
inserting the row has caused the block to expand from 3 sectors to 4 sectors.
Note that the Row Reference Array expands by 2 bytes to reflect the added row. If the block
now has 5 rows, the Row Reference Array will increase from 8 bytes to 10 bytes in length.
Acronyms:
FS – Free Space
RRA – Row Reference Array
BT – Block Trailer

Page 14-12

File System Writes

New Row Insert – Part 3
3. If the new row is larger than the total free space within the block, then the
Insert expands the block by as many sectors as needed (in memory) to hold
the row.
In this example, the block is expanded by one sector in memory.
525

Block Header

526



527



Row #1

Row #2
Row #4 

Row #3

FS



Free Space

RRA

BT

Row #2



New row to INSERT

Block Header

In memory,
block is
expanded.




Row #1
Row #3

Row #4 

New row
Free Space

RRA

BT

4. The next step is to locate the first block on the Free Block List equal to, or
greater than 4 sectors.

File System Writes

Page 14-13

New Row INSERT – Part 4
The File System searches the Free Sector (or Block) List looking for the first Free Block
whose size is equal to or greater than the new block’s requirement. It does not have to be an
exact match.


Upon finding a 5-sector free block starting at sector 0301, the system allocates a
new 4-sector block (sectors 301, 302, 303, 304) for the new data block, leaving a
free block of one sector (305) remaining.



The new data block is written to disk.



The old, 3-sector data block is placed onto the Free Sector List (or Free Block List).



The modified CI will be copied to the buddy node (FSG Cache) and the modified
CI will be written back to disk (eventually).

If a transaction failure occurs (or the transaction is aborted), the Transient Journal is used to
undo the changes to both the data blocks and the Cylinder Indexes. Before images of data
rows are written to the Transient Journal. Before images of Cylinder Indexes are not written
to the Transient Journal because Teradata uses the Alternate Cylinder Index for the changes.
If a transaction fails, the before image in the Transient Journal is used to return the data
row(s) back to the state before the transaction.

Alternate Cylinder Index
Starting with V2R6.2 and with WAL, space for 2 Cylinder Indexes (2 x 12 KB = 24 KB) is
allocated at the beginning of every cylinder. Characteristics include:




Page 14-14

Two Cylinder Indexes are used – Teradata alternates between the two Cylinder
Indexes.
Changes are written to an “Alternate Cylinder Index”.
When a CI is changed, it is not updated in place. This provides for better I/O
integrity.

File System Writes

New Row INSERT – Part 4
Cylinder Index - Cylinder #169
Free Block List
Free Sector Entries

SRDs Table ID First DBD DBD
Offset Count
SRD #1

100

FFFF

12

DBDs

Part #

Lowest
Row ID

Part #

Highest
RowHash

Start
Sector

:
DBD #5
DBD #6
DBD #7
DBD #8
:

:
0
0
0
0
:

:
00938, 1
00998, 1
01010, 3
01185, 2
:

:
0
0
0
0
:

:
00996
01010
01177
01258
:

:
0093
0789
0525
0056
:

Sector Row
Count Count
:
7
6
3
5
:

:
10
8
4
6
:

1

Start
Sector
:
0270
0301
0349
0470
0481
0550
:

Sector
Count
:
3
5
5
4
6
5
:

Alternate Cylinder Index - Cylinder #169
Free Block List
Free Sector Entries

SRDs Table ID First DBD DBD
Offset Count
SRD #1

100

FFFF

12

DBDs

Part #

Lowest
Row ID

Part #

:
DBD #5
DBD #6
DBD #7
DBD #8
:

:
0
0
0
0
:

:
00938, 1
00998, 1
01010, 3
01185, 2
:

:
0
0
0
0
:

File System Writes

Highest
RowHash
:
00996
01010
01177 2
01258
:

Start
Sector
:
0093
0789
0301
0056
:

Sector Row
Count Count
:
7
6
4
5
:

:
10
8
3 5
6
:

Start
Sector
:
0270
0305
0349
0470
0481
0525
0550

Sector
Count
:
3
2a
1
5
4
6
3
2b
5

Page 14-15

Blocking in Teradata
Tables supporting Data Warehouse and Decision Support users generally have their block
size set very large to accommodate more rows per block and reduce the number of block
I/Os needed to do full table scans. Tables involved in online applications and heavy data
maintenance generally have smaller block sizes.
Extremely large rows, called Oversized Rows, are very costly. Each Oversized row requires
its own block and costs one I/O every time it is touched. Oversized rows are common in
non-relational data models and appear in poor relational data models.

Page 14-16

File System Writes

Blocking in Teradata
Definitions
Largest Data Block Size

• The largest multi-row data block allowed. Impacts when a block split occurs.
• Determined by:
–
–

Table level attribute DATABLOCKSIZE
System default - PermDBSize parameter (DBS Control) – 5555 default is 254 sectors (127 KB)

Large (or typical) Row

• The largest fixed length row that allows multiple rows/block.
• Defined as ((Largest Block - 74 ) / 2);
–

Block header is 72 bytes and trailer is 2 b ytes.

Oversized Row

• A row that requires its own Data Block (one I/O per row):
• A fixed length row that is larger than Large Row.
Example:

• Assume DATABLOCKSIZE = 65,024 (127 sectors x 512 bytes)
–
–
–

Largest Block = 65,024 bytes
Large Row  32,475 bytes ((65,024 – 74) / 2)
Oversize row > 32,476 bytes

File System Writes

Page 14-17

Block Size and Filling Cylinders
Teradata supports a maximum block size of 255 sectors. With newer, larger, and faster
systems, it typically makes sense to use a large block size for transactions that do full table
or partition scans. A large block may help to minimize the number of I/Os needed to access
a large amount of data.
Therefore, it may seem that using the largest possible block size of 255 sectors would be a
good choice. However, a maximum block size of 254 sectors is actually a better choice in
most situations. Why?
With 254 sector blocks, a cylinder can hold 15 blocks.
With 255 sector blocks, a cylinder can only hold 14 blocks.
Why?
A cylinder consists of 3872 sectors and 48 sectors are used for the cylinder indexes.
The available space for user data blocks is 3872 – 48 = 3824 sectors.
3824 ÷ 254 = 15.055 or 15 blocks
3824 ÷ 255 = 14.996 or 14 blocks
15 x 254 = 3810 sectors of a cylinder are utilized or 99.6%
14 x 255 = 3570 sectors of a cylinder are utilized or only 93.4%
Assume an empty staging table and using FastLoad to load data into the table. With 255
sector blocks, the table will use 6% more cylinders to hold the data.
By using a system default (PermDBSize) or data block size (DATABLOCKSIZE) of 254
sectors will effectively utilize the space in cylinders more efficiently than 255 sector blocks.
The same is true if you are considering 127 or 128 sector blocks.
127 sector blocks – cylinder can hold 30 blocks – utilize 99.6% of cylinder
128 sector blocks – cylinder can hold 29 blocks – utilize 97.1% of cylinder
Therefore, 127 or 254 sector blocks are typically better choices. A greater percentage of
cylinder space can be utilized with these choices.

Page 14-18

File System Writes

Block Size & Filling Cylinders
What is the difference between choosing maximum data block size of 254 or 255 sectors?

• With 254 sector blocks, a cylinder can hold 15 blocks.
• With 255 sector blocks, a cylinder can only hold 14 blocks.
Why?

• A cylinder consists of 3872 sectors and 48 sectors are used for the cylinder indexes.
– 3824 254 = 15.055 or 15 blocks
– 3824 255 = 14.996 or 14 blocks
– 15 x 254 = 3810 sectors of a cylinder are utilized or 99.6%
– 14 x 255 = 3570 sectors of a cylinder are utilized or only 93.4%

• Assume an empty staging table and using FastLoad to load data into the table. With
255 sector blocks, the table will use 6% more cylinders.
What about 127 and 128 sector blocks?

• With 127 sector blocks, a cylinder can hold 30 blocks – utilize 99.6% of cylinder
• With 128 sector blocks, a cylinder can hold 29 blocks – utilize 97.1% of cylinder
Therefore, 127 or 254 sector blocks are typically better choices for PermDBSize and/or
data block sizes. A greater percentage of cylinder space can be utilized with these
choices.

File System Writes

Page 14-19

Variable Block Sizes
The Teradata RDBMS supports true variable block sizes. The illustration on the facing page
shows how blocks can expand to accommodate additional rows as they are INSERTed. As
rows are INSERTed, the Reference Array Pointers are placed into Row ID sequence.

 REMEMBER 
Large rows require more disk space for
Transient Journal, Permanent Journal, and Spool files.

Page 14-20

File System Writes

Variable Block Sizes
• When inserting rows (ad hoc SQL or TPump), the block expands as needed to
accommodate them.

• The system maintains rows within the block in logical ROW ID sequence.
• Large rows take more disk space for Transient Journal, Permanent Journal, and Spool
files.

• Blocks are expanded until they reach “Largest Block Size”. At this point, a Block Split
is attempted.
1 Sector 1 Sector 2 Sector 2 Sector 2 Sector 3 Sector
Block
Block
Block
Block
Block
Block
Row
Row
Row
Row
Row
Row
Row

Note:

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row
Row

Rows do NOT have to be
contiguous in a data block.

File System Writes

Page 14-21

Block Splits (INSERT and UPDATE)
Block splits occur during INSERT and UPDATE operations. Normally, when a data block
expands beyond the maximum multi-row block size (Largest Block), it splits into two
approximately equal-sized blocks. This is shown in the upper illustration on the facing
page.


If an Oversize Row is INSERTed into a data block, it causes a three-way block
split (as shown in the lower illustration). This type of block split may result in
uneven block sizes.



With Teradata, block splits cost only one additional I/O per extra block created.
There is little impact on OLTP and OLCP performance.



Block splits automatically reclaim any contiguous, unused space greater than 511
bytes.

Page 14-22

File System Writes

Block Splits (INSERT and UPDATE)
Two-Way Block Splits
• When a Data Block expands beyond
Largest Block, it splits into two,
fairly-equal blocks.
• This is the normal case.

Three-Way Block Splits
• An oversize row gets its own Data
Block. The existing Data Block
splits at the row’s logical point of
existence.
• This may result in uneven block
sizes.

New Row

Oversized
Row

New Row

Oversized
Row

Notes:
• Block splits automatically reclaim any unused space over 511 bytes.

• While it is not typical to increment blocks by one 512 sector, it is tunable
as to how many sectors are acquired at a time for the system.

File System Writes

Page 14-23

Space Fragmentation
Space fragmentation is not an issue in the Teradata database because the system collects
free blocks as a normal part of routine table maintenance. If a block of sectors is freed up
and is adjacent to already free sectors in the cylinder, these are combined into one entry on
the free block list.
As previously described, when an actual data block has to grow, it does not grow into
adjacent free blocks – a new block is assigned from the free block list. The freed up data
block (set of sectors) is placed on the free block (or segment) list. If there is already an
entry on the free block list representing adjacent free blocks, then the freed up data block is
combined with adjacent free sectors and only one entry is placed on the free block list.
Using the example on the facing page, assume we are looking at a 40-sector portion of a
cylinder. These sectors are physically adjacent to each other. The free block list would
have 2 entries on it – one representing the 4 unused sectors and a second entry representing
the 6 unused sectors.
We will now consider 4 situations.
First case – If the first 10-sector data block is freed up, software will not place an entry on
the free block list for just these 10 sectors. Software will effectively combine these 10
sectors with the following adjacent free 4 sectors and place one entry representing the 14
free sectors on the free block list. For this 40-sector portion of a cylinder, there will be 2
entries on the free block list – one for the first 14 unused sectors and a second entry for the 6
unused sectors that are still there.
Second case – If the middle 12-sector data block is freed up, software will not place an entry
on the free block list for just these 12 sectors, but will effectively combine these 12 sectors
with the previous adjacent 4 free sectors and with the following 6 free adjacent sectors,
effectively represented by one entry for 22 free sectors. For this 40-sector portion of a
cylinder, there will be one entry on the free block list showing that 22 sectors that are free.
Third case – If the last 8-sector data block is freed up, software will not place an entry on the
free block list for just these 8 sectors, but will effectively combine these 8 sectors with the
previous adjacent 6 free sectors. One entry representing the 14 free sectors is placed on the
free block list. For this 40-sector portion of a cylinder, there will be 2 entries on the free
block list – one for the first 4 unused sectors and a second entry for the 14 unused sectors.
Fourth case – If there is no entry on the free block list large enough to meet a request for a
new block, Teradata’s file system software may choose to dynamically defragment the
cylinder. In this case, all free sectors are combined together at the end of a new cylinder and
one entry for the free space (sectors) is placed on the free block list. Defragmentation is
actually done in the new cylinder and the existing cylinder is placed in the free cylinder list.

Page 14-24

File System Writes

Space Fragmentation
• The system collects free blocks as a normal part of table maintenance.
• Smaller Free Blocks become larger when adjacent blocks become free, or
when defragmentation is performed on the cylinder.
This
10 Sector
Block
4 Unused
Sectors

becomes
OR
14 Unused
Sectors

OR
10 Sector
Block

8 Sector
Block

This example represents
a 40 sector portion of a
cylinder; 30 sectors have
data and 10 sectors are
unused.

File System Writes

10 Sector
Block
4 Unused
Sectors

12 Sector
Block
6 Unused
Sectors

OR

12 Sector
Block

22 Unused
Sectors

6 Unused
Sectors
8 Sector
Block

1st used block is
freed up

2nd used block is
freed up

10 Sector
Block
12 Sector
Block

12 Sector
Block
8 Sector
Block
14 Unused
Sectors

8 Sector
Block

OR

3rd used block is
freed up

10 Unused
Sectors

After
Defragmentation

Page 14-25

Cylinder Full
A Cylinder Full condition occurs when there is no block on the Free Block List that has
enough sectors to accommodate additional data during an INSERT or UPDATE. If this
condition occurs, the File System goes through the steps outlined on the facing page which
results in a Cylinder Migrate to an existing adjacent cylinder or to a new cylinder. As part
of this process, the file system software may also choose to perform a Cylinder
Defragmentation or a Mini Cylinder Pack (Mini-Cylpack) operation.


A Mini-Cylpack is a background process that occurs automatically when the
number of free (or available) cylinders falls below a threshold. The mini-Cylpack
process is the mechanism that Teradata uses to rearrange data blocks to free
cylinders. This process involves moving data blocks from a data cylinder to the
logically preceding data cylinder until a whole cylinder becomes empty.



Mini-Cylpack is an indication that the system does not have enough free space to
handle its current workload.

In the example at the bottom of the facing page, if Cylinder 37 became full, the File System
would check Cylinder 204 and Cylinder 169 to see if they had enough room to perform a
Cylinder Migrate. These two cylinders are logically adjacent to Cylinder 37 in the Master
Index, but not necessarily physically adjacent on the disk.
During the Cylinder Migrate, if data blocks were moved to Cylinder 204, they would be
taken from the top of Cylinder 37. If they were moved to Cylinder 169, they would be taken
from the bottom of Cylinder 37.
Note:
Performance tests show that defragging can cause a significant performance hit.
Therefore, the default tuning parameters that control how often you do this are set to
only defragment cylinders if there are very few free cylinders left (<= 100) and the
cylinder has quite a bit of free space that isn’t usable (>= 25%). The latter indicates
that, although there is significant free space on the cylinder, the free space is apparently
so fragmented that a request for new sectors couldn’t be satisfied. Otherwise, it’s
assumed that the cylinder is full and the overhead of defragging it wouldn’t be worth it.

Page 14-26

File System Writes

Cylinder Full
Cylinder Full means there is no block big enough on the Free Block List. The File System
does either of the following:

•

Cylinder Migrate to an adjacent cylinder — checks logically adjacent cylinders for fullness. If it
finds room, it moves a maximum of 10 data blocks from the full cylinder to an adjacent one.

•

Cylinder Migrate to a new Cylinder — looks for a free cylinder, allocates one, and moves a
maximum of 10 data blocks from the congested cylinder to a new one.

While performing a Cylinder Migrate operation, the File System software may also do the
following operations in the background.

•

Cylinder Defragmentation — if the total cylinder free space  25% of the cylinder size (25% is
default), then the cylinder is defragmented. Defragmentation collects all free sectors at the end of
a new cylinder by moving all the data blocks to the top of the new cylinder.

•

Mini-Cylpack — if the number of free cylinders falls below a threshold (default is 10), then a "MiniCylpack" is performed to pack data together to free up a cylinder and place it on the free cylinder
list.
Master Index
Lowest
Highest
Pdisk and
Table ID Part # Row ID Table ID Part # Row Hash Cylinder Number
:
078
098
100
:

:
0
0
0
:

File System Writes

:
58234, 2
00107, 1
00773, 3
:

:
095
100
100
:

:
0
0
0
:

:
72194
00676
01361
:

:
204
037
169
:

Free
Cylinder
List
Pdisk 0

Free
Cylinder
List
Pdisk 1

:
124
125
168
:

:
761
780
895
:

Page 14-27

Mini-Cylpack
The Mini-Cylpack is the mechanism that Teradata uses to rearrange data blocks to free
cylinders. The process involves moving data blocks from a data cylinder to the logically
preceding data cylinder until a whole cylinder becomes empty.


A Mini-Cylpack is an indication that the system does not have enough free space to
handle its current workload.



Excessive numbers of Mini-Cylpacks indicate too little disk space is available
and/or too much spool is being utilized during data maintenance.



Spool cylinders are never “Cylpacked”.

Teradata has a Free Space (a percentage) parameter that can be set to control how much
free space is left in a cylinder during loading and the use of the Ferret PackDisk utility. This
parameter is not used with mini-cylpacks.


This parameter should be set low (close to 0%) for systems which are used solely
for Decision Support as there is no data maintenance involved.



In cases where there is moderate data maintenance (batch or some OLTP), the Free
Space parameter should be set at approximately 25%.



If heavy data maintenance is to be done (OLTP), the Free Space parameter may
have to be set at approximately 50% to prevent Cylpacks from affecting OLTP
response times.

The Free Space parameter can be set at the system level, at a table level, and when
executing the Ferret PackDisk utility.




DBSControl – FREESPACEPERCENT (0% is the default)
CREATE TABLE – FREESPACE = integer [PERCENT] (0 – 75)
FERRET PACKDISK – FREESPACEPERCENT (or FSP) integer

The system administrator can specify a count of empty cylinders the system should attempt
to maintain. Whenever a Cylinder Migrate to a new cylinder occurs, the system checks to
see if the minimum number of empty cylinders still exists. If the system has dropped below
the minimum, it starts a background task that begins packing cylinders. The task stops when
either a cylinder is added to the Free Cylinder List or it has packed 10 cylinders. This
process continues with every Cylinder Migrate to a new cylinder until the minimum count of
empty cylinders is reached, or a full mini-cylpack is required.

Page 14-28

File System Writes

Mini-Cylpack
BEFORE

AFTE R

A Mini-Cylpack m oves data blocks from the data cylinder(s) to logically
preceding data cylinder(s) until a single cylinder is em pty.

• Spool cylinders are never cylpacked.
• Mini-Cylpacks indicate that the system does not have space to handle its current
workload.

• Excessive Cylpacks indicate too little disk space and/or spool utiliza tion during data
maintenance.

The Free Space parameter impacts how full a cylinder is filled with data loading
and PackDisk.

• DBSControl – FREESPACEPERCENT
• CREATE TABLE – FREESP ACE
• FERRET P ACKDISK – FREESP ACEPERCENT (FSP)

File System Writes

Page 14-29

Space Utilization
The Teradata Database can use any particular cylinder to either store data or hold Spool
files. A cylinder cannot be used for both data and Spool simultaneously. In sizing a system,
you must make certain that you have enough cylinders to accommodate both requirements.
Limiting the number of rows and columns per query helps keep Spool requirements under
control, as does keeping the number of columns per row to a minimum. Both can result
from proper normalization.

Teradata 13.10 Auto Cylinder Pack Feature
One new background task in Teradata 13.10 is called AutoCylPack which attempts to
combine adjacent, sparsely filled cylinders. These cylinder packs are typically executed
when the system is idle.
AutoCylPack is particularly useful if a customer is using temperature-based BLC, because it
cleans up post-compression cylinders that are no longer holding as much data. However,
this feature works with compressed as well as uncompressed cylinders. Sometimes the
activity of AutoCylPack can result in seeing a little bit of wait I/O (less than 5%).
File System Field 17 (DisableAutoCylPack) has a default value of FALSE, which means
AutoCylPack is on and active all the time, unless you change this setting.
General notes:
There are a number of background tasks running in the Teradata database and
AutoCylPack is another of these tasks. These tasks include deadlock detection, cylinder
defragmentation, transient journal purging, periodic cache flushes, etc.
These tasks generally consume a small amount of system resources. However, you will
tend to notice them more when the system is idle.

Page 14-30

File System Writes

Space Utilization
Space being used is managed via Master Index and Cylinder Indexes

Master
Index

Cylinder
Indexes

Free
Cylinder
Lists

Fre e
Block
Lists

Cylinders not being used are listed in Free Cylinder Lists
Fr ee Sectors within cylinder s ar e listed in Free Block Lists

Cylinders contain Perm, Spool, Temporary, Permanent Journal, or WAL data,
but NOT a combination.
BE SURE THAT YOU HAVE ENOUGH SPACE OF EACH.
Limiting the rows and columns per query reduces spool use.

File System Writes

Page 14-31

Merge Datablocks (13.10 Feature)
This Teradata Database 13.10 feature automatically searches for “small” data blocks within
a table and will combine (merge) these small datablocks into a single larger block. Over
time, modifications to a table (especially with DELETEs of data rows) can result in a table
having blocks that are less than 50% of the maximum datablock size. This File System
feature combines these small blocks into a larger block.
The benefit is simply that future full table operations (full table scans and/or full table
updates) will perform faster because fewer I/Os are performed. By having larger blocks in
the Teradata file system, the selection of target rows can also be more efficient.


Blocks that are 50% or greater of the maximum multi-row datablock size (63.5 KB
in this example) are not considered to be small blocks. Small blocks are less than
50% of the maximum datablock size.

The merge of multiple small blocks into a larger block is limited by cylinder boundaries –
does not occur between cylinders. A maximum of 7 logically adjacent preceding blocks can
are merged together into a target block when the target block is updated. Therefore, a
maximum of 8 total blocks can be merged together.
Why are logical following blocks NOT merged together?



The File System software does not know if following blocks are going to be
immediately updated.
Reduces performance impact during dense sequential updates

How does a table get to the point of having many small blocks? DELETEs from this table
can cause blocks to permanently shrink to a much smaller size unless a large amount of data
is added again.
How have customers resolved this problem before Teradata 13.10?


The ALTER TABLE command can be used to re-block a table. This technique can
be time consuming and requires an exclusive table lock. This technique is still
available with Teradata 13.10.
–

ALTER TABLE DATABLOCKSIZE =  IMMEDIATE

If this featured is enabled, the merge of small data blocks into a larger block runs
automatically during full table SQL write operations. This feature can merge datablocks for
the primary/fallback subtables and all of the index subtables. This feature runs automatically
when the following SQL functions are executed.




Page 14-32

INSERT-SELECT, UPDATE-WHERE
DELETE-WHERE (used on both permanent table and permanent journal
datablocks)
During the DELETE phase of Reconfig utility on source amps
File System Writes

Merge Datablocks (Teradata 13.10)
This Teradata Database 13.10 feature automatically searches for “small” data
blocks within a table and will combine (merge) these small datablocks into a
single larger block.

• Over time, modifications to a table (especially with DELETEs of data rows) can result
in a table having blocks that are less than 50% of the maximum datablock size.

• Up to 8 datablocks can be merged together.
If enabled, the merge of small data blocks into a larger block runs automatically during full
table SQL write operations. This feature can merge datablocks for the primary/fallback
subtables and all of the index subtables.

• INSERT-SELECT
• UPDATE-WHERE
• DELETE-WHERE
How have customers resolved this problem before Teradata 13.10?

•

The ALTER TABLE command can be used to re-block a table. This technique can be time
consuming and requires an exclusive table lock. This technique is still available with Teradata
13.10.
ALTER TABLE DATABLOCKSIZE =  IMMEDIATE;

File System Writes

Page 14-33

Merge Datablocks (Teradata 13.10) cont.
How to use this Feature
Defaults for this feature can be set at the system level via DBSControl settings and can be
overridden with table level attributes. The CREATE TABLE and ALTER TABLE
commands have options to enable or disable this feature for a specific table.
The key parameter that controls this feature is MergeBlockRatio. This parameter can be set
at the system level and also as a table level attribute.
MergeBlockRatio has the following characteristics:

Limits the resulting size of a merged block.

Reduces the chances that a merged block will split again soon after it is merged,
defeating the feature’s purpose.

Computed as a percentage of the maximum multi-row datablock size for the
associated table.

Candidate merged block must be smaller than this computed size after all target
row updates are completed.

Source blocks are counted up as eligible until the size limit is reached (zero to 8
blocks can be merged together).

The default system level percentage is 60% and can be changed.
CREATE TABLE or ALTER TABLE options

DEFAULT MERGEBLOCKRATIO
– Default option on all CREATE TABLE statements

MERGEBLOCKRATIO = integer [PERCENT]
– Fixed MergeBlockRatio used for full table modification operations
– Overrides the system default value

NO MERGEBLOCKRATIO
– Disables merges completely for the table
DBSControl FILESYS Group parameters
25. DisableMergeBlocks (TRUE/FALSE, default FALSE)
– Disables feature completely across the system, even for tables with a defined
MergeBlockRatio as a table level attribute.
– Effective immediately – does not require a Teradata restart (tpareset)
26. MergeBlockRatio (1-100%, default 60%)
– Default setting for any table – this can be overridden at the table level.
– Ignored when DisableMergeBlocks is TRUE (FILESYS Flag #25)
– This is not stored in or copied to table header
– Effective immediately without a tpareset

Page 14-34

File System Writes

Merge Datablocks (Teradata 13.10) cont.
This feature is automatically enabled for new Teradata 13.10 systems, but must be enabled
for existing systems upgraded to 13.10.
Defaults for this feature are set via DBSControl settings.

•
•

System defaults will work well for most tables.

•

The key parameter is MergeBlockRatio.

The CREATE TABLE and ALTER TABLE commands have options to enable/disable this feature for
a specific table or change the default ratio.

MergeBlockRatio has the following characteristics:

•
•
•

The default system level percentage is 60%.
Computed as a percentage of the maximum multi-row datablock size for the associated table.
Candidate merged block must be smaller than this computed size after all target row updates are
completed.

CREATE TABLE or ALTER TABLE options

•

DEFAULT MERGEBLOCKRATIO

– Default option on all CREATE TABLE statements
•

MERGEBLOCKRATIO = integer [PERCENT]

– Fixed MergeBlockRatio used for full table modification operations
•

NO MERGEBLOCKRATIO

– Disables merges completely for the table

File System Writes

Page 14-35

File System Write Summary
Regardless of how large your tables get, or how many SQL-based INSERTs, UPDATEs or
DELETEs are executed, the process is the same. This module has discussed in some detail
the sequence of steps that Teradata’s file system software will attempt in order to complete
the write operation.
The facing page summarizes some of the key topics discussed in this module.

Page 14-36

File System Writes

File System Write Summary
Teradata’s file system software automatically maintains the logical sequence of
data rows within an AMP.

• The logical sequence is based on tableid, partition #, and rowid.
For write (INSERT, UPDATE, or DELETE) operations:

• Read the Data Block if not present in memory.
• Place appropriate entries into the Transient Journal buffer (WAL buffer).
• Make the changes to the Data Block in memory and determine the new block’s length.
• If the new block has changed size, allocate a new Data Block.
Blocks will grow to the maximum data block size determined by the DATABLOCKSIZE
table attribute, and then be split into smaller blocks.

• Blocks will vary in size with Teradata.
• For a table that has been updated with "ad hoc" or TPump INSERTs, UPDATEs, or
DELETEs, a typical block size for the table will be approximately 75% of the maximum
data block size.
If the Write operation fails, the file system does a rollback using the Transient Journal.

File System Writes

Page 14-37

Module 14: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 14-38

File System Writes

Module 14: Review Questions
1. When Teradata INSERTs a new row into a table, it first goes to the _________ to locate the proper
cylinder for the new row.
a. Cylinder Index
b. Fallback AMP
c. Free Cylinder List
d. Master Index
2. When a new block is needed, the File System searches the Free Block List looking for the first Free
Block whose size is equal to, or greater than the new block’s requirement. It does not have to be an
exact match.
a. True
b. False
3. Name the condition which occurs when there is no block on the Free Block List with enough sectors
to accommodate the additional data during an INSERT or UPDATE.
a. Mini Cylinder Pack
b. Cylinder Migrate to a new cylinder
c. Cylinder Migrate to an adjacent cylinder
d. Cylinder Full
4. The ______________ parameter can be set to control how completely cylinders are filled during
loading and PackDisk.
a. Free Space Percent
b. DataBlockSize
c. PermDBSize
d. PermDBAllocUnit

File System Writes

Page 14-39

Module 14: Review Questions (cont.)
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 14-40

File System Writes

Module 14: Review Questions (cont.)
5. Number the following steps in sequence from 1 to 6 that the File System software will attempt to
perform in order to insert a new row into an existing data block.
____

Perform a Cylinder Migrate operation to an adjacent cylinder

____

Simply insert the row into data block if enough contiguous free bytes in the block

____

Perform a Block split

____

Perform a Cylinder Migrate operation to a new cylinder

____

Defragment the block and insert the row

____

Expand or grow the block to hold the row

6. As part of a cylinder full condition, if the number of free sectors within a cylinder is greater than 25%,
what operation will Teradata perform in the background? ___________________
7. If the number of free cylinders falls below a minimum threshold, what operation will Teradata
perform in the background? ___________________

File System Writes

Page 14-41

Notes

Page 14-42

File System Writes

Module 15
Teradata SQL Assistant

After completing this module, you will be able to:
 Define an ODBC data source for Teradata.
 Submit SQL using SQL Assistant.
 Utilize Explorer Tree to simplify creation of queries.
 Use SQL Assistant to import/export a LOB.

Teradata Proprietary and Confidential

Teradata SQL Assistant

Page 15-1

Notes

Page 15-2

Teradata SQL Assistant

Table of Contents
SQL Assistant ............................................................................................................................ 15-4
Defining a Data Source .............................................................................................................. 15-6
Compatibility ..................................................................................................................... 15-6
Defining a Teradata .Net data source ................................................................................. 15-6
Defining a Data Source (cont.) .................................................................................................. 15-8
Defining an ODBC Data Source ............................................................................................ 15-8
Defining a Data Source (cont.) ................................................................................................ 15-10
ODBC Driver Setup for LOBs ............................................................................................. 15-10
Connecting to a Data Source .................................................................................................... 15-12
Main Window........................................................................................................................... 15-14
Database Explorer Tree ............................................................................................................ 15-16
Creating and Executing a Query .............................................................................................. 15-18
Creating statements (single and multi-queries) ................................................................ 15-18
Dragging Object Names to the Query Window ....................................................................... 15-20
Dragging Multiple Objects............................................................................................... 15-20
Query Options .......................................................................................................................... 15-22
To submit any part of any query .......................................................................................... 15-22
Clearing the Query Window ................................................................................................ 15-22
Formatting a Query .............................................................................................................. 15-22
Viewing Query Results ............................................................................................................ 15-24
Sorting an Answerset Locally .............................................................................................. 15-24
Formatting Answersets ............................................................................................................ 15-26
Using Query Builder ................................................................................................................ 15-28
Description of the Options ............................................................................................... 15-28
History Window ....................................................................................................................... 15-30
General Options ....................................................................................................................... 15-32
Connecting to Multiple Data Sources ...................................................................................... 15-34
Additional Options ................................................................................................................... 15-36
Importing/Exporting Large Object Files .................................................................................. 15-38
Teradata SQL Assistant 12.0 Note ....................................................................................... 15-38
Importing/Exporting Large Object Files .................................................................................. 15-40
To Import a LOB into Teradata ............................................................................................... 15-40
Selecting from a Table with a LOB ......................................................................................... 15-42
Displaying a JPG within SQL Assistant .................................................................................. 15-44
Teradata SQL Assistant Summary ........................................................................................... 15-46
Module 15: Review Questions ................................................................................................. 15-48
Lab Exercise 15-1 .................................................................................................................... 15-50
Lab Exercise 15-1 (cont.) ..................................................................................................... 15-52
Lab Exercise 15-1 (cont.) ..................................................................................................... 15-56

Teradata SQL Assistant

Page 15-3

SQL Assistant
Teradata SQL Assistant is an information discovery tool designed for the Windows
operating system (e.g., Windows 7). Teradata SQL Assistant retrieves data from any
ODBC-compliant database server. The data can then be manipulated and stored on the
desktop PC.
Teradata SQL Assistant is a query tool written for relational database developers. It is
intended for SQL-proficient developers who know how to formulate queries for processing
on Teradata or other ODBC-compliant Databases. Used as a discovery tool, Teradata SQL
Assistant catalogs submitted instructions to arrive at a derived answer. Teradata SQL
Assistant stores the history of your SQL in a local Microsoft Access database table. This
history is available in future executions of Teradata SQL Assistant.
Teradata SQL Assistant accepts standard Teradata SQL, DDL, and DML. In addition,
Teradata SQL Assistant sends native SQL to any other database that provides an ODBC
driver. If the driver supports the statements, they are processed correctly.
Key features of SQL Assistant include:










Create reports from any Relational Database that provides an ODBC interface
Export data from the database to a file on a PC
Import data from a PC file directly to the database
Use an import file to create many similar reports (query results or Answer sets).
Send queries to any supported database or the same query to many different
databases
Create a historical record of the submitted SQL with timings and status information
such as success or failure
Use the Database Explorer Tree to easily view database objects
Use a procedure builder that gives you a list of valid statements for building the
logic of a stored procedure
Limit data returned to prevent runaway queries

Teradata SQL Assistant also benefits database administrators by allowing them to directly
issue SHOW statements to view text for CREATE or REPLACE commands. The DBA
copies the text to the Query window, uses the Replace function to change a database name,
and reissues the CREATE or REPLACE to define a new object with this new name. You
can also display the CREATE text by going to the shortcut menu of the Database Explorer
Tree and clicking Show Definition.

Page 15-4

Teradata SQL Assistant

SQL Assistant Features
SQL Assistant is a Windows-based utility for submitting SQL to Teradata.
SQL Assistant has the following properties:

• Windows-based
• Two providers are available for Teradata connections:
– Teradata ODBC Driver
– Teradata .Net Data Provider

• Can be used to access other supported ODBC-compliant databases.
• Permits retrieval of previously used queries (History).
– Saves information about previous query result sets.

• Supports DDL, DML and DCL commands.
– Query Builder feature allows for easy creation of SQL statements.

•
•
•
•

Provides both import and export capabilities to files on a PC.
Provides a Database Explorer Tree to easily view database objects.
Does not support non-ODBC compliant syntax such as WITH BY and FORMAT.
Teradata Studio Express is a newer name for SQL Assistant Java Edition.
– Targeted to Java developers who are familiar with Eclipse

Teradata SQL Assistant

Page 15-5

Defining a Data Source
Before using Teradata SQL Assistant to access the Teradata database, you must first install
the Teradata ODBC driver on your PC and the .Net Data Provider for Teradata. When
connecting to a Teradata database, you can use either ODBC or the .Net Data Provider for
Teradata.
Connection to any other database must be made through an ODBC connection. In order to
use the ODBC connection, a vendor specific ODBC driver must be installed.
Before you can use Teradata SQL Assistant, you will need to define a “data source”, namely
the instance of the database you wish to work with.

Compatibility
Teradata SQL Assistant is certified to run with any Level 2 compliant 32-bit ODBC driver.
The product also works with Level 1 compliant drivers, but may not provide full
functionality. Consult the ODBC driver documentation to determine the driver's
conformance level. Most commercially available ODBC drivers conform to Level 2.

Defining a Teradata .Net data source
Use the Connection Information dialog to create, edit and delete data sources for .Net for
Teradata. This dialog box is also used to connect to a .Net data source.
To define a Teradata .Net data source
1. Open Teradata SQL Assistant.
2. Select Teradata .Net from the provider drop down list.
3. Click the Connect icon or go to Tools > Connect.
4. Use the Connection Information dialog to choose a .Net data source.
5. Create a new data source by entering the name and server and other applicable
information
Note: This module will illustrate the screens defining an ODBC data source. The specific
screens defining a Teradata .Net data source are not provided in this module, but are similar

Page 15-6

Teradata SQL Assistant

Defining a Data Source
You can define an ODBC data source in these ways:
• SQL Assistant (select Connect icon)
• Select Tools > Define ODBC Data Source
or

• ODBC Data Source Administrator Program
SQL Assistant has 2 provider options:
• Teradata .Net Data Provider
• ODBC

Select the System DSN tab and
click on Add to create a new data
source.
If using ODBC Administrator (not
shown), select the Machine Data
Source tab and click on Add to
create a new data source.

Teradata SQL Assistant

Page 15-7

Defining a Data Source (cont.)
When connecting to the Teradata database, use either the ODBC or the Teradata .Net Data
Provider. Connection to any other database must be made through an ODBC connection.

Defining an ODBC Data Source
An ODBC-based application like Teradata SQL Assistant accesses the data in a database
through an ODBC data source.
After installing Teradata SQL Assistant on a workstation or PC, start Teradata SQL
Assistant. Next, define a data source for each database.
The Microsoft ODBC Data Source Administrator maintains ODBC data sources and drivers
and can be used to add, modify, or remove ODBC drivers and configure data sources. An
About Box for each installed ODBC driver provides author, version number, module size,
and release date.
To define an ODBC data source, do one of the following:


From the Windows desktop, select …
Start > Control Panel > Administrative Tools > Data Sources (ODBC)



From the Windows desktop, select
Start > Programs > Teradata SQL Assistant
After SQL Assistant launches, select Tools > Define Data Source



Use the Connect icon from SQL Assistant and complete the dialog boxes.

In the “Define Data Source” dialog, decide what type of data source you wish to create:
Data Source Description

Explanation

A User Data Source can be
used only by the current
Windows user

An ODBC user data source stores information about how
to connect to the indicated data provider. A user data
source is only visible to you.

A System Data Source can be
used by any user defined on
your PC.

An ODBC system data source stores information about
how to connect to the indicated data provider. A system
data source is visible to all users on this machine,
including NT services.

Page 15-8

Teradata SQL Assistant

Defining a Data Source (cont.)

If using ODBC Administrator, you will be
given the user/system data source screen as
shown to the left.
You will not get this display if defining your
ODBC data source via SQL Assistant.

Select Teradata as the driver
and click Finish on the
confirmation screen.

Teradata SQL Assistant

Page 15-9

Defining a Data Source (cont.)
A dialog box (specific to Teradata) is used to define the Teradata system you wish to access.
Select This
Field...

To...

Name

Enter a name that identifies this data source. You can also enter the name of the
system or the logon you will be using.

Description

Enter a description. This is solely a comment field to describe the data source
name you used.

Name(s) or IP
address(es)

Enter the name(s) or IP address(es) of the Teradata Server of your Teradata
system.
Identify the host by either name (alias) or IP address. The setup routine
automatically searches for other systems that have similar name aliases. Multiple
server names may be entered by pulling the entries on separate lines within this
box.

Do not resolve
alias name to IP
address

When this option is checked, setup routine does not attempt to resolve alias
names entered into the "Name(s) and IP address(es)" box at setup time.
Instead it will be resolved at connect time. When unchecked, the setup routine
automatically appends COPn (where n = 1, 2, 3, ..., 128) for each alias name you
enter.

Use Integrated
Security

Select this option if will be logging on using integrated security measures.

Mechanism

Select from the list of mechanisms that automatically appear in this box. Leave

this field blank to use the default mechanism.
Parameter

The authentication parameter is a password required by the selected mechanism.

Username

Enter a user name.

Password

Enter a password to be used for the connection if you intend to use Teradata SQL
Assistant in an unattended (batch) mode. Entering a password here is not very
secure and is normally not recommended.

Default Database

Enter the default database you want this logon to use. If the Default Database is
not entered, the Username is used as the default.

Account String

You can optionally enter one of the accounts that assigned to your Username.

Session Character Use the drop down menu to choose the character set. The default is ASCII.
Set

ODBC Driver Setup for LOBs
When defining the ODBC Data Source, from the ODBC Driver Setup screen, use the
Options button to display the Teradata ODBC Driver Options screen and verify that the
option - Use Native Large Object Support – is checked.

Page 15-10

Teradata SQL Assistant

Defining a Data Source (cont.)

To access LOBs with SQL Assistant, …
1) Click on the Options button.
2) Verify that "Use Native Large Object Support" option box is checked.

Teradata SQL Assistant

Page 15-11

Connecting to a Data Source
Connecting to a data source is the equivalent of “logging on” with SQL Assistant. You may
choose from any previously defined data source.
When the connection is complete, the Connect icon is disabled and the Disconnect icon, to
its right, is enabled.
To connect to multiple data sources:
1.
2.
3.

Go to the Tools > Options > General tab.
Click Allow connections to multiple data sources (Query windows),
Follow the procedure for connecting to a data source.

Each new data source appears in the Database Explorer Tree and opens a new query window
with the data source name. To disconnect from one data source, click the Query window
that is connected to the data source and click the disconnect icon.

Page 15-12

Teradata SQL Assistant

Connecting to a Data Source
1. Click on the Connection icon to connect to Teradata.
Provider options are Teradata .NET or ODBC.

2. Select a
data source.

3. Complete the logon dialog box.

Teradata SQL Assistant

Page 15-13

Main Window
The Query window is where you enter and execute a query. The results from your query are
placed into one or more Answerset windows.
The Answerset window is a table Teradata SQL Assistant uses to display the output of a
query.
The History window is a table that displays your past queries and related processing
attributes. The past queries and processing attributes are stored locally in a Microsoft Access
database. This gives you flexibility to work with previous SQL statements in the future.
The Database Explorer Tree displays on the left side of the main Teradata SQL Assistant
window. It displays an alphabetical listing of databases and objects in the connected
Teradata server. You can double-click on a database name to expand the tree display for
that database.
You can use the Database Explorer Tree to reduce the time required to build a query and
help reduce errors in object names. The Database Explorer Tree is optional so you can
display or hide this window.

Page 15-14

Teradata SQL Assistant

Main Window

Query
Window

Database
Explorer
Tree

Answerset
Window

History
Window

Teradata SQL Assistant

Page 15-15

Database Explorer Tree
The Database Explorer Tree feature of Teradata SQL Assistant displays an alphabetical
listing of databases and objects of the connected user. It further permits drilldown on
individual objects to view, column names, indexes and parameters as they apply. This is
simply done by double-clicking on a database name to expand the tree display for that
database.
The Database Explorer Tree displays on the left side of the main Teradata SQL Assistant
window. You can use the Database Explorer Tree to reduce the time required to build a
query and help reduce errors in object names. The Database Explorer Tree is optional so
you can display or hide this window.
Initially, the following Teradata databases are loaded into the Database Explorer Tree:




The User ID that was used to connect to the database
The user’s default database
The database "DBC"

To add additional databases:
1.

2.
3.

Do one of the following:
– With the Database Explorer Tree active, press Insert.
– Right-click anywhere in the Database Explorer Tree, then select Add
Database.
Type the database name to be added.
If you want the database loaded only for the current session, clear the check box.
By default, the check box is selected so the database will appear in the Database
Explorer Tree in future sessions.

The Database Explorer Tree allows you to drill down to show:





Page 15-16

Columns and indexes of tables
Columns of views
Parameters of macros
Parameters of stored procedures

Teradata SQL Assistant

Explorer Tree Option
• The Database Explorer Tree displays an alphabetical listing of databases and objects
of the connected user.

– It is not a database hierarchy, but a list of databases and objects that the user needs to access.

• To refresh a database, right-click on the database/user name and select "Refresh".
To add another
database to the
Explorer Tree,
right-click on the
Explorer Tree.

Teradata SQL Assistant

To expand an
item/object,
click on the +
sign or doubleclick on the
object name.

Page 15-17

Creating and Executing a Query
Queries are created by simply typing in the query text into the query window. It is not
necessary to add a semi-colon at the end of a command, unless you are entering multiple
commands in a single window. The query may be executed by clicking on the ‘Execute’
icon in the toolbar. This icon looks like a pair of footprints.
“Execute” actually executes the statements in the query one statement after the other and
optionally stops if one of the statements returns an error. Function key F5 can also be used
to execute queries serially.
“Execute Parallel” executes all statements at the same time - and is only valid if all the
statements are Teradata SQL/DML statements. This submits the entire query as a single
request, allowing the database to execute all the statements in parallel. Multiple answer sets
are returned in the Answerset window. Function key F9 can also be used to execute queries
in parallel.

Creating statements (single and multi-queries)
To allow multiple queries:
1.
2.
3.

Select Tools > Options.
Select the General option.
Select the option “Allow Multiple Queries”.

Once this option is selected, you may open additional tabs in the query window. Each tab
can contain a separate query, and any of these queries can be executed. However, only one
query can be executed at a time.
You can create queries consisting of one or more statements.
A semicolon is not required when you enter one statement at a time. However, a semicolon
between the statements is required for two or more statements.
Each statement in the query is submitted separately to the database; therefore, your query
may return more than one Answerset.

Page 15-18

Teradata SQL Assistant

Creating and Executing a Query
1. Create a query in the Query Window.
2. To execute a query use either the “execute” or the “execute parallel” buttons.
The “execute” button

(or F5) serially executes all statements in the query window.

The “execute parallel” button
or (F9) executes all statements in the query window
in a single multi-statement request. These queries are effectively executed in parallel.

Create query or queries
in Query Window.

Teradata SQL Assistant

Page 15-19

Dragging Object Names to the Query Window
You can drag object names from the Database Explorer tree to the Query pane.
Click and drag the object from the Explorer tree to the Query pane. The name of the object
appears in the Query window.
Teradata SQL Assistant includes an option (Tools > Options) that allows objects to
automatically be qualified when dragging or pasting names from the Database Tree into the
Query Window.
For example, if this option is checked, dragging the object "MyColumn" adds the parent
object "MyTable", and appears as "MyTable.MyColumn" in the Query Window.
Use the Ctrl key to add a comma after the object name when it is dragged to the Query
Window.

Dragging Multiple Objects
Use the Shift and Ctrl keys to select more than one object from the Database Explorer Tree
that can be dragged to the Query window.



Page 15-20

Use the Ctrl key to select additional objects.
Use the Shift key to select a range of objects.

Teradata SQL Assistant

Dragging Object Names to the Query Window
• Click and drag the object from the Database Explorer tree to the Query window. The
name of the object appears in the Query window.
– If the "Qualify names when dragged or pasted from the Database Tree" option (Tools >
Options) is checked, then the parent name is automatically included.
Hold Ctrl key – causes a comma to be included after the object

–
• Selecting and dragging multiple objects
– The Shift and Ctrl keys can also be used to select multiple objects in the Database Explorer
tree for the purpose of dragging multiple objects to the Query Window.

Note: The order of selection becomes
the order of columns in the SELECT.

Teradata SQL Assistant

Page 15-21

Query Options
To submit any part of any query
1.
2.
3.
4.

Select Tools > Options.
Select the Query tab.
Check the option “Submit only the selected Query text, when highlighted”.
From the Query window, select the part of the query to submit by highlighting it.

Clearing the Query Window
The query window may be cleared using the “Clear Query” button on the tool bar.

Formatting a Query
The query formatting feature adds line breaks and indentation before certain keywords,
making SQL that comes from automatic code generators or other sources more readable.

To Format a Query
1.
2.

Ensure a statement exists in the Query window.
Do one of the following:

From the Tool Bar, click the Format Query button.

Right-click in the Query window, then click Format Query

Press Ctrl+Q

Select Edit > Format Query
Note: Some keywords will cause a line break and possibly cause the new line to be
indented. If a keyword is found to already be the first word on a line and it is already
prefixed by a tab character, then its indentation level will not change.

Indentation
When you press the Enter key, the new line will automatically indent to the same level as
the line above.
If you highlight one or more lines in the query and press the Tab key, those lines are
indented one level. If you press Shift-Tab, the highlighted lines are un-indented by one
level.
This indentation of lines will only apply if the selected text includes a line feed character.
For example, you must either select at least part of two lines, or if selecting only one line,
then the cursor must be at the beginning of the next line. (Note that this is always the case
when you use the margin to select a line.) If no line end is included in the selected text, or
no text is selected, then a tab character will simply be inserted.

Page 15-22

Teradata SQL Assistant

Query Options
To submit any part of a query:
1. Using Tools > Options > Query
Check the option “Submit only the selected Query text, when highlighted”.
2. Highlight the text in the query window and execute.

To clear the text in the query window, use the “Clear Query” button.
To format a query, click on the “Format Query” button.

Highlighted query in
Query Window.

Teradata SQL Assistant

Page 15-23

Viewing Query Results
The results of a query execution may be seen in the Answer Set window. Large answer sets
may be scrolled using the slide bars.
The Answerset window is a table that displays the results from a statement. You can sort the
output in a number of ways and print as bitmaps in spreadsheet format. Individual cells,
rows, columns, or blocks of columns may be formatted to change the background and
foreground color as well as the font style, name, and size. You can make other
modifications such as displaying or hiding gridlines and column headers.
The table may be resized by stretching the Answerset window using standard Windows
sizing techniques. Individual columns, groups of columns, rows, or groups of rows may
also be sized.
Output rows may be viewed as they are being retrieved from the database.

Sorting an Answerset Locally
There are two ways to sort an Answerset locally: quick sort or full sort. A quick sort sorts
on a single column; a full sort allows sorting by data in multiple columns.
To sort an Answerset using quick sort:


Right-click any column heading to sort the data by that column only. The data is
initially sorted in ascending order. Right-click the same column header again
reverses the sort order.

Note: The output from certain statements (e.g., EXPLAIN) cannot be sorted this way.
To sort an Answerset using a full sort:


Do one of the following: From the Tool Bar, click the sort button, right-click in
the Answerset window and select Sort, or use the Edit > Sort menu.
In the Sort Answerset dialog box, all columns in the active window are presented in
the Available Columns list box.



Select the column name in the Available Columns list box, or use the up or down
arrow keys to highlight the column name and press Enter.
This moves the column name to the Sort keys list box. By default, the sort
direction for this new sort column is ascending (Asc). If you click a column in the
Sort Keys list box, or select the item using the arrow keys or mouse and press
Enter, it reverses to descending sort order (Dsc).

To remove a sort column from the list, double-click the column name, or use the arrow keys
to highlight the column and press Delete.

Page 15-24

Teradata SQL Assistant

Viewing Query Results
• The Answerset window is a table that displays the results from a statement.
• The output can be sorted in different ways:
– Quick sort (single column) – right click on the column heading
– Full sort (1 or more columns) – use Edit > Sort menu or Sort button

• Data can be filtered using the funnel option at the column level.

Result set in
Answerset Window.

Teradata SQL Assistant

Page 15-25

Formatting Answersets
You can format the colors, font name, font style, and font size of a block of cells, individual
cells, rows, columns, or the entire spreadsheet. You can also specify the number of decimal
places displayed and if commas are displayed to mark thousand separators in numeric
columns.
You can control the Answerset and the Answerset window by setting options. To set
Answerset options, select Tools > Options > Answerset tab.
For example, to display Alternate Answerset Rows in Color, check and first option in the
Answerset tab, and use the Choose button.


Selecting this option makes it easier to see Answerset rows. The option applies the
selected background color to alternating rows in the Answerset grid. The
remaining rows use the standard ‘Window Background’ color.



The Choose button displays the selected color. Clicking the Choose button allows
you to change this color.

To format the colors, font name, font style, and font size of a block of specific cells
individual cells, rows, columns, you right-click on the answer set cells. Some options are
listed below.
To display commas:
1. Right-click in the Answerset cell you wish to change and select Format Cells.
2. Check Display 1000 separators.
3. Click OK.
To display decimal places:
1. Right-click in the Answerset cell you wish to change and select Decimal Places.
2. Select a number between 0 and 4.
To designate up to 14 decimal places:
a. Right-click to bring up the Shortcut menu.
b. Click Format Cells to bring up the Format Cells dialog.
c. Under Numerics, select the desired number of decimal places.

Page 15-26

Teradata SQL Assistant

Formatting Answersets
To set defaults for Answersets,
use the Tools > Options > Answerset tab.

Teradata SQL Assistant

To format specific cells,
right-click on a cell or use the

icon.

Page 15-27

Using Query Builder
Query Builder provides the user with the ability to use ‘templates’ for SQL commands,
which may then be modified by the user. This is a convenient way to create commands
whose syntax is complex or not easily remembered. Simply find the appropriate command,
then drag and drop it into the query window where it may then be customized.
The Query Builder window is a floating window you can leave open when you are working
within the main Teradata SQL Assistant window.
To access the Query Builder tool, do one of the following:




Press F2.
Select Help > Query Builder.
Right-click in the Query window and select Query Builder from the shortcut
menu.

From the drop-down list in the upper left corner, choose one of the following options.
SQL
Statements

Select a command from the statement list in the left pane to display an
example of its syntax in the right pane.

Procedure
Builder

Select a stored procedure statement from the list in the left pane to
display an example of its syntax in the right pane.



If you create a custom.syn file, this option appears in the drop-down list.
The name will be the name you specified in the first line of the
custom.syn file. Select this option and the queries you defined in this file
will display.

Description of the Options
SQL Statements
When you choose the SQL Statements option, the statement list in the left pane shows
each of the statement types available on the current data source. These syntax examples
reflect the SQL syntax of the data source you are currently connected. For example, the
Teradata syntax file is Teradata.syn.
Procedure Builder
When you choose the Procedure Builder option, the left pane shows a list of statements
that are valid only when used in a CREATE or REPLACE procedure statement.

You can create a user-defined syntax file using any text editor such as Notepad or
Microsoft Word. The name of the file must be custom.syn. The format of this file is the
same as the other syntax files except it has an additional line at the start of the file
containing the name you wish to see in the dropdown list in the Query Builder dialog.

Page 15-28

Teradata SQL Assistant

Query Builder
Query Builder provides the user
with the ability to use 'templates'
for SQL commands.
1. Select Query Builder from the
Help menu or use F2.

2. Double-click on
SQL statement to
place sample
query in Query
Window.

Teradata SQL Assistant

Page 15-29

History Window
The History window is a table that displays your past queries and related processing
attributes. The past queries and processing attributes are stored locally in a Microsoft Access
2000 database. This allows the flexibility to work with previous SQL statements in the
future.
Clicking any cell in the SQL Statement column in the History window copies the SQL to the
Query Window. It may then be optionally modified and then resubmitted.
You can display or hide the History window at any time.
With Teradata SQL Assistant 13, all history rows are now stored in a single History
database. The History Filter dialog allows you to specify a set of filters to be applied to the
history rows. The operators include >, <, =, and LIKE. The filter applies to the entire
history table. When you click in the fields or boxes in the Filter dialog, the possible
operators and proper format are displayed at the bottom of the dialog.
You can filter your history on the following options:









Date
Data source
User Name
Statement Type – for example, SELECT or CREATE TABLE
Statement Count – show only those queries tat contain this many statements
Row Count
Elapsed Time
Show successful queries only

By default, Teradata SQL Assistant records all queries that are submitted. You may change
this option so Teradata SQL Assistant records only those statements that are successful, or
turn off history recording altogether.
The most recently executed statement appears as the first row in the History window. The
data may be sorted locally after it has been loaded into the History window. New entries are
added as the first row of history no matter what sort order has been applied.

Page 15-30

Teradata SQL Assistant

History Window
A history of recently submitted queries may be recalled by activating the ‘Show History’
feature. Key options available with the History window are:

•

All history rows are now stored in a single History database. The History Filter dialog allows you
to specify a set of filters to be applied to the history rows.

•

You can choose to display all queries (successful or not), use a history filter to only display
successful queries, or turn off history recording altogether.

Query is copied into Query Window.

Click on query in
History Window.

Teradata SQL Assistant

Page 15-31

General Options
To set general program preferences:
1.
2.
3.

Select Tools > Options.
Click the General tab.
Choose from the following options:


Allow multiple Queries - allows you to have multiple query windows open
simultaneously. With this option selected, the New Query command opens a new
tab in the Query window. The default for this setting is unchecked.



Display this string for Null data fields - enter the string you want displayed in
place of Null data fields in your reports and imported/exported files. The default
for this setting is "?".



Use a separate Answer window for
–
–

–

Page 15-32

Each Resultset - opens a new Answer window for each new result set
Each Query - opens a new Answer window for each new query, but uses tabs
within this window if the query returns multiple result sets. This is the default
setting.
Never - directs all query results to display in a single tabbed Answer window

Teradata SQL Assistant

General Options
General options (Tools > Options > General tab) that are available include:

• Allow connections to multiple data sources.
• Allow multiple queries per connection – allows you to have multiple query windows
open simultaneously. New Queries are opened in new tabs.

Data Format options include:

• Date format
• Display of NULL data values
• Decimal places to displace

Teradata SQL Assistant

Page 15-33

Connecting to Multiple Data Sources
You can connect to multiple data sources. The “Allow connections to multiple data
sources” option must be checked with the General Options.
Each new data source appears in the Database Tree and opens a new query window with the
data source name. To disconnect from one data source, click the Query window that is
connected to the data source and click the disconnect icon.
The example on the facing page show two connects to two different systems (tdt5-1 and
tdt6-1).

Page 15-34

Teradata SQL Assistant

Connecting to Multiple Data Sources
A separate query window is opened for each data source connection.

Connections have
been made to two
systems:

• tdt5-1
• tdt6-1

Multiple queries
for tdt5-1 are
shown via tabs.

History includes
the Source name
for queries.

Teradata SQL Assistant

Page 15-35

Additional Options
Teradata SQL Assistant provides many other tools and options, some of which are briefly
noted on the facing page.

Page 15-36

Teradata SQL Assistant

Additional Options
Additional Tools menu options include:

• Explain – runs an Explain function on the SQL
statements in the Query window and display the
results in the Answerset window.

• List Tables – displays the Table List dialog box where
you can enter the name of the database and the
resulting list of tables or view displays in an
Answerset window.

• List Columns – displays the Column List dialog box
where you can list the columns in a particular
table/view and the resulting list of columns displays
in an Answerset window.

• Disconnect – disconnects from the current data
source.

• Change Password – change your Teradata password.
• Compact History – reclaim space that may have been
lost when history rows were deleted.

• Options – establish various options for queries,
answersets, import/export operations, etc.

Teradata SQL Assistant

Page 15-37

Importing/Exporting Large Object Files
To import and/or export LOB (Large Object) files with SQL Assistant, you need to first
make sure the “Use Native Large Object Support” option is set when defining ODBC
driver for Teradata. This option was discussed earlier in this module.
This option is automatically selected starting with SQL Assistant 14.0.

Teradata SQL Assistant 12.0 Note
The following information is not needed with Teradata SQL Assistant 13 and later.
With Teradata SQL Assistant 12.0 and prior versions, to import a file larger than 10 MB into
a BLOB or CLOB column, you need to enable this capability within SQL Assistant.
To enable importing of files larger than 10 MB into BLOB or CLOB columns:
1.
2.
3.
4.

5.

Select Tools > Options, then select the Export/Import tab.
Select the Import tab.
Click in the Maximum Size of an Imported data file field.
Press the Esc key, then set the value to the size of the largest file you wish to load,
up to a maximum of 9 digits.
If you do not press the Esc key before entering the data, you will be limited to a
maximum of 7 digits in this field.
Click OK.

Note: This will be a temporary change. The next time you click OK on the Options screen
the value will be reset to the first 7 digits of the number you had last set - for example, 50
MB (50,000,000) will become 5 MB (5,000,000).

Page 15-38

Teradata SQL Assistant

Importing/Exporting Large Object Files
To import and/or export LOB (Large Object) files with SQL Assistant, you need to first
make sure the “Use Native Large Object Support” option is set with the data source.
Teradata SQL Assistant supports Large Objects. Large objects come in two types:

•

Binary – these columns may contain Pictures, Music, Word documents, PDF files, etc.

•

Text – these columns contain text data such as Text, HTML, XML or Rich Text (RTF).

SQL Assistant > Tools > Options

• To import a LOB, create a data file that
contains the names of the Large Objects.

• Use the Export/Import Options dialog to
specify the field delimiter.

• The example in this module assumes the
fields in the imported file are TAB
separated.

Teradata SQL Assistant

Page 15-39

Importing/Exporting Large Object Files
To import and/or export LOB (Large Object) files with SQL Assistant, you need to first
make sure the “Use Native Large Object Support” option is set when defining ODBC
driver for Teradata. This option was discussed earlier in this module.

To Import a LOB into Teradata
First, create a data file that contains the names of the LOB(s) to be imported. By default, the
data file needs to be located in the same folder as the LOB.
Assume the data file to import from contains 4 fields that are  separated.
Second, select the IMPORT DATA function and execute an Insert statement.
Example: INSERT INTO TF VALUES (?,?,?,?B);
The parameter markers in this example are:
?

The data for this parameter is read from the Import file. It is always a character
string, and will be converted to a numeric value if necessary.

?B The data for this parameter resides in a file that is in the same directory as the
Import file. The import file contains only the name of the file to be imported. The
contents of the file are loaded as a binary image (e.g., BLOB). You can also use ??
in place of ?B.
?C The data for this parameter resides in a file that is in the same directory as the
import file. The import file contains only the name of the file to be imported. Use
this marker to load a text file into a CHAR or CLOB column.

Page 15-40

Teradata SQL Assistant

Importing a LOB into Teradata
1. Create a data file that contains the name(s) of the LOB(s). This data file needs to be
located in the same folder as the LOB.
TF Manual LOB.txt
1
2

TF Student Manual
TF Lab Workbook

PDF
PDF

TF v1400 Student Manual.pdf
TF v1400 Lab Workbook.pdf

The various fields are separated by tabs.

2. Within SQL Assistant, from the File menu, select the "Import Data" option to turn on the
Import Data function.
3. Enter an INSERT statement within the Query window.
INSERT INTO TF VALUES (?, ?, ?, ?B);
4. In the dialog box that is displayed, choose the name of the file to import.
For example, enter or choose "TF Manual LOB.txt".
5. From the File menu, select the "Import Data" option to turn off the Import Data function.

Teradata SQL Assistant

Page 15-41

Selecting from a Table with a LOB
To select from a table with a LOB, simply execute a SELECT statement. If LOB column is
projected, then a dialog box is displayed to enter the file name for the LOB.
Note that multiple files that are exported will have sequential numbers added to the file
name.
In the example on the facing page, the file name was specified as TF_Manual. Therefore,
the two manuals that will be created are named:
TF_PDF001.pdf
TF_PDF002.pdf

Page 15-42

Teradata SQL Assistant

Selecting from a Table with a LOB
With SQL Assistant, enter the following query:
SELECT * FROM TF ORDER BY 1;
The following dialog box is displayed to represent the data files to export the LOBs into.
Also specify the "File Type" as a known Microsoft file type extension.

The answer set window will include
a link to exported data files.

Teradata SQL Assistant

Page 15-43

Displaying a JPG within SQL Assistant
The “Display as picture …” can be selected to display a JPG file within the answer set.
Optionally, the “Also save picture to a file” can be selected.
Note that large JPG files with display very large within the answer set window.

Page 15-44

Teradata SQL Assistant

Displaying a JPG within SQL Assistant
SELECT * FROM Photos ORDER BY 1;
Optionally, the "Display as picture …" can be
selected to display a JPG file within the answer set.

Teradata SQL Assistant

Page 15-45

Teradata SQL Assistant Summary
The Teradata SQL Assistant utility can be of great value to you. The facing page
summarizes some of the key features discussed in this module.

Page 15-46

Teradata SQL Assistant

Teradata SQL Assistant Summary
Characteristics of Teradata SQL Assistant include:

• Windows-based utility that can be used to submit SQL queries to the Teradata
database.

• Provides the retrieval of previously used queries (History).
• Saves information about previous query result sets.
• Supports DDL, DML and DCL commands.
– Query Builder feature allows for easy creation of SQL statements.
• Provides both import and export capabilities to files on a PC.
• Provides a Database Explorer Tree to easily view database objects.

Teradata SQL Assistant

Page 15-47

Module 15: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 15-48

Teradata SQL Assistant

Module 15: Review Questions
1. Which two data interfaces are available with Teradata SQL Assistant?
a.
b.
c.
d.

CLIv2
JDBC
ODBC
Teradata .Net

2. Separate history database files are needed to maintain queries for different data sources.
a. True
b. False
3. Which piece of query information is not available in the History Window?
a.
b.
c.
d.
e.

User name
Query band
Elapsed time
Data source name
Number of rows returned

4. What are two techniques to execute multiple statements as a multi-statement request?
__________________________________

Teradata SQL Assistant

__________________________________

Page 15-49

Lab Exercise 15-1
Check your understanding of the concepts discussed in this module by completing the lab
exercise as directed by your instructor.

Page 15-50

Teradata SQL Assistant

Lab Exercise 15-1
Lab Exercise 15-1
Purpose
In this lab, you will use Teradata SQL Assistant to define a data source and execute some simple
SQL commands.
What you need
Teradata SQL Assistant installed on the laptop or PC
Tasks
1. Define either an ODBC data source or a .NET data source using the following instructions.
Complete the dialog box with the following information:
Name – TFClass
Description – Teradata Training for your name
Name or IP Address – ________________________ (supplied by instructor)
Username – ________________________ (supplied by instructor)
Password – do not fill in your password (initially needed for a .NET connection)
Verify the following options are set properly.
Session Character Set – ASCII
Options – Session Mode – System Default
Use Native Large Object Support option is checked (not needed with a .NET connection)

Teradata SQL Assistant

Page 15-51

Lab Exercise 15-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercise as directed by your instructor.

Page 15-52

Teradata SQL Assistant

Lab Exercise 15-1 (cont.)
2. Connect to the data source your just created (TFClass) and logon with your username and password.
3. Using the Tools > Options tabs, ensure the following options are set as indicated:
General – Check – Allow connections to multiple data sources
General – Check – Allow multiple queries per connection
Query – Check – Submit only the selected Query text, when highlighted
Answerset – Check – Display alternate Answerset rows in color – choose a color
Answerset – Check – Display Column Titles rather than Column Names
History – Check – Display SQL text on a single line
History – Check – Do not save duplicate queries in history
4. If the Explorer Tree pane is not visible, use the View > Explorer option to display the Explorer Tree.
Add the following databases to the Explorer Tree: AP, DS, PD, Collaterals
(Hint: Right-click on the Explorer Tree pane to use the "Add Database …" option.
5. Using the Explorer Tree, view the table objects in your database.
6. Using the Query Window, execute the following query.
CREATE TABLE Old_Orders AS Orders WITH NO DATA;
Does the new table object appear in the table object list? _____ If not, "refresh" the database.

Teradata SQL Assistant

Page 15-53

Lab Exercise 15-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercise as directed by your instructor.
Use the following SQL to determine a count of rows in a table.
SELECT COUNT(*) FROM tablename;

Step 8 Hint: Your Old_Orders table should have 2400 rows. If not, check the dates you used
in your queries.

Page 15-54

Teradata SQL Assistant

Lab Exercise 15-1 (cont.)
7. Using the Query window, execute the following query.
INSERT INTO Old_Orders SELECT * FROM DS.Orders
WHERE orderdate BETWEEN '2008-07-01' AND '2008-09-30';
Use the "Format Query" option to format the query.
How many rows are in the Old_Orders table? _______
8. Using the History window, recall the query from step #7 and modify it to add orders from '2008-10-01'
through '2008-12-31'.
How many rows are in the Old_Orders table? _______
9. Execute the following query by using the drag and drop object feature of SQL Assistant.
SELECT custid, SUM (totalprice)
FROM Old_Orders
GROUP BY 1
ORDER BY 1;
Use the "Add Totals" feature to automatically generate a total sum for all of the orders.
What is the sum of the orders using this feature? ______________

Teradata SQL Assistant

Page 15-55

Lab Exercise 15-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercise as directed by your instructor.
Use the following SQL to create a view.
CREATE
VIEW viewname
AS
SELECT
column1, column2
FROM
table_or_view_name
[WHERE condition];

Use the following SQL to create a simple macro.
CREATE
AS

MACRO macroname
(SELECT * FROM table_or_view_name ; ) ;

Use the following SQL to execute a simple macro.
EXEC macroname;

Page 15-56

Teradata SQL Assistant

Lab Exercise 15-1 (cont.)
10. Format only the cells of sum of the ordertotal to be in italics and green.
11. Using the Query Builder feature, create a view named "Old_Orders_v" for the Old_Orders table that
includes the following columns and only includes orders for December, 2008.
orderid, custid, totalprice, orderdate
SELECT all of the rows from the view named "Old_Orders_v".
How many rows are displayed from this view? _______
12. Using the Query Builder feature, create a simple macro named "Old_Orders_m" which selects all of
the orders from the view named "Old_Orders_v".
Execute the macro "Old_Orders_m".
What is the sum of the orders for December using this "Add Totals" feature? ______________

13. (Optional) Use the Collaterals database to access the Photos table to display various JPG files.
Execute the following: SELECT * FROM Collaterals.Photos ORDER BY 1;
Note: Set the file type to JPG and check the option "Display as picture in Answerset".

Teradata SQL Assistant

Page 15-57

Notes

Page 15-58

Teradata SQL Assistant

Module 16
Analyze Primary Index Criteria

After completing this module, you will be able to:
 Identify Primary Index choice criteria.
 Describe uniqueness and how it affects space utilization.
 Explain row access, selection, and selectivity.
 Choose between single and multiple-column Primary Indexes.
 Describe why a table might be created without a primary index.
 Specify the syntax to create a table without a primary index.

Teradata Proprietary and Confidential

Analyze Primary Index Criteria

Page 16-1

Notes

Page 16-2

Analyze Primary Index Criteria

Table of Contents
Primary Index Choice Criteria ................................................................................................... 16-4
Primary Index Defaults .............................................................................................................. 16-6
CREATE TABLE – Indexing Rules .......................................................................................... 16-8
Order of Preference Exercise ................................................................................................... 16-10
Primary Index Characteristics .................................................................................................. 16-12
Multi-Column Primary Indexes ............................................................................................... 16-14
Primary Index Considerations .................................................................................................. 16-16
PKs and Duplicate Rows.......................................................................................................... 16-18
NUPI Duplicate Row Check .................................................................................................... 16-20
Primary Index Demographics .................................................................................................. 16-22
Column Distribution Demographics for a PI Candidate .......................................................... 16-24
SQL to View Data Demographics ........................................................................................... 16-26
Example of Using Data Demographic SQL ............................................................................. 16-28
TableSize View ........................................................................................................................ 16-32
SQL to View Data Distribution ............................................................................................... 16-34
E-R Diagram for Exercises ...................................................................................................... 16-36
Exercise 2 – Sample ................................................................................................................. 16-38
Exercise 2 – Choosing PI Candidates ...................................................................................... 16-40
What is a NoPI Table? ............................................................................................................. 16-52
Reasons to Consider Using NoPI Tables ................................................................................. 16-54
Creating a Table without a PI .................................................................................................. 16-56
How is a NoPI Table Implemented? ........................................................................................ 16-58
NoPI Random Generator .......................................................................................................... 16-60
The Row ID for a NoPI Table .................................................................................................. 16-62
Multiple NoPI Tables at the AMP Level ................................................................................. 16-66
Loading Data into a NoPI Table .............................................................................................. 16-68
NoPI Options............................................................................................................................ 16-70
Summary .................................................................................................................................. 16-72
Module 16: Review Questions ................................................................................................. 16-74
Module 16: Review Questions (cont.) ..................................................................................... 16-76
Lab Exercise 16-1 .................................................................................................................... 16-78
Lab Exercise 16-2 .................................................................................................................... 16-82

Analyze Primary Index Criteria

Page 16-3

Primary Index Choice Criteria
There are three Primary Index Choice Criteria: Access Demographics, Distribution
Demographics, and Volatility.


Access demographics are the first of three Primary Index Choice Criteria. Access
columns are those that would appear (with a value) in a WHERE clause in an SQL
statement. Choose the column most frequently used for access to maximize the
number of one-AMP operations.



Distribution demographics are the second of the Primary Index Choice Criteria.
The more unique the index, the better the distribution. Optimizing distribution
optimizes parallel processing.



In choosing a Primary Index, there is a trade-off between the issues of access and
distribution. The most desirable situation is to find a PI candidate that has good
access and good distribution. Many times, however, index candidates offer great
access and poor distribution or vice versa. When this occurs, the physical designer
must balance these two qualities to make the best choice for the index.



The third of the Primary Index Choice Criteria is volatility, or how often the data
values will change. The Primary Index should not be very volatile. Any changes to
Primary Index values may result in heavy I/O overhead, as the rows themselves
may have to be moved from one AMP to another. Choose a column with stable
data values.

Degree of Uniqueness and Space Utilization
The degree of uniqueness of a Primary Index has a direct influence on the space utilization.
The more unique the index, the better the space is used.

Fewer Distinct PI Values than Amps
For larger tables, it is not a good idea to choose a Primary Index with fewer distinct values
than the number of AMPs in the system when other columns are available. At best, one
index value would be hashed to each AMP and the remaining AMPs would carry no data.

Non-Unique PIs
Choosing a Non-Unique PI (NUPI) with some very non-unique values can cause “spikes” in
the distribution.

Unique (or Nearly-Unique) PIs
The designer should choose an index which is unique or nearly unique to optimize the use of
disk space. Remember that the PERM limit of a database (or user) is divided by the number
of AMPs in the system to yield a threshold that cannot be exceeded on any AMP.

Page 16-4

Analyze Primary Index Criteria

Primary Index Choice Criteria
ACCESS

Maximize one-AMP operations:
Choose the column(s) most frequently used for access.
Consider both join and value access.

DISTRIBUTION Optimize parallel processing:
Choose the column(s) that provides good distribution.

VOLATILITY

Reduce maintenance resource overhead (I/O):
Choose the column(s) with stable data values.

Note: Data distribution has to be balanced with Access usage in choosing a PI.
General Notes:

• A good logical model identifies the Primary Key for each table or entity.
– Do not assume that the Primary Key will become the Primary Index.
– It is common for many tables in a database to have a Primary Index that is
different than the Primary Key.

– This module will first cover PI tables then cover details of the NO PRIMARY INDEX
option. The general assumption in this course is that tables will have a PI.

Analyze Primary Index Criteria

Page 16-5

Primary Index Defaults
1.

If the NO PRIMARY INDEX clause is specified, then the table is created without a
primary index. If this clause is used, you cannot specify a primary index for the table.
There are a number of limitations associated with a NoPI table that will be listed later.

2.

If the PRIMARY INDEX, NO PRIMARY INDEX, PRIMARY KEY, or UNIQUE
options are NOT specified in the CREATE TABLE DDL, then:
Table to be created with or without a primary index will be based on a new
DBSControl General flag Primay Index Default. The default setting is "D" which
effectively means the default is to create a table with the first column as a NUPI.
D - This is the default setting. This setting works the same as the P setting.
P - The first column in the table will be selected as the non-unique primary index.
This setting works the same as that in the past when PRIMARY INDEX was not
specified.
N – The table will be created without a primary index (NoPI table).

3.

With the NoPI Table feature, the system default setting essentially remains the same as
that in previous Teradata releases where the first column was selected as the non-unique
primary index when the user did not specify a PRIMARY INDEX or a PRIMARY KEY
or a UNIQUE Constraint.
Users can change the default setting for PrimaryIndexDefault to P or N and not rely on
the system default setting which might be changed in a future release.

Page 16-6

Analyze Primary Index Criteria

Primary Index Defaults
A Teradata 13.0 DBSControl flag determines if a PI or NoPI table is created when
a CREATE TABLE DDL does NOT have any of the following explicitly specified:

• PRIMARY INDEX clause
• NO PRIMARY INDEX clause
• PRIMARY KEY or UNIQUE constraints
Values for DBS Control General field #53 "Primary Index Default":
D – "Teradata Default" (effectively same as option P)
P – "First Column is NUPI" – create tables with first column as a NUPI
N – "No Primary Index" – create tables without a primary index (NoPI)

The PRIMARY INDEX and NO PRIMARY INDEX clauses have precedence over
PRIMARY KEY and UNIQUE constraints.
If the NO PRIMARY INDEX clause is specified AND if PRIMARY KEY or UNIQUE constraints
are also defined, these will be implemented as Unique Secondary Indexes.

• It may be unusual to create a NoPI table with these additional indexes.

Analyze Primary Index Criteria

Page 16-7

CREATE TABLE – Indexing Rules
The primary index may be explicitly specified at table create time. If not, a primary index
choice will be made based on other choices made. Primary key and uniqueness constraints
are always implemented by Teradata as unique indexes, either primary or secondary.
This chart assumes the system default is to create tables with a Primary Index.
The index implementation schedule is as follows:
Is a PI specified?

No

PK specified?

PK = UPI

PK specified and
UNIQUE constraints specified?

PK = UPI
UNIQUE constraints = USI(s)

UNIQUE column level constraints
only specified?

1st UNIQUE column level
constraint = UPI
Other UNIQUE constraints =
USI(s)

UNIQUE column level constraints and
table level UNIQUE constraints
specified?

1st UNIQUE column level
constraint = UPI
Other UNIQUE constraints =
USI(s)

UNIQUE table level constraints only
specified?

1st UNIQUE table level constraint
= UPI
Other table level UNIQUE
constraints = USI(s)

Yes

Page 16-8

Neither specified?

1st column = NUPI

PK specified?

PK = USI

PK specified and UNIQUE constraints
specified?

PK = USI
UNIQUE constraints = USI(s)

UNIQUE constraints only specified?

UNIQUE constraints = USI(s)

Analyze Primary Index Criteria

CREATE TABLE – Indexing Rules
Unspecified Primary Index option – assuming system default is "Primary Index"
IfIf
else
else

PRIMARY
PRIMARYKEY
KEYspecified
specified
11ststUNIQUE
column
UNIQUE columnlevel
levelconstraint
constraintspecified
specified

PK
PK
column
column

==UPI
UPI
==UPI
UPI

else
else
else
else

11ststUNIQUE
UNIQUEtable
tablelevel
levelconstraint
constraintspecified
specified
specified*
11ststcolumn
column specified*

column(s)
column(s)==UPI
UPI
column
=
NUPI
column = NUPI

* If system default is "No Primary Index" AND none of the
following have specified (Primary Index, PK, or UNIQUE),
then table is created as a NoPI table.

Specified PRIMARY INDEX or NO PRIMARY INDEX
IfIf
PRIMARY
PRIMARYKEY
KEYisisalso
alsospecified
specified
and
any
UNIQUE
constraint
and any UNIQUE constraint(column
(columnor
ortable
tablelevel)
level)

PK
==USI
PK
USI
column(s)
=
USI
column(s) = USI

Every PK or UNIQUE constraint is always implemented as a
unique index.

Analyze Primary Index Criteria

Page 16-9

Order of Preference Exercise
Complete the exercise on the facing page. Answers will be provided by your instructor.
Some additional examples include:
If table_5 was created as follows:
CREATE TABLE table_5
(col1 INTEGER NOT NULL
,col2 INTEGER NOT NULL
,col3 INTEGER NOT NULL
,CONSTRAINT uniq1 UNIQUE (col1,col2)
,CONSTRAINT uniq2 UNIQUE (col3));

Then, the indexes are a UPI on (col1,col2) and a USI on (col3).
If table_5 was created as follows:
CREATE TABLE table_5
(col1 INTEGER NOT NULL
,col2 INTEGER NOT NULL
,col3 INTEGER NOT NULL
,CONSTRAINT uniq1 UNIQUE (col3)
,CONSTRAINT uniq2 UNIQUE (col1,col2));

Then, the indexes are a UPI on (col3) and a USI on (col1,col2).
Notes:



Page 16-10

Recommendation: Specify Primary Index when creating a table.
Table level constraints are typically used to specify a PK or UNIQUE constraint for
multiple columns.

Analyze Primary Index Criteria

Order of Preference Exercise
Assuming the system default is "Primary Index", show the indexes that are created as a result of the DDL.
CREATE TABLE table_1
(col1 INTEGER NOT NULL UNIQUE
,col2 INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE table_2
(col1 INTEGER NOT NULL PRIMARY KEY
,col2 INTEGER)
PRIMARY INDEX (col2);
CREATE TABLE table_3
(col1 INTEGER
,col2 INTEGER NOT NULL);

col1 =
col2 =

col1 =
col2 =

col1 =
col2 =

CREATE TABLE table_4
(col1 INTEGER NOT NULL
,col2 INTEGER NOT NULL
,col3 INTEGER NOT NULL UNIQUE
,CONSTRAINT pk1 PRIMARY KEY (col1,col2));

col1 =
col2 =
col3 =
(col1,col2) =

CREATE TABLE table_5
(col1 INTEGER NOT NULL
,col2 INTEGER NOT NULL
,col3 INTEGER NOT NULL UNIQUE
,CONSTRAINT uniq1 UNIQUE (col1,col2));

col1 =
col2 =
col3 =
(col1,col2) =

Analyze Primary Index Criteria

UPI =
Unique Primary Index
NUPI = Non Unique Primary Index
USI =
Unique Secondary Index

Page 16-11

Primary Index Characteristics
Each table has one and only one Primary Index. A Primary Index may be different than a
Primary Key.

UPI = Best Performance, Best Distribution





UPIs offer the best performance possible for several reasons. They are:
A Unique Primary Index involves a single base table row at most
No Spool file is ever required
Single value access via the Primary Index is a one-AMP operation and uses
only one I/O

NUPI = Good Performance, Good Distribution







Page 16-12

NUPI performance differs from UPI performance because:
Non-Unique Primary Indexes may involve multiple table rows.
Duplicate values go to the same AMP and the same data block, if possible.
Multiple I/Os are required if the rows do not fit in a single data block.
Spool files are used when necessary.
A duplicate row check is required on INSERT and UPDATE for a SET table.

Analyze Primary Index Criteria

Primary Index Characteristics
Primary Indexes (UPI and NUPI)
• A Primary Index may be different than a Primary Key.
• Every table has only one Primary Index.
• A Primary Index may contain null(s).
• Single-value access uses ONE AMP and, typically, one I/O.
Unique Primary Index (UPI)
• Involves a single base table row at most.
• No spool file is ever required.
• The system automatically enforces uniqueness on the index value.
Non-Unique Primary Index (NUPI)

•
•
•
•
•

May involve multiple base table rows.
A spool file is created when needed.
Duplicate values go to the same AMP and the same data block.
Only one I/O is needed if all the rows fit in a single data block.
Duplicate row check is required for a Set table.

Analyze Primary Index Criteria

Page 16-13

Multi-Column Primary Indexes
In practice, Primary Indexes are sometimes composed of several columns. Such composite
indexes are known as multi-column Primary Indexes. They are used quite commonly and
you can probably think of several existing applications that utilize them.

Increased Uniqueness
There are both advantages and disadvantages to using multi-column PIs. Perhaps the most
important advantage is that by combining several columns, you can produce an index that is
much more unique than any of the component columns. This increased uniqueness will
result in better data distribution, among other benefits.
For example:




PI = Lastname
PI = Lastname + Firstname
PI = Lastname + Firstname + MI

The above example points out how better data distribution occurs. Notice that each
succeeding Primary Index is more unique than the one preceding it. That is, there are far
less individuals with identical last and first names then there are with the same last name,
and so on.
Increasing uniqueness means that as the number of columns increases:




The number of distinct values increases.
The number of rows per value decreases.
The selectivity increases.

Trade-off
The disadvantage involved with multi-column indexes is that as the number of columns
increases, the index becomes less usable.
A multi-column index can only be accessed when values for all columns are specified in the
SQL statement. If a single value is omitted, the Primary Index cannot be used.
It is important for the physical designer to balance these factors and use multi-column
indexes that have just enough columns. This will result in optimum uniqueness while
reducing unnecessary full table scans.

Page 16-14

Analyze Primary Index Criteria

Multi-Column Primary Indexes
Advantage
More columns = more uniqueness

• Number of distinct values increase.
• Rows/value decreases.
• Selectivity increases.
Disadvantage
More columns = less usability

• PI can only be used when values for all PI columns are provided in SQL
statement.

• Partial values cannot be hashed.

Analyze Primary Index Criteria

Page 16-15

Primary Index Considerations
The facing page summarizes the concepts you have seen throughout this module and
provides a list of the most important considerations when choosing a Primary Index.
The first three considerations summarize the three types of demographics: Access,
Distribution, and Volatility.
You should choose a column with good distribution to maximize parallel processing. A
good rule-of-thumb is to base your Primary Index on the column(s) most often used for
access (if you don't have too many rows per value) to maximize one-AMP operations.
Finally, Primary Index values should be stable to reduce maintenance resource overhead.
Make sure that the number of distinct values for a PI is greater than the number of AMPs in
the system, whenever possible, or some AMPs will have no rows.
Duplicate values hash to the same AMP and are stored in the same data block. If the index
is very non-unique, multiple data blocks are used and incur multiple I/Os.
Very non-unique PIs may skew space usage among AMPs and cause Database Full
conditions on AMPs where excessive numbers of rows are stored.

Page 16-16

Analyze Primary Index Criteria

Primary Index Considerations
• Base PI on the column(s) most often used for access, provided that the values
are unique or nearly unique.

• Choose a column (or columns) with good distribution and no spikes.
– NULLs and zero (for numeric data types) hash to binary zeroes and to the same
AMP.

• Distinct values distribute evenly across all AMPs.
– For large tables, the number of Distinct Primary Index values should be much
greater (at least 10X; 50X may be better guideline) than the number of AMPs.

• Duplicate values hash to the same AMP and are stored in the same data block
when possible.

– Very non-unique values use multiple data blocks and incur multiple I/Os.
– Very non-unique values may skew space usage among AMPs and cause premature
Database Full conditions.

– A large number of NUPI duplicate values on a SET table can cause expensive
duplicate row checks.

• Primary Index values should not be highly volatile.

Analyze Primary Index Criteria

This is how u write it here...

Page 16-17

PKs and Duplicate Rows
Each row in table or entity in a good logical model will be uniquely identified by the table's
primary key.




Every table must have a Primary Key.
Primary Keys (PKs) must be unique.
Primary Keys cannot be changed.

In Set tables, the Teradata Database does not allow duplicate rows. When a table has a
Unique Primary Index (UPI), the UPI enforces uniqueness. When a table has a Non-Unique
Primary Index (NUPI), the matter can become more complicated.
In the case of a NUPI (without a USI defined), the file system must compare data values
byte-by-byte within a Row Hash in order to ensure uniqueness. Many NUPI duplicates
result in lots of duplicate row checks, which can be quite expensive in terms of system
resources.
The way to avoid such a situation is to define a USI on the table whenever you have a NUPI.
The USI does the job of enforcing uniqueness and thus save you the cost of doing duplicate
row checks. Often, the best column(s) to use when defining such a USI is the PK.
Specifying a UNIQUE constraint on a column(s) other than the Primary Index also causes
the creation of a Unique Secondary Index.
An exception to the above is found when using the load utilities, such as FastLoad and
MultiLoad. These utilities do not allow the use of Secondary Indexes to enforce uniqueness.
Therefore, a full row comparison is still necessary.

Page 16-18

Analyze Primary Index Criteria

PKs and Duplicate Rows
Rule: Primary Keys Must be UNIQUE and NOT NULL.

• This rule of Relational Theory eliminates duplicate rows, which have
plagued the industry for decades.

• With Set tables (the default in Teradata transaction mode), the Teradata
database does not allow duplicate rows.

• With Multiset tables, the Teradata database allows duplicate rows.
– All indexes must be non-unique indexes (NUPI and NUSI) in order to allow
duplicate values.

– A unique index (UPI or USI) will prevent duplicate index values, and therefore,
duplicate rows (even if the table is created as Multiset).

• If no unique index exists for a SET table, the file system compares data
values byte by byte within a Row Hash to ensure row uniqueness in a table.

– Many NUPI duplicates result in expensive duplicate row checks.
– To avoid these duplicate row checks, use a Multiset table.

Analyze Primary Index Criteria

Page 16-19

NUPI Duplicate Row Check
Set tables (the default) do not allow duplicate rows. When a new row is inserted into a Set
table with a Non-Unique Primary Index, the system must perform a NUPI Duplicate Row
Check.
The table on the facing page illustrates the number of logical reads that must occur when
this happens. The middle column is the number of logical reads required before that one
row can be inserted. The right hand column shows how many cumulative logical reads
would be required to insert all the rows up to and including that one.
As you can see, when you have a NUPI with excessive rows per value, the number of logical
reads becomes prohibitively high. It is very important to limit the NUPI rows per value
whenever possible.
The best way to avoid NUPI Duplicate row checks is to create the table as a MULTISET
table.
Note: USIs should be used for access or uniqueness enforcement. They should not be
used just to avoid duplicate row checking, since sometimes they may be used and at
other times they will not be used. The overhead of a USI does not justify the cost of
trying to avoid the duplicate row check and they don't avoid the cost in most cases.
As a suggestion, keep the number of NUPI rows per value within the number of rows which
will fit into you largest block. This will allow the system to satisfy a single-value NUPI
access with one or two data block I/Os.

Page 16-20

Analyze Primary Index Criteria

NUPI Duplicate Row Check
Limit NUPI rows per value to rows per block whenever possible.
To avoid NUPI duplicate row checks, create the table as a MULTISET table.

Row Number
to be inserted

This chart illustrates the
additional I/O overhead.

1
2
3
4
5
6
7
8
9
10
20
50
100
200
500
1000

Number of Rows
that must be
logically read first

Cumulative Number
of logical row reads

0
1
2
3
4
5
6
7
8
9
19
49
99
199
499
999

0
1
3
6
10
15
21
28
36
45
190
1225
4950
19900
124750
499500

this is how we can write it.......

Analyze Primary Index Criteria

Page 16-21

Primary Index Demographics
As you have seen, the three types of demographics important to choosing a Primary Index
are: Access demographics, Distribution demographics and Volatility demographics. To
make proper PI selections, you must have accurate demographics. Accurate demographics
serve to quantify all three-index selection determinants.

Access Demographics
Access demographics identify index candidates that maximize one-AMP operations. Both
Value Access and Join Access are important to PI selection. The higher the value, the more
often the column is used for access.

Distribution Demographics
Distribution demographics identify index candidates that optimize parallel processing.
Choose the column(s) that provides the best distribution.

Volatility
Volatility demographics identify table columns that are UPDATEd. This item does not refer
to INSERT or DELETE operations.
Volatility demographics identify index candidates that reduce maintenance I/O. You want
to have columns with stable data values as your PI candidates.
In this module, you will see how to use Distribution demographics to select PI candidates.
Access and Volatility demographics will be presented in a later module.

Page 16-22

Analyze Primary Index Criteria

Primary Index Demographics
Access Demographics

• Identify index candidates that maximize one-AMP operations.
• Columns most frequently used for access (Value and Join).
Distribution Demographics

• Identify index candidates that optimize parallel processing.
• Columns that provide good distribution.
Volatility Demographics

• Identify index candidates with low maintenance I/O.

Without accurate demographics, index choices are unsubstantiated.
Demographics quantify all 3 index selection determinants.

Analyze Primary Index Criteria

Page 16-23

Column Distribution Demographics for a PI Candidate
Column Distribution demographics are expressed in four ways: Distinct Values, Maximum
Rows per Value, Maximum Rows NULL and Typical Rows per Value. These items are
defined below:


Distinct Values is the total number of different values a column contains. For PI
selection, the higher the Distinct Values (in comparison with the table row count),
the better. Distinct Values should be greater than the number of AMPs in the
system, whenever possible. We would prefer that all AMPs have rows from each
TABLE.



Maximum Rows per Value is the number of rows in the most common value for
the column or columns. When selecting a PI, the lower this number is, the better
the candidate. For a column or columns to qualify as a UPI, Maximum Rows per
Value must be 1.



Maximum Rows NULL should be treated the same as Maximum Rows Per Value
when being considered as a PI candidate.



Typical Rows per Value gives you an idea of the overall distribution which the
column or columns would give you. The lower this number is, the better the
candidate. Like Maximum Rows per Value, Typical Rows per Value should be
small enough to fit on one data block.

The illustration at the bottom of the facing page shows a distribution graph for a column
whose values are states. Note in the graph that 30K = Maximum Rows NULL, and 15K =
Maximum Rows per Value (CA). Typical Rows per Value is approximately 30.
You should monitor all demographics periodically as they change over time.

Page 16-24

Analyze Primary Index Criteria

Column Distribution Demographics for a PI Candidate
Distinct Values
• The more the better (compared to table row count).
• Should have enough values to allow for distribution to all AMPs.
Maximum Row Per Value 15K
• The fewer the better.
30K
Maximum Rows Null
• The fewer the better.
• A very large number indicates a very large distribution spike.
• Large spikes can cause serious space consumption problems.

Typical Rows Per Value
• The fewer the better.
• Monitor periodically as it changes over time.
ROWS
46K

30K
15K
100

70
0
Values:

NULL AZ

CA

Analyze Primary Index Criteria

30

10

30

30

30

GA

HI

MI

MO

NV

30
NY

30

OH OK

30 30

30

30

TX VA

VT

WA

Page 16-25

SQL to View Data Demographics
The facing page contains simple examples of SQL that can be used to determine data
demographics for a column.
The Average Rows per value and Typical Rows per value can be thought of as the Mean and
Median of a data set.

Page 16-26

Analyze Primary Index Criteria

SQL to View Data Demographics
# of Distinct Values for a column:
SELECT

COUNT(DISTINCT(column_name))

FROM tablename;

Max Rows per Value for all values in a column:
SELECT
GROUP BY 1

column_name, COUNT(*)
ORDER BY 2 DESC;

FROM tablename

Max Rows per Value for 5 most frequent values:
SELECT
FROM

TOP 5 t_colvalue, t_count
(SELECT
column_name, COUNT(*)
FROM
tablename
GROUP BY 1)
t1 (t_colvalue, t_count)
ORDER B Y t_count DESC;

Average Rows per Value for a column (mean value):
SELECT

COUNT(*) / COUNT(DISTINCT(col_name)) FROM tablename;

Typical Rows per Value for a column (median value):
SELECT
FROM

t_count
(SELECT

AS "Typical Rows per Value"
col_name, COUNT(*) FROM tablename GROUP BY 1)
t1 (t_colvalue, t_count),
(SELECT
COUNT(DISTINCT(col_name)) FROM tablename)
t2 (num_rows)
QUALIFY ROW_NUMBER () OVER (ORDER B Y t1.t_colvalue) = t2.num_rows /2 ;

Analyze Primary Index Criteria

Page 16-27

Example of Using Data Demographic SQL
The facing page contains simple examples of SQL that can be used to determine data
demographics for a column.

Page 16-28

Analyze Primary Index Criteria

Example of Using Data Demographic SQL
# of Distinct Values for a column:
SELECT
FROM

COUNT(DISTINCT(Last_name))
AS "# Values"
Customer;

Max Rows per Value for all values:
SELECT
FROM
GROUP BY
ORDER BY

Last_name, COUNT(*)
Customer
1
2 DESC;

Max Rows per Value for 3 most frequent values:
SELECT
FROM

t_colvalue, t_count
(SELECT
Last_name, COUNT(*)
FROM
Customer GROUP BY 1)
t_table (t_colvalue, t_count)
QUALIFY RANK (t_count) <= 3;

Analyze Primary Index Criteria

# Values
464

Last_name
Smith
Jones
Wilson
White
Lee
:

t_colvalue
Smith
Jones
Wilson

Count(*)
52
41
38
36
36
:

t_count(*)
52
41
38

Page 16-29

Example of Data Demographic SQL (cont.)
The facing page contains simple examples of SQL that can be used to determine data
demographics for a column.

Page 16-30

Analyze Primary Index Criteria

Example of Data Demographic SQL (cont.)
Average Rows per Value for a column (mean):
SELECT

FROM

'Last_name' AS "Column Name"
,COUNT(*) / COUNT(DISTINCT(Last_name))
AS ”Average Rows”
Customer;

Column Name
Last_name

Average Rows
15

Column Name
Last_name

Typical Rows
11

Typical Rows per Value for a column (median):
SELECT

'Last_name'
,t_count
(SELECT
FROM

AS "Column Name"
AS "Typical Rows"
FROM
Last_name, COUNT(*)
Customer GROUP BY 1)
t_table (t_colvalue, t_count),
(SELECT
COUNT(DISTINCT(Last_name))
FROM
Customer)
t_table2 (t_distinct_count)
QUALIFY RANK (t_colvalue) = (t_distinct_count / 2);

Analyze Primary Index Criteria

Page 16-31

TableSize View
The TableSize[V][X] views are Data Dictionary views that provides AMP Vproc
information about disk space usage at a table level, optionally for tables the current User
owns or has SELECT privileges on.

Example
The SELECT statement on the facing page looks for poorly distributed tables by displaying
the CurrentPerm figures for a single table on all AMP vprocs.
The result displays one table, table2, which is evenly distributed across all AMP vprocs in
the system. The CurrentPerm figure is nearly identical across all vprocs. The other table,
table2_nupi, is poorly distributed. The CurrentPerm figures range from 9,216 bytes to
71,680 bytes on different AMP vprocs.

Page 16-32

Analyze Primary Index Criteria

TableSize View
Provides AMP Vproc disk space usage at table level.
DBC.TableSize[V][X]

Vproc
TableName

DatabaseName
CurrentPerm

AccountName
PeakPerm

Result:
Example: Display table distribution
across AMPs.
SELECT

Vproc
,CAST (TableName AS CHAR(20))
,CurrentPerm
,PeakPerm
FROM
DBC.TableSizeV
WHERE
DatabaseName = USER
ORDER BY TableName, Vproc ;

Analyze Primary Index Criteria

Vproc
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7

TableName
table2
table2
table2
table2
table2
table2
table2
table2
table2_nupi
table2_nupi
table2_nupi
table2_nupi
table2_nupi
table2_nupi
table2_nupi
table2_nupi

CurrentPerm
41,472
41,472
40,960
40,960
40,960
40,960
40,960
40,960
22,528
22,528
71,680
71,680
9,216
9,216
59,392
59,392

PeakPerm
53,760
53,760
52,736
52,736
53,760
53,760
54,272
54,272
22,528
22,528
71,680
71,680
9,216
9,216
59,392
59,392

Page 16-33

SQL to View Data Distribution
The facing page contains simple examples of SQL that can be used to determine actual data
distribution for a table.

Page 16-34

Analyze Primary Index Criteria

SQL to View Data Distribution
Ex: Display the distribution of Customer by AMP space usage.
SELECT

FROM
WHERE
AND
ORDER BY

Vproc
,TableName (CHAR(15))
,CurrentPerm
DBC.TableSizeV
DatabaseName = DATABASE
TableName = 'Customer'
1;

Vproc
0
1
2
3
4
5
6
7

TableName
Customer
Customer
Customer
Customer
Customer
Customer
Customer
Customer

CurrentPerm
127488
127488
127488
127488
128000
128000
126976
126976

Ex: Display the distribution of Customer by AMP row counts.
SELECT

FROM
GROUP BY
ORDER BY

HASHAMP (HASHBUCKET
(HASHROW (Customer_number))) AS "AMP #"
,COUNT(*)
Customer
1
1;

The Row Hash functions can be used to predict the distribution
of data rows for any column in a table.

Analyze Primary Index Criteria

AMP #
0
1
2
3
4
5
6
7

Count(*)
867
886
877
870
881
878
879
862

Page 16-35

E-R Diagram for Exercises
The E-R diagram on the facing page depicts the tables used in the exercises. Though the
names of the tables and their columns are generic, the model is properly normalized to Third
Normal Form (3NF).

Page 16-36

Analyze Primary Index Criteria

E-R Diagram for Exercises

ENTITY 1

DEPENDENT

HISTORY

ENTITY 2

ASSOCIATIVE
1

ASSOCIATIVE
2

Note:
The exercise table and column names are generic so that index selections are not
influenced by names.

Analyze Primary Index Criteria

Page 16-37

Exercise 2 – Sample
The facing page has an example of how to use Distribution demographics to identify PI
candidates. On the following pages, you will be asked to identify PI candidates in a similar
manner.
Use the Primary Index Candidate Guidelines below to identify the PI candidates. Indicate
whether they are UPI or NUPI candidates. Indicate borderline candidates with a ?
In later exercises, you will make the final index choices for these tables.
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates. These columns will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (maybe at least 10 times the number
of AMPs), low Maximum Rows NULL, and with a Typical Rows per Value that is
relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-38

Analyze Primary Index Criteria

4N*100amps

Exercise 2 – Sample
On the following pages, there are sample tables with
distribution demographics.
• Indicate ALL possible Primary Index candidates
(UPI and NUPI).
• Later exercises will guide your final choices.

Example

60,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

A

B

PK,SA
5K
12
1M
50M
60M
1
0
1
0
UPI

2.6K
0
0
7M
12
5
7
1
NUPI

Primary Index Candidate Guidelines:
• PK and UNIQUE COLUMNS (ND)
• Any single column with:
– High Distinct values (at least 10X)
– Low Maximums for NULLs or a Value
– Typical Rows that is close to Max Rows

C

D

FK,NN

NN,ND

0
0
1K
5K
1.5M
500
0
35
5
NUPI?

500K
0
0
0
60M
1
0
1
3
UPI

E

F

G

H

0
0
0
0
8
8M
0
7M
0

0
0
0
0
15M
9
725K
3
4

0
0
0
0
15M
725K
5
3
4

52
4K
0
700
90K
10K
80K
9

PI/SI
Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-39

Exercise 2 – Choosing PI Candidates
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-40

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates
ENTITY 1
100,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

A

B

C

D

E

F

0
0
0
0
95M
2
0
1
3

0
0
0
0
300K
400
0
325
2

0
0
0
0
250K
350
0
300
1

0
0
0
0
40M
3
1.5M
2
1

0
0
0

PK,UA

50K
0
10M
10M
100M
1
0
1
0

1M
110
0
90
1

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-41

Exercise 2 – Choosing PI Candidates (cont.)
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-42

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates (cont.)
ENTITY 2
10,000,000
Rows

G

PK/FK

PK,SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

5K
12
100M
100M
10M
1
0
1
0

H

I

J

K

L

365
0
0
0
100K
200
0
100
0

12
0
0
0
9M
2
100K
1
9

12
0
0
0
12
1M
0
800K
1

0
0
0
0
50
240K
0
190K
2

0
260
0
180K
60
0
50
0

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-43

Exercise 2 – Choosing PI Candidates (cont.)
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-44

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates (cont.)
DEPENDENT
5,000,000
Rows

A

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

M

N

O

PK

P

Q

NN, ND

FK

SA

0
0
700K
1M
2M
4
0
1
0

0
0
0
0
50
200K
0
60K
0

0
0
0
0
90K
75
0
50
3

0
0
0
0
3M
2
390K
1
1

0
0
0
0
5M
1
0
1
0

0
0
0
0
2M
5
1M
1
1

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-45

Exercise 2 – Choosing PI Candidates (cont.)
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-46

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates (cont.)
ASSOCIATIVE 1
300,000,000
Rows

A

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

G

R

S

0
0
0
0
15K
21K
0
19K
0

0
0
0
0
800K
400
0
350
0

PK
FK

FK,SA

260
0
0
0
100M
5
0
3
0

0
0
8M
300M
10M
50
0
30
0

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-47

Exercise 2 – Choosing PI Candidates (cont.)
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-48

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates (cont.)
ASSOCIATIVE 2
100,000,000
Rows

A

M

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

G

T

U

0
0
0
0
560K
180
0
170
0

0
0
0
0
750
135K
0
100K
0

PK
FK

FK

0
0
7M
800M
50M
3
0
1
0

0
0
250K
20M
10M
150
0
8
0

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-49

Exercise 2 – Choosing PI Candidates (cont.)
Use the Primary Index Candidate Guidelines to identify the PI candidates. Indicate whether
they are UPI or NUPI candidates. Indicate borderline candidates with a question mark (?).
Primary Index Candidate Guidelines:


ALL Unique Columns are PI candidates and will be identified with the
abbreviation ND for No Duplicates.



The Primary Key (PK) is a UPI candidate.



Any single column with high Distinct Values (at least 100% greater than the
number of AMPs), low Maximum Rows NULL, and with a Typical Rows per
Value that is relatively close to the Maximum Rows per Value is a PI candidate.

Page 16-50

Analyze Primary Index Criteria

Exercise 2 – Choosing PI Candidates (cont.)
HISTORY
730,000,000
Rows

A

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

DATE

D

E

F

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

PK
FK

SA

10M
0
800M
2.4B
100M
18
0
3
0

5K
20K
0
0
730
1100K
0
900K
0

PI/SI

Collect Statistics (Y/N)

Analyze Primary Index Criteria

Page 16-51

What is a NoPI Table?
A NoPI Table is simply a table without a primary index.
Prior to Teradata Database 13.0, Teradata tables required a primary index. The primary
index was primarily used to hash and distribute rows to the AMPs according to hash
ownership. The objective was to divide data as evenly as possible among the AMPs to make
use of Teradata’s parallel processing. Each row stored in a table has a RowID which
includes the row hash that is generated by hashing the primary index value. For example,
the optimizer can choose an efficient single-AMP execution plan for SQL requests that
specify values for the columns of the primary index.
Starting with Teradata Database 13.0, a table can be defined without a primary index. This
feature is referred to as the NoPI Table feature. NoPI stands for No Primary Index.
Without a PI, the hash value as well as AMP ownership of a row is arbitrary. Within the
AMP, there are no row-ordering constraints and therefore rows can be appended to the end
of the table as if it were a spool table. Each row in a NoPI table has a hash bucket value that
is internally generated. A NoPI table is internally treated as a hashed table; it is just that
typically all the rows on one AMP will have the same hash bucket value.

Page 16-52

Analyze Primary Index Criteria

What is a NoPI Table?
What is a No Primary Index (NoPI) Table?

• It is simply a table without a primary index – a Teradata 13.0 feature.
• As rows are inserted into a NoPI table, rows are always appended at the end of the
table and never inserted in a middle of a hash sequence.

– Organizing/sorting rows based on row hash is therefore avoided.
Basic Concepts

• Rows will still be distributed between AMPs. New code (Random Generator) will
determine which AMP will receive rows or blocks of rows.

• Within an AMP, rows are simply appended to the end of the table. Rows will have a
unique RowID – the Uniqueness Value is incremented.

Benefits

• A NoPI table will reduce skew in intermediate ETL tables which have no natural
Primary Index.

• Loads (FastLoad and TPump Array Insert) into a NoPI staging table are faster.

Analyze Primary Index Criteria

Page 16-53

Reasons to Consider Using NoPI Tables
The facing page identifies various reasons to consider using NoPI tables.
Why is a NoPI table useful?


A NoPI can be very useful in those situations when the default primary index (first
column) causes skewing of data between AMPs and performance degradation.



This type of table provides a performance advantage in that data can be loaded and
stored quickly into a NoPI table using FastLoad or TPump Array INSERT.

Page 16-54

Analyze Primary Index Criteria

Reasons to Consider Using NoPI Tables
Reasons to consider using a NoPI Table

• Utilize NoPI tables instead of arbitrarily defaulting to first table column or creating an
unnatural Primary Index from many columns.

• Some ETL tools generate intermediate tables to store data without a known
distribution of values.
If the first column is used (defaults) as the primary index (NUPI), this may lead to
skewed data and performance issues.

– The system default can be set to create tables without a primary index.
• As a staging table to be used with the mini-batch loading technique.
• A NoPI table can be used as a Sandbox table (or any table) where data can be inserted
until an appropriate indexing method is determined.

• A NoPI table can be used as a Log file.
• As a Column Partitioned (columnar) table – Teradata 14.0 feature.

Analyze Primary Index Criteria

Page 16-55

Creating a Table without a PI
The facing page identifies the syntax to create a table without a primary index.
If you attempt to include the key word SET (set table) and NO PRIMARY INDEX in the
same CREATE TABLE statement, you will receive a syntax error.

Page 16-56

Analyze Primary Index Criteria

Creating a Table without a PI
To create a NoPI table, specify the NO PRIMARY INDEX clause in the CREATE
TABLE statement.
CREATE TABLE  ( ,
 ,
…)
NO PRIMARY INDEX;

Considerations:

– When a table is created with no primary index, the TableKind column is set to 'O'
instead of 'T' and appears in the DBC.TVM table.

– If PRIMARY KEY or UNIQUE constraints are also defined, these will be implemented
as Unique Secondary Indexes.

– A NoPI table is automatically created as a MULTISET table.

Analyze Primary Index Criteria

Page 16-57

How is a NoPI Table Implemented?
The NoPI Table feature is another step toward extending or supporting Mini-Batch. By
allowing a table with no primary index acting as a staging table, data can be loaded into the
table a lot more efficiently and in turn faster. All of the rows in a data request, after being
received by Teradata and converted into proper internal format, can be appended to a NoPI
table without having to be redistributed to their hash-owning AMPs. Rows in a NoPI table
are not hashed based on the primary index because there isn’t one.
The hash values are all internally controlled and generated and therefore the rows can be
stored in any particular order and in any AMP. That means sorting of the rows is avoided.
The performance advantage, especially for FastLoad, from using a NoPI table is most
significant for applications that currently load data into a staging table to be transformed or
standardized before being stored into another staging table or the target table. For those
applications, using a NoPI table can avoid the unnecessary row redistribution and sorting
work. Another advantage for FastLoad is that users can quickly load data into a NoPI table
and be done with the acquisition phase freeing up Client resources for other work.
For TPump, the performance advantage can be much bigger especially for applications that
were not able to pack many rows into the same AMP step in a traditional PI table. On a
NoPI table, all rows in a data request are packed into the same AMP step independently
from the system configuration and the clustering of data. This will generally lead to big
reductions in CPU and IO usage.

Page 16-58

Analyze Primary Index Criteria

How is a NoPI Table Implemented?
Rows are distributed between AMPs using a random generator. Within an AMP,
rows are simply added to a table in sequential order.

• The random generator is designed in such as way that data will be balanced out
between the AMPs.

• Although there is no primary index in a NoPI table, rows will still have a valid 64-bit
RowID.

The first part of the RowID is based on a hash bucket value (16 or 20 bits) that is
internally generated and controlled by the AMP.

• Typically, all the rows in a table on one AMP will have the same hash bucket value,
but will have different uniqueness values.

There are two separate steps used with a NoPI table.
1. A new internal function (e.g., random generator) is used to choose a hash bucket
which effectively determines which AMP the row(s) are sent to.
2. The AMP internally selects a hash bucket value that the AMP owns and uses it as the
first part (16 or 20 bits) of the RowID.

Analyze Primary Index Criteria

Page 16-59

NoPI Random Generator
For SQL-based functions, the PE uses the following technique for the random generator.
The DBQL Query ID is used by the random generator to select a random row hash. The
approach is to generate a random row hash in such a way that for a new request, data will
generally be sent to a different AMP from the one that the previous request sent data to. The
goal is to balance out the data as much as possible without the use of the primary index. The
DBQL Query ID is selected for this purpose because it uses the PE vproc ID in its high
digits and a counter-based value in its low digits.
There are two cases for INSERT; one is when only one single data row is processed and the
other is when multiple data rows are processed with an Array INSERT request. In the case
of an Array INSERT request, rows are sorted by their hash-owning AMPs so that the rows
going to the same AMP can easily be grouped together into the same step. This random row
hash will be generated once per request so that in the case of Array INSERT, the same
random row hash is used for all of the rows. This means they all will be sent to the same
AMP and usually in the same step.
FastLoad sends blocks of data to the AMPs. Each AMP (that receives blocks of data) uses
random generator code to distribute blocks of data between all of the AMPs in a round
robin fashion.

Page 16-60

Analyze Primary Index Criteria

NoPI Random Generator
How is the AMP selected that will receive the row (or block of rows)?

• The random generator can be executed at the PE or at the AMP level depending on the
type of request (e.g., SQL versus FastLoad).

For SQL-based functions, the PE uses the random generator.

• The DBQL Query ID is used by the random generator to select a random hash value.
– The approach is to generate a random hash bucket value in such a way that for a
new request, data will generally be sent to a different AMP from the one that the
previous request sent data to.

– In the case of an Array INSERT request, this random hash bucket value will be
generated once per request so that in the case of Array INSERT, the same
random hash bucket value is used for all of the rows.

For FastLoad-based functions, the AMP uses random generator code to
distribute blocks of data between the AMPs in a round robin fashion.

Analyze Primary Index Criteria

Page 16-61

The Row ID for a NoPI Table
For a NoPI table, the AMP will assign a RowID (64 bits) for a row or a set of rows using a
hash bucket that the AMP owns. For a NoPI table, the RowID will consist of a 20-bit hash
bucket followed by 44 bits that are used for the uniqueness part of the RowID. Only the first
20 bits (hash bucket) are used.
As more rows are added to the table, the uniqueness value is sequentially incremented.
For systems using a 16-bit hash buckets, the RowID for a NoPI table will have 16 bits for
the hash bucket value and 48 bits for the uniqueness id.

Page 16-62

Analyze Primary Index Criteria

The Row ID for a NoPI Table
The RowID will still be 64 bits, but it is utilized a little differently in a NoPI table.

• The first 20 bits represents the hash bucket that is internally selected by the AMP.
• Remaining 44 bits are used for the uniqueness value of rows in a NoPI table.
• Note: Systems may be configured to use 16 bits for the hash bucket numbers – if
so, then the uniqueness value will utilize 48 bits of the RowID.
Row ID for NoPI table

Hash Bucket
20 (or 16) bits

Row ID

Each row still has a
Row ID as a prefix.

Rows are logically
maintained in Row ID
sequence.

Uniqueness Value
44 (or 48) bits

Hash Bucket
000E7
000E7
000E7
000E7
000E7
000E7
:

Analyze Primary Index Criteria

Row Data

Uniqueness
00000000001
00000000002
00000000003
00000000004
00000000005
00000000006
:

Cust_No

Last_Name

First_Name

001018
001020
001031
001014
001012
001021
:

Reynolds
Davidson
Green
Jacobs
Garcia
Carnet
:

Jane
Evan
Jason
Paul
Jose
Jean
:

Page 16-63

The Row ID for a NoPI Table (cont.)
For a NoPI table, the AMP will assign a RowID (64 bits) for a row or a set of rows using a
hash bucket that the AMP owns. This 64-bit RowID can be used by secondary and join
indexes.
What is a different about the RowID for a NoPI table is that the uniqueness id is 44 bits long
instead of 32 bits. The additional 12 bits available in the row hash are added to the 32-bit
uniqueness. This gives a total of 44 bits to use for the uniqueness part of the RowID. For
each hash bucket, there can be up to 17 trillion rows per AMP (approximately).
For systems using a 16-bit hash buckets, the RowID for a NoPI table will have 16 bits for
the hash bucket value and 48 bits for the uniqueness id.
The RowID is still 64 bits long and a unique identifier of a row within a table.

Page 16-64

Analyze Primary Index Criteria

The Row ID for a NoPI Table (cont.)
The RowID is 64 bits and can be referenced by secondary and join indexes.

• The first 20 (or 16) bits represent the hash bucket value which is internally chosen by
and controlled by the AMP.

• Remaining 44 (or 48) bits are used for the uniqueness value of rows in a NoPI table.
This module assumes that 20-bit hash bucket numbers are used.

– The uniqueness value starts from 1 and will be sequentially incremented.
– With 44 bits, there can be approximately 17 trillion rows on an AMP.
• Normally, all rows in a NoPI table on an AMP will have the same hash bucket value
(first 20 bits) and the 44-bit uniqueness value will start at 1 and be sequentially
incremented.

• Each row in a NoPI table will have a RowID with a hash bucket value that is actually
owned by the AMP storing the row.

Fallback and index maintenance work the same as if the table is a primary index
table.
As always, the RowID is transparent to the end-user.

Analyze Primary Index Criteria

Page 16-65

Multiple NoPI Tables at the AMP Level
The facing page illustrates an example of two NoPI tables in a 27-AMP system.
Other NoPI considerations include:

Archive/Recovery Issues
Archive/Restore will be supported for NoPI table. Archiving a table or a database and
restoring or copying that to the same system or a different system should work out fine with
the existing scheme for NoPI table when no data redistribution takes place (same number of
AMPs). Data redistribution takes place when there is a difference in configuration or hash
function between the source system and the target system. In the case of a difference in
configuration, each row in a table will be looked at and if its hash bucket belongs to some
other AMP using the new configuration, that row will be redistributed to its hash-owning
AMP.
Since one hash bucket is normally enough to use to assign RowID to all of the rows on each
AMP, when we restore or copy data to a different configuration with more AMPs, there will
be AMPs that will not have any data at all. This means that data in a NoPI table can be
skewed after a Restore or Copy.
This is because permanent space is divided equally among the AMPs whether or not any of
them get any data. As some AMPs not getting any data from a Restore or Copy, some other
AMPs will get more data compared to what it was in the source system and this will require
more space allocated overall.
However, as a staging table, NoPI table is not intended to stay around for too long so it is
not expected to have many NoPI tables being restored or copied.

Reconfig Issues
Reconfig will be supported for NoPI table. The issue with Reconfig is very similar to that of
Restore or Copy to a different configuration. Although rows in a NoPI table are not hashed
based on the primary index and the AMPs where they reside are arbitrary, but each row does
have a RowID with a hash bucket that is owned by the AMP storing that row. Redistributing
rows in a NoPI table via Reconfig can be done by sending each row to the AMP that owns
the hash bucket in that row based on the new configuration map. As with Restore and Copy,
Reconfig can make a NoPI table skewed by going to a configuration with more AMPs.

Page 16-66

Analyze Primary Index Criteria

Multiple NoPI Tables at the AMP Level
AMP 0

...

TableID

AMP 3

...

Row ID
Hash

Uniq Value

AMP 17

Row Data

TableID

...

Row ID
Hash

Uniq Value

NoPI
Table1

00089A (Base)
00089A (Base)
00089A (Base)
00089A (Base)

000E7
000E7
000E7
000E7

00000000001
00000000002
00000000003
00000000004

00089A (Base)
00089A (Base)
00089A (Base)
00089A (Base)

0003F
0003F
0003F
0003F

00000000001
00000000002
00000000003
00000000004

NoPI
Table2

00089B (Base)
00089B (Base)
00089B (Base)
00089B (Base)

000E7
000E7
000E7
000E7

00000000001
00000000002
00000000003
00000000004

00089B (Base)
00089B (Base)
00089B (Base)
00089B (Base)

0003F
0003F
0003F
0003F

00000000001
00000000002
00000000003
00000000004

AMP 26

Row Data

Data within an AMP is logically stored in Table ID / Row ID sequence.

Analyze Primary Index Criteria

Page 16-67

Loading Data into a NoPI Table
The facing page summarizes various techniques of getting data inserted into a NoPI table.

Page 16-68

Analyze Primary Index Criteria

Loading Data into a NoPI Table
Simple INSERTs

• For a simple INSERT, the PE selects a random AMP where the row is sent to. That AMP
then turns the row into proper internal format and appends it to the end of the NoPI table.
INSERT–SELECT

• When inserting data from a source PI (or NoPI) table into a NoPI target table, data from the
source table will NOT be redistributed and will be locally appended into the target table.
INSERT-SELECT to a target NoPI table can result in a skewed NoPI table if the source table
is skewed.
FastLoad

• Blocks of data are sent to the AMP load sessions and the AMP random generator code
randomly distributes the blocks between the AMPs usually resulting in even distribution of
the data between AMPs.
TPump

• With TPump Array INSERT, rows are packed together in a request and distributed to an
AMP and then appended to the NoPI table on that AMP. Different requests are distributed to
different AMPs by the PE. This will usually result in even distribution of the data between the
AMPs.

Analyze Primary Index Criteria

Page 16-69

NoPI Options
The following options are available to a NoPI table:
•
•
•
•
•
•

FALLBACK
Secondary indexes – USI and NUSI
Join and reference indexes
Primary Key and Foreign Key constraints are allowed on a NoPI table.
LOBs are allowed on a NoPI table.
INSERT and DELETE trigger actions are allowed on a NoPI table.
– UPDATE trigger actions will be allowed starting with Teradata
13.00.00.03.
• NoPI table can be a Global Temporary or Volatile table.
• COLLECT/DROP STATISTICS are allowed on a NoPI table.
• FastLoad – note that duplicate rows are loaded and not deleted with a NoPI table
The following limitations apply to a NoPI table:
•
•
•
•
•
•
•
•
•

Page 16-70

SET is not allowed. Default is MULTISET for both Teradata and ANSI mode.
No columns are allowed to be specified for the primary index.
Partitioned primary index is not allowed.
Permanent journaling is not allowed.
Identity column is not allowed.
Cannot be created as a queue or as an error table.
Hash index is not allowed on a NoPI table.
MultiLoad cannot be used to load a NoPI table.
UPDATE, UPSERT, and MERGE-INTO operations are using the NoPI table as the
target table.
– UPDATE will be available with Teradata 13.00.00.03

Analyze Primary Index Criteria

NoPI Table Options
Options available with NoPI tables

•
•
•
•

FALLBACK
Secondary indexes – USI and NUSI
Join and reference indexes
Primary Key and Foreign Key
constraints are allowed.

• LOBs are allowed on a NoPI table.
• INSERT and DELETE trigger actions
are allowed on a NoPI table.

– UPDATE trigger actions will be
allowed starting with Teradata
13.00.00.03.

• Can be a Global Temporary or Volatile
table.

• COLLECT/DROP STATISTICS are

Limitations of NoPI tables

•
•
•
•
•
•

SET tables are not allowed.
Partitioned primary index is not allowed.
Permanent journaling is not allowed.
Identity column is not allowed.
Cannot be a queue or as an error table.
Hash index is not allowed on a NoPI
table.

• MultiLoad cannot be used on a NoPI
table.

• UPDATE, UPSERT, and MERGE-INTO
operations using the NoPI table as the
target table are not allowed.

– UPDATE will be available with
Teradata 13.00.00.03

allowed.

• FastLoad – note that duplicate rows
are loaded and not deleted with a
NoPI table

Analyze Primary Index Criteria

Page 16-71

Summary
The facing page summarizes some of the key concepts covered in this module.

Page 16-72

Analyze Primary Index Criteria

Summary
Tables with a Primary Index:

• Base PI on the column(s) most often used for access, provided that the values are
unique or nearly unique.

• Duplicate values hash to the same AMP and are stored in the same data block when
possible.

• PRIMARY KEY and/or UNIQUE constraints are always implemented as a unique index
(either a UPI or a USI.

Tables without a Primary Index:

• Although there is no primary index in a NoPI table, rows do have a valid row ID with
both hash and uniqueness.

– Hash value is internally selected in the AMP
• Rows in a NoPI table will be even distributed between the AMPs based upon a new
code (i.e., random generator).

Analyze Primary Index Criteria

Page 16-73

Module 16: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 16-74

Analyze Primary Index Criteria

Module 16: Review Questions
1. Which trade-off must be balanced to make the best choice for a primary index? ____
a.
b.
c.
d.

Access and volatility
Access and block size
Block size and volatility
Access and distribution

2. When volatility is considered as one of the Primary Index choice criteria, what is analyzed? ____
a.
b.
c.
d.

Degree of uniqueness
How often the data values will change
How often the fixed length rows will change
How frequently the column is used for access

3. To optimize the use of disk space, the designer should choose a primary index that ________.
a.
b.
c.
d.
e.

is non-unique
consists of one column
is unique or nearly unique
consists of multiple columns
has fewer distinct values than AMPs

Analyze Primary Index Criteria

Page 16-75

Module 16: Review Questions (cont.)
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 16-76

Analyze Primary Index Criteria

Module 16: Review Questions (cont.)
4. For NoPI tables, what are 2 ways in which the Random Generator is executed?
a.
b.
c.
d.

At the AMP level with FastLoad
At the PE level for ad hoc SQL requests
At the TPump client level for array insert operations
At the AMP level for INSERT-SELECT into an empty NoPI table

5. Assume DBSControl flag #53 (Primary Index Default) is set to N (No Primary Index), which two
indexes are created for TableX given the following DDL command?
CREATE TABLE TableX
(col1
INTEGER NOT NULL UNIQUE
,col2
CHAR(10) NOT NULL PRIMARY KEY
,col3
CHAR(80));
a.
b.
c.
d.

col1 will be a UPI
col1 will be a USI
col2 will be a UPI
col2 will be a USI

6. Which two options are permitted for NoPI tables?
a.
b.
c.
d.

Fallback
MultiLoad
Hash Index
BLOBs and CLOBs

Analyze Primary Index Criteria

Page 16-77

Lab Exercise 16-1
Check your understanding of the concepts discussed in this module by completing the lab
exercise as directed by your instructor.

Page 16-78

Analyze Primary Index Criteria

Lab Exercise 16-1
Lab Exercise 16-1
Purpose
In this lab, you will use Teradata SQL Assistant to evaluate various columns of table as primary index
candidates.
What you need
Populated PD.Employee table; your empty Employee table
Tasks
1. INSERT/SELECT all rows from the populated PD.Employee table to your “Employee” table. Verify
the number of rows in your table.
INSERT INTO Employee SELECT * FROM PD.Employee;
SELECT COUNT(*) FROM Employee;

Analyze Primary Index Criteria

Count = _________

Page 16-79

Lab Exercise 16-1 (cont.)
Use the following SQL to determine the column metrics for this Lab.
# of Distinct Values for a column:
SELECT
COUNT(DISTINCT(column_name))
FROM
tablename;
Max Rows per Value for all values in a column:
SELECT
column_name, COUNT(*)
FROM
tablename
GROUP BY 1
ORDER BY 2 DESC;
Max Rows with NULL in a column:
SELECT
COUNT(*)
FROM
tablename
WHERE
column_name IS NULL;
Average Rows per Value for a column (mean value):
SELECT
COUNT(*) / COUNT(DISTINCT(col_name))
FROM
tablename;
Typical Rows per Value for a column (median value):
SELECT
t_count AS "Typical Rows per Value"
FROM
(SELECT col_name, COUNT(*)
FROM tablename GROUP BY 1)
t1 (t_colvalue, t_count),
(SELECT COUNT(DISTINCT(col_name))
FROM
tablename)
t2 (num_rows)
QUALIFY ROW_NUMBER () OVER
(ORDER BY t1.t_colvalue) = t2.num_rows /2 ;

Page 16-80

Analyze Primary Index Criteria

Lab Exercise 16-1 (cont.)
2. Collect column demographics for each of these columns in Employee and determine if the column
would be a primary index candidate or not.
By using the SHOW TABLE Employee command, you should be able to complete the
Employee_number information without executing any SQL.
Distinct
Values

Max Rows
for a Value

Max Rows
NULL

Avg Rows
per Value

Candidate
for PI (Y/N)

Employee_Number
Dept_Number
Job_Code
Last_name

Analyze Primary Index Criteria

Page 16-81

Lab Exercise 16-2
Distribution of table space by AMP:
SELECT
FROM
WHERE
AND
ORDER BY

Page 16-82

Vproc, TableName (CHAR(15)), CurrentPerm
DBC.TableSizeV
DatabaseName = DATABASE
TableName = 'tablename'
1;

Analyze Primary Index Criteria

Lab Exercise 16-2
Lab Exercise 16-2
Purpose
In this lab, you will use the DBC.TableSizeV view to determine space distribution on a per AMP basis.
What you need
Your populated Employee table.
Tasks
1. Use SHOW TABLE command to determine which column is the Primary Index. PI = ______________
Determine the AMP space usage of your Employee table using DBC.TableSizeV.
AMP #_____ has the least amount of permanent space – amount __________
AMP #_____ has the greatest amount of permanent space – amount __________
2. Create a new table named Employee_2 with the same columns as Employee except specify
Last_name as the Primary Index.
Use INSERT/SELECT to populate Employee_2 from Employee.
Determine the AMP space usage of your Employee_2 table using DBC.TableSizeV.
AMP #_____ has the least amount of permanent space – amount __________
AMP #_____ has the greatest amount of permanent space – amount __________

Analyze Primary Index Criteria

Page 16-83

Notes

Page 16-84

Analyze Primary Index Criteria

Module 17
Partitioned Primary Indexes

After completing this module, you will be able to:
 Describe the components that comprise a Row ID in a
partitioned table.
 List two advantages of partitioning a table.
 List two potential disadvantages of partitioning a table.
 Create single-level and multi-level partitioned tables.
 Use the PARTITION key word to display partition information.

Teradata Proprietary and Confidential

Partitioned Primary Indexes

Page 17-1

Notes

Page 17-2

Partitioned Primary Indexes

Table of Contents
Partitioning a Table .................................................................................................................... 17-4
How is Partitioning Implemented?............................................................................................. 17-6
Logical Example of NPPI versus PPI ........................................................................................ 17-8
Primary Index Access (NPPI) .................................................................................................. 17-10
Primary Index Access (PPI) ..................................................................................................... 17-12
Why Partition a Table? ............................................................................................................ 17-14
Advantages/Disadvantages of Partitioning .............................................................................. 17-16
Disadvantages of Partitioning .............................................................................................. 17-16
PPI Considerations ................................................................................................................... 17-18
Access of Tables with a PPI ................................................................................................. 17-18
How to Define a PPI ................................................................................................................ 17-20
Partitioning with CASE_N and RANGE_N ............................................................................ 17-22
Partitioning with RANGE_N – Example 1 .............................................................................. 17-24
Access using Partitioned Data – Example 1 (cont.) ............................................................. 17-26
Access Using Primary Index – Example 1 (cont.) ............................................................... 17-28
Place a USI on NUPI – Example 1 (cont.) ........................................................................... 17-30
Place a NUSI on NUPI – Example 1 (cont.) ........................................................................ 17-32
Partitioning with RANGE_N – Example 2 .............................................................................. 17-34
Partitioning – Example 3.......................................................................................................... 17-36
Special Partitions with CASE_N and RANGE_N ................................................................... 17-38
Special Partition Examples ...................................................................................................... 17-40
Partitioning with CASE_N – Example 4 ................................................................................. 17-42
Additional examples: ........................................................................................................... 17-42
SQL Use of PARTITION Key Word ....................................................................................... 17-44
SQL Use of CASE_N .............................................................................................................. 17-46
Using ALTER TABLE with PPI Tables .................................................................................. 17-48
ALTER TABLE – Example 5 .................................................................................................. 17-50
ALTER TABLE – Example 5 (cont.) ...................................................................................... 17-52
ALTER TABLE TO CURRENT ............................................................................................. 17-54
ALTER TABLE TO CURRENT – Example 6 ........................................................................ 17-56
PPI Enhancements.................................................................................................................... 17-58
Multi-level PPI Concepts ......................................................................................................... 17-60
Multi-level PPI Concepts (cont.) ............................................................................................. 17-62
Multi-level Partitioning – Example 7....................................................................................... 17-64
Multi-level Partitioning – Example 7 (cont.) ........................................................................... 17-66
How is the MLPPI Partition # Calculated? .............................................................................. 17-68
Character PPI ........................................................................................................................... 17-70
Character PPI – Example 8 ...................................................................................................... 17-72
Summary .................................................................................................................................. 17-74
Module 17: Review Questions ................................................................................................. 17-76
Lab Exercise 17-1 .................................................................................................................... 17-80

Partitioned Primary Indexes

Page 17-3

Partitioning a Table
As part of implementing a physical design, Teradata provides numerous indexing options
that can improve performance for different types of queries and workloads. For example,
secondary indexes, join indexes, or hash indexes may be utilized to improve performance for
known queries. Teradata provides additional new indexing options to provide even more
flexibility in implementing a Teradata database. One of these new indexing options is the
Partitioned Primary Index (PPI). Key characteristics of Partitioned Primary Indexes are
listed on the facing page.
Primary indexes can be partitioned or non-partitioned. A non-partitioned primary index
(NPPI) is the traditional primary index by which rows are assigned to AMPs. Apart from
maintaining their storage in row hash order, no additional assignment processing of rows is
performed once they are hashed to an AMP.
A partitioned primary index (PPI) permits rows to be assigned to user-defined data partitions
on the AMPs, enabling enhanced performance for range queries that are predicated on
primary index values.
The Partitioned Primary Index (PPI) feature allows a class of queries to access a portion of a
large table, instead of the whole table. The traditional uses of the Primary Index (PI) for
data placement and rapid access of the data when the PI values are specified are retained.
Some common business queries generally require a full-table scan of a large table, even
though it’s predictable that a fairly small percentage of the rows will qualify. One example
of such a query is a trend analysis application that compares current month sales to the
previous month, or to the same month of the previous year, using a table with several years
of sales detail. Another example is an application that compares customer behavior in one
(fairly small) geographic region to another region.
Acronyms:
PI
– Primary Index
PPI
– Partitioned Primary Index
NPPI – Non-Partitioned Primary Index

Page 17-4

Partitioned Primary Indexes

Partitioning a Table
What is a “Partitioned Primary Index” or PPI?

• A indexing mechanism in Teradata for use in physical database design.
• Data rows are grouped into partitions at the AMP level – partitioning is simply an
ordering of the rows within a table on an AMP.

What advantages does partitioning provide?

• Increases the available options to improve the performance of certain types of queries
– specifically range-constrained queries.

• Only the rows of the qualified partitions in a query need to be accessed – avoid full
table scans.

How is a PPI created and managed?

• A PPI is easy to create and manage.
– The CREATE TABLE and ALTER TABLE statements contain options to create and/or alter
partitions.

• As always, data is distributed among AMPs and automatically placed within partitions.

Partitioned Primary Indexes

Page 17-5

How is Partitioning Implemented?
The PRIMARY INDEX clause (part of the CREATE TABLE statement) has been extended
to include a PARTITION BY clause. This new partition expression definition is the only
thing that needs to be done to create a partitioned table. Advantages to this approach are:






No separate partition layout
No disk layout for partitions
No definition of location in the system for partition
No need to define/manage separate tables per segment of the table that needs to be
accessed
Even data distribution and even processing of a logical partition is automatic due to
the PI distribution of the rows

No query has to be modified to take advantage of a PPI table.
For tables with a PPI, Teradata utilizes a 3-level scheme to distribute and later locate the
data. The 3 levels are:


Rows are distributed across all AMPs (and accessed via the Primary Index) based
upon HBN (Hash Bucket Number) portion of the Row Hash.



At the AMP level, rows are first ordered by their partition number.



Within the partition, data rows are logically stored in Row ID sequence.

A new term is associated with PPI tables. The Row Key is a combination of the Partition #
and the Row Hash. The term Row Key will appear in EXPLAIN reports.

Page 17-6

Partitioned Primary Indexes

How is Partitioning Implemented?
Provides an additional level of data distribution and ordering.

• Rows are distributed across all AMPs (via Primary Index) based upon HBN portion of
the Row Hash.

• Rows are first ordered by their partition number within the AMP.
• Within the partition, data rows are logically stored in Row ID sequence.
If a table is partitioned, rows are placed into partitions.

• Teradata 13.10 (and before) – partitions are numbered 1 to 65,535.
• Teradata 14.0 – maximum combined partitions is increased to 9.223 Quintillion.
– If combined partitions is <= 65,535, then 2-byte partition numbers are used.
– If combined partitions is > 65,535, then 8-byte partition numbers are used.
In a partitioned table, each row is uniquely identified by the following:

• Row ID = Partition # + Row Hash + Uniqueness Value
• Row Key = Partition # + Row Hash (e.g., Row Key will appear in Explain plans)
– In a partitioned table, data rows will have the Partition # included as part of the data row.
To help understand how partitioning is implemented, this module will include examples of
data access using tables defined with NPPI and PPI.

Partitioned Primary Indexes

Page 17-7

Logical Example of NPPI versus PPI
The facing page provides a logical example of an Orders table implemented with a NPPI
(Non-Partitioned Primary Index) and the same table implemented with a PPI (Partitioned
Primary Index). Only the Order_Number and a portion (YY/MM) of the Order_Date are
shown in the example.
The column headings in this example represent the following:
RH – Row Hash – the two-digit row hash is used for simplification purposes. A true
table would contain a Row ID for each row (Row Hash + Uniqueness Value).
Note that as just in a real implementation, two different order numbers happen to
hash to the same row hash value. Order numbers 1012 and 1043 on AMP 2 both
hash to ‘36’.
O_# – Order Number – this example assumes that Order Number is the Primary Index
and the data rows are hash distributed based on this value.
O_Date – Order Date – another column in the table. This example only contains orders
for 4 months – from January, 2012 through April, 2012. For example, an order
date, such as 12/01, represents January of 2012 (or 2012/01).
Important points to understand from this example:


All of the rows in the NPPI table are stored in logical Row ID sequence (row hash
+ uniqueness value) within each AMP.



The rows in the PPI table are first ordered by Partition Number, and then by Row
Hash (actually Row ID) sequence within the Partition.



This example illustrates 4 partitions – one for each of the 4 months shown in the
example.



A query that requests “order information” (with a WHERE condition that specifies
a range of dates) will result in a full table scan of the NPPI table.



The same query will only have to access the required partitions in the PPI table.

Page 17-8

Partitioned Primary Indexes

Logical Example of NPPI versus PPI
4 AMPs with
Orders Table defined
with Non-Partitioned
Primary Index (NPPI).

4 AMPs with
Orders Table defined
with PPI on O_Date.
SELECT …
WHERE O_Date
BETWEEN '2012-03-01'
AND
'2012-03-31';

Partitioned Primary Indexes

RH

O_#

O_Date

RH

O_#

O_Date

RH

O_#

O_Date

RH

O_#

'01'

1028

12/03

'06'

1009

12/01

'04'

1008

12/01

'02'

1024

12/02

'03'

1016

12/02

'07'

1017

12/02

'05'

1048

12/04

'08'

1006

12/01

'12'

1031

12/03

'10'

1034

12/03

'09'

1018

12/02

'11'

1019

12/02

'14'

1001

12/01

'13'

1037

12/04

'15'

1042

12/04

'18'

1041

12/04

'17'

1013

12/02

'16'

1021

12/02

'19'

1025

12/03

'20'

1005

12/01

'23'

1040

12/04

'21'

1045

12/04

'24'

1004

12/01

'22'

1020

12/02

'28'

1032

12/03

'26'

1002

12/01

'27'

1014

12/02

'25'

1036

12/03

'30'

1038

12/04

'29'

1033

12/03

'32'

1003

12/01

'31'

1026

12/03

'35'

1007

12/01

'34'

1029

12/03

'33'

1039

12/04

'38'

1046

12/04

'39'

1011

12/01

'36'

1012

12/01

'40'

1035

12/03

'41'

1044

12/04

'42'

1047


12/04 

'36'

1043

12/04

'44'

1022

12/02

'43'

1010

12/01

'48'

1023

12/02

'45'

1015

12/02

'47'

1027

12/03

'46'

1030

12/03

RH

O_#

O_Date

RH

O_#

O_Date

RH

O_#

O_Date

RH

O_#

O_Date

'14'

1001

12/01

'06'

1009

12/01

'04'

1008

12/01

'08'

1006

12/01

'35'

1007

12/01

'26'

1002

12/01

'24'

1004

12/01

'20'

1005

12/01

'39'

1011

12/01

'36'

1012

12/01

'32'

1003

12/01

'43'

1010

12/01

'03'

1016

12/02

'07'

1017

12/02

'09'

1018

12/02

'02'

1024

12/02

'17'

1013

12/02

'16'

1021

12/02

'27'

1014

12/02

'11'

1019

12/02

'48'

1023

12/02

'45'

1015

12/02

'44'

1022

12/02

'22'

1020

12/02

'01'

1028

12/03

'10'

1034

12/03

'19'

1025

12/03

'25'

1036

12/03

'12'

1031

12/03

'29'

1033

12/03

'40'

1035

12/03

'31'

1026

12/03

'28'

1032

12/03

'34'

1029

12/03

'47'

1027

12/03

'46'

1030

12/03

'23'

1040

12/04

'13'

1037

12/04

'05'

1048

12/04

'18'

1041

12/04

'30'

1038

12/04

'21'

1045

12/04

'15'

1042

12/04

'38'

1046

12/04

'42'

1047

12/04

'36'

1043

12/04

'33'

1039

12/04

'41'

1044

12/04





O_Date

Page 17-9

Primary Index Access (NPPI)
A non-partitioned table (NPPI) has a traditional primary index by which rows are assigned
to AMPs. Apart from maintaining their storage in row hash order, no additional assignment
processing of rows is performed once they are hashed to an AMP.
With a NPPI table, the PARSER will include Partition Number 0 in the request. For a table
with a NPPI, all of the rows are assumed to be part of one partition (Partition 0).
Assuming that an SQL statement (e.g., SELECT) provides equality value(s) to the column(s)
of a Primary Index, the TD Database software retrieves the row or rows from a single AMP
as described below.
The Parsing Engine (PE) creates a four-part message composed of the Table ID, Partition
#0, the Row Hash, and Primary Index value(s). The 48-bit Table ID is located via the Data
Dictionary, the 32 bit Row Hash value is generated by the Hashing Algorithm, and the
Primary Index value(s) come from the SQL request. The Parsing Engine (via the Data
Dictionary) knows if a table has a NPPI and sets the Partition Number to 0.
The Message Passing Layer uses a portion of the Row Hash to determine to which AMP to
send the request. The Message Passing Layer uses the HBN portion of the Row Hash (first
16 or 20 bits of the Row Hash) to locate a bucket in the Hash Map(s). This bucket identifies
to which AMP the PE will send the request. The Hash Maps are part of the Message
Passing Layer interface.
The AMP uses the Table ID and Row Hash to identify and locate the proper data block, then
uses the Row Hash and PI value to locate the specific row(s). The PI value is required to
distinguish between Hash Synonyms. The AMP implicitly assumes the rows are part of
partition #0.
Note: The Partition Number (effectively 0) is not stored within the data rows for a table
with a NPPI. The FLAG or SPARE byte (within the row overhead) has a bit set to zero
for a NPPI row and it is set to one for a PPI row.
Acronyms:
HBN – Hash Bucket Number
PPI
– Partitioned Primary Index
NPPI – Non-Partitioned Primary Index

Page 17-10

Partitioned Primary Indexes

Primary Index Access (NPPI)
SQL with primary index values
and data.

PARSER
Hashing
Algorithm

Base TableID
(48 bits)

Part. #
0

Row Hash

PI values
and data

Bucket #

Message Passing Layer (Hash Maps)

AMP 0

AMP 1

...

AMP x

...

AMP n - 1

AMP n

Data Table
Row ID
Row Hash

Uniq Value

x '00000000'

P# 0

RH

Data

x'068117A0' 0000 0001
x'068117A0' 0000 0002
x'068117A0' 0000 0003
x 'FFFFFFFF'

Partitioned Primary Indexes

Row Data

Notes:
1. For tables with a NPPI, the rows
are implicitly associated with
Partition #0.
2. Partition #0 is not stored within
each of the rows.
3. Rows are logically stored in Row
ID sequence.

Page 17-11

Primary Index Access (PPI)
The process to locate a data row(s) via a PPI is similar to the process in retrieving data rows
with a table defined with a NPPI – a process described earlier. If the SQL request provides
data about columns associated with the partitions, then the PARSER will include specific
partition information in the request.


The key to remember is that a specific Row Hash value can be found in different
partitions on the AMP. The Partition Number, Row Hash, and Uniqueness Value
are needed to uniquely identify a row in a PPI-based table.



A Row Hash and Uniqueness Value combination is only unique within a partition
of a PPI table. The same Row Hash and Uniqueness Value combination can be
present in different partitions (e.g., x’068117A0’).

Assuming that an SQL statement (e.g., SELECT) provides equality value(s) to the Primary
Index, then Teradata software retrieves the row(s) from a single AMP.


If the SQL request also provides data for partition columns, then the AMP will only
have to access the partition(s) identified in the request sent to it by the PE.



If the SQL request only provides Primary Index values and the partitioning
columns are outside of the Primary Index (and partitioning information is not
included in the SQL request), the AMP will check each of the Partitions for the
associated Row Hash.

The Parsing Engine (PE) creates a four-part message composed of the Table ID, Partition
Information, the Row Hash, and Primary Index value(s). The 48-bit Table ID is located via
the Data Dictionary, the 32-bit Row Hash value is generated by the Hashing Algorithm, and
the Partition information and Primary Index value(s) come from the SQL request. The
Parsing Engine (via the Data Dictionary) knows if a table has a PPI and determines the
Partitions to include in the request based on the SQL request.
The Message Passing Layer uses a portion of the Row Hash to determine to which AMP to
send the request. The Message Passing Layer uses the DSW portion of the Row Hash (first
16 or 20 bits of the Row Hash) to locate a bucket in the Hash Map(s). This bucket
identifies to which AMP the PE will send the request.
The AMP uses the Table ID, Partition Number(s), and Row Hash to identify and locate the
proper data block(s). The AMP then uses the Row Hash and PI value to locate the specific
row(s). The PI value is required to distinguish between Hash Synonyms. Each data row
will have the Partition Number stored within it.
In the general case, there can be up to 65,535 partitions, numbered from one. As rows are
inserted into the table, the partitioning expression is evaluated to determine the proper
partition placement for that row. The two-byte partition number is embedded in the row, as
part of the row identifier, making PPI rows two bytes wider than they would be if the table
wasn’t partitioned.
Page 17-12

Partitioned Primary Indexes

Primary Index Access (PPI)
PARSER

SQL with primary index values
and data, or SQL expressions that
include partition related values.

Hashing
Algorithm

Base TableID
(48 bits)

Part. #

Row Hash

1 or more Bucket #

PI values
and data

Message Passing Layer (Hash Maps)

AMP 0

AMP 1

...

AMP x

...

AMP n - 1

AMP n

Data Table
P#

RH

Row ID

Data
Part #
1
1
1
1
1
1
2
2
2
2
2
3
3
3

Partitioned Primary Indexes

Row Hash

Row Data
Uniq Value

x'00000000'
x'068117A0'
x'068117A0'

0000 0001
0000 0002

1. Within the AMP, rows are ordered
first by their partition number.

x’FFFFFFFF'
x'00000000'
x'068117A0'
x'FFFFFFFF'
x'00000000'

Notes:

0000 0001

2. Within each partition, rows are
logically stored in row hash and
uniqueness value sequence.

x'FFFFFFFF'

Page 17-13

Why Partition a Table?
The decision to define a Partitioned Primary Index (PPI) for a table depends on how its rows
are most frequently accessed. PPI tables are designed to optimize range queries while also
providing efficient primary index join strategies. For range queries, only rows of the
qualified partitions need to be accessed.


One of the reasons to define a PPI on a table is to increase query efficiency by
avoiding full table scans without the overhead and maintenance costs of secondary
indexes.

The facing page provides one example using a sales data table that has 5 years of sales
history. A PPI is placed on this table which partitions the data into 60 partitions (one for
each month of the 5 years).
Queries that request a subset of the data (some number of months) only need to access the
required partitions instead of the entire table. For example, a query that requests two months
of sales data only needs to read 2 partitions of the data from each AMP. This is about 1/30
of the table. Without a PPI or any secondary indexes, this query has to perform a full table
scan. Even with a secondary index, a full table scan would probably be done for 1/30 or 3%
of the table.
The more partitions there are, the greater the potential benefit.
Some of the performance opportunities available by using the PPI feature include:




Get more efficiency in querying against a subset of large volumes of transactional
detail data as well as to manage this data more effectively.
–

Businesses have recognized the analytic value of detailed transactions and are
storing larger and larger volumes of this data.

–

Increase query efficiency by avoiding full table scans without the overhead and
maintenance costs of secondary indexes.

–

As the retention volume of detailed transactions increases, the percent of
transactions that an “average” query requires for execution decreases.

Allow “instantaneous” dropping of “old” data and simple addition of “new” data.
–

Support a “rolling n periods” methodology for transactional data.

The term “partition elimination” refers to an automatic optimization in which the optimizer
determines, based on query conditions, that some partitions can't contain qualifying rows,
and causes those partitions to be skipped. Partitions that are skipped for a particular query
are called excluded partitions. Generally, the greatest benefit of a PPI table is obtained from
partition elimination.

Page 17-14

Partitioned Primary Indexes

Why Partition a Table?
• Increase query efficiency by avoiding full table scans without the overhead
and maintenance costs of secondary indexes.

– Partition Elimination – the key advantage to partitioning a table is that the
optimizer can eliminate partitions for queries.

• For example, assume a sales data table has 5 years of sales history.
– A PPI is placed on this table which partitions the data into 60 partitions (one for
each month of the 5 years).

– Assume a query only needs to read 2 months of the data from each AMP.
•
•

Only 1/30 (2 partitions) of the table has to be read.
With a NPPI, this query has to perform a full table scan.

– A Valued-Ordered NUSI may be used to help performance for this type of query.
•

However, there is NUSI subtable permanent space and maintenance overhead.

• Deleting large volumes of rows in entire partitions can be extremely fast.
– ALTER TABLE … DROP RANGE … ;
– Disclaimer: Fast deletes assume that the table doesn't have a NO RANGE partition
defined and has no secondary indexes, join indexes, or hash indexes.

Partitioned Primary Indexes

Page 17-15

Advantages/Disadvantages of Partitioning
The main advantage of a PPI table is the automatic optimization that occurs for queries that
specify a restrictive condition on the partitioning column. For example, a query which
examines two months of sales data in a table with two years of sales history can read about
one-twelfth of the table, instead of the entire table. The more partitions there are, the greater
the potential benefit.

Disadvantages of Partitioning
The two main potential disadvantages of using a PPI table occur with PI access and direct
PI-based joins. The PI access potential disadvantage occurs only when the partitioning
column is not part of the PI. In this situation, a query specifying a PI value, but no value for
the partitioning column, must look in each partition for that value, instead of positioning
directly to the first row for the PI value.
The direct join potential disadvantage occurs when another table with the same PI is joined
with an equality condition on every PI column. For two non-PPI tables, the rows of the two
tables will be ordered the same, and the join can be performed directly. If one of the tables
is partitioned, the rows won’t be ordered the same, and the task, in effect, becomes a set of
sub-joins, one for each partition of the PPI table.
In both of these situations, the disadvantage is proportional to the number of partitions, with
fewer partitions being better than more partitions.
With the Aligned Row Format (Linux 64-bit), the two-byte partition number is embedded in
the row, as part of the row identifier, plus an additional 2 bytes for a total of 4 additional
bytes per data row. With the Packed64 Row Format (Linux 64-bit 13.10 new install), the
overhead within in row for a PPI table is only 2 bytes for the partition number. Secondary
Indexes referencing PPI tables use the 10-byte row identifier, making those subtable rows 2
bytes wider as well. Join Indexes always use a 10-byte row identifier regardless if the base
tables are partitioned or not.
When the primary index is unique (but can’t be defined as unique because of the
partitioning), a USI or NUSI can be defined on the same columns as the primary index.
Access via the secondary index won’t be as fast as non-partitioned access via the primary
index, but is fast enough for most applications.
Why can't a Primary Index be defined as Unique unless the partitioning expression
columns are part of the PI column(s)?
It’s because of the difficulty of performing the duplicate PI check for inserts. If there
was already a row with that PI, it could be in any partition, so every partition would
have to be checked to determine whether the duplicate PI exists. There can be
thousands of partitions. An insert-select could take a very long time in such a situation.
It’s more efficient to check uniqueness (and it also provides an efficient access path) to
define a unique secondary index (USI) on the same columns as the PI in this case.

Page 17-16

Partitioned Primary Indexes

Advantages/Disadvantages of Partitioning
Advantages:

• The partition expression definition is the only thing that needs to be done by the DBA
or the database designer. No separate partition layout – no disk layout for partitions.

– For example, the last row in one partition and the first row in the next partition will usually be
in the same data block.

– No definition of location in the system for partitions.

• Even data distribution and even processing of a logical partition is automatic.
– Due to the PI distribution of the rows

• No modifications of queries required.
Potential disadvantages:

• PPI rows are 2 or 8 bytes longer. Table uses more PERM space.
– Secondary index subtable rows are also increased in size.

• A PI access may be degraded if the partitioning column is not part of the PI.
– A query specifying only a PI value must look in each partition for that value.

• Joins to non-partitioned tables with the same PI may be degraded.
• The PI can’t be defined as unique when the partitioning column is not part of the PI.

Partitioned Primary Indexes

Page 17-17

PPI Considerations
Starting with Teradata V2R6.1, base tables, global temporary tables, and volatile temporary
tables can be partitioned. This restriction doesn’t mean that a PPI table can’t have
secondary indexes, or can’t be referenced in the definition of a Join Index or Hash Index. It
merely means that the PARTITION BY clause is not available on a CREATE JOIN INDEX
or CREATE HASH INDEX statement.
In Teradata Database V2R6.2, Partitioned Primary Indexes (PPIs) are supported for noncompressed join indexes.
In the general case, there can be up to 65,535 partitions, numbered from one. The two-byte
partition number is embedded in the data row, as part of the row identifier. Secondary
Indexes and Join Indexes referencing PPI tables also use the wider row identifier. Except
for the embedded partition number, PPI rows have the same format as non-PPI rows. A data
block can contain rows from more than one partition. There are no new control structures
needed to implement the partitioning scheme.

Access of Tables with a PPI
Some of the issues associated with accessing a table that has a defined PPI are listed below:


If the SELECT statement does not provide values for any of the partitioning
columns, then all of the partitions may be probed to find row(s) with the hash
value.



If the SELECT statement provides values for some of the partitioning columns,
then partition elimination may reduce the number of the partitions that will be
probed to find row(s) with the hash value.
A common situation is with SQL specifying a range of values for partitioning
columns. This allows some partitions to be excluded.



If the SELECT statement provides values for all of the partitioning columns, then
partition elimination will cause a single partition to be probed to find row(s) with
the hash value.

In summary, a NUPI access of a PPI table will take longer when a query specifies the PI
column values, but doesn’t include the partitioning column(s). In this situation, each
partition must be probed for the appropriate PI value. In the worst case, the number of disk
reads could increase by a factor equal to the number of partitions. While probing a partition
is a fast operation, a table with thousands of partitions might not provide acceptable
performance for PI accesses for some applications.

Page 17-18

Partitioned Primary Indexes

PPI Considerations
PPI considerations include …

• Base tables are partitioned, secondary indexes are not.
• However, a PPI table can have secondary indexes which reference rows in a PPI table
via a RowID in the SI subtable.

– Global and Volatile Temporary Tables can also be partitioned.
– Non-Compressed Join Indexes can also be partitioned.
• A join or hash index can also reference rows in a PPI table.
A table has a max of 65,535 (or 9.223 Quintillion) partitions.

• Partitioning columns do not have to be columns in the primary index.
• There are numerous options for partitioning.
As rows are inserted into the table, the partitioning expression is evaluated to
determine the proper partition placement for that row.

Partitioned Primary Indexes

Page 17-19

How to Define a PPI
Primary indexes can be partitioned or non-partitioned. A primary index is defined as part of
the CREATE TABLE statement. The PRIMARY INDEX definition has a new option to
create partitioned primary indexes.
PARTITION BY 

A partitioned primary index (PPI) permits rows to be assigned to user-defined data partitions
on the AMPs, enabling enhanced performance for range queries that are predicated on
partitioning columns(s) values. The  is evaluated and Teradata
determines the appropriate partition number or assignment.
The  is a general expression, allowing wide flexibility in tailoring
the partitioning scheme to the unique characteristics of the table. Two functions, CASE_N
and RANGE_N, are provided to simplify the creation of common partitioning schemes.
You can write any valid SQL expression as a partitioning expression with a few exceptions.
The reference manual has details on SQL expressions that are not permitted in the
.
Limitations on PARTITION BY option include:


Partitioning expression must be a scalar expression that is INTEGER or can be cast
to INTEGER.



Multiple columns from the table may be specified in the expression
– These are called the partitioning columns.



Before Teradata 13.10, expression must not require character/graphic comparison
in order to be evaluated.
– Expression must not contain aggregate/ordered-analytic/statistical functions,
DATE/, TIME, ACCOUNT, RANDOM, HASH, etc. functions.



PARTITION BY clause not allowed for global temporary tables, volatile tables,
join indexes, hash indexes, and secondary indexes in the first release of PPI.



UNIQUE only allowed if all partitioning columns are included in the PI.



Partitioning expression limited to approximately 8100 characters.
– Stored as an implicit check constraint in DBC.TableConstraints

One or more columns can make up the partitioning expression, although it is anticipated that
for most tables one column will be specified. The partitioning column(s) can be part of the
primary index, but are not required to be. The result of the partitioning expression must be a
scalar value that is INTEGER or can be cast to INTEGER. Most deterministic functions can
be used within the expression. The expression must not require character or graphic
comparisons, although character or graphic columns can be used in some circumstances.

Page 17-20

Partitioned Primary Indexes

How to Define a PPI
The PRIMARY INDEX definition portion of a CREATE TABLE statement has a
optional PARTITION BY option.
CREATE TABLE …
[UNIQUE] PRIMARY INDEX (col1, col2, …)
PARTITION BY 

Options for the  include:

• Range partitioning
• Conditional partitioning, modulo partitioning, and general expression partitioning.
• Partitioning columns do not have to be columns in the primary index. If they aren't,
then the primary index cannot be unique.

Column(s) included in the partitioning expression are called the “partitioning
column(s)”.

• Two functions, CASE_N and RANGE_N, are provided to simplify the creation of
common partitioning schemes.

Partitioned Primary Indexes

Page 17-21

Partitioning with CASE_N and RANGE_N
For many tables, there is no suitable column that lends itself to direct usage as a partitioning
column. For these situations, the CASE_N and RANGE_N functions can be used to
concisely define partitioning expressions. When CASE_N or RANGE_N is used, two
partitions are reserved for specific uses, leaving a maximum of 65,533 user-defined
partitions. Note that the table still has a total of 65,535 available partitions.
The PARTITION BY phrase requires a partitioning expression that determines the partition
assignment of a row. You can use the CASE_N function to construct a partitioning
expression such that a row with any value or NULL for the partitioning column is assigned
to a partition.
The CASE_N function is patterned after the SQL CASE expression. It evaluates a list of
conditions and returns the position of the first condition that evaluates to TRUE, provided
that no prior condition in the list evaluates to UNKNOWN. The returned value will map
directly into a partition number.
Another option is to use the RANGE_N function to construct a partitioning expression with
a list of ranges such that a row with any value or NULL for the partitioning column is
assigned to a partition.
If CASE_N or RANGE_N is used in a partitioning expression in a CREATE TABLE or
ALTER TABLE statement, it:


Must not involve character or graphic comparisons



Can specify a maximum of 65,533 user-defined partitions. The table can have a
total of 65,535 partitions including the NO CASE (NO RANGE) and UNKNOWN
partitions.

Page 17-22

Partitioned Primary Indexes

Partitioning with CASE_N and RANGE_N
The  may use one of the following functions to help
define partitions.

• CASE_N
• RANGE_N
Use of CASE_N results in the following:

• Evaluates a list of conditions and returns the position of the first condition that
evaluates to TRUE.

• Result is the data row being placed into a partition associated with that condition.
• Note: Patterned after SQL CASE expression.
Use of RANGE_N results in the following:

• The expression is evaluated and is mapped into one of a list of specified ranges.
• Ranges are listed in increasing order and must not overlap with each other.
• Result is the data row being placed into a partition associated with that range.
NO CASE, NO RANGE, and UNKNOWN options are also available.

Partitioned Primary Indexes

Page 17-23

Partitioning with RANGE_N – Example 1
One of most common partitioning expression is to use RANGE_N partitioning to partition
the table based on a group of dates (e.g., month partitions). A range is defined by a starting
boundary and an optional ending boundary. If an ending boundary is not specified, the
range is defined by its starting boundary, inclusively, up to but not including the starting
boundary of the next range.
The list of ranges must specify ranges in increasing order, where the ending boundary of a
range is less than the starting boundary of the next range.
RANGE_N Limitations include:





Multiple test values are not allowed in a RANGE_N function.
Test value in RANGE_N function must be INTEGER, BYTEINT, SMALLINT, or
DATE.
Range value and range size in a RANGE_N function must be constant.
Ascending ranges only and ranges must not overlap with other.

For example, the following CREATE TABLE statement can be used to establish the
monthly partitioning. This example does not have the NO RANGE partition defined.
CREATE SET TABLE Claim
(claim_id
INTEGER
NOT NULL
,cust_id
INTEGER
NOT NULL
,claim_date
DATE
NOT NULL
:
PRIMARY INDEX (claim_id)
PARTITION BY RANGE_N (claim_date BETWEEN
DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' MONTH);

To maintain uniqueness on the claim_id, you can include a USI on claim_id by including the
following option.
UNIQUE INDEX (claim_id)
If the claim_date column for an attempted INSERT or UPDATE has a date outside of the
partitioning range or NULL, then an error will be returned and the row won’t be inserted or
updated.
Notes:
 UPI not allowed because partitioning column is not included in the PI.
 Unique Secondary Index is allowed on PI to enforce uniqueness.
The facing page contains examples of inserting data rows into a table partitioned by month
and how the date is evaluated into the appropriate partition.

Page 17-24

Partitioned Primary Indexes

Partitioning with RANGE_N – Example 1
For example, partition the Claim table by "Claim Date".
CREATE TABLE Claim
( claim_id
INTEGER
NOT NULL
,cust_id
INTEGER
NOT NULL
,claim_date DATE
NOT NULL
…)
PRIMARY INDEX (claim_id)
PARTITION BY RANGE_N
(claim_date BETWEEN DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' MONTH,
NO RANGE);

The following INSERTs place new rows into the Claim table. The date is evaluated and the
rows are placed into the appropriate partitions.
INSERT INTO Claim VALUES
INSERT INTO Claim VALUES
INSERT INTO Claim VALUES
INSERT INTO Claim VALUES

(100039,1009, '2003-01-13', …);  placed in partition #1
(260221,1020, '2012-01-07', …);  placed in partition #109
(350221,1020, '2013-01-01', …);  placed in no range partition (#121)
(100039, 1009, NULL, …);
 Error 3811 – NOT NULL violation

If the table did not have the NO RANGE partition defined, then the following error occurs:
INSERT INTO Claim VALUES (100039, 1009, '2013-01-01', …);

(5728 – Partitioning violation)

Note: claim_id must be defined as a NUPI because claim_date is not part of PI.

Partitioned Primary Indexes

Page 17-25

Access using Partitioned Data – Example 1 (cont.)
The EXPLAIN text for these queries is shown below.
EXPLAIN

SELECT
FROM
WHERE
BETWEEN

*
Claim_PPI
claim_date
DATE '2012-01-01' AND DATE '2012-01-31';

1) First, we lock a distinct DS."pseudo table" for read on a RowHash to prevent global
deadlock for DS.Claim_PPI.
2) Next, we lock DS.Claim_PPI for read.
3) We do an all-AMPs RETRIEVE step from a single partition of DS.Claim_PPI with a
condition of ("(DS.Claim_PPI.claim_date <= DATE '2012-01-31') AND
(DS.Claim_PPI.claim_date >= DATE '2012-01-01')") into Spool 1 (group_amps),
which is built locally on the AMPs. The input table will not be cached in memory, but
it is eligible for synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 21,100 rows (2,869,600 bytes). The estimated time for this step is 0.44
seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing
the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 0.44 seconds.
The table named Claim_NPPI is similar to Claim_PPI except it does not have a Partitioned
Primary Index, but does have “claim_id” as a UPI.
EXPLAIN

SELECT
FROM
WHERE
BETWEEN

*
Claim_NPPI
claim_date
DATE '2011-01-01' AND DATE '2011-01-31';

1) First, we lock a distinct DS."pseudo table" for read on a RowHash to prevent global
deadlock for DS.Claim_NPPI.
2) Next, we lock DS.Claim_NPPI for read.
3) We do an all-AMPs RETRIEVE step from DS.Claim_NPPI by way of an all-rows
scan with a condition of ("(DS.Claim_NPPI.claim_date <= DATE '2012-01-31') AND
(DS.Claim_NPPI.claim_date >= DATE '2012-01-01')") into Spool 1 (group_amps),
which is built locally on the AMPs. The input table will not be cached in memory, but
it is eligible for synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 21,100 rows (2,827,400 bytes). The estimated time for this step is
49.10 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing
the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 49.10 seconds.
Note: Statistics were collected on the claim_id, cust_id, and claim_date of both tables. The
Claim table has 1,440,000 rows.
Page 17-26

Partitioned Primary Indexes

Access using Partitioned Data – Example 1
AMP

AMP

...

AMP

Part 1 – Jan, 03

Part 1 – Jan, 03

Part 1 – Jan, 03

Part 2

Part 2

Part 2

.
.
.
P# 109

Part n

.
.
.
P# 109

.
.
.

QUERY – PPI

…

P# 109

Part n

SELECT *
FROM
Claim_PPI
WHERE claim_date BETWEEN
DATE '2012-01-01' AND
DATE '2012-01-31' ;

PLAN – PPI
ALL-AMPs – Single Partition Scan
EXPLAIN estimated cost – 0.44 sec.

Part n
AMP

AMP

AMP

QUERY – NPPI
SELECT *
FROM
Claim_NPPI
WHERE claim_date BETWEEN
DATE '2012-01-01' AND
DATE '2012-01-31' ;

...

...

PLAN – NPPI
ALL-AMPs – Full Table Scan
EXPLAIN estimated cost – 49.10 sec.

Partitioned Primary Indexes

Page 17-27

Access Using Primary Index – Example 1 (cont.)
The EXPLAIN text for these queries is shown below.
EXPLAIN

SELECT
FROM
WHERE

*
Claim_PPI
claim_id = 260221;

1) First, we do a single-AMP RETRIEVE step from all partitions of DS.Claim_PPI by
way of the primary index "DS.Claim_PPI.claim_id = 260221" with a residual condition
of ("DS.Claim_PPI.claim_id = 260221") into Spool 1 (one-amp), which is built locally
on that AMP. The input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 (136 bytes) is estimated with high
confidence to be 1 row. The estimated time for this step is 0.09 seconds.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 0.09 seconds.
The table named Claim_NPPI is similar to Claim_PPI except it does not have a Partitioned
Primary Index, but does have “claim_id” as a UPI.

EXPLAIN

SELECT
FROM
WHERE

*
Claim_NPPI
claim_id = 260221;

1) First, we do a single-AMP RETRIEVE step from DS.Claim_NPPI by way of the
unique primary index "DS.Claim_NPPI.claim_id = 260221" with no residual
conditions. The estimated time for this step is 0.00 seconds.
-> The row is sent directly back to the user as the result of statement 1. The total
estimated time is 0.00 seconds.

Page 17-28

Partitioned Primary Indexes

Access Using Primary Index – Example 1 (cont.)
AMP

AMP

AMP

Part 1 – Jan, 03

Part 1 – Jan, 03

Part 1 – Jan, 03

Part 2

Part 2

Part 2

.
.
.

SELECT
FROM
WHERE

.
.
.

.
.
.

Part 109

Part 109

Part 109

Part n

Part n

Part n

...

QUERY – PPI

…

SELECT *
FROM
Claim_NPPI
WHERE claim_id = 260221;

...

PLAN – PPI
One AMP – All Partitions are probed
EXPLAIN estimated cost – 0.09 sec.

AMP

QUERY – NPPI

*
Claim_PPI
claim_id = 260221;

AMP

AMP

Only one block
has to be read to
locate the row.

...

PLAN – NPPI
One AMP – UPI Access
EXPLAIN estimated cost – 0.00 sec.

Partitioned Primary Indexes

Page 17-29

Place a USI on NUPI – Example 1 (cont.)
If the partitioning columns are not part of the Primary Index, the Primary Index cannot be
unique (e.g., claim_date). To maintain uniqueness on the Primary Index, you can create a
USI on the PI (e.g., Claim ID or claim_id).
Reasons for this may include:



USI access to specific rows may be faster than scanning multiple partitions on a
single AMP.
Establish the USI as a referenced parent in Referential Integrity.

CREATE UNIQUE INDEX (claim_id) ON Claim_PPI;

EXPLAIN

SELECT
FROM
WHERE

*
Claim_PPI
claim_id = 260221;

1) First, we do a two-AMP RETRIEVE step from DS.Claim_PPI by way of unique
index # 4 "DS.Claim_PPI.claim_id = 260221" with no residual conditions. The
estimated time for this step is 0.00 seconds.
-> The row is sent directly back to the user as the result of statement 1. The total estimated
time is 0.00 seconds.
As an alternative, the SELECT can include the Primary Index values and the partitioning
information. This allows the PE to build a request that has the AMP scan a specific
partition. However, in this example, the user may not know the claim date in order to
include it in the query.
EXPLAIN

SELECT *
FROM
Claim_PPI
WHERE claim_id = 260221
AND
claim_date = DATE '2012-01-11';

1) First, we do a single-AMP RETRIEVE step from DS.Claim_PPI by way of the primary
index "DS.Claim_PPI.claim_id = 260221, DS.Claim_PPI.claim_date = DATE '201201-11'" with a residual condition of ("(DS.Claim_PPI.claim_date = DATE '2012-01-11')
AND (DS.Claim_PPI.claim_id = 260221)") into Spool 1 (one-amp), which is built
locally on that AMP. The input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 (136 bytes) is estimated with high
confidence to be 1 row. The estimated time for this step is 0.00 seconds.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 0.00 seconds.

Page 17-30

Partitioned Primary Indexes

Place a USI on NUPI – Example 1 (cont.)
Notes:
• If the partitioning column(s) are not part of the Primary Index, the Primary Index
cannot be unique (e.g., Claim Date is not part of the PI).

• To maintain uniqueness on the Primary Index, you can create a USI on the PI (e.g.,
Claim ID). This is a two-AMP operation.
AMP

AMP

...

USI Subtable

USI Subtable

AMP
USI Subtable

...

USI subtable row specifies part #,
row hash, & uniq. value of data row.
Part 1

Part 1

Part 2

...

Part 2
.
.
.

.
.
.

Part 109

Part 109

Part 109

Part n

Part n

Part n

Partitioned Primary Indexes

SELECT
*
FROM Claim_PPI
WHERE claim_id = 260221;

Part 1

Part 2
.
.
.

CREATE UNIQUE INDEX
(claim_id) ON Claim_PPI ;

...

USI Considerations:
• Eliminate partition probing
• Row-hash locks
• 2-AMP operation
• Can only be used if values in
PI column(s) are unique
• Will maintain uniqueness
• USI on NUPI only supported
on PPI tables

Page 17-31

Place a NUSI on NUPI – Example 1 (cont.)
If the partitioning columns are not part of the Primary Index, the Primary Index cannot be
unique (e.g., Claim ID). You can use a NUSI on the same columns that make up the PI and
actually get a single-AMP access operation. This feature only applies to a NUSI created on
the same columns as a PI on PPI table. Additionally, instead of table level locks (typical
NUSI), row hash locks will be used.
Reasons to choose a NUSI for your PI may include:


The primary index is non-unique (can’t use a USI) and you need faster access than
scanning or probing multiple partitions on a single AMP.



MultiLoad can be used to load a table with a NUSI, not a USI.



The access time for a USI and NUSI will be similar (each will access a subtable
block) – however, the USI is a 2-AMP operation and requires BYNET message
passing. The amount of space for a USI and NUSI subtable in this case will be
similar. A typical NUSI with duplicate values will have multiple row ids (keys) in
a subtable row and will save space per subtable row. However, a NUSI used as an
index for columns with unique values will use approximately the same amount of
subtable space as a USI. This is because each NUSI subtable row only contains 1
row id.

CREATE INDEX (claim_id) ON Claim_PPI;

EXPLAIN

SELECT *
FROM
Claim_PPI
WHERE claim_id = 260221;

1) First, we do a single-AMP RETRIEVE step from DS.Claim_PPI by way of index # 4
"DS.Claim_PPI.claim_id = 260221" with no residual conditions into Spool 1
(group_amps), which is built locally on that AMP. The input table will not be cached in
memory, but it is eligible for synchronized scanning. The size of Spool 1 (136 bytes) is
estimated with high confidence to be 1 row. The estimated time for this step is 0.00
seconds.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total
estimated time is 0.00 seconds.

Page 17-32

Partitioned Primary Indexes

Place a NUSI on NUPI – Example 1 (cont.)
Notes:
• You can optionally create a NUSI on the same columns as the Primary Index (e.g.,
Claim ID). The PI may be unique or not.

• Optimizer generates a plan for a single-AMP NUSI access with row-hash locking
(instead of table-level locking).
AMP

AMP

...

NUSI Subtable

NUSI Subtable

AMP
NUSI Subtable

...

NUSI subtable row specifies part#,
row hash, & uniq. value of data row.
Part 1

Part 1

Part 2

...

Part 2
.
.
.

.
.
.

Part 109

Part 109

Part 109

Part n

Part n

Part n

Partitioned Primary Indexes

SELECT
*
FROM Claim_PPI
WHERE claim_id = 260221;

Part 1

Part 2
.
.
.

CREATE INDEX (claim_id)
ON Claim_PPI;

...

NUSI Considerations:
• Eliminate partition probing
• Row-hash locks
• 1-AMP operation
• Can be used with unique or
non-unique PI columns
• Must be equality condition
• NUSI Single-AMP operation
only supported on PPI tables
• Use MultiLoad to load table

Page 17-33

Partitioning with RANGE_N – Example 2
This example illustrates that a table can be partitioned with different size intervals. The
current Sales data and Sales History data are placed in the same table. It typically is not
practical to create a partitioning expression as shown in example #2, but the example is
included to show the flexibility that you have with the partitioning expression.
For example, you may decide to partition the Sales History by month and the current sales
data by day. You may want to do this if users frequently access the Sales History data with
range constraints, resulting in full table scans. It may be that users access the current year
data frequently looking at data for a specific day. The example on the facing page partitions
the years 2002 to 2011 by month and the year 2011 by day.
One option may be to partition by week as follows:
PARTITION BY RANGE_N (sales_date
BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '7' DAY,
DATE '2004-01-01' AND DATE '2004-12-31' EACH INTERVAL '7' DAY,
:
:
DATE '2012-01-01' AND DATE '2012-12-31' EACH INTERVAL '7' DAY);

One may think that a simple partitioning scheme to partition by week would be as follows:
PARTITION BY RANGE_N (sales_date
BETWEEN DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '7' DAY);

This is a simpler PARTITION expression to initially code, but may require more work or
thought later. There is a minor drawback to partitioning by weeks because a 7-day partition
usually spans one year into the next. Assume that a year from now, you wish to ALTER this
table and DROP the partitions for the year 2003. The ALTER TABLE DROP RANGE
option has to specify a range of dates that actually represent a complete partition or
partitions in the table. A complete partition ends on 2003-12-19, not 2003-12-31. The
ALTER TABLE command will be described later in this module.
If daily partitions are desired for all of the years, the following partitioning expression can
be used to create a partitioned table with daily partitions.
PARTITION BY RANGE_N (
sales_date BETWEEN
DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' DAY);

Performance Note: Daily partitions for ten years creates 3653 partitions (10 x 365 plus three
leap days) and may not useful in many situations. Try to avoid daily partitions over a long
period of time.

Page 17-34

Partitioned Primary Indexes

Partitioning with RANGE_N – Example 2
Notes:
• This example places current and history sales data into one table.
• Current year data is partitioned on a more granular basis (daily) while historical sales
data is placed into monthly partitions.
• Partitions of varying intervals can be created on the same PPI for a table.
CREATE TABLE Sales_and_SalesHistory
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue
DECIMAL(9,2),
total_sold
INTEGER,
note
VARCHAR(256))
PRIMARY INDEX (store_id, item_id)
PARTITION BY RANGE_N (
sales_date BETWEEN
DATE '2003-01-01' AND DATE '2011-12-31' EACH INTERVAL '1' MONTH,
DATE '2012-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' DAY);

To partition by week, the following partitioning can be used.
PARTITION BY RANGE_N (sales_date BETWEEN
DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '7' DAY,
DATE '2004-01-01' AND DATE '2004-12-31' EACH INTERVAL '7' DAY,
:
:

Partitioned Primary Indexes

Page 17-35

Partitioning – Example 3
This example partitions by Store Id (store number). Prior to Teradata 14.0, a table has a
maximum limit of 65,535 partitions. Therefore, the partitioning expression value from Store
Id or an expression involving Store Id must be between 1 and 65,535.
If a company had a small number of stores, you could use the RANGE_N expression to limit
the number of possible partitions. The alternative partitioning (that is shown on facing page)
expression allows for ten partitions instead of 65,535. The optimizer may be able to more
accurately cost join plans when the maximum number of partitions is known and small,
making this a better choice than using the column directly.
Assume that a company has 1000 stores, and the store numbers (store_id) are from 100001
to 101001. To utilize 1000 partitions, the following partitioning expression could be
defined.
... PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY store_id – 100000;

If a company has a small number of stores and a small number of products, another option
may be to partition by a combination of Store Id and Item Id.
Assume the following:
Store numbers – 100001 to 100065 - less than 65 stores
Item numbers – 5000 to 5999 - less than 1000 item ids
Basically, the table has three-digit item_id codes and less than 65 stores.
This table could be partitioned as follows:
... PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY ((store_id – 100000) * 1000 + (item_id – 5000));

Assume that the store_id is 100009 and the item_id is 5025. This row would be placed in
partition # 9025.
If many queries specify both a Store Id and an Item Id, this might be a useful partitioning
scheme. Even if it wouldn’t be useful, it demonstrates that the physical designers and/or
database administrators have wide latitude in defining generalized partitioning schemes to
meet the needs of individual tables.

Page 17-36

Partitioned Primary Indexes

Partitioning – Example 3
Notes:
• The simplest partitioning expression uses one column from the row without
modification. Before Teradata 14.0, the column values must be between 1 and 65,535.
• Assume the store_id is a value between 100001 and 101001. Therefore, a simple
calculation can be performed.
• This example will partition by data by store_id and effectively utilize 1000 partitions.
CREATE TABLE Store_Sales
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue
DECIMAL(9,2),
total_sold
INTEGER,
note
VARCHAR(256))
UNIQUE PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY store_id - 100000;

Alternative Definition:
• Assume the customer wishes to group these 1000 stores into 100 partitions.
• The RANGE_N expression can be used to identify the number of partitions and group
multiple stores into the same partition.
PARTITION BY RANGE_N ( (store_id - 100000) BETWEEN 1 AND 1000 EACH 10);

Partitioned Primary Indexes

Page 17-37

Special Partitions with CASE_N and RANGE_N
The keywords, NO CASE (or NO RANGE) [OR UNKNOWN] and UNKNOWN are used
to define the specific-use partitions.
Even if these options are not specified with the CASE_N (or RANGE_N) expressions, these
two specific-use partitions are still reserved in the event the ALTER TABLE command is
later used to add these options.
If it is necessary to test a CASE_N condition directly as NULL, it needs to be the first
condition listed. This following example is correct. NULLs will be placed in partition #1.
PARTITION BY CASE_N
(col3 IS NULL,
col3 < 10,
col3 < 100,
NO CASE OR UNKNOWN)
INSERT INTO PPI_TabA VALUES (1, 'A', NULL, DATE);
INSERT INTO PPI_TabA VALUES (2, 'B', 5, DATE);
INSERT INTO PPI_TabA VALUES (3, 'C', 50, DATE);
INSERT INTO PPI_TabA VALUES (4, 'D', 500, DATE);
INSERT INTO PPI_TabA VALUES (5, 'E', NULL, DATE);
SELECT PARTITION AS "Part #", COUNT(*) FROM PPI_TabA
GROUP BY 1 ORDER BY 1;

Part #
1
2
3
4

Count(*)
2
1
1
1

Although you can code an example as follows, it should not be coded this way and will
provide inconsistent results. NULLs will be placed in partition #4.
PARTITION BY CASE_N
(col3 < 10,
col3 IS NULL,
col3 < 100,
NO CASE OR UNKNOWN)
SELECT PARTITION AS "Part #", COUNT(*) FROM PPI_TabA
GROUP BY 1 ORDER BY 1;

Part #
1
3
4

Page 17-38

Count(*)
1
1
3

Partitioned Primary Indexes

Special Partitions with CASE_N and RANGE_N
The CASE_N and RANGE_N can place rows into specific-use partitions when ...

• the expression doesn’t meet any of the CASE and RANGE expressions.
• the expression evaluates to UNKNOWN.
• two partition numbers are reserved even if the above options are not used.
The PPI keywords used to define two specific-use partitions are:

• NO CASE (or NO RANGE) [OR UNKNOWN]
– If this option is used, then a specific-use partition is used when the expression isn't true for
any case (or is out of range).

– If OR UNKNOWN is included with the NO CASE (or NO RANGE), then UNKNOWN expressions
are also placed in this partition.

• UNKNOWN
– If this option is specified, a different specific-use partition is used for unknowns.

• NO CASE (or NO RANGE), UNKNOWN
– If this option is used, then two separate specific-use partitions are used when the expression
isn't true for any case (or is out of range) and different special partition is used for NULLs.

Partitioned Primary Indexes

Page 17-39

Special Partition Examples
This example assumes the following CASE_N expression.
PARTITION BY CASE_N (
col3 < 10 ,
col3 < 100 ,
col3 < 1000 ,
NO CASE OR UNKNOWN)

This statement creates four partitions, conceptually numbered (*Note) from one to four in
the order they are defined. The first partition is when col3 is less than 10, the second
partition is when col3 is at least 10 but less than 100, and the third partition is when col3 is
at least 100 but less than 1,000.
The NO CASE OR UNKNOWN partition is for any value which isn't true for any previous
CASE_N expression. In this case, it would be when col3 is equal to or greater than 1,000 or
when col3 is NULL.
This partition is also used for values for which it isn't possible to determine the truth of the
previous CASE_N expressions. Usually, this is a case where col3 is NULL or unknown.
Internally, UNKNOWN (option by itself) rows are assigned to partition #1. NOCASE (NO
RANGE) OR UNKNOWN rows are physically assigned to partition #2. Internally, the first
user-defined partition is actually partition #3.
The physical implementation in the file system is:
col3 < 10
col3 < 100
col3 < 1000
NO CASE or UNKNOWN

– partition #1
– partition #2
– partition #3
– partition #4

(internally, rows placed in partition # 3)
(internally, rows placed in partition # 4)
(internally, rows placed in partition # 5)
(internally, rows placed in partition # 2)

It is NOT syntactically possible to code a partitioning expression that has both NO CASE
OR UNKNOWN, and UNKNOWN in the same expression. UNKNOWN expressions will
either be placed in the partition with NO CASE or in a partition of their own. The following
SQL is NOT permitted.
PARTITION BY CASE_N (
col3 < 10 ,
:
NO CASE OR UNKNOWN,
UNKNOWN)

Page 17-40

- causes an error

Partitioned Primary Indexes

Special Partition Examples
The following examples illustrate the use of NO CASE and UNKNOWN options.
Ex. 1 PARTITION BY CASE_N (
col3 < 10 ,
col3 < 100 ,
col3 < 1000 ,
NO CASE OR UNKNOWN)

If col3 = 5,
If col3 = 50,
If col3 = 500,
If col3 = 5000,
If col3 = NULL,

row is assigned to Partition #1.
row is assigned to Partition #2.
row is assigned to Partition #3.
row is assigned to Partition #4.
row is assigned to Partition #4.

In summary, NO CASE and UNKNOWN rows are placed into the same partition.

Ex. 2 PARTITION BY CASE_N (
col3 < 10 ,
col3 < 100 ,
col3 < 1000 ,
NO CASE,
UNKNOWN)

If col3 = 5,
row is placed in Partition #1.
If col3 = 50,
row is placed in Partition #2.
If col3 = 500, row is placed in Partition #3.
If col3 = 5000, row is placed in Partition #4.
If col3 = NULL, row is placed in Partition #5.

In summary, NO CASE and UNKNOWN rows are placed into separate partitions.
Note: RANGE_N works in a similar manner.

Partitioned Primary Indexes

Page 17-41

Partitioning with CASE_N – Example 4
This example illustrates the capability of partitioning based upon conditions (CASE_N).
For example, assume a table has a total revenue column, defined as decimal. The table
could be partitioned on that column, so that low revenue products are separated from high
revenue products. The partitioning expression could be written as shown on the facing page.
In this example, 8 partitions are defined for total revenue values up to 100,000. Two
additional partitions are defined – one for revenues greater than 100,000 and another for
unknown revenues (e.g., NULL).
Teradata 13.10 Note: Teradata 13.10 allows CURRENT_DATE and/or
CURRENT_TIMESTAMP with partitioning expressions. However, it is recommended to
NOT use these in a CASE expression for a partitioned primary index (PPI). Why? In this case,
all rows are scanned during reconciliation.

Additional examples:
The following examples illustrate the use of the NO CASE option by itself or the
UNKNOWN option by itself.
Ex.1

PARTITION BY CASE_N (
col3 < 10 ,
col3 < 100 ,
col3 < 1000 ,
NO CASE)
If col3 = 5, row is assigned to Partition #1.
If col3 = 50, row is assigned to Partition #2.
If col3 = 500, row is assigned to Partition #3.
If col3 = 5000, row is assigned to Partition #4.
If col3 = NULL, Error 5728
5728: Partitioning violation for table DBname.Tablename.

Ex. 2

PARTITION BY CASE_N (
col3 < 10 ,
col3 < 100 ,
col3 < 1000 ,
UNKNOWN)
If col3 = 5,
If col3 = 50,
If col3 = 500,
If col3 = 5000,
If col3 = NULL,

row is assigned to Partition #1.
row is assigned to Partition #2.
row is assigned to Partition #3.
Error 5728
row is assigned to Partition #4.

5728: Partitioning violation for table DBname.Tablename.

Page 17-42

Partitioned Primary Indexes

Partitioning with CASE_N – Example 4
Notes:

• Partition the data based on total revenue for the products.
• The NO CASE and UNKNOWN options allow for total_revenue >=100,000 or “unknown
revenue”.

• A UPI is NOT allowed because the partitioning columns are NOT part of the PI.
CREATE TABLE Sales_Revenue
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue
DECIMAL(9,2),
total_sold
INTEGER,
note
VARCHAR(256))
PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY CASE_N
( total_revenue <
2000 ,
total_revenue <
4000 ,
total_revenue <
6000 ,
total_revenue <
8000 ,
total_revenue <
10000 ,
total_revenue <
20000 ,
total_revenue <
50000 ,
total_revenue <
100000 ,
NO CASE ,
UNKNOWN );

Partitioned Primary Indexes

Page 17-43

SQL Use of PARTITION Key Word
The facing page contains an example of using the key word PARTITION to determine the
number of rows there are in physical partitions. This example is based on the
Sales_Revenue table is defined on the previous page.
The following table shows the same result as the facing page, but also identifies the internal
partition #’s as allocated.
Part # Row Count
1
169690
2
163810
3
68440
4
33490
5
18640
6
27520
7
1760

internally mapped to partition #3
internally mapped to partition #4
internally mapped to partition #5
internally mapped to partition #6
internally mapped to partition #7
internally mapped to partition #8
internally mapped to partition #9

Note that this table does not have any rows with a total_revenue value greater than 50,000
and less than 100,000. Partition #8 was not assigned. Also, there are no rows with a
total_revenue >=100,000 or NULL because the NO CASE and UNKNOWN partitions are
not used.
Assume the following three SQL INSERT commands are executed:
INSERT INTO Sales_Revenue
VALUES (1003, 5051, CURRENT_DATE, 51000, 45, NULL);
INSERT INTO Sales_Revenue
VALUES (1003, 5052, CURRENT_DATE, 102000, 113, NULL);
INSERT INTO Sales_Revenue
VALUES (1003, 5053, CURRENT_DATE, NULL, NULL, NULL);

The result of executing the SQL statement again would now be as follows:
Part # Row Count
1
169690
2
163810
3
68440
4
33490
5
18640
6
27520
7
1760
8
1
9
1
10
1

Page 17-44

internally mapped to partition #3
internally mapped to partition #4
internally mapped to partition #5
internally mapped to partition #6
internally mapped to partition #7
internally mapped to partition #8
internally mapped to partition #9
internally mapped to partition # 10
internally mapped to partition # 2 (NO CASE)
internally mapped to partition # 1 (UNKNOWN)

Partitioned Primary Indexes

SQL Use of PARTITION Key Word
The PARTITION SQL key word can be used to return partition numbers that have rows and
a count of rows that are currently located in partitions of a table.
SQL:
SELECT

PARTITION AS "Part #",
COUNT(*)
AS "Row Count"
FROM
Sales_Revenue
GROUP BY 1
ORDER BY 1;

Result:

Part #
1
2
3
4
5
6
7

Row Count
169690
163810
68440
33490
18640
27520
1760

total_revenue <
total_revenue <
total_revenue <
total_revenue <
total_revenue <
total_revenue <
total_revenue <

2,000
4,000
6,000
8,000
10,000
20,000
50,000

SQL - insert two rows:
INSERT INTO Sales_Revenue VALUES (1003, 5052, CURRENT_DATE, 102000, 113, NULL);
INSERT INTO Sales_Revenue VALUES (1003, 5053, CURRENT_DATE, NULL, NULL, NULL);

SQL (same as above):
SELECT

PARTITION AS "Part #",
COUNT(*)
AS "Row Count"
FROM
Sales_Revenue
GROUP BY 1
ORDER BY 1;

Partitioned Primary Indexes

Result:

Part #
1
2
:
7
9
10

Row Count
169690
163810
:
1760
1
1

total_revenue <
total_revenue <
:
total_revenue <
NO CASE
UNKNOWN

2,000
4,000
50,000

Page 17-45

SQL Use of CASE_N
The facing page contains an example of using the CASE_N expression with SQL. You may
wish to use this function to determine/forecast how rows will be mapped to various
partitions in a table. The Sales_Revenue table was created as follows:
CREATE TABLE Sales_Revenue
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
total_revenue
DECIMAL(9,2),
total_sold
INTEGER,
note
VARCHAR(256))
PRIMARY INDEX (store_id, item_id, sales_date)
PARTITION BY CASE_N
( total_revenue < 2000, total_revenue <
4000,
total_revenue < 6000, total_revenue <
8000,
total_revenue < 10000, total_revenue < 20000,
total_revenue < 50000, total_revenue < 100000,
NO CASE, UNKNOWN);

The CASE_N expression in the query on the facing page is simply an SQL statement that
shows how the rows would be partitioned.

SQL Use of RANGE_N
An example of using the RANGE_N expression with SQL is:
SELECT RANGE_N ( Calendar_Date BETWEEN
DATE '2004-11-28' AND DATE '2004-12-31' EACH INTERVAL '7' DAY,
DATE '2005-01-01' AND DATE '2005-01-09' EACH INTERVAL '7' DAY)
AS "Part #",
MIN (Calendar_Date)
AS "Minimum Date",
MAX (Calendar_Date)
AS "Maximum Date"
FROM
Sys_Calendar.Calendar
WHERE
Calendar_Date
BETWEEN
DATE '2004-11-28' AND DATE '2005-01-09'
GROUP BY
"Part #"
ORDER BY
"Part #";

Output from this SQL is:
Part #
1
2
3
4
5
6
7

Page 17-46

Minimum Date
2004-11-28
2004-12-05
2004-12-12
2004-12-19
2004-12-26
2005-01-01
2005-01-08

Maximum Date
2004-12-04
2004-12-11
2004-12-18
2004-12-25
2004-12-31
2005-01-07
2005-01-09

Partitioned Primary Indexes

SQL Use of CASE_N
The CASE_N (and RANGE_N) expressions can be used with SQL to forecast the
number of rows that will be placed into partitions.
This example uses a different partitioning scheme than the table actually has to determine
how many rows would be placed into various partitions.
SELECT

FROM
GROUP BY
ORDER BY

CASE_N ( total_revenue
total_revenue
total_revenue
total_revenue
total_revenue
total_revenue
total_revenue
total_revenue
NO CASE,
UNKNOWN )
count(*)
Sales_Revenue
1
1;

< 1500 ,
< 2000 ,
< 3000 ,
< 5000 ,
< 8000 ,
< 12000 ,
< 20000 ,
< 50000 ,
AS "Case #",
AS "Row Count"

Result:
Case #
1
2
3
4
5
6
7
8

Row Count
81540
88150
97640
103230
64870
31290
14870
1760

Notes:

• Currently, in this table, there are no rows with total_revenue >= 50,000 or NULL.
• The Case # would become the Partition # if the table was partitioned in this way.

Partitioned Primary Indexes

Page 17-47

Using ALTER TABLE with PPI Tables
The ALTER TABLE statement has been extended in support of PPI. For empty tables, the
primary index and partitioning expression may be re-specified. For tables with rows, the
partitioning expression may be modified only in ways that don’t require existing rows to be
re-evaluated.
The permitted changes for populated tables are to drop ranges at the ends or to add ranges at
the ends. For example, a common use of this capability would be to drop ranges for the
oldest dates, and to prepare additional ranges for future dates, among other things.
Limitations with ALTER TABLE:





Primary Index of a non-empty table may not be altered
Partitioning of a non-empty table is generally limited to altering the “ends”.
If a table has Delete triggers, they must be disabled if the WITH DELETE option
is specified.
If a save table has Insert triggers, they must be disabled if the WITH INSERT
option is specified.

For empty tables with a PPI, the ALTER TABLE statement can be used to do the following:






Remove partitioning for a partitioned table
Establish partitions for a table (adds or replaces)
Change the columns that comprise the primary index
Change a unique primary index to non-unique.
Change a non-unique primary index to unique.

For empty or non-empty tables, the ALTER TABLE statement can also be used to name an
unnamed primary index or drop the name of a named primary index.


To name an unnamed primary index or change the existing name of a primary
index to something else, specify
… MODIFY PRIMARY INDEX index_name;



To drop the name of a named index, specify
… MODIFY PRIMARY INDEX NOT NAMED;

Assume you have a populated data table (and the table is quite large) defined with a “nonunique partitioned primary index” and all of the partitioning columns are part of the PI. You
realize that the table should have been defined with a “unique partitioned primary index”,
but the table is already loaded with data. Here is a technique to convert this NUPI into a
UPI without copying or reloading the data.


Page 17-48

CREATE a USI on the columns making up the PI. ALTER the table, effectively
changing the NUPI to a UPI, and the software will automatically drop the USI.

Partitioned Primary Indexes

Using ALTER TABLE with PPI Tables
The ALTER TABLE statement has enhancements for a partitioned table to modify the
partitioning properties of the primary index for a table.
For populated tables, ...

• You are permitted to drop and/or add ranges at the “ends” of existing partitions on a
range-partitioned table.
– ALTER TABLE includes ADD / DROP RANGE options.
– You can also add or drop special partitions (NO RANGE or UNKNOWN).
– You cannot drop all the ranges.

• Possible use – drop ranges for the oldest dates and prepare additional ranges for
future dates.

• The set of primary index columns cannot be altered for a populated table.
Teradata 13.10 Feature

• ALTER TABLE has a new option to resolve partitioned table definitions with DATE,
CURRENT_DATE, and CURRENT_TIMESTAMP to their current values.
– This feature only applies to partitioned tables and join indexes.
To use ALTER TABLE for any purpose other than the above situations,
the table must be empty.

Partitioned Primary Indexes

Page 17-49

ALTER TABLE – Example 5
The DROP RANGE option is used to drop a range set from the RANGE_N function on
which the partitioning expression for the table is based. You can only drop ranges if the
partitioning expression for the table is derived only from a RANGE_N function. You can
drop empty partitions without specifying the WITH DELETE or WITH INSERT option.
Some of the ALTER TABLE statement options include:
DROP RANGE WHERE conditional_expression – a conditional partitioning expression used

to drop a range set from the RANGE_N function on which the partitioning expression
for the table is based.
You can only drop ranges if the partitioning expression for the table is derived only
from a RANGE_N function.
You must base conditional_partitioning_expression on the system-derived
PARTITION column.
DROP RANGE BETWEEN … [NO RANGE [OR UNKNOWN]] – used to drop a set of

ranges from the RANGE_N function on which the partitioning expression for the table
is based.
You can also drop NO RANGE OR UNKNOWN and UNKNOWN specifications from
the definition for the RANGE_N function.
You can only drop ranges if the partitioning expression for the table is derived
exclusively from a RANGE_N function.
Ranges must be specified in ascending order.
ADD RANGE BETWEEN … [NO RANGE [OR UNKNOWN]] – used to add a set of ranges

to the RANGE_N function on which the partitioning expression for the table is based.
You can also add NO RANGE OR UNKNOWN and UNKNOWN specifications to the
definition for the RANGE_N function.
You can only add ranges if the partitioning expression for the table is derived
exclusively from a RANGE_N function.

DROP Does NOT Mean DELETE
If a table does not have the NO RANGE partition, then partitions are dropped from the table
without using the Transient Journal and the rows are either deleted or are copied (WITH
INSERT) into a user-specified table.
If a table has a NO RANGE partition, rows are copied from dropped partition into the NO
RANGE partition.

Page 17-50

Partitioned Primary Indexes

ALTER TABLE – Example 5
To drop/add partitions and NOT COPY the old data to another table:
ALTER TABLE Sales MODIFY PRIMARY INDEX
DROP RANGE BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '1' MONTH
ADD RANGE BETWEEN DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' MONTH
WITH DELETE;

To drop/add partitions and COPY the old data to another table:
ALTER TABLE Sales MODIFY PRIMARY INDEX
DROP RANGE BETWEEN DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '1' MONTH
ADD RANGE BETWEEN DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' MONTH
WITH INSERT INTO SalesHistory;

Notes:
• Ranges are dropped and/or added to the "ends".
• DROP does NOT necessarily mean DELETE!
– If a table has a NO RANGE partition, rows are moved from the dropped partitions into the NO
RANGE partition. This can be time consuming.

•
•

The SalesHistory table must exist before using the WITH INSERT option.
The Sales table was partitioned as follows:
PARTITION BY RANGE_N (sales_date BETWEEN
DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' MONTH );

Partitioned Primary Indexes

Page 17-51

ALTER TABLE – Example 5 (cont.)
This page contains notes on the internal implementation. The important point is to
understand that dropping or adding partitions (to the “ends” of an already partitioned table
with data) does not cause changes to the internal partitioning numbers that are currently
implemented. The logical partition numbers change, but the internal partition numbers do
not. For this reason, dropping or adding partitions does not cause an undue amount of work.
The following table shows the same result as the facing page, but also identifies the internal
partition #’s as allocated.
PARTITION
1
2
:
13
14
:
119
120

Count(*)
10850
10150
:
12400
11200
:
14800
14950

internally mapped to partition #3
internally mapped to partition #4
:
internally mapped to partition #15
internally mapped to partition #16
:
internally mapped to partition #121
internally mapped to partition #122

In the example on the facing page, 12 partitions were dropped for the year 2003 and 12
partitions were added for the year 2013. The partitions for 2013 don’t appear because they
are empty.
The following table shows the same result as the facing page, but also identifies the internal
partition #’s as allocated after the partitions for the year 2003 were dropped.
PARTITION
1
2
:
107
108

12400
11200
:
14800
14950

internally mapped to partition #15
internally mapped to partition #16
:
internally mapped to partition #121
internally mapped to partition #122

You can add the NO RANGE and/or UNKNOWN partitions to an already partitioned table.
ALTER TABLE Sales MODIFY PRIMARY INDEX
ADD RANGE NO RANGE OR UNKNOWN;

If this table had NO RANGE partition defined and the 12 partitions were dropped (as in this
example), the data rows from the dropped partitions are moved to the NO RANGE partition.
To remove the special partitions and delete the data, use the following command:
ALTER TABLE Sales MODIFY PRIMARY INDEX
DROP RANGE NO RANGE OR UNKNOWN
WITH DELETE;

Page 17-52

Partitioned Primary Indexes

ALTER TABLE – Example 5 (cont.)
Partitions may only be dropped or added from/to the “ends” of a populated table.
SQL:
SELECT

PARTITION,
COUNT(*)
FROM
Sales
GROUP BY 1
ORDER BY 1;

Result: PARTITION COUNT(*)
1
2
:
119
120

10850
10150
:
14800
14950

Part #1 - January 2003

Part #120 - December 2012

ALTER TABLE Sales MODIFY PRIMARY INDEX
DROP RANGE BETWEEN
DATE '2003-01-01' AND DATE '2003-12-31' EACH INTERVAL '1' MONTH
ADD RANGE
BETWEEN
DATE '2013-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' MONTH
WITH DELETE;
SQL:
SELECT

PARTITION,
COUNT(*)
FROM
Sales
GROUP BY 1
ORDER BY 1;

Partitioned Primary Indexes

Result: PARTITION COUNT(*)
1
2
:
107
108

12400
11200
:
14800
14950

Part #1 - January 2004

Part #108 - December 2012

Page 17-53

ALTER TABLE TO CURRENT
Staring with Teradata 13.10, you can now specify CURRENT_DATE and
CURRENT_TIMESTAMP functions in a partitioned primary index for base tables and join
indexes.

Also starting with Teradata 13.10, Teradata provides a new option with the ALTER TABLE
statement to modify a partitioned table that has been defined with a moving
CURRENT_DATE (or DATE) or moving CURRENT_TIMESTAMP. This new option is
called ALTER TABLE TO CURRENT.
When you specify CURRENT_DATE and CURRENT_TIMESTAMP as part of a
partitioning expression for a partitioned table, these functions resolve to the date and
timestamp when you define the PPI. To partition on a new CURRENT_DATE or
CURRENT_TIMESTAMP value, submit an ALTER TABLE TO CURRENT request.

The ALTER TABLE TO CURRENT syntax is shown on the facing page.
The WITH DELETE option is used to delete any row whose partition number evaluates to a
value outside the valid range of partitions.
The WITH INSERT [INTO] save_table option is used to insert any row whose partition
number evaluates to a value outside the valid range of partitions into the table specified by
save_table.
The WITH DELETE or INSERT INTO save_table clause is sometimes referred to as a null
partition handler. You cannot specify a null partition handler for a join index.
Save_table and the table being altered must be different tables with different names.

Page 17-54

Partitioned Primary Indexes

ALTER TABLE TO CURRENT
This Teradata 13.10 option allows you to periodically resolve the CURRENT_DATE (or
DATE) and CURRENT_TIMESTAMP of a partitioned table to their current values.
Benefits include:

• You do not have to change the partitioning expression to update the value for
CURRENT_DATE or CURRENT_TIMESTAMP.

• To partition on a new CURRENT_DATE or CURRENT_TIMESTAMP value, simply
submit an ALTER TABLE TO CURRENT request.
Considerations:

• The ALTER TABLE TO CURRENT request causes the CURRENT_DATE and/or
CURRENT_TIMESTAMP to effectively repartition the rows in the table.

• If RANGE_N specifies CURRENT_DATE or CURRENT_TIMESTAMP in a partitioning
expression, you cannot use ALTER TABLE to add or drop ranges for the table. You
must use the ALTER TABLE TO CURRENT statement to achieve this function.

ALTER TABLE

table_name
join_index_name

Partitioned Primary Indexes

TO CURRENT

;
WITH

DELETE
INSERT [INTO] save_table

Page 17-55

ALTER TABLE TO CURRENT – Example 6
The ALTER TABLE TO CURRENT option allows you to periodically modify the
partitioning. This option resolves the CURRENT_DATE (or DATE) and
CURRENT_TIMESTAMP to their current values.
The example on the facing page assumes partitioning begins on a year boundary. Using this
example, consideration for the two options are:


With hard-coded dates in the CREATE TABLE statement, you must compute the
new dates and specify them explicitly in the ADD RANGE clause of the request.
This requires manual intervention every year you submit the request.



With CURRENT_DATE in the CREATE TABLE statement, you can schedule the
ALTER TABLE TO CURRENT request to be submitted annually or simply
execute the next year. This request rolls the partition window forward by
efficiently dropping and adding partitions.
As a result of executing the ALTER TABLE TO CURRENT WITH DELETE,
Teradata deletes the rows from the table because they are no longer needed.

Considerations:


You should evaluate how a DATE, CURRENT_DATE, or
CURRENT_TIMESTAMP function will require reconciliation in a partitioning
expression before you define such expressions on a table or join index.



If you specify multiple ranges using a DATE or CURRENT_DATE function in one
of the ranges, and then later reconcile the partitioning the range specified using
CURRENT_DATE might overlap one of the existing ranges. If so, reconciliation
aborts the request and returns an error to the requestor. If this happens, you must
recreate the table with a new partitioning expression based on DATE or
CURRENT_DATE. Because of this, you should design a partitioning expression
that uses a DATE or CURRENT_DATE function in one of its ranges with care.

DATE, CURRENT_DATE, and CURRENT_TIMESTAMP functions in a partitioning
expression are most appropriate when the data must be partitioned as one or more Current
partitions and one or more History partitions, where the terms Current and History are
defined with respect to the resolved DATE, CURRENT_DATE, or
CURRENT_TIMESTAMP values in the partitioning expression.
This enables you to reconcile a table or join index periodically to move older data from the
current partition into one or more history partitions using an ALTER TABLE TO
CURRENT request instead of redefining the partitioning using explicit dates that must be
determined each time you alter a table using ALTER TABLE requests to ADD or DROP
ranges.

Page 17-56

Partitioned Primary Indexes

ALTER TABLE TO CURRENT – Example 6
This example creates a partitioning expression to maintain the last 8 years of historical
data, data for the current year, and data for one future year for a total of 10 years.
CREATE TABLE Sales
( store_id
INTEGER NOT NULL,
item_id
INTEGER NOT NULL,
sales_date
DATE FORMAT 'YYYY-MM-DD',
:
PRIMARY INDEX (store_id, item_id)
PARTITION BY RANGE_N
(sales_date BETWEEN DATE '2004-01-01' AND DATE '2013-12-31' EACH INTERVAL '1' MONTH);

Assuming the current year is 2012, an equivalent definition using CURRENT_DATE is:
PRIMARY INDEX (store_id, item_id)
PARTITION BY RANGE_N
(sales_date BETWEEN
CAST(((EXTRACT(YEAR FROM CURRENT_DATE) - 8 - 1900) * 10000 + 0101) AS DATE) AND
CAST(((EXTRACT(YEAR FROM CURRENT_DATE) +1 - 1900) * 10000 + 1231) AS DATE)
EACH INTERVAL '1' MONTH);

In 2013, execute ALTER TABLE Sales TO CURRENT WITH DELETE;
• Teradata deletes the rows from 2004 because they are no longer needed.
• To view the date when the table was last resolved, then DBC.IndexConstraintsV provides new
columns named "ResolvedCurrent_Date" and "ResolvedCurrent_TimeStamp".

Partitioned Primary Indexes

Page 17-57

PPI Enhancements
The facing page identifies various enhancements with different Teradata releases.

Page 17-58

Partitioned Primary Indexes

PPI Enhancements
Teradata V2R6.0
• Selected Partition Archive, Restore, and Copy
• Dynamic partition elimination for merge join
• Single-AMP NUSI access when NUSI on same columns as NUPI;
• Partition elimination on RowIDs referenced by NUSI
Teradata V2R6.1
• PPI for global temporary tables and volatile tables
• Collect statistics on system-derived column PARTITION
Teradata V2R6.2
• PPI for non-compressed join indexes
Teradata 12.0
• Multi-level partitioning
Teradata 13.10
• Tables and non-compressed join indexes can now include partitioning on a character column.
• PPI tables allow a test value (e.g., RANGE_N) to have a TIMESTAMP(n) data type.
• ALTER TABLE tablename TO CURRENT …;
Teradata 14.0
• Increased partition limit to 9.223 quintillion
• New data types for RANGE_N – BIGINT and TIMESTAMP
• ADD option for a partitioning level

Partitioned Primary Indexes

Page 17-59

Multi-level PPI Concepts
The facing page identifies the basic concepts of using a multi-level PPI.
Multi-level partitioning allows each partition at a given level to be further partitioned into
sub-partitions. Each partition for a level is sub-partitioned the same per a partitioning
expression defined for the next lower level. The system hash orders the rows within the
lowest partition levels. A multilevel PPI (MLPPI) undertakes efficient searches by using
partition elimination at the various levels or combinations of levels.
Notes associated with multilevel partitioning:


Note that the number of levels of partitioning cannot exceed 15. Each level must
define at least two partitions. The number of levels of partitioning may be further
restricted by other limits such as the maximum size of the table header, data
dictionary entry sizes, etc.



The number of partitions in a table cannot exceed 65,535 partitions. The number of
partitions in an MLPPI is determined by multiplying the number of partitions at the
different levels (d1 * d2 * d3 * …).



The specification order of partitioning expressions can be important for multi-level
partitioning. The system maps multi-level partitioning expressions into a singlelevel combined partitioning expression. It then maps the resulting combined
partition number 1-to-1 to an internal partition number.



A usage implication - you can alter only the highest partition level, which by
definition is always level 1, to change the number of partitions at that level when
the table is populated with rows.

Page 17-60

Partitioned Primary Indexes

Multi-level PPI Concepts
• Allows multiple partitioning expressions instead of only one for a table or a noncompressed join index.

• Multilevel partitioning allows each partition at a level to be sub-partitioned.
– Each partitioning level is defined independently using a RANGE_N or CASE_N
expression.

• A multi-level PPI allows efficient searches by using partition elimination at the various
levels or combination of levels.

• Allows more flexibility in which partitioning expression to use when there are multiple
choices for the partitioning expressions.

• Teradata 14 allows for a maximum of 9.223 quintillion partitions and 62 levels.
• Syntax:
PARTITION BY

partitioning_expression
14*

(

partitioning_expression

,

partitioning_expression

)

14* – Teradata 13.10 limit.

Partitioned Primary Indexes

Page 17-61

Multi-level PPI Concepts (cont.)
The facing page contains an example showing the benefit of using a multi-level PPI.
You can use a multilevel PPI to improve query performance via partition elimination, either
at each of the partition levels or by combining all of them. An MLPPI provides multiple
access paths to the rows in the base table. As with other indexes, the Optimizer determines
if the index is usable for a query and, if usable, whether its use provides the estimated least
costly plan for executing the query.
The following list describes the various access methods that are available when a multilevel
PPI is defined for a table:













Page 17-62

If there is an equality constraint on the primary index and there are constraints on
the partitioning columns such that access is limited to a single partition at each
level, access is as efficient as with an NPPI.
This is a single-AMP, single-hash access in a single sub-partition at the lowest
level of the partition hierarchy.
With constraints defined on the partitioning columns, performance of a primary
index access can approach the performance of an NPPI depending on the extent of
partition elimination that can be achieved.
This is a single-AMP, single-hash access in multiple (but not all) sub-partitions at
the lowest level of the partition hierarchy.
Access by means of equality constraints on the primary index columns that does
not also include all the partitioning columns, and without constraints defined on the
partitioning columns, might not be as efficient as access with an NPPI. The
efficiency of the access depends on the number of non-empty sub-partitions at the
lowest level of the partition hierarchy.
This is a single-AMP, single-hash access in all sub-partitions at the lowest level of
the partition hierarchy.
With constraints on the partitioning columns of a partitioning expression such that
access is limited to a subset of, say n percent, of the partitions for that level, the
scan of the data is reduced to about n percent of the time required by a full-table
scan.
This is an all-AMP scan of only the non-eliminated partitions for that level. This
allows multiple access paths to a subset of the data: one for each partitioning
expression. If constraints are defined on partitioning columns for more than one of
the partitioning expressions in the MLPPI definition, partition elimination can lead
to even less of the data needing to be scanned.

Partitioned Primary Indexes

Multi-level PPI Concepts (cont.)
Query – Compare District 25 revenue for Week 6 vs. same period last year?

Sales for 2 Years

Week 6 Sales Only

Week 6 Sales
for District 25 Only

Full File Scan

No Partitioning

Partitioned Primary Indexes

Single Level
Partitioning

Multi-Level
Partitioning

Page 17-63

Multi-level Partitioning – Example 7
You create an MLPPI by specifying two or more partitioning expressions, where each
expression must be defined using either a RANGE_N function or a CASE_N function
exclusively. The system combines the individual partitioning expressions internally into a
single partitioning expression that defines how the data is partitioned on an AMP.
The first partitioning expression is the highest level partitioning. Within each of those
partitions, the second partitioning expression defines how each of the highest-level partitions
is sub-partitioned. Within each of those second-level partitions, the third-level partitioning
expression defines how each of the second level partitions is sub-partitioned. Within each of
these lowest level partitions, rows are ordered by the row hash value of their primary index
and their assigned uniqueness value.
You define the ordering of the partitioning expressions in your CREATE TABLE SQL text,
and that ordering implies the logically ordering by RowID. Because the partitions at each
level are distributed among the partitions of the next higher level in the hierarchy, scanning
a partition at a certain level requires skipping some internal partitions.
Partition expression order does not affect the ability to eliminate partitions, but does affect
the efficiency of a partition scan. As a general rule, this should not be a concern if there are
many rows, which implies multiple data blocks, in each of the partitions.
The facing page contains an example of creating a multi-level PPI.
There are two levels of partitioning defined in this example. The first level defines 120
partitions and the second defines 75 partitions. Therefore, the total number of partitions for
the combined partitioning expression is the product of 120 * 75, or 9000.

Page 17-64

Partitioned Primary Indexes

Multi-level Partitioning – Example 7
For example, partition Claim table by "Claim Date" and "State ID".
CREATE TABLE Claim
( claim_id
INTEGER NOT NULL
,cust_id
INTEGER NOT NULL
,claim_date
DATE
NOT NULL
,state_id
BYTEINT NOT NULL
,…)
PRIMARY INDEX (claim_id)
PARTITION BY (
/* First level of partitioning */
RANGE_N (claim_date BETWEEN
DATE '2003-01-01' AND DATE '2012-12-31' EACH INTERVAL '1' MONTH ),
/* Second level of partitioning */
RANGE_N (state_id
BETWEEN 1 AND 75 EACH 1) )
UNIQUE INDEX (claim_id);

Notes:
• For multi-level PPI, the set of partitioning expressions must be enclosed in
parentheses.
• Each level must define at least two partitions for a multi-level PPI.
• The number of defined partitions in this example is (120 * 75) or 9000.

Partitioned Primary Indexes

Page 17-65

Multi-level Partitioning – Example 7 (cont.)
The facing page continues the example of using a multi-level PPI. This example assumes
that the query has conditions where only claims for a specific month and for a specific state
need to be returned. Teradata only needs to scan the data blocks associated with the
specified criteria.

Page 17-66

Partitioned Primary Indexes

Multi-level PPI Example 7 (cont.)
Assume

• Eliminating all but one month out of many years of claims history would facilitate
scanning less than 2% of the claims history.

• Similarly, eliminating all but the California claims out of the many states would
facilitate scanning less than 4% of the claims history.
Then, combining both of these predicates for partition elimination would facilitate
scanning less than 0.08% of the claims history for satisfying the following query.

SELECT
FROM
WHERE
AND
AND

…
Claim C, States S
C.state_id = S.state_id
S.state_name = 'California'
C.claim_date BETWEEN DATE '2012-01-01' AND DATE '2012-01-31';

Partitioned Primary Indexes

Page 17-67

How is the MLPPI Partition # Calculated?
The facing page shows the calculation that is used to determine the partition number for a
MLPPI table.

Page 17-68

Partitioned Primary Indexes

How is the MLPPI Partition # Calculated?
Multilevel partitioning is rewritten internally to single-level partitioning to generate a
combined partition number as follows:
(p1 - 1) * dd1 + (p2 - 1) * dd2 + ... + (pn-1 - 1) * ddn-1 + pn
where n is the number of partitioning expressions
pi is the value of the partitioning expression for level i
di is the number of partitions for level i
ddi is the product of di+1 through dn
dd = d1* d2 * ... * dn <= 65535
dd is the total number of combined partitions
Example:
Assume January, 2012 is the 109th first level partition and California is the 6th state
code for the second level partition. Also assume that the first level has 120 partitions
and the second level has 75 partitions.
(109 – 1) * 75 + 6 = 8106
is the logical partition number for claims in California for January of 2012.

Partitioned Primary Indexes

Page 17-69

Character PPI
This Teradata 13.10 feature extends the capabilities and options when defining a PPI for a
table or a non-compressed join index. Tables and non-compressed join indexes can now
include partitioning on a character column. This feature provides the improved query
performance benefits of partitioning when queries have conditions on columns with
character (alphanumeric) data.


Before Teradata 13.10, customers were limited to creating partitioning on tables
that did not involve comparison of character data. Partitioning expressions were
limited to numeric or date type data.

The Partitioned Primary Index (PPI) feature of Teradata has always allowed a class of
queries to access a portion of a large table, instead of the entire table. This capability has
simply been extended to include character data. The traditional uses of the Primary Index
(PI) for data placement and rapid access of the data when the PI values are specified are still
retained.
When creating a table or a join index, the PARTITION BY clause (part of PRIMARY
INDEX) can now include partitioning on a character column. This allows the comparison of
character data.
This feature allows a partitioning expression to involve comparison of character data
(CHAR, VARCHAR, GRAPHIC, VARGRAPHIC) types. A comparison may involve a
predicate (=, >, <, >=, <=, <>, BETWEEN, LIKE) or a string function.


The use of a character expression is a PPI table is referred to as CPPI (Character
PPI).

The most common partitioning expressions utilize RANGE_N or CASE_N expressions.
Prior to Teradata 13.10, both the CASE_N and RANGE_N functions did not allow the PPI
definition of character data. This limited the useful partitioning that could be done using
character columns as a standard ordering (collation) of the character data is not preserved.
Both the RANGE_N and CASE_N functions support the definition of character data in
Teradata 13.10. The term "character or char" will be used to represent CHAR, VARCHAR,
GRAPHIC, or VARGRAPHIC data types.
The test value of a RANGE_N function should be a simple column reference, involving no
other functions or expressions. For example, if SUBSTR is added, then static partition
elimination will not occur. Keep the partitioning expressions as simple as possible.
RANGE_N (SUBSTR (state_code, 1, 1) BETWEEN 'AK' and 'CA', …
This definition will not allow static partition elimination.

Page 17-70

Partitioned Primary Indexes

Character PPI
Tables and non-compressed join indexes can now include partitioning on a character
column. This feature is referred to as CPPI (Character PPI).

• Prior to Teradata 13.10, partitioning expressions (RANGE_N and CASE_N) are limited
to numeric or date type data.
This feature allows a partitioning expression to involve comparison of character data
(CHAR, VARCHAR, GRAPHIC, VARGRAPHIC) types. A comparison may involve a predicate
(=, >, <, >=, <=, <>, BETWEEN, LIKE) or a string function.
Collation and case sensitivity considerations:

• The session collation in effect when the character PPI is created determines the
ordering of data used to evaluate the partitioning expression.

• The ascending order of ranges in a character PPI RANGE_N expression is defined by
the session collation in effect when the PPI is created or altered, as well as the case
sensitivity of the column or expression in the test value.

• The default case sensitivity of character data for the session transaction semantics in
effect when the PPI is created will also determine case sensitivity of comparison
unless overridden with an explicit CAST to a specific case sensitivity.

Partitioned Primary Indexes

Page 17-71

Character PPI – Example 8
In this example, the Claim table is first partitoned by claim_date (monthly intervals).
Claim_date is then sub-partitioned by state codes. State codes are then sub-partitioned by
the first two letters of a city name. The special partitions of NO RANGE and UNKNOWN
are defined at the claim_date, state_code, and city levels.
Why is the facing page partitioning example defined with intervals of 1 month for
claim_date?


Teradata 13.10 has a maximum limit of 65,535 partitions in a table and defining 8
years of day partitioning with two levels of sub-partitioning cause this limit to be
exceeded.

The following queries will benefit from this type of partitioning.
SELECT * FROM Claim_MLPPI2
WHERE state_code = 'GA' AND city LIKE 'a%';
SELECT * FROM Claim_MLPPI2
WHERE claim_date = '2012-08-24' AND city LIKE 'a%';

The session mode when these tables were created and when these queries were executed was
Teradata mode (BTET). Teradata mode defaults to "not case specific". The session
collation in effect when the character PPI is created determines the ordering of data used to
evaluate the partitioning expression.
The ascending order of ranges in a character PPI RANGE_N expression is defined by the
session collation in effect when the PPI is created or altered, as well as the case sensitivity of
the column or expression in the test value. The default case sensitivity of character data for
the session transaction semantics in effect when the PPI is created will also determine case
sensitivity of comparison unless overridden with an explicit CAST to a specific case
sensitivity.
The default case sensitivity in effect when the character PPI is created will also affect the
ordering of character data for the PPI.


Default case sensitivity of comparisons involving character constants is influenced
by the session mode. String literals have a different default CASESPECIFIC
attribute depending on the session mode.
–
–



Page 17-72

Teradata Mode (BTET) is NOT CASESPECIFIC
ANSI mode is CASESPECIFIC

If any expression in the comparison is case specific, then the comparison is case
sensitive.

Partitioned Primary Indexes

Character PPI – Example 8
In this example, 3 levels of partitioning are defined.
CREATE TABLE Claim_MLPPI2
(claim_id
INTEGER
cust_id
INTEGER
claim_date
DATE
city
VARCHAR(30)
state_code
CHAR(2)
claim_info
VARCHAR (256))
PRIMARY INDEX (claim_id)
PARTITION BY
( RANGE_N
(claim_date
BETWEEN
RANGE_N
(state_code
RANGE_N
(city

NOT NULL,
NOT NULL,
NOT NULL,
NOT NULL,
NOT NULL,

DATE '2005-01-01' and DATE '2012-12-31'
EACH INTERVAL '1' MONTH, NO RANGE),

BETWEEN

'A', 'D', 'I', 'N', 'T' AND 'ZZ', NO RANGE),

BETWEEN

'A', 'C', 'E', 'G', 'I', 'K', 'M', 'O',
'Q', 'S', 'U', 'W' AND 'ZZ', NO RANGE) )

UNIQUE INDEX (claim_id);

The following queries will benefit from this type of partitioning.
• SELECT * FROM Claim_MLPPI2 WHERE state_code = 'OH';
• SELECT * FROM Claim_MLPPI2 WHERE state_code = 'GA' AND city LIKE 'a%';
• SELECT * FROM Claim_MLPPI2 WHERE claim_date = DATE '2012-08-24' AND city LIKE 'a%';

Partitioned Primary Indexes

Page 17-73

Summary
The customer (e.g., DBA, Database Designer, etc.) has a flexible and powerful tool to
structure tables to allow automatic optimization of frequently used queries. This tool is the
Partitioned Primary Index (PPI) feature. This feature allows tables to be partitioned on
columns of interest, while retaining the traditional use of the primary index (PI) for data
distribution and efficient access when the PI values are specified in the query.
The facing page contains a summary of the key customer benefits that can be obtained by
using Partitioned Primary Indexes.
Whether and how to partition a table is a physical design choice.
A well-chosen partitioning scheme may be able to turn many frequently run queries into
partial-table scans instead of full-table scans, with much improved performance.
However, understand that there are trade-off considerations that must be understood and
carefully considered to get the most benefit from the PPI feature.

Page 17-74

Partitioned Primary Indexes

Summary
• Improves the performance of queries that use range constraints on the range
partitioning column(s) by allowing for range/partition elimination.

– Allows primary index access and range access without a secondary index.
• General Recommendations
– Collect statistics on system-derived column PARTITION.
– Do not define or name a column PARTITION in a PPI table – you won’t be able to reference the
system-derived column PARTITION for the table.

– If possible, avoid use of NO RANGE, NO RANGE OR UNKNOWN, or UNKNOWN options with
RANGE_N partitioning for DATE columns.

– Consider only having as many date ranges as needed currently plus some for the future –
helps the optimizer cost plans better, especially when partitioning column is not included in
the Primary Index.

• Note (as with all partitioning/indexing schemes) there are tradeoffs due to performance
impacts on table access, joins, maintenance, and other operations.

Partitioned Primary Indexes

Page 17-75

Module 17: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 17-76

Partitioned Primary Indexes

Module 17: Review Questions
Partition # Row Hash

Uniqueness Value

1. In a PPI table, every row is uniquely identified by its ______ _____ + ______ ______ + _____ ______ .

Partition
# + _______
Row _______
Hash .
2. The Row Key consists of the ________
________
FALSE (it only cares
about the first part of
4. True or False. For a PPI table, the partition number and the Row Hash are both used by the
the row hash)

0
3. In an NPPI table, the partition number defaults to ________
.

Message Passing Layer to determine which AMP(s) should receive the request.

5. Which two options apply to the RANGE_N expression in a partitioning expression? ____ ____
a.
b.
c.
d.

Ranges can be listed in descending order
Allows use of NO RANGE OR UNKNOWN option
Partitioning column must be part of the Primary Index
Has a maximum of 65,535 partitions with Teradata Release 13.10

6. With a populated table, select 2 actions that are allowed with the ALTER TABLE command. ____ ____
a.
b.
c.
d.

Drop all of the ranges
Add or drop ranges from the partition “ends”
Change the columns that comprise the primary index
Add or drop special partitions (NO RANGE, UNKNOWN)

7. Which 2 choices are advantages of partitioning a table? ____ ____
a.
b.
c.
d.

Fast delete of rows in partitions
Fewer AMPs are involved when accessing data
Faster access (than an NPPI table) if the table has a UPI
Range queries can be executed without a secondary index

Partitioned Primary Indexes

Page 17-77

Module 17: Review Questions (cont.)
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 17-78

Partitioned Primary Indexes

Module 17: Review Questions (cont.)
Given this CREATE TABLE statement, answer the following questions.
CREATE TABLE
Orders
(Order_id
INTEGER NOT NULL,
Cust_id
INTEGER NOT NULL,
Order_status
CHAR(1),
Total_price
DECIMAL(9,2) NOT NULL,
Order_date
DATE FORMAT 'YYYY-MM-DD' NOT NULL,
Order_priority
SMALLINT,
Clerk
CHAR(16),
Ship_priority
SMALLINT,
Order_Comment VARCHAR(80) )
PRIMARY INDEX (Order_id)
PARTITION BY RANGE_N (Order_date
BETWEEN DATE '2003-01-01' AND DATE '2012-12-31'
EACH INTERVAL '1' MONTH)
UNIQUE INDEX
(Order_id);

Order_date

8. What is the name of partitioning column? ______________

1 Month
9. What is the time period for each partition? ______________
10. Why is there a Unique Secondary Index specified instead of defining Order_id as a UPI? _____
a. This is a coding style choice.
b. You cannot have a UPI when using a partitioned primary index.
c. You cannot have a UPI if the partitioning column is not part of the primary index.
d. This is a mistake. You cannot have a secondary and a primary index on the same column(s).

Partitioned Primary Indexes

Page 17-79

Lab Exercise 17-1
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT COUNT(*) FROM table_name;
SHOW TABLE table_name;

A count of rows in the Orders table is 31,200.
A count of rows in the Orders_2012 table is 12,000.

Page 17-80

Partitioned Primary Indexes

Lab Exercise 17-1
Lab Exercise 17-1
Purpose
In this lab, use Teradata SQL Assistant to create tables with primary indexes partitioned in various
ways.
What you need
Populated DS tables and empty tables in your database
Tasks
1. Using INSERT/SELECT, populate your Orders and Orders_2012 tables from the DS.Orders and
DS.Orders_2012 tables, respectively. Your Orders table will have data for the years 2003 to 2011 and
the Orders_2012 table will have data for 2012. Verify the number of rows in your tables.
SELECT COUNT(*) FROM Orders;
SELECT COUNT(*) FROM Orders_2012;
2.

Count = _________ (should be 31,200)
Count = _________ (should be 12,000)

Use the SHOW TABLE for Orders to help create a new, similar table (same column names and
definitions, etc.) named "Orders_PPI" that has a PPI based on the following:
Primary Index – orderid
Partitioning column – orderdate
– From '2003-01-01' through '2012-12-31', partition by month
– Include the NO RANGE option (the UNKNOWN option is not needed for orderdate)
– Do not create any secondary indexes for this table
How many partitions are logically defined for the Orders_PPI table? ______

Partitioned Primary Indexes

Page 17-81

Lab Exercise 17-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT
FROM

COUNT(*)
table_name;

SELECT
FROM
GROUP BY
ORDER BY

PARTITION, COUNT(*)
table_name
1
1;

SELECT
FROM
WHERE
GROUP BY
ORDER BY

PARTITION, COUNT(*)
table_name
orderdate BETWEEN '2012-01-01' AND '2012-12-31'
1
1;

SELECT COUNT(DISTINCT(PARTITION)) FROM table_name;

Page 17-82

Partitioned Primary Indexes

Lab Exercise 17-1 (cont.)
3. INSERT/SELECT all of the rows from your Orders table into the Orders_PPI table. Verify the
number of rows in your table. Count = ________
How many partitions would you estimate have data at this time? ________

4. Use the PARTITION key word to list the partitions and number of rows in various partitions.
How many partitions actually have data? ________
How many rows are in each partition for the year 2003? ________
How many rows are in each partition for the year 2011? ________
5. Use INSERT/SELECT to add the rows from the Orders_2012 table to your Orders_PPI table. Verify
the number of rows in your table. Count = ________
Use the PARTITION key word to determine the number of partitions used and the number of rows in
various partitions.
How many partitions actually have data? ________
How many rows are in each partition for the year 2012? ________

Partitioned Primary Indexes

Page 17-83

Lab Exercise 17-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT

COUNT(*) FROM table_name;

SELECT
FROM
WHERE

COUNT(DISTINCT(PARTITION))
table_name
orderdate … ;

SELECT
FROM

MAX (PARTITION)
table_name;

SELECT
FROM
GROUP BY
ORDER BY

PARTITION, COUNT(*)
table_name
1
1;

SELECT COUNT(DISTINCT(PARTITION)) FROM table_name;

The PARTITION key word only returns partition numbers of partitions that contain rows.
The following “canned” SQL can be used to return a list of partitions that are not used
between the first and last used partitions.
SELECT p + 1
AS "The missing partitions are:"
FROM
(SELECT p1 - p2
AS p,
PARTITION AS p1,
MDIFF(PARTITION, 1, PARTITION) AS p2
FROM
table_name
QUALIFY p2 > 1)
AS temp;

Page 17-84

Partitioned Primary Indexes

Lab Exercise 17-1 (cont.)
6. INSERT the following row (using these values) into the Orders_PPI table.
(100000, 1000, 'C', 1000, '2000-12-31', 10, 'your name', 5, 20, 'old order')
How many partitions are now in Orders_PPI? ____
What is the partition number (highest partition #) of the NO RANGE OR UNKNOWN partition? ____

7. (Optional) Create a new table named "Orders_PPI_ML" that has a Multi-level PPI based on the
following:
Primary Index – orderid
First Level of Partitioning column – orderdate (use month ranges for all 10 years)
Include the NO RANGE option for orderdate
Second Level of Partitioning column – location (10 different order locations (1 through 10)
Place the NO RANGE and UNKNOWN rows into the same special partition for location
Unique secondary index – orderid

8. (Optional) Populate the Orders_PPI_ML table from the Orders and Orders_2012 tables using
INSERT/SELECT. Verify the number of rows in Orders_PPI_ML. Count = ________

Partitioned Primary Indexes

Page 17-85

Lab Exercise 17-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 VALUES (value1, value2, … );
INSERT INTO table_1 SELECT * FROM table_2;
SELECT COUNT(*) FROM table_name;
SELECT COUNT(DISTINCT(PARTITION)) FROM table_name;

Page 17-86

Partitioned Primary Indexes

Lab Exercise 17-1 (cont.)
9. (Optional) For the Orders_PPI_ML table, use the PARTITION key word to answer the following
questions.
How many partitions actually have data? ________
What is the highest partition number? _________
What is the partition number for orders in January, 2012 and location 1? _____
What is the partition number for orders in February, 2012 and location 1? _____
Is there a difference of 11 partitions between these two months? _____
Why or why not? _________________________________________________________________
10. (Optional) Before altering the table, verify the number of rows in Orders_PPI. Count = _______
Use the ALTER TABLE command on Orders_PPI to do the following.
–
–

DROP RANGE (with DELETE) for year 2003
ADD RANGE for orders that will be placed in year 2013 with an interval of 1 month

Use SHOW TABLE on Orders_PPI to view the altered partitioning.
Use the PARTITION key word to list the partitions and the number of rows in various partitions.
How many partitions currently have data rows? _______
How many rows now exist in the table? _______
Has the row count changed? ___
If the row count did not change, why not? ____________________________________________

Partitioned Primary Indexes

Page 17-87

Notes

Page 17-88

Partitioned Primary Indexes

Module 18
Teradata Columnar
After completing this module, you will be able to:
 Describe the components that comprise a Row ID in a column partitioned
table.
 Identify two advantages of creating a column partitioned table.
 Identify two disadvantages of creating a column partitioned table.
 Identify the recommended way to populate a column partitioned table.
 Specify how rows are deleted in a column partitioned table.

Teradata Proprietary and Confidential

Column Partitioning

Page 18-1

Notes

Page 18-2

Column Partitioning

Table of Contents
Teradata Columnar ..................................................................................................................... 18-4
Teradata Columnar Benefits ...................................................................................................... 18-6
Columnar Join Indexes........................................................................................................... 18-6
No Primary Index Table DDL ................................................................................................... 18-8
The No Primary Index Table.................................................................................................... 18-10
Column Partition Table DDL (without Auto-Compression) ................................................... 18-12
Characteristics of a Columnar Table .................................................................................... 18-12
Column Partition Container (No Automatic Compression) ..................................................... 18-14
The Column Partition Table (without Auto-Compression) ..................................................... 18-16
CP Table Query #1 (without Auto-Compression) ................................................................... 18-18
CP Table Query #1 (without Auto-Compression) ................................................................... 18-20
Column Partition Table DDL (with Auto-Compression) ........................................................ 18-22
Auto-Compression for CP Tables ............................................................................................ 18-24
Auto-Compression Techniques for CP Tables......................................................................... 18-26
User-Defined Compression Techniques .................................................................................. 18-28
Column Partition Container (Automatic Compression)........................................................... 18-30
The Column Partition Table (with Auto-Compression)........................................................... 18-32
CP Table Query #2 (with Auto-Compression) ........................................................................ 18-34
CP Table with Row Partitioning DDL ..................................................................................... 18-36
Determining the Column Partition Level ............................................................................. 18-36
The Column Partition Table (with Row Partitioning) ............................................................. 18-38
CP Table with Multi-Column Container DDL ........................................................................ 18-40
The CP Table with Multi-Column Container........................................................................... 18-42
CP Table Hybrid Row & Column Store DDL ......................................................................... 18-44
COLUMN Format Considerations ....................................................................................... 18-44
ROW Format Considerations ............................................................................................... 18-44
The CP Table (with Hybrid Row & Column Store) ................................................................ 18-46
Populating a CP Table.............................................................................................................. 18-48
INSERT-SELECT ................................................................................................................ 18-48
Options ................................................................................................................................. 18-48
DELETE Considerations.......................................................................................................... 18-50
The Delete Column Partition ........................................................................................... 18-50
UPDATE Considerations ......................................................................................................... 18-52
USI Access ........................................................................................................................... 18-52
NUSI Access ........................................................................................................................ 18-52
CP Table Restrictions............................................................................................................... 18-54
Summary .................................................................................................................................. 18-56
Module 18: Review Questions ................................................................................................. 18-58
Lab Exercise 18-1 .................................................................................................................... 18-60

Column Partitioning

Page 18-3

Teradata Columnar
Teradata Column or Column Partitioning (CP) is a new physical database design
implementation option (starting with Teradata 14.0) that allows single columns or sets of
columns of a NoPI table to be stored in separate partitions. Column partitioning can also be
applied to join indexes.
Columnar is simply a new approach for organizing the data of a user-defined table or join
index on disk.
Teradata Columnar offers the ability to partition a table or join index by column. Teradata
Columnar can be used alone or in combination with row partitioning in multilevel
partitioning definitions. Column partitions may be stored using traditional ‘ROW’ storage
or alternatively stored using the new ‘COLUMN’ storage option. In either case, columnar
can automatically compress physical rows where appropriate.
The key benefit in defining row-partitioned (PPI) tables is when queries access a subset of
rows based on constraints on one or more partitioning columns. The major advantage of
using column partitioning is to improve the performance of queries that access a subset of
the columns from a table, either for predicates (e.g., WHERE clause) or projections (i.e.,
SELECTed columns).
Because sets of one or more columns can be stored in separate column partitions, only the
column partitions that contain the columns needed by the query need to be accessed. Just as
row-partitioning can eliminate rows that need not be read, column partitioning eliminates
columns that are not needed.
The advantages of both can be combined, meaning even less data moved and thus reduced
I/O. Fewer data blocks need to be read since more data of interest is packed together into
fewer data blocks.
Columnar makes more sense in CPU-rich environments because CPU cycles are needed to
“glue” columns back together into rows, for compression and for different join strategies
(mainly hash joins).

Page 18-4

Column Partitioning

Teradata Columnar
•

Description

– Columnar (or Column Partitioning) is a new physical database design
implementation option that allows sets of columns (including just a single column)
of a table or join index to be stored in separate partitions.

– This is effectively an I/O reduction feature to improve performance for suitable
classes of workloads.

– This allows the capability for a table or join index to be column (vertically)
partitioned, row (horizontally) partitioned or both by using the already existing
multilevel partitioning capability.

•

Considerations

– Note that column partitioning is a physical database design choice and may not be
suitable for all workloads using that table/join index.

– It is especially suitable if both a small number of rows are selected and a few
columns are projected.

– When individual rows are deleted, they are not physically deleted, but are marked
as deleted.

Column Partitioning

Page 18-5

Teradata Columnar Benefits
The facing page lists a number of Teradata Columnar benefits.

Columnar Join Indexes
A join index can also be created as column-partitioned for either a columnar table or a noncolumnar table. Conversely, a join index can be created as non-columnar for either type of
table as well.
Sometime within a mixed workload, some queries might perform better if data is not column
partitioned and some where it is column partitioned. Or, perhaps some queries perform
better with one type of partitioning on a table, whereas other queries do better with another
type of partitioning. Join indexes allow creation of alternate physical layouts for the data
with the optimizer automatically choosing whether to access the base table and/or one of its
join indexes.
A column-partitioned join index must be a single-table, non-aggregate, non-compressed,
join index with no primary index, and no value-ordering, and must include RowID of the
base table. A column-partitioned join index may optionally be row partitioned. It may also
be a sparse join index.
This module will only describe and include examples of base tables that utilize column
partitioning.

Page 18-6

Column Partitioning

Teradata Columnar Benefits
Benefits of using the Teradata Columnar feature include:

• Improved query performance
Column partitioning can be used to improve query performance via column partition
elimination. Column partition elimination reduces the need to access all the data in a row
while row partition elimination reduces the need to access all the rows.

• Reduced disk space
The feature also allows for the possibility of using a new auto-compression capability which
allows data to be automatically (as applicable) compressed as physical rows are inserted
into a column-partitioned table or join index.

• Increased flexibility
Provides a new physical database design option to improve performance for suitable classes
of workloads.

• Reduced I/O
Allows fast and efficient access to selected data from column partitions, thus reducing query
I/O.

• Ease of use
Provides simple default syntax for the CREATE TABLE and CREATE JOIN INDEX
statements. No change is needed to queries.

Column Partitioning

Page 18-7

No Primary Index Table DDL
The facing page simply illustrates the DDL to create a NoPI table. This example will be as a
basis for multiple examples of creating tables with various column partitioning options.

Page 18-8

Column Partitioning

No Primary Index Table DDL
CREATE TABLE Super_Bowl
(Winner
CHAR(25)
,Loser
CHAR(25)
,Game_Date
DATE
,Game_Score
CHAR(7)
,Attendance
INTEGER)
NO PRIMARY INDEX;

NOT
NOT
NOT
NOT

NULL
NULL
NULL
NULL

In this module, we will use a example of Super Bowl history information to
simply demonstrate column partitioning.

Column Partitioning

Page 18-9

The No Primary Index Table
The No Primary Index table is shown on the facing page.

Page 18-10

Column Partitioning

The No Primary Index Table
Partition:
For NOPI tables this number is 0
HB:
The lowest number hashbucket on this AMP
Row # (Uniqueness ID):
The row number of this row
P-Bits #:
Presence Bits for the nullable columns
Partition

HB

Winner

Loser

Game_Date

Game_Score

0

n

Row # P-Bits
1

0

Dallas Cowboys

Denver Broncos

01-15-1978

27-10

Attendance
(null)

0

n

2

1

Chicago Bears

New England Patriots

01-26-1986

46-10

73,818

0

n

3

1

Pittsburgh Steelers

Arizona Cardinals

02-01-2009

27-23

70,774

0

n

4

1

Pittsburgh Steelers

Minnesota Vikings

01-12-1975

16-6

80,997

0

n

5

1

Pittsburgh Steelers

Seattle Seahawks

02-05-2006

21-10

68,206

0

n

6

1

New York Jets

Baltimore Colts

01-12-1969

16-7

75,389

0

n

7

0

Dallas Cowboys

Buffalo Bills

01-31-1993

52-17

(null)

0

n

8

1

Oakland Raiders

Philadelphia Eagles

01-25-1981

27-10

76,135

0

n

9

1

San Francisco 49ers

Cincinnati Bengals

01-24-1982

26-21

81,270

Collectively known as the ROWID

Column Partitioning

Page 18-11

Column Partition Table DDL (without AutoCompression)
With column partitioning, each column or specified group of columns in the table can
become a partition containing the column partition values of that column partition. This is
the simplest partitioning approach since there is no need to define partitioning expressions,
as seen in the example on the facing page.
The clause PARTITION BY COLUMN specifies that the table has column partitioning.
Each column of this table will have its own partition and will be (by default) in column
storage since no explicit column grouping is specified.
Note that a primary index is not specified since this is NoPI table. A primary index may not
be specified if the table is column partitioned.

Characteristics of a Columnar Table
A table or join index that is partitioned by column has several key characteristics:


It does not have a primary index.



Each column partition can be composed of single or multiple columns.



Each column partition usually consists of multiple physical rows.



A new physical row format COLUMN may be utilized for a column partition. Such
a physical row is called a ‘container’ and it is used to implement columnar-storage
for a column partition.



Alternatively, a column partition may also have traditional physical rows with
ROW format. Such a physical row for columnar partitions is called a subrow. This
is used to implement row-storage for a column partition.



Note that in subsequent discussions, when ‘row storage’ or ‘row format’ is stated, it
is referring to columnar partitioning with the ROW storage option selected. This is
not to be confused with row-partitioning which we associate with a PPI table.

In a table with multiple levels of partitioning, only one level may be column partitioned. All
other levels must be row-partitioned (i.e., PPI).

Page 18-12

Column Partitioning

Column Partition Table DDL
(without Auto-Compression)
CREATE TABLE Super_Bowl
(Winner
CHAR(25)
NOT NULL
,Loser
CHAR(25)
NOT NULL
,Game_Date
DATE
NOT NULL
,Game_Score
CHAR(7)
NOT NULL
,Attendance
INTEGER)
NO PRIMARY INDEX
PARTITION BY COLUMN
(Winner
NO AUTO COMPRESS
,Loser
NO AUTO COMPRESS
,Game_Date
NO AUTO COMPRESS
,Game_Score
NO AUTO COMPRESS
,Attendance
NO AUTO COMPRESS);
Defaults for a column partitioned table.

• Single-column partitions; options include multicolumn partitions.
• Auto compression is on; NO AUTO COMPRESS turns off auto-compression for the
column.

• System-determined column-store for above column partitions; options include
column-store (COLUMN) or row-store (ROW).

Column Partitioning

Page 18-13

Column Partition Container (No Automatic
Compression)
In order to support columnar-storage for a column partition, a new format, referred to as a
COLUMN format in the syntax, is available for a physical row.
A physical row with this format is referred to as a container and each container holds a
series of column partition values for a column partition.
Each container is assigned a specific partition number which identifies the column or group
of columns whose column partition values are held in the container. When a column
partition is stored on disk, one or more containers may be needed to hold all the column
partition values of the column partition. Since a container is a physical row, the size of a
container is limited by the maximum physical row size.
The example on the facing page assumes that NO AUTO COMPRESS has been specified
for the column.
Containers hold multiple values for the same column (or columns) of a table. For purposes
of this explanation, the assumption is being made that each partition contains only a single
column so a column partition value is the same as a column value. Recall that each column
value belongs to a specific row and that each row is identified by a RowID consisting of a
row-hash and uniqueness value. Since all of the rows on a single AMP of a NoPI table share
the same row hash, the uniqueness value becomes the real differentiator. So the connection
between a specific column value for a particular row on a given AMP and its uniqueness
value is the key in locating the corresponding column value.
Assume that a given container holds 1000 values. The RowID of each container carries a
hash bucket and uniqueness which represents the first column value entry in the container.
The first value’s hash bucket and uniqueness is explicit while the other values’ hash bucket
and uniqueness are implicit and are understood based on their position in their container.
The exact location of a column partition value is known based on its relative position within
the container. For example, if the uniqueness value in the container’s RowID is 201 and a
column partition value with a uniqueness value of 205 needs to be located, the 5th entry in
that container is the corresponding column partition value.

Page 18-14

Column Partitioning

Column Partition Container
(No Automatic Compression)

Partition

Column Store RowID

HB

Row #

Starting row number

1’s & 0’s

Auto-Compression &
NULL Bits

Dallas Cowboys
Chicago Bears
Pittsburgh Steelers
Pittsburgh Steelers

Column Data

Pittsburgh Steelers
New York Jets
Dallas Cowboys
Oakland Raiders
San Francisco 49ers

Column Container is effectively a row in the partition.

Column Partitioning

Page 18-15

The Column Partition Table (without AutoCompression)
The result of creating a column partitioned table (as shown previously) is shown on the
facing page with some sample data.
The clause PARTITION BY COLUMN specifies that the table has column partitioning.
Each column of this table will have its own partition and will be (by default) in column
storage since no explicit column grouping is specified.
The default of auto-compression is overridden for each of the columns.

Page 18-16

Column Partitioning

The Column Partition Table
(without Auto-Compression)
Column Containers

Winner

Loser

Game_Date

Game_Score

Attendance

Part 1-HB-Row #1

Part 2-HB-Row #1

Part 3-HB-Row #1

Part 4-HB-Row #1

Part 5-HB-Row #1
1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

Dallas Cowboys

Denver Broncos

01-15-1978

27-10

(null)

Chicago Bears

New England Patriots

01-26-1986

46-10

73,818

Pittsburgh Steelers

Arizona Cardinals

02-01-2009

27-23

70,774

Pittsburgh Steelers

Minnesota Vikings

01-12-1975

16-6

80,997

Pittsburgh Steelers

Seattle Seahawks

02-05-2006

21-10

68,206
75,389

New York Jets

Baltimore Colts

01-12-1969

16-7

Dallas Cowboys

Buffalo Bills

01-31-1993

52-17

(null)

Oakland Raiders

Philadelphia Eagles

01-25-1981

27-10

76,135

San Francisco 49ers

Cincinnati Bengals

01-24-1982

26-21

81,270

Column containers are effectively separate rows in a NoPI table.

Column Partitioning

Page 18-17

CP Table Query #1 (without Auto-Compression)
One of the key advantages of column partitioning is opportunity for reduced I/O. This can
be realized if only a subset of the columns in a table are read and if those column values are
held in separate column partitions. Data is stored on disk by partition, so when partition
elimination takes place, data blocks in the eliminated partitions are simply not read.
There are three ways to initiate read access to data within a column-partitioned table:




A full column partition scan
Indexed access (using a secondary, join index, or hash index),
A RowID join.

Both unique and non-unique secondary indexes are allowed on column-partitioned tables, as
are join indexes and hash indexes.
Queries best suited for scanning a column-partitioned table are queries that:



Page 18-18

Contain one or a few predicates that are very selective in combination.
Require a small enough number of columns to be accessed that the caches required
to support their consolidation can be held in memory.

Column Partitioning

CP Table Query #1
(without Auto-Compression)
Which teams have lost to the "Dallas Cowboys" in the Super Bowl?
Winner

Loser

Game_Date

Game_Score

Attendance

Part 1-HB-Row #1

Part 2-HB-Row #1

Part 3-HB-Row #1

Part 4-HB-Row #1

Part 5-HB-Row #1
1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

Dallas Cowboys

Denver Broncos

01-15-1978

27-10

(null)

Chicago Bears

New England Patriots

01-26-1986

46-10

73,818

Pittsburgh Steelers

Arizona Cardinals

02-01-2009

27-23

70,774

Pittsburgh Steelers

Minnesota Vikings

01-12-1975

16-6

80,997

Pittsburgh Steelers

Seattle Seahawks

02-05-2006

21-10

68,206
75,389

New York Jets

Baltimore Colts

01-12-1969

16-7

Dallas Cowboys

Buffalo Bills

01-31-1993

52-17

(null)

Oakland Raiders

Philadelphia Eagles

01-25-1981

27-10

76,135

San Francisco 49ers

Cincinnati Bengals

01-24-1982

26-21

81,270

Only the accessed columns are needed.

Column Partitioning

Page 18-19

CP Table Query #1 (without Auto-Compression)
If indexing is not available, Teradata can access data in a CP table by scanning a column
partition on all the AMPs in parallel.
In the example on the facing page, the “Winner” column containers are scanned for “Dallas
Cowboys”.
The following describes the scanning of CP data:
1. Columns within the table definition that aren’t referenced in the query are ignored
due to partition elimination.
2. If there is a predicate column in the query, its column partition is read.
3. Values within the predicate column partition are examined and compared against
the value passed in the query WHERE clause.
4. Each time a qualifying value is located, the next step is building up a row for the
output spool.
5. All the column partition values for a logical row have the same RowID except for
the column partition number. The RowID associated with each predicate column
value that matches the constraint in the query becomes the link to other column
partition values of the same logical row by simply modifying the column partition
number of the RowID to the column partition number for each of these other
column partition values.
If there is more than one predicate column in the query that can be used to disqualify rows,
the column for one of these predicates is chosen and its column partition is scanned.
Statistics, as well as other heuristics, are used by the optimizer to pick the most selective and
least costly predicate. Once that decision has been made, only that single column partition
needs to be scanned.
If there are no useful predicate columns in the query (for instance, OR’ed predicates), one
column partition is chosen to be scanned and for each of its column partition values
additional corresponding column partition values are accessed until either predicate
evaluation disqualifies the logical row or all the projected column values have been retrieved
and brought together to form rows for the output spool.

Page 18-20

Column Partitioning

CP Table Query #1
(without Auto-Compression)
Which teams have lost to the "Dallas Cowboys" in the Super Bowl?

(1)

(7)

Winner

Loser

Part 1-HB-Row #1

Part 2-HB-Row #1

1’s & 0’s

1’s & 0’s

Dallas Cowboys

(1)

Denver Broncos

Chicago Bears

New England Patriots

Pittsburgh Steelers

Arizona Cardinals

Pittsburgh Steelers

Minnesota Vikings

Pittsburgh Steelers

Seattle Seahawks

New York Jets

Baltimore Colts

Dallas Cowboys

(7)

Buffalo Bills

Oakland Raiders

Philadelphia Eagles

San Francisco 49ers

Cincinnati Bengals

The relative row number in each container is used to access the data.

Column Partitioning

Page 18-21

Column Partition Table DDL (with Auto-Compression)
The DDL to create a column partitioned table with auto-compression is shown on the facing
page. Each column will be maintained in a separate partition.
The clause PARTITION BY COLUMN specifies that the table has column partitioning.
Each column of this table will have its own partition and will be (by default) in column
storage since no explicit column grouping is specified.

Page 18-22

Column Partitioning

Column Partition Table DDL
(with Auto-Compression)
CREATE TABLE Super_Bowl
(Winner
CHAR(25)
,Loser
CHAR(25)
,Game_Date
DATE
,Game_Score
CHAR(7)
,Attendance
INTEGER)
NO PRIMARY INDEX
PARTITION BY COLUMN;

NOT
NOT
NOT
NOT

NULL
NULL
NULL
NULL

Note: Auto Compression is on by Default.

Column Partitioning

Page 18-23

Auto-Compression for CP Tables
Auto-compression is a completely transparent compression option for column partitions. It
is applied to a container when a container is full after appending some number of column
partition values without auto-compression by an INSERT or UPDATE statement. Each
container is assessed separately to see how, and if, it can be compressed.
Several available compression techniques are considered for compressing a container but,
unless there is some size reduction, no compression is performed. If a container is
compressed, the needed data is automatically uncompressed as it is read.
Auto-compression happens automatically and is most effective when the column partition is
based on a single column only, and less effectively as more columns are included in the
column partition.
User-defined compression, such as multi-value or algorithmic compression that is already
defined by the user is honored and carried forward if it helps compress the container. If
block level compression is specified, it applies for data blocks holding the physical rows of
the table independent of whether auto-compression is applied or not.

Page 18-24

Column Partitioning

Auto-Compression for CP Tables
Auto Compression

• When a column partition is defined to have auto-compression (i.e., the NO AUTO
COMPRESS option is not specified), data is compressed by the system as physical
rows that are inserted into a column-partitioned table or join index.

• For some values, there is no applicable compression technique that reduces the size
of the physical row and the system will determine not to compress the values for that
physical row.

• The system decompresses any compressed column-partition values when they are
retrieved.

• Auto-compression is most effective for a column partition with a single column and
COLUMN format.

• There is overhead in determining whether or not a physical row is to be compressed
and, if it is to be compressed, what compression techniques are to be used.

• This overhead can be eliminated by specifying the NO AUTO COMPRESS option for
the column partition.

Column Partitioning

Page 18-25

Auto-Compression Techniques for CP Tables
The facing page lists and briefly describes each of the auto-compression techniques that
Teradata may utilize.

Page 18-26

Column Partitioning

Auto-Compression Techniques for CP Tables
•

Run-Length Encoding
Each series of one or more column-partition values that are the same are compressed by having the
column-partition value occur once with an associated count of the number of occurrences in the series.

•

Local Dictionary Compression
This is similar to a user-specified value-list compression for a column. Often occurring column-partition
values within a physical row are placed in a value-list dictionary local to the physical row.

•

Trim Compression
Trim high-order zero bytes of numeric values and trailing pad bytes of character and byte values with bits to
indicate how many bytes were trimmed or what the length is after trimming.

•

Null Compression
Similar to null compression (COMPRESS NULL) for a column except applied to a column-partition value. A
single-column or multicolumn-partition value is a candidate for null compression if all the column values in
the column-partition value are null (this also means all these columns must be nullable).

•

Delta on Mean Compression
Delta on Mean compression computes the mean/average of all the values in the column container. This
mean value is saved and stored in the container. After Delta on Mean compression, the value that is stored
for a row is the difference with the mean. So for instance, if the average is say 100 and the four values in
four different rows are 99, 98, 101, and 102. Then the values stored are -1, -2, 1, and 2.

•

Unicode to UTF8 Compression
For a column defined with UNICODE character set but where the value consists of ASCII characters,
compress the Unicode representation (2 bytes per character) to UTF8 (1 byte per character).

Column Partitioning

Page 18-27

User-Defined Compression Techniques
User-defined compression, such as multi-value or algorithmic compression that is already
defined by the user is honored and carried forward if it helps compress the container.
If block level compression is specified, it applies for data blocks holding the physical rows
of the table independent of whether auto-compression is applied or not.
Note that auto-compression is applied locally to a container based on column partition
values (which may be multicolumn) while user-specified MVC and ALC are applied
globally for a column and are applicable to both containers and subrows.
Auto-compression is differentiated from block level compression in several key ways:








Auto-compression requires no parameter setting, but rather is completely
transparent to the user while block level compression is a result of the appropriate
settings of parameters.
Auto-compression acts on a container (a physical row) while block level
compression acts on a data block (which consists of one or more physical rows).
Decompressing a column partition value in a container has little overhead while
software-based block level compression incurs noticeable decompression overhead.
Only column partition values that are needed by the query are decompressed. BLC
has to decompress the entire data block even if only one or a few values are needed
from the data block.
Determining the auto-compression to use for a container, compressing a container,
and compressing additional values to be inserted into the container can cause an
increase in the CPU needed for appending values to column partitions.

You can expect additional CPU to be required when inserting rows into a CP table that uses
auto-compression. This is similarly to the increase in CPU if MVC or ALC compression is
added to the columns.

Page 18-28

Column Partitioning

User-Defined Compression Techniques
All the current compression techniques available in Teradata today, can be
leveraged and used for column partitioned tables.

• Dictionary-Based Compression:
Allows end-users to identify and target specific values that would be compressed
in a given column. Also known as, Multi-Value Compression.

• Algorithmic Compression:
Allows users the option of defining compression/decompression algorithms that
would be implemented as UDFs and that would be specified and applied to data at
the column level in a row. Teradata provides three compression/decompression
algorithms that offer compression for UNICODE and LATIN data columns.

• Block-Level Compression:
Feature provides the capability to perform compression on whole data blocks at
the file system level before the data blocks are actually written to storage.

Column Partitioning

Page 18-29

Column Partition Container (Automatic Compression)
In order to support columnar-storage for a column partition, a new format, referred to as a
COLUMN format in the syntax, is available for a physical row.
The example on the facing page assumes that automatic compression is on for the column.

Page 18-30

Column Partitioning

Column Partition Container
(Automatic Compression)

Partition

Column Store RowID

HB

Row #

Starting row number

1’s & 0’s

Auto-Compression &
NULL Bits

Dallas Cowboys
Chicago Bears
Pittsburgh Steelers (3)
New York Jets

Column Data

Dallas Cowboys
Oakland Raiders
San Francisco 49ers

Column (Local)
Compression
Dictionary

Column Container is effectively a row in the partition.

Column Partitioning

Page 18-31

The Column Partition Table (with Auto-Compression)
The result of creating a column partitioned table with auto-compression is shown on the
facing page.

Page 18-32

Column Partitioning

The Column Partition Table
(with Auto-Compression)

Winner

Loser

Game_Date

Game_Score

Attendance

Part 1-HB-Row #1

Part 2-HB-Row #1

Part 3-HB-Row #1

Part 4-HB-Row #1

Part 5-HB-Row #1

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

Denver
Broncos
Denver
Broncos

01-15-1978

46-10
27-10

73,818
(Null)

New
England
Patriots
New
England
Patriots

01-26-1986

27-23
46-10

70,774
73,818

Pittsburgh
Pittsburgh
Steelers
Steelers
(3)

Arizona
Cardinals
Arizona
Cardinals

02-01-2009

27-23
16-6

80,997
70,774

Pittsburgh
New York
Steelers
Jets

Minnesota
Vikings
Minnesota
Vikings

01-12-1975

21-10
16-6

68,206
80,997

Pittsburgh
Dallas Cowboys
Steelers

Seattle
Seahawks
Seattle
Seahawks

02-05-2006

21-10
16-7

75,389
68,206

Baltimore
Baltimore
Colts Colts

01-12-1969

52-17
16-7

76,135
75,389

Bills
Buffalo Buffalo
Bills

01-31-1993

52-17
26-21

81,270
(Null)

Oakland Raiders

Philadelphia
Eagles
Philadelphia
Eagles

01-25-1981

27-10

76,135

San Francisco 49ers

Cincinnati
Bengals
Cincinnati
Bengals

01-24-1982

26-21

81,270

Dallas Cowboys
Chicago Bears

Oakland
New York
Raiders
Jets
San
Dallas
Francisco
Cowboys
49ers

Run-Length
Encoding

Trim Trailing
Spaces

No Compression

Local Dictionary
Compression
(27-10 is
compressed)

Null
Compression

Columnar compression is based on each Container. Therefore, each Container may
have different compression characteristics and even different compression methods.

Column Partitioning

Page 18-33

CP Table Query #2 (with Auto-Compression)
The Pittsburgh Steelers team was compressed, but effectively represented 3 values in the
container. These 3 values correspond to 3, 4, and 5 in the other container (Loser column).

Page 18-34

Column Partitioning

CP Table Query #2
(with Auto-Compression)
Which teams have lost to the "Pittsburgh Steelers" in the Super Bowl?

Winner

Loser

Part 1-HB-Row #1

Part 2-HB-Row #1

1’s & 0’s

1’s & 0’s

Dallas Cowboys

Denver Broncos

Chicago Bears

(3, 4, 5) Pittsburgh Steelers
(3)

New York Jets
Dallas Cowboys
Oakland Raiders
San Francisco 49ers

New England Patriots

(3)

Arizona Cardinals

(4)

Minnesota Vikings

(5)

Seattle Seahawks
Baltimore Colts
Buffalo Bills
Philadelphia Eagles
Cincinnati Bengals

Column Partitioning

Page 18-35

CP Table with Row Partitioning DDL
Row partitioning can be combined with column partitioning on the same table. This allows
queries to read only non-eliminated combined partitions. Such partitions are defined by the
intersection of the columns referenced in the query and any partitioning column selection
criteria.
There is usually an advantage to putting the column partitioning at level one of the
combined partitioning scheme.
The DDL to create a column partitioned table with auto-compression and Row partitioning
is shown on the facing page.

Determining the Column Partition Level
It is initially recommended that column partitioning either be defined as the first level or, if
not as the first level, at least as the second level. When column partitioning is defined as the
first level it is easier for the file system to locate related data that is from the same logical
row of the table. When column partitioning is defined at a lower level, more boundary
checks have to be made, possibly causing an impact on performance.
If you are inserting a new table row, it takes more effort if the column partitioning is not the
first level. Values of columns from the newly-inserted table row need to be appended at the
end of each column partition. If column-partitioning is not first, it is necessary to read
through several combined partitions to find the correct container that represents the end
point.
On the other hand, if you have row partitioning at the second or a lower level partitioning
level so that column partitioning can be at the first level, this can be less efficient when row
partition elimination based on something like a date range is taking place.

Page 18-36

Column Partitioning

CP Table with Row Partitioning DDL

CREATE TABLE Super_Bowl
(Winner
CHAR(25)
NOT NULL
,Loser
CHAR(25)
NOT NULL
,Game_Date
DATE
NOT NULL
,Game_Score
CHAR(7)
NOT NULL
,City
CHAR(40))
NO PRIMARY INDEX
PARTITION BY
(COLUMN
,RANGE_N(Game_Date BETWEEN
DATE '1960-01-01' and DATE '2059-12-31'
EACH INTERVAL '10' YEAR));
Note: Auto Compression is on by Default.

Column Partitioning

Page 18-37

The Column Partition Table (with Row Partitioning)
The result of creating a column partitioned table with auto-compression and row partitioning
is shown on the facing page.

Page 18-38

Column Partitioning

The Column Partition Table
(with Row Partitioning)

• In the 1970s, which teams won Super Bowls, who were the losing teams ,
and what was the date the game was played?
Winner

Loser

Game_Date

Game_Score

City

Part 1-HB-Row #1

Part 2-HB-Row #1

Part 3-HB-Row #1

Part 4-HB-Row #1

Part 5-HB-Row #1

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

New York Jets

Baltimore Colts

01-12-1969

16-7

Miami, FL

Winner

Loser

Game_Date

Game_Score

Attendance

Part 11-HB-Row #1

Part 12-HB-Row #1

Part 13-HB-Row #1

Part 14-HB-Row #1

Part 15-HB-Row #1

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

Dallas Cowboys

Denver Broncos

01-15-1978

27-10

New Orleans, LA

Pittsburgh Steelers

Minnesota Vikings

01-12-1975

16-6

Winner

Loser

Game_Date

Game_Score

Attendance

Part 41-HB-Row #1

Part 42-HB-Row #1

Part 43-HB-Row #1

Part 44-HB-Row #1

Part 45-HB-Row #1

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

1’s & 0’s

Pittsburgh Steelers

Seattle Seahawks

02-05-2006

21-10

68,206

Column Partitioning

Page 18-39

CP Table with Multi-Column Container DDL
When a table is defined with column partitioning, by default each column becomes its own
column partition. However, it is possible to group multiple columns into a single partition.
This has the result of fewer column partitions with more data held within each column
partition.
Grouping columns into fewer column partitions may be appropriate in these situations:






When the table has a large number of columns (having fewer column partitions
may improve the performance of INSERT-SELECT and UPDATE statements).
When access to the table often involves a large percentage of the columns and the
access is not very selective.
When a common subset of columns are frequently accessed together.
When a multicolumn NUSI is created on a group of columns.
There are too few available column partition contexts to access all the needed
column partitions for queries.

Note that auto-compression will probably be less effective if columns are grouped together
instead of being in their own column partitions.

Page 18-40

Column Partitioning

CP Table with Multi-Column Container DDL
CREATE TABLE Super_Bowl
( Winner
CHAR(25)
NOT NULL
,Loser
CHAR(25)
NOT NULL
,Game_Date
DATE
NOT NULL
,Game_Score
CHAR(7)
NOT NULL
,Attendance
INTEGER)
NO PRIMARY INDEX
PARTITION BY COLUMN
(Winner
NO AUTO COMPRESS
Recommendation:
,Loser
NO AUTO COMPRESS
The group of multiple
,(Game_Date
columns should be less
,Game_Score
than 256 bytes.
,Attendance)
NO AUTO COMPRESS)
;

Note that this example is without Auto-Compression.

Watch the difference between 'Projection' and 'Predicate'.
If you are always projecting three columns, it may make sense to group these
columns into one Container. If, however, one of these columns is used in a WHERE
Predicate, then it may be better to place this column into its own Container.

Column Partitioning

Page 18-41

The CP Table with Multi-Column Container
The example on the facing page illustrates a CP table that has a multi-column container.

Page 18-42

Column Partitioning

The CP Table with Multi-Column Container
Single Column Containers
Winner

Loser

Part 1-HB-Row #1

Part 2-HB-Row #1

Multi-Column Container
Game_Date

Game_Score

Attendance

Part 3-HB-Row #1

1’s & 0’s

1’s & 0’s

Dallas Cowboys

Denver Broncos

01-15-1978

1’s & 0’s
27-10

(null)

Chicago Bears

New England Patriots

01-26-1986

46-10

73,818

Pittsburgh Steelers

Arizona Cardinals

02-01-2009

27-23

70,774

Pittsburgh Steelers

Minnesota Vikings

01-12-1975

16-6

80,997

Pittsburgh Steelers

Seattle Seahawks

02-05-2006

21-10

68,206
75,389

New York Jets

Baltimore Colts

01-12-1969

16-7

Dallas Cowboys

Buffalo Bills

01-31-1993

52-17

(null)

Oakland Raiders

Philadelphia Eagles

01-25-1981

27-10

76,135

San Francisco 49ers

Cincinnati Bengals

01-24-1982

26-21

81,270

General recommendations:
• If you have a lot of columns in a table, then multi-column containers may be needed.
• Multi-column containers will not compress as well as single-column containers.
• If you select any column in a multi-column container you will get all of the other columns.

Column Partitioning

Page 18-43

CP Table Hybrid Row & Column Store DDL
The example on the facing page illustrates the DDL to create a column partitioned table that
has a combination of row and column storage.

COLUMN Format Considerations
The COLUMN format packs column partition values into a physical row, referred to as a
container, up to a system-determined limit. Whether or not to change a column partition to
use ROW format depends on the whether the benefit of row header compression and autocompression can be realized or not.
A row header occurs once per container, with the RowID of the first column partition value
becoming the RowID of the container itself. In a column-partitioned table, each column
partition value is assigned its own RowID, but in a container these RowIDs are implicit
except for the first one specified in the header. The uniqueness value can be determined
from the position of a column partition value relative to the first column partition value.
Thus the row id for each value in the container is implicitly available and an explicit RowID
does not need be carried for each individual column value in the container.
If many column partition values can be packed into a container, this form of compression
(referred to as row header compression) can reduce the space needed for a columnpartitioned table compared to the table without column partitioning. If only a few column
partition values (because they are wide) can be placed in a container, there can actually be
an increase in the space needed for the table compared to the table without column
partitioning. In this case, ROW format may be more appropriate.

ROW Format Considerations
A subrow, on the other hand, has a format that is the same as traditional row (except it only
has the values of a subset of the columns). Subrows are appropriate when column partition
values are very wide and you expect only one or a few values to fit in a columnar container.
In this case, auto-compression and row header compression using COLUMN format might
not be very effective. ROW format provides quicker access to specific values because no
search through a physical row is required to find only one value. Each column partition
value is in it owns subrow with a row header. Subrows are not subject to auto-compression
but may be in the future.

Page 18-44

Column Partitioning

CP Table Hybrid Row & Column Store DDL
CREATE TABLE Super_Bowl
(Winner
CHAR(25)
NOT NULL
,Loser
CHAR(25)
NOT NULL
,Game_Date DATE
NOT NULL
,Game_Score
CHAR(7)
NOT NULL
,Attendance
INTEGER
,City
CHAR(40))
NO PRIMARY INDEX
PARTITION BY COLUMN
(Winner
NO AUTO COMPRESS
,Loser
NO AUTO COMPRESS
,ROW (Game_Date
,Game_Score
,Attendance
,City) NO AUTO COMPRESS);

This example illustrates the syntax
to create a row store, but in reality
you would only define the row
format if the set of columns was
greater than 256 bytes.

General recommendation:
• A column (or set of columns) should be at least 256 bytes wide before considering ROW format.
• Row stores will take up more space, but act like a row in terms of retrieving data.
• Each row will have a row header and require more space.

Column Partitioning

Page 18-45

The CP Table (with Hybrid Row & Column Store)
As an alternative to COLUMN format, column partition values may be held in a physical
row using what is known in Teradata syntax as ROW format. The type of physical row
supports row-storage for a column partition and is referred to as a subrow. Each subrow
holds one column partition value for a column partition. A subrow has the same format as
regular row except that it is generally a subset of the columns for a table row instead of all
the columns. Just like a container, each subrow is assigned to a specific partition. One or
more subrows may be needed to hold the entire column partition. Since a subrow is a
physical row, the size of a subrow is limited by the maximum physical row size.
A column partition may have COLUMN format or ROW format but not a mix of both.
However, different column partitions in column-partitioned table may have different
formats.

Page 18-46

Column Partitioning

The CP Table
(with Hybrid Row & Column Store)
Column and Row Store in "one" table.
Column Store Containers
Winner

Loser

Part 1-HB-Row #1

Part 2-HB-Row #1

Row Store

1’s & 0’s

1’s & 0’s

Partition

HB

Row #

Game_Date

Dallas Cowboys

Denver Broncos

0

n

1

01-15-1978

Game_Score Attendance
27-10

(null)

New Orleans, LA

City

New Orleans, LA

Chicago Bears

New England Patriots

0

n

2

01-26-1986

46-10

73,818

Pittsburgh Steelers

Arizona Cardinals

0

n

3

02-01-2009

27-23

70,774

Tampa, FL

Pittsburgh Steelers

Minnesota Vikings

0

n

4

01-12-1975

16-6

80,997

New Orleans, LA

Pittsburgh Steelers

Seattle Seahawks

0

n

5

02-05-2006

21-10

68,206

Detroit, MI

New York Jets

Baltimore Colts

0

n

6

01-12-1969

16-7

75,389

Miami, FL

Dallas Cowboys

Buffalo Bills

0

n

7

01-31-1993

52-17

(null)

Pasadena, CA

Oakland Raiders

Philadelphia Eagles

0

n

8

01-25-1981

27-10

76,135

New Orleans, LA

San Francisco 49ers

Cincinnati Bengals

0

n

9

01-24-1982

26-21

81,270

Pontiac, MI

Column Partitioning

Page 18-47

Populating a CP Table
INSERT-SELECT
INSERT-SELECT is the expected and most efficient method of loading data into a columnpartitioned table. If the data originates from an external source, FastLoad can be used to
load it into a staging table from which the INSERT-SELECT can take place.
If the source was a SELECT that included several joins and as a result skewed data was
produced, options can be added to the INSERT-SELECT statement to avoid a skewed
column-partitioned table and improve the effectiveness of auto-compression:

Options
HASH BY (RANDOM or hash_spec_list):
The selected rows are redistributed by the hash value of the expressions in the
hash_spec_list. Alternatively, HASH BY RANDOM can be specified to have data blocks
redistributed randomly. It is important that a column or columns be selected that distributes
well if the HASH BY option is used.
LOCAL ORDER BY:
A local sort is done on each AMP before physically storing the rows. This could help autocompression to be more effective by ensuring that like values of the sorting columns appear
together.
During an INSERT-SELECT process, each source row is read, and its columns individually
appended to the column partitions to which they belong. As many column partition values
as can fit are built up simultaneously in memory, and written out to disk when the buffer is
filled.
If the column-partitioned table being loaded has a large number of columns, additional
passes of the source table may be required to append all of the columns to their respective
column partitions.

Page 18-48

Column Partitioning

Populating a CP Table
CREATE TABLE Super_Bowl_Staging
(Winner
CHAR(25) NOT NULL
,Loser
CHAR(25) NOT NULL
,Game_Date
DATE
NOT NULL
,Game_Score
CHAR(7) NOT NULL
,Attendance
INTEGER)
NO PRIMARY INDEX;

CREATE TABLE Super_Bowl
(Winner
CHAR(25)
,Loser
CHAR(25)
,Game_Date
DATE
,Game_Score
CHAR(7)
,Attendance
INTEGER)
NO PRIMARY INDEX
PARTITION BY COLUMN;

NOT
NOT
NOT
NOT

NULL
NULL
NULL
NULL

1. Load data into staging table.
2. INSERT INTO Super_Bowl ….. SELECT * FROM Super_Bowl_Staging …

Column Partitioning

Page 18-49

DELETE Considerations
Rows can be deleted from a column-partitioned table using the DELETE ALL, or
selectively using DELETE. DELETE ALL uses the standard fast-path delete as would be
done on a primary-indexed table. If a column-partitioned table also happens to include row
partitioning, the same fast-path delete can be applied to one or more row partitions. Space is
immediately reclaimed.
The selective DELETE, in which only one or a few rows of the table are deleted, requires a
scan of a column partition or indexed access to the column-partitioned table. In this case,
the row being deleted is not physically removed, but only flagged as having been deleted.
The space taken by a row being deleted is scattered across multiple column partitions and is
not reclaimed at the time of the deletion. This form of delete should only be used to delete a
small percentage of rows.
During a delete operation, all large objects are immediately deleted, as are entries in
secondary indexes. Join indexes are updated to reflect the change as it happens.

The Delete Column Partition
Each column-partitioned table has one delete column partition, in addition to the userspecified column partitions. It holds information about deleted rows so they do not get
included in an answer set. When a single row delete takes place in a column-partitioned
table, rather than removing each deleted value across all the column partitions, which would
involve multiple physical row updates, a single action is performed. One bit in the delete
column partition is set as an indication that the hash bucket and uniqueness associated with
the table row has been deleted.
This delete column partition is accessed any time a query is made against a columnpartitioned table without indexed access. At the time a column partition is scanned, the
delete column partition is checked to make sure a table row being requested by the query has
not been deleted (if it has, the value is skipped). This additional partition access can be
observed in the EXPLAIN text.

Page 18-50

Column Partitioning

DELETE Considerations
• DELETE ALL uses the standard fast-path delete as would be done on a primary-indexed
table.

– If a CP table also happens to include row partitioning, the same fast-path delete
can be applied to one or more row partitions. Space is immediately reclaimed.

• The selective DELETE, in which only one or a few rows of the table are deleted, requires
a scan of a column partition or indexed access to the column-partitioned table.

– In this case, the row being deleted is not physically removed, but only flagged as
having been deleted.

– This form of delete should only be used to delete a small percentage of rows.
• The Delete Column Partition - each column-partitioned table has one delete column
partition, in addition to the user-specified column partitions. It holds information about
deleted rows so they do not get included in an answer set.

– One bit in the delete column partition is set as an indication that the hash bucket
and uniqueness associated with the table row has been deleted.

Column Partitioning

Page 18-51

UPDATE Considerations
Updating rows in column partitioned table requires a delete and an insert operation. It involves
marking the appropriate bit in the delete column partition, and then re-inserting columns for the new
updated version of the table row. The cost of this update is less severe than a Primary Index update
(also a delete plus insert) because in the column-partitioned table update, the deletion and reinsertion takes place on the same AMP.
The part of the update that re-inserts a new table row is essentially a re-append. The highest
uniqueness value on that AMP is incremented by one, and all the column values for that updated row
are appended to their corresponding column partitions. Because multiple I/Os are performed in
doing this re-append, row-at-a-time updates on column-partitioned tables should be approached with
caution. The space that is being used by the old row is not reclaimed, but a delete bit is turned on in
the delete column partition, indicating that the old version of the row is obsolete.
An UPDATE statement should only be used to update a small percentage of rows.

USI Access
For example, consider a unique secondary index (USI) access. The USI subtable provides the
specific RowID of the base table row. In the columnar case, the base table row is located on a
specific AMP which can be stored in multiple containers. As it is for a PI or NoPI tables, the hash
bucket in the RowID carried in the USI is used to locate the AMP that contains the base table row.
The column values from the multiple containers are located the same as using a RowID retrieved
from a NUSI which is described below.

NUSI Access
With non-unique secondary indexes (NUSIs), a row-id list is retrieved from the NUSI subtable. In
the case of a column-partitioned table, the table row has been decomposed into columns that are
located in different column partitions on disk. Several different internal partition numbers come into
play in reconstructing the table row.
Rather than relying on the column partition number, it is only the hash bucket and uniqueness that is
of importance in the NUSI subtable RowID list. The hash bucket and uniqueness identifies the table
row on that AMP, while the column partition number plays no meaningful role.
Because column partition numbers in the NUSI subtable are not useful in the case of a columnpartitioned table, all RowIDs in a NUSI carry a column partition number of 1. The hash bucket and
uniqueness value are the important link between the NUSI subtable and the column values of interest
for the query. Once the hash bucket and uniqueness value is known, a RowID is formulated using
the internal partition number of the column of interest. This RowID is then used to read the
containing physical row/container. A relative position is determined using the uniqueness value
which is then used to access the appropriate column value. This process is repeated to locate the
column value of each of the remaining columns for this row. These individual column values are
brought together to formulate the row. This process occurs for each RowID in the NUSI subtable
entry.

Page 18-52

Column Partitioning

UPDATE and USI/NUSI Considerations
UPDATE Considerations

• Updating rows in column partitioned table requires a delete and an insert operation.
• It involves marking the appropriate bit in the delete column partition, and then reinserting columns for the new updated version of the table row.

• An UPDATE statement should only be used to update a small percentage of rows.
USI/NUSI Considerations

• For a USI on a CP table, the base table row is located on a specific AMP which can be
stored in multiple containers. The hash bucket in the RowID carried in the USI is used
to locate the AMP that contains the base table row.

• For a NUSI on a CP table, the table row has been decomposed into columns that are
located in different column partitions on disk. Several different internal partition
numbers come into play in reconstructing the table row.

• Rather than relying on the column partition number, it is only the hash bucket and
uniqueness that is of importance in the NUSI subtable RowID list.

Column Partitioning

Page 18-53

CP Table Restrictions
The following limitations apply:


Column partitioning for join indexes is restricted to single-table, non-aggregate,
non-compressed join indexes with no PI and no ORDER BY clause
– ROWID of base table must be included in a CP join index



Column partitioning is not allowed for the following:
– Primary index (PI) base tables
– Global temporary, volatile, and queue tables
– Secondary indexes



Column partitioning is not applicable for the following:
– Global temporary trace tables
– Error tables
– Compressed join indexes



NoPI table with only row partitioning is not allowed



A column cannot be specified to be in more than one column partition



Column grouping cannot be specified in both the column definition list of
CREATE TABLE statement and in the COLUMN clause.



Column grouping cannot be specified in both the select list of a CREATE JOIN
INDEX statement and in the COLUMN clause.

Page 18-54

Column Partitioning

CP Table Restrictions
• Column Partitioning is predicated on the NoPI table structure and as such the following
restrictions apply:

–
–
–
–
–
–
–
–
–

Set Tables
Queue tables
Global Temporary Tables
Volatile Tables
Derived Tables
Multi-table or Aggregate Join Index
Compressed Join Index
Hash Index
Secondary Index are not column partitioned

• Column Partitioned tables cannot be loaded with either the FastLoad or the MultiLoad
utilities.

• Merge-Into and UPSERT statements are not supported.
• Population of Column Partition tables will require an Insert-Select process after data has
been loaded into a staging table.

• No synchronized scanning with Columnar Tables.
• Since Columnar Tables are NoPI Tables, Teradata cannot use Full Cylinder Reads.

Column Partitioning

Page 18-55

Summary
The facing page contains a summary of key concepts from this module.

Page 18-56

Column Partitioning

Summary
When is column partitioning useful?

• Queries access varying subsets of the columns of table
or
Queries of the table are selective (Best if both occur for queries)

• For example, ad hoc, data analytics
• Data can be loaded with large INSERT-SELECTs
• There is no or little update/delete maintenance between refreshes or appends of the
data for the table or for row partitions.

Do NOT use this feature when:

• Queries need to be run on current data that is changing (deletes and updates).
• Performing tactical queries or OLTP queries.
• Workload is CPU bound such that a trade-off of reduced I/O with increased CPU does
not improve the throughput.

– Column partitioning is not intended to be a CPU savings feature.

Column Partitioning

Page 18-57

Module 18: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 18-58

Column Partitioning

Module 18: Review Questions
1. Which two choices apply to Column Partitioning?
a. SET table
b. NoPI table
c. Table with multi-level partitioning
d. Table with existing row partitioning
2. What are two benefits of Column Partitioning?
a.
b.
c.
d.

Reduced
Reduced
Reduced
Reduced

I/O
CPU
disk space usage
tactical query response times

3. True or False?

Deleting a row in a column partitioned table will reclaim table space.

4. True or False?

In a multi-level partitioned table, only one level may be column partitioned.

5. True or False?

The preferred way to load a columnar table is using INSERT/SELECT.

Column Partitioning

Page 18-59

Lab Exercise 18-1
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT COUNT(*) FROM table_name;
SHOW TABLE table_name;

A count of rows in the Orders table is 31,200.
A count of rows in the Orders_2012 table is 12,000.

Page 18-60

Column Partitioning

Lab Exercise 18-1
Lab Exercise 18-1
Purpose
In this lab, you will use Teradata SQL Assistant to create tables with column partitioning in various
ways.
What you need
Populated DS tables and Orders and Orders_2012 tables in your database
Tasks
1. Use the SHOW TABLE for Orders to help create a new, similar table (same column names and
definitions, etc.) that does NOT have a primary index and name this table "Orders_NoPI".

2. Populate the Orders_NoPI table (via INSERT/SELECT) with all of the rows from the DS.Orders and
DS.Orders_2012 tables.
Verify the number of rows in your table. Count = ________ (count should be 43,200)
3. Use the SHOW TABLE for Orders_NoPI to create a new column partitioned table named "Orders_CP"
based on the following:
– Each column is created as a separate partition
– Utilize auto compression for every column
Populate the Orders_CP table (via INSERT/SELECT) from the Orders_NoPI table.

Column Partitioning

Page 18-61

Lab Exercise 18-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT COUNT(*) FROM table_name;
SELECT COUNT(DISTINCT(PARTITION)) FROM table_name;

Page 18-62

Column Partitioning

Lab Exercise 18-1 (cont.)
4. Verify the number of rows in your table. Count = ________ (count should be 43,200)
How many partitions actually have data? ________
Note: The table only has 1 logical partition.

5. Use the SHOW TABLE for Order_CP to create a new column partitioned table named
"Orders_CP_noAC" based on the following:
– Each column is created as a separate partition
– Turn off auto compression for every column
Populate the Orders_CP_noAC table (via INSERT/SELECT) from the Orders_NoPI table.

Column Partitioning

Page 18-63

Lab Exercise 18-1 (cont.)
Check your understanding of the concepts discussed in this module by completing the lab
exercises as directed by your instructor.
SQL hints:
INSERT INTO table_1 SELECT * FROM table_2;
SELECT COUNT(*) FROM table_name;
SELECT COUNT(DISTINCT(PARTITION)) FROM table_name;
SELECT
FROM
WHERE
AND
GROUP BY
ORDER BY

Page 18-64

TableName, SUM(CurrentPerm)
DBC.TableSizeV
DatabaseName = DATABASE
TableName In ('tablename1', 'tablename2', …)
1
1;

Column Partitioning

Lab Exercise 18-1 (cont.)
6. (Optional) Use the SHOW TABLE for Order_CP to create a new column partitioned table named
"Orders_CP_TP based on the following:
–
–
–

Each column is created as a separate partition (COLUMN partitioning is the first level)
Utilize auto compression for every column
Incorporate table partitioning with orderdate as the partitioning column
•
•

From '2003-01-01' through '2012-12-31', partition by month
Do not use the NO RANGE or UNKNOWN options.

Populate the Orders_CP_TP table (via INSERT/SELECT) from the Orders_NoPI table.
7. (Optional) Use the PARTITION key word to determine the number of partitions defined in the
Orders_CP_TP.
How many partitions actually have data? ________

8.

(Optional) Determine the AMP space usage of the Orders_CP, Orders_CP_noAC, and
Orders_CP_TP tables using DBC.TableSizeV.
CurrentPerm of Orders_CP

________________

CurrentPerm of Orders_CP_noAC

________________

CurrentPerm of Orders_CP_TP

________________

Column Partitioning

Page 18-65

Notes

Page 18-66

Column Partitioning

Module 19
Secondary Index Usage

After completing this module, you will be able to:
 Describe USI and NUSI implementations.
 Describe dual NUSI access.
 Describe NUSI bit mapping.
 Explain NUSI and Aggregate processing.
 Compare NUSI vs. full table scan (FTS).

Teradata Proprietary and Confidential

Secondary Index Usage

Page 19-1

Notes

Page 19-2

Secondary Index Usage

Table of Contents
Secondary Indexes ..................................................................................................................... 19-4
Defining Secondary Indexes ...................................................................................................... 19-6
Secondary Index Subtables ........................................................................................................ 19-8
Primary Indexes (UPIs and NUPIs) ....................................................................................... 19-8
Unique Secondary Indexes (USIs) ......................................................................................... 19-8
Non-Unique Secondary Indexes (NUSIs) .............................................................................. 19-8
USI Subtable General Row Layout .......................................................................................... 19-10
USI Change for PPI.............................................................................................................. 19-10
USI Hash Mapping................................................................................................................... 19-12
NUSI Subtable General Row Layout ....................................................................................... 19-14
NUSI Change for PPI ........................................................................................................... 19-14
NUSI Hash Mapping ................................................................................................................ 19-16
Table Access – A Complete Example...................................................................................... 19-18
Secondary Index Considerations .............................................................................................. 19-20
Single NUSI Access (Between, Less Than, or Greater Than) ................................................. 19-22
Dual NUSI Access ................................................................................................................... 19-24
AND with Equality Conditions ............................................................................................ 19-24
OR with Equality Conditions ............................................................................................... 19-24
NUSI Bit Mapping ................................................................................................................... 19-26
Example ............................................................................................................................... 19-26
Value-Ordered NUSIs .............................................................................................................. 19-28
Value-Ordered NUSIs (cont.) .................................................................................................. 19-30
Covering Indexes ..................................................................................................................... 19-32
Join Index Note: ............................................................................................................... 19-32
Example ........................................................................................................................... 19-32
Covering Indexes (cont.) .......................................................................................................... 19-34
NUSIs and Aggregate Processing ........................................................................................ 19-34
Example ............................................................................................................................... 19-34
NUSI vs. Full Table Scan (FTS) .............................................................................................. 19-36
Example ............................................................................................................................... 19-36
Full Table Scans – Sync Scans ................................................................................................ 19-38
Module 19: Review Questions ................................................................................................. 19-40

Secondary Index Usage

Page 19-3

Secondary Indexes
Secondary Indexes are generally defined to provide faster set selection. The Teradata
Database allows up to 32 Secondary Indexes per table. Teradata handles Unique Secondary
Indexes (USIs) and Non-Unique Secondary Indexes (NUSIs) very differently.
The diagram illustrates how Secondary Index values are stored in subtables. Secondary
Index values, like Primary Index values, are input to the Hashing Algorithm. As with
Primary Indexes, the Hashing Algorithm takes the Secondary Index value and outputs a
Row Hash. These Row Hash values point to a subtable which stores index rows containing
the base table SI column values and Row IDs which point to the row(s) in the base table
with the corresponding SI value.
The Teradata Database can determine the difference between a base table and a SI subtable
by checking the Subtable ID, which is part of the Table ID.

Page 19-4

Secondary Index Usage

Secondary Indexes
Secondary indexes provide faster set selection.

•
•
•
•

They may be unique (USI) or non-unique (NUSI).
A USI may be used to maintain uniqueness on a column.
The system maintains a separate subtable for each secondary index.
A secondary index can consist of 1 to 64 columns.

Subtables keep base table secondary index row hash, column values, and RowID (which
point to the row(s) in the base table with that value).

• The implementation of a USI is different than a NUSI.
• Users cannot access subtables directly.
SI Subtable
Secondary
Index Value

Hashing
Algorithm

C

Row ID

Base Table
Table_X
A

Primary
Index Value

Secondary Index Usage

Hashing
Algorithm

PI

B

C

D

E

SI

Page 19-5

Defining Secondary Indexes
Use the CREATE INDEX statement to create a secondary index on an existing table or join
index. The index can be optionally named.
Notes on ORDER BY option:


If the ORDER BY option is not used, the default is to order by hash.



If the ORDER BY option is specified and neither of the keywords (HASH or
VALUES) is specified, then the default is to order by values.

Recommendation: If the ORDER BY option is used, specify one of the keywords –
HASH or VALUES.
Notes on the ALL option:


The ALL option indicates that a NUSI should retain row ID pointers for each
logical row of a join index (as opposed to only the compressed physical rows).



ALL also ignores the NOT CASESPECIFIC attribute of data types so a NUSI can
include case-specific values.



ALL enables a NUSI to cover a join index, enhancing performance by eliminating
the need to access a join index when all values needed by a query are in the
secondary index. However, ALL might also require the use of additional index
storage space.



Use this keyword when a NUSI is being defined for a join index and you want to
make it eligible for the Optimizer to select when covering reduces access plan cost.
ALL can also be used for an index on a table, however.



You cannot specify multiple indexes that differ only by the presence or absence of
the ALL option.



The use of the ALL option for a NUSI on a data table does not cause a syntax error.

Additional notes:


column_name_2 specifies the sort order to be used. column_name_2 is a column
name that must appear in the column_name_1 list.



You can put two NUSI secondary indexes on the same column (or set of columns)
if one of the indexes is ordered by hash and the other index is ordered by value.



You cannot define a USI on a join index. Other secondary indexes are allowed.

Page 19-6

Secondary Index Usage

Defining Secondary Indexes
Secondary indexes can be defined …

• when a table is created (CREATE TABLE).
• for an existing table (CREATE INDEX).
,
CREATE

INDEX
UNIQUE

A

,
( col_name_1

A
index_name

ALL

)

B
ORDER BY

(col_name_2)
VALUES
HASH

B

ON

table_name
TEMPORARY

;
join_index_name

Examples:
Unnamed USI:

Named Value-Ordered NUSI:

CREATE UNIQUE INDEX
(item_id, store_id, sales_date)
ON Daily_Sales;

CREATE INDEX ds_vo_nusi
(sales_date)
ORDER BY VALUES ON Daily_Sales;

Secondary Index Usage

Page 19-7

Secondary Index Subtables
The diagram on the facing page illustrates how the Teradata Database retrieves rows based
upon their index values. It compares and contrasts examples of Primary (UPIs and NUPIs),
Unique Secondary (USIs) and Non-Unique Secondary Indexes (NUSIs).

Primary Indexes (UPIs and NUPIs)
As you have seen previously, in the case of a Primary Index, the Teradata Database hashes
the value and uses the Row Hash to find the desired row. This is always a one-AMP
operation and is shown in the top diagram on the facing page.

Unique Secondary Indexes (USIs)
The middle diagram illustrates the process of a USI row retrieval. An index subtable
contains index rows, which in turn point to base table rows matching the supplied index
value. USI rows are globally hash- distributed across all AMPs, and are retrieved using the
same procedure for Primary Index data row retrieval. Since the USI row is hash-distributed
on different columns than the Primary Index of the base table, the USI row typically lands
on an AMP other than the one containing the data row. Once the USI row is located, it
“points” to the corresponding data row. This requires a second access and usually involves
a different AMP. In effect, a USI retrieval is like two PI retrievals:
Master Index - Cylinder Index - Index Block
Master Index - Cylinder Index - Data Block

Non-Unique Secondary Indexes (NUSIs)
NUSIs are implemented on an AMP-local basis. Each AMP is responsible for maintaining
only those NUSI subtable rows that correspond to base table rows located on that AMP.
Since NUSIs allow duplicate index values and are based on different columns than the PI,
data rows matching the supplied NUSI value could appear on any AMP.
In a NUSI retrieval (illustrated at the bottom of the facing page), a message is sent to all
AMPs to see if they have an index row for the supplied value. Those that do use the
“pointers” in the index row to access their corresponding base table rows. Any AMP that
does not have an index row for the NUSI value will not access the base table to extract rows.

Page 19-8

Secondary Index Usage

Secondary Index Subtables
One AMP Operation
Primary Index
Value

Hashing
Algorithm

Base
Table

Two AMP Operation
Unique Secondary
Index Value

Hashing
Algorithm

USI
Subtable

Base
Table

All AMP Operation
Non-Unique Secondary
Index Value

Hashing
Algorithm

NUSI
Subtable

Base
Table

Secondary Index Usage

Page 19-9

USI Subtable General Row Layout
The layout of a USI subtable row is illustrated at the top of the facing page. It is composed
of several sections:


The first two bytes designate the row length.



The next 8 bytes contain the Row ID of the row. Within this Row ID, there are 4
bytes of Row Hash and 4 bytes of Uniqueness Value.



The following 2 bytes are additional system bytes that will be explained later as
will be the 7 bytes for row offsets.



The next section contains the SI value. This is the value that was used by the
Hashing Algorithm to generate the Row Hash for this row. This section varies in
length depending on the index.



Following the SI value are 8 bytes containing the Row ID of the base table row.
The base table Row ID tells the system where the row corresponding to this
particular USI value is located.

If the table is partitioned, then the USI subtable row needs 10 or 16 bytes to identify the
Row ID of the base table row. The Row ID (of the base table row) is combination of
the Partition Number, Row Hash, and Uniqueness Value.


The last two bytes contain the reference array pointer at the end of the block.

The Teradata Database creates one index subtable row for each base table row.

USI Change for PPI
For tables defined with a PPI, a two-byte or optionally eight-byte (TD 14.0) partition
number is embedded in the data row. Therefore, the unique row identifier is comprised of
the Partition Number, the Row Hash, and the Uniqueness Value.
The USI subtable rows use the wider row identifier to identify the base table row, making
these subtable rows wider as well. Except for the embedded partition number, USI subtable
rows (for a PPI table) have the same format as non-PPI rows.
The facing page shows the row layout for USI subtable rows.

Page 19-10

Secondary Index Usage

USI Subtable General Row Layout
Base Table Row
Identifier

Row ID of USI

USI Subtable
Row Layout

Row
Length

Row
Hash

Uniq.
Value

4

4

Secondary
Index
Value

Part.
#

Row
Hash

Uniq.
Value

2 or 8

4

4

Ref.
Array
Pointer

Notes:

• USI subtable rows are distributed by the Row Hash, like any other row.
• The Row Hash is based on the unique secondary index value.
• The subtable row includes the secondary index value and a second Row ID which
identifies a single base table row (usually on a different AMP).

• There is one index subtable row for each base table row.
• For PPI tables, a two-byte (or optionally eight-byte with Teradata 14.0) partition
number is embedded in the base table row identifier.

– Therefore, the base table row identifier is comprised of the Partition Number,
Row Hash, and the Uniqueness Value.

Secondary Index Usage

Page 19-11

USI Hash Mapping
The example on the facing page shows the three-part message that is put onto the Message
Passing Layer for USI access.


The only difference between this and the three-part message used in PI access
(previously discussed) is that the Subtable ID portion of the Table ID references the
USI subtable not the data table. Using the DSW for the Row Hash, the Message
Passing Layer (a.k.a., Communication Layer) directs the message to the correct
AMP which uses the Table ID and Row Hash as a logical index block identifier and
the Row Hash and USI value as the logical index row identifier. If the AMP
succeeds in locating the index row, it extracts the base table Row ID (“pointer”).
The Subtable ID portion of the Table ID is then modified to refer to the base table
and a new three-part message is put onto the Communications Layer.



Once again, the Message Passing Layer uses the DSW to identify the correct AMP.
That AMP uses the Table ID and Row ID to locate the correct data block and then
uses the Row ID to locate the correct row.

Page 19-12

Secondary Index Usage

USI Hash Mapping
PARSER

SELECT
FROM
WHERE

Hashing Algorithm
USI TableID

Row Hash

*
Table_Name
USI_col = 'siv';

siv

Message Passing Layer (Request is sent to a specific AMP)
AMP 0

AMP 1

USI Subtable

...

AMP 2

USI Subtable

USI Subtable
USI
RID
RH

USI
Value

(Base Table)
Row ID

siv

RIDx

Message Passing Layer (Request is sent to a specific AMP)
Base Table

Base Table

Data Rows
RID (8-16)
Data Columns

Data Rows
RID (8-16)
Data Columns

RIDx

Base Table
Data Rows
RID (8-16)
Data Columns

siv

Secondary Index Usage

Page 19-13

NUSI Subtable General Row Layout
The layout of a NUSI subtable row is illustrated on the facing page. It is almost identical to
the layout of a USI subtable row. There are, however, two major differences:


First, NUSI subtable rows are not distributed across the system via AMP number in
the Hash Map. NUSI subtable rows are built from the base table rows found on
that particular AMP and refer only to the base rows of that AMP.



Second, NUSI rows may point to or reference more than one base table row. There
can be many base table Row IDs (8, 10, or 16 bytes) in a NUSI subtable row.
Because NUSIs are always AMP-local to the base table rows, it is possible to have
the same NUSI value represented on multiple AMPs.

A NUSI subtable is just another table from the perspective of the file system.

NUSI Change for PPI
For tables defined with a PPI, the two-byte partition number is embedded in the data row.
Therefore, the unique row identifier is comprised of the Partition Number, Row Hash, and
Uniqueness Value. PPI data rows are two bytes wider than they would be if the table was
not partitioned.
If the base table is partitioned, then the NUSI subtable row needs 10 or 16 bytes for each
RowID entry to identify the Row ID of the base table row. The Row ID (of the base table
row) is combination of the Partition Number, Row Hash, and Uniqueness Value.
The NUSI subtable rows use the wider row identifier to identify the base table row, making
these subtable rows wider as well. Except for the embedded partition number, NUSI
subtable rows (for a PPI table) have the same format as non-PPI rows.
The facing page shows the row layout for NUSI subtable rows.

Page 19-14

Secondary Index Usage

NUSI Subtable General Row Layout
Row ID of NUSI

NUSI Subtable
Row Layout

Row
Length

Row
Hash

Uniq.
Value

4

4

Secondary
Index
Value

Table Row ID List
P RH U
2/8

4

4

P RH U
2/8

4

Ref.
Array
Pointer

4

Notes:

• The Row Hash is based on the base table secondary index value.
• The NUSI subtable row contains Row IDs that identify the base table rows on this
AMP that carry the Secondary Index Value.

• The Row IDs reference (or "point") to base table rows on this AMP only.
• There are one (or more) subtable rows for each secondary index value on the AMP.
– One NUSI subtable row can hold approximately 4000 – 8000 Row IDs; assuming a NUSI data
type less than 200 characters (CHAR(200)).

– If an AMP has more than 4000 – 8000 rows with the same NUSI value, another NUSI subtable
row is created for the same NUSI value.

• The maximum size of a single NUSI row is 64 KB.

Secondary Index Usage

Page 19-15

NUSI Hash Mapping
The example on the facing page shows the standard, three-part Message Passing Layer rowaccess message. Because NUSIs are AMP-local indexes, this message gets broadcast to all
AMPs. Each AMP uses the values to search the appropriate index block for a corresponding
NUSI row. Only those AMPs with one or more of the desired rows use the base table Row
IDs to access the proper data blocks and data rows.
In the example, the SELECT statement is designed to find those rows with a NUSI value of
‘siv’. Examination of the NUSI subtables on each AMP shows that AMPs 0, 2 and 3 (not
shown) all have a subtable index row, and, therefore, base table rows satisfying this
condition. These AMPs then participate in the base table access. The NUSI subtable on
AMP 1, on the other hand, shows that there are no rows with a NUSI value of ‘siv’ located
on this AMP. AMP 1 does not participate in the base table access process.
If the table is not partitioned, the subtable rows will identify the 8-byte Row IDs of the base
table rows.
If the table is partitioned with less than (or equal) 65,535 partitions, the subtable rows will
identify the 10-byte Row IDs of the base table rows. This Row ID includes the Partition
Number.
If the table is partitioned with more than 65,535 partitions, the subtable rows will identify
the 16-byte Row IDs of the base table rows. This Row ID includes the Partition Number.

Page 19-16

Secondary Index Usage

NUSI Hash Mapping
PARSER

SELECT *
FROM
Table_Name
WHERE NUSI_col = 'siv';

Hashing Algorithm
NUSI TableID

Row Hash

siv

Message Passing Layer (Broadcast to all AMPs)
AMP 0

AMP 1

NUSI Subtable
NUSI
RID
RH

NUSI
Value
siv

(Base Table)
Row IDs

AMP 2
NUSI Subtable

NUSI Subtable
NUSI
RID

NUSI
Value

(Base Table)
Row IDs

RID1 RID2

NUSI
RID
RH

NUSI
Value
siv

(Base Table)
Row IDs
RID3

Base Table

Base Table

Base Table

Data Rows
RID (8-16)
Data Columns
RID1
siv

Data Rows
RID (8-16)
Data Columns

Data Rows
RID (8-16)
Data Columns
RID3

RID2

...

siv

siv

Secondary Index Usage

Page 19-17

Table Access – A Complete Example
The example on the facing page shows a four-AMP configuration with Base Table Rows,
NUSI Subtable rows, and USI Subtable Rows. The table and index can be used to answer
the following queries without having to do a full table scan:
SELECT * FROM Customer WHERE Phone = '666-5555' ;
SELECT * FROM Customer WHERE Cust = 80;
SELECT * FROM Customer WHERE Name = 'Rice' ;

Page 19-18

Secondary Index Usage

Table Access – A Complete Example
CUSTOMER
Cust

Name

USI

NUSI

NUPI

37
98
74
95
27
56
45
31
40
72
80
49
12
62
77
51

White
Brown
Smith
Peters
Jones
Smith
Adams
Adams
Smith
Adams
Rice
Smith
Young
Black
Jones
Rice

555-4444
333-9999
555-6666
555-7777
222-8888
555-7777
444-6666
111-2222
222-3333
666-7777
666-5555
111-6666
777-7777
444-5555
777-6666
888-2222

Example:
SELECT * FROM Customer WHERE Phone = '666-5555' ;
SELECT * FROM Customer WHERE Cust = 80;
SELECT * FROM Customer WHERE Name = 'Rice' ;

Phone

AMP 1
USI Subtable
RowID
244, 1
505, 1
744, 4
757, 1

Cust
74
77
51
27

RowID
884, 1
639, 1
915, 1
388, 1

NUSI Subtable
RowID
432, 8
448, 1
567, 3
656, 1

Name RowID
Smith 640, 1
White 107, 1
Adams 638, 1
Rice
536, 5

Base Table
RowIDCust Name Phone
USI NUSI NUPI
107, 1 37 White 555-4444
536, 5 80 Rice 666-5555
638, 1 31 Adams 111-2222
640, 1 40 Smith 222-3333

Secondary Index Usage

AMP 2
USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1

Cust
98
80
56
49

RowID
555, 6
536, 5
778, 7
147, 1

NUSI Subtable
RowID Name
432, 3 Smith
567, 2 Adams
852, 1 Brown

RowID
884, 1
471,1 717,2
555, 6

Base Table
RowID Cust Name Phone
USI NUSI NUPI
471, 1 45 Adams 444-6666
555, 6 98 Brown 333-9999
717, 2 72 Adams 666-7777
884, 1 74 Smith 555-6666

AMP 3
USI Subtable
RowID
288, 1
339, 1
372, 2
588, 1

Cust
31
40
45
95

RowID
638, 1
640, 1
471, 1
778, 3

NUSI Subtable
RowID
432, 1
448, 4
567, 6
770, 1

Name
Smith
Black
Jones
Young

RowID
147, 1
822, 1
338, 1
147, 2

Base Table
RowIDCust Name Phone
USI NUSI NUPI
147, 1 49 Smith 111-6666
147, 2 12 Young 777-4444
388, 1 27 Jones 222-8888
822, 1 62 Black 444-5555

AMP 4
USI Subtable
RowID
175, 1
489, 1
838, 1
919, 1

Cust
37
72
12
62

RowID
107, 1
717, 2
147, 2
822, 1

NUSI Subtable
RowID
262, 1
396, 1
432, 5
656, 1

Name RowID
Jones 639, 1
Peters 778, 3
Smith 778, 7
Rice
915, 1

Base Table
RowID Cust Name
USI NUSI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 1 51 Rice

Phone
NUPI
777-6666
555-7777
555-7777
888-2222

Page 19-19

Secondary Index Considerations
As mentioned at the beginning of this module, a table may have up to 32 Secondary Indexes
that can be created and dropped dynamically. It is probably not a good idea to create 32 SIs
for each table just to speed up set selection because SIs consume the following extra
resources:


SIs require additional storage to hold their subtables. In the case of a Fallback
table, the SI subtables are Fallback also. Twice the additional storage space is
required.



SIs require additional I/O to maintain these subtables.

When deciding whether or not to define a NUSI, there other considerations. The Optimizer
may choose to do a Full Table Scan rather than utilize the NUSI in two cases:


When the NUSI is not selective enough.



When no COLLECTed STATISTICS are available.

As a guideline, choose only those rows having frequent access as NUSI candidates. After
the table has been loaded, create the NUSI indexes, COLLECT STATISTICS on the
indexes, and then do an EXPLAIN referencing each NUSI. If the Parser chooses a Full
Table Scan over using the NUSI, drop the index.

Page 19-20

Secondary Index Usage

Secondary Index Considerations
•

A table may have up to 32 secondary indexes.

•

Secondary indexes may be created and dropped dynamically.
– They require additional storage space for their subtables.
– They require additional I/Os to maintain their subtables.

•
•

If the base table is Fallback, the secondary index subtable is Fallback as well.

•

Without COLLECTed STATISTICS, the Optimizer often does a FTS.

•

The following approach is recommended:
– Create the index.
– COLLECT STATISTICS on the index (or column).
– Use EXPLAIN to see if the index is being used.

The Optimizer may, or may not, use a NUSI, depending on its selectivity.

Secondary Index Usage

Page 19-21

Single NUSI Access (Between, Less Than, or Greater
Than)
The Teradata Database accesses data from a NUSI-defined column in three ways:


If the NUSI is not ordered by value, utilize the NUSI and do a Full Table Scan
(FTS) of the NUSI subtable. In this case, the Row IDs of the qualifying base table
rows would be retrieved into spool. The Teradata Database would use those Row
IDs in spool to access the base table rows themselves.



If the NUSI is ordered by values, the NUSI subtable may be used to locate
matching base table rows.



Ignore the NUSI and do an FTS of the base table itself.

In order to make this decision, the Optimizer requires COLLECTed STATISTICS.

 REMEMBER 
The only way to determine for certain whether an index is being used
is to utilize the EXPLAIN facility.

Page 19-22

Secondary Index Usage

Single NUSI Access
(Between, Less Than, or Greater Than)
If the NUSI is not value-ordered, the system may do a FTS of the NUSI subtable.

• Retrieve Row IDs of qualifying base table rows into spool.
• Access the base table rows from the spooled Row IDs.
The Optimizer requires COLLECTed STATISTICS to make this choice.

• CREATE
• SELECT
FROM
WHERE

• SELECT
FROM
WHERE

• SELECT
FROM
WHERE

INDEX (hire_date) ON Employee;
last_name, first_name, hire_date
Employee
hire_date BETWEEN DATE '2012-01-01' AND DATE '2012-12-31';
last_name, first_name, hire_date
Employee
hire_date < DATE '2012-01-01';
last_name, first_name, hire_date
Employee
hire_date > DATE '1999-12-31';

If the NUSI is ordered by values, the NUSI subtable is much more likely be used
to locate matching base table rows.
Use EXPLAIN to see if, and how, indexes are being used.

Secondary Index Usage

Page 19-23

Dual NUSI Access
In the example on the facing page, two NUSIs are CREATEd on separate columns of the
EMPLOYEE TABLE. The Teradata Database decides how to use these NUSIs based on
their selectivity.

AND with Equality Conditions


If one of the two indexes is strongly selective, the system uses it alone for access.



If both indexes are weakly selective, but together they are strongly selective, the
system does a bit-map intersection.



If both indexes are weakly selective separately and together, the system does an
FTS.

In any case, any conditions in the SQL statement not used for access (residual conditions)
become row qualifiers.

OR with Equality Conditions
When accessing data with two NUSI equality conditions joined by the OR operator (as
shown in the last example on the facing page), the Teradata Database may do one of the
following:


Do a FTS of the base table.



If each of the NUSIs is strongly selective, it may use each of the NUSIs to return
the appropriate rows.



Do an FTS of the two NUSI subtables and do the following steps.
–
–
–

Retrieve Rows IDs of qualifying base table rows into two separate spools.
Eliminate duplicates from the two spools of Row IDs.
Access the base rows from the resulting spool of Row IDs.

If only one of the two columns joined by the OR is indexed, the Teradata Database always
does an FTS of the base tables.

Page 19-24

Secondary Index Usage

Dual NUSI Access
Each column is a separate NUSI:
CREATE INDEX (department_number) ON Employee;
CREATE INDEX (job_code)
ON Employee;

AND with Equality Conditions:
SELECT
FROM
WHERE
AND

last_name, first_name, …
Employee
department_number = 500
job_code = 2147;

OR with Equality Conditions:
SELECT
FROM
WHERE
OR

last_name, first_name, ...
Employee
department_number = 500
job_code = 2147;

Secondary Index Usage

Optimizer options with AND:

•

Use one of the two indexes if it is strongly
selective.

•

If the two indexes together are strongly selective,
optionally do a bit-map intersection.

•

If both indexes are weakly selective separately
and together, the system does a FTS.

Optimizer options with OR:

•
•

Do a FTS of the base table.

•

Do a FTS of the two NUSI subtables and retrieve
Rows IDs of qualifying rows into spool and
eliminate duplicate Row IDs from spool.

If each of the NUSIs is strongly selective, it may
use each of the NUSIs to return the appropriate
rows.

Page 19-25

NUSI Bit Mapping
NUSI Bit Mapping is a process that determines common Row IDs between multiple NUSI
values by a process of intersection:



It is much faster than copying, sorting and comparing the Row ID lists.
It dramatically reduces the number of base table I/Os.

NUSI bit mapping can be used with conditions other than equality if all of the following
conditions are satisfied:




All conditions must be linked by the AND operator.
At least two NUSI equality conditions must be specified.
The Optimizer is more likely to consider if you have COLLECTed STATISTICS
on the NUSIs.

Even when the above conditions are satisfied, the only way to be absolutely certain that
NUSI bit mapping is occurring is to use the EXPLAIN facility.

Example
The SQL statement and diagram on the facing page show how NUSI bit-map intersections
can narrow down the number of rows even though each condition is weakly selective.
In this example, the designer wants to access rows from the employee table. There are three
NUSIs defined: salary_amount, country_code, and job_code. All three of these NUSIs are
weakly selective. You can see that 7% of the employees earn more than $75,000 per year
(>75000), 40% of the employees are located in the USA, and 12% of the employees have a
job code of IT.
In this case, the bit map intersection of these three NUSIs has an aggregate selectivity of
.3%. That is, only .3% of the employees satisfy all three conditions: earning over $75,000,
USA based, and work in IT.

Page 19-26

Secondary Index Usage

NUSI Bit Mapping
•
•
•
•
•

Determines common Row IDs between multiple NUSI values.

•
•

Use EXPLAIN to see if bit mapping is being used.

Faster than copying, sorting, and comparing the Row ID lists.
Dramatically reduces the number of base table I/Os.
All NUSI conditions must be linked by the AND operator.
The Optimizer is much more likely to consider bit mapping if you COLLECT
STATISTICS.
Requires at least 2 NUSI equality conditions.
.3%
SELECT *
FROM
Employee
WHERE salary_amount > 75000
AND

country_code = 'USA'

AND

job_code = 'IT';

Secondary Index Usage

7%
40%
12%

Page 19-27

Value-Ordered NUSIs
NUSIs are maintained as separate subtables on each AMP. Their index entries point to base
table or Join Index rows residing on the same AMP as the index. The row hash for NUSI
rows is based on the secondary index column(s). Unlike row hash values for base table
rows, this row hash does not determine the distribution of subtable rows; only the local sort
order of each subtable.
Enhancements have been made to support the user-specified option of sorting the index rows
by data value rather than by hash code. This is referred to as "value ordered" indexes and is
presented to the user in the form of new syntax options in the CREATE INDEX statement.
By using the “value-ordered” indexes feature, this option can be specified to sort the index
rows by data value rather than by hash code.
The typical use of a hash-ordered NUSI is with an equality condition on the secondary
index column(s). The specified secondary index value is hashed and then each NUSI
subtable is probed for rows with the same row hash. For each matching NUSI entry, the
corresponding Row IDs are used to access the base rows on the same AMP. Because the
NUSI rows are stored in row hash order, searching the NUSI subtable for a particular row
hash is very efficient.
Value-ordered NUSIs, on the other hand, are useful for processing range conditions and
conditions with an inequality on the secondary index column set.
Although hash-ordered NUSIs can be selected by the Optimizer to access rows for range
conditions, a far more common response is to specify a full table scan of the NUSI subtable
to find the matching secondary key values. Therefore, depending on the size of the NUSI
subtable, this might not be very efficient.
By sorting the NUSI rows by data value, it is possible to search only a portion of the index
subtable for a given range of key values. The major advantage of a value-ordered NUSI is
in the performance of range queries.
Value-ordered NUSIs have the following limitations.
The sort key is limited to a single numeric column.
The sort key column must be four or fewer bytes.
The following query is an example of the sort of SELECT statement for which valueordered NUSIs were designed.
SELECT
FROM
WHERE
BETWEEN

Page 19-28

*
Orders
orderdate
DATE '2012-02-01' AND DATE '2012-02-29';

Secondary Index Usage

Value-Ordered NUSIs
A Value-Ordered NUSI is limited to a single column numeric (4-byte) value.
Some benefits of using value-ordered NUSIs:

•
•
•
•

Index subtable rows are sorted (sequenced) by data value rather than hash value.
Optimizer can search only a portion of the index subtable for a given range of values.
Can provide major advantages in performance of range queries.
Even with PPI, the Value-Ordered NUSI is still a valuable index selection for other
columns in a table.

Example of creating a Value-Ordered NUSI:
CREATE INDEX

(sales_date)
ORDER BY VALUES (sales_date)
ON Daily_Sales;

SELECT

sales_date
,SUM (sales)
Daily_Sales
sales_date
DATE '2012-02-09' AND DATE '2012-02-15'
1
1;

FROM
WHERE
BETWEEN
GROUP BY
ORDER BY

Secondary Index Usage

The optimizer may
choose to transverse
the NUSI using a
range constraint
rather than do a FTS.

Page 19-29

Value-Ordered NUSIs (cont.)

column_1_name

The names of one or more columns whose field values are to be
indexed.
You can specify up to 64 columns for the new index. The index is
based on the combined values of each column. Unless you use the
ORDER BY clause, all columns are hash-ordered.

ORDER BY

Multiple indexes can be defined on the same columns as long as each
index differs in its ordering option (VALUES versus HASH).
Row ordering on each AMP by a single NUSI column: either valueordered or hash-ordered.

VALUES

Rules for using an ORDER BY clause are shown in the following
table.
Value-ordering for the ORDER BY column.

HASH

Select VALUES to optimize queries that return a contiguous range of
values, especially for a covered index or a nested join.
Hash-ordering for the ORDER BY column.
Select HASH to limit hash-ordering to one column, rather than all
columns (the default).
Hash-ordering a multi-column NUSI on one of its columns allows the
NUSI to participate in a nested join where join conditions involve only
that ordering column.

Note: A Value-Ordered NUSI actually reserves two subtable IDs and this counts as 2
secondary indexes in the maximum count of 32 for a table.

Page 19-30

Secondary Index Usage

Value-Ordered NUSIs (cont.)
• Option that increases the ability of a NUSI to “cover” SQL queries without
having to access the base table.

• Value-Ordered is sorted by the ‘ORDER BY VALUES’ clause and the sort
column is limited to a single numeric column that cannot exceed 4 bytes.

– Value-Ordered is useful for range constraint queries.

• The ‘ORDER BY HASH’ clause provides the ability to create a multi-valued
index, but have the NUSI hashed based on a single attribute within the index,
not the entire composite value.

– Hash-Ordered is useful for equality searches based on a single attribute.
– Example: A NUSI may contain 10 columns for covering purposes and a single
value 'ORDER BY HASH' for equality searches on that NUSI value.

• Optimizer is much more likely to use a value-ordered NUSI if you have
collected statistics on the value-ordered NUSI.

Secondary Index Usage

Page 19-31

Covering Indexes
If the query references only columns of that table that are fully contained within a given
index, the index is said to "cover" the table in the query. In these cases, it is often more
efficient to access only the index subtable and avoid accessing the base table rows
altogether.
Covering will be considered for any table in the query that references only columns defined
in a given NUSI. These columns can be specified anywhere in the query including the:





SELECT list
WHERE clause
Aggregate functions
GROUP BY expressions

The presence of a WHERE condition on each indexed column is not a prerequisite for using
the index to cover the query. The optimizer will consider the legality and cost of covering
versus other alternative access paths and choose the optimal plan. Many of the potential
performance gains from index covering require no user intervention and will be transparent
except for the execution plan returned by EXPLAIN.

Join Index Note:
This course hasn’t covered Join Indexes to this point, but it is possible to create a NUSI on
top of a Join Index. The CREATE INDEX has a special option of ALL which is required if
these columns will be potentially used for covering.
The class of indexed data that will require user intervention to take advantage of covering is
NUSIs, which may be defined on a Join Index. By default, a NUSI defined on a Join Index
will maintain RowID pointers to only physical rows. In order to use the NUSI to cover the
data stored in a Join Index, Row IDs must be kept for each associated logical row. As a
result, when defining a potential covering NUSI on top of a Join Index, users should specify
the ALL option to indicate the NUSI rows should point to logical rows.

Example
Defining a NUSI on top of a Join Index
CREATE JOIN INDEX OrdCustIdx as
SELECT (custkey, custname), (orderstatus, orderdate, ordercomment)
FROM
Orders O LEFT JOIN Customer C ON O.custkey = C.custkey
ORDER BY custkey
PRIMARY INDEX (custname);
CREATE INDEX idx_name_stat ALL (custname, orderstatus) on OrdCustIdx;

Page 19-32

Secondary Index Usage

Covering Indexes
• The optimizer considers using a NUSI subtable to “cover” any query that references
only columns defined in a given NUSI.

• These columns can be specified anywhere in the query including:
–
–
–
–
–

SELECT list
WHERE clause
Aggregate functions
GROUP BY clauses
Expressions

• Presence of a WHERE condition on each indexed column is not a prerequisite for using
the index to cover the query.

• NUSIs (especially a covering NUSI) are considered by the optimizer in join plans and
can be joined to other tables in the system.
Query considered for index covering:
CREATE INDEX IdxOrd
(orderkey, orderdate, totalprice)
ON Orders ;

Query to access the
table via the OrderKey.

Secondary Index Usage

SELECT
FROM
WHERE
GROUP BY

Query considered for index covering and ordering:
CREATE INDEX IdxOrd2
(orderkey, orderdate, totalprice)
ORDER BY VALUES (orderkey)
ON Orders ;
orderdate, AVG(totalprice)
Orders
orderkey >1000
orderdate ;

Page 19-33

Covering Indexes (cont.)
NUSIs and Aggregate Processing
When aggregation is performed on a NUSI column, the Optimizer accesses the NUSI
subtable that offers much better performance than accessing the base table rows. Better
performance is achieved because there should be fewer index blocks and rows in the
subtable than data blocks and rows in the base table, thus requiring less I/O.

Example
In the example on the facing page, there is a NUSI defined on the state column of the
location table. Aggregate processing of this NUSI column produces much faster results for
the SELECT statement, which counts the number of rows for each state.

Page 19-34

Secondary Index Usage

Covering Indexes (cont.)
•
•
•
•

The Optimizer uses NUSI subtables for aggregation when possible.
If the aggregated column is a NUSI, subtable access may be sufficient.
The system counts Row ID List entries for each AMP for each value.
Also referred to as a “Covered NUSI”.
SELECT
FROM
GROUP BY
NUSI Subtable

NUSI Subtable

COUNT (*), state
Location
state;
NUSI Subtable

= subtable Row ID

NUSI Subtable

NY

NY

NY

NY

OH

OH

OH

OH

GA

GA

GA

GA

CA

CA

CA

CA

Secondary Index Usage

Page 19-35

NUSI vs. Full Table Scan (FTS)
The Optimizer generally chooses an FTS over a NUSI when one of the following occurs:




Rows per value is greater than data blocks per AMP.
It does not have COLLECTed STATISTICS on the NUSI.
The index is too weakly selective. The Optimizer determines this by using
COLLECTed STATISTICS.

Example
The table on the facing page shows how the access method chosen affects the number of
physical I/Os per AMP.
In the case of a NUSI, there is ONE I/O necessary to read the Index Block on each AMP
plus 0-ALL (where ALL = Number of Data Blocks) I/Os required to read the Data Blocks
for a possible total ranging from the Number of AMPs - (Number of AMPs + ALL) I/Os.
In the case of a Full Table Scan, there are no I/Os required to read any Index Blocks, but the
system reads ALL Data Blocks.
The only way to tell whether or not a NUSI is being used is by using EXPLAIN.

COLLECT STATISTICS on all NUSIs.
Use EXPLAIN to see whether a NUSI is being used.
Do not define NUSIs that will not be used.

Page 19-36

Secondary Index Usage

NUSI vs. Full Table Scan (FTS)
The Optimizer generally chooses a FTS over a NUSI when:

• It does not have COLLECTed STATISTICS on the NUSI.
• The index is too weakly selective.
• Small tables.
Access Method

Physical I/Os per AMP

NUSI

1
0 – Many

Index Subtable Block(s)
Data Blocks

Full Table Scan

0
ALL

Index Subtable Blocks
Data Blocks

General Rules:

• COLLECT STATISTICS on all NUSIs.
• USE EXPLAIN to see whether a NUSI is being used.
• Do not define NUSIs that will not be used.

Secondary Index Usage

Page 19-37

Full Table Scans – Sync Scans
In the case of multiple users that access the same table at the same time, the system can do a
synchronized scan (sync scan) on the table.

Page 19-38

Secondary Index Usage

Full Table Scans – Sync Scans
In the case of multiple users that access the same table at the same time, the
system can do a synchronized scan (sync scan) on the table.

• Multiple simultaneous scans share reads – this is a sync scan at the block level.
• New query joins scan at the current scan point.

Table Rows

112747
Query 1766
1
034982
2212
Begins
310229
2231

100766
106363
108222

3001
3005
3100

Frankel
Bench
Palmer

Allan
John
Carson

209181
123881
223431

1235
2433
2500

108221
101433
105200

3001
3007
3101

Smith
Walton
Brooks

Buster
Sam
Steve

221015
121332
118314

1019
2281
2100

108222
3199
Query 2
101281 Begins
3007
101100
3002

Woods
Walton
Ramon

Tiger
John
Anne

104631
210110
210001

1279
1201
1205

100279
101222
105432

3002
3003
3022

Roberts
Douglas
Morgan

Julie
Michael
Joe

100076
100045
319116

1011
1012
1219

104321
101231
121871

3021
3087
3025

Anderson
Sparky
Query 3
Michelson
Phil
Begins
Crawford
Cindy

:
:

:
:

:
:

:
:

Secondary Index Usage

:
:

:
:

Page 19-39

Module 19: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 19-40

Secondary Index Usage

Module 19: Review Questions
1. Because the row is hash-distributed on different columns, the subtable row will typically land on an
AMP other than the one containing the data row. This index would be:
a. UPI or NUPI
b. USI
c. NUSI
d. None of the above
2. The Teradata DBS hashes the value and uses the Row Hash to find the desired rows. This is always
a one-AMP operation. This index would be:
a.
b.
c.
d.

UPI or NUPI
USI
NUSI
None of the above

3. ___________________ is a process that determines common Row IDs between multiple NUSI values
by a process of intersection.
a. NUSI Bit Mapping
b. Dual NUSI Access
c. Full Table Scan
d. NUSI Read
4. If aggregation is performed on a NUSI column, the Optimizer accesses the NUSI subtable and returns
the result without accessing the base table. This is referred to as:
a.
b.
c.
d.

NUSI bit mapping
Full table scan
Dual NUSI access
Covering Index

Secondary Index Usage

Page 19-41

Notes

Page 19-42

Secondary Index Usage

Module 20
Analyze Secondary Index Criteria

After completing this module, you will be able to:
 Describe Composite Secondary Indexes.
 Choose columns as candidate Secondary Indexes.
 Analyze Change Rating, Value Access, and Range Access.

Teradata Proprietary and Confidential

Analyze Secondary Index Criteria

Page 20-1

Notes

Page 20-2

Analyze Secondary Index Criteria

Table of Contents
Accessing Rows ......................................................................................................................... 20-4
Row Selection ............................................................................................................................ 20-6
Secondary Index Considerations ................................................................................................ 20-8
Secondary Index Usage ............................................................................................................ 20-10
Secondary Index Candidate Guidelines ................................................................................... 20-12
Exercise 3 – Sample ................................................................................................................. 20-14
Secondary Index Candidate Guidelines ............................................................................... 20-14
Exercise 3 – Choosing SI Candidates ...................................................................................... 20-16
Exercise 3 – Choosing SI Candidates (cont.) ....................................................................... 20-18
Exercise 3 – Choosing SI Candidates (cont.) ....................................................................... 20-20
Exercise 3 – Choosing SI Candidates (cont.) ....................................................................... 20-22
Exercise 3 – Choosing SI Candidates (cont.) ....................................................................... 20-24
Exercise 3 – Choosing SI Candidates (cont.) ....................................................................... 20-26
Change Rating .......................................................................................................................... 20-28
Value and Range Access .......................................................................................................... 20-30
Exercise 4 – Sample ................................................................................................................. 20-32
Exercise 4 – Eliminating Index Candidates ............................................................................. 20-34
Exercise 4 – Eliminating Index Candidates (cont.) .............................................................. 20-36
Exercise 4 – Eliminating Index Candidates (cont.) .............................................................. 20-38
Exercise 4 – Eliminating Index Candidates (cont.) .............................................................. 20-40
Exercise 4 – Eliminating Index Candidates (cont.) .............................................................. 20-42
Exercise 4 – Eliminating Index Candidates (cont.) .............................................................. 20-44
Module 20: Review Questions ................................................................................................. 20-46

Analyze Secondary Index Criteria

Page 20-3

Accessing Rows
Three SQL commands require that rows be physically read. They are SELECT, UPDATE,
and DELETE. Their syntax and use are described below:

SELECT [expression] FROM tablename ...
UPDATE tablename SET col_name = [expression] ...
DELETE FROM tablename ...



The SELECT command returns the value(s) from the table(s) for display or
processing. Many people confuse the SQL SELECT statement with a READ
command (e.g., COBOL). SELECT simply asks for the column values expressed
in the project list to be returned for display or processing. The rows which have
their values returned, deleted or updated are identified by the WHERE clause
(when present). It is the WHERE clause that controls File System reads.



The UPDATE command changes one or more column values to new values.



The DELETE command removes rows from a table.

Any of the three SQL statements can be modified with a WHERE clause. Values specified
in a WHERE clause tell Teradata which rows should be acted upon. Proper use of the
WHERE clause will improve throughput by limiting the number of rows that must be
handled.

Page 20-4

Analyze Secondary Index Criteria

Accessing Rows
SELECT {expression} FROM tablename…

• Returns the value(s) from the table(s) for display or processing.
• The row(s) must be physically read first.
UPDATE tablename SET columns = {expression}…

• Changes one or more column values to new values.
• The rows(s) must be physically located (read) first.
DELETE FROM tablename…

• Removes rows from a table.
• The row(s) must be physically located (read) first.
Any of the above SQL statements can contain a WHERE clause.

• Values in the WHERE cause tell Teradata what set of rows to act on.
• Without a WHERE clause, all rows participate in the operation.
• Limiting the number of rows Teradata must handle improves throughput.

Analyze Secondary Index Criteria

Page 20-5

Row Selection
When TERADATA processes an SQL statement with a WHERE clause, it examines the
clause and builds an execution plan and access method to satisfy the clause conditions.
Certain conditions contained in the WHERE clause take advantage of indexing (assuming
that the appropriate index is in place). These conditions are shown in the upper box on the
facing page. Notice that these conditions all ask the RDBMS to locate a specific value or set
of values. Application programmers should use these conditions whenever possible as they
offer the best performance.
Other WHERE clause conditions are not able to take advantage of indexing and will always
cause a Full Table Scan of either the Base Table or a SI subtable. Though they benefit from
the Teradata distributed architecture, they are less desirable from a performance standpoint.
These kind of conditions are listed in the middle box on the opposite page and do not focus
on a specific value or set of values, thus forcing the system to do a Full Table Scan to find
all the values to satisfy them.
Note that poor relational models severely limit physical design choices and generally force
more Full Table Scans.
Maximum number of ORed conditions or IN list values per request can't exceed 1,048,576.
There really no fixed limit on the number of entries in an IN list; however, the maximum
SQL text size is 1MB and this places a request-specific upper bound on this number.

 NOTE 
The small box at the bottom of the facing page lists commands that
operate on the answer sets generated by previous conditions, such as
those shown in the boxes above.

Page 20-6

Analyze Secondary Index Criteria

Row Selection
WHERE clause conditions that may use indexing if available*:
colname = value
colname IS NULL
colname IN (subquery)
*

colname IN (explicit list of values)
t1.col_x = t1.col_y
t1.col_x = t2.col_x

condition1 AND condition2
condition1 OR condition2
colname = ANY, SOME or ALL

Access methods for the above depend on whether the column(s) are indexed, type of index,
and selectivity of the index.

WHERE clause conditions that typically cause a Full Table Scan:
non-equality comparisons
colname IS NOT NULL
colname NOT IN (explicit list of values)
colname NOT IN (subquery)
colname BETWEEN ... AND …
Join condition1 OR Join condition2
t1.col_x [ computation ] = value
t1.col_x [ computation ] = t1.col_y

INDEX (colname)
SUBSTRING (colname)
SUM
MIN
MAX
AVG
COUNT

The following functions affect output only, not base
row selection.
Poor relational models severely limit physical design
choices and generally force more Full Table Scans.

Analyze Secondary Index Criteria

DISTINCT
ANY
ALL
NOT (condition1)
col1 || col2 = value
colname LIKE ...
missing a WHERE clause

GROUP BY
HAVING
WITH
WITH … BY ...

ORDER BY
UNION
INTERSECT
EXCEPT

Page 20-7

Secondary Index Considerations
The facing page describes key considerations involved in decisions regarding the use of
Secondary Indexes. It is important to weigh the costs of Secondary Indexes against the
benefits.


Some of these costs are increased use of disk space and increased I/O.



The main benefit of Secondary Indexes is faster set selection. Choose them on
frequently used set selections.

 REMEMBER 
Data demographics change over time.
Revisit all index choices regularly to make sure that
they remain appropriate and serve you well.

Page 20-8

Analyze Secondary Index Criteria

Secondary Index Considerations
• Secondary Indexes consume disk space for their subtables.
• INSERTs, DELETEs, and UPDATEs (sometimes) cost double the I/Os.
• Choose Secondary Indexes on frequently used set selections.
– Secondary Index use is typically based on an Equality search.
– A NUSI may have multiple rows per value.

•
•
•
•
•

The Optimizer may not use a NUSI if it is too weakly selective.
Avoid choosing Secondary Indexes with volatile data values.
Weigh the impact on Batch Maintenance and OLTP applications.
USI changes are Transient Journaled. NUSI changes are not.
Remove or drop NUSIs that are not used.

Data demographics change over time. Revisit ALL index
(Primary and Secondary) choices regularly.
Make sure they are still serving you well.

Analyze Secondary Index Criteria

Page 20-9

Secondary Index Usage
The facing lists common usage for a USI and a NUSI.

Page 20-10

Analyze Secondary Index Criteria

Secondary Index Usage
Unique Secondary Index (USI) Usage

• A USI is used to maintain uniqueness in a column or columns.
• Usage is determined by specifying the USI value in an equality condition in the
•

WHERE clause or ON clause.
Unique Secondary Indexes support …
– Nested Joins
– Row-hash locking

Non-unique Secondary Index (NUSI) Usage

• Usage is determined by specifying the NUSI value in an equality condition in the
WHERE clause or ON clause.

• Non-Unique Secondary Indexes support Nested Joins and Merge Joins
• Optimizer can choose to use bit-mapping for weakly selective (>10%) NUSIs which
can alleviate limitations associated with composite NUSIs.

• In some cases, it may be better to use multiple single-column NUSIs (City, State)
instead a single composite NUSI.
– User has to balance the overhead of multiple NUSIs as compared to a single composite NUSI.

• Can be used to “cover” a query, avoiding base table access.
• Can significantly reduce base table I/O during value and join operations.

Analyze Secondary Index Criteria

Page 20-11

Secondary Index Candidate Guidelines
All Primary Index candidates are Secondary Index candidates.
Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly selective. A
guideline to use in initially selecting NUSI candidates is the following:
The optimizer does not only look at selectively of a column to determine if a FTS or an
indexed access will be used in a given plan. The decision is made based after comparing the
total cost of both approaches, after considering multiple factors, including row size, block
size, number of rows in the table, and also the I/O and CPU cost (based on the current
hardware cost factors).
In this course, we are going to 5% as a guideline for NUSI selectivity.
Example 1: Assume a table has 100M rows and a column has 50 distinct values that are
evenly distributed (each value has the same number of rows). Therefore, each value has
2M rows and effectively represents 2% of the rows. The NUSI would be used.
Example 2: Assume a table has 100M rows and a column has 20 distinct values that are
evenly distributed (each value has the same number of rows). Therefore, each value has
5M rows and effectively represents 5% of the rows. The NUSI would be used.
Example 3: Assume a table has 100M rows and a column has 10 distinct values that are
evenly distributed (each value has the same number of rows). Therefore, each value has
10M rows and effectively represents 10% of the rows. The NUSI would not be used.
The greater the discrepancy between typical rows per value and max rows per value, the
higher the probability the NUSI would not be used based on the max value used to qualify
the rows.

Page 20-12

Analyze Secondary Index Criteria

Secondary Index Candidate Guidelines
• All Primary Index (PI) candidates are Secondary Index candidates.
– A UPI is a USI candidate and a NUPI is a NUSI candidate.
• Columns that are not PI candidates should also be considered as NUSI candidates.
• A NUSI will be used depending on the percentage of table rows that will be accessed.
For example:

– If the number of rows accessed via a NUSI is ≤ 5%, the NUSI will be used.
– If the number of rows accessed via a NUSI is 5 – 10%, the NUSI may or may not be
used.

– If the number of rows accessed via a NUSI is > 10%, the NUSI will not be used.
• If 5% is used as the guideline, then any column with 20 or more distinct values is
considered as a NUSI candidate.

– The optimizer (based on statistics) will decide to use (or not) the NUSI for specific values.
– The greater the discrepancy between typical rows per value and max rows per value, the higher
the probability the NUSI would not be used based on the max value used to qualify the rows.

• These are only guidelines for candidate selection. Validate (via Explain and testing)
that the NUSI will be chosen AND that it will provide better performance.

Analyze Secondary Index Criteria

Page 20-13

Exercise 3 – Sample
In this exercise, you will work with the same tables you used to identify PI candidates in
Exercise 2 in Module 17.
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates. The table on the facing page provides you with an example of how to apply the
Secondary Index Candidate Guidelines.
You will make further index choices for these tables in following exercises.
Note: These exercises do not provide row sizes. Therefore, assume that the rows could be
as large as 960 bytes and assume a typical block size of 96 KB.

Secondary Index Candidate Guidelines
All Primary Index candidates are Secondary Index candidates.
Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly selective. A
guideline to use in initially selecting NUSI candidates is the following:
If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-14

Analyze Secondary Index Criteria

Exercise 3 – Sample
Secondary Index Guidelines
• All PI candidates are Secondary Index candidates.
• Other columns are NUSI candidates if typical
rows/value is ≤ 5% or # of distinct values ≥ 20.

On the following pages, there are sample tables with
typical rows per value demographics.
• Indicate ALL possible Secondary Index
candidates (USI and NUSI).
• Later exercises will guide your final choices.

Example

60,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating
PI/SI

A

B

PK,SA
5K
12
1M
50M
60M
1
0
1
0
UPI
USI

2.6K
0
0
0
7M
12
5
7
1
NUPI
NUSI

C

D

FK,NN

NN,ND

0
0
1K
5K
1.5M
500
0
35
5
NUPI?
NUSI

500K
0
0
0
60M
1
0
1
3
UPI
USI

E

F

G

H

0
0
0
0
8
8M
0
7M
0

0
0
0
0
15M
9
725K
3
4

0
0
0
0
15M
725K
5
3
4

52
4K
0
0
700
90K
10K
80K
9

NUSI

NUSI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-15

Exercise 3 – Choosing SI Candidates
In this exercise, you will work with the same tables you used to identify PI candidates in
Exercise 2 in Module 17.
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-16

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates
ENTITY 1
100,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

A

B

C

D

E

F

0
0
0
0
95M
2
0
1
3
NUPI

0
0
0
0
300K
400
0
325
2
NUPI

0
0
0
0
250K
350
0
300
1
NUPI

0
0
0
0
40M
3
1.5M
2
1

0
0
0
0
1M
110
0
90
1
NUPI

PK,UA

50K
0
10M
10M
100M
1
0
1
0
UPI

PI/SI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-17

Exercise 3 – Choosing SI Candidates (cont.)
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-18

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates (cont.)
ENTITY 2
10,000,000
Rows

G

PK/FK

PK,SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

5K
12
100M
100M
10M
1
0
1
0
UPI

H

I

J

K

L

365
0
0
0
100K
200
0
100
0
NUPI

12
0
0
0
9M
2
100K
1
9

12
0
0
0
12
1M
0
800K
1

0
0
0
0
50
240K
0
190K
2

0
260
0
0
180K
60
0
50
0
NUPI

PI/SI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-19

Exercise 3 – Choosing SI Candidates (cont.)
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-20

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates (cont.)
DEPENDENT
5,000,000
Rows

A

PK/FK

M

N

O

PK

P

Q

NN,ND

FK

SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

0
0
700K
1M
2M
4
0
1
0

0
0
0
0
50
200K
0
60K
0

PI/SI

NUPI

0
0
0
0
90K
75
0
50
3

UPI

0
0
0
0
3M
2
390K
1
1

0
0
0
0
5M
1
0
1
0
UPI

0
0
0
0
2M
5
1M
1
1

NUPI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-21

Exercise 3 – Choosing SI Candidates (cont.)
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-22

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates (cont.)
ASSOCIATIVE 1
300,000,000
Rows

A

PK/FK

G

R

S

PK
FK

FK,SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

260
0
0
0
100M
5
0
3
0

0
0
8M
300M
10M
50
0
30
0

0
0
0
0
15K
21K
0
19K
0

0
0
0
0
800K
400
0
350
0

PI/SI

NUPI

NUPI

NUPI?

NUPI

UPI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-23

Exercise 3 – Choosing SI Candidates (cont.)
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-24

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates (cont.)
ASSOCIATIVE 2
100,000,000
Rows

A

M

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

G

T

U

0
0
0
0
750
135K
0
100K
0

PK
FK

FK

0
0
7M
800M
50M
3
0
1
0

0
0
250K
20M
10M
150
0
8
0

0
0
0
0
560K
180
0
170
0

NUPI

NUPI

UPI
PI/SI

NUPI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-25

Exercise 3 – Choosing SI Candidates (cont.)
Use the Secondary Index Candidate Guidelines below to identify all USI and NUSI
candidates.


All Primary Index candidates are Secondary Index candidates.



Columns that are not Primary Index candidates have to also be considered as NUSI
candidates. A NUSI will be used by the Optimizer to select data if it is strongly
selective. A guideline to use in initially selecting NUSI candidates is the
following:

If the number of distinct values ≥ 20, then the column is a NUSI candidate.

Page 20-26

Analyze Secondary Index Criteria

Exercise 3 – Choosing SI Candidates (cont.)
HISTORY
730,000,000
Rows

A

PK/FK

DATE

D

E

F

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

PK
FK

SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

10M
0
800M
2.4B
100M
18
0
3
0

5K
20K
0
0
730
1100K
0
900K
0

PI/SI

NUPI

UPI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-27

Change Rating
Change Rating is a number that comes from Application & Transaction Modeling
(ATM).


Change Rating indicates how often the values in a column, or columns, are
updated.



It is a value from 0 to 10, with 0 describing those columns which never change and
10 describing those columns which change with every write operation.



The Change Rating values of various types of columns are shown on the facing
page.

Change Rating has nothing to do with the SQL INSERT or DELETE statements. A table
may be subject to frequent INSERTs and/or DELETEs, but the Change Ratings of columns
will be low as long as the values within those columns remain stable throughout the lifetime
of the row.
Change Rating is dependent only on the SQL UPDATE statement. Change Rating is
affected when column values are UPDATEd.
Utilize Change Rating when choosing indexes. Primary Indexes must be based on columns
with very stable data values. PI columns should never have Change Ratings higher than 1.
Secondary Indexes should be based on columns with at least fairly stable data values. You
should not choose columns with Change Ratings higher than 3 for SIs.

Page 20-28

Analyze Secondary Index Criteria

Change Rating
Change Rating indicates how often values in a column are UPDATEd:

• 0 = column values never change.
• 10 = column changes with every write operation.
PK columns are always 0.
Historical data columns are always 0.
Data that does not normally change = 1.
Update tracking columns = 10.
All other columns are rated 2 - 9.

Base Primary Index choices on columns with very stable data values:

• A change rating of 0 - 1 is reasonable.
Base Secondary Index choices on columns with fairly stable data values:

• A change rating of 0 - 3 is reasonable.

Analyze Secondary Index Criteria

Page 20-29

Value and Range Access
Value Access Frequency is a numeric rating which tells you how many times all known
transactions access the table in a given time interval (e.g., a one-year period). It measures
how frequently a column, or columns, is accessed by SQL statements containing an equality
value.
Range Access Frequency is a numeric rating which tells you how many times all known
transactions access the table in a given time interval (e.g., a one-year period). It measures
how frequently a column, or columns, is accessed by SQL statements that access a range of
values such as a DATE range. These types of queries may contain inequality or BETWEEN
expressions.
A Value Access or Range Access of 0 implies that there is no need to access the table
through that column. Since NUSIs require system resources to maintain them (INSERTs
and DELETEs require additional I/O to update the SI subtables), there is no point in having
a NUSI if it is not used for access. All NUSI candidates with very low Value Access or
Range Access Frequency should be eliminated.

Page 20-30

Analyze Secondary Index Criteria

Value and Range Access
Value Access:

• How often a column appears with an equality value. For example:
WHERE column_name = hardcoded_value or substitutable_value

Range Access:

• How often a column is used to access a range of data values (e.g., range of dates).
For example:
WHERE column_name BETWEEN value AND value or WHERE column_name > value

Value Access or Range Access Frequency:

• How often in a given time interval (e.g., annually) all known transactions access rows
from the table through this column either with an equality value or with a range of
values.

Notes:

• The above demographics result from Activity Modeling.
• Low Value Access or Range Access Frequency:
– Secondary Index overhead may cost more than doing the FTS.
• NUSIs may be considered by the optimizer for joins. In the following exercises, we
are going to eliminate NUSIs with a value access of 0, but we may need to reconsider
the NUSI as an index choice depending on join access (when given join metrics).
• EXPLAINs indicate if the index choices are utilized or not.

Analyze Secondary Index Criteria

Page 20-31

Exercise 4 – Sample
In this exercise, you will again work with the same tables that you used in Exercises 2 and 3.
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines. The table on the right provides you with an example of how to
apply these guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.
The table on the facing page provides you with an example of how to apply these guidelines.
You will make final index choices for these tables in Exercise 5 (later module).

Page 20-32

Analyze Secondary Index Criteria

Exercise 4 – Sample
Change Rating Guidelines:
• PI – change rating 0 - 1.
• SI – change rating 0 - 3.
Value Access Guideline:
• NUSI Value Access > 0
• VONUSI Range Access > 0

On the following pages, there are sample tables with change row and
value access demographics.
• Eliminate Index candidates based on change rating and value
access.
• Identify any VONUSI candidates with a Range Access > 0
• Later exercises will guide your final choices.

Example

60,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating
PI/SI

A

B

PK,SA
5K
12
1M
50M
60M
1
0
1
0
UPI
USI

2.6K
0
0
0
7M
12
5
7
1
NUPI
NUSI

C

D

FK,NN

NN,ND

0
0
1K
5K
1.5M
500
0
35
5
NUPI?
NUSI

500K
0
0
0
60M
1
0
1
3
UPI
USI

E

F

G

H

0
0
0
0
8
8M
0
7M
0

0
0
0
0
15M
9
725K
3
4

0
0
0
0
15M
725K
5
3
4

52
4K
0
0
700
90K
10K
80K
9

NUSI

NUSI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-33

Exercise 4 – Eliminating Index Candidates
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-34

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates
ENTITY 1
100,000,000
Rows
PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating
PI/SI

A

B

C

D

E

F

0
0
0
0
95M
2
0
1
3
NUPI
NUSI

0
0
0
0
300K
400
0
325
1
NUPI
NUSI

0
0
0
0
250K
350
0
300
1
NUPI
NUSI

0
0
0
0
40M
3
1.5M
2
1

0
0
0
0
1M
110
0
90
1
NUPI
NUSI

PK,UA

50K
0
10M
10M
100M
1
0
1
0
UPI
USI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-35

Exercise 4 – Eliminating Index Candidates (cont.)
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-36

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates (cont.)
ENTITY 2
10,000,000
Rows

G

PK/FK

PK,SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

5K
12
100M
100M
10M
1
0
1
0
UPI
USI

PI/SI

H

I

J

K

L

365
0
0
0
100K
200
0
100
0
NUPI
NUSI

12
0
0
0
9M
2
100K
1
9

12
0
0
0
12
1M
0
800K
1

0
0
0
0
50
240K
0
190K
2

0
260
0
0
180K
60
0
50
0
NUPI
NUSI

NUSI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-37

Exercise 4 – Eliminating Index Candidates (cont.)
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-38

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates (cont.)
DEPENDENT
5,000,000
Rows

A

PK/FK

M

N

O

PK

Q

NN,ND

FK

SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

0
0
700K
1M
2M
4
0
1
0

0
0
0
0
50
200K
0
60K
0

PI/SI

NUPI

0
0
0
0
90K
75
0
50
3

0
0
0
0
3M
2
390K
1
1

UPI

0
0
0
0
5M
1
0
1
0
UPI

0
0
0
0
2M
5
1M
1
1

NUPI
USI

USI
NUSI

P

NUSI

NUSI

NUSI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-39

Exercise 4 – Eliminating Index Candidates (cont.)
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-40

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates (cont.)
ASSOCIATIVE 1
300,000,000
Rows

A

PK/FK

G

R

S

PK
FK

FK,SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

260
0
0
0
100M
5
0
3
0

0
0
8M
300M
10M
50
0
30
0

0
0
0
0
15K
21K
0
19K
0

0
0
0
0
800K
400
0
350
0

PI/SI

NUPI

NUPI

NUPI?

NUPI

NUSI

NUSI

NUSI

UPI
USI
NUSI
Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-41

Exercise 4 – Eliminating Index Candidates (cont.)
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-42

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates (cont.)
ASSOCIATIVE 2
100,000,000
Rows

A

M

PK/FK

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

G

T

U

0
0
0
0
750
135K
0
100K
0

PK
FK

FK

0
0
7M
800M
50M
3
0
1
0

0
0
250K
20M
10M
150
0
8
0

0
0
0
0
560K
180
0
170
0

NUPI

NUPI

NUSI

NUSI

UPI
PI/SI

NUPI
USI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

NUSI

NUSI

Page 20-43

Exercise 4 – Eliminating Index Candidates (cont.)
In this exercise, you will look at three additional demographics to eliminate potential index
candidates and to possibly choose Value-Ordered NUSI candidates. The three additional
data demographics that you will look at are:




Change Rating
Value Access
Range Access

Use the following Change Rating demographics guidelines to eliminate those candidates that
do not fit the guidelines.



PI candidates should have Change Ratings from 0 - 1.
SI candidates should have Change Ratings from 0 - 3.

Also, eliminate those NUSI candidates which have Value Access = 0 and Range Access = 0.
If a Range Access is greater than 0, then consider the column as a possible Value-Ordered
NUSI (VONUSI) candidate.

Page 20-44

Analyze Secondary Index Criteria

Exercise 4 – Eliminating Index Candidates (cont.)
HISTORY
730,000,000
Rows

A

PK/FK

DATE

D

E

F

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

0
0
0
0
N/A
N/A
N/A
N/A
N/A

PK
FK

SA

Value Access
Range Access
Join Access
Join Rows
Distinct Values
Max Rows/Value
Max Rows/NULL
Typical Rows/Value
Change Rating

10M
0
800M
2.4B
100M
18
0
3
0

5K
20K
0
0
730
1100K
0
900K
0

PI/SI

NUPI

UPI
USI
NUSI

NUSI

Collect Statistics (Y/N)

Analyze Secondary Index Criteria

Page 20-45

Module 20: Review Questions
Check your understanding of the concepts discussed in this module by completing the
review questions as directed by your instructor.

Page 20-46

Analyze Secondary Index Criteria

Module 20: Review Questions
1. With a NUPI, a technique to avoid a duplicate row check is to ________.
a.
b.
c.
d.

use set tables
use the NOT NULL constraint on the column
create the table as a MULTISET table
compare data values byte-by-byte within a Row Hash in order to ensure uniqueness

2. Which type of usage normally applies to a USI? ____
a.
b.
c.
d.

Range access
NOT condition
Equality value access
Inequality value access

3. Which two types of usage normally apply to a composite NUSI that is hash-ordered? ____ ____
a.
b.
c.
d.

Covering index
Equality value access
Inequality value access
Non-covering range access

Analyze Secondary Index Criteria

Page 20-47

Notes

Page 20-48

Analyze Secondary Index Criteria

Module 21
Access Considerations and Constraints

After completing this module, you will be able to:
 Analyze Optimizer Access scenarios.
 Explain partial value searches and data conversions.
 Identify the effects of conflicting data types.
 Determine the cost of I/Os.
 Identify column level attributes and constraints.
 Identify table level attributes and constraints.
 Add, modify and drop constraints from tables.
 Explain how the Identity column allocates new numbers.

Teradata Proprietary and Confidential

Access Considerations and Constraints

Page 21-1

Notes

Page 21-2

Access Considerations and Constraints

Table of Contents
Access Method Comparison ...................................................................................................... 21-4
Unique Primary Index (UPI) .................................................................................................. 21-4
Non-Unique Primary Index (NUPI) ....................................................................................... 21-4
Unique Secondary Index (USI) .............................................................................................. 21-4
Non-Unique Secondary Index (NUSI) ................................................................................... 21-4
Full-Table Scan (FTS)............................................................................................................ 21-4
Optimizer Access Scenarios ....................................................................................................... 21-6
Data Conversions ....................................................................................................................... 21-8
Storing Numeric Data .............................................................................................................. 21-10
Data Conversion Example........................................................................................................ 21-12
Matching Data Types ............................................................................................................... 21-14
Counting I/O Operations .......................................................................................................... 21-16
Additional I/O ...................................................................................................................... 21-16
Transient Journal I/O ............................................................................................................... 21-18
INSERT and DELETE Operations .......................................................................................... 21-20
UPDATE Operations ............................................................................................................... 21-22
Primary Index Value UPDATE ............................................................................................... 21-24
Table Level Attributes ............................................................................................................. 21-26
Example of Column and Table Level Constraints ................................................................... 21-28
Table Level Constraints ....................................................................................................... 21-28
Example (13.0) – SHOW Department Table ........................................................................... 21-30
Example (13.10) – SHOW Department Table ......................................................................... 21-32
Altering Table Constraints ....................................................................................................... 21-34
Identity Column – Overview.................................................................................................... 21-36
Business Value ..................................................................................................................... 21-36
Business Usage .................................................................................................................... 21-36
Identity Column – Implementation .......................................................................................... 21-38
Performance ......................................................................................................................... 21-38
Process for Generating Identity Column Numbers .............................................................. 21-38
Identity Column – Example 1 .................................................................................................. 21-40
Identity Column – Example 2 .................................................................................................. 21-42
Identity Column – Considerations ........................................................................................... 21-44
Limited to DECIMAL(18,0) ................................................................................................ 21-44
Restrictions........................................................................................................................... 21-44
Module 21: Review Questions ................................................................................................. 21-46

Access Considerations and Constraints

Page 21-3

Access Method Comparison
We have seen in preceding modules that Teradata can access data (through indexes or
Partition, or Full Table Scans). The facing page illustrates these various access methods in
order of number of AMPs affected.

Unique Primary Index (UPI)
The UPI is the most efficient way to access data. Accessing data through a UPI is a oneAMP operation that leads directly to the single row with the desired UPI value. The system
does not have to create a Spool file during a UPI access.

Non-Unique Primary Index (NUPI)
Accessing data through a NUPI is a one-AMP operation that may lead to multiple rows with
the desired NUPI value. The system creates a spool file during a NUPI access, if needed.
NUPI access is efficient if the number of physical block reads is small.

Unique Secondary Index (USI)
A USI is a very efficient way to access data. Data access through a USI is usually a twoAMP operation, which leads directly to the single row with the desired USI value. The
system does not have to create a spool file during a USI access.
There are cases where a USI is actually more efficient than a NUPI. In these cases, the
optimizer decides on a case-by-case basis which method is more efficient. Remember: the
optimizer can only make informed decisions if it is provided with statistics.

Non-Unique Secondary Index (NUSI)
As we have seen, the non-unique secondary index (NUSI) is efficient only if the number of
rows accessed is a small percentage of the total data rows in the table. NUSI access is an
all-AMPs operation since the NUSI subtables must be scanned on each AMP. It is a
multiple rows operation since there can be many rows per NUSI value. A spool file will be
created if needed.

Full-Table Scan (FTS)
The Full-Table Scan is efficient in that each row is scanned only once. Although index
access is generally preferred to a FTS, there are cases where they are the best way to access
the data.
Like the situation with NUPIs and USIs, Full Table Scans can sometimes be more efficient
than a NUSI. The optimizer decides on a case-by-case basis which is more efficient
(assuming that it has been provided with statistics).
The Optimizer chooses what it thinks is the fastest access method.
COLLECT STATISTICS to help the Optimizer make good decisions.

Page 21-4

Access Considerations and Constraints

Access Method Comparison
Unique Primary Index

Non-Unique Secondary Index

• Very efficient
• One AMP, one row
• No spool file

• Efficient only if the number of rows accessed
•
•

Non-Unique Primary Index
• Efficient if the number of rows per value
•
•

is reasonable and there are no severe
spikes.
One AMP, multiple rows
Spool file if needed

No Primary Index
• Access is a full table scan without
secondary indexes.

is a small percentage of the total data rows in
the table.
All AMPs, multiple rows
Spool file if needed

Partition Scan
• Efficient since because of partition
elimination.

• All AMPs; all rows in specific partitions

Full-Table Scan
• Efficient since each row is touched only once.
• All AMPs, all rows
• Spool file may equal the table in size

Unique Secondary Index
• Very efficient
• Two AMPs, one row
• No spool file

Access Considerations and Constraints

The Optimizer chooses the fastest access method.
COLLECT STATISTICS to help the Optimizer make
good decisions.

Page 21-5

Optimizer Access Scenarios
Given the SQL WHERE clause on the facing page, the Optimizer decides which column it
will use to access the data. This decision is based upon what indexes have been defined on
the two columns (Col_1 and Col_2).
When you examine the table, you can see that the optimizer chooses the most efficient
access method depending on the situation. Interesting cases to note are as follows:
If Col_1 is a NUPI and Col_2 is a USI, the Optimizer chooses Col_1 (NUPI) if its
selectivity is close to a UPI (nearly unique). Otherwise, it accesses via Col_2
(USI) since only one row is involved, even though it is a two-AMP operation.
If both columns are NUSIs, the Optimizer must determine the how selective each of
them is. Depending on the relative selectivity, the Optimizer may choose to access
via Col_1, Col_2, NUSI Bit Mapping or a FTS.
If either one of the columns is a NUSI or the other column is not indexed, the Optimizer
determines the selectivity of the NUSI. Depending on this selectivity, it chooses
either to utilize the NUSI or to do a FTS.
Whenever one of the columns is used to access the data, the remaining condition is used as a
row qualifier. This is known as a residual condition.

Page 21-6

Access Considerations and Constraints

Optimizer Access Scenarios
Col_2
Col_1

SINGLE TABLE CASE
WHERE
AND

Table_1.Col_1 = :value_1
Table_1.Col_2 = :value_2 ;

USI

NOT
INDEXED

NUSI

UPI

UPI

UPI

NUPI

NUPI or
USI 1

NUPI

NUPI

USI

USI

USI

USI

NUSI

USI

NOT
INDEXED

USI

UPI

Either, Both,
or FTS 2
NUSI or FTS

Column the
Optimizer
uses for
access.

NUSI or FTS
3

FTS

3

Notes:
1. The Optimizer prefers Primary Indexes over Secondary Indexes. It chooses the NUPI if only one
I/O (block) is accessed.
The Optimizer prefers Unique indexes over non-unique indexes. Only one row is involved with
USI even though it is a two-AMP operation.
2. Depending on relative selectivity, the Optimizer may use either NUSI, may use both with NUSI Bit
Mapping, or may do a FTS.
3. It depends on the selectivity of the index.

Access Considerations and Constraints

Page 21-7

Data Conversions
Operands in an SQL statement must be of the same data type to be compared. If operands
differ, internal data conversion is performed.
Data conversion is expensive in terms of system overhead and adversely affects
performance. The physical designer should make every effort to minimize the need for data
conversion. The best way to do this is to implement data types at the Domain level which
should eliminate comparisons across data type. If data values come from the same Domain,
they must be of the same data type and therefore, can be compared without conversion.
Columns used in addition, subtraction, comparison, and join operations should always be
from the same domain. Multiplication and division operations involve columns from two or
three domains.
In the Teradata Database the Byte data types can only be compared to a column with the
Byte data type or a character string of XB'_ _ _ _...'
For example, the system converts a CHARACTER value to a DATE value using the DATE
conversion. On the other hand, converting from BYTE to NUMERIC is not possible
(indicated by "ERROR").

Page 21-8

Access Considerations and Constraints

Data Conversions
• Columns (or values) must be of the same data type to be compared without
necessary conversion.

• Character data is compared using the host’s collating sequence.
– Unequal-length character strings are converted by right-padding the shorter one
with blanks.

• If column (or values) types differ, internal conversion is performed.
– Numeric values are converted to the same underlying representation.
– Character to numeric comparison requires the character value to be converted to a
numeric value.

• Data conversion is expensive and generally unnecessary.
• Implement data types at the Domain level.
– Comparison across data types may indicate that Domain definitions are not clearly
understood.

Access Considerations and Constraints

Page 21-9

Storing Numeric Data
You should always store numeric data in numeric data types. Teradata will always convert
character data to numeric data prior to doing a comparison.
When Teradata is asked to do a comparison, it will always apply the following rules:
To compare 2 columns, they must be of the same data type.
Character data types will always be converted to numeric.
The example on the slide page demonstrates the potential performance hit that could occur,
when you store numeric data as a character data type.
In Case 1 (Numeric values stored as Character Data Type):
Statement 1 uses a character literal – Teradata will do a PI access (no data
conversion required) to perform the comparison.
Statement 2 uses a numeric value – Teradata will do a Full Table Scan (FTS)
against the EMP1 table converting Emp_no to a numeric value and then do the
comparison.
In Case 2 (Numeric values stored as Numeric Data Type):
Statement 1 uses a numeric value – Teradata will do a PI access (no data
conversion required) to perform the comparison.
Statement 2 uses a character literal – Teradata will convert the character literal to a
numeric value, then do a PI access to perform the comparison.

Page 21-10

Access Considerations and Constraints

Storing Numeric Data
When comparing character data to numeric, Teradata will always convert
character to numeric, then do the comparison.
Comparison Rules:
To compare columns, they
must be of the same Data
types.
Character data types will
always be converted to
numeric (when comparing
character to numeric).
Bottom Line:
Always store numeric data
in numeric data types to
avoid unnecessary and
costly data conversions.

Case 1

Case 2

CREATE TABLE Emp1
(Emp_no
CHAR(6),
Emp_name CHAR(20))
UNIQUE PRIMARY INDEX
(Emp_no);

CREATE TABLE Emp2
(Emp_no
INTEGER,
Emp_name CHAR(20))
UNIQUE PRIMARY INDEX
(Emp_no);

Statement
SELECT
FROM
WHERE

1
*
Emp1
Emp_no = '1234';

Statement
SELECT
FROM
WHERE

1
*
Emp2
Emp_no = 1234;

Statement
SELECT
FROM
WHERE

2
*
Emp1
Emp_no = 1234;

Statement
SELECT
FROM
WHERE

2
*
Emp2
Emp_no = '1234';

Results in Full Table Scan

Access Considerations and Constraints

Results in unnecessary
conversion

Page 21-11

Data Conversion Example
The example on the facing page illustrates how data conversion adversely affects system
performance.
You can see the results of the first EXPLAIN. Note that total estimated time to perform
this SELECT is minimal. The system can process this request quickly because the
data type of the literal value matches the column type. A character column value
(col1) is being compared to a character literal (‘8’) which allows TERADATA to
use the UPI defined on c1 for access and for maximum efficiency. The query
executes as a UPI SELECT.
In the second SELECT statement, the character column value (col1) is compared with a
numeric value (8). You should notice that the total “cost” for this SELECT is
nearly 30 times the estimate for the preceding SELECT. The system must do a
Full Table Scan and convert the character values in col1 to numeric to compare
them against the numeric literal (8).

If the column was numeric and the literal value was character,
the literal would convert to numeric and the result could be hashed,
allowing UPI access.

Page 21-12

Access Considerations and Constraints

Data Conversion Example
CREATE SET TABLE TFACT01.Table1
(col1 CHAR(12) NOT NULL)
UNIQUE PRIMARY INDEX (col1);
EXPLAIN SELECT * FROM Table1 WHERE col1 = '8';
1) First, we do a single-AMP RETRIEVE step from TFACT01.Table1 by way of the unique primary index
"TFACT01.Table1.col1 = '8' " with no residual conditions. The estimated time for this step is 0.00
seconds.
-> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.00
seconds.

EXPLAIN SELECT * FROM Table1 WHERE col1 = 8;
1) First, we lock a distinct TFACT01."pseudo table" for read on a RowHash to prevent global deadlock
for TFACT01.Table1.
2) Next, we lock TFACT01.Table1 for read.
3) We do an all-AMPs RETRIEVE step from TFACT01.Table1 by way of an all-rows scan with a condition
of ("(TFACT01.Table1.col1 (FLOAT, FORMAT '-9.99999999999999E-999')UNICODE)=
8.00000000000000E 000") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is
estimated with no confidence to be 1,001 rows. The estimated time for this step is 0.28 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated
time is 0.28 seconds.

Access Considerations and Constraints

Page 21-13

Matching Data Types
There are a few data types that the hashing algorithm treats identically.
The best way to make sure that you don't run into this problem is to administer the data type
assignments at the Domain level. Designing a system around domains helps ensure that you
give matching Primary Indexes across tables the same data type.

Page 21-14

Access Considerations and Constraints

Matching Data Types
The following data types are identical to the hashing algorithm:
BYTEINT = SMALLINT = INTEGER = BIGINT = DATE = DECIMAL (x,0)
CHAR = VARCHAR = LONG VARCHAR
BYTE = VARBYTE
GRAPHIC = VARGRAPHIC

Administer data type assignments at the domain level.
Give matching Primary Indexes across tables the same data type.

Access Considerations and Constraints

Page 21-15

Counting I/O Operations
Understanding the cost of various Teradata transactions in terms of I/O will help you avoid
unnecessary I/O overhead when doing your physical design.
Many factors can influence the number of physical I/Os in a transaction. Some are listed on
the facing page.
The main concept of the next few pages is to help you understand the relative cost of doing
INSERT, DELETE, and UPDATE operations. This understanding enables you to detect
subsequent problems when doing performance analysis on a troublesome application. When
counting I/O, it is important to remember that all such calculations give you a relative – not
the absolute – cost of the transaction. Any given I/O operation may or may not cause any
actual physical I/O.
– Normally, when making a change to a table (INSERT, UPDATE, and DELETE), not
only does the actual table have to be updated, but before-images have to be written in the
Transient Journal to maintain transaction integrity. Transient Journal space is automatically
allocated and is integrated with the WAL (Write-Ahead-Logic) Log, which has its own
cylinders and file system.

Additional I/O
A table may also have Join Indexes, Hash indexes, or a Permanent Journal associated with
it. Join Indexes can also have secondary indexes.
In additional the number of I/Os for changes to a table, these options will result in additional
I/Os.
Whenever Permanent Journaling is used, additional I/O is incurred. The amount of this I/O
varies according to whether you are using Before Imaging, After Imaging, or both, and
whether the imaging is single or dual. The table on the facing page shows how many I/O
operations are involved in writing the Permanent Journal block and the Cylinder Index.
To calculate the Total Permanent Journal I/O for PJ INSERTs, DELETEs and UPDATEs,
you apply the appropriate formula shown on the facing page.
Permanent Journal I/O is in addition to any I/O incurred during the operation itself. In order
to calculate the TOTAL I/O for an operation, you must sum the I/Os from the operation with
the Total PJ I/O corresponding to that operation.

Page 21-16

Access Considerations and Constraints

Counting I/O Operations
• Many factors influence the number of physical I/Os in a transaction:
–
–
–
–
–

Cache hits
Rows per block
Cylinder migrates
Mini-Cylpacks
Number of spool files and spool file sizes

• I/Os may be done serially or in parallel.
• Data and index block I/O may or may not require Cylinder Index I/O.
• Changes to data rows and USI rows require before-images (undo rows) and
after-images (redo rows) to be written to the WAL log.

• Logical I/O counts indicate the relative cost of a transaction.
– A given I/O operation may not cause any actual physical I/O.

• A table may also have Secondary, Join/Hash indexes, or a Permanent Journal
associated with it. Join Indexes can also have secondary indexes.

– In additional to the number of I/Os for changes to a table, these options will result
in additional I/O.

Access Considerations and Constraints

Page 21-17

Transient Journal I/O
The Transient Journal (TJ) exists to permit the successful rollback of a failed transaction.
Transactions are not committed to the database until an End Transaction request has been
received by the AMPs, either implicitly or explicitly. Until that time, there is always the
possibility that the transaction may fail in which case the participating table(s) must be
restored to their pre-transaction state.
The Transient Journal maintains a copy of all before-images of all rows affected by the
transaction. If the event of transaction failure, the before images are reapplied to the
affected tables, the images are deleted from the journal and a rollback operation is
completed. When the transaction completes (assume successfully), at the point of
transaction commit, the before-images for the transaction are discarded from the journal.
Normally, when making a change to a table (INSERT, UPDATE, and DELETE), not only
does the actual table have to be updated, but before-images have to be written in the TJ to
maintain transaction integrity.
– The preservation of the before-change row images for a transaction is the task of the
Write Ahead Logic (WAL) component of the Teradata database management software. The
system maintains a separate TJ (undo records) entry in the WAL log for each individual
database transaction whether it runs in ANSI or Teradata session mode.
–
The WAL Log includes the following:
–

Before-image or undo records used for transaction rollback.
After-image or redo records for updating disk blocks and insuring file system
consistency during restarts, based on operations performed in cache during normal
operation.
–
–

The WAL Log is conceptually similar to a table, but the log has a simpler structure than
a table. Log data is a sequence of WAL records, different from normal row structure and
not accessible via SQL.
When are transient journal rows actually written to the WAL log? This occurs BEFORE the
modification is made to the base table row.
Some situations where Transient Journal is not used when updating a table include:
INSERT / SELECT into an empty table
DELETE FROM tablename ALL;
Utilities such as FastLoad and MultiLoad
When a DELETE ALL is done, the master index and the cylinder indexes are updated. An
entry is actually placed in the Transient Journal indicating that a “DELETE ALL” has been
issued. Before-images of the individual deleted rows are not stored in the TJ. In the event a
node happens to fail in the middle of a DELETE ALL, the TJ is checked for the deferred
action that indicates a DELETE ALL was issued. The system checks to ensure that the
DELETE ALL has completed totally as part of the restart process.

Page 21-18

Access Considerations and Constraints

Transient Journal I/O
The Transient Journal (TJ) is …

•
•
•
•
•

A journal of transaction before-images (or undo records) maintained in the WAL log.
Provides for automatic rollback in the event of TXN failure.
Provides “Transaction Integrity”.
Is automatic and transparent.
TJ images are maintained in the WAL Log. The WAL Log includes the following:

– Before-images or undo records used for transaction rollback.
– After-images or redo records for updating disk blocks and insuring file system consistency
during restarts, based on operations performed in cache (FSG) during normal operation.

Therefore, when modifying a table, there are I/O's for the data table and the WAL log (undo
and redo records).
Some situations where Transient Journal is not used include:

•
•
•
•

INSERT / SELECT into an empty table
DELETE tablename; (Deletes all the rows in a table)
Utilities such as FastLoad and MultiLoad
ALTER TABLE

Access Considerations and Constraints

Page 21-19

INSERT and DELETE Operations
To calculate the number of I/Os required to INSERT a new data row or DELETE an existing
row, it is necessary to do three subsidiary calculations. They are:
Number of I/Os required to INSERT or DELETE the row itself = five.
Number of I/Os required for each Unique Secondary Index (USI) = five.
Number of I/Os required for each Non-Unique Secondary Index (NUSI) = three.
The overall formula for counting I/Os for INSERT and DELETE operations is shown at the
bottom of the facing page. The number of I/Os must be doubled if Fallback is used.

Page 21-20

Access Considerations and Constraints

INSERT and DELETE Operations
INSERT INTO tablename . . . ;

DELETE FROM tablename . . . ;

(* is an I/O operation)

DATA ROW

*
*
*
*
*

READ Data Block
WRITE Transient Journal record (UNDO row) to WAL Log
INSERT or DELETE the data row, and WRITE REDO row (after-image) to WAL Log
WRITE new Data Block
WRITE Cylinder Index

For each USI

*
*
*

READ USI subtable block
WRITE Transient Journal record (UNDO index row) to WAL Log
INSERT or DELETE the new USI subtable row, and WRITE REDO row (after-image)
to WAL Log for the USI subtable row
WRITE new USI subtable block
WRITE Cylinder Index

*
*
For each NUSI

*

*
*

READ NUSI subtable block
ADD or DELETE the ROWID on the ROWID LIST or
ADD or DELETE the NUSI subtable row
WRITE new NUSI subtable block
WRITE Cylinder Index

I/O operations per row = 5 + [ 5 * (#USIs) ] + [ 3 * (#NUSIs) ]
Double for FALLBACK

Access Considerations and Constraints

Page 21-21

UPDATE Operations
To calculate the number of I/Os required when updating a data column, it is necessary to
perform three subsidiary calculations. They are:
The number of I/Os required to UPDATE the column in the data row itself = five.
The number of I/Os required to change any USI subtable containing the particular
column which was updated = ten (five to remove the old subtable row and five to
add the new subtable row).
The number of I/Os required to change the subtable of any NUSI containing the
particular column which was updated = six (three to remove the old Row ID or
subtable row and three to add the new Row ID or subtable row).
The overall formula for counting I/Os for UPDATE operations is shown at the bottom of the
facing page.

 REMEMBER 
You are simply estimating the relative cost of a transaction.

Page 21-22

Access Considerations and Constraints

UPDATE Operations
UPDATE tablename SET colname = exp . . . (other than PI column)
DATA ROW

*
*
*
*
*

READ Data Block
WRITE Transient Journal record (UNDO row) to WAL Log
UPDATE the data row, and WRITE REDO row (after-image) to WAL Log
WRITE new Data Block
WRITE Cylinder Index

If colname = USI column
*
*
*
*
*
*
*
*
*
*

(* = I/O Operations)

READ current USI subtable block
WRITE TJ record (UNDO row) into WAL Log
DELETE USI subtable row, and WRITE
REDO row (after-image) to WAL Log
WRITE USI subtable block
WRITE Cylinder Index
READ new USI subtable block
WRITE TJ record (UNDO row) into WAL Log
INSERT new Index Subtable row, and WRITE
REDO row (after-image) to WAL Log
WRITE new USI subtable block
WRITE Cylinder Index

If colname = NUSI column
*

*
*
*

*
*

READ current NUSI subtable block
REMOVE data row's RowID from RowID list or
REMOVE NUSI subtable row if last RowID
WRITE NUSI subtable block
WRITE Cylinder Index
READ new NUSI subtable block
ADD data row's RowID to RowID list or ADD
new NUSI subtable row
WRITE new NUSI subtable block
WRITE Cylinder Index

I/O operations per row = 5 + [ 10 * (#USIs) ] + [ 6 * (#NUSIs) ]
Double for FALLBACK

Access Considerations and Constraints

Page 21-23

Primary Index Value UPDATE
Updating the Primary Index Value is the most I/O intensive operation of all. This is due to
the fact that any change to the PI invalidates all existing secondary index “pointers.”
To calculate the number of I/Os required to UPDATE a PI column, it is necessary to
perform three subsidiary calculations:
The number of I/Os required to UPDATE the PI column in the data row itself.
The number of I/Os required to change any USI subtable
The number of I/Os required to change any NUSI subtable
Study the steps on the facing page. Notice that updating a PI value is equivalent to first
deleting and then inserting a row. All the steps necessary to do a DELETE are performed,
and then all the steps necessary to do an INSERT are performed. Changing the PI value
involves actually moving the row to the location determined by the new hash value. Thus,
the number of steps involved in this process is exactly double the number of steps to
perform either an INSERT or a DELETE.
The formula for calculating the number of I/Os involved in a PI value update (shown at the
bottom of the facing page) can be derived by doubling the formula for INSERTing or
DELETing:
Formula for PI Value Update

=

10 + (5 * # USIs) + (6 * # NUSIs)

Remember to double the number if Fallback is used.
Note: If the USI changes, then the number of I/O’s for each changed USI is 8 in the
preceding formula.

Page 21-24

Access Considerations and Constraints

Primary Index Value Update
UPDATE tablename SET PI_column = new_value . . . ;

(* = I/O Operations)

Note: Assume only PI value is changed – all Secondary Index subtable rows are updated.
DATA ROW

**
*
**
**
*
**

READ current Data Block, WRITE TJ record (UNDO row) to WAL Log
DELETE the Data Row, and WRITE REDO row (after-image) to WAL Log
WRITE new Data Block, WRITE Cylinder Index
READ new Data Block, WRITE TJ record (UNDO row) to WAL Log
INSERT the DATA ROW, and WRITE REDO row (after-image) to WAL Log
WRITE new Data Block, WRITE Cylinder Index

For each USI

*
*
*

READ USI subtable block
WRITE TJ record (UNDO row) into WAL Log
UPDATE the USI subtable row with the new RowID, and WRITE REDO row (afterimage) to WAL Log
WRITE new USI subtable block
WRITE Cylinder index

*
*
For each NUSI

*
*
**
**

Read NUSI subtable block on AMP for current PI value
Read NUSI subtable block on AMP for new value
UPDATE the RowID list for both of the subtable blocks
WRITE new NUSI subtable blocks
WRITE Cylinder Indexes

I/O operations per row = 10 + [ 5 * (#USIs) ] + [ 6 * (#NUSIs) ]
Double for FALLBACK`

Access Considerations and Constraints

Page 21-25

Table Level Attributes
Because ANSI permits the possibility of duplicate rows in a table, a table level attribute
(SET, MULTISET) specifies whether or not to allow duplicates. Maximum data block
sizes can now be specified as part of a table creation, thus allowing for smaller or larger
blocks depending on the needs of the processing environment. Typically, decision support
applications prefer larger block sizes while on-line transaction processing applications
generally use smaller block sizes.
Additionally, a parameter may be set to allow for a particular cylinder fill factor during table
loading (FREESPACE). This factor may be set high for high subsequent file maintenance
activity, or low for relatively static tables.
The Checksum parameter (table level attribute not listed on facing page) feature improves
Teradata’s ability to detect data corruption in user data at the earliest occurrence. The
higher levels of checksums cause more sampling of data and more performance impact. The
default system value is normally NONE which has no performance impact. The
CHECKSUM is a calculated value (XOR logic) and is stored separate from the data
segment. It is stored in the Cylinder Index. This option is not as necessary with latest Disk
Array Controller's DAP-3 protection.
When a CHECKSUM value other than NONE is used, the data rows (in blocks) are not
updated in place. These “safe” writes prevent the system from not being able to recover
from an interrupted write corruption error.
Options for this parameter are:
DEFAULT

Calculate (or not) checksums based on system defaults as specified with
the DBS Control utility and the Checksum fields.

NONE

Do not calculate checksums.

LOW

Calculate checksums by sampling a low percentage of the disk block.
Default is to sample 2% of the disk block, but this value is determined by
the value in the DBS Control Checksum definitions.

MEDIUM

Calculate checksums by sampling a medium percentage of the disk block.
Default is to sample 33% of the disk block, but this value is determined by
the value in the DBS Control Checksum definitions.

HIGH

Calculate checksums by sampling a high percentage of the disk block.
Default is to sample 67% of the disk block, but this value is determined by
the value in the DBS Control Checksum definitions.

ALL

Calculate checksums using the entire disk block (sample 100% of the disk
block to generate a checksum).

Page 21-26

Access Considerations and Constraints

Table Level Attributes
CREATE MULTISET TABLE Table_1, FALLBACK, DATABLOCKSIZE = 64 KBYTES,
FREESPACE = 15, MERGEBLOCKRATIO = 60
(column1
INTEGER
NOT NULL,
column2
CHAR(5)
NOT NULL,
CONSTRAINT
table_constraint
CHECK (column1 > 0)
)
PRIMARY INDEX (column1)
INDEX
(column2);
SET
MULTISET
DATABLOCKSIZE = BYTES or KBYTES

Don’t allow duplicate rows
Allow duplicate rows (ANSI default)
Maximum multi-row block size for table in:
BYTES
Rounded to nearest sector (512)
KILOBYTES (or KBYTES) Increments of 1024

MINIMUM DATABLOCKSIZE
MAXIMUM DATABLOCKSIZE
IMMEDIATE

(7168)
(130,560)
May be used to immediately re-block the data with ALTER.

FREESPACE = integer [PERCENT]

Percent of freespace to keep on cylinder during load
operations (0 - 75%).

DEFAULT MERGEBLOCKRATIO
MERGEBLOCKRATIO = integer [PERCENT]
NO MERGEBLOCKRATIO

The merge block ratio to be used for this table when
when Teradata combines smaller data blocks into a single
larger data block (13.10). Typical system default is 60%.

Access Considerations and Constraints

Page 21-27

Example of Column and Table Level Constraints
Constraints can be placed at the column or the table level. Constraints may be named or
unnamed.
PRIMARY KEY

May only be defined on NOT NULL columns; guarantees
uniqueness.

UNIQUE

May only be defined on NOT NULL columns; guarantees
uniqueness.

CHECK

Allows range or value constraints to be placed on the column.

REFERENCES

Requires values to be referenced checked before being allowed.

Note: Columns with a REFERENCES constraint must refer to a column that has been
defined either with a PRIMARY KEY or UNIQUE constraint.
With Teradata, attributes and/or constraints can be assigned at the column when the table is
created (CREATE TABLE) or altered (ALTER TABLE). Some examples of
attributes/constraints that can be implemented include:
No Nulls – e.g., NOT NULL
No duplicates – e.g., UNIQUE
Data type – e.g., INTEGER
Size – e.g., VARCHAR(30)
Check – e.g., CHECK (col2 > 0)
Default – e.g., DEFAULT CURRENT_DATE
References – e.g., REFERENCES parent(col4)

Table Level Constraints
Constraints may also be specified at the table level. This is the only way to implement
constraints that involve more than one column. Table level constraints follow all column
level definitions. As previously, constraints may be either named or unnamed.

Page 21-28

Access Considerations and Constraints

Example of Column and Table Level Constraints
There are four types of constraints.
PRIMARY KEY
UNIQUE
CHECK
REFERENCES

No Nulls, No Duplicates
No Nulls, No Duplicates
Verify values or range
Relates to other columns

Constraints can be defined at the column or table level.
Notes for the following example:
• Some constraints are named, some are not.
• Some constraints are at column level, some are defined at the table level.
• The SHOW TABLE command will display this table differently for 13.0 and 13.10.
CREATE TABLE Department
( dept_number
INTEGER
,dept_name
CHAR(20)
,dept_mgr_number INTEGER
,budget_amount
DECIMAL (10,2)
,CONSTRAINT
refer_1
,CONSTRAINT
);

dn_gt_1000

Access Considerations and Constraints

NOT NULL CONSTRAINT primary_1 PRIMARY KEY
NOT NULL UNIQUE
COMPRESS 0
FOREIGN KEY (dept_mgr_number)
REFERENCES Employee (employee_number)
CHECK (dept_number > 1000)

Page 21-29

Example (13.0) – SHOW Department Table
The SHOW TABLE command shows a definition that is slightly altered from the original
script.
Note:
The PRIMARY KEY is implemented as a unique primary index.
The UNIQUE constraint is implemented as a unique secondary index.
The REFERENCES constraint is implemented as a FOREIGN KEY at the table level.
The CHECK constraint is implemented at the table level.
Additional notes: Since this table was created in Teradata mode, the following also applies:
The table is created as a SET table.
The character field is implemented with a NOTCASESPECIFIC attribute.
It is advisable to keep original scripts for documentation, as the original coding will
otherwise be lost.

Page 21-30

Access Considerations and Constraints

Example (13.0) – SHOW Department Table
This is an example of SHOW TABLE with Teradata 13.0.

SHOW TABLE Department;
CREATE SET TABLE PD.Department, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
dept_number INTEGER NOT NULL,
dept_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
dept_mgr_number INTEGER,
budget_amount DECIMAL(10,2) COMPRESS 0,
CONSTRAINT dn_gt_1000 CHECK ( dept_number > 1000 ),
CONSTRAINT refer_1 FOREIGN KEY ( dept_mgr_number ) REFERENCES
PD.EMPLOYEE ( EMPLOYEE_NUMBER ))
UNIQUE PRIMARY INDEX primary_1 ( dept_number )
UNIQUE INDEX ( dept_name );
Notes:
• In Teradata 13.0, the SHOW TABLE command does not show the Primary Key and Unique
constraints.
• Since Primary Key and Unique constraints are implemented as unique indexes, the Show
Table command shows these constraints as indexes.
• All constraints are specified at table level with SHOW TABLE.

Access Considerations and Constraints

Page 21-31

Example (13.10) – SHOW Department Table
An example of the same SHOW TABLE with Teradata 13.10 follows:
SHOW TABLE Department;
CREATE SET TABLE PD.Department , FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
dept_number INTEGER NOT NULL,
dept_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
dept_mgr_number INTEGER,
budget_amount DECIMAL(10,2) COMPRESS 0,
CONSTRAINT dn_1000_plus CHECK ( dept_number > 999 ),
CONSTRAINT primary_1 PRIMARY KEY ( dept_number ),
UNIQUE ( dept_name ),
CONSTRAINT refer_1 FOREIGN KEY ( dept_mgr_number )
REFERENCES PD.EMPLOYEE ( EMPLOYEE_NUMBER )) ;

The SHOW TABLE command again shows a definition that is slightly altered from the
original script; however the Teradata 13.10 version shows PRIMARY KEY and UNIQUE
constraints as originally specified.
Note:
The PRIMARY KEY is implemented as a unique primary index.
The UNIQUE constraint is implemented as a unique secondary index.
The REFERENCES constraint is implemented as a FOREIGN KEY at the table level.
The CHECK constraint is implemented at the table level.
Additional notes: Since this table was created in Teradata mode, the following also applies:
The table is created as a SET table.
The character field is implemented with a NOTCASESPECIFIC attribute.
As before, it is advisable to keep the original scripts for documentation, as the original
coding will otherwise be lost.

Page 21-32

Access Considerations and Constraints

Example (13.10) – SHOW Department Table
This is an example of SHOW TABLE with Teradata 13.10.

SHOW TABLE Department;
CREATE SET TABLE PD.Department, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
dept_number INTEGER NOT NULL,
dept_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
dept_mgr_number INTEGER,
budget_amount DECIMAL(10,2) COMPRESS 0,
CONSTRAINT dn_gt_1000 CHECK ( dept_number > 1000 ),
CONSTRAINT primary_1 PRIMARY KEY ( dept_number ),
UNIQUE ( dept_name ),
CONSTRAINT refer_1 FOREIGN KEY ( dept_mgr_number )
REFERENCES PD.EMPLOYEE ( EMPLOYEE_NUMBER )) ;
Notes:
• In Teradata 13.10, the SHOW TABLE command does show the Primary Key and Unique
constraints.
• As always, Primary Key and Unique constraints are implemented as unique indexes.
• All constraints are specified at table level with SHOW TABLE.

Access Considerations and Constraints

Page 21-33

Altering Table Constraints
Once a table has been created, constraints may be added, dropped and in some cases,
modified. The ALTER TABLE command can also be used to add new columns (up to
2048) to an existing table.

UNIQUE Constraints
Uniqueness constraints may also be added or dropped as needed. They may apply to one
or more columns. Columns must be defined as NOT NULL before a uniqueness constraint
may be applied to them. Uniqueness constraints are physically implemented by Teradata as
unique indexes, either primary or secondary. If the specified columns do not contain data
that is unique, the constraint will be rejected and an error will be returned.
Unique constraints may be dropped either by referencing their name, or by dropping the
index on the specified columns.

PRIMARY KEY Constraints
Adding a primary key constraint to a table via ALTER TABLE will always result in the
primary key being implemented as a unique secondary index (USI). This can only be done
if there has not already been a primary key defined on the table.
Dropping a primary key constraint may be done either by dropping the named constraint
or by dropping the associated index. It is not possible to drop a primary key constraint that
is implemented as a primary index.

FOREIGN KEY Constraints
Foreign key constraints may be n