SDTMIG V3.1.2 SDTM Implementation Guide

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 298 [warning: Documents this large are best viewed by clicking the View PDF Link!]

CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 1
Final November 12, 2008
Study Data Tabulation Model
Implementation Guide:
Human Clinical Trials
Prepared by the
CDISC Submission Data Standards Team
Notes to Readers
This is the implementation guide for Human Clinical Trials corresponding to Version 1.2 of the CDISC
Study Data Tabulation Model.
This Implementation Guide comprises version 3.1.2 (V3.1.2) of the CDISC Submission Data Standards
and domain models.
Revision History
Date
Version
Summary of Changes
2008-11-12
3.1.2 Final
Released version reflecting all changes and
corrections identified during comment period.
2007-07-25
3.1.2 Draft
Draft for comment.
2005-08-26
3.1.1 Final
Released version reflecting all changes and
corrections identified during comment period.
2004-07-14
3.1
Released version reflecting all changes and
corrections identified during comment periods.
Note: Please see 1570HAppendix F for Representations and Warranties, Limitations of Liability, and Disclaimers.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 2 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
CONTENTS
0H1 INTRODUCTION ................................................................................................... 1571H7
1H1.1 PURPOSE ............................................................................................................................................................. 1572H7
2H1.2 ORGANIZATION OF THIS DOCUMENT ................................................................................................................... 1573H7
3H1.3 RELATIONSHIP TO PRIOR CDISC DOCUMENTS ................................................................................................... 1574H8
4H1.4 HOW TO READ THIS IMPLEMENTATION GUIDE .................................................................................................... 1575H9
5H1.5 SUBMITTING COMMENTS .................................................................................................................................... 1576H9
6H2 FUNDAMENTALS OF THE SDTM ...................................................................... 1577H10
7H2.1 OBSERVATIONS AND VARIABLES ....................................................................................................................... 1578H10
8H2.2 DATASETS AND DOMAINS ................................................................................................................................. 1579H11
9H2.3 SPECIAL-PURPOSE DATASETS ........................................................................................................................... 1580H12
10H2.4 THE GENERAL OBSERVATION CLASSES ............................................................................................................. 1581H12
11H2.5 THE SDTM STANDARD DOMAIN MODELS ....................................................................................................... 1582H13
12H2.6 CREATING A NEW DOMAIN ............................................................................................................................... 1583H14
13H3 SUBMITTING DATA IN STANDARD FORMAT .................................................. 1584H16
14H3.1 STANDARD METADATA FOR DATASET CONTENTS AND ATTRIBUTES .................................................................. 1585H16
15H3.2 USING THE CDISC DOMAIN MODELS IN REGULATORY SUBMISSIONS DATASET METADATA ....................... 1586H17
16H3.2.1.1 Primary Keys ....................................................................................................................................... 1587H19
17H3.2.1.2 CDISC Submission Value-Level Metadata .......................................................................................... 1588H20
18H3.2.2 Conformance........................................................................................................................................ 1589H20
19H4 ASSUMPTIONS FOR DOMAIN MODELS .......................................................... 1590H21
20H4.1 GENERAL ASSUMPTIONS FOR ALL DOMAINS .................................................................................................... 1591H21
21H4.1.1 General Domain Assumptions ............................................................................................................. 1592H21
22H4.1.1.1 Review Study Data Tabulation and Implementation Guide ................................................................. 1593H21
23H4.1.1.2 Relationship to Analysis Datasets ........................................................................................................ 1594H21
24H4.1.1.3 Additional Timing Variables ................................................................................................................ 1595H21
25H4.1.1.4 Order of the Variables .......................................................................................................................... 1596H21
26H4.1.1.5 CDISC Core Variables ......................................................................................................................... 1597H21
27H4.1.1.6 Additional Guidance on Dataset Naming ............................................................................................ 1598H22
28H4.1.1.7 Splitting Domains ................................................................................................................................ 1599H22
29H4.1.1.8 Origin Metadata ................................................................................................................................... 1600H25
30H4.1.1.9 Assigning Natural Keys in the Metadata ............................................................................................. 1601H26
31H4.1.2 General Variable Assumptions ............................................................................................................. 1602H28
32H4.1.2.1 Variable-Naming Conventions ............................................................................................................. 1603H28
33H4.1.2.2 Two-Character Domain Identifier ........................................................................................................ 1604H28
34H4.1.2.3 Use of ―Subject‖ and USUBJID .......................................................................................................... 1605H29
35H4.1.2.4 Case Use of Text in Submitted Data .................................................................................................... 1606H29
36H4.1.2.5 Convention for Missing Values ............................................................................................................ 1607H29
37H4.1.2.6 Grouping Variables and Categorization ............................................................................................... 1608H29
38H4.1.2.7 Submitting Free Text from the CRF..................................................................................................... 1609H31
39H4.1.2.8 Multiple Values for a Variable ............................................................................................................. 1610H33
40H4.1.3 Coding and Controlled Terminology Assumptions .............................................................................. 1611H35
41H4.1.3.1 Types of Controlled Terminology ........................................................................................................ 1612H35
42H4.1.3.2 Controlled Terminology Text Case ...................................................................................................... 1613H35
43H4.1.3.3 Controlled Terminology Values ........................................................................................................... 1614H35
44H4.1.3.4 Use of Controlled Terminology and Arbitrary Number Codes ............................................................ 1615H36
45H4.1.3.5 Storing Controlled Terminology for Synonym Qualifier Variables ..................................................... 1616H36
46H4.1.3.6 Storing Topic Variables for General Domain Models .......................................................................... 1617H36
47H4.1.3.7 Use of ―Yes‖ and ―No‖ Values ............................................................................................................. 1618H36
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 3
Final November 12, 2008
48H4.1.4 Actual and Relative Time Assumptions ............................................................................................... 1619H37
49H4.1.4.1 Formats for Date/Time Variables ......................................................................................................... 1620H37
50H4.1.4.2 Date/Time Precision ............................................................................................................................. 1621H38
51H4.1.4.3 Intervals of Time and Use of Duration for --DUR Variables ............................................................... 1622H39
52H4.1.4.4 Use of the ―Study Day‖ Variables ........................................................................................................ 1623H40
53H4.1.4.5 Clinical Encounters and Visits ............................................................................................................. 1624H41
54H4.1.4.6 Representing Additional Study Days ................................................................................................... 1625H41
55H4.1.4.7 Use of Relative Timing Variables ........................................................................................................ 1626H42
56H4.1.4.8 Date and Time Reported in a Domain Based on Findings ................................................................... 1627H44
57H4.1.4.9 Use of Dates as Result Variables.......................................................................................................... 1628H44
58H4.1.4.10 Representing Time Points .................................................................................................................... 1629H44
59H4.1.5 Other Assumptions ............................................................................................................................... 1630H47
60H4.1.5.1 Original and Standardized Results of Findings and Tests Not Done ................................................... 1631H47
61H4.1.5.2 Linking of Multiple Observations ........................................................................................................ 1632H50
62H4.1.5.3 Text Strings That Exceed the Maximum Length for General-Observation-Class Domain Variables .. 1633H50
63H4.1.5.4 Evaluators in the Interventions and Events Observation Classes......................................................... 1634H51
64H4.1.5.5 Clinical Significance for Findings Observation Class Data ................................................................. 1635H52
65H4.1.5.6 Supplemental Reason Variables ........................................................................................................... 1636H52
66H4.1.5.7 Presence or Absence of Pre-Specified Interventions and Events ......................................................... 1637H52
67H5 MODELS FOR SPECIAL-PURPOSE DOMAINS ................................................. 1638H54
68H5.1 DEMOGRAPHICS ............................................................................................................................................... 1639H54
69H5.1.1 Demographics DM .......................................................................................................................... 1640H54
70H5.1.1.1 Assumptions for Demographics Domain Model.................................................................................. 1641H56
71H5.1.1.2 Examples for Demographics Domain Model ....................................................................................... 1642H57
72H5.2 COMMENTS....................................................................................................................................................... 1643H64
73H5.2.1 Comments CO ................................................................................................................................ 1644H64
74H5.2.1.1 Assumptions for Comments Domain Model ....................................................................................... 1645H65
75H5.2.1.2 Examples for Comments Domain Model ............................................................................................. 1646H66
76H5.3 SUBJECT ELEMENTS AND VISITS ....................................................................................................................... 1647H67
77H5.3.1 Subject Elements SE ....................................................................................................................... 1648H67
78H5.3.1.1 Assumptions for Subject Elements Domain Model ............................................................................. 1649H68
79H5.3.1.2 Examples for Subject Elements Domain Model .................................................................................. 1650H70
80H5.3.2 Subject Visits SV ............................................................................................................................ 1651H72
81H5.3.2.1 Assumptions for Subject Visits Domain Model ................................................................................... 1652H73
82H5.3.2.2 Examples for Subject Visits Domain Model ........................................................................................ 1653H74
83H6 DOMAIN MODELS BASED ON THE GENERAL OBSERVATION CLASSES .... 1654H75
84H6.1 INTERVENTIONS ................................................................................................................................................ 1655H75
85H6.1.1 Concomitant Medications CM ........................................................................................................ 1656H75
86H6.1.1.1 Assumptions for Concomitant Medications Domain Model................................................................ 1657H78
87H6.1.1.2 Examples for Concomitant Medications Domain Model ..................................................................... 1658H80
88H6.1.2 Exposure EX ................................................................................................................................... 1659H82
89H6.1.2.1 Assumptions for Exposure Domain Model .......................................................................................... 1660H84
90H6.1.2.2 Examples for Exposure Domain Model ............................................................................................... 1661H85
91H6.1.3 Substance Use SU ........................................................................................................................... 1662H89
92H6.1.3.1 Assumptions for Substance Use Domain Model ................................................................................. 1663H92
93H6.1.3.2 Example for Substance Use Domain Model ........................................................................................ 1664H93
94H6.2 EVENTS ............................................................................................................................................................ 1665H94
95H6.2.1 Adverse Events AE ......................................................................................................................... 1666H94
96H6.2.1.1 Assumptions for Adverse Event Domain Model ................................................................................. 1667H97
97H6.2.1.2 Examples for Adverse Events Domain Model ................................................................................... 1668H100
98H6.2.2 Disposition DS .............................................................................................................................. 1669H103
99H6.2.2.1 Assumptions for Disposition Domain Model .................................................................................... 1670H104
100H6.2.2.2 Examples for Disposition Domain Model ......................................................................................... 1671H106
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 4 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
101H6.2.3 Medical History MH ..................................................................................................................... 1672H110
102H6.2.3.1 Assumptions for Medical History Domain Model ............................................................................. 1673H112
103H6.2.3.2 Examples for Medical History Domain Model .................................................................................. 1674H114
104H6.2.4 Protocol Deviations DV ................................................................................................................ 1675H117
105H6.2.4.1 Assumptions for Protocol Deviations Domain Model ....................................................................... 1676H118
106H6.2.4.2 Examples for Protocol Deviations Domain Model ............................................................................ 1677H118
107H6.2.5 Clinical Events CE ........................................................................................................................ 1678H119
108H6.2.5.1 Assumptions for Clinical Events Domain Model .............................................................................. 1679H121
109H6.2.5.2 Examples for Clinical Events Domain Model ................................................................................... 1680H122
110H6.3 FINDINGS ........................................................................................................................................................ 1681H124
111H6.3.1 ECG Test Results EG .................................................................................................................... 1682H124
112H6.3.1.1 Assumptions for ECG Test Results Domain Model ........................................................................... 1683H127
113H6.3.1.2 Examples for ECG Test Results Domain Model ................................................................................ 1684H127
114H6.3.2 Inclusion/Exclusion Criteria Not Met IE ...................................................................................... 1685H130
115H6.3.2.1 Assumptions for Inclusion/Exclusion Criteria Not Met Domain Model ........................................... 1686H131
116H6.3.2.2 Examples for Inclusion/Exclusion Not Met Domain Model .............................................................. 1687H132
117H6.3.3 Laboratory Test Results LB .......................................................................................................... 1688H133
118H6.3.3.1 Assumptions for Laboratory Test Results Domain Model ................................................................. 1689H137
119H6.3.3.2 Examples for Laboratory Test Results Domain Model ...................................................................... 1690H137
120H6.3.4 Physical Examination PE .............................................................................................................. 1691H140
121H6.3.4.1 Assumptions for Physical Examination Domain Model .................................................................... 1692H142
122H6.3.4.2 Examples for Physical Examination Domain Model ......................................................................... 1693H143
123H6.3.5 Questionnaire QS .......................................................................................................................... 1694H144
124H6.3.5.1 Assumptions for Questionnaire Domain Model ................................................................................ 1695H147
125H6.3.5.2 Examples for Questionnaire Domain Model ..................................................................................... 1696H148
126H6.3.6 Subject Characteristics SC ............................................................................................................ 1697H150
127H6.3.6.1 Assumptions for Subject Characteristics Domain Model .................................................................. 1698H151
128H6.3.6.2 Example for Subject Charactistics Domain Model ............................................................................ 1699H152
129H6.3.7 Vital Signs VS ............................................................................................................................... 1700H153
130H6.3.7.1 Assumptions for Vital Signs Domain Model ..................................................................................... 1701H156
131H6.3.7.2 Example for Vital Signs Domain Model ............................................................................................ 1702H156
132H6.3.8 Drug Accountability DA ............................................................................................................... 1703H158
133H6.3.8.1 Assumptions for Drug Accountability Domain Model ...................................................................... 1704H159
134H6.3.8.2 Examples for Drug Accountability Domain Model ........................................................................... 1705H160
135H6.3.9 Microbiology Domains MB and MS ............................................................................................ 1706H161
136H6.3.9.1 Microbiology Specimen (MB) Domain Model .................................................................................. 1707H161
137H6.3.9.2 Assumptions for Microbiology Specimen (MB) Domain Model ...................................................... 1708H164
138HMicrobiology Susceptibility (MS) Domain Model ............................................................................................ 1709H165
139H6.3.9.3 Assumptions for Microbiology Susceptibility (MS) Domain Model ................................................. 1710H168
140H6.3.9.4 Examples for MB and MS Domain Models ....................................................................................... 1711H169
141H6.3.10 Pharmacokinetics Domains PC and PP ......................................................................................... 1712H172
142H6.3.10.1 Assumptions for Pharmacokinetic Concentrations (PC) Domain Model........................................... 1713H176
143H6.3.10.2 Examples for Pharmacokinetic Concentrations (PC) Domain Model ................................................ 1714H176
144H6.3.10.3 Assumptions for Pharmacokinetic Parameters (PP) Domain Model ................................................. 1715H179
145H6.3.10.4 Example for Pharmacokinetic Parameters (PP) Domain Model ........................................................ 1716H179
146H6.3.10.5 Relating PP Records to PC Records .................................................................................................. 1717H181
147H6.3.10.6 Conclusions........................................................................................................................................ 1718H193
148H6.3.10.7 Suggestions for Implementing RELREC in the Submission of PK Data ........................................... 1719H193
149H6.4 FINDINGS ABOUT EVENTS OR INTERVENTIONS ................................................................................................ 1720H194
150H6.4.1 When to Use Findings About ............................................................................................................. 1721H194
151H6.4.2 Naming Findings About Domains ..................................................................................................... 1722H195
152H6.4.3 Variables Unique to Findings About .................................................................................................. 1723H195
153H6.4.4 Findings About (FA) Domain Model ................................................................................................. 1724H196
154H6.4.5 Assumptions for Findings About Domain Model .............................................................................. 1725H198
155H6.4.6 Findings About Examples .................................................................................................................. 1726H199
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 5
Final November 12, 2008
156H7 TRIAL DESIGN DATASETS .............................................................................. 1727H211
157H7.1 INTRODUCTION ............................................................................................................................................... 1728H211
158H7.1.1 Purpose of Trial Design Model .......................................................................................................... 1729H211
159H7.1.2 Definitions of Trial Design Concepts ................................................................................................ 1730H211
160H7.1.3 Current and Future Contents of the Trial Design Model .................................................................... 1731H213
161H7.2 TRIAL ARMS ................................................................................................................................................... 1732H214
162H7.2.1 Trial Arms Dataset TA .................................................................................................................. 1733H214
163H7.2.2 Assumptions for TA Dataset .............................................................................................................. 1734H214
164H7.2.3 Trial Arms Examples ......................................................................................................................... 1735H215
165H7.2.3.1 Example Trial 1, a Parallel Trial ........................................................................................................ 1736H216
166H7.2.3.2 Example Trial 2, a Crossover Trial .................................................................................................... 1737H219
167H7.2.3.3 Example Trial 3, a Trial with Multiple Branch Points ....................................................................... 1738H223
168H7.2.3.4 Example Trial 4, Cycles of Chemotherapy ........................................................................................ 1739H226
169H7.2.3.5 Example Trial 5, Cycles with Different Treatment Durations ............................................................ 1740H230
170H7.2.3.6 Example Trial 6, Chemotherapy Trial with Cycles of Different Lengths .......................................... 1741H232
171H7.2.3.7 Example Trial 7, Trial with Disparate Arms ...................................................................................... 1742H235
172H7.2.4 Issues in Trial Arms Datasets ............................................................................................................. 1743H238
173H7.2.4.1 Distinguishing between Branches and Transitions ............................................................................ 1744H238
174H7.2.4.2 Subjects not Assigned to an Arm ....................................................................................................... 1745H238
175H7.2.4.3 Defining Epochs ................................................................................................................................ 1746H238
176H7.2.4.4 Rule Variables .................................................................................................................................... 1747H238
177H7.3 TRIAL ELEMENTS ........................................................................................................................................... 1748H239
178H7.3.1 Trial Elements Dataset TE ............................................................................................................ 1749H239
179H7.3.2 Assumptions for TE Dataset .............................................................................................................. 1750H240
180H7.3.3 Trial Elements Examples ................................................................................................................... 1751H241
181H7.3.4 Trial Elements Issues ......................................................................................................................... 1752H242
182H7.3.4.1 Granularity of Trial Elements ............................................................................................................ 1753H242
183H7.3.4.2 Distinguishing Elements, Study Cells, and Epochs ........................................................................... 1754H242
184H7.3.4.3 Transitions between Elements ........................................................................................................... 1755H243
185H7.4 TRIAL VISITS .................................................................................................................................................. 1756H244
186H7.4.1 Trial Visits Dataset TV .................................................................................................................. 1757H244
187H7.4.2 Assumptions for TV Dataset .............................................................................................................. 1758H244
188H7.4.3 Trial Visits Examples ......................................................................................................................... 1759H245
189H7.4.4 Trial Visits Issues ............................................................................................................................... 1760H246
190H7.4.4.1 Identifying Trial Visits ....................................................................................................................... 1761H246
191H7.4.4.2 Trial Visit Rules ................................................................................................................................. 1762H246
192H7.4.4.3 Visit Schedules Expressed with Ranges............................................................................................. 1763H247
193H7.4.4.4 Contingent Visits ................................................................................................................................ 1764H247
194H7.5 TRIAL INCLUSION/EXCLUSION CRITERIA ........................................................................................................ 1765H248
195H7.5.1 Trial Inclusion/Exclusion Criteria Dataset TI ............................................................................... 1766H248
196H7.5.2 Assumptions for TI Dataset ............................................................................................................... 1767H248
197H7.5.3 Examples for Trial Inclusion/Exclusion Dataset Model .................................................................... 1768H249
198H7.6 TRIAL SUMMARY INFORMATION ..................................................................................................................... 1769H249
199H7.6.1 Trial Summary Dataset TS ............................................................................................................ 1770H249
200H7.6.2 Assumptions for Trial Summary Dataset Model ................................................................................ 1771H250
201H7.6.3 Examples for Trial Summary Dataset Model ..................................................................................... 1772H251
202H7.7 HOW TO MODEL THE DESIGN OF A CLINICAL TRIAL ....................................................................................... 1773H254
203H8 REPRESENTING RELATIONSHIPS AND DATA .............................................. 1774H255
204H8.1 RELATING GROUPS OF RECORDS WITHIN A DOMAIN USING THE --GRPID VARIABLE ..................................... 1775H256
205H8.1.1 --GRPID Example ............................................................................................................................. 1776H256
206H8.2 RELATING PEER RECORDS .............................................................................................................................. 1777H257
207H8.2.1 RELREC Dataset ............................................................................................................................... 1778H257
208H8.2.2 RELREC Dataset Examples .............................................................................................................. 1779H258
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 6 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
209H8.3 RELATING DATASETS ...................................................................................................................................... 1780H259
210H8.3.1 RELREC Dataset Relationship Example ........................................................................................... 1781H259
211H8.4 RELATING NON-STANDARD VARIABLES VALUES TO A PARENT DOMAIN ......................................................... 1782H260
212H8.4.1 Supplemental Qualifiers: SUPPQUAL or SUPP-- Datasets .............................................................. 1783H261
213H8.4.2 Submitting Supplemental Qualifiers in Separate Datasets ................................................................. 1784H262
214H8.4.3 SUPP-- Examples .............................................................................................................................. 1785H262
215H8.4.4 When Not to Use Supplemental Qualifiers ........................................................................................ 1786H264
216H8.5 RELATING COMMENTS TO A PARENT DOMAIN ................................................................................................ 1787H265
217H8.6 HOW TO DETERMINE WHERE DATA BELONG IN THE SDTM ........................................................................... 1788H265
218H8.6.1 Guidelines for Determining the General Observation Class .............................................................. 1789H265
219H8.6.2 Guidelines for Forming New Domains .............................................................................................. 1790H266
220H8.6.3 Guidelines for Differentiating between Events, Findings, and Findings about Events ...................... 1791H266
221HAPPENDICES ............................................................................................................. 1792H269
222HAPPENDIX A: CDISC SDS TEAM *............................................................................................................................. 1793H269
223HAPPENDIX B: GLOSSARY AND ABBREVIATIONS .......................................................................................................... 1794H270
224HAPPENDIX C: CONTROLLED TERMINOLOGY ............................................................................................................... 1795H271
225HAppendix C1: Controlled Terms or Format for SDTM Variables (see also Appendix C3: Trial Summary Codes 1796H271
226HAppendix C2: Reserved Domain Codes ................................................................................................................ 1797H274
227HAppendix C2a: Reserved Domain Codes under Discussion .................................................................................. 1798H277
228HAppendix C3: Trial Summary Codes ..................................................................................................................... 1799H279
229HAppendix C4: Drug Accountability Test Codes ..................................................................................................... 1800H283
230HAppendix C5: Supplemental Qualifiers Name Codes............................................................................................ 1801H283
231HAPPENDIX D: CDISC VARIABLE-NAMING FRAGMENTS ............................................................................................. 1802H284
232HAPPENDIX E: REVISION HISTORY ............................................................................................................................... 1803H286
233HAPPENDIX F: REPRESENTATIONS AND WARRANTIES, LIMITATIONS OF LIABILITY, AND DISCLAIMERS ........................ 1804H298
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 7
Final November 12, 2008
1 Introduction
1.1 PURPOSE
This document comprises the CDISC Version 3.1.2 (V3.1.2) Study Data Tabulation Model Implementation Guide
for Human Clinical Trials (SDTMIG), which has been prepared by the Submissions Data Standards (SDS) team of
the Clinical Data Interchange Standards Consortium (CDISC). Like its predecessors, V3.1.2 is intended to guide the
organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority
such as the US Food and Drug Administration (FDA). V3.1.2 supersedes all prior versions of the CDISC
Submission Data Standards.
The SDTMIG should be used in close concert with the current version of the CDISC Study Data Tabulation Model
(SDTM, available at http://www.cdisc.org/standards) that describes the general conceptual model for representing
clinical study data that is submitted to regulatory authorities and should be read prior to reading the SDTMIG.
V3.1.2 provides specific domain models, assumptions, business rules, and examples for preparing standard
tabulation datasets that are based on the SDTM.
Tabulation datasets, which are electronic listings of individual observations for a subject that comprise the essential
data reported from a clinical trial, are one of four types of data currently submitted to the FDA along with patient
profiles, listings, and analysis files. By submitting tabulations that conform to the standard structure, sponsors may
benefit by no longer having to submit separate patient profiles or listings with a product marketing application.
SDTM datasets are not intended to fully meet the needs supported by analysis datasets, which will continue to be
submitted separately in addition to the tabulations. Since July 2004, the FDA has referenced use of the SDTM in the
Study Data Specifications for the Electronic Common Technical Document, available at
235Hhttp://www.fda.gov/cder/regulatory/ersr/Studydata-v1.2.pdf.
The availability of standard submission data will provide many benefits to regulatory reviewers. Reviewers can be
trained in the principles of standardized datasets and the use of standard software tools, and thus be able to work
with the data more effectively with less preparation time. Another benefit of the standardized datasets is that they
will support 1) the FDAs efforts to develop a repository for all submitted trial data, and 2) a suite of standard review
tools to access, manipulate, and view the tabulations. Use of these data standards is also expected to benefit industry
by streamlining the flow of data from collection through submission, and facilitating data interchange between
partners and providers. Note that the SDTM represents an interchange standard, rather than a presentation format. It
is assumed that tabulation data will be transformed by software tools to better support viewing and analysis.
This document is intended for companies and individuals involved in the collection, preparation, and analysis of
clinical data that will be submitted to regulatory authorities.
1.2 ORGANIZATION OF THIS DOCUMENT
This document is organized into the following sections:
236HSection 1, 1805HIntroduction, provides an overall introduction to the V3.1.2 models and describes changes from
prior versions.
237HSection 2, 1806HFundamentals of the SDTM, recaps the basic concepts of the SDTM, and describes how this
implementation guide should be used in concert with the SDTM.
238HSection 3, 1807HSubmitting Data in Standard Format, explains how to describe metadata for regulatory
submissions, and how to assess conformance with the standards.
239HSection 4, 1808HAssumptions for Domain Models, describes basic concepts, business rules, and assumptions that
should be taken into consideration before applying the domain models.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 8 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
240HSection 5, 241HModels for Special-Purpose Domains, describes special-purpose domains, including
Demographics, Comments, Subject Visits, and Subject Elements.
242HSection 6, 1809HDomain Models Based on the General Observation Classes, provides specific metadata models
based on the three general observation classes, along with assumptions and example data.
243HSection 7, 1810HTrial Design Datasets, provides specific metadata models, assumptions, and examples.
244HSection 8, 245HRepresenting Relationships and Data, describes how to represent relationships between separate
domains, datasets, and/or records, and information to help sponsors determine where data belongs in the
SDTM.
1811HAppendices provide additional background material and describe other supplemental material relevant to
implementation.
1.3 RELATIONSHIP TO PRIOR CDISC DOCUMENTS
This document, together with the SDTM, represents the most recent version of the CDISC Submission Data Domain
Models. Since all updates are intended to be backward compatible the term ―V3.x‖ is used to refer to Version 3.1
and all subsequent versions. The most significant changes since the prior version, V3.1.1, include:
New domain models for Clinical Events and Findings About Events and Interventions (formerly Clinical
Findings in v3.1.2 Draft), and inclusion of previously posted domain models for Protocol Deviations, Drug
Accountability, pharmacokinetic data, and microbiology.
Additional assumptions and rules for representing common data scenarios and naming of datasets in
246HSection 4, including guidance on the use of keys and representing data with multiple values for a single
question.
Corrections and clarifications regarding the use of ISO 8601 date formats in 247HSection 4.1.4.
Additional guidance about how to address Findings data collected as a result of Events or Interventions,
and data submitted for pre-specified Findings and Events.
The use of new SDTM variables (Section 6.2 of the SDTM).
Implementation advice on the use of new timing variables, --STRTPT, --ENRTPT, --STTPT, and --ENTPT
(248HSection 4.1.4.7), and the new variable --OBJ (249HSection 6.4.3).
Listing of Qualifier variables from the same general observation class that would not generally be used in
the standard domains.
Several changes to the organization of the document, including the reclassification of Subject Elements
(SE) and Subject Visits (SV) as special-purpose domain datasets in 250HSection 5 (these were formerly included
as part of Trial Design), and moving data examples from a separate section (former Section 9) to locations
immediately following each domain model in 251HSection 5 and 252HSection 6.
Changes to the method for representing multiple RACE values in DM and SUPPDM with examples.
Removed the Origin column from domain models based on the three general classes since origins will need
to be defined by the sponsor in most cases. Definitions of origin metadata have been added.
A detailed list of changes between versions is provided in 253HAppendix E.
V3.1 was the first fully implementation-ready version of the CDISC Submission Data Standards that was directly
referenced by the FDA for use in human clinical studies involving drug products. However, future improvements
and enhancements such as V3.1.2 will continue to be made as sponsors gain more experience submitting data in this
format. Therefore, CDISC will be preparing regular updates to the implementation guide to provide corrections,
clarifications, additional domain models, examples, business rules, and conventions for using the standard domain
models. CDISC will produce further documentation for controlled terminology as separate publications, so sponsors
are encouraged to check the CDISC website (254Hwww.cdisc.org/standards/) frequently for additional information. See
255HSection 4.1.3 for the most up-to-date information on applying Controlled Terminology.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 9
Final November 12, 2008
1.4 HOW TO READ THIS IMPLEMENTATION GUIDE
This SDTM Implementation Guide (SDTMIG) is best read online, so the reader can benefit from the many
hyperlinks included to both internal and external references. The following guidelines may be helpful in reading this
document:
1. First, read the SDTM to gain a general understanding of SDTM concepts.
2. Next, read Sections 1-3 of this document to review the key concepts for preparing domains and submitting
data to regulatory authorities. Refer to the Glossary in 1812H Appendix B as necessary.
3. Read the 256HGeneral Assumptions for all Domains in 257HSection 4.
4. Review 258HSection 5 and 259HSection 6 in detail, referring back to Assumptions as directed (hyperlinks are
provided). Note the implementation examples for each domain to gain an understanding of how to apply
the domain models for specific types of data.
5. Read 260HSection 7 to understand the fundamentals of the Trial Design Model and consider how to apply the
concepts for typical protocols. New extensions to the trial design model will be published separately on the
CDISC website.
6. Review 261HSection 8 to learn advanced concepts of how to express relationships between datasets, records and
additional variables not specifically defined in the models.
7. Finally, review the 1813H Appendices as appropriate.
1.5 SUBMITTING COMMENTS
Comments on this document can be submitted through the 262HCDISC Discussion Board.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 10 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
2 Fundamentals of the SDTM
2.1 OBSERVATIONS AND VARIABLES
The V3.x Submission Data Standards are based on the SDTM‘s general framework for organizing clinical trials
information that is to be submitted to the FDA. The SDTM is built around the concept of observations collected
about subjects who participated in a clinical study. Each observation can be described by a series of variables,
corresponding to a row in a dataset or table. Each variable can be classified according to its Role. A Role determines
the type of information conveyed by the variable about each distinct observation and how it can be used. Variables
can be classified into five major roles:
Identifier variables, such as those that identify the study, subject, domain, and sequence number of the record
Topic variables, which specify the focus of the observation (such as the name of a lab test)
Timing variables, which describe the timing of the observation (such as start date and end date)
Qualifier variables, which include additional illustrative text or numeric values that describe the results or
additional traits of the observation (such as units or descriptive adjectives)
Rule variables, which express an algorithm or executable method to define start, end, and branching or looping
conditions in the Trial Design model
The set of Qualifier variables can be further categorized into five sub-classes:
Grouping Qualifiers are used to group together a collection of observations within the same domain. Examples
include --CAT and --SCAT.
Result Qualifiers describe the specific results associated with the topic variable in a Findings dataset. They
answer the question raised by the topic variable. Result Qualifiers are --ORRES, --STRESC, and --STRESN.
Synonym Qualifiers specify an alternative name for a particular variable in an observation. Examples include
--MODIFY and --DECOD, which are equivalent terms for a --TRT or --TERM topic variable, --TEST and
--LOINC which are equivalent terms for a --TESTCD.
Record Qualifiers define additional attributes of the observation record as a whole (rather than describing a
particular variable within a record). Examples include --REASND, AESLIFE, and all other SAE flag variables
in the AE domain; AGE, SEX, and RACE in the DM domain; and --BLFL, --POS, --LOC, --SPEC and --NAM
in a Findings domain
Variable Qualifiers are used to further modify or describe a specific variable within an observation and are only
meaningful in the context of the variable they qualify. Examples include --ORRESU, --ORNRHI, and
--ORNRLO, all of which are Variable Qualifiers of --ORRES; and --DOSU, which is a Variable Qualifier of
--DOSE.
For example, in the observation, ―Subject 101 had mild nausea starting on Study Day 6, ― the Topic variable value is
the term for the adverse event, ―NAUSEA‖. The Identifier variable is the subject identifier, ―101‖. The Timing
variable is the study day of the start of the event, which captures the information, ―starting on Study Day 6‖, while
an example of a Record Qualifier is the severity, the value for which is ―MILD‖. Additional Timing and Qualifier
variables could be included to provide the necessary detail to adequately describe an observation.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 11
Final November 12, 2008
2.2 DATASETS AND DOMAINS
Observations about study subjects are normally collected for all subjects in a series of domains. A domain is defined
as a collection of logically related observations with a common topic. The logic of the relationship may pertain to the
scientific subject matter of the data or to its role in the trial. Each domain is represented by a single dataset.
Each domain dataset is distinguished by a unique, two-character code that should be used consistently throughout
the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in four ways: as the
dataset name, the value of the DOMAIN variable in that dataset, as a prefix for most variable names in that dataset,
and as a value in the RDOMAIN variable in relationship tables (263HSection 8).
All datasets are structured as flat files with rows representing observations and columns representing variables. Each
dataset is described by metadata definitions that provide information about the variables used in the dataset. The
metadata are described in a data definition document named ―define‖ that is submitted with the data to regulatory
authorities. (See the 264HCase Report Tabulation Data Definition Specification [define.xml], available at www.CDISC.org).
Define.xml specifies seven distinct metadata attributes to describe SDTM data:
The Variable Name (limited to 8 characters for compatibility with the SAS Transport format)
A descriptive Variable Label, using up to 40 characters, which should be unique for each variable in the dataset
The data Type (e.g., whether the variable value is a character or numeric)
The set of controlled terminology for the value or the presentation format of the variable (Controlled Terms or Format)
The Origin of each variable (see 265HSection 4.1.1.8)
The Role of the variable, which determines how the variable is used in the dataset. For the V3.x domain models,
Roles are used to represent the categories of variables such as Identifier, Topic, Timing, or the five types of
Qualifiers.
Comments or other relevant information about the variable or its data included by the sponsor as necessary to
communicate information about the variable or its contents to a regulatory agency.
Data stored in SDTM datasets include both raw (as originally collected) and derived values (e.g., converted into
standard units, or computed on the basis of multiple values, such as an average). The SDTM lists only the name,
label, and type, with a set of brief CDISC guidelines that provide a general description for each variable used for a
general observation class.
The domain dataset models included in 266HSection 5 and 267HSection 6 of this document provide additional information
about Controlled Terms or Format, notes on proper usage, and examples. Controlled terminology (CT) is now
represented one of four ways:
A single asterisk when there is no specific CT available at the current time, but the SDS Team expects that sponsors
may have their own CT and/or the CDISC Controlled Terminology Team may be developing CT.
A list of controlled terms for the variable when values are not yet maintained externally
The name of an external codelist whose values can be found via the hyperlinks in either the domain or 268HAppendix C1.
A common format such as ISO 8601
The CDISC Controlled Terminology team will be publishing additional guidance on use of controlled terminology
separately.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 12 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
2.3 SPECIAL-PURPOSE DATASETS
The SDTM includes three types of special-purpose datasets:
Domain datasets, consisting of Demographics (DM), Comments (CO), Subject Elements (SE), and Subject
Visits (SV)0F
1, all of which include subject-level data that do not conform to one of the three general
observation classes. These are described in 269HSection 5.
Trial Design Model (TDM) datasets, such as Trial Arms (TA) and Trial Elements (TE), which represent
information about the study design but do not contain subject data. These are described in 270HSection 7.
Relationship datasets, which include the RELREC and SUPP-- datasets described in 271HSection 8.
2.4 THE GENERAL OBSERVATION CLASSES
Most subject-level observations collected during the study should be represented according to one of the three
SDTM general observation classes: Interventions, Events, or Findings. The lists of variables allowed to be used in
each of these can be found in the STDM.
The Interventions class captures investigational, therapeutic and other treatments that are administered to the
subject (with some actual or expected physiological effect) either as specified by the study protocol (e.g.,
exposure to study drug), coincident with the study assessment period (e.g., concomitant medications), or
self-administered by the subject (such as use of alcohol, tobacco, or caffeine).
The Events class captures planned protocol milestones such as randomization and study completion, and
occurrences, conditions, or incidents independent of planned study evaluations occurring during the trial (e.g.,
adverse events) or prior to the trial (e.g., medical history).
The Findings class captures the observations resulting from planned evaluations to address specific tests or
questions such as laboratory tests, ECG testing, and questions listed on questionnaires.
In most cases, the choice of observation class appropriate to a specific collection of data can be easily determined
according to the descriptions provided above. The majority of data, which typically consists of measurements or
responses to questions usually at specific visits or time points, will fit the Findings general observation class.
Additional guidance on choosing the appropriate general observation class is provided in 272HSection 8.6.1.
General assumptions for use with all domain models and custom domains based on the general observation classes
are described in 273HSection 4 of this document; specific assumptions for individual domains are included with the
domain models.
1 SE and SV were included as part of the Trial Design Model in earlier versions of the SDTMIG.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 13
Final November 12, 2008
2.5 THE SDTM STANDARD DOMAIN MODELS
The following standard domains with their respective domain codes have been defined or referenced by the CDISC SDS
Team in this document. Note that other domain models may be posted separately for comment after this publication.
Special-Purpose Domains (defined in 274HSection 5):
Demographics 1814HDM
Comments 1815HCO
Subject Elements 1816HSE
Subject Visits 1817HSV
Interventions General Observation Class (defined in 275HSection 6.1):
Concomitant Medications 1818HCM
Exposure 1819HEX
Substance Use 1820HSU
Events General Observation Class (defined in 276HSection 6.2):
Adverse Events 1821H AE
Disposition 1822HDS
Medical History 1823HMH
Protocol Deviations 1824HDV
Clinical Events 1825HCE
Findings General Observation Class (defined in 277HSection 6.3):
ECG Test Results 1826HEG
Inclusion/Exclusion Criterion Not Met 1827HIE
Laboratory Test Results 1828HLB
Physical Examination 1829HPE
Questionnaires 1830HQS
Subject Characteristics 1831HSC
Vital Signs 1832HVS
Drug Accountability 1833HDA
Microbiology Specimen 1834HMB
Microbiology Susceptibility Test MS
PK Concentrations 1836HPC
PK Parameters PP
Findings About (defined in 280HSection 6.4)
Findings About 281HFA
Trial Design Domains (defined in 282HSection 7):
Trial Arms 283HTA
Trial Elements 284HTE
Trial Visits 1837HTV
Trial Inclusion/Exclusion Criteria 285HTI
Trial Summary 286HTS
Relationship Datasets (defined in 287HSection 8):
288HSupplemental Qualifiers SUPPQUAL or
multiple SUPP-- datasets
Related Records 289HRELREC
A sponsor should only submit domain datasets that were actually collected (or directly derived from the collected
data) for a given study. Decisions on what data to collect should be based on the scientific objectives of the study,
rather than the SDTM. Note that any data that was collected and will be submitted in an analysis dataset must also
appear in a tabulation dataset.
The collected data for a given study may use some or all of the SDS standard domains as well as additional custom
domains based on the three general observation classes. A list of standard domain codes for many commonly used
domains is provided in . Additional standard domain models will be published by CDISC as they are developed, and
sponsors are encouraged to check the CDISC website for updates.
These general rules apply when determining which variables to include in a domain:
The Identifier variables, STUDYID, USUBJID, DOMAIN, and --SEQ are required in all domains based on the
general observation classes. Other Identifiers may be added as needed.
Any Timing variables are permissible for use in any submission dataset based on a general observation class
except where restricted by specific domain assumptions.
Any additional Qualifier variables from the same general observation class may be added to a domain model
except where restricted by specific domain assumptions.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 14 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
Sponsors may not add any other variables than those described in the preceding three bullets. The addition of
non-standard variables will compromise the FDAs abilities to populate the data repository and to use standard
tools. The SDTM allows for the inclusion of the sponsors non-SDTM variables using the Supplemental
Qualifiers special-purpose dataset structure, described in 290HSection 8.4. As the SDTM continues to evolve over
time, certain additional standard variables may be added to the general observation classes. Therefore, Sponsors
wishing to nominate such variables for future consideration should provide a rationale and description of the
proposed variable(s) along with representative examples to the CDISC Public Discussion Forum.
Standard variables must not be renamed or modified for novel usage. Their metadata should not be changed.
As long as no data was collected for Permissible variables, a sponsor is free to drop them and the corresponding
descriptions from the define.xml.
2.6 CREATING A NEW DOMAIN
This section describes the overall process for creating a custom domain, which must be based on one of the three
SDTM general observation classes. The number of domains submitted should be based on the specific requirements
of the study. Follow the process below to create a custom domain:
1. Confirm that none of the existing published domains will fit the need. A custom domain may only be
created if the data are different in nature and do not fit into an existing published domain.
Establish a domain of a common topic (i.e., where the nature of the data is the same), rather than by
a specific method of collection (e.g. electrocardiogram - EG). Group and separate data within the
domain using --CAT, --SCAT, --METHOD, --SPEC, --LOC, etc. as appropriate. Examples of
different topics are: microbiology, tumor measurements, pathology/histology, vital signs, and
physical exam results.
Do not create separate domains based on time, rather represent both prior and current observations
in a domain (e.g., CM for all non-study medications). Note that AE and MH are an exception to this
best practice because of regulatory reporting needs.
How collected data are used (e.g., to support analyses and/or efficacy endpoints) must not result in
the creation of a custom domain. For example, if blood pressure measurements are endpoints in a
hypertension study, they must still be represented in the VS (Vital Signs) domain as opposed to a
custom ―efficacy‖ domain. Similarly, if liver function test results are of special interest, they must
still be represented in the LB (Laboratory Tests) domain.
Data that were collected on separate CRF modules or pages may fit into an existing domain (such as
separate questionnaires into the QS domain, or prior and concomitant medications in the CM domain).
If it is necessary to represent relationships between data that are hierarchical in nature (e.g., a parent
record must be observed before child records), then establish a domain pair (e.g., MB/MS, PC/PP).
Note, domain pairs have been modeled for microbiology data (MB/MS domains) and PK data
(PC/PP domains) to enable dataset-level relationships to be described using RELREC. The domain
pair uses DOMAIN as an Identifier to group parent records (e.g., MB) from child records (e.g., MS)
and enables a dataset-level relationship to be described in RELREC. Without using DOMAIN to
facilitate description of the data relationships, RELREC, as currently defined could not be used
without introducing a variable that would group data like DOMAIN.
2. Check the Submission Data Standards area of the CDISC website (292Hhttp://www.cdisc.org/) for models added
after the last publication of the SDTMIG.
3. Look for an existing, relevant domain model to serve as a prototype. If no existing model seems
appropriate, choose the general observation class (Interventions, Events, or Findings) that best fits the data
by considering the topic of the observation The general approach for selecting variables for a custom
domain is as follows (also see Figure 2.6 below)
a. Select and include the required Identifier variables (e.g., STUDYID, DOMAIN, USUBJID, --SEQ)
and any permissible Identifier variables from SDTM Table 2.2.4.
b. Include the Topic variable from the identified general observation class (e.g., --TESTCD for
Findings) (SDTM table 2.2.1, SDTM table 2.2.2 or SDTM table 2.2.3).
c. Select and include the relevant Qualifier variables from the identified general observation class
(SDTM table 2.2.1, SDTM table 2.2.2 or SDTM table 2.2.3). Variables belonging to other general
observation classes must not be added.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 15
Final November 12, 2008
d. Select and include the applicable Timing variables (SDTM Table 2.2.5). Determine the domain
code. Check 293H Appendix C2 and 294HAppendix C2A for reserved two-character domain identifiers or
abbreviations. If one has not been assigned by CDISC, then the sponsor may select the unique
two-character domain code to be used consistently throughout the submission.
e. Apply the two-character domain code to the appropriate variables in the domain. Replace all
variable prefixes (shown in the models as two hyphens ―--―) with the domain code. If no domain
code exists in 295H Appendix C2 or 296HAppendix C2A for this data and if it desired to have this domain code
as part of CDISC controlled terminology then submit a request to add the new domain via the
CDISC website. Requests for new domain codes must include:
1) Two-letter domain code and description
2) Rationale for domain code
3) Domain model with assumptions
4) Examples
Upon receipt, the SDS Domain Code Subteam will review the package. If accepted, then the
proposal will be submitted to the SDS Team for review. Upon approval, a response will be sent to
the requestor and package processing will begin (i.e., prepare for inclusion in a next release of the
SDTM and SDTMIG, mapping concepts to BRIDG, and posting an update to the CDISC website). If
declined, then the Domain Code Subteam will draft a response for SDS Team review. Upon
agreement, the response will be sent to the requestor and also posted to the CDISC website.
f. Set the order of variables consistent with the order defined in SDTM Tables 2.2.1, 2.2.2, or 2.2.3,
depending upon the general observation class the custom domain is based on.
g. Adjust the labels of the variables only as appropriate to properly convey the meaning in the context
of the data being submitted in the newly created domain. Use title case for all labels (title case
means to capitalize the first letter of every word except for articles, prepositions, and conjunctions).
h. Ensure that appropriate standard variables are being properly applied by comparing the use of
variables in standard domains.
i. Describe the dataset within the define.xml document (see 297HSection 3.2).
j. Place any non-standard (SDTM) variables in a Supplemental Qualifier dataset. Mechanisms for
representing additional non-standard Qualifier variables not described in the general observation
classes and for defining relationships between separate datasets or records are described in 298HSection 8.4
of this document.
Figure 2.6. Creating a New Domain
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 16 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
3 Submitting Data in
Standard Format
3.1 STANDARD METADATA FOR DATASET CONTENTS AND ATTRIBUTES
The SDTMIG provides standard descriptions of some of the most commonly used data domains, with metadata
attributes. The descriptive metadata attributes that should be included in a define.xml as applied in the domain
models are:
The SDTMIG -standard variable name (standardized for all submissions, even though sponsors may be
using other variable names internally in their operational database)
The SDTMIG -standard variable label
Expected data types (the SDTMIG uses character or numeric to conform to the data types consistent with
SAS V5 transport file format, but define.xml allows for more descriptive data types, such as integer or
float)
The actual controlled terms and formats used by the sponsor (do not include the asterisk (*) included in the
CDISC domain models to indicate when controlled terminology applies)
The origin or source of the data (e.g., CRF, derived; see definitions in 299HSection 4.1.1.8)
The role of the variable in the dataset corresponding to the role in the SDTM if desired. Since these roles
are predefined for all standard domains that follow the general observation classes, they do not need to be
specified by sponsors in their define.xml for these domains.
Any Comments provided by the sponsor that may be useful to the Reviewer in understanding the variable
or the data in it.
In addition to these metadata attributes, the CDISC domain models include three other shaded columns that are not
sent to the FDA. These columns assist sponsors in preparing their datasets:
"CDISC Notes" is for notes to the sponsor regarding the relevant to the use of each variable
"Core" indicates how a variable is classified as a CDISC Core Variable (see 300HSection 4.1.1.5)
"References" provides references to relevant section of the SDTM or the SDTMIG.), and one to provide
references to relevant section of the SDTM or the SDTMIG.
The domain models in 301HSection 6 illustrate how to apply the SDTM when creating a specific domain dataset. In
particular, these models illustrate the selection of a subset of the variables offered in one of the general observation
classes along with applicable timing variables. The models also show how a standard variable from a general
observation class should be adjusted to meet the specific content needs of a particular domain, including making the
label more meaningful, specifying controlled terminology, and creating domain-specific notes and examples. Thus
the domain models demonstrate not only how to apply the model for the most common domains, but also give
insight on how to apply general model concepts to other domains not yet defined by CDISC.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 17
Final November 12, 2008
3.2 USING THE CDISC DOMAIN MODELS IN REGULATORY
SUBMISSIONS DATASET METADATA
The define.xml that accompanies a submission should also describe each dataset that is included in the submission
and describe the natural key structure of each dataset. While most studies will include DM and a set of safety
domains based on the three general observation classes (typically including EX, CM, AE, DS, MH, IE, LB, and VS),
the actual choice of which data to submit will depend on the protocol and the needs of the regulatory reviewer.
Dataset definition metadata should include dataset filenames, descriptions, locations, structures, class, purpose, keys,
and comments as described below in Table 3.2.1.
In the event that no records are present in a dataset (e.g., a small PK study where no subjects took concomitant
medications), the empty dataset should not be submitted and should not be described in the define.xml document.
The annotated CRF will show the data that would have been submitted had data been received; it need not be re-
annotated to indicate that no records exist.
Table 3.2.1. SDTM Submission Dataset-Definition Metadata Example
Dataset
Description
Class
Structure
Purpose
1838HKeys*
Location
1839HDM
Demographics
Special Purpose
Domains
One record per subject
Tabulation
STUDYID,
USUBJID
dm.xpt
1840HCO
Comments
Special Purpose
Domains
One record per comment
per subject
Tabulation
STUDYID,
USUBJID,
COSEQ
co.xpt
1841HSE
Subject Elements
Special Purpose
Domains
One record per actual
Element per subject
Tabulation
STUDYID,
USUBJID,
ETCD,
SESTDTC
se.xpt
1842HSV
Subject Visits
Special Purpose
Domains
One record per actual visit
per subject
Tabulation
STUDYID,
USUBJID,
VISITNUM
sv.xpt
1843HCM
Concomitant
Medications
Interventions
One record per recorded
medication occurrence or
constant-dosing interval per
subject.
Tabulation
STUDYID,
USUBJID,
CMTRT,
CMSTDTC
cm.xpt
1844HEX
Exposure
Interventions
One record per constant
dosing interval per subject
Tabulation
STUDYID,
USUBJID,
EXTRT,
EXSTDTC
ex.xpt
1845HSU
Substance Use
Interventions
One record per substance
type per reported occurrence
per subject
Tabulation
STUDYID,
USUBJID,
SUTRT,
SUSTDTC
su.xpt
1846HAE
Adverse Events
Events
One record per adverse
event per subject
Tabulation
STUDYID,
USUBJID,
AEDECOD,
AESTDTC
ae.xpt
1847HDS
Disposition
Events
One record per disposition
status or protocol milestone
per subject
Tabulation
STUDYID,
USUBJID,
DSDECOD,
DSSTDTC
ds.xpt
1848HMH
Medical History
Events
One record per medical
history event per subject
Tabulation
STUDYID,
USUBJID,
MHDECOD
mh.xpt
1849HDV
Protocol
Deviations
Events
One record per protocol
deviation per subject
Tabulation
STUDYID,
USUBJID,
DVTERM,
DVSTDTC
dv.xpt
1850HCE
Clinical Events
Events
One record per event per
subject
Tabulation
STUDYID,
USUBJID,
CETERM,
CESTDTC
ce.xpt
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 18 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
Dataset
Description
Class
Structure
Purpose
1838HKeys*
Location
1851HEG
ECG Test Results
Findings
One record per ECG
observation per time point
per visit per subject
Tabulation
STUDYID,
USUBJID,
EGTESTCD,
VISITNUM,
EGTPTREF,
EGTPTNUM
eg.xpt
1852HIE
Inclusion/
Exclusion Criteria
Not Met
Findings
One record per
inclusion/exclusion criterion
not met per subject
Tabulation
STUDYID,
USUBJID,
IETESTCD
ie.xpt
1853HLB
Laboratory Tests
Results
Findings
One record per analyte per
planned time point number
per time point reference per
visit per subject
Tabulation
STUDYID,
USUBJID,
LBTESTCD,
LBSPEC,
VISITNUM,
LBTPTREF,
LBTPTNUM
lb.xpt
1854HPE
Physical
Examination
Findings
One record per body system
or abnormality per visit per
subject
Tabulation
STUDYID,
USUBJID,
PETESTCD,
VISITNUM
pe.xpt
1855HQS
Questionnaires
Findings
One record per
questionnaire per question
per time point per visit per
subject
Tabulation
STUDYID,
USUBJID,
QSCAT,
QSTESTCD,
VISITNUM,
QSTPTREF,
QSTPTNUM
qs.xpt
1856HSC
Subject
Characteristics
Findings
One record per
characteristic per subject
Tabulation
STUDYID,
USUBJID,
SCTESTCD
sc.xpt
1857HVS
Vital Signs
Findings
One record per vital sign
measurement per time point
per visit per subject
Tabulation
STUDYID,
USUBJID,
VSTESTCD,
VISITNUM,
VSTPTREF,
VSTPTNUM
vs.xpt
1858HDA
Drug
Accountability
Findings
One record per drug
accountability finding per
subject
Tabulation
STUDYID,
USUBJID,
DATESTCD,
DADTC
da.xpt
1859HMB
Microbiology
Specimen
Findings
One record per
microbiology specimen
finding per time point per
visit per subject
Tabulation
STUDYID,
USUBJID,
MBTESTCD,
VISITNUM,
MBTPTREF,
MBTPTNUM
mb.xpt
1860HMS
Microbiology
Susceptibility Test
Findings
One record per
microbiology susceptibility
test (or other organism-
related finding) per
organism found in MB
Tabulation
STUDYID,
USUBJID,
MSTESTCD,
VISITNUM,
MSTPTREF,
MSTPTNUM
ms.xpt
1861HPC
Pharmacokinetic
Concentrations
Findings
One record per analyte per
planned time point number
per time point reference per
visit per subject"
Tabulation
STUDYID,
USUBJID,
PCTESTCD,
VISITNUM,
PCTPTREF,
PCTPTNUM
pc.xpt
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 19
Final November 12, 2008
Dataset
Description
Class
Structure
Purpose
1838HKeys*
Location
302HPP
Pharmacokinetic
Parameters
Findings
One record per PK
parameter per time-
concentration profile per
modeling method per
subject
Tabulation
STUDYID,
USUBJID,
PPTESTCD,
PPCAT,
VISITNUM,
PPTPTREF
pp.xpt
303HFA
Findings About
Events or
Interventions
Findings
One record per finding per
object per time point per
time point reference per
visit per subject
Tabulation
STUDYID,
USUBJID,
FATESTCD,
FAOBJ,
VISITNUM,
FATPTREF,
FATPTNUM
fa.xpt
304HTA
Trial Arms
Trial Design
One record per planned
Element per Arm
Tabulation
STUDYID,
ARMCD,
TAETORD
ta.xpt
305HTE
Trial Elements
Trial Design
One record per planned
Element
Tabulation
STUDYID,
ETCD
te.xpt
1862HTV
Trial Visits
Trial Design
One record per planned
Visit per Arm
Tabulation
STUDYID,
VISITNUM,
ARMCD
tv.xpt
306HTI
Trial Inclusion/
Exclusion Criteria
Trial Design
One record per I/E criterion
Tabulation
STUDYID,
IETESTCD
ti.xpt
307HTS
Trial Summary
Trial Design
One record per trial
summary parameter
value
Tabulation
STUDYID,
TSPARMCD,
TSSEQ
ts.xpt
1863HRELREC
Related Records
Special Purpose
Datasets
One record per related
record, group of records or
datasets
Tabulation
STUDYID,
RDOMAIN,
USUBJID,
IDVAR,
IDVARVAL,
RELID
relrec.xpt
308HSUPP--
**
Supplemental
Qualifiers for
[domain name]
Special-Purpose
Datasets
One record per IDVAR,
IDVARVAL, and QNAM
value per subject
Tabulation
STUDYID,
RDOMAIN,
USUBJID,
IDVAR,
IDVARVAL,
QNAM
supp--.xpt or
suppqual.xpt
* Note that the key variables shown in this table are examples only. A sponsors actual key structure may be
different.
** Separate Supplemental Qualifier datasets of the form supp--.xpt are recommended. See 309HSection 8.4.
3.2.1.1 PRIMARY KEYS
310HTable 3.2.1 above shows examples of what a sponsor might submit as variables that comprise the primary key for
SDTM datasets. Since the purpose of this column is to aid reviewers in understanding the structure of a dataset,
sponsors should list all of the natural keys (see definition below) for the dataset. These keys should define uniqueness
for records within a dataset, and may define a record sort order. The naming of these keys should be consistent with
the description of the structure in the Structure column. For all the general-observation-class domains (and for some
special-purpose domains), the --SEQ variable was created so that a unique record could be identified consistently
across all of these domains via its use, along with STUDYID, USUBJID, DOMAIN. In most domains, --SEQ will be
a surrogate key (see definition below) for a set of variables which comprise the natural key. In certain instances, a
Supplemental Qualifier (SUPP--) variable might also contribute to the natural key of a record for a particular domain.
See 311Hassumption 4.1.1.9 for how this should be represented, and for additional information on keys.
A natural key is a piece of data (one or more columns of an entity) that uniquely identify that entity, and distinguish
it from any other row in the table. The advantage of natural keys is that they exist already, and one does not need to
introduce a new ―unnatural‖ value to the data schema. One of the difficulties in choosing a natural key is that just
about any natural key one can think of has the potential to change. Because they have business meaning, natural
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 20 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
keys are effectively coupled to the business, and they may need to be reworked when business 312Hrequirements change.
An example of such a change in clinical trials data would be the addition of a position or location that becomes a
key in a new study, but wasnt collected in previous studies.
A surrogate key is a single-part, artificially established identifier for a record. Surrogate key assignment is a special
case of derived data, one where a portion of the primary key is derived. A surrogate key is immune to changes in
business needs. In addition, the key depends on only one field, so it‘s compact. A common way of deriving surrogate
key values is to assign integer values sequentially. The --SEQ variable in the SDTM datasets is an example of a
surrogate key for most datasets; in some instances, however, --SEQ might be a part of a natural key as a replacement
for what might have been a key (e.g. a repeat sequence number) in the sponsor's database
3.2.1.2 CDISC SUBMISSION VALUE-LEVEL METADATA
In general, the CDISC V3.x Findings data models are closely related to normalized, relational data models in a
vertical structure of one record per observation. Since the V3.x data structures are fixed, sometimes information that
might have appeared as columns in a more horizontal (denormalized) structure in presentations and reports will
instead be represented as rows in an SDTM Findings structure. Because many different types of observations are all
presented in the same structure, there is a need to provide additional metadata to describe the expected differences
that differentiate, for example, hematology lab results from serum chemistry lab results in terms of data type,
standard units and other attributes.
For example, the Vital Signs data domain could contain subject records related to diastolic and systolic blood
pressure, height, weight, and body mass index (BMI). These data are all submitted in the normalized SDTM
Findings structure of one row per vital signs measurement. This means that there could be five records per subject
(one for each test or measurement) for a single visit or time point, with the parameter names stored in the Test
Code/Name variables, and the parameter values stored in result variables. Since the unique Test Code/Names could
have different attributes (i.e., different origins, roles, and definitions) there would be a need to provide value-level
metadata for this information.
The value-level metadata should be provided as a separate section of the Case Report Tabulation Data Definition
Specification (CRT-DDS). This information, which historically has been submitted as a pdf document named
―define.pdf‖, should henceforth be submitted in an XML format. For details on the CDISC specification for
submitting define.xml, see 313Hwww.cdisc.org/standards/
3.2.2 CONFORMANCE
Conformance with the SDTMIG Domain Models is minimally indicated by:
Following the complete metadata structure for data domains
Following SDTMIG domain models wherever applicable
Using SDTM-specified standard domain names and prefixes where applicable
Using SDTM-specified standard variable names
Using SDTM-specified variable labels for all standard domains
Using SDTM-specified data types for all variables
Following SDTM-specified controlled terminology and format guidelines for variables, when provided
Including all collected and relevant derived data in one of the standard domains, special-purpose datasets, or
general-observation-class structures
Including all Required and Expected variables as columns in standard domains, and ensuring that all
Required variables are populated
Ensuring that each record in a dataset includes the appropriate Identifier and, Timing variables, as well as a
Topic variable
Conforming to all business rules described in the CDISC Notes column and general and domain-specific
assumptions.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 21
Final November 12, 2008
4 Assumptions for Domain
Models
4.1 GENERAL ASSUMPTIONS FOR ALL DOMAINS
4.1.1 GENERAL DOMAIN ASSUMPTIONS
4.1.1.1 REVIEW STUDY DATA TABULATION AND IMPLEMENTATION GUIDE
Review the Study Data Tabulation Model as well as this Implementation Guide before attempting to use any of the
individual domain models. See the Case Report Tabulation Data Definition Specification (define.xml), available on
the CDISC website, for information about an xml representation of the define.xml document.
4.1.1.2 RELATIONSHIP TO ANALYSIS DATASETS
Specific guidance on preparing analysis datasets can be found in the CDISC Analysis Dataset Model General
Considerations document, available at 314Hwww.cdisc.org/standards/
4.1.1.3 ADDITIONAL TIMING VARIABLES
Additional Timing variables can be added as needed to a standard domain model based on the three general
observation classes except where discouraged in 315HAssumption 4.1.4.8 and specific domain assumptions. Timing
variables can be added to special-purpose domains only where specified in the SDTMIG domain model
assumptions. Timing variables cannot be added to SUPPQUAL datasets or to RELREC (described in 316HSection 8).
4.1.1.4 ORDER OF THE VARIABLES
The order of variables in the define.xml should reflect the order of variables in the dataset. The order of variables in
the CDISC domain models has been chosen to facilitate the review of the models and application of the models.
Variables for the three general observation classes should be ordered with Identifiers first, followed by the Topic,
Qualifier, and Timing variables. Within each role, variables are ordered as shown in Tables 2.2.1, 2.2.2, 2.2.3,
2.2.3.1, 2.2.4, and 2.2.5 of the SDTM.
4.1.1.5 CDISC CORE VARIABLES
The concept of core variable is used both as a measure of compliance, and to provide general guidance to sponsors.
Three categories of variables are specified in the ―Core‖ column in the domain models:
A Required variable is any variable that is basic to the identification of a data record (i.e., essential key
variables and a topic variable) or is necessary to make the record meaningful. Required variables must always
be included in the dataset and cannot be null for any record.
An Expected variable is any variable necessary to make a record useful in the context of a specific domain.
Expected variables may contain some null values, but in most cases will not contain null values for every record.
When no data has been collected for an expected variable, however, a null column should still be included in the
dataset, and a comment should be included in the define.xml to state that data was not collected.
A Permissible variable should be used in a domain as appropriate when collected or derived. Except where
restricted by specific domain assumptions, any SDTM Timing and Identifier variables, and any Qualifier
variables from the same general observation class are permissible for use in a domain based on that general
observation class. The Sponsor can decide whether a Permissible variable should be included as a column
when all values for that variable are null. The sponsor does not have the discretion to not submit permissible
variables when they contain data.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 22 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.1.6 ADDITIONAL GUIDANCE ON DATASET NAMING
SDTM datasets are normally named to be consistent with the domain code; for example, the Demographics dataset
(DM) is named dm.xpt (see 317H Appendix C2 for a list of standard and reserved domain codes). Exceptions to this rule
are described in 318HSection 4.1.1.7 for general-observation-class datasets and in 319HSection 8 for the RELREC and SUPP--
datasets.
In some cases, sponsors may need to define new custom domains other than those represented in the SDTMIG or
listed in 320HAppendix C2, and may be concerned that CDISC domain codes defined in the future will conflict with
those they choose to use. To eliminate any risk of a sponsor using a name that CDISC later determines to have a
different meaning, domain codes beginning with the letters X, Y, or Z have been reserved for the creation of custom
domains. Any letter or number may be used in the second position. Note the use of codes beginning with X, Y, or Z
is optional, and not required for custom domains.
4.1.1.7 SPLITTING DOMAINS
Sponsors may choose to split a domain of topically related information into physically separate datasets. In such
cases, one of two approaches should be implemented:
1) For a domain based on a general observation class, splitting should be according to values in --CAT (which
must not be null).
2) The Findings About (FA) domain (321HSection 6.4) can be split either by --CAT values (per the bullet above) or
relative to the parent domain of the value in --OBJ. For example, FACM would store Findings About CM
records. See 322HSection 6.4.2 for more details.
The following rules must be adhered to when splitting a domain into separate datasets to ensure they can be
appended back into one domain dataset:
1) The value of DOMAIN must be consistent across the separate datasets as it would have been if they had not
been split (e.g., QS, FA).
2) All variables that require a domain prefix (e.g., --TESTCD, --LOC) must use the value of DOMAIN as the
prefix value (e.g., QS, FA).
3) --SEQ must be unique within USUBJID for all records across all the split datasets. If there are 1000 records
for a USUBJID across the separate datasets, all 1000 records need unique values for --SEQ.
4) When relationship datasets (e.g., SUPPxx, FAxx, CO, RELREC) relate back to split parent domains, IDVAR
should generally be --SEQ. When IDVAR is a value other than --SEQ (e.g., --GRPID, --REFID, --SPID),
care should be used to ensure that the parent records across the split datasets have unique values for the
variable specified in IDVAR, so that related children records do not accidentally join back to incorrect parent
records.
5) Variables of the same name in separate datasets should have the same SAS Length attribute to avoid any
difficulties if the sponsor or FDA should decide to append datasets together.
6) Permissible variables included in one split dataset need not be included in all split datasets. Should the
datasets be appended in SAS, permissible variables not used in some split datasets will have null values in
the appended datasets. Care is advised, however, when considering variable order. Should a permissible
variable used in one (or more) split datasets not be included in the first dataset used in a SAS Set statement,
the order of variables could be compromised.
7) Split dataset names can be up to four characters in length. For example, if splitting by --CAT, then dataset
names would be the domain name plus up to two additional characters (e.g., QS36 for SF-36). If splitting
Findings About by parent domain, then the dataset name would be the domain name plus the two-character
domain code describing the parent domain code (e.g., FACM). The four-character dataset-name limitation
allows the use of a Supplemental Qualifier dataset associated with the split dataset.
8) Supplemental Qualifier datasets for split domains would also be split. The nomenclature would include the
additional one-to-two characters used to identify the split dataset (e.g., SUPPQS36, SUPPFACM). The value
of RDOMAIN in the SUPP-- datasets would be the two-character domain code (e.g., QS, FA).
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 23
Final November 12, 2008
9) In RELREC, if a dataset-level relationship is defined for a split Findings About domain, then RDOMAIN
may contain the four-character dataset name, as shown in the following example.
STUDYID
RDOMAIN
USUBJID
IDVAR
IDVARVAL
RELTYPE
RELID
ABC
CM
CMSPID
ONE
1
ABC
FACM
FASPID
MANY
1
10) See the SDTM Metadata Implementation Guide for guidance on how to represent the metadata for a set of
split domain datasets in the define.xml.
Note that submission of split SDTM domains may be subject to additional dataset splitting conventions as defined
by regulators via technical specifications (e.g., Study Data Specifications) and/or as negotiated with regulatory
reviewers.
4.1.1.7.1 EXAMPLE OF SPLITTING QUESTIONNAIRES
This example shows the split QS domain data into three datasets: Clinical Global Impression (QSCG), Cornell Scale
for Depression in Dementia (QSCS) and Mini Mental State Examination (QSMM). Each dataset represents a subset
of the QS domain data and has only one value of QSCAT.
QS Domains
qscg.xpt (Clinical Global Impressions)
Row
STUDYID
DOMAIN
USUBJID
QSSEQ
QSSPID
QSTESTCD
QSTEST
QSCAT
1
CDISC01
QS
CDISC01.100008
1
CGI-
CGI-I
CGIGLOB
Global
Improvement
Clinical Global
Impressions
2
CDISC01
QS
CDISC01.100008
2
CGI-
CGI-I
CGIGLOB
Global
Improvement
Clinical Global
Impressions
3
CDISC01
QS
CDISC01.100014
1
CGI-
CGI-I
CGIGLOB
Global
Improvement
Clinical Global
Impressions
4
CDISC01
QS
CDISC01.100014
2
CGI-
CGI-I
CGIGLOB
Global
Improvement
Clinical Global
Impressions
Row
QSORRES
QSSTRESC
QSSTRESN
QSBLFL
VISITNUM
VISIT
VISITDY
QSDTC
QSDY
1
(cont)
No change
4
4
3
WEEK
2
15
2003-
05-13
15
2
(cont)
Much
Improved
2
2
10
WEEK
24
169
2003-
10-13
168
3
(cont)
Minimally
Improved
3
3
3
WEEK
2
15
2003-
10-31
17
4
(cont)
Minimally
Improved
3
3
10
WEEK
24
169
2004-
03-30
168
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 24 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
qscs.xpt (Cornell Scale for Depression in Dementia)
Row
STUDYID
DOMAIN
USUBJID
QSSEQ
QSSPID
QSTESTCD
QSTEST
QSCAT
1
CDISC01
QS
CDISC01.
100008
3
CSDD-01
CSDD01
Anxiety
Cornell Scale for
Depression in Dementia
2
CDISC01
QS
CDISC01.
100008
23
CSDD-01
CSDD01
Anxiety
Cornell Scale for
Depression in Dementia
3
CDISC01
QS
CDISC01.
100014
3
CSDD-01
CSDD01
Anxiety
Cornell Scale for
Depression in Dementia
4
CDISC01
QS
CDISC01.
100014
28
CSDD-06
CSDD06
Retardation
Cornell Scale for
Depression in Dementia
Row
QSORRES
QSSTRESC
QSSTRESN
QSBLFL
VISITNUM
VISIT
VISITDY
QSDTC
QSDY
1 (cont)
Severe
2
2
1
SCREEN
-13
2003-04-15
-14
2 (cont)
Severe
2
2
Y
2
BASELINE
1
2003-04-29
1
3 (cont)
Severe
2
2
1
SCREEN
-13
2003-10-06
-9
4 (cont)
Mild
1
1
Y
2
BASELINE
1
2003-10-15
1
qsmm.xpt (Mini Mental State Examination)
Row
STUDYID
DOMAIN
USUBJID
QSSEQ
QSSPID
QSTESTCD
QSTEST
QSCAT
1
CDISC01
QS
CDISC01.
100008
81
MMSE-A.1
MMSEA1
Orientation Time
Score
Mini Mental State
Examination
2
CDISC01
QS
CDISC01.
100008
88
MMSE-A.1
MMSEA1
Orientation Time
Score
Mini Mental State
Examination
3
CDISC01
QS
CDISC01.
100014
81
MMSE-A.1
MMSEA1
Orientation Time
score
Mini Mental State
Examination
4
CDISC01
QS
CDISC01.
100014
88
MMSE-A.1
MMSEA1
Orientation Time
score
Mini Mental State
Examination
Row
QSORRES
QSSTRESC
QSSTRESN
QSBLFL
VISITNUM
VISIT
VISITDY
QSDTC
QSDY
1 (cont)
4
4
4
1
SCREEN
-13
2003-04-15
-14
2 (cont)
3
3
3
Y
2
BASELINE
1
2003-04-29
1
3 (cont)
2
2
2
1
SCREEN
-13
2003-10-06
-9
4 (cont)
2
2
2
Y
2
BASELINE
1
2003-10-15
1
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 25
Final November 12, 2008
SUPPQS Domains
suppqscg.xpt: Supplemental Qualifiers for QSCG
Row
STUDYID
RDOMAIN
USUBJID
IDVAR
IDVARVAL
QNAM
QLABEL
QVAL
QORIG
QEVAL
1
CDISC01
QS
CDISC01.
100008
QSCAT
Clinical Global
Impressions
QSLANG
Questionnaire
Language
GERMAN
CRF
2
CDISC01
QS
CDISC01.
100014
QSCAT
Clinical Global
Impressions
QSLANG
Questionnaire
Language
FRENCH
CRF
suppqscs.xpt: Supplemental Qualifiers for QSCS
Row
STUDYID
RDOMAIN
USUBJID
IDVAR
IDVARVAL
QNAM
QLABEL
QVAL
QORIG
QEVAL
1
CDISC01
QS
CDISC01.
100008
QSCAT
Cornell Scale
for Depression
in Dementia
QSLANG
Questionnaire
Language
GERMAN
CRF
2
CDISC01
QS
CDISC01.
100014
QSCAT
Cornell Scale
for Depression
in Dementia
QSLANG
Questionnaire
Language
FRENCH
CRF
suppqsmm.xpt: Supplemental Qualifiers for QSMM
Row
STUDYID
RDOMAIN
USUBJID
IDVAR
IDVARVAL
QNAM
QLABEL
QVAL
QORIG
QEVAL
1
CDISC01
QS
CDISC01.
100008
QSCAT
Mini Mental State
Examination
QSLANG
Questionnaire
Language
GERMAN
CRF
2
CDISC01
QS
CDISC01.
100014
QSCAT
Mini Mental State
Examination
QSLANG
Questionnaire
Language
FRENCH
CRF
4.1.1.8 ORIGIN METADATA
4.1.1.8.1 ORIGIN METADATA FOR VARIABLES
The Origin column of the define.xml is used to indicate where the data originated. Its purpose is to unambiguously
communicate to the reviewer whether data was collected on a CRF (and thus should be traceable to an annotated
CRF), derived (and thus traceable to some derivation algorithm), or assigned by some subjective process (and thus
traceable to some external evaluator). The SDTMIG defines the following controlled terms for specifying Origin:
CRF: The designation of ‖CRF‖ (along with a reference) as an origin in the define.xml means that data was
collected as part of a CRF and that there is an annotated CRF associated with the variable. Sponsors may specify
additional details about the origin that may be helpful to the Reviewer (e.g., electronic diary) in the Comments
section of the define.xml. An origin of ―CRF‖ includes information that is preprinted on the CRF (e.g.,
―RESPIRATORY SYSTEM DISORDERS‖ for MHCAT).
eDT: The designation of "eDT" as an origin in the define.xml means that the data are received via an electronic Data
Transfer (eDT) and usually does not have associated annotations. An origin of eDT refers to data collected via data
streams such as laboratory, ECG, or IVRS. Sponsors may specify additional details about the origin that may be
helpful to the Reviewer in the Comments section of the define.xml.
Derived: Derived data are not directly collected on the CRF but are calculated by an algorithm or reproducible rule,
which is dependent upon other data values. This algorithm is applied across all values and may reference other
SDTM datasets. The derivation is assumed to be performed by the Sponsor. This does not apply to derived lab test
results performed directly by labs (or by devices).
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 26 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
Examples illustrating the distinction between collected and derived values include the following:
A value derived by an eCRF system from other entered fields has an origin of "Derived, " since the sponsor
controls the derivation.
A value derived from collected data by the sponsor, or a CRO working on their behalf, has an origin of
"Derived."
A value derived by an investigator and written/entered on a CRF has an origin of "CRF" (along with a
reference) rather than ―derived‖.
A value derived by a vendor (e.g., a central lab) according to their procedures is considered collected rather
than derived, and would have an origin of ―eDT‖.
Assigned: A value that is determined by individual judgment (by an evaluator other than the subject or investigator),
rather than collected as part of the CRF or derived based on an algorithm. This may include third party attributions
by an adjudicator. Coded terms that are supplied as part of a coding process (as in --DECOD) are considered to have
an Origin of ―Assigned‖. Values that are set independently of any subject-related data values in order to complete
SDTM fields such as DOMAIN and --TESTCD are considered to have an Origin of ―Assigned‖.
Protocol: A value that is defined as part of the Trial Design preparation (see 323HSection 7). An example would be
VSPOS (Vital Signs Position), which may be specified only in the protocol and not appear on a CRF.
The term ―Sponsor Defined‖ was used in earlier versions of the SDTMIG to advise the Sponsor to supply the
appropriate Origin value in the metadata. The text ―Sponsor Defined‖ was not intended to be used in the define.xml
and is no longer used in V3.1.2 and later.
4.1.1.8.2 ORIGIN METADATA FOR RECORDS
Sponsors are cautioned to recognize that an Origin of ―Derived‖ means that all values for that variable were derived,
and that ―CRF‖ (along with a reference) means that all were collected. In some cases, both collected and derived
values may be reported in the same field. For example, some records in a Findings dataset such as QS contain values
collected from the CRF and other records may contain derived values such as a total score. When both derived and
collected values are reported in a field, the value-level metadata origin will indicate at the test level if the value is
―Derived‖ or ―CRF‖ and the variable-level metadata origin will list all types for that variable separated by commas
(e.g., ―Derived, CRF‖).
4.1.1.9 ASSIGNING NATURAL KEYS IN THE METADATA
324HSection 3.2 indicates that a sponsor should include in the metadata the variables that contribute to the natural key for
a domain. The following examples are illustrations of how to do this, and include a case where a Supplemental
Qualifier variable is referenced because it forms part of the natural key.
Physical Examination (PE) domain example:
Sponsor A chooses the following natural key for the PE domain:
STUDYID, USUBJID, VISTNUM, PETESTCD
Sponsor B collects data in such a way that the location (PELOC) and method (PEMETHOD) variables need to be
included in the natural key to identify a unique row, but they do not collect a visit variable; instead they use the visit
date (PEDTC) to sequence the data. Sponsor B then defines the following natural key for the PE domain.
STUDYID, USUBJID, PEDTC, PETESTCD, PELOC, PEMETHOD
In certain instances a Supplemental Qualifier variable (i.e., a QNAM value, see 325HSection 8.4) might also contribute to
the natural key of a record, and therefore needs to be referenced as part of the natural key for a domain. The
important concept here is that a domain is not limited by physical structure. A domain may be comprised of more
than one physical dataset, for example the main domain dataset and its associated Supplemental Qualifiers dataset.
Supplemental Qualifiers variables should be referenced in the natural key by using a two-part name. The word
QNAM must be used as the first part of the name to indicate that the contributing variable exists in a dataset (and
this can be either a domain-specific SUPP-- dataset or the general SUPPQUAL dataset) and the second part is the
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 27
Final November 12, 2008
value of QNAM that ultimately becomes a column reference (e.g., QNAM.XVAR when the SUPP-- record has a
QNAM of ―XVAR‖) when the SUPPQUAL records are joined on to the main domain dataset.
Continuing with the PE domain example above, Sponsor B might have used ultrasound as a method of measurement
and might have collected additional information such as the makes and models of ultrasound equipment employed.
The sponsor considers the make and model information to be essential data that contributes to the uniqueness of the
test result, and so creates Supplemental Qualifier variables for make (QNAM=PEMAKE) and model
(QNAM=PEMODEL). The natural key is then defined as follows:
STUDYID, USUBJID, PEDTC, PETESTCD, PELOC, PEMETHOD, QNAM.PEMAKE, QNAM.PEMODEL
This approach becomes very useful in a Findings domain when a sponsor might choose to employ generic
--TESTCD values rather than compound --TESTCD values. The use of generic test codes helps to create distinct
lists of manageable controlled terminology for --TESTCD. In studies where multiple repetitive tests or
measurements are being made, for example in a rheumatoid arthritis study where repetitive measurements of bone
erosion in the hands and wrists might be made using both X-ray and MRI equipment, one approach to recording this
data might be to create an individual --TESTCD value for each measurement. Taking just the phalanges, a sponsor
might want to express the following in a test code in order to make it unique:
Left or Right hand
Phalange position (proximal / distal / middle)
Rotation of the hand
Method of measurement (X-ray / MRI)
Machine Make
Machine Model
Trying to encapsulate all of this information within a unique value of a --TESTCD is not a recommended approach
for the following reasons:
It results in the creation of a potentially large number of test codes
The eight-character values of --TESTCD becoming less intuitively meaningful
Multiple test codes are essentially representing the same test or measurement simply to accommodate
attributes of a test within the --TESTCD value itself (e.g., to represent a body location at which a
measurement was taken).
As a result, the preferred approach would be to use a generic (or simple) test code that requires associated qualifier
variables to fully express the test detail. Using this approach in the above example, a generic --TESTCD value might
be ―EROSION‖ and the additional components of the compound test codes discussed above would be represented in
a number of distinct qualifier variables. These may include domain variables (--LOC, --METHOD, etc.) and
Supplemental Qualifier variables (QNAM.MAKE, QNAM.MODEL, etc.). Expressing the natural key becomes very
important in this situation in order to communicate the variables that contribute to the uniqueness of a test.
If a generic --TESTCD was used the following variables would be used to fully describe the test. The test is
―EROSION‖, the location is ―Left MCP I‖, the method of measurement is ―Ultrasound‖, the make of the ultrasound
machine is ―ACME‖ and the model of the ultrasound machine is ―u 2.1‖. This domain includes both domain
variables and Supplemental Qualifier variables that contribute to the natural key of each row and to describe the
uniqueness of the test.
--TESTCD
--TEST
--LOC
--METHOD
QNAM.MAKE
QNAM.MODEL
EROSION
EROSION
LEFT MCP I
ULTRASOUND
ACME
U 2.1
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 28 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.2 GENERAL VARIABLE ASSUMPTIONS
4.1.2.1 VARIABLE-NAMING CONVENTIONS
SDTM variables are named according to a set of conventions, using fragment names (defined in 1864HAppendix D).
Variables with names ending in ―CD‖ are ―short‖ versions of associated variables that do not include the ―CD‖
suffix (e.g., --TESTCD is the short version of --TEST).
Values of --TESTCD must be limited to 8 characters, and cannot start with a number, nor can they contain characters
other than letters, numbers, or underscores. This is to avoid possible incompatibility with SAS V5 Transport files.
This limitation will be in effect until the use of other formats (such as XML) becomes acceptable to regulatory
authorities.
QNAM serves the same purpose as --TESTCD within supplemental qualifier datasets, and so values of QNAM are
subject to the same restrictions as values of --TESTCD.
Values of other ―CD‖ variables are not subject to the same restrictions as --TESTCD.
ETCD (the companion to ELEMENT) and TSPARMCD (the companion to TSPARM) are limited to 8
characters and do not have special character restrictions. These values should be short for ease of use in
programming, but it is not expected that they will need to serve as variable names.
ARMCD is limited to 20 characters and does not have special character restrictions. The maximum length
of ARMCD is longer than for other ―short‖ variables to accommodate the kind of values that are likely to
be needed for crossover trials. For example, if ARMCD values for a seven-period crossover were
constructed using two-character abbreviations for each treatment and separating hyphens, the length of
ARMCD values would be 20.
Variable descriptive names (labels), up to 40 characters, should be provided as data variable labels.
Use of variable names (other than domain prefixes), formats, decodes, terminology, and data types for the same type
of data (even for custom domains and Supplemental Qualifiers) should be consistent within and across studies
within a submission. Sponsors must use the predefined SDTM-standard labels in all standard domains.
4.1.2.2 TWO-CHARACTER DOMAIN IDENTIFIER
In order to minimize the risk of difficulty when merging/joining domains for reporting purposes, the two-character
Domain Identifier is used as a prefix in most variable names.
Special-Purpose domains (see 326HSection 5), Standard domains (see 327HSection 6), Trial Design domains (see 328HSection 7)
and Relationship datasets (see 329HSection 8) already specify the complete variable names, so no action is required.
When creating custom domains based on the General Observation Classes, sponsors must replace the -- (two
hyphens) prefix in the General Observation Class, Timing, and Identifier variables with the two-character Domain
Identifier (DOMAIN) variable value for that domain/dataset. The two-character domain code is limited to A to Z for
the first character, and A-Z, 0 to 9 for the 2nd character. No special characters are allowed for compatibility with SAS
version 5 transport files, and with file naming for the Electronic Common Technical Document (eCTD).
The philosophy applied to determine which variable names use a prefix was that all variable names are prefixed with
the Domain Identifier in which they originate except the following:
a. Required Identifiers (STUDYID, DOMAIN, USUBJID)
b. Commonly used grouping and merge Keys (VISIT, VISITNUM, VISITDY), and many of the variables
in trial design (such as ELEMENT and ARM)
c. All Demographics domain (DM) variables other than DMDTC and DMDY
d. All variables in RELREC and SUPPQUAL, and some variables in Comments and Trial Design datasets.
Required Identifiers are not prefixed because they are usually used as keys when merging/joining observations. The
--SEQ and the optional Identifiers --GRPID and --REFID are prefixed because they may be used as keys when
relating observations across domains.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 29
Final November 12, 2008
4.1.2.3 USE OF “SUBJECT” AND USUBJID
―Subject‖ should be used where applicable to generically refer to both ―patients‖ and ―healthy volunteers‖ in order
to be consistent with the recommendation in FDA guidance. The term ―Subject‖ should be used consistently in all
labels and comments. To identify a subject uniquely across all studies for all applications or submissions involving
the product, a unique identifier (USUBJID) should be assigned and included in all datasets.
The unique subject identifier (USUBJID) is required in all datasets containing subject-level data. USUBJID values
must be unique for each trial participant (subject) across all trials in the submission. This means that no two (or
more) subjects, across all trials in the submission, may have the same USUBJID. Additionally, the same person who
participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.
Sample Rows from individual study dm.xpt files for a same subject that participates first in ACME01 study, then
ACME14 study. Note that this is only one example of the possible values for USUBJID. CDISC does not
recommend any specific format for the values of USUBJID, only that the values need to be unique for all subjects in
the submission, and across multiple submissions for the same compound. Many sponsors concatenate values for the
Study, Site and Subject into USUBJID, but this is not a requirement. It is acceptable to use any format for
USUBJID, as long as the values are unique across all subjects per FDA guidance.
Study ACME01 dm.xpt
STUDYID
DOMAIN
USUBJID
SUBJID
SITEID
INVNAM
ACME01
DM
ACME01-05-001
001
05
John Doe
Study ACME14 dm.xpt
STUDYID
DOMAIN
USUBJID
SUBJID
SITEID
INVNAM
ACME14
DM
ACME01-05-001
017
14
Mary Smith
4.1.2.4 CASE USE OF TEXT IN SUBMITTED DATA
It is recommended that text data be submitted in upper case text. Exceptions may include long text data (such as
comment text); values of --TEST in Findings datasets (which may be more readable in title case if used as labels in
transposed views); and certain controlled terminology (see 330HSection 4.1.3.2) that are already in mixed case. The
Sponsors define.xml may indicate as a general note or assumption whether case sensitivity applies to text data for
any or all variables in the dataset.
4.1.2.5 CONVENTION FOR MISSING VALUES
Missing values for individual data items should be represented by nulls. This is a change from previous versions of
the SDTMIG, which allowed sponsors to define their conventions for missing values. Conventions for representing
observations not done using the SDTM --STAT and --REASND variables are addressed in 331H Section 4.1.5.1.2 and the
individual domain models.
4.1.2.6 GROUPING VARIABLES AND CATEGORIZATION
Grouping variables are Identifiers and Qualifiers that group records in the SDTM domains/datasets such as the
--CAT (Category) and --SCAT (Subcategory) variables assigned by sponsors to categorize topic-variable values. For
example, a lab record with LBTEST = ―SODIUM‖ might have LBCAT = ―CHEMISTRY‖ and LBSCAT =
―ELECTROLYTES‖. Values for --CAT and --SCAT should not be redundant with the domain name or dictionary
classification provided by --DECOD and --BODSYS.
1. Hierarchy of Grouping Variables
STUDYID
DOMAIN
--CAT
--SCAT
USUBJID
--GRPID
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 30 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
2. How Grouping Variables Group Data
A. For the subject
1. All records with the same USUBJID value are a group of records that describe that subject.
B. Across subjects (records with different USUBJID values)
1. All records with the same STUDYID value are a group of records that describe that study
2. All records with the same DOMAIN value are a group of records that describe that domain
3. --CAT (Category) and --SCAT (Sub-category) values further subset groups within the domain.
Generally, --CAT/--SCAT values have meaning within a particular domain. However, it is
possible to use the same values for --CAT/--SCAT in related domains (e.g., MH and AE). When
values are used across domains, the meanings should be the same. Examples of where
--CAT/--SCAT may have meaning across domains/datasets:
a. Some limited cases where they will have meaning across domains within the same
general observation class, because those domains contain similar conceptual information.
Adverse Events (AE), Medical History (MH) and Clinical Events (CE), for example, are
conceptually the same data, the only differences being when the event started relative to
the study start and whether the event is considered a regulatory reportable adverse event
in the study. Neurotoxicities collected in Oncology trials both as separate Medical
History CRFs (MH domain) and Adverse Event CRFs (AE domain) could both
identify/collect ―Paresthesia of the left Arm.‖ In both domains, the --CAT variable could
have the value of NEUROTOXICITY.
b. Cases where multiple datasets are necessary to capture data in the same domain. As an
example, perhaps the existence, start and stop date of ―Paresthesia of the left Arm‖ is
reported as an Adverse Event (AE domain), but the severity of the event is captured at
multiple visits and recorded as Findings About (FA dataset). In both cases the --CAT
variable could have a value of NEUROTOXICITY.
c. Cases where multiple domains are necessary to capture data that was collected together and
have an implicit relationship, perhaps identified in the Related Records (RELREC) special
purpose dataset. Stress Test data collection, for example, may capture the following:
i. Information about the occurrence, start, stop, and duration of the test in an
Events or Interventions custom general observation class dataset
ii. Vital Signs recorded during the stress test (VS domain)
iii. Treatments (e.g., oxygen) administered during the stress test (in an Interventions
domain).
In such cases, the data collected during the stress tests recorded in three separate domains
may all have --CAT/--SCAT values (STRESS TEST) that identify this data was collected
during the stress test.
C. Within subjects (records with the same USUBJID values)
1. --GRPID values further group (subset) records within USUBJID. All records in the same domain with the
same --GRPID value are a group of records within USUBJID. Unlike --CAT and --SCAT, --GRPID values
are not intended to have any meaning across subjects and are usually assigned during or after data collection.
2. Although --SPID and --REFID are Identifier variables, usually not considered to be grouping variables, they
may have meaning across domains.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 31
Final November 12, 2008
3. Differences between Grouping Variables
A. The primary distinctions between --CAT/--SCAT and --GRPID are:
--CAT/--SCAT are known (identified) about the data before it is collected
--CAT/--SCAT values group data across subjects
--CAT/--SCAT may have some controlled terminology
--GRPID is usually assigned during or after data collection at the discretion of the Sponsor
--GRPID groups data only within a subject
--GRPID values are sponsor-defined, and will not be subject to controlled terminology.
Therefore, data that would be the same across subjects is usually more appropriate in --CAT/--SCAT, and
data that would vary across subjects is usually more appropriate in --GRPID. For example, a Concomitant
Medication administered as part of a known combination therapy for all subjects (Mayo Clinic Regimen
for example) would more appropriately use --CAT/--SCAT to identify the medication as part of that
regimen. Groups of medications taken to treat an SAE, recorded in/on the SAE collection, and could be
part of a different grouping of medications for each subject would more appropriately use --GRPID.
In domains based on the Findings general observation class, the --RESCAT variable can be used to categorize results
after the fact. --CAT and --SCAT by contrast, are generally pre-defined by the Sponsor or used by the investigator at
the point of collection, not after assessing the value of Findings results.
4.1.2.7 SUBMITTING FREE TEXT FROM THE CRF
Sponsors often collect free text data on a CRF to supplement a standard field. This often occurs as part of a list of
choices accompanied by ―Other, specify.‖ The manner in which these data are submitted will vary based on their
role.
4.1.2.7.1 “SPECIFY” VALUES FOR NON-RESULT QUALIFIER VARIABLES
When free-text information is collected to supplement a standard non-result Qualifier field, the free-text value
should be placed in the SUPP-- dataset described in 332HSection 8.4. When applicable, controlled terminology should be
used for SUPP-- field names (QNAM) and their associated labels (QLABEL) (see 333HSection 8.4 and 1865HAppendix C5).
For example, when a description of "Other Medically Important Serious Adverse Event" category is collected on a
CRF, the free text description should be stored in the SUPPAE dataset.
AESMIE=Y
SUPPAE QNAM=AESOSP, QLABEL= Other Medically Important SAE, QVAL=HIGH RISK FOR
ADDITIONAL THROMBOSIS
Another example is a CRF that collects reason for dose adjustment with additional free-text description:
Reason for Dose Adjustment (EXADJ)
Describe
Adverse event
_____________________
Insufficient response
_____________________
Non-medical reason
_____________________
The free text description should be stored in the SUPPEX dataset.
EXADJ=NONMEDICAL REASON
SUPPEX QNAM=EXADJDSC, QLABEL= Reason For Dose Adjustment, QVAL=PATIENT
MISUNDERSTOOD INSTRUCTIONS
Note that QNAM references theparent‖ variable name with the addition ofOTH, one of the standard variable
naming fragments for Other‖ (see 1866HAppendix D). Likewise, the label is a modification of the parent variable label.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 32 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
When the CRF includes a list of values for a qualifier field that includes "Other" and the "Other" is supplemented
with a "Specify" free text field, then the manner in which the free text "Specify" value is submitted will vary based
on the sponsor's coding practice and analysis requirements. For example, consider a CRF that collects the
anatomical location of administration (EXLOC) of a study drug given as an injection:
Location of Injection
Right Arm Left Arm
Right Thigh Left Thigh
Other, Specify: _________________
An investigator has selected ―OTHER‖ and specified ―UPPER RIGHT ABDOMEN‖. Several options are available
for submission of this data:
1) If the sponsor wishes to maintain controlled terminology for the EXLOC field and limit the terminology to the 5
pre-specified choices, then the free text is placed in SUPPEX.
EXLOC
OTHER
QNAM
QLABEL
QVAL
EXLOCOTH
Other Location of Dose Administration
UPPER RIGHT ABDOMEN
2) If the sponsor wishes to maintain controlled terminology for EXLOC but will expand the terminology based on
values seen in the specify field, then the value of EXLOC will reflect the sponsors coding decision and
SUPPEX could be used to store the verbatim text.
EXLOC
ABDOMEN
QNAM
QLABEL
QVAL
EXLOCOTH
Other Location of Dose Administration
UPPER RIGHT ABDOMEN
Note that the sponsor might choose a different value for EXLOC (e.g., UPPER ABDOMEN, TORSO)
depending on the sponsor's coding practice and analysis requirements.
3) If the sponsor does not require that controlled terminology be maintained and wishes for all responses to be
stored in a single variable, then EXLOC will be used and SUPPEX is not required.
EXLOC
UPPER RIGHT ABDOMEN
4.1.2.7.2 “SPECIFY” VALUES FOR RESULT QUALIFIER VARIABLES
When the CRF includes a list of values for a result field that includes "Other" and the "Other" is supplemented with
a "Specify" free text field, then the manner in which the free text "Specify" value is submitted will vary based on the
sponsor's coding practice and analysis requirements. For example, consider a CRF where the sponsor requests the
subject's eye color:
Eye Color
Brown Black
Blue Green
Other, specify: ________
An investigator has selected "OTHER" and specified "BLUEISH GRAY." As in the above discussion for non-result
Qualifier values, the sponsor has several options for submission:
1) If the sponsor wishes to maintain controlled terminology in the standard result field and limit the terminology to
the 5 pre-specified choices, then the free text is placed in --ORRES and the controlled terminology in
--STRESC.
SCTEST=Eye Color, SCORRES=BLUEISH GRAY, SCSTRESC=OTHER
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 33
Final November 12, 2008
2) If the sponsor wishes to maintain controlled terminology in the standard result field, but will expand the
terminology based on values seen in the specify field, then the free text is placed in --ORRES and the value of
--STRESC will reflect the sponsor's coding decision.
SCTEST=Eye Color, SCORRES=BLUEISH GRAY, SCSTRESC=GRAY
3) If the sponsor does not require that controlled terminology be maintained, the verbatim value will be copied to
--STRESC.
SCTEST=Eye Color, SCORRES=BLUEISH GRAY, SCSTRESC=BLUEISH GRAY
Note that rules for the use of ―Other, Specify‖ for the Result Qualifier variable, --OBJ, is discussed in 334HSection 6.4.3.
4.1.2.7.3 “SPECIFY” VALUES FOR TOPIC VARIABLES
Interventions: If a list of specific treatments is provided along with ―Other, Specify‖, --TRT should be populated with
the name of the treatment found in the specified text. If the sponsor wishes to distinguish between the pre-specified
list of treatments and those recorded under ―Other, Specify, ‖ the --PRESP variable could be used. For example:
Indicate which of the following concomitant medications was used to treat the subject‘s headaches:
Acetaminophen
Aspirin
Ibuprofen
Naproxen
Other: ______
If ibuprofen and diclofenac were reported, the CM dataset would include the following:
CMTRT=IBUPROFEN, CMPRESP=Y
CMTRT=DICLOFENAC, CMPRESP is null.
Events: ―Other, Specify‖ for Events may be handled similarly to Interventions. --TERM should be populated with
the description of the event found in the specified text and --PRESP could be used to distinguish between
pre-specified and free text responses.
Findings: ―Other, Specify‖ for tests may be handled similarly to Interventions. --TESTCD and --TEST should be
populated with the code and description of the test found in the specified text. If specific tests are not prespecified on
the CRF and the investigator has the option of writing free text for tests, then the name of the test would have to be
coded to ensure that all --TESTCD and --TEST values are controlled terminology and are not free text. For example,
a lab CRF has tests of Hemoglobin, Hematocrit and ―Other, specify‖. The value the investigator wrote for ―Other,
specify‖ is Prothrombin time with an associated result and units. The sponsor would submit the controlled
terminology for this test which is LBTESTCD = PT and LBTEST = Prothrombin Time.
4.1.2.8 MULTIPLE VALUES FOR A VARIABLE
4.1.2.8.1 MULTIPLE VALUES FOR AN INTERVENTION OR EVENT TOPIC VARIABLE
If multiple values are reported for a topic variable (i.e., --TRT in an Interventions general-observation-class dataset or
--TERM in an Events general-observation-class dataset), it is assumed that the sponsor will split the values into
multiple records or otherwise resolve the multiplicity as per the sponsors standard data management procedures. For
example, if an adverse event term of ―Headache and Nausea‖ or a concomitant medication of ―Tylenol and Benadryl‖ is
reported, sponsors will often split the original report into separate records and/or query the site for clarification. By the
time of submission, the datasets should be in conformance with the record structures described in the SDTMIG. Note
that the Disposition dataset (DS) is an exception to the general rule of splitting multiple topic values into separate
records. For DS, one record for each disposition or protocol milestone is permitted according to the domain structure.
For cases of multiple reasons for discontinuation see 335HSection 6.2.2.1, Assumption 5 for additional information.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 34 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.2.8.2 MULTIPLE VALUES FOR A FINDINGS RESULT VARIABLE
If multiple result values (--ORRES) are reported for a test in a Findings class dataset, multiple records should be
submitted for that --TESTCD. Example:
EGTESTCD=RHYRATE, EGTEST=Rhythm and Rate, EGORRES=ATRIAL FIBRILLATION
EGTESTCD=RHYRATE, EGTEST=Rhythm and Rate, EGORRES=ATRIAL FLUTTER
Note that in this case, the sponsors operational database may have a result-sequence variable as part of the natural key.
Some sponsors may elect to keep this variable in a Supplemental Qualifier record, while others may decide to use
--SPID or --SEQ to replace it. Dependent variables such as result Qualifiers should never be part of the natural key.
4.1.2.8.3 MULTIPLE VALUES FOR A NON-RESULT QUALIFIER VARIABLE
The SDTM permits one value for each Qualifier variable per record. If multiple values exist (e.g., due to a ―Check all that
apply‖ instruction on a CRF), then the value for the Qualifier variable should be ―MULTIPLE‖ and SUPP-- should be
used to store the individual responses. It is recommended that the SUPP-- QNAM value reference the corresponding
standard domain variable with an appended number or letter. In some cases, the standard variable name will be shortened
to meet the 8 character variable name requirement or it may be clearer to append a meaningful character string as shown in
the 2nd AE example below where the 1st 3 characters of the drug name are appended. Likewise the QLABEL value should
be similar to the standard label. The values stored in QVAL should be consistent with the controlled terminology
associated with the standard variable. See 336HSection 8.4 for additional guidance on maintaining appropriately unique QNAM
values. The following example includes selected variables from the ae.xpt and suppae.xpt datasets for a rash whose
locations are the face, neck, and chest.
AE Dataset
AETERM
AELOC
RASH
MULTIPLE
SUPPAE Dataset
QNAM
QLABEL
QVAL
AELOC1
Location of the Reaction 1
FACE
AELOC2
Location of the Reaction 2
NECK
AELOC3
Location of the Reaction 3
CHEST
In some cases, values for QNAM and QLABEL more specific than those above may be needed. For example, a
sponsor might conduct a study with two study drugs (e.g., open-label study of Abcicin + Xyzamin), and may require
the investigator assess causality and describe action taken for each drug for the rash:
AE Dataset
AETERM
AEREL
AEACN
RASH
MULTIPLE
MULTIPLE
SUPPAE Dataset
QNAM
QLABEL
QVAL
AERELABC
Causality of Abcicin
POSSIBLY RELATED
AERELXYZ
Causality of Xyzamin
UNLIKELY RELATED
AEACNABC
Action Taken with Abcicin
DOSE REDUCED
AEACNXYZ
Action Taken with Xyzamin
DOSE NOT CHANGED
In each of the above examples, the use of SUPPAE should be documented in the metadata and the annotated CRF.
The controlled terminology used should be documented as part of value-level metadata.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 35
Final November 12, 2008
If the sponsor has clearly documented that one response is of primary interest (e.g., in the CRF, protocol, or analysis
plan), the standard domain variable may be populated with the primary response and SUPP-- may be used to store
the secondary response(s). For example, if Abcicin is designated as the primary study drug in the example above:
AE Dataset
AETERM AEREL AEACN
RASH POSSIBLY RELATED DOSE REDUCED
SUPPAE Dataset
QNAM QLABEL QVAL
AERELX Causality of Xyzamin UNLIKELY RELATED
AEACNX Action Taken with Xyzamin DOSE NOT CHANGED
Note that in the latter case the label for standard variables AEREL and AEACN will have no indication that they
pertain to Abcicin. This association must be clearly documented in the metadata and annotated CRF.
4.1.3 CODING AND CONTROLLED TERMINOLOGY ASSUMPTIONS
PLEASE NOTE: Examples provided in the column “CDISC Notes” are only examples and not intended to imply
controlled terminology. Please check current controlled terminology at this link:
337H1561Hhttp://www.cancer.gov/cancertopics/terminologyresources/CDISC
4.1.3.1 TYPES OF CONTROLLED TERMINOLOGY
For SDTMIG V3.1.1 the presence of a single asterisk (*) or a double asterisk (**) in the “Controlled Terms or
Format” column indicated that a discrete set of values (controlled terminology) was expected for the variable. This
set of values was sponsor-defined in cases where standard vocabularies had not yet been defined (represented by *)
or from an external published source such as MedDRA (represented by **). For V3.1.2, controlled terminology is
now represented one of three ways:
A single asterisk when there is no specific CT available at the current time, but the SDS Team expects
that sponsors may have their own CT and/or the CDISC Controlled Terminology Team may be
developing CT.
A list of controlled terms for the variable when values are not yet maintained externally
The name of an external codelist whose values can be found via the hyperlinks in either the domain or
Appendix C.
In addition, the “Controlled Terms or Format” column has been used to indicate a common format such as ISO
8601.
4.1.3.2 CONTROLLED TERMINOLOGY TEXT CASE
It is recommended that controlled terminology be submitted in upper case text for all cases other than those
described as exceptions below. Deviations to this rule should be described in the define.xml.
a. If the external reference for the controlled terminology is not in upper case then the data should
conform to the case prescribed in the external reference (e.g., MedDRA and LOINC).
b. Units, which are considered symbols rather than abbreviated text (e.g., mg/dL).
4.1.3.3 CONTROLLED TERMINOLOGY VALUES
The controlled terminology or a link to the controlled terminology should be included in the define.xml wherever
applicable. All values in the permissible value set for the study should be included, whether they are represented in
the submitted data or not. Note that a null value should not be included in the permissible value set. A null value is
implied for any list of controlled terms unless the variable is “Required” (see 338HSection 4.1.1.5).
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 36 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.3.4 USE OF CONTROLLED TERMINOLOGY AND ARBITRARY NUMBER CODES
Controlled terminology or decoded text should be used instead of arbitrary number codes in order to reduce
ambiguity for submission reviewers. For example, for concomitant medications, the verbatim term and/or dictionary
term should be presented, rather than numeric codes. Separate code values may be submitted as Supplemental
Qualifiers and may be necessary in analysis datasets.
4.1.3.5 STORING CONTROLLED TERMINOLOGY FOR SYNONYM QUALIFIER VARIABLES
For events such as AEs and Medical History, populate --DECOD with the dictionary‘s preferred term and
populate --BODSYS with the preferred body system name. If a dictionary is multi-axial, the value in
--BODSYS should represent the system organ class (SOC) used for the sponsors analysis and summary
tables, which may not necessarily be the primary SOC.
For concomitant medications, populate CMDECOD with the drug's generic name and populate CMCLAS
with the drug class used for the sponsors analysis and summary tables. If coding to multiple classes, follow
339Hassumption 4.1.2.8.1 or omit CMCLAS.
In either case, no other intermediate levels (e.g., MedDRA LLT, HLT, HLGT) or relationships should be stored in
the dataset. These may be provided in a Supplemental Qualifiers dataset (see 340HSection 8.4 and Appendix C5 for more
information). By knowing the dictionary and version used, the reviewer will be able to obtain intermediate levels in
a hierarchy (as in MedDRA), or a drug‘s ATC codes (as in WHO Drug). The sponsor is expected to provide the
dictionary name and version used to map the terms by utilizing the define.xml external codelist attributes.
4.1.3.6 STORING TOPIC VARIABLES FOR GENERAL DOMAIN MODELS
The topic variable for the Interventions and Events general-observation-class models is often stored as verbatim text.
For an Events domain, the topic variable is --TERM. For an Interventions domain, the topic variable is --TRT. For a
Findings domain, the topic variable, --TESTCD, should use Controlled Terminology (e.g., SYSBP for Systolic
Blood Pressure). If CDISC standard controlled terminology exists, it should be used; otherwise sponsors should
define their own controlled list of terms. If the verbatim topic variable in an Interventions or Event domain is
modified to facilitate coding, the modified text is stored in --MODIFY. In most cases (other than PE), the dictionary-
coded text is derived into --DECOD. Since the PEORRES variable is modified instead of the topic variable for PE,
the dictionary-derived text would be placed in PESTRESC. The variables used in each of the defined domains are:
Domain
Original Verbatim
Modified Verbatim
Standardized Value
AE
AETERM
AEMODIFY
AEDECOD
DS
DSTERM
DSDECOD
CM
CMTRT
CMMODIFY
CMDECOD
MH
MHTERM
MHMODIFY
MHDECOD
PE
PEORRES
PEMODIFY
PESTRESC
4.1.3.7 USE OF “YES” AND “NO” VALUES
Variables where the response is ―Yes‖ or ―No‖ (―Y‖ or ―N‖) should normally be populated for both ―Y‖ and ―N‖
responses. This eliminates confusion regarding whether a blank response indicates ―N‖ or is a missing value.
However, some variables are collected or derived in a manner that allows only one response, such as when a single
check box indicates ―Yes‖. In situations such as these, where it is unambiguous to only populate the response of
interest, it is permissible to only populate one value (―Y‖ or ―N‖) and leave the alternate value blank. An example of
when it would be acceptable to use only a value of ―Y‖ would be for Baseline Flag (--BLFL) variables, where ―N‖ is
not necessary to indicate that a value is not a baseline value.
Note: Permissible values for variables with controlled terms of ―Y‖ or ―N‖ may be extended to include ―U‖ or ―NA‖ if
it is the sponsors practice to explicitly collect or derive values indicating ―Unknown‖ or ―Not Applicable‖ for that
variable.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 37
Final November 12, 2008
4.1.4 ACTUAL AND RELATIVE TIME ASSUMPTIONS
Timing variables (Table 2.2.5 of the SDTM) are an essential component of all SDTM subject-level domain datasets.
In general, all domains based on the three general observation classes should have at least one Timing variable. In
the Events or Interventions general observation class this could be the start date of the event or intervention. In the
Findings observation class where data are usually collected at multiple visits, at least one Timing variable must be
used.
The SDTMIG requires dates and times of day to be stored according to the international standard ISO 8601
(342Hhttp://www.iso.org). ISO 8601 provides a text-based representation of dates and/or times, intervals of time, and
durations of time.
4.1.4.1 FORMATS FOR DATE/TIME VARIABLES
An SDTM DTC variable may include data that is represented in ISO 8601 format as a complete date/time, a partial
date/time, or an incomplete date/time.
The SDTMIG template uses ISO 8601 for calendar dates and times of day, which are expressed as follows:
ο YYYY-MM-DDThh:mm:ss
where:
ο [YYYY] = four-digit year
ο [MM] = two-digit representation of the month (01-12, 01=January, etc.)
ο [DD] = two-digit day of the month (01 through 31)
ο [T] = (time designator) indicates time information follows
ο [hh] = two digits of hour (00 through 23) (am/pm is NOT allowed)
ο [mm] = two digits of minute (00 through 59)
ο [ss] = two digits of second (00 through 59)
Other characters defined for use within the ISO 8601 standard are:
ο [-] (hyphen): to separate the time Elements "year" from "month" and "month" from "day" and to represent
missing date components.
ο [:] (colon): to separate the time Elements "hour" from "minute" and "minute" from "second"
ο [/] (solidus): to separate components in the representation of date/time intervals
ο [P] (duration designator): precedes the components that represent the duration
ο NOTE: Spaces are not allowed in any ISO 8601 representations
Key aspects of the ISO 8601 standard are as follows:
ISO 8601 represents dates as a text string using the notation YYYY-MM-DD.
ISO 8601 represents times as a text string using the notation hh:mm:ss.
The SDTM and SDTMIG require use of the ISO 8601 Extended format, which requires hyphen delimiters
for date components and colon delimiters for time components. The ISO 8601 basic format, which does not
require delimiters, should not be used in SDTM datasets.
When a date is stored with a time in the same variable (as a date/time), the date is written in front of the
time and the time is preceded with “T” using the notation YYYY-MM-DDThh:mm:ss
(e.g. 2001-12-26T00:00:01).
Implementation of the ISO 8601 standard means that date/time variables are character/text data types. The SDS
fragment employed for date/time character variables is DTC.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 38 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.4.2 DATE/TIME PRECISION
The concept of representing date/time precision is handled through use of the ISO 8601 standard. According to ISO
8601, precision (also referred to by ISO 8601 as "completeness" or "representations with reduced accuracy") can be
inferred from the presence or absence of components in the date and/or time values. Missing components are
represented by right truncation or a hyphen (for intermediate components that are missing). If the date and time
values are completely missing the SDTM date field should be null. Every component except year is represented as
two digits. Years are represented as four digits; for all other components, one-digit numbers are always padded with
a leading zero.
The table below provides examples of ISO 8601 representation complete date and truncated date/time values using
ISO 8601 "appropriate right truncations" of incomplete date/time representations. Note that if no time component is
represented, the [T] time designator (in addition to the missing time) must be omitted in ISO 8601 representation.
Date and Time as
Originally Recorded
Precision
ISO 8601 Date/Time
1
December 15, 2003 13:14:17
Complete date/time
2003-12-15T13:14:17
2
December 15, 2003 13:14
Unknown seconds
2003-12-15T13:14
3
December 15, 2003 13
Unknown minutes and seconds
2003-12-15T13
4
December 15, 2003
Unknown time
2003-12-15
5
December, 2003
Unknown day and time
2003-12
6
2003
Unknown month, day, and time
2003
This date and date/time model also provides for imprecise or estimated dates, such as those commonly seen in
Medical History. To represent these intervals while applying the ISO 8601 standard, it is recommended that the
sponsor concatenate the date/time values (using the most complete representation of the date/time known) that
describe the beginning and the end of the interval of uncertainty and separate them with a solidus as shown in the
table below:
Interval of Uncertainty
ISO 8601 Date/Time
1
Between 10:00 and 10:30 on the Morning of December 15, 2003
2003-12-15T10:00/2003-12-15T10:30
2
Between the first of this year (2003) until "now" (February 15, 2003)
2003-01-01/2003-02-15
3
Between the first and the tenth of December, 2003
2003-12-01/2003-12-10
4
Sometime in the first half of 2003
2003-01-01/2003-06-30
Other uncertainty intervals may be represented by the omission of components of the date when these components
are unknown or missing. As mentioned above, ISO 8601 represents missing intermediate components through the
use of a hyphen where the missing component would normally be represented. This may be used in addition to
"appropriate right truncations" for incomplete date/time representations. When components are omitted, the
expected delimiters must still be kept in place and only a single hyphen is to be used to indicate an omitted
component. Examples of this method of omitted component representation are shown in the table below:
Date and Time as Originally
Recorded
Level of Uncertainty
ISO 8601 Date/Time
1
December 15, 2003 13:15:17
Complete date
2003-12-15T13:15:17
2
December 15, 2003 ??:15
Unknown hour with known minutes
2003-12-15T-:15
3
December 15, 2003 13:??:17
Unknown minutes with known date, hours,
and seconds
2003-12-15T13:-:17
4
The 15th of some month in 2003, time
not collected
Unknown month and time with known year
and day
2003---15
5
December 15, but can't remember the
year, time not collected
Unknown year with known month and day
--12-15
6
7:15 of some unknown date
Unknown date with known hour and
minute
-----T07:15
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 39
Final November 12, 2008
Note that Row 6 above where a time is reported with no date information represents a very unusual situation. Since
most data is collected as part of a visit, when only a time appears on a CRF, it is expected that the date of the visit
would usually be used as the date of collection.
Using a character-based data type to implement the ISO 8601 date/time standard will ensure that the date/time
information will be machine and human readable without the need for further manipulation, and will be platform
and software independent.
4.1.4.3 INTERVALS OF TIME AND USE OF DURATION FOR --DUR VARIABLES
4.1.4.3.1 INTERVALS OF TIME AND USE OF DURATION FOR --DUR VARIABLES
As defined by ISO 8601, an interval of time is the part of a time axis, limited by two time "instants" such as the
times represented in SDTM by the variables --STDTC and --ENDTC. These variables represent the two instants that
bound an interval of time, while the duration is the quantity of time that is equal to the difference between these time
points.
ISO 8601 allows an interval to be represented in multiple ways. One representation, shown below, uses two dates in
the format:
YYYY-MM-DDThh:mm:ss/YYYY-MM-DDThh:mm:ss
While the above would represent the interval (by providing the start date/time and end date/time to "bound" the
interval of time), it does not provide the value of the duration (the quantity of time).
Duration is frequently used during a review; however, the duration timing variable (--DUR) should generally be
used in a domain if it was collected in lieu of a start date/time (--STDTC) and end date/time (--ENDTC). If both
--STDTC and --ENDTC are collected, durations can be calculated by the difference in these two values, and need
not be in the submission dataset.
Both duration and duration units can be provided in the single --DUR variable, in accordance with the ISO 8601
standard. The values provided in --DUR should follow one of the following ISO 8601 duration formats:
PnYnMnDTnHnMnS or PnW
where:
[P] (duration designator): precedes the alphanumeric text string that represents the duration. NOTE: The
use of the character P is based on the historical use of the term "period" for duration.
[n] represents a positive -number or zero
[W] is used as week designator, preceding a data Element that represents the number of calendar weeks
within the calendar year (e.g., P6W represents 6 weeks of calendar time).
The letter "P" must precede other values in the ISO 8601 representation of duration. The ―n‖ preceding each letter
represents the number of Years, Months, Days, Hours, Minutes, Seconds, or the number of Weeks. As with the
date/time format, ―T‖ is used to separate the date components from time components.
Note that weeks cannot be mixed with any other date/time components such as days or months in duration expressions.
As is the case with the date/time representation in --DTC, --STDTC, or --ENDTC only the components of duration
that are known or collected need to be represented. Also, as is the case with the date/time representation, if no time
component is represented, the [T] time designator (in addition to the missing time) must be omitted in ISO 8601
representation.
ISO 8601 also allows that the "lowest-order components" of duration being represented may be represented in
decimal format. This may be useful if data are collected in formats such as "one and one-half years", "two and one-
half weeks", "one-half a week" or "one quarter of an hour" and the sponsor wishes to represent this "precision" (or
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 40 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
lack of precision) in ISO 8601 representation. Remember that this is ONLY allowed in the lowest-order (right-most)
component in any duration representation.
The table below provides some examples of ISO-8601-compliant representations of durations:
Duration as originally recorded
ISO 8601 Duration
2 Years
P2Y
10 weeks
P10W
3 Months 14 days
P3M14D
3 Days
P3D
6 Months 17 Days 3 Hours
P6M17DT3H
14 Days 7 Hours 57 Minutes
P14DT7H57M
42 Minutes 18 Seconds
PT42M18S
One-half hour
PT0.5H
5 Days 12¼ Hours
P5DT12.25H
4 ½ Weeks
P4.5W
Note that a leading zero is required with decimal values less than one.
4.1.4.3.2 INTERVAL WITH UNCERTAINTY
When an interval of time is an amount of time (duration) following an event whose start date/time is recorded (with
some level of precision, i.e. when one knows the start date/time and the duration following the start date/time), the
correct ISO 8601 usage to represent this interval is as follows:
YYYY-MM-DDThh:mm:ss/PnYnMnDTnHnMnS
where the start date/time is represented before the solidus [/], the "Pn…", following the solidus, represents a
―duration‖, and the entire representation is known as an ―interval‖. NOTE: This is the recommended representation
of elapsed time, given a start date/time and the duration elapsed.
When an interval of time is an amount of time (duration) measured prior to an event whose start date/time is
recorded (with some level of precision, i.e. where one knows the end date/time and the duration preceding that end
date/time), the syntax is:
PnYnMnDTnHnMnS/YYYY-MM-DDThh:mm:ss
where the duration, "Pn…", is represented before the solidus [/], the end date/time is represented following the
solidus, and the entire representation is known as an ―interval‖.
4.1.4.4 USE OF THE “STUDY DAY” VARIABLES
The permissible Study Day variables (--DY, --STDY, and --ENDY) describe the relative day of the observation
starting with the reference date as Day 1. They are determined by comparing the date portion of the respective
date/time variables (--DTC, --STDTC, and --ENDTC) to the date portion of the Subject Reference Start Date
(RFSTDTC from the Demographics domain).
The Subject Reference Start Date (RFSTDTC) is designated as Study Day 1. The Study Day value is incremented by
1 for each date following RFSTDTC. Dates prior to RFSTDTC are decremented by 1, with the date preceding
RFSTDTC designated as Study Day -1 (there is no Study Day 0). This algorithm for determining Study Day is
consistent with how people typically describe sequential days relative to a fixed reference point, but creates
problems if used for mathematical calculations because it does not allow for a Day 0. As such, Study Day is not
suited for use in subsequent numerical computations, such as calculating duration. The raw date values should be
used rather than Study Day in those calculations.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 41
Final November 12, 2008
All Study Day values are integers. Thus, to calculate Study Day:
--DY = (date portion of --DTC) - (date portion of RFSTDTC) + 1 if --DTC is on or after RFSTDTC
--DY = (date portion of --DTC) - (date portion of RFSTDTC) if --DTC precedes RFSTDTC
This algorithm should be used across all domains.
4.1.4.5 CLINICAL ENCOUNTERS AND VISITS
All domains based on the three general observation classes should have at least one timing variable. For domains in
the Events or Interventions observations classes, and for domains in the Findings observation class for which data
are collected only once during the study, the most appropriate timing variable may be a date (e.g., --DTC, --STDTC)
or some other timing variable. For studies that are designed with a prospectively defined schedule of visit-based
activities, domains for data that are to be collected more than once per subject (e.g., Labs, ECG, Vital Signs) are
expected to include VISITNUM as a timing variable.
Clinical encounters are described by the CDISC Visit variables. For planned visits, values of VISIT, VISITNUM,
and VISITDY must be those defined in the Trial Visits dataset, see 343HSection 7.4. For planned visits:
Values of VISITNUM are used for sorting and should, wherever possible, match the planned chronological
order of visits. Occasionally, a protocol will define a planned visit whose timing is unpredictable (e.g., one
planned in response to an adverse event, a threshold test value, or a disease event), and completely
chronological values of VISITNUM may not be possible in such a case.
There should be a one-to-one relationship between values of VISIT and VISITNUM.
For visits that may last more than one calendar day, VISITDY should be the planned day of the start of the visit.
Sponsor practices for populating visit variables for unplanned visits may vary across sponsors.
VISITNUM should generally be populated, even for unplanned visits, as it is expected in many Findings
domains, as described above. The easiest method of populating VISITNUM for unplanned visits is to assign
the same value (e.g., 99) to all unplanned visits, but this method provides no differentiation between the
unplanned visits and does not provide chronological sorting. Methods that provide a one-to-one relationship
between visits and values of VISITNUM, that are consistent across domains, and that assign VISITNUM
values that sort chronologically require more work and must be applied after all of a subject's unplanned visits
are known.
VISIT may be left null or may be populated with a generic value (e.g., "Unscheduled") for all unplanned
visits, or individual values may be assigned to different unplanned visits.
VISITDY should not be populated for unplanned visits, since VISITDY is, by definition, the planned study
day of visit, and since the actual study day of an unplanned visit belongs in a --DY variable.
The following table shows an example of how the visit identifiers might be used for lab data:
USUBJID
VISIT
VISITNUM
VISITDY
LBDY
001
Week 1
2
7
7
001
Week 2
3
14
13
001
Week 2 Unscheduled
3.1
17
4.1.4.6 REPRESENTING ADDITIONAL STUDY DAYS
The SDTM allows for --DTC values to be represented as study days (--DY) relative to the RFSTDTC reference start
date variable in the DM dataset, as described above in 344HSection 4.1.4.4. The calculation of additional study days
within subdivisions of time in a clinical trial may be based on one or more sponsor-defined reference dates not
represented by RFSTDTC. In such cases, the Sponsor may define Supplemental Qualifier variables and the
define.xml should reflect the reference dates used to calculate such study days. If the sponsor wishes to define ―day
within element‖ or ―day within epoch,‖ the reference date/time will be an element start date/time in the Subject
Elements dataset (345HSection 5.3.1).
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 42 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.4.7 USE OF RELATIVE TIMING VARIABLES
--STRF and --ENRF
The variables --STRF and --ENRF represent the timing of an observation relative to the sponsor-defined reference period
when information such as "BEFORE‖, ―PRIOR‖,‖ONGOING‖', or ―CONTINUING‖ is collected in lieu of a date and this
collected information is in relation to the sponsor-defined reference period. The sponsor-defined reference period is the
continuous period of time defined by the discrete starting point (RFSTDTC) and the discrete ending point (RFENDTC) for
each subject in the Demographics dataset.
--STRF is used to identify the start of an observation relative to the sponsor-defined reference period.
--ENRF is used to identify the end of an observation relative to the sponsor-defined reference period.
Allowable values for --STRF and --ENRF are ―BEFORE‖, ―DURING‖, ―DURING/AFTER‖, ―AFTER‖, and ―U‖ (for
unknown).
As an example, a CRF checkbox that identifies concomitant medication use that began prior to the study treatment
period would translate into CMSTRF = ―BEFORE‖ if selected. Note that in this example, the information collected
is with respect to the start of the concomitant medication use only and therefore the collected data corresponds to
variable CMSTRF, not CMENRF. Note also that the information collected is relative to the study treatment period,
which meets the definition of CMSTRF.
Some sponsors may wish to derive --STRF and --ENRF for analysis or reporting purposes even when dates are
collected. Sponsors are cautioned that doing so in conjunction with directly collecting or mapping data such as
―BEFORE‖, ―PRIOR", etc. to --STRF and --ENRF will blur the distinction between collected and derived values
within the domain. Sponsors wishing to do such derivations are instead encouraged to use supplemental variables or
analysis datasets for this derived data.
In general, sponsors are cautioned that representing information using variables --STRF and --ENRF may not be as
precise as other methods, particularly because information is often collected relative to a point in time or to a period
of time other than the one defined as the study reference period. SDTMIG V3.1.2 has attempted to address these
limitations by the addition of four new relative timing variables, which are described in the following paragraph.
Sponsors should use the set of variables that allows for accurate representation of the collected data. In many cases,
this will mean using these new relative timing variables in place of --STRF and --ENRF.
--STRTPT, --STTPT, --ENRTPT, and --ENTPT
While the variables --STRF and --ENRF are useful in the case when relative timing assessments are made coincident
with the start and end of the study reference period, these may not be suitable for expressing relative timing
assessments such as ―Prior‖ or ―Ongoing‖ that are collected at other times of the study. As a result, four new timing
variables have been added in V3.1.2 to express a similar concept at any point in time. The variables --STRTPT and
--ENRTPT contain values similar to --STRF and --ENRF, but may be anchored with any timing description or
date/time value expressed in the respective --STTPT and --ENTPT variables, and not be limited to the study reference
period. Unlike the variables --STRF and --ENRF, which for all domains are defined relative to one study reference
period, the timing variables --STRTPT, --STTPT, --ENRTPT, and --ENTPT are defined to be unique within a domain
only. Allowable values for --STRTPT and --ENRTPT are as follows:
If the reference time point corresponds to the date of collection or assessment:
Start values: an observation can start BEFORE that time point, can start COINCIDENT with that time
point, or it is unknown (U) when it started
End values: an observation can end BEFORE that time point, can end COINCIDENT with that time point, can
be known that it didn‘t end but was ONGOING, or it is unknown (U) at all when it ended or if it was ongoing.
AFTER is not a valid value in this case because it would represent an event after the date of collection.
If the reference time point is prior to the date of collection or assessment:
Start values: an observation can start BEFORE the reference point, can start COINCIDENT with the
reference point, can start AFTER the reference point, or it may not be known (U) when it started
End values: an observation can end BEFORE the reference point, can end COINCIDENT with the
reference point, can end AFTER the reference point, can be known that it didn‘t end but was ONGOING, or
it is unknown (U) when it ended or if it was ongoing.
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 43
Final November 12, 2008
Examples of --STRTPT, --STTPT, --ENRTPT, and --ENTPT
1. Medical History
Assumptions:
CRF contains "Year Started" and check box for "Active"
"Date of Assessment" is collected
Example when "Active" is checked:
MHDTC = date of assessment value, ex. "2006-11-02"
MHSTDTC = year of condition start, e.g., "2002"
MHENRTPT = "ONGOING"
MHENTPT = date of assessment value, e.g., "2006-11-02"
Figure 4.1.4.7 Example of --ENRTPT and --ENTPT for Medical History
MHENTPT
Assessment Date and
Reference Time Point of
2006-11-02
MHDTC = 2006-11-02
MHSTDTC = 2002
MHENRTPT = ONGOING
MHENTPT = 2006-11-02
Medical event began in 2002 and
was ongoing at the reference time
point of 2006-11-02. The medical
event may or may not have ended
at any time after that.
2002
Prior and Concomitant Medications
Assumptions:
CRF contains "Start Date", "Stop Date", and check boxes for "Prior" if unknown or uncollected Start Date, and
"Continuing" if no Stop Date was collected. Prior refers to screening visit and Continuing refers to final study visit.
Example when both "Prior" and "Continuing" are checked:
CMSTDTC = [null]
CMENDTC = [null]
CMSTRTPT = "BEFORE"
CMSTTPT = screening date, e.g., "2006-10-21"
CMENRTPT = "ONGOING"
CMENTPT = final study visit date, e.g., "2006-11-02"
2. Adverse Events
Assumptions:
CRF contains "Start Date", "Stop Date", and "Outcome" with check boxes including "Continuing" and
"Unknown" (Continuing and Unknown are asked at the end of the subject's study participation)
No assessment date or visit information is collected
Example when "Unknown" is checked:
AESTDTC = start date, e.g., "2006-10-01"
AEENDTC = [null]
AEENRTPT = "U"
AEENTPT = final subject contact date, e.g., "2006-11-02"
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 44 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
4.1.4.8 DATE AND TIME REPORTED IN A DOMAIN BASED ON FINDINGS
When the date/time of collection is reported in any domain, the date/time should go into the --DTC field (e.g., EGDTC for
Date/Time of ECG). For any domain based on the Findings general observation class, such as lab tests which are based on a
specimen, the collection date is likely to be tied to when the specimen or source of the finding was captured, not necessarily
when the data was recorded. In order to ensure that the critical timing information is always represented in the same variable,
the --DTC variable is used to represent the time of specimen collection. For example, in the LB domain the LBDTC variable
would be used for all single-point blood collections or spot urine collections. For timed lab collections (e.g., 24-hour urine
collections) the LBDTC variable would be used for the start date/time of the collection and LBENDTC for the end date/time
of the collection. This approach will allow the single-point and interval collections to use the same date/time variables
consistently across all datasets for the Findings general observation class. The table below illustrates the proper use of these
variables. Note that --STDTC is not used for collection dates over an interval, so is blank in the following table.
Collection Type
--DTC
--STDTC
--ENDTC
Single-Point Collection
X
Interval Collection
X
X
4.1.4.9 USE OF DATES AS RESULT VARIABLES
Dates are generally used only as timing variables to describe the timing of an event, intervention, or collection activity, but there may
be occasions when it may be preferable to model a date as a result (--ORRES) in a Findings dataset. Note that using a date as a result
to a Findings question is unusual and atypical, and should be approached with caution, but this situation may occasionally occur when
a) a group of questions (each of which has a date response) is asked and analyzed together; or b) the Event(s) and Intervention(s) in
question are not medically significant (often the case when included in questionnaires). Consider the following cases:
Calculated due date
Date of last day on the job
Date of high school graduation
One approach to modeling these data would be to place the text of the question in --TEST and the response to the
question, a date represented in ISO 8601 format, in --ORRES and --STRESC as long as these date results do not
contain the dates of medically significant events or interventions.
Again, use extreme caution when storing dates as the results of Findings. Remember, in most cases, these dates
should be timing variables associated with a record in an Intervention or Events dataset.
4.1.4.10 REPRESENTING TIME POINTS
Time points can be represented using the time point variables, --TPT, --TPTNUM, --ELTM, and the time point
anchors, --TPTREF (text description) and --RFTDTC (the date/time). Note that time-point data will usually have an
associated --DTC value. The interrelationship of these variables is shown in Figure 4.1.4.10 below.
Figure 4.1.4.10
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 45
Final November 12, 2008
Values for these variables for Vital Signs measurements taken at 30, 60, and 90 minutes after dosing would look like
the following.
VSTPTNUM
VSTPT
VSELTM
VSTPTREF
VSRFTDTC
VSDTC
1
30 MIN
PT30M
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-01T08:30
2
60 MIN
PT1H
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-01T09:01
3
90 MIN
PT1H30M
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-01T09:32
Note that the actual elapsed time is not an SDTM variable, but can be derived by an algorithm representing VSDTC-VSRFTDTC.
Values for these variables for Urine Collections taken pre-dose, and from 0-12 hours and 12-24 hours after dosing
would look like the following.
LBTPTNUM
LBTPT
LBELTM
LBTPTREF
LBRFTDTC
LBDTC
1
15 MIN PRE-DOSE
-PT15M
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-01T08:30
2
0-12 HOURS
PT12H
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-01T20:35
3
12-24 HOURS
PT24H
DOSE ADMINISTRATION
2006-08-01T08:00
2006-08-02T08:40
Note that the value in LBELTM represents the end of the interval at which the collection ends.
When time points are used, --TPTNUM is expected. Time points may or may not have an associated --TPTREF.
Sometimes, --TPTNUM may be used as a key for multiple values collected for the same test within a visit; as such,
there is no dependence upon an anchor such as --TPTREF, but there will be a dependency upon the VISITNUM. In
such cases, VISITNUM will be required to confer uniqueness to values of --TPTNUM.
If the protocol describes the scheduling of a dose using a reference intervention or assessment, then --TPTREF
should be populated, even if it does not contribute to uniqueness. The fact that time points are related to a reference
time point, and what that reference time point is, are important for interpreting the data collected at the time point.
Not all time points will require all three variables to provide uniqueness. In fact, in some cases a time point may be
uniquely identified without the use of VISIT, or without the use of --TPTREF, or, rarely, without the use of either
one. For instance:
A trial might have time points only within one visit, so that the contribution of VISITNUM to uniqueness is
trivial.
A trial might have time points that do not relate to any visit, such as time points relative to a dose of drug
self-administered by the subject at home.
A trial may have only one reference time point per visit, and all reference time points may be similar, so that
only one value of --TPTREF (e.g., "DOSE") is needed.
A trial may have time points not related to a reference time point. For instance, --TPTNUM values could be
used to distinguish first, second, and third repeats of a measurement scheduled without any relationship to
dosing.
For trials with many time points, the requirement to provide uniqueness using only VISITNUM, --TPTREF, and
--TPTNUM may lead to a scheme where multiple natural keys are combined into the values of one of these variables.
For instance, in a crossover trial with multiple doses on multiple days within each period, either of the following
options could be used. VISITNUM might be used to designate period, --TPTREF might be used to designate the day
and the dose, and --TPTNUM might be used to designate the timing relative to the reference time point. Alternatively,
VISITNUM might be used to designate period and day within period, --TPTREF might be used to designate the dose
within the day, and --TPTNUM might be used to designate the timing relative to the reference time point.
CDISC SDTM Implementation Guide (Version 3.1.2)
Page 46 © 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved
November 12, 2008 Final
Option 1
VISIT VISITNUM --TPT --TPTNUM --TPTREF
PERIOD 1 3 PRE-DOSE 1 DAY 1, AM DOSE
1H 2
4H 3
PRE-DOSE 1 DAY 1, PM DOSE
1H 2
4H 3
PRE-DOSE 1 DAY 5, AM DOSE
1H 2
4H 3
PRE-DOSE 1 DAY 5, PM DOSE
1H 2
4H 3
PERIOD 2 4 PRE-DOSE 1 DAY 1, AM DOSE
1H 2
4H 3
PRE-DOSE 1 DAY 1, PM DOSE
1H 2
4H 3
Option 2
VISIT VISITNUM --TPT --TPTNUM --TPTREF
PERIOD 1, DAY 1 3 PRE-DOSE 1 AM DOSE
1H 2
4H 3
PRE-DOSE 1
PM DOSE
1H 2
4H 3
PERIOD 1, DAY 5 4 PRE-DOSE 1 AM DOSE
1H 2
4H 3
PRE-DOSE 1
PM DOSE
1H 2
4H 3
PERIOD 2, DAY 1 5 PRE-DOSE 1 AM DOSE
1H 2
4H 3
PRE-DOSE 1
PM DOSE
1H 2
4H 3
Within the context that defines uniqueness for a time point, which may include domain, visit, and reference time
point, there must be a one-to-relationship between values of --TPT and --TPTNUM. In other words, if domain, visit,
and reference time point uniquely identify subject data, then if two subjects have records with the same values of
DOMAIN, VISITNUM, --TPTREF, and --TPTNUM, then these records may not have different time point
descriptions in --TPT.
Within the context that defines uniqueness for a time point, there is likely to be a one-to-one relationship between
most values of --TPT and --ELTM. However, since --ELTM can only be populated with ISO 8601 periods of time
(as described in 346HSection 4.1.4.3), --ELTM may not be populated for all time points. For example, --ELTM is likely to
be null for time points described by text such as "pre-dose" or "before breakfast." When --ELTM is populated, if two
CDISC SDTM Implementation Guide (Version 3.1.2)
© 2008 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Page 47
Final November 12, 2008
subjects have records with the same values of DOMAIN, VISITNUM, --TPTREF, and --TPTNUM, then these
records may not have different values in --ELTM.
When the protocol describes a time point with text such as "4-6 hours after dose" or "12 hours +/- 2 hours after
dose" the sponsor may choose whether and how to populate --ELTM. For example, a time point described as "4-6
hours after dose" might be associated with an --ELTM value of PT4H. A time point described as "12 hours +/- 2
hours after dose" might be associated with an --ELTM value of PT12H. Conventions for populating --ELTM should
be consistent (the examples just given would probably not both be used in the same trial). It would be good practice
to indicate the range of intended timings by some convention in the values used to populate --TPT.
Sponsors may, of course, use more stringent requirements for populating --TPTNUM, --TPT, and --ELTM. For
instance, a sponsor could decide that all time points with a particular --ELTM value would have the same values of
--TPTNUM and --TPT, across all visits, reference time points, and domains.
4.1.5 OTHER ASSUMPTIONS
4.1.5.1 ORIGINAL AND STANDARDIZED RESULTS OF FINDINGS AND TESTS NOT DONE
4.1.5.1.1 ORIGINAL AND STANDARDIZED RESULTS
The --ORRES variable contains the result of the measurement or finding as originally received or collected.
--ORRES is an expected variable and should always be populated, with two exceptions:
When --STAT = ―NOT DONE‖
--ORRES should generally not be populated for derived records
Derived records are flagged with the --DRVFL variable. When the derived record comes from more than one visit,
the sponsor must define the value for VISITNUM, addressing the correct temporal sequence. If a new record is
derived for a dataset, and the source is not eDT, then that new record should be flagged as derived. For example in
ECG data, if QTc Intervals are derived in-house by the sponsor, then the derived flag is set to ―Y‖. If the QTc
Intervals are received from a vendor the derived flag is not populated.
When --ORRES is populated, --STRESC must also be populated, regardless of whether the data values are character
or numeric. The variable, --STRESC, is derived either by the conversion of values in --ORRES to values with
standard units, or by the assignment of the value of --ORRES (as in the PE Domain, where --STRESC could contain
a dictionary-derived term). A further step is necessary when --STRESC contains numeric values. These are
converted to numeric type and written to --STRESN. Because --STRESC may contain a mixture of numeric and
character values, --STRESN may contain null values, as shown in the flowchart below.
--ORRES
(all original values)
--STRESC
(derive or copy all results)
--STRESN
(numeric results only)
When the original measurement or finding is a selection from a defined codelist, in general, the --ORRES and
--STRESC variables contain results in decoded format, that is, the textual interpretation of whichever code was
selected from the codelist. In some cases where the code values in the codelist are statistically meaningful
standardized values or scores, which are defined by sponsors or by valid methodologies such as SF36
questionnaires, the --ORRES variables will contain the decoded format, whereas, the --STRESC variables as well as
the --STRESN variables will contain the standardized values or scores.
Occasionally data that are intended to be numeric are collected with characters attached that cause the character-to-numeric
conversion to fail. For example, numeric cell counts in the source data may be sp