[R] Base Reference Stata Manual V13

User Manual:

Open the PDF directly: View PDF .
Page Count: 2556 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Contents
[IG] Installation Guide
- Simple installation
- Installing Stata for Windows
- Installing Stata for Mac
- Installing Stata for Unix
- Platforms and flavors
  - Available platforms
  - Available flavors
- Documentation
[GS] Getting Started
- [GSM] Mac
- [GSU] Unix
- [GSW] Windows
[U] User's Guide
- Contents
- Stata basics
- Elements of Stata
- Advice
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - Y
[D] Data Management
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- data management
  - Description
  - Reference
  - Also see
- append
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- assert
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- bcal
  - Syntax
  - Menu
  - Description
  - Option for bcal check
  - Options for bcal create
  - Remarks and examples
  - Stored results
  - Also see
- by
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- cd
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- cf
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - Reference
  - Also see
- changeeol
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- checksum
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- clear
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- clonevar
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Acknowledgments
  - Also see
- codebook
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- collapse
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgment
  - Also see
- compare
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- compress
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- contract
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgments
  - Reference
  - Also see
- copy
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- corr2data
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Reference
  - Also see
- count
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Stored results
  - References
  - Also see
- cross
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - References
  - Also see
- data types
  - Description
  - Remarks and examples
    - Precision of numeric storage types
  - Also see
- datasignature
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- datetime
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- datetime business calendars
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- datetime business calendars creation
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- datetime display formats
  - Syntax
  - Description
  - Remarks and examples
    - Specifying display formats
    - Times are truncated, not rounded, when displayed
  - Also see
- datetime translation
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- describe
  - Syntax
  - Menu
  - Description
  - Options to describe data in memory
  - Options to describe data in file
  - Remarks and examples
    - describe
    - describe, replace
  - Stored results
  - References
  - Also see
- destring
  - Syntax
  - Menu
  - Description
  - Options for destring
  - Options for tostring
  - Remarks and examples
  - Acknowledgment
  - References
  - Also see
- dir
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- drawnorm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- drop
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Reference
  - Also see
- ds
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgments
  - References
  - Also see
- duplicates
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- edit
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - References
  - Also see
- egen
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- encode
  - Syntax
  - Menu
  - Description
  - Options for encode
  - Options for decode
  - Remarks and examples
    - encode
    - decode
  - Reference
  - Also see
- erase
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- expand
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Reference
  - Also see
- expandcl
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- export
  - Description
  - Remarks and examples
    - Summary of the different methods
  - Also see
- filefilter
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- fillin
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - References
  - Also see
- format
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - References
  - Also see
- functions
  - Description
  - Acknowledgments
  - References
  - Also see
- generate
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - generate and replace
    - set type
  - Methods and formulas
  - References
  - Also see
- gsort
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- hexdump
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- icd9
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Descriptions
  - Stored results
  - Reference
- import
  - Description
  - Remarks and examples
  - Reference
  - Also see
- import delimited
  - Syntax
  - Menu
  - Description
  - Options for import delimited
  - Options for export delimited
  - Remarks and examples
    - import delimited
    - export delimited
  - Also see
- import excel
  - Syntax
  - Menu
  - Description
  - Options for import excel
  - Options for export excel
  - Remarks and examples
    - Video example
  - Stored results
  - References
  - Also see
- import haver
  - Syntax
  - Menu
  - Description
  - Options for import haver
  - Options for import haver, describe
  - Option for set haverdir
  - Remarks and examples
  - Stored results
  - Acknowledgment
  - Also see
- import sasxport
  - Syntax
  - Menu
  - Description
  - Options for import sasxport
  - Option for import sasxport, describe
  - Options for export sasxport
  - Remarks and examples
  - Stored results
  - Technical appendix
  - Also see
- infile (fixed format)
  - Syntax
  - Menu
  - Description
  - Options
    - Dictionary directives
  - Remarks and examples
  - References
  - Also see
- infile (free format)
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- infix (fixed format)
  - Syntax
  - Menu
  - Description
  - Options
    - Specifications
  - Remarks and examples
  - Also see
- input
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- inspect
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- ipolate
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Reference
  - Also see
- isid
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- joinby
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgment
  - Reference
  - Also see
- label
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- label language
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- labelbook
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgments
  - References
  - Also see
- list
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- lookfor
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - References
  - Also see
- memory
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- merge
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- missing values
  - Description
  - Remarks and examples
  - Reference
  - Also see
- mkdir
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- mvencode
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgment
  - Also see
- notes
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - References
  - Also see
- obs
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- odbc
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- order
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- outfile
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- pctile
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - Also see
- putmata
  - Syntax
  - Description
  - Options for putmata
  - Options for getmata
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- range
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- recast
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- recode
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgment
  - Also see
- rename
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - References
  - Also see
- rename group
  - Syntax
  - Menu
  - Description
  - Options for renaming variables
  - Options for changing the case of groups of variable names
  - Remarks and examples
  - Stored results
  - Also see
- reshape
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgment
  - References
  - Also see
- rmdir
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- sample
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- save
  - Syntax
  - Menu
  - Description
  - Options for save
  - Options for saveold
  - Remarks and examples
  - Also see
- separate
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgment
  - Reference
  - Also see
- shell
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- snapshot
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- sort
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - References
  - Also see
- split
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgments
  - Also see
- stack
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- statsby
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgment
  - References
  - Also see
- sysuse
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- type
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- use
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- varmanage
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- webuse
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- xmlsave
  - Syntax
  - Menu
  - Description
  - Options for xmlsave
  - Options for xmluse
  - Remarks and examples
  - Also see
- xpose
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- zipfile
  - Syntax
  - Description
  - Option for zipfile
  - Option for unzipfile
  - Remarks and examples
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
[G] Graphics
- Contents
- Introduction
- Commands
- Options
- Styles/concepts/schemes
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
[ME] Multilevel Mixed Effects
- Contents
- me
  - Syntax by example
  - Formal syntax
  - Description
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- mecloglog
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mecloglog postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - Also see
- meglm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meglm postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- melogit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- melogit postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat icc
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Prediction
    - Intraclass correlations
  - Also see
- menbreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Introduction
    - Two-level models
  - Stored results
  - Methods and formulas
  - References
  - Also see
- menbreg postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - Also see
- meologit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meologit postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - Also see
- meoprobit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meoprobit postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - Also see
- mepoisson
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mepoisson postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat group
  - Menu for estat
  - Remarks and examples
  - Methods and formulas
  - Also see
- meprobit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meprobit postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat icc
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Prediction
    - Intraclass correlations
  - Also see
- meqrlogit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meqrlogit postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat recovariance
  - Option for estat icc
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Prediction
    - Intraclass correlations
  - References
  - Also see
- meqrpoisson
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- meqrpoisson postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat recovariance
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mixed
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- mixed postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat icc
  - Options for estat recovariance
  - Options for estat wcorrelation
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- Glossary
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - Y
  - Z
[MI] Multiple Imputation
- Contents
- intro substantive
  - Description
  - Remarks and examples
  - References
  - Also see
- intro
  - Syntax
  - Description
  - Remarks and examples
  - Acknowledgments
  - Also see
- estimation
  - Description
  - Also see
- mi add
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mi append
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mi convert
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi copy
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- mi describe
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - mi query
    - mi describe
  - Stored results
  - Also see
- mi erase
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- mi estimate
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Univariate case
    - Multivariate case
  - Acknowledgments
  - References
  - Also see
- mi estimate using
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi estimate postestimation
  - Description
  - Remarks and examples
    - Using the command-specific postestimation tools
  - Also see
- mi expand
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi export
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- mi export ice
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - References
  - Also see
- mi export nhanes1
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi extract
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi import
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- mi import flong
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi import flongsep
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi import ice
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- mi import nhanes1
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Description of the nhanes1 format
    - Importing nhanes1 data
  - Also see
- mi import wide
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi impute
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute chained
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- mi impute intreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- mi impute logit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute mlogit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Univariate imputation using multinomial logistic regression
    - Using mi impute mlogit
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute monotone
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute mvn
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute nbreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Univariate imputation using negative binomial regression
    - Using mi impute nbreg
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- mi impute ologit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Univariate imputation using ordered logistic regression
    - Using mi impute ologit
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute pmm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute poisson
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Univariate imputation using Poisson regression
    - Using mi impute poisson
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute regress
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi impute truncreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Univariate imputation using truncated regression
    - Using mi impute truncreg
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi merge
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mi misstable
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mi passive
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi predict
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- mi ptrace
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mi rename
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- mi replace0
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- mi reset
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Using mi reset
    - Technical notes and relation to mi update
  - Also see
- mi reshape
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi select
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- mi set
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- mi stsplit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- mi test
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mi update
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- mi varying
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Detecting problems
    - Fixing problems
  - Stored results
  - Also see
- mi xeq
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- mi XXXset
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- noupdate option
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- styles
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- technical
  - Description
  - Remarks and examples
  - Also see
- workflow
  - Description
  - Remarks and examples
  - Also see
- Glossary
  - Also see
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
[MV] Multivariate Statistics
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- multivariate
  - Description
  - Remarks and examples
  - Also see
- alpha
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- biplot
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- ca
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- ca postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
    - Postestimation statistics
    - Predicting new variables
  - Stored results
  - Methods and formulas
  - References
  - Also see
- ca postestimation plots
  - Description
  - cabiplot
  - caprojection
  - Remarks and examples
  - References
  - Also see
- candisc
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- canon
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- canon postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- cluster
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- clustermat
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- cluster dendrogram
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- cluster generate
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- cluster kmeans and kmedians
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Reference
  - Also see
- cluster linkage
  - Syntax
  - Menu
  - Description
  - Options for cluster linkage commands
  - Options for clustermat linkage commands
  - Remarks and examples
  - Methods and formulas
  - Also see
- cluster notes
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- cluster programming subroutines
  - Description
  - Remarks and examples
  - Reference
  - Also see
- cluster programming utilities
  - Syntax
  - Description
  - Options for cluster set
  - Options for cluster delete
  - Options for cluster measures
  - Remarks and examples
  - Stored results
  - Also see
- cluster stop
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- cluster utility
  - Syntax
  - Menu
  - Description
  - Options for cluster list
  - Options for cluster renamevar
  - Remarks and examples
  - Also see
- discrim
  - Syntax
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- discrim estat
  - Description
    - Special-interest postestimation commands
  - Syntax
  - Menu for estat
  - Options for estat classtable
  - Options for estat errorrate
  - Options for estat grsummarize
  - Options for estat list
  - Options for estat summarize
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- discrim knn
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- discrim knn postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- discrim lda
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Predictive LDA
    - Descriptive LDA
  - References
  - Also see
- discrim lda postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- discrim logistic
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- discrim logistic postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Reference
  - Also see
- discrim qda
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- discrim qda postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- factor
  - Syntax
  - Menu
  - Description
  - Options for factor and factormat
  - Options unique to factormat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- factor postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- hotelling
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- manova
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- manova postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for manovatest
  - Menu for manovatest
  - Options for manovatest
  - Syntax for test after manova
  - Menu for test after manova
  - Options for test after manova
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Also see
- matrix dissimilarity
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- mca
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mca postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
    - Options for estat coordinates
    - Options for estat summarize
  - Remarks and examples
    - Postestimation statistics
    - Predicting new variables
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mca postestimation plots
  - Description
  - mcaplot
  - mcaprojection
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- mds
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mds postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
    - Postestimation statistics
    - Predictions
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mds postestimation plots
  - Description
  - mdsconfig
  - mdsshepard
  - Remarks and examples
  - References
  - Also see
- mdslong
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mdsmat
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- measure_option
  - Syntax
  - Description
  - Options
  - References
  - Also see
- mvreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- mvreg postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- mvtest
  - Syntax
  - Description
  - References
  - Also see
- mvtest correlations
  - Syntax
  - Menu
  - Description
  - Options for multiple-sample tests
  - Options for one-sample tests
  - Remarks and examples
    - One-sample tests for correlation matrices
    - A multiple-sample test for correlation matrices
  - Stored results
  - Methods and formulas
    - One-sample tests for correlation matrices
    - A multiple-sample test for correlation matrices
  - References
  - Also see
- mvtest covariances
  - Syntax
  - Menu
  - Description
  - Options for multiple-sample tests
  - Options for one-sample tests
  - Remarks and examples
    - One-sample tests for covariance matrices
    - A multiple-sample test for covariance matrices
  - Stored results
  - Methods and formulas
    - One-sample tests for covariance matrices
    - A multiple-sample test for covariance matrices
  - References
  - Also see
- mvtest means
  - Syntax
  - Menu
  - Description
  - Options for multiple-sample tests
  - Options with one-sample tests
  - Remarks and examples
    - One-sample tests for mean vectors
    - Multiple-sample tests for mean vectors
  - Stored results
  - Methods and formulas
    - One-sample tests for mean vectors
    - Multiple-sample tests for mean vectors
  - References
  - Also see
- mvtest normality
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- pca
  - Syntax
  - Menu
  - Description
  - Options
  - Options unique to pcamat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- pca postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- procrustes
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- procrustes postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Options for estat
  - Syntax for procoverlay
  - Menu for procoverlay
  - Options for procoverlay
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- rotate
  - Syntax
  - Menu
  - Description
  - Options
    - Rotation criteria
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- rotatemat
  - Syntax
  - Menu
  - Description
  - Options
    - Rotation criteria
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- scoreplot
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- screeplot
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- Glossary
  - References
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - Y
  - Z
[PSS] Power and Sample Size
- Contents
- intro
  - Description
  - Remarks and examples
  - References
  - Also see
- GUI
  - Description
  - Menu
  - Remarks and examples
    - PSS Control Panel
    - Example with PSS Control Panel
  - Also see
- power
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- power, graph
  - Syntax
  - Menu
  - Description
  - Suboptions
  - Remarks and examples
  - Also see
- power, table
  - Syntax
  - Menu
  - Description
  - Suboptions
  - Remarks and examples
  - Stored results
  - Also see
- power onemean
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power twomeans
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Known standard deviations
    - Unknown standard deviations
      - Unequal standard deviations
      - Equal standard deviations
  - References
  - Also see
- power pairedmeans
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power oneproportion
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Large-sample normal approximation
    - Binomial test
  - References
  - Also see
- power twoproportions
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power pairedproportions
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power onevariance
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- power twovariances
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power onecorrelation
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power twocorrelations
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power oneway
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Computing power
  - References
  - Also see
- power twoway
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- power repeated
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Hypothesis testing
    - Computing power
  - References
  - Also see
- unbalanced designs
  - Syntax
  - Description
  - Options
  - Remarks and examples
    - Two samples
    - Fractional sample sizes
  - Also see
- Glossary
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - R
  - S
  - T
  - U
  - V
  - W
  - Z
[R] Base Reference
- Contents
- Introduction
  - intro
- A
  - about
  - adoupdate
  - ameans
  - anova
  - anova postestimation
  - areg
  - areg postestimation
  - asclogit
  - asclogit postestimation
  - asmprobit
  - asmprobit postestimation
  - asroprobit
  - asroprobit postestimation
- B
  - BIC note
  - binreg
  - binreg postestimation
  - biprobit
  - biprobit postestimation
  - bitest
  - bootstrap
  - bootstrap postestimation
  - boxcox
  - boxcox postestimation
  - brier
  - bsample
  - bstat
- C
  - centile
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - ci
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - clogit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - clogit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Methods and formulas
    - Reference
    - Also see
  - cloglog
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Introduction to complementary log-log regression
      - Robust standard errors
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - cloglog postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Also see
  - cls
    - Syntax
    - Description
  - cnsreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - cnsreg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Also see
  - constraint
    - Syntax
    - Menu
    - Description
    - Remarks and examples
    - References
    - Also see
  - contrast
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - contrast postestimation
    - Description
    - Remarks and examples
    - Also see
  - copyright
    - Syntax
    - Description
    - Remarks and examples
    - Also see
  - copyright apache
    - Description
    - Also see
  - copyright boost
    - Description
    - Also see
  - copyright freetype
    - Description
    - Legal Terms
    - Also see
  - copyright icu
    - Description
    - Also see
  - copyright jagpdf
    - Description
    - Also see
  - copyright lapack
    - Description
    - Also see
  - copyright libpng
    - Description
    - Also see
  - copyright miglayout
    - Description
    - Also see
  - copyright scintilla
    - Description
    - Also see
  - copyright ttf2pt1
    - Description
    - Also see
  - copyright zlib
    - Description
    - Also see
  - correlate
    - Syntax
    - Menu
    - Description
    - Options for correlate
    - Options for pwcorr
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - cumul
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Acknowledgment
    - References
    - Also see
  - cusum
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Acknowledgment
    - References
    - Also see
- D
  - db
  - diagnostic plots
  - display
  - do
  - doedit
  - dotplot
  - dstdize
  - dydx
- E
  - eform_option
    - Description
    - Remarks and examples
    - Reference
    - Also see
  - eivreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - eivreg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Also see
  - error messages
    - Description
    - Also see
  - esize
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - estat
    - Syntax
    - Description
  - estat classification
    - Syntax
    - Menu for estat
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - estat gof
    - Syntax
    - Menu for estat
    - Description
    - Options
    - Remarks and examples
      - Introduction
      - Samples other than the estimation sample
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - estat ic
    - Syntax
    - Menu for estat
    - Description
    - Option
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - estat summarize
    - Syntax
    - Menu for estat
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Also see
  - estat vce
    - Syntax
    - Menu for estat
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Reference
    - Also see
  - estimates
    - Syntax
    - Description
    - Remarks and examples
    - Also see
  - estimates describe
    - Syntax
    - Menu
    - Description
    - Option
    - Remarks and examples
    - Stored results
    - Also see
  - estimates for
    - Syntax
    - Description
    - Options
    - Remarks and examples
    - Also see
  - estimates notes
    - Syntax
    - Description
    - Remarks and examples
    - Also see
  - estimates replay
    - Syntax
    - Menu
    - Description
    - Remarks and examples
    - Also see
  - estimates save
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Also see
  - estimates stats
    - Syntax
    - Menu
    - Description
    - Option
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Also see
  - estimates store
    - Syntax
    - Menu
    - Description
    - Option
    - Remarks and examples
    - Stored results
    - References
    - Also see
  - estimates table
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - References
    - Also see
  - estimates title
    - Syntax
    - Menu
    - Description
    - Remarks and examples
    - Also see
  - estimation options
    - Syntax
    - Description
    - Options
    - Also see
  - exit
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - exlogistic
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - exlogistic postestimation
    - Description
      - Special-interest postestimation commands
    - Syntax for estat
    - Menu for estat
    - Options for estat predict
    - Option for estat se
    - Remarks and examples
    - Stored results
    - Reference
    - Also see
  - expoisson
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - Conditional distribution
    - References
    - Also see
  - expoisson postestimation
    - Description
      - Special-interest postestimation command
    - Syntax for estat se
    - Menu for estat
    - Option for estat se
    - Remarks and examples
    - Also see
- F
  - fp
  - fp postestimation
  - frontier
  - frontier postestimation
  - fvrevar
  - fvset
- G
  - gllamm
  - glm
  - glm postestimation
  - glogit
  - glogit postestimation
  - gmm
  - gmm postestimation
  - grmeanby
- H
  - hausman
  - heckman
  - heckman postestimation
  - heckoprobit
  - heckoprobit postestimation
  - heckprobit
  - heckprobit postestimation
  - help
  - hetprobit
  - hetprobit postestimation
  - histogram
- I
  - icc
  - inequality
    - Remarks and examples
    - References
  - intreg
  - intreg postestimation
  - ivpoisson
  - ivpoisson postestimation
  - ivprobit
  - ivprobit postestimation
  - ivregress
  - ivregress postestimation
  - ivtobit
  - ivtobit postestimation
- J
  - jackknife
  - jackknife postestimation
- K
  - kappa
  - kdensity
  - ksmirnov
  - kwallis
- L
  - ladder
  - level
  - limits
  - lincom
  - linktest
  - lnskew0
  - log
  - logistic
  - logistic postestimation
  - logit
  - logit postestimation
  - loneway
  - lowess
  - lpoly
  - lroc
  - lrtest
  - lsens
  - lv
- M
  - margins
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - margins postestimation
    - Description
    - Remarks and examples
    - Also see
  - margins, contrast
    - Syntax
    - Menu
    - Description
    - Suboptions
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Reference
    - Also see
  - margins, pwcompare
    - Syntax
    - Menu
    - Description
    - Suboptions
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Also see
  - marginsplot
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Addendum: Advanced uses of dimlist
    - Acknowledgments
    - References
    - Also see
  - matsize
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - maximize
    - Syntax
    - Description
    - Maximization options
    - Option for set maxiter
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mean
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Video example
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mean postestimation
    - Description
    - Remarks and examples
    - Also see
  - meta
    - Remarks and examples
    - References
  - mfp
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Acknowledgments
    - References
    - Also see
  - mfp postestimation
    - Description
      - Special-interest postestimation commands
    - Syntax for fracplot and fracpred
    - Menu for fracplot and fracpred
    - Options for fracplot
    - Options for fracpred
    - Remarks and examples
    - Methods and formulas
    - Also see
  - misstable
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Also see
  - mkspline
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Linear splines
      - Restricted cubic splines
    - Methods and formulas
      - Linear splines
      - Restricted cubic splines
    - Acknowledgment
    - References
    - Also see
  - ml
    - Syntax
      - Syntax of subroutines for use by evaluator programs
      - Syntax of user-written evaluator
    - Description
    - Options
    - Remarks and examples
      - Survey options and ml
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mlexp
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mlexp postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Option for predict
    - Also see
  - mlogit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mlogit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Reference
    - Also see
  - more
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - mprobit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - mprobit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - References
    - Also see
- N
  - nbreg
  - nbreg postestimation
  - nestreg
  - net
  - net search
  - netio
  - news
  - nl
  - nl postestimation
  - nlcom
  - nlogit
  - nlogit postestimation
  - nlsur
  - nlsur postestimation
  - nptrend
- O
  - ologit
  - ologit postestimation
  - oneway
  - oprobit
  - oprobit postestimation
  - orthog
- P
  - pcorr
    - Syntax
    - Menu
    - Description
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - permute
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - References
    - Also see
  - pk
    - Description
    - Remarks and examples
    - References
  - pkcollapse
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Methods and formulas
    - Also see
  - pkcross
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Methods and formulas
    - References
    - Also see
  - pkequiv
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - pkexamine
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Reference
    - Also see
  - pkshape
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - References
    - Also see
  - pksumm
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Methods and formulas
    - Also see
  - poisson
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - poisson postestimation
    - Description
      - Special-interest postestimation command
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Syntax for estat gof
    - Menu for estat
    - Remarks and examples
    - Methods and formulas
    - Also see
  - predict
    - Syntax
    - Menu for predict
    - Description
    - Options
    - Remarks and examples
    - Methods and formulas
    - Also see
  - predictnl
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Methods and formulas
    - References
    - Also see
  - probit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Robust standard errors
      - Model identification
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - probit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
      - Obtaining predicted values
      - Performing hypothesis tests
    - Methods and formulas
    - Also see
  - proportion
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - Confidence intervals
    - References
    - Also see
  - proportion postestimation
    - Description
    - Remarks and examples
    - Also see
  - prtest
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - pwcompare
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - pwcompare postestimation
    - Description
    - Remarks and examples
    - Also see
  - pwmean
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Reference
    - Also see
  - pwmean postestimation
    - Description
    - Remarks and examples
    - Also see
- Q
  - qc
  - qreg
  - qreg postestimation
  - query
- R
  - ranksum
    - Syntax
    - Menu
    - Description
    - Options for ranksum
    - Options for median
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - ranksum
      - median
    - References
    - Also see
  - ratio
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - ratio postestimation
    - Description
    - Remarks and examples
    - Also see
  - reg3
    - Syntax
    - Menu
    - Description
      - Nomenclature
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - reg3 postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Methods and formulas
    - Reference
    - Also see
  - regress
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgments
    - References
    - Also see
  - regress postestimation
    - Description
    - Predictions
    - DFBETA influence statistics
    - Tests for violation of assumptions
    - Variance inflation factors
    - Measures of effect size
    - Methods and formulas
      - predict
      - Special-interest postestimation commands
    - Acknowledgments
    - References
    - Also see
  - regress postestimation diagnostic plots
    - Description
    - rvfplot
    - avplot
    - avplots
    - cprplot
    - acprplot
    - rvpplot
    - lvr2plot
    - Methods and formulas
    - References
    - Also see
  - regress postestimation time series
    - Description
    - Syntax for estat archlm
    - Options for estat archlm
    - Syntax for estat bgodfrey
    - Options for estat bgodfrey
    - Syntax for estat durbinalt
    - Options for estat durbinalt
    - Syntax for estat dwatson
    - Menu for estat
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - #review
    - Syntax
    - Description
    - Remarks and examples
  - roc
    - Description
    - Reference
  - roccomp
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - rocfit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - rocfit postestimation
    - Description
      - Special-interest postestimation command
    - Syntax for rocplot
    - Menu
    - Options for rocplot
    - Remarks and examples
      - Using lincom and test
      - Using rocplot
    - Also see
  - rocreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgments
    - References
    - Also see
  - rocreg postestimation
    - Description
      - Special-interest postestimation command
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Syntax for estat nproc
    - Menu for estat
    - Options for estat nproc
    - Remarks and examples
      - Using predict after rocreg
      - Using estat nproc
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - rocregplot
    - Syntax
    - Menu
    - Description
    - probit_options
    - common_options
    - boot_options
    - Remarks and examples
      - Plotting covariate-specific ROC curves
      - Plotting marginal ROC curves
    - Methods and formulas
    - References
    - Also see
  - roctab
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - rologit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - rologit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Also see
  - rreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - rreg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Also see
  - runtest
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
- S
  - scobit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Skewed logistic model
      - Robust standard errors
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - scobit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Also see
  - sdtest
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - search
    - Syntax
    - Menu
    - Description
    - Options for search
    - Option for set searchdefault
    - Remarks and examples
    - Acknowledgment
    - Also see
  - serrbar
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Acknowledgment
    - Also see
  - set
    - Syntax
    - Description
    - Remarks and examples
    - Also see
  - set cformat
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - set_defaults
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - set emptycells
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - set seed
    - Syntax
    - Description
    - Remarks and examples
    - Also see
  - set showbaselevels
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - Also see
  - signrank
    - Syntax
    - Menu
    - Description
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - signrank
      - signtest
    - References
    - Also see
  - simulate
    - Syntax
    - Description
    - Options
    - Remarks and examples
    - References
    - Also see
  - sj
    - Description
    - Remarks and examples
      - Installing the Stata Journal software
      - Installing the STB software
    - Also see
  - sktest
    - Syntax
    - Menu
    - Description
    - Option
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgments
    - References
    - Also see
  - slogit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - slogit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Methods and formulas
    - Also see
  - smooth
    - Syntax
    - Menu
    - Description
    - Option
    - Remarks and examples
    - Methods and formulas
    - Acknowledgments
    - References
    - Also see
  - spearman
    - Syntax
    - Menu
    - Description
    - Options for spearman
    - Options for ktau
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - spikeplot
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Acknowledgments
    - References
    - Also see
  - ssc
    - Syntax
    - Description
      - Command overview
    - Options
    - Remarks and examples
    - Acknowledgments
    - References
    - Also see
  - stem
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - References
    - Also see
  - stepwise
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - stored results
    - Syntax
    - Description
    - Option
    - Remarks and examples
    - References
    - Also see
  - suest
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - summarize
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Video example
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - sunflower
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Acknowledgments
    - References
  - sureg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - sureg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - Also see
  - swilk
    - Syntax
    - Menu
    - Description
    - Options for swilk
    - Options for sfrancia
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - symmetry
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - Asymptotic tests
      - Exact symmetry test
    - References
    - Also see
- T
  - table
    - Syntax
    - Menu
    - Description
    - Options
      - Limits
    - Remarks and examples
    - Methods and formulas
    - Also see
  - tabstat
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - Video example
    - Acknowledgments
    - Also see
  - tabulate oneway
    - Syntax
    - Menu
    - Description
    - Options
      - Limits
    - Remarks and examples
    - Stored results
    - References
    - Also see
  - tabulate twoway
    - Syntax
    - Menu
    - Description
    - Options
      - Limits
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - tabulate, summarize()
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
      - One-way tables
      - Two-way tables
    - Also see
  - test
    - Syntax
    - Menu
    - Description
    - Options for testparm
    - Options for test
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - testnl
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - tetrachoric
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - tnbreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
      - Mean-dispersion model
      - Constant-dispersion model
    - Acknowledgment
    - References
    - Also see
  - tnbreg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Methods and formulas
      - Mean-dispersion model
      - Constant-dispersion model
    - Also see
  - tobit
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - tobit postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Remarks and examples
    - References
    - Also see
  - total
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - total postestimation
    - Description
    - Remarks and examples
    - Also see
  - tpoisson
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - Acknowledgment
    - References
    - Also see
  - tpoisson postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Methods and formulas
    - Also see
  - translate
    - Syntax
    - Description
    - Options for print
    - Options for translate
    - Remarks and examples
    - Stored results
    - Also see
  - truncreg
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
  - truncreg postestimation
    - Description
    - Syntax for predict
    - Menu for predict
    - Options for predict
    - Also see
  - ttest
    - Syntax
    - Menu
    - Description
    - Options
    - Remarks and examples
    - Stored results
    - Methods and formulas
    - References
    - Also see
- U
  - update
- V
  - vce_option
  - view
  - vwls
  - vwls postestimation
- W
  - which
- X
  - xi
- Z
  - zinb
  - zinb postestimation
  - zip
  - zip postestimation
- Author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
- Subject index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Z
[SEM] Structural Equation Modeling
- Contents
- Acknowledgments
  - Reference
  - Also see
- intro 1
  - Description
  - Remarks and examples
  - Also see
- intro 2
  - Description
  - Remarks and examples
  - Reference
  - Also see
- intro 3
  - Description
  - Remarks and examples
  - Also see
- intro 4
  - Description
  - Remarks and examples
  - References
  - Also see
- intro 5
  - Description
  - Remarks and examples
  - References
  - Also see
- intro 6
  - Description
  - Remarks and examples
  - Reference
  - Also see
- intro 7
  - Description
  - Remarks and examples
  - Also see
- intro 8
  - Description
  - Options
  - Remarks and examples
  - Also see
- intro 9
  - Description
  - Options
  - Remarks and examples
  - Also see
- intro 10
  - Description
  - Remarks and examples
  - Also see
- intro 11
  - Description
  - Remarks and examples
  - Reference
  - Also see
- intro 12
  - Description
  - Remarks and examples
  - Also see
- Builder
  - Description
  - Remarks and examples
    - Video example
  - Reference
- Builder, generalized
  - Description
  - Remarks and examples
    - Video example
  - Reference
- estat eform
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- estat eqgof
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- estat eqtest
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- estat framework
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- estat ggof
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- estat ginvariant
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- estat gof
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- estat mindices
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- estat residuals
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- estat scoretests
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - References
  - Also see
- estat stable
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- estat stdize
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- estat summarize
  - Syntax
  - Menu for estat
  - Description
  - Options
  - Stored results
  - Also see
- estat teffects
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- example 1
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 2
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 3
  - Description
  - Remarks and examples
  - References
  - Also see
- example 4
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 5
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 6
  - Description
  - Remarks and examples
  - Also see
- example 7
  - Description
  - Remarks and examples
  - References
  - Also see
- example 8
  - Description
  - Remarks and examples
  - Also see
- example 9
  - Description
  - Remarks and examples
  - References
  - Also see
- example 10
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 11
  - Description
  - Remarks and examples
  - Also see
- example 12
  - Description
  - Remarks and examples
    - Fitting the seemingly unrelated regression model
    - Fitting the model with the Builder
  - Also see
- example 13
  - Description
  - Remarks and examples
  - Also see
- example 14
  - Description
  - Remarks and examples
  - Also see
- example 15
  - Description
  - Remarks and examples
    - Fitting the model
    - Fitting the model with the Builder
  - Reference
  - Also see
- example 16
  - Description
  - Remarks and examples
  - Also see
- example 17
  - Description
  - Remarks and examples
    - Fitting the model
    - Fitting the model with the Builder
  - Reference
  - Also see
- example 18
  - Description
  - Remarks and examples
    - Fitting the model
    - Fitting the model with the Builder
  - Reference
  - Also see
- example 19
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 20
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 21
  - Description
  - Remarks and examples
  - Also see
- example 22
  - Description
  - Remarks and examples
  - Also see
- example 23
  - Description
  - Remarks and examples
    - Background
    - Fitting the constrained model
  - Also see
- example 24
  - Description
  - Remarks and examples
  - Also see
- example 25
  - Description
  - Remarks and examples
  - Also see
- example 26
  - Description
  - Remarks and examples
  - Also see
- example 27g
  - Description
  - Remarks and examples
  - Also see
- example 28g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 29g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 30g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 31g
  - Description
  - Remarks and examples
    - Fitting the two-factor model
    - Fitting the model with the Builder
  - Also see
- example 32g
  - Description
  - Remarks and examples
    - Structural model with measurement component
    - Fitting the model with the Builder
  - Also see
- example 33g
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 34g
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 35g
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 36g
  - Description
  - Remarks and examples
    - Fitting the MIMIC model
    - Fitting the model with the Builder
  - Reference
  - Also see
- example 37g
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 38g
  - Description
  - Remarks and examples
  - Reference
  - Also see
- example 39g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 40g
  - Description
  - Remarks and examples
    - The crossed model
    - Fitting the model with the Builder
  - Reference
  - Also see
- example 41g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 42g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 43g
  - Description
  - Remarks and examples
    - Fitting tobit regression models
    - Fitting the model with the Builder
  - Also see
- example 44g
  - Description
  - Remarks and examples
    - Fitting interval regression models
    - Fitting the model with the Builder
  - Also see
- example 45g
  - Description
  - Remarks and examples
  - References
  - Also see
- example 46g
  - Description
  - Remarks and examples
    - Fitting the treatment-effects model
    - Fitting the model with the Builder
  - References
  - Also see
- gsem
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- gsem estimation options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- gsem family-and-link options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- gsem model description options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- gsem path notation extensions
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- gsem postestimation
  - Description
  - Remarks and examples
  - Also see
- gsem reporting options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- lincom
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- lrtest
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- methods and formulas for gsem
  - Description
  - Remarks and examples
  - References
  - Also see
- methods and formulas for sem
  - Description
  - Remarks and examples
  - References
  - Also see
- nlcom
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- predict after gsem
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- predict after sem
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- sem
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- sem and gsem option constraints()
  - Syntax
  - Description
  - Remarks and examples
    - Use with sem
    - Use with gsem
  - Also see
- sem and gsem option covstructure()
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- sem and gsem option from()
  - Syntax
  - Description
  - Option
  - Remarks and examples
    - Syntax 1, especially useful when dealing with convergence problems
    - Syntax 2, seldom used
  - Also see
- sem and gsem option reliability()
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- sem and gsem path notation
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem and gsem syntax options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem estimation options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem group options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem model description options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem option method()
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem option noxconditional
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- sem option select()
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- sem path notation extensions
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- sem postestimation
  - Description
  - Remarks and examples
  - Also see
- sem reporting options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- sem ssd options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- ssd
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- test
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- testnl
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- Glossary
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - Z
[ST] Survival Analysis
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- survival analysis
  - Description
  - Remarks and examples
  - Reference
  - Also see
- ct
  - Description
  - Also see
- ctset
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Examples
    - Data errors flagged by ctset
  - Also see
- cttost
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- discrete
  - Description
  - Acknowledgment
  - References
  - Also see
- epitab
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- ltable
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- snapspan
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Snapshot and time-span datasets
    - Specifying varlist
  - Also see
- st
  - Description
  - Reference
  - Also see
- st_is
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- stbase
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- stci
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Single-failure data
    - Multiple-failure data
  - Stored results
  - Methods and formulas
  - References
  - Also see
- stcox
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- stcox PH-assumption tests
  - Syntax
  - Menu
  - Description
  - Options for stphplot
  - Options for stcoxkm
  - Options for estat phtest
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- stcox postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat concordance
  - Menu for estat
  - Options for estat concordance
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - estat concordance
  - References
  - Also see
- stcrreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- stcrreg postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- stcurve
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- stdescribe
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- stfill
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- stgen
  - Syntax
  - Menu
  - Description
  - Functions
  - Remarks and examples
  - Also see
- stir
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- stpower
  - Syntax
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- stpower cox
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- stpower exponential
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- stpower logrank
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- stptime
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- strate
  - Syntax
  - Menu
  - Description
  - Options for strate
  - Options for stmh and stmc
  - Remarks and examples
  - Stored results
  - Acknowledgments
  - References
  - Also see
- streg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Parameter estimation
  - References
  - Also see
- streg postestimation
  - Description
    - Special-interest postestimation command
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- sts
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- sts generate
  - Syntax
  - Menu
  - Description
  - Functions
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- sts graph
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- sts list
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- sts test
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- stset
  - Syntax
  - Menu
  - Description
  - Options for use with stset and streset
  - Options unique to streset
  - Options for st
  - Remarks and examples
  - References
  - Also see
- stsplit
  - Syntax
  - Menu
  - Description
  - Options for stsplit
  - Option for stjoin
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- stsum
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Single-failure data
    - Multiple-failure data
  - Stored results
  - Methods and formulas
  - Also see
- sttocc
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- sttoct
  - Syntax
  - Description
  - Options
  - Remarks and examples
    - Case 1: entvar not specified
    - Case 2: entvar specified
  - Also see
- stvary
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- Glossary
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
[SVY] Survey Data
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- survey
  - Description
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- bootstrap_options
  - Syntax
  - Description
  - Options
  - Also see
- brr_options
  - Syntax
  - Description
  - Options
  - Also see
- direct standardization
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- estat
  - Syntax
  - Menu
  - Description
  - Options for estat effects
  - Options for estat lceffects
  - Options for estat size
  - Options for estat sd
  - Options for estat cv
  - Options for estat gof
  - Options for estat vce
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- jackknife_options
  - Syntax
  - Description
  - Options
  - Also see
- ml for svy
  - Remarks and examples
  - Reference
  - Also see
- poststratification
  - Description
  - Remarks and examples
    - Overview
    - Video example
  - Methods and formulas
  - References
  - Also see
- sdr_options
  - Syntax
  - Description
  - Options
  - Also see
- subpopulation estimation
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- svy
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- svy bootstrap
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- svy brr
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- svy estimation
  - Description
  - Menu
  - Remarks and examples
  - References
  - Also see
- svy jackknife
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- svy postestimation
  - Description
  - Syntax for predict
  - Remarks and examples
  - References
  - Also see
- svy sdr
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- svy: tabulate oneway
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- svy: tabulate twoway
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- svydescribe
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- svymarkout
  - Syntax
  - Description
  - Stored results
  - Also see
- svyset
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- variance estimation
  - Description
  - Remarks and examples
  - References
  - Also see
- Glossary
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - V
  - W
  - Y
  - Z
[TE] Treatment Effects
- Contents
- treatment effects
  - Description
  - Also see
- etpoisson
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- etpoisson postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- etregress
  - Syntax
  - Menu
  - Description
  - Options for maximum likelihood estimates
  - Options for two-step consistent estimates
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- etregress postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- teffects
  - Syntax
  - Description
  - Also see
- teffects intro
  - Description
  - Remarks and examples
  - Reference
  - Also see
- teffects intro advanced
  - Description
  - Remarks and examples
  - Acknowledgments
  - References
  - Also see
- teffects aipw
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Overview
    - Video example
  - Stored results
  - Methods and formulas
  - References
  - Also see
- teffects ipw
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Overview
    - Video example
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- teffects ipwra
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Overview
    - Video example
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- teffects multivalued
  - Description
  - Remarks and examples
  - References
  - Also see
- teffects nnmatch
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - Nearest-neighbor matching estimator
      - Bias-corrected matching estimator
    - Propensity-score matching estimator
      - PSM, ATE, and ATET variance adjustment
  - References
  - Also see
- teffects overlap
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- teffects postestimation
  - Description
  - Syntax
  - Options
  - Remarks and examples
  - Also see
- teffects psmatch
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- teffects ra
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Overview
    - Video example
  - Stored results
  - Methods and formulas
  - References
  - Also see
- Glossary
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - K
  - L
  - M
  - N
  - O
  - P
  - R
  - S
  - T
  - U
  - V
  - W
  - Z
[TS] Time Series
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- time series
  - Description
  - Remarks and examples
  - References
  - Also see
- arch
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- arch postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- arfima
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- arfima postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
    - Forecasting after ARFIMA
    - IRF results for ARFIMA
  - Methods and formulas
  - References
  - Also see
- arima
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- arima postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
    - Forecasting after ARIMA
    - IRF results for ARIMA
  - Reference
  - Also see
- corrgram
  - Syntax
  - Menu
  - Description
  - Options for corrgram
  - Options for ac and pac
  - Remarks and examples
    - Basic examples
    - Video example
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- cumsp
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- dfactor
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - An introduction to dynamic-factor models
    - Some examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- dfactor postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- dfgls
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- dfuller
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- estat acplot
  - Syntax
  - Menu for estat
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- estat aroots
  - Syntax
  - Menu for estat
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- fcast compute
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- fcast graph
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- forecast
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- forecast adjust
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- forecast clear
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- forecast coefvector
  - Syntax
  - Description
  - Options
  - Remarks and examples
    - Introduction
    - Simulations with coefficient vectors
  - Methods and formulas
  - Also see
- forecast create
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- forecast describe
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- forecast drop
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- forecast estimates
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- forecast exogenous
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- forecast identity
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- forecast list
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- forecast query
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- forecast solve
  - Syntax
  - Description
  - Options
  - Remarks and examples
    - Performing conditional forecasts
    - Using simulations to measure forecast accuracy
  - Stored results
  - Methods and formulas
  - References
  - Also see
- irf
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- irf add
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- irf cgraph
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf create
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- irf ctable
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf describe
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf drop
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- irf graph
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf ograph
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf rename
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- irf set
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- irf table
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mgarch
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- mgarch ccc
  - Syntax
  - Menu
  - Description
  - Options
    - Eqoptions
  - Remarks and examples
    - Some examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mgarch ccc postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- mgarch dcc
  - Syntax
  - Menu
  - Description
  - Options
    - Eqoptions
  - Remarks and examples
    - Some examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mgarch dcc postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- mgarch dvech
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Some examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mgarch dvech postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- mgarch vcc
  - Syntax
  - Menu
  - Description
  - Options
    - Eqoptions
  - Remarks and examples
    - Some examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- mgarch vcc postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- newey
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- newey postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- pergram
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- pperron
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- prais
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- prais postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- psdensity
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - The frequency-domain approach to time series
    - Some ARMA examples
  - Methods and formulas
  - References
  - Also see
- rolling
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgment
  - References
  - Also see
- sspace
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- sspace postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- tsappend
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- tsfill
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Also see
- tsfilter
  - Syntax
  - Description
  - Remarks and examples
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- tsfilter bk
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tsfilter bw
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tsfilter cf
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tsfilter hp
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tsline
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Basic examples
    - Video example
  - References
  - Also see
- tsreport
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Basic examples
    - Video example
  - Stored results
  - Also see
- tsrevar
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- tsset
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - References
  - Also see
- tssmooth
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- tssmooth dexponential
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tssmooth exponential
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- tssmooth hwinters
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- tssmooth ma
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - Overview
    - Video example
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- tssmooth nl
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Also see
- tssmooth shwinters
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- ucm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- ucm postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat period
  - Menu for estat
  - Options for estat period
  - Remarks and examples
  - Methods and formulas
  - Also see
- var intro
  - Description
  - Remarks and examples
  - References
  - Also see
- var
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- var postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
    - Model selection and inference
    - Forecasting
  - Methods and formulas
  - Also see
- var svar
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- var svar postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
    - Model selection and inference
    - Forecasting
  - Also see
- varbasic
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varbasic postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- vargranger
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varlmar
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varnorm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varsoc
  - Syntax
  - Menu
  - Description
  - Preestimation options
  - Postestimation option
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varstable
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- varwle
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- vec intro
  - Description
  - Remarks and examples
    - Introduction to cointegrating VECMs
    - VECM estimation in Stata
  - References
  - Also see
- vec
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- vec postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
    - Model selection and inference
    - Forecasting
  - Also see
- veclmar
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Reference
  - Also see
- vecnorm
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- vecrank
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- vecstable
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- wntestb
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- wntestq
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xcorr
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- Glossary
  - References
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
[XT] Longitudinal Data/Panel Data
- Contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - Also see
- xt
  - Syntax
  - Description
  - Remarks and examples
  - References
  - Also see
- quadchk
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
- vce_options
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Reference
  - Also see
- xtabond
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtabond postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat abond
  - Remarks and examples
    - estat abond
    - estat sargan
  - Methods and formulas
  - Also see
- xtcloglog
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for PA model
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtcloglog, re and the robust VCE estimator
  - References
  - Also see
- xtcloglog postestimation
  - Description
  - Syntax for predict
  - Menu for predict
    - Options for predict
  - Remarks and examples
  - Also see
- xtdata
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Methods and formulas
  - Also see
- xtdescribe
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- xtdpd
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- xtdpd postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat abond
  - Remarks and examples
    - estat abond
    - estat sargan
  - Methods and formulas
  - Reference
  - Also see
- xtdpdsys
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- xtdpdsys postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat
  - Menu for estat
  - Option for estat abond
  - Remarks and examples
    - estat abond
    - estat sargan
  - Methods and formulas
  - Reference
  - Also see
- xtfrontier
  - Syntax
  - Menu
  - Description
  - Options for time-invariant model
  - Options for time-varying decay model
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtfrontier postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- xtgee
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtgee postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for estat wcorrelation
  - Menu for estat
  - Options for estat wcorrelation
  - Remarks and examples
  - Also see
- xtgls
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtgls postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xthtaylor
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xthtaylor postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - References
  - Also see
- xtintreg
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtintreg postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- xtivreg
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for BE model
  - Options for FE model
  - Options for FD model
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- xtivreg postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xtline
  - Syntax
  - Menu
  - Description
  - Options for graph by panel
  - Options for overlaid panels
  - Remarks and examples
  - Also see
- xtlogit
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for FE model
  - Options for PA model
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtlogit, re and the robust VCE estimator
  - References
  - Also see
- xtlogit postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- xtnbreg
  - Syntax
  - Menu
  - Description
  - Options for RE/FE models
  - Options for PA model
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtnbreg postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Methods and formulas
  - Also see
- xtologit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtologit and the robust VCE estimator
  - References
  - Also see
- xtologit postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- xtoprobit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtoprobit and the robust VCE estimator
  - References
  - Also see
- xtoprobit postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- xtpcse
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- xtpcse postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xtpoisson
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for FE model
  - Options for PA model
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtpoisson, re normal and the robust VCE estimator
  - References
  - Also see
- xtpoisson postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Methods and formulas
  - Also see
- xtprobit
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for PA model
  - Remarks and examples
  - Stored results
  - Methods and formulas
    - xtprobit, re and the robust VCE estimator
  - References
  - Also see
- xtprobit postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Remarks and examples
  - Also see
- xtrc
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xtrc postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xtreg
  - Syntax
  - Menu
  - Description
  - Options for RE model
  - Options for BE model
  - Options for FE model
  - Options for MLE model
  - Options for PA model
  - Remarks and examples
    - Assessing goodness of fit
    - xtreg and associated commands
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- xtreg postestimation
  - Description
    - Special-interest postestimation commands
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Syntax for xttest0
  - Menu for xttest0
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- xtregar
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgment
  - References
  - Also see
- xtregar postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xtset
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- xtsum
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- xttab
  - Syntax
  - Menu
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- xttobit
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- xttobit postestimation
  - Description
  - Syntax for predict
  - Menu for predict
  - Options for predict
  - Also see
- xtunitroot
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - Acknowledgments
  - References
  - Also see
- Glossary
- Subject and author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
[P] Programming
- Contents
- Combined subject table of contents
- intro
  - Description
  - Remarks and examples
    - What's new
  - References
  - Also see
- automation
  - Description
  - Remarks and examples
  - Also see
- break
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- byable
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- capture
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- char
  - Syntax
  - Description
  - Option
  - Remarks and examples
    - How to program with characteristics
  - Also see
- class
  - Description
  - Remarks and examples
  - Also see
- class exit
  - Syntax
  - Description
  - Remarks and examples
    - Examples
  - Also see
- classutil
  - Syntax
  - Description
  - Options for classutil describe
  - Options for classutil dir
  - Option for classutil which
  - Remarks and examples
  - Stored results
  - Also see
- comments
  - Description
  - Remarks and examples
  - Also see
- confirm
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- continue
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- creturn
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Also see
- _datasignature
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- #delimit
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- dialog programming
  - Description
  - Remarks and examples
  - Also see
- discard
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- display
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- ereturn
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- error
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- estat programming
  - Description
  - Remarks and examples
  - Also see
- _estimates
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- exit
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- file
  - Syntax
  - Description
  - Options
    - ASCII text output specifications
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- file formats .dta
  - Description
  - Remarks and examples
  - Also see
- findfile
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- foreach
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- forvalues
  - Syntax
  - Description
  - Remarks and examples
  - Reference
  - Also see
- fvexpand
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- gettoken
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- if
  - Syntax
  - Description
  - Remarks and examples
    - Introduction
    - Avoid single-line if and else with ++ and -/- macro expansion
  - Reference
  - Also see
- include
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- java
  - Description
  - Usage
  - Remarks and examples
  - Also see
- javacall
  - Syntax
  - Description
  - Option
  - Also see
- levelsof
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Acknowledgments
  - References
  - Also see
- macro
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- macro lists
  - Syntax
  - Description
  - Remarks and examples
    - Treatment of adornment
    - Treatment of duplicate elements in lists
  - Also see
- makecns
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- mark
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- matlist
  - Syntax
  - Description
  - Style options
  - General options
  - Required options for the second syntax
  - Remarks and examples
  - Also see
- matrix
  - Description
  - Remarks and examples
  - Also see
- matrix accum
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Reference
  - Also see
- matrix define
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - References
  - Also see
- matrix dissimilarity
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- matrix eigenvalues
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- matrix get
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- matrix mkmat
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
    - mkmat
    - svmat
  - Acknowledgment
  - References
  - Also see
- matrix rownames
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- matrix score
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- matrix svd
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Methods and formulas
  - Reference
  - Also see
- matrix symeigen
  - Syntax
  - Menu
  - Description
  - Remarks and examples
  - Methods and formulas
  - References
  - Also see
- matrix utility
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - Also see
- more
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- nopreserve option
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- numlist
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- pause
  - Syntax
  - Description
  - Remarks and examples
  - Reference
  - Also see
- plugin
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- postfile
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- _predict
  - Syntax
  - Description
  - Options
  - Methods and formulas
  - Reference
  - Also see
- preserve
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- program
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- program properties
  - Description
  - Option
  - Remarks and examples
  - Also see
- Project Manager
  - Description
  - Remarks and examples
  - Also see
- putexcel
  - Syntax
  - Menu
  - Description
  - Options
  - Remarks and examples
  - References
  - Also see
- quietly
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- _return
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Stored results
  - Also see
- return
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- _rmcoll
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Also see
- rmsg
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- _robust
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Stored results
  - Methods and formulas
  - References
  - Also see
- scalar
  - Syntax
  - Description
  - Remarks and examples
    - Naming scalars
  - Reference
  - Also see
- serset
  - Syntax
  - Description
  - Options for serset create
  - Options for serset create_xmedians
  - Option for serset create_cspline
  - Option for serset summarize
  - Option for serset use
  - Remarks and examples
  - Stored results
  - Also see
- signestimationsample
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- sleep
  - Syntax
  - Description
  - Remarks and examples
- smcl
  - Description
  - Remarks and examples
  - Also see
- sortpreserve
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- syntax
  - Syntax
  - Description
  - Syntax, continued
  - Remarks and examples
  - Also see
- sysdir
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- tabdisp
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- timer
  - Syntax
  - Description
  - Remarks and examples
  - Stored results
  - Also see
- tokenize
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- trace
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Also see
- unab
  - Syntax
  - Description
  - Options
  - Remarks and examples
  - Reference
  - Also see
- unabcmd
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- varabbrev
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- version
  - Syntax
  - Description
  - Option
  - Remarks and examples
  - Also see
- viewsource
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- while
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- window programming
  - Syntax
  - Description
  - Also see
- window fopen
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- window manage
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- window menu
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- window push
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- window stopbox
  - Syntax
  - Description
  - Remarks and examples
  - Also see
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - Y
  - Z
[M] Mata
- Contents
- Introduction to the Mata manual
  - intro
- Introduction and advice
- Language definition
- Commands for controlling Mata
- Index and guide to functions
- Mata functions
- Mata glossary of common terms
  - Glossary
- Subject and author index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
[I] Index
- Contents
- Combined subject table of contents
- Acronym glossary
- Glossary
- Vignette index
- Author index
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
  - Subject index
  - Symbols
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
Subject index
- Symbols
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z

STATA BASE REFERENCE MANUAL

RELEASE 13

A Stata Press Publication

StataCorp LP

College Station, Texas

1985–2013 StataCorp LP

Version 13

Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845

Typeset in T

ISBN-10: 1-59718-116-1

ISBN-13: 978-1-59718-116-7

This manual is protected by copyright. All rights are reserved. No part of this manual may be reproduced, stored

in a retrieval system, or transcribed, in any form or by any means—electronic, mechanical, photocopy, recording, or

otherwise—without the prior written permission of StataCorp LP unless permitted subject to the terms and conditions

of a license granted to you by StataCorp LP to use the software and documentation. No license, express or implied,

by estoppel or otherwise, to any intellectual property rights is granted by this document.

StataCorp provides this manual “as is” without warranty of any kind, either expressed or implied, including, but

not limited to, the implied warranties of merchantability and ﬁtness for a particular purpose. StataCorp may make

improvements and/or changes in the product(s) and the program(s) described in this manual at any time and without

notice.

The software described in this manual is furnished under a license agreement or nondisclosure agreement. The software

may be copied only in accordance with the terms of the agreement. It is against the law to copy the software onto

DVD, CD, disk, diskette, tape, or any other medium for any purpose other than backup or archival purposes.

The automobile dataset appearing on the accompanying media is Copyright c

1979 by Consumers Union of U.S.,

Inc., Yonkers, NY 10703-1057 and is reproduced by permission from CONSUMER REPORTS, April 1979.

Stata, , Stata Press, Mata, , and NetCourse are registered trademarks of StataCorp LP.

Stata and Stata Press are registered trademarks with the World Intellectual Property Organization of the United Nations.

NetCourseNow is a trademark of StataCorp LP.

Other brand and product names are registered trademarks or trademarks of their respective companies.

For copyright information about the software, type help copyright within Stata.

The suggested citation for this software is

StataCorp. 2013. Stata: Release 13 . Statistical Software. College Station, TX: StataCorp LP.

Contents

intro ........................................ Introductionto base referencemanual 1

about ....................................... Displayinformation about yourStata 7

adoupdate ........................................... Updateuser-writtenado-ﬁles 8

ameans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetic, geometric, and harmonic means 12

anova ........................................ Analysis ofvarianceandcovariance 16

anova postestimation ................................. Postestimationtools for anova 57

areg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear regression with a large dummy-variable set 74

areg postestimation ................................... Postestimationtools for areg 80

asclogit . . . . . . . . . . . . . . . . Alternative-speciﬁc conditional logit (McFadden’s choice) model 84

asclogit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for asclogit 94

asmprobit . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative-speciﬁc multinomial probit regression 101

asmprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for asmprobit 126

asroprobit . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative-speciﬁc rank-ordered probit regression 136

asroprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for asroprobit 149

BIC note ........................................ Calculating and interpretingBIC 157

binreg . . . . . . . . . . . . . . . . . . . Generalized linear models: Extensions to the binomial family 162

binreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for binreg 175

biprobit ............................................... Bivariateprobit regression 178

biprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for biprobit 185

bitest ................................................... Binomialprobability test 188

bootstrap ....................................... Bootstrapsampling and estimation 193

bootstrap postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for bootstrap 215

boxcox .............................................. Box–Cox regression models 219

boxcox postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for boxcox 230

brier .................................................. Brierscore decomposition 235

bsample .............................................. Samplingwith replacement 241

bstat .................................................... Reportbootstrap results 249

centile ....................................... Reportcentile and conﬁdenceinterval 256

ci . . . . . . . . . . . . . . . . . . . . . . . . . . . Conﬁdence intervals for means, proportions, and counts 262

clogit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional (ﬁxed-effects) logistic regression 274

clogit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for clogit 290

cloglog ......................................... Complementarylog-log regression 295

cloglog postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for cloglog 304

cls ....................................................... ClearResults window 307

cnsreg .............................................. Constrained linear regression 308

cnsreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for cnsreg 314

constraint .............................................. Deﬁneand list constraints 317

contrast . . . . . . . . . . . . . . . . . . . . . . . . . . Contrasts and linear hypothesis tests after estimation 320

contrast postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for contrast 383

ii Contents

correlate . . . . . . . . . . . . . . . . . . . . . . . . Correlations (covariances) of variables or coefﬁcients 404

cumul ................................................... Cumulative distribution 412

cusum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cusum plots and tests for binary variables 416

db ............................................................. Launch dialog 420

diagnostic plots ..................................... Distributional diagnostic plots 422

display ........................................... Substitutefor a handcalculator 434

do ................................................ Execute commands froma ﬁle 435

doedit ............................................ Editdo-ﬁles and othertextﬁles 436

dotplot ................................................. Comparative scatterplots 437

dstdize ......................................... Directand indirect standardization 444

dydx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculate numeric derivatives and integrals 463

eform option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Displaying exponentiated coefﬁcients 469

eivreg .............................................. Errors-in-variablesregression 471

eivreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for eivreg 476

error messages .................................... Error messagesand return codes 478

esize ....................................... Effect size basedon mean comparison 479

estat .................................................... Postestimationstatistics 490

estat classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classiﬁcation statistics and table 491

estat gof . . . . . . . . . . . . . . . . . . . . . . . . . . Pearson or Hosmer–Lemeshow goodness-of-ﬁt test 494

estat ic .............................................. Display information criteria 503

estat summarize ...................................... Summarize estimationsample 507

estat vce ...................................... Display covariance matrix estimates 510

estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Save and manipulate estimation results 513

estimates describe ...................................... Describeestimation results 517

estimates for . . . . . . . . . . . . . . . . . . . . . . . . . Repeat postestimation command across models 519

estimates notes .................................... Add notes toestimation results 521

estimates replay ....................................... Redisplay estimation results 523

estimates save ..................................... Save and useestimation results 526

estimates stats .......................................... Model-selection statistics 530

estimates store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Store and restore estimation results 532

estimates table ......................................... Compare estimationresults 535

estimates title ........................................ Settitle for estimationresults 541

estimation options ............................................. Estimation options 542

exit ................................................................ Exit Stata 545

exlogistic ............................................... Exact logistic regression 546

exlogistic postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for exlogistic 564

expoisson ............................................... Exact Poisson regression 569

expoisson postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for expoisson 581

fp .............................................. Fractional polynomialregression 583

fp postestimation ....................................... Postestimationtools for fp 607

frontier ................................................ Stochasticfrontier models 616

frontier postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for frontier 631

fvrevar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factor-variables operator programming command 635

Contents iii

fvset .............................................. Declarefactor-variable settings 638

gllamm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized linear and latent mixed models 643

glm ................................................... Generalizedlinear models 645

glm postestimation .................................... Postestimationtools for glm 679

glogit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logit and probit regression for grouped data 685

glogit postestimation . . . . . . . . . . Postestimation tools for glogit, gprobit, blogit, and bprobit 696

gmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized method of moments estimation 698

gmm postestimation .................................. Postestimationtools for gmm 760

grmeanby . . . . . . . . . . . . . . . . . . . . . . . . . Graph means and medians by categorical variables 764

hausman .............................................. Hausman speciﬁcation test 767

heckman ............................................... Heckmanselection model 776

heckman postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for heckman 794

heckoprobit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ordered probit model with sample selection 800

heckoprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for heckoprobit 809

heckprobit ..................................... Probitmodel with sampleselection 814

heckprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for heckprobit 822

help ...................................................... Displayhelp in Stata 827

hetprobit ............................................ Heteroskedastic probitmodel 829

hetprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for hetprobit 836

histogram . . . . . . . . . . . . . . . . . . . . . . . . Histograms for continuous and categorical variables 839

icc .............................................. Intraclass correlation coefﬁcients 850

inequality ................................................... Inequalitymeasures 872

intreg ....................................................... Interval regression 875

intreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for intreg 885

ivpoisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poisson regression with endogenous regressors 890

ivpoisson postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ivpoisson 905

ivprobit . . . . . . . . . . . . . . . . . . . . . . . . . . Probit model with continuous endogenous regressors 910

ivprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ivprobit 923

ivregress . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-equation instrumental-variables regression 927

ivregress postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ivregress 943

ivtobit . . . . . . . . . . . . . . . . . . . . . . . . . . . Tobit model with continuous endogenous regressors 961

ivtobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for ivtobit 971

jackknife .................................................. Jackknife estimation 975

jackknife postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for jackknife 987

kappa ...................................................... Interrateragreement 988

kdensity ...................................... Univariate kernel density estimation1002

ksmirnov . . . . . . . . . . . . . . . . . . . . . . . . . Kolmogorov–Smirnov equality-of-distributions test 1012

kwallis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kruskal–Wallis equality-of-populations rank test 1016

ladder ........................................................ Ladderof powers1019

level ................................................ Set default conﬁdencelevel1026

limits ................................................. Quickreference for limits1028

lincom ......................................... Linearcombinations of estimators1033

linktest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Speciﬁcation link test for single-equation models 1041

lnskew0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Find zero-skewness log or Box–Cox transform 1047

log ................................................. Echocopyof session toﬁle1051

logistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic regression, reporting odds ratios 1055

logistic postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for logistic 1067

logit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic regression, reporting coefﬁcients 1077

iv Contents

logit postestimation ................................... Postestimation toolsfor logit1090

loneway . . . . . . . . . . . . . . . . . . . . . . Large one-way ANOVA, random effects, and reliability 1096

lowess ...................................................... Lowess smoothing1102

lpoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel-weighted local polynomial smoothing 1108

lroc . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute area under ROC curve and graph the curve 1118

lrtest ......................................... Likelihood-ratio testafter estimation1124

lsens . . . . . . . . . . . . . . . . . . . . . . . . Graph sensitivity and speciﬁcity versus probability cutoff 1134

lv ........................................................ Letter-value displays1139

margins . . . . . . . . . . . . . . . . . . . . . Marginal means, predictive margins, and marginal effects 1145

margins postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for margins 1200

margins, contrast ............................................ Contrasts ofmargins1202

margins, pwcompare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairwise comparisons of margins 1219

marginsplot . . . . . . . . . . . . . . . . . . . . . . . . . . Graph results from margins (proﬁle plots, etc.) 1224

matsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set the maximum number of variables in a model 1259

maximize ........................................ Detailsof iterative maximization1261

mean .......................................................... Estimatemeans1268

mean postestimation .................................. Postestimationtools for mean1279

meta ............................................................ Meta-analysis1281

mfp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariable fractional polynomial models 1283

mfp postestimation .................................... Postestimationtools for mfp1295

misstable ................................................ Tabulate missing values1300

mkspline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear and restricted cubic spline construction 1308

ml ............................................... Maximumlikelihoodestimation1314

mlexp . . . . . . . . . . . . . . . . . . . . Maximum likelihood estimation of user-speciﬁed expressions 1341

mlexp postestimation ................................. Postestimation tools formlexp1353

mlogit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multinomial (polytomous) logistic regression 1355

mlogit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mlogit 1369

more ................................................... The —more— message1379

mprobit ............................................. Multinomialprobit regression1381

mprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for mprobit 1388

nbreg .............................................. Negative binomialregression1391

nbreg postestimation . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for nbreg and gnbreg 1403

nestreg .................................................. Nested model statistics1407

net . . . . . . . . . . . . . . . . . . . . . . . Install and manage user-written additions from the Internet 1413

net search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search the Internet for installable packages 1431

netio ................................................ ControlInternet connections1435

news ........................................................ Report Stata news1438

nl ............................................. Nonlinearleast-squares estimation1440

nl postestimation ....................................... Postestimation toolsfor nl1460

nlcom ....................................... Nonlinearcombinations of estimators1464

nlogit ................................................... Nestedlogit regression1475

nlogit postestimation ................................. Postestimation toolsfor nlogit1497

nlsur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of nonlinear systems of equations 1502

nlsur postestimation .................................. Postestimation tools for nlsur1524

nptrend ....................................... Test for trend acrossordered groups1527

ologit ................................................ Orderedlogistic regression1531

ologit postestimation ................................. Postestimation toolsfor ologit1540

oneway ............................................. One-way analysis ofvariance1544

oprobit ................................................ Orderedprobit regression1555

Contents v

oprobit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for oprobit 1560

orthog . . . . . . . . . . . . . . . . . . . Orthogonalize variables and compute orthogonal polynomials 1564

pcorr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial and semipartial correlation coefﬁcients 1570

permute ........................................... MonteCarlo permutation tests1573

pk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pharmacokinetic (biopharmaceutical) data 1583

pkcollapse . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate pharmacokinetic measurement dataset 1591

pkcross ............................................ Analyzecrossover experiments1594

pkequiv ............................................. Perform bioequivalence tests1603

pkexamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculate pharmacokinetic measures 1610

pkshape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reshape (pharmacokinetic) Latin-square data 1616

pksumm ......................................... Summarizepharmacokinetic data1624

poisson ...................................................... Poissonregression1629

poisson postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for poisson 1639

predict . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtain predictions, residuals, etc., after estimation 1645

predictnl . . . . . . . . . . . . . Obtain nonlinear predictions, standard errors, etc., after estimation 1656

probit ........................................................ Probitregression1668

probit postestimation ................................. Postestimation tools forprobit1681

proportion ................................................. Estimateproportions1685

proportion postestimation . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for proportion 1691

prtest ...................................................... Tests of proportions1693

pwcompare ................................................ Pairwise comparisons1698

pwcompare postestimation . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for pwcompare 1730

pwmean .......................................... Pairwise comparisons ofmeans1732

pwmean postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for pwmean 1744

qc ....................................................... Qualitycontrol charts1746

qreg ....................................................... Quantile regression1761

qreg postestimation . . . . . . . . . . . . . . Postestimation tools for qreg, iqreg, sqreg, and bsqreg 1791

query ................................................ Displaysystem parameters1795

ranksum ......................................... Equalitytests on unmatcheddata1802

ratio ........................................................... Estimateratios1809

ratio postestimation ................................... Postestimation toolsfor ratio1818

reg3 . . . . . . . . . . . . . . . . . . . . Three-stage estimation for systems of simultaneous equations 1819

reg3 postestimation ................................... Postestimation toolsfor reg31840

regress ....................................................... Linear regression1845

regress postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for regress 1870

regress postestimation diagnostic plots . . . . . . . . . . . . . . . . . Postestimation plots for regress 1905

regress postestimation time series . . . . . . . Postestimation tools for regress with time series 1924

#review .............................................. Review previous commands1934

roc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Receiver operating characteristic (ROC) analysis 1935

roccomp .......................................... Tests of equality ofROCareas1937

rocﬁt ................................................... Parametric ROC models1949

rocﬁt postestimation .................................. Postestimationtools for rocﬁt1956

rocreg . . . . . . . . . . . . . . . . . . . . . . . . . . . . Receiver operating characteristic (ROC) regression 1960

rocreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for rocreg 2013

rocregplot . . . . . . . . . . . . . . . . . Plot marginal and covariate-speciﬁc ROC curves after rocreg 2028

roctab .............................................. NonparametricROCanalysis2048

rologit ............................................ Rank-orderedlogistic regression2058

rologit postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for rologit 2075

rreg .......................................................... Robustregression2077

vi Contents

rreg postestimation .................................... Postestimationtools for rreg2084

runtest ................................................... Testfor random order2086

scobit ................................................ Skewedlogistic regression2092

scobit postestimation ................................. Postestimation tools forscobit2101

sdtest ................................................. Variance-comparisontests2104

search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Stata documentation and other resources 2110

serrbar ............................................ Graphstandard error barchart2116

set ............................................... Overview ofsystem parameters2119

set cformat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Format settings for coefﬁcient tables 2131

set defaults . . . . . . . . . . . . . . . . . . . . . . . . Reset system parameters to original Stata defaults 2134

set emptycells . . . . . . . . . . . . . . . . . . . . . . . . Set what to do with empty cells in interactions 2136

set seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify initial value of random-number seed 2137

set showbaselevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display settings for coefﬁcient tables 2142

signrank ........................................... Equalitytests on matcheddata2151

simulate ................................................ MonteCarlo simulations2157

sj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stata Journal and STB installation instructions 2164

sktest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Skewness and kurtosis test for normality 2167

slogit .............................................. Stereotypelogistic regression2172

slogit postestimation ................................. Postestimationtools for slogit2185

smooth ............................................... Robust nonlinear smoother2189

spearman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spearman’s and Kendall’s correlations 2197

spikeplot ............................................. Spike plots androotograms2206

ssc ....................................... Installand uninstall packagesfrom SSC2210

stem .................................................... Stem-and-leafdisplays2218

stepwise .................................................... Stepwiseestimation2222

stored results ..................................................... Storedresults2232

suest ............................................. Seeminglyunrelated estimation2237

summarize ................................................... Summarystatistics2255

sunﬂower ...................................... Density-distribution sunﬂowerplots2265

sureg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zellner’s seemingly unrelated regression 2271

sureg postestimation .................................. Postestimationtools for sureg2279

swilk . . . . . . . . . . . . . . . . . . . . . . . . . . Shapiro–Wilk and Shapiro–Francia tests for normality 2282

symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symmetry and marginal homogeneity tests 2286

table .......................................... Flexible table ofsummary statistics2294

tabstat ........................................ Compacttable of summarystatistics2305

tabulate oneway ...................................... One-waytable of frequencies2310

tabulate twoway ...................................... Two-way table of frequencies2318

tabulate, summarize() . . . . . . . . . . . . . . . . . . One- and two-way tables of summary statistics 2335

test ........................................ Test linear hypotheses after estimation2340

testnl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test nonlinear hypotheses after estimation 2359

tetrachoric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tetrachoric correlations for binary variables 2368

tnbreg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Truncated negative binomial regression 2378

tnbreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for tnbreg 2387

tobit .......................................................... Tobit regression2391

tobit postestimation ................................... Postestimation toolsfor tobit2398

total ........................................................... Estimatetotals2403

total postestimation ................................... Postestimation toolsfor total2409

tpoisson ............................................ TruncatedPoisson regression2410

tpoisson postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for tpoisson 2418

translate ................................................. Printand translate logs2421

Contents vii

truncreg ................................................... Truncated regression2431

truncreg postestimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Postestimation tools for truncreg 2438

ttest .............................................. ttests (mean-comparison tests) 2441

update ................................................. Checkfor ofﬁcialupdates2451

vce option .................................................. Variance estimators2454

view ....................................................... Viewﬁles and logs2459

vwls ............................................. Variance-weightedleast squares2462

vwls postestimation ................................... Postestimation tools forvwls2468

which . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Display location and version for an ado-ﬁle 2470

xi ........................................................ Interactionexpansion2472

zinb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zero-inﬂated negative binomial regression 2482

zinb postestimation ................................... Postestimationtools for zinb2489

zip .............................................. Zero-inﬂatedPoisson regression2492

zip postestimation ..................................... Postestimationtools for zip2499

Author index ................................................................ 2503

Subject index ................................................................ 2519

Cross-referencing the documentation

When reading this manual, you will ﬁnd references to other Stata manuals. For example,

[U] 26 Overview of Stata estimation commands

[XT]xtabond

[D]reshape

The ﬁrst example is a reference to chapter 26, Overview of Stata estimation commands, in the User’s

Guide; the second is a reference to the xtabond entry in the Longitudinal-Data/Panel-Data Reference

Manual; and the third is a reference to the reshape entry in the Data Management Reference Manual.

All the manuals in the Stata Documentation have a shorthand notation:

[GSM]Getting Started with Stata for Mac

[GSU]Getting Started with Stata for Unix

[GSW]Getting Started with Stata for Windows

[U]Stata User’s Guide

[R]Stata Base Reference Manual

[D]Stata Data Management Reference Manual

[G]Stata Graphics Reference Manual

[XT]Stata Longitudinal-Data/Panel-Data Reference Manual

[ME]Stata Multilevel Mixed-Effects Reference Manual

[MI]Stata Multiple-Imputation Reference Manual

[MV]Stata Multivariate Statistics Reference Manual

[PSS]Stata Power and Sample-Size Reference Manual

[P]Stata Programming Reference Manual

[SEM]Stata Structural Equation Modeling Reference Manual

[SVY]Stata Survey Data Reference Manual

[ST]Stata Survival Analysis and Epidemiological Tables Reference Manual

[TS]Stata Time-Series Reference Manual

[TE]Stata Treatment-Effects Reference Manual:

Potential Outcomes/Counterfactual Outcomes

[I]Stata Glossary and Index

[M]Mata Reference Manual

Title

intro — Introduction to base reference manual

Description Remarks and examples Also see

Description

This entry describes the organization of the reference manuals.

Remarks and examples

The complete list of reference manuals is as follows:

[R]Stata Base Reference Manual

[D]Stata Data Management Reference Manual

[G]Stata Graphics Reference Manual

[XT]Stata Longitudinal-Data/Panel-Data Reference Manual

[ME]Stata Multilevel Mixed-Effects Reference Manual

[MI]Stata Multiple-Imputation Reference Manual

[MV]Stata Multivariate Statistics Reference Manual

[PSS]Stata Power and Sample-Size Reference Manual

[P]Stata Programming Reference Manual

[SEM]Stata Structural Equation Modeling Reference Manual

[SVY]Stata Survey Data Reference Manual

[ST]Stata Survival Analysis and Epidemiological Tables Reference Manual

[TS]Stata Time-Series Reference Manual

[TE]Stata Treatment-Effects Reference Manual:

Potential Outcomes/Counterfactual Outcomes

[I]Stata Glossary and Index

[M]Mata Reference Manual

When we refer to “reference manuals”, we mean all manuals listed above.

When we refer to the specialty manuals, we mean all the manuals listed above except [R] and [I].

2intro — Introduction to base reference manual

Arrangement of the reference manuals

Each manual contains the following sections:

•Contents.

A table of contents can be found at the beginning of each manual.

•Cross-referencing the documentation.

This entry lists all the manuals and explains how they are cross-referenced.

•Introduction.

This entry—usually called intro—provides an overview of the manual. In the specialty manuals,

this introduction suggests entries that you might want to read ﬁrst and provides information about

new features.

Each specialty manual contains an overview of the commands described in it.

•Entries.

Entries are arranged in alphabetical order. Most entries describe Stata commands, but some entries

discuss concepts, and others provide overviews.

Entries that describe estimation commands are followed by an entry discussing postestimation

commands that are available for use after the estimation command. For example, the xtlogit entry

in the [XT] manual is followed by the xtlogit postestimation entry.

•Index.

An index can be found at the end of each manual.

The Glossary and Index, [I], contains a subject table of contents for all the reference manuals and

the User’s Guide, a combined acronym glossary, a combined glossary, a vignette index, a combined

author index, and a combined subject index for all the manuals.

To ﬁnd information and commands quickly, use Stata’s search command; see [R]search (see the

entry search in the [R] manual).

Arrangement of each entry

Entries in the Stata reference manuals, except the [M] and [SEM] manuals, generally contain the

following sections, which are explained below:

Syntax

Description

Options

Remarks and examples

Stored results

Methods and formulas

References

Also see

Syntax

A command’s syntax diagram shows how to type the command, indicates all possible options, and

gives the minimal allowed abbreviations for all the items in the command. For instance, the syntax

diagram for the summarize command is

intro — Introduction to base reference manual 3

summarize varlist  if  in weight ,options 

options Description

Main

detail display additional statistics

meanonly suppress the display; calculate only the mean; programmer’s option

format use variable’s display format

separator(#)draw separator line after every #variables; default is separator(5)

display options control spacing and base and empty cells

varlist may contain factor variables; see [U] 11.4.3 Factor variables.

varlist may contain time-series operators; see [U] 11.4.4 Time-series varlists.

by is allowed; see [D] by.

aweights, fweights, and iweights are allowed. However, iweights may not be used with the detail option; see

[U] 11.1.6 weight.

Items in the typewriter-style font should be typed exactly as they appear in the diagram,

although they may be abbreviated. Underlining indicates the shortest abbreviations where abbre-

viations are allowed. For instance, summarize may be abbreviated su,sum,summ, etc., or it may be

spelled out completely. Items in the typewriter font that are not underlined may not be abbreviated.

Square brackets denote optional items. In the syntax diagram above, varlist,if,in,weight, and the

options are optional.

The options are listed in a table immediately following the diagram, along with a brief description

of each.

Items typed in italics represent arguments for which you are to substitute variable names, observation

numbers, and the like.

The diagrams use the following symbols:

#Indicates a literal number, for example, 5; see [U] 12.2 Numbers.

 Anything enclosed in brackets is optional.

 At least one of the items enclosed in braces must appear.

|The vertical bar separates alternatives.

%fmt Any Stata format, for example, %8.2f; see [U] 12.5 Formats: Controlling how data are

displayed.

depvar The dependent variable in an estimation command; see [U] 20 Estimation and postesti-

mation commands.

exp Any algebraic expression, for example, (5+myvar)/2; see [U] 13 Functions and ex-

pressions.

ﬁlename Any ﬁlename; see [U] 11.6 Filenaming conventions.

indepvars The independent variables in an estimation command; see [U] 20 Estimation and

postestimation commands.

newvar A variable that will be created by the current command; see [U] 11.4.2 Lists of new

variables.

numlist A list of numbers; see [U] 11.1.8 numlist.

oldvar A previously created variable; see [U] 11.4.1 Lists of existing variables.

options A list of options; see [U] 11.1.7 options.

4intro — Introduction to base reference manual

range An observation range, for example, 5/20; see [U] 11.1.4 in range.

"string"Any string of characters enclosed in double quotes; see [U] 12.4 Strings.

varlist A list of variable names; see [U] 11.4 varlists. If varlist allows factor variables, a note to

that effect will be shown below the syntax diagram; see [U] 11.4.3 Factor variables. If

varlist allows time-series operators, a note to that effect will be shown below the syntax

diagram; see [U] 11.4.4 Time-series varlists.

varname A variable name; see [U] 11.3 Naming conventions.

weight A[wgttype=exp]modiﬁer; see [U] 11.1.6 weight and [U] 20.23 Weighted estimation.

xvar The variable to be displayed on the horizontal axis.

yvar The variable to be displayed on the vertical axis.

The Syntax section will indicate whether factor variables or time-series operators may be used

with a command. summarize allows factor variables and time-series operators.

If a command allows preﬁx commands, this will be indicated immediately following the table of

options. summarize allows by.

If a command allows weights, the types of weights allowed will be speciﬁed, with the default

weight listed ﬁrst. summarize allows aweights, fweights, and iweights, and if the type of weight

is not speciﬁed, the default is aweights.

A menu indicates how the dialog box for the command may be accessed using the menu system.

Description

Following the syntax diagram is a brief description of the purpose of the command.

Options

If the command allows any options, they are explained here, and for dialog users the location of

the options in the dialog is indicated. For instance, in the logistic entry in this manual, the Options

section looks like this:



 

Model 

. . .



 

SE/Robust 

. . .



 

Reporting 

. . .



 

Maximization 

. . .

intro — Introduction to base reference manual 5

Remarks and examples

The explanations under Description and Options are exceedingly brief and technical; they are

designed to provide a quick summary. The remarks explain in English what the preceding technical

jargon means. Examples are used to illustrate the command.

Stored results

Commands are classiﬁed as e-class, r-class, s-class, or n-class, according to whether they store

calculated results in e(),r(),s(), or not at all. These results can then be used in subroutines by

other programs (ado-ﬁles). Such stored results are documented here; see [U] 18.8 Accessing results

calculated by other programs and [U] 18.9 Accessing results calculated by estimation commands.

Methods and formulas

The techniques and formulas used in obtaining the results are described here as tersely and

technically as possible.

References

Published sources are listed that either were directly referenced in the preceding text or might be

of interest.

Also see

Other manual entries relating to this entry are listed that might also interest you.

 

Elizabeth L. (“Betty”) Scott (1917–1988) was an astronomer and mathematician trained at the

University of California at Berkeley. She published her ﬁrst paper when she was just 22 years

old, and her work was focused on comets for much of her early academic career.

During World War II, Scott began working at the statistical laboratory at Berkeley, which

had recently been established by Jerzy Neyman, sparking what would be a long and fruitful

collaboration with him. After the war, she shifted her focus toward mathematics and statistics,

partly because of limited career opportunities as an astronomer, though she still applied her

research to astronomical topics. For example, in 1949 she published a paper using statistical

techniques to analyze the distribution of binary star systems. She also published papers examining

the distribution of galaxies, and she is the name behind the “Scott effect”, which helps determine

the distances to galaxies. Later in her career, Scott applied her statistical knowledge to problems

associated with ozone depletion and its effects on the incidence of skin cancer as well as weather

modiﬁcation. She was also a champion of equality for women graduate students and faculty.

Among Scott’s many awards and accomplishments, she was elected an honorary fellow of the

Royal Statistical Society and was a fellow of the American Association for the Advancement of

Science. In 1992, the Committee of Presidents of Statistical Societies established the Elizabeth

L. Scott Award, a biannual award to recognize those who have strived to enhance the status of

women within the statistics profession.

 

6intro — Introduction to base reference manual

Also see

[U] 1.1 Getting Started with Stata

Title

about — Display information about your Stata

Syntax Menu Description Remarks and examples Also see

Syntax

about

Help >About Stata

Description

about displays information about your version of Stata.

Remarks and examples

If you are running Stata for Windows, information about memory is also displayed:

. about

Stata/MP 13 for Windows (64-bit x86-64)

Revision date

Total physical memory: 8388608 KB

Available physical memory: 937932 KB

10-user 32-core Stata network perpetual license:

Serial number: 5013041234

Licensed to: Alan R. Riley

StataCorp

Also see

[R]which — Display location and version for an ado-ﬁle

[U] 3 Resources for learning and using Stata

[U] 5 Flavors of Stata

Title

adoupdate — Update user-written ado-ﬁles

Syntax Description Options Remarks and examples

Stored results Also see

Syntax

adoupdate pkglist  ,options 

options Description

update perform update; default is to list packages that have updates, but not to

update them

all include packages that might have updates; default is to list or update

only packages that are known to have updates

ssconly check only packages obtained from SSC; default is to check all installed packages

dir(dir)check packages installed in dir; default is to check those installed in PLUS

verbose provide output to assist in debugging network problems

Description

User-written additions to Stata are called packages. These packages can add remarkable abilities

to Stata. Packages are found and installed by using ssc,search, and net; see [R]ssc,[R]search,

and [R]net.

User-written packages are updated by their developers, just as ofﬁcial Stata software is updated

by StataCorp.

To determine whether your ofﬁcial Stata software is up to date, and to update it if it is not, you

use update; see [R]update.

To determine whether your user-written additions are up to date, and to update them if they are

not, you use adoupdate.

Options

update speciﬁes that packages with updates be updated. The default is simply to list the packages

that could be updated without actually performing the update.

The ﬁrst time you adoupdate, do not specify this option. Once you see adoupdate work, you

will be more comfortable with it. Then type

. adoupdate, update

The packages that can be updated will be listed and updated.

all is rarely speciﬁed. Sometimes, adoupdate cannot determine whether a package you previously

installed has been updated. adoupdate can determine that the package is still available over the

web but is unsure whether the package has changed. Usually, the package has not changed, but

if you want to be certain that you are using the latest version, reinstall from the source.

adoupdate — Update user-written ado-ﬁles 9

Specifying all does this. Typing

. adoupdate, all

adds such packages to the displayed list as needing updating but does not update them. Typing

. adoupdate, update all

lists such packages and updates them.

ssconly is a popular option. Many packages are available from the Statistical Software Components

(SSC) archive—often called the Boston College Archive—which is provided at http://repec.org.

Many users ﬁnd most of what they want there. See [R]ssc for more information on the SSC.

ssconly speciﬁes that adoupdate check only packages obtained from that source. Specifying

this option is popular because SSC always provides distribution dates, and so adoupdate can be

certain whether an update exists.

dir(dir)speciﬁes which installed packages be checked. The default is dir(PLUS), and that is

probably correct. If you are responsible for maintaining a large system, however, you may have

previously installed packages in dir(SITE), where they are shared across users. See [P]sysdir

for an explanation of these directory codewords. You may also specify an actual directory name,

such as C:\mydir.

verbose is speciﬁed when you suspect network problems. It provides more detailed output that may

help you diagnose the problem.

Remarks and examples

Do not confuse adoupdate with update. Use adoupdate to update user-written ﬁles. Use update

to update the components (including ado-ﬁles) of the ofﬁcial Stata software. To use either command,

you must be connected to the Internet.

Remarks are presented under the following headings:

Using adoupdate

Possible problem the ﬁrst time you run adoupdate and the solution

Notes for developers

Using adoupdate

The ﬁrst time you try adoupdate, type

. adoupdate

That is, do not specify the update option. adoupdate without update produces a report but does

not update any ﬁles. The ﬁrst time you run adoupdate, you may see messages such as

. adoupdate

(note: package utx was installed more than once; older copy removed)

(remaining output omitted)

Having the same packages installed multiple times is common; adoupdate cleans that up.

The second time you run adoupdate, pick one package to update. Suppose that the report indicates

that package st0008 has an update available. Type

. adoupdate st0008, update

You can specify one or many packages after the adoupdate command. You can even use wildcards

such as st* to mean all packages that start with st or st*8 to mean all packages that start with st

and end with 8. You can do that with or without the update option.

10 adoupdate — Update user-written ado-ﬁles

Finally, you can let adoupdate update all your user-written additions:

. adoupdate, update

Possible problem the ﬁrst time you run adoupdate and the solution

The ﬁrst time you run adoupdate, you might get many duplicate messages:

. adoupdate

(note: package ___ installed more than once; older copy removed)

...

(note: package ___ installed more than once; older copy removed)

(remaining output omitted)

Some users have hundreds of duplicates. You might even see the same package name repeated

more than once:

(note: package stylus installed more than once; older copy removed)

That means that the package was duplicated twice.

Stata tolerates duplicates, and you did nothing wrong when you previously installed and updated

packages. adoupdate, however, needs the duplicates removed, mainly so that it does not keep

checking the same ﬁles.

The solution is to just let adoupdate run. adoupdate will run faster next time, when there are

no (or just a few) duplicates.

Notes for developers

adoupdate reports whether an installed package is up to date by comparing its distribution date

with that of the package available over the web.

If you are distributing software, include the line

d Distribution-Date: date

somewhere in your .pkg ﬁle. The capitalization of Distribution-Date does not matter, but include

the hyphen and the colon as shown. Code the date in either of two formats:

all numeric: yyyymmdd, for example, 20120701

Stata standard: ddMONyyyy, for example, 01jul2012

Stored results

adoupdate stores the following in r():

Macros

r(pkglist) a space-separated list of package names that need updating (update not speciﬁed) or that

were updated (update speciﬁed)

adoupdate — Update user-written ado-ﬁles 11

Also see

[R]net — Install and manage user-written additions from the Internet

[R]search — Search Stata documentation and other resources

[R]ssc — Install and uninstall packages from SSC

[R]update — Check for ofﬁcial updates

Title

ameans — Arithmetic, geometric, and harmonic means

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas Acknowledgments

References Also see

Syntax

ameans varlist  if  in weight ,options 

options Description

Main

add(#)add #to each variable in varlist

only add #only to variables with nonpositive values

level(#)set conﬁdence level; default is level(95)

by is allowed; see [D] by.

aweights and fweights are allowed; see [U] 11.1.6 weight.

Statistics >Summaries, tables, and tests >Summary and descriptive statistics >Arith./geometric/harmonic means

Description

ameans computes the arithmetic, geometric, and harmonic means, with their corresponding

conﬁdence intervals, for each variable in varlist or for all the variables in the data if varlist is

not speciﬁed. gmeans and hmeans are synonyms for ameans.

If you simply want arithmetic means and corresponding conﬁdence intervals, see [R]ci.

Options



 

Main 

add(#)adds the value #to each variable in varlist before computing the means and conﬁdence

intervals. This option is useful when analyzing variables with nonpositive values.

only modiﬁes the action of the add(#)option so that it adds #only to variables with at least one

nonpositive value.

level(#)speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [U] 20.7 Specifying the width of conﬁdence intervals.

ameans — Arithmetic, geometric, and harmonic means 13

Remarks and examples

Example 1

We have a dataset containing 8 observations on a variable named x. The eight values are 5, 4,

−4, −5, 0, 0, missing, and 7.

. ameans x

Variable Type Obs Mean [95% Conf. Interval]

x Arithmetic 7 1 -3.204405 5.204405

Geometric 3 5.192494 2.57899 10.45448

Harmonic 3 5.060241 3.023008 15.5179

. ameans x, add(5)

Variable Type Obs Mean [95% Conf. Interval]

x Arithmetic 7 6 1.795595 10.2044 *

Geometric 6 5.477226 2.1096 14.22071 *

Harmonic 6 3.540984 . . *

(*) 5 was added to the variables prior to calculating the results.

Missing values in confidence intervals for harmonic mean indicate

that confidence interval is undefined for corresponding variables.

Consult Reference Manual for details.

The number of observations displayed for the arithmetic mean is the number of nonmissing observations.

The number of observations displayed for the geometric and harmonic means is the number of

nonmissing, positive observations. Specifying the add(5) option produces 3 more positive observations.

The conﬁdence interval for the harmonic mean is not reported; see Methods and formulas below.

Video example

Descriptive statistics in Stata

Stored results

ameans stores the following in r():

Scalars

r(N) number of nonmissing observations; used for arithmetic mean

r(N pos) number of nonmissing positive observations; used for geometric and harmonic means

r(mean) arithmetic mean

r(lb) lower bound of conﬁdence interval for arithmetic mean

r(ub) upper bound of conﬁdence interval for arithmetic mean

r(Var) variance of untransformed data

r(mean g) geometric mean

r(lb g) lower bound of conﬁdence interval for geometric mean

r(ub g) upper bound of conﬁdence interval for geometric mean

r(Var g) variance of lnxi

r(mean h) harmonic mean

r(lb h) lower bound of conﬁdence interval for harmonic mean

r(ub h) upper bound of conﬁdence interval for harmonic mean

r(Var h) variance of 1/xi

14 ameans — Arithmetic, geometric, and harmonic means

Methods and formulas

See Armitage, Berry, and Matthews (2002)orSnedecor and Cochran (1989). For a history of the

concept of the mean, see Plackett (1958).

When restricted to the same set of values (that is, to positive values), the arithmetic mean (x) is

greater than or equal to the geometric mean, which in turn is greater than or equal to the harmonic

mean. Equality holds only if all values within a sample are equal to a positive constant.

The arithmetic mean and its conﬁdence interval are identical to those provided by ci; see [R]ci.

To compute the geometric mean, ameans ﬁrst creates uj=lnxjfor all positive xj. The arithmetic

mean of the ujand its conﬁdence interval are then computed as in ci. Let ube the resulting mean,

and let [L, U ]be the corresponding conﬁdence interval. The geometric mean is then exp(u), and

its conﬁdence interval is [ exp(L),exp(U) ].

The same procedure is followed for the harmonic mean, except that then uj=1/xj. The harmonic

mean is then 1/u, and its conﬁdence interval is [1/U, 1/L ]if Lis greater than zero. If Lis not

greater than zero, this conﬁdence interval is not deﬁned, and missing values are reported.

When weights are speciﬁed, ameans applies the weights to the transformed values, uj=lnxj

and uj=1/xj, respectively, when computing the geometric and harmonic means. For details on

how the weights are used to compute the mean and variance of the uj, see [R]summarize. Without

weights, the formula for the geometric mean reduces to

expn1

ln(xj)o

Without weights, the formula for the harmonic mean is

Acknowledgments

This improved version of ameans is based on the gmci command (Carlin, Vidmar, and Ramal-

heira 1998) and was written by John Carlin of the Murdoch Children’s Research Institute and the

University of Melbourne; Suzanna Vidmar of the University of Melbourne; and Carlos Ramalheira

of Coimbra University Hospital, Portugal.

References

Armitage, P., G. Berry, and J. N. S. Matthews. 2002. Statistical Methods in Medical Research. 4th ed. Oxford:

Blackwell.

Carlin, J. B., S. Vidmar, and C. Ramalheira. 1998. sg75: Geometric means and conﬁdence intervals.Stata Technical

Bulletin 41: 23–25. Reprinted in Stata Technical Bulletin Reprints, vol. 7, pp. 197–199. College Station, TX: Stata

Press.

Keynes, J. M. 1911. The principal averages and the laws of error which lead to them. Journal of the Royal Statistical

Society 74: 322–331.

Plackett, R. L. 1958. Studies in the history of probability and statistics: VII. The principle of the arithmetic mean.

Biometrika 45: 130–135.

Snedecor, G. W., and W. G. Cochran. 1989. Statistical Methods. 8th ed. Ames, IA: Iowa State University Press.

Stigler, S. M. 1985. Arithmetric means. In Vol. 1 of Encyclopedia of Statistical Sciences, ed. S. Kotz and N. L.

Johnson, 126–129. New York: Wiley.

ameans — Arithmetic, geometric, and harmonic means 15

Also see

[R]ci — Conﬁdence intervals for means, proportions, and counts

[R]mean — Estimate means

[R]summarize — Summary statistics

[SVY]svy estimation — Estimation commands for survey data

Title

anova — Analysis of variance and covariance

Syntax Menu Description Options

Remarks and examples Stored results References Also see

Syntax

anova varname termlist  if  in weight ,options 

where termlist is a factor-variable list (see [U] 11.4.3 Factor variables) with the following additional

features:

•Variables are assumed to be categorical; use the c. factor-variable operator to override this.

•The |symbol (indicating nesting) may be used in place of the #symbol (indicating interaction).

•The /symbol is allowed after a term and indicates that the following term is the error term

for the preceding terms.

options Description

Model

repeated(varlist)variables in terms that are repeated-measures variables

partial use partial (or marginal) sums of squares

sequential use sequential sums of squares

noconstant suppress constant term

dropemptycells drop empty cells from the design matrix

Adv. model

bse(term)between-subjects error term in repeated-measures ANOVA

bseunit(varname)variable representing lowest unit in the between-subjects error term

grouping(varname)grouping variable for computing pooled covariance matrix

bootstrap,by,fp,jackknife, and statsby are allowed; see [U] 11.1.10 Preﬁx commands.

Weights are not allowed with the bootstrap preﬁx; see [R] bootstrap.

aweights are not allowed with the jackknife preﬁx; see [R] jackknife.

aweights and fweights are allowed; see [U] 11.1.6 weight.

See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Statistics >Linear models and related >ANOVA/MANOVA >Analysis of variance and covariance

Description

The anova command ﬁts analysis-of-variance (ANOVA) and analysis-of-covariance (ANCOVA) models

for balanced and unbalanced designs, including designs with missing cells; for repeated-measures

ANOVA; and for factorial, nested, or mixed designs.

anova — Analysis of variance and covariance 17

The regress command (see [R]regress) will display the coefﬁcients, standard errors, etc., of the

regression model underlying the last run of anova.

If you want to ﬁt one-way ANOVA models, you may ﬁnd the oneway or loneway command more

convenient; see [R]oneway and [R]loneway. If you are interested in MANOVA or MANCOVA, see

[MV]manova.

Options



 

Model 

repeated(varlist)indicates the names of the categorical variables in the terms that are to be treated

as repeated-measures variables in a repeated-measures ANOVA or ANCOVA.

partial presents the ANOVA table using partial (or marginal) sums of squares. This setting is the

default. Also see the sequential option.

sequential presents the ANOVA table using sequential sums of squares.

noconstant suppresses the constant term (intercept) from the ANOVA or regression model.

dropemptycells drops empty cells from the design matrix. If c(emptycells) is set to keep (see

[R]set emptycells), this option temporarily resets it to drop before running the ANOVA model. If

c(emptycells) is already set to drop, this option does nothing.



 

Adv. model 

bse(term)indicates the between-subjects error term in a repeated-measures ANOVA. This option

is needed only in the rare case when the anova command cannot automatically determine the

between-subjects error term.

bseunit(varname)indicates the variable representing the lowest unit in the between-subjects error

term in a repeated-measures ANOVA. This option is rarely needed because the anova command

automatically selects the ﬁrst variable listed in the between-subjects error term as the default for

this option.

grouping(varname)indicates a variable that determines which observations are grouped together in

computing the covariance matrices that will be pooled and used in a repeated-measures ANOVA.

This option is rarely needed because the anova command automatically selects the combination

of all variables except the ﬁrst (or as speciﬁed in the bseunit() option) in the between-subjects

error term as the default for grouping observations.

Remarks and examples

Remarks are presented under the following headings:

Introduction

One-way ANOVA

Two-way ANOVA

N-way ANOVA

Weighted data

ANCOVA

Nested designs

Mixed designs

Latin-square designs

Repeated-measures ANOVA

Video examples

18 anova — Analysis of variance and covariance

Introduction

anova uses least squares to ﬁt the linear models known as ANOVA or ANCOVA (henceforth referred

to simply as ANOVA models).

If your interest is in one-way ANOVA, you may ﬁnd the oneway command to be more convenient;

see [R]oneway.

Structural equation modeling provides a more general framework for ﬁtting ANOVA models; see

the Stata Structural Equation Modeling Reference Manual.

ANOVA was pioneered by Fisher. It features prominently in his texts on statistical methods and his

design of experiments (1925,1935). Many books discuss ANOVA; see, for instance, Altman (1991); van

Belle et al. (2004); Cobb (1998); Snedecor and Cochran (1989); or Winer, Brown, and Michels (1991).

For a classic source, see Scheff´

e(1959). Kennedy and Gentle (1980) discuss ANOVA’s computing

problems. Edwards (1985) is concerned primarily with the relationship between multiple regression

and ANOVA.Acock (2014, chap. 9) illustrates his discussion with Stata output. Repeated-measures

ANOVA is discussed in Winer, Brown, and Michels (1991); Kuehl (2000); and Milliken and John-

son (2009). Pioneering work in repeated-measures ANOVA can be found in Box (1954); Geisser and

Greenhouse (1958); Huynh and Feldt (1976); and Huynh (1978). For a Stata-speciﬁc discussion of

ANOVA contrasts, see Mitchell (2012, chap. 7–9).

One-way ANOVA

anova, entered without options, performs and reports standard ANOVA. For instance, to perform a

one-way layout of a variable called endog on exog, you would type anova endog exog.

Example 1: One-way ANOVA

We run an experiment varying the amount of fertilizer used in growing apple trees. We test four

concentrations, using each concentration in three groves of 12 trees each. Later in the year, we

measure the average weight of the fruit.

If all had gone well, we would have had 3 observations on the average weight for each of the

four concentrations. Instead, two of the groves were mistakenly leveled by a confused man on a large

bulldozer. We are left with the following data:

. use http://www.stata-press.com/data/r13/apple

(Apple trees)

. list, abbrev(10) sepby(treatment)

treatment weight

1. 1 117.5

2. 1 113.8

3. 1 104.4

4. 2 48.9

5. 2 50.4

6. 2 58.9

7. 3 70.4

8. 3 86.9

9. 4 87.7

10. 4 67.3

anova — Analysis of variance and covariance 19

To obtain one-way ANOVA results, we type

. anova weight treatment

Number of obs = 10 R-squared = 0.9147

Root MSE = 9.07002 Adj R-squared = 0.8721

Source Partial SS df MS F Prob > F

Model 5295.54433 3 1765.18144 21.46 0.0013

treatment 5295.54433 3 1765.18144 21.46 0.0013

Residual 493.591667 6 82.2652778

Total 5789.136 9 643.237333

We ﬁnd signiﬁcant (at better than the 1% level) differences among the four concentrations.

Although the output is a usual ANOVA table, let’s run through it anyway. Above the table is a

summary of the underlying regression. The model was ﬁt on 10 observations, and the root mean

squared error (Root MSE) is 9.07. The R2for the model is 0.9147, and the adjusted R2is 0.8721.

The ﬁrst line of the table summarizes the model. The sum of squares (Partial SS) for the model is

5295.5 with 3 degrees of freedom (df). This line results in a mean square (MS) of 5295.5/3≈1765.2.

The corresponding Fstatistic is 21.46 and has a signiﬁcance level of 0.0013. Thus the model appears

to be signiﬁcant at the 0.13% level.

The next line summarizes the ﬁrst (and only) term in the model, treatment. Because there is

only one term, the line is identical to that for the overall model.

The third line summarizes the residual. The residual sum of squares is 493.59 with 6 degrees of

freedom, resulting in a mean squared error of 82.27. The square root of this latter number is reported

as the Root MSE.

The model plus the residual sum of squares equals the total sum of squares, which is reported as

5789.1 in the last line of the table. This is the total sum of squares of weight after removal of the

mean. Similarly, the model plus the residual degrees of freedom sum to the total degrees of freedom,

9. Remember that there are 10 observations. Subtracting 1 for the mean, we are left with 9 total

degrees of freedom.

Technical note

Rather than using the anova command, we could have performed this analysis by using the

oneway command. Example 1 in [R]oneway repeats this same analysis. You may wish to compare

the output.

Type regress to see the underlying regression model corresponding to an ANOVA model ﬁt using

the anova command.

Example 2: Regression table from a one-way ANOVA

Returning to the apple tree experiment, we found that the fertilizer concentration appears to

signiﬁcantly affect the average weight of the fruit. Although that ﬁnding is interesting, we next want

to know which concentration appears to grow the heaviest fruit. One way to ﬁnd out is by examining

the underlying regression coefﬁcients.

20 anova — Analysis of variance and covariance

. regress, baselevels

Source SS df MS Number of obs = 10

F( 3, 6) = 21.46

Model 5295.54433 3 1765.18144 Prob > F = 0.0013

Residual 493.591667 6 82.2652778 R-squared = 0.9147

Adj R-squared = 0.8721

Total 5789.136 9 643.237333 Root MSE = 9.07

weight Coef. Std. Err. t P>|t| [95% Conf. Interval]

treatment

1 0 (base)

2 -59.16667 7.405641 -7.99 0.000 -77.28762 -41.04572

3 -33.25 8.279758 -4.02 0.007 -53.50984 -12.99016

4 -34.4 8.279758 -4.15 0.006 -54.65984 -14.14016

_cons 111.9 5.236579 21.37 0.000 99.08655 124.7134

See [R]regress for an explanation of how to read this table. The baselevels option of regress

displays a row indicating the base category for our categorical variable, treatment. In summary,

we ﬁnd that concentration 1, the base (omitted) group, produces signiﬁcantly heavier fruits than

concentration 2, 3, and 4; concentration 2 produces the lightest fruits; and concentrations 3 and 4

appear to be roughly equivalent.

Example 3: ANOVA replay

We previously typed anova weight treatment to produce and display the ANOVA table for our

apple tree experiment. Typing regress displays the regression coefﬁcients. We can redisplay the

ANOVA table by typing anova without arguments:

. anova

Number of obs = 10 R-squared = 0.9147

Root MSE = 9.07002 Adj R-squared = 0.8721

Source Partial SS df MS F Prob > F

Model 5295.54433 3 1765.18144 21.46 0.0013

treatment 5295.54433 3 1765.18144 21.46 0.0013

Residual 493.591667 6 82.2652778

Total 5789.136 9 643.237333

Two-way ANOVA

You can include multiple explanatory variables with the anova command, and you can specify

interactions by placing ‘#’ between the variable names. For instance, typing anova y a b performs a

two-way layout of yon aand b. Typing anova y a b a#b performs a full two-way factorial layout.

The shorthand anova y a##b does the same.

With the default partial sums of squares, when you specify interacted terms, the order of the terms

does not matter. Typing anova y a b a#b is the same as typing anova y b a b#a.

anova — Analysis of variance and covariance 21

Example 4: Two-way factorial ANOVA

The classic two-way factorial ANOVA problem, at least as far as computer manuals are concerned,

is a two-way ANOVA design from Aﬁﬁ and Azen (1979).

Fifty-eight patients, each suffering from one of three different diseases, were randomly assigned

to one of four different drug treatments, and the change in their systolic blood pressure was recorded.

Here are the data:

Disease 1 Disease 2 Disease 3

Drug 1 42, 44, 36 33, 26, 33 31, –3, 25

13, 19, 22 21 25, 24

Drug 2 28, 23, 34 34, 33, 31 3, 26, 28

42, 13 36 32, 4, 16

Drug 3 1, 29, 19 11, 9, 7 21, 1, 9

1, –6 3

Drug 4 24, 9, 22 27, 12, 12 22, 7, 25

–2, 15 –5, 16, 15 5, 12

Let’s assume that we have entered these data into Stata and stored the data as systolic.dta.

Below we use the data, list the ﬁrst 10 observations, summarize the variables, and tabulate the

control variables:

. use http://www.stata-press.com/data/r13/systolic

(Systolic Blood Pressure Data)

. list in 1/10

drug disease systolic

1. 1 1 42

2. 1 1 44

3. 1 1 36

4. 1 1 13

5. 1 1 19

6. 1 1 22

7. 1 2 33

8. 1 2 26

9. 1 2 33

10. 1 2 21

. summarize

Variable Obs Mean Std. Dev. Min Max

drug 58 2.5 1.158493 1 4

disease 58 2.017241 .8269873 1 3

systolic 58 18.87931 12.80087 -6 44

. tabulate drug disease

Patient’s Disease

Drug Used 1 2 3 Total

1 6 4 5 15

2 5 4 6 15

3 3 5 4 12

4 5 6 5 16

Total 19 19 20 58

22 anova — Analysis of variance and covariance

Each observation in our data corresponds to one patient, and for each patient we record drug,

disease, and the increase in the systolic blood pressure, systolic. The tabulation reveals that the

data are not balanced—there are not equal numbers of patients in each drug–disease cell. Stata

does not require that the data be balanced. We can perform a two-way factorial ANOVA by typing

. anova systolic drug disease drug#disease

Number of obs = 58 R-squared = 0.4560

Root MSE = 10.5096 Adj R-squared = 0.3259

Source Partial SS df MS F Prob > F

Model 4259.33851 11 387.212591 3.51 0.0013

drug 2997.47186 3 999.157287 9.05 0.0001

disease 415.873046 2 207.936523 1.88 0.1637

drug#disease 707.266259 6 117.87771 1.07 0.3958

Residual 5080.81667 46 110.452536

Total 9340.15517 57 163.862371

Although Stata’s table command does not perform ANOVA, it can produce useful summary tables

of your data (see [R]table):

. table drug disease, c(mean systolic) row col f(%8.2f)

Patient’s Disease

Drug Used 1 2 3 Total

1 29.33 28.25 20.40 26.07

2 28.00 33.50 18.17 25.53

3 16.33 4.40 8.50 8.75

4 13.60 12.83 14.20 13.50

Total 22.79 18.21 15.80 18.88

These are simple means and are not inﬂuenced by our anova model. More useful is the margins

command (see [R]margins) that provides marginal means and adjusted predictions. Because drug

is the only signiﬁcant factor in our ANOVA, we now examine the adjusted marginal means for drug.

. margins drug, asbalanced

Adjusted predictions Number of obs = 58

Expression : Linear prediction, predict()

at : drug (asbalanced)

disease (asbalanced)

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

drug

1 25.99444 2.751008 9.45 0.000 20.45695 31.53194

2 26.55556 2.751008 9.65 0.000 21.01806 32.09305

3 9.744444 3.100558 3.14 0.003 3.503344 15.98554

4 13.54444 2.637123 5.14 0.000 8.236191 18.8527

These adjusted marginal predictions are not equal to the simple drug means (see the total column from

the table command); they are based upon predictions from our ANOVA model. The asbalanced

option of margins corresponds with the interpretation of the Fstatistic produced by ANOVA—each

cell is given equal weight regardless of its sample size (see the following three technical notes). You

anova — Analysis of variance and covariance 23

can omit the asbalanced option and obtain predictive margins that take into account the unequal

sample sizes of the cells.

. margins drug

Predictive margins Number of obs = 58

Expression : Linear prediction, predict()

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

drug

1 25.89799 2.750533 9.42 0.000 20.36145 31.43452

2 26.41092 2.742762 9.63 0.000 20.89003 31.93181

3 9.722989 3.099185 3.14 0.003 3.484652 15.96132

4 13.55575 2.640602 5.13 0.000 8.24049 18.871

Technical note

How do you interpret the signiﬁcance of terms like drug and disease in unbalanced data? If you

are familiar with SAS, the sums of squares and the Fstatistic reported by Stata correspond to SAS

type III sums of squares. (Stata can also calculate sequential sums of squares, but we will postpone

that topic for now.)

Let’s think in terms of the following table:

Disease 1 Disease 2 Disease 3

Drug 1 µ11 µ12 µ13 µ1·

Drug 2 µ21 µ22 µ23 µ2·

Drug 3 µ31 µ32 µ33 µ3·

Drug 4 µ41 µ42 µ43 µ4·

µ·1µ·2µ·3µ··

In this table, µij is the mean increase in systolic blood pressure associated with drug iand disease

j, while µi·is the mean for drug i,µ·jis the mean for disease j, and µ·· is the overall mean.

If the data are balanced, meaning that there are equal numbers of observations going into the

calculation of each mean µij , the row means, µi·, are given by

µi·=µi1+µi2+µi3

In our case, the data are not balanced, but we deﬁne the µi·according to that formula anyway. The

test for the main effect of drug is the test that

µ1·=µ2·=µ3·=µ4·

To be absolutely clear, the Ftest of the term drug, called the main effect of drug, is formally

equivalent to the test of the three constraints:

24 anova — Analysis of variance and covariance

µ11 +µ12 +µ13

3=µ21 +µ22 +µ23

µ11 +µ12 +µ13

3=µ31 +µ32 +µ33

µ11 +µ12 +µ13

3=µ41 +µ42 +µ43

In our data, we obtain a signiﬁcant Fstatistic of 9.05 and thus reject those constraints.

Technical note

Stata can display the symbolic form underlying the test statistics it presents, as well as display other

test statistics and their symbolic forms; see Obtaining symbolic forms in [R]anova postestimation.

Here is the result of requesting the symbolic form for the main effect of drug in our data:

. test drug, symbolic

drug

1 -(r2+r3+r4)

2 r2

3 r3

4 r4

disease

1 0

2 0

3 0

drug#disease

1 1 -1/3 (r2+r3+r4)

1 2 -1/3 (r2+r3+r4)

1 3 -1/3 (r2+r3+r4)

2 1 1/3 r2

2 2 1/3 r2

2 3 1/3 r2

3 1 1/3 r3

3 2 1/3 r3

3 3 1/3 r3

4 1 1/3 r4

4 2 1/3 r4

4 3 1/3 r4

_cons 0

This says exactly what we said in the previous technical note.

Technical note

Saying that there is no main effect of a variable is not the same as saying that it has no effect at

all. Stata’s ability to perform ANOVA on unbalanced data can easily be put to ill use.

For example, consider the following table of the probability of surviving a bout with one of two

diseases according to the drug administered to you:

anova — Analysis of variance and covariance 25

Disease 1 Disease 2

Drug 1 1 0

Drug 2 0 1

If you have disease 1 and are administered drug 1, you live. If you have disease 2 and are

administered drug 2, you live. In all other cases, you die.

This table has no main effects of either drug or disease, although there is a large interaction effect.

You might now be tempted to reason that because there is only an interaction effect, you would

be indifferent between the two drugs in the absence of knowledge about which disease infects you.

Given an equal chance of having either disease, you reason that it does not matter which drug is

administered to you—either way, your chances of surviving are 0.5.

You may not, however, have an equal chance of having either disease. If you knew that disease 1

was 100 times more likely to occur in the population, and if you knew that you had one of the two

diseases, you would express a strong preference for receiving drug 1.

When you calculate the signiﬁcance of main effects on unbalanced data, you must ask yourself

why the data are unbalanced. If the data are unbalanced for random reasons and you are making

predictions for a balanced population, the test of the main effect makes perfect sense. If, however,

the data are unbalanced because the underlying populations are unbalanced and you are making

predictions for such unbalanced populations, the test of the main effect may be practically—if not

statistically—meaningless.

Example 5: ANOVA with missing cells

Stata can perform ANOVA not only on unbalanced populations, but also on populations that are

so unbalanced that entire cells are missing. For instance, using our systolic blood pressure data, let’s

reﬁt the model eliminating the drug 1–disease 1 cell. Because anova follows the same syntax as all

other Stata commands, we can explicitly specify the data to be used by typing the if qualiﬁer at the

end of the anova command. Here we want to use the data that are not for drug 1 and disease 1:

. anova systolic drug##disease if !(drug==1 & disease==1)

Number of obs = 52 R-squared = 0.4545

Root MSE = 10.1615 Adj R-squared = 0.3215

Source Partial SS df MS F Prob > F

Model 3527.95897 10 352.795897 3.42 0.0025

drug 2686.57832 3 895.526107 8.67 0.0001

disease 327.792598 2 163.896299 1.59 0.2168

drug#disease 703.007602 5 140.60152 1.36 0.2586

Residual 4233.48333 41 103.255691

Total 7761.44231 51 152.185143

Here we used drug##disease as a shorthand for drug disease drug#disease.

26 anova — Analysis of variance and covariance

Technical note

The test of the main effect of drug in the presence of missing cells is more complicated than that

for unbalanced data. Our underlying tableau now has the following form:

Disease 1 Disease 2 Disease 3

Drug 1 µ12 µ13

Drug 2 µ21 µ22 µ23 µ2·

Drug 3 µ31 µ32 µ33 µ3·

Drug 4 µ41 µ42 µ43 µ4·

µ·2µ·3

The hole in the drug 1–disease 1 cell indicates that the mean is unobserved. Considering the main

effect of drug, the test is unchanged for the rows in which all the cells are deﬁned:

µ2·=µ3·=µ4·

The ﬁrst row, however, requires special attention. Here we want the average outcome for drug 1,

which is averaged only over diseases 2 and 3, to be equal to the average values of all other drugs

averaged over those same two diseases:

µ12 +µ13

2=µ22 +µ23/2 + µ32 +µ33/2 + µ42 +µ43/2

Thus the test contains three constraints:

µ21 +µ22 +µ23

3=µ31 +µ32 +µ33

µ21 +µ22 +µ23

3=µ41 +µ42 +µ43

µ12 +µ13

2=µ22 +µ23 +µ32 +µ33 +µ42 +µ43

Stata can calculate two types of sums of squares, partial and sequential. If you do not specify

which sums of squares to calculate, Stata calculates partial sums of squares. The technical notes

above have gone into great detail about the deﬁnition and use of partial sums of squares. Use the

sequential option to obtain sequential sums of squares.

Technical note

Before we illustrate sequential sums of squares, consider one more feature of the partial sums. If

you know how such things are calculated, you may worry that the terms must be speciﬁed in some

particular order, that Stata would balk or, even worse, produce different results if you typed, say,

anova drug#disease drug disease rather than anova drug disease drug#disease. We assure

you that is not the case.

When you type a model, Stata internally reorganizes the terms, forms the cross-product matrix,

inverts it, converts the result to an upper-Hermite form, and then performs the hypothesis tests. As a

ﬁnal touch, Stata reports the results in the same order that you typed the terms.

anova — Analysis of variance and covariance 27

Example 6: Sequential sums of squares

We wish to estimate the effects on systolic blood pressure of drug and disease by using sequential

sums of squares. We want to introduce disease ﬁrst, then drug, and ﬁnally, the interaction of drug

and disease:

. anova systolic disease drug disease#drug, sequential

Number of obs = 58 R-squared = 0.4560

Root MSE = 10.5096 Adj R-squared = 0.3259

Source Seq. SS df MS F Prob > F

Model 4259.33851 11 387.212591 3.51 0.0013

disease 488.639383 2 244.319691 2.21 0.1210

drug 3063.43286 3 1021.14429 9.25 0.0001

disease#drug 707.266259 6 117.87771 1.07 0.3958

Residual 5080.81667 46 110.452536

Total 9340.15517 57 163.862371

The Fstatistic on disease is now 2.21. When we ﬁt this same model by using partial sums of

squares, the statistic was 1.88.

N-way ANOVA

You may include high-order interaction terms, such as a third-order interaction between the variables

A,B, and C, by typing A#B#C.

Example 7: Three-way factorial ANOVA

We wish to determine the operating conditions that maximize yield for a manufacturing process.

There are three temperature settings, two chemical supply companies, and two mixing methods under

investigation. Three observations are obtained for each combination of these three factors.

. use http://www.stata-press.com/data/r13/manuf

(manufacturing process data)

. describe

Contains data from http://www.stata-press.com/data/r13/manuf.dta

obs: 36 manufacturing process data

vars: 4 2 Jan 2013 13:28

size: 144

storage display value

variable name type format label variable label

temperature byte %9.0g temp machine temperature setting

chemical byte %9.0g supplier chemical supplier

method byte %9.0g meth mixing method

yield byte %9.0g product yield

Sorted by:

28 anova — Analysis of variance and covariance

We wish to perform a three-way factorial ANOVA. We could type

. anova yield temp chem temp#chem meth temp#meth chem#meth temp#chem#meth

but prefer to use the ## factor-variable operator for brevity.

. anova yield temp##chem##meth

Number of obs = 36 R-squared = 0.5474

Root MSE = 2.62996 Adj R-squared = 0.3399

Source Partial SS df MS F Prob > F

Model 200.75 11 18.25 2.64 0.0227

temperature 30.5 2 15.25 2.20 0.1321

chemical 12.25 1 12.25 1.77 0.1958

temperature#chemical 24.5 2 12.25 1.77 0.1917

method 42.25 1 42.25 6.11 0.0209

temperature#method 87.5 2 43.75 6.33 0.0062

chemical#method .25 1 .25 0.04 0.8508

temperature#chemical#

method 3.5 2 1.75 0.25 0.7785

Residual 166 24 6.91666667

Total 366.75 35 10.4785714

The interaction between temperature and method appears to be the important story in these data.

A table of means for this interaction is given below.

. table method temp, c(mean yield) row col f(%8.2f)

mixing machine temperature setting

method low medium high Total

stir 7.50 6.00 6.00 6.50

fold 5.50 9.00 11.50 8.67

Total 6.50 7.50 8.75 7.58

Here our ANOVA is balanced (each cell has the same number of observations), and we obtain the

same values as in the table above (but with additional information such as conﬁdence intervals) by

using the margins command. Because our ANOVA is balanced, using the asbalanced option with

margins would not produce different results. We request the predictive margins for the two terms

that appear signiﬁcant in our ANOVA:temperature#method and method.

anova — Analysis of variance and covariance 29

. margins temperature#method method

Predictive margins Number of obs = 36

Expression : Linear prediction, predict()

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

temperature#

method

low#stir 7.5 1.073675 6.99 0.000 5.284044 9.715956

low#fold 5.5 1.073675 5.12 0.000 3.284044 7.715956

medium#stir 6 1.073675 5.59 0.000 3.784044 8.215956

medium#fold 9 1.073675 8.38 0.000 6.784044 11.21596

high#stir 6 1.073675 5.59 0.000 3.784044 8.215956

high#fold 11.5 1.073675 10.71 0.000 9.284044 13.71596

method

stir 6.5 .6198865 10.49 0.000 5.220617 7.779383

fold 8.666667 .6198865 13.98 0.000 7.387284 9.946049

We decide to use the folding method of mixing and a high temperature in our manufacturing

process.

Weighted data

Like all estimation commands, anova can produce estimates on weighted data. See [U] 11.1.6 weight

for details on specifying the weight.

Example 8: Three-way factorial ANOVA on grouped data

We wish to investigate the prevalence of byssinosis, a form of pneumoconiosis that can afﬂict

workers exposed to cotton dust. We have data on 5,419 workers in a large cotton mill. We know

whether each worker smokes, his or her race, and the dustiness of the work area. The variables are

smokes smoker or nonsmoker in the last ﬁve years

race white or other

workplace 1 (most dusty), 2 (less dusty), 3 (least dusty)

We wish to ﬁt an ANOVA model explaining the prevalence of byssinosis according to a full factorial

model of smokes,race, and workplace.

The data are unbalanced. Moreover, although we have data on 5,419 workers, the data are grouped

according to the explanatory variables, along with some other variables, resulting in 72 observations.

For each observation, we know the number of workers in the group (pop), the prevalence of byssinosis

(prob), and the values of the three explanatory variables. Thus we wish to ﬁt a three-way factorial

model on grouped data.

We begin by showing a bit of the data, which are from Higgins and Koch (1977).

30 anova — Analysis of variance and covariance

. use http://www.stata-press.com/data/r13/byssin

(Byssinosis incidence)

. describe

Contains data from http://www.stata-press.com/data/r13/byssin.dta

obs: 72 Byssinosis incidence

vars: 5 19 Dec 2012 07:04

size: 864

storage display value

variable name type format label variable label

smokes int %8.0g smokes Smokes

race int %8.0g race Race

workplace int %8.0g workplace

Dustiness of workplace

pop int %8.0g Population size

prob float %9.0g Prevalence of byssinosis

Sorted by:

. list in 1/5, abbrev(10) divider

smokes race workplace pop prob

1. yes white most 40 .075

2. yes white less 74 0

3. yes white least 260 .0076923

4. yes other most 164 .152439

5. yes other less 88 0

The ﬁrst observation in the data represents a group of 40 white workers who smoke and work

in a “most” dusty work area. Of those 40 workers, 7.5% have byssinosis. The second observation

represents a group of 74 white workers who also smoke but who work in a “less” dusty environment.

None of those workers has byssinosis.

Almost every Stata command allows weights. Here we want to weight the data by pop. We can,

for instance, make a table of the number of workers by their smoking status and race:

. tabulate smokes race [fw=pop]

Race

Smokes other white Total

no 799 1,431 2,230

yes 1,104 2,085 3,189

Total 1,903 3,516 5,419

The [fw=pop] at the end of the tabulate command tells Stata to count each observation as representing

pop persons. When making the tally, tabulate treats the ﬁrst observation as representing 40 workers,

the second as representing 74 workers, and so on.

Similarly, we can make a table of the dustiness of the workplace:

anova — Analysis of variance and covariance 31

. tabulate workplace [fw=pop]

Dustiness

workplace Freq. Percent Cum.

least 3,450 63.66 63.66

less 1,300 23.99 87.65

most 669 12.35 100.00

Total 5,419 100.00

We can discover the average incidence of byssinosis among these workers by typing

. summarize prob [fw=pop]

Variable Obs Mean Std. Dev. Min Max

prob 5419 .0304484 .0567373 0 .287037

We discover that 3.04% of these workers have byssinosis. Across all cells, the byssinosis rates vary

from 0 to 28.7%. Just to prove that there might be something here, let’s obtain the average incidence

rates according to the dustiness of the workplace:

. table workplace smokes race [fw=pop], c(mean prob)

Dustiness Race and Smokes

of other white

workplace no yes no yes

least .0107527 .0101523 .0081549 .0162774

less .02 .0081633 .0136612 .0143149

most .0820896 .1679105 .0833333 .2295082

Let’s now ﬁt the ANOVA model.

. anova prob workplace smokes race workplace#smokes workplace#race smokes#race

> workplace#smokes#race [aweight=pop]

(sum of wgt is 5.4190e+03)

Number of obs = 65 R-squared = 0.8300

Root MSE = .025902 Adj R-squared = 0.7948

Source Partial SS df MS F Prob > F

Model .173646538 11 .015786049 23.53 0.0000

workplace .097625175 2 .048812588 72.76 0.0000

smokes .013030812 1 .013030812 19.42 0.0001

race .001094723 1 .001094723 1.63 0.2070

workplace#smokes .019690342 2 .009845171 14.67 0.0000

workplace#race .001352516 2 .000676258 1.01 0.3718

smokes#race .001662874 1 .001662874 2.48 0.1214

workplace#smokes#race .000950841 2 .00047542 0.71 0.4969

Residual .035557766 53 .000670901

Total .209204304 64 .003268817

Of course, if we want to see the underlying regression, we could type regress.

Above we examined simple means of the cells of workplace#smokes#race. Our ANOVA shows

workplace,smokes, and their interaction as being the only signiﬁcant factors in our model. We now

examine the predictive marginal mean byssinosis rates for these terms.

32 anova — Analysis of variance and covariance

. margins workplace#smokes workplace smokes

Predictive margins Number of obs = 65

Expression : Linear prediction, predict()

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

workplace#

smokes

least#no .0090672 .0062319 1.45 0.152 -.0034323 .0215667

least#yes .0141264 .0053231 2.65 0.010 .0034497 .0248032

less#no .0158872 .009941 1.60 0.116 -.0040518 .0358263

less#yes .0121546 .0087353 1.39 0.170 -.0053662 .0296755

most#no .0828966 .0182151 4.55 0.000 .0463617 .1194314

most#yes .2078768 .012426 16.73 0.000 .1829533 .2328003

workplace

least .0120701 .0040471 2.98 0.004 .0039526 .0201875

less .0137273 .0065685 2.09 0.041 .0005526 .0269019

most .1566225 .0104602 14.97 0.000 .1356419 .177603

smokes

no .0196915 .0050298 3.91 0.000 .0096029 .02978

yes .0358626 .0041949 8.55 0.000 .0274488 .0442765

Smoking combined with the most dusty workplace produces the highest byssinosis rates.

 

Ronald Aylmer Fisher (1890–1962) (Sir Ronald from 1952) studied mathematics at Cambridge.

Even before he ﬁnished his studies, he had published on statistics. He worked as a statistician at

Rothamsted Experimental Station (1919–1933), as professor of eugenics at University College

London (1933–1943), as professor of genetics at Cambridge (1943–1957), and in retirement at

the CSIRO Division of Mathematical Statistics in Adelaide. His many fundamental and applied

contributions to statistics and genetics mark him as one of the greatest statisticians of all time,

including original work on tests of signiﬁcance, distribution theory, theory of estimation, ﬁducial

inference, and design of experiments.

 

ANCOVA

You can include multiple explanatory variables with the anova command, but unless you explicitly

state otherwise by using the c. factor-variable operator, all the variables are interpreted as categorical

variables. Using the c. operator, you can designate variables as continuous and thus perform ANCOVA.

Example 9: ANCOVA (ANOVA with a continuous covariate)

We have census data recording the death rate (drate) and median age (age) for each state. The

dataset also includes the region of the country in which each state is located (region):

anova — Analysis of variance and covariance 33

. use http://www.stata-press.com/data/r13/census2

(1980 Census data by state)

. summarize drate age region

Variable Obs Mean Std. Dev. Min Max

drate 50 84.3 13.07318 40 107

age 50 29.5 1.752549 24 35

region 50 2.66 1.061574 1 4

age is coded in integral years from 24 to 35, and region is coded from 1 to 4, with 1 standing for

the Northeast, 2 for the North Central, 3 for the South, and 4 for the West.

When we examine the data more closely, we discover large differences in the death rate across

regions of the country:

. tabulate region, summarize(drate)

Census Summary of Death Rate

region Mean Std. Dev. Freq.

NE 93.444444 7.0553368 9

N Cntrl 88.916667 5.5833899 12

South 88.3125 8.5457104 16

West 68.769231 13.342625 13

Total 84.3 13.073185 50

Naturally, we wonder if these differences might not be explained by differences in the median ages

of the populations. To ﬁnd out, we ﬁt a regression model (via anova) of drate on region and age.

In the anova example below, we treat age as a categorical variable.

. anova drate region age

Number of obs = 50 R-squared = 0.7927

Root MSE = 6.7583 Adj R-squared = 0.7328

Source Partial SS df MS F Prob > F

Model 6638.86529 11 603.533208 13.21 0.0000

region 1320.00973 3 440.003244 9.63 0.0001

age 2237.24937 8 279.656171 6.12 0.0000

Residual 1735.63471 38 45.6745977

Total 8374.5 49 170.908163

We have the answer to our question: differences in median ages do not eliminate the differences in

death rates across the four regions. The ANOVA table summarizes the two terms in the model, region

and age. The region term contains 3 degrees of freedom, and the age term contains 8 degrees of

freedom. Both are signiﬁcant at better than the 1% level.

The age term contains 8 degrees of freedom. Because we did not explicitly indicate that age was

to be treated as a continuous variable, it was treated as categorical, meaning that unique coefﬁcients

were estimated for each level of age. The only clue of this labeling is that the number of degrees of

freedom associated with the age term exceeds 1. The labeling becomes more obvious if we review

the regression coefﬁcients:

34 anova — Analysis of variance and covariance

. regress, baselevels

Source SS df MS Number of obs = 50

F( 11, 38) = 13.21

Model 6638.86529 11 603.533208 Prob > F = 0.0000

Residual 1735.63471 38 45.6745977 R-squared = 0.7927

Adj R-squared = 0.7328

Total 8374.5 49 170.908163 Root MSE = 6.7583

drate Coef. Std. Err. t P>|t| [95% Conf. Interval]

region

NE 0 (base)

N Cntrl .4428387 3.983664 0.11 0.912 -7.621668 8.507345

South -.2964637 3.934766 -0.08 0.940 -8.261981 7.669054

West -13.37147 4.195344 -3.19 0.003 -21.8645 -4.878439

age

24 0 (base)

26 -15 9.557677 -1.57 0.125 -34.34851 4.348506

27 14.30833 7.857378 1.82 0.076 -1.598099 30.21476

28 12.66011 7.495513 1.69 0.099 -2.51376 27.83399

29 18.861 7.28918 2.59 0.014 4.104825 33.61717

30 20.87003 7.210148 2.89 0.006 6.273847 35.46621

31 29.91307 8.242741 3.63 0.001 13.22652 46.59963

32 27.02853 8.509432 3.18 0.003 9.802089 44.25498

35 38.925 9.944825 3.91 0.000 18.79275 59.05724

_cons 68.37147 7.95459 8.60 0.000 52.26824 84.47469

The regress command displayed the anova model as a regression table. We used the baselevels

option to display the dropped level (or base) for each term.

If we want to treat age as a continuous variable, we must prepend c. to age in our anova.

. anova drate region c.age

Number of obs = 50 R-squared = 0.7203

Root MSE = 7.21483 Adj R-squared = 0.6954

Source Partial SS df MS F Prob > F

Model 6032.08254 4 1508.02064 28.97 0.0000

region 1645.66228 3 548.554092 10.54 0.0000

age 1630.46662 1 1630.46662 31.32 0.0000

Residual 2342.41746 45 52.0537213

Total 8374.5 49 170.908163

The age term now has 1 degree of freedom. The regression coefﬁcients are

anova — Analysis of variance and covariance 35

. regress, baselevels

Source SS df MS Number of obs = 50

F( 4, 45) = 28.97

Model 6032.08254 4 1508.02064 Prob > F = 0.0000

Residual 2342.41746 45 52.0537213 R-squared = 0.7203

Adj R-squared = 0.6954

Total 8374.5 49 170.908163 Root MSE = 7.2148

drate Coef. Std. Err. t P>|t| [95% Conf. Interval]

region

NE 0 (base)

N Cntrl 1.792526 3.375925 0.53 0.598 -5.006935 8.591988

South .6979912 3.18154 0.22 0.827 -5.70996 7.105942

West -13.37578 3.723447 -3.59 0.001 -20.87519 -5.876377

age 3.922947 .7009425 5.60 0.000 2.511177 5.334718

_cons -28.60281 21.93931 -1.30 0.199 -72.79085 15.58524

Although we started analyzing these data to explain the regional differences in death rate, let’s focus

on the effect of age for a moment. In our ﬁrst model, each level of age had a unique death rate

associated with it. For instance, the predicted death rate in a north central state with a median age

of 28 was

0.44 + 12.66 + 68.37 ≈81.47

whereas the predicted death rate from our current model is

1.79 + 3.92 ×28 −28.60 ≈82.95

Our previous model had an R2of 0.7927, whereas our current model has an R2of 0.7203. This

“small” loss of predictive power accompanies a gain of 7 degrees of freedom, so we suspect that the

continuous-age model is as good as the discrete-age model.

Technical note

There is enough information in the two ANOVA tables to attach a statistical signiﬁcance to our

suspicion that the loss of predictive power is offset by the savings in degrees of freedom. Because

the continuous-age model is nested within the discrete-age model, we can perform a standard Chow

test. For those of us who know such formulas off the top of our heads, the Fstatistic is

(2342.41746 −1735.63471)/7

45.6745977 = 1.90

There is, however, a better way.

We can ﬁnd out whether our continuous model is as good as our discrete model by putting age

in the model twice: once as a continuous variable and once as a categorical variable. The categorical

variable will then measure deviations around the straight line implied by the continuous variable, and

the Ftest for the signiﬁcance of the categorical variable will test whether those deviations are jointly

zero.

36 anova — Analysis of variance and covariance

. anova drate region c.age age

Number of obs = 50 R-squared = 0.7927

Root MSE = 6.7583 Adj R-squared = 0.7328

Source Partial SS df MS F Prob > F

Model 6638.86529 11 603.533208 13.21 0.0000

region 1320.00973 3 440.003244 9.63 0.0001

age 699.74137 1 699.74137 15.32 0.0004

age 606.782747 7 86.6832496 1.90 0.0970

Residual 1735.63471 38 45.6745977

Total 8374.5 49 170.908163

We ﬁnd that the Ftest for the signiﬁcance of the (categorical) age variable is 1.90, just as we

calculated above. It is signiﬁcant at the 9.7% level. If we hold to a 5% signiﬁcance level, we cannot

reject the null hypothesis that the effect of age is linear.

Example 10: Interaction of continuous and categorical variables

In our census data, we still ﬁnd signiﬁcant differences across the regions after controlling for the

median age of the population. We might now wonder whether the regional differences are differences

in level — independent of age—or are instead differences in the regional effects of age. Just as we

can interact categorical variables with other categorical variables, we can interact categorical variables

with continuous variables.

. anova drate region c.age region#c.age

Number of obs = 50 R-squared = 0.7365

Root MSE = 7.24852 Adj R-squared = 0.6926

Source Partial SS df MS F Prob > F

Model 6167.7737 7 881.110529 16.77 0.0000

region 188.713602 3 62.9045339 1.20 0.3225

age 873.425599 1 873.425599 16.62 0.0002

region#age 135.691162 3 45.2303874 0.86 0.4689

Residual 2206.7263 42 52.5411023

Total 8374.5 49 170.908163

The region#c.age term in our model measures the differences in slopes across the regions. We cannot

reject the null hypothesis that there are no such differences. The region effect is now “insigniﬁcant”.

This status does not mean that there are no regional differences in death rates because each test is a

marginal or partial test. Here, with region#c.age included in the model, region is being tested at

the point where age is zero. Apart from this value not existing in the dataset, it is also a long way

from the mean value of age, so the test of region at this point is meaningless (although it is valid

if you acknowledge what is being tested).

To obtain a more sensible test of region, we can subtract the mean from the age variable and

use this in the model.

. quietly summarize age

. generate mage = age - r(mean)

anova — Analysis of variance and covariance 37

. anova drate region c.mage region#c.mage

Number of obs = 50 R-squared = 0.7365

Root MSE = 7.24852 Adj R-squared = 0.6926

Source Partial SS df MS F Prob > F

Model 6167.7737 7 881.110529 16.77 0.0000

region 1166.14735 3 388.715783 7.40 0.0004

mage 873.425599 1 873.425599 16.62 0.0002

region#mage 135.691162 3 45.2303874 0.86 0.4689

Residual 2206.7263 42 52.5411023

Total 8374.5 49 170.908163

region is signiﬁcant when tested at the mean of the age variable.

Remember that we can specify interactions by typing varname#varname. We have seen examples

of interacting categorical variables with categorical variables and, in the examples above, a categorical

variable (region) with a continuous variable (age or mage).

We can also interact continuous variables with continuous variables. To include an age2term

in our model, we could type c.age#c.age. If we also wanted to interact the categorical variable

region with the age2term, we could type region#c.age#c.age (or even c.age#region#c.age).

Nested designs

In addition to specifying interaction terms, nested terms can also be speciﬁed in an ANOVA. A

vertical bar is used to indicate nesting: A|B is read as Anested within B.A|B|C is read as Anested

within B, which is nested within C.A|B#C is read as Ais nested within the interaction of Band C.

A#B|C is read as the interaction of Aand B, which is nested within C.

Different error terms can be speciﬁed for different parts of the model. The forward slash is used

to indicate that the next term in the model is the error term for what precedes it. For instance,

anova y A / B|A indicates that the Ftest for Ais to be tested by using the mean square from B|A

in the denominator. Error terms (terms following the slash) are generally not tested unless they are

themselves followed by a slash. Residual error is the default error term.

For example, consider A/B/C, where A,B, and Cmay be arbitrarily complex terms. Then

anova will report Atested by Band Btested by C. If we add one more slash on the end to form

A/B/C/, then anova will also report Ctested by the residual error.

Example 11: Simple nested ANOVA

We have collected data from a manufacturer that is evaluating which of ﬁve different brands

of machinery to buy to perform a particular function in an assembly line. Twenty assembly-line

employees were selected at random for training on these machines, with four employees assigned

to learn a particular machine. The output from each employee (operator) on the brand of machine

for which he trained was measured during four trial periods. In this example, the operator is nested

within machine. Because of sickness and employee resignations, the ﬁnal data are not balanced. The

following table gives the mean output and sample size for each machine and operator combination.

. use http://www.stata-press.com/data/r13/machine, clear

(machine data)

38 anova — Analysis of variance and covariance

. table machine operator, c(mean output n output) col f(%8.2f)

five

brands of operator nested in machine

machine 1 2 3 4 Total

1 9.15 9.48 8.27 8.20 8.75

2 4 3 4 13

2 15.03 11.55 11.45 11.52 12.47

3 2 2 4 11

3 11.27 10.13 11.13 10.84

3 3 3 9

4 16.10 18.97 15.35 16.60 16.65

3 3 4 3 13

5 15.30 14.35 10.43 13.63

4 4 3 11

Assuming that operator is random (that is, we wish to infer to the larger population of possible

operators) and machine is ﬁxed (that is, only these ﬁve machines are of interest), the typical test for

machine uses operator nested within machine as the error term. operator nested within machine

can be tested by residual error. Our earlier warning concerning designs with either unplanned missing

cells or unbalanced cell sizes, or both, also applies to interpreting the ANOVA results from this

unbalanced nested example.

. anova output machine / operator|machine /

Number of obs = 57 R-squared = 0.8661

Root MSE = 1.47089 Adj R-squared = 0.8077

Source Partial SS df MS F Prob > F

Model 545.822288 17 32.1071934 14.84 0.0000

machine 430.980792 4 107.745198 13.82 0.0001

operator|machine 101.353804 13 7.79644648

operator|machine 101.353804 13 7.79644648 3.60 0.0009

Residual 84.3766582 39 2.16350406

Total 630.198947 56 11.2535526

operator|machine is preceded by a slash, indicating that it is the error term for the terms before

it (here machine). operator|machine is also followed by a slash that indicates it should be tested

with residual error. The output lists the operator|machine term twice, once as the error term for

machine, and again as a term tested by residual error. A line is placed in the ANOVA table to separate

the two. In general, a dividing line is placed in the output to separate the terms into groups that are

tested with the same error term. The overall model is tested by residual error and is separated from

the rest of the table by a blank line at the top of the table.

The results indicate that the machines are not all equal and that there are signiﬁcant differences

between operators.

anova — Analysis of variance and covariance 39

Example 12: ANOVA with multiple levels of nesting

Your company builds and operates sewage treatment facilities. You want to compare two particulate

solutions during the particulate reduction step of the sewage treatment process. For each solution,

two area managers are randomly selected to implement and oversee the change to the new treatment

process in two of their randomly chosen facilities. Two workers at each of these facilities are trained

to operate the new process. A measure of particulate reduction is recorded at various times during

the month at each facility for each worker. The data are described below.

. use http://www.stata-press.com/data/r13/sewage

(Sewage treatment)

. describe

Contains data from http://www.stata-press.com/data/r13/sewage.dta

obs: 64 Sewage treatment

vars: 5 9 May 2013 12:43

size: 320

storage display value

variable name type format label variable label

particulate byte %9.0g particulate reduction

solution byte %9.0g 2 particulate solutions

manager byte %9.0g 2 managers per solution

facility byte %9.0g 2 facilities per manager

worker byte %9.0g 2 workers per facility

Sorted by: solution manager facility worker

You want to determine if the two particulate solutions provide signiﬁcantly different particulate

reduction. You would also like to know if manager,facility, and worker are signiﬁcant effects.

solution is a ﬁxed factor, whereas manager,facility, and worker are random factors.

In the following anova command, we use abbreviations for the variable names, which can sometimes

make long ANOVA model statements easier to read.

. anova particulate s / m|s / f|m|s / w|f|m|s /, dropemptycells

Number of obs = 64 R-squared = 0.6338

Root MSE = 12.7445 Adj R-squared = 0.5194

Source Partial SS df MS F Prob > F

Model 13493.6094 15 899.573958 5.54 0.0000

solution 7203.76563 1 7203.76563 17.19 0.0536

manager|solution 838.28125 2 419.140625

manager|solution 838.28125 2 419.140625 0.55 0.6166

facility|manager|

solution 3064.9375 4 766.234375

facility|manager|

solution 3064.9375 4 766.234375 2.57 0.1193

worker|facility|

manager|solution 2386.625 8 298.328125

worker|facility|

manager|solution 2386.625 8 298.328125 1.84 0.0931

Residual 7796.25 48 162.421875

Total 21289.8594 63 337.934276

40 anova — Analysis of variance and covariance

While solution is not declared signiﬁcant at the 5% signiﬁcance level, it is near enough to

that threshold to warrant further investigation (see example 3 in [R]anova postestimation for a

continuation of the analysis of these data).

Technical note

Why did we use the dropemptycells option with the previous anova? By default, Stata retains

empty cells when building the design matrix and currently treats |and #the same in how it

determines the possible number of cells. Retaining empty cells in an ANOVA with nested terms can

cause your design matrix to become too large. In example 12, there are 1024 =2×4×8×16

cells that are considered possible for the worker|facility|manager|solution term because the

worker,facility, and manager variables are uniquely numbered. With the dropemptycells

option, the worker|facility|manager|solution term requires just 16 columns in the design

matrix (corresponding to the 16 unique workers).

Why did we not use the dropemptycells option in example 11, where operator is nested in

machine? If you look at the table presented at the beginning of that example, you will see that

operator is compactly instead of uniquely numbered (you need both operator number and machine

number to determine the operator). Here the dropemptycells option would have only reduced

our design matrix from 26 columns down to 24 columns (because there were only 3 operators instead

of 4 for machines 3 and 5).

We suggest that you specify dropemptycells when there are nested terms in your ANOVA. You

could also use the set emptycells drop command to accomplish the same thing; see [R]set.

Mixed designs

An ANOVA can consist of both nested and crossed terms. A split-plot ANOVA design provides an

example.

Example 13: Split-plot ANOVA

Two reading programs and three skill-enhancement techniques are under investigation. Ten classes

of ﬁrst-grade students were randomly assigned so that ﬁve classes were taught with one reading

program and another ﬁve classes were taught with the other. The 30 students in each class were

divided into six groups with 5 students each. Within each class, the six groups were divided randomly

so that each of the three skill-enhancement techniques was taught to two of the groups within each

class. At the end of the school year, a reading assessment test was administered to all the students.

In this split-plot ANOVA, the whole-plot treatment is the two reading programs, and the split-plot

treatment is the three skill-enhancement techniques.

. use http://www.stata-press.com/data/r13/reading

(Reading experiment data)

anova — Analysis of variance and covariance 41

. describe

Contains data from http://www.stata-press.com/data/r13/reading.dta

obs: 300 Reading experiment data

vars: 5 9 Mar 2013 18:57

size: 1,500 (_dta has notes)

storage display value

variable name type format label variable label

score byte %9.0g reading score

program byte %9.0g reading program

class byte %9.0g class nested in program

skill byte %9.0g skill enhancement technique

group byte %9.0g group nested in class and skill

Sorted by:

In this split-plot ANOVA, the error term for program is class nested within program. The error

term for skill and the program by skill interaction is the class by skill interaction nested

within program. Other terms are also involved in the model and can be seen below.

Our anova command is too long to ﬁt on one line of this manual. Where we have chosen to break

the command into multiple lines is arbitrary. If we were typing this command into Stata, we would

just type along and let Stata automatically wrap across lines, as necessary.

. anova score prog / class|prog skill prog#skill / class#skill|prog /

> group|class#skill|prog /, dropemptycells

Number of obs = 300 R-squared = 0.3738

Root MSE = 14.6268 Adj R-squared = 0.2199

Source Partial SS df MS F Prob > F

Model 30656.5167 59 519.601977 2.43 0.0000

program 4493.07 1 4493.07 8.73 0.0183

class|program 4116.61333 8 514.576667

skill 1122.64667 2 561.323333 1.54 0.2450

program#skill 5694.62 2 2847.31 7.80 0.0043

class#skill|program 5841.46667 16 365.091667

class#skill|program 5841.46667 16 365.091667 1.17 0.3463

group|class#skill|

program 9388.1 30 312.936667

group|class#skill|

program 9388.1 30 312.936667 1.46 0.0636

Residual 51346.4 240 213.943333

Total 82002.9167 299 274.257246

The program#skill term is signiﬁcant, as is the program term. Let’s look at the predictive margins

for these two terms and at a marginsplot for the ﬁrst term.

42 anova — Analysis of variance and covariance

. margins, within(program skill)

Predictive margins Number of obs = 300

Expression : Linear prediction, predict()

within : program skill

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

program#skill

1 1 68.16 2.068542 32.95 0.000 64.08518 72.23482

1 2 52.86 2.068542 25.55 0.000 48.78518 56.93482

1 3 61.54 2.068542 29.75 0.000 57.46518 65.61482

2 1 50.7 2.068542 24.51 0.000 46.62518 54.77482

2 2 56.54 2.068542 27.33 0.000 52.46518 60.61482

2 3 52.1 2.068542 25.19 0.000 48.02518 56.17482

. marginsplot, plot2opts(lp(dash) m(D)) plot3opts(lp(dot) m(T))

Variables that uniquely identify margins: program skill

45 50 55 60 65 70

Linear Prediction

1 2

reading program

skill=1 skill=2

skill=3

Predictive Margins with 95% CIs

. margins, within(program)

Predictive margins Number of obs = 300

Expression : Linear prediction, predict()

within : program

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

program

1 60.85333 1.194273 50.95 0.000 58.50074 63.20593

2 53.11333 1.194273 44.47 0.000 50.76074 55.46593

Because our ANOVA involves nested terms, we used the within() option of margins; see

[R]margins.

skill 2 produces a low score when combined with program 1 and a high score when combined

with program 2, demonstrating the interaction between the reading program and the skill-enhancement

anova — Analysis of variance and covariance 43

technique. You might conclude that the ﬁrst reading program and the ﬁrst skill-enhancement technique

perform best when combined. However, notice the overlapping conﬁdence interval for the ﬁrst reading

program and the third skill-enhancement technique.

Technical note

There are several valid ways to write complicated anova terms. In the reading experiment

example (example 13), we had a term group|class#skill|program. This term can be read

as group nested within both class and skill and further nested within program. You can

also write this term as group|class#skill#program or group|program#class#skill or

group|skill#class|program, etc. All variations will produce the same result. Some people prefer

having only one ‘|’ in a term and would use group|class#skill#program, which is read as group

nested within class,skill, and program.

 

Gertrude Mary Cox (1900–1978) was born on a farm near Dayton, Iowa. Initially intending to

become superintendent of an orphanage, she enrolled at Iowa State College. There she majored

in mathematics and attained the college’s ﬁrst Master’s degree in statistics. After working on

her PhD in psychological statistics for two years at the University of California–Berkeley, she

decided to go back to Iowa State to work with George W. Snedecor. There she pursued her

interest in and taught a course in design of experiments. That work led to her collaboration with

W. G. Cochran, which produced a classic text. In 1940, when Snedecor shared with her his list

of men he was nominating to head the statistics department at North Carolina State College, she

wanted to know why she had not been included. He added her name, she won the position, and

she built an outstanding department at North Carolina State. Cox retired early so she could work

at the Research Triangle Institute in North Carolina. She consulted widely, served as editor of

Biometrics, and was elected to the National Academy of Sciences.

 

Latin-square designs

You can use anova to analyze a Latin-square design. Consider the following example, published

in Snedecor and Cochran (1989).

Example 14: Latin-square ANOVA

Data from a Latin-square design are as follows:

Row Column 1 Column 2 Column 3 Column 4 Column 5

1 257(B) 230(E) 279(A) 287(C) 202(D)

2 245(D) 283(A) 245(E) 280(B) 260(C)

3 182(E) 252(B) 280(C) 246(D) 250(A)

4 203(A) 204(C) 227(D) 193(E) 259(B)

5 231(C) 271(D) 266(B) 334(A) 338(E)

44 anova — Analysis of variance and covariance

In Stata, the data might appear as follows:

. use http://www.stata-press.com/data/r13/latinsq

. list

row c1 c2 c3 c4 c5

1. 1 257 230 279 287 202

2. 2 245 283 245 280 260

3. 3 182 252 280 246 250

4. 4 203 204 227 193 259

5. 5 231 271 266 334 338

Before anova can be used on these data, the data must be organized so that the outcome

measurement is in one column. reshape is inadequate for this task because there is information

about the treatments in the sequence of these observations. pkshape is designed to reshape this type

of data; see [R]pkshape.

. pkshape row row c1-c5, order(beacd daebc ebcda acdeb cdbae)

. list

sequence outcome treat carry period

1. 1 257 1 0 1

2. 2 245 5 0 1

3. 3 182 2 0 1

4. 4 203 3 0 1

5. 5 231 4 0 1

6. 1 230 2 1 2

7. 2 283 3 5 2

8. 3 252 1 2 2

9. 4 204 4 3 2

10. 5 271 5 4 2

11. 1 279 3 2 3

12. 2 245 2 3 3

13. 3 280 4 1 3

14. 4 227 5 4 3

15. 5 266 1 5 3

16. 1 287 4 3 4

17. 2 280 1 2 4

18. 3 246 5 4 4

19. 4 193 2 5 4

20. 5 334 3 1 4

21. 1 202 5 4 5

22. 2 260 4 1 5

23. 3 250 3 5 5

24. 4 259 1 2 5

25. 5 338 2 3 5

anova — Analysis of variance and covariance 45

. anova outcome sequence period treat

Number of obs = 25 R-squared = 0.6536

Root MSE = 32.4901 Adj R-squared = 0.3073

Source Partial SS df MS F Prob > F

Model 23904.08 12 1992.00667 1.89 0.1426

sequence 13601.36 4 3400.34 3.22 0.0516

period 6146.16 4 1536.54 1.46 0.2758

treat 4156.56 4 1039.14 0.98 0.4523

Residual 12667.28 12 1055.60667

Total 36571.36 24 1523.80667

These methods will work with any type of Latin-square design, including those with replicated

measurements. For more information, see [R]pk,[R]pkcross, and [R]pkshape.

Repeated-measures ANOVA

One approach for analyzing repeated-measures data is to use multivariate ANOVA (MANOVA); see

[MV]manova. In this approach, the data are placed in wide form (see [D]reshape), and the repeated

measures enter the MANOVA as dependent variables.

A second approach for analyzing repeated measures is to use anova. However, one of the underlying

assumptions for the Ftests in ANOVA is independence of observations. In a repeated-measures design,

this assumption is almost certainly violated or is at least suspect. In a repeated-measures ANOVA,

the subjects (or whatever the experimental units are called) are observed for each level of one or

more of the other categorical variables in the model. These variables are called the repeated-measure

variables. Observations from the same subject are likely to be correlated.

The approach used in repeated-measures ANOVA to correct for this lack of independence is to

apply a correction to the degrees of freedom of the Ftest for terms in the model that involve

repeated measures. This correction factor, , lies between the reciprocal of the degrees of freedom

for the repeated term and 1. Box (1954) provided the pioneering work in this area. Milliken and

Johnson (2009) refer to the lower bound of this correction factor as Box’s conservative correction

factor. Winer, Brown, and Michels (1991) call it simply the conservative correction factor.

Geisser and Greenhouse (1958) provide an estimate for the correction factor called the Greenhouse–

Geisser . This value is estimated from the data. Huynh and Feldt (1976) show that the Greenhouse–

Geisser tends to be conservatively biased. They provide a revised correction factor called the

Huynh–Feldt . When the Huynh–Feldt exceeds 1, it is set to 1. Thus there is a natural ordering

for these correction factors:

Box’s conservative ≤Greenhouse–Geisser ≤Huynh–Feldt ≤1

A correction factor of 1 is the same as no correction.

anova with the repeated() option computes these correction factors and displays the revised

test results in a table that follows the standard ANOVA table. In the resulting table, H-F stands for

Huynh–Feldt, G-G stands for Greenhouse–Geisser, and Box stands for Box’s conservative .

46 anova — Analysis of variance and covariance

Example 15: Repeated-measures ANOVA

This example is taken from table 4.3 of Winer, Brown, and Michels (1991). The reaction time for

ﬁve subjects each tested with four drugs was recorded in the variable score. Here is a table of the

data (see [P]tabdisp if you are unfamiliar with tabdisp):

. use http://www.stata-press.com/data/r13/t43, clear

(T4.3 -- Winer, Brown, Michels)

. tabdisp person drug, cellvar(score)

drug

person 1 2 3 4

1 30 28 16 34

2 14 18 10 22

3 24 20 18 30

4 38 34 20 44

5 26 28 14 30

drug is the repeated variable in this simple repeated-measures ANOVA example. The ANOVA is

speciﬁed as follows:

. anova score person drug, repeated(drug)

Number of obs = 20 R-squared = 0.9244

Root MSE = 3.06594 Adj R-squared = 0.8803

Source Partial SS df MS F Prob > F

Model 1379 7 197 20.96 0.0000

person 680.8 4 170.2 18.11 0.0001

drug 698.2 3 232.733333 24.76 0.0000

Residual 112.8 12 9.4

Total 1491.8 19 78.5157895

Between-subjects error term: person

Levels: 5 (4 df)

Lowest b.s.e. variable: person

Repeated variable: drug

Huynh-Feldt epsilon = 1.0789

*Huynh-Feldt epsilon reset to 1.0000

Greenhouse-Geisser epsilon = 0.6049

Box’s conservative epsilon = 0.3333

Prob > F

Source df F Regular H-F G-G Box

drug 3 24.76 0.0000 0.0000 0.0006 0.0076

Residual 12

Here the Huynh–Feldt is 1.0789, which is larger than 1. It is reset to 1, which is the same as making

no adjustment to the standard test computed in the main ANOVA table. The Greenhouse–Geisser is

0.6049, and its associated p-value is computed from an Fratio of 24.76 using 1.8147 (=3) and

7.2588 (=12) degrees of freedom. Box’s conservative is set equal to the reciprocal of the degrees

of freedom for the repeated term. Here it is 1/3, so Box’s conservative test is computed using 1 and

4 degrees of freedom for the observed Fratio of 24.76.

anova — Analysis of variance and covariance 47

Even for Box’s conservative ,drug is signiﬁcant with a p-value of 0.0076. The following table

gives the predictive marginal mean score (that is, response time) for each of the four drugs:

. margins drug

Predictive margins Number of obs = 20

Expression : Linear prediction, predict()

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

drug

1 26.4 1.371131 19.25 0.000 23.41256 29.38744

2 25.6 1.371131 18.67 0.000 22.61256 28.58744

3 15.6 1.371131 11.38 0.000 12.61256 18.58744

4 32 1.371131 23.34 0.000 29.01256 34.98744

The ANOVA table for this example provides an Ftest for person, but you should ignore it. An

appropriate test for person would require replication (that is, multiple measurements for person

and drug combinations). Also, without replication there is no test available for investigating the

interaction between person and drug.

Example 16: Repeated-measures ANOVA with nesting

Table 7.7 of Winer, Brown, and Michels (1991) provides another repeated-measures ANOVA example.

There are four dial shapes and two methods for calibrating dials. Subjects are nested within calibration

method, and an accuracy score is obtained. The data are shown below.

. use http://www.stata-press.com/data/r13/t77

(T7.7 -- Winer, Brown, Michels)

. tabdisp shape subject calib, cell(score)

2 methods for calibrating dials and

subject nested in calib

4 dial 1 2

shapes 1 2 3 1 2 3

1 0 3 4 4 5 7

2 0 1 3 2 4 5

3 5 5 6 7 6 8

4 3 4 2 8 6 9

The calibration method and dial shapes are ﬁxed factors, whereas subjects are random. The

appropriate test for calibration method uses the nested subject term as the error term. Both the dial

shape and the interaction between dial shape and calibration method are tested with the dial shape

by subject interaction nested within calibration method. Here we drop this term from the anova

command, and it becomes residual error. The dial shape is the repeated variable because each subject

is tested with all four dial shapes. Here is the anova command that produces the desired results:

48 anova — Analysis of variance and covariance

. anova score calib / subject|calib shape calib#shape, repeated(shape)

Number of obs = 24 R-squared = 0.8925

Root MSE = 1.11181 Adj R-squared = 0.7939

Source Partial SS df MS F Prob > F

Model 123.125 11 11.1931818 9.06 0.0003

calib 51.0416667 1 51.0416667 11.89 0.0261

subject|calib 17.1666667 4 4.29166667

shape 47.4583333 3 15.8194444 12.80 0.0005

calib#shape 7.45833333 3 2.48611111 2.01 0.1662

Residual 14.8333333 12 1.23611111

Total 137.958333 23 5.99818841

Between-subjects error term: subject|calib

Levels: 6 (4 df)

Lowest b.s.e. variable: subject

Covariance pooled over: calib (for repeated variable)

Repeated variable: shape

Huynh-Feldt epsilon = 0.8483

Greenhouse-Geisser epsilon = 0.4751

Box’s conservative epsilon = 0.3333

Prob > F

Source df F Regular H-F G-G Box

shape 3 12.80 0.0005 0.0011 0.0099 0.0232

calib#shape 3 2.01 0.1662 0.1791 0.2152 0.2291

Residual 12

The repeated-measure corrections are applied to any terms that are tested in the main ANOVA

table and have the repeated variable in the term. These corrections are given in a table below the

main ANOVA table. Here the repeated-measures tests for shape and calib#shape are presented.

Calibration method is signiﬁcant, as is dial shape. The interaction between calibration method and

dial shape is not signiﬁcant. The repeated-measure corrections do not change these conclusions, but

they do change the signiﬁcance level for the tests on shape and calib#shape. Here, though, unlike

in the example 15, the Huynh–Feldt is less than 1.

Here are the predictive marginal mean scores for calibration method and dial shapes. Because the

interaction was not signiﬁcant, we request only the calib and shape predictive margins.

. margins, within(calib)

Predictive margins Number of obs = 24

Expression : Linear prediction, predict()

within : calib

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

calib

1 3 .3209506 9.35 0.000 2.300709 3.699291

2 5.916667 .3209506 18.43 0.000 5.217375 6.615958

anova — Analysis of variance and covariance 49

. margins, within(shape)

Predictive margins Number of obs = 24

Expression : Linear prediction, predict()

within : shape

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

shape

1 3.833333 .4538926 8.45 0.000 2.844386 4.82228

2 2.5 .4538926 5.51 0.000 1.511053 3.488947

3 6.166667 .4538926 13.59 0.000 5.17772 7.155614

4 5.333333 .4538926 11.75 0.000 4.344386 6.32228

Technical note

The computation of the Greenhouse–Geisser and Huynh–Feldt epsilons in a repeated-measures

ANOVA requires the number of levels and degrees of freedom for the between-subjects error term, as

well as a value computed from a pooled covariance matrix. The observations are grouped based on

all but the lowest-level variable in the between-subjects error term. The covariance over the repeated

variables is computed for each resulting group, and then these covariance matrices are pooled. The

dimension of the pooled covariance matrix is the number of levels of the repeated variable (or

combination of levels for multiple repeated variables). In example 16, there are four levels of the

repeated variable (shape), so the resulting covariance matrix is 4 ×4.

The anova command automatically attempts to determine the between-subjects error term and the

lowest-level variable in the between-subjects error term to group the observations for computation of

the pooled covariance matrix. anova issues an error message indicating that the bse() or bseunit()

option is required when anova cannot determine them. You may override the default selections of

anova by specifying the bse(),bseunit(), or grouping() option. The term speciﬁed in the bse()

option must be a term in the ANOVA model.

The default selection for the between-subjects error term (the bse() option) is the interaction of the

nonrepeated categorical variables in the ANOVA model. The ﬁrst variable listed in the between-subjects

error term is automatically selected as the lowest-level variable in the between-subjects error term

but can be overridden with the bseunit(varname)option. varname is often a term, such as subject

or subsample within subject, and is most often listed ﬁrst in the term because of the nesting notation

of ANOVA. This term makes sense in most repeated-measures ANOVA designs when the terms of

the model are written in standard form. For instance, in example 16, there were three categorical

variables (subject,calib, and shape), with shape being the repeated variable. Here anova looked

for a term involving only subject and calib to determine the between-subjects error term. It found

subject|calib as the term with six levels and 4 degrees of freedom. anova then picked subject

as the default for the bseunit() option (the lowest variable in the between-subjects error term)

because it was listed ﬁrst in the term.

The grouping of observations proceeds, based on the different combinations of values of the

variables in the between-subjects error term, excluding the lowest level variable (as found by default

or as speciﬁed with the bseunit() option). You may specify the grouping() option to change the

default grouping used in computing the pooled covariance matrix.

The between-subjects error term, number of levels, degrees of freedom, lowest variable in the

term, and grouping information are presented after the main ANOVA table and before the rest of the

repeated-measures output.

50 anova — Analysis of variance and covariance

Example 17: Repeated-measures ANOVA with two repeated variables

Data with two repeated variables are given in table 7.13 of Winer, Brown, and Michels (1991).

The accuracy scores of subjects making adjustments to three dials during three different periods are

recorded. Three subjects are exposed to a certain noise background level, whereas a different set of

three subjects is exposed to a different noise background level. Here is a table of accuracy scores for

the noise,subject,period, and dial variables:

. use http://www.stata-press.com/data/r13/t713

(T7.13 -- Winer, Brown, Michels)

. tabdisp subject dial period, by(noise) cell(score) stubwidth(11)

noise

background

and subject 10 minute time periods and dial

nested in 1 2 3

noise 1 2 3 1 2 3 1 2 3

1 45 53 60 40 52 57 28 37 46

2 35 41 50 30 37 47 25 32 41

3 60 65 75 58 54 70 40 47 50

1 50 48 61 25 34 51 16 23 35

2 42 45 55 30 37 43 22 27 37

3 56 60 77 40 39 57 31 29 46

noise,period, and dial are ﬁxed, whereas subject is random. Both period and dial are

repeated variables. The ANOVA for this example is speciﬁed next.

anova — Analysis of variance and covariance 51

. anova score noise / subject|noise period noise#period /

> period#subject|noise dial noise#dial /

> dial#subject|noise period#dial noise#period#dial, repeated(period dial)

Number of obs = 54 R-squared = 0.9872

Root MSE = 2.81859 Adj R-squared = 0.9576

Source Partial SS df MS F Prob > F

Model 9797.72222 37 264.803303 33.33 0.0000

noise 468.166667 1 468.166667 0.75 0.4348

subject|noise 2491.11111 4 622.777778

period 3722.33333 2 1861.16667 63.39 0.0000

noise#period 333 2 166.5 5.67 0.0293

period#subject|noise 234.888889 8 29.3611111

dial 2370.33333 2 1185.16667 89.82 0.0000

noise#dial 50.3333333 2 25.1666667 1.91 0.2102

dial#subject|noise 105.555556 8 13.1944444

period#dial 10.6666667 4 2.66666667 0.34 0.8499

noise#period#dial 11.3333333 4 2.83333333 0.36 0.8357

Residual 127.111111 16 7.94444444

Total 9924.83333 53 187.261006

Between-subjects error term: subject|noise

Levels: 6 (4 df)

Lowest b.s.e. variable: subject

Covariance pooled over: noise (for repeated variables)

Repeated variable: period

Huynh-Feldt epsilon = 1.0668

*Huynh-Feldt epsilon reset to 1.0000

Greenhouse-Geisser epsilon = 0.6476

Box’s conservative epsilon = 0.5000

Prob > F

Source df F Regular H-F G-G Box

period 2 63.39 0.0000 0.0000 0.0003 0.0013

noise#period 2 5.67 0.0293 0.0293 0.0569 0.0759

period#subject|noise 8

Repeated variable: dial

Huynh-Feldt epsilon = 2.0788

*Huynh-Feldt epsilon reset to 1.0000

Greenhouse-Geisser epsilon = 0.9171

Box’s conservative epsilon = 0.5000

Prob > F

Source df F Regular H-F G-G Box

dial 2 89.82 0.0000 0.0000 0.0000 0.0007

noise#dial 2 1.91 0.2102 0.2102 0.2152 0.2394

dial#subject|noise 8

52 anova — Analysis of variance and covariance

Repeated variables: period#dial

Huynh-Feldt epsilon = 1.3258

*Huynh-Feldt epsilon reset to 1.0000

Greenhouse-Geisser epsilon = 0.5134

Box’s conservative epsilon = 0.2500

Prob > F

Source df F Regular H-F G-G Box

period#dial 4 0.34 0.8499 0.8499 0.7295 0.5934

noise#period#dial 4 0.36 0.8357 0.8357 0.7156 0.5825

Residual 16

For each repeated variable and for each combination of interactions of repeated variables, there are

different correction values. The anova command produces tables for each applicable combination.

The two most signiﬁcant factors in this model appear to be dial and period. The noise by

period interaction may also be signiﬁcant, depending on the correction factor you use. Below is a

table of predictive margins for the accuracy score for dial, period, and noise by period.

. margins, within(dial)

Predictive margins Number of obs = 54

Expression : Linear prediction, predict()

within : dial

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

dial

1 37.38889 .6643478 56.28 0.000 35.98053 38.79724

2 42.22222 .6643478 63.55 0.000 40.81387 43.63058

3 53.22222 .6643478 80.11 0.000 51.81387 54.63058

. margins, within(period)

Predictive margins Number of obs = 54

Expression : Linear prediction, predict()

within : period

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

period

1 54.33333 .6643478 81.78 0.000 52.92498 55.74169

2 44.5 .6643478 66.98 0.000 43.09165 45.90835

3 34 .6643478 51.18 0.000 32.59165 35.40835

anova — Analysis of variance and covariance 53

. margins, within(noise period)

Predictive margins Number of obs = 54

Expression : Linear prediction, predict()

within : noise period

Empty cells : reweight

Delta-method

Margin Std. Err. t P>|t| [95% Conf. Interval]

noise#period

1 1 53.77778 .9395297 57.24 0.000 51.78606 55.76949

1 2 49.44444 .9395297 52.63 0.000 47.45273 51.43616

1 3 38.44444 .9395297 40.92 0.000 36.45273 40.43616

2 1 54.88889 .9395297 58.42 0.000 52.89717 56.8806

2 2 39.55556 .9395297 42.10 0.000 37.56384 41.54727

2 3 29.55556 .9395297 31.46 0.000 27.56384 31.54727

Dial shape 3 produces the highest score, and scores decrease over the periods.

Example 17 had two repeated-measurement variables. Up to four repeated-measurement variables

may be speciﬁed in the anova command.

Video examples

Analysis of covariance in Stata

Two-way ANOVA in Stata

54 anova — Analysis of variance and covariance

Stored results

anova stores the following in e():

Scalars

e(N) number of observations

e(mss) model sum of squares

e(df m) model degrees of freedom

e(rss) residual sum of squares

e(df r) residual degrees of freedom

e(r2) R-squared

e(r2 a) adjusted R-squared

e(F) Fstatistic

e(rmse) root mean squared error

e(ll) log likelihood

e(ll 0) log likelihood, constant-only model

e(ss #)sum of squares for term #

e(df #)numerator degrees of freedom for term #

e(ssdenom #)denominator sum of squares for term #(when using nonresidual error)

e(dfdenom #)denominator degrees of freedom for term #(when using nonresidual error)

e(F #)Fstatistic for term #(if computed)

e(N bse) number of levels of the between-subjects error term

e(df bse) degrees of freedom for the between-subjects error term

e(box#)Box’s conservative epsilon for a particular combination of repeated variables

(repeated() only)

e(gg#)Greenhouse–Geisser epsilon for a particular combination of repeated variables

(repeated() only)

e(hf#)Huynh–Feldt epsilon for a particular combination of repeated variables

(repeated() only)

e(rank) rank of e(V)

Macros

e(cmd) anova

e(cmdline) command as typed

e(depvar) name of dependent variable

e(varnames) names of the right-hand-side variables

e(term #)term #

e(errorterm #)error term for term #(when using nonresidual error)

e(sstype) type of sum of squares; sequential or partial

e(repvars) names of repeated variables (repeated() only)

e(repvar#)names of repeated variables for a particular combination (repeated() only)

e(model) ols

e(wtype) weight type

e(wexp) weight expression

e(properties) b V

e(estat cmd) program used to implement estat

e(predict) program used to implement predict

e(asbalanced) factor variables fvset as asbalanced

e(asobserved) factor variables fvset as asobserved

Matrices

e(b) coefﬁcient vector

e(V) variance–covariance matrix of the estimators

e(Srep) covariance matrix based on repeated measures (repeated() only)

Functions

e(sample) marks estimation sample

References

Acock, A. C. 2014. A Gentle Introduction to Stata. 4th ed. College Station, TX: Stata Press.

Aﬁﬁ, A. A., and S. P. Azen. 1979. Statistical Analysis: A Computer Oriented Approach. 2nd ed. New York: Academic

Press.

anova — Analysis of variance and covariance 55

Altman, D. G. 1991. Practical Statistics for Medical Research. London: Chapman & Hall/CRC.

Anderson, R. L. 1990. Gertrude Mary Cox 1900–1978. Biographical Memoirs, National Academy of Sciences 59:

116–132.

Box, G. E. P. 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems, I.

Effect of inequality of variance in the one-way classiﬁcation. Annals of Mathematical Statistics 25: 290–302.

Box, J. F. 1978. R. A. Fisher: The Life of a Scientist. New York: Wiley.

Chatﬁeld, M., and A. P. Mander. 2009. The Skillings–Mack test (Friedman test when there are missing data).Stata

Journal 9: 299–305.

Cobb, G. W. 1998. Introduction to Design and Analysis of Experiments. New York: Springer.

Edwards, A. L. 1985. Multiple Regression and the Analysis of Variance and Covariance. 2nd ed. New York: Freeman.

Fisher, R. A. 1925. Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd.

. 1935. The Design of Experiments. Edinburgh: Oliver & Boyd.

. 1990. Statistical Methods, Experimental Design, and Scientiﬁc Inference. Oxford: Oxford University Press.

Geisser, S., and S. W. Greenhouse. 1958. An extension of Box’s results on the use of the F distribution in multivariate

analysis. Annals of Mathematical Statistics 29: 885–891.

Gleason, J. R. 1999. sg103: Within subjects (repeated measures) ANOVA, including between subjects factors.Stata

Technical Bulletin 47: 40–45. Reprinted in Stata Technical Bulletin Reprints, vol. 8, pp. 236–243. College Station,

TX: Stata Press.

. 2000. sg132: Analysis of variance from summary statistics.Stata Technical Bulletin 54: 42–46. Reprinted in

Stata Technical Bulletin Reprints, vol. 9, pp. 328–332. College Station, TX: Stata Press.

Hall, N. S. 2010. Ronald Fisher and Gertrude Cox: Two statistical pioneers sometimes cooperate and sometimes

collide. American Statistician 64: 212–220.

Higgins, J. E., and G. G. Koch. 1977. Variable selection and generalized chi-square analysis of categorical data

applied to a large cross-sectional occupational health survey. International Statistical Review 45: 51–62.

Huynh, H. 1978. Some approximate tests for repeated measurement designs. Psychometrika 43: 161–175.

Huynh, H., and L. S. Feldt. 1976. Estimation of the Box correction for degrees of freedom from sample data in

randomized block and split-plot designs. Journal of Educational Statistics 1: 69–82.

Kennedy, W. J., Jr., and J. E. Gentle. 1980. Statistical Computing. New York: Dekker.

Kuehl, R. O. 2000. Design of Experiments: Statistical Principles of Research Design and Analysis. 2nd ed. Belmont,

CA: Duxbury.

Marchenko, Y. V. 2006. Estimating variance components in Stata.Stata Journal 6: 1–21.

Milliken, G. A., and D. E. Johnson. 2009. Analysis of Messy Data, Volume 1: Designed Experiments. 2nd ed. Boca

Raton, FL: CRC Press.

Mitchell, M. N. 2012. Interpreting and Visualizing Regression Models Using Stata. College Station, TX: Stata Press.

Scheff´

e, H. 1959. The Analysis of Variance. New York: Wiley.

Snedecor, G. W., and W. G. Cochran. 1989. Statistical Methods. 8th ed. Ames, IA: Iowa State University Press.

van Belle, G., L. D. Fisher, P. J. Heagerty, and T. S. Lumley. 2004. Biostatistics: A Methodology for the Health

Sciences. 2nd ed. New York: Wiley.

Winer, B. J., D. R. Brown, and K. M. Michels. 1991. Statistical Principles in Experimental Design. 3rd ed. New

York: McGraw–Hill.

56 anova — Analysis of variance and covariance

Also see

[R]anova postestimation — Postestimation tools for anova

[R]contrast — Contrasts and linear hypothesis tests after estimation

[R]icc — Intraclass correlation coefﬁcients

[R]loneway — Large one-way ANOVA, random effects, and reliability

[R]oneway — One-way analysis of variance

[R]regress — Linear regression

[MV]manova — Multivariate analysis of variance and covariance

[PSS]power oneway — Power analysis for one-way analysis of variance

[PSS]power repeated — Power analysis for repeated-measures analysis of variance

[PSS]power twoway — Power analysis for two-way analysis of variance

Stata Structural Equation Modeling Reference Manual

Title

anova postestimation — Postestimation tools for anova

Description Syntax for predict Syntax for test after anova

Menu for test after anova Options for test after anova Remarks and examples

References Also see

Description

The following postestimation commands are of special interest after anova:

Command Description

dfbeta DFBETA inﬂuence statistics

estat hettest tests for heteroskedasticity

estat imtest information matrix test

estat ovtest Ramsey regression speciﬁcation-error test for omitted variables

estat szroeter Szroeter’s rank test for heteroskedasticity

estat vif variance inﬂation factors for the independent variables

estat esize η2and ω2effect sizes

rvfplot residual-versus-ﬁtted plot

avplot added-variable plot

avplots all added-variables plots in one image

cprplot component-plus-residual plot

acprplot augmented component-plus-residual plot

rvpplot residual-versus-predictor plot

lvr2plot leverage-versus-squared-residual plot

58 anova postestimation — Postestimation tools for anova

The following standard postestimation commands are also available:

Command Description

contrast contrasts and ANOVA-style joint tests of estimates

estat ic Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC)

estat summarize summary statistics for the estimation sample

estat vce variance–covariance matrix of the estimators (VCE)

estimates cataloging estimation results

hausman Hausman’s speciﬁcation test

lincom point estimates, standard errors, testing, and inference for linear

combinations of coefﬁcients

linktest link test for model speciﬁcation

lrtest likelihood-ratio test

margins marginal means, predictive margins, marginal effects, and average marginal

effects

marginsplot graph the results from margins (proﬁle plots, interaction plots, etc.)

nlcom point estimates, standard errors, testing, and inference for nonlinear

combinations of coefﬁcients

predict predictions, residuals, inﬂuence statistics, and other diagnostic measures

predictnl point estimates, standard errors, testing, and inference for generalized predictions

pwcompare pairwise comparisons of estimates

suest seemingly unrelated estimation

test Wald tests of simple and composite linear hypotheses

testnl Wald tests of nonlinear hypotheses

Special-interest postestimation commands

In addition to the common estat commands (see [R]estat), estat hettest,estat imtest,

estat ovtest,estat szroeter, and estat vif are also available. dfbeta is also available.

The syntax for dfbeta and these estat commands is the same as after regress; see [R]regress

postestimation.

For information on the plot commands, see [R]regress postestimation diagnostic plots.

In addition to the standard syntax of test (see [R]test), test after anova has three additionally

allowed syntaxes; see below. test performs Wald tests of expressions involving the coefﬁcients of

the underlying regression model. Simple and composite linear hypotheses are possible.

Syntax for predict

predict after anova follows the same syntax as predict after regress and can provide

predictions, residuals, standardized residuals, Studentized residuals, the standard error of the residuals,

the standard error of the prediction, the diagonal elements of the projection (hat) matrix, and Cook’s D.

See [R]regress postestimation for details.

anova postestimation — Postestimation tools for anova 59

Syntax for test after anova

In addition to the standard syntax of test (see [R]test), test after anova also allows the

following:

test, test(matname)mtest(opt)matvlc(matname)syntax a

test, showorder syntax b

test term term . . .  /term term . . .  , symbolic syntax c

syntax a test expression involving the coefﬁcients of the underlying regression model;

you provide information as a matrix

syntax b show underlying order of design matrix, which is useful when constructing

matname argument of the test() option

syntax c test effects and show symbolic forms

Menu for test after anova

Statistics >Linear models and related >ANOVA/MANOVA >Test linear hypotheses after anova

Options for test after anova

test(matname)is required with syntax a of test. The rows of matname specify linear combinations

of the underlying design matrix of the ANOVA that are to be jointly tested. The columns correspond

to the underlying design matrix (including the constant if it has not been suppressed). The column

and row names of matname are ignored.

A listing of the constraints imposed by the test() option is presented before the table containing

the tests. You should examine this table to verify that you have applied the linear combinations

you desired. Typing test, showorder allows you to examine the ordering of the columns for

the design matrix from the ANOVA.

mtest(opt)speciﬁes that tests are performed for each condition separately. opt speciﬁes the method

for adjusting p-values for multiple testing. Valid values for opt are

bonferroni Bonferroni’s method

holm Holm’s method

sidak ˇ

Sid´

ak’s method

noadjust no adjustment is to be made

Specifying mtest with no argument is equivalent to mtest(noadjust).

matvlc(matname), a programmer’s option, saves the variance–covariance matrix of the linear

combinations involved in the suite of tests. For the test Lb =c, what is returned in matname is

LVL0, where Vis the estimated variance–covariance matrix of b.

showorder causes test to list the deﬁnition of each column in the design matrix. showorder is

not allowed with any other option.

symbolic requests the symbolic form of the test rather than the test statistic. When this option

is speciﬁed with no terms (test, symbolic), the symbolic form of the estimable functions is

displayed.

60 anova postestimation — Postestimation tools for anova

Remarks and examples

Remarks are presented under the following headings:

Testing effects

Obtaining symbolic forms

Testing coefﬁcients and contrasts of margins

Video example

See examples 4,7,8,13,15,16, and 17 in [R]anova for examples that use the margins command.

Testing effects

After ﬁtting a model using anova, you can test for the signiﬁcance of effects in the ANOVA table,

as well as for effects that are not reported in the ANOVA table, by using the test or contrast

command. You follow test or contrast by the list of effects that you wish to test. By default, these

commands use the residual mean squared error in the denominator of the Fratio. You can specify

other error terms by using the slash notation, just as you would with anova. See [R]contrast for

details on this command.

Example 1: Testing effects

Recall our byssinosis example (example 8) in [R]anova:

. anova prob workplace smokes race workplace#smokes workplace#race smokes#race

> workplace#smokes#race [aweight=pop]

(sum of wgt is 5.4190e+03)

Number of obs = 65 R-squared = 0.8300

Root MSE = .025902 Adj R-squared = 0.7948

Source Partial SS df MS F Prob > F

Model .173646538 11 .015786049 23.53 0.0000

workplace .097625175 2 .048812588 72.76 0.0000

smokes .013030812 1 .013030812 19.42 0.0001

race .001094723 1 .001094723 1.63 0.2070

workplace#smokes .019690342 2 .009845171 14.67 0.0000

workplace#race .001352516 2 .000676258 1.01 0.3718

smokes#race .001662874 1 .001662874 2.48 0.1214

workplace#smokes#race .000950841 2 .00047542 0.71 0.4969

Residual .035557766 53 .000670901

Total .209204304 64 .003268817

We can easily obtain a test on a particular term from the ANOVA table. Here are two examples:

. test smokes

Source Partial SS df MS F Prob > F

smokes .013030812 1 .013030812 19.42 0.0001

Residual .035557766 53 .000670901

. test smokes#race

Source Partial SS df MS F Prob > F

smokes#race .001662874 1 .001662874 2.48 0.1214

Residual .035557766 53 .000670901

Both of these tests use residual error by default and agree with the ANOVA table produced earlier.

anova postestimation — Postestimation tools for anova 61

We could have performed these same tests with contrast:

. contrast smokes

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

smokes 1 19.42 0.0001

Denominator 53

. contrast smokes#race

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

smokes#race 1 2.48 0.1214

Denominator 53

Technical note

After anova, you can use the ‘/’ syntax in test or contrast to perform tests with a variety of

non-σ2Ierror structures. However, in most unbalanced models, the mean squares are not independent

and do not have equal expectations under the null hypothesis. Also, be warned that you assume

responsibility for the validity of the test statistic.

Example 2: Testing effects with different error terms

We return to the nested ANOVA example (example 11)in[R]anova, where ﬁve brands of machinery

were compared in an assembly line. We can obtain appropriate tests for the nested terms using test,

even if we had run the anova command without initially indicating the proper error terms.

. use http://www.stata-press.com/data/r13/machine

(machine data)

. anova output machine / operator|machine /

Number of obs = 57 R-squared = 0.8661

Root MSE = 1.47089 Adj R-squared = 0.8077

Source Partial SS df MS F Prob > F

Model 545.822288 17 32.1071934 14.84 0.0000

machine 430.980792 4 107.745198 13.82 0.0001

operator|machine 101.353804 13 7.79644648

operator|machine 101.353804 13 7.79644648 3.60 0.0009

Residual 84.3766582 39 2.16350406

Total 630.198947 56 11.2535526

62 anova postestimation — Postestimation tools for anova

In this ANOVA table, machine is tested with residual error. With this particular nested design, the

appropriate error term for testing machine is operator nested within machine, which is easily

obtained from test.

. test machine / operator|machine

Source Partial SS df MS F Prob > F

machine 430.980792 4 107.745198 13.82 0.0001

operator|machine 101.353804 13 7.79644648

This result from test matches what we obtained from our anova command.

Example 3: Pooling terms when testing effects

The other nested ANOVA example (example 12)in[R]anova was based on the sewage data. The

ANOVA table is presented here again. As before, we will use abbreviations of variable names in typing

the commands.

. use http://www.stata-press.com/data/r13/sewage

(Sewage treatment)

. anova particulate s / m|s / f|m|s / w|f|m|s /, dropemptycells

Number of obs = 64 R-squared = 0.6338

Root MSE = 12.7445 Adj R-squared = 0.5194

Source Partial SS df MS F Prob > F

Model 13493.6094 15 899.573958 5.54 0.0000

solution 7203.76563 1 7203.76563 17.19 0.0536

manager|solution 838.28125 2 419.140625

manager|solution 838.28125 2 419.140625 0.55 0.6166

facility|manager|

solution 3064.9375 4 766.234375

facility|manager|

solution 3064.9375 4 766.234375 2.57 0.1193

worker|facility|

manager|solution 2386.625 8 298.328125

worker|facility|

manager|solution 2386.625 8 298.328125 1.84 0.0931

Residual 7796.25 48 162.421875

Total 21289.8594 63 337.934276

In practice, it is often beneﬁcial to pool nonsigniﬁcant nested terms to increase the power of

tests on remaining terms. One rule of thumb is to allow the pooling of a term whose p-value is

larger than 0.25. In this sewage example, the p-value for the test of manager is 0.6166. This value

indicates that the manager effect is negligible and might be ignored. Currently, solution is tested by

manager|solution, which has only 2 degrees of freedom. If we pool the manager and facility

terms and use this pooled estimate as the error term for solution, we would have a term with 6

degrees of freedom.

Below are two tests: a test of solution with the pooled manager and facility terms and a

test of this pooled term by worker.

anova postestimation — Postestimation tools for anova 63

. test s / m|s f|m|s

Source Partial SS df MS F Prob > F

solution 7203.76563 1 7203.76563 11.07 0.0159

manager|solution

facility|manager|

solution 3903.21875 6 650.536458

. test m|s f|m|s / w|f|m|s

Source Partial SS df MS F Prob > F

manager|solution

facility|manager|

solution 3903.21875 6 650.536458 2.18 0.1520

worker|facility|manager|

solution 2386.625 8 298.328125

In the ﬁrst test, we included two terms after the forward slash (m|s and f|m|s). test after anova

allows multiple terms both before and after the slash. The terms before the slash are combined and

are then tested by the combined terms that follow the slash (or residual error if no slash is present).

The p-value for solution using the pooled term is 0.0159. Originally, it was 0.0536. The increase

in the power of the test is due to the increase in degrees of freedom for the pooled error term.

We can get identical results if we drop manager from the anova model. (This dataset has unique

numbers for each facility so that there is no confusion of facilities when manager is dropped.)

. anova particulate s / f|s / w|f|s /, dropemptycells

Number of obs = 64 R-squared = 0.6338

Root MSE = 12.7445 Adj R-squared = 0.5194

Source Partial SS df MS F Prob > F

Model 13493.6094 15 899.573958 5.54 0.0000

solution 7203.76563 1 7203.76563 11.07 0.0159

facility|solution 3903.21875 6 650.536458

facility|solution 3903.21875 6 650.536458 2.18 0.1520

worker|facility|

solution 2386.625 8 298.328125

worker|facility|

solution 2386.625 8 298.328125 1.84 0.0931

Residual 7796.25 48 162.421875

Total 21289.8594 63 337.934276

This output agrees with our earlier test results.

In the following example, two terms from the anova are jointly tested (pooled).

64 anova postestimation — Postestimation tools for anova

Example 4: Obtaining overall signiﬁcance of a term using contrast

In example 10 of [R]anova, we ﬁt the model anova drate region c.mage region#c.mage.

Now we use the contrast command to test for the overall signiﬁcance of region.

. contrast region region#c.mage, overall

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

region 3 7.40 0.0004

region#c.mage 3 0.86 0.4689

Overall 6 5.65 0.0002

Denominator 42

The overall Fstatistic associated with the region and region#c.mage terms is 5.65, and it is

signiﬁcant at the 0.02% level.

In the ANOVA output, the region term, by itself, had a sum of squares of 1166.15, which, based

on 3 degrees of freedom, yielded an Fstatistic of 7.40 and a signiﬁcance level of 0.0004. This is

the same test that is reported by contrast in the row labeled region. Likewise, the test from the

ANOVA output for the region#c.mage term is reproduced in the second row of the contrast output.

Obtaining symbolic forms

test can produce the symbolic form of the estimable functions and symbolic forms for particular

tests.

Example 5: Symbolic form of the estimable functions

After ﬁtting an ANOVA model, we type test, symbolic to obtain the symbolic form of the

estimable functions. For instance, returning to our blood pressure data introduced in example 4 of

[R]anova, let’s begin by reestimating systolic on drug,disease, and drug#disease:

. use http://www.stata-press.com/data/r13/systolic, clear

(Systolic Blood Pressure Data)

. anova systolic drug disease drug#disease

Number of obs = 58 R-squared = 0.4560

Root MSE = 10.5096 Adj R-squared = 0.3259

Source Partial SS df MS F Prob > F

Model 4259.33851 11 387.212591 3.51 0.0013

drug 2997.47186 3 999.157287 9.05 0.0001

disease 415.873046 2 207.936523 1.88 0.1637

drug#disease 707.266259 6 117.87771 1.07 0.3958

Residual 5080.81667 46 110.452536

Total 9340.15517 57 163.862371

anova postestimation — Postestimation tools for anova 65

To obtain the symbolic form of the estimable functions, type

. test, symbolic

drug

1 -(r2+r3+r4-r0)

2 r2

3 r3

4 r4

disease

1 -(r6+r7-r0)

2 r6

3 r7

drug#disease

1 1 -(r2+r3+r4+r6+r7-r12-r13-r15-r16-r18-r19-r0)

1 2 r6 - (r12+r15+r18)

1 3 r7 - (r13+r16+r19)

2 1 r2 - (r12+r13)

2 2 r12

2 3 r13

3 1 r3 - (r15+r16)

3 2 r15

3 3 r16

4 1 r4 - (r18+r19)

4 2 r18

4 3 r19

_cons r0

Example 6: Symbolic form for a particular test

To obtain the symbolic form for a particular test, we type test term [term . . . ], symbolic. For

instance, the symbolic form for the test of the main effect of drug is

. test drug, symbolic

drug

1 -(r2+r3+r4)

2 r2

3 r3

4 r4

disease

1 0

2 0

3 0

drug#disease

1 1 -1/3 (r2+r3+r4)

1 2 -1/3 (r2+r3+r4)

1 3 -1/3 (r2+r3+r4)

2 1 1/3 r2

2 2 1/3 r2

2 3 1/3 r2

3 1 1/3 r3

3 2 1/3 r3

3 3 1/3 r3

4 1 1/3 r4

4 2 1/3 r4

4 3 1/3 r4

_cons 0

66 anova postestimation — Postestimation tools for anova

If we omit the symbolic option, we instead see the result of the test:

. test drug

Source Partial SS df MS F Prob > F

drug 2997.47186 3 999.157287 9.05 0.0001

Residual 5080.81667 46 110.452536

Testing coefﬁcients and contrasts of margins

The test command allows you to perform tests directly on the coefﬁcients of the underly-

ing regression model. For instance, the coefﬁcient on the third drug and the second disease

is referred to as 3.drug#2.disease. This could also be written as i3.drug#i2.disease, or

b[3.drug#2.disease], or even coef[i3.drug#i2.disease]; see [U] 13.5 Accessing coefﬁ-

cients and standard errors.

Example 7: Testing linear combinations of coefﬁcients

Let’s begin by testing whether the coefﬁcient on the third drug is equal to the coefﬁcient on the

fourth in our blood pressure data. We have already ﬁt the model anova systolic drug##disease

(equivalent to anova systolic drug disease drug#disease), and you can see the results of that

estimation in example 5. Even though we have performed many tasks since we ﬁt the model, Stata

still remembers, and we can perform tests at any time.

. test 3.drug = 4.drug

( 1) 3.drug - 4.drug = 0

F( 1, 46) = 0.13

Prob > F = 0.7234

We ﬁnd that the two coefﬁcients are not signiﬁcantly different, at least at any signiﬁcance level smaller

than 73%.

For more complex tests, the contrast command often provides a more concise way to specify

the test we are interested in and prevents us from having to write the tests in terms of the regression

coefﬁcients. With contrast, we instead specify our tests in terms of differences in the marginal

means for the levels of a particular factor. For example, if we want to compare the third and fourth

drugs, we can test the difference in the mean impact on systolic blood pressure separately for each

disease using the @operator. We also use the reverse adjacent operator, ar., to compare the fourth

level of drug with the previous level.

anova postestimation — Postestimation tools for anova 67

. contrast ar4.drug@disease

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

drug@disease

(4 vs 3) 1 1 0.13 0.7234

(4 vs 3) 2 1 1.76 0.1917

(4 vs 3) 3 1 0.65 0.4230

Joint 3 0.85 0.4761

Denominator 46

Contrast Std. Err. [95% Conf. Interval]

drug@disease

(4 vs 3) 1 -2.733333 7.675156 -18.18262 12.71595

(4 vs 3) 2 8.433333 6.363903 -4.376539 21.24321

(4 vs 3) 3 5.7 7.050081 -8.491077 19.89108

None of the individual contrasts shows signiﬁcant differences between the third drug and the

fourth drug. Likewise, the overall Fstatistic is 0.85, which is hardly signiﬁcant. We cannot reject

the hypothesis that the third drug has the same effect as the fourth drug.

Technical note

Alternatively, we could have speciﬁed these tests based on the coefﬁcients of the underlying

regression model using the test command. We would have needed to perform tests on the coefﬁcients

for drug and for the coefﬁcients on drug interacted with disease in order to test for differences in

the means mentioned above. To do this, we start with our previous test command:

. test 3.drug = 4.drug

Notice that the Fstatistic for this test is equivalent to the test labeled (4 vs 3) 1 in the contrast

output. Let’s now add the constraint that the coefﬁcient on the third drug interacted with the third

disease is equal to the coefﬁcient on the fourth drug, again interacted with the third disease. We do

that by typing the new constraint and adding the accumulate option:

. test 3.drug#3.disease = 4.drug#3.disease, accumulate

( 1) 3.drug - 4.drug = 0

( 2) 3.drug#3.disease - 4.drug#3.disease = 0

F( 2, 46) = 0.39

Prob > F = 0.6791

So far, our test includes the equality of the two drug coefﬁcients, along with the equality of the

two drug coefﬁcients when interacted with the third disease. Now we add two more equations, one

for each of the remaining two diseases:

68 anova postestimation — Postestimation tools for anova

. test 3.drug#2.disease = 4.drug#2.disease, accumulate

( 1) 3.drug - 4.drug = 0

( 2) 3.drug#3.disease - 4.drug#3.disease = 0

( 3) 3.drug#2.disease - 4.drug#2.disease = 0

F( 3, 46) = 0.85

Prob > F = 0.4761

. test 3.drug#1.disease = 4.drug#1.disease, accumulate

( 1) 3.drug - 4.drug = 0

( 2) 3.drug#3.disease - 4.drug#3.disease = 0

( 3) 3.drug#2.disease - 4.drug#2.disease = 0

( 4) 3o.drug#1b.disease - 4o.drug#1b.disease = 0

Constraint 4 dropped

F( 3, 46) = 0.85

Prob > F = 0.4761

The overall Fstatistic reproduces the one from the joint test in the contrast output.

You may notice that we also got the message “Constraint 4 dropped”. For the technically inclined,

this constraint was unnecessary, given the normalization of the model. If we specify all the constraints

involved in our test or use contrast, we need not worry about the normalization because Stata

handles this automatically.

The test() option of test provides another alternative for testing coefﬁcients. Instead of spelling

out each coefﬁcient involved in the test, a matrix representing the test provides the needed information.

test, showorder shows the order of the terms in the ANOVA corresponding to the order of the

columns for the matrix argument of test().

Example 8: Another way to test linear combinations of coefﬁcients

We repeat the last test of example 7 above with the test() option. First, we view the deﬁnition

and order of the columns underlying the ANOVA performed on the systolic data.

anova postestimation — Postestimation tools for anova 69

. test, showorder

Order of columns in the design matrix

1: (drug==1)

2: (drug==2)

3: (drug==3)

4: (drug==4)

5: (disease==1)

6: (disease==2)

7: (disease==3)

8: (drug==1)*(disease==1)

9: (drug==1)*(disease==2)

10: (drug==1)*(disease==3)

11: (drug==2)*(disease==1)

12: (drug==2)*(disease==2)

13: (drug==2)*(disease==3)

14: (drug==3)*(disease==1)

15: (drug==3)*(disease==2)

16: (drug==3)*(disease==3)

17: (drug==4)*(disease==1)

18: (drug==4)*(disease==2)

19: (drug==4)*(disease==3)

20: _cons

Columns 1–4 correspond to the four levels of drug. Columns 5–7 correspond to the three levels

of disease. Columns 8–19 correspond to the interaction of drug and disease. The last column

corresponds to cons, the constant in the model.

We construct the matrix dr3vs4 with the same four constraints as the last test shown in example 7

and then use the test(dr3vs4) option to perform the test.

. matrix dr3vs4 = (0,0,1,-1, 0,0,0, 0,0,0,0,0,0,0,0,0, 0, 0, 0, 0 \

> 0,0,0, 0, 0,0,0, 0,0,0,0,0,0,0,0,1, 0, 0,-1, 0 \

> 0,0,0, 0, 0,0,0, 0,0,0,0,0,0,0,1,0, 0,-1, 0, 0 \

> 0,0,0, 0, 0,0,0, 0,0,0,0,0,0,1,0,0,-1, 0, 0, 0)

. test, test(dr3vs4)

( 1) 3.drug - 4.drug = 0

( 2) 3.drug#3.disease - 4.drug#3.disease = 0

( 3) 3.drug#2.disease - 4.drug#2.disease = 0

( 4) 3o.drug#1b.disease - 4o.drug#1b.disease = 0

Constraint 4 dropped

F( 3, 46) = 0.85

Prob > F = 0.4761

Here the effort involved with spelling out the coefﬁcients is similar to that of constructing a matrix

and using it in the test() option. When the test involving coefﬁcients is more complicated, the

test() option may be more convenient than specifying the coefﬁcients directly in test. However,

as previously demonstrated, contrast may provide an even simpler method for testing the same

hypothesis.

After ﬁtting an ANOVA model, various contrasts (1-degree-of-freedom tests comparing different

levels of a categorical variable) are often of interest. contrast can perform each 1-degree-of-freedom

test in addition to the combined test, even in cases in which the contrasts do not correspond to one

of the contrast operators.

70 anova postestimation — Postestimation tools for anova

Example 9: Testing particular contrasts of interest

Rencher and Schaalje (2008) illustrate 1-degree-of-freedom contrasts for an ANOVA comparing the

net weight of cans ﬁlled by ﬁve machines (labeled A–E). The data were originally obtained from

Ostle and Mensing (1975). Rencher and Schaalje use a cell-means ANOVA model approach for this

problem. We could do the same by using the noconstant option of anova; see [R]anova. Instead,

we obtain the same results by using the standard overparameterized ANOVA approach (that is, we

keep the constant in the model).

. use http://www.stata-press.com/data/r13/canfill

(Can Fill Data)

. list, sepby(machine)

machine weight

1. A 11.95

2. A 12.00

3. A 12.25

4. A 12.10

5. B 12.18

6. B 12.11

7. C 12.16

8. C 12.15

9. C 12.08

10. D 12.25

11. D 12.30

12. D 12.10

13. E 12.10

14. E 12.04

15. E 12.02

16. E 12.02

. anova weight machine

Number of obs = 16 R-squared = 0.4123

Root MSE = .087758 Adj R-squared = 0.1986

Source Partial SS df MS F Prob > F

Model .059426993 4 .014856748 1.93 0.1757

machine .059426993 4 .014856748 1.93 0.1757

Residual .084716701 11 .007701518

Total .144143694 15 .00960958

The four 1-degree-of-freedom tests of interest among the ﬁve machines are Aand Dversus B,C,

and E;Band Eversus C;Aversus D; and Bversus E. We can specify these tests as user-deﬁned

contrasts by placing the corresponding contrast coefﬁcients into positions related to the ﬁve levels of

machine as described in User-deﬁned contrasts of [R]contrast.

anova postestimation — Postestimation tools for anova 71

. contrast {machine 3 -2 -2 3 -2}

> {machine 0 1 -2 0 1}

> {machine 1 0 0 -1 0}

> {machine 0 1 0 0 -1}, noeffects

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

machine

(1) 1 0.75 0.4055

(2) 1 0.31 0.5916

(3) 1 4.47 0.0582

(4) 1 1.73 0.2150

Joint 4 1.93 0.1757

Denominator 11

contrast produces a 1-degree-of-freedom test for each of the speciﬁed contrasts as well as a

joint test. We included the noeffects option so that the table displaying the values of the individual

contrasts with their conﬁdence intervals was suppressed.

The signiﬁcance values above are not adjusted for multiple comparisons. We could have produced

the Bonferroni-adjusted signiﬁcance values by using the mcompare(bonferroni) option.

. contrast {machine 3 -2 -2 3 -2}

> {machine 0 1 -2 0 1}

> {machine 1 0 0 -1 0}

> {machine 0 1 0 0 -1}, mcompare(bonferroni) noeffects

Contrasts of marginal linear predictions

Margins : asbalanced

Bonferroni

df F P>F P>F

machine

(1) 1 0.75 0.4055 1.0000

(2) 1 0.31 0.5916 1.0000

(3) 1 4.47 0.0582 0.2329

(4) 1 1.73 0.2150 0.8601

Joint 4 1.93 0.1757

Denominator 11

Note: Bonferroni-adjusted p-values are reported for tests

on individual contrasts only.

Example 10: Linear and quadratic contrasts

Here there are two factors, Aand B, each with three levels. The levels are quantitative so that

linear and quadratic contrasts are of interest.

72 anova postestimation — Postestimation tools for anova

. use http://www.stata-press.com/data/r13/twowaytrend

. anova Y A B A#B

Number of obs = 36 R-squared = 0.9304

Root MSE = 2.6736 Adj R-squared = 0.9097

Source Partial SS df MS F Prob > F

Model 2578.55556 8 322.319444 45.09 0.0000

A 2026.72222 2 1013.36111 141.77 0.0000

B 383.722222 2 191.861111 26.84 0.0000

A#B 168.111111 4 42.0277778 5.88 0.0015

Residual 193 27 7.14814815

Total 2771.55556 35 79.1873016

We can use the p. contrast operator to obtain the 1-degree-of-freedom tests for the linear and

quadratic effects of Aand B.

. contrast p.A p.B, noeffects

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

(linear) 1 212.65 0.0000

(quadratic) 1 70.88 0.0000

Joint 2 141.77 0.0000

(linear) 1 26.17 0.0000

(quadratic) 1 27.51 0.0000

Joint 2 26.84 0.0000

Denominator 27

All the above tests appear to be signiﬁcant. In addition to presenting the 1-degree-of-freedom tests,

the combined tests for Aand Bare produced and agree with the original ANOVA results.

Now we explore the interaction between Aand B.

. contrast p.A#p1.B, noeffects

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

A#B

(linear) (linear) 1 17.71 0.0003

(quadratic) (linear) 1 0.07 0.7893

Joint 2 8.89 0.0011

Denominator 27

The 2-degrees-of-freedom test of the interaction of Awith the linear components of Bis signiﬁcant

at the 0.0011 level. But, when we examine the two 1-degree-of-freedom tests that compose this result,

anova postestimation — Postestimation tools for anova 73

the signiﬁcance is due to the linear Aby linear Bcontrast (signiﬁcance level of 0.0003). A signiﬁcance

value of 0.7893 for the quadratic Aby linear Bindicates that this factor is not signiﬁcant for these

data.

. contrast p.A#p2.B, noeffects

Contrasts of marginal linear predictions

Margins : asbalanced

df F P>F

A#B

(linear) (quadratic) 1 2.80 0.1058

(quadratic) (quadratic) 1 2.94 0.0979

Joint 2 2.87 0.0741

Denominator 27

The test of Awith the quadratic components of Bdoes not fall below the 0.05 signiﬁcance level.

Video example

Introduction to contrasts in Stata: One-way ANOVA

References

Mitchell, M. N. 2012. Interpreting and Visualizing Regression Models Using Stata. College Station, TX: Stata Press.

Ostle, B., and R. W. Mensing. 1975. Statistics in Research. 3rd ed. Ames, IA: Iowa State University Press.

Rencher, A. C., and G. B. Schaalje. 2008. Linear Models in Statistics. 2nd ed. New York: Wiley.

Also see

[R]anova — Analysis of variance and covariance

[R]regress postestimation — Postestimation tools for regress

[R]regress postestimation diagnostic plots — Postestimation plots for regress

[U] 20 Estimation and postestimation commands

Title

areg — Linear regression with a large dummy-variable set

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas References

Also see

Syntax

areg depvar indepvars if in weight , absorb(varname)options 

options Description

Model

∗absorb(varname)categorical variable to be absorbed

SE/Robust

vce(vcetype)vcetype may be ols,robust,cluster clustvar,bootstrap,

or jackknife

Reporting

level(#)set conﬁdence level; default is level(95)

display options control column formats, row spacing, line width, display of omitted

variables and base and empty cells, and factor-variable labeling

coeflegend display legend instead of statistics

∗absorb(varname)is required.

indepvars may contain factor variables; see [U] 11.4.3 Factor variables.

depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.

bootstrap,by,fp,jackknife,mi estimate,rolling, and statsby are allowed; see [U] 11.1.10 Preﬁx commands.

vce(bootstrap) and vce(jackknife) are not allowed with the mi estimate preﬁx; see [MI] mi estimate.

Weights are not allowed with the bootstrap preﬁx; see [R] bootstrap.

aweights are not allowed with the jackknife preﬁx; see [R] jackknife.

aweights, fweights, and pweights are allowed; see [U] 11.1.6 weight.

coeflegend does not appear in the dialog box.

See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

Statistics >Linear models and related >Other >Linear regression absorbing one cat. variable

Description

areg ﬁts a linear regression absorbing one categorical factor. areg is designed for datasets with

many groups, but not a number of groups that increases with the sample size. See the xtreg, fe

command in [XT]xtreg for an estimator that handles the case in which the number of groups increases

with the sample size.

areg — Linear regression with a large dummy-variable set 75

Options



 

Model 

absorb(varname)speciﬁes the categorical variable, which is to be included in the regression as if

it were speciﬁed by dummy variables. absorb() is required.



 

SE/Robust 

vce(vcetype)speciﬁes the type of standard error reported, which includes types that are derived

from asymptotic theory (ols), that are robust to some kinds of misspeciﬁcation (robust), that

allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods

(bootstrap,jackknife); see [R]vce option.

vce(ols), the default, uses the standard variance estimator for ordinary least-squares regression.

Exercise caution when using the vce(cluster clustvar)option with areg. The effective number

of degrees of freedom for the robust variance estimator is ng−1, where ngis the number of

clusters. Thus the number of levels of the absorb() variable should not exceed the number of

clusters.



 

Reporting 

level(#); see [R]estimation options.

display options:noomitted,vsquish,noemptycells,baselevels,allbaselevels,nofvla-

bel,fvwrap(#),fvwrapon(style),cformat(%fmt),pformat(% fmt),sformat(% fmt), and

nolstretch; see [R]estimation options.

The following option is available with areg but is not shown in the dialog box:

coeflegend; see [R]estimation options.

Remarks and examples

Suppose that you have a regression model that includes among the explanatory variables a large

number, k, of mutually exclusive and exhaustive dummies:

y=Xβ+d1γ1+d2γ2+··· +dkγk+

For instance, the dummy variables, di, might indicate countries in the world or states of the United

States. One solution would be to ﬁt the model with regress, but this solution is possible only if k

is small enough so that the total number of variables (the number of columns of Xplus the number

of di’s plus one for y) is sufﬁciently small—meaning less than matsize (see [R]matsize). For

problems with more variables than the largest possible value of matsize (100 for Small Stata, 800

for Stata/IC, and 11,000 for Stata/SE and Stata/MP), regress will not work. areg provides a way

of obtaining estimates of β—but not the γi’s—in these cases. The effects of the dummy variables

are said to be absorbed.

Example 1

So that we can compare the results produced by areg with Stata’s other regression commands,

we will ﬁt a model in which kis small. areg’s real use, however, is when kis large.

In our automobile data, we have a variable called rep78 that is coded 1, 2, 3, 4, and 5, where 1

means poor and 5 means excellent. Let’s assume that we wish to ﬁt a regression of mpg on weight,

gear ratio, and rep78 (parameterized as a set of dummies).

76 areg — Linear regression with a large dummy-variable set

. use http://www.stata-press.com/data/r13/auto2

(1978 Automobile Data)

. regress mpg weight gear_ratio b5.rep78

Source SS df MS Number of obs = 69

F( 6, 62) = 21.31

Model 1575.97621 6 262.662702 Prob > F = 0.0000

Residual 764.226686 62 12.3262369 R-squared = 0.6734

Adj R-squared = 0.6418

Total 2340.2029 68 34.4147485 Root MSE = 3.5109

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

weight -.0051031 .0009206 -5.54 0.000 -.0069433 -.003263

gear_ratio .901478 1.565552 0.58 0.567 -2.228015 4.030971

rep78

Poor -2.036937 2.740728 -0.74 0.460 -7.515574 3.4417

Fair -2.419822 1.764338 -1.37 0.175 -5.946682 1.107039

Average -2.557432 1.370912 -1.87 0.067 -5.297846 .1829814

Good -2.788389 1.395259 -2.00 0.050 -5.577473 .0006939

_cons 36.23782 7.01057 5.17 0.000 22.22389 50.25175

To ﬁt the areg equivalent, we type

. areg mpg weight gear_ratio, absorb(rep78)

Linear regression, absorbing indicators Number of obs = 69

F( 2, 62) = 41.64

Prob > F = 0.0000

R-squared = 0.6734

Adj R-squared = 0.6418

Root MSE = 3.5109

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

weight -.0051031 .0009206 -5.54 0.000 -.0069433 -.003263

gear_ratio .901478 1.565552 0.58 0.567 -2.228015 4.030971

_cons 34.05889 7.056383 4.83 0.000 19.95338 48.1644

rep78 F(4, 62) = 1.117 0.356 (5 categories)

Both regress and areg display the same R2values, root mean squared error, and—for weight

and gear ratio—the same parameter estimates, standard errors, tstatistics, signiﬁcance levels, and

conﬁdence intervals. areg, however, does not report the coefﬁcients for rep78, and, in fact, they

are not even calculated. This computational trick makes the problem manageable when kis large.

areg reports a test that the coefﬁcients associated with rep78 are jointly zero. Here this test has a

signiﬁcance level of 35.6%. This Ftest for rep78 is the same that we would obtain after regress

if we were to specify test 1.rep78 2.rep78 3.rep78 4.rep78; see [R]test.

The model Ftests reported by regress and areg also differ. The regress command reports a

test that all coefﬁcients except that of the constant are equal to zero; thus, the dummies are included

in this test. The areg output shows a test that all coefﬁcients excluding the dummies and the constant

are equal to zero. This is the same test that can be obtained after regress by typing test weight

gear ratio.

areg — Linear regression with a large dummy-variable set 77

Technical note

areg is designed for datasets with many groups, but not a number that grows with the sample

size. Consider two different samples from the U.S. population. In the ﬁrst sample, we have 10,000

individuals and we want to include an indicator for each of the 50 states, whereas in the second

sample we have 3 observations on each of 10,000 individuals and we want to include an indicator for

each individual. areg was designed for datasets similar to the ﬁrst sample in which we have a ﬁxed

number of groups, the 50 states. In the second sample, the number of groups, which is the number of

individuals, grows as we include more individuals in the sample. For an estimator designed to handle

the case in which the number of groups grows with the sample size, see the xtreg, fe command

in [XT]xtreg.

Although the point estimates produced by areg and xtreg, fe are the same, the estimated VCEs

differ when vce(cluster clustvar)is speciﬁed because the commands make different assumptions

about whether the number of groups increases with the sample size.

Technical note

The intercept reported by areg deserves some explanation because, given kmutually exclusive

and exhaustive dummies, it is arbitrary. areg identiﬁes the model by choosing the intercept that

makes the prediction calculated at the means of the independent variables equal to the mean of the

dependent variable: y=xb

β.

. predict yhat

(option xb assumed; fitted values)

. summarize mpg yhat if rep78 != .

Variable Obs Mean Std. Dev. Min Max

mpg 69 21.28986 5.866408 12 41

yhat 69 21.28986 4.383224 11.58643 28.07367

We had to include if rep78 < . in our summarize command because we have missing values in

our data. areg automatically dropped those missing values (as it should) in forming the estimates,

but predict with the xb option will make predictions for cases with missing rep78 because it does

not know that rep78 is really part of our model.

These predicted values do not include the absorbed effects (that is, the diγi). For predicted values

that include these effects, use the xbd option of predict (see [R]areg postestimation) or see

[XT]xtreg.

Example 2

areg, vce(robust) is a Huberized version of areg; see [P]robust. Just as areg is equivalent to

using regress with dummies, areg, vce(robust) is equivalent to using regress, vce(robust)

with dummies. You can use areg, vce(robust) when you expect heteroskedastic or nonnormal

errors. areg, vce(robust), like ordinary regression, assumes that the observations are independent,

unless the vce(cluster clustvar)option is speciﬁed. If the vce(cluster clustvar)option is

speciﬁed, this independence assumption is relaxed and only the clusters identiﬁed by equal values of

clustvar are assumed to be independent.

78 areg — Linear regression with a large dummy-variable set

Assume that we were to collect data by randomly sampling 10,000 doctors (from 100 hospitals)

and then sampling 10 patients of each doctor, yielding a total dataset of 100,000 patients in a cluster

sample. If in some regression we wished to include effects of the hospitals to which the doctors

belonged, we would want to include a dummy variable for each hospital, adding 100 variables to our

model. areg could ﬁt this model by

. areg depvar patient vars, absorb(hospital) vce(cluster doctor)

Stored results

areg stores the following in e():

Scalars

e(N) number of observations

e(tss) total sum of squares

e(df m) model degrees of freedom

e(rss) residual sum of squares

e(df r) residual degrees of freedom

e(r2) R-squared

e(r2 a) adjusted R-squared

e(df a) degrees of freedom for absorbed effect

e(rmse) root mean squared error

e(ll) log likelihood

e(ll 0) log likelihood, constant-only model

e(N clust) number of clusters

e(F) Fstatistic

e(F absorb) Fstatistic for absorbed effect (when vce(robust) is not speciﬁed)

e(rank) rank of e(V)

Macros

e(cmd) areg

e(cmdline) command as typed

e(depvar) name of dependent variable

e(absvar) name of absorb variable

e(wtype) weight type

e(wexp) weight expression

e(title) title in estimation output

e(clustvar) name of cluster variable

e(vce) vcetype speciﬁed in vce()

e(vcetype) title used to label Std. Err.

e(datasignature) the checksum

e(datasignaturevars) variables used in calculation of checksum

e(properties) b V

e(predict) program used to implement predict

e(footnote) program used to implement the footnote display

e(marginsnotok) predictions disallowed by margins

e(asbalanced) factor variables fvset as asbalanced

e(asobserved) factor variables fvset as asobserved

Matrices

e(b) coefﬁcient vector

e(Cns) constraints matrix

e(V) variance–covariance matrix of the estimators

e(V modelbased) model-based variance

Functions

e(sample) marks estimation sample

areg — Linear regression with a large dummy-variable set 79

Methods and formulas

areg begins by recalculating depvar and indepvars to have mean 0 within the groups speciﬁed

by absorb(). The overall mean of each variable is then added back in. The adjusted depvar is then

regressed on the adjusted indepvars with regress, yielding the coefﬁcient estimates. The degrees

of freedom of the variance–covariance matrix of the coefﬁcients is then adjusted to account for the

absorbed variables—this calculation yields the same results (up to numerical roundoff error) as if the

matrix had been calculated directly by the formulas given in [R]regress.

areg with vce(robust) or vce(cluster clustvar)works similarly, calling robust after

regress to produce the Huber/White/sandwich estimator of the variance or its clustered version. See

[P]robust, particularly Introduction and Methods and formulas. The model Ftest uses the robust

variance estimates. There is, however, no simple computational means of obtaining a robust test of the

absorbed dummies; thus this test is not displayed when the vce(robust) or vce(cluster clustvar)

option is speciﬁed.

The number of groups speciﬁed in absorb() are included in the degrees of freedom used in

the ﬁnite-sample adjustment of the cluster–robust VCE estimator. This statement is only valid if the

number of groups is small relative to the sample size. (Technically, the number of groups must remain

ﬁxed as the sample size grows.) For an estimator that allows the number of groups to grow with the

sample size, see the xtreg, fe command in [XT]xtreg.

References

Blackwell, J. L., III. 2005. Estimation and testing of ﬁxed-effect panel-data systems.Stata Journal 5: 202–207.

McCaffrey, D. F., K. Mihaly, J. R. Lockwood, and T. R. Sass. 2012. A review of Stata commands for ﬁxed-effects

estimation in normal linear models.Stata Journal 12: 406–432.

Also see

[R]areg postestimation — Postestimation tools for areg

[R]regress — Linear regression

[MI]estimation — Estimation commands for use with mi estimate

[XT]xtreg — Fixed-, between-, and random-effects and population-averaged linear models

[U] 20 Estimation and postestimation commands

Title

areg postestimation — Postestimation tools for areg

Description Syntax for predict Menu for predict Options for predict

Remarks and examples References Also see

Description

The following postestimation commands are available after areg:

Command Description

contrast contrasts and ANOVA-style joint tests of estimates

estat ic Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC)

estat summarize summary statistics for the estimation sample

estat vce variance–covariance matrix of the estimators (VCE)

estimates cataloging estimation results

forecast1dynamic forecasts and simulations

lincom point estimates, standard errors, testing, and inference for linear

combinations of coefﬁcients

linktest link test for model speciﬁcation

lrtest likelihood-ratio test

margins marginal means, predictive margins, marginal effects, and average marginal

effects

marginsplot graph the results from margins (proﬁle plots, interaction plots, etc.)

nlcom point estimates, standard errors, testing, and inference for nonlinear

combinations of coefﬁcients

predict predictions, residuals, inﬂuence statistics, and other diagnostic measures

predictnl point estimates, standard errors, testing, and inference for generalized predictions

pwcompare pairwise comparisons of estimates

test Wald tests of simple and composite linear hypotheses

testnl Wald tests of nonlinear hypotheses

1forecast is not appropriate with mi estimation results.

areg postestimation — Postestimation tools for areg 81

Syntax for predict

predict type newvar if  in  ,statistic 

where yj=xjb+dabsorbvar +ejand statistic is

statistic Description

Main

xb xjb, ﬁtted values; the default

stdp standard error of the prediction

dresiduals dabsorbvar +ej=yj−xjb

∗xbd xjb+dabsorbvar

∗ddabsorbvar

∗residuals residual

∗score score; equivalent to residuals

Unstarred statistics are available both in and out of sample; type predict . . . if e(sample) . . . if wanted only for

the estimation sample. Starred statistics are calculated only for the estimation sample, even when if e(sample)

is not speciﬁed.

Menu for predict

Statistics >Postestimation >Predictions, residuals, etc.

Options for predict



 

Main 

xb, the default, calculates the prediction of xjb, the ﬁtted values, by using the average effect of the

absorbed variable. Also see xbd below.

stdp calculates the standard error of xjb.

dresiduals calculates yj−xjb, which are the residuals plus the effect of the absorbed variable.

xbd calculates xjb+dabsorbvar, which are the ﬁtted values including the individual effects of the

absorbed variable.

dcalculates dabsorbvar, the individual coefﬁcients for the absorbed variable.

residuals calculates the residuals, that is, yj−(xjb+dabsorbvar).

score is a synonym for residuals.

Remarks and examples

Example 1

Continuing with example 1 of [R]areg, we reﬁt the model with robust standard errors and then

obtain linear predictions and standard errors for those linear predictions.

. use http://www.stata-press.com/data/r13/auto2

(1978 Automobile Data)

82 areg postestimation — Postestimation tools for areg

. areg mpg weight gear_ratio, absorb(rep78) vce(robust)

(output omitted )

. predict xb_ar

(option xb assumed; fitted values)

. predict stdp_ar, stdp

We can obtain the same linear predictions by ﬁtting the model with xtreg, fe, but we would

ﬁrst need to specify the panel structure by using xtset.

. xtset rep78

panel variable: rep78 (unbalanced)

. xtreg mpg weight gear_ratio, fe vce(robust)

(output omitted )

. predict xb_xt

(option xb assumed; fitted values)

. predict stdp_xt, stdp

. summarize xb_ar xb_xt stdp*

Variable Obs Mean Std. Dev. Min Max

xb_ar 74 21.36805 4.286788 11.58643 28.07367

xb_xt 74 21.36805 4.286788 11.58643 28.07367

stdp_ar 74 .7105649 .1933936 .4270821 1.245179

stdp_xt 74 .8155919 .4826332 .0826999 1.709786

The predicted xb values above are the same for areg and xtreg, fe, but the standard errors

for those linear predictions are different. The assumptions for these two estimators lead to different

formulations for their standard errors. The robust variance estimates with areg are equivalent to the

robust variance estimates using regress, including the panel dummies. The consistent robust variance

estimates with xtreg are equivalent to those obtained by specifying vce(cluster panelvar)with that

estimation command. For a theoretical discussion, see Wooldridge (2013), Stock and Watson (2008),

and Arellano (2003); also see the technical note after example 3 of [XT]xtreg.

Example 2

We would like to use linktest to check whether the dependent variable for our model is correctly

speciﬁed:

. use http://www.stata-press.com/data/r13/auto2, clear

(1978 Automobile Data)

. areg mpg weight gear_ratio, absorb(rep78)

(output omitted )

. linktest, absorb(rep78)

Linear regression, absorbing indicators Number of obs = 69

F( 2, 62) = 46.50

Prob > F = 0.0000

R-squared = 0.6939

Adj R-squared = 0.6643

Root MSE = 3.3990

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat -.9305602 .9537856 -0.98 0.333 -2.83715 .9760302

_hatsq .0462785 .0227219 2.04 0.046 .0008582 .0916989

_cons 19.24899 9.725618 1.98 0.052 -.1922457 38.69022

rep78 F(4, 62) = 1.278 0.288 (5 categories)

areg postestimation — Postestimation tools for areg 83

The squared residuals are signiﬁcant in the regression for mpg on the linear and squared residuals;

therefore, the test indicates that our dependent variable does not seem to be well speciﬁed. Let’s

transform the dependent variable into energy consumption, gallons per mile, ﬁt the alternative model,

and check the link test again.

. generate gpm = 1/mpg

. areg gpm weight gear_ratio, absorb(rep78)

(output omitted )

. linktest, absorb(rep78)

Linear regression, absorbing indicators Number of obs = 69

F( 2, 62) = 72.60

Prob > F = 0.0000

R-squared = 0.7436

Adj R-squared = 0.7187

Root MSE = 0.0068

gpm Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat .2842582 .7109124 0.40 0.691 -1.136835 1.705352

_hatsq 6.956965 6.862439 1.01 0.315 -6.760855 20.67478

_cons .0175457 .0178251 0.98 0.329 -.0180862 .0531777

rep78 F(4, 62) = 0.065 0.992 (5 categories)

The link test supports the use of the transformed dependent variable.

References

Arellano, M. 2003. Panel Data Econometrics. Oxford: Oxford University Press.

Stock, J. H., and M. W. Watson. 2008. Heteroskedasticity-robust standard errors for ﬁxed effects panel data regression.

Econometrica 76: 155–174.

Wooldridge, J. M. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western.

Also see

[R]areg — Linear regression with a large dummy-variable set

[U] 20 Estimation and postestimation commands

Title

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas References

Also see

Syntax

asclogit depvar indepvars  if  in weight , case(varname)

alternatives(varname)options 

options Description

Model

∗case(varname)use varname to identify cases

∗alternatives(varname)use varname to identify the alternatives available for each case

casevars(varlist)case-speciﬁc variables

basealternative(#|lbl |str)alternative to normalize location

noconstant suppress alternative-speciﬁc constant terms

altwise use alternativewise deletion instead of casewise deletion

offset(varname)include varname in model with coefﬁcient constrained to 1

constraints(constraints)apply speciﬁed linear constraints

collinear keep collinear variables

SE/Robust

vce(vcetype)vcetype may be oim,robust,cluster clustvar,bootstrap,

or jackknife

Reporting

level(#)set conﬁdence level; default is level(95)

or report odds ratios

noheader do not display the header on the coefﬁcient table

nocnsreport do not display constraints

display options control column formats and line width

Maximization

maximize options control the maximization process; seldom used

coeflegend display legend instead of statistics

∗case(varname)and alternatives(varname)are required.

bootstrap,by,fp,jackknife,statsby, and xi are allowed; see [U] 11.1.10 Preﬁx commands.

Weights are not allowed with the bootstrap preﬁx; see [R] bootstrap.

fweights, iweights, and pweights are allowed (see [U] 11.1.6 weight), but they are interpreted to apply to cases

as a whole, not to individual observations. See Use of weights in [R] clogit.

coeflegend does not appear in the dialog box.

See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model 85

Statistics >Categorical outcomes >Alternative-speciﬁc conditional logit

Description

asclogit ﬁts McFadden’s choice model, which is a speciﬁc case of the more general conditional

logistic regression model (McFadden 1974). asclogit requires multiple observations for each case

(individual or decision), where each observation represents an alternative that may be chosen. The cases

are identiﬁed by the variable speciﬁed in the case() option, whereas the alternatives are identiﬁed by

the variable speciﬁed in the alternatives() option. The outcome or chosen alternative is identiﬁed

by a value of 1 in depvar, whereas zeros indicate the alternatives that were not chosen. There can be

multiple alternatives chosen for each case.

asclogit allows two types of independent variables: alternative-speciﬁc variables and case-speciﬁc

variables. Alternative-speciﬁc variables vary across both cases and alternatives and are speciﬁed in

indepvars. Case-speciﬁc variables vary only across cases and are speciﬁed in the casevars() option.

See [R]clogit for a more general application of conditional logistic regression. For example,

clogit would be used when you have grouped data where each observation in a group may be

a different individual, but all individuals in a group have a common characteristic. You may use

clogit to obtain the same estimates as asclogit by specifying the case() variable as the group()

variable in clogit and generating variables that interact the casevars() in asclogit with each

alternative (in the form of an indicator variable), excluding the interaction variable associated with the

base alternative. asclogit takes care of this data management burden for you. Also, for clogit,

each record (row in your data) is an observation, whereas in asclogit each case, consisting of

several records (the alternatives) in your data, is an observation. This last point is important because

asclogit will drop observations, by default, in a casewise fashion. That is, if there is at least one

missing value in any of the variables for each record of a case, the entire case is dropped from

estimation. To use alternativewise deletion, specify the altwise option and only the records with

missing values will be dropped from estimation.

Options



 

Model 

case(varname)speciﬁes the numeric variable that identiﬁes each case. case() is required and must

be integer valued.

alternatives(varname)speciﬁes the variable that identiﬁes the alternatives for each case. The

number of alternatives can vary with each case; the maximum number of alternatives cannot exceed

the limits of tabulate oneway; see [R]tabulate oneway.alternatives() is required and may

be a numeric or a string variable.

casevars(varlist)speciﬁes the case-speciﬁc numeric variables. These are variables that are constant

for each case. If there are a maximum of Jalternatives, there will be J−1 sets of coefﬁcients

associated with the casevars().

basealternative(#|lbl |str)speciﬁes the alternative that normalizes the latent-variable location

(the level of utility). The base alternative may be speciﬁed as a number, label, or string depending

on the storage type of the variable indicating alternatives. The default is the alternative with the

highest frequency.

If vce(bootstrap) or vce(jackknife) is speciﬁed, you must specify the base alternative. This

is to ensure that the same model is ﬁt with each call to asclogit.

86 asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

noconstant suppresses the J−1 alternative-speciﬁc constant terms.

altwise speciﬁes that alternativewise deletion be used when marking out observations due to missing

values in your variables. The default is to use casewise deletion; that is, the entire group of

observations making up a case is deleted if any missing values are encountered. This option does

not apply to observations that are marked out by the if or in qualiﬁer or the by preﬁx.

offset(varname),constraints(numlist |matname),collinear; see [R]estimation options.



 

SE/Robust 

vce(vcetype)speciﬁes the type of standard error reported, which includes types that are derived

from asymptotic theory (oim), that are robust to some kinds of misspeciﬁcation (robust), that

allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods

(bootstrap,jackknife); see [R]vce option.



 

Reporting 

level(#); see [R]estimation options.

or reports the estimated coefﬁcients transformed to odds ratios, that is, ebrather than b. Standard errors

and conﬁdence intervals are similarly transformed. This option affects how results are displayed,

not how they are estimated. or may be speciﬁed at estimation or when replaying previously

estimated results.

noheader prevents the coefﬁcient table header from being displayed.

nocnsreport; see [R]estimation options.

display options:cformat(%fmt),pformat(% fmt),sformat(% fmt), and nolstretch; see [R]es-

timation options.



 

Maximization 

maximize options:difficult,technique(algorithm spec),iterate(#),nolog,trace,

gradient,showstep,hessian,showtolerance,tolerance(#),ltolerance(#),

nrtolerance(#),nonrtolerance, and from(init specs); see [R]maximize. These options are

seldom used.

technique(bhhh) is not allowed.

The initial estimates must be speciﬁed as from(matname , copy ), where matname is the

matrix containing the initial estimates and the copy option speciﬁes that only the position of each

element in matname is relevant. If copy is not speciﬁed, the column stripe of matname identiﬁes

the estimates.

The following option is available with asclogit but is not shown in the dialog box:

coeflegend; see [R]estimation options.

Remarks and examples

asclogit ﬁts McFadden’s choice model (McFadden [1974]; for a brief introduction, see Greene

[2012, sec. 18.2] or Cameron and Trivedi [2010, sec. 15.5]). In this model, we have a set of unordered

alternatives indexed by 1,2, . . . , J. Let yij ,j=1, . . . , J, be an indicator variable for the alternative

actually chosen by the ith individual (case). That is, yij = 1 if individual ichose alternative j

and yij =0 otherwise. The independent variables come in two forms: alternative speciﬁc and case

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model 87

speciﬁc. Alternative-speciﬁc variables vary among the alternatives (as well as cases), and case-speciﬁc

variables vary only among cases. Assume that we have palternative-speciﬁc variables so that for

case iwe have a J×pmatrix, Xi. Further, assume that we have qcase-speciﬁc variables so that

we have a 1 ×qvector zifor case i. Our random-utility model can then be expressed as

ui=Xiβ+ (ziA)0+i

Here βis a p×1 vector of alternative-speciﬁc regression coefﬁcients and A= (α1,...,αJ)is a q×J

matrix of case-speciﬁc regression coefﬁcients. The elements of the J×1 vector iare independent

Type I (Gumbel-type) extreme-value random variables with mean γ(the Euler–Mascheroni constant,

approximately 0.577) and variance π2/6. We must ﬁx one of the αjto the constant vector to normalize

the location. We set αk=0, where kis speciﬁed by the basealternative() option. The vector

uiquantiﬁes the utility that the individual gains from the Jalternatives. The alternative chosen by

individual iis the one that maximizes utility.

Example 1

We have data on 295 consumers and their choice of automobile. Each consumer chose among an

American, Japanese, or European car; the variable car indicates the nationality of the car for each

alternative. We want to explore the relationship between the choice of car to the consumer’s sex

(variable sex) and income (variable income in thousands of dollars). We also have information on

the number of dealerships of each nationality in the consumer’s city in the variable dealer that we

want to include as a regressor. We assume that consumers’ preferences are inﬂuenced by the number

of dealerships in an area but that the number of dealerships is not inﬂuenced by consumer preferences

(which we admit is a rather strong assumption). The variable dealer is an alternative-speciﬁc variable

(Xiis a 3 ×1 vector in our previous notation), and sex and income are case-speciﬁc variables (zi

is a 1 ×2 vector). Each consumer’s chosen car is indicated by the variable choice.

Let’s list some of the data.

. use http://www.stata-press.com/data/r13/choice

. list id car choice dealer sex income in 1/12, sepby(id)

id car choice dealer sex income

1. 1 American 0 18 male 46.7

2. 1 Japan 0 8 male 46.7

3. 1 Europe 1 5 male 46.7

4. 2 American 1 17 male 26.1

5. 2 Japan 0 6 male 26.1

6. 2 Europe 0 2 male 26.1

7. 3 American 1 12 male 32.7

8. 3 Japan 0 6 male 32.7

9. 3 Europe 0 2 male 32.7

10. 4 American 0 18 female 49.2

11. 4 Japan 1 7 female 49.2

12. 4 Europe 0 4 female 49.2

We see, for example, that the ﬁrst consumer, a male earning $46,700 per year, chose to purchase a

European car even though there are more American and Japanese car dealers in his area. The fourth

consumer, a female earning $49,200 per year, purchased a Japanese car.

88 asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

We now ﬁt our model.

. asclogit choice dealer, case(id) alternatives(car) casevars(sex income)

Iteration 0: log likelihood = -273.55685

Iteration 1: log likelihood = -252.75109

Iteration 2: log likelihood = -250.78555

Iteration 3: log likelihood = -250.7794

Iteration 4: log likelihood = -250.7794

Alternative-specific conditional logit Number of obs = 885

Case variable: id Number of cases = 295

Alternative variable: car Alts per case: min = 3

avg = 3.0

max = 3

Wald chi2(5) = 15.86

Log likelihood = -250.7794 Prob > chi2 = 0.0072

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

car

dealer .0680938 .0344465 1.98 0.048 .00058 .1356076

American (base alternative)

Japan

sex -.5346039 .3141564 -1.70 0.089 -1.150339 .0811314

income .0325318 .012824 2.54 0.011 .0073973 .0576663

_cons -1.352189 .6911829 -1.96 0.050 -2.706882 .0025049

Europe

sex .5704109 .4540247 1.26 0.209 -.3194612 1.460283

income .032042 .0138676 2.31 0.021 .004862 .0592219

_cons -2.355249 .8526681 -2.76 0.006 -4.026448 -.6840501

Displaying the results as odds ratios makes interpretation easier.

. asclogit, or noheader

choice Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

car

dealer 1.070466 .0368737 1.98 0.048 1.00058 1.145232

American (base alternative)

Japan

sex .5859013 .1840647 -1.70 0.089 .3165294 1.084513

income 1.033067 .013248 2.54 0.011 1.007425 1.059361

_cons .2586735 .1787907 -1.96 0.050 .0667446 1.002508

Europe

sex 1.768994 .8031669 1.26 0.209 .7265404 4.307178

income 1.032561 .0143191 2.31 0.021 1.004874 1.061011

_cons .0948699 .0808925 -2.76 0.006 .0178376 .5045693

These results indicate that men (sex =1) are less likely to pick a Japanese car over an American

car than women (odds ratio 0.59) but that men are more likely to choose a European car over an

American car (odds ratio 1.77). Raising a person’s income increases the likelihood that he or she

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model 89

purchases a Japanese or European car; interestingly, the effect of higher income is about the same

for these two types of cars.

 

Daniel Little McFadden was born in 1937 in North Carolina. He studied physics, psychology,

and economics at the University of Minnesota and has taught economics at Pittsburgh, Berkeley,

MIT, and the University of Southern California. His contributions to logit models were triggered

by a student’s project on freeway routing decisions, and his work consistently links economic

theory and applied problems. In 2000, he shared the Nobel Prize in Economics with James J.

Heckman.

 

Technical note

McFadden’s choice model is related to multinomial logistic regression (see [R]mlogit). If all the

independent variables are case speciﬁc, then the two models are identical. We verify this supposition

by running the previous example without the alternative-speciﬁc variable, dealer.

. asclogit choice, case(id) alternatives(car) casevars(sex income) nolog

Alternative-specific conditional logit Number of obs = 885

Case variable: id Number of cases = 295

Alternative variable: car Alts per case: min = 3

avg = 3.0

max = 3

Wald chi2(4) = 12.53

Log likelihood = -252.72012 Prob > chi2 = 0.0138

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

American (base alternative)

Japan

sex -.4694799 .3114939 -1.51 0.132 -1.079997 .141037

income .0276854 .0123666 2.24 0.025 .0034472 .0519236

_cons -1.962652 .6216804 -3.16 0.002 -3.181123 -.7441807

Europe

sex .5388441 .4525279 1.19 0.234 -.3480942 1.425782

income .0273669 .013787 1.98 0.047 .000345 .0543889

_cons -3.180029 .7546837 -4.21 0.000 -4.659182 -1.700876

To run mlogit, we must rearrange the dataset. mlogit requires a dependent variable that indicates

the choice—1, 2, or 3—for each individual. We will use car as our dependent variable for those

observations that represent the choice actually chosen.

90 asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

. keep if choice == 1

(590 observations deleted)

. mlogit car sex income

Iteration 0: log likelihood = -259.1712

Iteration 1: log likelihood = -252.81165

Iteration 2: log likelihood = -252.72014

Iteration 3: log likelihood = -252.72012

Multinomial logistic regression Number of obs = 295

LR chi2(4) = 12.90

Prob > chi2 = 0.0118

Log likelihood = -252.72012 Pseudo R2 = 0.0249

car Coef. Std. Err. z P>|z| [95% Conf. Interval]

American (base outcome)

Japan

sex -.4694798 .3114939 -1.51 0.132 -1.079997 .1410371

income .0276854 .0123666 2.24 0.025 .0034472 .0519236

_cons -1.962651 .6216803 -3.16 0.002 -3.181122 -.7441801

Europe

sex .5388443 .4525278 1.19 0.234 -.348094 1.425783

income .027367 .013787 1.98 0.047 .000345 .0543889

_cons -3.18003 .7546837 -4.21 0.000 -4.659182 -1.700877

The results are the same except for the model statistic: asclogit uses a Wald test and mlogit

uses a likelihood-ratio test. If you prefer the likelihood-ratio test, you can ﬁt the constant-only model

for asclogit followed by the full model and use [R]lrtest. The following example will carry this

out.

. use http://www.stata-press.com/data/r13/choice, clear

. asclogit choice, case(id) alternatives(car)

. estimates store null

. asclogit choice, case(id) alternatives(car) casevars(sex income)

. lrtest null .

Technical note

We force you to explicitly identify the case-speciﬁc variables in the casevars() option to ensure

that the program behaves as you expect. For example, an if or in qualiﬁer may drop observations in

such a way that (what was expected to be) an alternative-speciﬁc variable turns into a case-speciﬁc

variable. Here you would probably want asclogit to terminate instead of interacting the variable with

the alternative indicators. This situation could also occur if asclogit drops cases, or observations

if you use the altwise option, because of missing values.

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model 91

Stored results

asclogit stores the following in e():

Scalars

e(N) number of observations

e(N case) number of cases

e(k) number of parameters

e(k alt) number of alternatives

e(k indvars) number of alternative-speciﬁc variables

e(k casevars) number of case-speciﬁc variables

e(k eq) number of equations in e(b)

e(k eq model) number of equations in overall model test

e(df m) model degrees of freedom

e(ll) log likelihood

e(N clust) number of clusters

e(const) constant indicator

e(i base) base alternative index

e(chi2) χ2

e(F) Fstatistic

e(p) signiﬁcance

e(alt min) minimum number of alternatives

e(alt avg) average number of alternatives

e(alt max) maximum number of alternatives

e(rank) rank of e(V)

e(ic) number of iterations

e(rc) return code

e(converged) 1 if converged, 0otherwise

Macros

e(cmd) asclogit

e(cmdline) command as typed

e(depvar) name of dependent variable

e(indvars) alternative-speciﬁc independent variable

e(casevars) case-speciﬁc variables

e(case) variable deﬁning cases

e(altvar) variable deﬁning alternatives

e(alteqs) alternative equation names

e(alt#)alternative labels

e(wtype) weight type

e(wexp) weight expression

e(title) title in estimation output

e(clustvar) name of cluster variable

e(offset) linear offset variable

e(chi2type) Wald, type of model χ2test

e(vce) vcetype speciﬁed in vce()

e(vcetype) title used to label Std. Err.

e(opt) type of optimization

e(which) max or min; whether optimizer is to perform maximization or minimization

e(ml method) type of ml method

e(user) name of likelihood-evaluator program

e(technique) maximization technique

e(datasignature) the checksum

e(datasignaturevars) variables used in calculation of checksum

e(properties) b V

e(estat cmd) program used to implement estat

e(predict) program used to implement predict

e(marginsnotok) predictions disallowed by margins

92 asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

Matrices

e(b) coefﬁcient vector

e(stats) alternative statistics

e(altvals) alternative values

e(altfreq) alternative frequencies

e(alt casevars) indicators for estimated case-speciﬁc coefﬁcients—e(k alt)×e(k casevars)

e(ilog) iteration log (up to 20 iterations)

e(gradient) gradient vector

e(V) variance–covariance matrix of the estimators

e(V modelbased) model-based variance

Functions

e(sample) marks estimation sample

Methods and formulas

In this model, we have a set of unordered alternatives indexed by 1,2, . . . , J. Let yij ,j=1, . . . , J,

be an indicator variable for the alternative actually chosen by the ith individual (case). That is, yij = 1

if individual ichose alternative jand yij =0 otherwise. The independent variables come in two

forms: alternative speciﬁc and case speciﬁc. Alternative-speciﬁc variables vary among the alternatives

(as well as cases), and case-speciﬁc variables vary only among cases. Assume that we have p

alternative-speciﬁc variables so that for case iwe have a J×pmatrix, Xi. Further, assume that

we have qcase-speciﬁc variables so that we have a 1 ×qvector zifor case i. The deterministic

component of the random-utility model can then be expressed as

ηi=Xiβ+ (ziA)0

=Xiβ+ (zi⊗IJ)vec(A0)

= (Xi,zi⊗IJ)β

vec(A0)

=X∗

iβ∗

As before, βis a p×1 vector of alternative-speciﬁc regression coefﬁcients, and A= (α1,...,αJ)

is a q×Jmatrix of case-speciﬁc regression coefﬁcients; remember that we must ﬁx one of the αj

to the constant vector to normalize the location. Here IJis the J×Jidentity matrix, vec() is the

vector function that creates a vector from a matrix by placing each column of the matrix on top of

the other (see [M-5]vec( )), and ⊗is the Kronecker product (see [M-2]op kronecker).

We have rewritten the linear equation so that it is a form that can be used by clogit, namely,

X∗

iβ∗, where

X∗

i= (Xi,zi⊗IJ)

β∗=β

vec(A0)

With this in mind, see Methods and formulas in [R]clogit for the computational details of the

conditional logit model.

This command supports the clustered version of the Huber/White/sandwich estimator of the

variance using vce(robust) and vce(cluster clustvar). See [P]robust, particularly Maximum

likelihood estimators and Methods and formulas. Specifying vce(robust) is equivalent to specifying

vce(cluster casevar), where casevar is the variable that identiﬁes the cases.

asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model 93

References

Cameron, A. C., and P. K. Trivedi. 2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press.

Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.

McFadden, D. L. 1974. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, ed.

P. Zarembka, 105–142. New York: Academic Press.

Also see

[R]asclogit postestimation — Postestimation tools for asclogit

[R]asmprobit — Alternative-speciﬁc multinomial probit regression

[R]asroprobit — Alternative-speciﬁc rank-ordered probit regression

[R]clogit — Conditional (ﬁxed-effects) logistic regression

[R]logistic — Logistic regression, reporting odds ratios

[R]logit — Logistic regression, reporting coefﬁcients

[R]nlogit — Nested logit regression

[R]ologit — Ordered logistic regression

[U] 20 Estimation and postestimation commands

Title

asclogit postestimation — Postestimation tools for asclogit

Description Syntax for predict Menu for predict

Options for predict Syntax for estat Menu for estat

Options for estat mfx Remarks and examples Stored results

Methods and formulas Also see

Description

The following postestimation commands are of special interest after asclogit:

Commands Description

estat alternatives alternative summary statistics

estat mfx marginal effects

The following standard postestimation commands are also available:

Commands Description

estat ic Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC)

estat summarize summary statistics for the estimation sample

estat vce variance–covariance matrix of the estimators (VCE)

estimates cataloging estimation results

hausman Hausman’s speciﬁcation test

lincom point estimates, standard errors, testing, and inference for linear

combinations of coefﬁcients

lrtest likelihood-ratio test

nlcom point estimates, standard errors, testing, and inference for nonlinear

combinations of coefﬁcients

predict predicted probabilities, estimated linear predictor and its standard error

predictnl point estimates, standard errors, testing, and inference for generalized

predictions

test Wald tests of simple and composite linear hypotheses

testnl Wald tests of nonlinear hypotheses

Special-interest postestimation commands

estat alternatives displays summary statistics about the alternatives in the estimation sample.

estat mfx computes probability marginal effects.

asclogit postestimation — Postestimation tools for asclogit 95

Syntax for predict

predict type newvar if  in  ,statistic options 

predict type  stub*|newvarlist  if  in , scores

statistic Description

Main

pr probability that each alternative is chosen; the default

xb linear prediction

stdp standard error of the linear prediction

options Description

Main

∗k(#|observed) condition on #alternatives per case or on observed number of alternatives

altwise use alternativewise deletion instead of casewise deletion when computing

probabilities

nooffset ignore the offset() variable speciﬁed in asclogit

∗k(#|observed) may be used only with pr.

These statistics are available both in and out of sample; type predict . . . if e(sample) . . . if wanted

only for the estimation sample.

Menu for predict

Statistics >Postestimation >Predictions, residuals, etc.

Options for predict



 

Main 

pr computes the probability of choosing each alternative conditioned on each case choosing k()

alternatives. This is the default statistic with default k(1); one alternative per case is chosen.

xb computes the linear prediction.

stdp computes the standard error of the linear prediction.

k(#|observed) conditions the probability on #alternatives per case or on the observed number of

alternatives. The default is k(1). This option may be used only with the pr option.

altwise speciﬁes that alternativewise deletion be used when marking out observations due to missing

values in your variables. The default is to use casewise deletion. The xb and stdp options always

use alternativewise deletion.

nooffset is relevant only if you speciﬁed offset(varname)for asclogit. It modiﬁes the calcu-

lations made by predict so that they ignore the offset variable; the linear prediction is treated as

xβrather than as xβ+offset.

scores calculates the scores for each coefﬁcient in e(b). This option requires a new variable list of

length equal to the number of columns in e(b). Otherwise, use the stub*option to have predict

generate enumerated variables with preﬁx stub.

96 asclogit postestimation — Postestimation tools for asclogit

Syntax for estat

Alternative summary statistics

estat alternatives

Marginal effects

estat mfx if  in  ,options 

options Description

Main

varlist(varlist)display marginal effects for varlist

at(mean atlist |median atlist )calculate marginal effects at these values

k(#)condition on the number of alternatives chosen to be #

Options

level(#)set conﬁdence interval level; default is level(95)

nodiscrete treat indicator variables as continuous

noesample do not restrict calculation of means and medians to the

estimation sample

nowght ignore weights when calculating means and medians

Menu for estat

Statistics >Postestimation >Reports and statistics

Options for estat mfx



 

Main 

varlist(varlist)speciﬁes the variables for which to display marginal effects. The default is all

variables.

at(mean atlist |median atlist )speciﬁes the values at which the marginal effects are to be

calculated. atlist is

alternative:variable =# variable =# alternative:offset =# . . . 

The default is to calculate the marginal effects at the means of the independent variables by using

the estimation sample, at(mean). If offset() is used during estimation, the means of the offsets

(by alternative) are computed by default.

After specifying the summary statistic, you can specify a series of speciﬁc values for variables.

You can specify values for alternative-speciﬁc variables by alternative, or you can specify one

value for all alternatives. You can specify only one value for case-speciﬁc variables. You specify

values for the offset() variable (if present) the same way as for alternative-speciﬁc variables. For

example, in the choice dataset (car choice), income is a case-speciﬁc variable, whereas dealer

is an alternative-speciﬁc variable. The following would be a legal syntax for estat mfx:

. estat mfx, at(mean American:dealer=18 income=40)

asclogit postestimation — Postestimation tools for asclogit 97

When nodiscrete is not speciﬁed, at(mean atlist )or at(median atlist )has no effect on

computing marginal effects for indicator variables, which are calculated as the discrete change in

the simulated probability as the indicator variable changes from 0 to 1.

The mean and median computations respect any if or in qualiﬁers, so you can restrict the data over

which the statistic is computed. You can even restrict the values to a speciﬁc case, for example,

. estat mfx if case==21

k(#)computes the probabilities conditioned on #alternatives chosen. The default is one alternative

chosen.



 

Options 

level(#)sets the conﬁdence level; default is level(95).

nodiscrete speciﬁes that indicator variables be treated as continuous variables. An indicator variable

is one that takes on the value 0or 1in the estimation sample. By default, the discrete change in

the simulated probability is computed as the indicator variable changes from 0to 1.

noesample speciﬁes that the whole dataset be considered instead of only those marked in the

e(sample) deﬁned by the asclogit command.

nowght speciﬁes that weights be ignored when calculating the medians.

Remarks and examples

Remarks are presented under the following headings:

Predicted probabilities

Obtaining estimation statistics

Predicted probabilities

After ﬁtting a McFadden’s choice model with alternative-speciﬁc conditional logistic regression,

you can use predict to obtain the estimated probability of alternative choices given case proﬁles.

Example 1

In example 1 of [R]asclogit, we ﬁt a model of consumer choice of automobile. The alternatives are

nationality of the automobile manufacturer: American, Japanese, or European. There is one alternative-

speciﬁc variable in the model, dealer, which contains the number of dealerships of each nationality

in the consumer’s city. The case-speciﬁc variables are sex, the consumer’s sex, and income, the

consumer’s income in thousands of dollars.

. use http://www.stata-press.com/data/r13/choice

. asclogit choice dealer, case(id) alternatives(car) casevars(sex income)

(output omitted )

. predict p

(option pr assumed; Pr(car))

. predict p2, k(2)

(option pr assumed; Pr(car))

. format p p2 %6.4f

98 asclogit postestimation — Postestimation tools for asclogit

. list car choice dealer sex income p p2 in 1/9, sepby(id)

car choice dealer sex income p p2

1. American 0 18 male 46.7 0.6025 0.8589

2. Japan 0 8 male 46.7 0.2112 0.5974

3. Europe 1 5 male 46.7 0.1863 0.5437

4. American 1 17 male 26.1 0.7651 0.9293

5. Japan 0 6 male 26.1 0.1282 0.5778

6. Europe 0 2 male 26.1 0.1067 0.4929

7. American 1 12 male 32.7 0.6519 0.8831

8. Japan 0 6 male 32.7 0.1902 0.5995

9. Europe 0 2 male 32.7 0.1579 0.5174

Obtaining estimation statistics

Here we will demonstrate the specialized estat subcommands after asclogit. Use estat

alternatives to obtain a table of alternative statistics. The table will contain the alternative values,

labels (if any), the number of cases in which each alternative is present, the frequency that the

alternative is selected, and the percent selected.

Use estat mfx to obtain marginal effects after asclogit.

Example 2

We will continue with the automobile choice example, where we ﬁrst list the alternative statistics

and then compute the marginal effects at the mean income in our sample, assuming that there are

ﬁve automobile dealers for each nationality. We will evaluate the probabilities for females because

sex is coded 0 for females, and we will be obtaining the discrete change from 0 to 1.

. estat alternatives

Alternatives summary for car

Alternative Cases Frequency Percent

index value label present selected selected

1 1 American 295 192 65.08

2 2 Japan 295 64 21.69

3 3 Europe 295 39 13.22

. estat mfx, at(dealer=0 sex=0) varlist(sex income)

Pr(choice = American|1 selected) = .41964329

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

casevars

sex* .026238 .068311 0.38 0.701 -.107649 .160124 0

income -.007891 .002674 -2.95 0.003 -.013132 -.00265 42.097

(*) dp/dx is for discrete change of indicator variable from 0 to 1

asclogit postestimation — Postestimation tools for asclogit 99

Pr(choice = Japan|1 selected) = .42696187

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

casevars

sex* -.161164 .079238 -2.03 0.042 -.316468 -.005859 0

income .005861 .002997 1.96 0.051 -.000014 .011735 42.097

(*) dp/dx is for discrete change of indicator variable from 0 to 1

Pr(choice = Europe|1 selected) = .15339484

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

casevars

sex* .134926 .076556 1.76 0.078 -.015122 .284973 0

income .00203 .001785 1.14 0.255 -.001469 .00553 42.097

(*) dp/dx is for discrete change of indicator variable from 0 to 1

The marginal effect of income indicates that there is a lower chance for a consumer to buy American

automobiles with an increase in income. There is an indication that men have a higher preference

for European automobiles than women but a lower preference for Japanese automobiles. We did not

include the marginal effects for dealer because we view these as nuisance parameters, so we adjusted

the probabilities by ﬁxing dealer to a constant, 0.

Stored results

estat mfx stores the following in r():

Scalars

r(pr alt)scalars containing the computed probability of each alternative evaluated at the value that is

labeled X in the table output. Here alt are the labels in the macro e(alteqs).

Matrices

r(alt)matrices containing the computed marginal effects and associated statistics. There is one matrix

for each alternative, where alt are the labels in the macro e(alteqs). Column 1 of each

matrix contains the marginal effects; column 2, their standard errors; column 3, their z

statistics; and columns 4 and 5, the conﬁdence intervals. Column 6 contains the values

of the independent variables used to compute the probabilities r(pr alt).

Methods and formulas

The deterministic component of the random-utility model can be expressed as

η=Xβ+ (zA)0

=Xβ+ (z⊗IJ)vec(A0)

= (X,z⊗IJ)β

vec(A0)

=X∗β∗

where Xis the J×pmatrix containing the alternative-speciﬁc covariates, zis a 1 ×qvector

of case-speciﬁc variables, βis a p×1 vector of alternative-speciﬁc regression coefﬁcients, and

A= (α1,...,αJ)is a q×Jmatrix of case-speciﬁc regression coefﬁcients (with one of the αj

ﬁxed to the constant). Here IJis the J×Jidentity matrix, vec() is the vector function that creates

a vector from a matrix by placing each column of the matrix on top of the other (see [M-5]vec( )),

and ⊗is the Kronecker product (see [M-2]op kronecker).

100 asclogit postestimation — Postestimation tools for asclogit

We have rewritten the linear equation so that it is a form that we all recognize, namely, η=X∗β∗,

where

X∗= (X,z⊗IJ)

β∗=β

vec(A0)

To compute the marginal effects, we use the derivative of the log likelihood ∂`(y|η)/∂η, where

`(y|η) = log Pr(y|η)is the log of the probability of the choice indicator vector ygiven the linear

predictor vector η. Namely,

∂Pr(y|η)

∂vec(X∗)0= Pr(y|η)∂`(y|η)

∂η0

∂η

∂vec(X∗)0

= Pr(y|η)∂`(y|η)

∂η0β∗0 ⊗IJ

The standard errors of the marginal effects are computed using the delta method.

Also see

[R]asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

[U] 20 Estimation and postestimation commands

Title

asmprobit — Alternative-speciﬁc multinomial probit regression

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas References

Also see

Syntax

asmprobit depvar indepvars  if  in weight , case(varname)

alternatives(varname)options 

options Description

Model

∗case(varname)use varname to identify cases

∗alternatives(varname)use varname to identify the alternatives available for each case

casevars(varlist)case-speciﬁc variables

constraints(constraints)apply speciﬁed linear constraints

collinear keep collinear variables

Model 2

correlation(correlation)correlation structure of the latent-variable errors

stddev(stddev)variance structure of the latent-variable errors

structural use the structural covariance parameterization; default is the

differenced covariance parameterization

factor(#)use the factor covariance structure with dimension #

noconstant suppress the alternative-speciﬁc constant terms

basealternative(#|lbl |str)alternative used for normalizing location

scalealternative(#|lbl |str)alternative used for normalizing scale

altwise use alternativewise deletion instead of casewise deletion

SE/Robust

vce(vcetype)vcetype may be oim,robust,cluster clustvar,opg,

bootstrap, or jackknife

Reporting

level(#)set conﬁdence level; default is level(95)

notransform do not transform variance–covariance estimates to the standard

deviation and correlation metric

nocnsreport do not display constraints

display options control column formats and line width

101

102 asmprobit — Alternative-speciﬁc multinomial probit regression

Integration

intmethod(seqtype)type of quasi- or pseudouniform point set

intpoints(#)number of points in each sequence

intburn(#)starting index in the Hammersley or Halton sequence

intseed(code |#)pseudouniform random-number seed

antithetics use antithetic draws

nopivot do not use integration interval pivoting

initbhhh(#)use the BHHH optimization algorithm for the ﬁrst #iterations

favor(speed |space) favor speed or space when generating integration points

Maximization

maximize options control the maximization process

coeflegend display legend instead of statistics

correlation Description

unstructured one correlation parameter for each pair of alternatives; correlations

with the basealternative() are zero; the default

exchangeable one correlation parameter common to all pairs of alternatives;

correlations with the basealternative() are zero

independent constrain all correlation parameters to zero

pattern matname user-speciﬁed matrix identifying the correlation pattern

fixed matname user-speciﬁed matrix identifying the ﬁxed and free correlation

parameters

stddev Description

heteroskedastic estimate standard deviation for each alternative; standard deviations

for basealternative() and scalealternative() set to one

homoskedastic all standard deviations are one

pattern matname user-speciﬁed matrix identifying the standard deviation pattern

fixed matname user-speciﬁed matrix identifying the ﬁxed and free standard

deviations

seqtype Description

hammersley Hammersley point set

halton Halton point set

random uniform pseudorandom point set

∗case(varname)and alternatives(varname)are required.

bootstrap,by,jackknife,statsby, and xi are allowed; see [U] 11.1.10 Preﬁx commands.

Weights are not allowed with the bootstrap preﬁx; see [R] bootstrap.

fweights, iweights, and pweights are allowed; see [U] 11.1.6 weight.

coeflegend does not appear in the dialog box.

See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

asmprobit — Alternative-speciﬁc multinomial probit regression 103

Statistics >Categorical outcomes >Alternative-speciﬁc multinomial probit

Description

asmprobit ﬁts multinomial probit (MNP) models by using maximum simulated likelihood (MSL)

implemented by the Geweke–Hajivassiliou–Keane (GHK) algorithm. By estimating the variance–

covariance parameters of the latent-variable errors, the model allows you to relax the independence

of irrelevant alternatives (IIA) property that is characteristic of the multinomial logistic model.

asmprobit requires multiple observations for each case (decision), where each observation rep-

resents an alternative that may be chosen. The cases are identiﬁed by the variable speciﬁed in the

case() option, whereas the alternatives are identiﬁed by the variable speciﬁed in the alternative()

option. The outcome (chosen alternative) is identiﬁed by a value of 1 in depvar, with 0 indicating

the alternatives that were not chosen; only one alternative may be chosen for each case.

asmprobit allows two types of independent variables: alternative-speciﬁc variables and case-

speciﬁc variables. Alternative-speciﬁc variables vary across both cases and alternatives and are speciﬁed

in indepvars. Case-speciﬁc variables vary only across cases and are speciﬁed in the casevars()

option.

Options



 

Model 

case(varname)speciﬁes the variable that identiﬁes each case. This variable identiﬁes the individuals

or entities making a choice. case() is required.

alternatives(varname)speciﬁes the variable that identiﬁes the alternatives for each case. The

number of alternatives can vary with each case; the maximum number of alternatives is 20.

alternatives() is required.

casevars(varlist)speciﬁes the case-speciﬁc variables that are constant for each case(). If there are

a maximum of Jalternatives, there will be J−1 sets of coefﬁcients associated with casevars().

constraints(constraints),collinear; see [R]estimation options.



 

Model 2 

correlation(correlation)speciﬁes the correlation structure of the latent-variable errors.

correlation(unstructured) is the most general and has J(J−3)/2+1 unique correlation

parameters. This is the default unless stdev() or structural are speciﬁed.

correlation(exchangeable) provides for one correlation coefﬁcient common to all latent

variables, except the latent variable associated with the basealternative() option.

correlation(independent) assumes that all correlations are zero.

correlation(pattern matname)and correlation(fixed matname)give you more ﬂexibil-

ity in deﬁning the correlation structure. See Variance structures later in this entry for more

information.

stddev(stddev)speciﬁes the variance structure of the latent-variable errors.

104 asmprobit — Alternative-speciﬁc multinomial probit regression

stddev(heteroskedastic) is the most general and has J−2 estimable parameters. The standard

deviations of the latent-variable errors for the alternatives speciﬁed in basealternative()

and scalealternative() are ﬁxed to one.

stddev(homoskedastic) constrains all the standard deviations to equal one.

stddev(pattern matname)and stddev(fixed matname)give you added ﬂexibility in deﬁning

the standard deviation parameters. See Variance structures later in this entry for more information.

structural requests the J×Jstructural covariance parameterization instead of the default J−1×J−1

differenced covariance parameterization (the covariance of the latent errors differenced with that

of the base alternative). The differenced covariance parameterization will achieve the same MSL

regardless of the choice of basealternative() and scalealternative(). On the other hand,

the structural covariance parameterization imposes more normalizations that may bound the model

away from its maximum likelihood and thus prevent convergence with some datasets or choices

of basealternative() and scalealternative().

factor(#)requests that the factor covariance structure of dimension #be used. The factor() option

can be used with the structural option but cannot be used with stddev() or correlation().

A#×J(or #×J−1)matrix, C, is used to factor the covariance matrix as I+C0C, where

Iis the identity matrix of dimension J(or J−1). The column dimension of Cdepends on

whether the covariance is structural or differenced. The row dimension of C,#, must be less than

or equal to floor((J(J−1)/2−1)/(J−2)), because there are only J(J−1)/2−1 identiﬁable

variance–covariance parameters. This covariance parameterization may be useful for reducing the

number of covariance parameters that need to be estimated.

If the covariance is structural, the column of Ccorresponding to the base alternative contains zeros.

The column corresponding to the scale alternative has a one in the ﬁrst row and zeros elsewhere.

If the covariance is differenced, the column corresponding to the scale alternative (differenced with

the base) has a one in the ﬁrst row and zeros elsewhere.

noconstant suppresses the J−1 alternative-speciﬁc constant terms.

basealternative(#|lbl |str)speciﬁes the alternative used to normalize the latent-variable location

(also referred to as the level of utility). The base alternative may be speciﬁed as a number, label,

or string. The standard deviation for the latent-variable error associated with the base alternative

is ﬁxed to one, and its correlations with all other latent-variable errors are set to zero. The default

is the ﬁrst alternative when sorted. If a fixed or pattern matrix is given in the stddev()

and correlation() options, the basealternative() will be implied by the ﬁxed standard

deviations and correlations in the matrix speciﬁcations. basealternative() cannot be equal to

scalealternative().

scalealternative(#|lbl |str)speciﬁes the alternative used to normalize the latent-variable scale

(also referred to as the scale of utility). The scale alternative may be speciﬁed as a number,

label, or string. The default is to use the second alternative when sorted. If a fixed or pattern

matrix is given in the stddev() option, the scalealternative() will be implied by the

ﬁxed standard deviations in the matrix speciﬁcation. scalealternative() cannot be equal to

basealternative().

If a fixed or pattern matrix is given for the stddev() option, the base alternative and scale

alternative are implied by the standard deviations and correlations in the matrix speciﬁcations, and

they need not be speciﬁed in the basealternative() and scalealternative() options.

altwise speciﬁes that alternativewise deletion be used when marking out observations due to missing

values in your variables. The default is to use casewise deletion; that is, the entire group of

observations making up a case is deleted if any missing values are encountered. This option does

not apply to observations that are marked out by the if or in qualiﬁer or the by preﬁx.

asmprobit — Alternative-speciﬁc multinomial probit regression 105



 

SE/Robust 

vce(vcetype)speciﬁes the type of standard error reported, which includes types that are derived from

asymptotic theory (oim,opg), that are robust to some kinds of misspeciﬁcation (robust), that

allow for intragroup correlation (cluster clustvar), and that use bootstrap or jackknife methods

(bootstrap,jackknife); see [R]vce option.

If specifying vce(bootstrap) or vce(jackknife), you must also specify basealternative()

and scalealternative().



 

Reporting 

level(#); see [R]estimation options.

notransform prevents retransforming the Cholesky-factored variance–covariance estimates to the

correlation and standard deviation metric.

This option has no effect if structural is not speciﬁed because the default differenced variance–

covariance estimates have no interesting interpretation as correlations and standard deviations.

notransform also has no effect if the correlation() and stddev() options are speciﬁed with

anything other than their default values. Here it is generally not possible to factor the variance–

covariance matrix, so optimization is already performed using the standard deviation and correlation

representations.

nocnsreport; see [R]estimation options.

display options:cformat(%fmt),pformat(% fmt),sformat(% fmt), and nolstretch; see [R]es-

timation options.



 

Integration 

intmethod(hammersley |halton |random) speciﬁes the method of generating the point sets used in

the quasi–Monte Carlo integration of the multivariate normal density. intmethod(hammersley),

the default, uses the Hammersley sequence; intmethod(halton) uses the Halton sequence; and

intmethod(random) uses a sequence of uniform random numbers.

intpoints(#)speciﬁes the number of points to use in the quasi–Monte Carlo integration. If

this option is not speciﬁed, the number of points is 50 ×Jif intmethod(hammersley) or

intmethod(halton) is used and 100 ×Jif intmethod(random) is used. Larger values of

intpoints() provide better approximations of the log likelihood, but at the cost of added

computation time.

intburn(#)speciﬁes where in the Hammersley or Halton sequence to start, which helps reduce the

correlation between the sequences of each dimension. The default is 0. This option may not be

speciﬁed with intmethod(random).

intseed(code |#)speciﬁes the seed to use for generating the uniform pseudorandom sequence. This

option may be speciﬁed only with intmethod(random).code refers to a string that records the

state of the random-number generator runiform(); see [R]set seed. An integer value #may

be used also. The default is to use the current seed value from Stata’s uniform random-number

generator, which can be obtained from c(seed).

antithetics speciﬁes that antithetic draws be used. The antithetic draw for the J−1 vector

uniform-random variables, x, is 1 −x.

nopivot turns off integration interval pivoting. By default, asmprobit will pivot the wider intervals

of integration to the interior of the multivariate integration. This improves the accuracy of the

quadrature estimate. However, discontinuities may result in the computation of numerical second-

order derivatives using ﬁnite differencing (for the Newton–Raphson optimize technique, tech(nr))

106 asmprobit — Alternative-speciﬁc multinomial probit regression

when few simulation points are used, resulting in a non–positive-deﬁnite Hessian. asmprobit

uses the Broyden–Fletcher–Goldfarb–Shanno optimization algorithm, by default, which does not

require computing the Hessian numerically using ﬁnite differencing.

initbhhh(#)speciﬁes that the Berndt–Hall–Hall–Hausman (BHHH) algorithm be used for the initial

#optimization steps. This option is the only way to use the BHHH algorithm along with other

optimization techniques. The algorithm switching feature of ml’s technique() option cannot

include bhhh.

favor(speed |space) instructs asmprobit to favor either speed or space when generating the

integration points. favor(speed) is the default. When favoring speed, the integration points are

generated once and stored in memory, thus increasing the speed of evaluating the likelihood. This

speed increase can be seen when there are many cases or when the user speciﬁes a large number

of integration points, intpoints(#). When favoring space, the integration points are generated

repeatedly with each likelihood evaluation.

For unbalanced data, where the number of alternatives varies with each case, the estimates computed

using intmethod(random) will vary slightly between favor(speed) and favor(space). This

is because the uniform sequences will not be identical, even when initiating the sequences using the

same uniform seed, intseed(code |#). For favor(speed),ncase blocks of intpoints(#)×

J−2 uniform points are generated, where Jis the maximum number of alternatives. For

favor(space), the column dimension of the matrices of points varies with the number of

alternatives that each case has.



 

Maximization 

maximize options:difficult,technique(algorithm spec),iterate(#),nolog,trace,

gradient,showstep,hessian,showtolerance,tolerance(#),ltolerance(#),

nrtolerance(#),nonrtolerance, and from(init specs); see [R]maximize.

The following options may be particularly useful in obtaining convergence with asmprobit:

difficult,technique(algorithm spec),nrtolerance(#),nonrtolerance, and

from(init specs).

If technique() contains more than one algorithm speciﬁcation, bhhh cannot be one of them. To

use the BHHH algorithm with another algorithm, use the initbhhh() option and specify the other

algorithm in technique().

Setting the optimization type to technique(bhhh) resets the default vcetype to vce(opg).

The following option is available with asmprobit but is not shown in the dialog box:

coeflegend; see [R]estimation options.

Remarks and examples

Remarks are presented under the following headings:

Introduction

Variance structures

asmprobit — Alternative-speciﬁc multinomial probit regression 107

Introduction

The MNP model is used with discrete dependent variables that take on more than two outcomes

that do not have a natural ordering. The stochastic error terms are assumed to have a multivariate

normal distribution that is heteroskedastic and correlated. Say that you have a set of Junordered

alternatives that are modeled by a regression of both case-speciﬁc and alternative-speciﬁc covariates.

A “case” refers to the information on one decision maker. Underlying the model is the set of Jlatent

variables (utilities),

ηij =xij β+ziαj+ξij (1)

where idenotes cases and jdenotes alternatives. xij is a 1 ×pvector of alternative-speciﬁc variables,

βis a p×1 vector of parameters, ziis a 1×qvector of case-speciﬁc variables, αjis a q×1vector

of parameters for the jth alternative, and ξi= (ξi1, . . . , ξiJ )is distributed multivariate normal with

mean zero and covariance matrix Ω. The decision maker selects the alternative whose latent variable

is highest.

Because the MNP model allows for a general covariance structure in ξij , it does not impose the

IIA property inherent in multinomial logistic and conditional logistic models. That is, the MNP model

permits the odds of choosing one alternative over another to depend on the remaining alternatives. For

example, consider the choice of travel mode between two cities: air, train, bus, or car, as a function

of the travel mode cost, travel time (alternative-speciﬁc variables), and an individual’s income (a

case-speciﬁc variable). The odds of choosing air travel over a bus may not be independent of the train

alternative because both bus and train travel are public ground transportation. That is, the probability

of choosing air travel is Pr(ηair > ηbus, ηair > ηtrain, ηair > ηcar), and the two events ηair > ηbus

and ηair > ηtrain may be correlated.

An alternative to MNP that will allow a nested correlation structure in ξij is the nested logit model

(see [R]nlogit).

The added ﬂexibility of the MNP model does impose a signiﬁcant computation burden because of

the need to evaluate probabilities from the multivariate normal distribution. These probabilities are

evaluated using simulation techniques because a closed-form solution does not exist. See Methods

and formulas for more information.

Not all the Jsets of regression coefﬁcients αjare identiﬁable, nor are all J(J+1)/2 elements

of the variance–covariance matrix Ω. As described by Train (2009, sec. 2.5), the model requires

normalization because both the location (level) and scale of the latent variable are irrelevant. Increasing

the latent variables by a constant does not change which ηij is the maximum for decision maker i,

nor does multiplying them by a constant. To normalize location, we choose an alternative, indexed

by k, say, and take the difference between the latent variable kand the J−1 others,

vijk =ηij −ηik

= (xij −xik)β+zi(αj−αk) + ξij −ξik

=δij0β+ziγj0+ij0

=λij0+ij0

(2)

where j0=jif j < k and j0=j−1if j > k, so that j0= 1, . . . , J −1. One can now work with

the (J−1)×(J−1)covariance matrix Σ(k)for 0

i= (i1, . . . , i,J−1). The kth alternative here

is the basealternative() in asmprobit. From (2), the probability that decision maker ichooses

alternative k, for example, is

Pr(ichooses k) = Pr(vi1k≤0, . . . , vi,J−1,k ≤0)

= Pr(i1≤ −λi1, . . . , i,J−1≤ −λi,J−1)

108 asmprobit — Alternative-speciﬁc multinomial probit regression

To normalize for scale, one of the diagonal elements of Σ(k)must be ﬁxed to a constant. In

asmprobit, this is the error variance for the alternative speciﬁed by scalealternative(). Thus

there are a total of, at most, J(J−1)/2−1 identiﬁable variance–covariance parameters. See Variance

structures below for more on this issue.

In fact, the model is slightly more general in that not all cases need to have faced all Jalternatives.

The model allows for situations in which some cases chose among all possible alternatives, whereas

other cases were given a choice among a subset of them, and perhaps other cases were given a

choice among a different subset. The number of observations for each case is equal to the number

of alternatives faced.

The MNP model is often motivated using a random-utility consumer-choice framework. Equation

(1) represents the utility that consumer ireceives from good j. The consumer purchases the good for

which the utility is highest. Because utility is ordinal, all that matters is the ranking of the utilities

from the alternatives. Thus one must normalize for location and scale.

Example 1

Application of MNP models is common in the analysis of transportation data. Greene (2012,

sec. 18.2.9) uses travel-mode choice data between Sydney and Melbourne to demonstrate estimating

parameters of various discrete-choice models. The data contain information on 210 individuals’

choices of travel mode. The four alternatives are air, train, bus, and car, with indices 1, 2, 3, and 4,

respectively. One alternative-speciﬁc variable is travelcost, a measure of generalized cost of travel

that is equal to the sum of in-vehicle cost and a wagelike measure times the amount of time spent

traveling. A second alternative-speciﬁc variable is the terminal time, termtime, which is zero for car

transportation. Household income, income, is a case-speciﬁc variable.

. use http://www.stata-press.com/data/r13/travel

. list id mode choice travelcost termtime income in 1/12, sepby(id)

id mode choice travel~t termtime income

1. 1 air 0 70 69 35

2. 1 train 0 71 34 35

3. 1 bus 0 70 35 35

4. 1 car 1 30 0 35

5. 2 air 0 68 64 30

6. 2 train 0 84 44 30

7. 2 bus 0 85 53 30

8. 2 car 1 50 0 30

9. 3 air 0 129 69 40

10. 3 train 0 195 34 40

11. 3 bus 0 149 35 40

12. 3 car 1 101 0 40

The model of travel choice is

ηij =β1travelcostij +β2termtimeij +α1jincomei+α0j+ξij

The alternatives can be grouped as air and ground travel. With this in mind, we set the air alternative

to be the basealternative() and choose train as the scaling alternative. Because these are the

ﬁrst and second alternatives in the mode variable, they are also the defaults.

asmprobit — Alternative-speciﬁc multinomial probit regression 109

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income)

(output omitted )

Alternative-specific multinomial probit Number of obs = 840

Case variable: id Number of cases = 210

Alternative variable: mode Alts per case: min = 4

avg = 4.0

max = 4

Integration sequence: Hammersley

Integration points: 200 Wald chi2(5) = 32.05

Log simulated-likelihood = -190.09418 Prob > chi2 = 0.0000

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

mode

travelcost -.00977 .0027834 -3.51 0.000 -.0152253 -.0043146

termtime -.0377095 .0094088 -4.01 0.000 -.0561504 -.0192686

air (base alternative)

train

income -.0291971 .0089246 -3.27 0.001 -.046689 -.0117052

_cons .5616376 .3946551 1.42 0.155 -.2118721 1.335147

bus

income -.0127503 .0079267 -1.61 0.108 -.0282863 .0027857

_cons -.0571364 .4791861 -0.12 0.905 -.9963239 .882051

car

income -.0049086 .0077486 -0.63 0.526 -.0200957 .0102784

_cons -1.833393 .8186156 -2.24 0.025 -3.43785 -.2289357

/lnl2_2 -.5502039 .3905204 -1.41 0.159 -1.31561 .2152021

/lnl3_3 -.6005552 .3353292 -1.79 0.073 -1.257788 .0566779

/l2_1 1.131518 .2124817 5.33 0.000 .7150612 1.547974

/l3_1 .9720669 .2352116 4.13 0.000 .5110606 1.433073

/l3_2 .5197214 .2861552 1.82 0.069 -.0411325 1.080575

(mode=air is the alternative normalizing location)

(mode=train is the alternative normalizing scale)

. estimates store full

By default, the differenced covariance parameterization is used, so the covariance matrix for this

model is 3 ×3. There are two free variances to estimate and three correlations. To help ensure that the

covariance matrix remains positive deﬁnite, asmprobit uses the square root transformation, where it

optimizes on the Cholesky-factored variance–covariance. To ensure that the diagonal elements of the

Cholesky estimates remain positive, we use the log transformation. The estimates labeled /lnl2 2

and /lnl3 3 in the coefﬁcient table are the log-transformed diagonal elements of the Cholesky

matrix. The estimates labeled /l2 1,/l3 1, and /l3 2 are the off-diagonal entries for elements

(2,1),(3,1), and (3,2)of the Cholesky matrix.

Although the transformed parameters of the differenced covariance parameterization are difﬁcult

to interpret, you can view them untransformed by using the estat command. Typing

110 asmprobit — Alternative-speciﬁc multinomial probit regression

. estat correlation

train bus car

train 1.0000

bus 0.8909 1.0000

car 0.7895 0.8951 1.0000

Note: correlations are for alternatives differenced with air

gives the correlations, and typing

. estat covariance

train bus car

train 2

bus 1.600208 1.613068

car 1.37471 1.399703 1.515884

Note: covariances are for alternatives differenced with air

gives the (co)variances.

We can reduce the number of covariance parameters in the model by using the factor model by

Cameron and Trivedi (2005). For large models with many alternatives, the parameter reduction can

be dramatic, but for our example we will use factor(1), a one-dimension factor model, to reduce

by 3 the number of parameters associated with the covariance matrix.

asmprobit — Alternative-speciﬁc multinomial probit regression 111

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income) factor(1)

(output omitted )

Alternative-specific multinomial probit Number of obs = 840

Case variable: id Number of cases = 210

Alternative variable: mode Alts per case: min = 4

avg = 4.0

max = 4

Integration sequence: Hammersley

Integration points: 200 Wald chi2(5) = 107.85

Log simulated-likelihood = -196.85094 Prob > chi2 = 0.0000

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

mode

travelcost -.0093696 .0036329 -2.58 0.010 -.01649 -.0022492

termtime -.0593173 .0064585 -9.18 0.000 -.0719757 -.0466589

air (base alternative)

train

income -.0373511 .0098219 -3.80 0.000 -.0566018 -.0181004

_cons .1092322 .3949529 0.28 0.782 -.6648613 .8833257

bus

income -.0158793 .0112239 -1.41 0.157 -.0378777 .0061191

_cons -1.082181 .4678732 -2.31 0.021 -1.999196 -.1651666

car

income .0042677 .0092601 0.46 0.645 -.0138817 .0224171

_cons -3.765445 .5540636 -6.80 0.000 -4.851389 -2.6795

/c1_2 1.182805 .3060299 3.86 0.000 .5829972 1.782612

/c1_3 1.227705 .3401237 3.61 0.000 .5610747 1.894335

(mode=air is the alternative normalizing location)

(mode=train is the alternative normalizing scale)

The estimates labeled /c1 2 and /c1 3 in the coefﬁcient table are the factor loadings. These factor

loadings produce the following differenced covariance estimates:

. estat covariance

train bus car

train 2

bus 1.182805 2.399027

car 1.227705 1.452135 2.507259

Note: covariances are for alternatives differenced with air

Variance structures

The matrix Ωhas J(J+1)/2 distinct elements because it is symmetric. Selecting a base alternative,

normalizing its error variance to one, and constraining the correlations between its error and the other

errors reduces the number of estimable parameters by J. Moreover, selecting a scale alternative and

normalizing its error variance to one reduces the number by one, as well. Hence, there are at most

m=J(J−1)/2−1 estimable parameters in Ω.

112 asmprobit — Alternative-speciﬁc multinomial probit regression

In practice, estimating all mparameters can be difﬁcult, so one must often place more restrictions on

the parameters. The asmprobit command provides the correlation() option to specify restrictions

on the J(J−3)/2+1 correlation parameters not already restricted as a result of choosing the base

alternatives, and it provides stddev() to specify restrictions on the J−2 standard deviations not

already restricted as a result of choosing the base and scale alternatives.

When the structural option is used, asmprobit ﬁts the model by assuming that all m

parameters can be estimated, which is equivalent to specifying correlation(unstructured) and

stddev(heteroskedastic). The unstructured correlation structure means that all J(J−3)/2+1

of the remaining correlation parameters will be estimated, and the heteroskedastic speciﬁcation means

that all J−2 standard deviations will be estimated. With these default settings, the log likelihood is

maximized with respect to the Cholesky decomposition of Ω, and then the parameters are transformed

to the standard deviation and correlation form.

The correlation(exchangeable) option forces the J(J−3)/2+1 correlation parameters

to be equal, and correlation(independent) forces all the correlations to be zero. Using the

stddev(homoskedastic) option forces all Jstandard deviations to be one. These options may help

in obtaining convergence for a model if the default options do not produce satisfactory results. In

fact, when ﬁtting a complex model, it may be advantageous to ﬁrst ﬁt a simple one and then proceed

with removing the restrictions one at a time.

Advanced users may wish to specify alternative variance structures of their own choosing, and the

next few paragraphs explain how to do so.

correlation(pattern matname)allows you to give the name of a J×Jmatrix that identiﬁes

a correlation structure. Sequential positive integers starting at 1 are used to identify each correlation

parameter: if there are three correlation parameters, they are identiﬁed by 1, 2, and 3. The integers

can be repeated to indicate that correlations with the same number should be constrained to be equal.

A zero or a missing value (.) indicates that the correlation is to be set to zero. asmprobit considers

only the elements of the matrix below the main diagonal.

Suppose that you have a model with four alternatives, numbered 1–4, and alternative 1 is the

base. The unstructured and exchangeable correlation structures identiﬁed in the 4 ×4 lower triangular

matrices are unstructured exchangeable







1234

1·

2 0 ·

3 0 1 ·

4 0 2 3 ·











1234

1·

2 0 ·

3 0 1 ·

4 0 1 1 ·







asmprobit labels these correlation structures unstructured and exchangeable, even though the correla-

tions corresponding to the base alternative are set to zero. More formally: these terms are appropriate

when considering the (J−1)×(J−1)submatrix Σ(k)deﬁned in the Introduction above.

You can also use the correlation(fixed matname)option to specify a matrix that speciﬁes

ﬁxed and free parameters. Here the free parameters (those that are to be estimated) are identiﬁed by

a missing value, and nonmissing values represent correlations that are to be taken as given. Below

is a correlation structure that would set the correlations of alternative 1 to be 0.5:







1 2 3 4

1·

2 0.5 ·

3 0.5 · ·

4 0.5 ···







asmprobit — Alternative-speciﬁc multinomial probit regression 113

The order of the elements of the pattern or fixed matrices must be the same as the numeric

order of the alternative levels.

To specify the structure of the standard deviations—the diagonal elements of Ω—you can use the

stddev(pattern matname)option, where matname is a 1 ×Jmatrix. Sequential positive integers

starting at 1 are used to identify each standard deviation parameter. The integers can be repeated to

indicate that standard deviations with the same number are to be constrained to be equal. A missing

value indicates that the corresponding standard deviation is to be set to one. In the four-alternative

example mentioned above, suppose that you wish to set the ﬁrst and second standard deviations to

one and that you wish to constrain the third and fourth standard deviations to be equal; the following

pattern matrix will do that:

(

1234

1· · 1 1 )

Using the stddev(fixed matname)option allows you to identify the ﬁxed and free standard

deviations. Fixed standard deviations are entered as positive real numbers, and free parameters are

identiﬁed with missing values. For example, to constrain the ﬁrst and second standard deviations to

equal one and to allow the third and fourth to be estimated, you would use this fixed matrix:

(

1234

1 1 1 · · )

When supplying either the pattern or the fixed matrices, you must ensure that the model is

properly scaled. At least two standard deviations must be constant for the model to be scaled. A

warning is issued if asmprobit detects that the model is not scaled.

The order of the elements of the pattern or fixed matrices must be the same as the numeric

order of the alternative levels.

Example 2

In example 1, we used the differenced covariance parameterization, the default. We now use

the structural option to view the J−2 standard deviation estimates and the (J−1)(J−2)/2

correlation estimates. Here we will ﬁx the standard deviations for the air and train alternatives to

1 and the correlations between air and the rest of the alternatives to 0.

114 asmprobit — Alternative-speciﬁc multinomial probit regression

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income) structural

(output omitted )

Alternative-specific multinomial probit Number of obs = 840

Case variable: id Number of cases = 210

Alternative variable: mode Alts per case: min = 4

avg = 4.0

max = 4

Integration sequence: Hammersley

Integration points: 200 Wald chi2(5) = 32.05

Log simulated-likelihood = -190.09418 Prob > chi2 = 0.0000

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

mode

travelcost -.0097703 .0027834 -3.51 0.000 -.0152257 -.0043149

termtime -.0377103 .0094092 -4.01 0.000 -.056152 -.0192687

air (base alternative)

train

income -.0291975 .0089246 -3.27 0.001 -.0466895 -.0117055

_cons .5616448 .3946529 1.42 0.155 -.2118607 1.33515

bus

income -.01275 .0079266 -1.61 0.108 -.0282858 .0027858

_cons -.0571664 .4791996 -0.12 0.905 -.9963803 .8820476

car

income -.0049085 .0077486 -0.63 0.526 -.0200955 .0102785

_cons -1.833444 .8186343 -2.24 0.025 -3.437938 -.22895

/lnsigma3 -.2447428 .4953363 -0.49 0.621 -1.215584 .7260985

/lnsigma4 -.3309429 .6494493 -0.51 0.610 -1.60384 .9419543

/atanhr3_2 1.01193 .3890994 2.60 0.009 .249309 1.774551

/atanhr4_2 .5786576 .3940461 1.47 0.142 -.1936586 1.350974

/atanhr4_3 .8885204 .5600561 1.59 0.113 -.2091693 1.98621

sigma1 1 (base alternative)

sigma2 1 (scale alternative)

sigma3 .7829059 .3878017 .2965368 2.067

sigma4 .7182462 .4664645 .2011227 2.564989

rho3_2 .766559 .1604596 .244269 .9441061

rho4_2 .5216891 .2868027 -.1912734 .874283

rho4_3 .7106622 .277205 -.2061713 .9630403

(mode=air is the alternative normalizing location)

(mode=train is the alternative normalizing scale)

When comparing this output to that of example 1, we see that we have achieved the same log

likelihood. That is, the structural parameterization using air as the base alternative and train as

the scale alternative applied no restrictions on the model. This will not always be the case. We leave

it up to you to try different base and scale alternatives, and you will see that not all the different

combinations will achieve the same log likelihood. This is not true for the differenced covariance

parameterization: it will always achieve the same log likelihood (and the maximum possible likelihood)

regardless of the base and scale alternatives. This is why it is the default parameterization.

asmprobit — Alternative-speciﬁc multinomial probit regression 115

For an exercise, we can compute the differenced covariance displayed in example 1 by using the

following ado-code.

. estat covariance

air train bus car

air 1

train 0 1

bus 0 .6001436 .6129416

car 0 .3747012 .399619 .5158776

. return list

matrices:

r(cov) : 4 x 4

. matrix cov = r(cov)

. matrix M = (1,-1,0,0 \ 1,0,-1,0 \ 1,0,0,-1)

. matrix cov1 = M*cov*M’

. matrix list cov1

symmetric cov1[3,3]

r1 r2 r3

r1 2

r2 1.6001436 1.6129416

r3 1.3747012 1.399619 1.5158776

The slight difference in the regression coefﬁcients between the example 1 and example 2 coefﬁcient

tables reﬂects the accuracy of the [M-5]ghk( ) algorithm using 200 points from the Hammersley

sequence.

We now ﬁt the model using the exchangeable correlation matrix and compare the models with a

likelihood-ratio test.

116 asmprobit — Alternative-speciﬁc multinomial probit regression

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income) correlation(exchangeable)

(output omitted )

Alternative-specific multinomial probit Number of obs = 840

Case variable: id Number of cases = 210

Alternative variable: mode Alts per case: min = 4

avg = 4.0

max = 4

Integration sequence: Hammersley

Integration points: 200 Wald chi2(5) = 53.60

Log simulated-likelihood = -190.4679 Prob > chi2 = 0.0000

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

mode

travelcost -.0084636 .0020452 -4.14 0.000 -.012472 -.0044551

termtime -.0345394 .0072812 -4.74 0.000 -.0488103 -.0202684

air (base alternative)

train

income -.0290357 .0083226 -3.49 0.000 -.0453477 -.0127237

_cons .5517445 .3719913 1.48 0.138 -.177345 1.280834

bus

income -.0132562 .0074133 -1.79 0.074 -.0277859 .0012735

_cons -.0052517 .4337932 -0.01 0.990 -.8554708 .8449673

car

income -.0060878 .006638 -0.92 0.359 -.0190981 .0069224

_cons -1.565918 .6633007 -2.36 0.018 -2.865964 -.265873

/lnsigmaP1 -.3557589 .1972809 -1.80 0.071 -.7424222 .0309045

/lnsigmaP2 -1.308596 .8872957 -1.47 0.140 -3.047663 .4304719

/atanhrP1 1.116589 .3765488 2.97 0.003 .3785667 1.854611

sigma1 1 (base alternative)

sigma2 1 (scale alternative)

sigma3 .7006416 .1382232 .4759596 1.031387

sigma4 .2701992 .2397466 .0474697 1.537983

rho3_2 .8063791 .131699 .3614621 .9521783

rho4_2 .8063791 .131699 .3614621 .9521783

rho4_3 .8063791 .131699 .3614621 .9521783

(mode=air is the alternative normalizing location)

(mode=train is the alternative normalizing scale)

. lrtest full .

Likelihood-ratio test LR chi2(2) = 0.75

(Assumption: . nested in full) Prob > chi2 = 0.6882

The likelihood-ratio test suggests that a common correlation is a plausible hypothesis, but this could

be an artifact of the small sample size. The labeling of the standard deviation and correlation estimates

has changed from /lnsigma and /atanhr, in the previous example, to /lnsigmaP and /atanhrP.

The “P” identiﬁes the parameter’s index in the pattern matrices used by asmprobit. The pattern

matrices are stored in e(stdpattern) and e(corpattern).

asmprobit — Alternative-speciﬁc multinomial probit regression 117

Technical note

Another way to ﬁt the model with the exchangeable correlation structure in example 2 is to use

the constraint command to deﬁne the constraints on the rho parameters manually and then apply

those.

. constraint 1 [atanhr3_2]_cons = [atanhr4_2]_cons

. constraint 2 [atanhr3_2]_cons = [atanhr4_3]_cons

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income) constraints(1 2) structural

With this method, however, we must keep track of what parameterization of the rhos is used in

estimation, and that depends on the options speciﬁed.

Example 3

In the last example, we used the correlation(exchangeable) option, reducing the number

of correlation parameters from three to one. We can explore a two–correlation parameter model

by specifying a pattern matrix in the correlation() option. Suppose that we wish to have the

correlation between train and bus be equal to the correlation between bus and car and to have the

standard deviations for the bus and car equations be equal. We will use air as the base category and

train as the scale category.

118 asmprobit — Alternative-speciﬁc multinomial probit regression

. matrix define corpat = J(4, 4, .)

. matrix corpat[3,2] = 1

. matrix corpat[4,3] = 1

. matrix corpat[4,2] = 2

. matrix define stdpat = J(1, 4, .)

. matrix stdpat[1,3] = 1

. matrix stdpat[1,4] = 1

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income) correlation(pattern corpat) stddev(pattern stdpat)

(output omitted )

Alternative-specific multinomial probit Number of obs = 840

Case variable: id Number of cases = 210

Alternative variable: mode Alts per case: min = 4

avg = 4.0

max = 4

Integration sequence: Hammersley

Integration points: 200 Wald chi2(5) = 41.67

Log simulated-likelihood = -190.12871 Prob > chi2 = 0.0000

choice Coef. Std. Err. z P>|z| [95% Conf. Interval]

mode

travelcost -.0100335 .0026203 -3.83 0.000 -.0151692 -.0048979

termtime -.0385731 .008608 -4.48 0.000 -.0554445 -.0217018

air (base alternative)

train

income -.029271 .0089739 -3.26 0.001 -.0468595 -.0116824

_cons .56528 .4008037 1.41 0.158 -.2202809 1.350841

bus

income -.0124658 .0080043 -1.56 0.119 -.0281539 .0032223

_cons -.0741685 .4763422 -0.16 0.876 -1.007782 .859445

car

income -.0046905 .0079934 -0.59 0.557 -.0203573 .0109763

_cons -1.897931 .7912106 -2.40 0.016 -3.448675 -.3471867

/lnsigmaP1 -.197697 .2751269 -0.72 0.472 -.7369359 .3415418

/atanhrP1 .9704403 .3286981 2.95 0.003 .3262038 1.614677

/atanhrP2 .5830923 .3690419 1.58 0.114 -.1402165 1.306401

sigma1 1 (base alternative)

sigma2 1 (scale alternative)

sigma3 .8206185 .2257742 .4785781 1.407115

sigma4 .8206185 .2257742 .4785781 1.407115

rho3_2 .7488977 .1443485 .3151056 .9238482

rho4_2 .5249094 .2673598 -.1393048 .863362

rho4_3 .7488977 .1443485 .3151056 .9238482

(mode=air is the alternative normalizing location)

(mode=train is the alternative normalizing scale)

In the call to asmprobit, we did not need to specify the basealternative() and scalealter-

native() options because they are implied by the speciﬁcations of the pattern matrices.

asmprobit — Alternative-speciﬁc multinomial probit regression 119

Technical note

If you experience convergence problems, try specifying nopivot, increasing intpoints(),

specifying antithetics, specifying technique(nr) with difficult, or specifying a switching

algorithm in the technique() option. As a last resort, you can use the nrtolerance() and

showtolerance options. Changing the base and scale alternative in the model speciﬁcation can also

affect convergence if the structural option is used.

Because simulation methods are used to obtain multivariate normal probabilities, the estimates

obtained have a limited degree of precision. Moreover, the solutions are particularly sensitive to the

starting values used. Experimenting with different starting values may help in obtaining convergence,

and doing so is a good way to verify previous results.

If you wish to use the BHHH algorithm along with another maximization algorithm, you must

specify the initbhhh(#)option, where #is the number of BHHH iterations to use before switching

to the algorithm speciﬁed in technique(). The BHHH algorithm uses an outer-product-of-gradients

approximation for the Hessian, and asmprobit must perform the gradient calculations differently

than for the other algorithms.

Technical note

If there are no alternative-speciﬁc variables in your model, the variance–covariance matrix pa-

rameters are not identiﬁable. For such a model to converge, you would therefore need to use cor-

relation(independent) and stddev(homoskedastic). A better alternative is to use mprobit,

which is geared speciﬁcally toward models with only case-speciﬁc variables. See [R]mprobit.

120 asmprobit — Alternative-speciﬁc multinomial probit regression

Stored results

asmprobit stores the following in e():

Scalars

e(N) number of observations

e(N case) number of cases

e(k) number of parameters

e(k alt) number of alternatives

e(k indvars) number of alternative-speciﬁc variables

e(k casevars) number of case-speciﬁc variables

e(k sigma) number of variance estimates

e(k rho) number of correlation estimates

e(k eq) number of equations in e(b)

e(k eq model) number of equations in overall model test

e(df m) model degrees of freedom

e(ll) log simulated-likelihood

e(N clust) number of clusters

e(const) constant indicator

e(i base) base alternative index

e(i scale) scale alternative index

e(mc points) number of Monte Carlo replications

e(mc burn) starting sequence index

e(mc antithetics) antithetics indicator

e(chi2) χ2

e(p) signiﬁcance

e(fullcov) unstructured covariance indicator

e(structcov) 1 if structured covariance; 0otherwise

e(cholesky) Cholesky-factored covariance indicator

e(alt min) minimum number of alternatives

e(alt avg) average number of alternatives

e(alt max) maximum number of alternatives

e(rank) rank of e(V)

e(ic) number of iterations

e(rc) return code

e(converged) 1 if converged, 0otherwise

asmprobit — Alternative-speciﬁc multinomial probit regression 121

Macros

e(cmd) asmprobit

e(cmdline) command as typed

e(depvar) name of dependent variable

e(indvars) alternative-speciﬁc independent variable

e(casevars) case-speciﬁc variables

e(case) variable deﬁning cases

e(altvar) variable deﬁning alternatives

e(alteqs) alternative equation names

e(alt#)alternative labels

e(wtype) weight type

e(wexp) weight expression

e(title) title in estimation output

e(clustvar) name of cluster variable

e(correlation) correlation structure

e(stddev) variance structure

e(cov class) class of the covariance structure

e(chi2type) Wald, type of model χ2test

e(vce) vcetype speciﬁed in vce()

e(vcetype) title used to label Std. Err.

e(opt) type of optimization

e(which) max or min; whether optimizer is to perform maximization or minimization

e(ml method) type of ml method

e(mc method) technique used to generate sequences

e(mc seed) random-number generator seed

e(user) name of likelihood-evaluator program

e(technique) maximization technique

e(datasignature) the checksum

e(datasignaturevars) variables used in calculation of checksum

e(properties) b V

e(estat cmd) program used to implement estat

e(mfx dlg) program used to implement estat mfx dialog

e(predict) program used to implement predict

e(marginsnotok) predictions disallowed by margins

Matrices

e(b) coefﬁcient vector

e(Cns) constraints matrix

e(stats) alternative statistics

e(stdpattern) variance pattern

e(stdfixed) ﬁxed and free standard deviations

e(altvals) alternative values

e(altfreq) alternative frequencies

e(alt casevars) indicators for estimated case-speciﬁc coefﬁcients—e(k alt)×e(k casevars)

e(corpattern) correlation structure

e(corfixed) ﬁxed and free correlations

e(ilog) iteration log (up to 20 iterations)

e(gradient) gradient vector

e(V) variance–covariance matrix of the estimators

e(V modelbased) model-based variance

Functions

e(sample) marks estimation sample

Methods and formulas

The simulated maximum likelihood estimates for the MNP are obtained using ml; see [R]ml.

The likelihood evaluator implements the GHK algorithm to approximate the multivariate distribution

function (Geweke 1989;Hajivassiliou and McFadden 1998;Keane and Wolpin 1994). The technique

is also described in detail by Genz (1992), but Genz describes a more general algorithm where both

122 asmprobit — Alternative-speciﬁc multinomial probit regression

lower and upper bounds of integration are ﬁnite. We brieﬂy describe the GHK simulator and refer you

to Bolduc (1999) for the score computations.

As discussed earlier, the latent variables for a J-alternative model are ηij =xij β+ziαj+ξij ,

for j=1, . . . , J,i=1, . . . , n, and ξ0

i= (ξi,1, . . . , ξi,J)∼MVN(0,Ω). The experimenter observes

alternative kfor the ith observation if k=arg max(ηij , j =1, . . . , J). Let

vij0=ηij −ηik

= (xij −xik)β+zi(αj−αk) + ξij −ξik

=δij0β+ziγj0+ij0

where j0=jif j < k and j0=j−1 if j > k, so that j0=1, . . . , J −1. Further, i=

(i1, . . . , i,J−1)∼MVN(0,Σ(k)).Σis indexed by kbecause it depends on the choice made. We

denote the deterministic part of the model as λij0=δij0β+zjγj0, and the probability of this event

Pr(yi=k) = Pr(vi1≤0, . . . , vi,J−1≤0)

= Pr(i1≤ −λi1, . . . , i,J−1≤ −λi,J−1)

= (2π)−(J−1)/2|Σ(k)|−1/2Z−λi1

−∞ ···Z−λi,J−1

−∞

exp −1

2z0Σ−1

(k)zdz

(3)

Simulated likelihood

For clarity in the discussion that follows, we drop the index denoting case so that for an arbitrary

observation υ0= (v1, . . . , vJ−1),λ0= (λ1, . . . , λJ−1), and 0= (1, . . . , J−1).

The Cholesky-factored variance–covariance, Σ=LL0, is lower triangular,

L=





l11 0. . . 0

l21 l22 . . . 0

lJ−1,1lJ−1,2. . . lJ−1,J−1







and the correlated latent-variable errors can be expressed as linear functions of uncorrelated normal

variates, =Lζ, where ζ0= (ζ1, . . . , ζJ−1)and ζj∼iid N(0,1). We now have υ=λ+Lζ, and

by deﬁning

zj=









−λ1

l11

for j=1

−λj+Pj−1

i=1 ljiζi

ljj

for j=2, . . . , J −1

(4)

we can express the probability statement (3) as the product of conditional probabilities

Pr(yi=k) = Pr (ζ1≤z1) Pr (ζ2≤z2|ζ1≤z1)···

Pr (ζJ−1≤zJ−1|ζ1≤z1, . . . , ζJ−2≤zJ−2)

asmprobit — Alternative-speciﬁc multinomial probit regression 123

because

Pr(v1≤0) = Pr(λ1+l11ζ1≤0)

= Pr ζ1≤ −λ1

l11 

Pr(v2≤0) = Pr(λ2+l21ζ1+l22ζ2≤0)

= Pr ζ2≤ −λ2+l21ζ1

l22 |ζ1≤ −λ1

l11 

. . .

The Monte Carlo algorithm then must make draws from the truncated standard normal distribution.

It does so by generating J−1 uniform variates, δj, j =1, . . . , J −1, and computing

ζj=









Φ−1δ1Φ−λ1

l11  for j=1

Φ−1(δjΦ −λj−Pj−1

i=1 lji e

ζi

ljj !) for j=2, . . . , J −1

Deﬁne ezjby replacing e

ζifor ζiin (4) so that the simulated probability for the lth draw is

pl=

J−1

j=1

Φ(ezj)

To increase accuracy, the bounds of integration, λj, are ordered so that the largest integration intervals

are on the inside. The rows and columns of the variance–covariance matrix are pivoted accordingly

(Genz 1992).

For a more detailed description of the GHK algorithm in Stata, see Gates (2006).

Repeated draws are made, say, N, and the simulated likelihood for the ith case, denoted b

Li, is

computed as

Li=1

l=1

The overall simulated log likelihood is Pilog b

Li.

If the true likelihood is Li, the error bound on the approximation can be expressed as

Li−Li| ≤ V(Li)DN{(δi)}

where V(Li)is the total variation of Liand DNis the discrepancy, or nonuniformity, of the set of ab-

scissas. For the uniform pseudorandom sequence, δi, the discrepancy is of order O{(log log N/N)1/2}.

The order of discrepancy can be improved by using quasirandom sequences.

Quasi–Monte Carlo integration is carried out by asmprobit by replacing the uniform deviates

with either the Halton or the Hammersley sequences. These sequences spread the points more evenly

than the uniform random sequence and have a smaller order of discrepancy, O{(log N)J−1}/N 

and O{(log N)J−2}/N, respectively. The Halton sequence of dimension J−1 is generated from

the ﬁrst J−1primes, pk, so that on draw lwe have hl={rp1(l), rp2(l), . . . , rpJ−1(l)}, where

124 asmprobit — Alternative-speciﬁc multinomial probit regression

rpk(l) =

j=0

bjk(l)p−j−1

k∈(0,1)

is the radical inverse function of lwith base pkso that Pq

j=0 bjk(l)pj

k=l, where pq

k≤l < pq+1

(Fang and Wang 1994).

This function is demonstrated with base p3=5 and l=33, which generates r5(33). Here q=2,

b0,3(33) = 3, b1,5(33) = 1, and b2,5(33) = 1, so that r5(33) = 3/5+1/25 +1/625.

The Hammersley sequence uses an evenly spaced set of points with the ﬁrst J−2 components

of the Halton sequence

hl=2l−1

2N, rp1(l), rp2(l), . . . , rpJ−2(l)

for l=1, . . . , N.

For a more detailed description of the Halton and Hammersley sequences, see Drukker and

Gates (2006).

Computations for the derivatives of the simulated likelihood are taken from Bolduc (1999). Bolduc

gives the analytical ﬁrst-order derivatives for the log of the simulated likelihood with respect to

the regression coefﬁcients and the parameters of the Cholesky-factored variance–covariance matrix.

asmprobit uses these analytical ﬁrst-order derivatives and numerical second-order derivatives.

This command supports the clustered version of the Huber/White/sandwich estimator of the

variance using vce(robust) and vce(cluster clustvar). See [P]robust, particularly Maximum

likelihood estimators and Methods and formulas. Specifying vce(robust) is equivalent to specifying

vce(cluster casevar), where casevar is the variable that identiﬁes the cases.

References

Bolduc, D. 1999. A practical technique to estimate multinomial probit models in transportation. Transportation Research

Part B 33: 63–79.

Bunch, D. S. 1991. Estimability of the multinomial probit model. Transportation Research Part B 25: 1–12.

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge

University Press.

Cappellari, L., and S. P. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood.Stata

Journal 3: 278–294.

Drukker, D. M., and R. Gates. 2006. Generating Halton sequences using Mata.Stata Journal 6: 214–228.

Fang, K.-T., and Y. Wang. 1994. Number-theoretic Methods in Statistics. London: Chapman & Hall.

Gates, R. 2006. A Mata Geweke–Hajivassiliou–Keane multivariate normal simulator.Stata Journal 6: 190–213.

Genz, A. 1992. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical

Statistics 1: 141–149.

Geweke, J. 1989. Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57:

1317–1339.

Geweke, J., and M. P. Keane. 2001. Computationally intensive methods for integration in econometrics. In Vol. 5 of

Handbook of Econometrics, ed. J. Heckman and E. Leamer, 3463–3568. Amsterdam: Elsevier.

Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.

Haan, P., and A. Uhlendorff. 2006. Estimation of multinomial logit models with unobserved heterogeneity using

maximum simulated likelihood.Stata Journal 6: 229–245.

asmprobit — Alternative-speciﬁc multinomial probit regression 125

Hajivassiliou, V. A., and D. L. McFadden. 1998. The method of simulated scores for the estimation of LDV models.

Econometrica 66: 863–896.

Hole, A. R. 2007. Fitting mixed logit models by using maximum simulated likelihood.Stata Journal 7: 388–401.

Keane, M. P., and K. I. Wolpin. 1994. The solution and estimation of discrete choice dynamic programming models

by simulation and interpolation: Monte Carlo evidence. Review of Economics and Statistics 76: 648–672.

Train, K. E. 2009. Discrete Choice Methods with Simulation. 2nd ed. New York: Cambridge University Press.

Also see

[R]asmprobit postestimation — Postestimation tools for asmprobit

[R]asclogit — Alternative-speciﬁc conditional logit (McFadden’s choice) model

[R]asroprobit — Alternative-speciﬁc rank-ordered probit regression

[R]mlogit — Multinomial (polytomous) logistic regression

[R]mprobit — Multinomial probit regression

[U] 20 Estimation and postestimation commands

Title

asmprobit postestimation — Postestimation tools for asmprobit

Description Syntax for predict Menu for predict Options for predict

Syntax for estat Menu for estat Options for estat Remarks and examples

Stored results Methods and formulas Also see

Description

The following postestimation commands are of special interest after asmprobit:

Command Description

estat alternatives alternative summary statistics

estat covariance covariance matrix of the latent-variable errors for the alternatives

estat correlation correlation matrix of the latent-variable errors for the alternatives

estat facweights covariance factor weights matrix

estat mfx marginal effects

The following standard postestimation commands are also available:

Command Description

estat ic Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC)

estat summarize summary statistics for the estimation sample

estat vce variance–covariance matrix of the estimators (VCE)

estimates cataloging estimation results

lincom point estimates, standard errors, testing, and inference for linear

combinations of coefﬁcients

lrtest likelihood-ratio test

nlcom point estimates, standard errors, testing, and inference for nonlinear

combinations of coefﬁcients

predict predicted probabilities, estimated linear predictor and its standard error

predictnl point estimates, standard errors, testing, and inference for generalized

predictions

test Wald tests of simple and composite linear hypotheses

testnl Wald tests of nonlinear hypotheses

Special-interest postestimation commands

estat alternatives displays summary statistics about the alternatives in the estimation sample

and provides a mapping between the index numbers that label the covariance parameters of the model

and their associated values and labels for the alternative variable.

estat covariance computes the estimated variance–covariance matrix of the latent-variable

errors for the alternatives. The estimates are displayed, and the variance–covariance matrix is stored

in r(cov).

126

asmprobit postestimation — Postestimation tools for asmprobit 127

estat correlation computes the estimated correlation matrix of the latent-variable errors for

the alternatives. The estimates are displayed, and the correlation matrix is stored in r(cor).

estat facweights displays the covariance factor weights matrix and stores it in r(C).

estat mfx computes the simulated probability marginal effects.

Syntax for predict

predict type newvar if  in  ,statistic altwise 

predict type  stub*|newvarlist  if  in , scores

statistic Description

Main

pr probability alternative is chosen; the default

xb linear prediction

stdp standard error of the linear prediction

These statistics are available both in and out of sample; type predict . . . if e(sample) . . . if wanted

only for the estimation sample.

Menu for predict

Statistics >Postestimation >Predictions, residuals, etc.

Options for predict



 

Main 

pr, the default, calculates the probability that alternative jis chosen in case i.

xb calculates the linear prediction xij β+ziαjfor alternative jand case i.

stdp calculates the standard error of the linear predictor.

altwise speciﬁes that alternativewise deletion be used when marking out observations due to missing

values in your variables. The default is to use casewise deletion. The xb and stdp options always

use alternativewise deletion.

scores calculates the scores for each coefﬁcient in e(b). This option requires a new variable list of

length equal to the number of columns in e(b). Otherwise, use the stub*option to have predict

generate enumerated variables with preﬁx stub.

128 asmprobit postestimation — Postestimation tools for asmprobit

Syntax for estat

Alternative summary statistics

estat alternatives

Covariance matrix of the latent-variable errors for the alternatives

estat covariance , format(%fmt) border(bspec) left(#)

Correlation matrix of the latent-variable errors for the alternatives

estat correlation , format(%fmt) border(bspec) left(#)

Covariance factor weights matrix

estat facweights , format(%fmt) border(bspec) left(#)

Marginal effects

estat mfx if  in  ,estat mfx options 

estat mfx options Description

Main

varlist(varlist)display marginal effects for varlist

at(mean atlist |median atlist )calculate marginal effects at these values

Options

level(#)set conﬁdence interval level; default is level(95)

nodiscrete treat indicator variables as continuous

noesample do not restrict calculation of means and medians to the

estimation sample

nowght ignore weights when calculating means and medians

Menu for estat

Statistics >Postestimation >Reports and statistics

Options for estat

Options for estat are presented under the following headings:

Options for estat covariance, estat correlation, and estat facweights

Options for estat mfx

asmprobit postestimation — Postestimation tools for asmprobit 129

Options for estat covariance, estat correlation, and estat facweights

format(%fmt)sets the matrix display format. The default for estat covariance and estat

facweights is format(%9.0g); the default for estat correlation is format(%9.4f).

border(bspec)sets the matrix display border style. The default is border(all). See [P]matlist.

left(#)sets the matrix display left indent. The default is left(2). See [P]matlist.

Options for estat mfx



 

Main 

varlist(varlist)speciﬁes the variables for which to display marginal effects. The default is all

variables.

at(mean atlist |median atlist )speciﬁes the values at which the marginal effects are to be

calculated. atlist is

alternative:variable =# variable =# . . . 

The default is to calculate the marginal effects at the means of the independent variables at the

estimation sample, at(mean).

After specifying the summary statistic, you can specify a series of speciﬁc values for variables.

You can specify values for alternative-speciﬁc variables by alternative, or you can specify one

value for all alternatives. You can specify only one value for case-speciﬁc variables. For example,

in travel.dta,income is a case-speciﬁc variable, whereas termtime and travelcost are

alternative-speciﬁc variables. The following would be a legal syntax for estat mfx:

. estat mfx, at(mean air:termtime=50 travelcost=100 income=60)

When nodiscrete is not speciﬁed, at(mean atlist )or at(median atlist )has no effect on

computing marginal effects for indicator variables, which are calculated as the discrete change in

the simulated probability as the indicator variable changes from 0 to 1.

The mean and median computations respect any if and in qualiﬁers, so you can restrict the data

over which the means or medians are computed. You can even restrict the values to a speciﬁc

case; for example,

. estat mfx if case==21



 

Options 

level(#)speciﬁes the conﬁdence level, as a percentage, for conﬁdence intervals. The default is

level(95) or as set by set level; see [U] 20.7 Specifying the width of conﬁdence intervals.

nodiscrete speciﬁes that indicator variables be treated as continuous variables. An indicator variable

is one that takes on the value 0 or 1 in the estimation sample. By default, the discrete change in

the simulated probability is computed as the indicator variable changes from 0 to 1.

noesample speciﬁes that the whole dataset be considered instead of only those marked in the

e(sample) deﬁned by the asmprobit command.

nowght speciﬁes that weights be ignored when calculating the means or medians.

Remarks and examples

Remarks are presented under the following headings:

Predicted probabilities

Obtaining estimation statistics

Obtaining marginal effects

130 asmprobit postestimation — Postestimation tools for asmprobit

Predicted probabilities

After ﬁtting an alternative-speciﬁc multinomial probit model, you can use predict to obtain the

simulated probabilities that an individual will choose each of the alternatives. When evaluating the

multivariate normal probabilities via Monte Carlo simulation, predict uses the same method to

generate the random sequence of numbers as the previous call to asmprobit. For example, if you

speciﬁed intmethod(Halton) when ﬁtting the model, predict also uses the Halton sequence.

Example 1

In example 1 of [R]asmprobit, we ﬁt a model of individuals’ travel-mode choices. We can obtain

the simulated probabilities that an individual chooses each alternative by using predict:

. use http://www.stata-press.com/data/r13/travel

. asmprobit choice travelcost termtime, case(id) alternatives(mode)

> casevars(income)

(output omitted )

. predict prob

(option pr assumed; Pr(mode))

. list id mode prob choice in 1/12, sepby(id)

id mode prob choice

1. 1 air .1494137 0

2. 1 train .329167 0

3. 1 bus .1320298 0

4. 1 car .3898562 1

5. 2 air .2565875 0

6. 2 train .2761054 0

7. 2 bus .0116135 0

8. 2 car .4556921 1

9. 3 air .2098406 0

10. 3 train .1081824 0

11. 3 bus .1671841 0

12. 3 car .5147822 1

Obtaining estimation statistics

Once you have ﬁt a multinomial probit model, you can obtain the estimated variance or correlation

matrices for the model alternatives by using the estat command.

Example 2

To display the correlations of the errors in the latent-variable equations, we type

. estat correlation

train bus car

train 1.0000

bus 0.8909 1.0000

car 0.7895 0.8951 1.0000

Note: correlations are for alternatives differenced with air

asmprobit postestimation — Postestimation tools for asmprobit 131

The covariance matrix can be displayed by typing

. estat covariance

train bus car

train 2

bus 1.600208 1.613068

car 1.37471 1.399703 1.515884

Note: covariances are for alternatives differenced with air

Obtaining marginal effects

The marginal effects are computed as the derivative of the simulated probability for an alternative

with respect to an independent variable. A table of marginal effects is displayed for each alternative,

with the table containing the marginal effect for each case-speciﬁc variable and the alternative for

each alternative-speciﬁc variable.

By default, the marginal effects are computed at the means of each continuous independent variable

over the estimation sample. For indicator variables, the difference in the simulated probability evaluated

at 0 and 1 is computed by default. Indicator variables will be treated as continuous variables if the

nodiscrete option is used.

Example 3

Continuing with our model from example 1, we obtain the marginal effects for alternatives air,

train,bus, and car evaluated at the mean values of each independent variable. Recall that the

travelcost and termtime variables are alternative speciﬁc, taking on different values for each

alternative, so they have a separate marginal effect for each alternative.

132 asmprobit postestimation — Postestimation tools for asmprobit

. estat mfx

Pr(choice = air) = .29434926

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

travelcost

air -.002688 .000677 -3.97 0.000 -.004015 -.001362 102.65

train .0009 .000436 2.07 0.039 .000046 .001755 130.2

bus .000376 .000271 1.39 0.166 -.000155 .000908 115.26

car .001412 .00051 2.77 0.006 .000412 .002412 95.414

termtime

air -.010376 .002711 -3.83 0.000 -.015689 -.005063 61.01

train .003475 .001639 2.12 0.034 .000264 .006687 35.69

bus .001452 .001008 1.44 0.150 -.000523 .003427 41.657

car .005449 .002164 2.52 0.012 .001209 .00969 0

casevars

income .003891 .001847 2.11 0.035 .000271 .007511 34.548

Pr(choice = train) = .29531182

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

travelcost

air .000899 .000436 2.06 0.039 .000045 .001753 102.65

train -.004081 .001466 -2.78 0.005 -.006953 -.001208 130.2

bus .001278 .00063 2.03 0.042 .000043 .002513 115.26

car .001904 .000887 2.15 0.032 .000166 .003641 95.414

termtime

air .003469 .001638 2.12 0.034 .000258 .00668 61.01

train -.01575 .00247 -6.38 0.000 -.020591 -.010909 35.69

bus .004934 .001593 3.10 0.002 .001812 .008056 41.657

car .007348 .002228 3.30 0.001 .00298 .011715 0

casevars

income -.00957 .002223 -4.31 0.000 -.013927 -.005214 34.548

Pr(choice = bus) = .08880039

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

travelcost

air .00038 .000274 1.39 0.165 -.000157 .000916 102.65

train .001279 .00063 2.03 0.042 .000044 .002514 130.2

bus -.003182 .001175 -2.71 0.007 -.005485 -.00088 115.26

car .001523 .000675 2.26 0.024 .0002 .002847 95.414

termtime

air .001466 .001017 1.44 0.149 -.000526 .003459 61.01

train .004937 .001591 3.10 0.002 .001819 .008055 35.69

bus -.012283 .002804 -4.38 0.000 -.017778 -.006788 41.657

car .00588 .002255 2.61 0.009 .001461 .010299 0

casevars

income .000435 .001461 0.30 0.766 -.002428 .003298 34.548

asmprobit postestimation — Postestimation tools for asmprobit 133

Pr(choice = car) = .32168607

variable dp/dx Std. Err. z P>|z| [ 95% C.I. ] X

travelcost

air .00141 .000509 2.77 0.006 .000411 .002408 102.65

train .001903 .000886 2.15 0.032 .000166 .003641 130.2

bus .001523 .000675 2.25 0.024 .000199 .002847 115.26

car -.004836 .001539 -3.14 0.002 -.007853 -.001819 95.414

termtime

air .005441 .002161 2.52 0.012 .001205 .009677 61.01

train .007346 .002228 3.30 0.001 .00298 .011713 35.69

bus .005879 .002256 2.61 0.009 .001456 .010301 41.657

car -.018666 .003938 -4.74 0.000 -.026385 -.010948 0

casevars

income .005246 .002166 2.42 0.015 .001002 .00949 34.548

First, we note that there is a separate marginal effects table for each alternative and that table

begins by reporting the overall probability of choosing the alternative, for example, 0.2944 for air

travel. We see in the ﬁrst table that a unit increase in terminal time for air travel from 61.01 minutes

will result in a decrease in probability of choosing air travel (when the probability is evaluated at the

mean of all variables) by approximately 0.01, with a 95% conﬁdence interval of about −0.016 to

−0.005. Travel cost has a less negative effect of choosing air travel (at the average cost of 102.65).

Alternatively, an increase in terminal time and travel cost for train, bus, or car from these mean values

will increase the chance for air travel to be chosen. Also, with an increase in income from 34.5, it

would appear that an individual would be more likely to choose air or automobile travel over bus or

train. (While the marginal effect for bus travel is positive, it is not signiﬁcant.)

Example 4

Plotting the simulated probability marginal effect evaluated over a range of values for an independent

variable may be more revealing than a table of values. Below are the commands for generating the

simulated probability marginal effect of air travel for increasing air travel terminal time. We ﬁx all

other independent variables at their medians.

. qui gen meff = .

. qui gen tt = .

. qui gen lb = .

. qui gen ub = .

. forvalues i=0/19 {

2. local termtime = 5+5*‘i’

3. qui replace tt = ‘termtime’ if _n == ‘i’+1

4. qui estat mfx, at(median air:termtime=‘termtime’) var(termtime)

5. mat air = r(air)

6. qui replace meff = air[1,1] if _n == ‘i’+1

7. qui replace lb = air[1,5] if _n == ‘i’+1

8. qui replace ub = air[1,6] if _n == ‘i’+1

9. qui replace prob = r(pr_air) if _n == ‘i’+1

10. }

. label variable tt "terminal time"

134 asmprobit postestimation — Postestimation tools for asmprobit

. twoway (rarea lb ub tt, pstyle(ci)) (line meff tt, lpattern(solid)), name(meff)

> legend(off) title(" marginal effect of air travel" "terminal time and"

> "95% confidence interval", position(3))

. twoway line prob tt, name(prob) title(" probability of choosing" "air travel",

> position(3)) graphregion(margin(r+9)) ytitle("") xtitle("")

. graph combine prob meff, cols(1) graphregion(margin(l+5 r+5))

0 .2 .4 .6 .8 1

0 20 40 60 80 100

probability of choosing

air travel

−.02 −.015 −.01 −.005 0

0 20 40 60 80 100

terminal time

marginal effect of air travel

terminal time and

95% confidence interval

From the graphs, we see that the simulated probability of choosing air travel decreases in an

sigmoid fashion. The marginal effects display the rate of change in the simulated probability as a

function of the air travel terminal time. The rate of change in the probability of choosing air travel

decreases until the air travel terminal time reaches about 45; thereafter, it increases.

Stored results

estat mfx stores the following in r():

Scalars

r(pr alt)scalars containing the computed probability of each alternative evaluated at the value that is

labeled X in the table output. Here alt are the labels in the macro e(alteqs).

Matrices

r(alt)matrices containing the computed marginal effects and associated statistics. There is one matrix

for each alternative, where alt are the labels in the macro e(alteqs). Column 1 of each

matrix contains the marginal effects; column 2, their standard errors; columns 3 and 4,

their zstatistics and the p-values for the zstatistics; and columns 5 and 6, the conﬁdence

intervals. Column 7 contains the values of the independent variables used to compute the

probabilities r(pr alt).

asmprobit postestimation — Postestimation tools for asmprobit 135

Methods and formulas

Marginal effects

The marginal effects are computed as the derivative of the simulated probability with respect to each

independent variable. A set of marginal effects is computed for each alternative; thus, for Jalternatives,

there will be Jtables. Moreover, the alternative-speciﬁc variables will have Jentries, one for each

alternative in each table. The details of computing the effects are different for alternative-speciﬁc

variables and case-speciﬁc variables, as well as for continuous and indicator variables.

We use the latent-variable notation of asmprobit (see [R]asmprobit) for a J-alternative model

and, for notational convenience, we will drop any subscripts involving observations. We then have

the following linear functions ηj=xjβ+zαj, for j=1, . . . , J. Let kindex the alternative of

interest, and then

vj0=ηj−ηk

= (xj−xk)β+z(αj−αk) + j0

where j0=jif j < k and j0=j−1 if j > k, so that j0=1, . . . , J −1 and j0∼MVN(0,Σ).

Denote pk=Pr(v1≤0, . . . , vJ−1≤0)as the simulated probability of choosing alternative k

given proﬁle xkand z. The marginal effects are then ∂pk/∂xk,∂pk/∂xj, and ∂pk/∂z, where

k= 1, . . . , J,j6=k.asmprobit analytically computes the ﬁrst-order derivatives of the simulated

probability with respect to the v’s, and the marginal effects for x’s and zare obtained via the chain

rule. The standard errors for the marginal effects are computed using the delta method.

Also see

[R]asmprobit — Alternative-speciﬁc multinomial probit regression

[U] 20 Estimation and postestimation commands

Title

asroprobit — Alternative-speciﬁc rank-ordered probit regression

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas Reference

Also see

Syntax

asroprobit depvar indepvars  if  in weight , case(varname)

alternatives(varname)options 

options Description

Model

∗case(varname)use varname to identify cases

∗alternatives(varname)use varname to identify the alternatives available for each case

casevars(varlist)case-speciﬁc variables

constraints(constraints)apply speciﬁed linear constraints

collinear keep collinear variables

Model 2

correlation(correlation)correlation structure of the latent-variable errors

stddev(stddev)variance structure of the latent-variable errors

structural use the structural covariance parameterization; default is the

differenced covariance parameterization

factor(#)use the factor covariance structure with dimension #

noconstant suppress the alternative-speciﬁc constant terms

basealternative(#|lbl |str)alternative used for normalizing location

scalealternative(#|lbl |str)alternative used for normalizing scale

altwise use alternativewise deletion instead of casewise deletion

reverse interpret the lowest rank in depvar as the best; the default is the

highest rank is the best

SE/Robust

vce(vcetype)vcetype may be oim,robust,cluster clustvar,opg,

bootstrap, or jackknife

Reporting

level(#)set conﬁdence level; default is level(95)

notransform do not transform variance–covariance estimates to the standard

deviation and correlation metric

nocnsreport do not display constraints

display options control column formats and line width

136

asroprobit — Alternative-speciﬁc rank-ordered probit regression 137

Integration

intmethod(seqtype)type of quasi- or pseudouniform sequence

intpoints(#)number of points in each sequence

intburn(#)starting index in the Hammersley or Halton sequence

intseed(code |#)pseudouniform random-number seed

antithetics use antithetic draws

nopivot do not use integration interval pivoting

initbhhh(#)use the BHHH optimization algorithm for the ﬁrst #iterations

favor(speed |space) favor speed or space when generating integration points

Maximization

maximize options control the maximization process

coeflegend display legend instead of statistics

correlation Description

unstructured one correlation parameter for each pair of alternatives; correlations

with the basealternative() are zero; the default

exchangeable one correlation parameter common to all pairs of alternatives;

correlations with the basealternative() are zero

independent constrain all correlation parameters to zero

pattern matname user-speciﬁed matrix identifying the correlation pattern

fixed matname user-speciﬁed matrix identifying the ﬁxed and free correlation

parameters

stddev Description

heteroskedastic estimate standard deviation for each alternative; standard deviations

for basealternative() and scalealternative() set to one

homoskedastic all standard deviations are one

pattern matname user-speciﬁed matrix identifying the standard deviation pattern

fixed matname user-speciﬁed matrix identifying the ﬁxed and free standard

deviations

seqtype Description

hammersley Hammersley point set

halton Halton point set

random uniform pseudorandom point set

∗case(varname)and alternatives(varname)are required.

bootstrap,by,jackknife,statsby, and xi are allowed; see [U] 11.1.10 Preﬁx commands.

Weights are not allowed with the bootstrap preﬁx; see [R] bootstrap.

fweights, iweights, and pweights are allowed; see [U] 11.1.6 weight.

coeflegend does not appear in the dialog box.

See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.

138 asroprobit — Alternative-speciﬁc rank-ordered probit regression

Statistics >Ordinal outcomes >Rank-ordered probit regression

Description

asroprobit ﬁts rank-ordered probit (ROP) models by using maximum simulated likelihood (MSL).

The model allows you to relax the independence of irrelevant alternatives (IIA) property that is

characteristic of the rank-ordered logistic model by estimating the variance–covariance parameters

of the latent-variable errors. Each unique identiﬁer in the case() variable has multiple alternatives

identiﬁed in the alternatives() variable, and depvar contains the ranked alternatives made by each

case. Only the order in the ranks, not the magnitude of their differences, is assumed to be relevant.

By default, the largest rank indicates the more desirable alternative. Use the reverse option if the

lowest rank should be interpreted as the more desirable alternative. Tied ranks are allowed, but they

increase the computation time because all permutations of the tied ranks are used in computing the

likelihood for each case. asroprobit allows two types of independent variables: alternative-speciﬁc

variables, in which the values of each variable vary with each alternative, and case-speciﬁc variables,

which vary with each case.

The estimation technique of asroprobit is nearly identical to that of asmprobit, and the two

routines share many of the same options; see [R]asmprobit.

Options



 

Model 

case(varname)speciﬁes the variable that identiﬁes each case. This variable identiﬁes the individuals

or entities making a choice. case() is required.

alternatives(varname)speciﬁes the variable that identiﬁes the alternatives available for each case.

The number of alternatives can vary with each case; the maximum number of alternatives is 20.

alternatives() is required.

casevars(varlist)speciﬁes the case-speciﬁc variables that are constant for each case(). If there are

a maximum of Jalternatives, there will be J−1 sets of coefﬁcients associated with casevars().

constraints(constraints),collinear; see [R]estimation options.



 

Model 2 

correlation(correlation)speciﬁes the correlation structure of the latent-variable errors.

correlation(unstructured) is the most general and has J(J−3)/2+1 unique correlation

parameters. This is the default unless stddev() or structural are speciﬁed.

correlation(exchangeable) provides for one correlation coefﬁcient common to all latent

variables, except the latent variable associated with the basealternative().

correlation(independent) assumes that all correlations are zero.

correlation(pattern matname)and correlation(fixed matname)give you more ﬂexibil-

ity in deﬁning the correlation structure. See Variance structures in [R]asmprobit for more

information.

stddev(stddev)speciﬁes the variance structure of the latent-variable errors.

stddev(heteroskedastic) is the most general and has J−2 estimable parameters. The standard

deviations of the latent-variable errors for the alternatives speciﬁed in basealternative()

and scalealternative() are ﬁxed to one.

asroprobit — Alternative-speciﬁc rank-ordered probit regression 139

stddev(homoskedastic) constrains all the standard deviations to equal one.

stddev(pattern matname)and stddev(fixed matname)give you added ﬂexibility in deﬁning

the standard deviation parameters. See Variance structures in [R]asmprobit for more information.

structural requests the J×Jstructural covariance parameterization instead of the default J−1×J−1

differenced covariance parameterization (the covariance of the latent errors differenced with that of

the base alternative). The differenced covariance parameterization will achieve the same maximum

simulated likelihood regardless of the choice of basealternative() and scalealternative().

On the other hand, the structural covariance parameterization imposes more normalizations that

may bound the model away from its maximum likelihood and thus prevent convergence with some

datasets or choices of basealternative() and scalealternative().