LIWC2015 Language Manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 26

DownloadLIWC2015 Language Manual
Open PDF In BrowserView PDF
The Development and Psychometric
Properties of LIWC2015

James W. Pennebaker, Ryan L. Boyd,
Kayla Jordan, and Kate Blackburn

The University of Texas at Austin

Correspondence should be sent to James W. Pennebaker, Department of Psychology, The
University of Texas at Austin, 108 E. Dean Keeton Stop A8000, Austin, TX 78712­1043. The
LIWC2015 program is a commercial product distributed by Pennebaker Conglomerates for
research purposes and by Receptiviti, Inc for commercial purposes. All profits to Pennebaker for
the research­based version are donated to the Department of Psychology, University of Texas at
Austin.
The official reference to this paper is:
Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). ​
The development and
psychometric properties of LIWC2015​
. Austin, TX: University of Texas at Austin.

LIWC2015 Development Manual

Page 1

The Development and Psychometric Properties of
LIWC2015
The ways people use words in their daily lives can provide rich information about their beliefs,
fears, thinking patterns, social relationships, and personalities. From the time of Freud’s writings
about slips of the tongue to the early days of computer­based text analysis, researchers began
amassing increasingly compelling evidence that the words we use have tremendous
psychological value (Gottschalk & Glaser, 1969; Stone, Dunphy, Smith, & Ogilvie, 1966;
Weintraub, 1989).
Although promising, the early computer methods floundered because of the sheer complexity of
the task. Extensive samples of text were not digitized, computers were slow and unwieldy, and
there was little agreement about which features of natural language were most related to
psychological states. Everything changed in the 1990s with the advent of efficient desktop
computers, improved data storage technology, and the explosion of the internet. These factors
allowed for the easy collection of large stores of books, conversations, and other digitized text
samples.
In order to provide an efficient and effective method for studying the various emotional,
cognitive, and structural components present in individuals’ verbal and written speech samples,
we originally developed a text analysis application called Linguistic Inquiry and Word Count, or
LIWC. The first LIWC application was developed as part of an exploratory study of language
and disclosure (Francis, 1993; Pennebaker, 1993). The second (LIWC2001) and third
(LIWC2007) versions updated the original application with an expanded dictionary and a more
modern software design (Pennebaker, Francis, & Booth, 2001; Pennebaker, Booth, & Francis,
2007).
The most recent evolution, LIWC2015 (Pennebaker, Booth, Boyd, & Francis, 2015), has
significantly altered both the dictionary and the software options. Importantly, the LIWC2015
software and dictionary are new, rather than a basic update to previous versions of LIWC. As
with previous versions, however, the program is designed to analyze individual or multiple
language files quickly and efficiently. At the same time, the program attempts to be transparent
and flexible in its operation, allowing the user to explore word use in multiple ways.

The LIWC2015 Framework
Both the standard downloadable and web­based versions of the LIWC2015 application rely on
an internal default dictionary that defines which words should be counted in the target text files.
Note that the LIWC2015 processor is an executable file and cannot be read or opened. To avoid
confusion in the subsequent discussion, words contained in texts that are read and analyzed by
LIWC2015 are referred to as ​
target words​
. Words in the LIWC2015 dictionary file will be
referred to as ​
dictionary words​
. Groups of dictionary words that tap a particular domain (e.g.,
negative emotion words) are variously referred to as subdictionaries or word categories.

LIWC2015 Development Manual

Page 2

The LIWC2015 Main Text Processing Module
Because the software application is written in a cross­platform language, it runs identically on
PC and Mac computers via the Java Virtual Machine. LIWC2015 is designed to accept written or
transcribed verbal text which has been stored as a digital, machine­readable file in one of
multiple formats, including plain text, PDF, RTF, or standard Microsoft Word files (i.e., .doc and
.docx). Unlike previous versions, the software can now process text on a line by line basis within
and across columns inside of multiple spreadsheet formats, including those saved as .xls, .xlsx,
and .csv files.
During operation, LIWC2015 accesses a single text file, a group of files, or texts within a
spreadsheet and analyzes each sequentially. For each file, LIWC2015 reads one target word at a
time. As each target word is processed, the dictionary file is searched, looking for a dictionary
match with the current target word. If the target word is matched with a dictionary word, the
appropriate word category scale (or scales) for that word is incremented. As the target text file is
being processed, counts for various structural composition elements (e.g., word count and
sentence punctuation) are also incremented.
For each text file, approximately 90 output variables are written as one line of data to an output
file. This data record includes the file name and word count, 4 summary language variables
(analytical thinking, clout, authenticity, and emotional tone), 3 general descriptor categories
(words per sentence, percent of target words captured by the dictionary, and percent of words in
the text that are longer than six letters), 21 standard linguistic dimensions (e.g., percentage of
words in the text that are pronouns, articles, auxiliary verbs, etc.), 41 word categories tapping
psychological constructs (e.g., affect, cognition, biological processes, drives), 6 personal concern
categories (e.g., work, home, leisure activities), 5 informal language markers (assents, fillers,
swear words, netspeak), and 12 punctuation categories (periods, commas, etc). A complete list of
the standard LIWC2015 scales is included in Table 1.

The Default LIWC2015 Dictionary
The LIWC2015 Dictionary is the heart of the text analysis strategy. The default LIWC2015
Dictionary is composed of almost 6,400 words, word stems, and select emoticons. Each
dictionary entry additionally defines one or more word categories or subdictionaries. For
example, the word ​
cried​
is part of five word categories: sadness, negative emotion, overall affect,
verbs, and past focus. Hence, if the word ​
cried​
is found in the target text, each of these five
subdictionary scale scores will be incremented. As in this example, many of the LIWC2015
categories are arranged hierarchically. All sadness words, by definition, belong to the broader
“negative emotion” category, as well as the “overall affect words” category. Note too that word
stems can be captured by the LIWC2015 system. For example, the dictionary includes the stem
hungr*​
which allows for any target word that matches the first five letters to be counted as an
ingestion word (including hungry, hungrier, hungriest). The asterisk, then, denotes the
acceptance of all letters, hyphens, or numbers following its appearance.

LIWC2015 Development Manual

Page 3

Each of the default LIWC2015 categories is composed of a list of dictionary words that define
that scale. Table 1 provides a comprehensive list of the default LIWC2015 dictionary categories,
scales, sample scale words, and relevant scale word counts.
Table 1. LIWC2015 Output Variable Information
Category
Word count
Summary Language Variables
Analytical thinking
Clout
Authentic
Emotional tone
Words/sentence
Words > 6 letters
Dictionary words
Linguistic Dimensions
Total function words
Total pronouns
Personal pronouns
1st pers singular
1st pers plural
2nd person
3rd pers singular
3rd pers plural
Impersonal pronouns
Articles
Prepositions
Auxiliary verbs
Common Adverbs
Conjunctions
Negations
Other Grammar
Common verbs
Common adjectives
Comparisons
Interrogatives
Numbers
Quantifiers
Psychological Processes
Affective processes
Positive emotion
Negative emotion
Anxiety
Anger
Sadness
Social processes
Family

Examples

Words in
category

WC

­

­

Analytic
Clout
Authentic
Tone
WPS
Sixltr
Dic

­
­
­
­
­
­
­

­
­
­
­
­
­
­

Abbrev

Internal
Consistency
(Uncorrected ​
α​
)
­
­
­
­
­
­
­
­

Internal
Consistency
(Corrected ​
α​
)
­
­
­
­
­
­
­
­

funct
pronoun
ppron
i
we
you
shehe
they
ipron
article
prep
auxverb
adverb
conj
negate

it, to, no, very
I, them, itself
I, them, her
I, me, mine
we, us, our
you, your, thou
she, her, him
they, their, they’d
it, it’s, those
a, an, the
to, with, above
am, will, have
very, really
and, but, whereas
no, not, never

491
153
93
24
12
30
17
11
59
3
74
141
140
43
62

.05
.25
.20
.41
.43
.28
.49
.37
.28
.05
.04
.16
.43
.14
.29

.24
.67
.61
.81
.82
.70
.85
.78
.71
.23
.18
.54
.82
.50
.71

verb
adj
compare
interrog
number
quant

eat, come, carry
free, happy, long
greater, best, after
how, when, what
second, thousand
few, many, much

1000
764
317
48
36
77

.05
.04
.08
.18
.45
.23

.23
.19
.35
.57
.83
.64

affect
posemo
negemo
anx
anger
sad
social
family

happy, cried
love, nice, sweet
hurt, ugly, nasty
worried, fearful
hate, kill, annoyed
crying, grief, sad
mate, talk, they
daughter, dad, aunt

1393
620
744
116
230
136
756
118

.18
.23
.17
.31
.16
.28
.51
.55

.57
.64
.55
.73
.53
.70
.86
.88

LIWC2015 Development Manual

Page 4

Category

Abbrev

Examples

Friends
Female references
Male references
Cognitive processes
Insight
Causation
Discrepancy
Tentative
Certainty
Differentiation
Perceptual processes
See
Hear
Feel
Biological processes
Body
Health
Sexual
Ingestion
Drives
Affiliation
Achievement
Power
Reward
Risk
Time orientations
Past focus
Present focus
Future focus
Relativity
Motion
Space
Time
Personal concerns
Work
Leisure
Home
Money
Religion
Death
Informal language
Swear words
Netspeak
Assent
Nonfluencies
Fillers

friend
female
male
cogproc
insight
cause
discrep
tentat
certain
differ
percept
see
hear
feel
bio
body
health
sexual
ingest
drives
affiliation
achieve
power
reward
risk
TimeOrient
focuspast
focuspresent
focusfuture
relativ
motion
space
time

buddy, neighbor
girl, her, mom
boy, his, dad
cause, know, ought
think, know
because, effect
should, would
maybe, perhaps
always, never
hasn’t, but, else
look, heard, feeling
view, saw, seen
listen, hearing
feels, touch
eat, blood, pain
cheek, hands, spit
clinic, flu, pill
horny, love, incest
dish, eat, pizza

work
leisure
home
money
relig
death
informal
swear
netspeak
assent
nonflu
filler

Words in
category

Internal
Consistency
(Uncorrected ​
α​
)

Internal
Consistency
(Corrected ​
α​
)

95
124
116
797
259
135
83
178
113
81
436
126
93
128
748
215
294
131
184
1103
248
213
518
120
103

.20
.53
.52
.65
.47
.26
.34
.44
.31
.38
.17
.46
.27
.24
.29
.52
.09
.37
.67
.39
.40
.41
.35
.27
.26

.60
.87
.87
.92
.84
.67
.76
.83
.73
.78
.55
.84
.69
.65
.71
.87
.37
.78
.92
.80
.80
.81
.76
.69
.68

ago, did, talked
today, is, now
may, will, soon
area, bend, exit
arrive, car, go
down, in, thin
end, until, season

341
424
97
974
325
360
310

.23
.24
.26
.50
.36
.45
.39

.64
.66
.68
.86
.77
.83
.79

job, majors, xerox
cook, chat, movie
kitchen, landlord
audit, cash, owe
altar, church
bury, coffin, kill

444
296
100
226
174
74
380
131
209
36
19
14

.69
.50
.46
.60
.64
.39
.46
.45
.42
.10
.27
.06

.93
.86
.83
.90
.91
.79
.84
.83
.82
.39
.69
.27

ally, friend, social
win, success, better
superior, bully
take, prize, benefit
danger, doubt

fuck, damn, shit
btw, lol, thx
agree, OK, yes
er, hm, umm
Imean, youknow

LIWC2015 Development Manual

Page 5

“Words in category” refers to the number of different dictionary words and stems that make up the variable
category. All alphas were computed on a sample of ~181,000 text files from several of our language corpora (see
Table 2). Uncorrected internal consistency alphas are based on Cronbach estimates; corrected alphas are based on
Spearman Brown. See the Reliability and Validity section below. Note that the LIWC2015 dictionary generally
arranges categories hierarchically. There are some exceptions to the hierarchy rules. For example, ​
Social processes
include a large group of words that denote social processes, including all non­first­person­singular personal
pronouns as well as verbs that suggest human interaction (talking, sharing) ­­ many of these words do not belong to
any of the ​
Social processes​
subcategories. Another example is ​
Relativity​
, which includes a large number of words
that cannot be found in any of its subcategories.

LIWC2015 Dictionary Development
The selection of words defining the LIWC2015 categories involved multiple steps over several
years. Originally, the idea was to identify a group of words that tapped basic emotional and
cognitive dimensions often studied in social, health, and personality psychology. With time, the
domain of word categories expanded considerably.
The most recent version of the dictionary, LIWC2015, is a completely new version compared to
earlier ones. Dictionaries can now accommodate numbers, punctuation, and even short phrases.
These additions allow the user to read "netspeak" language that is common in Twitter and
Facebook posts, as well as SMS (short messaging service, a.k.a. “text messaging”) and SMS­like
modes of communication (e.g., Snapchat, instant messaging). For example, "b4" is coded as a
preposition and ":)" is coded as a positive emotion word.
A handful of new categories have been added and a small number have been removed. With the
advent of more powerful analytic methods and more diverse language samples, we have been
able to build more internally­consistent language dictionaries. This means that many of the
dictionaries in previous LIWC versions may have the same name, but the words making up the
dictionaries have been altered (categories subjected to major changes are presented below). We
present here a complete overview of the process used to create the LIWC2015 dictionary.
Step 1. Word Collection.​
In the design and development of the LIWC category scales, sets of
words were first generated for each conceptual dimension, using the LIWC2007 dictionary as a
starting point. Within the Psychological Processes category, for example, the emotion
subdictionaries were based on words from several sources, including previous versions of the
LIWC dictionary. We drew on common emotion rating scales, such as the PANAS (Watson,
Clark, & Tellegen, 1988), Roget’s Thesaurus, and standard English dictionaries. Following the
creation of preliminary category word lists, 2­6 judges individually generated word lists for each
category, then group brain­storming sessions among 4­8 judges were held in which words
relevant to the various scales were generated and added to the initial scale lists. Similar schemes
were used for the other subjective dictionary categories.
Step 2. Judge Rating Phase.​
Once the grand list of words was amassed, each word in the
dictionary was examined by a group of 4­8 judges and qualitatively rated in terms of “goodness
of fit” for each category. In order for a word to remain in a given category, a majority of judges
had to agree on its inclusion. In cases of disputes, several corpora and online sources were
referenced to determine a word’s common use, inflection, and meaning. Words for which judges
could not decide on appropriate category placement were removed from the dictionary.

LIWC2015 Development Manual

Page 6

Step 3. Base Rate Analyses.​
Once a working version of the dictionary was constructed from
judges’ ratings, texts from several sources were analyzed using the Meaning Extraction Helper
(MEH; Boyd, 2015) to determine how frequently dictionary words were used in various
contexts. These sources included blog posts, spoken language studies, Twitter, Facebook, novels,
student writings, and several others. Dictionary words that did not occur at least once in multiple
corpora were omitted from the dictionary.
Step 4. Candidate Word List Generation. ​
In order to expand the dictionary, we explored several
sources of language for high­frequency words that had not been added by judges. Using MEH,
high­frequency words were quantified as a percentage of total words for hundreds of thousands
of text files from multiple studies and sources. For several linguistic categories (e.g., verbs,
adjectives), the Stanford Natural Language Toolkit (NLTK; Toutanova, Klein, Manning, &
Singer, 2003) was used in conjunction with MEH to identify common words. All candidate
words were then correlated with all dictionary categories in order to detect common words that
were not yet included in the dictionary. Words that correlated positively with dictionary
categories were added to a list of candidate words for possible inclusion. Following this, 4­8
judges reviewed the candidate list and voted on 1) whether words should be included in the
dictionary and 2) whether words were a sound conceptual fit for specific dictionary categories.
Judges’ rating procedures were parallel to those outlined in ​
Step 2​
.

Step 5. Psychometric Evaluation.​
Following all previously­described steps, each language
category was separated into its constituent words. Each word was then quantified as a percentage
of total words for ~181,000 text files hailing from 5 corpora, totalling ~231,000,000 words (see
Table 2). All words for each category were treated as a “response” and used to compute internal
consistency statistics for each language category as a whole. Words that were detrimental to the
internal consistency of their overarching language category were added to a candidate list of
words for omission from the final dictionary. A group of 2­8 judges then reviewed the list of
candidate words and voted on whether words should be retained. Words for which no majority
could be established were omitted. Several linguistic categories, such as ​
pronouns​
and ​
adverbs​
,
constitute established linguistic constructs and were therefore not a part of the omission process.
We discuss the psychometric evaluation procedures in extensive detail in the next section.
Step 6. Refinement Phase.​
After Steps 1 through 5 were complete, they were repeated in their
entirety. This was done to catch any possible mistakes/oversights that might have occurred
throughout the dictionary creation process. Note that the psychometrics of each language
category changed negligibly during each refinement phase. During the last stage of the final
refinement phase, two judges reviewed the dictionary for mistakes.
Step 7. Addition of Summary Variables.
A major change from earlier versions of LIWC is the inclusion of four new summary variables:
analytical thinking (Pennebaker et al., 2014), clout (Kacewicz et al., 2012), authenticity
(Newman et al., 2003), and emotional tone (Cohn et al., 2004). Each summary variable was
derived from previously published findings from our lab and converted to percentiles based on
standardized scores from large comparison samples. It must be emphasized that the summary
variables are the only non­transparent dimensions in the LIWC2015 output.

LIWC2015 Development Manual

Page 7

A Note about the LIWC2015 Language Categories
For those who are familiar with LIWC2007, some of the LIWC2015 categories and results will
be a bit jarring. Some of the original categories have been removed, largely due to their
consistently low base rates, low internal reliability, or their infrequent use by researchers:
Past tense verbs

Present tense verbs

Future tense verbs

Inhibition words

Inclusives

Exclusives

Human words

The following is a list of categories that are either a) new to LIWC2015, or b) substantially
different from their counterparts in previous versions. While other LIWC2015 categories may
also be slightly different from those in previous versions, categories from previous versions of
LIWC that are presented in the list below have undergone substantial revision.

Common verbs

Common adjectives

Common comparison
words

Interrogatives

Female references

Male references

Cognitive processes

Differentiation words

Drives

Affiliation words

Achievement words

Power words

Risk words

Reward words

Past focus words

Present focus words

Future focus words

Informal language

Netspeak words

Quantifiers

Note that the LIWC2015 application comes with the original internal dictionaries for both
LIWC2001 and LIWC2007 for those who want to rely on older versions of the dictionary as well
as to compare LIWC2015 analyses with those provided by older versions of the software.

LIWC2015: Internal Reliability and External Validity
Assessing the reliability and validity of text analysis programs is a tricky business. On the
surface, one would think that you could determine the internal reliability of a LIWC scale the
same way it is done with a questionnaire. With a questionnaire that taps anger or aggression, for
example, participants complete a self­report asking a number of questions about their feelings or
behaviors related to anger. Reliability coefficients are computed by correlating people’s
responses to the various questions. The more highly they correlate, the reasoning goes, the more
the questionnaire items all measure the same thing. Voila! The scale is deemed internally
consistent.
A similar strategy can be used with words. But be warned: the psychometrics of natural language
use are not as straight­forward as with questionnaires. The reason is obvious once you think

LIWC2015 Development Manual

Page 8

about it. Once you say something, you generally don’t need to say it again in the same paragraph
or essay. The nature of discourse, then, is we usually say something and then move on to the next
topic. Repeating the same idea over and over again is generally bad form in language, yet this is
a staple of self­report questionnaire design. It is important, then, to understand that acceptable
boundaries for natural language reliability coefficients are lower than those commonly seen
elsewhere in psychological tests.
The LIWC Anger scale, for example, is made up of 230 anger­related words and word stems. In
theory, the more that people use one type of anger word in a given text, the more they should use
other anger words in the same text. To test this idea, we can determine the degree to which
people use each of the 230 anger words across a select group of text files and then calculate the
intercorrelations of the word use. Indeed, in Table 1, we include these internal reliability
statistics, including those of Anger where the alpha reliabilities range between .52 (corrected)
and .07 (uncorrected) depending on how it is computed. In order to calculate these statistics, each
dictionary word was measured as a percentage of total words per text. These scores were then
entered as an “item” in a standard Cronbach’s alpha calculation, providing raw alpha scores for
each word category, separately for each corpora. Uncorrected alphas in Table 1 are averages of
each corpora’s alpha score. Importantly, the uncorrected method tends to grossly underestimate
reliability in language categories due the highly variable base rates of word usage within any
given category. Corrected alphas were computed using the Spearman­Brown prediction formula
(Brown, 1910; Spearman, 1910), and are generally a more accurate approximation of each
category’s “true” internal consistency.
Issues of validity are also a bit tricky. We can have people complete a questionnaire that assesses
their general moods and then have them write an essay which we then subject to the LIWC
program. We can also have judges evaluate the essay for its emotional content. In other words,
we can get self­reported, judged, and LIWC numbers that all reflect a participant’s anger.
One of the first tests of the validity of the LIWC scales was undertaken by Pennebaker and
Francis (1996) as part of an experiment in which first year college students wrote about the
experience of coming to college. During the writing phase of the study, 72 Introductory
Psychology students met as a group on three consecutive days to write on their assigned topics.
Participants in the experimental condition (n = 35) were instructed to write about their deepest
thoughts and feelings concerning the experience of coming to college. Those in the control
condition (n = 37) were asked to describe any particular object or event of their choosing in an
unemotional way. After the writing phase of the study was completed, four judges rated the
participants’ essays on various emotional, cognitive, content, and composition dimensions
designed to correspond to selected LIWC Dictionary scales. Using LIWC output and judges’
ratings, Pearson correlational analyses were performed to test LIWC’s external validity. The
findings suggested that LIWC successfully measures positive and negative emotions, a number
of cognitive strategies, several types of thematic content, and various language composition
elements. The level of agreement between judges’ ratings and LIWC’s objective word count
strategy provides support for LIWC’s external validity.
Since the first version of LIWC, hundreds of studies have found the LIWC categories to be valid
across dozens of psychological domains. As a starting point for exploring this body of literature,
we recommend a close reading of Tausczik and Pennebaker (2010).

LIWC2015 Development Manual

Page 9

Base Rates of Word Usage
In evaluating any text analysis program, it is helpful to get a sense of the degree to which
language varies across settings. Since 1986, we have been collecting text samples from a variety
of studies – both from our own lab as well as from dozens of others in the United States,
England, Canada, New Zealand, and Australia. For purposes of comparison, text from several
dozens of studies have been analyzed using the updated LIWC2015 dictionary. As can be seen in
Table 2, these analyses reflect the utterances of over 80,000 writers or speakers totaling over 231
million words. We provide a brief description of each dataset below.

Table 2. Summary Information for LIWC2015 Statistics
Blogs

Expressive
writing

Novels

Natural
Speech

NY Times

Twitter

Total files

37,295

6,179

875

3,232

34,929

35,269

Total authors

37,295

2,510

441

2,174

Unknown

35,269

119,449,058

2,526,709

57,467,183

2,566,446

26,007,632

23,172,994

Total words

Note:​
All texts for all corpora required a minimum of 25 words for inclusion in our analyses. All texts with fewer
than 25 words were omitted for all statistics reported in this document.

Blogs​
. This is an expanded version of the corpus described in Schler, Koppel, Argamon, and
Pennebaker (2006). All blog posts were merged by individual prior to analysis, reflecting the
entirety of each person’s blog.
Expressive writing​
. This dataset consists of 29 samples from experiments where people were
randomly assigned to write either about deeply emotional topics (emotional writing) or about
relatively trivial topics such as plans for the day (control writing). Individuals from all walks of
life – ranging from college students to psychiatric prisoners to elderly and even elementary­aged
individuals – are represented in these studies. Only the emotional writing topics were included in
the current analyses.
Novels​
. This is a sample of novels acquired from Project Gutenberg (​
http://www.gutenberg.org/​
)
that had been tagged as “literature”. All novels were written in the English language by authors
who lived between approximately 1660 and 2008. The number of authors presented in Table 2
reflects only known authors of the works analyzed ­­ works for which the author was unknown
were not included in this figure, but included in analyses.
Natural speech​
. The speech samples included diverse transcripts from multiple contexts,
including people wearing audio recorders over days or weeks, strangers interacting in a waiting
room, couples talking about problems, and open­air tape recordings of people in public spaces.
New York Times​
. A collection of articles published online at the New York Times website
(​
http://www.nytimes.com​
). Articles were collected from the New York Times internet archives

LIWC2015 Development Manual

Page 10

and include various types of work, including editorials, features, U.S. and world news, letters to
the editor, and so on. All articles were published between January and July of 2014. Author
information was not preserved for this dataset, so the true number of authors is unknown.
Twitter​
. Individual Twitter posts (i.e., “tweets”) were collected from the public profiles of users
whose names were entered into the Analyze Words webpage (​
http://analyzewords.com​
). Each
user’s tweets were combined into a single unit of observation for analysis.
As can be seen in Table 3, the LIWC2015 version captures, on average, over 86 percent of the
words people use in writing and speech. Note that except for total word count and words per
sentence and the four summary variables (Analytic, Clout, Authentic, and Tone), all means in
Table 3 are expressed as percentage of total words used in any given language sample. Simple
statistical tests indicate that nearly all language categories differ significantly between contexts.

Table 3. LIWC2015 Output Variable Information
Category
Linguistic Processes
Word count (mean)
Analytic
Clout
Authentic
Tone
*
Words/sentence​
Words>6 letters
Dictionary words
Total function words
Total pronouns
Personal pronouns
1st pers singular
1st pers plural
2nd person
3rd pers singular
3rd pers plural
Impersonal pronouns
Articles
Prepositions
Auxiliary verbs
Adverbs
Conjunctions
Negations
Other Grammar
Common verbs
Common adjectives
Comparisons
Interrogatives

Blogs

Expressive
writing

Novels

Natural
Speech

NY
Times

Twitter

Grand
Means

Mean
SDs

3206.45
49.89
47.87
60.93
54.50
18.40
14.38
85.79
53.10
16.20
10.66
6.26
0.91
1.32
1.50
0.68
5.53
6.00
12.60
8.75
5.88
6.43
1.81

408.94
44.88
37.02
76.01
38.60
18.42
13.62
91.93
58.27
18.03
12.74
8.66
0.81
0.68
2.01
0.57
5.28
5.70
14.27
9.25
6.02
7.46
1.69

65716.49
70.33
75.37
21.56
37.06
16.13
16.30
84.52
54.51
15.15
10.35
2.63
0.61
1.39
4.80
0.92
4.79
8.35
14.27
7.77
4.17
6.28
1.68

794.17
18.43
56.27
61.32
79.29
­
10.42
91.60
56.86
20.92
13.37
7.03
0.87
4.04
0.77
0.65
7.53
4.34
10.29
12.03
7.67
6.21
2.42

744.62
92.57
68.17
24.84
43.61
21.94
23.58
74.62
42.39
7.41
3.56
0.63
0.38
0.34
1.53
0.68
3.84
9.08
14.27
5.11
2.76
4.85
0.62

660.24
61.94
63.02
50.39
72.24
12.10
15.31
82.60
46.08
13.62
9.02
4.75
0.74
2.41
0.64
0.47
4.60
5.58
11.88
8.27
5.13
4.19
1.74

11921.82
56.34
57.95
49.17
54.22
17.40
15.60
85.18
51.87
15.22
9.95
4.99
0.72
1.70
1.88
0.66
5.26
6.51
12.93
8.53
5.27
5.90
1.66

10274.32
17.58
17.51
20.92
23.27
16.38
3.76
5.36
5.13
3.61
3.02
2.46
0.83
1.35
1.53
0.60
1.62
1.79
2.11
2.04
1.61
1.57
0.86

17.03
4.53
2.17
1.51

18.63
4.52
2.42
1.49

15.42
4.36
2.13
1.53

21.01
4.13
2.35
2.44

10.23
4.52
2.39
1.26

16.33
4.89
1.89
1.43

16.44
4.49
2.23
1.61

2.93
1.30
0.95
0.76

LIWC2015 Development Manual
Number
Quantifiers
Psychological Processes
Affective processes
Positive emotion
Negative emotion
Anxiety
Anger
Sadness
b
Social processes​
Family
Friends
Female references
Male references
Cognitive processes
Insight
Causation
Discrepancy
Tentative
Certainty
Differentiation
Perceptual processes
See
Hear
Feel
Biological processes
Body
Health
Sexual
Ingestion
Drives
Affiliation
Achievement
Power
Reward
Risk
Time orientations
Past focus
Present focus
Future focus
Relativity
Motion
Space
Time
Personal Concerns
Work
Leisure
Home
Money
Religion

Page 11

1.89
2.27

1.87
2.35

1.23
1.80

2.19
1.93

3.55
1.94

1.98
1.85

2.12
2.02

2.07
0.83

5.79
3.66
2.06
0.27
0.68
0.44
8.95
0.46
0.40
0.91
1.31
11.58
2.28
1.46
1.56
2.82
1.56
3.31
2.58
1.04
0.75
0.64
2.16
0.74
0.61
0.17
0.54
6.87
2.20
1.27
2.07
1.49
0.46

4.77
2.57
2.12
0.50
0.49
0.50
8.69
0.77
0.55
1.37
1.47
12.52
2.66
1.65
1.74
2.89
1.51
3.40
2.38
0.80
0.48
0.92
2.59
0.69
0.93
0.09
0.86
7.35
2.45
1.37
2.02
1.56
0.54

4.81
2.67
2.08
0.44
0.51
0.55
12.26
0.39
0.25
1.88
4.09
9.84
2.11
1.03
1.48
2.27
1.45
2.82
3.74
1.58
1.26
0.76
2.17
1.24
0.48
0.08
0.39
5.84
1.39
0.91
2.46
1.04
0.53

6.54
5.31
1.19
0.14
0.36
0.23
10.42
0.31
0.37
0.55
0.80
12.27
2.46
1.45
1.45
3.06
1.38
3.73
2.11
0.78
0.63
0.61
1.23
0.31
0.38
0.09
0.35
6.39
2.06
0.99
1.72
1.73
0.30

3.82
2.32
1.45
0.25
0.47
0.29
7.62
0.33
0.18
0.62
1.38
7.52
1.54
1.42
0.89
1.74
0.76
2.03
2.42
0.88
1.06
0.35
1.44
0.41
0.57
0.10
0.41
7.60
1.69
1.82
3.62
1.07
0.56

7.67
5.48
2.14
0.24
0.75
0.43
10.47
0.36
0.43
0.54
0.84
9.96
1.92
1.41
1.54
2.35
1.43
2.62
2.96
1.39
0.82
0.56
2.60
0.77
0.54
0.24
0.86
7.50
2.53
1.45
2.17
1.86
0.46

5.57
3.67
1.84
0.31
0.54
0.41
9.74
0.44
0.36
0.98
1.65
10.61
2.16
1.40
1.44
2.52
1.35
2.99
2.70
1.08
0.83
0.64
2.03
0.69
0.59
0.13
0.57
6.93
2.05
1.30
2.35
1.46
0.47

1.99
1.63
1.09
0.32
0.59
0.40
3.38
0.63
0.40
1.26
1.34
3.02
1.08
0.73
0.80
1.09
0.70
1.18
1.20
0.78
0.62
0.52
1.39
0.64
0.65
0.30
0.83
2.03
1.28
0.82
1.12
0.81
0.41

4.25
10.95
1.60
14.23
2.15
6.43
5.86

5.83
10.45
1.85
16.19
2.58
6.96
7.01

7.06
6.21
1.19
14.56
2.34
7.82
4.71

3.78
15.28
1.45
12.12
2.20
5.86
4.28

4.09
5.14
0.80
14.47
1.70
7.76
5.17

2.81
11.74
1.60
13.99
1.94
6.51
5.75

4.64
9.96
1.42
14.26
2.15
6.89
5.46

2.06
2.80
0.90
3.18
1.03
1.96
1.81

2.04
1.50
0.49
0.59
0.39

2.64
1.17
0.99
0.41
0.20

1.20
0.56
0.56
0.45
0.34

2.87
1.11
0.34
0.44
0.14

4.49
1.67
0.47
1.47
0.25

2.16
2.11
0.43
0.74
0.35

2.56
1.35
0.55
0.68
0.28

1.81
1.08
0.63
0.83
0.57

LIWC2015 Development Manual
Death
Informal Language
Swear words
Netspeak
Assent
Nonfluencies
Fillers
*
Punctuation​
Total Punctuation
Periods
Commas
Colons
Semicolons
Question marks
Exclamation marks
Dashes
Quotation marks
Apostrophes
Parentheses
Other punctuation

Page 12

0.15
2.09
0.35
0.92
0.33
0.42
0.11

0.12
0.45
0.09
0.05
0.10
0.17
0.04

0.26
0.53
0.05
0.10
0.14
0.24
0.01

24.18
10.29
4.15
0.43
0.10
0.59
1.16
0.99
0.71
3.85
0.90
1.00

12.41
6.17
3.17
0.21
0.04
0.15
0.12
0.39
0.22
1.40
0.32
0.23

23.68
6.04
7.09
0.12
0.53
0.60
0.49
2.14
3.90
2.19
0.06
0.52

0.04
7.10
0.25
1.35
3.29
1.96
0.46
­
­
­
­
­
­
­
­
­
­
­
­

0.22
0.29
0.02
0.16
0.05
0.07
0.00

0.19
4.68
0.49
3.23
1.82
0.39
0.04

0.16
2.52
0.21
0.97
0.95
0.54
0.11

0.29
1.65
0.37
1.17
0.72
0.49
0.27

19.02
5.88
6.60
0.27
0.17
0.15
0.02
1.23
2.23
1.56
0.54
0.36

27.46
9.07
2.76
2.15
0.67
1.40
3.21
1.21
1.30
3.32
0.81
1.56

21.35
7.49
4.75
0.64
0.30
0.58
1.00
1.19
1.67
2.46
0.53
0.73

9.01
3.76
1.94
0.85
0.53
1.00
1.35
1.38
1.36
4.94
0.87
1.70

Notes: Grand Means are the unweighted means of the six genres; Mean SDs refer to the unweighted mean of the
standard deviations across the six genre categories.
*In calculating grand means and standard deviations for the words per sentence (WPS) and punctuation categories,
the natural speech corpus was excluded due to differing transcription rules across documents.

In many ways, Table 3 points to the important role that context plays in people’s use of
language. Not surprisingly, the topics of writing – as reflected in the current concerns category –
vary substantially as a function of genre. More striking, however, are the large differences in
people’s use of function words as well as punctuation from genre to genre (cf., Biber, 1988).

Comparing LIWC2015 with LIWC2007
For users of LIWC2007, a new edition of LIWC that uses a different dictionary can be an
unsettling experience. Most of the older dictionaries have been slightly changed, some have been
substantially reworked (e.g., social words, cognitive process words), and several others have
been removed or added. To assist in the transition to the new version of LIWC, we include Table
4 which lists the means, standard deviations, and correlations between the two dictionary
versions. These analyses are based on the corpora detailed in Tables 2 and 3. All numbers
presented in Table 4 are the average results from all six corpora.
To get a sense of how much a dictionary has changed from the LIWC2007 to the LIWC2015
versions, look at the LIWC2015/2007 Correlation column. The lower the correlation, the more
change across the two versions.

LIWC2015 Development Manual

Page 13

Table 4. Comparisons Between LIWC2015 and LIWC2007: Means, Standard Deviations,
and Correlations
LIWC Dimension

Output Label

LIWC2015 mean

LIWC2007 mean

Word count
Summary Variables
Analytical thinking
Clout
Authentic
Emotional tone
Language Metrics
*
Words per sentence​
Words>6 letters
Dictionary words
Function Words
Total pronouns
Personal pronouns
1st pers singular
1st pers plural
2nd person
3rd pers singular
3rd pers plural
Impersonal pronouns
Articles
Prepositions
Auxiliary verbs
Common adverbs
Conjunctions
Negations
Other Grammar
Regular verbs
Adjectives
Comparatives
Interrogatives
Numbers
Quantifiers
Affect Words
Positive emotion
Negative emotion
Anxiety
Anger
Sadness
Social Words
Family

WC

11,921.82

11,852.99

LIWC 2015/2007
1
Correlation​
1.00

Analytic
Clout
Authentic
Tone

56.34
57.95
49.17
54.22

­
­
­
­

­
­
­
­

WPS
Sixltr
Dic
function
pronoun
ppron
i
we
you
shehe
they
ipron
article
prep
auxverb
adverb
conj
negate

17.40
15.60
85.18
51.87
15.22
9.95
4.99
0.72
1.70
1.88
0.66
5.26
6.51
12.93
8.53
5.27
5.90
1.66

25.07
15.89
83.95
54.29
14.99
9.83
4.97
0.72
1.61
1.87
0.66
5.17
6.53
12.59
8.82
4.83
5.87
1.72

0.74
0.98
0.94
0.95
0.99
0.99
1.00
1.00
0.98
1.00
0.99
0.99
0.99
0.96
0.96
0.97
0.99
0.96

verb
adj
compare
interrog
number
quant
affect
posemo
negemo
anx
anger
sad
social
family

16.44
4.49
2.23
1.61
2.12
2.02
5.57
3.67
1.84
0.31
0.54
0.41
9.74
0.44

15.26
­
­
­
1.98
2.48
5.63
3.75
1.83
0.33
0.6
0.39
9.36
0.38

0.72
­
­
­
0.98
0.88
0.96
0.96
0.96
0.94
0.97
0.92
0.96
0.94

LIWC2015 Development Manual
Friends
Female referents
Male referents
2
Cognitive Processes​
Insight
Cause
Discrepancies
Tentativeness
Certainty
3
Differentiation​
Perceptual Processes
Seeing
Hearing
Feeling
Biological Processes
Body
Health/illness
Sexuality
Ingesting
Drives and Needs
Affiliation
Achievement
Power
Reward focus
Risk focus
4
Time Orientations​
Past focus
Present focus
Future focus
Relativity
Motion
Space
Time
Personal Concerns
Work
Leisure
Home
Money
Religion
Death
Informal Speech
Swear words
Netspeak
Assent
Nonfluencies

Page 14

friend
female
male
cogproc
insight
cause
discrep
tentat
certain
differ
percept
see
hear
feel
bio
body
health
sexual
ingest
drives
affiliation
achieve
power
reward
risk

0.36
0.98
1.65
10.61
2.16
1.40
1.44
2.52
1.35
2.99
2.70
1.08
0.83
0.64
2.03
0.69
0.59
0.13
0.57
6.93
2.05
1.30
2.35
1.46
0.47

0.23
­
­
14.99
2.13
1.41
1.45
2.42
1.27
2.48
2.36
0.87
0.73
0.62
1.88
0.68
0.53
0.28
0.46
­
­
1.56
­
­
­

0.78
­
­
0.84
0.98
0.97
0.99
0.98
0.92
0.85
0.92
0.88
0.94
0.92
0.94
0.96
0.87
0.76
0.94
­
­
0.93
­
­
­

focuspast
focuspresent
focusfuture
relativ
motion
space
time

4.64
9.96
1.42
14.26
2.15
6.89
5.46

4.14
8.1
1.00
13.87
2.06
6.17
5.79

0.97
0.92
0.63
0.98
0.93
0.96
0.94

work
leisure
home
money
relig
death
informal
swear
netspeak
assent
nonfl

2.56
1.35
0.55
0.68
0.28
0.16
2.52
0.21
0.97
0.95
0.54

2.27
1.37
0.56
0.70
0.32
0.16
­
0.17
­
1.11
0.30

0.97
0.95
0.99
0.97
0.96
0.96
­
0.89
­
0.68
0.84

LIWC2015 Development Manual
Fillers
*
All Punctuation​
Periods
Commas
Colons
Semicolons
Question marks
Exclamation marks
Dashes
Quotation marks
Apostrophes
Parentheses (pairs)
Other punctuation

Page 15

filler
Allpunc
Period
Comma
Colon
SemiC
QMark
Exclam
Dash
Quote
Apostro
Parenth
OtherP

0.11
21.35
7.49
4.75
0.64
0.3
0.58
1.00
1.19
1.67
2.46
0.53
0.73

0.40
21.65
7.56
4.75
0.73
0.29
0.58
1.00
1.21
1.64
2.52
0.63
0.72

0.29
0.98
0.98
1.00
0.98
0.97
1.00
1.00
0.98
0.93
0.94
0.90
0.95

*​

Due to differences in punctuation rules for transcriptions, the natural language corpus was excluded when
computing means and correlations for punctuation categories as well as words per sentence.
1​
Correlation is the average correlation between the 2007 and 2015 dictionaries across six corpora. Low correlations
(<.80) are to be expected due to the large category differences between the two versions.
2​
Cognitive processes is conceptually similar to the cognitive mechanisms LIWC2007 category. The newer cognitive
process dimension restricts constituent words to true markers of cognitive activity.
3​
Differentiation is conceptually similar to the 2007 exclusive category.
4​
Time Orientation categories are similar to the 2007 categories past, present, and future but are more unified to
reflect a general time orientation instead of just verb tense usage.

LIWC Dictionary Translations
The LIWC dictionaries have been translated into several languages, including Spanish, German,
Dutch, Norwegian, Italian, Portuguese. Several other language translations are underway,
including Arabic, Korean, Turkish, and Chinese. To date, these translations have relied on the
LIWC2001 or LIWC2007 dictionaries rather than LIWC2015.
Unlike previous versions of LIWC, the current version is bundled exclusively with the original
English dictionary versions. LIWC dictionary translations, as well as other published
dictionaries, will be made available at the official LIWC dictionary repository
(​
http://www.liwc.net/dictionaries​
). If you would like to build a non­English LIWC2015
dictionary or if you have built one independently would like to add it to the repository, contact
the first author at pennebaker@mail.utexas.edu.

Helpful References
Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in
formal written texts. ​
Text​
, 23, 32­346.

LIWC2015 Development Manual

Page 16

Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the
author of an anonymous text. ​
Communications of the Association for Computing
Machinery (CACM)​
, 52, 119­123.

Baayen, R. H., Piepenbrock, R., & Bulickers, L. (1995). The CELEX Lexical Database (Release
I) [CD ROM]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Back, M. D., Küfner, A. C., & Egloff, B. (2011). “Automatic or the people?” Anger on
September 11, 2001, and lessons learned for the analysis of large digital data sets.
Psychological science​
,​
22​
, 837­838.
Baddeley, J. L., Daniel, G. R., & Pennebaker, J. W. (2015). How Henry Hellyer’s use of
language foretold his suicide​
. Crisis, 32​
, 288­292.

Bazarova, N. N., Taft, J. G., Choi, Y. H., & Cosley, D. (2012). Managing impressions and
relationships on Facebook: Self­presentational and relational concerns revealed through
the analysis of language style. ​
Journal of Language and Social Psychology, 32​
,
121­141.
Biber, D. (1988). ​
Variation across speech and writing.​
Cambridge: Cambridge University Press.
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’
conception of time. ​
Cognitive Psychology, 43,​
1­22.

Bosson, J. K., Swann, W. B., Jr., & Pennebaker, J. W. (2000). Stalking the perfect measure of
implicit self­esteem: The blind men and the elephant revisited? ​
Journal of Personality
and Social Psychology, 79,​
631­643.
Boyd, R. L. (2015). MEH: Meaning Extraction Helper [Software]. Available from
http://meh.ryanb.cc

Boyd, R. L., & Pennebaker, J. W. (2015). Did Shakespeare write ​
Double Falsehood​
? Identifying
individuals by creating psychological signatures with text analysis. ​
Psychological
science​
,​
26,​
570­582.
Brewer, M. B., & Gardner, W. (1996). Who is this “We”? Levels of collective identity and self
representations. ​
Journal of Personality & Social Psychology, 71,​
83­93.
Brown, R. (1968). ​
Words and things: An introduction to language​
. NY: Free Press.

Bruner, J. S. (1973). ​
Beyond the information given: Studies in the psychology of knowing.
London: W. W. Norton.
Bucci, W. (1995). The power of the narrative: a multiple code account. In J. W. Pennebaker
(Ed.), ​
Emotion, Disclosure, and Health​
(pp. 93­122)​
.​
Washington, DC: American
Psychological Association.
Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space:
Neighborhood effects in word recognition. ​
Psychonomic Bulletin & Review​
,​
8​
,
531­544.

LIWC2015 Development Manual

Page 17

Carey, A. L., Brucks, M. S., Küfner, A. C. P., Holtzman, N. S., Deters, F. G., Back, M. D.,
Donnellan, M. B., et al. (2015). Narcissism and the user of personal pronouns revisited.
Journal of Personality and Social Psychology, 109​
, 1­15.
Campbell, R. S. & Pennebaker, J. W. (2003). The secret life of pronouns: Flexibility in writing
style and physical health. ​
Psychological science, 14​
, 60­65.

Chambers, J. K., Trudgill, P., & Schilling­Estes, N., (2004). ​
The handbook of language variation
and change.​
London: Blackwell.
Chung, C. K., & Pennebaker, J. W. (2013). Using computerized text analysis to track social
processes. In T. Holtgraves (Ed.),​
Handbook of language and social psychology ​
(pp.
219­23). New York, NY: Oxford.

Chung, C. K., & Pennebaker, J. W. (2012). Linguistic inquiry and word count (LIWC):
Pronounced “Luke,”... and other useful facts. In P. M. McCarthy & C. Boonthum
Denecke (Eds.), ​
Applied natural language processing: Identification, investigation and
resolution​
(pp. 206­229). Hershey, PA: IGI Global.

Chung, C. K., & Pennebaker, J. W. (2005). Assessing quality of life through natural language
use: Implications of computerized text analysis. In W. R. Lenderking and D. A. Revicki
(eds.), ​
Advancing health outcomes research methods and clinical applications​
(pp.
79­94). Washington, DC: Degnon Associates.
Chung, C. K., & Pennebaker, J. W. (2007). The psychological functions of function words. In K.
Fiedler (Ed.), ​
Social communication ​
(pp. 343­359). New York, NY: Psychology Press.
Chung, C. K., & Pennebaker, J. W. (2008). Revealing dimensions of thinking in open­ended
self­descriptions: An automated meaning extraction method for natural language.
Journal of Research in Personality, 42, ​
96­132.

Cohn, M. A., Mehl, M. R., & Pennebaker, J. W. (2004). Linguistic markers of psychological
change surrounding September 11, 2001. ​
Psychological science, 15,​
687­93.

Crammer, K. & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems.
Journal of Machine Learning Research​
,​
3​
, 951­991.
Damasio, A. R. (1995). ​
Descartes' error: Emotion, reason and the human brain. ​
NY: Harper
Collins.
Davison, K. P, & Pennebaker, J. W., & Dickerson, S. S. (2000). Who talks? The social
psychology of illness support groups. ​
American Psychologist​
,​
55​
, 205­217.

De Choudhury, M., Counts, S., & Horvitz, E. (2013, April). Predicting postpartum changes in
emotion and behavior via social media. In ​
Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems​
(pp. 3267­3276). ACM.
Feixas, G., Geldschlager, H., & Neimeyer, R. A. (2002). Content analysis of personal
constructs. ​
Journal of Constructivist Psychology, 15,​
1­19.

Fiedler, K., & Semin, G. R. (1992). Attribution and language as a socio­cognitive environment.
In G. R. Semin, and K. Fiedler (Eds.), ​
Language, Interaction, and Social Cognition (​
pp.
58­78.) Thousand Oaks, CA: Sage Publications, Inc.

LIWC2015 Development Manual

Page 18

Fitzsimmons, G. M., & Kay, A. C. (2004). Language and interpersonal cognition: Causal effects
of variations in pronoun usage on perceptions of closeness. ​
Personality and Social
Psychology Bulletin, 5,​
547­557.
Foltz, P. W. (1996). Latent semantic analysis for text­based research. ​
Behavior Research
Methods, Instruments & Computers, 28,​
197­202.

Francis, W. N., & Kucera, H. (1982). ​
Frequency analyses of English usage: Lexicon and
grammar​
. MA: Houghton Mifflin.
Gazzaniga, M. S. (2005). ​
The ethical brain.​
NY: Dana Press.

Genkin, A., Lewis, D. D., and Madigan, D. (2006). Large­scale Bayesian logistic regression for
text categorization. ​
Technometrics, 49​
, 291­304.
Gill, A. (2003). Personality and language. The projection and perception of personality in
computer mediated communication. Unpublished doctoral dissertation. University of
Edinburgh, Scotland.
Gill, A. J., Oberlander, J., & Austin, E. (2006). The perception of e­mail personality at
zero­acquaintace. ​
Personality and Individual Differences, 40,​
497­507.

Gortner, E. M., & Pennebaker, J. W. (2003). The anatomy of a disaster: Media coverage and
community­wide health effects of the Texas A&M Bonfire tragedy. ​
Journal of Social
and Clinical Psychology, 22, ​
580­603.

Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In C.
W. Roberts (Ed.) ​
Text analysis for the social sciences: Methods for drawing statistical
inferences from texts and transcripts (​
pp. 117­129​
). ​
Mahwah, NJ: Erlbaum.

Gottschalk, L. A., & Gleser, G. C. (1969). ​
The measurement of psychological states through the
content analysis of verbal behavior​
. CA: University of California Press.

Graesser, A. C., Gernsbacher, M. A., & Goldman, S. R. (2003). Introduction to the Handbook of
Discourse Processes. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman,
Handbook of discourse processes ​
(pp. 1­23). Mahwah, NJ: Lawrence Erlbaum
Associates.
Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H., Ventura, M., Olney, A., & Louwerse, M.
M. (2004). AutoTutor: A tutor with dialogue in natural language. ​
Behavioral
Research Methods, Instruments, and Computers, 36, ​
180­193.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh­Metrix: Analysis of
text on cohesion and language. ​
Behavior Research Methods, Instruments &
Computers, 36,​
193­202.
Graham, L. E., Scherwitz, L., & Brand, R. (1989). Self reference and coronary heart disease
incidence n the Western Collaborative Group Study. ​
Psychosomatic Medicine, 51,
137­144.

Graybeal, A., Seagal, J. D., & Pennebaker, J. W. (2002). The role of story­making in disclosure
writing: The psychometrics of narrative. ​
Psychology and Health, 17​
, 571­581.

LIWC2015 Development Manual

Page 19

Groom, C. J., & Pennebaker, J. W. (2005). The language of love: Sex, sexual orientation, and
language use in online personal advertisements. ​
Sex Roles, 52​
, 447­461.
Groom, C. J., & Pennebaker, J. W. (2003). Words. ​
Journal of Research in Personality, 36​
,
615­621.

Hajek, C., & Giles, H. (2003). New directions in intercultural communication competence. In J.
O. Greene and B. R. Burleson (Eds.), ​
Handbook of communication and social
interaction skills (​
pp.935­957). Mahwah, NJ: Lawrence Erlbaum Associates,
Publishers.
Halliday, M. A. K., & Matthiessen, C. (2004). ​
An introduction to functional grammar ​
(3rd ed.).
London: Arnold.
Hart, R. P., Jarvis, S. E., Jennings, W. P., & Smith­Howell, D. (2005). ​
Political keywords: Using
language that uses us​
. NY: Oxford University Press.
Hartley, J., Pennebaker, J. W., & Fox, C. (2003). Using new technology to assess the academic
writing styles of male and female pairs and individuals. ​
Journal of Technical Writing
and Communication, 33​
, 243­261.
Hartley, J., Sotto E., & Pennebaker, J. W. (2003). Speaking versus typing: A case­study of the
effects of using voice­recognition software on academic correspondence. ​
British
Journal of Educational Technology, 34​
, 5­16.

Hartley, J., Sotto, E. and Pennebaker, J. W. (2002). Style and substance in psychology: Are
influential articles more readable than less influential ones. ​
Social Studies of Science,
32​
, 321­334.
Heberlein, A. S., Adolphs, R., Pennebaker, J. W., & Tranel, D. (2003). ​
Effects of damage to
right­hemisphere brain structures on spontaneous emotional and social judgments.
Political Psychology, 24, ​
705­726.

Holtgraves, T. (2011). Text messaging, personality, and the social context.​
Journal of Research
​
in Personality​
,​
45​
, 92­99.

Holtzman, N. S., Vazire, S., & Mehl, M. R. (2010). Sounds like a narcissist: Behavioral
manifestations of narcissism in everyday life. ​
Journal of Research in Personality​
,​
44​
,
478­484.
Ireland, M. E., & Henderson, M. D. (2014). Language style matching, engagement, and impasse
in negotiations. ​
Negotiation and conflict management research​
,​
7​
, 1­16.
Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., & Pennebaker, J.
W. (2011). Language style matching predicts relationship initiation and stability.
Psychological science, 22​
, 39­44.

Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., & Graesser, A. C. (2013). Pronoun use
reflects standings in social hierarchies. ​
Journal of Language and Social Psychology​
, 33,
125­143.
Kanagawa, C., Cross, S. E., & Markus, H. R. (2001). "Who am I?" The cultural psychology of
the conceptual self. ​
Personality and Social Psychology Bulletin, 27,​
90­103.

LIWC2015 Development Manual

Page 20

Kashima, E. S., & Kashima, Y. (1998). Culture and language: The case of cultural dimensions
and personal pronoun use. ​
Journal of Cross­Cultural Psychology, 29, ​
461­486.

Kashima, E. S., & Kashima, Y. (2005). Erratum to Kashima and Kashima (1998) and reiteration.
Journal of Cross­Cultural Psychology, 36, ​
396­400.

Koppel, M., Schler, J., & Zigdon, K. (2005, August). Determining an author's native language by
mining a text for errors. In ​
Proceedings of the eleventh ACM SIGKDD international
conference on Knowledge discovery in data mining​
(pp. 624­628). ACM.
Koppel, M., Schler, J., Argamon, S., & Pennebaker, J. W. (2006). Effects of age and gender on
blogging. Presented at ​
AAAI 2006 Spring Symposium on Computational Approaches to
Analysing Weblogs​
, Stanford, CA, March 2006.
Lee, Chang H., Nam, K., & Pennebaker, J. W. (2004). Is writing as much phonological as
speaking? Homophone usage across speaking and writing. ​
Psychologia: An
International Journal of Psychology in the Orient, 47​
, 1­9.

Lepore, S. J., & Smyth, J. M. (2002). ​
The writing cure: How expressive writing promotes health
and emotional well­being. ​
Washington: American Psychological Association.
Li, J., Zheng, R., & Chen, H. (2006). From fingerprint to writeprint. ​
Communications of the
ACM​
,​
49​
, 76­82.

Liehr, P., Mehl, M. R., Summers, L.C., & Pennebaker, J. W. (2004). Connecting with others in
the midst of a stressful upheaval on September 11, 2001. ​
Applied Nursing Research, 17​
,
2­9.
Liehr, P., Takahashi, R., Nishimura, C., Frazier, L., Kuwajima, I. & Pennebaker, J. W. (2002).
Embodied language: Comparison of the cardiac and stroke health experience for
Japanese elders. ​
Journal of Nursing Scholarship, 34​
, 27­32

Lyons, E. J., Mehl, M. R., & Pennebaker, J. W. (2006). Linguistic self­presentation in anorexia:
Differences between pro­anorexia and recovering anorexia internet language use.
Journal of Psychosomatic Research, 60,​
253­256.

Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion,
and motivation. ​
Psychological Review, 98,​
224­253.
McAdams, D. P. (2001). The psychology of life stories. ​
Review of General Psychology, 5,
100­122.

Mehl, M. R., Pennebaker, J. W. (2003). The social dynamics of a cultural upheaval: Social
interactions surrounding September 11, 2001. ​
Psychological Science, 14, ​
579­85.

Mehl, M. R., & Pennebaker, J.W . (2003). The sounds of social life: A psychometric analysis of
students’ daily social environments and conversations. ​
Journal of Personality and
Social Psychololgy, 84​
, 857­870.
Mehl, M. R., Robbins, M. L., & Holleran, S. E. (2012). How taking a word for a word can be
problematic: Context­dependent linguistic markers of extraversion and neuroticism.
Journal of Methods and Measurement in the Social Sciences​
,​
3​
, 30­50.

LIWC2015 Development Manual

Page 21

Miller, G. A. (1995). ​
The Science of Words. ​
NY: Scientific American Library.
Mitchell, T. (1999). ​
Machine Learning​
. NY: McGraw­Hill.

Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender
differences in language use: An analysis of 14,000 text samples. ​
Discourse Processes​
,
45​
, 211­236.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying
words: Predicting deception from linguistic style. ​
Personality and Social Psychology
Bulletin, 29,​
665­675.

Niederhoffer, K. G. & Pennebaker, J. W. (2002). Linguistic style matching in social interaction.
Journal of Language and Social Psychology, 21,​
337­360.
Nisbett, R. E. (2003). ​
The geography of thought: How Asians and Westerners think differently.
NY: Free Press.

Oberlander, J., & Gill, A. J. (2006). Language with character: A stratified corpus comparison of
individual differences in e­mail communication. ​
Discourse Processes, 42, ​
239­270.
Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, and reasoning about contradiction.
American Psychologist, 54,​
741­754.
Pennebaker, J. W. (1997). Writing about emotional experiences as a therapeutic process.
Psychological Science, 8​
, 162­166.

Pennebaker, J. W. (2002). What our words can say about us: Towards a broader language
psychology. ​
Psychological Science Agenda, 15​
, 8­9.

Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). ​
Linguistic Inquiry and Word Count
(LIWC): LIWC2007​
. Austin, TX: LIWC.net.

Pennebaker, J. W., Booth, R. J., Boyd, R. L., & Francis, M. E. (2015). ​
Linguistic Inquiry and
Word Count: LIWC2015​
. Austin, TX: Pennebaker Conglomerates (www.LIWC.net).

Pennebaker, J. W. & Campbell, R. S. (2000). The effects of writing about traumatic experience.
Clinical Quarterly, 9​
, 17­21.
Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When
small words foretell academic success: The case of college admissions essays. ​
PloS
One​
,​
9​
, 1­10.

Pennebaker, J. W. & Chung, C.K. (2005). Tracking the social dynamics of responses to
terrorism: Language, behavior, and the Internet. In S. Wessely and V.N. Krasnov
(Eds.), ​
Psychological responses to the new terrorism: A NATO­Russia dialogue ​
(pp.
159­170). Holland, Amsterdam: ISO Press.
Pennebaker, J. W. & Graybeal, A. (2001). Patterns of natural language use: Disclosure,
personality, and social integration. ​
Current Directions in Psychological Science, 10,
90­93.

LIWC2015 Development Manual

Page 22

Pennebaker, J. W. & Lee, Chang H. (2002). The power of words in social, clinical, and
personality psychology. ​
The Korean Journal of Thinking and Problem Solving, 12​
,
35­43.
Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in
disclosure. ​
Cognition and Emotion, 10​
, 601­626.

Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). ​
Linguistic Inquiry and Word Count
(LIWC): LIWC2001. ​
Mahwah: Lawrence Erlbaum Associates.

Pennebaker, J. W., Groom, C. J., Loew, D., & Dabbs, J. M. (2004). Testosterone as a social
inhibitor: Two case studies of the effect of testosterone treatment on language. ​
Journal
of Abnormal Psychology, 113​
, 172­175.
Pennebaker, J. W., & Ireland, M. (2008). Analyzing words to understand literature. In W. van
Peer and J. Auracher (Eds.), ​
New beginnings for the study of literature ​
(pp. 24­48).
Cambridge, UK: Cambridge Scholars Publishing.

Pennebaker, J. W., & Ireland, M. E. (2011). Using literature to understand authors: The case for
computerized text analysis. ​
Scientific Study of Literature​
,​
1​
, 34­48.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual
difference. ​
Journal of Personality & Social Psychology, 77,​
1296­1312.
Pennebaker, J. W., Mayne, T., & Francis, M. E. (1997). Linguistic predictors of adaptive
bereavement. ​
Journal of Personality and Social Psychology, 72​
, 863­871.

Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. (2003). Psychological aspects of natural
language use: Our words, our selves. ​
Annual Review of Psychology, 54,​
547­577.

Pennebaker, J. W., Slatcher, R. B., & Chung, C. K. (2005). Linguistic markers of psychological
state through media interviews: John Kerry and John Edwards in 2004, Al Gore in
2000. ​
Analysis of Social and Public Policy, 5​
, 1­9.
Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the lifespan.
Journal of Personality and Social Psychology, 85,​
291­301.
Ramirez­Esparza, N., & Pennebaker, J. W. (2006). Do good stories produce good health?
Exploring words, language, and culture. ​
Narrative Inquiry, 16​
, 211­219.

Ramirez­Esparza, N., Pennebaker, J. W., Garcia, F. A., & Suria, R. (2007). La psychología del
uso de las palabras: Un programa de comutadora que analiza textos en Español (The
psychology of word use: A computer program that analyzes texts in Spanish). ​
Revista
Mexicana de Psicología, 24​
, 85­99.

Robinson, R. L., Navea, R., & Ickes, W. (2013). Predicting final course performance from
students’ written self­introductions: A LIWC analysis. ​
Journal of Language and Social
Psychology​
, 32, 469­479.
Rochon, E., & Saffran, E. M., Berndt, R. S., & Schwartz, M. F. (2000). Quantitative analysis of
aphasic sentence production: Further development and new data. ​
Brain and Language,
72, ​
193­218.

LIWC2015 Development Manual

Page 23

Rosenberg, S. D. & Tucker, G. J. (1978). Verbal behavior and schizophrenia: The semantic
dimension. ​
Archives of General Psychiatry, 36​
, 1331­1337.
Rude, S. S., Gortner, E. M., & Pennebaker, J. W. (2004). Language use of depressed and
depression­vulnerable college students. ​
Cognition & Emotion, 18, ​
1121­1133.

Sbarra, D. A., Smith, H. L., & Mehl, M. R. (2012). When leaving your ex, love yourself
observational ratings of self­compassion predict the course of emotional recovery
following marital separation. ​
Psychological Science​
,​
23​
, 261­269.

Scherwitz, L., Berton, K., & Leventhal, H. (1978). Type A behavior, self­involvement, and
cardiovascular response. ​
Psychosomatic Medicine, 40,​
593­609.

Schiller, R., Tellegen, A., & Evens, J. (1995). An idiogrpahic and nomothetic study of
personality description. In J. N. Butcher and C. D. Spielberger (Eds.), ​
Advances in
personality assessment ​
(pp. 1­23). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Schultheiss, O. C., & Brunstein, J. C. (2001). Assessment of implicit motives with a research
version of the TAT: Picture profiles, gender differences, and relations to other
personality measures. ​
Journal of Personality Assessment, 77, Special issue: More data
on the current Rorschach controversy,​
71­86.
Scott, M. (1996). ​
WordSmith.​
NY: Oxford University Press.

Sebastiani, F. (2002). Machine learning in automated text categorization. ​
ACM Computing
Surveys​
,​
34​
, 1­47.

Semin, G. R., Rubini, M., & Fiedler, K. (1995). The answer is in the question: The effect of verb
causality on the locus of explanation. ​
Personality & Social Psychology Bulletin, 21,
834­841.
Skoyen, J. A., Randall, A. K., Mehl, M. R., & Butler, E. A. (2014). “We” overeat, but “I” can
stay thin: Pronoun use and body weight in couples who eat to regulate emotion. ​
Journal
of Social and Clinical Psychology​
,​
33​
, 743­766.
Slatcher, R. B. & Pennebaker, J. W. (2006). How do I love thee? Let me count the words: The
social effects of expressive writing. ​
Psychological Science, 17​
, 660­664.

Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words:
Individual differences in linguistic style among U.S. presidential and vice presidential
candidates. ​
Journal of Research in Personality, 41​
, 63­75.
Slobin, D. (1996). From “thought” and “language” to “thinking” for “speaking”. From J. J.
Gumperz and S. J. Levinson (Eds.), ​
Rethinking linguistic relativity ​
(pp. 70­96). New
York, NY: Cambridge University Press.

Stiles, W. B. (1992). ​
Describing talk: A taxonomy of verbal response modes​
. California: Sage.

Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and non­suicidal
poets. ​
Psychosomatic Medicine, 63, ​
517­522.

LIWC2015 Development Manual

Page 24

Stone, L. D., & Pennebaker, J. W. (2002). Trauma in real time: Talking and avoiding online
conversations about the death of Princess Diana. ​
Basic & Applied Social Psychology,
24,​
172­182.

Stone, L. D. & Pennebaker, J. W. (2002). Trauma in real time: Talking and avoiding online
conversations about the death of Princess Diana. ​
Basic and Applied Social Psychology,
24​
, 172­182.
Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). ​
The General Inquirer: A Computer
Approach to Content Analysis.​
Cambridge: MIT Press.
Tannen, D. (1993). ​
Framing in discourse. ​
London: Oxford University Press.

Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and
computerized text analysis methods. ​
Journal of language and social psychology​
,​
29​
,
24­54.
Tausczik, Y., Faasse, K., Pennebaker, J. W., & Petrie, K. J. (2012). Public anxiety and
information seeking following the H1N1 outbreak: blogs, newspaper articles, and
Wikipedia visits. ​
Health Communication​
,​
27​
, 179­185.

Toma, C. L., & Hancock, J. T. (2010, February). Reading between the lines: linguistic cues to
deception in online dating profiles. In ​
Proceedings of the 2010 ACM conference on
Computer supported cooperative work​
(pp. 5­8). ACM.

Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with
twitter: What 140 characters reveal about political sentiment. ​
ICWSM​
,​
10​
, 178­185.
Van Petten, C., & Kutas, M. (1991). Influences of semantic and syntactic context on open­ and
closed­class words. ​
Memory & Cognition, 19,​
95­112.
Van Swol, L. M., & Carlson, C. L. (2015). Language use and influence among minority,
majority, and homogeneous group members. ​
Communication Research, 43, 1­18​
.
Väyrynen, J., & Honkela, T. (2005). Comparison of independent component analysis and
singular value decomposition in word context analysis. ​
Proceedings of AKRR​
,​
5​
,
135­140.

Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures
of positive and negative affect: The PANAS scales. ​
Journal of Personality and Social
Psychology, 54​
, 1063­1070.
Weber­Fox, C., & Neville H. J. (2001). Sensitive periods differentiate processing of open­ and
closed­class words: An event­related brain potential study of bilinguals. ​
Journal of
Speech, Language, and Hearing Research, 44,​
1338­1353.
Weintraub, W. (1989). ​
Verbal behavior in everyday life. ​
NY: Springer.

Williams-Baucom, K. J., Atkins, D. C., Sevier, M., Eldridge, K. A., & Christensen, A. (2010).
“You” and “I” need to talk about “us”: Linguistic patterns in marital interactions.
Personal Relationships​
,​
17​
, 41­56.

Winter, D. G., & McClelland, D. C. (1978). Thematic analysis: An empirically derived measure

LIWC2015 Development Manual

Page 25

of the effects of liberal arts education. ​
Journal of Educational Psychology, 70,​
8­16.

Wolf, M., Horn, A. B., Mehl, M. R., Haug, S., Pennebaker, J. W., & Kordy, H. (2008).
Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen
Version des Linguistic Inquiry and Word Count. ​
Diagnostica​
,​
54​
, 85­98.
Zijlstra, H., Van Meerveld, T., Van Middendorp, H., Pennebaker, J. W., & Geenen, R. (2004).
De Nederlandse versie van de ‘linguistic inquiry and word count’(LIWC). ​
Gedrag &
gezondheid​
,​
32​
, 271­281.

Portions of the research reported in this manual were made possible by grants from the National
Institutes of Health (MH52391), National Science Foundation (​
IIS­​
1344257​
), the Army Research
Institute (​
W5J9CQ12C0043​
), and the Templeton Foundation. Special thanks go to Cindy Chung.
Cindy’s mastery of language, thoughtful feedback, and valuable insights have been vital to the
ongoing longevity of the LIWC project. We are also deeply indebted to a number of people who
have helped with different phases of LIWC, including: Martha Francis, Laura King, Yitai Seah,
Jenna Baddelley, Molly Ireland, Yla Tausczik, Matthias Mehl, Richard Slatcher, Jason Ferrell,
Sam Gosling, and Gabriella Harari. We are particularly indebted to the LIWC2015
Development Team of Kiki Adams, Jennifer Caplan, Zachary Reese, Courtney Wang, and Nick
Abbs.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Has XFA                         : No
XMP Toolkit                     : Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30
Modify Date                     : 2017:12:22 01:10:16+08:00
Create Date                     : 2017:12:15 14:35:18+08:00
Metadata Date                   : 2017:12:22 01:10:16+08:00
Format                          : application/pdf
Document ID                     : uuid:c353106a-97cf-419a-8df4-aad95dc00106
Instance ID                     : uuid:2be6e33d-9ecc-4d55-b175-074b98775c41
Page Count                      : 26
EXIF Metadata provided by EXIF.tools

Navigation menu