LIWC2015 Language Manual
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 26
Download | |
Open PDF In Browser | View PDF |
The Development and Psychometric Properties of LIWC2015 James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn The University of Texas at Austin Correspondence should be sent to James W. Pennebaker, Department of Psychology, The University of Texas at Austin, 108 E. Dean Keeton Stop A8000, Austin, TX 787121043. The LIWC2015 program is a commercial product distributed by Pennebaker Conglomerates for research purposes and by Receptiviti, Inc for commercial purposes. All profits to Pennebaker for the researchbased version are donated to the Department of Psychology, University of Texas at Austin. The official reference to this paper is: Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015 . Austin, TX: University of Texas at Austin. LIWC2015 Development Manual Page 1 The Development and Psychometric Properties of LIWC2015 The ways people use words in their daily lives can provide rich information about their beliefs, fears, thinking patterns, social relationships, and personalities. From the time of Freud’s writings about slips of the tongue to the early days of computerbased text analysis, researchers began amassing increasingly compelling evidence that the words we use have tremendous psychological value (Gottschalk & Glaser, 1969; Stone, Dunphy, Smith, & Ogilvie, 1966; Weintraub, 1989). Although promising, the early computer methods floundered because of the sheer complexity of the task. Extensive samples of text were not digitized, computers were slow and unwieldy, and there was little agreement about which features of natural language were most related to psychological states. Everything changed in the 1990s with the advent of efficient desktop computers, improved data storage technology, and the explosion of the internet. These factors allowed for the easy collection of large stores of books, conversations, and other digitized text samples. In order to provide an efficient and effective method for studying the various emotional, cognitive, and structural components present in individuals’ verbal and written speech samples, we originally developed a text analysis application called Linguistic Inquiry and Word Count, or LIWC. The first LIWC application was developed as part of an exploratory study of language and disclosure (Francis, 1993; Pennebaker, 1993). The second (LIWC2001) and third (LIWC2007) versions updated the original application with an expanded dictionary and a more modern software design (Pennebaker, Francis, & Booth, 2001; Pennebaker, Booth, & Francis, 2007). The most recent evolution, LIWC2015 (Pennebaker, Booth, Boyd, & Francis, 2015), has significantly altered both the dictionary and the software options. Importantly, the LIWC2015 software and dictionary are new, rather than a basic update to previous versions of LIWC. As with previous versions, however, the program is designed to analyze individual or multiple language files quickly and efficiently. At the same time, the program attempts to be transparent and flexible in its operation, allowing the user to explore word use in multiple ways. The LIWC2015 Framework Both the standard downloadable and webbased versions of the LIWC2015 application rely on an internal default dictionary that defines which words should be counted in the target text files. Note that the LIWC2015 processor is an executable file and cannot be read or opened. To avoid confusion in the subsequent discussion, words contained in texts that are read and analyzed by LIWC2015 are referred to as target words . Words in the LIWC2015 dictionary file will be referred to as dictionary words . Groups of dictionary words that tap a particular domain (e.g., negative emotion words) are variously referred to as subdictionaries or word categories. LIWC2015 Development Manual Page 2 The LIWC2015 Main Text Processing Module Because the software application is written in a crossplatform language, it runs identically on PC and Mac computers via the Java Virtual Machine. LIWC2015 is designed to accept written or transcribed verbal text which has been stored as a digital, machinereadable file in one of multiple formats, including plain text, PDF, RTF, or standard Microsoft Word files (i.e., .doc and .docx). Unlike previous versions, the software can now process text on a line by line basis within and across columns inside of multiple spreadsheet formats, including those saved as .xls, .xlsx, and .csv files. During operation, LIWC2015 accesses a single text file, a group of files, or texts within a spreadsheet and analyzes each sequentially. For each file, LIWC2015 reads one target word at a time. As each target word is processed, the dictionary file is searched, looking for a dictionary match with the current target word. If the target word is matched with a dictionary word, the appropriate word category scale (or scales) for that word is incremented. As the target text file is being processed, counts for various structural composition elements (e.g., word count and sentence punctuation) are also incremented. For each text file, approximately 90 output variables are written as one line of data to an output file. This data record includes the file name and word count, 4 summary language variables (analytical thinking, clout, authenticity, and emotional tone), 3 general descriptor categories (words per sentence, percent of target words captured by the dictionary, and percent of words in the text that are longer than six letters), 21 standard linguistic dimensions (e.g., percentage of words in the text that are pronouns, articles, auxiliary verbs, etc.), 41 word categories tapping psychological constructs (e.g., affect, cognition, biological processes, drives), 6 personal concern categories (e.g., work, home, leisure activities), 5 informal language markers (assents, fillers, swear words, netspeak), and 12 punctuation categories (periods, commas, etc). A complete list of the standard LIWC2015 scales is included in Table 1. The Default LIWC2015 Dictionary The LIWC2015 Dictionary is the heart of the text analysis strategy. The default LIWC2015 Dictionary is composed of almost 6,400 words, word stems, and select emoticons. Each dictionary entry additionally defines one or more word categories or subdictionaries. For example, the word cried is part of five word categories: sadness, negative emotion, overall affect, verbs, and past focus. Hence, if the word cried is found in the target text, each of these five subdictionary scale scores will be incremented. As in this example, many of the LIWC2015 categories are arranged hierarchically. All sadness words, by definition, belong to the broader “negative emotion” category, as well as the “overall affect words” category. Note too that word stems can be captured by the LIWC2015 system. For example, the dictionary includes the stem hungr* which allows for any target word that matches the first five letters to be counted as an ingestion word (including hungry, hungrier, hungriest). The asterisk, then, denotes the acceptance of all letters, hyphens, or numbers following its appearance. LIWC2015 Development Manual Page 3 Each of the default LIWC2015 categories is composed of a list of dictionary words that define that scale. Table 1 provides a comprehensive list of the default LIWC2015 dictionary categories, scales, sample scale words, and relevant scale word counts. Table 1. LIWC2015 Output Variable Information Category Word count Summary Language Variables Analytical thinking Clout Authentic Emotional tone Words/sentence Words > 6 letters Dictionary words Linguistic Dimensions Total function words Total pronouns Personal pronouns 1st pers singular 1st pers plural 2nd person 3rd pers singular 3rd pers plural Impersonal pronouns Articles Prepositions Auxiliary verbs Common Adverbs Conjunctions Negations Other Grammar Common verbs Common adjectives Comparisons Interrogatives Numbers Quantifiers Psychological Processes Affective processes Positive emotion Negative emotion Anxiety Anger Sadness Social processes Family Examples Words in category WC Analytic Clout Authentic Tone WPS Sixltr Dic Abbrev Internal Consistency (Uncorrected α ) Internal Consistency (Corrected α ) funct pronoun ppron i we you shehe they ipron article prep auxverb adverb conj negate it, to, no, very I, them, itself I, them, her I, me, mine we, us, our you, your, thou she, her, him they, their, they’d it, it’s, those a, an, the to, with, above am, will, have very, really and, but, whereas no, not, never 491 153 93 24 12 30 17 11 59 3 74 141 140 43 62 .05 .25 .20 .41 .43 .28 .49 .37 .28 .05 .04 .16 .43 .14 .29 .24 .67 .61 .81 .82 .70 .85 .78 .71 .23 .18 .54 .82 .50 .71 verb adj compare interrog number quant eat, come, carry free, happy, long greater, best, after how, when, what second, thousand few, many, much 1000 764 317 48 36 77 .05 .04 .08 .18 .45 .23 .23 .19 .35 .57 .83 .64 affect posemo negemo anx anger sad social family happy, cried love, nice, sweet hurt, ugly, nasty worried, fearful hate, kill, annoyed crying, grief, sad mate, talk, they daughter, dad, aunt 1393 620 744 116 230 136 756 118 .18 .23 .17 .31 .16 .28 .51 .55 .57 .64 .55 .73 .53 .70 .86 .88 LIWC2015 Development Manual Page 4 Category Abbrev Examples Friends Female references Male references Cognitive processes Insight Causation Discrepancy Tentative Certainty Differentiation Perceptual processes See Hear Feel Biological processes Body Health Sexual Ingestion Drives Affiliation Achievement Power Reward Risk Time orientations Past focus Present focus Future focus Relativity Motion Space Time Personal concerns Work Leisure Home Money Religion Death Informal language Swear words Netspeak Assent Nonfluencies Fillers friend female male cogproc insight cause discrep tentat certain differ percept see hear feel bio body health sexual ingest drives affiliation achieve power reward risk TimeOrient focuspast focuspresent focusfuture relativ motion space time buddy, neighbor girl, her, mom boy, his, dad cause, know, ought think, know because, effect should, would maybe, perhaps always, never hasn’t, but, else look, heard, feeling view, saw, seen listen, hearing feels, touch eat, blood, pain cheek, hands, spit clinic, flu, pill horny, love, incest dish, eat, pizza work leisure home money relig death informal swear netspeak assent nonflu filler Words in category Internal Consistency (Uncorrected α ) Internal Consistency (Corrected α ) 95 124 116 797 259 135 83 178 113 81 436 126 93 128 748 215 294 131 184 1103 248 213 518 120 103 .20 .53 .52 .65 .47 .26 .34 .44 .31 .38 .17 .46 .27 .24 .29 .52 .09 .37 .67 .39 .40 .41 .35 .27 .26 .60 .87 .87 .92 .84 .67 .76 .83 .73 .78 .55 .84 .69 .65 .71 .87 .37 .78 .92 .80 .80 .81 .76 .69 .68 ago, did, talked today, is, now may, will, soon area, bend, exit arrive, car, go down, in, thin end, until, season 341 424 97 974 325 360 310 .23 .24 .26 .50 .36 .45 .39 .64 .66 .68 .86 .77 .83 .79 job, majors, xerox cook, chat, movie kitchen, landlord audit, cash, owe altar, church bury, coffin, kill 444 296 100 226 174 74 380 131 209 36 19 14 .69 .50 .46 .60 .64 .39 .46 .45 .42 .10 .27 .06 .93 .86 .83 .90 .91 .79 .84 .83 .82 .39 .69 .27 ally, friend, social win, success, better superior, bully take, prize, benefit danger, doubt fuck, damn, shit btw, lol, thx agree, OK, yes er, hm, umm Imean, youknow LIWC2015 Development Manual Page 5 “Words in category” refers to the number of different dictionary words and stems that make up the variable category. All alphas were computed on a sample of ~181,000 text files from several of our language corpora (see Table 2). Uncorrected internal consistency alphas are based on Cronbach estimates; corrected alphas are based on Spearman Brown. See the Reliability and Validity section below. Note that the LIWC2015 dictionary generally arranges categories hierarchically. There are some exceptions to the hierarchy rules. For example, Social processes include a large group of words that denote social processes, including all nonfirstpersonsingular personal pronouns as well as verbs that suggest human interaction (talking, sharing) many of these words do not belong to any of the Social processes subcategories. Another example is Relativity , which includes a large number of words that cannot be found in any of its subcategories. LIWC2015 Dictionary Development The selection of words defining the LIWC2015 categories involved multiple steps over several years. Originally, the idea was to identify a group of words that tapped basic emotional and cognitive dimensions often studied in social, health, and personality psychology. With time, the domain of word categories expanded considerably. The most recent version of the dictionary, LIWC2015, is a completely new version compared to earlier ones. Dictionaries can now accommodate numbers, punctuation, and even short phrases. These additions allow the user to read "netspeak" language that is common in Twitter and Facebook posts, as well as SMS (short messaging service, a.k.a. “text messaging”) and SMSlike modes of communication (e.g., Snapchat, instant messaging). For example, "b4" is coded as a preposition and ":)" is coded as a positive emotion word. A handful of new categories have been added and a small number have been removed. With the advent of more powerful analytic methods and more diverse language samples, we have been able to build more internallyconsistent language dictionaries. This means that many of the dictionaries in previous LIWC versions may have the same name, but the words making up the dictionaries have been altered (categories subjected to major changes are presented below). We present here a complete overview of the process used to create the LIWC2015 dictionary. Step 1. Word Collection. In the design and development of the LIWC category scales, sets of words were first generated for each conceptual dimension, using the LIWC2007 dictionary as a starting point. Within the Psychological Processes category, for example, the emotion subdictionaries were based on words from several sources, including previous versions of the LIWC dictionary. We drew on common emotion rating scales, such as the PANAS (Watson, Clark, & Tellegen, 1988), Roget’s Thesaurus, and standard English dictionaries. Following the creation of preliminary category word lists, 26 judges individually generated word lists for each category, then group brainstorming sessions among 48 judges were held in which words relevant to the various scales were generated and added to the initial scale lists. Similar schemes were used for the other subjective dictionary categories. Step 2. Judge Rating Phase. Once the grand list of words was amassed, each word in the dictionary was examined by a group of 48 judges and qualitatively rated in terms of “goodness of fit” for each category. In order for a word to remain in a given category, a majority of judges had to agree on its inclusion. In cases of disputes, several corpora and online sources were referenced to determine a word’s common use, inflection, and meaning. Words for which judges could not decide on appropriate category placement were removed from the dictionary. LIWC2015 Development Manual Page 6 Step 3. Base Rate Analyses. Once a working version of the dictionary was constructed from judges’ ratings, texts from several sources were analyzed using the Meaning Extraction Helper (MEH; Boyd, 2015) to determine how frequently dictionary words were used in various contexts. These sources included blog posts, spoken language studies, Twitter, Facebook, novels, student writings, and several others. Dictionary words that did not occur at least once in multiple corpora were omitted from the dictionary. Step 4. Candidate Word List Generation. In order to expand the dictionary, we explored several sources of language for highfrequency words that had not been added by judges. Using MEH, highfrequency words were quantified as a percentage of total words for hundreds of thousands of text files from multiple studies and sources. For several linguistic categories (e.g., verbs, adjectives), the Stanford Natural Language Toolkit (NLTK; Toutanova, Klein, Manning, & Singer, 2003) was used in conjunction with MEH to identify common words. All candidate words were then correlated with all dictionary categories in order to detect common words that were not yet included in the dictionary. Words that correlated positively with dictionary categories were added to a list of candidate words for possible inclusion. Following this, 48 judges reviewed the candidate list and voted on 1) whether words should be included in the dictionary and 2) whether words were a sound conceptual fit for specific dictionary categories. Judges’ rating procedures were parallel to those outlined in Step 2 . Step 5. Psychometric Evaluation. Following all previouslydescribed steps, each language category was separated into its constituent words. Each word was then quantified as a percentage of total words for ~181,000 text files hailing from 5 corpora, totalling ~231,000,000 words (see Table 2). All words for each category were treated as a “response” and used to compute internal consistency statistics for each language category as a whole. Words that were detrimental to the internal consistency of their overarching language category were added to a candidate list of words for omission from the final dictionary. A group of 28 judges then reviewed the list of candidate words and voted on whether words should be retained. Words for which no majority could be established were omitted. Several linguistic categories, such as pronouns and adverbs , constitute established linguistic constructs and were therefore not a part of the omission process. We discuss the psychometric evaluation procedures in extensive detail in the next section. Step 6. Refinement Phase. After Steps 1 through 5 were complete, they were repeated in their entirety. This was done to catch any possible mistakes/oversights that might have occurred throughout the dictionary creation process. Note that the psychometrics of each language category changed negligibly during each refinement phase. During the last stage of the final refinement phase, two judges reviewed the dictionary for mistakes. Step 7. Addition of Summary Variables. A major change from earlier versions of LIWC is the inclusion of four new summary variables: analytical thinking (Pennebaker et al., 2014), clout (Kacewicz et al., 2012), authenticity (Newman et al., 2003), and emotional tone (Cohn et al., 2004). Each summary variable was derived from previously published findings from our lab and converted to percentiles based on standardized scores from large comparison samples. It must be emphasized that the summary variables are the only nontransparent dimensions in the LIWC2015 output. LIWC2015 Development Manual Page 7 A Note about the LIWC2015 Language Categories For those who are familiar with LIWC2007, some of the LIWC2015 categories and results will be a bit jarring. Some of the original categories have been removed, largely due to their consistently low base rates, low internal reliability, or their infrequent use by researchers: Past tense verbs Present tense verbs Future tense verbs Inhibition words Inclusives Exclusives Human words The following is a list of categories that are either a) new to LIWC2015, or b) substantially different from their counterparts in previous versions. While other LIWC2015 categories may also be slightly different from those in previous versions, categories from previous versions of LIWC that are presented in the list below have undergone substantial revision. Common verbs Common adjectives Common comparison words Interrogatives Female references Male references Cognitive processes Differentiation words Drives Affiliation words Achievement words Power words Risk words Reward words Past focus words Present focus words Future focus words Informal language Netspeak words Quantifiers Note that the LIWC2015 application comes with the original internal dictionaries for both LIWC2001 and LIWC2007 for those who want to rely on older versions of the dictionary as well as to compare LIWC2015 analyses with those provided by older versions of the software. LIWC2015: Internal Reliability and External Validity Assessing the reliability and validity of text analysis programs is a tricky business. On the surface, one would think that you could determine the internal reliability of a LIWC scale the same way it is done with a questionnaire. With a questionnaire that taps anger or aggression, for example, participants complete a selfreport asking a number of questions about their feelings or behaviors related to anger. Reliability coefficients are computed by correlating people’s responses to the various questions. The more highly they correlate, the reasoning goes, the more the questionnaire items all measure the same thing. Voila! The scale is deemed internally consistent. A similar strategy can be used with words. But be warned: the psychometrics of natural language use are not as straightforward as with questionnaires. The reason is obvious once you think LIWC2015 Development Manual Page 8 about it. Once you say something, you generally don’t need to say it again in the same paragraph or essay. The nature of discourse, then, is we usually say something and then move on to the next topic. Repeating the same idea over and over again is generally bad form in language, yet this is a staple of selfreport questionnaire design. It is important, then, to understand that acceptable boundaries for natural language reliability coefficients are lower than those commonly seen elsewhere in psychological tests. The LIWC Anger scale, for example, is made up of 230 angerrelated words and word stems. In theory, the more that people use one type of anger word in a given text, the more they should use other anger words in the same text. To test this idea, we can determine the degree to which people use each of the 230 anger words across a select group of text files and then calculate the intercorrelations of the word use. Indeed, in Table 1, we include these internal reliability statistics, including those of Anger where the alpha reliabilities range between .52 (corrected) and .07 (uncorrected) depending on how it is computed. In order to calculate these statistics, each dictionary word was measured as a percentage of total words per text. These scores were then entered as an “item” in a standard Cronbach’s alpha calculation, providing raw alpha scores for each word category, separately for each corpora. Uncorrected alphas in Table 1 are averages of each corpora’s alpha score. Importantly, the uncorrected method tends to grossly underestimate reliability in language categories due the highly variable base rates of word usage within any given category. Corrected alphas were computed using the SpearmanBrown prediction formula (Brown, 1910; Spearman, 1910), and are generally a more accurate approximation of each category’s “true” internal consistency. Issues of validity are also a bit tricky. We can have people complete a questionnaire that assesses their general moods and then have them write an essay which we then subject to the LIWC program. We can also have judges evaluate the essay for its emotional content. In other words, we can get selfreported, judged, and LIWC numbers that all reflect a participant’s anger. One of the first tests of the validity of the LIWC scales was undertaken by Pennebaker and Francis (1996) as part of an experiment in which first year college students wrote about the experience of coming to college. During the writing phase of the study, 72 Introductory Psychology students met as a group on three consecutive days to write on their assigned topics. Participants in the experimental condition (n = 35) were instructed to write about their deepest thoughts and feelings concerning the experience of coming to college. Those in the control condition (n = 37) were asked to describe any particular object or event of their choosing in an unemotional way. After the writing phase of the study was completed, four judges rated the participants’ essays on various emotional, cognitive, content, and composition dimensions designed to correspond to selected LIWC Dictionary scales. Using LIWC output and judges’ ratings, Pearson correlational analyses were performed to test LIWC’s external validity. The findings suggested that LIWC successfully measures positive and negative emotions, a number of cognitive strategies, several types of thematic content, and various language composition elements. The level of agreement between judges’ ratings and LIWC’s objective word count strategy provides support for LIWC’s external validity. Since the first version of LIWC, hundreds of studies have found the LIWC categories to be valid across dozens of psychological domains. As a starting point for exploring this body of literature, we recommend a close reading of Tausczik and Pennebaker (2010). LIWC2015 Development Manual Page 9 Base Rates of Word Usage In evaluating any text analysis program, it is helpful to get a sense of the degree to which language varies across settings. Since 1986, we have been collecting text samples from a variety of studies – both from our own lab as well as from dozens of others in the United States, England, Canada, New Zealand, and Australia. For purposes of comparison, text from several dozens of studies have been analyzed using the updated LIWC2015 dictionary. As can be seen in Table 2, these analyses reflect the utterances of over 80,000 writers or speakers totaling over 231 million words. We provide a brief description of each dataset below. Table 2. Summary Information for LIWC2015 Statistics Blogs Expressive writing Novels Natural Speech NY Times Twitter Total files 37,295 6,179 875 3,232 34,929 35,269 Total authors 37,295 2,510 441 2,174 Unknown 35,269 119,449,058 2,526,709 57,467,183 2,566,446 26,007,632 23,172,994 Total words Note: All texts for all corpora required a minimum of 25 words for inclusion in our analyses. All texts with fewer than 25 words were omitted for all statistics reported in this document. Blogs . This is an expanded version of the corpus described in Schler, Koppel, Argamon, and Pennebaker (2006). All blog posts were merged by individual prior to analysis, reflecting the entirety of each person’s blog. Expressive writing . This dataset consists of 29 samples from experiments where people were randomly assigned to write either about deeply emotional topics (emotional writing) or about relatively trivial topics such as plans for the day (control writing). Individuals from all walks of life – ranging from college students to psychiatric prisoners to elderly and even elementaryaged individuals – are represented in these studies. Only the emotional writing topics were included in the current analyses. Novels . This is a sample of novels acquired from Project Gutenberg ( http://www.gutenberg.org/ ) that had been tagged as “literature”. All novels were written in the English language by authors who lived between approximately 1660 and 2008. The number of authors presented in Table 2 reflects only known authors of the works analyzed works for which the author was unknown were not included in this figure, but included in analyses. Natural speech . The speech samples included diverse transcripts from multiple contexts, including people wearing audio recorders over days or weeks, strangers interacting in a waiting room, couples talking about problems, and openair tape recordings of people in public spaces. New York Times . A collection of articles published online at the New York Times website ( http://www.nytimes.com ). Articles were collected from the New York Times internet archives LIWC2015 Development Manual Page 10 and include various types of work, including editorials, features, U.S. and world news, letters to the editor, and so on. All articles were published between January and July of 2014. Author information was not preserved for this dataset, so the true number of authors is unknown. Twitter . Individual Twitter posts (i.e., “tweets”) were collected from the public profiles of users whose names were entered into the Analyze Words webpage ( http://analyzewords.com ). Each user’s tweets were combined into a single unit of observation for analysis. As can be seen in Table 3, the LIWC2015 version captures, on average, over 86 percent of the words people use in writing and speech. Note that except for total word count and words per sentence and the four summary variables (Analytic, Clout, Authentic, and Tone), all means in Table 3 are expressed as percentage of total words used in any given language sample. Simple statistical tests indicate that nearly all language categories differ significantly between contexts. Table 3. LIWC2015 Output Variable Information Category Linguistic Processes Word count (mean) Analytic Clout Authentic Tone * Words/sentence Words>6 letters Dictionary words Total function words Total pronouns Personal pronouns 1st pers singular 1st pers plural 2nd person 3rd pers singular 3rd pers plural Impersonal pronouns Articles Prepositions Auxiliary verbs Adverbs Conjunctions Negations Other Grammar Common verbs Common adjectives Comparisons Interrogatives Blogs Expressive writing Novels Natural Speech NY Times Twitter Grand Means Mean SDs 3206.45 49.89 47.87 60.93 54.50 18.40 14.38 85.79 53.10 16.20 10.66 6.26 0.91 1.32 1.50 0.68 5.53 6.00 12.60 8.75 5.88 6.43 1.81 408.94 44.88 37.02 76.01 38.60 18.42 13.62 91.93 58.27 18.03 12.74 8.66 0.81 0.68 2.01 0.57 5.28 5.70 14.27 9.25 6.02 7.46 1.69 65716.49 70.33 75.37 21.56 37.06 16.13 16.30 84.52 54.51 15.15 10.35 2.63 0.61 1.39 4.80 0.92 4.79 8.35 14.27 7.77 4.17 6.28 1.68 794.17 18.43 56.27 61.32 79.29 10.42 91.60 56.86 20.92 13.37 7.03 0.87 4.04 0.77 0.65 7.53 4.34 10.29 12.03 7.67 6.21 2.42 744.62 92.57 68.17 24.84 43.61 21.94 23.58 74.62 42.39 7.41 3.56 0.63 0.38 0.34 1.53 0.68 3.84 9.08 14.27 5.11 2.76 4.85 0.62 660.24 61.94 63.02 50.39 72.24 12.10 15.31 82.60 46.08 13.62 9.02 4.75 0.74 2.41 0.64 0.47 4.60 5.58 11.88 8.27 5.13 4.19 1.74 11921.82 56.34 57.95 49.17 54.22 17.40 15.60 85.18 51.87 15.22 9.95 4.99 0.72 1.70 1.88 0.66 5.26 6.51 12.93 8.53 5.27 5.90 1.66 10274.32 17.58 17.51 20.92 23.27 16.38 3.76 5.36 5.13 3.61 3.02 2.46 0.83 1.35 1.53 0.60 1.62 1.79 2.11 2.04 1.61 1.57 0.86 17.03 4.53 2.17 1.51 18.63 4.52 2.42 1.49 15.42 4.36 2.13 1.53 21.01 4.13 2.35 2.44 10.23 4.52 2.39 1.26 16.33 4.89 1.89 1.43 16.44 4.49 2.23 1.61 2.93 1.30 0.95 0.76 LIWC2015 Development Manual Number Quantifiers Psychological Processes Affective processes Positive emotion Negative emotion Anxiety Anger Sadness b Social processes Family Friends Female references Male references Cognitive processes Insight Causation Discrepancy Tentative Certainty Differentiation Perceptual processes See Hear Feel Biological processes Body Health Sexual Ingestion Drives Affiliation Achievement Power Reward Risk Time orientations Past focus Present focus Future focus Relativity Motion Space Time Personal Concerns Work Leisure Home Money Religion Page 11 1.89 2.27 1.87 2.35 1.23 1.80 2.19 1.93 3.55 1.94 1.98 1.85 2.12 2.02 2.07 0.83 5.79 3.66 2.06 0.27 0.68 0.44 8.95 0.46 0.40 0.91 1.31 11.58 2.28 1.46 1.56 2.82 1.56 3.31 2.58 1.04 0.75 0.64 2.16 0.74 0.61 0.17 0.54 6.87 2.20 1.27 2.07 1.49 0.46 4.77 2.57 2.12 0.50 0.49 0.50 8.69 0.77 0.55 1.37 1.47 12.52 2.66 1.65 1.74 2.89 1.51 3.40 2.38 0.80 0.48 0.92 2.59 0.69 0.93 0.09 0.86 7.35 2.45 1.37 2.02 1.56 0.54 4.81 2.67 2.08 0.44 0.51 0.55 12.26 0.39 0.25 1.88 4.09 9.84 2.11 1.03 1.48 2.27 1.45 2.82 3.74 1.58 1.26 0.76 2.17 1.24 0.48 0.08 0.39 5.84 1.39 0.91 2.46 1.04 0.53 6.54 5.31 1.19 0.14 0.36 0.23 10.42 0.31 0.37 0.55 0.80 12.27 2.46 1.45 1.45 3.06 1.38 3.73 2.11 0.78 0.63 0.61 1.23 0.31 0.38 0.09 0.35 6.39 2.06 0.99 1.72 1.73 0.30 3.82 2.32 1.45 0.25 0.47 0.29 7.62 0.33 0.18 0.62 1.38 7.52 1.54 1.42 0.89 1.74 0.76 2.03 2.42 0.88 1.06 0.35 1.44 0.41 0.57 0.10 0.41 7.60 1.69 1.82 3.62 1.07 0.56 7.67 5.48 2.14 0.24 0.75 0.43 10.47 0.36 0.43 0.54 0.84 9.96 1.92 1.41 1.54 2.35 1.43 2.62 2.96 1.39 0.82 0.56 2.60 0.77 0.54 0.24 0.86 7.50 2.53 1.45 2.17 1.86 0.46 5.57 3.67 1.84 0.31 0.54 0.41 9.74 0.44 0.36 0.98 1.65 10.61 2.16 1.40 1.44 2.52 1.35 2.99 2.70 1.08 0.83 0.64 2.03 0.69 0.59 0.13 0.57 6.93 2.05 1.30 2.35 1.46 0.47 1.99 1.63 1.09 0.32 0.59 0.40 3.38 0.63 0.40 1.26 1.34 3.02 1.08 0.73 0.80 1.09 0.70 1.18 1.20 0.78 0.62 0.52 1.39 0.64 0.65 0.30 0.83 2.03 1.28 0.82 1.12 0.81 0.41 4.25 10.95 1.60 14.23 2.15 6.43 5.86 5.83 10.45 1.85 16.19 2.58 6.96 7.01 7.06 6.21 1.19 14.56 2.34 7.82 4.71 3.78 15.28 1.45 12.12 2.20 5.86 4.28 4.09 5.14 0.80 14.47 1.70 7.76 5.17 2.81 11.74 1.60 13.99 1.94 6.51 5.75 4.64 9.96 1.42 14.26 2.15 6.89 5.46 2.06 2.80 0.90 3.18 1.03 1.96 1.81 2.04 1.50 0.49 0.59 0.39 2.64 1.17 0.99 0.41 0.20 1.20 0.56 0.56 0.45 0.34 2.87 1.11 0.34 0.44 0.14 4.49 1.67 0.47 1.47 0.25 2.16 2.11 0.43 0.74 0.35 2.56 1.35 0.55 0.68 0.28 1.81 1.08 0.63 0.83 0.57 LIWC2015 Development Manual Death Informal Language Swear words Netspeak Assent Nonfluencies Fillers * Punctuation Total Punctuation Periods Commas Colons Semicolons Question marks Exclamation marks Dashes Quotation marks Apostrophes Parentheses Other punctuation Page 12 0.15 2.09 0.35 0.92 0.33 0.42 0.11 0.12 0.45 0.09 0.05 0.10 0.17 0.04 0.26 0.53 0.05 0.10 0.14 0.24 0.01 24.18 10.29 4.15 0.43 0.10 0.59 1.16 0.99 0.71 3.85 0.90 1.00 12.41 6.17 3.17 0.21 0.04 0.15 0.12 0.39 0.22 1.40 0.32 0.23 23.68 6.04 7.09 0.12 0.53 0.60 0.49 2.14 3.90 2.19 0.06 0.52 0.04 7.10 0.25 1.35 3.29 1.96 0.46 0.22 0.29 0.02 0.16 0.05 0.07 0.00 0.19 4.68 0.49 3.23 1.82 0.39 0.04 0.16 2.52 0.21 0.97 0.95 0.54 0.11 0.29 1.65 0.37 1.17 0.72 0.49 0.27 19.02 5.88 6.60 0.27 0.17 0.15 0.02 1.23 2.23 1.56 0.54 0.36 27.46 9.07 2.76 2.15 0.67 1.40 3.21 1.21 1.30 3.32 0.81 1.56 21.35 7.49 4.75 0.64 0.30 0.58 1.00 1.19 1.67 2.46 0.53 0.73 9.01 3.76 1.94 0.85 0.53 1.00 1.35 1.38 1.36 4.94 0.87 1.70 Notes: Grand Means are the unweighted means of the six genres; Mean SDs refer to the unweighted mean of the standard deviations across the six genre categories. *In calculating grand means and standard deviations for the words per sentence (WPS) and punctuation categories, the natural speech corpus was excluded due to differing transcription rules across documents. In many ways, Table 3 points to the important role that context plays in people’s use of language. Not surprisingly, the topics of writing – as reflected in the current concerns category – vary substantially as a function of genre. More striking, however, are the large differences in people’s use of function words as well as punctuation from genre to genre (cf., Biber, 1988). Comparing LIWC2015 with LIWC2007 For users of LIWC2007, a new edition of LIWC that uses a different dictionary can be an unsettling experience. Most of the older dictionaries have been slightly changed, some have been substantially reworked (e.g., social words, cognitive process words), and several others have been removed or added. To assist in the transition to the new version of LIWC, we include Table 4 which lists the means, standard deviations, and correlations between the two dictionary versions. These analyses are based on the corpora detailed in Tables 2 and 3. All numbers presented in Table 4 are the average results from all six corpora. To get a sense of how much a dictionary has changed from the LIWC2007 to the LIWC2015 versions, look at the LIWC2015/2007 Correlation column. The lower the correlation, the more change across the two versions. LIWC2015 Development Manual Page 13 Table 4. Comparisons Between LIWC2015 and LIWC2007: Means, Standard Deviations, and Correlations LIWC Dimension Output Label LIWC2015 mean LIWC2007 mean Word count Summary Variables Analytical thinking Clout Authentic Emotional tone Language Metrics * Words per sentence Words>6 letters Dictionary words Function Words Total pronouns Personal pronouns 1st pers singular 1st pers plural 2nd person 3rd pers singular 3rd pers plural Impersonal pronouns Articles Prepositions Auxiliary verbs Common adverbs Conjunctions Negations Other Grammar Regular verbs Adjectives Comparatives Interrogatives Numbers Quantifiers Affect Words Positive emotion Negative emotion Anxiety Anger Sadness Social Words Family WC 11,921.82 11,852.99 LIWC 2015/2007 1 Correlation 1.00 Analytic Clout Authentic Tone 56.34 57.95 49.17 54.22 WPS Sixltr Dic function pronoun ppron i we you shehe they ipron article prep auxverb adverb conj negate 17.40 15.60 85.18 51.87 15.22 9.95 4.99 0.72 1.70 1.88 0.66 5.26 6.51 12.93 8.53 5.27 5.90 1.66 25.07 15.89 83.95 54.29 14.99 9.83 4.97 0.72 1.61 1.87 0.66 5.17 6.53 12.59 8.82 4.83 5.87 1.72 0.74 0.98 0.94 0.95 0.99 0.99 1.00 1.00 0.98 1.00 0.99 0.99 0.99 0.96 0.96 0.97 0.99 0.96 verb adj compare interrog number quant affect posemo negemo anx anger sad social family 16.44 4.49 2.23 1.61 2.12 2.02 5.57 3.67 1.84 0.31 0.54 0.41 9.74 0.44 15.26 1.98 2.48 5.63 3.75 1.83 0.33 0.6 0.39 9.36 0.38 0.72 0.98 0.88 0.96 0.96 0.96 0.94 0.97 0.92 0.96 0.94 LIWC2015 Development Manual Friends Female referents Male referents 2 Cognitive Processes Insight Cause Discrepancies Tentativeness Certainty 3 Differentiation Perceptual Processes Seeing Hearing Feeling Biological Processes Body Health/illness Sexuality Ingesting Drives and Needs Affiliation Achievement Power Reward focus Risk focus 4 Time Orientations Past focus Present focus Future focus Relativity Motion Space Time Personal Concerns Work Leisure Home Money Religion Death Informal Speech Swear words Netspeak Assent Nonfluencies Page 14 friend female male cogproc insight cause discrep tentat certain differ percept see hear feel bio body health sexual ingest drives affiliation achieve power reward risk 0.36 0.98 1.65 10.61 2.16 1.40 1.44 2.52 1.35 2.99 2.70 1.08 0.83 0.64 2.03 0.69 0.59 0.13 0.57 6.93 2.05 1.30 2.35 1.46 0.47 0.23 14.99 2.13 1.41 1.45 2.42 1.27 2.48 2.36 0.87 0.73 0.62 1.88 0.68 0.53 0.28 0.46 1.56 0.78 0.84 0.98 0.97 0.99 0.98 0.92 0.85 0.92 0.88 0.94 0.92 0.94 0.96 0.87 0.76 0.94 0.93 focuspast focuspresent focusfuture relativ motion space time 4.64 9.96 1.42 14.26 2.15 6.89 5.46 4.14 8.1 1.00 13.87 2.06 6.17 5.79 0.97 0.92 0.63 0.98 0.93 0.96 0.94 work leisure home money relig death informal swear netspeak assent nonfl 2.56 1.35 0.55 0.68 0.28 0.16 2.52 0.21 0.97 0.95 0.54 2.27 1.37 0.56 0.70 0.32 0.16 0.17 1.11 0.30 0.97 0.95 0.99 0.97 0.96 0.96 0.89 0.68 0.84 LIWC2015 Development Manual Fillers * All Punctuation Periods Commas Colons Semicolons Question marks Exclamation marks Dashes Quotation marks Apostrophes Parentheses (pairs) Other punctuation Page 15 filler Allpunc Period Comma Colon SemiC QMark Exclam Dash Quote Apostro Parenth OtherP 0.11 21.35 7.49 4.75 0.64 0.3 0.58 1.00 1.19 1.67 2.46 0.53 0.73 0.40 21.65 7.56 4.75 0.73 0.29 0.58 1.00 1.21 1.64 2.52 0.63 0.72 0.29 0.98 0.98 1.00 0.98 0.97 1.00 1.00 0.98 0.93 0.94 0.90 0.95 * Due to differences in punctuation rules for transcriptions, the natural language corpus was excluded when computing means and correlations for punctuation categories as well as words per sentence. 1 Correlation is the average correlation between the 2007 and 2015 dictionaries across six corpora. Low correlations (<.80) are to be expected due to the large category differences between the two versions. 2 Cognitive processes is conceptually similar to the cognitive mechanisms LIWC2007 category. The newer cognitive process dimension restricts constituent words to true markers of cognitive activity. 3 Differentiation is conceptually similar to the 2007 exclusive category. 4 Time Orientation categories are similar to the 2007 categories past, present, and future but are more unified to reflect a general time orientation instead of just verb tense usage. LIWC Dictionary Translations The LIWC dictionaries have been translated into several languages, including Spanish, German, Dutch, Norwegian, Italian, Portuguese. Several other language translations are underway, including Arabic, Korean, Turkish, and Chinese. To date, these translations have relied on the LIWC2001 or LIWC2007 dictionaries rather than LIWC2015. Unlike previous versions of LIWC, the current version is bundled exclusively with the original English dictionary versions. LIWC dictionary translations, as well as other published dictionaries, will be made available at the official LIWC dictionary repository ( http://www.liwc.net/dictionaries ). If you would like to build a nonEnglish LIWC2015 dictionary or if you have built one independently would like to add it to the repository, contact the first author at pennebaker@mail.utexas.edu. Helpful References Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text , 23, 32346. LIWC2015 Development Manual Page 16 Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the Association for Computing Machinery (CACM) , 52, 119123. Baayen, R. H., Piepenbrock, R., & Bulickers, L. (1995). The CELEX Lexical Database (Release I) [CD ROM]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania. Back, M. D., Küfner, A. C., & Egloff, B. (2011). “Automatic or the people?” Anger on September 11, 2001, and lessons learned for the analysis of large digital data sets. Psychological science , 22 , 837838. Baddeley, J. L., Daniel, G. R., & Pennebaker, J. W. (2015). How Henry Hellyer’s use of language foretold his suicide . Crisis, 32 , 288292. Bazarova, N. N., Taft, J. G., Choi, Y. H., & Cosley, D. (2012). Managing impressions and relationships on Facebook: Selfpresentational and relational concerns revealed through the analysis of language style. Journal of Language and Social Psychology, 32 , 121141. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conception of time. Cognitive Psychology, 43, 122. Bosson, J. K., Swann, W. B., Jr., & Pennebaker, J. W. (2000). Stalking the perfect measure of implicit selfesteem: The blind men and the elephant revisited? Journal of Personality and Social Psychology, 79, 631643. Boyd, R. L. (2015). MEH: Meaning Extraction Helper [Software]. Available from http://meh.ryanb.cc Boyd, R. L., & Pennebaker, J. W. (2015). Did Shakespeare write Double Falsehood ? Identifying individuals by creating psychological signatures with text analysis. Psychological science , 26, 570582. Brewer, M. B., & Gardner, W. (1996). Who is this “We”? Levels of collective identity and self representations. Journal of Personality & Social Psychology, 71, 8393. Brown, R. (1968). Words and things: An introduction to language . NY: Free Press. Bruner, J. S. (1973). Beyond the information given: Studies in the psychology of knowing. London: W. W. Norton. Bucci, W. (1995). The power of the narrative: a multiple code account. In J. W. Pennebaker (Ed.), Emotion, Disclosure, and Health (pp. 93122) . Washington, DC: American Psychological Association. Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review , 8 , 531544. LIWC2015 Development Manual Page 17 Carey, A. L., Brucks, M. S., Küfner, A. C. P., Holtzman, N. S., Deters, F. G., Back, M. D., Donnellan, M. B., et al. (2015). Narcissism and the user of personal pronouns revisited. Journal of Personality and Social Psychology, 109 , 115. Campbell, R. S. & Pennebaker, J. W. (2003). The secret life of pronouns: Flexibility in writing style and physical health. Psychological science, 14 , 6065. Chambers, J. K., Trudgill, P., & SchillingEstes, N., (2004). The handbook of language variation and change. London: Blackwell. Chung, C. K., & Pennebaker, J. W. (2013). Using computerized text analysis to track social processes. In T. Holtgraves (Ed.), Handbook of language and social psychology (pp. 21923). New York, NY: Oxford. Chung, C. K., & Pennebaker, J. W. (2012). Linguistic inquiry and word count (LIWC): Pronounced “Luke,”... and other useful facts. In P. M. McCarthy & C. Boonthum Denecke (Eds.), Applied natural language processing: Identification, investigation and resolution (pp. 206229). Hershey, PA: IGI Global. Chung, C. K., & Pennebaker, J. W. (2005). Assessing quality of life through natural language use: Implications of computerized text analysis. In W. R. Lenderking and D. A. Revicki (eds.), Advancing health outcomes research methods and clinical applications (pp. 7994). Washington, DC: Degnon Associates. Chung, C. K., & Pennebaker, J. W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social communication (pp. 343359). New York, NY: Psychology Press. Chung, C. K., & Pennebaker, J. W. (2008). Revealing dimensions of thinking in openended selfdescriptions: An automated meaning extraction method for natural language. Journal of Research in Personality, 42, 96132. Cohn, M. A., Mehl, M. R., & Pennebaker, J. W. (2004). Linguistic markers of psychological change surrounding September 11, 2001. Psychological science, 15, 68793. Crammer, K. & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research , 3 , 951991. Damasio, A. R. (1995). Descartes' error: Emotion, reason and the human brain. NY: Harper Collins. Davison, K. P, & Pennebaker, J. W., & Dickerson, S. S. (2000). Who talks? The social psychology of illness support groups. American Psychologist , 55 , 205217. De Choudhury, M., Counts, S., & Horvitz, E. (2013, April). Predicting postpartum changes in emotion and behavior via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 32673276). ACM. Feixas, G., Geldschlager, H., & Neimeyer, R. A. (2002). Content analysis of personal constructs. Journal of Constructivist Psychology, 15, 119. Fiedler, K., & Semin, G. R. (1992). Attribution and language as a sociocognitive environment. In G. R. Semin, and K. Fiedler (Eds.), Language, Interaction, and Social Cognition ( pp. 5878.) Thousand Oaks, CA: Sage Publications, Inc. LIWC2015 Development Manual Page 18 Fitzsimmons, G. M., & Kay, A. C. (2004). Language and interpersonal cognition: Causal effects of variations in pronoun usage on perceptions of closeness. Personality and Social Psychology Bulletin, 5, 547557. Foltz, P. W. (1996). Latent semantic analysis for textbased research. Behavior Research Methods, Instruments & Computers, 28, 197202. Francis, W. N., & Kucera, H. (1982). Frequency analyses of English usage: Lexicon and grammar . MA: Houghton Mifflin. Gazzaniga, M. S. (2005). The ethical brain. NY: Dana Press. Genkin, A., Lewis, D. D., and Madigan, D. (2006). Largescale Bayesian logistic regression for text categorization. Technometrics, 49 , 291304. Gill, A. (2003). Personality and language. The projection and perception of personality in computer mediated communication. Unpublished doctoral dissertation. University of Edinburgh, Scotland. Gill, A. J., Oberlander, J., & Austin, E. (2006). The perception of email personality at zeroacquaintace. Personality and Individual Differences, 40, 497507. Gortner, E. M., & Pennebaker, J. W. (2003). The anatomy of a disaster: Media coverage and communitywide health effects of the Texas A&M Bonfire tragedy. Journal of Social and Clinical Psychology, 22, 580603. Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In C. W. Roberts (Ed.) Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts ( pp. 117129 ). Mahwah, NJ: Erlbaum. Gottschalk, L. A., & Gleser, G. C. (1969). The measurement of psychological states through the content analysis of verbal behavior . CA: University of California Press. Graesser, A. C., Gernsbacher, M. A., & Goldman, S. R. (2003). Introduction to the Handbook of Discourse Processes. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman, Handbook of discourse processes (pp. 123). Mahwah, NJ: Lawrence Erlbaum Associates. Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H., Ventura, M., Olney, A., & Louwerse, M. M. (2004). AutoTutor: A tutor with dialogue in natural language. Behavioral Research Methods, Instruments, and Computers, 36, 180193. Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). CohMetrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments & Computers, 36, 193202. Graham, L. E., Scherwitz, L., & Brand, R. (1989). Self reference and coronary heart disease incidence n the Western Collaborative Group Study. Psychosomatic Medicine, 51, 137144. Graybeal, A., Seagal, J. D., & Pennebaker, J. W. (2002). The role of storymaking in disclosure writing: The psychometrics of narrative. Psychology and Health, 17 , 571581. LIWC2015 Development Manual Page 19 Groom, C. J., & Pennebaker, J. W. (2005). The language of love: Sex, sexual orientation, and language use in online personal advertisements. Sex Roles, 52 , 447461. Groom, C. J., & Pennebaker, J. W. (2003). Words. Journal of Research in Personality, 36 , 615621. Hajek, C., & Giles, H. (2003). New directions in intercultural communication competence. In J. O. Greene and B. R. Burleson (Eds.), Handbook of communication and social interaction skills ( pp.935957). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Halliday, M. A. K., & Matthiessen, C. (2004). An introduction to functional grammar (3rd ed.). London: Arnold. Hart, R. P., Jarvis, S. E., Jennings, W. P., & SmithHowell, D. (2005). Political keywords: Using language that uses us . NY: Oxford University Press. Hartley, J., Pennebaker, J. W., & Fox, C. (2003). Using new technology to assess the academic writing styles of male and female pairs and individuals. Journal of Technical Writing and Communication, 33 , 243261. Hartley, J., Sotto E., & Pennebaker, J. W. (2003). Speaking versus typing: A casestudy of the effects of using voicerecognition software on academic correspondence. British Journal of Educational Technology, 34 , 516. Hartley, J., Sotto, E. and Pennebaker, J. W. (2002). Style and substance in psychology: Are influential articles more readable than less influential ones. Social Studies of Science, 32 , 321334. Heberlein, A. S., Adolphs, R., Pennebaker, J. W., & Tranel, D. (2003). Effects of damage to righthemisphere brain structures on spontaneous emotional and social judgments. Political Psychology, 24, 705726. Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of Research in Personality , 45 , 9299. Holtzman, N. S., Vazire, S., & Mehl, M. R. (2010). Sounds like a narcissist: Behavioral manifestations of narcissism in everyday life. Journal of Research in Personality , 44 , 478484. Ireland, M. E., & Henderson, M. D. (2014). Language style matching, engagement, and impasse in negotiations. Negotiation and conflict management research , 7 , 116. Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., & Pennebaker, J. W. (2011). Language style matching predicts relationship initiation and stability. Psychological science, 22 , 3944. Kacewicz, E., Pennebaker, J. W., Davis, M., Jeon, M., & Graesser, A. C. (2013). Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology , 33, 125143. Kanagawa, C., Cross, S. E., & Markus, H. R. (2001). "Who am I?" The cultural psychology of the conceptual self. Personality and Social Psychology Bulletin, 27, 90103. LIWC2015 Development Manual Page 20 Kashima, E. S., & Kashima, Y. (1998). Culture and language: The case of cultural dimensions and personal pronoun use. Journal of CrossCultural Psychology, 29, 461486. Kashima, E. S., & Kashima, Y. (2005). Erratum to Kashima and Kashima (1998) and reiteration. Journal of CrossCultural Psychology, 36, 396400. Koppel, M., Schler, J., & Zigdon, K. (2005, August). Determining an author's native language by mining a text for errors. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 624628). ACM. Koppel, M., Schler, J., Argamon, S., & Pennebaker, J. W. (2006). Effects of age and gender on blogging. Presented at AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs , Stanford, CA, March 2006. Lee, Chang H., Nam, K., & Pennebaker, J. W. (2004). Is writing as much phonological as speaking? Homophone usage across speaking and writing. Psychologia: An International Journal of Psychology in the Orient, 47 , 19. Lepore, S. J., & Smyth, J. M. (2002). The writing cure: How expressive writing promotes health and emotional wellbeing. Washington: American Psychological Association. Li, J., Zheng, R., & Chen, H. (2006). From fingerprint to writeprint. Communications of the ACM , 49 , 7682. Liehr, P., Mehl, M. R., Summers, L.C., & Pennebaker, J. W. (2004). Connecting with others in the midst of a stressful upheaval on September 11, 2001. Applied Nursing Research, 17 , 29. Liehr, P., Takahashi, R., Nishimura, C., Frazier, L., Kuwajima, I. & Pennebaker, J. W. (2002). Embodied language: Comparison of the cardiac and stroke health experience for Japanese elders. Journal of Nursing Scholarship, 34 , 2732 Lyons, E. J., Mehl, M. R., & Pennebaker, J. W. (2006). Linguistic selfpresentation in anorexia: Differences between proanorexia and recovering anorexia internet language use. Journal of Psychosomatic Research, 60, 253256. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98, 224253. McAdams, D. P. (2001). The psychology of life stories. Review of General Psychology, 5, 100122. Mehl, M. R., Pennebaker, J. W. (2003). The social dynamics of a cultural upheaval: Social interactions surrounding September 11, 2001. Psychological Science, 14, 57985. Mehl, M. R., & Pennebaker, J.W . (2003). The sounds of social life: A psychometric analysis of students’ daily social environments and conversations. Journal of Personality and Social Psychololgy, 84 , 857870. Mehl, M. R., Robbins, M. L., & Holleran, S. E. (2012). How taking a word for a word can be problematic: Contextdependent linguistic markers of extraversion and neuroticism. Journal of Methods and Measurement in the Social Sciences , 3 , 3050. LIWC2015 Development Manual Page 21 Miller, G. A. (1995). The Science of Words. NY: Scientific American Library. Mitchell, T. (1999). Machine Learning . NY: McGrawHill. Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes , 45 , 211236. Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29, 665675. Niederhoffer, K. G. & Pennebaker, J. W. (2002). Linguistic style matching in social interaction. Journal of Language and Social Psychology, 21, 337360. Nisbett, R. E. (2003). The geography of thought: How Asians and Westerners think differently. NY: Free Press. Oberlander, J., & Gill, A. J. (2006). Language with character: A stratified corpus comparison of individual differences in email communication. Discourse Processes, 42, 239270. Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, and reasoning about contradiction. American Psychologist, 54, 741754. Pennebaker, J. W. (1997). Writing about emotional experiences as a therapeutic process. Psychological Science, 8 , 162166. Pennebaker, J. W. (2002). What our words can say about us: Towards a broader language psychology. Psychological Science Agenda, 15 , 89. Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic Inquiry and Word Count (LIWC): LIWC2007 . Austin, TX: LIWC.net. Pennebaker, J. W., Booth, R. J., Boyd, R. L., & Francis, M. E. (2015). Linguistic Inquiry and Word Count: LIWC2015 . Austin, TX: Pennebaker Conglomerates (www.LIWC.net). Pennebaker, J. W. & Campbell, R. S. (2000). The effects of writing about traumatic experience. Clinical Quarterly, 9 , 1721. Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PloS One , 9 , 110. Pennebaker, J. W. & Chung, C.K. (2005). Tracking the social dynamics of responses to terrorism: Language, behavior, and the Internet. In S. Wessely and V.N. Krasnov (Eds.), Psychological responses to the new terrorism: A NATORussia dialogue (pp. 159170). Holland, Amsterdam: ISO Press. Pennebaker, J. W. & Graybeal, A. (2001). Patterns of natural language use: Disclosure, personality, and social integration. Current Directions in Psychological Science, 10, 9093. LIWC2015 Development Manual Page 22 Pennebaker, J. W. & Lee, Chang H. (2002). The power of words in social, clinical, and personality psychology. The Korean Journal of Thinking and Problem Solving, 12 , 3543. Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition and Emotion, 10 , 601626. Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic Inquiry and Word Count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates. Pennebaker, J. W., Groom, C. J., Loew, D., & Dabbs, J. M. (2004). Testosterone as a social inhibitor: Two case studies of the effect of testosterone treatment on language. Journal of Abnormal Psychology, 113 , 172175. Pennebaker, J. W., & Ireland, M. (2008). Analyzing words to understand literature. In W. van Peer and J. Auracher (Eds.), New beginnings for the study of literature (pp. 2448). Cambridge, UK: Cambridge Scholars Publishing. Pennebaker, J. W., & Ireland, M. E. (2011). Using literature to understand authors: The case for computerized text analysis. Scientific Study of Literature , 1 , 3448. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality & Social Psychology, 77, 12961312. Pennebaker, J. W., Mayne, T., & Francis, M. E. (1997). Linguistic predictors of adaptive bereavement. Journal of Personality and Social Psychology, 72 , 863871. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547577. Pennebaker, J. W., Slatcher, R. B., & Chung, C. K. (2005). Linguistic markers of psychological state through media interviews: John Kerry and John Edwards in 2004, Al Gore in 2000. Analysis of Social and Public Policy, 5 , 19. Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the lifespan. Journal of Personality and Social Psychology, 85, 291301. RamirezEsparza, N., & Pennebaker, J. W. (2006). Do good stories produce good health? Exploring words, language, and culture. Narrative Inquiry, 16 , 211219. RamirezEsparza, N., Pennebaker, J. W., Garcia, F. A., & Suria, R. (2007). La psychología del uso de las palabras: Un programa de comutadora que analiza textos en Español (The psychology of word use: A computer program that analyzes texts in Spanish). Revista Mexicana de Psicología, 24 , 8599. Robinson, R. L., Navea, R., & Ickes, W. (2013). Predicting final course performance from students’ written selfintroductions: A LIWC analysis. Journal of Language and Social Psychology , 32, 469479. Rochon, E., & Saffran, E. M., Berndt, R. S., & Schwartz, M. F. (2000). Quantitative analysis of aphasic sentence production: Further development and new data. Brain and Language, 72, 193218. LIWC2015 Development Manual Page 23 Rosenberg, S. D. & Tucker, G. J. (1978). Verbal behavior and schizophrenia: The semantic dimension. Archives of General Psychiatry, 36 , 13311337. Rude, S. S., Gortner, E. M., & Pennebaker, J. W. (2004). Language use of depressed and depressionvulnerable college students. Cognition & Emotion, 18, 11211133. Sbarra, D. A., Smith, H. L., & Mehl, M. R. (2012). When leaving your ex, love yourself observational ratings of selfcompassion predict the course of emotional recovery following marital separation. Psychological Science , 23 , 261269. Scherwitz, L., Berton, K., & Leventhal, H. (1978). Type A behavior, selfinvolvement, and cardiovascular response. Psychosomatic Medicine, 40, 593609. Schiller, R., Tellegen, A., & Evens, J. (1995). An idiogrpahic and nomothetic study of personality description. In J. N. Butcher and C. D. Spielberger (Eds.), Advances in personality assessment (pp. 123). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Schultheiss, O. C., & Brunstein, J. C. (2001). Assessment of implicit motives with a research version of the TAT: Picture profiles, gender differences, and relations to other personality measures. Journal of Personality Assessment, 77, Special issue: More data on the current Rorschach controversy, 7186. Scott, M. (1996). WordSmith. NY: Oxford University Press. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys , 34 , 147. Semin, G. R., Rubini, M., & Fiedler, K. (1995). The answer is in the question: The effect of verb causality on the locus of explanation. Personality & Social Psychology Bulletin, 21, 834841. Skoyen, J. A., Randall, A. K., Mehl, M. R., & Butler, E. A. (2014). “We” overeat, but “I” can stay thin: Pronoun use and body weight in couples who eat to regulate emotion. Journal of Social and Clinical Psychology , 33 , 743766. Slatcher, R. B. & Pennebaker, J. W. (2006). How do I love thee? Let me count the words: The social effects of expressive writing. Psychological Science, 17 , 660664. Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words: Individual differences in linguistic style among U.S. presidential and vice presidential candidates. Journal of Research in Personality, 41 , 6375. Slobin, D. (1996). From “thought” and “language” to “thinking” for “speaking”. From J. J. Gumperz and S. J. Levinson (Eds.), Rethinking linguistic relativity (pp. 7096). New York, NY: Cambridge University Press. Stiles, W. B. (1992). Describing talk: A taxonomy of verbal response modes . California: Sage. Stirman, S. W., & Pennebaker, J. W. (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine, 63, 517522. LIWC2015 Development Manual Page 24 Stone, L. D., & Pennebaker, J. W. (2002). Trauma in real time: Talking and avoiding online conversations about the death of Princess Diana. Basic & Applied Social Psychology, 24, 172182. Stone, L. D. & Pennebaker, J. W. (2002). Trauma in real time: Talking and avoiding online conversations about the death of Princess Diana. Basic and Applied Social Psychology, 24 , 172182. Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The General Inquirer: A Computer Approach to Content Analysis. Cambridge: MIT Press. Tannen, D. (1993). Framing in discourse. London: Oxford University Press. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology , 29 , 2454. Tausczik, Y., Faasse, K., Pennebaker, J. W., & Petrie, K. J. (2012). Public anxiety and information seeking following the H1N1 outbreak: blogs, newspaper articles, and Wikipedia visits. Health Communication , 27 , 179185. Toma, C. L., & Hancock, J. T. (2010, February). Reading between the lines: linguistic cues to deception in online dating profiles. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (pp. 58). ACM. Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM , 10 , 178185. Van Petten, C., & Kutas, M. (1991). Influences of semantic and syntactic context on open and closedclass words. Memory & Cognition, 19, 95112. Van Swol, L. M., & Carlson, C. L. (2015). Language use and influence among minority, majority, and homogeneous group members. Communication Research, 43, 118 . Väyrynen, J., & Honkela, T. (2005). Comparison of independent component analysis and singular value decomposition in word context analysis. Proceedings of AKRR , 5 , 135140. Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54 , 10631070. WeberFox, C., & Neville H. J. (2001). Sensitive periods differentiate processing of open and closedclass words: An eventrelated brain potential study of bilinguals. Journal of Speech, Language, and Hearing Research, 44, 13381353. Weintraub, W. (1989). Verbal behavior in everyday life. NY: Springer. Williams-Baucom, K. J., Atkins, D. C., Sevier, M., Eldridge, K. A., & Christensen, A. (2010). “You” and “I” need to talk about “us”: Linguistic patterns in marital interactions. Personal Relationships , 17 , 4156. Winter, D. G., & McClelland, D. C. (1978). Thematic analysis: An empirically derived measure LIWC2015 Development Manual Page 25 of the effects of liberal arts education. Journal of Educational Psychology, 70, 816. Wolf, M., Horn, A. B., Mehl, M. R., Haug, S., Pennebaker, J. W., & Kordy, H. (2008). Computergestützte quantitative Textanalyse: Äquivalenz und Robustheit der deutschen Version des Linguistic Inquiry and Word Count. Diagnostica , 54 , 8598. Zijlstra, H., Van Meerveld, T., Van Middendorp, H., Pennebaker, J. W., & Geenen, R. (2004). De Nederlandse versie van de ‘linguistic inquiry and word count’(LIWC). Gedrag & gezondheid , 32 , 271281. Portions of the research reported in this manual were made possible by grants from the National Institutes of Health (MH52391), National Science Foundation ( IIS 1344257 ), the Army Research Institute ( W5J9CQ12C0043 ), and the Templeton Foundation. Special thanks go to Cindy Chung. Cindy’s mastery of language, thoughtful feedback, and valuable insights have been vital to the ongoing longevity of the LIWC project. We are also deeply indebted to a number of people who have helped with different phases of LIWC, including: Martha Francis, Laura King, Yitai Seah, Jenna Baddelley, Molly Ireland, Yla Tausczik, Matthias Mehl, Richard Slatcher, Jason Ferrell, Sam Gosling, and Gabriella Harari. We are particularly indebted to the LIWC2015 Development Team of Kiki Adams, Jennifer Caplan, Zachary Reese, Courtney Wang, and Nick Abbs.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Has XFA : No XMP Toolkit : Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30 Modify Date : 2017:12:22 01:10:16+08:00 Create Date : 2017:12:15 14:35:18+08:00 Metadata Date : 2017:12:22 01:10:16+08:00 Format : application/pdf Document ID : uuid:c353106a-97cf-419a-8df4-aad95dc00106 Instance ID : uuid:2be6e33d-9ecc-4d55-b175-074b98775c41 Page Count : 26EXIF Metadata provided by EXIF.tools