A Practical Guide To Sentiment Analysis

A%20Practical%20Guide%20to%20Sentiment%20Analysis

User Manual:

Open the PDF directly: View PDF .
Page Count: 199 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Preface
Contents
1 Affective Computing and Sentiment Analysis
2 Many Facets of Sentiment Analysis
3 Reflections on Sentiment/Opinion Analysis
4 Challenges in Sentiment Analysis
5 Sentiment Resources: Lexicons and Datasets
6 Generative Models for Sentiment Analysis and Opinion Mining
7 Social Media Summarization
8 Deception Detection and Opinion Spam
9 Concept-Level Sentiment Analysis with SenticNet
Index

Socio-Affective Computing 5

ErikCambria

DipankarDas

SivajiBandyopadhyay

AntonioFeraco Editors

A Practical

Guide to

Sentiment

Analysis

Socio-Affective Computing

Volume 5

Series Editor

Amir Hussain, University of Stirling, Stirling, UK

Co-Editor

Erik Cambria, Nanyang Technological University, Singapore

This exciting Book Series aims to publish state-of-the-art research on socially

intelligent, affective and multimodal human-machine interaction and systems.

It will emphasize the role of affect in social interactions and the humanistic side

of affective computing by promoting publications at the cross-roads between

engineering and human sciences (including biological, social and cultural aspects

of human life). Three broad domains of social and affective computing will be

covered by the book series: (1) social computing, (2) affective computing, and

(3) interplay of the ﬁrst two domains (for example, augmenting social interaction

through affective computing). Examples of the ﬁrst domain will include but not

limited to: all types of social interactions that contribute to the meaning, interest and

richness of our daily life, for example, information produced by a group of people

used to provide or enhance the functioning of a system. Examples of the second

domain will include, but not limited to: computational and psychological models of

emotions, bodily manifestations of affect (facial expressions, posture, behavior,

physiology), and affective interfaces and applications (dialogue systems, games,

learning etc.). This series will publish works of the highest quality that advance

the understanding and practical application of social and affective computing

techniques. Research monographs, introductory and advanced level textbooks,

volume editions and proceedings will be considered.

More information about this series at http://www.springer.com/series/13199

Erik Cambria • Dipankar Das

Sivaji Bandyopadhyay • Antonio Feraco

Editors

A Practical Guide to

Sentiment Analysis

123

Editors

Erik Cambria

School of Computer Science

and Engineering

Nanyang Technological University

Singapore, Singapore

Sivaji Bandyopadhyay

Computer Science

and Engineering Department

Jadavpur University

Kolkata, India

Dipankar Das

Computer Science

and Engineering Department

Jadavpur University

Kolkata, India

Antonio Feraco

Fraunhofer IDM@NTU

Nanyang Technological University

Singapore, Singapore

ISSN 2509-5706 ISSN 2509-5714 (electronic)

Socio-Affective Computing

ISBN 978-3-319-55392-4 ISBN 978-3-319-55394-8 (eBook)

DOI 10.1007/978-3-319-55394-8

Library of Congress Control Number: 2017938021

Chapter 4 is published with kind permission of the Her Majesty the Queen Right of Canada.

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology

now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book

are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or

the editors give a warranty, express or implied, with respect to the material contained herein or for any

errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional

claims in published maps and institutional afﬁliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

While sentiment analysis research has become very popular in the past ten years,

most companies and researchers still approach it simply as a polarity detection

problem. In reality, sentiment analysis is a “suitcase problem” that requires tackling

many natural language processing (NLP) subtasks, including microtext analysis,

sarcasm detection, anaphora resolution, subjectivity detection, and aspect extrac-

tion. In this book, we propose an overview of the main issues and challenges

associated with current sentiment analysis research and provide some insights on

practical tools and techniques that can be exploited to both advance the state of the

art in all sentiment analysis subtasks and explore new areas in the same context.

In Chap. 1, we discuss the state of the art of affective computing and sentiment

analysis research, including recent deep learning techniques and linguistic patterns

for emotion and polarity detection from different modalities, e.g., text and video.

In Chap. 2, Bing Liu describes different aspects of sentiment analysis and

different types of opinions. In particular, he uses product reviews as examples to

introduce general key concepts and deﬁnitions that are applicable to all forms of

formal and informal opinion text and all kinds of domains including social and

political domains.

In Chap. 3, Jiwei Li and Eduard Hovy describe possible directions for deeper

understanding about what opinions or sentiments are, why people hold them, and

why and how their facets are chosen and expressed, helping bridge the gap between

psychology/cognitive science and computational approaches.

In Chap. 4, Saif Mohammad discusses different sentiment analysis problems and

the challenges that are to be faced in order to go beyond simply determining whether

a piece of text is positive, negative, or neutral. In particular, the chapter aims to equip

researchers and practitioners with pointers to the latest developments in sentiment

analysis and encourage more work in the diverse landscape of problems, especially

those areas that are relatively less explored.

In Chap. 5, Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire contrast the

process of lexicon creation for a new language or a resource-scarce language from

a resource-rich one and, hence, show how the produced sentiment resources can be

exploited to solve classic sentiment analysis problems.

vi Preface

In Chap. 6, Hongning Wang and ChengXiang Zhai show how generative models

can be used to integrate opinionated text data and their companion numerical

sentiment ratings, enabling deeper analysis of sentiment and opinions to obtain not

only subtopic-level sentiment but also latent relative weights on different subtopics.

In Chap. 7, Vasudeva Varma, Litton Kurisinkel, and Priya Radhakrishnan present

an overview of general approaches to automated text summarization with more

emphasis on extractive summarization techniques. They also describe recent works

on extractive summarization and the nature of scoring function for candidate

summary.

In Chap. 8, Paolo Rosso and Leticia Cagnina describe the very challenging

problems of deception detection and opinion spam detection, as lies and spam are

becoming increasingly serious issues with the rise, both in size and importance, of

social media and public opinion.

Finally, in Chap. 9Federica Bisio et al. describe how to enhance the accuracy

of any algorithm for emotion or polarity detection through the integration of

commonsense reasoning resources, e.g., by embedding a concept-level knowledge

base for sentiment analysis.

Singapore, Singapore Erik Cambria

Kolkata, India Dipankar Das

Kolkata, India Sivaji Bandyopadhyay

Singapore, Singapore Antonio Feraco

Contents

1 Affective Computing and Sentiment Analysis ............................ 1

Erik Cambria, Dipankar Das, Sivaji Bandyopadhyay,

and Antonio Feraco

2 Many Facets of Sentiment Analysis ........................................ 11

Bing Liu

3 Reﬂections on Sentiment/Opinion Analysis ............................... 41

Jiwei Li and Eduard Hovy

4 Challenges in Sentiment Analysis .......................................... 61

Saif M. Mohammad

5 Sentiment Resources: Lexicons and Datasets ............................. 85

Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire

6 Generative Models for Sentiment Analysis and Opinion Mining ....... 107

Hongning Wang and ChengXiang Zhai

7 Social Media Summarization ............................................... 135

Vasudeva Varma, Litton J. Kurisinkel, and Priya Radhakrishnan

8 Deception Detection and Opinion Spam................................... 155

Paolo Rosso and Leticia C. Cagnina

9 Concept-Level Sentiment Analysis with SenticNet ....................... 173

Federica Bisio, Claudia Meda, Paolo Gastaldo, Rodolfo Zunino,

and Erik Cambria

Index ............................................................................... 189

vii

Chapter 1

Affective Computing and Sentiment Analysis

Erik Cambria, Dipankar Das, Sivaji Bandyopadhyay, and Antonio Feraco

Abstract Understanding emotions is one of the most important aspects of personal

development and growth and, as such, it is a key tile for the emulation of

human intelligence. Besides being a important for the advancement of AI, emotion

processing is also important for the closely related task of polarity detection. The

opportunity automatically to capture the sentiments of the general public about

social events, political movements, marketing campaigns, and product preferences,

in fact, has raised increasing interest both in the scientiﬁc community, for the

exciting open challenges, and in the business world, for the remarkable fallouts

in marketing and ﬁnancial market prediction. This has led to the emerging ﬁelds

of affective computing and sentiment analysis, which leverage on human-computer

interaction, information retrieval, and multimodal signal processing for distilling

people’s sentiments from the ever-growing amount of online social data.

Keywords Affective computing • Sentiment analysis • Five eras of the Web •

Jumping NLP curves • Hybrid approaches

1.1 Introduction

Emotions play an important role in successful and effective human-human relation-

ships. In fact, in many situations, human ‘emotional intelligence’ is more important

than IQ for successful interaction (Pantic et al. 2005). There is also signiﬁcant

evidence that rational learning in humans is dependent on emotions (Picard 1997).

E. Cambria ()

School of Computer Science and Engineering, Nanyang Technological University, 639798,

Singapore, Singapore

e-mail: cambria@ntu.edu.sg

D. Das • S. Bandyopadhyay

Computer Science and Engineering Department, Jadavpur University, 700032, Kolkata, India

e-mail: das@cse.jdvu.ac.in;sbandyopadhyay@cse.jdvu.ac.in

A. Feraco

Fraunhofer IDM@NTU, Nanyang Technological University, Singapore, Singapore

e-mail: antonio.feraco@fraunhofer.sg

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_1

2 E. Cambria et al.

Affective computing and sentiment analysis, hence, are key for the advancement

of AI (Minsky 2006) and all the research ﬁelds that stem from it. Moreover, they ﬁnd

applications in several different scenarios and there is a good number of companies,

large and small, that include the analysis of emotions and sentiments as part of

their mission. Sentiment mining techniques can be exploited for the creation and

automated upkeep of review and opinion aggregation websites, in which opinionated

text and videos are continuously gathered from the Web and not restricted to

just product reviews, but also to wider topics such as political issues and brand

perception.

Affective computing and sentiment analysis have also a great potential as a

sub-component technology for other systems. They can enhance the capabilities

of customer relationship management and recommendation systems allowing, for

example, to ﬁnd out which features customers are particularly happy about or

to exclude from the recommendations items that have received very negative

feedbacks. Similarly, they can be exploited for affective tutoring and affective enter-

tainment or for troll ﬁltering and spam detection in online social communication.

Business intelligence is also one of the main factors behind corporate interest

in the ﬁelds of affective computing and sentiment analysis. Nowadays, companies

invest an increasing amount of money in marketing strategies and they are constantly

interested in both collecting and predicting the attitudes of the general public

towards their products and brands. The design of automatic tools capable to mine

sentiments over the Web in real-time and to create condensed versions of these

represents one of the most active research and development areas. The development

of such systems, moreover, is not only important for commercial purposes, but

also for government intelligence applications able to monitor increases in hostile

communications or to model cyber-issue diffusion.

Several commercial and academic tools, e.g., IBM,1SAS,2Oracle,3SenticNet4

and Luminoso,5track public viewpoints on a large-scale by offering graphical

summarizations of trends and opinions in the blogosphere. Nevertheless, most

commercial off-the-shelf (COTS) tools are limited to a polarity evaluation or a mood

classiﬁcation according to a very limited set of emotions. In addition, such methods

mainly rely on parts of text in which emotional states are explicitly expressed

and, hence, they are unable to capture opinions and sentiments that are expressed

implicitly. Because they are mainly based on statistical properties associated with

words, in fact, many COTS tools are easily tricked by linguistic operators such as

negation and disjunction.

The remainder of this chapter lists common tasks of affective computing and

sentiment analysis and presents a general categorization for them, after which some

concluding remarks are proposed.

1http://ibm.com/analytics

2http://sas.com/social

3http://oracle.com/social

4http://business.sentic.net

5http://luminoso.com

1 Affective Computing and Sentiment Analysis 3

1.2 Common Tasks

The Web is evolving towards an era where communities will deﬁne future products

and services.6In this context, big social data analysis (Cambria et al. 2014)is

destined to attract increasing interest from both academia and business (Fig. 1.1).

Fig. 1.1 Owyang’s Five-Eras vision shows that mining sentiments from the general public is

becoming increasingly important for the future of the Web

6http://web-strategist.com/blog/2009/04/27

4 E. Cambria et al.

The basic tasks of affective computing and sentiment analysis are emotion

recognition (Picard 1997; Calvo and D’Mello 2010; Zeng et al. 2009; Schuller et al.

2011; Gunes and Schuller 2012) and polarity detection (Pang and Lee 2008;Liu

2012; Wilson et al. 2005; Cambria 2016). While the former focuses on extracting a

set of emotion labels, the latter is usually a binary classiﬁcation task with outputs

such as ‘positive’ versus ‘negative’, ‘thumbs up’ versus ‘thumbs down’ or ‘like’

versus ‘dislike’. These two tasks are highly inter-related and inter-dependent to the

extent that some sentiment categorization models, e.g., the Hourglass of Emotions

(Cambria et al. 2012), treat it as a unique task by inferring the polarity associated to

a sentence directly from the emotions this conveys. In many cases, in fact, emotion

recognition is considered a sub-task of polarity detection.

Polarity classiﬁcation itself can also be viewed as a subtask of more advanced

analyses. For example, it can be applied to identifying ‘pro and con’ expressions that

can be used in individual reviews to evaluate the pros and cons that have inﬂuenced

the judgements of a product and that make such judgements more trustworthy.

Another instance of binary sentiment classiﬁcation is agreement detection, that is,

given a pair of affective inputs, deciding whether they should receive the same or

differing sentiment-related labels.

Complementary to binary sentiment classiﬁcation is the assignment of degrees of

positivity to the detected polarity or valence to the inferred emotions. If we waive the

assumption that the input under examination is opinionated and it is about one single

issue or item, new challenging tasks arise, e.g., subjectivity detection, opinion target

identiﬁcation, and more (Cambria et al. 2015). The capability of distinguishing

whether an input is subjective or objective, in particular, can be highly beneﬁcial

for a more effective sentiment classiﬁcation. Moreover, a record can also have a

polarity without necessarily containing an opinion, for example a news article can

be classiﬁed into good or bad news without being subjective.

Typically, affective computing and sentiment analysis are performed over on-

topic documents, e.g., on the result of a topic-based search engine. However, several

studies suggested that managing these two task jointly can be beneﬁcial for the

overall performances. For example, off-topic passages of a document could contain

irrelevant affective information and result misleading for the global sentiment

polarity about the main topic. Also, a document can contain material on multiple

topics that may be of interest to the user. In this case, it is therefore necessary to

identify the topics and separate the opinions associated with each of them.

Similar to topic detection is aspect extraction, a subtask of sentiment analysis

that consists in identifying opinion targets in opinionated text, i.e., in detecting

the speciﬁc aspects of a product or service the opinion holder is either praising

or complaining about. In a recent approach, Poria et al. (2016) used a 7-layer deep

convolutional neural network to tag each word in opinionated sentences as either

aspect or non-aspect word and developed a set of linguistic patterns for the same

purpose in combination with the neural network.

Other sentiment analysis subtasks include aspect extraction (Poria et al. 2016),

subjectivity detection (Chaturvedi et al. 2016), concept extraction (Rajagopal et al.

1 Affective Computing and Sentiment Analysis 5

2013), named entity recognition (Ma et al. 2016), and sarcasm detection (Poria et al.

2016), but also complementary tasks such as personality recognition (Poria et al.

2013), user proﬁling (Mihalcea and Garimella 2016) and especially multimodal

fusion (Poria et al. 2016). With increasing amounts of webcams installed in end-

user devices such as smart phones, touchpads, or netbooks, there is an increasing

amount of affective information posted to social online services in an audio or

audiovisual format rather than on a pure textual basis. For a rough impression on

the extent, consider that two days of video material are uploaded to YouTube on

average per minute. Besides speech-to-text recognition, this allows for additional

exploitation of acoustic information, facial expression and body movement analysis

or even the “mood” of the background music or the color ﬁlters, etc.

Multimodal fusion is to integrate all single modalities into a combined single

representation. There are basically two types of fusion techniques that have been

used in most of the literature to improve reliability in emotion recognition from

multimodal information: feature-level fusion and decision-level fusion (Konar and

Chakraborty 2015). The authors in Raaijmakers et al. (2008) fuse acoustic and

linguistic information. Yet, linguistic information is based on the transcript of the

spoken content rather than on automatic speech recognition output. In Morency et al.

(2011), acoustic, textual, and video features are combined for the assessment of

opinion polarity in 47 YouTube videos. A signiﬁcant improvement is demonstrated

in a leave-one-video-out evaluation using Hidden-Markov-Models for classiﬁcation.

As relevant features the authors identify polarized words, smile, gaze, pauses, and

voice pitch. Textual analysis is, however, also only based on the manual transcript

of spoken words.

In Poria et al. (2016), ﬁnally, the authors propose a novel methodology for

multimodal sentiment analysis, which consists in harvesting sentiments from Web

videos by demonstrating a model that uses audio, visual and textual modalities as

sources of information. They used both feature- and decision-level fusion methods

to merge affective information extracted from multiple modalities, achieving an

accuracy of nearly 80%.

1.3 General Categorization

Existing approaches to affective computing and sentiment analysis can be grouped

into three main categories: knowledge-based techniques, statistical methods, and

hybrid approaches.

Knowledge-based techniques are very popular because of their accessibility and

economy. Text is classiﬁed into affect categories based on the presence of fairly

unambiguous affect words like ‘happy’, ‘sad’, ‘afraid’, and ‘bored’. Popular sources

of affect words or multi-word expressions are Ortony’s Affective Lexicon (Ortony

et al. 1988), Wiebe’s linguistic annotation scheme (Wiebe et al. 2005), WordNet-

6 E. Cambria et al.

Affect (Strapparava and Valitutti 2004), SentiWordNet (Esuli and Sebastiani 2006),

SenticNet (Cambria et al. 2016), and other probabilistic knowledge bases trained

from linguistic corpora (Stevenson et al. 2007; Somasundaran et al. 2008;Rao

and Ravichandran 2009). The major weakness of knowledge-based approaches is

poor recognition of affect when linguistic rules are involved. For example, while

a knowledge base can correctly classify the sentence “today was a happy day”

as being happy, it is likely to fail on a sentence like “today wasn’t a happy

day at all”. To this end, more sophisticated knowledge-based approaches exploit

linguistics rules to distinguish how each speciﬁc knowledge base entry is used in

text (Poria et al. 2015). The validity of knowledge-based approaches, moreover,

heavily depends on the depth and breadth of the employed resources. Without

a comprehensive knowledge base that encompasses human knowledge, in fact,

it is not easy for a sentiment mining system to grasp the semantics associated

with natural language or human behavior. Another limitation of knowledge-based

approaches lies in the typicality of their knowledge representation, which is usually

strictly deﬁned and does not allow handling different concept nuances, as the

inference of semantic and affective features associated with concepts is bounded

by the ﬁxed, ﬂat representation.

Statistical methods, such as support vector machines and deep learning, have

been popular for affect classiﬁcation of texts and have been used by researchers

on projects such as Pang’s movie review classiﬁer (Pang et al. 2002) and many

others (Hu and Liu 2004; Glorot et al. 2011; Socher et al. 2013; Lau et al. 2014;

Oneto et al. 2016). By feeding a machine learning algorithm a large training corpus

of affectively annotated texts, it is possible for the system to not only learn the

affective valence of affect keywords (as in the keyword spotting approach), but also

to take into account the valence of other arbitrary keywords (like lexical afﬁnity)

and word co-occurrence frequencies. However, statistical methods are generally

semantically weak, i.e., lexical or co-occurrence elements in a statistical model have

little predictive value individually. As a result, statistical text classiﬁers only work

with acceptable accuracy when given a sufﬁciently large text input. So, while these

methods may be able to affectively classify user’s text on the page- or paragraph-

level, they do not work well on smaller text units such as sentences or clauses.

Hybrid approaches to affective computing and sentiment analysis, ﬁnally, exploit

both knowledge-based techniques and statistical methods to perform tasks such as

emotion recognition and polarity detection from text or multimodal data. Sentic

computing (Cambria and Hussain 2015), for example, exploits an ensemble of

knowledge-driven linguistic patterns and statistical methods to infer polarity from

text. Xia et al. (2015) used SenticNet and a Bayesian model for contextual

concept polarity disambiguation. Dragoni et al. (2014) proposed a fuzzy framework

which merges WordNet, ConceptNet and SenticNet to extract key concepts from a

sentence. iFeel (Araújo et al. 2014) is a system that allows users to create their own

sentiment analysis framework by combing SenticNet, SentiWordNet and other sen-

timent analysis methods. Chenlo and Losada (2014) used SenticNet to extract bag of

concepts and polarity features for subjectivity detection and other sentiment analysis

tasks. Chung et al. (2014) used SenticNet concepts as seeds and proposed a method

1 Affective Computing and Sentiment Analysis 7

of random walk in ConceptNet to retrieve more concepts along with polarity scores.

Other works propose the joint use of knowledge bases and machine learning for

Twitter sentiment analysis (Bravo-Marquez et al. 2014), short text message classiﬁ-

cation (Gezici et al. 2013) and frame-based opinion mining (Recupero et al. 2014).

1.4 Conclusion

The passage from a read-only to a read-write Web made users more enthusiastic

about sharing their emotion and opinions through social networks, online com-

munities, blogs, wikis, and other online collaborative media. In recent years, this

collective intelligence has spread to many different areas of the Web, with particular

focus on ﬁelds related to our everyday life such as commerce, tourism, education,

and health.

Despite signiﬁcant progress, however, affective computing and sentiment anal-

ysis are still ﬁnding their own voice as new inter-disciplinary ﬁelds. Engineers

and computer scientists use machine learning techniques for automatic affect

classiﬁcation from video, voice, text, and physiology. Psychologists use their long

tradition of emotion research with their own discourse, models, and methods.

Affective computing and sentiment analysis are research ﬁelds inextricably bound

to the affective sciences that attempt to understand human emotions. Simply put, the

development of affect-sensitive systems cannot be divorced from the century-long

psychological research on emotion.

Hybrid approaches aim to better grasp the conceptual rules that govern sentiment

and the clues that can convey these concepts from realization to verbalization in

the human mind. In recent years, such approaches are gradually setting affective

computing and sentiment analysis as interdisciplinary ﬁelds in between mere

NLP and natural language understanding by gradually shifting from syntax-based

techniques to more and more semantics-aware frameworks Cambria and White

(2014), where both conceptual knowledge and sentence structure are taken into

account (Fig. 1.2).

So far, sentiment mining approaches from text or speech have been mainly based

on the bag-of-words model because, at ﬁrst glance, the most basic unit of linguistic

structure appears to be the word. Single-word expressions, however, are just a subset

of concepts, multi-word expressions that carry speciﬁc semantics and sentics, that

is, the denotative and connotative information commonly associated with objects,

actions, events, and people. Sentics, in particular, speciﬁes the affective information

associated with real-world entities, which is key for emotion recognition and

polarity detection, the basic tasks of affective computing and sentiment analysis.

The best way forward for these two ﬁelds, hence, is the ensemble application of

semantic knowledge and machine learning, where different approaches can cover

for each other’s ﬂaws. In particular, the combined application of linguistics and

knowledge bases will allow sentiments to ﬂow from concept to concept based on

8 E. Cambria et al.

Fig. 1.2 Jumping NLP curves

the dependency relation of the input sentence, while machine learning will act as

backup for missing concepts and unknown linguistic patterns.

Next-generation sentiment mining systems need broader and deeper common

and commonsense knowledge bases, together with more brain-inspired and

psychologically-motivated reasoning methods, in order to better understand

natural language opinions and, hence, more efﬁciently bridge the gap between

(unstructured) multimodal information and (structured) machine-processable data.

Looking ahead, blending scientiﬁc theories of emotion with the practical engi-

neering goals of analyzing sentiments in natural language and human behavior

will pave the way for development of more bio-inspired approaches to the design

of intelligent sentiment mining systems capable of handling semantic knowledge,

making analogies, learning new affective knowledge, and detecting, perceiving, and

‘feeling’ emotions.

References

Araújo, M., P. Gonçalves, M. Cha, and F. Benevenuto. 2014. iFeel: A system that compares and

combines sentiment analysis methods. In WWW, 75–78.

Bravo-Marquez, F., M. Mendoza, and B. Poblete. 2014. Meta-level sentiment models for big social

data analysis. Knowledge-Based Systems 69: 86–99.

1 Affective Computing and Sentiment Analysis 9

Calvo, R., and S. D’Mello. 2010. Affect detection: An interdisciplinary review of models, methods,

and their applications. IEEE Transactions on Affective Computing 1(1): 18–37.

Cambria, E. 2016. Affective computing and sentiment analysis. IEEE Intelligent Systems 31(2):

102–107.

Cambria, E., and A. Hussain. 2015. Sentic computing: A common-sense-based framework for

concept-level sentiment analysis. Cham: Springer.

Cambria, E., A. Livingstone, and A. Hussain. 2012. The hourglass of emotions. In Cognitive

behavioral systems, ed. A. Esposito, A. Vinciarelli, and R. Hoffmann, V. Muller, Lecture notes

in computer science, vol. 7403, 144–157. Berlin/Heidelberg: Springer.

Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for

sentiment analysis based on conceptual primitives. In COLING, 2666–2677.

Cambria, E., S. Poria, F. Bisio, R. Bajpai, and I. Chaturvedi. 2015. The CLSA model: A novel

framework for concept-level sentiment analysis. In Computational linguistics and intelligent

text processing. CICLing 2015, ed. A. Gelbukh, LNCS, vol. 9042, 3–22. Cham: Springer.

Cambria, E., H. Wang, and B. White. 2014. Guest editorial: Big social data analysis. Knowledge-

Based Systems 69: 1–2.

Cambria, E., and B. White. 2014. Jumping NLP curves: A review of natural language processing

research. IEEE Computational Intelligence Magazine 9(2): 48–57.

Chaturvedi, I., E. Cambria, and D. Vilares. 2016. Lyapunov ﬁltering of objectivity for Spanish

sentiment model. In: IJCNN, 4474–4481.

Chenlo, J.M., and D.E. Losada. 2014. An empirical study of sentence features for subjectivity and

polarity classiﬁcation. Information Sciences 280: 275–288.

Chung, J.K.C., C.E. Wu, and R.T.H. Tsai. 2014. Improve polarity detection of online reviews with

bag-of-sentimental-concepts. In Proceedings of the 11th ESWC. Semantic Web Evaluation

Challenge. Crete: Springer.

Dragoni, M., A.G. Tettamanzi, and C. da Costa Pereira. 2014. A fuzzy system for concept-level

sentiment analysis. In Semantic web evaluation challenge, 21–27. Cham: Springer.

Esuli, A., and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion

mining. In LREC.

Gezici, G., R. Dehkharghani, B. Yanikoglu, D. Tapucu, and Y. Saygin. 2013. Su-sentilab: A

classiﬁcation system for sentiment analysis in twitter. In International Workshop on Semantic

Evaluation, 471–477.

Glorot, X., A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment

classiﬁcation: A deep learning approach. In ICML, Bellevue.

Gunes, H., and B. Schuller. 2012. Categorical and dimensional affect analysis in continuous input:

Current trends and future directions. Image and Vision Computing 31(2): 120–136.

Hu, M., and B. Liu. 2004. Mining and summarizing customer reviews. In KDD, Seattle.

Konar, A., and A. Chakraborty. 2015. Emotion recognition: A pattern analysis approach. Hoboken:

Wiley & Sons.

Lau, R., Y. Xia, and Y. Ye. 2014. A probabilistic generative model for mining cybercriminal

networks from online social media. IEEE Computational Intelligence Magazine 9(1): 31–43

Liu, B. 2012. Sentiment analysis and opinion mining. San Rafael: Morgan and Claypool.

Ma, Y., E. Cambria, and S. Gao. 2016. Label embedding for zero-shot ﬁne-grained named entity

typing. In COLING, Osaka, 171–180.

Mihalcea, R., and A. Garimella. 2016. What men say, what women hear: Finding gender-speciﬁc

meaning shades. IEEE Intelligent Systems 31(4): 62–67.

Minsky, M. 2006. The emotion machine: Commonsense thinking, artiﬁcial intelligence, and the

future of the human mind. New York: Simon & Schuster.

Morency, L.P., R. Mihalcea, and P. Doshi. 2011. Towards multimodal sentiment analysis:

Harvesting opinions from the web. In International Conference on Multimodal Interfaces

(ICMI), 169–176. New York: ACM.

Oneto, L., F. Bisio, E. Cambria, and D. Anguita. 2016. Statistical learning theory and ELM for big

social data analysis. IEEE Computational Intelligence Magazine 11(3): 45–55.

10 E. Cambria et al.

Ortony, A., G. Clore, and A. Collins. 1988. The cognitive structure of emotions. Cambridge:

Cambridge University Press.

Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in

Information Retrieval 2: 1–135.

Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classiﬁcation using machine

learning techniques. In EMNLP, Philadelphia, 79–86.

Pantic, M., N. Sebe, J. Cohn, and T. Huang. 2005. Affective multimodal human-computer

interaction. In ACM International Conference on Multimedia, New York, 669–676.

Picard, R. 1997. Affective computing. Boston: The MIT Press.

Poria, S., E. Cambria, and A. Gelbukh. 2016. Aspect extraction for opinion mining with a deep

convolutional neural network. Knowledge-Based Systems 108: 42–49.

Poria, S., E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data ﬂow analysis

by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):

26–36.

Poria, S., E. Cambria, D. Hazarika, and P. Vij. 2016. A deeper look into sarcastic tweets using deep

convolutional neural networks. In COLING, 1601–1612.

Poria, S., E. Cambria, N. Howard, G.B. Huang, and A. Hussain. 2016. Fusing audio, visual and

textual clues for sentiment analysis from multimodal content. Neurocomputing 174: 50–59.

Poria, S., I. Chaturvedi, E. Cambria, and A. Hussain. 2016. Convolutional MKL based multimodal

emotion recognition and sentiment analysis. In ICDM, 439–448.

Poria, S., A. Gelbukh, B. Agarwal, E. Cambria, and N. Howard. 2013. Common sense knowledge

based personality recognition from text. In Advances in soft computing and its applications,

484–496. Berlin/Heidelberg: Springer.

Raaijmakers, S., K. Truong, and T. Wilson. 2008. Multimodal subjectivity analysis of multiparty

conversation. In EMNLP, Edinburgh, 466–474.

Rajagopal, D., E. Cambria, D. Olsher, and K. Kwok. 2013. A graph-based approach to common-

sense concept extraction and semantic similarity detection. In WWW, Rio De Janeiro, 565–570.

Rao, D., and D. Ravichandran. 2009. Semi-supervised polarity lexicon induction. In EACL,

Athens, 675–682.

Recupero, D.R., V. Presutti, S. Consoli, A. Gangemi, and A. Nuzzolese. 2014. Sentilo: Frame-

based sentiment analysis. Cognitive Computation 7(2): 211–225.

Schuller, B., A. Batliner, S. Steidl, and D. Seppi. 2011. Recognising realistic emotions and affect

in speech: State of the art and lessons learnt from the ﬁrst challenge. Speech Communication

53(9/10): 1062–1087.

Socher, R., A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. 2013.

Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP,

1642–1654.

Somasundaran, S., J. Wiebe, and J. Ruppenhofer. 2008. Discourse level opinion interpretation. In

COLING, Manchester, 801–808.

Stevenson, R., J. Mikels, and T. James. 2007. Characterization of the affective norms for english

words by discrete emotional categories. Behavior Research Methods 39: 1020–1024.

Strapparava, C., and A. Valitutti. 2004. WordNet-Affect: An affective extension of WordNet. In

LREC, Lisbon, 1083–1086.

Wiebe, J., T. Wilson, and C. Cardie. 2005. Annotating expressions of opinions and emotions in

language. Language Resources and Evaluation 39(2): 165–210.

Wilson, T., J. Wiebe, and P. Hoffmann. 2005. Recognizing contextual polarity in phrase-level

sentiment analysis. In HLT/EMNLP, Vancouver, 347–354.

Xia, Y., E. Cambria, A. Hussain, and H. Zhao. 2015. Word polarity disambiguation using bayesian

model and opinion-level features. Cognitive Computation 7(3): 369–380.

Zeng, Z., M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods:

Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and

Machine Intelligence 31(1): 39–58.

Chapter 2

Many Facets of Sentiment Analysis

Bing Liu

Abstract Sentiment analysis or opinion mining is the computational study of

people’s opinions, sentiments, evaluations, attitudes, moods, and emotions. It is

one of the most active research areas in natural language processing, data mining,

information retrieval, and Web mining. In recent years, its research and applications

have also spread to management sciences and social sciences due to its importance

to business and society as a whole. This chapter deﬁnes the sentiment analysis

problem and its related concepts such as sentiment, opinion, emotion, mood, and

affect. The goal is to abstract a structure from the complex unstructured natural

language text related to the problem and its pertinent concepts. The deﬁnitions not

only enable us to see a rich set of inter-related sub-problems, but also a common

framework that can unify existing research directions. They also help researchers

design more robust solution techniques by exploiting the inter-relationships of the

sub-problems.

Keywords Sentiment analysis • Opinion mining • Emotion • Mood • Affect •

Subjectivity

Many people thought that sentiment analysis is just the problem of classifying

whether a document or a sentence expresses a positive or negative sentiment or

opinion. It is in fact a much more complex problem than that. It involves many facets

and multiple sub-problems. In this chapter, I deﬁne an abstraction of the sentiment

analysis problem. The deﬁnitions will enable us to see a rich set of inter-related

sub-problems. It is often said that if we cannot structure a problem, we probably

do not understand the problem. The objective of the deﬁnitions is to abstract a

structure from the complex unstructured natural language text. The structure serves

as a common framework to unify existing research directions and enable researchers

to design more robust solution techniques by exploiting the inter-relationships of the

sub-problems.

B. Liu ()

Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA

e-mail: liub@cs.uic.edu

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_2

12 B. Liu

Unlike factual information, sentiment and opinion have an important character-

istic, namely, being subjective. The subjectivity comes from many sources. First

of all, different people may have different experiences and thus different opinions.

Different people may also have different interests and/or different ideologies. Due to

such different subjective experiences, views, interests and ideologies, it is important

to examine a collection of opinions from many people rather than only one opinion

from a single person because such an opinion represents only the subjective view of

a single person, which is usually not sufﬁcient for action. With a large number of

opinions, some form of summary becomes necessary (Hu and Liu 2004). Thus, the

problem deﬁnitions should also state what kind of summary may be desired. Along

with the problem deﬁnitions, the chapter also discusses different types of opinions

and the important concepts of affect, emotion and mood.

Throughout this chapter, I mainly use product reviews and sentences from such

reviews as examples to introduce the key concepts, but the ideas and the resulting

deﬁnitions are general and applicable to all forms of formal and informal opinion

text such as news articles, tweets (Twitter posts), forum discussions, blogs, and

Facebook posts, and all kinds of domains including social and political domains.

The content of this chapter is mainly taken from my book “Sentiment Analysis:

Mining Opinions, Sentiments, and Emotions” (Liu 2015).

2.1 Deﬁnition of Opinion

Sentiment analysis mainly studies opinions that express or imply positive or

negative sentiment. We deﬁne the problem in this context. We use the term opinion

as a broad concept that covers sentiment, evaluation, appraisal, or attitude, and its

associated information such as opinion target and the person who holds the opinion,

and use the term sentiment to mean only the underlying positive or negative feeling

implied by opinion. Due to the need to analyze a large volume of opinions, in

deﬁning opinion we consider two levels of abstraction: a single opinion and aset

of opinions. In this section, we focus on deﬁning a single opinion and describing

the tasks involved in extracting an opinion. Section 2.2 focuses on a set of opinions,

where we deﬁne opinion summary.

2.1.1 Opinion Deﬁnition

We use the following review (Review A) about a camera to introduce the problem

(an id number is associated with each sentence for easy reference):

2 Many Facets of Sentiment Analysis 13

Review A

Posted by John Smith

Date: September 10, 2011

(1) I bought a Canon G12 camera six months ago.(2)Isimplyloveit.(3)The picture quality

is amazing.(4)The battery life is also long.(5)However, my wife thinks it is too heavy for

her.

From this review, we can make the following important observation:

Opinion, sentiment and target: Review A has several opinions with positive or

negative sentiment about the Canon G12 camera. Sentence (2) expresses a

positive sentiment about the Canon camera as a whole. Sentence (3) expresses

a positive sentiment about its picture equality. Sentence (4) expresses a positive

sentiment about its battery life. Sentence (5) expresses a negative sentiment about

the camera’s weight.

From these opinions, we can make a crucial observation about sentiment

analysis. That is, an opinion has two key components: a target g and a sentiment

son the target, i.e., (g,s), where gcan be any entity or aspect of the entity on

which an opinion has been expressed, and scan be a positive, negative, or neutral

sentiment, or a numeric rating. Positive,negative and neutral are called sentiment

or opinion orientations. For example, the target of the opinion in sentence (2) is

the Canon G12 camera, the target of the opinion in sentence (3) is the picture

quality of Canon G12, and the target of sentence (5) is the weight of Canon G12

(weight is indicated by heavy). Target is also called topic by some researchers.

Opinion holder: Review A contains opinions from two persons, who are called

opinion sources or opinion holders (Kim and Hovy 2004; Wiebe et al. 2005).The

holder of the opinions in sentences (2), (3), and (4) is the author of the review

(“John Smith”), but for sentence (5), it is the wife of the author.

Time of opinion: The date of the review was September 10, 2011. This date is useful

because one often wants to know how opinions change over time or the opinion

trend.

With this example, we can deﬁne opinion as a quadruple.

Deﬁnition 1 (Opinion) An opinion is a quadruple,

.g;s;h;t/;

where gis the sentiment target,sis the sentiment of the opinion about the target g,

his the opinion holder (the person or organization who holds the opinion), and tis

the time when the opinion is expressed.

The four components here are essential. It is generally problematic if any of

them is missing. For example, the time component is important in practice because

14 B. Liu

an opinion two years ago is not the same as an opinion today. Not having an opinion

holder is also problematic. For example, an opinion from a very important person

(e.g., the US President) is probably more important than that from the average Joe

on the street.

One thing that we want to stress about the deﬁnition is that opinion has target.

Recognizing this is important for two reasons: First, in a sentence with multiple

targets, we need to identify the speciﬁc target for each positive or negative sentiment.

For example, “Apple is doing very well in this poor economy” has a positive

sentiment and a negative sentiment. The target for the positive sentiment is Apple

and the target for the negative sentiment is economy. Second, words or phrases such

as good,amazing,bad and poor that express sentiments (called sentiment or opinion

terms) and opinion targets often have some syntactic relations (Hu and Liu 2004;

Qiu et al. 2011; Zhuang et al. 2006), which allow us to design algorithms to extract

both sentiment terms and opinion targets, which are two core tasks of sentiment

analysis (see Sect. 2.1.6).

The opinion deﬁned here is just one type of opinion, called a regular opinion

(e.g., “Coke taste great”). Another type is comparative opinion (e.g., “Coke tastes

better than Pepsi”) which needs a different deﬁnition (Jindal and Liu 2006b;Liu

2006). Section 2.1.4 will further discuss different types of opinions. For the rest of

this section, we focus on only regular opinions, which, for simplicity, we will just

call opinions.

2.1.2 Sentiment Target

Deﬁnition 2 (Sentiment Target) The sentiment target, also known as the opinion

target, of an opinion is the entity or a part or attribute of the entity that the sentiment

has been expressed upon.

For example, in sentence (3) of Review A, the target is the picture quality of

Canon G12, although the sentence mentioned only the picture quality. The target is

not just the picture quality because without knowing that the picture quality belongs

to the Canon G12 camera, the opinion in the sentence is of little use.

An entity can be decomposed and represented hierarchically (Liu 2006).

Deﬁnition 3 (Entity) An entity e is a product, service, topic, person, organization,

issue or event. It is described with a pair, e:(T,W), where Tis a hierarchy of parts,

sub-parts, and so on, and Wis a set of attributes of e. Each part or sub-part also has

its own set of attributes.

For example, a particular camera model is an entity, e.g., Canon G12. It has a

set of attributes, e.g., picture quality,size, and weight, and a set of parts, e.g., lens,

viewﬁnder, and battery.Battery also has its own set of attributes, e.g., battery life

and battery weight. A topic can be an entity too, e.g., tax increase, with its sub-

topics or parts ‘tax increase for the poor,’ ‘tax increase for the middle class’ and

‘tax increase for the rich.’

2 Many Facets of Sentiment Analysis 15

This deﬁnition describes an entity hierarchy based on the part-of relation. The

root node is the name of the entity, e.g., Canon G12 Review A. All the other nodes

are parts and sub-parts, etc. An opinion can be expressed on any node and any

attribute of the node. For instance, in Review A, sentence (2) expresses a positive

opinion about the entity Canon G12 as a whole, and sentence (3) expresses a

positive opinion about the picture quality attribute of the camera. Clearly, we can

also express opinions about any part or component of the camera.

In the research literature, entities are also called objects, and attributes are also

called features (as in product features) (Hu and Liu 2004;Liu2010). The terms

object and feature are not used in this Chapter because object can be confused with

the term object used in grammar and feature can be confused with feature used

in machine learning as data attribute. In recent years, the term aspect has become

popular, which covers both part and attribute (see Sect. 2.1.4).

Entities may be called other names in speciﬁc application domains. For example,

in politics, entities are usually political candidates,issues, and events. There is

no term that is perfect for all application domains. The term entity is chosen

because most current applications of sentiment analysis study opinions about

various forms of named entities, e.g., products, services, brands, organizations,

events, and people.

2.1.3 Sentiment and Its Intensity

Deﬁnition 4 (Sentiment) Sentiment is the underlying feeling, attitude, evaluation,

or emotion associated with an opinion. It is represented as a triple,

.y;o;i/;

where yis the type of the sentiment, ois the orientation of the sentiment, and iis

the intensity of the sentiment.

Sentiment type: Sentiment can be classiﬁed into several types. There are linguistic-

based, psychology-based, and consumer research-based classiﬁcations. Here I

choose to use a consumer research-based classiﬁcation as it is simple and easy

to use in practice. Consumer research classiﬁes sentiment broadly into two

categories: rational sentiment and emotional sentiment (Chaudhuri 2006).

Deﬁnition 5 (Rational Sentiment) Rational sentiments are from rational reason-

ing, tangible beliefs, and utilitarian attitudes. They express no emotions.

We also call opinions expressing rational sentiment the rational opinions.The

opinions in the following sentences imply rational sentiment: “The voice of this

phone is clear,” and “This car is worth the price.”

Deﬁnition 6 (Emotional Sentiment) Emotional sentiments are from non-tangible

and emotional responses to entities which go deep into people’s psychological state

of mind.

16 B. Liu

We also call opinions expressing emotional sentiment the emotional opinions.

The opinions in the following sentences imply emotional sentiment: “I love iPhone,”

“I am so angry with their service people,” “This is the best car ever” and “After our

team won, I cried.”

Emotional sentiment is stronger than rational sentiment, and is usually more

important in practice. For example, in marketing, to guarantee the success of a new

product in the market, the positive sentiment from a large population of consumers

has to reach the emotional level. Rational positive may not be sufﬁcient.

Each of these broad categories can be further divided into smaller categories.

For example, there are many types of emotions, e.g., anger,joy, fear, and sadness.

We will discuss some possible sub-divisions of rational sentiment in Sect. 2.4.2 and

different emotions in Sect. 2.3. In applications, the user is also free to design their

own sub-categories.

Sentiment orientation: It can be positive,negative,orneutral. Neutral usually means

the absence of sentiment or no sentiment or opinion. Sentiment orientation is also

called polarity,semantic orientation,orvalence in the research literature.

Sentiment intensity: Sentiment can have different levels of strength or intensity.

People often use two ways to express intensity of their feelings in text. The

ﬁrst is to choose sentiment terms (words or phrases) with suitable strengths.

For example, good is weaker than excellent, and dislike is weaker than detest.

Sentiment words are words in a language that are often used to express positive

or negative sentiments. For example, good,wonderful, and amazing are positive

sentiment words, and bad,poor, and terrible are negative sentiment words.

The second is to use intensiﬁers and diminishers, which are terms that change

the degree of the expressed sentiment. An intensiﬁer increases the intensity

of a positive/negative term, while a diminisher decreases the intensity of that

term. Common English intensiﬁers include very,so,extremely,dreadfully,really,

awfully,terribly, etc., and common English diminishers include slightly,pretty,

a little bit,abit,somewhat,barely,etc.

Sentiment rating: In applications, we commonly use some discrete ratings to express

sentiment intensity. Five levels (e.g., 1–5 stars) are commonly employed, which

can be interpreted as follows based on the two types of sentiment in Deﬁnitions

5and 6:

•emotional positive (C2 or 5 stars)

•rational positive (C1 or 4 stars)

•neutral (0 or 3 stars)

•rational negative (1 or 2 stars)

•emotional negative (2or1star)

Clearly, it is possible to have more rating levels, which, however, become difﬁcult

to differentiate based on the natural language text alone due to the highly subjective

nature and the fact that people’s spoken or written expressions may not fully match

with their psychological states of mind. For example, the sentence “This is an

excellent phone” expresses a rational evaluation of the phone, while “Ilovethis

2 Many Facets of Sentiment Analysis 17

phone” expresses an emotional evaluation about the phone. However, whether they

represent completely different psychology states of mind of the authors is hard to

say. In practice, the above ﬁve levels are sufﬁcient for most applications. If these ﬁve

levels are not enough in some applications, I suggest dividing emotional positive

(and, respectively, emotional negative) into two levels. Such applications are likely

to involve sentiment about personal, social or political events or issues, for which

people can be highly emotional.

2.1.4 Opinion Deﬁnition Simpliﬁed

Opinion as deﬁned in Deﬁnition 1, although concise, may not be easy to use in

practice especially in the domain of online reviews of products, services, and brands.

Let us ﬁrst look at the sentiment (or opinion) target. The central concept here is

entity, which is represented as a hierarchy with an arbitrary number of levels. This

can be too complex for practical applications because NLP is a very difﬁcult task.

Recognizing parts and attributes of an entity at different levels of details is extremely

hard. Most applications also do not need such a complex analysis. Thus, we simplify

the hierarchy to two levels and use the term aspect to denote both part and attribute.

In the simpliﬁed tree, the root node is still the entity itself and the second level (also

the leaf level) nodes are different aspects of the entity.

The deﬁnition of sentiment in Deﬁnition 4can be simpliﬁed too. In many

applications, positive (denoted by C1), negative (denoted by 1) and neutral

(denoted by 0) orientations alone are already enough. In almost all applications,

5 levels of ratings are sufﬁcient, e.g., 1–5 stars. In both cases, sentiment can be

represented with a single value. The other two components in the triple can be folded

into this value.

This simpliﬁed framework is what is typically used in practical sentiment

analysis systems. We now redeﬁne the concept of opinion (Hu and Liu 2004;Liu

2010).

Deﬁnition 7 (Opinion) An opinion is a quintuple,

.e;a;s;h;t/;

where eis the target entity, ais the target aspect of entity eon which the opinion

has been expressed, sis the sentiment of the opinion on aspect aof entity e, h is

the opinion holder, and tis the opinion posting time. scan be positive,negative,

or neutral,orarating (e.g., 1–5 stars). When an opinion is only on the entity as

a whole, the special aspect GENERAL is used to denote it. Here, eand atogether

represent the opinion target.

Sentiment analysis (or opinion mining) based on this deﬁnition is often called

aspect-based sentiment analysis, or feature-based sentiment analysis as it was called

earlier in (Hu and Liu 2004;Liu2010).

18 B. Liu

We should note that due to the simpliﬁcation, the quintuple representation of

opinion may result in information loss. For example, ink is a part of printer.A

printer review might say “The ink of this printer is expensive.” This sentence does

not say that the printer is expensive (expensive here indicates the aspect price). If

one does not care about any attribute of the ink, this sentence just gives a negative

opinion about the ink (which is an aspect of the printer entity). This results in

information loss. However, if one also wants to study opinions about different

aspects of the ink, then the ink needs to be treated as a separate entity. The quintuple

representation still applies, but an extra mechanism will be required to record the

part-of relationship between ink and printer. Of course, conceptually we can also

extend the ﬂat quintuple relation to a nested relation to make it more expressive.

However, as we explained above, too complex a deﬁnition can make the problem

extremely difﬁcult to solve in practice. Despite this limitation, Deﬁnition 4does

cover the essential information of an opinion sufﬁciently for most applications.

In some applications, it may not be easy to distinguish entity and aspect or there

is no need to distinguish them. Such cases often occur when people discuss political

or social issues, e.g., “I hate property tax increases.” We may deal with them in two

ways. First, since the author regards ‘property tax increase’ as a general issue and it

thus does not belong to any speciﬁc entity. We can treat it as an entity with the aspect

GENERAL. Second, we can regard ‘property tax’ as an entity and ‘property tax

increases’ as one of its aspects to form a hierarchical relationship. Whether treating

an issue/topic as an aspect or an entity can also depend on the speciﬁc context.

For example, in commenting about a local government, one says “I hate the

proposed property tax increase.” Since it is the local government that imposes and

levies property taxes, the speciﬁc local government may be regarded as an entity

and ‘the proposed property tax increase’ as one of its aspects.

Not all applications need all ﬁve components of an opinion. In some applications,

the user may not need the aspect information. For example, in brand management,

the user typically is interested in only opinions about product brands (entities). This

is sometimes called entity-based sentiment analysis. In some other applications,

the user may not need to know the opinion holder or time of opinion. Then these

components can be ignored.

2.1.5 Reason and Qualiﬁer for Opinion

We can in fact perform an even ﬁner-grained analysis of opinions. Let us use the

sentence “This car is too small for a tall person” to explain. It expresses a negative

sentiment about the size aspect of the car. However, only reporting the negative

sentiment for size does not tell the whole story because it can mean too small or too

big. In the above sentence, we call “too small”thereason for the negative sentiment

about size. Furthermore, the sentence does not say that the car is too small for

everyone, but only for a tall person. We call “for a tall person”thequaliﬁer of

the opinion. We now deﬁne these concepts.

2 Many Facets of Sentiment Analysis 19

Deﬁnition 8 (Reason for Opinion) A reason for an opinion is the cause of the

opinion.

In practical applications, discovering the reasons for each positive or negative

opinion can be very important because it may be these reasons that enable one to

perform actions to remedy the situation. For example, the sentence “I do not like the

picture quality of this camera” is not as useful as “I do not like the picture quality of

this camera because the pictures are quite dark.” The ﬁrst sentence does not give the

reason for the negative sentiment about the picture quality and it is thus difﬁcult to

know what to do to improve the picture quality. The second sentence is more infor-

mative because it gives the reason or cause for the negative sentiment. The camera

manufacturer can make use of this piece of information to improve the picture qual-

ity of the camera. In most industrial applications, such reasons are called problems

or issues. Knowing the issues allows businesses to ﬁnd ways to address them.

Deﬁnition 9 (Qualiﬁer of Opinion) A qualiﬁer of an opinion limits or modiﬁes

the meaning of the opinion.

Knowing the qualiﬁer is also important in practice because it tells what the

opinion is good for. For example, “This car is too small for a tall person” does

not say that the car is too small for everyone, but just for tall people. For a person

who is not tall, this opinion does not apply.

However, as we have seen, not every opinion comes with an explicit reason

and/or an explicit qualiﬁer. “The picture quality of this camera is not great” does not

have a reason or a qualiﬁer. “The picture quality of this camera is not good for night

shots” has a qualiﬁer “for night shots,” but does not give a speciﬁc reason for the

negative sentiment. “The picture quality of this camera is not good for night shots

as the pictures are quite dark” has a reason for the negative sentiment (‘the pictures

are quite dark’) and also a qualiﬁer (‘for night shots’). Sometimes, the qualiﬁer and

the reason may not be in the same sentence and/or may be quite implicit, e.g., “The

picture quality of this camera is not great. Pictures of night shots are very dark”

and “I am 6 feet 5 inches tall.This car is too small for me.” An expression can also

serve multiple purposes. For example, ‘too small’ in the above sentence indicates

the size aspect of the car, a negative sentiment about the size, and also the reason

for the negative sentiment/opinion.

2.1.6 Objective and Tasks of Sentiment Analysis

With the deﬁnitions in Sects. 2.1.1,2.1.2,2.1.3 and 2.1.4, we can now present the

core objective and the key tasks of (aspect-based) sentiment analysis.

Objective of Sentiment Analysis Given an opinion document d, discover all

opinion quintuples (e,a,s,h,t)ind. For more advanced analysis, discover the

reason and qualiﬁer for the sentiment in each opinion quintuple.

20 B. Liu

Key Tasks of Sentiment Analysis The key tasks of sentiment analysis can

be derived from the ﬁve components of the quintuple (Deﬁnition 7). The ﬁrst

component is the entity and the ﬁrst task is to extract entities. The task is similar

to named entity recognition (NER) in information extraction (Hobbs and Riloff

2010; Sarawagi 2008). However, as deﬁned in Deﬁnition 3, an entity can also be

an event, issue, or topic, which is usually not a named entity. For example, in “I

hate tax increase,” the entity is ‘tax increase,’ which is an issue or topic. In such

cases, entity extraction is basically the same as aspect extraction and the difference

between entity and aspect becomes blurry. In some applications, there may not be a

need to distinguish them.

After extraction, we need to categorize the extracted entities as people often write

the same entity in different ways. For example, Motorola may be written as Mot,

Moto, and Motorola. We need to recognize that they all refer to the same entity (see

(Liu 2015) for details).

Deﬁnition 10 (Entity Category and Entity Expression) An entity category

represents a unique entity, while an entity expression or mention is an actual word

or phrase that indicates an entity category in the text.

Each entity or entity category should have a unique name in a particular

application. The process of grouping or clustering entity expressions into entity

categories is called entity resolution or grouping.

For aspects of entities, the problem is basically the same as for entities. For

example, picture,image, and photo refer to the same aspect for cameras. We thus

need to extract aspect expressions and resolve them.

Deﬁnition 11 (Aspect Category and Aspect Expression) An aspect category of

an entity represents a unique aspect of the entity, while an aspect expression or

mention is an actual word or phrase that indicates an aspect category in the text.

Each aspect or aspect category should also have a unique name in a particular

application. The process of grouping aspect expressions into aspect categories

(aspects) is called aspect resolution or grouping.

Aspect expressions are usually nouns and noun phrases but can also be verbs,

verb phrases, adjectives, and adverbs. They can also be explicit or implicit (Hu and

Liu 2004).

Deﬁnition 12 (Explicit Aspect Expression) Aspect expressions that appear in an

opinion text as nouns and noun phrases are called explicit aspect expressions.

For example, ‘picture quality’in“The picture quality of this camera is great”is

an explicit aspect expression.

Deﬁnition 13 (Implicit Aspect Expression) Aspect expressions that are not nouns

or noun phrases but indicate some aspects are called implicit aspect expressions.

For example, expensive is an implicit aspect expression in “This camera is

expensive.” It implies the aspect price. Many implicit aspect expressions are

adjectives and adverbs used to describe or qualify some speciﬁc aspects, e.g.,

expensive (price), and reliably (reliability). They can also be verb and verb phrases,

e.g., “I can install the software easily.” Install indicates the aspect installation.

2 Many Facets of Sentiment Analysis 21

Implicit aspect expressions are not just individual adjectives, adverbs, verbs and

verb phrases; they can be very complex. For example, in “This camera will not easily

ﬁt in my pocket,” ‘ﬁt in my pocket’ indicates the aspect size (and/or shape). In the

sentence “This restaurant closes too early,” ‘closes too early’ indicates the aspect

of closing time of the restaurant. In both cases, some commonsense knowledge may

be needed to recognize them.

Aspect extraction is a very challenging problem, especially when it involves

verbs and verb phrases. In some cases, it is even very hard for human beings to

recognize and to annotate. For example, in a vacuum cleaner review, one wrote

“The vacuum cleaner does not get the crumbs out of thick carpets,” which seems to

describe only one very speciﬁc aspect, ‘get the crumbs out of thick carpets.’ But in

practice, it may be more useful to decompose it into three different aspects indicated

by (1) ‘get something out of,’ (2) crumbs, and (3) ‘thick carpets.’ (1) represents the

suction power of the vacuum cleaner in general, (2) represents suction related to

crumbs, and (3) represents suction related to ‘thick carpets.’ All three are important

and useful because the user may be interested in knowing whether the vacuum can

suck crumbs, and whether it works well with thick carpets.

The third component in the opinion deﬁnition is the sentiment. For this, we

need to perform sentiment classiﬁcation or regression to determine the sentiment

orientation or score on the involved aspect and/or entity. The fourth component and

ﬁfth components are opinion holder and opinion posting time respectively. They

also have expressions and categories as entities and aspects. I will not repeat their

deﬁnitions. Note that opinion holders (Bethard et al. 2004; Choi et al. 2005;Kim

and Hovy 2004) are also called opinion sources in (Wiebe et al. 2005).

Based on the above discussions, we can now deﬁne a model of entity and a model

of opinion document (Liu 2006) and summarize the main sentiment analysis tasks.

Model of Entity An entity eis represented by itself as a whole and a ﬁnite set of

its aspects ADfa1, a2, :::,ang.ecan be expressed in text with any one of a ﬁnite

set of its entity expressions fee1, ee2, :::,eesg. Each aspect a2Aof entity ecan be

expressed with any one of its ﬁnite set of aspect expressions fae1, ae2, :::,aemg.

Model of Opinion Document An opinion document dcontains opinions about a

set of entities fe1,e2,:::,ergand a subset of aspects of each entity. The opinions

are from a set of opinion holders fh1,h2,:::,hpgand are given at a particular time

point t.

Given a set of opinion documents D, sentiment analysis performs the following

eight (8) main tasks:

Task 1 (entity extraction and resolution): Extract all entity expressions in D, and

group synonymous entity expressions into entity clusters (or categories). Each

entity expression cluster refers to a unique entity e.

Task 2 (aspect extraction and resolution): Extract all aspect expressions of the

entities, and group these aspect expressions into clusters. Each aspect expression

cluster of entity erepresents a unique aspect a.

22 B. Liu

Task 3 (opinion holder extraction and resolution): Extract the holder expression

of each opinion from the text or structured data and group them. The task is

analogous to tasks 1 and 2.

Task 4 (time extraction and standardization): Extract the posting time of each

opinion and standardize different time formats.

Task 5 (aspect sentiment classiﬁcation or regression): Determine whether an opin-

ion about an aspect a(or entity e) is positive, negative or neutral (classiﬁcation),

or assign a numeric sentiment rating score to the aspect (or entity) (regression).

Task 6 (opinion quintuple generation): Produce all opinion quintuples (e,a,s,h,t)

expressed in Dbased on the results from tasks 1–5. This task is seemingly very

simple but it is in fact quite difﬁcult in many cases as Review B below shows.

For more advanced analysis, we also need to perform the following two

additional tasks, which are analogous to task 2:

Task 7 (opinion reason extraction and resolution): Extract reason expressions for

each opinion, and group all reason expressions for each aspect or entity and each

sentiment orientation into clusters. Each cluster for an aspect (or entity) and a

sentiment orientation represents a unique reason for the aspect (or entity) and the

orientation.

Task 8 (opinion qualiﬁer extraction and resolution): Extract qualiﬁer expressions

for each opinion, and group all qualiﬁer expressions for each aspect (or entity)

and each sentiment orientation into clusters. Each cluster for an aspect (or entity)

and a sentiment orientation represents a unique qualiﬁer for the aspect (or entity)

and the orientation.

Although reasons for and qualiﬁers of opinions are useful, their extraction and

categories are very challenging. Little research has been done about them so far.

We use an example review to illustrate the tasks (a sentence id is again associated

with each sentence) and the mining results.

Review B

Posted by: bigJohn

Date: Sept. 15, 2011

(1) I bought a Samsung camera and my friend brought a Canon camera yesterday.(2)In the

past week, we both used the cameras a lot.(3)The photos from my Samy are not clear for

night shots, and the battery life is short too.(4)My friend was very happy with his camera

and loves its picture quality.(5)I want a camera that can take good photos.(6)I am going

to return it tomorrow.

Task 1 should extract the entity expressions, Samsung,Samy, and Canon, and

group Samsung and Samy together because they represent the same entity. Task 2

should extract aspect expressions picture,photo, and battery life, and group picture

and photo together as they are synonyms for cameras. Task 3 should ﬁnd that the

holder of the opinions in sentence (3) is bigJohn (the blog author) and that the holder

of the opinions in sentence (4) is bigJohn’s friend.Task 4 should ﬁnd that the time

when the blog was posted is Sept-15-2011. Task 5 should ﬁnd that sentence (3)

gives a negative opinion to the picture quality of the Samsung camera and a negative

2 Many Facets of Sentiment Analysis 23

opinion also to its battery life. Sentence (4) gives a positive opinion to the Canon

camera as a whole and also to its picture quality. Sentence (5) seemingly expresses

a positive opinion, but it does not. To generate opinion quintuples for sentence (4)

we need to know what ‘his camera’ and its refer to. Task 6 should ﬁnally generate

the following opinion quintuples:

1. (Samsung, picture_quality, negative, bigJohn, Sept-15-2011)

2. (Samsung, battery_life, negative, bigJohn, Sept-15-2011)

3. (Canon, GENERAL, positive, bigJohn’s_friend, Sept-15-2011)

4. (Canon, picture_quality, positive, bigJohn’s_friend, Sept-15-2011)

With more advanced mining and analysis, we also ﬁnd the reasons and qualiﬁers

of opinions. None below means unspeciﬁed.

1. (Samsung, picture_quality, negative, bigJohn, Sept-15-2011)

Reason for opinion: picture not clear

Qualiﬁer of opinion: night shots

2. (Samsung, battery_life, negative, bigJohn, Sept-15-2011)

Reason for opinion: short battery life

Qualiﬁer of opinion: none

3. (Canon, GENERAL, positive, bigJohn’s_friend, Sept-15-2011)

Reason for opinion: none

Qualiﬁer of opinion: none

4. (Canon, picture_quality, positive, bigJohn’s_friend, Sept-15-2011)

Reason for opinion: none

Qualiﬁer of opinion: none

2.2 Deﬁnition of Opinion Summary

Unlike facts, opinions are subjective (although they may not be all expressed

in subjective sentences). An opinion from a single opinion holder is usually not

sufﬁcient for action. In almost all applications, the user needs to analyze opinions

from a large number of opinion holders. This tells us that some form of summary

of opinions is necessary. The question is what an opinion summary should be. On

the surface, an opinion summary is just like a multi-document summary because we

need to summarize multiple opinion documents, e.g., reviews. It is, however, very

different from traditional multi-document summary. Although there are informal

descriptions about what a traditional multi-document summary should be, it is never

formally deﬁned. A traditional multi-document summary is often just “deﬁned”

operationally based on each speciﬁc algorithm that produces the summary. Thus

different algorithms produce different kinds of summaries. The resulting summaries

are also hard to evaluate. An opinion summary in its core form, on the other hand,

can be deﬁned precisely based on the quintuple deﬁnition of opinion and easily

evaluated. That is, all opinion summarization algorithms should aim to produce the

24 B. Liu

same summary. Although they may still produce different ﬁnal summaries, that is

due to their different accuracies. This core form of opinion summary is called the

aspect-based opinion summary (or feature-based opinion summary) (Hu and Liu

2004; Liu et al. 2005)

Deﬁnition 11 (Aspect-Based Opinion Summary) The aspect-based opinion sum-

mary about an entity eis of the following form:

GENERAL: number of opinion holders who are positive about entity enumber of

opinion holders who are negative about entity e

Aspect 1: number of opinion holders who are positive about aspect 1 of entity e

number of opinion holders who are negative about aspect 1 of entity e

:::

Aspect n: number of opinion holders who are positive about aspect nof entity e

number of opinion holders who are negative about aspect nof entity e

where GENERAL represents the entity eitself and nis the total number of aspects

of e.

The key features of this opinion summary deﬁnition are that it is based on positive

and negative opinions about each entity and its aspects and that it is quantitative.

The quantitative perspective is reﬂected by the numbers of positive and negative

opinions. In an application, the number counts can also be replaced by percentages.

The quantitative perspective is especially important in practice. For example, 20%

of the people positive about a product is very different from 80% of the people

positive about the product.

To illustrate this form of summary, we summarize a set of reviews of a digital

camera, called digital camera 1, in Figure 2.1. This is called a structured summary

in contrast to a traditional text summary of a short document generated from one

or multiple long documents. In the ﬁgure, 105 reviews expressed positive opinions

about the camera itself denoted by GENERAL and 12 expressed negative opinions.

Picture quality and battery life are two camera aspects. 75 reviews expressed

positive opinions about the picture quality, and 42 expressed negative opinions.

Digital Camera 1:

Aspect: GENERAL

Positive: 105 <Individual review sentences>

Negative: 12 <Individual review sentences>

Aspect: Picture quality

Positive: 75 <Individual review sentences>

Negative: 42 <Individual review sentences>

Aspect: Battery life

Positive: 50 <Individual review sentences>

Negative: 9 <Individual review sentences>

…

Fig. 2.1 An aspect-based opinion summary

2 Many Facets of Sentiment Analysis 25

We also added <Individual review sentences>, which can be a link pointing to the

sentences and/or the whole reviews that contain the opinions (Hu and Liu 2004;

Liu et al. 2005). With this summary, one can easily see how existing customers feel

about the camera. If one is interested in a particular aspect and additional details,

one can drill down by following the <Individual review sentences> link to see the

actual opinion sentences or reviews.

In a more advanced analysis, we can also summarize opinion reasons and qual-

iﬁers in a similar way. Based on my experience, qualiﬁers for opinion statements

are rare, but reasons for opinions are quite common. To perform the task, we need

another level of summary. For example, in the example of Figure 2.1, we may want

to summarize the reasons for the poor picture quality based on the sentences in

<Individual review sentences>. We may ﬁnd that 35 people say the pictures are not

bright enough and 7 people say that the pictures are blurry.

Based on the idea of aspect-based summary, researchers have proposed many

opinion summarization algorithms, and also extended this form of summary to some

other more specialized forms (Liu 2015).

2.3 Affect, Emotion, and Mood

Affect,emotion, and mood have been studied extensively in several ﬁelds, e.g.,

psychology, philosophy, and sociology. However, investigations in these ﬁelds are

seldom concerned with the language expressions used to express such feelings.

Their main concerns are people’s psychological states of mind, theorizing what

affect, emotion and mood are, what constitute basic emotions, what physiological

reactions happen (e.g., heart rate changes, blood pressure, sweating and so on),

what facial expressions, gestures and postures are, and measuring and investigating

the impact of such mental states. These mental states have also been exploited

extensively in application areas such as marketing, economics, and education.

However, even with such extensive research, understanding these concepts is still

slippery and confusing because different theorists often have somewhat different

deﬁnitions for them and even do not completely agree with each other about what

emotion, mood, and affect are. For example, about emotion, diverse theorists have

proposed that there are from two to twenty basic human emotions and some even

do not believe there is such a thing called basic emotions (Ortony and Turner 1990).

In most cases, emotion and affect are regarded as synonymous, and indeed, all three

terms are sometimes used interchangeably. Affect is also used as an encompassing

term covering all topics related to emotion, feeling, and mood. To make matters

worse, in applications, researchers and practitioners use these concepts loosely in

whatever way they feel like to without following any established deﬁnitions. Thus

one is often left puzzled by just what an author means when the word emotion,

mood, or affect is used. In most cases, the deﬁnition of each term also uses one or

more of the other terms resulting in circular deﬁnitions, which causes further confu-

sion. The good news for natural language processing researchers and practitioners

26 B. Liu

is that in practical applications of sentiment analysis, we needn’t be too concerned

with such an unsettled state of affair because in practice we can pick up and use

whatever emotion or mood states that are suitable for the applications at hand.

This section ﬁrst tries to create a reasonable understanding of these concepts

and their relationships for our tasks of natural language processing in general and

sentiment analysis in particular. It then puts these three concepts in the context of

sentiment analysis and discusses how they can be handled in sentiment analysis.

2.3.1 Affect, Emotion, and Mood in Psychology

We start the discussion with the dictionary deﬁnitions of affect, emotion, and mood1.

The concept of feeling is also included as all three concepts are about human

feelings. From the deﬁnitions, we can see how difﬁcult it is to explain or to articulate

these concepts:

•Affect: Feeling or emotion, especially as manifested by facial expression or body

language.

•Emotion: A mental state that arises spontaneously rather than through conscious

effort and is often accompanied by physiological changes.

•Mood: A state of mind or emotion.

•Feeling: An affective state of consciousness, such as that resulting from emo-

tions, sentiments, or desires.

These deﬁnitions are confusing from a scientiﬁc point of view because we do not

see a clear demarcation for each concept. We turn to the ﬁeld of psychology to look

for a better deﬁnition for each of them. The convergence of views and ideas among

theorists in the past twenty years gives us a workable classiﬁcation scheme.

An affect is commonly deﬁned as an neurophysiological state consciously

accessible as the simplest raw (nonreﬂective) feeling evident in moods and emotions

(Russell 2003). The key point here is that such a feeling is primitive and not directed

at an object. For example, you are watching a scary movie. If you are affected,

it moves you and you experience a feeling of being scared. Your mind further

processes this feeling and expresses it to yourself and the world around you. The

feeling is then displayed as an emotion, such as crying, shock, and scream.

Emotion is thus the indicator of affect. Due to cognitive processing, emotion is a

compound (rather than primitive) feeling concerned with a speciﬁc object, such as

a person, an event, a thing, or a topic. It tends to be intense and focused and lasts a

short period of time. Mood, like emotion, is a feeling or affective state but it typically

lasts longer than emotion and tends to be more unfocused and diffused. Mood is also

less intense than emotion. For example, you may wake up feeling happy and stay

that way for most of the day.

1http://www.thefreedictionary.com/subjective

2 Many Facets of Sentiment Analysis 27

In short, emotions are quick and tense, while moods are more diffused and

prolonged feelings. For example, we can get very angry very quickly, but it is

difﬁcult to stay very angry for a long time. The anger emotion may subside into an

irritable mood that can last quite a long time. An emotion is usually very speciﬁc,

triggered by noticeable events, which means that an emotion has a speciﬁc target.

In this sense, emotion is like a rational opinion. On the other hand, a mood can be

caused by multiple events, and sometimes it may not have any speciﬁc targets or

causes. Mood typically also has a dimension of future expectation. It can involve a

structured set of beliefs about general expectations of a future experience of pleasure

or pain, or of positive or negative affect in the future (Batson et al. 1992).

Since sentiment analysis is not so much concerned with affect as deﬁned above,

below we focus only on emotion and mood in the psychological context. Let us start

with emotion. Emotion has been frequently mentioned in sentiment analysis. Since

it has a target or an involved entity, it ﬁts the sentiment analysis context naturally.

Almost all applications are interested in opinions and emotions about some target

entities.

Theorists in psychology have grouped emotions into categories. However, as we

mentioned earlier, there is still not a set of agreed basic (or primary) emotions among

theorists. In (Ortony and Turner 1990), the basic emotions proposed by several

theorists were compiled to show there is a great deal of disagreement. We reproduce

them in Table 2.1.

In (Parrott 2001), apart from the basic emotions, secondary and tertiary emotions

were also proposed (see Table 2.2). These secondary and tertiary are useful in some

Table 2.1 Basic emotions from different theorists

Source Basic emotions

Arnold (1960) Anger, aversion, courage, dejection, desire, despair, fear,

hate, hope, love, sadness

Ekman et al. (1982)Anger, disgust, fear, joy, sadness, surprise

Gray (1982) Anxiety, joy, rage, terror

Izard (1971) Anger, contempt, disgust, distress, fear guilt, interest, joy,

shame, surprise

James (1884) Fear, grief, love, rage

McDougall (1926) Anger, disgust, elation, fear, subjection, tender-emotion,

wonder

Mowrer (1960) Pain, pleasure

Oatley and Jobnson-Laird (1987)Anger, disgust, anxiety, happiness, sadness

Panksepp (1982) Expectancy, fear, rage, panic

Plutchik (1980) Acceptance, anger, anticipation, disgust, joy, fear, sadness,

surprise

Tomkins (1984) Anger, interest, contempt, disgust, distress, fear, joy,

shame, surprise

Watson (1930) Fear, love, rage

Weiner and Graham (1984)Happiness, sadness

Parrott (2001)Anger, fear, joy, love, sadness, surprise

28 B. Liu

Table 2.2 Primary, Secondary and Tertiary emotions from Parrott (2001)

Primary

emotion

Secondary

emotion Tertiary emotion

Anger Disgust Contempt, loathing, revulsion

Envy Jealousy

Exasperation Frustration

Irritability Aggravation, agitation, annoyance, crosspatch, grouchy,

grumpy

Rage Anger, bitter, dislike, ferocity, fury, hatred, hostility, outrage,

resentment, scorn, spite, vengefulness, wrath

Torment Torment

Fear Horror Alarm, fear, fright, horror, hysteria, mortiﬁcation, panic,

shock, terror

Nervousness Anxiety, apprehension (fear), distress, dread, suspense,

uneasiness, worry

Cheerfulness Amusement, bliss, gaiety, glee, jolliness, joviality, joy,

delight, enjoyment, gladness, happiness, jubilation, elation,

satisfaction, ecstasy, euphoria

Joy Contentment Pleasure

Enthrallment Enthrallment, rapture

Optimism Eagerness, hope

Pride Triumph

Relief Relief

Zest Enthusiasm, excitement, exhilaration, thrill, zeal

Love Affection Adoration, attractiveness, caring, compassion, fondness,

liking, sentimentality, tenderness

Longing Longing

Lust/sexual

desire

Desire, infatuation, passion

Disappointment Dismay, displeasure

Neglect Alienation, defeatism, dejection, embarrassment,

homesickness, humiliation, insecurity, insult, isolation,

loneliness, rejection

Sadness Sadness Depression, despair, gloom, glumness, grief, melancholy,

misery, sorrow, unhappy, woe

Shame Guilt, regret, remorse

Suffering Agony, anguish, hurt

Sympathy Pity, sympathy

Surprise Surprise Amazement, astonishment

sentiment analysis applications because the set of basic emotions may not be ﬁne-

grained enough. For example, in one of the applications that I worked on, the client

was interested in detecting optimism in the ﬁnancial market. Optimism is not a basic

emotion in the list of any theorist, but it is a secondary emotion for joy in Table 2.2.

Note that although the words in Table 2.2 describe different emotions or states of

mind, they can also be used as part of an emotion lexicon in sentiment analysis to

2 Many Facets of Sentiment Analysis 29

spot different kinds of emotions. Of course, they need to be signiﬁcantly expanded

to include those synonymous words and phrases to form a reasonably complete

emotion lexicon. In fact, there are some emotion lexicons that have been compiled

by researchers, see (Liu 2015). Note also that for sentiment analysis, we do not need

to be concerned with the disagreement of theorists. For a particular application, we

can choose the types of emotion that are useful to the application. We also do not

need to worry about whether they are primary, second or tertiary.

The emotion annotation and representation language (EARL) proposed by the

Human-Machine Interaction Network on Emotion (HUMAINE) (HUMAINE 2006)

has classiﬁed 48 emotions into different kinds of positive and negative orientations

or valences (Table 2.3). This is useful to us because sentiment analysis is mainly

interested in expressions with positive or negative orientations or polarities (also

called valences). However, we should take note that some emotions do not have

positive or negative orientations, e.g., surprise and interest. Some psychologists felt

that these should not be regarded as emotions (Ortony and Turner 1990)simply

because they do not have positive or negative orientations or valences. For the same

reason, they are not commonly used in sentiment analysis.

Table 2.3 HUMAINE polarity annotations of emotions

Negative and forceful Negative and passive Quiet positive

Anger Boredom Calm

Annoyance Despair Content

Contempt Disappointment Relaxed

Disgust Hurt Relieved

Irritation Sadness Serene

Negative and not in control Positive and lively Caring

Anxiety Amusement Affection

Embarrassment Delight Empathy

Fear Elation Friendliness

Helplessness Excitement Love

Powerlessness Happiness

Worry Joy

Pleasure

Negative thoughts Positive thoughts Reactive

Doubt Courage Interest

Envy Hope Politeness

Frustration Pride Surprised

Guilt Satisfaction

Shame Trust

Agitation

Stress

Shock

Tension

30 B. Liu

We now turn to mood. The types of mood are similar to those of emotion except

that the types of emotion that last only momentarily will not usually be moods, e.g.,

surprise and shock. Thus, the words or phrases used to express moods are similar

to those for emotions too. However, since mood is a feeling that lasts a relatively

long time, is diffused, and may not have a clear cause or target object, it is hard

to recognize unless a person explicitly says it, e.g., I feel sad today. We can also

monitor one’s writings over a period of time to assess his/her prevailing mood in

the period, which can help discover people with prolonged mental or other medical

conditions (e.g., chronicle depression) and even the tendency to commit suicides or

crimes.

It is also interesting to discover the mood of the general population, e.g., public

mood, and the general atmosphere between organizations or countries, e.g., the

mood of US and Russian relations, by monitoring the traditional news media and/or

social media over a period of time.

2.3.2 Affect, Emotion, and Mood in Sentiment Analysis

The above discussions are only about people’s states of mind, which are the subjects

of study of psychologists. However, for sentiment analysis, we need to know how

such feelings are expressed in natural language and how they can be recognized.

This leads us to the linguistics of affect, emotion and mood. Affect as deﬁned

by psychologists as a primitive response or feeling with no target is not much

of interest to us as almost everything written in text or displayed in the form of

facial expressions and other visible signs have already gone through some cognitive

processing to become emotion or mood. However, we note that the term affect is still

commonly used in linguistics and many other ﬁelds to mean emotion and mood.

Wikipedia has a good page describing the linguistic aspect of emotion and

mood. There are two main ways that human beings express themselves, speech

and writing. In addition to choices of grammatical and lexical expressions, which

are common to both speech and writing (see below), speaker emotion can also be

conveyed through paralinguistic mechanisms such as intonations, facial expressions,

body movements, biophysical signals or changes, gestures, and postures. In writing,

special punctuations (e.g., repeated exclamation marks, !!!!), capitalization of all

letters of a word, emoticons, and lengthening of words (e.g., sloooooow)are

frequently used, especially in social media.

Regarding choices of grammatical and lexical expressions, there are several

common ways that people often employ to express emotions or moods:

1. use emotion or mood words or phrases such as love, disgusting, angry, and upset.

2. describe emotion-related behaviors, e.g., “He cried after he saw his mother” and

“After received the news, he jumped up and down for a few minutes like a small

boy.”

3. use intensiﬁers. As we discussed in Sect. 2.1.3, common English intensiﬁers

include very, so, extremely, dreadfully, really, awfully (e.g., awfully bad), terribly

2 Many Facets of Sentiment Analysis 31

(e.g., terribly good), never (e.g., “I will never buy any product from them again”),

the sheer number of, on earth (e.g., “What on earth do you think you are doing?”),

the hell (e.g., “What the hell are you doing?”), a hell of a, etc. To emphasize

further, intensiﬁers may be repeated, e.g., “This car is very very good.”

4. use superlatives. Arguably, many superlative expressions also express emotions,

e.g., “This car is simply the best.”

5. use pejorative (e.g., “He is a fascist.”), laudatory (e.g., “He is a saint.”), and

sarcastic expressions (e.g., “What a great car, it broke the second day”).

6. use swearing, cursing, insulting, blaming, accusing, and threatening expressions.

My experience is that using these clues is sufﬁcient to recognize emotion and

mood in text, although in linguistics, adversative forms, honoriﬁc and deferential

language, interrogatives, tag questions, and the like may also be employed to

express emotional feelings, but their uses are rare and are also hard to recognize

computationally.

To design emotion detection algorithms, in addition to considering the above

clues, we should be aware that there is a cognitive gap between people’s true

psychological states of mind and the language that they use to express such states.

There are many reasons (e.g., being polite, and do not want people to know

one’s true feeling) that they may not fully match. Thus, language does not always

represent psychological reality. For example, when one says “I am happy with this

car,” one may not have any emotional reaction towards the car although the emotion

word happy is used. Furthermore, emotion and mood are very difﬁcult to distinguish

in written text (Alm 2008). We normally do not distinguish them. When we say

emotion, we mean emotion or mood.

Since emotions have targets and most of them also imply positive or negative

sentiment, they can be represented and handled in very much the same way as

rational opinions. Although a rational opinion emphasizes a person’s evaluation

about an entity and an emotion emphasizes a person’s feeling caused by an entity,

emotion can essentially be regarded as sentiment with a stronger intensity (see Sect.

2.1.3). It is often the case that when the sentiment of a person becomes so strong,

he/she becomes emotional. For example, “The hotel manager is not professional”

expresses a rational opinion, while “I almost cried when the hotel manager talked to

me in a hostile manner” indicates that the author’s sentiment reached the emotional

level of sadness and/or anger. The sentiment orientation of an emotion naturally

inherits the polarity of the emotion, e.g., sad,anger,disgust, and fear are negative,

and love and joy are positive. At the emotional level, sentiment becomes more ﬁne-

grained. Additional mechanisms are needed to recognize different types of emotions

in writing.

Due to the similarity of emotion and rational opinion, we can still use the

quadruple or quintuple representation of opinion (Deﬁnitions 1and 7) to represent

emotion. However, if we want to be more precise, we can give it a separate deﬁnition

based on the quadruple (Deﬁnition 1) or quintuple (Deﬁnition 7) deﬁnitions as the

meanings of some components in the tuple are not the exactly same as they were in

the opinion deﬁnition because emotions focus on personal feelings, while rational

opinions focus on evaluations.

32 B. Liu

Deﬁnition 14 (Emotion) An emotion is a quintuple,

.e;a;m;f;t/;

where eis the target entity, ais the target aspect of ethat is responsible for the

emotion, mis the emotion type or a pair representing an emotion type and an

intensity level, fis the feeler of the emotion, and tis the time when the emotion

is expressed.

For example, for the emotion expressed in the sentence “I am so upset with the

manager of the hotel,” the entity is ‘the hotel,’ the aspect is ‘the manager’ofthe

hotel, the emotion type is anger, and the emotion feeler is I(the author). If we

know the time when the emotion was expressed we can add it to the quintuple

representation. As another example, in “After hearing his brother’s death, he burst

into tears.” the target entity is ‘his brother’s death,’ which is an event, and there is

no aspect. The emotion type is sadness and the emotion feeler is he.

In practical applications, we should integrate the analysis of rational opinions

and emotions, we may also want to add the sentiment orientation or polarity of

an emotion, i.e., whether it is positive (desirable) or negative (undesirable) for the

feeler. If that is required, a sentiment component can be included to Deﬁnition 14 to

make it a sextuple.

Cause of Emotion In Sect. 2.1.5, we discussed the reasons for opinions. In a

similar way, emotions have causes as emotions are usually caused by some internal

or external events. Here we use the word cause instead of reason because an emotion

is an effect produced by a cause (usually an event), rather than a justiﬁcation or

explanation in support of an opinion. In the above sentence, ‘his brother’s death’is

the cause for his sadness emotion. Actually, ‘his brother’s death’ is both the target

entity and the cause. In many cases, the target and the cause of an emotion are

different. For example, in “I am so mad with the hotel manager because he refused

to refund my booking fee,” the target entity is the hotel, the target aspect is the

manager of the hotel, and the cause of the anger emotion is ‘he refused to refund

my booking fee.’ There is a subtle difference between ‘his brother’s death’ and ‘he

refused to refund my booking fee.’ The latter states an action performed by he (the

hotel manager) that causes the sadness emotion (negative). He is the agent of the

undesirable action. The sentiment on the hotel manager is negative. The sentence

also explicitly stated the anger is toward the hotel manager, In the case of ‘his

brother’s death,’‘his brother’ordeath alone is not the target of the emotion. It

is the whole event that is the target and the cause of the sadness emotion.

Unlike rational opinions, in many emotion and mood sentences, the authors

may not explicitly state the entities (e.g., named entities, topics, issues, actions

and events) that are responsible for the emotions or moods, e.g., “I felt a bit sad

this morning” and “There is sadness in her eyes.” The reason is that a rational

opinion sentence focuses on both the opinion target and the sentiment on the target

but the opinion holder is often omitted (e.g., “The pictures from this camera are

great”) while an emotion sentence focuses on the feeling of the feeler (e.g., “There

2 Many Facets of Sentiment Analysis 33

is sadness in her eyes.” This means that a rational opinion sentence contains both

sentiments and their targets explicitly, but may or may not give the opinion holder.

An emotion sentence always has feelers and emotion expressions, but may or may

not state the emotion target or the cause (e.g., “I love this car” and “I felt sad this

morning”). This does not mean that some emotions do not have targets or causes.

They do, but the targets or the causes may be expressed in previous sentences or

implied by the context, which makes extracting targets or causes very difﬁcult. In

the case of mood, the causes may be implicit or even unknown and are thus not

stated in the text.

2.4 Different Types of Opinions

Opinions can actually be classiﬁed along many dimensions. We discuss some main

classiﬁcations in this section.

2.4.1 Regular and Comparative Opinions

The type of opinion that we have deﬁned is called the regular opinion (Liu 2006).

Another type is comparative opinion (Jindal and Liu 2006b).

Regular Opinion Aregular opinion is often referred to simply as an opinion in the

literature. It has two main sub-types (Liu 2006):

Direct opinion:Adirect opinion is an opinion that is expressed directly on an entity

or an entity aspect, e.g., “The picture quality is great.”

Indirect opinion:Anindirect opinion is an opinion that is expressed indirectly on an

entity or aspect of an entity based on some positive or negative effects on some

other entities. This sub-type often occurs in the medical domain. For example,

the sentence “After injection of the drug, my joints felt worse” describes an

undesirable effect of the drug on ‘my joints,’ which indirectly gives a negative

opinion or sentiment to the drug. In this case, the entity is the drug and the aspect

is the effect on joints. Indirect opinions also occur in other domains, although

less frequently. In these cases, they are typically expressed beneﬁts (positive) or

issues (negative) of entities, e.g., “With this machine, I can ﬁnish my work in one

hour,which used to take me 5 hours” and “After switching to this laptop, my eyes

felt much better.” In marketing, beneﬁts of a product or service are regarded as

the major selling points. Thus, extracting such beneﬁts is of practical interest.

Comparative Opinion Acomparative opinion expresses a relation of similarities

or differences between two or more entities and/or a preference of the opinion holder

based on some shared aspects of the entities (Jindal and Liu 2006a,b). For example,

the sentences “Coke tastes better than Pepsi” and “Coke tastes the best” express

34 B. Liu

two comparative opinions. A comparative opinion is usually expressed using the

comparative or superlative form of an adjective or adverb, although not always (e.g.,

prefer). The deﬁnitions in Sects. 2.1 and 2.2 do not cover comparative opinion.

Comparative opinions have many types. See (Liu 2015) for their deﬁnitions.

2.4.2 Subjective and Fact-Implied Opinions

Opinions and sentiments are by nature subjective because they are about people’s

subjective views, appraisals, evaluations, and feelings. But when they are expressed

in actual text, they do not have to appear as subjective sentences. People can use

objective or factual sentences to express their happiness and displeasure because

facts can be desirable or undesirable. Conversely, not all subjective sentences

express positive or negative sentiments, e.g., “I think he went home,” which is a

belief and has no positive or negative orientation. Based on subjectivity, we can

classify opinions into two types, subjective opinions and fact-implied opinions.We

deﬁne them below.

Subjective Opinion An subjective opinion is a regular or comparative opinion

given in a subjective statement, e.g.,

“Coke tastes great.”

“I think Google’s proﬁt will go up next month.”

“This camera is a masterpiece.”

“We are seriously concerned about this new policy.”

“Coke tastes better than Pepsi.”

We can broadly classiﬁed subjective opinions into two categories: rational

opinions and emotional opinions (Sect. 2.1.3).

Fact-Implied Opinion Afact implied opinion is a regular or comparative opinion

implied in an objective or factual statement. Such an objective statement expresses

a desirable or undesirable fact or action. This type of opinion can be further divided

into two subtypes:

1. Personal fact-implied opinion: Such an opinion is implied by a factual state-

ment about someone’s personal experience, e.g.,

“I bought the mattress a week ago, and a valley has formed in the middle.”

“I bought the toy yesterday and I have already thrown it into the trash can.”

“My dad bought the car yesterday and it broke today.”

“The battery of this phone lasts longer than that of my previous Samsung phone.”

Although factual, these sentences tell us whether the opinion holder is positive

or negative about the product or his preference among different products. Thus,

the opinions implied by these factual sentences are no different from subjective

opinions.

2 Many Facets of Sentiment Analysis 35

2. Non-personal fact-implied opinion: This type is entirely different as it does not

imply any personal opinion. It often comes from fact reporting and the reported

fact does not give any opinion from anyone, e.g.,

“Google’s revenue went up by 30%.”

“The unemployment rate came down last week.”

“Google made more money than Yahoo last month.”

Unlike personal facts, these sentences do not express any experience or evalua-

tion from any person. For instance, the ﬁrst sentence above does not have the same

meaning as a sentiment resulted from a person who has used a Google product

and expresses a desirable or undesirable fact about the Google product. Since these

sentences do not give any personal opinion, they do not have opinion holders

although they do have the sources of information. For example, the source of the

information in the ﬁrst sentence above is likely to be Google itself, but it is a fact,

not a Google’s subjective opinion.

However, we can still treat them as a type of opinion sentences due to the

following two reasons:

1. Each of the sentences above does indicate a desirable and/or undesirable state for

the involved entities or topics (i.e., Google,Yahoo and unemployment rate) based

on our commonsense knowledge.

2. The persons who post the above sentences might be expressing positive or

negative opinions implicitly about the involved entities. For example, the person

who posted the ﬁrst sentence on Twitter is likely to have a positive sentiment

about Google; otherwise, he/she would probably not post the fact. This kind of

posts occur very frequently on Twitter, where Twitter users pick up some news

headlines from the traditional media and post them on Twitter. Many people may

also re-tweet them.

As we can see, it is important to distinguish personal facts and non-personal

facts as opinions induced from non-personal facts represent a very different type

of opinions and need a special treatment. How to deal with such facts depends on

applications. My recommendation is to assign it the positive or negative orientation

based on our commonsense knowledge whether the sentence is about a fact desirable

or undesirable to the involved entity, e.g., Google. Users of the sentiment analysis

system should be made aware of the convention so that they can make use the

opinion appropriately based on their applications.

Sometimes the author who posts such a fact may also give an explicit opinion,

e.g.,

“I am so upset that Google’s share price went up today.”

The clause ‘Google’s share price went up today’ in the example gives a non-

personal fact-implied positive opinion about Google, but the author is negative about

it. This is called a meta-opinion, an opinion about an opinion.

36 B. Liu

Subjective opinions are usually easier to deal with because the number of words

and phrases that can be used to explicitly express subjective feelings is limited, but

this is not the case for fact-implied opinions. There seem to be an inﬁnite number of

desirable and undesirable facts and every domain is different. Much of the existing

research has focused on subjective opinions. Limited work has been done about

fact-implied opinions (Zhang and Liu 2011).

2.4.3 First-Person and Non-First-Person Opinions

In some applications, it is important to distinguish those statements expressing

one’s own opinions from those statements expressing beliefs about someone else’s

opinions. For example, in a political election, one votes based on one’s belief of

each candidate’s stances on issues, rather than the true stances of the candidate,

which may or may not be the same.

1. First-person opinion: Such an opinion states one’s own attitude towards an

entity. It can be from a person, a representative of a group, or an organization.

Here are some example sentences expressing ﬁrst-person opinions.

“Tax increase is bad for the economy.”

“I think Google’s proﬁt will go up next month.”

“We are seriously concerned about this new policy.”

“Coke tastes better than Pepsi.”

Notice that not every sentence needs to explicitly use the ﬁrst person pronoun “I”

or “we,” or to mention an organization name.

2. Non-ﬁrst-person opinion: Such an opinion is expressed by a person stating

someone else’s opinion. That is, it is a belief of someone else’s opinion about

some entities or topics, e.g.,

“I think John likes Lenovo PCs.”

“Jim loves his iPhone.”

“President Obama supports tax increase.”

“I believe Obama does not like wars.”

2.4.4 Meta-opinions

Meta-opinions are opinions about opinions. That is, a meta-opinion’s target is also

an opinion which is usually contained in a subordinate clause. The opinion in the

subordinate clause can express either a fact with an implied opinion or a subjective

opinion. Let us see some examples:

“I am so upset that Google’s proﬁt went up”

“I am very happy that my daughter loves her new Ford car”

“I am so sad that Germany lost the game.”

2 Many Facets of Sentiment Analysis 37

These sentences look quite different from opinion sentences before. But they

still follow the same opinion deﬁnition in Deﬁnition 7. It is just that the target of

the meta-opinion in the main clause is now an opinion itself in the subordinate

clause. For example, in the ﬁrst sentence, the author is negative about ‘Google’s

proﬁt went up,’ which is the target of the meta-opinion in the main clause. So the

meta-opinion is negative, but its target is a regular positive opinion about ‘Google’s

proﬁt.’ In practice, these two types of opinions should be treated differently. Since

meta-opinions are rare, there is little research or practical work about them.

2.5 Author and Reader Standpoint

We can look at an opinion from two perspectives, that of the author (opinion holder)

who posts the opinion, and that of the reader who reads the opinion. Since opinions

are subjective, naturally the author and the reader may not see the same thing in the

same way. Let us use the following two example sentences to illustrate the point:

“This car is too small for me.”

“Google’s proﬁts went up by 30%.”

Since the author or the opinion holder of the ﬁrst sentence felt the car is too small,

a sentiment analysis system should output a negative opinion about the size of the

car. However, this does not mean that the car is too small for everyone. A reader may

actually like the small size, and feel positive about it. This causes a problem because

if the system outputs only a negative opinion about size, the reader will not know

whether it is too small or too large and then he/she would not see this positive aspect

for him/her. Fortunately, this problem can be dealt with by mining and summarizing

opinion reasons (see Sect. 2.1.2). Here ‘too small’ not only indicates a negative

opinion about the size but also the reason for the negative opinion. With the reason,

the reader can see a more complete picture of the opinion.

The second sentence represents a non-personal fact-implied opinion. As dis-

cussed in Sect. 2.4.1, the person who posts the fact is likely to be positive about

Google. However, the readers may have different feelings. Those who have ﬁnancial

interests in Google should feel happy, but Google’s competitors will not be thrilled.

In Sect. 2.4.2, we choose to assign positive sentiment to the opinion because our

commonsense knowledge says that the fact is desirable for Google. Users can decide

how to use the opinion based on their application needs.

2.6 Summary

This chapter described many facets of sentiment analysis. It started with the deﬁni-

tions of the concepts of opinion, sentiment, and opinion summary. The deﬁnitions

abstracted a structure from the unstructured natural language text, and also showed

that sentiment analysis is a multi-faceted problem with many interrelated sub-

38 B. Liu

problems. Researchers can exploit the inter-relationships to design more robust and

accurate solution techniques. This chapter also classiﬁed and discussed different

types of opinions. Along with these deﬁnitions and discussions, the important

concepts of affect, emotion and mood were introduced and deﬁned too. They

are closely related to, but are also different from conventional rational opinions.

Opinions emphasize evaluation or appraisal of some target objects, events or topics

(which are collectively called entities in this chapter), while emotions emphasize

people’s feelings caused by such entities.

After reading this chapter, I am sure that you would agree with me that on the one

hand, sentiment analysis is a challenging area of research involving many different

tasks and perspectives, and on the other, it is also highly subjective in nature. Thus,

I do not expect that you completely agree with me on everything in the chapter.

I also do not claim that this chapter covered all important aspects of sentiment

and opinion. My goal is to present a reasonably precise deﬁnition of sentiment

analysis (or opinion mining) and its related concepts, issues, and tasks. I hope I

have succeeded to some extent.

References

Alm, Ebba Cecilia Ovesdotter. 2008. Affect in text and speech: ProQuest.

Arnold, Magda B. 1960. Emotion and personality. New York: Columbia University Press.

Batson, C. Daniel, Laura L. Shaw, and Kathryn C. Oleson. 1992. Differentiating affect, mood, and

emotion: Toward functionally based conceptual distinctions. Emotion Review of Personality

and Social Psychology 13: 294–326.

Bethard, Steven, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou, and Dan Jurafsky. 2004.

Automatic extraction of opinion propositions and their holders. In Proceedings of the AAAI

spring symposium on exploring attitude and affect in text.

Chaudhuri, Arjun. 2006. Emotion and reason in consumer behavior. Oxford: Elsevier Butterworth-

Heinemann.

Choi, Yejin, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. 2005. Identifying sources of

opinions with conditional random ﬁelds and extraction patterns. In Proceedings of the human

language technology conference and the conference on empirical methods in natural language

processing (HLT/EMNLP-2005).

Ekman, P., W.V. Friesen, and P. Ellsworth. 1982. What emotion categories or dimensions can

observers judge from facial behavior? In Emotion in the human face, ed. P. Ekman, 98–110.

Cambridge: Cambridge University Press.

Gray, Jeffrey A. 1982. The neuropsychology of anxiety. Oxford: Oxford University Press.

Hobbs, Jerry R., and Ellen Riloff. 2010. Information extraction. In In handbook of natural language

processing, ed. N. Indurkhya and F.J. Damerau, 2nd ed. London: Chapman & Hall/CRC Press.

Hu, Minqing and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of

ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD-

2004).

HUMAINE. 2006. Emotion annotation and representation language. Available from: http://

emotion-research.net/projects/humaine/earl

Izard, Carroll Ellis. 1971. Thefaceofemotion. New York: Appleton-Century-Crofts.

James, William. 1884. What is an emotion? Mind 9: 188–205.

2 Many Facets of Sentiment Analysis 39

Jindal, Nitin and Bing Liu. 2006a. Identifying comparative sentences in text documents. In

Proceedings of ACM SIGIR conference on research and development in information retrieval

(SIGIR-2006).

———. 2006b. Mining comparative sentences and relations. In Proceedings of national conference

on artiﬁcial intelligence (AAAI-2006).

Kim, Soo-Min and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings of

interntional conference on computational linguistics (COLING-2004).

Liu, Bing. 2006. Web data mining: Exploring hyperlinks, contents, and usage data. Berlin:

Springer.

———. 2010. Sentiment analysis and subjectivity, in Handbook of natural language processing,

Second Edition, N. Indurkhya and F.J. Damerau, Editors.

———. 2015. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge:

Cambridge University Press.

Liu, Bing, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing

opinions on the web. Proceedings of international conference on world wide web (WWW-2005).

McDougall, William. 1926. An introduction to social psychology. Boston: Luce.

Mowrer, Orval Hobart. 1960. Learning theory and behavior. New York: Wiley.

Oatley, K., and P.N. Jobnson-Laird. 1987. Towards a cognitive theory of emotions. Cognition and

Emotion 1: 29–50.

Ortony, Andrew, and Terence J. Turner. 1990. What’s basic about basic emotions? Psychological

Review 97 (3): 315–331.

Panksepp, Jaak. 1982. Toward a general psychobiological theory of emotions. Behavioral and

Brain Sciences 5 (3): 407–422.

Parrott, W. Gerrod. 2001. Emotions in social psychology: Essential readings. Philadelphia:

Psychology Press.

Plutchik, Robert. 1980. A general psychoevolutionary theory of emotion. In Emotion: Theory,

research, and experience: Vol. 1. Theories of emotion, ed. R. Plutchik and H. Kellerman, 3–33.

New York: Academic Press.

Qiu, Guang, Bing Liu, Bu Jiajun, and Chun Chen. 2011. Opinion word expansion and target

extraction through double propagation. Computational Linguistics 37 (1): 9–27.

Russell, James A. 2003. Core affect and the psychological construction of emotion. Psychological

Review 10 (1): 145–172.

Sarawagi, Sunita. 2008. Information extraction. Foundations and Trends in Databases 1 (3): 261–

377.

Tomkins, Silvan. 1984. Affect theory. In Approaches to emotion, ed. K.R. Scherer and P. Ekman,

163–195. Hillsdale: Eribaum.

Watson, John B. 1930. Behaviorism. Chicago: Chicago University Press.

Weiner, B., and S. Graham. 1984. An attributional approach to emotional development. In Emotion,

cognition and behavior, ed. C.E. Izard, J. Kagan, and R.B. Zajonc, 167–191. New York:

Cambridge University Press.

Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and

emotions in language. Language Resources and Evaluation 39 (2): 165–210.

Zhang, Lei and Bing Liu. 2011. Identifying noun product features that imply opinions. In

Proceedings of the annual meeting of the Association for Computational Linguistics (short

paper) (ACL-2011).

Zhuang, Li, Feng Jing, and Xiaoyan Zhu. 2006. Movie review mining and summarization. In

Proceedings of ACM international conference on information and knowledge management

(CIKM-2006).

Chapter 3

Reﬂections on Sentiment/Opinion Analysis

Jiwei Li and Eduard Hovy

Abstract The detection of expressions of sentiment in online text has become a

popular Natural Language Processing application. The task is commonly deﬁned

as identifying the words or phrases in a given fragment of text in which the reader

understands that the author expresses some person’s positive, negative, or perhaps

neutral attitude toward a topic. These four elements—expression words, attitude

holder, topic, and attitude value—have evolved with hardly any discussion in the

literature about their foundation or nature. Speciﬁcally, the use of two (or three)

attitude values is far more simplistic than many examples of real language show.

In this paper we ask: where do sentiments come from? We focus on two basic

sources of human attitude—the holder’s non-logical/emotional preferences and the

fulﬁllment of the holder’s goals. After exploring each source we provide a notional

algorithm sketch and examples of how sentiment systems could provide richer and

more realistic accounts of sentiment in text.

Keywords Sentiment analysis • Opinion mining • Natural language processing •

Aspect extraction • Psychology of emotions

3.1 Introduction

Sentiment analysis is an application of natural language processing that focuses on

identifying expressions that reﬂect authors’ opinion-based attitude (i.e., good or bad,

like or dislike) toward entities (e.g., products, topics, issues) or facets of them (e.g.,

price, quality).

Since the early 2000s, a large number of models and frameworks have been

introduced to address this application, with emphasis on various aspects like opinion

related entity exaction, review mining, topic mining, sentiment summarization, rec-

J. Li ()

Computer Science Department, Stanford University, Stanford, 94305, CA, USA

e-mail: jiweil@stanford.edu

E. Hovy

Language Technology Institute, Carnegie Mellon University, Pittsburgh, 15213, PA, USA

e-mail: hovy@cmu.edu

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_3

42 J. Li and E. Hovy

ommendation, and these extracted from signiﬁcantly diverse text sources including

product reviews, news articles, social media (blogs, Twitter, forum discussions), and

so on.

However, despite this activity, disappointingly little has been published about

what exactly a sentiment or opinion actually is. It is generally simply assumed that

two (or perhaps three) polar values positive, negative, neutral) are enough, and that

they are clear, and that anyone would agree on how to assign such labels to arbitrary

texts. Further, existing methods, despite employing increasingly sophisticated (and

of course more powerful) models (e.g., neural nets), still essentially boil down to

considering individual or local combinations of words and matching them against

predeﬁned lists of words with ﬁxed sentiment values, and thus hardly transcend

what was described in the early work by Pang et al. (2002).

There is nothing against simple methods when they work, but they do not

always work, and without some discussion of why not, and where to go next, this

application remains rather technically uninteresting. The goal of this paper is to

identify gaps in the current sentiment analysis literature and to outline practical

computational ways to address these issues.

Goals, Expectations and Sentiments. We begin with the fundamental question

“What make people hold positive attitudes towards some entities and negative

attitudes toward others?”. The answer to this question is a psychological state that

relates to the opinion holder’s satisfaction and dissatisfaction with some aspect of

the topic in question. One of only two principal factors determines the answer: either

(1) the holder’s deep emotionally-driven, non-logical native preferences,or(2)

whether (and how well) one of the holder’s goals is fulﬁlled, and how (in what

ways) the goal is fulﬁlled.

Examples of the former are reﬂected in sentences like “I just like red” or “seeing

that makes me happy”. They are typiﬁed by adverbs like “just” and “simply” that

suggest that no further conscious psychological reﬂection or motivation obtains. Of

this class of factor we can say nothing computationally, and do not address it in the

rest of this chapter.

Fortunately, a large proportion of the attitudes people write about reﬂect the other

factor, which one can summarize as goal-driven utility. This relates primarily to

Consequentialism: both to Utilitarianism, in which pleasure, economic well-being

and the lack of suffering are considered desirable, but also to the general case that

morally justiﬁable actions (and the objects that enable them) are desirable. That

is, the ultimate basis for any judgment about the rightness or wrongness of one’s

actions, and hence of the objects that support/enable them, is a consideration of

their outcome, or consequence.

In everyday life, people establish and maintain goals or expectations, both long-

term or short-term, urgent or not-urgent, ones. Achieving these goals would ﬁll one

with satisfaction, otherwise dissatisfaction: a man walks into a restaurant to achieve

the goal of getting full, he cannot be satisﬁed if all food was sold out (the main goal

not being achieved). A voter would not be satisﬁed if his candidate or party fails to

win an election, since the longer-term consequences would generally work against

3 Reﬂections on Sentiment/Opinion Analysis 43

his own preferences. The generation of sentiment-related texts is guided by such

sorts of mental satisfaction and dissatisfaction induced by goals being achieved or

needs being fulﬁlled.

We next provide some examples to illustrate why identifying these aspects is

essential and fundamental for adequate sentiment/opinion analysis. Following the

most popular motivation for computational sentiment analysis, suppose we wish to

analyze customers’ opinions towards a product or an offering. It is not sufﬁcient to

simply determine that someone likes or dislikes something; to make that knowledge

useful and actionable, one also wants to know why that is the case. Especially when

one would like to change the opinion, it is important to determine what it is about

the topic that needs to be changed.

Case (1)

•Question:Why did the customer like detergent X?

•Customer’s review:The detergent removes stubborn stains.

No general sentiment indicator is found in the above review. But the review directly

provides the reason, and assuming his/her goal of clean clothing is achieved, it is

evident that the opinion holder holds a positive opinion towards the detergent.

Case (2)

•Question:Why did the traveller dislike ﬂight Y?

•Customer’s review:The food was good. The crew was helpful and took care of

everything. The service was efﬁcient. However the ﬂight was supposed to take

1.5 h but was 3 h late, and I missed my next connecting ﬂight.

The major goal of taking a ﬂight is to get to your destination, which is more impor-

tant than goals like enjoying one’s food and receiving pampering service. While

multiple simultaneous goals induce competing opinion decisions, the presence of

an importance ranking among them determines the overall sentiment.

Case (3)

•Question:Why did the customer visit restaurant Z?

•Review1:The food is bad.

•Review2:The waiter was kind but the food was bad.

•Review3:The food was good but the waiter was rude.

Although the primary goal of being sated may be achieved, secondary goals such

as enjoying the food and receiving respectful service can be violated in various

combinations. Often, these goals pertain to the method by which the primary goal

was achieved; in other words, to the question “how?” rather than “why?”.

A sentiment determination algorithm that can provide more than just a simple

opinion label thus has to pay attention both to the primary reason behind the holder’s

involvement with the topic (“why?”) and to the secondary reasons (both “why?” and

“how?”), and has to be able to determine their relative importance and relationship

to the primary goal.

44 J. Li and E. Hovy

Goals and Expectations are Personal. As different people (opinion holders)

are from different backgrounds, have different personalities, and are in different

situations, they have different goals, needs, and the expectations of life. This

diversity generally leads to completely diverse opinions towards the same entity,

the same action, and the same situation: a billionaire wouldn’t be the least bit

concerned with the price in a bread shop but would consider the quality, while a

beggar might care only about the price. This rather banal observation is explained

best by Maslow’s famous hierarchy of needs (Maslow 1943), in which the beggar’s

attention focuses on Maslow’s Physiological needs while the billionaire’s focuses

on Self-Actualization; more on this in Sect. 3.3.1.

Life Requires Trade-offs. Most situations in real life address many personal

needs simultaneously. People thus face trade-offs between their goals, which

entails sacriﬁcing the achievement of one goal for the satisfaction of another.

Given the variability among people, the rankings and decision procedures will also

from individual to individual. However, Maslow’s hierarchy describes the general

behavioral trends of people in most societies and situations.

Complex Sentiment Expressions. As far as we see, current opinion analysis

frameworks mostly fail to address the kinds of issues mentioned above, and thereby

impair a deeper understanding about opinion or sentiment. As a result, they ﬁnd it

impossible to provide even rudimentary approaches to cases such as the following

(from Hovy 2015):

1. Peter thinks the pants are great and I cannot agree more.

2. Peter thinks the pants are great but I don’t agree.

3. Sometime I like it but sometimes I hate it.

4. He was half excited, half terriﬁed.

5. The movie is indeed wonderful, but for some reason, I just don’t like it.

6. Why I won’t buy this game even though I like it.

In this paper, we explore the feasibility of addressing these issues in a practical way

using machine learning techniques currently available.

3.2 A Review of Current Sentiment Analysis

Here we give a brief overview of tasks in current sentiment analysis literature. More

details can be found in Liu (2010,2012).

The key points involved at the algorithm level in the sentiment analysis literature

follow the basic approaches of statistical machine learning, in which a gold-standard

labeling of training data is obtained through manual annotation or other data

harvesting approaches (e.g., semi-supervised or weakly supervised), and this is then

used to train a variety of association-learning techniques who are then tested on

new material. Usually, some text unit has to be identiﬁed and then associated with

a sentiment label (e.g., positive, neutral, negative). Based on the annotated dataset,

the techniques learn that vocabulary items like “bad”, “awful”, and “disgusting” are

3 Reﬂections on Sentiment/Opinion Analysis 45

negative sentiment indicators while “good”, “fantastic” and “awesome” are positive

ones. The main complexity lies in learning which words carry some opinion and,

especially, what to decide in cases where different words with opposite labels appear

in the same clause.

Basic sentiment analysis identiﬁes the simple polarity of a text unit (e.g., a

token, a phrase, a sentence, or a document) and is framed as a binary or multi-

class classiﬁcation task; see for example Pang et al’s work (2002) that uses a

unigram/bigram feature-based SVM classiﬁer. Over the past 15 years, techniques

have evolved from simple rule-based word matching to more sophisticated feature

and signal (e.g., local word composition, facets of topics, opinion holder) identiﬁca-

tion and combination, from the level of single tokens to entire documents, and from

‘ﬂat’ word strings without any syntactic structure at all to incorporation of complex

linguistic structures (e.g., discourse or mixed-affect sentences); see (Pang and Lee

2004; Hu and Liu 2004; Wiebe et al. 2005; Nakagawa et al. 2010; Maas et al. 2011;

Tang et al. 2014a,b; Qiu et al. 2011; Wang and Manning 2012; Yang and Cardie

2014a; Snyder and Barzilay 2007). Recent progress in neural models provides new

techniques for local composition of both opinion and structure (e.g., subordination,

conjunction) using distributed representations of text units (e.g., Socher et al. 2013;

Irsoy and Cardie 2014a,b; Tang 2015; Tang et al. 2014c).

A supporting line of research extends the basic sentiment classiﬁcation to

include related aspects and facets, such as identifying opinion holders, the topics

of opinions, topics not explicitly mentioned in the text, etc.; see (Choi et al. 2006;

Kim and Hovy 2006,2004; Li and Hovy 2014; Jin et al. 2009; Breck et al.

2007; Johansson and Moschitti 2010; Yang and Cardie 2012,2013,2014b). These

approaches usually employ sequence labeling models (e.g., CRF (Lafferty et al.

2001), HMM (Liu et al. 2004)) to identify whether the current token corresponds to

a speciﬁc sentiment-related aspect or facet.

An important part of such supportive work is the identiﬁcation of the relevant

aspects or facets of the topic (e.g., the ambience of a restaurant vs. its food or staff

or cleanliness) and the correspondent sentiment; see (Brody and Elhadad 2010;Lu

et al. 2011; Titov and McDonald 2008; Jo and Oh 2011; Xueke et al. 2013; Kim et al.

2013; García-Moya et al. 2013; Wang et al. 2011; Moghaddam and Ester 2012).

Online reviews (about products or offerings) in crowdsourcing and traditional sites

(e.g., yelp, Amazon, Consumer Reports) include some sort of aspect-oriented star

rating systems where more stars indicate higher level of satisfaction. Consumers rely

on these user-generated online reviews when making purchase decisions. To tackle

this issue, researchers invent aspect identiﬁcation or target extraction approaches as

one subﬁeld of sentiment analysis. These approaches ﬁrst identify ‘aspects/facets

of the principal Topic and then discover authors’ corresponding opinions for each

one; e.g., (Brody and Elhadad 2010; Titov and McDonald 2008). Aspects are usually

identiﬁed either manually or automatically using word clustering models (e.g., LDA

(Bleietal.2003) or pLSA). However, real life is usually a lot more complex and

much harder to break into a series of facets (e.g., quality of living, marriage, career).

Other related work includes opinion summarization, aiming to summary senti-

ment key points given long texts (e.g., Hu and Liu 2004; Liu et al. 2005; Zhuang

et al. 2006;Kuetal.2006), opinion spam detection aiming at identifying ﬁctitious

46 J. Li and E. Hovy

reviews generated to deceive readers (e.g., Ott et al. 2011;Lietal.2014,2013; Jindal

and Liu 2008; Lim et al. 2010), sentiment text generation (e.g., Mohammad 2011;

Blair-Goldensohn et al. 2008), and large-scale sentiment/mood analysis on social

media for trend detecion (e.g., O’Connor et al. 2010; Bollen et al. 2011; Conover

et al. 2011; Paul and Dredze 2011).

3.3 The Needs and Goals Behind Sentiments

As outlined in Sect. 3.1, this chapter argues that an adequate and complete account

of utilitarian-based sentiment is possible only with reference to the goals of the

opinion holder. In this section we discuss a classic model of human needs and

associated goals and then outline a method for determining such goals from text.

3.3.1 Maslow’s Hierarchy of Needs

Abraham Maslow (Maslow 1943,1967,1971; Maslow et al. 1970) developed a

theory of the basic human needs as being organized in a hierarchy of importance,

visualized using a pyramid (shown in Fig. 3.1), where needs at the bottom are the

most pressing, basic, and fundamental to human life (that is, the human will tend to

choose to satisfy them ﬁrst before progressing to needs higher up).

According to Maslow’s theory, the most basic two levels of human needs are1:

• Physiological needs: breathing, food, water, sleep, sex, excretion, etc.

• Safety Needs: security of body, employment, property, heath, etc.

Fig. 3.1 Maslow’s hierarchy

of needs

Self-actualization

creativity,

spontaneity

lack of prejudice,

acceptance of facts,

morality

self-esteem, confidence,

respect of and by others

family, friendship, (sexual) intimacy

breathing, food, water, sleep, excretion, sex

security of self (body, resoures,

property, employment, health) and family

Esteem

Love & belonging

Safety

Physiology

1References from

https://en.wikipedia.org/wiki/Abraham_Maslow;

https://en.wikipedia.org/wiki/Maslow’s_hierarchy_of_needs;

http://www.edpsycinteractive.org/topics/conation/maslow.html

3 Reﬂections on Sentiment/Opinion Analysis 47

which are essential for the physical survival of a person. Once these needs are

satisﬁed, people tend to accomplish more and move to higher levels:

• Love and Belonging: psychological needs like friendship, family, sexual inti-

macy.

• Esteem: the need to be competent and recognized such as through status and level

of success like achievement, respect by others, etc.

These four types of needs are also referred to as DEFICIT NEEDS (or D-NEEDS),

meaning that for any human, if he or she doesn’t have enough of any of them, he

or she will experience the desire to obtain them. Less pressing than the D-needs are

the so-called GROWTH NEEDS, including Cognitive, Aesthetic (need for harmony,

order and beauty), and Self-actualization (described by Maslow as “the desire to

accomplish everything that one can, to become the most that one can be”). Growth

needs are more generalized, obscure, and computationally challenging. We focus in

this chapter on deﬁcit needs. For further reading, refer to Maslow’s original papers

(1943,1967) or relevant Wikipedia pages.

We note that real life offers many situations in which an action does not easily

align with a need listed in the hierarchy (for example, the goal of British troops to

arrest an Irish Republican Army leader or of US troops to attack Iraq). Additionally,

a single action (e.g., going to college, looking for a job) can simultaneously address

multiple needs. Putting aside such complex situations in this chapter, we focus on

more tractable situations to illustrate the key points.2

3.3.2 Finding Appropriate Goals for Actions and Entities

Typically, each deﬁcit need gives rise to one or more goals that impel the agent (the

opinion holder) to appropriate action. Following standard AI and Cognitive Science

practice, we assume that the agent instantiates one or more plans to achieve his

or her goals, where a plan is a sequence of actions intended to alter the state of

the world from some situation (typically, the agent’s initial state) to a situation in

which the goal has been achieved and the need satisﬁed. In each plan, its actions,

their preconditions, and the entities used in performing them (the plan’s so-called

props) constitute the material upon which sentiment analysis operates. For example,

the goal to sate one’s hunger may be achieved by plans such as visit-restaurant,

cook-and-eat-meal-at-home, buy-or-steal-ready-made-food, cadge-meal-invitation,

etc. In all these plans, food is one of the props. For the restaurant and buying-food

plans, an affordable price is an important precondition.

2However, putting them aside them doesn’t mean that we don’t need to explore and explain

these complex situations. On the contrary, these situations are essential and fundamental to the

understanding of opinion and sentiment, but requires deeper and more systematic exploration in

psychology, cognitive science, and AI.

48 J. Li and E. Hovy

A sentiment detection system that seeks to understand why the holder holds a

speciﬁc opinion valence has to determine the speciﬁc actions, preconditions, and

props that are relevant to the holder’s goal, and to what degree they sufﬁce. In

principle, a complete account requires the system to infer from the given text:

1. what need is active,

2. which goal(s) have been activated to address the need,

3. which plan(s) is/are being followed to achieve the goal(s),

4. which actions, preconditions, and props appear in these plan(s),

5. which of these is/are being talked about in the text,

6. how well it/they actually have furthered the agent’s plan(s),

from which the sentiment valence can be automatically deduced. When the valence

is given in the text, one can work ‘backwards’ to infer step 6, and possibly even

earlier steps.

Determining all this is a tall order for computational systems. Fortunately, it

is possible to circumvent much of this reasoning in practice. For most common

situations, a relatively small set of goals and plans obtains, and the relevant actions,

preconditions, and props are usually quite standard. (In fact, they are precisely

what is typically called ‘facets’ in the sentiment analysis literature, for which, as

described in Sect. 3.2, various techniques have been investigated, albeit without a

clear understanding of the reason these facets are important.)

Given this, the principal unaddressed computational problem today is the deter-

mination from the text of the original need or goal being experienced by the holder,

since that is what ties together all the other (and currently investigated) aspects.

How can one, for a given topic, determine the goals an agent would typically have

for it, suggest likely plans, and potentially pinpoint speciﬁc actions, preconditions,

and props?

One approach is to perform automated goal and plan harvesting, using typical

text mining / pattern-matching approaches from Information Extraction. This is a

relatively mature application of NLP (Hearst 1992; Riloff and Shepherd 1997; Riloff

and Jones 1999; Snow et al. 2004; Davidov and Rappoport 2006; Etzioni et al.

2005; Banko 2009; Mitchell et al. 2009; Ritter et al. 2009; Kozareva and Hovy

2013), and the harvesting power and behavior of various styles of patterns has been

investigated for over two decades. (In practice, the Double-Anchored Pattern (DAP)

method (Kozareva and Hovy 2013) works better than most others.) Stated simply,

one creates or automatically induces text patterns anchored on the topic (e.g., a

camera) such as

“I want a camera because *”

“If I had a camera I could *”

“the main reason to get a camera is *”

“wanted to *, so he bought a camera” etc.

and then extracts from large amounts of text the matched VPs and NPs as being

relevant to the topic. Appropriately rephrased and categorized, one obtains the

information harvested by these patterns would provide typical goals (reasons) for

buying and using cameras.

3 Reﬂections on Sentiment/Opinion Analysis 49

3.4 Toward a Practical Computational Approach

We are now ready to describe the overall approach necessary for a more complete

sentiment analysis system. For illustrative purposes we focus on simple binary

(positive/negative) valence identiﬁcation. However, the framework applies to ﬁner

granularity (e.g., multi-class classiﬁcation, regression) with minor adjustments. We

ﬁrst provide an overall algorithm sketch, provide a series of examples, and then

suggest models for determining the still unexplored aspects required for deeper

sentiment analysis.

First, we assume that standard techniques are employed to ﬁnd the following

from some given text:

1. Opinion Holder: Individual or organization holding the opinion.

2. Entity/Aspect/Theme/Facet: topic or aspect about which the opinion is held.

3. Sentiment Indicator: Sentiment-related text (tokens, phrases, sentences, etc.) that

indicate the polarity of the holder.

4. Valence: like, neutral,ordislike.

These have been deﬁned (or at least used with implicit deﬁnition) throughout the

sentiment literature, and are deﬁned for example in Hovy (2015). Of these, item 1

is usually achieved by simple matching. Item 2 can be partially addressed by recent

topic/facet mining models, and item 3 can be addressed by existing sentiment related

algorithms at the word-, sentence-, or text-level. Item 4 at its simplest is a matter of

keyword matching, but the composition witin a sentence of contrasting valences has

generated some interesting researech. Annotated corpora (or other semi-supervised

data harvesting techniques) might be needed for goal and need identiﬁcation, as

discussed above.

Given this, the following sketch algorithm implements deeper sentiment

analysis:

1. In the text, identify the key goal underlying the Theme.

2. Is there is no apparent goal?

• If yes, the opinion is probably non-utilitarian, so ﬁnd and return a valence if

any, but return no reason for it.

• If no, go to step 3.

3. Determine whether the goal is satisﬁed:

• Ifyes,gotostep4,

• If no, return a negative valence.

4. Identify the subgoals involved in achieving the major goal.

5. Identify how well the subgoals are satisﬁed.

6. Determine the ﬁnal utilitarian sentiment based on the trade-off between different

subgoals, and return it together with the trade-off analysis as the reasoning.

50 J. Li and E. Hovy

This procedure requires the determination of the Goals or Subgoals and the

Condition/Situation under which the opinion holder holds that opinion. The former

is discussed above; the latter can usually bet determined from the context of the

given text.

3.4.1 Examples and Illustration

As a running example we use simple restaurant reviews, sentences in italics

indicating original text from the reviews3:

Case 1

1. My friends and I went to restaurant X.

2. So many people were waiting there and we left without eating.

Following the algorithm sketch, the question “was the major goal of going to a

restaurant fulﬁlled?” is answered no. The reviewer is predicted to hold a negative

sentiment. Similar reasoning applies to Case 2 in Sect. 3.1.

Case 2

1. My friends and I went to restaurant X.

2. The waiter was friendly and knowledgeable.

3. We ordered curry chicken, potato chips and italian sausage. The Italian sausage

was delicious.

4. Overall the food was appetizing,

5. but I just didn’t enjoy the experience.

To the question “was the major goal of being full fulﬁlled?” the answer is yes,as

the food was ordered and eaten. Next the algorithms addresses the how (manner

of achievement) question described in steps 4–6, which involves the functional

elements of goals/needs embedded in each sentence:

1. My friends and I went to restaurant X.

Opinion Holder: I

Entity/Aspect/Theme: restaurant X

Need: sate hunger

Goal: visit restaurant

Sentiment Indicator: none

Valence: neutral Condition: in restaurant X

2. The waiter was friendly and knowledgeable.

Opinion Holder: I

3These reviews were originally from yelp reviews and revised by the authors for illustration

purposes.

3 Reﬂections on Sentiment/Opinion Analysis 51

Entity/Aspect/Theme: waiter

Need: gather respect/friendship

Subgoal: order food

Sentiment Indicator: friendly, knowledgeable

Valence: positive

Condition: in restaurant X

3. We ordered curry chicken, potato chips and italian sausage. Italian sausage was

delicious.

Opinion Holder: I

Entity/Aspect/Theme: Italian sausage

Need: sate hunger

Subgoal: eat food

Sentiment Indicator: delicious

Valence: positive

Condition: in restaurant X

4. Overall the food was appetizing,

Opinion Holder: I

Entity/Aspect/Theme: food

Need: sate hunger

Subgoal: eat enough to remove hunger

Sentiment Indicator: appetizing

Valence: positive

Condition: in restaurant X

5. but I just didn’t enjoy the experience.

Opinion Holder: I

Entity/Aspect/Theme: restaurant visit experience

Need: none — this is not utilitarian

Goal: none

Sentiment Indicator: didn’t enjoy

Sentiment Label: negative

Condition: in restaurant X

The analysis of the needs/goals and their respective positive and negative valences

allows one to justify the various sentiment statements, and (in the case of tie ﬁnal

negative decision) also indicate that it is not based on utilitarian considerations.

3.4.2 A Computational Model of Each Part

Current computational models can be used to address each of the aspects involved

in the sketch algorithm. We provide only a high-level description of each.

Deciding Functional Elements. Case 2 above involves three of the needs

described in Maslow’s hierarchy: food, respect/friendship, and emotion. The ﬁrst

two are stated to have been achieved. The third is a pure emotion, expressed without

52 J. Li and E. Hovy

a reason, why the holder “just didn’t enjoy the experience”. Pure emotions usually

have no overt utilitarian value but only relate to the holder’s high-level goal of

being happy. In this example, we have to conclude that since all overt goals were

met, either some unstated utilitarian Maslow-type need was not met, or the holder’s

opinion stems from a deeper psychological/emotional bias, of the kind mentioned

in Sect. 3.1, that goes beyond utilitarian value.

Whether the Major Goal is Achieved. To make a decision about goal achieve-

ment, one must: (1) identify the goal/subgoal of an action (e.g., buying the detergent,

going to a restaurant); (2) identify whether that goal/subgoal is achieved. The two

steps can be computed either separately or jointly using current machine learning

models and techniques, including:

•Joint Model: Annotate corpora for satisfaction or not for all goals and subgoals

together, and train a single machine learning algorithm.

•Separate Model:

1. Determine the goal and its plans and subgoals either through annotation or as

described in Sect. 3.3.2.

2. Associate the actions or entities of the Theme (e.g., going to a restaurant;

buying a car) with their respective (sub)goals.

3. Align each subgoal with indicator sentence(s) in the document (e.g., “I got a

small portion”; “the car was all it was supposed to be”).

4. Decide whether the subgoal is satisﬁed based on indicator sentence(s).

Learning Weights for Different Goals/Needs. One can clearly infer that the

customer in case 2 assigns more weight to the emotional aspect, that being his or

her ﬁnal conclusion, and less to the food or respect/friendship (which comes last in

this scenario). More formally, for a given text D, we discover Lneeds/(sub)goals,

with indices 1,2;:::;L. Each type of need/(sub)goal i2Œ1; Lis associated with

a weight that contributes to the ﬁnal sentiment valence decision vi. In document

D, each type of need iis associated with achievement value aithat indicates how

the need or goal is satisﬁed. The sentiment score SDfor given document Dis then

given by:

SDDX

i2Œ1;L

viai

This simple approach is comparable to a regression model that assigns weights

to relevant aspects, where gold standard examples can be the overall ratings of

the labeled restaurant reviews. One can view such a weight decision procedure

as a supervised regression model by assigning a weight value to each discovered

need. Such a procedure is similar to latent aspect rating introduced in Wang

et al. (2011); Zhao et al. (2010) by learning aspect weight (i.e., value, room,

location, or service) for hotel review ratings. A simple illustrative example might be

collaborative ﬁltering in recommendation systems, e.g., Breese et al. (1998); Sarwar

3 Reﬂections on Sentiment/Opinion Analysis 53

et al. (2001), optimizing need weight regarding each respective individual (which

could be sampled from a uniform prior for humans’ generally accepted weights).

Since individual expectations can differ, it would be advantageous to maintain

opinion holder proﬁles (for example, both yelp and Amazon keep individual proﬁles

for each customer) that record one’s long-term activity. This would support individ-

ual analysis of background, personality, or social identity, and enable learning of

speciﬁc goal weights for different individuals.

When these issues have been addressed, one can start asking deeper questions

like:

•Q: Why does John like his current job though his salary is low?

A: He weighs employment more highly than family.

•Q: How wealthy is a particular opinion holder?

A: He might be rich as he places little concern (weight) on money.

or make user-oriented recommendations like:

•Q: Should the system recommend an expensive–but-luxurious hotel or a cheap-

but-poor hotel?

3.4.3 Prior/Default Knowledge About Opinion Holders

Sentiment/opinion analysis can be considerably assisted by the existence of a

knowledge base that provides information about the typical preferences of the

holder.

Individuals’ goals vary across backgrounds, ages, nationalities, genders, etc. An

engineer would have different life goals from a businessman, or a doctor, a citizen

living in South America would have different weighing systems from those in

Europe or the United States, people in wartime would have different life expec-

tations from when in peacetime. Two general methods exist today for practically

collecting such standardized knowledge to construct a relevant knowledge base:

(1) Rule-based Approaches. Hierarchies of personality proﬁles have been pro-

posed, and changes to them have long been explored in the social and

developmental psychology literature, usually based on polls or surveys. For

example, (1981) found that children have higher physical needs than other

age groups, love needs emerging in the transitional period from childhood to

adulthood; esteem needs are the highest among adolescents; the highest self-

actualization levels are found with adults; and the highest levels of security

are found at older ages. As another example, researchers (Tang and Ibrahim

1998; Tang et al. 2002; Tang and West 1997) have found that survival (i.e.,

physiological and safety) needs dominate during wartime while psychological

needs (i.e., love, self-esteem, and self-actualization) surface during peacetime,

which is in line with our expectations. For computational implementation,

54 J. Li and E. Hovy

however, these sorts of studies provide very limited evidence, since only a few

aspects are typically explored.

(2) Computational Inference Approaches. Despite the lack of information about

individuals, reasonable preferences can be inferred from other resources such

as online social media. A vast section of the Social Network Analysis research

focuses on this problem, as well as much of the research of the large web

search engine companies. Networking websites like Facebook, LinkedIn, and

Google Plus provide rich repositories of personal information about individ-

ual attributes such as education, employment, nationality, religion, likes and

dislikes, etc. Additionally, online posts usually offer direct evidence for such

attributes. Some examples include age (Rao et al. 2010; Rao and Yarowsky

2010), gender (Ciot et al. 2013), living location (Sadilek et al. 2012), and

education (Mislove et al. 2010).

3.5 Conclusion and Discussion

The past 15 years has witnessed signiﬁcant performance improvements in training

machine learning algorithms for the sentiment/opinion identiﬁcation application.

But little progress has been made toward a deeper understanding about what

opinions or sentiments are, why people hold them, and why and how their

facets are chosen and expressed. No-one can deny the unprecedented contri-

butions of statistical learning algorithms in modern-day (post-1990s) NLP, for

this application as for others. However, ignoring cognitive and psychological

perspectives in favor of engineering alone inevitably hampers progress once the

algorithms asymptote to their optimal performance, since understanding how

to do something doesn’t necessarily lead to better insight about what needs

to be done, or how it is best represented. For example, when inter-annotator

agreement on sentiment labels peaks at 0.79 even for the rather crude 3-way

sentiment granularity of positive/neutral/negative (Ogneva 2010), is that the the-

oretical best that could be achieved? How could one ever know, without under-

standing what other aspects of sentiment/opinion are pertinent and investigating

whether they could constrain the annotation task and help boost annotation agree-

ment?

In this paper, we described possible directions for deeper understanding, help-

ing bridge the gap between psychology / cognitive science and computational

approaches. We focus on the opinion holder’s underlying needs and their resultant

goals, which, in a utilitarian model of sentiment, provides the basis for explaining

the reason a sentiment valence is held. (The complementary non-utilitarian, purely

intuitive preference-based basis for some sentiment decisions is a topic requiring

altogether different treatment.) While these thoughts are still immature, scattered,

unstructured, and even imaginary, we believe that these perspectives might suggest

fruitful avenues for various kinds of future work.

3 Reﬂections on Sentiment/Opinion Analysis 55

References

Banko, Michelle. 2009. Ph.D. Dissertation, University of Washington.

Blair-Goldensohn, Sasha, Kerry Hannan, Ryan McDonald, Tyler Neylon, George A Reis, and Jeff

Reynar. 2008. Building a sentiment summarizer for local service reviews. In WWW Workshop

on NLP in the Information Explosion Era, vol. 14.

Blei, David M, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. The Journal

of Machine Learning Research 3: 993–1022.

Bollen, Johan, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market.

Journal of Computational Science 2(1): 1–8.

Breck, Eric, Yejin Choi, and Claire Cardie. 2007. Identifying expressions of opinion in context.

In IJCAI.

Breese, John S, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive

algorithms for collaborative ﬁltering. In Proceedings of the Fourteenth Conference on

Uncertainty in Artiﬁcial Intelligence, 43–52. Morgan Kaufmann Publishers Inc.

Brody, Samuel, and Noemie Elhadad. 2010. An unsupervised aspect-sentiment model for

online reviews. In Human Language Technologies: The 2010 Annual Conference of the North

American Chapter of the Association for Computational Linguistics, 804–812. Association for

Computational Linguistics.

Choi, Yejin, Eric Breck, and Claire Cardie. 2006. Joint extraction of entities and relations for

opinion recognition. In EMNLP.

Ciot, Morgane, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of twitter users in

non-English contexts. In EMNLP, 1136–1145.

Conover, Michael, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Filippo Menczer, and

Alessandro Flammini. 2011. Political polarization on twitter. In ICWSM.

Davidov, A., and D. Rappoport. 2006. Efﬁcient unsupervised discovery of word categories

using symmetric patterns and high frequency words. In Proceedings of the 21st International

Conference on Computational Linguistics COLING and the 44th Annual Meeting of the ACL,

297–304.

Etzioni, O., M. Cafarella, D. Downey, A.M. Popescu, T. Shaked, and S. Soderland et al. 2005.

Unsupervised named-entity extraction from the web: An experimental study. Artiﬁcial

Intelligence 165(1): 91–134.

García-Moya, Lisette, Henry Anaya-Sánchez, and Rafael Berlanga-Llavori. 2013. Retrieving

product features and opinions from customer reviews. IEEE Intelligent Systems 28(3):

0019–27.

Goebel, Barbara L, and Delores R Brown. 1981. Age differences in motivation related to Maslow’s

need hierarchy. Developmental Psychology 17(6): 809.

Hearst, Marti. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings

of the 14th Conference on Computational Linguistics, 539–545.

Hovy, Eduard H. 2015. What are sentiment, affect, and emotion? Applying the methodology

of Michael zock to sentiment analysis. In Language production, cognition, and the Lexicon,

13–24. Cham: Springer.

Hu, Minqing, and Bing Liu. 2004. Mining opinion features in customer reviews. In AAAI,vol.4,

755–760.

Irsoy, Ozan, and Claire Cardie. 2014a. Deep recursive neural networks for compositionality in

language. In Advances in neural information processing systems, 2096–2104. Cham: Springer.

Irsoy, Ozan, and Claire Cardie. 2014b. Opinion mining with deep recurrent neural networks. In

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing

(EMNLP), 720–728.

Jin, Wei, Hung Hay Ho, and Rohini K Srihari. 2009. A novel lexicalized HMM-based learning

framework for web opinion mining. In ICML.

Jindal, Nitin, and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008

International Conference on Web Search and Data Mining, 219–230. ACM.

56 J. Li and E. Hovy

Jo, Yohan, and Alice H Oh. 2011. Aspect and sentiment uniﬁcation model for online review

analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data

Mining, 815–824. ACM.

Johansson, Richard, and Alessandro Moschitti. 2010. Syntactic and semantic structure for opinion

expression detection. In Proceedings of the Fourteenth Conference on Computational Natural

Language Learning.

Kim, Soo-Min, and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings

of the 20th International Conference on Computational Linguistics, 1367. Association for

Computational Linguistics.

Kim, Soo-Min, and Eduard Hovy. 2006. Extracting opinions, opinion holders, and topics expressed

in online news media text. In Proceedings of the Workshop on Sentiment and Subjectivity in

Tex t.

Kim, Suin, Jianwen Zhang, Zheng Chen, Alice H Oh, and Shixia Liu. 2013. A hierarchical aspect-

sentiment model for online reviews. In AAAI.

Kozareva, Z., and E.H Hovy. 2013. Tailoring the automated construction of large-scale taxonomies

using the web. Journal of Language Resources and Evaluation 47: 859–890.

Ku, Lun-Wei, Yu-Ting Liang, and Hsin-Hsi Chen. 2006. Opinion extraction, summarization and

tracking in news and blog corpora. In AAAI Spring Symposium: Computational Approaches to

Analyzing Weblogs, vol. 100107.

Lafferty, John, Andrew McCallum, and Fernando Pereira. 2001. Conditional random ﬁelds:

Probabilistic models for segmenting and labeling sequence data. In Proceedings of the

eighteenth international conference on machine learning, ICML.Vol.1.

Li, Jiwei, and Eduard H. Hovy. 2014. Sentiment analysis on the people’s daily. In EMNLP, 467–

476.

Li, Jiwei, Myle Ott, and Claire Cardie. 2013. Identifying manipulated offerings on review portals.

In EMNLP, 1933–1942.

Li, Jiwei, Myle Ott, Claire Cardie, and Eduard H. Hovy. 2014. Towards a general rule for

identifying deceptive opinion spam. In ACL (1), 1566–1576.

Lim, Ee-Peng, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting

product review spammers using rating behaviors. In Proceedings of the 19th ACM International

Conference on Information and Knowledge Management, 939–948. ACM.

Liu, Bing. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing

2: 627–666.

Liu, Bing. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language

Technologies 5(1): 1–167.

Liu, Bing, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: analyzing and comparing

opinions on the web. In Proceedings of the 14th International Conference on World Wide Web,

342–351. ACM.

Liu, Yun-zhong, Ya-ping Lin, and Zhi-ping Chen. 2004. Text information extraction based on

hidden Markov model [J]. Acta Simulata Systematica Sinica 3: 038.

Lu, Bin, Myle Ott, Claire Cardie, and Benjamin K Tsou. 2011. Multi-aspect sentiment analysis

with topic models. In 2011 IEEE 11th International Conference on Data Mining Workshops

(ICDMW), 81–88. IEEE.

Maas, Andrew L, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher

Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual

Meeting of the Association for Computational Linguistics: Human Language Technologies-

Vol u me 1 , 142–150. Association for Computational Linguistics.

Maslow, Abraham Harold, Robert Frager, James Fadiman, Cynthia McReynolds, and Ruth Cox.

1970. Motivation and personality, vol. 2. New York: Harper & Row.

Maslow, Abraham Harold. 1943. A theory of human motivation. Psychological Review 50(4):

370.

Maslow, Abraham H. 1967. A theory of metamotivation: The biological rooting of the value-life.

Journal of Humanistic Psychology 7(2): 93–127.

Maslow, Abraham H. 1971. The farther reaches of human nature.

3 Reﬂections on Sentiment/Opinion Analysis 57

Mislove, Alan, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. 2010. You are who

you know: Inferring user proﬁles in online social networks. In Proceedings of the Third ACM

International Conference on Web Search and Data Mining, 251–260. ACM.

Mitchell, T.M., J. Betteridge, A. Carlson, E. Hruschka, and R. Wang. 2009. Populating the

semantic web by macro-reading internet text. In Proceedings of the 8th International Semantic

Web Conference (ISWC).

Moghaddam, Samaneh, and Martin Ester. 2012. On the design of LDA models for aspect-based

opinion mining. In Proceedings of the 21st ACM International Conference on Information and

Knowledge Management, 803–812. ACM.

Mohammad, Saif. 2011. From once upon a time to happily ever after: Tracking emotions in novels

and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for

Cultural Heritage, Social Sciences, and Humanities, 105–114. Association for Computational

Linguistics.

Nakagawa, Tetsuji, Kentaro Inui, and Sadao Kurohashi. 2010. Dependency tree-based sentiment

classiﬁcation using CRFs with hidden variables. In Human Language Technologies: The

2010 Annual Conference of the North American Chapter of the Association for Computational

Linguistics, 786–794. Association for Computational Linguistics.

O’Connor, Brendan, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. 2010.

From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11: 122–

129.

Ogneva, Maria. 2010. How companies can use sentiment analysis to improve their business.

Mashable. http://mashable.com/2010/04/19/sentiment-analysis/

Ott, Myle, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion

spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the

Association for Computational Linguistics: Human Language Technologies-Volume 1, 309–

319. Association for Computational Linguistics.

Pang, Bo, and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity

summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on

Association for Computational Linguistics, 271. Association for Computational Linguistics.

Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classiﬁcation

using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical

Methods in Natural Language Processing-Volume 10, 79–86. Association for Computational

Linguistics.

Paul, Michael J, and Mark Dredze. 2011. You are what you tweet: Analyzing twitter for public

health. In ICWSM, 265–272.

Qiu, Guang, Bing Liu, Jiajun Bu, and Chun Chen. 2011. Opinion word expansion and target

extraction through double propagation. Computational Linguistics 37(1): 9–27.

Rao, Delip, and David Yarowsky. 2010. Detecting latent user properties in social media. In

Proceedings of the NIPS MLSN Workshop.

Rao, Delip, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent

user attributes in twitter. In Proceedings of the 2nd International Workshop on Search and

Mining User-Generated Contents, 37–44. ACM.

Riloff, J., and E. Shepherd. 1997. A corpus-based approach for building semantic lexicons. In

Proceedings of the Second Conference on Empirical Methods in Natural Language Processing

(EMNLP), 117–124.

Riloff, R., and E. Jones. 1999. Learning dictionaries for information extraction by multi-level

bootstrapping. In Proceedings of the Sixteenth National Conference on Artiﬁcial Intelligence

(AAAI), 474–479.

Ritter, A., S. Soderland, and O. Etzioni. 2009. What is this, anyway: Automatic hypernym

discovery. In Proceedings of the AAAI Spring Symposium on Learning by Reading and

Learning to Read.

Sadilek, Adam, Henry Kautz, and Jeffrey P Bigham. 2012. Finding your friends and following

them to where you are. In Proceedings of the Fifth ACM International Conference on Web

Search and Data Mining, 723–732. ACM.

58 J. Li and E. Hovy

Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative

ﬁltering recommendation algorithms. In Proceedings of the 10th International Conference on

World Wide Web, 285–295. ACM.

Snow, Rion, Daniel Jurafsky, and Andrew Y. Ng. 2004. Learning syntactic patterns for automatic

hypernym discovery. In NIPS, vol. 17, 1297–1304.

Snyder, Benjamin, and Regina Barzilay. 2007. Multiple aspect ranking using the good grief

algorithm. In HLT-NAACL, 300–307.

Socher, Richard, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y

Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over

a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural

Language Processing (EMNLP), vol. 1631, 1642, Citeseer.

Tang, Thomas Li-Ping, and Abdul H Safwat Ibrahim. 1998. Importance of human needs during

retrospective peacetime and the persian gulf war: Mideastern employees. International Journal

of Stress Management 5(1): 25–37.

Tang, Thomas Li-Ping, and W. Beryl West. 1997. The importance of human needs during

peacetime, retrospective peacetime, and the persian gulf war. International Journal of Stress

Management 4(1): 47–62.

Tang, T.L.P, A.H.S Ibrahim, and W.B. West. 2002. Effects of war-related stress on the satisfaction

of human needs: The united states and the middle east. International Journal of Management

Theory and Practices 3(1): 35–53.

Tang, Duyu, Furu Wei, Bing Qin, Li Dong, Ting Liu, and Ming Zhou. 2014a. A joint segmentation

and classiﬁcation framework for sentiment analysis. In EMNLP, 477–487.

Tang, Duyu, Furu Wei, Bing Qin, Ming Zhou, and Ting Liu. 2014b. Building large-scale Twitter-

speciﬁc sentiment Lexicon: A representation learning approach. In COLING, 172–182.

Tang, Duyu, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014c. Learning sentiment-

speciﬁc word embedding for twitter sentiment classiﬁcation. In Proceedings of the 52nd Annual

Meeting of the Association for Computational Linguistics, 1555–1565.

Tang, Duyu. 2015. Sentiment-speciﬁc representation learning for document-level sentiment

analysis. In Proceedings of the Eighth ACM International Conference on Web Search and

Data Mining, 447–452. ACM.

Titov, Ivan, and Ryan T McDonald. 2008. A joint model of text and aspect ratings for sentiment

summarization. In ACL, vol. 8, 308–316. Citeseer.

Wang, Sida, and Christopher D Manning. 2012. Baselines and bigrams: Simple, good sentiment

and topic classiﬁcation. In Proceedings of the 50th Annual Meeting of the Association for

Computational Linguistics: Short Papers-Volume 2, 90–94. Association for Computational

Linguistics.

Wang, Hongning, Yue Lu, and ChengXiang Zhai. 2011. Latent aspect rating analysis without

aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, 618–626. ACM.

Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and

emotions in language. Language Resources and Evaluation 39(2–3): 165–210.

Xueke, Xu, Cheng Xueqi, Tan Songbo, Liu Yue, and Shen Huawei. 2013. Aspect-level opinion

mining of online customer reviews. Communications, China 10(3): 25–41.

Yang, Bishan, and Claire Cardie. 2012. Extracting opinion expressions with semi-Markov

conditional random ﬁelds. In EMNLP.

Yang, Bishan, and Claire Cardie. 2013. Joint inference for ﬁne-grained opinion extraction. In ACL

(1), 1640–1649.

Yang, Bishan, and Claire Cardie. 2014a. Context-aware learning for sentence-level sentiment

analysis with posterior regularization. In Proceedings of ACL.

Yang, Bishan, and Claire Cardie. 2014b. Joint modeling of opinion expression extraction and

attribute classiﬁcation. Transactions of the Association for Computational Linguistics 2: 505–

516.

3 Reﬂections on Sentiment/Opinion Analysis 59

Zhao, Wayne Xin, Jing Jiang, Hongfei Yan, and Xiaoming Li. 2010. Jointly modeling aspects

and opinions with a maxent-LDA hybrid. In Proceedings of the 2010 Conference on Empirical

Methods in Natural Language Processing, 56–65. Association for Computational Linguistics.

Zhuang, Li, Feng Jing, and Xiao-Yan Zhu. 2006. Movie review mining and summarization.

In Proceedings of the 15th ACM International Conference on Information and Knowledge

Management, 43–50. ACM.

Chapter 4

Challenges in Sentiment Analysis

Saif M. Mohammad

Abstract A vast majority of the work in Sentiment Analysis has been on devel-

oping more accurate sentiment classiﬁers, usually involving supervised machine

learning algorithms and a battery of features. Surveys by Pang and Lee (Found

Trends Inf Retr 2(1–2):1–135, 2008), Liu and Zhang (A survey of opinion mining

and sentiment analysis. In: Aggarwal CC, Zhai C (eds) In: Mining text data.

Springer, New York, pp 415–463, 2012), and Mohammad (Mohammad Senti-

ment analysis: detecting valence, emotions, and other effectual states from text.

In: Meiselman H (ed) Emotion measurement. Elsevier, Amsterdam, 2016b) give

summaries of the many automatic classiﬁers, features, and datasets used to detect

sentiment. In this chapter, we ﬂesh out some of the challenges that still remain,

questions that have not been explored sufﬁciently, and new issues emerging from

taking on new sentiment analysis problems. We also discuss proposals to deal with

these challenges. The goal of this chapter is to equip researchers and practitioners

with pointers to the latest developments in sentiment analysis and encourage more

work in the diverse landscape of problems, especially those areas that are relatively

less explored.

Keywords Sentiment analysis tasks • Sentiment of the writer, reader, and other

entities • Sentiment towards aspects of an entity • Stance detection

• Sentiment lexicons • Sentiment annotation • Multilingual sentiment analysis

4.1 Introduction

There has been a large volume of work in sentiment analysis over the past decade

and it continues to rapidly develop in new directions. However, much of it is on

developing more accurate sentiment classiﬁers. In this chapter, we ﬂesh out some of

the challenges that still remain. We start by discussing different sentiment analysis

S.M. Mohammad ()

National Research Council Canada, 1200 Montreal Rd., Ottawa, ON, Canada

e-mail: Saif.Mohammad@nrc-cnrc.gc.ca

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_4

62 S.M. Mohammad

problems and how one of the challenges is to explore new sentiment analysis

problems that go beyond simply determining whether a piece of text is positive,

negative, or neutral (Sect. 4.2). Some of the more ambitious problems that need

more work include detecting sentiment at various levels of text granularities (terms,

sentences, paragraphs, etc); detecting sentiment of the reader or sentiment of entities

mentioned in the text; detecting sentiment towards aspects of products; detecting

stance towards pre-speciﬁed targets that may not be explicitly mentioned in the text

and that may not be the targets of opinion in the text; and detecting semantic roles

of sentiment. Since many sentiment analysis systems rely on sentiment lexicons, we

discuss capabilities and limitations of existing manually and automatically created

sentiment lexicons in Sect. 4.3. In Sect. 4.4, we discuss the difﬁcult problem of

sentiment composition—how to predict the sentiment of a combination of terms.

More speciﬁcally, we discuss the determination of sentiment of phrases (that may

include negators, degree adverbs, and intensiﬁers) and sentiment of sentences and

tweets. In Sect. 4.5, we discuss challenges in annotation of data for sentiment.

We provide categories of sentences that are particularly challenging for sentiment

annotation. Section 4.6 presents challenges in multilingual sentiment analysis. This

is followed by a discussion on the challenges of applying sentiment analysis to

downstream applications, and ﬁnally, some concluding remarks (Sect. 4.7).

4.2 The Array of Sentiment Analysis Tasks

Sentiment analysis is a generic name for a large number of opinion and affect related

tasks, each of which present their own unique challenges. The sub-sections below

provide an overview.

4.2.1 Sentiment at Different Text Granularities

Sentiment can be determined at various levels: from sentiment associations of

words and phrases; to sentiment of sentences, SMS messages, chat messages, and

tweets; to sentiment in product reviews, blog posts, and whole documents. A word–

sentiment (or valence) association lexicon may have entries such as:

delighted – positive

killed –negative

shout –negative

desk – neutral

These lexicons can be created either by manual annotation or through automatic

means. Manually created lexicons tend to be in the order of a few thousand

entries, but automatically generated lexicons can capture sentiment associations for

hundreds of thousands unigrams (single word strings) and even for larger expres-

4 Challenges in Sentiment Analysis 63

sions such as bigrams (two-word sequences) and trigrams (three-word sequences).

Entries in an automatically generated lexicon often also include a real-valued score

indicating the strength of association between the word and the valence category.

These numbers are prior estimates of the sentiment of terms in an average usage

of the term. While sentiment lexicons are often useful in sentence-level sentiment

analysis,1the same terms may convey different sentiments in different contexts. The

SemEval 2013 and 2014 Sentiment Analysis in Twitter shared tasks had a separate

sub-task aimed at identifying sentiment of terms in context. Automatic systems have

largely performed well in this task, obtaining F-scores close to 0.9. We discuss

manually and automatically created sentiment lexicons in more detail in Sect. 4.3.

Sentence-level valence classiﬁcation systems assign labels such as positive,

negative, or neutral to whole sentences. It should be noted that the valence of a

sentence is not simply the sum of the polarities of its constituent words. Automatic

systems learn a model from labeled training data (instances that are already marked

as positive, negative, or neutral) using a large number of features such as word

and character ngrams, valence association lexicons, negation lists, word clusters,

and even embeddings-based features. In recent years, there have been a number

of shared task competitions on valence classiﬁcation such as the 2013, 2014,

and 2015 SemEval shared tasks titled Sentiment Analysis in Twitter, the 2014

and 2015 SemEval shared tasks on Aspect Based Sentiment Analysis, the 2015

SemEval shared task Sentiment Analysis of Figurative Language in Twitter, and

the 2015 Kaggle competition Sentiment Analysis on Movie Reviews.2The NRC-

Canada system (Mohammad et al. 2013a; Kiritchenko et al. 2014b), a supervised

machine learning system, came ﬁrst in the 2013 and 2014 competitions. Other

sentiment analysis systems developed speciﬁcally for tweets include those by Pak

and Paroubek (2010), Agarwal et al. (2011), Thelwall et al. (2011), Brody and

Diakopoulos (2011), Aisopos et al. (2012), and Bakliwal et al. (2012). However,

even the best systems currently obtain an F-score of only about 0.7.

Sentiment analysis involving many sentences is often broken down into the

sentiment analysis of the component sentences. However, there is interesting work

in sentiment analysis of documents to generate text summaries (Ku et al. 2006;Liu

et al. 2007; Somprasertsri and Lalitrojwong 2010; Stoyanov and Cardie 2006;Lloret

et al. 2009), as well as detecting the patterns of sentiment and detecting sentiment

networks in novels and fairy tales (Nalisnick and Baird 2013a,b; Mohammad and

Yang 2011).

1The top systems in the SemEval-2013 and 2014 Sentiment Analysis in Twitter tasks used large

sentiment lexicons (Wilson et al. 2013; Rosenthal et al. 2014a).

2http://alt.qcri.org/semeval2015/task10/

http://alt.qcri.org/semeval2015/task12/

http://alt.qcri.org/semeval2015/task11/

http://www.kaggle.com/c/sentiment-analysis-on-movie-reviews

64 S.M. Mohammad

4.2.2 Detecting Sentiment of the Writer, Reader, and Other

Entities

On the surface, sentiment may seem unambiguous, but looking closer, it is easy

to see how sentiment can be associated with any of the following: 1. the speaker or

writer, 2. the listener or reader, or 3. one or more entities mentioned in the utterance.

A large majority of research in sentiment analysis has focused on detecting the

sentiment of the speaker, and this is often done by analyzing only the utterance.

However, there are several instances where it is unclear whether the sentiment in the

utterance is the same as the sentiment of the speaker. For example, consider:

James: The pop star suffered a fatal overdose of heroine.

The sentence describes a negative event (death of a person), but it is unclear whether

to conclude that James (the speaker) is personally saddened by the event. It is

possible that James is a news reader and merely communicating information about

the event. Developers of sentiment systems have to decide before hand whether they

wish to assign a negative or neutral sentiment to the speaker in such cases. More

generally, they have to decide whether the speaker’s sentiment will be chosen to be

neutral in absence of clear signiﬁers of the speaker’s own sentiment, or whether the

speaker’s sentiment will be chosen to be the same as the sentiment of events and

topics mentioned in the utterance.

On the other hand, people can react differently to the same utterance, for

example, people on opposite sides of a debate or rival sports fans. Thus modeling

listener sentiment requires modeling listener proﬁles. This is an area of research not

explored much by the community. Similarly, there is no work on modeling sentiment

of entities mentioned in the text, for example, given:

Drew: Jackson could not stop talking about the new Game of Thrones episode.

It will be useful to develop automatic systems that can deduce that Jackson (not

Drew) liked the new episode of Game of Thrones (a TV show).

4.2.3 Sentiment Towards Aspects of an Entity

A review of a product or service can express sentiment towards various aspects.

For example, a restaurant review can speak positively about the service, but express

a negative attitude towards the food. There is now a growing amount of work in

detecting aspects of products in text and also in determining sentiment towards

these aspects. In 2014, a shared task was organized for detecting aspect sentiment

in restaurant and laptop reviews (Pontiki et al. 2014a). The best performing

systems had a strong sentence-level sentiment analysis system to which they added

localization features so that more weight was given to sentiment features close to the

mention of the aspect. This task was repeated in 2015. It will be useful to develop

4 Challenges in Sentiment Analysis 65

aspect-based sentiment systems for other domains such as blogs and news articles

as well. (See proceeding of SemEval-2014 and 2015 for details about participating

aspect sentiment systems.)

4.2.4 Stance Detection

Stance detection is the task of automatically determining from text whether the

author of the text is in favor of, against, or neutral towards a proposition or target.

For example, given the following target and text pair:

Target of interest: women have the right to abortion

Text: A foetus has rights too!

Humans can deduce from the text that the speaker is against the proposition.

However, this is a challenging task for computers. To successfully detect stance,

automatic systems often have to identify relevant bits of information that may not

be present in the focus text. The systems also have to ﬁrst identify the target of

opinion in the text and then determine its implication on the target of interest. Note

that the target of opinion need not be the same as the target of interest. For example,

that if one is actively supporting foetus rights (target of opinion), then he or she is

likely against the right to abortion (target of interest). Automatic systems can obtain

such information from large amounts of domain text.

Automatically detecting stance has widespread applications in information

retrieval, text summarization, and textual entailment. In fact, one can argue that

stance detection can bring complementary information to sentiment analysis,

because we often care about the authors evaluative outlook towards speciﬁc targets

and propositions rather than simply about whether the speaker was angry or happy.

Mohammad et al. (2016b) created the ﬁrst dataset of tweets labeled for both

stance and sentiment. More than 4000 tweets are annotated for whether one can

deduce favorable or unfavorable stance towards one of ﬁve targets ‘Atheism’,

‘Climate Change is a Real Concern’, ‘Feminist Movement’, ‘Hillary Clinton’, and

‘Legalization of Abortion’. Each of these tweets is also annotated for whether

the target of opinion expressed in the tweet is the same as the given target of

interest. Finally, each tweet is annotated for whether it conveys positive, negative,

or neutral sentiment. Partitions of this stance-annotated data were used as training

and test sets in the SemEval-2016 shared task competition, Task #6: Detecting

Stance from Tweets Mohammad et al. (2016a). Participants were provided with

2,914 training instances labeled for stance for the ﬁve targets. The test data included

1,249 instances. All of the stance data is made freely available through the shared

task website. The task received submissions from 19 teams. The best performing

system obtained an overall average F-score of 67.8 in a three-way classiﬁcation

task: favour, against, or neither. They employed two recurrent neural network (RNN)

classiﬁers: the ﬁrst was trained to predict task-relevant hashtags on a large unlabeled

Twitter corpus. This network was used to initialize a second RNN classiﬁer, which

66 S.M. Mohammad

was trained with the provided training data (Zarrella and Marsh 2016). Mohammad

et al. (2016b) developed a SVM system that only uses features drawn from word

and character ngrams and word embeddings to obtain an even better F-score of 70.3

on the shared task’s test set. Yet, performance of systems is substantially lower on

tweets where the target of opinion is an entity other than the target of interest.

Most of the earlier work focused on two-sided debates, for example on congres-

sional debates (Thomas et al. 2006) or debates in online forums (Somasundaran

and Wiebe 2009; Murakami and Raymond 2010; Anand et al. 2011;Walkeretal.

2012; Hasan and Ng 2013; Sridhar et al. 2014). New research in domains such as

social media texts, and approaches that combine traditional sentiment analysis with

relation extraction can make a signiﬁcant impact in improving the state-of-the-art in

automatic stance detection.

4.2.5 Detecting Semantic Roles of Feeling

Past work in sentiment analysis has focused extensively on detecting polarity,

and to a smaller extent on detecting the target of the sentiment (the stimulus)

(Popescu and Etzioni 2005;Suetal.2006;Xuetal.2013; Qadir 2009; Zhang

et al. 2010; Zhang and Liu 2011; Kessler and Nicolov 2009). However, there

exist other aspects relevant to sentiment. Tables 4.1 and 4.2 show FrameNet

(Baker et al. 1998) frames for ‘feelings’ and ‘emotions’, respectively. Observe

that in addition to Evaluation, State, and Stimulus, several other roles such as

Reason, Degree, Topic, and Circumstance are also of signiﬁcance and beneﬁcial

to down-stream applications such as information retrieval, summarization, and

textual entailment. Detecting these various roles is essentially a semantic role-

labeling problem (Gildea and Jurafsky 2002; Màrquez et al. 2008;Palmeretal.

2010), and it is possible that they can be modeled jointly to improve detection

accuracy. Li and Xu (2014) proposed a rule-based system to extract the event that

was the cause of an emotional Weibo (Chinese microblogging service) message.

Mohammad et al. (2015a) created a corpus of tweets from the run up to the 2012

US presidential elections, with annotations for sentiment, emotion, stimulus, and

experiencer. The data also includes annotations for whether the tweet is sarcastic,

Table 4.1 The FrameNet frame for feeling

Role Description

Core

Emotion The feeling that the experiencer experiences

State The state the experiencer is in

Evaluation A negative or positive assessment of the experiencer regarding his/her state

Experiencer One who experiences the emotion and is in the state

Non-Core

Explanation The thing that leads to the experiencer feeling the emotion or state

4 Challenges in Sentiment Analysis 67

Table 4.2 The FrameNet frame for emotions

Role Description

Core

Experiencer The person that experiences or feels the emotion

State The abstract noun that describes the experience

Stimulus The person or event that evokes the emotional response in the experiencer.

Topic The general area in which the emotion occurs

Non-Core

Circumstances The condition in which stimulus evokes response

Degree The extent to which the experiencer’s emotion deviates from the norm for

the emotion

Empathy_target The Empathy_target is the individual or individuals with which the

experiencer identiﬁes emotionally

Manner Any description of the way in which the experiencer experiences

the stimulus which is not covered by more speciﬁc frame elements

Reason The explanation for why the stimulus evokes a certain emotional response

ironic, or hyperbolic. Diman Ghazi and Szpakowicz (2015) compiled FrameNet

sentences that were tagged with the stimulus of certain emotions.

4.2.6 Detecting Affect and Emotions

Sentiment analysis is most commonly used to refer to the goal of determining

the valence or polarity of a piece of text. However, it can refer more generally to

determining one’s attitude towards a particular target or topic. Here, attitude can

even mean emotional or affectual attitude such as frustration, joy, anger, sadness,

excitement, and so on. Russell (1980) developed a circumplex model of affect and

showed that it can be characterized by two primary dimensions: valence (positive

and negative dimension) and arousal (degree of reactivity to stimulus). Thus, it

is not surprising that large amounts of work in sentiment analysis is focused on

determining valence. However, there is barely any work on automatically detecting

arousal and a relatively small amount of work on detecting emotions such as anger,

frustration, sadness, and optimism (Strapparava and Mihalcea 2007; Aman and

Szpakowicz 2007; Tokuhisa et al. 2008; Neviarouskaya et al. 2009; Bellegarda

2010; Mohammad 2012; Boucouvalas 2002; Zhe and Boucouvalas 2002; Holzman

and Pottenger 2003;Maetal.2005; Mohammad 2012; John et al. 2006; Mihalcea

and Liu 2006; Genereux and Evans 2006). Detecting these more subtle aspects

of sentiment has wide-ranging applications, for example in developing customer

relation models, public health, military intelligence, and the video games industry,

where it is necessary to make distinctions between anger and sadness (both of which

are negative), calm and excited (both of which are positive), and so on.

68 S.M. Mohammad

4.3 Sentiment of Words

Term–sentiment associations have been captured by manually created sentiment

lexicons as well as automatically generated ones.

4.3.1 Manually Generated Term-Sentiment Association

Lexicons

The General Inquirer (GI) has sentiment labels for about 3,600 terms (Stone et al.

1966). Hu and Liu (2004) manually labeled about 6,800 words and used them

for detecting sentiment of customer reviews. The MPQA Subjectivity Lexicon,

which draws from the General Inquirer and other sources, has sentiment labels

for about 8,000 words (Wilson et al. 2005). The NRC Emotion Lexicon has

sentiment and emotion labels for about 14,000 words (Mohammad and Turney

2010; Mohammad and Yang 2011). These labels were compiled through Mechanical

Turk annotations.3

For people, assigning a score indicating the degree of sentiment is not natural.

Different people may assign different scores to the same target item, and it is hard

for even the same annotator to remain consistent when annotating a large number of

items. In contrast, it is easier for annotators to determine whether one word is more

positive (or more negative) than the other. However, the latter requires a much larger

number of annotations than the former (in the order of N2, where Nis the number

of items to be annotated).

An annotation scheme that retains the comparative aspect of annotation while

still requiring only a small number of annotations comes from survey analysis

techniques and is called MaxDiff (Louviere 1991). The annotator is presented with

four terms and asked which word is the most positive and which is the least positive.

By answering just these two questions ﬁve out of the six inequalities are known. If

the respondent says that Ais most positive and Dis least positive, then:

A>B;A>C;A>D;B>D;C>D

Each of these MaxDiff questions can be presented to multiple annotators. The

responses to the MaxDiff questions can then be easily translated into a ranking of

all the terms and also a real-valued score for all the terms (Orme 2009). If two

words have very different degrees of association (for example, AD), then Awill

be chosen as most positive much more often than Dand Dwill be chosen as least

positive much more often than A. This will eventually lead to a ranked list such that

Aand Dare signiﬁcantly farther apart, and their real-valued association scores are

also signiﬁcantly different. On the other hand, if two words have similar degrees

3https://www.mturk.com/mturk/welcome

4 Challenges in Sentiment Analysis 69

of association with positive sentiment (for example, Aand B), then it is possible

that for MaxDiff questions having both Aand B, some annotators will choose Aas

most positive, and some will choose Bas most positive. Further, both Aand Bwill

be chosen as most positive (or most negative) a similar number of times. This will

result in a list such that Aand Bare ranked close to each other and their real-valued

association scores will also be close in value.

MaxDiff was used for obtaining annotations of relation similarity of pairs of

items in a SemEval-2012 shared task (Jurgens et al. 2012). Kiritchenko and Moham-

mad (2016a) applied Best–Worst Scaling to obtain real-valued sentiment association

scores for words and phrases in three different domains: general English, English

Twitter, and Arabic Twitter. They showed that on all three domains the ranking

of words by sentiment remains remarkably consistent even when the annotation

process is repeated with a different set of annotators. They also determine the

minimum difference in sentiment association that is perceptible to native speakers

of a language.

4.3.2 Automatically Generated Term-Sentiment Association

Lexicons

Semi-supervised and automatic methods have also been proposed to detect the

polarity of words. Hatzivassiloglou and McKeown (1997) proposed an algorithm

to determine the polarity of adjectives. SentiWordNet was created using supervised

classiﬁers as well as manual annotation (Esuli and Sebastiani 2006). Turney and

Littman (2003) proposed a minimally supervised algorithm to calculate the polarity

of a word by determining if its tendency to co-occur with a small set of positive

seed words is greater than its tendency to co-occur with a small set of negative seed

words. Mohammad et al. (2013b) employed the Turney method to generate a lexicon

(Hashtag Sentiment Lexicon) from tweets with certain sentiment-bearing seed-word

hashtags such as (#excellent, #good, #terrible, and so on) and another lexicon (Hash-

tag Sentiment Lexicon) from tweets with emoticons.4Since the lexicons themselves

are generated from tweets, they even have entries for the creatively spelled words

(e.g. happpeee), slang (e.g. bling), abbreviations (e.g. lol), and even hashtags and

conjoined words (e.g. #loveumom). Cambria et al. (2016) created SenticNet that has

sentiment entries for 30,000 words and multi-word expressions using information

propagation to connect various parts of common-sense knowledge representations.

Kiritchenko et al. (2014b) proposed a method to create separate lexicons for words

found in negated context and those found in afﬁrmative context; the idea being

that the same word contributes to sentiment differently depending on whether

it is negated or not. These lexicons contain sentiment associations for hundreds

4http://www.purl.com/net/lexicons

70 S.M. Mohammad

of thousands of unigrams and bigrams. However, they do not explicitly handle

combinations of terms with modals, degree adverbs, and intensiﬁers.

4.4 Sentiment of Phrases, Sentences, and Tweets: Sentiment

Composition

Semantic composition, which aims at determining a representation of the meaning

of two words through manipulations of their individual representations, has gained

substantial attention in recent years with work from Mitchell and Loapata (2010),

Baroni and Zamparelli (2010), Rudolph and Giesbrecht (2010), Yessenalina and

Cardie (2011), Grefenstette et al. (2013), Grefenstette and Sadrzadeh (2011), and

Turney (2014). Socher et al. (2012) and Mikolov et al. (2013) introduced deep

learning models and distributed word representations in vector space (word embed-

dings) to obtain substantial improvements over the state-of-the-art in semantic

composition. Mikolov’s word2vec tool for generating word embeddings is available

publicly.5

Sentiment of a phrase or a sentence is often not simply the sum of the sentiments

of its constituents. Sentiment composition is the determining of sentiment of a

multi-word linguistic unit, such as a phrase or a sentence, based on its constituents.

Lexicons that include sentiment associations for phrases as well as their constituent

words are referred to as sentiment composition lexicons (SCLs). Kiritchenko

and Mohammad created sentiment composition lexicons for English and Arabic

that included: (1) negated expressions Kiritchenko and Mohammad (2016a,b),

(2) phrases with adverbs, modals, and intensiﬁes Kiritchenko and Mohammad

(2016a,b), and (3) opposing polarity phrases (where at least one word in the phrase

is positive and at least one word is negative, for example, happy accident and

dark chocolate) (Kiritchenko and Mohammad 2016c). Socher et al. (2013) took

a dataset of movie review sentences that were annotated for sentiment and further

annotated ever word and phrasal constituent within those sentences for sentiment.

Such datasets where sentences, phrases, and their constituent words are annotated

for sentiment are helping foster further research on how sentiment is composed.

We discuss speciﬁc types of sentiment composition, and challenges for automatic

methods that address them, in the sub-sections below.

4.4.1 Negated Expressions

Morante and Sporleder (2012) deﬁne negation to be “a grammatical category that

allows the changing of the truth value of a proposition”. Negation is often expressed

5https://code.google.com/p/word2vec

4 Challenges in Sentiment Analysis 71

through the use of negative signals or negator words such as not and never, and it can

signiﬁcantly affect the sentiment of its scope. Understanding the impact of negation

on sentiment improves automatic analysis of sentiment. Earlier works on negation

handling employ simple heuristics such as ﬂipping the polarity of the words in a

negator’s scope (Kennedy and Inkpen 2005; Choi and Cardie 2008) or changing

the degree of sentiment of the modiﬁed word by a ﬁxed constant (Taboada et al.

2011). Zhu et al. (2014) show that these simple heuristics fail to capture the true

impact of negators on the words in their scope. They show that negators tend to

often make positive words negative (albeit with lower intensity) and make negative

words less negative (not positive). Zhu et al. also propose certain embeddings-based

recursive neural network models to capture the impact of negators more precisely.

As mentioned earlier, Kiritchenko et al. (2014b) capture the impact of negation by

creating separate sentiment lexicons for words seen in afﬁrmative context and those

seen in negated contexts. They use a hand-chosen list of negators and determine

scope to be starting from the negator and ending at the ﬁrst punctuation (or end of

sentence).

Several aspects about negation are still not understood though: for example, can

negators be ranked in terms of their average impact on the sentiment of their scopes

(which negators impact sentiment more and which impact sentiment less); in what

contexts does the same negator impact the sentiment of its scope more and in what

contexts is the impact less; how do people in different communities and cultures use

negations differently; and how negations of sentiment expressions should be dealt

with by paraphrase and textual entailment systems.

4.4.2 Phrases with Degree Adverbs, Intensiﬁers, and Modals

Degree adverbs such as barely, moderately, and slightly quantify the extent or

amount of the predicate. Intensiﬁers such as too and very are modiﬁers that do

not change the propositional content (or truth value) of the predicate they modify,

but they add to the emotionality. However, even linguists are hard pressed to

give out comprehensive lists of degree adverbs and intensiﬁers. Additionally, the

boundaries between degree adverbs and intensiﬁers can sometimes be blurred,

and so it is not surprising that the terms are occasionally used interchangeably.

Impacting propositional content or not, both degree adverbs and intensiﬁers impact

the sentiment of the predicate, and there is some work in exploring this interaction

(Zhang et al. 2008; Wang and Wang 2012;Xuetal.2008; Lu and Tsou 2010;

Taboada et al. 2008). Most of this work focuses on identifying sentiment words

by bootstrapping over patterns involving degree adverbs and intensiﬁers. Thus

several areas remain unexplored, such as identifying patterns and regularities in how

different kinds of degree adverbs and intensiﬁers impact sentiment, ranking degree

adverbs and intensiﬁers in terms of how they impact sentiment, and determining

when (in what contexts) the same modiﬁer will impact sentiment differently than

72 S.M. Mohammad

its usual behavior. (See Kiritchenko and Mohammad (2016b) for some recent work

exploring these questions in manually annotated sentiment composition lexicons.)

Modals (a kind of auxiliary verb) are used to convey the degree of conﬁdence,

permission, or obligation to the predicate. Thus, if the predicate is sentiment bearing,

then the sentiment of the combination of the modal and the predicate can be different

from the sentiment of the predicate alone. For example, cannot work seems less

positive than work or will work (cannot and will are modals). There is little work

on automatically determining the impact of modals on sentiment.

4.4.3 Sentiment of Sentences, Tweets, and SMS messages

Bag-of-word models such as the NRC-Canada system (Mohammad et al. 2013a;

Kiritchenko et al. 2014a,b) and Unitn Severyn and Moschitti (2015) have been very

successful in recent shared task competitions on determining sentiment of whole

tweets, SMS messages, and sentences. However, approaches that apply systematic

sentiment composition of smaller units to determine sentiment of sentences are

growing in popularity. Socher et al. (2013) proposed a word-embeddings based

model that learns the sentiment of term compositions. They obtain state-of-the-

art results in determining both the overall sentiment and sentiment of constituent

phrases in movie review sentences. This has inspired tremendous interest in more

embeddings-based work for sentiment composition (Dong et al. 2014; Kalchbrenner

et al. 2014). These recursive models do not require any hand-crafted features or

semantic knowledge, such as a list of negation words or sentiment lexicons. How-

ever, they are computationally intensive and need substantial additional annotations

(word and phrase-level sentiment labeling). Nonetheless, use of word-embeddings

in sentiment composition is still in its infancy, and we will likely see much more

work using these techniques in the future.

4.4.4 Sentiment in Figurative Expressions

Figurative expressions in text, by deﬁnition, are not compositional. That is, their

meaning cannot fully be derived from the meaning of their components in isolation.

There is growing interest in detecting ﬁgurative language, especially irony and

sarcasm (Carvalho et al. 2009; Reyes et al. 2013; Veale and Hao 2010; Filatova

2012; González-Ibánez et al. 2011). In 2015, a SemEval shared task was organized

on detecting sentiment in tweets rich in metaphor and irony (Task 11).6Participants

were asked to determine the degree of sentiment for each tweet where the score is

a real number in the range from 5(mostnegative)toC5 (most positive). One of

6The proceedings will be released later in 2015.

4 Challenges in Sentiment Analysis 73

the characteristics of the data is that a large majority is negative; thereby suggesting

that ironic tweets are largely negative. The SemEval 2014 shared task Sentiment

Analysis in Twitter Rosenthal et al. (2014a) had a separate test set involving

sarcastic tweets. Participants were asked not to train their system on sarcastic tweets,

but rather apply their regular sentiment system on this new test set; the goal was

to determine performance of regular sentiment systems on sarcastic tweets. It was

observed that the performances dropped by about 25% to 70%, thereby showing that

systems must be adjusted if they are to be applied to sarcastic tweets. We found little

to no work exploring automatic sentiment detection in hyperbole, understatement,

rhetorical questions, and other creative uses of language.

4.5 Challenges in Annotating for Sentiment

Clear and simple instructions are crucial for obtaining high-quality annotations. This

is true even for seemingly simple annotation tasks, such as sentiment annotation,

where one is to label instances as positive, negative, or neutral. For word anno-

tations, researchers have often framed the task as ‘is this word positive, negative,

or neutral?’ Hu and Liu (2004), ‘does this word have associations with positive,

negative, or neutral sentiment?’ Mohammad and Turney (2013), or ‘which word

is more positive?’/‘which word has a greater association with positive sentiment’

(Kiritchenko et al. 2016; Kiritchenko and Mohammad 2016c). Similar instructions

are also widely used for sentence-level sentiment annotations—‘is this sentence

positive, negative, or neutral?’ (Rosenthal et al. 2015,2014b; Mohammad et al.

2016a,2015b). We will refer to such annotation schemes as the simple sentiment

questionnaires. On the one hand, this characterization of the task is simple, terse,

and reliant on the intuitions of native speakers of a language (rather than biasing the

annotators by providing deﬁnitions of what it means to be positive, negative, and

neutral). On the other hand, the lack of speciﬁcation leaves the annotator in doubt

over how to label certain kinds of instances—for example, sentences where one side

wins against another, sarcastic sentences, or retweets.

A different approach to sentiment annotation is to ask respondents to identify

the target of opinion, and the sentiment towards this target of opinion (Pontiki

et al. 2014b; Mohammad et al. 2015b; Deng and Wiebe 2014). We will refer to

such annotation schemes as the semantic-role based sentiment questionnaires.This

approach of sentiment annotation is more speciﬁc, and more involved, than the

simple sentiment questionnaire approach; however, it too is insufﬁcient for handling

several scenarios. Most notably, the emotional state of the speaker is not under

the purview of this scheme. Many applications require that statements expressing

positive or negative emotional state of the speaker should be marked as ‘positive’

or ‘negative’, respectively. Similarly, many applications require statements that

describe positive or negative events or situations to be marked as ‘positive’ or

‘negative’, respectively. Instructions for annotating opinion towards targets do not

74 S.M. Mohammad

specify how such instances are to be annotated, and worse still, possibly imply that

such instances are to be labeled as neutral.

Some sentence types that are especially challenging for sentiment annotation

(using either the simple sentiment questionnaire or the semantic-role based senti-

ment questionnaire) are listed below:

•Speaker’s emotional state: The speaker’s emotional state may or may not

have the same polarity as the opinion expressed by the speaker. For example,

a politician’s tweet can imply both a negative opinion about a rival’s past

indiscretion, and a joyous mental state as the news will impact the rival adversely.

•Success or failure of one side w.r.t. another: Often sentences describe the

success or failure of one side w.r.t. another side—for example, ‘Yay! France

beat Germany 3–1’, ‘Supreme court judges in favor of gay marriage’, and ‘the

coalition captured the rebels’. If one supports France, gay marriage, and the

coalition, then these events are positive, but if one supports Germany, marriage

as a union only between man and woman, and the rebels, then these events can

be seen as negative.

Also note that the framing of an event as the success of one party (or as the failure

of another party) does not automatically imply that the speaker is expressing

positive (or negative) opinion towards the mentioned party. For example, when

Finland beat Russia in ice hockey in the 2014 Sochi Winter Olympics, the

event was tweeted around the world predominantly as “Russia lost to Finland”

as opposed to “Finland beat Russia”. This is not because the speakers were

expressing negative opinion towards the Russian team, but rather simply because

Russia, being the host nation, was the focus of attention and traditionally Russian

hockey teams have been strong.

•Neutral reporting of valenced information: If the speaker does not give any

indication of her own emotional state but describes valenced events or situations,

then it is unclear whether to consider these statements as neutral unemotional

reporting of developments or whether to assume that the speaker is in a negative

emotional state (sad, angry, etc.). Example:

The war has created millions of refugees.

•Sarcasm and ridicule: Sarcasm and ridicule are tricky from the perspective of

assigning a single label of sentiment because they can often indicate positive

emotional state of the speaker (pleasure from mocking someone or something)

even though they have a negative attitude towards someone or something.

•Different sentiment towards different targets of opinion: The speaker may express

opinion about multiple targets, and sentiment towards the different targets might

be different. The targets may be different people or objects (for example, an

iPhone vs. an android phone), or they may be different aspects of the same entity

(for example, quality of service vs. quality of food at a restaurant).

•Precisely determining the target of opinion: Sometimes it is difﬁcult to precisely

identify the target of opinion. For example, consider:

Glad to see Hillary’s lies being exposed.

4 Challenges in Sentiment Analysis 75

It is unclear whether the target of opinion is ‘Hillary’, ‘Hillary’s lies’, or

‘Hillary’s lies being exposed’. One reasonable interpretation is that positive

sentiment is expressed about ‘Hillary’s lies being exposed’. However, one can

also infer that the speaker has a negative attitude towards ‘Hillary’s lies’ and

probably ‘Hillary’ in general. It is unclear whether annotators should be asked to

provide all three opinion–target pairs or only one (in which case, which one?).

•Supplications and requests: Many tweets convey positive supplications to God

or positive requests to people in the context of a (usually) negative situation.

Examples include:

May god help those displaced by war.

Let us all come together and say no to fear mongering and divisive politics.

•Rhetorical questions: Rhetorical questions can be treated simply as queries (and

thus neutral) or as utterances that give away the emotional state of the speaker.

For example, consider:

Why do we have to quibble every time?

On the one hand, this tweet can be treated as a neutral question, but on the

other hand, it can be seen as negative because the utterance betrays a sense of

frustration on the part of the speaker.

•Quoting somebody else or re-tweeting: Quotes and retweets are difﬁcult to

annotate for sentiment because it is often unclear and not explicitly evident

whether the one who quotes (or retweets) holds the same opinions as that

expressed by the quotee.

The challenges listed above can be addressed to varying degrees by providing

instructions to the annotators on how such instances are to be labeled. However,

detailed and complicated instructions can be counter-productive as the annotators

may not understand or may not have the inclination to understand the subtleties

involved. See Mohammad (2016a) for annotation schemes that address some of

these challenges.

4.6 Challenges in Multilingual Sentiment Analysis

Work on multilingual sentiment analysis has mainly addressed mapping sentiment

resources from English into morphologically complex languages. Mihalcea et al.

(2007) use English resources to automatically generate a Romanian subjectivity

lexicon using an English–Romanian dictionary. The generated lexicon is then

used to classify Romanian text. Wan (2008) translated Chinese customer reviews

to English using a machine translation system. The translated reviews are then

annotated using rule-based system that uses English lexicons. A higher accuracy is

achieved when using ensemble methods and combining knowledge from Chinese

and English resources. Balahur and Turchi (2014) conducted a study to assess

the performance of statistical sentiment analysis techniques on machine-translated

76 S.M. Mohammad

texts. Opinion-bearing phrases from the New York Times Text (2002–2005) corpus

were automatically translated using publicly available machine-translation engines

(Google, Bing, and Moses). Then, the accuracy of a sentiment analysis system

trained on original English texts was compared to the accuracy of the system

trained on automatic translations to German, Spanish, and French. The authors

conclude that the quality of machine translation is acceptable for sentiment analysis

to be performed on automatically translated texts. Salameh et al. (2015) conducted

experiments to determine loss in sentiment predictability when they translate Arabic

social media posts into English, manually and automatically. As benchmarks,

they use manually and automatically determined sentiment labels of the Arabic

texts. They show that sentiment analysis of English translations of Arabic texts

produces competitive results, w.r.t. Arabic sentiment analysis. They also claim that

even though translation signiﬁcantly reduces human ability to recover sentiment,

automatic sentiment systems are affected relatively less by this.

Some of the areas less explored in the realm of multilingual sentiment analysis

include: how to translate text so as to preserve the degree of sentiment in the

source text; how sentiment modiﬁers such as negators and modals differ in function

across languages; understanding how automatic translations differ from manual

translations in terms of sentiment; and how to translate ﬁgurative language without

losing its affectual gist.

4.7 Challenges in Applying Sentiment Analysis

Applications of sentiment analysis beneﬁt from the fact that even though systems are

not extremely accurate at determining sentiment of individual sentences, they can

accurately capture signiﬁcant changes in the proportion of instances that are positive

(or negative). It is also worth noting that such sentiment tracking systems are more

effective when incorporating carefully chosen baselines. For example, knowing the

percentage of tweets that are negative towards Russian President, Vladimir Putin,

is less useful than, for instance, knowing: the percentage of tweets that are negative

towards Putin before vs. after the invasion of Crimea; or, the percentage of tweets

that are negative towards Putin in Russia vs. the rest of the world; or, the percentage

of tweets negative towards Putin vs. Barack Obama (US president).

Sentiment analysis is commonly applied in several areas including tracking

sentiment towards products, movies, politicians, and companies (O’Connor et al.

2010; Pang and Lee 2008), improving customer relation models (Bougie et al.

2003), detecting happiness and well-being (Schwartz et al. 2013), tracking the stock

market (Bollen et al. 2011), and improving automatic dialogue systems (Velásquez

1997;Ravajaetal.2006). The sheer volume of work in this area precludes

detailed summarization here. Nonetheless, it should be noted that often the desired

application can help direct certain design choices in the sentiment analysis system.

For example, the threshold between neutral and positive sentiment and the threshold

between neutral and negative sentiment can be determined empirically by what

4 Challenges in Sentiment Analysis 77

is most suitable for the target application. Similarly, as suggested earlier, some

applications may require only the identiﬁcation of strongly positive and strongly

negative instances.

Abundant availability of product reviews and their ratings has powered a lot of

the initial research in sentiment analysis, however, as we look forward, one can

be optimistic that the future holds more diverse and more compelling applications

of sentiment analysis. Some recent examples include predicting heart attack rates

through sentiment word usage in tweets (Eichstaedt et al. 2015), corpus-based

poetry generation (Colton et al. 2012), generating music that captures the sentiment

in novels (Davis and Mohammad 2014), conﬁrming theories in literary analysis

(Hassan et al. 2012), and automatically detecting Cyber-bullying (Nahar et al. 2012).

References

Agarwal, A., B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of twitter

data. In Proceedings of Language in Social Media, 30–38. Portland.

Aisopos, F., G. Papadakis, K. Tserpes, and T. Varvarigou. 2012. Textual and contextual patterns for

sentiment analysis over microblogs. In Proceedings of the 21st WWW Companion,NewYork,

453–454.

Aman, S., and S. Szpakowicz. 2007. Identifying expressions of emotion in text. In Text, Speech

and Dialogue,Lecture notes in computer science, vol. 4629, 196–205.

Anand, Pranav, et al. 2011. Cats rule and dogs drool!: Classifying stance in online debate. In

Proceedings of the ACL workshop on computational approaches to subjectivity and sentiment

analysis, Portland.

Baker, C.F., C.J. Fillmore, and J.B. Lowe. 1998. The Berkeley framenet project. In Proceedings of

ACL, Stroudsburg, 86–90.

Bakliwal, A., P. Arora, S. Madhappan, N. Kapre, M. Singh, and V. Varma. 2012. Mining sentiments

from tweets. In Proceedings of WASSA’12, 11–18, Jeju.

Balahur, A., and M. Turchi. 2014. Comparative experiments using supervised learning and machine

translation for multilingual sentiment analysis. Computer Speech & Language 28(1): 56–75.

Baroni, M., and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing

adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on

Empirical Methods in Natural Language Processing, 1183–1193.

Bellegarda, J. 2010. Emotion analysis using latent affective folding and embedding. In Proceedings

of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation

of Emotion in Text. Los Angeles.

Bollen, J., H. Mao, and X. Zeng. 2011. Twitter mood predicts the stock market. Journal of

Computational Science 2(1): 1–8.

Boucouvalas, A.C. 2002. Real time text-to-emotion engine for expressive internet communication.

Emerging Communication: Studies on New Technologies and Practices in Communication 5:

305–318.

Bougie, J.R.G., R. Pieters, and M. Zeelenberg. 2003. Angry customers don’t come back, they get

back: The experience and behavioral implications of anger and dissatisfaction in services. Open

access publications from Tilburg university, Tilburg University.

Brody, S., and N. Diakopoulos. 2011. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word

lengthening to detect sentiment in microblogs. In Proceedings of the Conference on Empirical

Methods in Natural Language Processing, EMNLP’11, 562–570.

Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for

sentiment analysis based on conceptual primitives. In: COLING, 2666–2677.

78 S.M. Mohammad

Carvalho, P., L. Sarmento, M.J. Silva, and E. De Oliveira, 2009. Clues for detecting irony in

user-generated contents: oh...!! it’s so easy;-). In Proceedings of the 1st International CIKM

Workshop on Topic-Sentiment Analysis for Mass Opinion, 53–56. ACM.

Choi, Y., and C. Cardie. 2008. Learning with compositional semantics as structural inference for

subsentential sentiment analysis. In Proceedings of the Conference on Empirical Methods in

Natural Language Processing, EMNLP’08, Honolulu, 793–801.

Colton, S., J. Goodwin, and T. Veale. 2012. Full face poetry generation. In Proceedings of the Third

International Conference on Computational Creativity, 95–102.

Davis, H., and S. Mohammad. 2014. Generating music from literature. In Proceedings of the 3rd

Workshop on Computational Linguistics for Literature (CLFL), Gothenburg, 1–10.

Deng, L., and J. Wiebe. 2014. Sentiment propagation via implicature constraints. In EACL, 377–

385.

Diman Ghazi, D.I., and S. Szpakowicz. 2015. Detecting emotion stimuli in emotion-bearing

sentences. In: Proceedings of the 2015 Conference on Intelligent Text Processing and Com-

putational Linguistics.

Dong, L., F. Wei, M. Zhou, and K. Xu. 2014. Adaptive multi-compositionality for recursive neural

models with applications to sentiment analysis. In Twenty-Eighth AAAI Conference on Artiﬁcial

Intelligence (AAAI).

Eichstaedt, J.C., H.A. Schwartz, M.L. Kern, G. Park, D.R. Labarthe, R.M. Merchant, S. Jha, M.

Agrawal, L.A. Dziurzynski, and M. Sap et al. 2015. Psychological language on twitter predicts

county-level heart disease mortality. Psychological Science 2: 159–169.

Esuli, A., and F. Sebastiani. 2006. SENTIWORDNET: A publicly available lexical resource

for opinion mining. In In Proceedings of the 5th Conference on Language Resources and

Evaluation, LREC’06, 417–422.

Filatova, E. 2012. Irony and sarcasm: Corpus generation and analysis using crowdsourcing. In

LREC, 392–398.

Genereux, M., and R.P. Evans. 2006. Distinguishing affective states in weblogs. In AAAI-2006

Spring Symposium on Computational Approaches to Analysing Weblogs, Stanford, 27–29.

Gildea, D., and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics

28(3): 245–288.

González-Ibánez, R., S. Muresan, and N. Wacholder. 2011. Identifying sarcasm in twitter: A closer

look. In Proceedings of the ACL, 581–586.

Grefenstette, E., G. Dinu, Y.-Z. Zhang, M. Sadrzadeh, and M. Baroni. 2013. Multi-step regression

learning for compositional distributional semantics. arXiv preprint arXiv:1301.6939.

Grefenstette, E., and M. Sadrzadeh. 2011. Experimental support for a categorical compositional

distributional model of meaning. In Proceedings of the Conference on Empirical Methods in

Natural Language Processing, 1394–1404.

Hasan, Kazi Saidul, and Vincent Ng. 2013. Stance classiﬁcation of ideological debates: Data,

models, features, and constraints. In The 6th international joint conference on natural language

processing, Nagoya.

Hassan, A., A. Abu-Jbara, and D. Radev. 2012. Extracting signed social networks from text.

In Workshop Proceedings of TextGraphs-7 on Graph-Based Methods for Natural Language

Processing, 6–14.

Hatzivassiloglou, V., and K.R. McKeown. 1997. Predicting the semantic orientation of adjectives.

In Proceedings of the 8th Conference of European Chapter of the Association for Computa-

tional Linguistics, Madrid, 174–181.

Holzman, L.E., and W.M. Pottenger. 2003. Classiﬁcation of emotions in internet chat: An

application of machine learning using speech phonemes. Technical report, Leigh University.

Hu, M., and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’04,

New York, 168–177.

John, D., A.C. Boucouvalas, and Z. Xu. 2006. Representing emotional momentum within expres-

sive internet communication. In Proceedings of the 24th IASTED International Conference on

Internet and Multimedia Systems and Applications, 183–188. Anaheim: ACTA Press.

4 Challenges in Sentiment Analysis 79

Jurgens, D., S.M. Mohammad, P. Turney, and K. Holyoak. 2012. Semeval-2012 task 2: Measuring

degrees of relational similarity. In Proceedings of the 6th International Workshop on Semantic

Evaluation, SemEval’12, Montréal, 356–364.

Kalchbrenner, N., E. Grefenstette, and P. Blunsom. 2014. A convolutional neural network for

modelling sentences. arXiv preprint arXiv:1404.2188.

Kennedy, A., and D. Inkpen. 2005. Sentiment classiﬁcation of movie and product reviews using

contextual valence shifters. In Proceedings of the Workshop on the Analysis of Informal and

Formal Information Exchange During Negotiations, Ottawa

Kessler, J.S., and N. Nicolov. 2009. Targeting sentiment expressions through supervised ranking of

linguistic conﬁgurations. In 3rd Int’l AAAI Conference on Weblogs and Social Media (ICWSM

2009).

Kiritchenko, S., and S.M. Mohammad. 2016a. Capturing reliable ﬁne-grained sentiment associa-

tions by crowdsourcing and best–worst scaling. In Proceedings of the 15th Annual Conference

of the North American Chapter of the Association for Computational Linguistics: Human

Language Technologies (NAACL), San Diego.

Kiritchenko, S., and S.M. Mohammad. 2016b. The effect of negators, modals, and degree adverbs

on sentiment composition. In Proceedings of the Workshop on Computational Approaches to

Subjectivity, Sentiment and Social Media Analysis (WASSA).

Kiritchenko, S., and S.M. Mohammad. 2016c. Sentiment composition of words with opposing

polarities. In Proceedings of the 15th Annual Conference of the North American Chapter of

the Association for Computational Linguistics: Human Language Technologies (NAACL),San

Diego.

Kiritchenko, S., S.M. Mohammad, and M. Salameh. 2016. Semeval-2016 task 7: Determining

sentiment intensity of English and arabic phrases. In Proceedings of the International Workshop

on Semantic Evaluation, SemEval-2016, San Diego.

Kiritchenko, S., X. Zhu, C. Cherry, and S. Mohammad. 2014a. Nrc-canada-2014: Detecting aspects

and sentiment in customer reviews. In Proceedings of the 8th International Workshop on

Semantic Evaluation (SemEval 2014), Dublin, 437–442.

Kiritchenko, S., X. Zhu, and S.M. Mohammad. 2014b. Sentiment analysis of short informal texts.

Journal of Artiﬁcial Intelligence Research 50: 723–762.

Ku, L.-W., Y.-T. Liang, and H.-H. Chen. 2006. Opinion extraction, summarization and tracking in

news and blog corpora. In AAAI Spring Symposium: Computational Approaches to Analyzing

Weblogs, vol. 100107.

Li, W., and H. Xu. 2014. Text-based emotion classiﬁcation using emotion cause extraction. Expert

Systems with Applications 41(4, Part 2): 1742–1749.

Liu, B., and L. Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text

Data, ed. C.C. Aggarwal and C. Zhai, 415–463. New York: Springer

Liu, J., Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. 2007. Low-quality product review detection in

opinion summarization. In EMNLP-CoNLL, 334–342.

Lloret, E., A. Balahur, M. Palomar, and A. Montoyo. 2009. Towards building a competitive

opinion summarization system: challenges and keys. In Proceedings of Human Language

Technologies: The 2009 Annual Conference of the North American Chapter of the Association

for Computational Linguistics, Companion Volume: S, 72–77.

Louviere, and J.J. 1991. Best-worst scaling: A model for the largest difference judgments. Working

Paper.

Lu, B., and B.K. Tsou. 2010. Cityu-dac: Disambiguating sentiment-ambiguous adjectives within

context. In Proceedings of the 5th International Workshop on Semantic Evaluation, 292–295.

Ma, C., H. Prendinger, and M. Ishizuka. 2005. Emotion estimation and reasoning based on affective

textual interaction. In First International Conference on Affective Computing and Intelligent

Interaction (ACII-2005), ed. J. Tao, R.W. Picard, Beijing, 622–628.

Màrquez, L., X. Carreras, K.C. Litkowski, and S. Stevenson. 2008. Semantic role labeling: An

introduction to the special issue. Computational Linguistics 34(2): 145–159.

80 S.M. Mohammad

Mihalcea, R., C. Banea, and J. Wiebe. 2007. Learning multilingual subjective language via

cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of

Computational Linguistics.

Mihalcea, R., and H. Liu. 2006. A corpus-based approach to ﬁnding happiness. In AAAI-2006

Spring Symposium on Computational Approaches to Analysing Weblogs, 139–144. AAAI

Press.

Mikolov, T., I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. 2013. Distributed representations of

words and phrases and their compositionality. In Advances in Neural Information Processing

Systems, 3111–3119.

Mitchell, J., and M. Loapata. 2010. Composition in distributional models of semantics. Cognitive

Science 34(8): 1388–1429.

Mohammad, S. 2012. Portable features for classifying emotional text. In Proceedings of the 2012

Conference of the North American Chapter of the Association for Computational Linguistics:

Human Language Technologies, Montréal, 587–591.

Mohammad, S., S. Kiritchenko, and X. Zhu. 2013a. Nrc-canada: Building the state-of-the-art

in sentiment analysis of tweets. In Proceedings of the Seventh International Workshop on

Semantic Evaluation Exercises (SemEval-2013), Atlanta.

Mohammad, S., S. Kiritchenko, and X. Zhu. 2013b. NRC-Canada: Building the state-of-the-art

in sentiment analysis of tweets. In Proceedings of the International Workshop on Semantic

Evaluation, SemEval’13, Atlanta.

Mohammad, S., and T. Yang. 2011. Tracking sentiment in mail: How genders differ on emotional

axes. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and

Sentiment Analysis (WASSA 2011), Portland, 70–79.

Mohammad, S.M. 2012. #emotional tweets. In Proceedings of the First Joint Conference on

Lexical and Computational Semantics – Volume 1: Proceedings of the Main Conference and

the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic

Evaluation, SemEval’12, Stroudsburg, 246–255.

Mohammad, S.M. 2016a. A practical guide to sentiment annotation: Challenges and solutions.

In Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and

Social Media Analysis.

Mohammad, S.M. 2016b. Sentiment analysis: Detecting valence, emotions, and other affectual

states from text. In Emotion Measurement, ed. H. Meiselman. Amsterdam: Elsevier.

Mohammad, S.M., S. Kiritchenko, P. Sobhani, X. Zhu, and C. Cherry. 2016a. Semeval-2016

task 6: Detecting stance in tweets. In Proceedings of the International Workshop on Semantic

Evaluation, SemEval’16, San Diego.

Mohammad, S.M., P. Sobhani, and S. Kiritchenko. 2016b, In Press. Stance and Sentiment in

Tweets. Special Section of the ACM Transactions on Internet Technology on Argumentation

in Social Media.

Mohammad, S.M., and P.D. Turney. 2010. Emotions evoked by common words and phrases:

Using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL-HLT 2010

Workshop on Computational Approaches to Analysis and Generation of Emotion in Text,

California.

Mohammad, S.M., and P.D. Turney. 2013. Crowdsourcing a word-emotion association lexicon.

Computational Intelligence 29(3): 436–465.

Mohammad, S.M., X. Zhu, S. Kiritchenko, and J. Martin. 2015a. Sentiment, emotion, purpose, and

style in electoral tweets. Information Processing & Management 51: 480–499.

Mohammad, S.M., X. Zhu, S. Kiritchenko, and J. Martin. 2015b. Sentiment, emotion, purpose, and

style in electoral tweets. Information Processing and Management 51(4): 480–499.

Murakami, Akiko, and Rudy Raymond. 2010. Support or oppose?: Classifying positions in online

debates from reply activities and opinion expressions. In Proceedings of the 23rd international

conference on computational linguistics, Beijing.

Nahar, V., S. Unankard, X. Li, and C. Pang. 2012. Sentiment analysis for effective detection of

cyber bullying. In Web Technologies and Applications, 767–774. Berlin/Heidelberg: Springer.

4 Challenges in Sentiment Analysis 81

Nalisnick, E.T., and H.S. Baird. 2013a. Character-to-character sentiment analysis in Shakespeare’s

plays. In Proceedings of the 51st annual meeting of the association for computational

linguistics (ACL), Short Paper, Soﬁa, 479–483, Aug 2013.

Nalisnick, E.T., and H.S. Baird. 2013b. Extracting sentiment networks from Shakespeare’s plays.

In 2013 12th International Conference on Document Analysis and Recognition (ICDAR), 758–

762. IEEE.

Neviarouskaya, A., H. Prendinger, and M. Ishizuka. 2009. Compositionality principle in recog-

nition of ﬁne-grained emotions from text. In Proceedings of the Proceedings of the Third

International Conference on Weblogs and Social Media (ICWSM-09), 278–281, San Jose.

O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. From tweets to polls:

Linking text sentiment to public opinion time series. In Proceedings of the International AAAI

Conference on Weblogs and Social Media.

Orme, B. 2009. Maxdiff analysis: Simple counting, individual-level logit, and HB. Orem: Sawtooth

Software, Inc.

Pak, A., and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining.

In Proceedings of the 7th Conference on International Language Resources and Evaluation,

LREC’10, Valletta.

Palmer, M., D. Gildea, and N. Xue. 2010. Semantic role labeling. Synthesis Lectures on Human

Language Technologies 3(1): 1–103.

Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in

Information Retrieval 2(1–2): 1–135.

Pontiki, M., D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar.

2014a. SemEval-2014 Task 4: Aspect based sentiment analysis. In Proceedings of the Interna-

tional Workshop on Semantic Evaluation, SemEval’14, Dublin.

Pontiki, M., H. Papageorgiou, D. Galanis, I. Androutsopoulos, J. Pavlopoulos, and S. Manandhar.

2014b. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th

International Workshop on Semantic Evaluation, SemEval’14, Dublin.

Popescu, A.-M., and O. Etzioni. 2005. Extracting product features and opinions from reviews. In

Proceedings of the Conference on Human Language Technology and Empirical Methods in

Natural Language Processing, HLT’05, Stroudsburg, 339–346.

Qadir, A. 2009. Detecting opinion sentences speciﬁc to product features in customer reviews using

typed dependency relations. In Proceedings of the Workshop on Events in Emerging Text Types,

eETTs’09, Stroudsburg, 38–43.

Ravaja, N., T. Saari, M. Turpeinen, J. Laarni, M. Salminen, and M. Kivikangas. 2006. Spatial

presence and emotions during video game playing: Does it matter with whom you play?

Presence: Teleoperators and Virtual Environments 15(4): 381–392.

Reyes, A., P. Rosso, and T. Veale. 2013. A multidimensional approach for detecting irony in twitter.

Language Resources and Evaluation 47(1): 239–268.

Rosenthal, S., P. Nakov, S. Kiritchenko, S. Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval-

2015 task 10: Sentiment analysis in Twitter. In Proceedings of the 9th International Workshop

on Semantic Evaluation, SemEval’15, Denver, 450–462.

Rosenthal, S., P. Nakov, A. Ritter, and V. Stoyanov. 2014a. SemEval-2014 Task 9: Sentiment

analysis in Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation,

ed. P. Nakov, and T. Zesch, SemEval-2014, Dublin.

Rosenthal, S., A. Ritter, P. Nakov, and V. Stoyanov. 2014b. SemEval-2014 Task 9: Sentiment

analysis in Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation

(SemEval 2014), Dublin, 73–80.

Rudolph, S., and E. Giesbrecht. 2010. Compositional matrix-space models of language. In

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,

907–916.

Russell, J.A. 1980. A circumplex model of affect. Journal of Personality and Social Psychology

39(6): 1161.

82 S.M. Mohammad

Salameh, M., S.M. Mohammad, and S. Kiritchenko. 2015. Sentiment after translation: A case-

study on arabic social media posts. In Proceedings of the North American Chapter of

Association of Computational Linguistics,Denver.

Schwartz, H., J. Eichstaedt, M. Kern, L. Dziurzynski, R. Lucas, M. Agrawal, G. Park, et al.

2013. Characterizing geographic variation in well-being using tweets. In Proceedings of the

International AAAI Conference on Weblogs and Social Media.

Severyn, A., and A. Moschitti. 2015. Unitn: Training deep convolutional neural network for

twitter sentiment classiﬁcation. In Proceedings of the 9th International Workshop on Semantic

Evaluation (SemEval 2015), 464–469. Denver: Association for Computational Linguistics.

Socher, R., B. Huval, C.D. Manning, and A.Y. Ng. 2012. Semantic compositionality through

recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Methods in

Natural Language Processing, EMNLP’12, Jeju.

Socher, R., A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. 2013. Recur-

sive deep models for semantic compositionality over a sentiment treebank. In Proceedings of

the Conference on Empirical Methods in Natural Language Processing, EMNLP’13, Seattle.

Somasundaran, Swapna, and Janyce Wiebe. 2009. Recognizing stances in online debates. In

Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th

international joint conference on natural language processing of the AFNLP, Singapore.

Somprasertsri, G., and P. Lalitrojwong. 2010. Mining feature-opinion in online customer reviews

for opinion summarization. Journal of Universal Computer Science 16(6): 938–955.

Sridhar, Dhanya, Lise Getoor, and Marilyn Walker. 2014. Collective stance classiﬁcation of posts

in online debate forums. In Proceedings of the 52nd annual meeting of the association for

computational linguistics, Baltimore.

Stone, P., D.C. Dunphy, M.S. Smith, D.M. Ogilvie, and associates. 1966. The General Inquirer: A

Computer Approach to Content Analysis. Cambridge, MA: The MIT Press.

Stoyanov, V., and C. Cardie. 2006. Toward opinion summarization: Linking the sources. In

Proceedings of the Workshop on Sentiment and Subjectivity in Text, 9–14.

Strapparava, C., and R. Mihalcea. 2007. Semeval-2007 Task 14: Affective text. In Proceedings of

SemEval-2007, Prague, 70–74.

Su, Q., K. Xiang, H. Wang, B. Sun, and S. Yu. 2006. Using pointwise mutual information

to identify implicit features in customer reviews. In Proceedings of the 21st International

Conference on Computer Processing of Oriental Languages: Beyond the Orient: The Research

Challenges Ahead, ICCPOL’06, 22–30. Berlin/Heidelberg: Springer.

Taboada, M., J. Brooke, M. Toﬁloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for

sentiment analysis. Computational Linguistics 37(2): 267–307.

Taboada, M., K. Voll, and J. Brooke. 2008. Extracting sentiment as a function of discourse structure

and topicality. Simon Fraser Univeristy School of Computing Science Technical Report.

Thelwall, M., K. Buckley, and G. Paltoglou. 2011. Sentiment in Twitter events. Journal of the

American Society for Information Science and Technology 62(2): 406–418.

Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition

from congressional ﬂoor-debate transcripts. In Proceedings of the 2006 conference on empirical

methods in natural language processing. Sydney: Association for Computational Linguistics.

Tokuhisa, R., K. Inui, and Y. Matsumoto. 2008. Emotion classiﬁcation using massive examples

extracted from the web. In Proceedings of the 22nd International Conference on Computational

Linguistics – Volume 1, COLING’08, 881–888.

Turney, P., and M.L. Littman. 2003. Measuring praise and criticism: Inference of semantic

orientation from association. ACM Transactions on Information Systems 21(4): 315–346.

Turney, P.D. 2014. Semantic composition and decomposition: From recognition to generation.

arXiv preprint arXiv:1405.7908.

Veale, T., and Y. Hao. 2010. Detecting ironic intent in creative comparisons. ECAI 215: 765–770.

Velásquez, J.D. 1997. Modeling emotions and other motivations in synthetic agents. In Proceedings

of the Fourteenth National Conference on Artiﬁcial Intelligence and Ninth Conference on

Innovative Applications of Artiﬁcial Intelligence, AAAI’97/IAAI’97, 10–15. AAAI Press.

4 Challenges in Sentiment Analysis 83

Walker, Marilyn A., et al. 2012. A corpus for research on deliberation and debate. In proceedings

of the eighth international conference on language resources and evaluation (LREC),Istanbul.

Wan, X. 2008. Using bilingual knowledge and ensemble techniques for unsupervised Chinese

sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural

Language Processing, EMNLP’08, 553–561.

Wang, C., and F. Wang. 2012. A bootstrapping method for extracting sentiment words using degree

adverb patterns. In 2012 International Conference on Computer Science and Service System

(CSSS), 2173–2176. IEEE.

Wilson, T., Z. Kozareva, P. Nakov, S. Rosenthal, V. Stoyanov, and A. Ritter. 2013. SemEval-

2013 Task 2: Sentiment analysis in Twitter. In Proceedings of the International Workshop on

Semantic Evaluation, SemEval’13, Atlanta.

Wilson, T., J. Wiebe, and P. Hoffmann. 2005. Recognizing contextual polarity in phrase-level

sentiment analysis. In Proceedings of the Conference on Human Language Technology and

Empirical Methods in Natural Language Processing, HLT’05, Stroudsburg, 347–354.

Xu, G., C.-R. Huang, and H. Wang. 2013. Extracting Chinese product features: Representing a

sequence by a set of skip-bigrams. In Proceedings of the 13th Chinese Conference on Chinese

Lexical Semantics, CLSW’12, 72–83. Berlin/Heidelberg: Springer.

Xu, R., K.-F. Wong, Q. Lu, Y. Xia, and W. Li. 2008. Learning knowledge from relevant webpage

for opinion analysis. In IEEE/WIC/ACM International Conference on Web Intelligence and

Intelligent Agent Technology, 2008. WI-IAT’08., vol. 1, 307–313. IEEE.

Yessenalina, A., and C. Cardie. 2011. Compositional matrix-space models for sentiment analysis.

In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 172–

182.

Zarrella, G., and A. Marsh. 2016. MITRE at SemEval-2016 Task 6: Transfer learning for Stance

detection. In Proceedings of the International Workshop on Semantic Evaluation, SemEval’16,

San Diego.

Zhang, C., D. Zeng, Q. Xu, X. Xin, W. Mao, and F.-Y. Wang. 2008. Polarity classiﬁcation

of public health opinions in Chinese. In Intelligence and Security Informatics, 449–454.

Berlin/Heidelberg: Springer.

Zhang, L., and B. Liu. 2011. Identifying noun product features that imply opinions. In Proceedings

of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language

Technologies: Short Papers – Volume 2, HLT’11, 575–580.

Zhang, L., B. Liu, S.H. Lim, and O’Brien-E. Strain. 2010. Extracting and ranking product features

in opinion documents. In Proceedings of the 23rd International Conference on Computational

Linguistics: Posters, COLING’10, Stroudsburg, 1462–1470.

Zhe, X., and A. Boucouvalas. 2002. Text-to-Emotion Engine for Real Time Internet Communication

Text-to-Emotion Engine for Real Time Internet Communication, 164–168.

Zhu, X., H. Guo, S. Mohammad, and S. Kiritchenko. 2014. An empirical study on the effect of

negation words on sentiment. In Proceedings of the 52nd Annual Meeting of the Association

for Computational Linguistics (Volume 1: Long Papers), Baltimore, 304–313.

Chapter 5

Sentiment Resources: Lexicons and Datasets

Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire

Abstract Sentiment lexicons and datasets represent the knowledge base that lies

at the foundation of a SA system. In its simplest form, a sentiment lexicon is

a repository of words/phrases labelled with sentiment. Similarly, a sentiment-

annotated dataset consists of documents (tweets, sentences or longer documents)

labelled with one or more sentiment labels. This chapter explores the philosophy,

execution and utility of popular sentiment lexicons and datasets. We describe

different labelling schemes that may be used. We then provide a detailed description

of existing sentiment and emotion lexicons, and the trends underlying research in

lexicon generation. This is followed by a survey of sentiment-annotated datasets

and the nuances of labelling involved. We then show how lexicons and datasets

created for one language can be transferred to a new language. Finally, we place

these sentiment resources in the perspective of their classic applications to sentiment

analysis.

Keywords Sentiment lexicons • Sentiment datasets • Evaluation • Transfer

learning

The previous chapter shows that sentiment analysis (SA) is indeed more challenging

than it seems. The next question that arises is, where does the program ‘learn’

the sentiment from? In other words, where does the knowledge required for a SA

system come from? This chapter discusses sentiment resources as means to this

requirement of knowledge. We refer to words/phrases and documents as ‘textual

units’. In sentiment resources, it is these textual units that are annotated with

sentiment information.

A. Joshi ()

IITB-Monash Research Academy, Mumbai, India

e-mail: adityaj@cse.iitb.ac.in

P. Bhattacharyya • S. Ahire

IIT Bombay, Mumbai, India

e-mail: pb@cse.iitb.ac.in

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_5

86 A. Joshi et al.

5.1 Introduction

Sentiment resources, i.e., lexicons and datasets represent the knowledge base of

a SA system. Thus, creation of a sentiment lexicon or a dataset is the funda-

mental requirement of a SA system. In case of a lexicon, it is in the form of

simpler units like words and phrases, whereas in case of datasets, it consists of

comparatively longer text. There exists a wide spectrum of such resources that

can be used for sentiment/emotion analysis. Before we proceed, we reiterate the

deﬁnition of sentiment and emotion analysis. We refer to sentiment analysis as a

positive/negative/neutral classiﬁcation task, whereas emotion analysis deals with

a wider spectrum of emotions such as angry, excited, etc. A discussion on both

sentiment and emotion lexicons is imperative to show how different the philosophy

behind construction of the two is.

A sentiment resource is a repository of textual units marked with one or

more labels representing a sentiment state. This means that there are two driving

components of a sentiment resource: (a) the textual unit, and (b) the labels. We

discuss the second component, labels in detail in Sect. 5.2.

In case of a sentiment lexicon, the lexical unit may be a word, a phrase or a

concept from a general purpose lexicon like WordNet. What constitutes the labels

is also important. The set of labels may be purely functional: task-based. For a

simple positive-negative classiﬁcation, it is often sufﬁcient to have a set of positive

and negative words. If the goal is a system that gives ‘magnitude’ (‘The movie was

horrible’ is more strongly negative than ‘The movie was bad’), then the lexicon

needs to capture that information in terms of a magnitude in addition to positive and

negative words.

An annotated dataset consists of documents labelled with one or more output

labels. As in the case of sentiment lexicons, the two driving components of a

sentiment-annotated dataset are: (a) the textual unit, and (b) the labels. For example,

a dataset may consist of a set of movie reviews (the textual units) annotated

by human annotators as positive or negative (the labels). Datasets often contain

additional annotation in order to enrich the quality of annotation. For example,

a dataset of restaurant reviews annotated with sentiment may contain additional

annotation in the form of restaurant location. Such annotation may facilitate insights

such as: which restaurant is the most popular, what are the issues with respect to this

outlet of a restaurant that people complain the most about, etc.

5.2 Labels

A set of labels is the pre-determined set of attributes that each textual unit in a

sentiment resource will be annotated with. The process of assigning a label to a

textual unit is called annotation, and in case the label pertains to sentiment, the

process is called sentiment annotation. The goal of sentiment annotation is to assign

5 Sentiment Resources: Lexicons and Datasets 87

labels in one out of three schemes: absolute, overlapping and fuzzy. The ﬁrst two

are shown in Liu (2010).

Absolute labelling is when a textual unit is marked as exactly one out of multiple

labels. An example of absolute labelling may be positive versus negative – where

each document is annotated as either positive or negative. An additional label

‘neutral’ may be added. A fallback label such as ‘ambiguous’/‘unknown’/‘unsure’

may be introduced. Numeric schemes that allow labels to range between, say, C5to

5 also fall under this method of labelling.

Labels can be overlapping as well. A typical example of this is emotion labels.

Emotions are more complex than sentiment, because there can be more than one

emotion at a time. For example, the sentence, “Was happy to bump into my friend

at the airport this afternoon.” would be labelled as positive as a sentiment-annotated

sentence. On the other hand, an emotion annotation would require two labels to

be assigned to this text: happiness and surprise. Emotions can, in fact, be thought

of arising from a combination of emotions, and their magnitudes. This means that

while positive-negative are mutually exclusive, emotions need not be. In such cases,

each one of them must be viewed as a Boolean attribute. This means that the word

‘amazed’ will be marked as ‘happy: yes, surprised: yes’ for an emotion lexicon,

whereas the same ‘amazed’ will be marked as ‘positive’ for a sentiment lexicon. By

deﬁnition, a positive word implies that it is not negative.

Finally, the third scheme of labelling is fuzzy: where a distribution over different

labels is assigned to a textual unit. Consider the case where we assign a distribution

over ‘positive/negative’ as a label. Such a distribution implies likelihood of the

textual unit to belong to the given label. For example, a word with ‘positive:0.8,

negative:0.2’ means that the word tends to occur more frequently in a positive

sense – however, it is not completely positive and it may still be used in the negative

sense to an extent.

Several linguistic studies have explored what constitutes basic labels for a

sentiment resource. In the next subsections, we look at three strategies.

5.2.1 Stand-Alone Labels

A sentiment resource may use two labels: positive or negative. The granularity

can be increased to strongly positive, moderately positive and so on. A positive

unit represents a desirable state, whereas a negative unit represents an undesirable

state (Liu 2010). Emotion labels are more nuanced. Basic emotions are a list of

emotions that are fundamental to human experience. Whether or not there are any

basic emotions at all, and whether it is worthwhile to discover these basic emotions

has been a matter of disagreement. Ortony and Turner (1990) state that the basic

emotion approach (i.e., stating that there are basic emotions and other emotions

evolve from them) is ﬂawed, while Ekman (1992) supports the basic emotion theory.

Several basic emotions have been suggested. Ekman suggests six basic emotions:

88 A. Joshi et al.

anger, disgust, fear, sadness, happiness and surprise. Plutchik has listed eight basic

emotions: six from Ekman’s list in addition to anticipation and trust (Plutchik 1980).

5.2.2 Dimensions

Sentiment has been deﬁned by Liu (2010) as a 5-tuple: <sentiment-holder,

sentiment-target, sentiment-target-aspect, sentiment, sentiment-time>. This means

that sentiment in a textual unit can be captured accurately only if information

along the ﬁve dimensions is obtained. Similarly, emotions can also be looked at in

the form of two dimensions: valence and arousal (Mehrabian and Russell 1974).

Valence indicates whether an emotion is pleasant or unpleasant. Arousal indicates

the magnitude of an emotion. Happy and excited are two forms of a pleasant

emotion, but they differ along the arousal axis. Excitement indicates a state where a

person is happy, but aroused to a great degree. On the other hand, calm and content,

while still being pleasant emotions, represent a deactivated state. Corresponding

emotions in the left quadrant (that indicates unpleasant emotions) are sad, stressed,

bored and fatigued. In such a case, overlapping labelling must be used. A resource

annotated using dimensional structure will assign a value per dimension for each

textual unit.

5.2.3 Structures

Plutchik wheel of emotions (Plutchik 1982) is a popular structure that represents

basic emotions, and emotions that arise as a combination of these emotions. It

combines the notion of basic emotions, along with arousal as seen in case of

emotion dimensions. The basic emotions according to Plutchik’s wheel are joy,

trust, fear, surprise, anticipation, sadness, disgust, anger and anticipation. The basic

emotions are arranged in a circular manner to indicate antonymy. The opposite of

‘joy’ is placed diametrically opposite to it: ‘sadness’. Similarly, ‘anticipation’ lies

diametrically opposite to ‘surprise’. Each ‘petal’ of the wheel indicates the arousal

of the emotion. The emotion ‘joy’ has ‘serenity’ above it and ‘ecstasy’ below it.

These emotions indicate a deactivated and activated state of arousal respectively.

Similarly, an aroused state of ‘anger’ becomes ‘rage’. Thus, the eight emotions in the

central circle are the aroused forms of the basic emotions. These are: rage, loathing,

grief, amazement, terror, admiration, ecstasy and vigilance. The wheel also allows

combination of emotions to create more nuanced emotions. A resource annotated

using a structure such as the Plutchik wheel of emotions will place every textual

unit in the space represented by the structure.

5 Sentiment Resources: Lexicons and Datasets 89

5.3 Lexicons

We now discuss sentiment lexicons: we describe them individually ﬁrst, and then

show trends in lexicon generation. Words/phrases have two kinds of sentiment,

as given in Liu (2010): absolute and relative. Absolute sentiment means that

the sentiment remains the same, given the right word/phrase and meaning. For

example, the word ‘beautiful’ is a positive word. Relative sentiment means that the

sentiment changes depending on the context. For example, the word ‘increased’ or

‘fuelled’ has a positive/negative sentiment based on what the object of the word is.

There exists a third category of sentiment: implicit sentiment. Implicit sentiment

is different from absolute sentiment. Implicit sentiment is the sentiment that is

commonly invoked in the mind of a reader when he/she reads that word/phrase.

Consider the example ‘amusement parks’. A reader typically experiences positive

sentiment on reading this word. Similarly, the phrase ‘waking up in the middle of

the night’ does involve an implicit negative sentiment.

Currently, most sentiment lexicons limit themselves to absolute sentiment words.

Extraction of implicit sentiment in phrases forms a different branch of work.

However, there exist word association lexicons that capture implied sentiment in

words (Mohammad and Turney 2010). We stick to this deﬁnition as well, and

discuss sentiment and emotion lexicons that capture absolute sentiment.

5.3.1 Sentiment Lexicons

Early development of sentiment lexicons focused on creation of sentiment dictio-

naries. Stone et al. (1966) present a lexicon called ‘General Inquirer’ that has been

widely used for sentiment analysis. Finn (2011) present a lexicon called AFINN.

Like General Inquirer, it is also a manually generated lexicon. To show the general

methodology underlying sentiment lexicons, we describe some popular sentiment

lexicons in the forthcoming subsections.

5.3.1.1 SentiWordNet

SentiWordNet, described ﬁrst by Esuli and Sebastiani (2006), is a sentiment lexicon

which augments WordNet (Miller 1995) with sentiment information. The labelling

is fuzzy, and is done by adding three sentiment scores to each synset in the WordNet

as follows. Every synsets has three scores:

1. Pos(s): The positive score of synsets

2. Neg(s): The negative score of synsets

3. Obj(s): The objective score of synsets

90 A. Joshi et al.

Thus, in SentiWordNet, sentiment is associated with the meaning of a word rather

than the word itself. This representation allows a word to have multiple sentiments

corresponding to each meaning. Because there are three scores, each meaning in

itself can be both positive and negative, or neither positive nor negative.

The process of SentiWordNet creation is an expansion of the approach used

for the three-class sentiment classiﬁcation to handle graded sentiment values. The

algorithm to create SentiWordNet can be summarized as:

1. Selection of Seed Set: A seed set L_p and L_n consisting of ‘paradigmatic’ pos-

itive and negative synsets respectively was created. Each synset was represented

using the TDS. This representation converted words in the synset, its WordNet

deﬁnition and the sample phrases together with explicit labels for negation into

vectors.

2. Creation of Training Set: This seed set was expanded for k iterations using

the following relations of WordNet: Direct antonymy, Similarity, Derived from,

Pertains to, Attribute and Also see. These were the relations hypothesized to

preserve or invert the associated sentiment. After k iterations of expansion, this

gave rise to the sets Tr_pˆk and Tr_nˆk. The objective set L_o DTr_oˆk was

assumed to consist of all the synsets that did not belong to Tr_pˆk or Tr_nˆk.

3. Creation of Classiﬁers: A classiﬁer can be deﬁned as a combination of a learning

algorithm and a training set. In addition to the two choices of learning algorithms

(SVM and Rocchio), four different training sets were constructed with the

number of iterations of expansion k D0, 2, 4, 6. The size of the training set

increased substantially with an increase in k. As a result, low values of k yielded

classiﬁers with low recall but high precision, while higher k led to high recall

but low precision. As a result there were 8 ternary classiﬁers in total due to all

combinations of the 2 learners and 4 training sets. Each ternary classiﬁer was

made up of two binary classiﬁers, positive vs. not positive and negative vs. not

negative.

4. Synset Scoring: Each synset from the WordNet was vectorized and given to the

committee of ternary classiﬁers as test input. Depending upon the output of the

classiﬁers, each synset was assigned sentiment scores by dividing the count of

classiﬁers that give a label by the total number of classiﬁers (8).

5.3.1.2 SO-CAL

Sentiment Orientation CALculator (SO-CAL) system (Brooke et al. 2009) is based

on a manually constructed low-coverage resource made up of raw words. Unlike

SentiWordNet, there is no sense information associated with a word. SO-CAL

uses as its basis a lexical sentiment resource consisting of about 5000 words. (In

comparison, SentiWordNet has over 38,000 polar words and several other strictly

objective words.) Each word in SO-CAL has a sentiment label which is an integer

5 Sentiment Resources: Lexicons and Datasets 91

in [5, C5] apart from 0 as objective words are simply excluded. The strengths

of SO-CAL lie in its accuracy, as it is manually annotated, and the use of detailed

features that handle sentiment in various cases in ways conforming to linguistic

phenomena.

SO-CAL uses several ‘features’ to model different word categories and the

effects they have on sentiment. In addition, a few special features operate outside the

scope of the lexicon in order to affect the sentiment on the document level. These

are some of the features of SO-CAL:

1. Adjectives: A manual dictionary of adjectives was created by manually tagging

all adjectives in a 500-document multidomain review corpus, and the terms from

the General Inquirer dictionary were annotated added to the list thus obtained.

2. Nouns, Verbs and Adverbs: SO-CAL also extended the approach used for

adjectives to nouns and verbs. As a result, 1142 nouns and 903 verbs were added

to the sentiment lexicon. Adverbs were added by simply adding the -ly sufﬁx to

adjectives and then manually altering words whose sentiment was not preserved,

such as essentially. In addition multi-word expressions were also added, leading

to an addition to 152 multiwords in the lexicon. Thus, while the adjective ‘funny’

has a sentiment of C2, the multiword ‘act funny’ has a sentiment of 1.

3. Intensiﬁers and Downtoners: An Intensiﬁer is a word which increases the

intensity of the phrase to which it is applied, while a Downtoner is a word which

decreases the intensity of the phrase to which it is applied. For instance the word

‘extraordinarily’ in the phrase ‘extraordinarily good’ is an intensiﬁer while the

word somewhat in the phrase ‘somewhat nice’ is a downtoner.

5.3.1.3 Sentiment Treebank & Associated Lexicon

This Treebank was introduced in Socher et al. (2013). In order to do create the

Treebank, the work also came up with a lexicon called the Sentiment Treebank,

which is a lexicon consisting of partial parse trees annotated with sentiment.

The lexicon was created as follows. A movie review corpus was obtained

from www.rottentomatoes.com, consisting of 10,662 sentences. Each sentence was

parsed using the Stanford Parser. This gave a parse tree for each sentence. The parse

trees were split into phrases, i.e., each parse tree was split into its components,

each of which was then output as a phrase. This gave rise to 215,154 phrases.

Each of these phrases was tagged for sentiment using Amazon’s Mechanical Turk’s

interface. The selection of labels is also described in the original paper. Initially,

the granularity of the sentiment values was 25, i.e., 25 possible values could be

given for the sentiment, but it was observed from the data from the Mechanical

Turks experiment that most responses contained any one of only 5 values. These

5 values were then called ‘very positive’, ‘positive’, ‘neutral’, ‘negative’ and ‘very

negative’.

92 A. Joshi et al.

Table 5.1 Summary of sentiment lexicons

Approach Lexical unit Labels Observation

SO-CAL Manual Word Integer in [5, C5] Performance can

be improved by

incorporating

linguistic features

even with low

coverage

SentiWordNet Automatic WordNet

Synset

3 fractional values

Pos, Neg, Obj in [0, 1]

WordNet captures

senses. Different

senses may have

different

sentiment.

Sentiment

Treebank

Manual,

Crowdsourced

Phrase 5 labels ranging from

“very negative” to

“very positive”

Crowdsourcing

can be beneﬁcial.

Tune labels

according to the

task.

Macquaire

semantic

orientation

lexicon

Semi-supervised Words Positive/ negative Using links in a

thesaurus to

discover new

words.

5.3.1.4 Summary

Table 5.1 summarizes sentiment lexicons described above, and in addition, also

mentions some other sentiment lexicons. We compare along four parameters: the

approach used for creation, lexical units, labels and some observations. Mohammad

et al. (2009) present Macquaire semantic orientation lexicon. This is a sentiment

lexicon that contains 76,400 terms, marked as positive or negative. In terms of

obtaining manual annotations, Louviere (1991) present an approach called the

MaxDiff approach. In this case, instead of obtaining annotations for one word at

a time, an annotator is shown multiple words and asked to identify the least positive

and most positive word among them.

5.3.2 Emotion Lexicons

We now describe emotion lexicons. They have been described in this separate

subsection so as to highlight challenges and the approaches speciﬁc to emotion

lexicon generation.

5.3.2.1 LIWC

Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2001) is a popular

manually created lexicon. The lexicon consists of 4500 words and word stems

5 Sentiment Resources: Lexicons and Datasets 93

(An example word stem is happ* which covers adjectival and adverbial forms

of the word) arranged in four categories. The four categories of words in LIWC

are: Linguistic processes (pronouns, prepositions, conjunctions, etc.), Speaking

processes (Interjections, Fillers, etc.), personal concerns (words related to work,

home, etc.) and psychological processes. The words in the psychological processes

category deal with affect and opinion, and are further classiﬁed into cognitive

and affective processes. Cognitive processes include words indicating certainty

(‘deﬁnitely’), possibility (‘likely’) and inhibition (‘prevention’), etc. Affective

processes include words with positive/negative emotion, words expressing anxiety,

anger, sadness. LIWC 2001 has 713 cognitive and 915 affective processes words.

LIWC was manually created by three linguistic experts in two steps:

(a) Deﬁne category scales: The judges determined categories and decided how they

can be grouped into a hierarchy

(b) Manual population: The categories were manually populated with words. For

each word, three judges manually evaluated whether or not a word should be

placed in a category. In addition, they also considered if a word can be moved

higher up in the hierarchy.

LIWC now exists in multiple languages, and has been widely used by several

applications for analysis of topic as well as sentiment/emotion.

5.3.2.2 ANEW

Affective norms for English words (ANEW) (Bradley and Lang 1999) is a dic-

tionary of around 1000 words where each word is indicated with a three-tuple

representation: pleasure, arousal and activation. Pleasure indicates the valence of a

word, arousal the intensity while activation indicates whether the emotion expressed

in the word is in control or not. Consider the example word ‘afraid’. This word is

indicated by the tuple (negative, 3, not) indicating that it is a negative emotion, with

an arousal of 3, and is a deactivated emotion. ANEW was manually created by 25

annotators separately. Each annotation experiment was conducted in runs of 100–

150 words. Annotators are given a sheet called ScanSAM sheet. Each annotator

marks values of S, A and M for word. The annotators perform the annotation

separately.

5.3.2.3 Emo-Lexicon

Emo-Lexicon (Mohammad and Turney 2013) is a lexicon of 14,000 terms created

using crowd-sourcing portals like Amazon Mechanical Turk. Association with

positive and negative valence as well as with the eight Plutchik emotions is also

available. Although it is manually created, the lexicon is larger than other emotion

lexicons – a clear indication that crowdsourcing is indeed a powerful mechanism

for large-scale creation of emotion lexicon. However, because the task of lexicon

94 A. Joshi et al.

creation has been opened up to the ‘crowd’, quality control is a key challenge. To

mitigate this, the lexicon is created with additional drivers, as follows:

1. A list of words is created from a thesaurus.

2. When an annotator annotates a word with emotion, he/she must ﬁrst ascertain

the sense of the word. The target word is displayed along with four words. The

annotator must select one that is closest to the target word.

3. Only if the annotator was able to correctly determine the sense of the word is

his/her annotation for emotion label obtained.

5.3.2.4 WordNet-Affect

WordNet-Affect (Strapparava and Valitutti 2004) like SentiWordNet, is a resource

that annotates senses in WordNet with emotions. WordNet Affect was created using

a semi-supervised method. It consists of 2874 synsets annotated with affective labels

(called a-labels). WordNet-Affect was created as follows:

1. A set of core synsets is created. These are synsets whose emotion has been

manually labelled in the form of a-labels.

2. These labels are projected to other synsets using WordNet relations.

3. The a-labels are then manually evaluated and corrected, wherever necessary.

5.3.2.5 Chinese Emotion Lexicon

A Chinese emotion lexicon (Xu et al. 2010) was created using a semi-supervised

approach, in absence of a graphical structure such as WordNet. There are two steps

of creation:

1. Select a core set of labelled words.

2. Expand these words using a similarity matrix. Iterate until convergence.

The similarity matrix takes three kinds of similarity into account:

1. Syntagmatic similarity: This includes co-occurrence of two words in a large text

corpus.

2. Paradigmatic similarity: This includes relations between two words in a semantic

dictionary.

3. Linguistic peculiarity: This involves syllable overlap, possibly to cover different

forms of the same word.

5.3.2.6 SenticNet

SenticNet (The most recent version, being SenticNet 4) by Cambria et al. (2016)

is a rich graphical repository of concepts. The resource aims to capture semantic,

5 Sentiment Resources: Lexicons and Datasets 95

Table 5.2 Summary of emotion lexicons

Approach Labels Observation

LIWC Manual Hierarchy of

categories

Decide hierarchy of

categories; have judges

interacting with each other

ANEW & ANEW for

Spanish

Manual Valence, arousal,

dominance

ScanSAM lists; have a set of

annotators annotating in

parallel

Emo-Lex Manual Eight emotions, two

valence categories

Use crowd-sourcing.

Attention to quality control.

WordNet affect Semi-supervised Affective labels Annotate a seed set. Expand

using WordNet relations.

Chinese emotion

lexicon

Semi-supervised Five emotions Annotate a seed set. Expand

using similarity matrices

NRC Hashtag emotion

lexicon

Automatic Eight emoticons Use hashtag based

supervision of tweets

SenticNet 4 Semi-supervised A larger structure Semi-supervised graphical

structure, created using

techniques such as

agglomerative clustering

and sentic properties of words and phrases. The sentic properties are related to

connotations of words. A detailed discussion of SenticNet forms a forthcoming

chapter of this book.

5.3.2.7 Summary

Table 5.2 shows a summary of emotion lexicons discussed in this section. We

observe that manual approaches dominate emotion lexicon creation. Key issues

in manual emotion annotation are: ascertaining the quality of the labels, decid-

ing hierarchies if any. Additional useful lexicons are available at: http://www.

saifmohammad.com/WebPages/lexicons.html. On the other hand, automatic emo-

tion annotation is mostly semi-supervised. To expand a seed set, structures like

WordNet may be used, or similarity matrices constructed from large corpora can

be employed. Mohammad (2012) present a hashtag emotion lexicon that consists

of 16,000Cunigrams annotated with eight emotions. The lexicon is created using

emotion-denoting hashtags present in tweets. Mohammad and Turney (2010)isalso

an emotion lexicon created using a crowdsourcing platform.

5.4 Sentiment-Annotated Datasets

This section describes sentiment-annotated datasets, and is organized as follows. We

ﬁrst describe sources of data, mechanisms of annotation, and then provide a list of

some sentiment-annotated datasets.

96 A. Joshi et al.

5.4.1 Sources of Data

The ﬁrst step is to obtain raw data. The following are candidate sources of raw

data:

1. Social networking websites like twitter are a rich source of data for sentiment

analysis applications. For example, Twitter API (Makice 2009) is a publicly

available API that allows you to download tweets based on a lot of interesting

search criteria such as keyword-based-search, download-timelines, download-

tweet-threads, etc.

2. Competitions such as SemEval have been regularly conducting Sentiment

analysis related tasks. These competitions release a training dataset followed by

a test dataset. These datasets can be used as benchmark datasets.

3. Discussion forums are portals where users discuss topics, often in the context

of a central theme or an initial question. These discussion forums often arrange

posts in a thread-like manner. This allows discourse nature to sentiment. How-

ever, this also introduces an additional challenge. A reply to a post could mean

one out of three possibilities: (a) The reply is an opinion with respect to the

post, offering an agreement or disagreement (example: Well-written post), (b)

The reply is an opinion towards the author of the post (example: Why do you

always post hateful things?), or (c) The reply is an opinion towards the topics

being discussed in the post. (Example: You said that the situation is bad. But do

you think that....). Reddit threads have been used as opinion datasets in several

past works.

4. Review websites: Amazon and other review websites have reviews on different

domains. Each kind of reviews has unique challenges of its own. In case of

movie reviews, the review often has a portion describing ‘what’ the movie is

about. It is possible to create subjective extracts before using them as done by

Mukherjee and Bhattacharyya (2012). In case of product reviews, the review

often contains sentiment towards different ‘aspects’. (‘Aspects’ of a cell phone

are battery, weight, OS, etc.).

5. Blogs are often long text describing an opinion with respect to a topic. They can

also be crawled and annotated to create a sentiment dataset. Blogs tend to be

structured narratives analyzing the topic. They may not always contain the same

sentiment throughout but can be useful sources of data that looks at different

aspects of the given topic.

5.4.2 Obtaining Labels

Once raw data has been obtained, the second step is to label this data. There are

different approaches that can be used for obtaining labels for a dataset:

1. Manual labelling: Several datasets have been created by human annotators.

The labelling can be done through crowd-sourcing applications like Amazon

5 Sentiment Resources: Lexicons and Datasets 97

Mechanical Turk. They allow obtaining large volumes of annotations by employ-

ing the ‘power of the crowds’ (Paolacci et al. 2010). To control the quality of

annotation, one way is to use a seed set of gold labels. Human annotators within

the controlled setup of the experiment create a set of gold labels. If a crowd-

sourced annotator (known as ‘worker’ in the crowd-sourcing parlance) gets a

sufﬁcient number of gold labels right, only then is he/she permitted to perform

the task of annotation.

2. Distant supervision: Distant supervision refers to the situation where the label or

the supervision is obtained without an annotator – hence the word ‘distant’. One

way to do so is to use annotation provided by the writer themselves. However,

the question of reliability arises because not every data unit has been manually

veriﬁed by a human annotator. This has to be validated using the approach used

to obtain distant supervision. Consider the example of Amazon reviews. Each

review is often accompanied by star ratings. These star ratings can be used as

labels provided by the writer. Since these ratings are out of 5, a review with 1

star is likely to be strongly negative, whereas a review with 5 stars is likely to be

strongly positive. To improve the quality of the dataset obtained, Pang and Lee

(2005) consider reviews that are deﬁnitely positive and deﬁnitely negative – i.e.

reviews with 5 and 1 stars respectively.

Another technique to obtain distant supervision is the use of hashtags. Twitter

provides a reverse index mechanism in the form of hashtags. An example tweet

is ‘Just ﬁnished writing a 20 page long assignment. #Engineering #Boring’.

‘#Engineering’ and ‘#Boring’ as hashtags – since they are phrases preceded

by a hashtag symbol. Note that a hashtag is created by the author of the tweet

and hence, can be anything – topical (i.e. identifying what the tweet is about.

Engineering, in this case) or emotion-related (i.e. expressing an opinion through

a hashtag. In this case, the author of the tweet is bored). Purver and Battersby

(2012) emotion-related hashtags to obtain a set of tweets containing emotion-

related hashtags. Thus, hashtags such as ‘#happy’, ‘#sad’, etc. are used to

download tweets using the Twitter API. The tweets are then labelled as ‘#happy’,

‘#sad’, etc. Since hashtags are user-created, they can be more nuanced than

this. For example, consider the hypothetical tweet: ‘Meeting my ex-girlfriend

after about three years. #happy #not’. The last hashtag ‘#not’ inverts sentiment

expressed by the preceding hashtag ‘#happy’. This unique construct ‘#not’ or

‘#notserious’ or ‘#justkidding’/‘#jk’ is popular in tweets and must be handled

properly when hashtag-based supervision is used to create a dataset.

5.4.3 Popular Sentiment-Annotated Datasets

We now discuss some popular sentiment-annotated datasets. We divide them into

two categories: sentence-level annotation, discourse-level annotation. The latter

points to text longer than a sentence. While tweets may contain more than a

sentence, we group them under sentence-level annotation because of limited length

of tweets.

98 A. Joshi et al.

Sentence-Level Annotated Datasets

Niek Sanders released a dataset at http://www.sananalytics.com/lab/twitter-

sentiment/. It consists of 5513 manually labelled tweets, classiﬁed as per four

topics.

SemEval is a competition that is run for speciﬁc tasks. Sentiment analysis and

related tasks have featured since 2013 (Nakov et al. 2013; Rosenthal et al. 2014,

2015). The datasets for these tasks are released online, and can be useful for

sentiment applications. SemEval 2013 dataset is at: http://www.cs.york.ac.uk/

semeval-2013/semeval2013.tgz SemEval 2014 dataset is at: http://alt.qcri.org/

semeval2014/task9/ SemEval 2015 dataset is at: http://alt.qcri.org/semeval2015/

task10/index.php?id=subtaske-readme

Darmstadt corpus consists of consumer reviews annotated at sentence and expres-

sion level. The dataset is available at: https://www.ukp.tu-darmstadt.de/data/

sentiment-analysis/darmstadt-service-review-corpus/ Sentence annotated polar-

ity dataset from Pang et al. (2002) is also available at: https://www.cs.cornell.edu/

people/pabo/movie-review-data/ Sentiment140 (Go et al. 2009) is a corpus made

available by Stanford at http://help.sentiment140.com/for-students. The dataset

is of tweets and contains additional information such as timestamp, author, tweet

id, etc.

Deng et al. (2013) released a goodFor/badFor corpus that is available at: http://

mpqa.cs.pitt.edu/corpora/gfbf/. goodFor/badFor indicates positive/negative sen-

timent respectively. This corpus uses a ﬁve-tuple representation for opinion

annotation. Consider this example sentence from their user manual: ‘The smell

stiﬂed his hunger.’ This sentence is marked as: ‘span: stiﬂed, polarity: badFor,

agent: the smell, object: his hunger’.

Discourse-Level Annotated Datasets

Many movie review datasets and lexicons are available at: https://www.cs.cornell.

edu/people/pabo/movie-review-data/. These datasets include: sentiment anno-

tated datasets, subjectivity annotated datasets, and sentiment scale datasets.

These have been released in Pang and Lee (2004,2005), and widely used.

A Congressional speech dataset (Thomas et al. 2006) annotated with opinion is

available at: http://www.cs.cornell.edu/home/llee/data/convote.html The labels

indicate whether the speaker supported or opposed a legislation that he/she was

talking about.

A corpus consisting of Amazon reviews from different domains such as electronics,

movies, etc. is available at: https://snap.stanford.edu/data/web-Amazon.html

(McAuley and Leskovec 2013). This dataset spans a period of 18 years, and

contains information such as: product title, author name, star rating, helpful

votes, etc.

The Political Debate Corpus by Somasundaran and Wiebe (2009) is a dataset of

political debates that is arranged based on different topics. It is available here:

http://mpqa.cs.pitt.edu/corpora/product_debates/.

5 Sentiment Resources: Lexicons and Datasets 99

MPQA Opinion Corpus (Wiebe et al. 2005) is a popular dataset that consists of

news articles from different sources. Version 2.0 of the corpus is nearly 15,000

sentences. The sentences are annotated with topics and labels. The topics are

from different countries around the world. This corpus is available at http://mpqa.

cs.pitt.edu/corpora/mpqa_corpus/.

5.5 Bridging the Language Gap

Creation of a sentiment lexicon or a labelled dataset is a time/effort-intensive

task. Since English is the dominant language in which SA research has been

carried out, it is only natural that many other languages have tried to leverage on

resources developed for English by adapting and/or reusing them. Cross-lingual

SA refers to use of systems and resources developed for one language to perform

SA of another. The ﬁrst language (where the resources/lexicons/systems have been

developed) is called the source language, while the second language (where a new

system/resource/lexicon needs to be deployed) is called the target language. The

basis of cross-lingual SA is availability of a lexicon or an annotated dataset in the

source language. It must be noted that several nuanced methodologies to perform

cross-lingual SA exist, but have been left out due to the scope of this chapter. We

focus on cross-lingual sentiment resources.

The fundamental requirement is a mapping between the two languages. Let us

consider what happens in case we wish to map a lexicon in language X to language

Y. For a lexicon, this mapping can be in the form of a parallel dictionary where

words of one language are mapped to another. ANEW For Spanish (Redondo et al.

2007) describes the generation of a lexicon called ANEW. Originally created for

English words, its parallel Spanish version is created by translating words from

English to Spanish, and then manually validating them. It can also be in the form

of linked WordNets, in case the lexicons involve concepts like synsets. Hindi

SentiWordNet (Joshi et al. 2010) map synsets in English to Hindi using a WordNet

linking, and generate a Hindi SentiWordNet from its English variant. Mahyoub et al.

(2014) describe a technique to create a sentiment lexicon for Arabic. Based on a seed

set of positive and negative words, and Arabic WordNet, they present an expansion

algorithm to create a lexicon. The algorithm uses WordNet relations in order to

propagate sentiment labels to new words/synsets. The WordNet relations they use

are divided into two categories: the ones that preserve the sentiment orientation, and

the ones that invert the sentiment orientation.

How is this process of mapping words in one language to another any different

for datasets? In case a machine translation (MT) system is available, this task is

simple. A dataset in the source language can be translated to the target language.

This is a common strategy that has been employed (Mihalcea et al. 2007;Duh

100 A. Joshi et al.

et al. 2011). It follows that translation may introduce additional errors into the

system, thus causing a degradation in the quality of the dataset. This is particularly

applicable to translation of sentiment-bearing idioms. Salameh et al. (2015) perform

their experiments for Arabic where a MT system is used to translate documents,

following which sentiment analysis is performed. An interesting observation that

the authors make is that although MT may result in a poor translation making it

difﬁcult for humans to identify sentiment, a classiﬁer performs reasonably well.

However, MT systems may not exist for all language pairs. Balamurali et al. (2012)

suggest a naive replacement for a MT system. To translate a corpus from Hindi to

Marathi (and vice versa), they obtain sense annotations for words in the dataset.

Then, they use a WordNet linking to transfer the word from the source language to

the target language.

An immediate question that arises is the hypothesis at the bottom of all cross-

lingual approaches: sentiment is retained across languages. This means that if a

word has a sentiment s in the source language, the translated word in target language

(with appropriate sense recorded) also has sentiment s. How fair is the hypothesis

that words in different languages bear the same emotion? This can be seen from

linear correlations between ratings for the three affective dimensions, as was done

for ANEW for Spanish. ANEW for Spanish (Redondo et al. 2007), as described

above, was a lexicon created using ANEW in English. The correlation values for

valence, arousal and dominance are 0.916, 0.746 and 0.720 respectively. This means

that a positive English word is very likely to be a positive Spanish word. The arousal

and dominance values remain the same to a lower extent.

Thus, we have two options now. The ﬁrst option is cross-lingual SA: use

resources generated for the source language and map it to the target language.

The second option is in-language SA: create resources for the target language on

its own. Balamurali et al. (2013) weighs in-language SA against cross-lingual SA

based on Machine Translation. The authors show for English, German, French and

Russian that in-language SA does consistently better than cross-lingual SA relying

on translation alone.

Cross-lingual SA also beneﬁts from additional corpora in target language:

1. Unlabeled corpus in target language: This type of corpus is used in different

approaches, the most noteworthy being the co-training-based approach. Wan

(2009). The authors assume that a labelled corpus in the source language,

unlabeled corpus in target language and a MT system to translate back and forth

between the two languages are available.

2. Labelled corpus in target language: The size of this dataset is assumed to be

much smaller than the training set.

3. Pseudo-parallel data: Lu et al. (2011) describe use of pseudo-parallel data for

their experiments. Pseudo-parallel data is the set of sentences in the source

language that are translated to the target language and used as an additional

polarity-labelled data set. This allows the classiﬁer to be trained on a larger

number of samples.

5 Sentiment Resources: Lexicons and Datasets 101

5.6 Applications of Sentiment Resources

In the preceding sections, we described sentiment resources in terms of labels,

annotation techniques and approaches to creation. We will now see how a sentiment

resource (either a lexicon or a dataset) can be used.

A lexicon is useful as a knowledge base for a rule-based SA system. A rule-based

SA system takes a textual unit as input, applies a set of pre-determined rules, and

produces a prediction. Joshi et al. (2011) present C-Feel-It, a rule-based SA system

for tweets. The workﬂow is as follows:

1. A user types a keyword. Tweets containing the keyword are downloaded using

the Twitter API

2. The tweets are pre-processed to correct extended words (e.g. ‘happpyyyyy’ is

replaced with two occurrences of happy. Two, because the extended form of the

word ‘happy’ has a magniﬁed sentiment)

3. The words in a tweet are looked up individually in four lexical resources.

The sentiment label of a tweet is calculated as a sum of positive and negative

words – with rules applied for conjunctions and negation. In case of negation,

the sentiment of words within a window is inverted. In case of conjunctions such

as ‘but’, the latter part of a tweet is considered.

4. The resultant prediction of a tweet is a weighted sum of prediction made by the

four lexical resources. The weights are determined experimentally by considering

how well the resources perform on an already labelled dataset of tweets.

The above approach is a common framework for rule-based SA systems.

Levallois (2013) also use lexicons and a set of rules to perform sentiment analysis

of tweets. The goal, as stated by the authors, is to design it as ‘fast and scalable’.

LIWC provides a tool which also uses the lexicon, applies a set of rules to generate

a prediction. Typically, systems that use SA as a sub-module of a larger application

can beneﬁt greatly from a lexicon and simple hand-crafted rules.

Lexicons have also been used in topic models (Lin and He 2009) to set priors

on the word-topic distributions. A topic model takes as input a dataset (labelled

or unlabeled) and generates clusters of words called topics, such that a word may

belong to more than one topic. A topic model based on LDA (Blei et al. 2003)

samples a latent variable called topic, for every word occurrence in a document.

This results in two types of distributions over an unlabeled dataset: topic-document

distributions (the probability of seeing this topic in this document, given the words

and the topic-word assignments), and word-topic distributions (the probability of

seeing this word belonging to the topic in the entire corpus, given the words

and the topic-word assignments). The word-topic distribution is a multinomial

with a Dirichlet prior. Sentiment lexicons have been commonly used as Dirichlet

Hyperparameters for the word-topic distribution. Consider the following example.

In a typical scenario, all words have symmetric priors over the topics. This means

that all words are equally likely to belong to a certain topic. However, if we wish

102 A. Joshi et al.

to have ‘sentiment coherence’ in topics, then, setting Dirichlet Hyperparameters

appropriately can adjust priors on topic. Let us assume that we wish to have

the ﬁrst half of topics to represent ‘positive’ topics, and second half of topics to

represent ‘negative’ topics. A ‘positive’ topic here means a topic with positive words

corresponding to a concept. More complex topic models which model additional

latent variables (such as sentiment or switch variables) also use lexicons to set priors

(Mukherjee and Bhattacharyya 2012). Lexicons have also been used to train deep

learning-based neural networks (Socher et al. 2013). A combination of datasets and

lexicons has also been used. Tao et al. (2009) propose a three-pronged factorization

method for sentiment classiﬁcation. They factor in information from sentiment

lexicons (in the form of word level polarities), unlabeled datasets (in the form of

word co-occurrence) and labelled datasets (to set up the correspondences). Lexicons

can also be used to determine values of frequency-based features in a statistical

classiﬁcation system. Kiritchenko et al. (2014) use features derived from a lexicon

such as: number of tokens with non-zero sentiment, total and maximal score of

sentiment, etc. This work also presents a set of ablation tests to identify value of

individual sets of features. When the lexicon-based features are removed from the

complete set, the maximum degradation is observed. Such lexicon-based features

have been used for related tasks such as sentiment annotation complexity prediction

(Joshi et al. 2014), thwarting detection (Ramteke et al. 2013) and sarcasm detection

(Joshi et al. 2015).

Let us now look at how sentiment-labelled datasets can be used, especially

in machine learning (ML)-based classiﬁcation systems. ML-based systems model

sentiment analysis as a classiﬁcation problem. A classiﬁcation model predicts the

label of a document as one among different labels. This model is learnt using a

labelled dataset as follows. A document is converted to a feature vector. The most

common form of a feature vector of a document is the unigram representation with

the length equal to the vocabulary size. The vocabulary is the set of unique words

in the labelled dataset. A Boolean or numeric feature vector of length equal to the

vocabulary size is constructed for each document where the value is set for the words

present in the document. The goal of the model is to minimize error on training

documents, with appropriate regularization for variance in unseen documents. The

labelled documents serve as a building block for a ML-based system. While the

unigram representation is common, several features such as word sense based

features (Balamurali et al. 2011), qualitative features such as POS sequences (Pang

et al. 2002), have been used as features for ML-based systems. The annotated

datasets form the basis for creation of feature vectors with the documents acting

as observed instances. Melville et al. (2009) combine knowledge from lexicons

and labelled datasets in a unique manner. Sentiment lexicon forms the background

knowledge about words while labelled datasets provide a domain-speciﬁc view of

the task, in a typical text classiﬁcation scenario.

5 Sentiment Resources: Lexicons and Datasets 103

5.7 Conclusion

This chapter described sentiment resources: speciﬁcally, sentiment lexicons and

sentiment-annotated datasets. Our focus was on the philosophy and trends in the

generation and use of sentiment lexicons and datasets. We described creation

of several popular sentiment and emotion lexicons. We then discussed different

strategies to create annotated datasets, and also presented a list of available datasets.

Finally, we add two critical points in the context of sentiment resources: how a

resource in one language can be mapped to another, and how these resources are

actually deployed in a SA system. The diversity in goals, approaches and uses of

sentiment resources highlights the value of good quality sentiment resources to

sentiment analysis.

References

Balamurali, A.R., Aditya Joshi, and Pushpak Bhattacharyya. 2011. Harnessing wordnet senses for

supervised sentiment classiﬁcation. In Proceedings of the conference on empirical methods in

natural language processing. Association for Computational Linguistics.

———. 2012. Cross-lingual sentiment analysis for Indian languages using linked WordNets. In:

COLING.

Balamurali, A.R., Mitesh M. Khapra, and Pushpak Bhattacharyya. 2013. Lost in translation:

Viability of machine translation for cross language sentiment analysis. In Computational

linguistics and intelligent text processing. Berlin/Heidelberg: Springer.

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. The

Journal of Machine Learning Research 3: 993–1022.

Bradley, M.M., and P.J. Lang. 1999. Affective norms for English words (ANEW): Instruction man-

ual and affective ratings. Technical report C-1. The Center for Research in Psychophysiology,

University of Florida.

Brooke, Julian, Milan Toﬁloski, and Maite Taboada. 2009. Cross-linguistic sentiment analysis:

From English to Spanish. In: RANLP.

Cambria, Erik, Soujanya Poria, Rajiv Bajpai, and Björn Schuller. 2016. SenticNet 4: A semantic

resource for sentiment analysis based on conceptual primitives. In The 26th International

conference on computational linguistics (COLING), Osaka, 2666–2677.

Deng, Lingjia, Yoonjung Choi, and Janyce Wiebe. 2013. Benefactive/Malefactive event and writer

attitude annotation. ACL (2).

Duh, Kevin, Akinori Fujino, and Masaaki Nagata. 2011. Is machine translation ripe for cross-

lingual sentiment classiﬁcation?. In Proceedings of the 49th annual meeting of the association

for computational linguistics: Human language technologies: Short papers-volume 2. Associ-

ation for Computational Linguistics.

Ekman, Paul. 1992. An argument for basic emotions. Cognition and Emotion 6 (3–4): 169–200.

Esuli, Andrea, and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource

for opinion mining. In Proceedings of LREC, vol. 6.

104 A. Joshi et al.

Finn, Arup. 2011. AFINN. Informatics and Mathematical Modelling, Technical University of

Denmark.

Go, Alec, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classiﬁcation using distant

supervision. CS224N Project Report, Stanford 1 (2009): 12.

Joshi, Aditya, A.R. Balamurali, and Pushpak Bhattacharyya. 2010. A fall-back strategy for

sentiment analysis in hindi: A case study. In Proceedings of the 8th ICON.

Joshi, Aditya, A.R. Balamurali, and Pushpak Bhattacharyya, and Rajat Mohanty. 2011. C-feel-

it: A sentiment analyzer for micro-blogs. In Proceedings of the 49th annual meeting of the

association for computational linguistics.

Joshi, Aditya, Abhijt Mishra, and Pushpak Bhattacharyya. 2014. Measuring sentiment annotation

complexity of text. In Conference for association of computational linguistics.

Joshi, Aditya, Vinita Sharma, and Pushpak Bhattacharyya. 2015. Harnessing context incongruity

for Sarcasm detection. In Conference for association of computational linguistics.

Kiritchenko, Svetlana, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short

informal texts. Journal of Artiﬁcial Intelligence Research 50: 723–762.

Levallois, Clement. 2013. Umigon: Sentiment analysis for tweets based on lexicons and heuristics.

In Proceedings of the international workshop on semantic evaluation. SemEval, vol. 13.

Lin, Chenghua, and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In

Proceedings of the 18th ACM conference on Information and knowledge management.ACM.

Liu, Bing. 2010. Sentiment analysis and subjectivity. In Handbook of natural language processing,

vol. 2, 627–666.

Louviere, Jordan J. 1991.Best-worst scaling: A model for the largest difference judgments.

Technical report, University of Alberta.

Lu, Bin, et al. 2011. Joint bilingual sentiment classiﬁcation with unlabeled parallel corpora.

In Proceedings of the 49th annual meeting of the association for computational linguistics:

Human language technologies-volume 1. Association for Computational Linguistics.

Mahyoub, Fawaz H.H., Muazzam A. Siddiqui, and Mohamed Y. Dahab. 2014. Building an Arabic

sentiment Lexicon using semi-supervised learning. Journal of King Saud University-Computer

and Information Sciences 26 (4): 417–424.

Makice, Kevin. 2009. Twitter API: Up and running: Learn how to build applications with the

Twitter API. Beijing: O’Reilly Media, Inc.

McAuley, Julian, and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating

dimensions with review text. In Proceedings of the 7th ACM conference on recommender

systems.ACM.

Mehrabian, Albert, and James A. Russell. 1974. An approach to environmental psychology.

Cambridge, MA: MIT Press.

Melville, Prem, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by

combining lexical knowledge with text classiﬁcation. In Proceedings of the 15th ACM SIGKDD

international conference on Knowledge discovery and data mining.ACM.

Mihalcea, Rada, Carmen Banea, and Janyce M. Wiebe. 2007. Learning multilingual subjective

language via cross-lingual projections.

Miller, George A. 1995. WordNet: A lexical database for English. Communications of the ACM 38

(11): 39–41.

Mohammad, Saif. 2012. #Emotional tweets. In Proceedings of the ﬁrst joint conference on lexical

and computational semantics (*Sem), June 2012.

Mohammad, Saif M., and Peter D. Turney. 2010. Emotions evoked by common words and phrases:

Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT

2010 workshop on computational approaches to analysis and generation of emotion in text.

Association for Computational Linguistics.

———. 2013. Crowdsourcing a word–emotion association lexicon. Computational Intelligence 29

(3): 436–465.

5 Sentiment Resources: Lexicons and Datasets 105

Mohammad, S., C. Dunne, and B. Dorr. 2009. Generating high-coverage semantic orientation

lexicons from overtly marked words and a thesaurus. In EMNLP, 599–608.

Mukherjee, S., and P. Bhattacharyya. 2012. Wikisent: Weakly supervised sentiment analysis

through extractive summarization with wikipedia. In Machine learning and knowledge dis-

covery in databases, 774–793. Berlin/Heidelberg: Springer.

Nakov, Preslav, et al. 2013. Semeval-2013 task 2: Sentiment analysis in twitter.

Ortony, Andrew, and Terence J. Turner. 1990. What’s basic about basic emotions? Psychological

Review 97 (3): 315–331.

Pang, Bo, and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity

summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on

association for computational linguistics. Association for Computational Linguistics.

———. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect

to rating scales. In Proceedings of the 43rd annual meeting on association for computational

linguistics. Association for Computational Linguistics.

Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classi-

ﬁcation using machine learning techniques. In Proceedings of the ACL-02 conference on

empirical methods in natural language processing-volume 10. Association for Computational

Linguistics.

Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. 2010. Running experiments on

Amazon Mechanical turk. Judgment and Decision Making 5 (5): 411–419.

Pennebaker, James W., Martha E. Francis, and Roger J. Booth. 2001. Linguistic inquiry and word

count: LIWC 2001, vol 71. Mahwah: Lawrence Erlbaum Associates.

Plutchik, Robert. 1980. Emotion: A psychoevolutionary synthesis. New York: Harpercollins

College Division.

———. 1982. A psychoevolutionary theory of emotions. Social Science Information/sur les

sciences sociales 21: 529–553.

Purver, Matthew, and Stuart Battersby. 2012. Experimenting with distant supervision for emotion

classiﬁcation. In Proceedings of the 13th conference of the European chapter of the association

for computational linguistics. Association for Computational Linguistics.

Ramteke, Ankit, Pushpak Bhattacharyya, and J. Saketha Nath. 2013. Detecting Turnarounds in

sentiment analysis: Thwarting.In Conference for association of computational linguistics.

Redondo, Jaime, et al. 2007. The Spanish adaptation of ANEW (affective norms for English

words). Behavior Research Methods 39 (3): 600–605.

Rosenthal, Sara, et al. 2014. Semeval-2014 task 9: Sentiment analysis in twitter. In Proceedings of

SemEval, 73–80.

Rosenthal, Sara, Preslav Nakov, Svetlana Kiritchenko, Saif M. Mohammad, Alan Ritter, and

Veselin Stoyanov. 2015. Semeval-2015 task 10: Sentiment analysis in twitter. In Proceedings

of the 9th international workshop on semantic evaluation, SemEval.

Salameh, Mohammad, Saif Mohammad, and Svetlana Kiritchenko. 2015. Sentiment after trans-

lation: A case-study on Arabic social media posts. In Proceedings of the 2015 conference of

the North American chapter of the association for computational linguistics: Human language

technologies.

Socher, Richard, et al. 2013. Recursive deep models for semantic compositionality over a

sentiment treebank. In Proceedings of the conference on empirical methods in natural language

processing (EMNLP), vol. 1631.

Somasundaran, Swapna, and Janyce Wiebe. 2009. Recognizing stances in online debates. In

Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th

international joint conference on natural language processing of the AFNLP: Volume 1-volume

1. Association for Computational Linguistics.

Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. 1966. The general inquirer: A computer

approach to content analysis. Cambridge, MA: MIT Press.

106 A. Joshi et al.

Strapparava, Carlo, and Alessandro Valitutti. 2004. WordNet affect: An affective extension of

WordNet. LREC, vol. 4.

Tao, L., Y. Zhang, and V. Sindhwani. 2009. A non-negative matrix tri-factorization approach to

sentiment classiﬁcation with lexical prior knowledge. In Proceedings of the joint conference

of the 47th annual meeting of the ACL and the 4th international joint conference on natural

language processing of the AFNLP Association for Computational Linguistics.

Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or oppo-

sition from Congressional ﬂoor-debate transcripts. In Proceedings of the 2006 conference on

empirical methods in natural language processing. Association for Computational Linguistics.

Wan, Xiaojun. 2009. Co-training for cross-lingual sentiment classiﬁcation. In Proceedings of

the joint conference of the 47th annual meeting of the ACL and the 4th international joint

conference on natural language processing of the AFNLP: Volume 1-volume 1. Association for

Computational Linguistics.

Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and

emotions in language. Language Resources and Evaluation 39 (2–3): 165–210.

Xu, Ge, Xinfan Meng, and Houfeng Wang. 2010. Build Chinese emotion lexicons using a graph-

based algorithm and multiple resources. In Proceedings of the 23rd international conference

on computational linguistics. Association for Computational Linguistics.

Chapter 6

Generative Models for Sentiment Analysis

and Opinion Mining

Hongning Wang and ChengXiang Zhai

Abstract This chapter provides a survey of recent work on using generative models

for sentiment analysis and opinion mining. Generative models attempt to model the

joint distribution of all the relevant data with parameters that can be interpreted as

reﬂecting latent structures or properties in the data. As a result of ﬁtting such a

model to the observed data, we can obtain an estimate of these parameters, thus

“revealing” the latent structures or properties of the data to be analyzed. Such

models have already been widely used for analyzing latent topics in text data. Some

of the models have been extended to model both topics and sentiment of a topic,

thus enabling sentiment analysis at the topic level. Moreover, new generative models

have also been developed to model both opinionated text data and their companion

numerical sentiment ratings, enabling deeper analysis of sentiment and opinions to

not only obtain subtopic-level sentiment but also latent relative weights on different

subtopics. These generative models are general and robust and require no or little

human effort in model estimation. Thus they can be applied broadly to perform

sentiment analysis and opinion mining on any text data in any natural language.

Keywords Generative model • Probabilistic topic model • Topic-sentiment mix-

ture • Latent aspect rating analysis • Latent variable analysis

There are many approaches to performing sentiment analysis and opinion mining.

At a high level, we can distinguish two main families of approaches. The ﬁrst is

rule-based approaches where human expertise is leveraged to create rules (e.g.,

sentiment lexicon) for determining sentiment of a text object (Ding and Liu 2007;

Ding et al. 2008; Esuli and Sebastiani 2006; Taboada et al. 2011; Cambria et al.

2016). The second is statistical model based approaches, where statistical models

H. Wang ()

Department of Computer Science, University of Virginia, 22903, Charlottesville, VA, USA

e-mail: hw5x@virginia.edu

C.X. Zhai

Department of Computer Science, University of Illinois at Urbana-Champaign, 61801, Urbana,

IL, USA

e-mail: czhai@illinois.edu

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_6

107

108 H. Wang and C.X. Zhai

are estimated on labeled data or domain-speciﬁc priors generated by humans to

essentially learn “soft” rules for sentiment prediction (Dave et al. 2003; Kim and

Hovy 2004; Leskovec et al. 2010; Pang et al. 2002; Poria et al. 2015), a.k.a,

learning based methods. Learning based approaches usually require labeled data

for parameter estimation, while rule based approaches have less dependence on

manual annotation but they also suffer from limited generalization capability. The

rules can also be treated as high-level features to be used in a statistical model so as

to combine the two families of approaches (Hu et al. 2013;Luetal.2011; Rao and

Ravichandran 2009; Melville et al. 2009).

Among the statistical approaches, we may further distinguish generative models

from discriminative models (Bishop 2006). Generative models focus on modeling

the joint probability between class labels (e.g., sentiment labels) and data instances

(e.g., text documents). Latent variables can be introduced in generative models to

capture the unobservable or missing structures, e.g., latent topics (Blei et al. 2003;

Blei 2012; Hofmann 1999). As a result, a generative model is a full probabilistic

model of both observed and unobserved variables. In general, generative models

attempt to model the joint distribution of all the relevant data with parameters that

can be interpreted as reﬂecting latent structures or properties in the data. As a result

of ﬁtting such a model to the observed data, we can obtain an estimate of these

parameters, thus “revealing” the latent structures or properties of the data to be

analyzed.

In contrast, discriminative models, such as support vector machines (Hearst

et al. 1998; Joachims 1998), directly model the decision boundaries, e.g., the

conditional probability of class labels given data instances. Thus, a discriminative

model provides a model only for the target variables conditioned on the observed

variables. Flexible feature representations can be exploited in discriminative models,

and empirically they often result in better classiﬁcation performance than generative

models (Jordan 2002). This category of statistical solutions for sentiment analysis

have been well discussed in Liu’s and Pang’s survey book (Liu 2012,2015; Pang

and Lee 2008), and therefore we will not cover it in our book.

In addition to supporting sentiment classiﬁcation, one major advantage of gener-

ative models over discriminative models is the ability of expressing complex rela-

tionships between the observed and target variables, even when such relationships

are not directly observable. This property is of particular importance in sentiment

analysis and opinion mining, when formalizing the subtle dependency between

sentiment and text document content for more accurate modeling of opinions.

Promising progress in exploring generative models for sentiment analysis and

opinion mining has been achieved in recent studies (Lin and He 2009; Mei et al.

2007; Titov and McDonald 2008a; Jo and Oh 2011; Wang et al. 2010,2011;

McAuley and Leskovec 2013; Moghaddam and Ester 2011). Previously, generative

models have already been widely used for analyzing latent topics in text documents,

e.g., topic models (Blei et al. 2003;Blei2012; Hofmann 1999). Some of the

models have been extended to model the sentiment of a topic, thus enabling

sentiment analysis at the topic level (Lin and He 2009; Mei et al. 2007; Titov

6 Generative Models for Sentiment Analysis and Opinion Mining 109

and McDonald 2008a; Jo and Oh 2011). Moreover, new generative models have

also been developed to model both opinionated text data and their companion

numerical sentiment ratings, enabling deeper analysis of sentiment and opinions to

not only obtain subtopic-level sentiment but also latent relative weights on different

subtopics (Wang et al. 2010,2011; McAuley and Leskovec 2013; Moghaddam

and Ester 2011). This chapter provides a survey of these recent works on using

generative models for sentiment analysis and opinion mining, and discusses various

applications of such models.

The rest of this chapter is organized as follows. In Sect. 6.1, we provide essential

background about language models and topic models, which is the basis of the

generative models that we will review in this chapter. We then present a detailed

review of the major generative models for sentiment analysis in Sect. 6.2. We will

discuss their applications in Sect. 6.3. To facilitate application development using

such models, in Sect. 6.4, we also provide a brief review of the relevant resources

on the Web.

6.1 Background: Language Models and Probabilistic Topic

Models

As a background, we ﬁrst introduce generative models for modeling text data, start-

ing from the N-gram language models, proceeding to introducing the probabilistic

topic models. We will introduce two most typical topic models, i.e., probabilistic

latent semantic indexing model (Hofmann 1999) and latent Dirichlet allocation

model (Blei et al. 2003). We will also brieﬂy discuss the model estimation procedure

for these generative models.

6.1.1 Language Models for Text

The simplest generative model for modeling text data is the N-gram language mod-

els, which were ﬁrst introduced in speech recognition for distinguishing between

words and phrases that sound similar (Katz 1987; Rabiner and Juang 1993) and

later introduced to information retrieval for matching keyword queries with text doc-

uments (Ponte and Croft 1998; Hiemstra and Kraaij 1998; Zhai and Lafferty 2001a).

A statistical language model speciﬁes a probability distribution over sequences

of words. For example, with a language model estimated on a collection of computer

science research papers, one can make statistical assertions about which text

sequence is more likely to be generated by a computer scientist, e.g., P(“generative

models for sentiment analysis”) > P(“the ﬂight to Chicago is cancelled”). Formally,

a language model P.w1;w2;:::;wn/speciﬁes the joint probability of observing

110 H. Wang and C.X. Zhai

the word sequence w1;w2;:::;wn. Using the chain rule of probability, it can be

written as,

P.w1;w2;:::;wn/DP.w1/P.w2jw1/P.w3jw1w2/:::P.wnjw1w2:::wn1/

kD1

P.wkjw1;:::;wk1/(6.1)

where P.wkjw1;:::;wk1/is a multinomial distribution over words in the vocabu-

lary given the word sequence of w1;:::;wk1.

The chain rule shows the link between computing the joint probability of a

sequence of words and computing the conditional probability of a word given all

preceding words. Intuitively, Eq. (6.1) deﬁnes the generation process of a word

sequence: repeatedly select the next word with regard to all the words in front of

it until meeting the predeﬁned sequence length. For this reason, such a model is

often called a generative model.

Although Eq. (6.1) suggests that one can compute the joint probability of

an entire sequence of words by multiplying together a number of conditional

probabilities, it does not reduce the computational complexity. The bottleneck is

that we do not have any efﬁcient way to compute the exact probability of a word

given a long sequence of preceding words. For example, with a vocabulary size

of V, to compute P.wkjw1;:::;wk1/one needs in total .V1/Vk1elements in

the probability table (minus one because the probabilities sum up to one). And this

complexity is in the same order as that to directly compute P.w1;:::;wk/, which is

Vk1. Since in general these probabilities must be estimated based on empirically

observed data, and in practice, we almost never have so much data to observe

all these different sequences, we must make simpliﬁcation assumptions about the

model to make it tractable and actually useful in an application.

N-gram language models provide a practical solution to this computation com-

plexity challenge: instead of computing the probability of a word given the entire

preceding sequence, we can approximate the preceding sequence by just a ﬁnite

number of previous words, i.e., P.wkjw1;:::;wk1/:

DP.wkjwkNC1;:::;wk1/.

The assumption that the conditional probability of a word depends only on

the previous N-1 words is called a Markov assumption. Unigram model is the

simplest N-gram language model, in which one assumes the current word is totally

independent of any other words in the sequence, i.e., P.wkjw1;:::;wk1/DP.wk/;

as a result,

P.w1;w2;:::;wn/D

kD1

P.wk/(6.2)

In literature, unigram language model is also referred as bag-of-words model

(Harris 1954), since the order between words is totally ignored. To capture the local

dependency between words, bigram and trigram models are usually exploited.

6 Generative Models for Sentiment Analysis and Opinion Mining 111

One fundamental problem in applying the N-gram language models is to

estimate the N-gram probabilities of P.wkjwkNC1;:::;wk1/. The simplest and

most intuitive way for estimating such probabilities is the maximum likelihood

estimation (Bishop 2006), in which one looks for the conﬁguration of those

unknown probabilities to maximize the likelihood function over a given set of

training data. For the general case of maximum likelihood estimation for N-gram

language models, one estimates the conditional probability as follows,

P.wkjwkNC1;:::;wk1/DC.wkNC1;:::;wk1wk/

C.wkNC1;:::;wk1/(6.3)

where C.wkNC1;:::;wk1/is the frequency of word sequence wkNC1;:::;wk1

in the training corpus.

One important concept in maximum likelihood estimation for N-gram language

models is called “smoothing.” Due to the sparse observations in the training data,

zero probability is assigned to some word sequences, which makes any sequence

containing such sequences has a zero probability in the estimated model. Various

types of techniques have been developed to smooth a language model, e.g., Laplace

Smoothing, Good-Turing discounting and linear interpolation. Since this topic is

beyond the scope of this book, we refer the audiences to the following literature

for more details (Jurafsky and Martin 2009; Chen and Goodman 1996; Zhai and

Lafferty 2001b).

6.1.2 Probabilistic Topic Models

Topic models are a class of generative models for uncovering the underlying

semantic structure of a document collection. The very original idea of topic

modeling roots in Deerwester et al.s’ seminal work in latent semantic indexing

(LSI) (Deerwester et al. 1990), in which singular value decomposition is performed

to discover inter- and intra-document statistical structures in a lower dimensional

space. However, this approach is not a generative model, making it unclear how to

interpret the latent topics discovered. A signiﬁcant step forward in this direction was

made by Hofmann (1999), who solved the problem of latent semantic indexing in a

probabilistic fashion (the pLSI model). In pLSI, words and documents are modeled

in a generative perspective: a document is modeled as a mixture of latent topics

and each topic is modeled as a multinomial distribution over words. However, pLSI

model is not a complete generative model, which does not specify the generation

process at the document level. To address this problem, a full Bayesian probabilistic

model, latent Dirichlet allocation (LDA) model (Blei et al. 2003), was introduced, in

which the topic proportion in each document is assumed to be drawn from a shared

Dirichlet distribution in the same corpus. LDA is an important milestone which

opened up many possibilities for further development of various generative models

for modeling topics. It has served as a springboard for many other topic models in

112 H. Wang and C.X. Zhai

analyzing different types of text data, including scientiﬁc literature (Steyvers et al.

2004; Blei and Lafferty 2007; Wang and Blei 2011), social media (Zhao et al. 2011;

Hong and Davison 2010) and opinionated text reviews (Titov and McDonald 2008a;

Lin and He 2009; Mei et al. 2007; Jo and Oh 2011; Wang et al. 2011).

In this section, we will brieﬂy introduce these two basic probabilistic topic

models for text modeling, i.e., pLSI and LDA. We will focus on the basic notations,

generative assumptions, graphical model representation, and model estimation

procedure for each model.

6.1.2.1 pLSI

Probabilistic latent semantic indexing (pLSI), also known as probabilistic latent

semantic analysis (pLSA) is a generative model for document modeling. It models

a text document as a mixture over a set of latent topics, and each topic is modeled as

a probabilistic distribution over a ﬁxed vocabulary. To formally describe the pLSI

model, and later other more advanced topic models, we will ﬁrst introduce some

notations and terminologies.

Formally, a word wis the basic unit deﬁned in a ﬁxed size vocabulary,

indexed from 1 to V. A document is a length-Nsequence of words, denoted as

dD.w1w2:::wN/. A corpus is a collection of Mdocuments, denoted as DD

fd1;d2;:::;dMg. In pLSI, a corpus is assumed to contain a set of klatent topics, each

of which is modeled as a multinomial distribution over the vocabulary, i.e., p.wjˇi/,

where ˇiis the distribution parameter for topic i. Thus a document is modeled

as a composition of those ktopics: each word in a document is generated from a

single topic indexed by z, and different words in a document may be generated from

different topics.

An important assumption made in the pLSI model is that given the topic

assignments zD.z1z2:::zN/for the words in a document d, the words are

independent of the document index. As a result, the joint probability of document d

and its words w1w2:::wNcan be computed as,

P.d;w1w2:::wN/DP.d/

iD1X

P.wijzi/P.zijd/(6.4)

The decomposition of joint probability of a document and its words in pLSI can

be described by the following generative process:

1. For each d2D, sample dby dp.d/;

2. To generate each word wi2d,

a. Sample topic assignment ziby zip.zjd/;

b. Sample word wiby wip.wjˇ; zi/;

Using the graphical model presentation, the above generation process of a text

document deﬁned by pLSI model can be illustrated in Fig. 6.1.

6 Generative Models for Sentiment Analysis and Opinion Mining 113

Fig. 6.1 Graphical model representation of probabilistic latent semantic indexing (pLSI) model.

The plates represent replicates, where the index on the bottom right corner indicates the number

of repetitions. The outer plate represents documents, while the inner plate represents the repeated

choice of topics and words within a document. The circles represent random variables, where

shaded circle indicates observable variables and light circle indicates latent variables

Consider the unigram language model described in Eq. (6.2), which assumes the

whole corpus only contains one topic and every word in documents is sampled from

that topic. pLSI relaxes this assumption by introducing klatent topics in a given

collection, and allows each document to be a mixture over those ktopics. Hence, in

pLSI each document is represented as a list of mixing proportions for these mixture

components (i.e., p.zjd/) and thereby reduced to a probability distribution on a ﬁxed

set of topics. Those mixing proportions can be considered as a lower dimensional

representation of a document, which can also be regarded as useful knowledge about

coverage of topics in each document.

pLSI model has served as building blocks in many other generative model for

text documents. Brants et al. used pLSI model to perform topic-based document

segmentation (Brants et al. 2002), Mei et al. utilized it to model the facets and

opinions in weblogs (Mei et al. 2007) and discover evolutionary theme patterns from

text (Mei and Zhai 2005), Zhai et al. used it for cross-collection comparative text

mining (Zhai et al. 2004), and Lu et al. exploited it for rated aspect summarization

of short comments (Lu et al. 2009).

pLSI model has two parameters to be estimated, i.e., the word distribution under a

given topic i,p.wjˇi/, and the topic proportions in a given document d,p.zjd/.Due

to the existence of latent variables in pLSI (i.e., the topic assignments of words),

maximum likelihood estimation is no longer applicable. Expectation maximization

(EM) algorithm (Dempster et al. 1977) is popularly used to estimate those two

parameters. Brieﬂy, the EM algorithm approximates the lower bound of data

likelihood function (i.e., p.d;w/DPzp.d;w;z/) by computing the expectation

of complete data likelihood over the latent variables (i.e., EzŒp.d;w;z/). Two steps

are alternatively executed in EM algorithm: in E-step, the expectation of complete

data likelihood over the latent variables is computed; in M-step, the optimal model

parameters are found to maximize this expectation. Since a principled derivation of

EM algorithm and the proof of its convergence are beyond the scope of this book,

interested readers can refer to Dempster et al. (1977), McLachlan and Krishnan

(2007), and Wu (1983) for more details.

The EM iterations are guaranteed to stop at a local maximum. However, there is

no guarantee for an EM algorithm to ﬁnd the global optimal. As a result, pLSI is

prone to overﬁtting the data and good initialization in pLSI becomes very important.

114 H. Wang and C.X. Zhai

Another source of overﬁtting in the pLSI model is its incomplete generative

process: the document variable dis simply modeled as an index in the corpus, and

there is no generative assumption about it. As a result, the number of parameters

in the model grows linearly with the size of the corpus (each document has its own

k-dimensional topic proportion vector), and it is not clear how to assign probability

to a document outside of the training set.

To address these limitations, latent Dirichlet allocation model was introduced

later to impose a full generative assumption about the document generation process.

We will introduce the LDA model in the next section.

6.1.2.2 LDA and Advanced Topic Models

Latent Dirichlet allocation model (LDA), proposed by Blei et al. in (Blei et al.

2003), introduces a shared Dirichlet distribution over the topic proportions in each

document to control the number of parameters in a topic model. As shown in

Fig. 6.2, the topic proposition p.zj;d/in document dis modeled as a multinomial

distribution parameterized by a k-dimensional vector , which is assumed to be

drawn from a Dirichlet distribution with ˛as the concentration parameter,

p.j˛/ D.

iD1˛i/

iD1.˛

iD1

˛i1

i(6.5)

where ./is the Gamma function.

According to Fig. 6.2, the generative process of documents speciﬁed by a LDA

model can be described as follows,

1. For each d2D, sample by Dir.˛/;

2. For each wi2d,

a. Sample topic assignment ziby zip.zj; d/;

b. Sample word wiby wip.wjˇ; zi/;

The corresponding joint probability of words w, latent topic assignments z,

and latent topic proportion in document dspeciﬁed by a LDA model can be

computed as,

Fig. 6.2 Graphical model representation of latent Dirichlet allocation (LDA) model. ˛and ˇare

corpus-level parameters for the distribution of topic proportion in documents and word distribution

under topics (Blei et al. 2003)

6 Generative Models for Sentiment Analysis and Opinion Mining 115

p.w;z;j˛; ˇ/ Dp.j˛/

nD1

p.wnjˇ; zn/p.znj/ (6.6)

LDA model postulates a two-layer hierarchal Bayesian assumption in the

document generation process: the topic proposition is drawn from a Dirichlet

distribution, and the speciﬁc topic assignment of each word is drawn from a multi-

nomial distribution speciﬁed by . The conjugacy between Dirichlet distribution

and multinomial distribution provides additional computational advantage, which

facilitates posterior inference. Compared to the pLSI model, the topic proposition

is now modeled as a latent variable, rather than a model parameter. It thus makes

the number of parameters in LDA model independent from the training corpus, and

provides a principled way to estimate the topic proposition in unseen test documents,

i.e., via statistical posterior inference.

Many extensions of LDA have been made. Blei and Lafferty replaced the

Dirichlet prior for the topic proportion in documents with a log-normal distribution

to model the covariance of topics in a corpus (Blei and Lafferty 2007). Temporal

dynamics of word distribution under topics in a given corpus are modeled in Blei

and Lafferty (2006). Both continuous supervision (Mcauliffe and Blei 2008), e.g.,

opinion ratings, and discrete supervision (Zhu et al. 2009; Ramage et al. 2009), e.g.,

sentiment class, are introduced into LDA. Teh et al. introduced another layer of

Bayesian hierarchy over the generation of Dirichlet parameter ˛(Teh et al. 2006),

such that the clustering property of documents can be captured.

Because of the coupling between the continuous variable and discrete variable

zin a document, the posterior inference in LDA model becomes more challenging

than that in pLSI model. Two most popularly used inference methods are Gibbs

sampling (Grifﬁths and Steyvers 2004) and variational inference (Blei et al.

2003). Both inference methods take advantage of the conjugacy between Dirichlet

distribution and multinomial distribution to facilitate the computation, e.g., can

be integrated out in Gibbs sampling and a closed form solution for exists in

variational inference. Further details about those two inference procedures can

be found in Andrieu et al. (2003) and Wainwright and Jordan (2008). Parallel

implementation of LDA model for large-scale document collection can be found

in Smola and Narayanamurthy (2010), Andrieu et al. (2003), Zhai et al. (2012),

and Wang et al. (2009). And the parameter estimation in a LDA model can also be

achieved via EM algorithms (Blei et al. 2003).

6.2 Generative Models for Sentiment Analysis

With the basic concepts about generative modeling of text documents introduced

in the previous section, we are now ready to discuss how to utilize the generative

models for sentiment analysis. Before diving into the details of speciﬁc models, we

will ﬁrst deﬁne some categorizations of generative models for sentiment analysis

116 H. Wang and C.X. Zhai

Fig. 6.3 Basic categorization

of generative models for

sentiment analysis (Mimno

and McCallum 2008). (a)

Upstream model. (b)

Downstream model

to facilitate our later discussions. According to the notion proposed in Mimno and

McCallum’s work (Mimno and McCallum 2008), we can categorize most of existing

generative models for sentiment analysis as upstream models and downstream

models, according to their particular dependency assumption among the sentiment

label s, topic assignment zand observed word win a given document. Using the

language of graphical models, we can illustrate these two classes of generative

models for sentiment analysis in Fig. 6.3.

Upstream models assume that in order to generate a word wd;ninagiven

document d, one needs to ﬁrst decide the sentiment polarity sd;nof this word, and sd;n

then determines the topic assignment zd;nfor this word. Upstream models usually

model sentiment as discrete labels and assume there are different topic proportions

under different sentiment labels. In contrast, downstream models assume the

sentiment label sd;nis determined by the topic assignment zd;n, in parallel to the word

wd;n. Therefore, downstream models are more ﬂexible in modeling the sentiment,

e.g., continuous ratings can also be modeled (Mcauliffe and Blei 2008; Wang et al.

2011). The key difference between the two kinds of models lies in the way we

specify the dependency.

Intuitively, in the upstream models, topics and words are potentially dependent on

the sentiment variable, thus it can be regarded as in the “up stream” with its inﬂuence

on other variables directly captured in the model. In the downstream model, the

sentiment variable is assumed to depend on topics, thus the sentiment variable can

be regarded as in the “down stream”, and the model attempts to capture how other

variables (mostly topics) inﬂuence the sentiment variable. Since we treat sentiment

as a response variable of topic variable, it opens up many different ways to model

sentiment, and can easily model numerical ratings, which would be hard to model

with an upstream model.

One thing we need to emphasize about the graphical representation illustrated

in Fig. 6.3 is that we do not explicitly distinguish the scope of sentiment label s,

e.g., a document-level label v.s., a word-level variable. In some existing models, s

is considered as a document-level variable, such that all sd;nis forced to share the

same value (Mcauliffe and Blei 2008; Wang et al. 2011); while some models treat s

as a word-level or sentence-level variable, so that different words or sentences in the

same document might be associated with different sentiment (Jo and Oh 2011;Lin

and He 2009; Mei et al. 2007). Another factor not speciﬁed in Fig. 6.3 is whether

sd;nis observable or latent. In most of downstream models, sd;nis considered as an

observable random variable, e.g., sentiment class label for the documents (Mcauliffe

6 Generative Models for Sentiment Analysis and Opinion Mining 117

and Blei 2008). Some upstream models treat sd;nas latent variables and sentiment

prior is introduced to guide the corresponding model learning process, e.g., in Mei

et al. (2007), Lin and He (2009), and Jo and Oh (2011); while some consider it as

document-level observable variables (Ramage et al. 2009,2011).

Following this categorization, we will introduce the basic modeling assumptions,

model speciﬁcations and interesting ﬁndings and results from upstream and down-

stream models for sentiment analysis in the following sections.

6.2.1 Upstream Models for Sentiment Analysis

Upstream models assume that to generate a word in a text document, one needs to

ﬁrst sample a latent sentiment label, then sample a topic label with respect to this

sentiment category, and ﬁnally sample the word from this chosen topic. One typical

upstream generative model for sentiment analysis is the Topic-Sentiment Mixture

model (TSM) proposed in Mei et al. (2007). TSM is constructed based on the pLSI

model: in addition to assuming a corpus consists of ktopics with neutral sentiment,

TSM introduces two additional sentiment models, one for positive and one for

negative opinions. In TSM, the sentiment models are assumed to be orthogonal

to topic models in the sense that they would assign high probabilities to general

words that are frequently used to express sentiment polarities whereas topical

models would assign high probabilities to words representing topical contents with

neutral opinions. For example, for a collection of MP3 player reviews, the words

“nano,” “price” and “mini” are supposed to be observed more often in the neutral

topic models, “awesome,” “love” are more likely to be found in positive sentiment

models, and “hate,” “bad” are more likely to be found in negative sentiment models.

A new concept called “theme” is then introduced in TSM and it is modeled as a

compound of these three components: neutral words, positive words and negative

words, in each document. The combination of topic models and sentiment models

creates a theme about a particular aspect with certain sentiment polarity in a

given document. And such combination varies across different documents to reﬂect

users’ distinct sentiment polarities toward the same aspect. Once the themes are

determined, a document is modeled as a mixture over the themes, and the rest

generation process follows what in the pLSI model.

We followed the representation used in Mei et al. (2007) to depict the TSM model

in Fig. 6.4. We should note this representation does not follow the conventional

graphical model representation of probabilistic models. According to the ﬁgure,

the generation of words from the document-speciﬁc themes follows the same

assumption as that in a pLSI model. The themes in a particular document are

modeled as another mixture over the corpus-level neutral, positive and negative

topics. As a result, a TSM model can be considered as a three-layer Bayesian model

of documents.

Since TSM model is based on the pLSI model, EM algorithm with a closed

form posterior inference is possible. TSM is unsupervised and it does not directly

118 H. Wang and C.X. Zhai

Neutral

Themes

Positive Negative

θ2

πd1

πdk

πd2

θ2

θ1

1 – λB

λB

θ1δ1,d,F

δ2,d,F

δk,d,F

δj,d,P

δj,d,N

θk

θP

θN

Fig. 6.4 Illustration of Topic-Sentiment Mixture model. f1;

2;:::;

kg,Pand Nlabeled with

“Neutral,” “Positive” and “Negative” in the dash round box denote the neutral, positive and

negative topics in the corpus accordingly. f1;

2;:::;

kglocated in the dash round box labeled

with “theme” denote the themes of a particular document. A theme is modeled as a mixture

over the latent neutral, positive and negative topics; and the mixing weights are denoted as

fıi;d;F;ı

j;d;P;ı

j;d;Ngfor each speciﬁc theme i.Brepresents the background topic model, and words

in a given document are sampled from a mixture of the themes and background topic (Mei et al.

2007)

model sentiment labels. In TSM, sentiment prior extracted from external corpus

was introduced to the EM algorithm to guide the parameter estimation of sentiment

models. Thus a collection of text data with sentiment labels is needed to induce

priors for effective separation of positive and negative topics, but the sample text

data does not have to be related to the opinionated text data to be analyzed. With the

learned topic models and sentiment models in TSM, topic life cycles and sentiment

dynamics can be extracted from text documents. These mining results provide

unique insights about the latent sentiment conveyed in unstructured text data.

Because TSM model is based on the pLSI model, it also suffers from its

limitations, e.g., overﬁtting and can hardly generalize to unseen documents. Several

follow-up work tries to address the limitations with LDA’s modeling assumptions.

In (Lin and He 2009), Lin and He proposed a joint sentiment and topic (JST)

model for sentiment analysis. In JST model, a corpus is assumed to contain Sk

topics, where Sis the number of sentiment categories, e.g., positive, negative and

neutral. As a result, in JST the combination of topics and sentiments is modeled as

a Cartesian product between topic models and sentiment models, similarly to the

linear interpolation combination assumed in the TSM model.

As an upstream model, JST model ﬁrst samples a sentiment label and then sam-

ples topic assignment and the word from corresponding distributions. To generate a

document with the JST model, one needs to ﬁrst sample a sentiment mixture for that

document from a shared Dirichlet distribution; and under each sentiment category,

sample a topic mixing proportion from another corpus-level Dirichlet distribution.

6 Generative Models for Sentiment Analysis and Opinion Mining 119

Fig. 6.5 Graphical model representation of Joint Sentiment and Topic (JST) model. is a S-

by-Tmatrix controlling the word distribution under each sentiment-topic combination. is the

sentiment mixture proportion in a given document, and it is assumed to be drawn from a Dirichlet

distribution with parameter .lis a speciﬁc sentiment assignment for word w, and it also controls

the topic assignment zof this word. is Sk-dimensional vectors, which denote the topic proportion

under each sentiment class in this document (Lin and He 2009)

Speciﬁcally, the topic proportion in each document is modeled as Sk-dimensional

vectors, which allow different topic mixtures under different sentiment categories.

Gibbs sampling is used to perform the posterior inference of latent variables in JST,

e.g., latent topic assignments, sentiment and topic mixture. The graphical model

representation of JST model is illustrated in Fig. 6.5.

Given JST model is also an unsupervised model, sentiment prior is vital for

it. Sentiment seed words are injected as the prior for the word distribution under

different topics in JST. The authors reported that without sentiment prior, JST’s

performance in sentiment categorization is close to random (Lin and He 2009).

Jo and Oh’s Aspect and Sentiment Uniﬁcation Model (ASUM) employs the same

generative assumption as that in JST model. But to enforce the topic and sentiment

coherence inside a document, they further assumed all the words in one sentence

share the same topic and sentiment assignment. The same posterior inference

procedure as that in JST model is applied in ASUM, which takes sentence as the

basic unit for inference. Because ASUM is based on the same generation assumption

as that JST, it also heavily depends on sentiment seed words to differentiate different

types of sentiments.

A different variant of upstream generative model for sentiment analysis is

proposed in Zhao et al.’s work in Zhao et al. (2010). In particular, a Maximum

Entropy (ME) model is introduced into LDA model to control the selection of words

from background topic, aspect-speciﬁc topics and opinion-speciﬁc topics. In the

proposed ME-LDA model, a given word can be generated from ﬁve different types

of topics: background topic, general aspect topic, aspect-speciﬁc topics, general

opinion topics and aspect-speciﬁc topics. And a particular word’s assignment to

those ﬁve topics is controlled by a Maximum Entropy model based on discriminative

features extracted from previous, current and next words’ POS tags, and word

content. The authors used a set of training sentences with labeled background,

120 H. Wang and C.X. Zhai

aspect and opinion words to estimate the ME model beforehand. With this pretrained

ME model on a separately labeled corpus, ME-LDA should really be regarded as a

hybrid of generative and discriminative model.

The generative topic models have been used as building blocks in many other

sentiment analysis tasks. Lu et al., used pLSI model to integrate opinions expressed

in a well-written expert review with lots of opinions scattering in various sources

such as blogspaces and forums (Lu and Zhai 2008). Sentiment prior is given to

the pLSI model to identify sentiment-oriented aspects from expert reviews. Such

sentiment-oriented aspects are then used to retrieve the most relevant sentences from

various sources of opinionated text data. Later on, they used topics learned from

pLSI models as lower dimensional representation of documents for clustering (Lu

et al. 2009). In each aspect-speciﬁc document clusters, the overall sentiment rating

is aggregated to predict aspect-level opinions.

From the discussion above, we can observe that most of the typical upstream

generative models for sentiment analysis treat sentiment label as latent variable

over each word, and sentiment prior is used to inject sentiment polarity into the

models. Although such a modeling approach provides ﬂexibility of identifying

distinct opinions on individual words, strong knowledge about sentiment is required

to ensure satisfactory analysis results. As an alternative solution, Ramage et al.’s

Labeled-LDA model provides a different perspective of modeling sentiment with

topics in an upstream model (Ramage et al. 2009). Speciﬁcally, in Labeled-LDA

model, sentiment can be modeled as document-level variables, which is directly

observable. And the choice of document sentiment labels affects the topic mixing

proportion in this document. Later on, partially Labeled-LDA model was developed

to handle the situation, in which some of labels are not directly observable in a

document (Ramage et al. 2011).

6.2.2 Downstream Models for Sentiment Analysis

Downstream models reverse the generation assumption between the sentiment

labels and latent topic assignments: to generate a text document, one needs to

ﬁrst select the topic assignments in this document, and sample the words and

sentiment labels with respect to those topics. One typical downstream generative

model for sentiment analysis is Blei and McAuliffe’s supervised LDA (sLDA)

model (Mcauliffe and Blei 2008). The graphical model representation of sLDA

model is illustrated in Fig. 6.6.

The assumed generation process of text content in sLDA model is identical to

that assumed in LDA model. In addition to document generation, sLDA assumes the

document-level response variable yis drawn from a Gaussian distribution with mean

TNzand standard deviation , in which NzD1

NPN

nD1zn, i.e., the mean vector of topic

assignments in document d. With this continuous assumption about the response

variable y, sLDA can be used as a regression model to model the opinion ratings

in text documents. The generation of ycan be further modeled with a generalized

6 Generative Models for Sentiment Analysis and Opinion Mining 121

Fig. 6.6 Graphical model

representation of supervised

Latent Dirichlet Allocation

(sLDA) model. yis the

response variable observed in

document d(Mcauliffe and

Blei 2008)

linear model, e.g., a logistic model, to model discrete sentiment classes. Variational

inference similar to that used in LDA model can be applied in sLDA model for

posterior inference. Later on, Zhu et al. introduced the idea of maximum margin

training in sLDA model for better predictive performance (Zhu et al. 2009). Blei

and Wang extended sLDA to a collaborative setting (Wang and Blei 2011), where

collaborative ﬁltering based on users’ opinion ratings can be achieved in the latent

topic space.

Boyd-Graber and Resnik further generalized sLDA model to perform holistic

sentiment analysis across languages (Boyd-Graber and Resnik 2010). In their

proposed MLSLDA model, topics organized according to some shared semantic

structure that can be represented as a tree, and the sentiment label in a given

document is modeled as a regression response variable with respect to the topic

assignments. As a result, MLSLDA simultaneously identiﬁes how multilingual

concepts are clustered into thematically coherent topics and how topics associated

with text connect to the sentiment ratings.

In (Lin et al. 2012), Lin and He performed an interesting reparameterization

of JST to turn their original upstream JST model into a new downstream joint

sentiment-topic model, named Reverse-JST. In Reverse-JST, it is assumed that to

generate the word sequence in a given document, one needs to ﬁrst sample topic

assignment, then sample sentiment category with respect to the selected topic,

and select a word under this topic sentiment combination. Without the sentiment

seed words being speciﬁed, the JST model and Reverse-JST model are essentially

the same, since both of them model the combination of topics and sentiments

with Cartesian product. The authors’ empirical evaluation indicates JST performs

consistently better than Reverse-JST when sentiment seed words are available.

One important line of research in downstream generative models for sentiment

analysis focuses on aspect-level understanding of opinions. Those aspect ratings

can be understood as users’ sentiment polarities over the latent topics in a given

document. This line of research exploits and analyzes user-generated opinionated

text content at the detailed topical aspect level and enables a deeper and more

detailed understanding of user opinions.

Titov and McDonald developed a LDA-based generative model called Multi-

Aspect Sentiment (MAS) model for joint modeling of text content and aspect

ratings for sentiment summarization (Titov and McDonald 2008b). In their solution,

two types of topics, i.e., global and local topics, are explicitly modeled; and each

fraction inside a document (modeled as a moving window of sequential words

122 H. Wang and C.X. Zhai

in the document) is assumed to be a mixture over those global and local topics.

Based on the latent topic assignments, aspect ratings are assumed to be determined

by a logistic regression model, which takes the topic assignments and the word

sequence in that window as input. Comparing to sLDA model, which only captures

the document-level sentiment, MAS enables the understanding of sentiment at ﬁner

granularity, in which the detailed prediction of aspect-level opinions is possible.

However, in MAS the aspect-level sentiment labels are assumed to be known to

the model during the training phase. This limits the application of this type aspect-

level sentiment analysis, when such detailed annotations are not available. Wang et

al.’s work in latent aspect rating analysis (LARA) (Wang et al. 2010,2011) alleviates

the dependency on the fully annotated data and enables in-depth understanding of

users’ opinions at the aspect-level. In the LARA model, the overall rating is assumed

to be observable in a given document and it provides guidance for estimation of

corresponding latent aspect ratings. Moreover, in addition to analyzing opinions

expressed in text document at the level of topical aspects to discover each individual

user’s latent opinion on each aspect, the LARA model also identiﬁes the relative

preference users have placed onto those different aspects when forming the overall

judgment.

A two-stage approach based on bootstrapping aspect segmentation and latent

rating regression model was ﬁrst proposed to solve the problem of LARA in Wang

et al. (2010). This solutions assumes that a set of predeﬁned keywords specifying

the latent topical aspects are available. The overall sentiment rating in a document

is assumed to be drawn from a mixture of the latent aspect ratings. Via posterior

inference, the overall rating can be decomposed into aspect ratings, the inferred

mixing weights reﬂect users’ preference over those latent aspects.

However, this two-step solution is not a fully generative model, because it does

not specify the generation of text content in a document. Later on, a uniﬁed solution

based on LDA model is introduced to jointly identify the latent topical aspects,

and infer the latent aspect weights/ratings from each user’s opinionated review

article (Wang et al. 2011). As shown in Fig. 6.7, in the uniﬁed LARA model, each

latent aspect rating in a given document is assumed to be drawn from a Gaussian

Fig. 6.7 Latent aspect rating analysis (LARA) model. sis a K-dimensional vector indicating

the aspect-level latent opinion ratings. rdenotes observable document-level opinion rating.

Speciﬁcally, the LARA model assumes the overall rating ris determined by the weighted average

of aspect ratings, i.e., rN.Ts;2/(Wang et al. 2011)

6 Generative Models for Sentiment Analysis and Opinion Mining 123

distribution with mean determined by the linear combination of words assigned to

that aspect, e.g., siN.PN

nD1wnijŒwnDvj;znDi; ı2/. Intuitively, the latent

topic assignments zsegment the text content into different aspects, and the observed

words in each aspect segment contribute to the sentiment polarity of corresponding

aspect rating. Then the observable overall rating is assumed to be drawn from

another linear combination of these latent aspect ratings, i.e., rN.Ts;2/.

Variational inference is used to infer the latent topic assignments, aspect ratings

and weights in a given document simultaneously.

Clearly distinct from all previous work in opinion analysis that mostly focuses

on integrated entity-level opinions, LARA reveals individual users’ latent sentiment

preferences at the level of topical aspects in an unsupervised manner. Discovering

such detailed user preferences (which are often hard to obtain by a human from

simply reading many reviews) enables many important applications. First, such

analysis facilitates in-depth understanding of user intents. For example, by mining

the product reviews, LARA recognizes which aspect inﬂuences a particular user’s

purchase decision the most. Second, by identifying each user’s latent aspect

preference in a particular domain (e.g., hotel booking), personalized result ranking

and recommendation can be achieved. Third, discovering the general population’s

sentiment preferences over different aspects of a particular product or service

provides a more effective way for businesses to manage their customer relationship

and conduct market research.

Follow up work extended LARA model in different directions. Diao et al.

introduced collaborative ﬁltering into LARA modeling to uniformly model different

users’ rating preferences in a generative manner (Diao et al. 2014). Wu and Ester

also combined the LARA model with collaborative ﬁltering method to predict the

latent aspect ratings even when the users have not generated the review content (Wu

and Ester 2015). Both of these two models enable aspect-based recommendation.

6.3 Applications of Generative Models for Sentiment

Analysis

In the above discussions, we have summarized the most representative works in

modeling opinionated text documents with generative models. In this section, we

review the landscape of application opportunities of such models.

6.3.1 Sentiment Lexicon Construction

A sentiment lexicon can be directly used for sentiment tagging or suggesting

useful features for supervised learning approaches to sentiment analysis. One major

challenge in constructing a sentiment lexicon is that the polarity of a word such as

124 H. Wang and C.X. Zhai

“long” highly depends on the context; for example, “long battery life” is positive,

while “long rebooting time” is negative in the same review of a laptop. Thus a

lexicon must incorporate context when specifying the polarity of a word.

A generative model can capture context by using appropriate latent variables, and

thus be useful for constructing a topic-speciﬁc sentiment lexicon. The sentiment

polarity of a word can be modeled in two different ways in a generative model.

In the ﬁrst, we may explicitly have a positive or negative topic represented as a

word distribution. In such a case, the probability of a word can be regarded as an

indicator of polarity, thus a word with very high probability according to a positive

model would be tagged as a positive word and the probability can be used as a

measure of conﬁdence which may be useful to include in the lexicon. In the second,

the sentiment of a term is modeled with a real number, which can be positive or

negative, depending on the sentiment of the word. In such a case, a high positive

weight would indicate a very positive word (for the corresponding topic).

One example of work in the ﬁrst category is the topic-sentiment mixture model

(Mei et al. 2007). In this work, the authors demonstrated a list of positive and

negative words speciﬁc to the topics of “movies” and “cities”: “beautiful,” “love”

and “awesome” are automatically identiﬁed as positive for “cities” while “hate,”

“trafﬁc” and “stink” are identiﬁed as negative for this topic. The authors in Lin and

He (2009) also reported a similar list of learned sentiment lexicon from JST model

on a movie review data set. However, as we discussed before, upper stream models

depend on sentiment priors to determine the sentiment polarity of learned topics.

The bias in those sentiment seed words determine the qualify of learned sentiment

lexicon.

Another example of the ﬁrst category is the downstream model sLDA (Mcauliffe

and Blei 2008). In general, the downstream models can resolve the dependency on

sentiment prior by directly learning from the given sentiment labels. In (Mcauliffe

and Blei 2008), the authors applied sLDA on a set of labeled movie reviews,

where the learned topics are directly aligned with numerical sentiment polarities,

e.g., a topic represented by the words of “least,” “problem” and “unfortunately” is

strongly correlated with negative opinion while the topic represented by the words

of “motion,” “simple” and “perfect” is strongly correlated with negative opinion.

An example of the second category is the LARA model (Wang et al. 2010), which

is also a downstream model, but in contrast with sLDA, LARA uses numerical

weights to model the sentiment of a word, and thus can learn a topic-speciﬁc lexicon

in the form of positive and negative weights for words. Table 6.1 illustrates an

sample output from the LARA model (Wang et al. 2010), where the aspect speciﬁc

word sentiment polarity was learned from a collection of hotel reviews.

As shown in the table, words “linen”, “walk” and “beach” do not have opinion

annotations in general sentiment lexicons, e.g., SentiWordNet (Esuli and Sebastiani

2006), since they are nouns, while the LARA model automatically assigns them

positive sentiment likely because “linen” may suggest the “cleanliness” condition is

good and “walk” and “beach” might imply the location of a hotel is convenient.

In general, one can potentially design a generative model to embed a particular

perspective of topical context as needed for an application to automatically construct

6 Generative Models for Sentiment Analysis and Opinion Mining 125

Table 6.1 Estimated word sentiment polarities under different aspects. The numbers to the right

of listed words indicate their learned sentiment weight from a LARA model (Wang et al. 2010)

Val u e Rooms Location Cleanliness

Resort 22.80 View 28.05 Restaurant 24.47 Clean 55.35

Value 19.64 Comfortable 23.15 Walk 18.89 Smell 14.38

Excellent 19.54 Modern 15.82 Bus 14.32 Linen 14.25

Worth 19.20 Quiet 15.37 Beach 14.11 Maintain 13.51

Quality 18.60 Spacious 14.25 Perfect 13.63 Spotlessly 8.95

Bad 24.09 Carpet 9.88 Wall 11.70 Smelly 0.53

Money 11.02 Smell 8.83 Bad 5.40 Urine 0.43

Terrible 10.01 Dirty 7.85 MRT 4.83 Filthy 0.42

Overprice 9.06 Stain 5.85 Road 2.90 Dingy 0.38

Cheap 7.31 Ok 5.46 Website 1.67 Damp 0.30

a topic-speciﬁc lexicon that would capture the desired dependency of sentiment on

context. Such a lexicon may itself be used directly as knowledge about people’s

opinions about a topic, thus facilitating comparative analysis of opinions across

opinion holders or other interesting context variables.

6.3.2 Sentiment Annotation and Pattern Discovery

Another direct application of the generative models for sentiment analysis is

sentiment annotation and pattern discovery. Sentiment annotation is to tag a text

object with sentiment labels which can be categorical (e.g., positive vs. negative vs.

neutral) or numerical (i.e., ratings). Once tagging is done, we can easily examine

patterns of opinions by associating sentiment labels with context variables such

as time, location, and sources of opinions to reveal patterns of opinions such as

spatiotemporal trends of opinions.

In (Lin and He 2009), the JST model is reported to achieve comparable perfor-

mance as supervised statistical algorithms in binary sentiment classiﬁcation. And

sLDA is reported to have better predictive power than the supervised lasso least-

square regression model trained on LDA model’s topic output (Mcauliffe and Blei

2008). With maximum margin estimation method, further improved classiﬁcation

performance is achieved in MedLDA model (Zhu et al. 2009). The aspect-level

sentiment model, e.g., MAS (Titov and McDonald 2008b) and LARA (Wang

et al. 2010,2011), can also predict aspect-level sentiment ratings, which might be

unobservable during the training process, thus enabling discovery of latent patterns

of opinions at the level of subtopics.

Based on the identiﬁed sentiment polarity from text content, temporal dynamics

of opinions in user-generated content is studied in TSM model (Mei et al. 2007). A

hidden Markov model is built based on the TSM model’s identiﬁed neutral, positive

126 H. Wang and C.X. Zhai

and negative opinions over time to capture the topic life cycles and sentiment

dynamics. Similar idea has been explored in Si et al. (2013) to leverage topic based

sentiments from Twitter to help predict the stock market. A continuous Dirichlet

Process Mixture model is developed to estimate the daily topic set, which is mapped

to a sentiment time series according to predeﬁned sentiment lexicon. A regression

model is build to predict the stock index with respect to this Twitter sentiment time

series.

6.3.3 Topic-Speciﬁc Sentiment Summarization

Yet another interesting application of the generative sentiment analysis models is to

generate topic-speciﬁc sentiment summaries. Summarization of opinions facilitates

digestion of opinions by users and also provides entry points for a user to navigate

into detailed information about a speciﬁc aspect of opinion. In (Jo and Oh 2011),

review text content can be summarized according to its topic and sentiment.

Table 6.2 illustrated the aspect-speciﬁc sentiment summarization reported in Wang

et al. (2010). Such detailed aspect-level sentiment analysis and summarization

provide ﬂexibility for ordinal users to navigate through the opinionated text corpus.

6.3.4 Deep Analysis of Latent Preferences of Opinion Holders

An important application enabled by generative models is deep analysis of latent

preferences of opinion holders. While the applications discussed above can all

Table 6.2 Aspect-based comparative summarization (Hotel Max in Seattle as an example) (Wang

et al. 2010)

Aspect Summary Rating

Val u e Truly unique character and a great location at a reasonable price Hotel

Max was an excellent choice for our recent three night stay in Seattle

3.1

Overall not a negative experience, however considering that the hotel

industry is very much in the impressing business there was a lot of room

for improvement

1.7

Room We chose this hotel because there was a Travelzoo deal where the Queen

of Art room was $139.00/night

3.7

Heating system is a window AC unit that has to be shut off at night or

guests will roast

1.2

Location The location ‘a short walk to downtown and Pike Place market’ made the

hotel a good choice

3.5

When you visit a big metropolitan city, be prepared to hear a little trafﬁc

outside!

2.1

6 Generative Models for Sentiment Analysis and Opinion Mining 127

Table 6.3 User rating

behavior analysis (Wang et al.

2010)

Expensive hotel Cheap hotel

Aspect 5Stars 3Stars 5Stars 1Star

Val u e 0:134 0:148 0:171 0:093

Room 0:098 0:162 0:126 0:121

Location 0:171 0:074 0:161 0:082

Cleanliness 0:081 0:163 0:116 0:294

Service 0:251 0:101 0:101 0:049

be potentially supported by other approaches to sentiment analysis, the deep

analysis of latent preferences of opinion holders cannot be easily supported by

other approaches, and thus represents a unique advantage of generative models for

sentiment analysis. This unique beneﬁt comes from the explicit use of meaningful

latent variables in a generative model to model and capture the latent information

about an opinion holder.

For example, the aspect-level sentiment analysis enabled by LARA model

enables the in-depth understanding of users’ sentiment preference in their decision

making process. In (Wang et al. 2010), the authors demonstrated the learned aspect

weights in a hotel data set (see in Table 6.3), and such latent weights unveil

reviewers’ detailed sentiments preference over those aspects.

It is interesting to note that according to the learned aspect preference weights

in Table 6.3, reviewers give the “expensive hotels” high ratings mainly due to their

nice services and locations, while they give low ratings to such hotels because of

undesirable room condition and overprice. In contrast, reviewers give the “cheap”

hotels high ratings mostly because of the good price/value and good location, while

giving low ratings for its poor cleanliness condition. Such analysis can be performed

for different groups of hotels, or different groups of consumers, or different time

periods, etc, thus enabling potentially many interesting applications. Note that such

a deep understanding of reviewers cannot be easily achieved by other approaches

to sentiment analysis; indeed, it cannot even be easily achieved by humans even if

they read all the reviews, thus representing an important beneﬁt of using generative

models for sentiment analysis.

Such a deep understanding of latent preferences would further enable many

applications, particularly those requiring better understanding people’s behavior and

preferences and ﬁnding groups of people with shared preferences. Examples include

market research where we want to understand consumer’s preferences, business

intelligence where we want to understand the relative strength and weakness of a

product with respective to another product for a particular group of consumers, and

targeted advertising where the goal is to discover groups of consumers that may

potentially ﬁnd a product appealing.

128 H. Wang and C.X. Zhai

6.3.5 Entity Ranking and Recommendation

Generative models enable detailed understanding of opinions about entities such as

products as well as detailed understanding of preferences of people such as review-

ers. Thus they can be used to generate more informative representations for both

entities and users, which further helps improving the ranking and recommendation

of entities for users.

For example, based on the identiﬁed aspect preferences, collaborative ﬁltering

can be performed. In (Wang and Blei 2011), scientiﬁc article recommendation is

performed based on the learned latent topics in each individual user from their rating

history. Comparing to the tradition collaborative ﬁltering solutions, which can only

provide item-level recommendations, the collaborative topic model enables topic-

speciﬁc recommendations. Diao et al.’s JMARS model identiﬁes users’ aspect-level

sentiment preference and the content distribution in their generated review content

(Diao et al. 2014). Improved recommendation performance is reported comparing

to traditional collaborative ﬁltering solutions.

In LARA (Wang et al. 2010), the inferred reviewer preferences can be leveraged

to support personalized entity recommendation. Speciﬁcally, a user can specify his

or her preferences (e.g., price is much more important than service or location), and

the system can selectively use only those reviewers that are written by reviewers

with similar preferences to recommend hotels, instead of using the generic set

of all reviewers, making the recommendation more accurately reﬂect the speciﬁc

preferences of this particular group of users. Such a personalized recommendation

is only possible because of the inferred latent preference information, which enabled

us to know which reviewers have put more weight on price than on location and

service.

6.3.6 Social Network and Social Media Analysis

The generative model based solutions for sentiment analysis have also been explored

in the context of social networks. Liu et al. explore topic modeling technique to

study topic-level inﬂuence in heterogeneous networks (Liu et al. 2010). Rao et

al. developed a supervised topic model to analyze emotion based on social media

content (Rao et al. 2014). Xu et al. developed a pLSI-based generative model to

analyze users’ posting behaviors on Twitter: via generative modeling, the motivation

of a user’s posting behavior is decomposed into the factors of breaking news, posts

from social friends and user’s intrinsic interest.

6 Generative Models for Sentiment Analysis and Opinion Mining 129

6.4 Resources on the Web

Most of aforementioned generative models for sentiment analysis have open

implementations online and there are also publicly available sentiment data sets

on the Web. In this section, we will brieﬂy summarize some resources for this line

of research.

David M. Blei maintains a page for topic modeling, where implementations of

many LDA-based generative models (e.g., the LDA (Blei et al. 2003) and sLDA

(Mcauliffe and Blei 2008) models) are provided: http://www.cs.princeton.edu/~blei/

topicmodeling.html. The Stanford Natural Language Processing group provides a

Topic Modeling Toolbox, which can easily import and manipulate text from cells

in Excel and other spreadsheets. This toolbox focuses on helping social scientists

and others who wish to perform analysis on datasets that have a substantial textual

component. Implementations of LDA and Labeled-LDA (Ramage et al. 2009) mod-

els are provided in this toolbox. Andrew McCallum and David Mimno developed

a Java-based package for statistical text document modeling named MALLET

(McCallum 2002), which provides implementations of several aforementioned topic

models, e.g., LDA model. Besides those generic implementation of standard topic

models, there are also implementations of those speciﬁc generative models for

sentiment analysis introduced above. The authors of JST model (Lin and He 2009)

provide their implementation on GitHub at: https://github.com/linron84/JST.And

the authors of LARA model (Wang et al. 2010) provide their implementation of

two-step solution at: http://www.cs.virginia.edu/~hw5x/Codes/LARA.zip.

Besides those open implementation of generative models, there are also public

sentiment data sets available on the Web. The Stanford Network Analysis Project

provides a large collection of Amazon reviews, spanning a period of 18 years,

including around 35 million reviews up to March 2013. The data can be found

at http://snap.stanford.edu/data/web-Amazon.html. The authors of book “Sentiment

Analysis and Opinion Mining” (Liu 2012) also provide a large collection of amazon

reviews at http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html, where addi-

tional sentence-level positive and negative annotations are possible in a subset of

reviews. Yelp.com hosts an annual “Yelp Dataset Challenge,” which provides more

than 1.6 million Yelp reviews from more than 366k users. Besides the text content

and opinion ratings, this Yelp data set also includes the social connections among

those reviewers. In addition to those user review data sets, twitter data sets with

sentiment annotations are also available. Go et al. manually created a collection of

40,216 tweets with polarity sentiment labels (Go et al. 2009). This data set can be

found at http://help.sentiment140.com/for-students. Shamma et al. used Amazon

Mechanical Turk to annotate sentiment polarities in 3,269 tweets posted during

the presidential debate on September 26, 2008 between Barack Obama and John

McCain (Shamma et al. 2009). The data set can be found at https://bitbucket.org/

speriosu/updown/src/5de483437466/data/. Saif et al. provided a survey of datasets

for twitter sentiment analysis (Saif et al. 2013).

130 H. Wang and C.X. Zhai

6.5 Summary

In this chapter, we provide an introduction and systematic review of generative

models for sentiment analysis, which represent an important family of (mostly

unsupervised) approaches to sentiment analysis that can be potentially applied

to any opinionated text data due to their generality and robustness. They are

especially powerful in inferring latent variables about opinion holders or detailed

opinions about speciﬁc subtopics and can very effectively perform joint analysis

of both opinionated text data and the companion numerical ratings. Besides sup-

porting common applications of sentiment analysis such as sentiment classiﬁcation,

sentiment lexicon construction, and sentiment summarization, they also enable

many other interesting new applications such as topic-speciﬁc lexicon construction,

detailed opinion pattern discovery in association with context variables such as time,

location, and sources, personalized entity ranking and recommendation, and deep

analysis of latent preferences of opinion holders. When using appropriate latent

variables, such generative models can discover latent opinion patterns from large

amounts of data that are hard to discovery by humans even if they have time to read

all the opinionated text data, thus are essential tools for building intelligent systems

for opinion understanding and its related applications, as well as for research in

computational social science.

References

Andrieu, C., N. De Freitas, A. Doucet, and M.I. Jordan. 2003. An introduction to MCMC for

machine learning. Machine Learning 50(1–2): 5–43.

Bishop, C.M. 2006. Pattern recognition and machine learning. New York: Springer.

Blei, D.M. 2012. Probabilistic topic models. Communications of the ACM 55(4): 77–84.

Blei, D.M., and J.D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd Interna-

tional Conference on Machine Learning, 113–120. ACM.

Blei, D.M., and J.D. Lafferty. 2007. A correlated topic model of science. The Annals of Applied

Statistics 1(1): 17–35.

Blei, D.M., A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine

Learning Research 3: 993–1022.

Boyd-Graber, J., and P. Resnik. 2010. Holistic sentiment analysis across languages: Multilingual

supervised latent Dirichlet allocation. In Proceedings of the 2010 Conference on Empirical

Methods in Natural Language Processing (EMNLP ’10), 45–55, Stroudsburg. Association for

Computational Linguistics.

Brants, T., F. Chen, and I. Tsochantaridis. 2002. Topic-based document segmentation with

probabilistic latent semantic analysis. In Proceedings of the Eleventh International Conference

on Information and Knowledge Management, 211–218. ACM.

Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for

sentiment analysis based on conceptual primitives. In: COLING, 2666–2677.

Chen, S.F., and J. Goodman. 1996. An empirical study of smoothing techniques for language

modeling. In Proceedings of the 34th Annual Meeting on Association for Computational

Linguistics, 310–318. Association for Computational Linguistics.

6 Generative Models for Sentiment Analysis and Opinion Mining 131

Dave, K., S. Lawrence, and D.M. Pennock. 2003. Mining the peanut gallery: Opinion extraction

and semantic classiﬁcation of product reviews. In Proceedings of the 12th International

Conference on World Wide Web (WWW ’03), 519–528. New York: ACM.

Deerwester, S.C., S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. 1990. Indexing

by latent semantic analysis. JAsIs 41(6): 391–407.

Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via

the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological) 39: 1–38.

Diao, Q., M. Qiu, C.-Y. Wu, A.J. Smola, J. Jiang, and C. Wang. 2014. Jointly modeling aspects,

ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 193–202.

ACM.

Ding, X., and B. Liu. 2007. The utility of linguistic rules in opinion mining. In Proceedings

of the 30th Annual International ACM SIGIR Conference on Research and Development in

Information Retrieval, 811–812. ACM.

Ding, X., B. Liu, and P.S. Yu. 2008. A holistic lexicon-based approach to opinion mining. In

Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM

’08), 231–240. New York: ACM.

Esuli, A., and F. Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion

mining. In Proceedings of LREC, vol. 6, 417–422. Citeseer.

Go, A., R. Bhayani, and L. Huang. 2009. Twitter sentiment classiﬁcation using distant supervision.

CS224N Project Report, Stanford, 1: 12.

Grifﬁths, T.L., and M. Steyvers. 2004. Finding scientiﬁc topics. Proceedings of the National

Academy of Sciences, 101(suppl 1): 5228–5235.

Harris, Z.S. 1954. Distributional structure. Word .

Hearst, M.A., S.T. Dumais, E. Osman, J. Platt, and B. Scholkopf. 1998. Support vector machines.

Intelligent Systems and their Applications, IEEE, 13(4): 18–28.

Hiemstra, D., and W. Kraaij. 1998. Twenty-one at TREC7: ad-hoc and cross-language track. In

Proceedings of The Seventh Text REtrieval Conference (TREC 1998), Gaithersburg, 174–185,

9–11 Nov 1998.

Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual

International ACM SIGIR Conference on Research and Development in Information Retrieval,

50–57. ACM.

Hong, L., and B.D. Davison. 2010. Empirical study of topic modeling in twitter. In Proceedings of

the First Workshop on Social Media Analytics, 80–88. ACM.

Hu, X., J. Tang, H. Gao, and H. Liu. 2013. Unsupervised sentiment analysis with emotional

signals. In Proceedings of the 22nd International Conference on World Wide Web, 607–618.

International World Wide Web Conferences Steering Committee.

Jo, Y., and A.H. Oh. 2011. Aspect and sentiment uniﬁcation model for online review analysis. In

Proceedings of the Fourth ACM International Conference on Web Search and Data Mining,

815–824. ACM.

Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant

features. Berlin/New York: Springer.

Jordan, A. 2002. On discriminative vs. generative classiﬁers: A comparison of logistic regression

and naive bayes. Advances in Neural Information Processing Systems 14: 841.

Jurafsky, D., and J.H. Martin. 2009. Speech and language processing: An introduction to natural

language processing, computational linguistics, and speech recognition.

Katz, S.M. 1987. Estimation of probabilities from sparse data for the language model component

of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):

400–401.

Kim, S.-M., and E. Hovy. 2004. Determining the sentiment of opinions. In Proceedings of the 20th

International Conference on Computational Linguistics, 1367. Association for Computational

Linguistics.

132 H. Wang and C.X. Zhai

Leskovec, J., D. Huttenlocher, and J. Kleinberg. 2010. Predicting positive and negative links in

online social networks. In Proceedings of the 19th International Conference on World Wide

Web, 641–650. ACM.

Lin, C., and Y. He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of the

18th ACM Conference on Information and Knowledge Management, 375–384. ACM.

Lin, C., Y. He, R. Everson, and S. Rüger. 2012. Weakly supervised joint sentiment-topic detection

from text. IEEE Transactions on Knowledge and Data Engineering 24(6): 1134–1145.

Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language

Technologies 5(1): 1–167.

Liu, B. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge:

Cambridge University Press.

Liu, L., J. Tang, J. Han, M. Jiang, and S. Yang. 2010. Mining topic-level inﬂuence in heterogeneous

networks. In Proceedings of the 19th ACM International Conference on Information and

Knowledge Management (CIKM ’10), 199–208. New York: ACM.

Lu, Y., M. Castellanos, U. Dayal, and C. Zhai. 2011. Automatic construction of a context-

aware sentiment lexicon: An optimization approach. In Proceedings of the 20th International

Conference on World Wide Web, 347–356. ACM.

Lu, Y., and C. Zhai. 2008. Opinion integration through semi-supervised topic modeling. In

Proceedings of the 17th International Conference on World Wide Web, 121–130. ACM.

Lu, Y., C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In

Proceedings of the 18th International Conference on World Wide Web, 131–140. ACM.

McAuley, J., and J. Leskovec. 2013. Hidden factors and hidden topics: Understanding rating

dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender

Systems, 165–172. ACM.

Mcauliffe, J.D., and D.M. Blei. 2008. Supervised topic models. In Advances in Neural Information

Processing Systems, 121–128.

McCallum, A.K. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.

edu.

McLachlan, G., and T. Krishnan. 2007. The EM algorithm and extensions, vol. 382. John Wiley &

Sons.

Mei, Q., X. Ling, M. Wondra, H. Su, and C. Zhai. 2007. Topic sentiment mixture: Modeling facets

and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide

Web, 171–180. ACM.

Mei, Q., and C. Zhai. 2005. Discovering evolutionary theme patterns from text: An exploration of

temporal text mining. In Proceedings of the Eleventh ACM SIGKDD International Conference

on Knowledge Discovery in Data Mining, 198–207. ACM.

Melville, P., W. Gryc, and R.D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical

knowledge with text classiﬁcation. In Proceedings of the 15th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining (KDD ’09), 1275–1284. New York:

ACM.

Mimno, D., and A. McCallum. 2008. Topic models conditioned on arbitrary features with

Dirichlet-multinomial regression. The 24th Conference on Uncertainty in Artiﬁcial Intelli-

gence, 411–418.

Moghaddam, S., and M. Ester. 2011. ILDA: Interdependent LDA model for learning latent aspects

and their ratings from online product reviews. In Proceedings of the 34th International ACM

SIGIR Conference on Research and Development in Information Retrieval, 665–674. ACM.

Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in

Information Retrieval 2(1–2): 1–135.

Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classiﬁcation using machine

learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in

Natural Language Processing, vol. 10, 79–86. Association for Computational Linguistics.

Ponte, J.M., and W.B. Croft. 1998. A language modeling approach to information retrieval.

In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and

Development in Information Retrieval (SIGIR ’98), 24–28 Aug 1998, Melbourne, 275–281.

6 Generative Models for Sentiment Analysis and Opinion Mining 133

Poria, S., E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data ﬂow analysis

by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):

26–36.

Rabiner, Lawrence R., and Biing-Hwang Juang. 1993. Fundamentals of speech recognition. Upper

Saddle River: Prentice-Hall.

Ramage, D., D. Hall, R. Nallapati, and C.D. Manning. 2009. Labeled LDA: A supervised topic

model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference

on Empirical Methods in Natural Language Processing, vol. 1, 248–256. Association for

Computational Linguistics.

Ramage, D., C.D. Manning, and S. Dumais. 2011. Partially labeled topic models for interpretable

text mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining, 457–465. ACM.

Rao, D., and D. Ravichandran. 2009. Semi-supervised polarity lexicon induction. In Proceedings of

the 12th Conference of the European Chapter of the Association for Computational Linguistics,

675–682. Association for Computational Linguistics.

Rao, Y., Q. Li, X. Mao, and L. Wenyin. 2014. Sentiment topic models for social emotion mining.

Information Sciences 266: 90–100.

Saif, H., M. Fernandez, Y. He, and H. Alani, 2013. Evaluation datasets for twitter sentiment

analysis a survey and a new dataset, the sts-gold. In Proceedings, 1st workshop on emotion

and sentiment in social and expressive media (ESSEM) in conjunction with AIIA conference,

Turin.

Shamma, D.A., L. Kennedy, and E.F. Churchill. 2009. Tweet the debates: Understanding commu-

nity annotation of uncollected sources. In Proceedings of the First SIGMM Workshop on Social

Media, 3–10. ACM.

Si, J., A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng. 2013. Exploiting topic based twitter

sentiment for stock prediction. In ACL (2), 24–29.

Smola, A., and S. Narayanamurthy. 2010. An architecture for parallel topic models. Proceedings

of the VLDB Endowment 3(1–2): 703–710.

Steyvers, M., P. Smyth, M. Rosen-Zvi, and T. Grifﬁths. 2004. Probabilistic author-topic models for

information discovery. In Proceedings of the Tenth ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, 306–315. ACM.

Taboada, M., J. Brooke, M. Toﬁloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for

sentiment analysis. Computational linguistics, 37(2): 267–307.

Teh, Y.W., M.I. Jordan, M.J. Beal, and D.M. Blei. 2006. Hierarchical Dirichlet processes. Journal

of the American Statistical Association 101(476).

Titov, I., and R. McDonald 2008a. Modeling online reviews with multi-grain topic models. In

Proceedings of the 17th International Conference on World Wide Web, pages 111–120. ACM.

Titov, I., and R.T. McDonald 2008b. A joint model of text and aspect ratings for sentiment

summarization. In ACL, vol. 8, 308–316. Citeseer.

Wainwright, M.J., and M.I. Jordan. 2008. Graphical models, exponential families, and variational

inference. Foundations and Trends® in Machine Learning 1(1–2): 1–305.

Wang, C., and D.M. Blei. 2011. Collaborative topic modeling for recommending scientiﬁc articles.

In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, 448–456. ACM.

Wang, H., Y. Lu, and C. Zhai. 2010. Latent aspect rating analysis on review text data: A rating

regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, 783–792. ACM.

Wang, H., Y. Lu, and C. Zhai. 2011. Latent aspect rating analysis without aspect keyword

supervision. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining, 618–626. ACM.

Wang, Y., H. Bai, M. Stanton, W.-Y. Chen, and E.Y. Chang. 2009. PLDA: Parallel latent Dirichlet

allocation for large-scale applications. In Algorithmic Aspects in Information and Management,

301–314. Springer.

134 H. Wang and C.X. Zhai

Wu, C.J. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11:

95–103.

Wu, Y., and M. Ester. 2015. Flame: A probabilistic model combining aspect based opinion mining

and collaborative ﬁltering. In Proceedings of the Eighth ACM International Conference on Web

Search and Data Mining, 199–208. ACM.

Zhai, C., and J. Lafferty 2001a. Model-based feedback in the language modeling approach to

information retrieval. In Proceedings of the Tenth International Conference on Information

and Knowledge Management, 403–410. ACM.

Zhai, C., and J. Lafferty 2001b. A study of smoothing methods for language models applied to

ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR

Conference on Research and Development in Information Retrieval, 334–342. ACM.

Zhai, C., A. Velivelli, and B. Yu. 2004. A cross-collection mixture model for comparative text

mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge

Discovery and Data Mining, 743–748. ACM.

Zhai, K., J. Boyd-Graber, N. Asadi, and M.L. Alkhouja. 2012. Mr. LDA: A ﬂexible large scale

topic modeling package using variational inference in mapreduce. In Proceedings of the 21st

International Conference on World Wide Web, 879–888. ACM.

Zhao, W.X., J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. 2011. Comparing twitter and

traditional media using topic models. In Advances in Information Retrieval, 338–349. Springer.

Zhao, W.X., J. Jiang, H. Yan, and X. Li. 2010. Jointly modeling aspects and opinions with a

maxent-lda hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural

Language Processing (EMNLP ’10), Stroudsburg, 56–65. Association for Computational

Linguistics.

Zhu, J., A. Ahmed, and E.P. Xing. 2009. Medlda: Maximum margin supervised topic models for

regression and classiﬁcation. In Proceedings of the 26th Annual International Conference on

Machine Learning, 1257–1264. ACM.

Chapter 7

Social Media Summarization

Vasudeva Varma, Litton J. Kurisinkel, and Priya Radhakrishnan

Abstract Social media is an important venue for information sharing, discussions

or conversations on a variety of topics and events generated or happening across the

globe. Application of automated text summarization techniques on the large volume

of information piled up in social media can produce textual summaries in a variety

of ﬂavors depending on the difﬁculty of the use case. This chapter talks about the

available set of techniques to generate summaries from different genres of social

media text with an extensive introduction to extractive summarization techniques.

Keywords Social media summarization • Extractive summarization • Conver-

sational summarization • Event summarization • Sentiment analysis • Attribute

extraction semantic similarity • Topic modeling

7.1 Introduction

Text Summarization is one of the prominent areas in the domain of Computational

Text Processing. The relevance of the ﬁeld is of particular interest in the prevailing

era of social media than ever before, given the enormous amount of data available in

diverse styles and formats, from tweets, blogs to articles and news reports. Some of

these data such as tweets and posts of social media stand apart from the conventional

formal-styled texts, due to their highly informal, often non-grammatical usage.

Nevertheless, their prominence in terms of content are no less than any formal

document because of social media data are instantaneous, temporally and topically

relevant and sensitive to affairs of the world. This precisely makes the idea of social

media summarization interesting, despite the challenges posed by the data. In this

chapter we talk about the psychological perspectives about social media usage, then

discuss at length a wide range of issues pertinent to the ﬁeld, present a coherent

description of various methodologies in prevalence and list out the variability in the

choice of summarization technique with the variability in data.

V. Va r m a ( ) • L.J. Kurisinkel • P. Radhakrishnan

International Institute of Information Technology-Hyderabad, Hyderabad, India

e-mail: vv@iiit.ac.in;litton.jKurisinkel@research.iiit.ac.in;priya.r@research.iiit.ac.in

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_7

135

136 V. Varma et al.

Section 7.2 presents an overview of general approaches to automated text

summarization with more emphasis on extractive summarization techniques. We

go on to describe the recent works on extractive summarization in Sect. 7.2.1 and

subsequently the nature of scoring function for candidate summary is discussed.

Section 7.3 is the ﬁnal part which outlines the challenges involved in social media

summarization, General Approaches to Social Media Summarization, event sum-

marization, sentiment analysis and summarization, conversational summarization

and emerging trends in social media summarization under Sects. 7.3.1,7.3.2,7.3.3,

7.3.4,7.3.5 and 7.3.6.

7.1.1 Expressiveness of Social Media

According to Erikson’s psycho-social theory, the phases which characterize the

process of adolescent and adult development include the formation of identity and

the development of intimate relationships. Social networking sites allow people to

engage in activities that reﬂect their identity. Friendships, romantic relationships,

and ideology remain as key aspects of adolescent development. These identity

challenges of adulthood is addressed through self-disclosure, particularly with

peers.

Since online interactions offer a level of anonymity and privacy, which are

quite uncommon in actual interactions, people tend to express themselves more

openly in the relatively safe environment. (Kang 2000) has noted that, ‘Cyberspace

makes talking with strangers easier’. People with stigmatized social identities

(homosexuality or fringe political beliefs) may be inspired to join and participate in

online groups devoted to that particular identity, because of the relative anonymity,

safety in internet and the shortage of such groups in ofﬂine world (Bargh and

McKenna 2004).

Polarization of political opinions, social support groups for various causes,

intimate relationships are expressed by people more openly online than in the ofﬂine

world. This is due to relative insulation from identity disclosure, implicit trust

in the privacy of communication and disruption the reﬂexive operation of racial

stereotypes etc. (Kang 2000). Usage of online networks requires deep faith within.

It depicts about our trust that the information which we share will not be used in

unlawful or deceitful ways. We write open and conﬁdential messages to our friends

and colleagues and believe that it will remain conﬁdential. Due to these reasons, data

obtained from social media are more expressive of the people’s actual opininons

than in most ofﬂine interactions.

7.1.2 Need for Text Summarization on Social Media Data

Social Media interactions are instrumental in massive production and sharing of data

in the form video, images and text. This enormous amount of data can be utilized to

7 Social Media Summarization 137

identify implicit patterns in social behavior which can be utilized for social surveys,

business decisions or framing governmental policies.

The majority of data shared and produced by social media applications is in the

form of text prevalent widely as posts, comments or messages.[The data produced

by social media as a consequence of a particular pattern of social behaviour, can

be huge in size and noisy]. This data needs to be summarized and converted into

intrepretable forms so that the information contained can be utilized for practical

purposes.

The information can be reported in graphical forms like Histograms or Pie charts

which analyze the data on various parameters and present them statistically. But

laymen who are searching for the opinion of masses about a movie, an incident or

a retail product may be ignorant or impatient to interpret these representations. In

such a context, a noise-free textual summary, generated out this huge volume of data

makes it possible to leverage the information for the beneﬁt of a layman end user

who can afford only a ‘skimming’ to grasp the information conveyed.

In other words, while statistical representations can effectively capture the in-

formation pertaining to various speciﬁc parameters from a large social media data,

a text summary aims to capture the information pertaining to contents of various

topics and present a coherent overview of those topics. For example a statistical

representation may rate the cinematography of a movie as good with 4 on a scale

of 5. But a textual summary may actually give an overview of what is good in

the cinematography: say, ‘the veteran cinematographer Rajiv Menon has displayed

sheer brilliance in the climax which received critical acclamation’.

7.2 An Overview of Autmated Text Summarization

A summary is a text that is produced from one or more texts, that conveys important

information in the original text(s), and that is no longer than half of the original text(s)

and usually signiﬁcantly less than that. (Radev et al. 2002)

In an era of information explosion where a large number of sources of informa-

tion co-exist and produce a signiﬁcantly huge content overlap, there is an immense

necessity for auomatic means for summarizing this information so that a noise-

free essence of the entire information available can be brought out. Automated

Text summarization techniques provide means for summarizing textual content and

are broadly classiﬁed into Extractive and Abstractive methods. Abstractive summa-

rization Techniques convert the source text into an internal semantic representation

which in turn is utilized by Natural Language Generation techniques to generate a

summary which is equivalent to a human created summary. Due to the complexity

constraints of abstractive techniques, research community has been overwhelmingly

inclined towards extractive techniques. We will focus on extractive summarization

techniques in the remaining part of this section.

Extractive summarization approaches try to identify from the original corpus of

textual data, a proper subset of linguistic units, which can be the best representative

138 V. Varma et al.

of the original corpus within the constraints of a stipulated summary size. The

linguistic units can be sentences, phrases or a short textual entity like a tweet.

The research community of the ﬁeld has approached the problem of auto- mated

summarization in a variety of ways, but most of them can be generalized to follow

three steps given below.

1. Creating an intermediate representation for the target text such that the key

textual features are captured. Possible approaches are Topic Signatures, Word-

frequency count, Latent Space Approaches using Matrix Factorisations or

Baysian approaches.

2. Using the intermediate representation to assign scores for individual linguistic

units within the text.

3. Selecting a set of linguistic units which maximises the total score as the summary

for target text.

Candidate summaries are those subsets of linguistic units in the original corpus

whose total size falls within the stipulated targeted summary size. The quality of a

candidate summary is estimated with a scoring function and the maximum scoring

candidate summary is chosen as the summary of the corpus. The scoring function

for candidate summaries for a generic summarization purpose is of the form:

F.S/DœCoverage.S/.1œ/Redundancy.S/(7.1)

F.S/DCoverage.S/C.1/Diversity.S/(7.2)

where œis a constant, S is a candidate summary. Coverage function positively

rewards the summary which covers maximum information from the original text,

Redundancy function penalizes a candidate summary for carrying a redundant

information and Diversity function encourages candidate summaries with diverse

information with higher values.

7.2.1 Recent Developments in Extractive Summarization

Extensive work has been done on extractive summarization which tries to achieve

a proper content coverage by scoring and selection of sentences. Typically these

methods extract candidate sentences to be included in the summary and then reorder

them separately. Most of the extractive summarization researches aim to increase the

total salience of the sentences while reducing redundancy. Approaches include the

use of Maximum Marginal Relevance (Carbonell and Goldstein 1998), Centroid-

based Summarization (Radev et al. 2002), Summarization through Keyphrase

Extraction (Qazvinian et al. 2010) and Formulation as Minimum Dominating Set

problem (Shen and Li 2010). Graph centrality has also been used to estimate the

7 Social Media Summarization 139

salience of a sentence (Erkan and Radev 2004). Approaches to content analysis

include generative topic models (Haghighi and Vanderwende 2009; Celikyilmaz and

Hakkani-Tur 2010;Lietal.2011a) and Discriminative models (Aker et al. 2010).

ILP2 (Galanis et al. 2012) is a system that uses Integer Linear Programming (ILP)

to jointly optimize the importance of the summary’s sentences and their diversity

(non-redundancy), while also respecting the maximum allowed summary length.

They use a Support Vector Regression model to generate a scoring function for

the sentences. Woodsend and Lapata (2012) arrived at a scoring function which

holds linear components to quantify the salience of bi-grams, salience of parse tree

nodes and a component based on a language model which penalises the unlikely

sentences. An approach based on the distribution of some important concepts in the

summary was done by (Berg-Kirkpatrick et al. 2011). The concepts are bi-grams

in the corpus to be summarised. They formulated an ILP objective function in the

space of candidate summaries that maximizes the total concept weight score of the

summary to be chosen.

Takamura and Okumura (2009) have treated multidocument summarization as

a maximum concept coverage problem with knapsack constraint (MCKP). They

have also exploited the possibility of decoding algorithms in solving MCKP in the

summarization task. Lin and Bilmes (2011) formulated summarization as a sub-

modular function maximization problem in the possible set of candidate summaries

with due respect to the space constraint. The primary goal of all thes above methods

is to achieve maximum content coverage.

As far as sentence ordering is concerned, Li et al. (2011b) used context inference

to achieve better sentence ordering while (McKeown et al. 2001) used majority

ordering algorithm to sort sentences. (Lapata 2013) provided an unsupervised

probabilistic model for sentence ordering while (Ji and Yu 2013) used a cluster

adjacency based approach. One disadvantage in these approaches is that though the

sentence ordering approaches can achieve a topical order of sentences, the local

structural relations of the sentences are never captured.

The work which pioneered a holistic approach towards multi-document sum-

marization by bringing sentence selection and coherence under a single umbrella

is G-Flow by (Janara et al. 2013). They built a graph which stored dis- course

relations with proper edge weights to quantify coherence. This value was linearly

combined along with salience and redundancy in the scoring function of sentences

to formulate multi-document summarization as a constraint optimization problem.

The system has taken into consideration the readability of the extracted sentences in

output summary by quantifying its coherence by means of discourse graph. This

has ensured the optimal content coverage with readability and coherence of the

sentences taken care of in the resultant summary.

Varm a e t a l. (2011) and Jagadeesh et al. (2007b) utilized Hyperspace Analog to

language model to create a semantic space of words from word co-occurance based

statistics and effectively leverage this information for summarization. Chandan

et al. (2008) created a scheme for genearting personalised summaries on web

documents by utilizing user speciﬁc information according to the user’s subjective

information need. Chandan et al. (2009) formulated summarization as a decision

140 V. Varma et al.

making problem where a risk associated with the selection of sentence in terms

of information loss is estimated and the set sentences inducing minimum total risk

of selection generate the summary. Rahul et al. (2009) approached summarization

sentence position policy with an assumption that key sentences are present at

speciﬁc locations of the text.

7.2.2 Expected Nature of Scoring Function for Candidate

Summary

The scoring function of candidate summaries designed for an extractive summa-

rization can be formalized as follows.

For a given corpus containing set of sentences V Dfv1,v2,::: ,vng,

F:2

V!Ris a function that returns a real value for any subset SV.Andthe

summarization function traces out a subset of bounded size which maximises F. i.e.

Ssum Darg max

SVF.S/(7.3)

where jSsumjkand k!Targeted summary size.

And this optimization is obviously NP-complete. An automated multi-document

summarization approach is expected to be scalable on large document set to pro-

duce a reliable summary. Lin and Bilmes (2011) observed the importance of mono-

tone, submodular functions for extractive summarization process. It has been shown

by (Nemhauser et al. 1978) that if F is monotone, non-decreasing, submodular

function there exists a greedy approach which approaximates sum- marySsum such

that

F.Ssum/>D.e1=e/FSopt(7.4)

where

Sopt Darg maxSVF.S/:

(Minoux 1978) has come up with a version of this algorithm which scales to very

large dataset. Submodular functions possess an interesting property of ‘diminishing

returns’ which can be formalised as follows.

For any ABV, and (v2V,v62 A and v 62 B), if F is submodular,

F.ACv/F.A/F.BCv/F.B/(7.5)

i.e. the value addition induced by v decreases as A grows to B. And F is non-

decreasing if,

8AB;F.A/F.B/(7.6)

7 Social Media Summarization 141

A monotone, non-decreasing submodular functions (MND) has an additional

property that a resultant function formulated as a weighted sum of several MND

submodular functions, will in turn, be a monotone submodular function if weights

used are positive real numbers. i.e. FDPi(˛iFi) is submodular if each of the

Fiis a monotone, non-decreasing function. For all i, ˛i> 0. This is of signiﬁcant

importance to summarization, as in most of the cases, the scoring function of

sentences utilised for extractive summarization is a weighted sum of a function

which estimates Topical Coverage and another function which maximises topical

diversity.

For a generic summarization purpose, Lin and Bilmes (2011) used the following

function

F.S/DL1.S/CR1.S/(7.7)

Here L1(S) and R1(S)aregivenby

L1.S/DX

i2V

min 8

j2S

wi;j˛X

k2V

wi;k9

;(7.8)

where

wi;j!TF-IDF cosine similarity between sentences i and j

V!set of all sentences in the corpus

S!candidate summary

˛!A learned parameter

R1.S/D

kD1v

j2S\Pk

i2V

wi;j(7.9)

where

P1,P2,::: .Pkare sentence clusters formed out of applying k-means clustering on

the set of sentences in the corpus with TF-IDF cosine similarity as the similarity

metric. Description of all other variables are the same as mentioned above for

Eq. 7.8.

7.3 Social Media Summarization

Social media platforms have a large number of users and the interactions among

these users produce an enormous amount of data every day. Summarizing such

large amount of user-generated content produced by the social media platforms

without disturbing its essence can provide useful insights for various purposes. In

142 V. Varma et al.

this section on social media summarization we discuss about the type of social

media data, the nature of interactions contributing to the data are introduced and

then we talk about the challenges involved in the information extraction in social

media summarization.

7.3.1 Challenges of Information Extraction in Social Media

The text in user-generated content occurring in social interactions is usually not

well-formed in terms of Natural Language grammar, structure and formality. It

also disagrees with other conventions of language in notably different ways like:

usage of inconsistent cases of alphabets while dealing with named entities, missing

punctuations, repetitions and reduplications, lack of good sentence structure, false

starts, non-standard words, ﬁller words like “uuumm”, “uhh” and other texting

disﬂuencies. It is indeed so because the social media texts are more likely to express

emotional and context-speciﬁc content.

Unstructured noisy text data is found in informal environmental settings such

as online chat, text messages, e-mails, message boards, user reviews, blogs,

wikis and social networking posts. Hence while carrying out tasks speciﬁc to

Information Extraction, the contemporary research is facing a lot of problems with

unstructured text, as the standard natural language processing tools such as Parts of

speech tagging (POS), Parsing and Named Entity Recognition (NER) exhibit poor

performance on such unstructured data.

Most of the textual content in social media are in the form of tweets, comments

and foot notes for images/videos and are relatively short in size. Traditional text

similarity metrices expect a reasonable amount of contextual information for an

accurate similarity estimation. The short textual content in social media data exhbits

sparsity of contextual information and this can result in improper text classiﬁcation,

clustering or infomation extraction when traditional text-similarity metrices are

used.

Another distinguishing characteristic of social media data is its sensitivity to

chronological recency. Social media is always volatile on its views towards persons

or commercial products as it relies only on the most recent outcomes related to

the same. For instance, all media build-ups and public image about a politician can

collapse once a video tape on his private affair goes viral in social media. When a

summarization technique needs to be applied on a particular scenario, the system

needs to ensure that it makes use of the most chronolgically signiﬁcant set of social

media interactions.

7.3.2 General Approaches to Social Media Summarization

Microblogging sites such as twitter generate and share data with users at an

unprecedented rate. Such raw data from these social media are informative but

overwhelming, given the sheer volume of data along with all the noise and

7 Social Media Summarization 143

redundancies contained within. Unlike conventional summarization systems which

focus on short static data, social media summarization involves dynamic, quick

to change, large-scale streams of information. Some of the approaches attempted

are as follows. Online tweet clustering algorithm to cluster tweets and distilled

statistics called Tweet Cluster vectors (Shou et al. 2013). They have implemented

a prototype called Sumblr. An exploratory search application for twitter called as

Tweetmotif was implemented by (Shou et al. 2010). TweetMotif groups messages

by frequent signiﬁcant terms which facilitate navigation and drilldown through a

faceted search interface. The idea of User inﬂuence models, which project user

interaction information onto a Twitter context tree, to help in twitter context

summarization within a supervised learning framework (Chang et al. 2013). Using

a scheme called as Location Centric Word Cooccurrence that uses the content of the

tweets and the network information of the twitters to identify tweets that are location

speciﬁc (Rakesh et al. 2013). Word graph along with optimization techniques such

as decaying windows and pruning is attempted by (Olariu 2014).

There are different genres of social media interactions which can be categorised

into friendship-driven, interest-driven, expert discussions etc. The approaches to

social media summarization vary with respect to the purpose of summarization

and the genre of data. Below explained are some special cases of social media

summarization.

7.3.3 Event Summarization

Most of the events that pop up in any part of the world, whether natural, ad-hoc

or planned is subject to global attention and people from near and far from the

scene of event report, analyse and share their opinions through social media. Social

media also gives an easy accesible platform for similar minds across the world to

group together and organize events. Generally event can be viewed as a sequence

of incidents/sub-events evolving at different points in event timeline or as a set of

contributions of different entities that realise a single event.

Deepayan and Kunal (2011) came up with an approach to achieve event summa-

rization in tweets which segments the event time-line into different segments where

each segment corresponds to a sub-event which is a semantically distinct portion

of the full-event and pick up required number of tweets from each. A speciﬁcally

designed HMM is employed, which will take care of burstiness of the tweet stream

and the word distribution used in the tweets, to segement the tweet stream. The

approach accommodates representative tweets from both low activity periods and

bursty periods along the event time-line. This ensures that certain inherently bursty

sub-events which produce more tweets in comparison with other sub-events, do not

undeservedly occupy more summary space. For example, a terrorrist attack during

an athletic meet grabs the attention of large section of audience, even of those who

are not interested in athletics. A one shot summarization of tweet data without due

consideration to its burstiness can result in a summary on only ‘terrorist attack’ for

144 V. Varma et al.

throughout the week. The use of automatically learned language models can ensure

the seperation of sub-events which are not temporally far apart.

F. Chuan and S. Asur (2013a) proposed a Search and Summarize framework

which executes event summarization in twitter in a bootstrapping manner. A normal

key-word based event search can fetch a large and highly heterogenous set of tweets

as the result, which makes the task of subevent detection a herculean deal. The

twitter stream Defor an event exihibits a high-level temporal topical relation i.e. If

tweets d1,d2,d32Deare written respectively at times t1,t2,t13, and if t1t2t3,

then the topical similarity between d1and d2will be higher than that between

d1and d3. They have also formulated a Decay Topic Model which takes the

temporal signiﬁcance of a latent topic in the tweet stream by quantifying it with

an exponential decay function along with conventional word co-occurance based

estimations. Initially, the system starts querying for tweets with a set of key words

related to the event and apply the topic model on the resultant set of tweets. Each

latent topic identiﬁed corresponds to one of the sub-events and the top ranked words

from each topic is used to query for tweets again. The new result set is merged with

the older set of tweets and the topic model is updated. The ﬁnal set of topics are

utilised to summarize the sequence of tweets by selecting the tweets for to each

latent topic that give lowest perplexity.

Chao Shen and Tao Li (Chua and Asur 2013b) had come up with a participant

based approach for event summarization in twitter where participants are entities

that play a key role in shaping up the event. They trigger the process by tagging

proper nouns using CMU TweetNLP tool (Gimpel et al. 2011),followedbya

hierarchical clustering where resultant clusters contain the different mentions of

same entity. The similarity metric used in the clustering process is represented as

follows.

sim ci;cjDlexSim ci;cjcontSim ci;cj(7.10)

where lexSim evaluates the lexical similarity between two mentions on the basis

of Edit distance between the two, while contSim quantiﬁes contextual similarity

between two entity mentions. Context is deﬁned by a temporal segment in the tweet

sequence surrounding an entity mention and the calculation of contextual similarity

between contexts relies on the term distribution in respective contexts. The global

tweet set corresponding to the event is divided into different partcipant streams

where each of the individial stream contains tweets holding atleast one mention of

the participant entity. The major sub-events corresponding to each of the partcipant

stream are identiﬁed through mixture model approach incorpoarting both time and

content aspects. Such sub-events that are identiﬁed for each of the participant are

merged to create a global list of sub-events and the summary is ﬁnally generated by

extracting a representative tweet for each of the listed sub-event.

7 Social Media Summarization 145

7.3.4 Sentiment Analysis and Summarization

Internet has brought about revolutionary change in the way people across the globe

can communicate with each other, melting away the geographical rifts between

them to a considerable extent. This has tremendously increased the visibility of

incidents from across the world which would otherwise have been treated as only

locally signifant in earlier days. More powerful modes of communication have

also paved the way for a cultural penetration between communities which are far

apart geographically and they have started experiencing something new which was

alien to their previous generations. This enabled larger chunks of humanity to get

sensitized on same issues like [Israel enchroachments in Palestine, US Presidential

Election, a newly released music album, or students’ agitations in Gulf countries

for democracy] etc. This phenomenon of shared sensitization among larger groups

of people enables people to exhibiting their emotions of support, empathy, hatred

and aggressions through social media. The huge textual data piling up in social

media due to debates and discussions among people belonging to different cultural,

regious, economic and racial backgrounds can be utilised for comprehensive opinion

surveys motivated for a large variety of purposes. Mass opinion about an entity can

be broken down into different subsets where each subset brings out and highlights

some aspect of the entity. A well-formed opinion summary should provide a ﬁne-

grained view of popular opinions on different aspects of the entity. Keeping this fact

under consideration, (Hyun Duk Kim et al. 2010) in their detailed survey on opinion

summarization, abstracted the opinion- summarization techniques, which generate

a textual summary that holds opinion distribution of each aspect, into 3 major steps

as follows.

1. Identify the various aspects (features/subtopic)

2. Sentiment prediction for each occurrence of an aspect

3. Extract sentences that represent the popular sentiment on each aspect

NLP techniques devised for feature identiﬁcation include a combination of POS

tagging and syntactic tree parsing, as most of the features are noun phrases (Lu

et al. 2009; Popescu and Etzioni 2005; Hu and Liu 2004a,b). Hu and Liu (2004a,b)

devised association rule mining for feature extraction to learn rules of the form

A1,A2,::: ..,An!Fswhere Fsstands for the feature and the approach based on

other words and their POS taggs in a sentence.

The problem of product attribute and opinion extraction has been handled as

a sequence labeling task in the papers (Jin and Ho 2009; Qi and Chen 2010;

Zhang et al. 2010. Somprasertsri and Lalitrojwong use maximum entropy models

to address the issue (Somprasertsri and Lalitrojwong 2008). By making use of

a lexicalized HMM-based method Jin and Ho (2009) have proposed to perform

opinion mining at the level of attributes. In (Miao et al. 2010), Miao et al. introduced

a novel method to do opinion mining, with very ﬁne-grained granularity, by utilizing

Conditional Random Field models (CRFs) (Lafferty et al. 2001) and domain

146 V. Varma et al.

knowledge. To extract information about both the products and their opinions at

the same time, Qi and Chen have made use of a linear-chain CRFs in (Qi and Chen

2010).

Given a source text and a context, sentiment prediction deals with ideintifying

the sentiment orientation or inclination of sentiment towards some aspect of the

text in the given context. For e.g. ‘Story line of the movie is bad’ holds a negative

polarity towards story line of some movie, while ‘cinematography is excellant’

holds a highly biased positive polarity. There were some methods for sentiment

prediction based on numerical information associated with an opinion text such

as product rating associated with a product review comment. But it cannot be

generalized for all the textual opinions appearing in web as many of these lack

the privilege of having a user-given numerical information. Lexicon based methods

exhibit a more ﬂexible and generalisable approach for assessing sentiment polarity.

SentiWordNet (Baccianella et al. 2010) is a such a lexical resource which is devised

to support sentiment classiﬁcatiion and is evolved out of an automatic annotation

of WordNet synsets1with their degrees of positivity, negativity and neutrality.

Sentiment polarity of a word appearing within the context of occurrence of a speciﬁc

aspect provides a reliable clue about the sentiment orientation towards that aspect in

the particular context. Sufﬁcient works have been done which trace out the sentiment

polarity distribution on each aspect and generate a statistical summary which when

transformed gives rise to an easily intrepretable graphical representation.Having

said that, there are indeed certain contexts where a textual summary carrying more

speciﬁc information, including reasons of polarity, is inevitable.

Opinion Summaries can be generated to convey different levels of granularity of

opinions. Popular opinionated terminologies (e.g. excellent, boringe etc), relevant

to various aspects of a particular topic, are used to retrieve the word-level opinions,

as shown by Popescu and Etzioni (2005). A summary which is based on word-level

popularity gives a coarser level of information about the opinion, say ‘Direction:

Good’. More granularity and deeper level of understanding can be achieved by sen-

tence level summary. e.g. ‘Technically brilliant attempt from director Vasantabalan’.

Along with popular sentiments, a sentence in an opinion summary should also

convey the reason for the sentiment so that the user will get a reliable insight. A

summary generated by picking up sentences carrying popular sentiment polarity

may not hold the reason for the sentiment.

Glaser and Schutze (2012) have come up with an approach to identify ‘supporting

sentence’ that represents the overall sentiment of a product and carries a convincing

reason for the sentiment. As an initial step, they apply a sentiment classiﬁcation

on the entire set of sentences and classify the sentences into positive and negative

sentences and pick up ‘n’ sentences which exhibit the highest probability of

conforming to the overall sentiment of the document. In the succeeding step, they

ﬁlter out a sentence which contains enough supporting reason for the orientaion of

polarity. The quantity of suppoting information contained in a sentence is quantiﬁed

1https://wordnet.princeton.edu/

7 Social Media Summarization 147

by weighting function based on the frequency of domain speciﬁc noun-phrases,

the intution being that a supporting information cannot be conveyed without noun-

phrases.

Hyun Duk Kim et al. (2013) rank the explanativeness of a sentence based on the

following heuristics namely.

1. Sentence Length !a lengthier sentence can be more explanatory

2. Popularity and representativeness !a sentence is more explanatory if it contains

more terms that are frequent in source text

3. Discriminativeness relative to background !A sentence is expected to be

explanatory if it can discriminate source text O which is to be summarised from

the background set B which is a superset of O. The set O consists of sentences

satisfying the constraints that they cover aspect A of topic T with sentiment

polarity P. The background set B can be constructed by relaxing any of these

constrains adopted to create O.

They have come with two schemes for measuring explanativeness. First one is a

modiﬁed version of BM25 (Jones et al. 2000) ranking function for information

retrieval. It treats sentence as a query and ranks the explanativeness of a sentence

based on the frequency of the words of the sentence in O and B.

BM25E.S;O;B/DX

w2S

IDF .w;B/c.w;O/.

k1C1/

c.w;O/Ck1.1b/CbjOj

avgdl 

IDF .w;B/Dlog jBjc.w;B/C0:5

c.w;B/C0:5

(7.11)

where

c(w, O)!count of w in data set O,

jOj! total number of term occurrences in data set O

jBj! total number of term occurrences in data set B

avgdl !average no. of total term occurrences of sub-clusters in T from which O is

extracted.

k1and b are parameters which can be set empirically

The second scheme measures the explanativeness of a sentence as the sum of

explanativeness of each word in it. The explanativeness of each word is modelled

probabilistically as follows:

ES.S/DX

w2S

p.wjED1/

p.wjED0/(7.12)

Here ED1 implies that the word wis observed from an explanatory sentence and

ED0 implies that the word wis observed from a non-explanatory sentence.

148 V. Varma et al.

7.3.5 Conversational Summarization

Besides catering to people’s need of expressing themselves and providing them

with a platform to address their instantaneous emotional reﬂexes, social media also

provides a lot of sophisticated venues for expert discussions and provisions for

seeking expert advices for almost all domains including healthcare, IT and ﬁnance.

Such discussion forums produce a considerably vast description of expert opinions

about latest updates in different ﬁelds, ranging across different perspectives and

effects a concurrent knowledge creation in the form of user generated content. An

expert dialog summary created out of such data can satisfy many academic queries.

Usually such discussions will be surrounding a primary topic of interest, but liable

to frequent topic shifts due to a relatively large number of participants, consequently

leading to data sparseness. This data sparseness can be countered by incorporating

web documents of related content so that unsupervised topic modelling techniques

can be employed. One among the latent topics can be traced out as the primary topic

of dicussion.

Arpit et al. (2013) deﬁne the primary topic as the most prevalent topic in the

longest sentence in each conversation element. A search engine is queried for each

word in such a ‘topic sentence’ and the ﬁrst web document obtained as result

is fetched out. The document obtained is considered to be a description of the

particular word. Such documents obtained for words belonging to the same sentence

are concatenated to form a single document and the set of such documents obtained

for all topic sentences constitute the input corpus for topic modelling schemes like

Latent Dirichletete Allocation (LDA). The latent topic whose topic terms have a

popular presence in longest sentence of each of the conversation element is treated

as the primary topic of the on-going conversation and a sentence’s presence in the

summary is decided by quantifying its relation with the primary topic. Statistical

measures based on word co-occurrence can reliably quantify the relation of word

with the primary topic word. For this purpose we use HAL model which constructs

the dependencies of a word w on other words based on their occurrence in the

context of w within a sufﬁciently large corpus. HAL model creates a term*term

matrix where each element represents a co-occurrence score between two words

within a predeﬁned window of length K.

HAL w0jwD

kD0

W.k/matrix w;k;w0(7.13)

where

matrix(w’,k,w) !number of times word w’ occurs k distance away from w,

K!Window length,

W(k) !KkC1 denotes the strength of coccurrence between two words

7 Social Media Summarization 149

pHAL is given by

pHAL w0jwDHAL .w0jw/

n.w/K(7.14)

Here pHAL is the probability of associating a word w0with another word win a

window of size K. n(w) is word frequency of w.

Given the topic terms t1,t2::: tkof the primary topic, the salience of a sentence

to be present in summary is given by,

Score .S/DY

w2S P.wi/Y

pHAL .tk=wi/!(7.15)

A scenario where a user raises a question for an expert or peer opinion and receives

more than one answers, deserves a seperate treatment compared to the one that is

discussed above where a many-to-many interaction happens. Wang et al. (2014)try

to attend this problem by incorporating a Ranking function which is used to quantify

the revelance of a sentence to the posted query, along with other linear components

for Topical Coverage and Diversity in Scoring function. They also encourage the

contributions from more number of authors with an author coverage component.

The sentence scoring function adopted, includes a linear component to assess the

relevence of a candidate summary S which can be illustrated as follows

r.S/DjSj

iqrank1

i(7.16)

rankiis the rank of sentence iin V, the set of all sentences in the source corpus to be

summarized. rankiis calculated using ListNet (Cao et al. 2007) ranker.

Along with other coverage functions, they have introduced an author coverage

function which will encourage the particpation of all the authors in the summary.

Authorship coverage involves clustering the sentences based on authorship. It is

given by authorship score a(S).

a.S/DX

A2ƒrjS\Aj(7.17)

is the clustering induced by the sentence to author relation.

7.3.6 Future Trends

A summary is expected to be a representative of original corpus and is intended

to convey the information contained in the original corpus without any incorrect

150 V. Varma et al.

reading. More than extracting the content, a lot of work needs to be done in re-

organizing the extracted information to a presentable output which creates the

right inference. The quality of the summaries can be better advanced by applying

abstractive summarization techniques on user-generated content which treat user-

generated content just as a source of information and generates summary in

an intrepretable good language. In future, it is quite possible to extract more

speciﬁc information about users and their interests to generate insightful summaries

highlighting aspects relevant to the user’s interests. User activity network and

summarization based on those activities can provide such meaningful insights.

The virtual world of social media provides a lot of opportunities for a user to ﬁnd

new friends and expand his circle of closeness. The strength of each friendship can

be evaluated based on different parameters such as frequency of wall posts shared,

number of messges sent or the number of comments made on each other’s posts

(Viswanath et al. 2009). The macroscopic view of all active, user-to-user links

can bring into focus the existence of larger user-activity networks within social

media. Such user-activity networks share a lot attributes like geographical location,

age, batchmates in college, or people having similar tastes and interests. A textual

summary of interactions happening in a user-activity network can offer granular

data, based on speciﬁc ﬁne-tuned attributes of the network. For e.g., ‘Interactions of

photography enthusiasts in the district’. Such a precise summary is more insightful

than the conventional generic summary on all the social media interactions on a

particular topic. But it should be remembered that the data being dealt with, belongs

to the domain of inter-personal social interactions and the individuals are naturally

endowed with their privacy settings on what should be shared or not shared. Due to

such intrinsic limitations on the data disclosure, such ﬁne-tuned granular summaries

are practically constrained, if not infeasible.

Acknowledgement We extend our sincere thanks to people of SIEL lab, IIIT Hyderabad for

giving us the suggestion in organizing the chapter and to Vigneshwaran M, LTRC, IIIT for

helping us in editing the content. We also thank Sangeetha Thomas, MA Psychology, University of

Hyderabad for her insightful inputs on psychological aspects of social media usage. We received

grants from DIETY, NOKIA (Microsoft Mobile) and acknowledge their contribution towards the

research activities at SIEL lab, IIIT Hyderabad.

References

Aker, Ahmet, Trevor Cohn, and Robert Gaizauskas. 2010.Multi-document summarization using

A * search and discriminative training. In EMNLP.

Arpit, Sood, Thanvir P. Mohamed, and Vasudeva Varma. 2013. Topic-focused summarization of

chat conversations. In ECIR.

Baccianella, S, A. Esuli, and F. Sebastiani. 2010. SENTIWORDNET 3.0, An enhanced lexical

resource for sentiment analysis and opinion mining. In Proceedings of the 7th conference on

international language resources and evaluation. (LREC’10).

Bargh, J.A., and K.Y. McKenna. 2004. The Internet and social life. Annuual Review of Psychology

55: 573–590.

7 Social Media Summarization 151

Berg-Kirkpatrick, Taylor, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and

compress. In Proceedings of the 49th annual meeting of the association for computational

linguistics, vol. 1, 481–490.

Cao, Zhe, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: From

pairwise approach to listwise approach. In Proceedings of the 24th international conference on

machine learning, ICML 07.

Carbonell, Jaime, and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for

reordering documents and producing summaries. In SIGIR.

Celikyilmaz, Asli, and Dilek Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document

summarization. In Annual meeting- association for computational linguistics 2, no. conf, 48,

815–824.

Chandan, Kumar, Prasad Pingali, and Vasudeva Varma. 2008. Generating personalized summaries

using publicly available Web documents. In Ternational conference on Web intelligence and

intelligent agent technology.

———. 2009. Estimating risk of picking a sentence for document summarization. CICLing.

Chang, Yi, Xuanhui Wang, Qiaozhu Mei, and Yan Liu. 2013. Towards Twitter context summariza-

tion with user inﬂuence models. In WSDM’13.

Chua, Freddy Chong Tat, and Sitaram Asur. 2013a. Automatic summarization of events in social

media-Freddy Chong. In ICWSM.

———. 2013b. A participant-based approach for event summarization using Twitter streams. In

NAACL.

Deepayan, Chakrabarti, and Punera Kunal. 2011. Event summarization using Tweets. In Associa-

tion for the advancement of artiﬁcial intelligence.

Erkan, Gunes, and Dragomir R. Radev. 2004. LexRank: Graph-based centrality as salience in text

summarization. Journal of Artiﬁcial Intelligence Research 22 (1): 457–479.

Galanis, Dimitrios, Gerasimos Lampouras, and Ion Androutsopoulo. 2012.Extractive multi-

document summarization with integer linear programming and support vector regression. In

Proceedings of the ::: International conference on computational linguistics, vol. 1, 911–926.

Gimpel, Kevin, Nathan Schneide, Brendan O. Connor, Dipanjan Das, Daniel Mills, Jacob

Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011.

Part-of-speech tagging for twitter: Annotation, features, and experiments. In ACL.

Glaser, Andrea, and Hinrich Schutze. 2012. Automatic generation of short informative sentiment

summaries. In ACL 2012.

Haghighi, Aria, and Lucy Vanderwende. 2009.Exploring content models for multi- document

summarization.In NAACL.

Hu, M., and B. Liu. 2004a. Mining and summarizing customer reviews. In KDD 04: Proceedings

of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.

———. 2004b. Mining opinion features in customer reviews. In AAAI04: Proceedings of the 19th

national conference on Artiﬁcal intelligence.

Hyun Duk Kim, Kavita Ganesan, Parikshit Sondhi, Chenxiang Zhai. 2010. Comprehensive review

of opinion summarization. In Survey paper, 2010.

Hyun Duk Kim, Malu G. Castellanos, Meichun Hsu, ChenXiang Zhai, Umeshwar Dayal, and

Riddhiman Ghosh. 2013. Ranking explanatory sentences for opinion summarization. In SIGIR.

Jagadeesh, J., Prasad Pingali and Vasudeva Varma. 2007a. Capturing sentence prior for query-

based multi-document summarization. In Conference RIAO2007.

Jagadeesh, Jagarlamudi, Prasad Pingali, and Vasudeva Varma. 2007b. Capturing sentence prior for

query-based multi-document summarization. In Document understanding conferences.

Janara, Christensen, Stephen Soderland Mausam, and Oren Etzioni. 2013. Towards coherent multi-

document summarization. In Proceedings of NAACL-HLT 2013, 1163–1173.

Ji, Donghong, and Nie Yu. 2013. Sentence ordering based on cluster adjacency in multi- document

summarization. In ACL 2013.

Jin, W., and H.A. Ho. 2009. Novel lexicalized HMM based learning framework for web opinion

mining. In Proceedings of the 26th annual international conference on machine learning.

152 V. Varma et al.

Jones, K.S., S. Walker, and S.E. Robertson. 2000. A probabilistic model of information retrieval:

Development and comparative experiments. In Information Processing and Management.

Kang, J. 2000. Cyber-race. Harvard Law Review 113 (5): 1130–1208.

Karamanis, Nikiforos, Massimo Poesio, Chris Mellish, and Jon Oberlander. 2009. Evaluating

centering-based metrics of coherence for text structuring using a reliably annotated corpus.

In Proceedings of the 42nd annual meeting of the association for computational linguistics,

391–398, Barcelona, Spain.

Lafferty, J., A. McCallum, F. Pereira. 2001. Conditional random ﬁelds: Probabilistic models for

segmenting and labeling sequence data. In Proceedings of the 18th international conference on

machine learning.

Lapata, Mirella. 2013. Probabilistic text structuring: Experiments with sentence ordering. In ACL

2003, 545–552.

Li, Peng, Yinglin Wang, Wei Gao, and Jing Jiang. 2011a. Generating aspect-oriented multi-

document summarization with event-aspect model. In EMNLP.

Li, Peifeng, Guangxi Deng, and Qiaoming Zhu. 2011b Multi-document Summarization. In

Proceedings of the 5th international joint conference on natural language processing, 1055–

1061.

Lin, Hui, and Jeff Bilmes. 2011. A class of submodular functions for document summa- rization.

In Proceedings of the association for computational linguistics, vol. 1, 510–520.

Lu, Y., C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In

Proceedings of the 18th international conference on World wide web.

Mckeown, Kathleen R., Regina Barzilay, David Evans, Vasileios Hatzivassiloglou, and Simone

Teufel. 2001. Multi-document summarization: Approach and evaluation. In DUC01.

McKoon, Gail, and Roger Ratcliff. 1992. Inference during reading. Psychological Review 99 (3):

440–446.

Miao, Q., Q. Li, Z. Daniel. 2010. Mining ﬁne grained opinions by using probabilistic models and

domain knowledge. In Proceedings of the IEEE/WIC/ACM international conference on Web

intelligence.

Minoux, M. 1978. Accelerated greedy algorithms for maximizing submodular set functions.In

Optimization techniques, 234–243.

Nemhauser, G.L., L.A. Wolsey, and M.L. Fisher. 1978. An analysis of approximations for

maximizing submodular set functions I. Mathematical Programming 14 (1): 265–294.

Olariu, Andrei. 2014. Efﬁcient online summarization of microblogging streams. In ACL, 2014.

Popescu, A.M., and O. Etzioni. 2005. Extracting product features and opinions from reviews. In

Proceedings of the conference on human language technology and empirical methods in natural

language processing., HLT’05.

Qazvinian, Vahed, Dragomir R. Radev, and Arzucan Ozgur. 2010.Citation summarization through

keyphrase extraction. In Proceedings of the ::: International conference on computational

linguistics, vol 2, 895–903.

Qi, L., and L. Chen. 2010. A linear-chain CRF-based learning approach for web opinion mining.

In Proceedings of the 11th international conference on Web information systems engineering.

Radev, D., A. Winkeil, and M. Topper. 2002. Multi-document centroid based text summarization.

In Meeting of the association for computational linguistics, 112–113.

Rahul, Katragadda, Prasad Pingali, and Vasudeva Varma. 2009. Sentence position revisited: A

robust light-weight update summarization baseline Algorithm CLIAWS3. In Third international

cross lingual information access workshop.

Rakesh, Vineeth, Chandan K. Reddy, Dilpreet Singh, and M.S. Ramachandran. 2013. Location-

speciﬁc tweet detection and topic summarization in Twitter. In IEEE.

Shen, Chao, and Tao Li. 2010.Multi-document summarization via the minimum dominating set.

In Proceedings of COLING, 984–992.

Shou, Lidan, Zhenhua Wang, Ke Chen, and Gang Chen. 2010. TweetMotif: Exploratory search and

topic summarization for Twitter. In AAAI.

Shou, Lidan, Zhenhua Wang, Ke Chen, amd Gang Chen. 2013. Sumblr: Continuous summa-

rization of evolving tweet streams. In SIGIR’13.

7 Social Media Summarization 153

Sipos, R., A. Swaminathan, P. Shivaswamy, and T. Joachims. 2012. Temporal corpus summariza-

tion using submodular word coverage. In Proceedings of CIKM.

Somprasertsri, G., and P. Lalitrojwong. 2008. Automatic product feature extraction from online

product reviews using maximum entropy with lexical and syntactic features. In Proceedings of

the 2008 IEEE international conference on information reuse and integration.

Takamura, Hiroya, and Manabu Okumura 2009. Text summarization model based on maximum

coverage problem and its variants. In Associations for Computational Linguistics, Stroudsburg.

Varma, Vasudeva, Sudheer Kovelamudi, Jayant Gupta, Nikhil Priyatam,

arpit.soodug08@students.iiit.ac.in, Harshit Jain, Aditya Mogadala Mogadala, and Srikanth

Reddy Vaddepally. 2011. IIIT Hyderabad in summarization and knowledge base population.

In TAC 2011.

Viswanath, Bimal, Alan Mislove, Meeyoung Cha, and P. Krishna. 2009. Gummadi on the evolution

of user interaction in Facebook. In WOSN09.

Wang, Lu, Hema Raghavan, Claire Cardie, and Vittorio Castelli. 2014. Query-focused opinion

summarization for user-generated content. In COLING.

Woodsend, Kristian, and Mirella Lapata. 2012.Multiple aspect summarization using integer linear

programming. In Proceedings of the ::: Joint conference on EMNLP and computational

natural language learning, 233–243.

Zhang, S., W. Jia, Y. Xia, Y. Meng, and H. Yu. 2010. Product features extraction and categorization

in Chinese reviews. In Proceedings of the 6th international multi-conference on computing in

the global information technology.

Chapter 8

Deception Detection and Opinion Spam

Paolo Rosso and Leticia C. Cagnina

Abstract In this chapter we ﬁrst introduce the reader to the problem of deception

detection in general, describing how lies may be detected automatically using

different methods. Later we address the speciﬁc problem of deception detection

in predatory communication. We make emphasis especially on those approaches

using affective resources as categorical and psychometric information provided by

natural language processing tools. Finally, we focus on the problem of opinion spam

whose detection is very important for reliable opinion mining. In fact, nowadays

a large number of opinion reviews are posted on the Web. Such reviews are a

very important source of information for customers and companies. Unfortunately,

due to the business behind it, there is an increasing number of deceptive opinions

on the Web. Those opinions are ﬁctitious and have been deliberately written to

sound authentic in order to deceive the consumers promoting a low quality product

(positive deceptive opinions) or criticizing a potentially good quality one (negative

deceptive opinions). Then, we summary some interesting approaches to detect spam

opinion on the Web.

Keywords Deception detection • Opinion spam • Lie detection • Online sexual

predators detection

8.1 Lie Detection

It has been demonstrated that deception is frequently present in computer-mediated

communication (CMC)1in everyday human communication (Hancock et al. 2004).

Verbal deception, deﬁned as (Buller and Burgoon 1996): “a message knowingly

1The term CMC was proposed in Wolz et al. (1997).

P. Rosso ()

PRHLT Research Center, Universitat Politècnica de València, Valencia, España

e-mail: prosso@dsic.upv.es

L.C. Cagnina

CONICET – LIDIC, Universidad Nacional de San Luis, San Luis, Argentina

e-mail: lcagnina@unsl.edu.ar

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_8

155

156 P. Rosso and L.C. Cagnina

transmitted by a sender to foster a false belief or conclusion by the receiver” is a

concept that can be perfectly applied to CMC. Deception detection is a well-known

challenging problem in any research area, basically because the human ability to

detect deception is poor. Maybe for that reason, there is not a reliable and robust

approach that is able to automatically perform that kind of detection.

Particular studies on social psychology and communications show that the

accuracy rates of people abilities for detecting deception are in the range of 55–

58% (Frank et al. 2004), that is, slightly better than chance. Many works point out

how to ﬁnd patterns to help to solve this task (Fitzpatrick et al. 2015; Poria et al.

2015). In Newman et al. (2003) the authors present the results of an experiment

with participants who either lied or told the truth. They observed that liars use

less frequently ﬁrst person singular pronouns (I, me, my) maybe due to the lack

of personal experience. In Burgoon et al. (2003) the authors suggest that liars use

more emotional terms and concretely, negative and positive emotions (hate, happy,

sad) than true tellers.

According to Zuckerman et al. (1981) the non-verbal behavior of liars include:

emotional reactions (guilt, fear and delight are usually associated to deception

(Ekman 2001)), cognitive effort in formulating their stories for avoiding contradic-

tions (Vrij et al. 2008), and behavioral control (verbal and not) to result convincing.

Such is the case of strong emotions that can activate facial muscles in almost

the same moment in which the deception occurs. The work presented in Ekman

(2001) shows that observing those facial micro-expressions, the deception could

be detected. In fact, the author was able to classify correctly deception with an

accuracy of 80% using micro-expressions observations. Then, that value was further

improved in a 7% only incorporating the tone of voice (Frank and Ekman 1997)of

the person who is lying.

Some computational approaches include computer vision methods used to

distinguish expressions of genuine and posed pain (Littlewort et al. 2007), and

facial expressions (Zhang et al. 2007; Valstar et al. 2006,2007; Cohn and Schmidt

2004). Particularly in Littlewort et al. (2007), the authors trained 20 Support Vector

Machine (SVM) classiﬁers with 5500 images of posed and spontaneous facial

expressions (fake versus real pain). The proposed method considered 20 facial

action units and obtained 72% of accuracy in differentiating fake from real pain

expressions. A different strategy is the one presented in Zhang et al. (2007) to detect

real facial expressions which arise from internal emotions versus those deceptive

which are simulated. The system uses facial action units related to emotions (anger,

enjoyment, fear and sadness), considering distance and texture based features. The

results obtained with this strategy were good enough, that is, accuracy values in

the range of 73–90%. In a ﬁrst attempt to detect posed from spontaneous facial

behavior, in Valstar et al. (2006) the authors proposed a semi automated system to

discriminate brow actions. They used the speed, intensity, duration and occurrence

order of each brow movement. The system obtained 90.7% of classiﬁcation rate on

189 samples of spontaneous and volitional facial data. Later, in Valstar et al. (2007)

the authors proposed a method for automatic multi-cue discrimination between

posed and spontaneous smiles in videos. For that, they considered head, face

8 Deception Detection and Opinion Spam 157

and shoulder movements. The classiﬁcation was performed with kernel methods

combined with ensemble learning techniques. The obtained results reached a rate

of 94% of correctly classiﬁed videos. In Cohn and Schmidt (2004) the authors used

a linear classiﬁer with timing and amplitude measures of smiles for discriminate

spontaneous from deliberate similes. They obtained results of classiﬁcation with

93% of accuracy using 81 young-adults videos. A different approach is the one

proposed in Mihalcea et al. (2013) in which the verbal component of videos was

used to detect deception. The authors used a collection of 140 fake and truthful

recordings represented with the unigrams of words model. SVM and Naïve Bayes

obtained accuracies in the range of 52–73% with and without considering stop

words.

More sophisticated features were used in Newman et al. (2003), in which 568

texts with true and false statements were analyzed considering 29 variables of

the Linguistic Inquiry and Word Count (LIWC) tool2like word count, amount

of pronouns, positive and negative emotions, motion verbs, etc. The statements

were obtained recording on videos opinions about abortion and then, these were

transcribed. Besides, a group of persons were asked to write about their feeling

on abortion and some others were asked about friends and a ﬁctitious crime. With

the texts, the authors trained a logistic regression method and obtained 67% of

correct classiﬁcation between truth tellers and liars. The authors also concluded

that the ﬁve most signiﬁcant variables over the 29 were: self-reference terms,

negative emotions, motion verbs, references to others and exclusive words. A similar

study was carried out in Mihalcea and Strapparava (2009), in which the authors

constructed three corpora of 100 texts with truths and 100 with lies, considering

topics as abortion, death penalty and best friends. In order to answer if the separation

between both classes of texts is possible, Naïve Bayes and SVM classiﬁers were

used, considering only the bag of word (BOW) representation with tokenization

and stemming preprocessing (no feature selection was performed). The results

obtained with each corpus show accuracy values around 70% in identifying true

from deceptive texts. The authors also were interested in knowing which features

are the distinctive of deceptive statements. For this, they calculated a dominance

score of a given word class considering the collection of deceptive texts. This score

is a measure of saliency for the word classes used, in this case, the 70 categories

deﬁned in LIWC. According to this score, the conclusion indicates that the word

usage in deceptive texts includes detachment from the self (you, other) and words

related to certainty (always, very) while belief-oriented words (feel, believe, think)

are present in truthful statements.

It is important to detect linguistic patterns in verbal deception in order to differen-

tiate deceptive from truthful CMC (email, instant messaging, chat, etc.). In Hancock

et al. (2008) the analysis of 242 CMCs showed that liars use more words in general,

more sense-based words (touch, listen), and few self-oriented pronouns (I, me)

particularly when they attempt to increase the distance between themselves and the

2LIWC (Tausczik and Pennebaker 2010) is a tool able to analyze the positive and negative emotions

(among other characteristics) contained in the text. http://www.liwc.net

158 P. Rosso and L.C. Cagnina

deception. The authors also revealed that liars ask more questions, use more negative

emotions and avoid causation words (because, hence) in deceptive conversations

for reducing the possibility of contradictions. In Warkentin et al. (2010) the authors

examined the effect of three particular warrants (pieces of information): name, photo

and acquaintance on 562 CMCs including emails, instant messagings, forums, chat

rooms and social networking sites. They aimed to know if the use of warrants could

reduce the frequency of deception and constraint its seriousness. This issue can be

used to determine if the online identity of a person matches with the real world

one. The authors claimed that warrants affect the perception of the information

about others, then consequently, this could propitiate deceptive practices in CMC.

For that proposal, they analyzed the data collected through a survey using a mixed

model approach. The conclusions stated that people lie frequently in chats but least

in emails and social networking sites. Also, the authors found that there exists a

negative and linear relationship between warrants and deception, with exception of

real world acquaintances which constrain deception in emails and social networking

sites. With similar characteristics, in Smith et al. (2014) the authors studied the effect

of lies in text messaging, a particular form of CMC. A total of 164 participants ﬁlled

in a short questionnaire with information about demographics and text messaging

behavior. Then, the participants completed a Web survey with information related

to the last 15 messages sent to two selected persons. After analyzing the obtained

results, the authors concluded that deception in text messaging is not very common

although there are some proliﬁc liars (people who lie in a day more than the

average). Deception in this kind of CMC seems to be less frequent among close

people and has to do with concerns about coordinating activities and plans.

As we have described previously, automatic deception detection has been studied

considering psychological (Zuckerman et al. 1981; Zhu et al. 2007;Vrijetal.2008;

Tsiamyrtzis et al. 2006) and psycholinguistic (DePaulo et al. 2003; Newman et al.

2003; Burgoon et al. 2003; Hancock et al. 2008) traits. In Hauch et al. (2014)

some general linguistic cues related to deception were analyzed as well as those

that can be detected using automatic tools. The authors considered 79 linguistic

cues extracted from published articles on deception. They determined that around

60% of the studies used the LIWC tool, while less than 25% of the total of works

used other general tools. Only 18.6% used speciﬁc tools developed for deception

detection. The results reported in the work concluded that people who lied could

experience greater cognitive load regarding the true-tellers, they demonstrated more

negative emotions, and used frequently negations, ﬁrst person pronoun and present

tense verbs. Additional conclusions claimed that liars express fewer sensory and

contextual detail words and refer less often to cognitive process in comparison

with true-tellers. Besides the results obtained regarding the relationship between

the language used by liars and the act of deception, the authors concluded that these

linguistic cues can be applied in computational methods to detect deception.

Some proposals considered other languages besides English, as Italian and

Spanish. In Fornaciari and Poesio (2012) a corpus named DECOUR of 3015

8 Deception Detection and Opinion Spam 159

utterances transcribed from hearings held in Italian Courts, was used. The authors

represented an Italian utterance by a feature vector considering: the length of the

utterance (with and without the punctuation), the number of words with length

longer than 6 letters, 80 linguistic variables obtained from LIWC for Italian

language and, frequencies of lemmas and n-grams (with nD1:::5). SVM was

used for the experimental study. Three different experiments were performed: one

considering the whole corpus divided into train and test sets, while the remaining

two experiments used smaller subsets of utterances for training the models. These

subsets were obtained performing two kinds of clusterings, each one considering:

(1) distances between hearings for detecting outliers, and (2) the gender of the

speakers. According to the Monte Carlo simulation applied to the test sets, values

as 59.60%, 61.26% and 63.19% of correct predictions were obtained for each

experiment respectively. This suggests that the models are effective in detecting

deceptive texts. Later, in Fornaciari et al. (2013) a combination of personality traits

was used as set of features to guide the classiﬁcation of deceptive communication.

The authors used 5 different classiﬁers to perform the experiments on a subset of

DECOUR corpus. The best results were obtained with a decision tree technique, and

the features used were emotional stability/neuroticism and openness to experience.

The F-measure obtained outperformed the baseline with a value of 0.55. In Almela

et al. (2013) the authors studied deception in Spanish written communication. They

collected 100 true and 100 false statements from Spanish-speakers considering

three different topics. The speakers were asked for opinions related to homosexual

adoption and bullﬁghting, and the feeling for the best friend. A linear SVM

classiﬁer was trained with the LIWC categories in Spanish of each collected corpus.

Combinations of the standard LIWC dimensions were used: linguistic dimension,

psychological processes, relativity and personal concerns. The results obtained with

each combination showed that F-measure scores are in the range [0.50, 0.72] for

the homosexual adoption opinion corpus. In the case of the bullﬁghting corpus,

the scores of F-measure are in the range [0.52, 0.68], and [0.63, 0.84] for the best

friend feeling corpus. The conclusions showed that the fourth dimension is the least

discriminant and, the ﬁrst and second one are the most relevant.

8.2 Lies in Predatory Communication: Online Sexual

Predators Detection

Pedophilia is a problem that has gained relevance in the past decade due the

massive use of social media like facebook,myspace,Hi5 and micro-blogs like

Twitter,Plurk,Tumblr, etc. New ways of meeting people are offered through the

use of chat rooms like chatroulette.com and omegle.com where often the

identiﬁcation of the user is not needed. The anonymity, the lack of information

and the poor parental control promote the pedophilia as a great social problem.

Pedophilia is a clinical diagnosis deﬁned as World Health Organization (2012):

160 P. Rosso and L.C. Cagnina

“A sexual preference for children, boys or girls or both, usually of prepubertal or

early pubertal age”. It is a particular case of a disorder of sexual preference of an

adult individual, commonly named pedophile. From a computational point of view,

a pedophile using social media for gaining access to young victims could be named

“online sexual predator” or “cyberpedophile”. In Guo (2008) an online sexual

predator is deﬁned like “someone who uses internet to sexually exploit vulnerable

individuals, typically underaged youths” and it is characterized as a person who

talks about sex as soon as he can, usually each three or four message exchanged

with the young. The personality of online sexual predators is friendly because

they try to detect vulnerabilities in the victims (which use to “understand” them

and become thus “a friend”). On the other hand, cyberpedophiles have feelings of

inferiority, isolation, loneliness, low self esteem, emotional immaturity that prevents

to have adequate interpersonal interaction with people, and experiment high levels

of passive aggressiveness (Hall and Hall 2007) that express through the text in the

chat conversations. The offenders deceive the young making promises of love and

romance but their intentions are primarily sexuals. Cyberpedophiles often create

false proﬁles, pretend to be younger or of the opposite sex and try to copy child’s

behaviour.

The phenomenon of pedophilia has been studied from different research perspec-

tives. From the law enforcement view, through the reforms in the criminal codes in

order to create a new offence for persons who use Internet to procure an underage

to commit a sexual act or expose him to pornography. This, in conjunction with

programs and non-proﬁt organizations to investigate, control, detect and prevent

sexual exploitations of underage in social media, pretend to address the problem.

Psychology and forensic psychiatry study the interactions between the offender and

the victims in order to establish a behavioral pattern. From the natural language

processing (NLP) perspective, the research points out to provide reliable tools to

automatically detect pedophilia in online social media.

One of the problems involved with the computational approaches for the detec-

tion of sexual predators is the manual monitoring of chat conversations. Usually

these are impossible to analyze due to the massive amount of data to processing and

some privacy issues. Besides, the characteristics of this kind of text prevent to use

general tools to evaluate their content. The texts of the chats generally are informal,

quite different from the regular written texts (news, abstracts, monographs, etc.)

even blogs. In chat conversations it is possible to ﬁnd large amount of mistakes

and misspellings caused by the fast typing, the use of emoticons, abbreviations,

character ﬂooding and speciﬁc slangs. For general NLP processing tools, the latter

characteristics can be considered as very noise data, but for speciﬁc approaches for

detecting possible pedophiles, can be valuable information to process.

The detection of certain emotions in the text could help to detect possible

pedophiles. An initial work used categorical and psychometric information pro-

vided by LIWC as features, besides the traditional term-based features for the

representation of the chat conversations (Rahman Miah et al. 2011). Typical words

as “friend”, “family”, “sex”, “anger”, “happy”, “sad” and “anxiety” are indicative

of emotional and cognitive components. Then, using standard text categorization

8 Deception Detection and Opinion Spam 161

techniques as Naïve Bayes, J48 decision tree and classiﬁcation via regression, the

authors classiﬁed the chat logs in three categories: (a) underage exploiting: an

adult offender chats with a minor, (b) sex fantasy: chats between two adults with

sexual content and, (c) general chats: without sexual subject matter. The study

showed that the representation enriched with the category of the words and the

psychometric information, improved the performance of some classiﬁers used to

predict the class of underage exploiting chats. Other works have used LIWC to

extract useful information for the pedophile detection in social media. Such is the

case of Gupta et al. (2012) in which LIWC was used to create psycholinguistic

proﬁles for ﬁnding patterns related to the six online grooming stages (O’Connell

2003): friendship forming, relational forming, risk assessment, exclusivity, sex and

conclusion. These proﬁles can be used in automatic classiﬁers to detect possible

stages of grooming in chat conversations. From the study, the authors concluded that

relationship forming (personal information exchange about family, friends, school,

etc.) is the most characteristic stage. However, the pedophile generally does not

wait for the ending of grooming to produce the meeting with the underage. Then,

the conclusion stage should be identiﬁed early in order to detect a possible attack. In

Parapar et al. (2012) the authors studied three different strategies to extract a feature-

based representation for chat conversations: the standard term-based tf-idf, eleven

chat-based features with information about the activity of the person in chatrooms

(number of lines in a chat, number of users participating in a conversation, time

between consecutive line messages, etc.), and LWIC features for analyzing aspect as

deception versus honesty through the category of words (psychological constructs:

affect, cognition; personal concern: home, leisure, etc.). The three sets of features

were used independently for performing the classiﬁcation with SVM. The results

obtained were not good enough for identifying sexual predators. Then, combinations

of these sets were used obtaining the best results with the tf-idf and chat-based

features. In a later work (Parapar et al. 2014), the authors proposed additional

LIWC features (80 in total) based on psycho-linguistic evidence. They argued that

those features are markers of emotional states and provide valuable clues about

deception and honesty. A deeper analysis of the best performing classiﬁers and

the most discriminative features concluded that the set of features utilized and

the relative weighting of the misclassiﬁcation costs in the SVM algorithms, are

important factors that affect the performance of the system. They identiﬁed that

the word categories more implicated in deception are: use of pronouns, emotion

words, markers of cognitive complexity, and motion verbs. To similar conclusions

arrived the authors of Cano et al. (2014) in which the chat conversations were

classiﬁed considering features of sentiment polarity, content and, psycholinguistic

and discourse patterns. The interesting proposal focuses on the behavior of predators

in each underage grooming stage (as was proposed in Gupta et al. 2012) classifying

the lines into such stages. Basically, the authors in Cano et al. (2014)makea

proﬁle of the predator considering six different types of features: BOW (1, 2 and

3 grams), syntactical (POS tags), sentiment polarity (extracted from a sentence with

162 P. Rosso and L.C. Cagnina

Sentistrength3), content (complexity, readability, length), psycholinguistic (62 in

total obtained with LIWC) and discourse patterns (semantic frame in which a word

sense is used). Then, a supervised approach for automatic classiﬁcation of online

grooming stages was proposed. The results obtained showed that the discourse

‘label’ feature outperformed the baseline in terms of precision for the three stages.

When combined features were used, the results improved in terms of precision and

F-measure for grooming and approach stages. Regarding the analysis of features,

the authors found that sentiment polarity characteristics used in the study, were not

discriminatory. On the contrary, discourse frames as emotional_state, desiring and

stimulus_focus (ﬁne-grained emotions) were useful in the classiﬁcation task. For

evaluating the differences and similarities of the grooming stage in both online and

face-to-face environments, the authors in Black et al. (2015) used the transcripts

of 44 convicted online offenders. They also used LIWC and content analysis of

strategies in order to study the texts. The considered strategies involved situations

as friendship forming, risk assessment, exclusivity and sexual stages. The results

indicated that many strategies as talk about plans, use of ﬂattery, the assessing of

parents activities and the mention of past relationships, are common practices for

both environments. Besides, the timing and the order of the considered strategies

seem to be different in online communication; for example, the deceiver in CMC

uses the strategies faster and, the assessing risk particularly, is more frequent than

in face-to-face communication.

The change of mood could be indicative of the level of emotional instability

of pedophiles. There is an interesting publicly available resource named SenticNet

(Cambria et al. 2016) which associates semantics and sentics to many common and

common-sense concepts for the analysis of concept-level sentiments. Another useful

resource to obtain information about the emotion contained in the words is WordNet

Affect (Strapparava and Valitutti 2004), an additional hierarchy of “affective domain

labels” as part of WordNet Domains. WordNet Affect was used for the identiﬁcation

of emotions in Bogdanova et al. (2014) such as positive and negative words related

to basic emotions such as joy, sadness, anger, disgust, surprise and fear. Content and

stylistic based features as: approach words (meet, car), family words (mum, dad),

relationship nouns (boyfriend, date), personal pronouns (I, you) and obligation verbs

(must, have to), were also considered in the same work. The results obtained with

a SVM classiﬁer concluded that the use of high-level features achieves the 97% of

accuracy discriminating cyberpedophiles from cybersex chats in comparison with

the use of low-level features (50–64%).

Other works have used only the content of the chat conversations directly. In

Kucukyilmaz et al. (2008), Egan et al. (2011), and Barber and Bettez (2014)

the authors investigated the feasibility of predicting the author of a chat by the

extraction of the information contained in the text. In Kucukyilmaz et al. (2008)

the authors stated that chat messaging has evolved in order to transfer emotions. A

3http://sentistrength.wlv.ac.uk/

8 Deception Detection and Opinion Spam 163

clear example of it is the use of emoticons4for representing feelings typing only a

sequence of punctuation marks. Also, the repetition of some characters or the use

of uppercase letters in a word, are used to transfer emotions. In Egan et al. (2011)

the authors analyzed the written language in chats in order to identify recurrent

topics that cyberpedophiles usually use. In Barber and Bettez (2014) the authors

identiﬁed online sexual predators patterns of behavior for a potential use in pattern

recognition. Their study concluded that characteristics as fantasizing (cyber sexual

elements in text), sexuality assessment (to obtain information about the sexual skill

of the youth), domination (over the acts of the victim), enticement, and the intention

to have a face-to-face meeting, could be used to improve automated detection

software and educational tools.

Due the important challenge involved with the detection of predatory com-

munication, a shared task on sexual predator identiﬁcation was organized at

PAN-2012 (Inches and Crestani 2012). The objective of the task was twofold:

identifying the predators among all the users in the different chat conversations and

identifying the most distinctive lines of the predator bad behaviour. The 16 teams

participating in the contest made possible the recognition of common pattern for the

predators identiﬁcation. The winner method used a two step approach (Villatoro-

Tello et al. 2012) for distinguish predators conversations between normal chats.

The authors performed a preprocessing step for removing conversations with just

one participant, less than 6 interventions per user and containing text with 3 long

sequence of unrecognized characters. The best result, with F-measure of 0.87, was

obtained with a neural network classiﬁer using BOW with boolean scheme. Only

two proposals considered characteristics that go beyond shallow lexical features.

In Vartapetiance and Gillam (2012) the authors discovered that sexual intentions

can be detected from the text, although not explicitly, considering activities that

the pedophile tries to share with the victim as watching TV, listening to music,

meeting and having fun. Also some spelling combination of words as “go down

on you” and “make you come” are usually used to express the wished of sexual

intentions. The latter was used as feature in Vartapetiance and Gillam (2012) along

with the identiﬁcation of words related to age (“you are young”, “wish you were”),

parents (“your mom”, “Ur dads car”) and address (“ur address”, URLs). Using these

four features the authors obtained a F-measure score of 0.47. In Morris and Hirst

(2012) the authors proposed behavioral features besides lexical, in order to model

the actions of a possible predator. Features as tendency to initiate a conversation,

number of times asking the same question, attempts to keep a conversation going,

response time, repeated messages and dominance of the conversation, contributed

to obtain a F-measure score of 0.72.

4emot(ion) + icon. “A sideways facial glyph used in e-mail to indicate an emotion or attitude, as to

indicate intended humor” (Pickett 2000).

164 P. Rosso and L.C. Cagnina

8.3 Lies in Opinions: Deceptive Opinions Detection

With the increasing availability of review sites and blogs, consumers rely more than

ever on online reviews to make their purchase decisions. A recent survey5found

that 68% of them have reinforced the decision to purchase a product or service

by positive online reviews and 92% of consumers read online reviews to judge a

local business or a product. Therefore, detecting lies in opinions is a very important

problem as well as challenging since opinions expressed on the Web are typically

short texts, written by unknown people using different styles and for different

purposes.

The detection of opinion spam, i.e., the identiﬁcation of fake reviews that try

to deliberately misleading human readers, is just another face of the problem of

the detection of lies on the Web (Lau et al. 2012). Nevertheless, the construction

of automatic detection methods for this task is complex since manually gath-

ering labeled reviews, and particularly truthful opinions, is difﬁcult (Mukherjee

et al. 2011). Due to the lack of reliable labeled data, most initial works on the

detection of opinion spam considered unsupervised approaches which relied on

meta-information from reviews and reviewers. For example, in Jindal and Liu (2008)

the authors proposed detecting opinion spam by identifying duplicate content. This

method showed good precision using a logistic regression classiﬁer with a reviews

dataset from Amazon but it failed detecting original fake reviews. In a subsequent

paper (Jindal et al. 2010), the authors proposed to detect spammers by searching

for unusual review patterns. They classiﬁed a reviewer as spam suspect if the

person wrote negative reviews about all the products of a brand but wrote positive

reviews about a competing brand. The duplication of content was also considered in

Lin et al. (2014), in which several features based on similarities were presented.

The authors measured the similarity of a review regarding other reviews of the

same author and other reviews about the same product, reviews frequency of the

product, and comments frequency. Then, those features were used to determine if a

review is spam or not considering a threshold. Considering also a similarity score, a

probabilistic language model detects similar content between two reviews (Lai et al.

2010). The authors tested the model with a SVM classiﬁer and obtained a precision

of 81% in detecting spam reviews. A lower precision value of 43.6% was obtained

with an analogous approach but considering the conventional cosine function to

measure conceptual features (Algur et al. 2010). Similarly, in Wu et al. (2010)the

authors presented a method to detect hotels which are more likely to be involved in

spamming. They proposed a number of criteria that might be indicative of suspicious

reviews and then, they evaluated alternative methods for integrating these criteria to

produce a suspiciousness ranking. Their criteria mainly derive from characteristics

of the network of reviewers and also from the impact and ratings of reviews. It is

worth mentioning that they did not take advantage of reviews’ content for their

analysis. In the same category of unsupervised approaches, in Mukherjee et al.

5Local Consumer Review Survey 2015 (visited: January 3, 2016): https://www.brightlocal.com/

learn/local-consumer-review-survey/

8 Deception Detection and Opinion Spam 165

(2011) the authors proposed a method for detecting groups of opinion spammers

based on criteria such as the number of products for which the group work together

and a high content similarity of their reviews. Finally, in (Xie et al. 2012), it has been

demonstrated that a high correlation between the increase in the volume of singleton

reviews and a sharp increase or decrease in the ratings is a clear signal that the rating

is manipulated by possible spam reviews. Supported by this observation the authors

proposed an opinion spam detection method based on temporal pattern discovery.

It was only after the release of the gold-standard datasets (Ott et al. 2011,2013),

which contain examples of positive and negative deceptive opinion spam, that it

was possible to conduct supervised learning and a reliable evaluation of the task.6

In Ott et al. (2011) the authors employed a SVM classiﬁer to distinguish between

positive deceptive and truthful reviews using different stylistic, syntactic and lexical

features. Then, in Ott et al. (2013) they applied the same approach to classify

negative opinions. The main conclusion from these works is that standard text

categorization techniques using unigrams and bigrams word features are effective at

detecting deception in text, and that their results signiﬁcantly outperform those from

human judges. Following this research direction, in Feng et al. (2012a,b) the authors

extended Ott et al.’s n-gram feature set by incorporating deep syntax features, i.e.,

syntactic production rules derived from probabilistic context free grammar parse

trees. Their experimental results consistently ﬁnd statistical evidence that deep

syntactic patterns are helpful in discriminating deceptive writing. Similarly, in Feng

and Hirst (2013) the authors extended previous Ott et al. and Feng et al.’s works

by incorporating features that characterize the degree of compatibility between the

personal experience described in a test review and a product proﬁle derived from

a collection of reference reviews about the same product. This idea was supported

by the hypothesis that since the writer of a deceptive review usually does not have

any actual experience with that product, the resulting review might contain some

contradictions with facts about the product. This approach signiﬁcantly improved

the performance of identifying deceptive reviews.

Although supervised text classiﬁcation techniques have demonstrated to be very

robust if they are trained using large sets of labeled instances from both deceptive

and truthful opinions – some works have reported F1measures around 0.90 (Ott

et al. 2011,2013; Feng and Hirst 2013) – in real application scenarios it is very

difﬁcult to compile such large training sets and maybe, it is almost impossible

to determine the authenticity of the opinions, i.e., to assemble a set of veriﬁed

truthful reviews (Mukherjee et al. 2011). To overtake this restriction, in PU-learning

(Liu et al. 2002) has been applied to detect deceptive opinion spam learning only

from a few examples of deceptive opinions and a set of unlabeled data, under

the consideration that deceptive opinion spam can be accurately generated via

crowdsourcing as suggested in Ott et al. (2011).

6http://myleott.com/op_spam

166 P. Rosso and L.C. Cagnina

In Li et al. (2014) the authors present a study on Chinese fake review detection.

First they considered two classes of reviews: fake and unknown. However, since the

unknown data set may contain many fake reviews, it was treated as an unlabeled

set. Therefore, the PU-learning model was employed in order to learn from positive

and unlabeled examples. Experimental results showed that PU learning not only

outperforms supervised learning signiﬁcantly, but also detects a large number of

potentially fake reviews hidden among the unlabeled examples.

In Hernández Fusilier et al. (2015) the authors proposed a PU-learning variant

for detecting opinion spam. The evaluation of the proposed method was carried out

using the set of hotel reviews gathered in Ott et al. (2013) containing positive and

negative deceptive opinion spam. The results are encouraging: on the one hand,

they indicate that using only a hundred of examples of deceptive opinions for

training it is possible to reach F1measures of 0.8 and 0.7 for positive and negative

opinions, respectively. On the other hand, they demonstrate the appropriateness

of the proposed PU-learning variant for detecting opinion spam, since its results

signiﬁcantly outperformed those from the original PU-learning approach in both

kinds of opinion spam. Moreover, the authors analysed the role of opinions’ polarity

in the detection of deception. Their results conﬁrm that negative deceptive opinions

are more difﬁcult to detect than positive ones, but they also show that having

one single classiﬁer for analysing both kinds of opinions is better than using two

separate classiﬁers, suggesting that there are common characteristics in the way

people write positive and negative opinion spam. In Ren et al. (2014)thesame

authors proposed a semi-supervised model. Firstly, some reliable negative examples

were identiﬁed from the unlabeled dataset. Secondly, some representative positive

examples and negative examples were generated with Latent Dirichlet Allocation.

Thirdly, a SVM classiﬁer was feeded with the remaining unlabeled examples and

their similarity weights. Experiments on gold-standard Ott’s dataset showed very

interesting results obtaining accuracy values above 80%. Better results on detecting

whether a review is spam or not were obtained with the framework presented in

Sharma and Lin (2013). Criteria as rating consistency, questions, capital letters,

comparative sentences and links were used to calculate a rating of a review.

Considering that rating the framework could determine whether a review is spam

or not, with a high accuracy value.

In Hernández Fusilier et al. (2015) the detection of opinion spam was considered

as a stylistic classiﬁcation task. That is, although deceptive and truthful opinions

given a particular domain are similar in content, they differ in the way opinions

are written. The authors proposed to use character n-grams as features since

they have shown to capture lexical content as well as stylistic information. They

evaluated their approach on the standard-de-facto Ott’s corpus composed of 1600

hotel reviews, considering positive and negative reviews. They compared the results

obtained with character n-grams against the ones with word n-grams. The results

obtained show that character n-grams are good features for the detection of opinion

spam; they seem to be able to capture better than word n-grams the content of

8 Deception Detection and Opinion Spam 167

deceptive opinions and the writing style of the deceiver. In particular, the results

show an improvement of 2:3% and 2:1% over the word-based representations in

the detection of positive and negative deceptive opinions respectively. Furthermore,

character n-grams allow to obtain a good performance also with a very small training

corpus. Using only 25% of the training set, a Naïve Bayes classiﬁer showed F1

values up to 0.80 for both opinion polarities. A similar study was presented in

Cagnina and Rosso (2015) in which the authors studied the performance of Naïve

Bayes and SVM classiﬁers using character n-grams in tokens, the sentiment score

and LIWC linguist features such as pronouns, articles and verbs (present, past and

future tenses). The Ott’s corpus cited previously was used to test the proposed

features and the results were compared with those obtained with state-of-the-art

methods. From the experimental study the authors concluded that character n-

grams in tokens capture correctly content and the writing style of the reviews, the

sentiment-based feature does not provide useful information for detecting deception

in this kind of text, and LIWC variables as pronouns, articles and verbs are

meaningful. In fact, character 4-grams in tokens combined with LIWC variables

performed the best with a SVM classiﬁer reaching a F-measure of 0.87. Regarding

the comparison with the results of Hernández Fusilier et al. (2015), the statistical

signiﬁcance test showed that both approaches performed similarly although the

proposal in Cagnina and Rosso (2015) used a lower dimensionality representation

(95% reduction of features) compared with the one presented in Hernández Fusilier

et al. (2015).

8.4 Conclusions

From the point of view of psychological, linguistic and computational processes,

the deception detection presents constant challenges to be addressed. In this

work different approaches to automatically detect deception have been described,

although we have focused mainly on those that considered emotional and cognitive

aspects of the problem. Verbal deception detection has been also addressed in online

sexual predators communications. Special attention has been given to the problem

of the detection of deceptive opinions.

Acknowledgements This work is the result of the collaboration in the framework of the WIQ-

EI IRSES research project (grant no. 269180) within the EC FP7 Marie Curie. The work of the

ﬁrst author was also in the framework of the SomEMBED TIN2015-71147-C2-1-P MINECO

research project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent

Systems. The ﬁrst author would like also to thank the co-authors of the papers on deceptive

opinions detection. The research work of the second author has been partially funded by CONICET

(Argentina).

168 P. Rosso and L.C. Cagnina

References

Algur, S.P., A.P. Patil, P.S. Hiremath, and S. Shivashankar. 2010. Conceptual level similarity

measure based review spam detection. In 2010 International Conference on Signal and Image

Processing (ICSIP), 416–423.

Almela, A., R. Valencia-García, and P. Cantos. 2013. Seeing through deception: A computational

approach to deceit detection in Spanish written communication. Linguistic Evidence in

Security, Law and Intelligence 1(1): 3–12.

Barber, C.S., and S.C. Bettez. 2014. Deconstructing the online grooming of youth: Toward

improved information systems for detection of online sexual predators. In Proceedings of the

International Conference on Information Systems (ICIS 2014).

Black, P.J., M. Wollis, M. Woodworth, and J.T. Hancock. 2015. A linguistic analysis of grooming

strategies of online child sex offenders: Implications for our understanding of predatory sexual

behavior in an increasingly computer-mediated world. Child Abuse & Neglect 44: 140–149.

Bogdanova, D., P. Rosso, and T. Solorio. 2014. Exploring high-level features for detecting

cyberpedophilia. Computer Speech and Language 28(1): 108–120.

Buller, D.B., and J.K. Burgoon. 1996. Interpersonal deception theory. Communication Theory 6(3):

203–242.

Burgoon, J.K., J.P. Blair, T. Qin, and J.F. Nunamaker Jr. 2003. Detecting deception through

linguistic analysis. In Intelligence and security informatics,ed.H.Chen,R.Miranda,D.D.

Zeng, C. Demchak, J. Schroeder, and T. Madhusudan, Lecture notes in computer science,

vol. 2665, 91–101. Berlin/Heidelberg: Springer.

Cagnina, L., and P. Rosso. Classiﬁcation of deceptive opinions using a low dimensionality repre-

sentation. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity,

Sentiment and Social Media Analysis, Lisboa, 58–66, Sep 2015. Association for Computational

Linguistics.

Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for

sentiment analysis based on conceptual primitives. In Proceedings of COLING 2016, the 26th

International Conference on Computational Linguistics: Technical Papers, 2666–2677, Osaka,

Japan.

Cano, A.E., M. Fernandez, and H. Alani. 2014. Detecting child grooming behaviour patterns on

social media. In Social informatics, ed. L. Aiello and D. McFarland, Lecture notes in computer

science, vol. 8851, 412–427. Springer.

Cohn, J.F., and K.L. Schmidt. 2004. The timing of facial motion in posed and spontaneous smiles.

International Journal of Wavelets, Multiresolution and Information Processing 2(2): 121–132.

DePaulo, B.M., J.J. Lindsay, B.E. Malone, L. Muhlenbruck, K. Charlton, and H. Cooper. 2003.

Cues to deception. Psychological Bulletin 129(1): 74–118.

Egan, V., J. Hoskinson, and D. Shewan. 2011. Perverted justice: A content analysis of the language

used by offenders detected attempting to solicit children for sex. In Antisocial behavior:

Causes, correlations and treatments, ed. R.M. Clarke, 119–134. New York: Nova Science

Publishers.

Ekman, P. 2001. Telling lies: Clues to deceit in the marketplace, politics, and marriage.NewYork:

W. W. Norton & Company.

Feng, S., R. Banerjee, and Y. Choi. 2012a. Syntactic stylometry for deception detection. In

Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics

(ACL’12), 171–175. The Association for Computer Linguistics.

Feng, S., L. Xing, A. Gogar, and Y. Choi. 2012b. Distributional footprints of deceptive product

reviews. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social

Media, ed. J.G. Breslin, N.B. Ellison, J.G. Shanahan, and Z. Tufekci, 98–105. The AAAI Press.

Feng, V.W., and G. Hirst. 2013. Detecting deceptive opinions with proﬁle compatibility. In

Proceedings of the Sixth International Joint Conference on Natural Language Processing, 338–

346. Asian Federation of Natural Language Processing.

8 Deception Detection and Opinion Spam 169

Fitzpatrick, E., J. Bachenko, and T. Fornaciari. 2015. Automatic detection of verbal deception,

Synthesis lectures on human language technologies. Morgan & Claypool Publishers.

Fornaciari, T., F. Celli, and M. Poesio. 2013. The effect of personality type on deceptive

communication style. In European Intelligence and Security Informatics Conference, 1–6.

IEEE.

Fornaciari, T., and M. Poesio. 2012. On the use of homogenous sets of subjects in deceptive

language analysis. In Proceedings of the Workshop on Computational Approaches to Deception

Detection (EACL 2012), 39–47. Association for Computational Linguistics.

Forner, P., J. Karlgren, and C. Womser-Hacker, ed. 2012. CLEF 2012 Evaluation Labs and

Workshop, Online Working Notes,Rome.

Frank, M.G., and P. Ekman. 1997. The ability to detect deceit generalizes across different types of

high-stake lies. Journal of Personality and Social Psychology 72(6): 1429–1439.

Frank, M.G., N. Paolantonio, T.H. Feeley, and T.J. Servoss. 2004. Individual and small group

accuracy in judging truthful and deceptive communication. Group Decision and Negotiation

13(1): 45–59.

Guo, R.M. 2008. Stranger danger and the online social network. Berkeley Technology Law Journal

23(1): 617–644.

Gupta, A., P. Kumaraguru, and A. Sureka. 2012. Characterizing pedophile conversations on the

internet using online grooming. arXiv:1208.4324v1.

Hall, R.C.W., and R.C.W. Hall. 2007. A proﬁle of pedophilia: Deﬁnition, characteristics of

offenders, recidivism, treatment outcomes, and forensic issues. In Mayo Clinic Proceedings,

457–471. Elsevier.

Hancock, J.T., L.E. Curry, S. Goorha, and M. Woodworth. 2008. On lying and being lied to: A

linguistic analysis of deception in computer-mediated communication. Discourse Processes

45(1): 1–23.

Hancock, J.T., J. Thom-Santelli, and T. Ritchie. 2004. Deception and design: The impact of

communication technology on lying behavior. In Proceedings of the SIGCHI Conference on

Human Factors in Computing Systems, 129–134. ACM.

Hauch, V., I. Blandon-Gitlin, J. Masip, and S.L. Sporer. 2014. Are computers effective lie

detectors? A meta-analysis of linguistic cues to deception. Personality and Social Psychology

Review 41: 1–36.

Hernández Fusilier, D., M. Montes y Gómez, P. Rosso, and R. Guzmán Cabrera. 2015. Detecting

positive and negative deceptive opinions using pu-learning. Information Processing and

Management 51(4): 433–443.

Hernández Fusilier, D., M. Montes y Gómez, P. Rosso, and R. Guzmán Cabrera. 2015. Detection

of opinion spam with character n-grams. In 16th International Conference on Intelligent Text

Processing and Computational Linguistics, Lecture Notes in Computer Science, vol. 9042,

285–294. Springer.

Inches, G., and F. Crestani. 2012. Overview of the international sexual predator identiﬁcation

competition at pan-2012. In Forner et al. (2012).

Jindal, N., and B. Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008 International

Conference on Web Search and Data Mining (WSDM ’08), 219–230. ACM.

Jindal, N., B. Liu, and E. Lim. 2010. Finding unusual review patterns using unexpected rules.

In Proceedings of the 19th ACM International Conference on Information and Knowledge

Management, ed. J. Huang, N. Koudas, G.J. F. Jones, X. Wu, K. Collins-Thompson, and A. An,

1549–1552. ACM.

Kucukyilmaz, T., B.B. Cambazoglu, C. Aykanat, and F. Can. 2008. Chat mining: Predicting user

and message attributes in computer-mediated communication. Information Processing and

Management 44(4): 1448–1466.

Lai, C.L., K.Q. Xu, R.Y.K. Lau, Y. Li, and L. Jing. 2010. Toward a language modeling approach

for consumer review spam detection. In 2010 IEEE 7th International Conference on e-Business

Engineering (ICEBE), 1–8.

170 P. Rosso and L.C. Cagnina

Lau, R.Y.K., S.Y. Liao, R.C. Kwok, K. Xu, Y. Xia, and Y. Li. 2012. Text mining and probabilistic

language modeling for online review spam detection. ACM Transactions on Management

Information Systems 2(4): 1–25.

Li, H., B. Liu, A. Mukherjee, and J. Shao. 2014. Spotting fake reviews using positive-unlabeled

learning. Computación y Sistemas 18(3): 467–475.

Lin, Y., T. Zhu, H. Wu, J. Zhang, X. Wang, and A. Zhou. 2014. Towards online anti-opinion spam:

Spotting fake reviews from the review sequence. In 2014 IEEE/ACM International Conference

on Advances in Social Networks Analysis and Mining (ASONAM), 261–264.

Littlewort, G.C., M.S. Bartlett, and K. Lee. 2007. Faces of pain: Automated measurement

of spontaneous facial expressions of genuine and posed pain. In Proceedings of the 9th

International Conference on Multimodal Interfaces (ICMI ’07), 15–21. ACM.

Liu, B., W.S. Lee, P.S. Yu, and X. Li. 2002. Partially supervised classiﬁcation of text documents.

In Proceedings of the Nineteenth International Conference on Machine Learning (ICML ’02),

387–394. Morgan Kaufmann Publishers Inc.

Mihalcea, R., V. Pérez-Rosas, and M. Burzo. 2013. Automatic detection of deceit in verbal

communication. In Proceedings of the 15th ACM on International Conference on Multimodal

Interaction (ICMI ’13), 131–134. ACM.

Mihalcea, R., and C. Strapparava. 2009. The lie detector: Explorations in the automatic recognition

of deceptive language. In Proceedings of the Association for Computational Linguistics

(ACL-IJCNLP 2009) Conference Short Papers (ACLShort ’09), 309–312. Association for

Computational Linguistics.

Morris, C., and G. Hirst. 2012. Identifying sexual predators by SVM classiﬁcation with lexical and

behavioral features. In Forner et al. (2012).

Mukherjee, A., B. Liu, J. Wang, N. Glance, and N. Jindal. 2011. Detecting group review spam. In

Proceedings of the 20th International Conference Companion on World Wide Web (WWW ’11),

93–94. ACM.

Newman, M.L., J.W. Pennebaker, D.S. Berry, and J.M. Richards. 2003. Lying words: Predicting

deception from linguistic styles. Personality and Social Psychology Bulletin 29(5): 665–675.

O’Connell, R. 2003. A typology of child cybersexploitation and online grooming practices, http://

netsafe.org.nz/Doc_Library/racheloconnell1.pdf.

Ott, M., C. Cardie, and J.T. Hancock. 2013. Negative deceptive opinion spam. In Proceedings

of the 2013 Conference of the North American Chapter of the Association for Computational

Linguistics: Human Language Technologies (NAACL-HLT 2013), 497–501. The Association

for Computational Linguistics.

Ott, M., Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding deceptive opinion spam by any

stretch of the imagination. Proceedings of the 49th Annual Meeting of the Association for

Computational Linguistics: Human Language Technologies, vol. 1, 309–319.

Parapar, J., D. Losada, and A. Barreiro. 2012. A learning-based approach for the identiﬁcation of

sexual predators in chat logs. In Forner et al. (2012).

Parapar, J., D.E. Losada, and A. Barreiro. 2014. Combining psycho-linguistic, content-based and

chat-based features to detect predation in chatrooms. Journal of Universal Computer Science

20(2): 213–239.

Pickett, J.P. 2000. The American Heritage Dictionary of the English Language. Number 2 in The

American Heritage Dictionary of the English Language. Houghton Mifﬂin.

Poria, S., E. Cambria, A.F. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data ﬂow analysis

by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):

26–36.

Rahman Miah, M.W., J. Yearwood, and S. Kulkarni. 2011. Detection of child exploiting chats

from a mixed chat dataset as text classiﬁcation task. In Proceedings of the Australian Language

Technology Association Workshop, 157–165. ALTA.

Ren, Y., D. Ji, and H. Zhang. 2014. Positive unlabeled learning for deceptive reviews detection. In

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing

(EMNLP), 488–498. Association for Computational Linguistics.

8 Deception Detection and Opinion Spam 171

Sharma, K., and K-I. Lin. 2013. Review spam detector with rating consistency check. In

Proceedings of the 51st ACM Southeast Conference (ACMSE ’13), 34:1–34:6. ACM.

Smith, M.E., J.T. Hancock, L. Reynolds, and J.P. Birnholtz. 2014. Everyday deception or a few

proliﬁc liars? The prevalence of lies in text messaging. Computers in Human Behavior 41:

220–227.

Strapparava, C., and A. Valitutti. 2004. WordNet-affect: An affective extension of WordNet. In

Proceedings of the 4th International Conference on Language Resources and Evaluation,

1083–1086. ELRA.

Tausczik, Y.R., and J.W. Pennebaker. 2010. The psychological meaning of words: LIWC and

computerized text analysis methods. Journal of Language and Social Psychology 29(1): 24–

54.

Tsiamyrtzis, P., J. Dowdall, D. Shastri, I.T. Pavlidis, M.G. Frank, and P. Ekman. 2006. Imaging

facial physiology for the detection of deceit. International Journal of Computer Vision 71(2):

197–214.

Valstar, M.F., H. Gunes, and M. Pantic. 2007. How to distinguish posed from spontaneous smiles

using geometric features. In Proceedings of the 9th International Conference on Multimodal

Interfaces (ICMI ’07), 38–45. ACM.

Valstar, M.F., M. Pantic, Z. Ambadar, and J.F. Cohn. 2006. Spontaneous vs. posed facial behavior:

Automatic analysis of brow actions. In Proceedings of the 8th International Conference on

Multimodal Interfaces (ICMI ’06), 162–170. ACM.

Vartapetiance, A., and L. Gillam. 2012. Quite simple approaches for authorship attribution,

intrinsic plagiarism detection and sexual predator identiﬁcation. In Forner et al. (2012).

Villatoro-Tello, E., A. Juárez-González, H.J. Escalante, M. Montes y Gómez, and L. Villaseñor

Pineda. 2012. A two-step approach for effective detection of misbehaving users in chats. In

Forner et al. (2012).

Vrij, A., ed. 2008. Detecting lies and deceit: Pitfalls and opportunities, 2nd ed. Chichester: Wiley.

Warkentin, D., M. Woodworth, J.T. Hancock, and N. Cormier. 2010. Warrants and deception in

computer mediated communication. In Proceedings of the 2010 ACM Conference on Computer

Supported Cooperative Work (CSCW), ed. K.I. Quinn, C. Gutwin, and J.C. Tang, 9–12. ACM.

Wolz, U., J. Palme, P. Anderson, Z. Chen, J. Dunne, G. Karlsson, A. Laribi, S. Männikkö,

R. Spielvogel, and H. Walker. 1997. Computer-mediated communication in collaborative

educational settings. In The Supplemental Proceedings of the Conference on Integrating

Technology into Computer Science Education: Working Group Reports and Supplemental

Proceedings (ITiCSE-WGR ’97), 51–69. ACM.

World Health Organization. 2012. International Statistical Classiﬁcation of Diseases and Related

Health Problems (ICD-10). Canadian Institute for Health Information.

Wu, G., D. Greene, and P. Cunningham. 2010. Merging multiple criteria to identify suspicious

reviews. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys

’10), 241–244. ACM.

Xie, S., G. Wang, S. Lin, and P.S. Yu. 2012. Review spam detection via temporal pattern discovery.

In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining (KDD ’12), 823–831. ACM.

Zhang, Z., V. Singh, T.E. Slowe, S. Tulyakov, and V. Govindaraju. 2007. Real-time automatic

deceit detection from involuntary facial expressions. In IEEE Conference on Computer Vision

and Pattern Recognition, 2007, 1–6.

Zhu, Z., P. Tsiamyrtzis, and I. Pavlidis. 2007. Forehead thermal signature extraction in lie detection.

In 29th Annual International Conference of the IEEE Engineering in Medicine and Biology

Society, 2007, 243–246.

Zuckerman, M., B.M. DePaulo, and R. Rosenthal. 1981. Verbal and nonverbal communication of

deception. Advances in Experimental Social Psychology 14: 1–59.

Chapter 9

Concept-Level Sentiment Analysis with

SenticNet

Federica Bisio, Claudia Meda, Paolo Gastaldo, Rodolfo Zunino,

and Erik Cambria

Abstract SenticNet is a publicly available resource for opinion mining that

exploits AI, linguistics, and psychology to infer the polarity associated with

commonsense concepts and encode this in a semantic-aware representation. In

particular, SenticNet uses dimensionality reduction to calculate the affective valence

of multi-word expressions and, hence, represent it in a machine-accessible and

machine-processable format. This chapter presents an overview of the most recent

sentic computing tools and techniques, with particular focus on applications in the

context of big social data analysis.

Keywords SenticNet • Sentic computing • Concept-level sentiment analysis

• Big social data analysis

9.1 Introduction

Sentic computing (Cambria and Hussain 2015) is a multi-disciplinary approach

to sentiment analysis that exploits both computer and social sciences to better

recognize, interpret, and process opinions and sentiments over the Web. The

approach speciﬁcally brings together lessons from both affective computing and

commonsense computing because, in the ﬁeld of opinion mining, not only com-

monsense knowledge, but also emotional knowledge is important to grasp both the

cognitive and affective information (termed semantics and sentics) associated with

natural language opinions and sentiments.

F. Bisio () • C. Meda • P. Gastaldo • R. Zunino

DITEN, University of Genoa, Via Opera Pia 11A, Genova, I-16145, Italy

e-mail: federica.bisio@edu.unige.it;claudia.meda@edu.unige.it;paolo.gastaldo@unige.it;

rodolfo.zunino@unige.it

E. Cambria

School of Computer Science and Engineering, Nanyang Technological University, 639798,

Singapore, Singapore

e-mail: cambria@ntu.edu.sg

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_9

173

174 F. Bisio et al.

During most of the last century, research on emotions was conducted by

philosophers and psychologists, whose work was based on a small set of emotion

theories that continue to underpin research in this area. The ﬁrst researchers to

try linking text to emotions were actually social psychologists and anthropologists

who tried to ﬁnd similarities on how people from different cultures communicate

(Osgood et al. 1975). This research was also triggered by a dissatisfaction with the

dominant cognitive view centered around humans as ‘information processors’ Lutz

and White (1986).

Later on, in the 1980s, researchers such as Turkle (1984) began to speculate about

how computers might be used to study emotions. Systematic research programs

along this front began to emerge in the early 1990s. For example, Scherer (1993)

implemented a computational model of emotion as an expert system. A few years

later, Picard’s landmark book affective computing (Vesterinen 2001)prompteda

wave of interest among computer scientists and engineers looking for ways to

improve human-computer interfaces by coordinating emotions and cognition with

task constraints and demands. Picard described three types of affective computing

applications:

• Systems that detect the emotions of the user;

• Systems that express what a human would perceive as an emotion;

• Systems that actually ‘feel’ an emotion.

Although touching upon HCI and affective modeling, sentic computing primarily

focuses on affect detection from text. Affect detection is critical because an affect

sensitive interface can never respond to users’ affective states if it cannot sense their

affective states. Affect detection need not be perfect, but must be approximately on

target. Affect detection is, however, a very challenging problem because emotions

are constructs (i.e., conceptual quantities that cannot be directly measured) with

fuzzy boundaries and with substantial individual difference variations in expression

and experience. To overcome such a hurdle, sentic computing builds upon a

biologically inspired and psychologically-motivated affective categorization model

(Cambria et al. 2012) that can potentially describe the full range of emotional

experiences in terms of four independent but concomitant dimensions, whose

different levels of activation make up the total emotional state of the mind.

In sentic computing, whose term derives from the Latin “sentire” (root of

words such as sentiment and sentience) and sensus (intended both as capability of

feeling and as commonsense), the analysis of natural language is based on affective

ontologies and commonsense reasoning tools, which enable the analysis of text not

only at document, page, or paragraph level, but also at sentence and clause level. In

particular, sentic computing involves the use of AI and SemanticWeb techniques, for

knowledge representation and inference; mathematics, for carrying out tasks such as

graph mining and multi-dimensionality reduction; linguistics, for discourse analysis

and pragmatics; psychology, for cognitive and affective modeling; sociology, for

understanding social network dynamics and social inﬂuence; ﬁnally ethics, for

understanding related issues about the nature of mind and the creation of emotional

machines.

9 Concept-Level Sentiment Analysis with SenticNet 175

Sentic computing tackles the crucial issues of analysis of sentiments and feel-

ings by exploiting affective commonsense reasoning, i.e., the intrinsically human

capacity to interpret the cognitive and affective information associated with natural

language. In particular, sentic computing leverages on a commonsense knowledge

base built through crowdsourcing (Cambria et al. 2012). Commonsense is useful in

many different computer-science applications including data visualization (Cambria

et al. 2010), text recognition (Wang et al. 2013), and human-computer interaction

(Poria et al. 2016). In this context, commonsense is used to bridge the semantic gap

between word-level natural language data and the concept-level opinions conveyed

by these (Cambria et al. 2015).

To perform affective commonsense reasoning (Bisio et al. 2015), a knowledge

database is required for storing and extracting the semantic and affective infor-

mation associated with word and multi-word expressions. By applying semantic

multidimensional scaling (Cambria et al. 2015) on the matrix representation of this

knowledge base, we obtain SenticNet (Cambria et al. 2016), a RDF/XML repository

of natural language concepts speciﬁcally designed for sentiment analysis.

This chapter presents an overview of the most recent and advanced technologies

of sentic computing, with particular focus on the applications related to the

SenticNet framework. The main result consists in a review of the most interesting

methods employed to compare, classify and visualize affective information. The

chapter is organized as follows: Sect. 9.2 provides a description of SenticNet and

sentic computing techniques; Sect. 9.3 describes several applications which employ

sentic computing and the SenticNet framework; ﬁnally, Sect. 9.4 sets up conclusions

and ﬁnal remarks.

9.2 SenticNet

SenticNet is a publicly available resource for sentiment analysis that provides the

semantics and sentics associated with 30,000 natural language concepts by lever-

aging on an ensemble of graph mining and multi-dimensional scaling techniques

(Fig. 9.1).

The last release, SenticNet 4 (Cambria et al. 2016), exploits ‘energy ﬂows’ to

connect different parts of both common and commonsense knowledge representa-

tions to one another, unlike standard graph-mining and dimensionality-reduction

techniques. SenticNet 4, therefore, models semantics and sentics, that is, the

conceptual and affective information associated with multi-word natural language

expressions. To this aim, SenticNet 4 employs an energy-based knowledge repre-

sentation to provide the semantics and sentics associated with 30,000 concepts, thus

enabling a ﬁne-grained analysis of natural language opinions. SenticNet 4 contains

both unambiguous adjectives as standalone entries (like ‘good’ and ‘awful’) and

non-trivial multi-word expressions such as ‘small room’ and ‘cold bed’. This is due

to the fact that while unambiguous adjectives convey positive or negative polarities

176 F. Bisio et al.

Fig. 9.1 SenticNet construction framework

(whatever noun they are associated with), other adjectives are able to carry a speciﬁc

polarity only when coupled with certain nouns.

SenticNet 4 focuses on the use of ‘energy’ or information ﬂows to connect

various parts of common and commonsense knowledge representations to one

another. Each quantum of energy possesses a scalar magnitude, a valence (binary

positive/negative), and an edge history, deﬁned as a list of the edge labels that a

particular quantum of energy has traversed in the past. Essentially, common and

commonsense knowledge is broken down into ‘atoms’, thus allowing the fusing of

data from different knowledge bases without requiring any ontology alignment.

9.2.1 Knowledge Sources

SenticNet mainly leverages on the general commonsense knowledge extracted

from Open Mind Common Sense (OMCS), the affective knowledge coming from

WordNet-Affect (WNA) and the practical commonsense knowledge crowdsourced

from a game engine for commonsense knowledge acquisition (GECKA).

OMCS (Singh 2002) is a second-generation commonsense database. It differs

from previous attempts to build a commonsense database for the innovative way to

9 Concept-Level Sentiment Analysis with SenticNet 177

collect knowledge and represent it. Knowledge, in fact, is represented in natural

language, rather than using a formal logical structure, and information is not

hand-crafted by expert engineers but spontaneously inserted by online volunteers.

The reason why Lenat decided to develop an ad hoc language for Cyc (Lenat

and Guha 1989) is that vagueness and ambiguity pervade English and computer

reasoning systems generally require knowledge to be expressed accurately and

precisely. However, as expressed in the Society of Mind (Minsky 1986), ambiguity

is unavoidable when trying to represent the commonsense world.

WNA (Strapparava and Valitutti 2004) is an extension of WordNet Domains,

including a subset of synsets suitable to represent affective concepts correlated

with affective words. Similarly to the method used for domain labels, a number

of WordNet synsets is assigned to one or more affective labels (a-labels). In

particular, the affective concepts representing emotional state are individuated by

synsets marked with the a-label emotion. There are also other a-labels for those

concepts representing moods, situations eliciting emotions, or emotional responses.

The resource was extended with a set of additional a-labels (termed emotional

categories), hierarchically organized, in order to specialize synsets with a-label

emotion. The hierarchical structure of new a-labels was modeled on the WordNet

hyperonym relation.

GECKA (Cambria et al. 2015) implements a new game with a purpose (GWAP)

concept that aims to overcome the main drawbacks of traditional data-collecting

games by empowering users to create their own GWAPs and by mining knowledge

that is highly reusable and multi-purpose. In particular, GECKA allows users to

design compelling serious games for their peers to play and, while doing so, gather

commonsense knowledge useful for intelligent applications in any ﬁeld requiring

in-depth knowledge of the real world, including reasoning, perception and social

systems simulation. Besides allowing for the acquisition of knowledge from game

designers, GECKA enables players of the ﬁnished games to be educated in useful

ways, all while being entertained. The knowledge gained from GECKA is later

encoded in SenticNet in the form <concept-relationship-concept>. The use of

this natural language based (rather than logic-based) framework allows GECKA

players to conceptualize the world in their own terms, at an ideal level of semantic

abstraction. Players can work with knowledge exactly as they envision it, and

researchers can access data on the same level as players’ thoughts, greatly enhancing

the usefulness of the captured data.

9.2.2 SenticNet Structure

The aggregation of common and commonsense knowledge bases is designed as a

2-stage process in which different pieces of knowledge are ﬁrst translated into RDF

triples and then inserted into a graph. Considering as an example ‘Pablo Picasso

is an artist’, we obtain the RDF triple <Pablo Picasso-isA-artist>and, hence, the

entry Œartist - SUBSUME !Pablo Picasso.

178 F. Bisio et al.

Fig. 9.2 A sample concept in SenticNet

In this way, we obtain a shared representation for common and commonsense

knowledge, thus performing a conceptual decomposition of relation types, i.e.,

the unfolding of relation types that are usually opaque in natural-language-based

resources.

After low conﬁdence score trimming and duplicates removal, the resulting

semantic network (built out of about 25 million RDF statements) contains 2,693,200

nodes. Of these, 30,000 affect-driven concepts (that is, those concepts that are

most highly linked to emotion nodes) have been selected for the construction of

SenticNet 4 (Fig. 9.2).

SenticNet 4 conceptualizes the information as ‘energy’ and sets up pathways

upon which this energy may ﬂow between different semantic fragments. In this way,

complex concepts can be built upon simpler pieces by connecting them together via

energy ﬂows. Once an element is reached by a certain quantum of energy ﬂow, it is

included in a wider concept representation, thus enabling simple elements to deeply

affect larger conceptual connections. Such a representation is optimal for modeling

domains characterized by nuanced, interconnected semantics and sentics (including

most socially-oriented AI modeling domains).

Each quantum of energy possesses a scalar magnitude, a valence (binary

positive/negative), and an edge history, deﬁned as a list of the edge labels that a

particular quantum of energy has traversed in the past. These three elements describe

the semantics and sentics of the quantum of energy and they are extracted for each

concept of the semantic network.

In particular, the extraction of semantics and sentics is achieved through multiple

steps of spreading activation with respect to the nodes representing the activation

levels of the Hourglass of Emotions (Cambria et al. 2012), a brain-inspired model

for the representation and the analysis of human emotions.

9.2.3 The Hourglass of Emotions

The Hourglass of Emotions is an affective categorization model developed starting

from Plutchik’s studies on human emotions (Plutchik 2001). The main advantage

over other emotion categorization models is that it allows emotions to be decon-

9 Concept-Level Sentiment Analysis with SenticNet 179

Table 9.1 The sentic levels of the Hourglass model

Interval Pleasantness Attention Sensitivity Aptitude

[G(1), G(2/3)) Ecstasy Vigilance Rage Admiration

[G(2/3), G(1/3)) Joy Anticipation Anger Trust

[G(1/3), G(0)) Serenity Interest Annoyance Acceptance

(G(0), G(-1/3)] Pensiveness Distraction Apprehension Boredom

(G(-1/3), G(-2/3)] Sadness Surprise Fear Disgust

(G(-2/3), G(-1)] Grief Amazement Terror Loathing

structed into independent but concomitant affective dimensions, whose different

levels of activation make up the total emotional state of the mind. Such a modular

approach to emotion categorization allows different factors (or energy ﬂows) to be

concomitantly taken into account for the generation of an affective state.

The model can potentially synthesize the full range of emotional experiences

in terms of four affective dimensions, Pleasantness, Attention, Sensitivity, and

Aptitude, which determine the intensity of the expressed/perceived emotion as

aﬂoat 2Œ1; C1. Each affective dimension is characterized by six levels of

activation, termed ‘sentic levels’, which are also labeled as a set of 24 basic emotions

(six for each affective dimension) (Table 9.1). Previous works (Cambria et al. 2015)

already proved that a categorization model based on these four affective dimensions

is effective in the design of an emotion categorization architecture.

The transition between different emotional states is modeled, within the same

affective dimension, using the function G.x/D 1

p2 ex2=22, for its symmetric

inverted bell curve shape that quickly rises up towards the unit value. In particular,

the function models how valence or intensity of an affective dimension varies

according to different values of arousal or activation, spanning from null value

(emotional void) to the unit value (heightened emotionality). Mapping this space

of possible emotions leads to a hourglass shape (Fig. 9.3).

9.2.4 Sentic Patterns

Sentic patterns (Poria et al. 2015) are a novel paradigm for concept-level sentiment

analysis that blends computational intelligence, linguistics, and commonsense

computing in order to improve the accuracy of computationally expensive tasks such

as polarity detection from big social data. The algorithm assigns contextual polarity

to concepts in text and ﬂows this polarity through the dependency arcs in order to

assign a ﬁnal polarity label to each sentence. Analyzing how sentiment ﬂows from

concept to concept through dependency relations allows for a better understanding

of the contextual role of each concept in a text.

180 F. Bisio et al.

Fig. 9.3 The 3D model of the Hourglass of Emotions

The polarity detection algorithm employs SenticNet to retrieve the polarity

scores of concepts. The procedure can be considered as a tree painting algorithm

operating on the nodes and arcs of the syntactic dependency tree. For those words

or relations (concepts, or multiword expressions) for which the polarity can be

determined directly from the existing lexical resources, the algorithm assigns it

directly. Then, it gradually extends the labels to other arcs and nodes, with the

9 Concept-Level Sentiment Analysis with SenticNet 181

Fig. 9.4 Sentic patterns

necessary transformations determined by sentic pattern rules (Poria et al. 2014),

until it obtains the ﬁnal label for the root element, which is the desired output. The

extending of the polarity labels is termed the ﬂow of the sentiment.

The success of this rule-based algorithm crucially relies on the completeness of

the knowledge base used, in this case, SenticNet. Namely, for the concepts that are

absent in SenticNet, an ELM classiﬁer (Cambria et al. 2013; Huang et al. 2006)is

employed (Fig. 9.4).

9.3 Applications of the SenticNet Framework

SenticNet is freely available both as an API1and as a RDF/XML standalone

resource.2The SenticNet framework can be tried at SenticNet demo page.3More

advanced functionalities are available at SenticNet Ltd. website.4Besides many

companies using SenticNet services for tasks such as brand positioning, customer

relationship management, and social media marketing, there is a good number of

research works exploiting it for different sentiment analysis tasks. Xia et al. (2016),

1http://sentic.net/api

2http://sentic.net/downloads

3http://sentic.net/demo

4http://business.sentic.net

182 F. Bisio et al.

for example, used SenticNet for contextual concept polarity disambiguation. In their

approach, SenticNet was used as a baseline and contextual polarity was detected by

a Bayesian method.

Other works Poria et al. (2012,2014) focused on extending or enhancing

SenticNet. Poria et al. (2012), for example, developed a fuzzy based SVM semi-

supervised classiﬁer to assign emotion labels to the SenticNet concepts. Several

lexical and syntactic features as well as SenticNet based features were used to

train the semi-supervised model. Qazi et al. (2014) used SenticNet for improving

business intelligence from suggestive reviews. They built a supervised system where

sentiment speciﬁc features were grasped from SenticNet.

SenticNet can also be used for extracting concepts and discover domains from

sentences. This use of SenticNet was studied by Dragoni et al. (2014), who proposed

a fuzzy based framework which merges WordNet, ConceptNet and SenticNet to

extract key concepts from a sentence. iFeel (Araújo et al. 2014) is a system which

allows its users to create their own sentiment analysis framework by combing

SenticNet, SentiWordNet and other sentiment analysis methods.

SenticNet was adopted in the context of e-health to mine the opinions of patients

about their experience with healthcare providers and to compare these with ofﬁcial

ratings (Cambria et al. 2011). Some approaches (Wu et al. 2011) focused on

developing the multilingual concept level sentiment lexicon using the way SenticNet

was built.

SenticNet was also used to develop several supervised baseline methods (Xia

et al. 2016; Duthil et al. 2012; Gezici et al. 2013). Among other supervised

approaches using SenticNet, the work by Chenlo and Losada (2014) is notable.

They used SenticNet to extract bag of concepts and polarity features for subjectivity

and sentiment analysis tasks. Chung et al. (2014) used SenticNet concepts as

seeds and proposed a method of random walk in the ConceptNet to retrieve

more concepts along with polarity scores. Their method indeed aimed to expand

SenticNet containing 265,353 concepts. After expanding SenticNet they formed

Bag-of-Sentimental-Concepts features which is similar to Bag of Concepts features.

Each dimension in the feature vector represents a concept and each concept was

assigned a value by multiplying tf-idf and polarity value of the concept. SenticNet

has also been adopted for enhancing Twitter sentiment classiﬁcation accuracy. The

approach by Bravo-Marquez et al. (2014) used both SenticNet and SentiWordNet

to improve the baseline Twitter classiﬁcation system. SenticNet was also used for

informal short text message (SMS) classiﬁcation (Gezici et al. 2013) and within

a domain independent unsupervised sentiment analysis system termed Sentilo

(Recupero et al. 2014).

The SenticNet framework is optimized for binary polarity classiﬁcation on

sentences in formal English. However, the system can be applied also to document-

level sentiment classiﬁcation and micro-text analysis (as shown in the next two

sections, respectively).

9 Concept-Level Sentiment Analysis with SenticNet 183

9.3.1 Document-Level Sentiment Analysis

An example of how the SenticNet framework can be adapted to document-level

classiﬁcation is provided by Bisio et al. (2016), a work that aims to study and

identify the best similarity metric able to describe the sentiment distribution of

several types of books, establishing a different point of view on the interpretation of

feeling extraction: the classiﬁcation of documents based on an emotional distance.

In particular, Bisio et al. (2016) employed a text miner application (Meda

et al. 2015), in which the word ‘document’ is used to denote any source of

data able to carry information, e.g., text written in natural language, web pages,

images (Bisio et al. 2013). The tool normalizes input documents into an internal

representation and applies several metrics to compute distances between pair of

documents; the document-distance used takes into account a conventional content-

based similarity metric, a stylistic similarity criterion and a semantic representation

of the documents, in order to apply machine learning algorithms (Oneto et al. 2016)

for both clustering and classiﬁcation purposes.

After a pre-processing phase, in which language identiﬁcation, stemming and

stopword removal steps are carried out, a text document becomes a ‘docum object’,

deprived of useless information (e.g., articles, prepositions, punctuation, special

characters). At this level the ‘SenticNet semantic descriptor’ is applied.

The SenticNet framework allows one to retrieve four different sentiment expe-

riences associated with a speciﬁc word; then, the aim is the development of a

sentiment semantic descriptor made up of a vector of four affective dimensions

(Pleasantness, Attention, Sensitivity and Aptitude). Thus, the ‘SenticNet Semantic

Descriptor’ extracts the list of words that compose the document and submits each

single word to SenticNet. After the semantic descriptor step, the distance between

two document can be calculated.

In order to test the approach, Bisio et al. (2016) selected books between ﬁve

distinct literary genres and applied three different distance metrics (Manhattan,

Euclidean and Maximum norm). The experiments underline the fact that it is

possible to notice a similarity between different literary genres, because, from

an affective point of view, even though different novels may be set in different

environments, mentality and social constraints, they can still convey similar types

of feelings.

9.3.2 Micro-text Sentiment Analysis

Supervised learning classiﬁers often misclassify tweets containing conjunctions

like ‘but’ and conditionals like ‘if’, due to their special linguistic characteristics.

Moreover, tweets often contain misspelled words, slangs, URLs, elongations,

repeated punctuations, emoticons, abbreviations and hashtags. To tackle such

184 F. Bisio et al.

Fig. 9.5 Flowchart of the Twitter sentiment analysis system

challenges, the SenticNet framework can be adopted to enhance supervised learning

for polarity classiﬁcation (Chikersal et al. 2015). The general scheme of the system

is presented in Fig. 9.5. This system ﬁrst considers the number of positive and

negative emoticons of the tweet and the following rules are applied:

• If a tweet contains one or more positive emoticons and no negative emoticons, it

is labeled as positive.

• If a tweet contains one or more negative emoticons and no positive emoticons, it

is labeled as negative.

• If neither one of the two rules above can be applied, the tweet is labeled as

unknown.

If these emoticon-based rules label a tweet as positive or negative,thisis

considered the ﬁnal label outputted by the system. Otherwise, all tweets labeled

as unknown are passed into a supervised learning classiﬁer.

To this end, each tweet is represented as a feature vector of case-sensitive n-grams

(unigrams, bigrams, and trigrams). These n-grams are frequencies of sequences of 1,

2 or 3 contiguous tokens in a tweet. After handling negation, all tweets containing

the conjunction ‘but’ and the conditionals ‘if’, ‘unless’, ‘until’, and ‘in case’ are

considered, and speciﬁc linguistic rules are formulated in order to enable removal

of irrelevant or oppositely oriented n-grams from the tweet’s feature vector.

9 Concept-Level Sentiment Analysis with SenticNet 185

Finally, a SVM classiﬁer is trained in order to obtain the tweet’s label. For tweets

with an absolute decision score or conﬁdence below 0.5, the class labels assigned by

SVM are discarded and an unsupervised classiﬁer is employed. The rules used by

this classiﬁer are based on a linguistic analysis of tweets, and leverage on sentiment

analysis resources that contain polarity values of words and phrases; the primarily

resource used for this purpose is SenticNet.

This unsupervised classiﬁcation process works as follows:

1. Single-word and multi-word concepts are extracted from the tweets in order to

fetch their polarities from SenticNet.

2. If a single-word concept is not found in SenticNet, it is queried in SentiWordNet

(Esuli and Sebastiani 2006), and if it is not found in SentiWordNet, it is searched

in the list of positive and negative words from the Bing Liu lexicon (Liu et al.

2005).

3. Based on the number of positive and negative concepts, and the most polar value

occurring in the tweet, the following rules are applied:

• If the number of positive concepts is greater than the number of negative

concepts and the most polar value occurring in the tweet is greater than or

equal to 0.6, the tweet is labeled as positive.

• If the number of negative concepts is greater than the number of positive

concepts and the most polar value occurring in the tweet is less than or equal

to 0:6, the tweet is labeled as negative.

• If neither one of the two rules stated above can be applied, the tweet is labeled

as unknown by the rule-based classiﬁer, and the SVM’s low conﬁdence

prediction is taken as the ﬁnal output of the system.

9.4 Conclusion

With the advent of the Social Web, the way people express their views and opinions

has dramatically changed. Reviews, forums and blogs now represent huge sources

of information with many practical applications. However, ﬁnding opinion sources

and monitoring them can be a formidable task because there are a large number of

diverse sources and each source may also have a huge volume of opinionated text.

Thus, automated opinion discovery and summarization systems are needed.

Due to its tremendous value for practical applications, there has been an

explosive growth of sentiment analysis techniques in both research in academia and

applications in the industry. However, most of the existing approaches still rely on

syntactical structure of text, which is far from the way human mind processes natural

language.

This chapter showed how sentic computing techniques can be employed for the

development of several sentiment analysis tasks. In order to assess the capability of

sentic computing to tackle real-world NLP tasks, we considered several applications

in different domains and different text formats.

186 F. Bisio et al.

All such applications demonstrate how SenticNet represents a useful resource for

the analysis of social data, as it goes beyond the use of domain-dependent keywords

by using an ensemble of commonsense computing tools and linguistics.

References

Araújo, M., P. Gonçalves, M. Cha, and F. Benevenuto. 2014. iFeel: A system that compares and

combines sentiment analysis methods. In WWW, 75–78.

Bisio, F., P. Gastaldo, C. Peretti, R. Zunino, and E. Cambria. 2013. Data intensive review mining

for sentiment classiﬁcation across heterogeneous domains. In Advances in Social Networks

Analysis and Mining (ASONAM), 1061–1067. IEEE.

Bisio, F., P. Gastaldo, R. Zunino, and E. Cambria. 2015. A learning scheme based on similarity

functions for affective common-sense reasoning. In IJCNN, 2476–2481.

Bisio, F., C. Meda, P. Gastaldo, R. Zunino, and E. Cambria. 2016. Sentiment-oriented information

retrieval: Affective analysis of documents based on the senticnet framework. In Sentiment

analysis and ontology engineering, Studies in Computational Intelligence, eds. W. Pedrycz

and S.-M. Chen, vol. 639, 175–195.

Bravo-Marquez, F., M. Mendoza, and B. Poblete. 2014. Meta-level sentiment models for big social

data analysis. Knowledge-Based Systems 69: 86–99.

Cambria, E., J. Fu, F. Bisio, and S. Poria. 2015. AffectiveSpace 2: Enabling affective intuition for

concept-level sentiment analysis. In AAAI, Austin, 508–514.

Cambria, E., P. Gastaldo, F. Bisio, and R. Zunino. 2015. An ELM-based model for affective

analogical reasoning. Neurocomputing 149: 443–455.

Cambria, E., G.B. Huang, et al. 2013. Extreme learning machines. IEEE Intelligent Systems 28(6):

30–59.

Cambria, E., and A. Hussain. 2015. Sentic computing: A common-sense-based framework for

concept-level sentiment analysis. Cham: Springer.

Cambria, E., A. Hussain, and C. Eckl. 2011. Bridging the gap between structured and unstructured

health-care data through semantics and sentics. In WebSci, Koblenz.

Cambria, E., A. Hussain, C. Havasi, and C. Eckl. 2010. SenticSpace: Visualizing opinions

and sentiments in a multi-dimensional vector space. In Knowledge-based and intelligent

information and engineering systems, ed. R. Setchi, I. Jordanov, R. Howlett, L. Jain, Lecture

Notes in Artiﬁcial Intelligence, vol. 6279, 385–393. Berlin: Springer.

Cambria, E., A. Livingstone, and A. Hussain. 2012. The hourglass of emotions. In Cognitive

behavioral systems, ed. A. Esposito, A. Vinciarelli, R. Hoffmann, V. Muller, Lecture Notes

in Computer Science, vol. 7403, 144–157. Berlin/Heidelberg: Springer.

Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for

sentiment analysis based on conceptual primitives. In: COLING, 2666–2677.

Cambria, E., S. Poria, F. Bisio, R. Bajpai, and I. Chaturvedi. 2015. The CLSA model: A novel

framework for concept-level sentiment analysis. In LNCS, vol. 9042, 3–22. Springer.

Cambria, E., D. Rajagopal, K. Kwok, and J. Sepulveda. 2015. GECKA: Game engine for

commonsense knowledge acquisition. In FLAIRS, 282–287.

Cambria, E., Y. Xia, and A. Hussain. 2012. Affective common sense knowledge acquisition for

sentiment analysis. In LREC, Istanbul, 3580–3585.

Chenlo, J.M., and D.E. Losada. 2014. An empirical study of sentence features for subjectivity and

polarity classiﬁcation. Information Sciences 280: 275–288.

Chikersal, P., S. Poria, E. Cambria, A. Gelbukh, and C.-E. Siong. 2015. Modelling public sentiment

in Twitter: Using linguistic patterns to enhance supervised learning. In CICLing, 49–65.

9 Concept-Level Sentiment Analysis with SenticNet 187

Chung, J.K.C., C.E. Wu, and R.T.H. Tsai. 2014. Improve polarity detection of online reviews

with bag-of-sentimental-concepts. In Proceedings of the 11th ESWC, Semantic web evaluation

challenge. Crete: Springer.

Dragoni, M., A.G. Tettamanzi, and C. da Costa Pereira. 2014. A fuzzy system for concept-level

sentiment analysis. In Semantic web evaluation challenge, 21–27. Springer.

Duthil, B., F. Trousset, G. Dray, J. Montmain, and P. Poncelet. 2012. Opinion extraction applied to

criteria. In Database and expert systems applications, 489–496. Springer.

Esuli, A., and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion

mining. In LREC

Gezici, G., R. Dehkharghani, B. Yanikoglu, D. Tapucu, and Y. Saygin. 2013. Su-sentilab: A

classiﬁcation system for sentiment analysis in twitter. In International Workshop on Semantic

Evaluation, 471–477.

Huang, G.B., Q.Y. Zhu, and C.K. Siew. 2006. Extreme learning machine: Theory and applications.

Neurocomputing 70(1): 489–501.

Huang, G.B., E. Cambria, K.A. Toh, B. Widrow, and Z. Xu. 2015. New trends of learning in

computational intelligence. IEEE Computational Intelligence Magazine 10(2):16–17.

Lenat, D., and R. Guha. 1989. Building large knowledge-based systems: Representation and

inference in the Cyc project. Boston: Addison-Wesley.

Liu, B., M. Hu, and J. Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the

web. In WWW, Chiba.

Lutz, C., and G. White. 1986. The anthropology of emotions. Annual Review of Anthropology 15:

405–436.

Meda, C., F. Bisio, P. Gastaldo, R. Zunino, R. Surlinelli, E. Scillia, and A.V. Ottaviano. 2015.

Content-adaptive analysis and ﬁltering of microblogs trafﬁc for event-monitoring applications.

In Proceedings of the 18th Asia Paciﬁc Symposium on Intelligent and Evolutionary Systems,

vol. 1, 155–170. Springer.

Minsky, M. 1986. The society of mind. New York: Simon and Schuster.

Oneto, L., S. Ridella, and D. Anguita. 2016. Tikhonov Ivanov and Morozov regularization for

support vector machine learning. Machine Learning 103: 103–136

Osgood, C., W. May, and M. Miron. 1975. Cross-cultural universals of affective meaning. Urbana:

University of Illinois Press.

Plutchik, R. 2001. The nature of emotions. American Scientist 89(4): 344–350.

Poria, S., E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data ﬂow analysis

by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):

26–36.

Poria, S., I. Chaturvedi, E. Cambria, and A. Hussain. 2016. Convolutional MKL based multimodal

emotion recognition and sentiment analysis. In ICDM, 439–448.

Poria, S., E. Cambria, G. Winterstein, and G.B. Huang. 2014. Sentic patterns: Dependency-based

rules for concept-level sentiment analysis. Knowledge-Based Systems 69: 45–63.

Poria, S., A. Gelbukh, E. Cambria, D. Das, and S. Bandyopadhyay. 2012. Enriching SenticNet

polarity scores through semi-supervised fuzzy clustering. In IEEE ICDM, Brussels, 709–716.

Poria, S., A. Gelbukh, E. Cambria, A. Hussain, and G.B. Huang. 2014. EmoSenticSpace: A novel

framework for affective common-sense reasoning. Knowledge-Based Systems 69: 108–123.

Qazi, A., R.G. Raj, M. Tahir, E. Cambria, and K.B.S. Syed. 2014. Enhancing business intelligence

by means of suggestive reviews. The Scientiﬁc World Journal 2014: 1–11.

Recupero, D.R., V. Presutti, S. Consoli, A. Gangemi, and A. Nuzzolese. 2014. Sentilo: Frame-

based sentiment analysis. Cognitive Computation 7(2): 211–225.

Scherer, K. 1993. Studying the emotion-antecedent appraisal process: An expert system approach.

Cognition and Emotion 7: 325–355.

Singh, P. 2002. The open mind common sense project. KurzweilAI.net.

Strapparava, C., and A. Valitutti. 2004. WordNet-Affect: An affective extension of WordNet. In

LREC, Lisbon, 1083–1086.

Turkle, S. 1984. The second self: Computers and the human spirit. New York: Simon & Schuster.

Vesterinen, E. 2001. Affective computing. In Digital media research seminar. Helsinki.

188 F. Bisio et al.

Wang, Q., E. Cambria, C. Liu, and A. Hussain. 2013. Common sense knowledge for handwritten

Chinese recognition. Cognitive Computation 5(2): 234–242.

Wu, H.H., A.C.R. Tsai, R.T.H. Tsai, and J.Y.J. Hsu. 2011. Sentiment value propagation for

an integral sentiment dictionary based on commonsense knowledge. In 2011 International

Conference on Technologies and Applications of Artiﬁcial Intelligence (TAAI), 75–81. IEEE.

Xia, R., F. Xu, J. Yu, Y. Qi, and E. Cambria. 2016. Polarity shift detection, elimination and

ensemble: A three-stage model for document-level sentiment analysis. Information Processing

and Management 52: 36–45.

Index

Absolute sentiment, 89

Abstractive summarization techniques, 137,

150

Affect detection, 174

Affect, emotion, and mood analysis

basic emotions, 27

cause of, 32–33

deﬁnition, 26, 32

EARL, 29

facial expressions, gestures and postures,

grammatical and lexical expressions, 30–31

HUMAINE, 29

language expressions, 25

mental states, 25

optimism, 28

primary, secondary and tertiary emotions,

27–28

quadruple/quintuple representation, 31

rational opinions, 31

speaker emotion, 30

surprise and shock, 30

Affective computing

aspect extraction, 4

big social data analysis, 3

business intelligence, 2

commercial and academic tools, 2

complementary to binary sentiment

classiﬁcation, 4

concept extraction, 4

COTS tools, 2

emotional intelligence, 1

emotion recognition, 4

emotions, 1

end-user devices webcams installation, 5

entity recognition, 5

government intelligence applications, 2

Hidden-Markov models, 5

hybrid approaches, 6–7

jumping NLP curves, 7–8

knowledge-based techniques, 5–6

multimodal fusion, 5

next-generation sentiment mining systems,

off-topic passages, 4

on-topic documents, 4

personality recognition, 5

polarity classiﬁcation, 4

sarcasm detection, 5

sentiment mining techniques, 2

speech-to-text recognition, 5

statistical methods, 6

sub-component technology, 2

subjectivity detection, 4

topic detection, 4

types, 174

user proﬁling, 5

Affective dimension, 100, 179, 183

Affective norms for English words (ANEW),

93, 99

AFINN, 89

a-label emotion, 177

Amazon Mechanical Turk, 91, 93, 129

Amazon reviews dataset, 98

ANEW. See Affective norms for English words

(ANEW)

Annotation, 86

Applications of sentiment analysis, 76–77

Aptitude, 179, 183

E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,

Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8

189

190 Index

Arabic sentiment analysis, 76

Arousal dimension, 88

Artiﬁcial intelligence (AI), 1, 47, 174, 178

Aspect-based comparative summarization, 126

Aspect-level sentiment analysis, 120–122,

125–128

Attention, 179, 183

Automated text summarization

extractive summarization, 138–140

scoring function, 138, 140–141

Bag-of-sentimental-concepts, 182

Bag of word (BOW) model, 7, 72, 110, 157,

161, 163

Best–Worst Scaling, 69

Bigrams, 45, 63, 70, 110, 165, 184

Big social data analysis, 3, 179

Blogs, 96

Boolean attribute, 87

Brow movement, 156

C-Feel-It, 101

Chain rule, 109–110

Chat conversations, 62, 142, 158–163

Chinese emotion lexicon, 94

Classiﬁer creation, 90

Clustering process, 20, 45, 115, 120, 141–144,

149, 159, 183

CMC. See Computer-mediated communication

(CMC)

CMU TweetNLP tool, 144

Cognitive processes, 93

Collaborative ﬁltering solutions, 128

Commercial off-the-shelf (COTS) tools, 2

Commonsense knowledge, 8, 21, 35, 37,

173–178

Comparative opinion, 33–34

Computational inference approaches, 54

Computer-mediated communication (CMC),

155–158, 162

Concept-level sentiment analysis, 173–186

ConceptNet, 6, 7, 182

Conditional probability, 108, 110, 111

Conditional random ﬁeld models (CRFs), 145,

146

Congressional speech dataset, 98

contSim, 144

Conversational summarization, 148–149

Corpus, 6, 65, 66, 76, 77, 91, 94, 98–101,

111–115, 117, 118, 120, 126,

137–141, 148, 149, 157–159, 166,

167

COTS tools. See Commercial off-the-shelf

(COTS) tools

CRFs. See Conditional Random Field models

(CRFs)

Cross-lingual sentiment resources, 99

Crowdsourcing, 45, 92, 93, 95, 165, 175

Cyber-bullying, 77

Cyberpedophile, 160, 162, 163

Cyc, 177

Darmstadt corpus, 98

Data visualization, 175

Decay topic model, 144

Deception detection

accuracy rates, 156

brow movement, 156

CMC, 155–158, 162

computer vision methods, 156

DECOUR corpus, 158–159

facial micro-expressions, 156

linear classiﬁer, 156

linguistic patterns, 157

LIWC, 157

Naïve Bayes classiﬁer, 157, 161, 167

non-verbal behavior, liar, 156

online sexual predator, 159–163

opinion spam, 164–167

SVM classiﬁer, 156

text messaging, 158

verbal component, 157

videos, 156–157

voice tone, 156

warrants, 158

DECOUR corpus, 158, 159

Degree adverbs, 71

Direct opinion, 33

Dirichlet hyperparameters, 101–102

Dirichlet process mixture model, 126

Discourse-level annotated datasets, 98

Discriminative models, 108

Distant supervision, 97

D-NEEDS, 47

Document-level sentiment analysis, 183

Downstream models, 116, 120–123

Downtoner, 91

ELM classiﬁer, 181

EM algorithm, 117–118

Index 191

Emo-Lexicon, 93–94

Emotional sentiments, 15–16

Emotion annotation and representation

language (EARL), 29

Emotions. See also Affect, emotion, and mood

analysis

Hourglass model, 178–180

lexicons, 92–95

NRC lexicon, 68

positive/negative, 184

structures, 88

theories, 174

Energy ﬂow, 175, 176, 178, 179

Entity ranking, 128, 130

Event summarization, 143–144

Explanativeness, of sentence, 147

Extractive summarization techniques, 136–141

Facebook, 54

Fact-implied opinion, 34–36

Fake reviews, 164, 166

Feeling, 26

Figurative expressions, 72–73

First-person opinion, 36

FrameNet frame, 66–67

F-score, 65

Game engine for commonsense knowledge

acquisition (GECKA), 176, 177

Game with a purpose (GWAP), 177

General Inquirer (GI), 89, 91

Generative models

approaches of, 107–108

categorization of, 115–116

downstream models of, 120–123

entity ranking, 128

language models, 109–111

opinion holders latent preferences, 126–127

pattern discovery, 125–126

probabilistic topic models, 111–115

recommendation, 128

sentiment annotation, 125–126

sentiment lexicon construction, 123–125

social media analysis, 128

social network, 128

topic-speciﬁc sentiment summarization,

126

upstream models of, 117–120

Web resources, 129

GI. See General Inquirer (GI)

Google Plus, 54

GROWTH NEEDS, 47

HAL model, 148, 149

Hidden-Markov models, 5

Hourglass of Emotions, 178–180

Human-computer interaction, 174, 175

Human-machine interaction network on

emotion (HUMAINE), 29

Hybrid approaches, 6–7

IBM, 2

iFeel, 182

Implicit sentiment, 89

Indirect opinion, 33

Information processors, 174

Integer Linear Programming (ILP), 139

Intensiﬁers, 71–72, 91

Java-based package, 129

J48 decision tree, 161

JMARS model, 128

JST model, 118–119, 125

Kernel methods, 157

Knowledge-based techniques, 5–6

Labels

corpus, target language, 100

dimensions, 88

stand-alone labels, 87–88

structures, 88

Latent aspect rating analysis (LARA), 122–125

Latent Dirichlet Allocation, 109, 111, 114,

121, 166

Latent dirichletete allocation (LDA) model, 45,

101, 111–112, 114–115, 119–122,

125, 129, 148

Latent semantic indexing (LSI), 111

Latent variables, 108, 113

LDA. See Latent dirichletete allocation (LDA)

model

Learning based approaches, 108

192 Index

Lexicons. See also Term-Sentiment

Association

emotion lexicons, 92–95

MPQA Subjectivity Lexicon, 68

NRC Emotion Lexicon, 68

sentiment lexicons, 89–92

lexSim, 144

Lie detection. See Deception detection

Linguistic Inquiry and Word Count (LIWC)

tool, 92–93, 157

Linguistic peculiarity, 94

Linguistic processes, 93

LinkedIn, 54

LIWC. See Linguistic Inquiry and Word Count

(LIWC) tool

Location CentricWord Cooccurrence, 143

Logistic regression model, 122

LSI. See Latent semantic indexing (LSI)

Luminoso, 2

Machine learning (ML)-based classiﬁcation

systems, 102

Machine translation (MT) system, 99–100

MALLET, 129

Manual labelling, 96–97

Markov assumption, 110

Markov model, 125–126

MAS. See Multi-Aspect Sentiment (MAS)

model

Maslow’s hierarchy of needs, 46

MaxDiff questions, 68–69

Maximum coverage problem with knapsack

constraint (MCKP), 139

Maximum Entropy (ME) model, 119–120

MedLDA model, 125

Meta-opinions, 36–37

Microblogging sites, 66, 142

Micro-text sentiment analysis, 183–185

MLSLDA model, 121

Modals, 72

Movie review datasets, 98

MPQA Opinion Corpus, 99

MPQA Subjectivity Lexicon, 68

MT system. See Machine translation (MT)

system

Multi-Aspect Sentiment (MAS) model,

121–122

Multilingual sentiment analysis, 75–76

Multimodal sentiment analysis, 5

Multi-word concepts, 5, 7, 69, 70, 91, 175, 185

Naïve Bayes classiﬁer, 157, 161, 167

Named entity recognition (NER), 20, 142

Natural Language Generation techniques, 137

Natural language processing (NLP), 7, 8, 17,

48, 54, 145, 160, 185

Negative emotion, 87

Next-generation sentiment mining systems, 8

N-gram language models, 109–111, 159, 166,

167, 184

Niek Sanders, 98

NLP. See Natural language processing (NLP)

Non-ﬁrst-person opinion, 36

Non-personal fact-implied opinion, 35–36

NRC-Canada system, 72

NRC Emotion Lexicon, 68

Ofﬂine interactions, 136

Online reviews, 17, 45, 164

Online sexual predator, 159–163. See also

Pedophilia

Open Mind Common Sense (OMCS), 176–177

Opinion analysis

actions and entities, 47–48

aspect-based sentiment analysis, 17, 24–25

aspect category, 20

aspect expression, 20

aspect extraction, 21

aspect-oriented star rating systems, 45

comparative, 14, 33–34

complex sentiment expressions, 44

deciding functional elements, 51–52

deﬁnition, 12–14

emotional sentiments, 15–16

entity, 14–15

entity-based sentiment analysis, 18

entity category, 20

entity expression, 20

explicit aspect expression, 20

fact-implied, 34–36

feature-based sentiment analysis, 17

ﬁrst-person, 36

goals and expectations, 42–44

identiﬁcation/target extraction approaches,

implicit aspect expressions, 20–21

joint model, 52

large-scale sentiment/mood analysis, 45–46

Maslow’s hierarchy of needs, 46–47

meta-opinions, 36–37

Index 193

model of entity, 21

model of opinion document, 21–23

models and frameworks, 41–42

NER, 20

non-ﬁrst-person, 36

objective and tasks, 19–20

opinion holders, 53–54

opinion spam detection, 45–46

opinion summarization, 45

picture quality, 14

quintuple, 19

rational sentiment, 15

reason and qualiﬁer, 18–19

regular, 14, 33

restaurant reviews (examples), 50–51

sentiment label, 44

sentiment score, 52

sentiment target, 14

sentiment text generation, 46

separate model, 52

sketch algorithm, 49–50

sources, 21

standard techniques, 49

subjective, 34

target, 14

trade-offs, life, 44

unigram/bigram feature-based SVM

classiﬁer, 45

Opinion holders, 53–54, 126–127

Opinion spam, 164–167

Opinion summaries, 23–25, 45, 145, 146

Optimism, 28

Oracle, 2

Paradigmatic similarity, 94

Parts of speech tagging (POS), 142, 145, 161

Pattern discovery, 125–126

Pedophilia, 159–160

Personal fact-implied opinion, 34

Pleasantness, 179, 183

pLSA. See Probabilistic latent semantic

analysis (pLSA)

pLSI. See Probabilistic latent semantic

indexing (pLSI)

pLSI-based generative model, 128

Polarity detection algorithm, 180

The Political Debate Corpus, 98

Pornography, 160

Posed/fake facial expressions, 156, 157

Positive emotion, 87

Probabilistic latent semantic analysis (pLSA),

112

Probabilistic latent semantic indexing (pLSI),

112–114

Probabilistic topic models, 111–115

Probability, chain rule of, 109–110

Props, 47

Pseudo-parallel data, 100

Psychological processes, 93

PU-learning model, 165, 166

Purchase decisions, 45, 123, 164

Putin vs. Barack Obama (percentage of tweets),

Rational sentiment, 15

RDF/XML, 175, 181

Recurrent neural network (RNN), 65

Regression model, 126

Regular opinion, 33

Relative sentiment, 89

Reverse-JST model, 121

Romanian subjectivity lexicon, 75

Rule-based approaches, 53, 107

Rule-based SA system, 101

Rule-based system, 66

SAS, 2

SA system. See Sentiment analysis (SA)

system

ScanSAM sheet, 93

Scoring function, of canditate, 140–141

Search and summarize framework, 144

Seed set selection, 90

Semantic composition

degree adverbs, 71

English and Arabic, 70

ﬁgurative expressions, 72–73

intensiﬁers, 71–72

Mikolov’s word2vec tool, 70

modals, 72

multi-word linguistic unit, 70

negated expressions, 70–71

NRC-Canada system, 72

Unitn Severyn and Moschitti, 72

Semantic-role based sentiment questionnaire,

74–75

SemanticWeb techniques, 174

SemEval, 96, 98

SemEval-2014, 65, 73

SemEval-2016, 65

Semi-supervised approach, 44, 49, 69, 92, 94,

95, 166, 182

194 Index

Sensitivity, 179, 183

Sentence-level annotated datasets, 98

Sentence-level valence classiﬁcation systems,

Sentic computing. See also SenticNet

AI, 174, 178

commonsense reasoning, 175

deﬁnition, 173

semanticweb techniques, 174

Sentic levels, 179

SenticNet, 2, 6, 69, 94–95. See also Sentic

computing

construction framework, 176

demo page, 181

document-level sentiment analysis, 183

energy/information ﬂows, 175–176

GECKA, 176, 177

Hourglass model, 178–180

micro-text sentiment analysis, 183–185

OMCS, 176–177

patterns, 179–181

semantic descriptor, 183

services, 181

structure, 177–178

uses, 181

WNA, 176, 177

SenticNet 4, 94, 95, 175–176, 178

Sentiment analysis

affect, emotion, and mood, 67

basic emotions, 27

cause of, 32–33

deﬁnition, 26, 32

EARL, 29

facial expressions, gestures and

postures, 25

grammatical and lexical expressions,

30–31

HUMAINE, 29

language expressions, 25

mental states, 25

optimism, 28

primary, secondary and tertiary

emotions, 27–28

quadruple or quintuple representation,

rational opinions, 31

speaker emotion, 30

surprise and shock, 30

affective computing. See (Affective

computing)

challenges

applications of, 76–77

multilingual sentiment analysis, 75–76

sentiment annotation, 73–75

entity aspects, 64

FrameNet frame, 66–67

generative models. See (Generative models)

opinion

actions and entities, 47–48

aspect-based sentiment analysis, 17,

24–25

aspect category, 20

aspect expression, 20

aspect extraction, 21

aspect-oriented star rating systems, 45

comparative, 14, 33–34

complex sentiment expressions, 44

deciding functional elements, 51–52

deﬁnition, 12–14

emotional sentiments, 15–16

entity, 14–15

entity-based sentiment analysis, 18

entity category, 20

entity expression, 20

explicit aspect expression, 20

fact-implied, 34–36

feature-based sentiment analysis, 17

ﬁrst-person, 36

goals and expectations, 42–44

identiﬁcation/target extraction

approaches, 45

implicit aspect expressions, 20–21

joint model, 52

large-scale sentiment/mood analysis,

45–46

Maslow’s hierarchy of needs, 46–47

meta-opinions, 36–37

model of entity, 21

model of opinion document, 21–23

models and frameworks, 41–42

NER, 20

non-ﬁrst-person, 36

objective and tasks, 19–20

opinion holders, 53–54

opinion spam detection, 45–46

opinion summarization, 45

picture quality, 14

quintuple, 19

rational sentiment, 15

reason and qualiﬁer, 18–19

regular, 14, 33

restaurant reviews (examples), 50–51

sentiment label, 44

sentiment score, 52

sentiment target, 14

sentiment text generation, 46

separate model, 52

sketch algorithm, 49–50

Index 195

sources, 21

standard techniques, 49

subjective, 34

target, 14

trade-offs, life, 44

unigram/bigram feature-based SVM

classiﬁer, 45

semantic composition

degree adverbs, 71

English and Arabic, 70

ﬁgurative expressions, 72–73

intensiﬁers, 71–72

Mikolov’s word2vec tool, 70

modals, 72

multi-word linguistic unit, 70

negated expressions, 70–71

NRC-Canada system, 72

Unitn Severyn and Moschitti, 72

SenticNet. See (SenticNet)

stance detection, 65–66

Term-Sentiment Association automatic

lexicons, 69–70

Term-Sentiment Association manual

lexicons

annotation scheme, 68

Best–Worst Scaling, 69

General Inquirer, 68

MaxDiff questions, 68–69

MPQA Subjectivity Lexicon, 68

NRC Emotion Lexicon, 68

text granularities, 62–63

writer, reader and other entities, 64

Sentiment analysis (SA) system, 86, 99–101,

103

Sentiment-annotated datasets, 95–99

Sentiment annotation, 73–75, 86–87

in generative models, 125–126

Sentiment composition lexicons (SCLs), 70

Sentiment intensity, 16

Sentiment lexicon construction, 123–125

Sentiment lexicons, 86, 89–92

Sentiment orientation, 16

Sentiment Orientation CALculator (SO-CAL)

system, 90–91

Sentiment rating, 16–17

Sentiment resources

applications of, 101–102

emotion lexicons, 92–95

labels, 86–88

language gap, bridging, 99–100

sentiment-annotated datasets, 95–99

sentiment lexicons, 89–92

Sentiment score, 52

Sentiment Treebank, 91–92

SentiWordNet, 6, 69, 89–90, 92, 94, 99, 124,

146, 182, 185

Short text message (SMS), 62, 72, 182

Single-word concept, 7, 62, 183, 185

Smoothing concept, 111

SO-CAL system. See Sentiment Orientation

CALculator (SO-CAL) system

Social media analysis, 128

Social media summarization

approaches, 142–143

automated text summarization

abstractive summarization, 137, 150

extractive summarization, 138–140

scoring function, 140–141

conversational, 148–149

events, 143–144

expressiveness, 136

future aspects, 149–150

information extraction challenges, 142

sentiment analysis, 145–147

text summarization need, 136–137

Social networking websites, 96, 128, 136, 142,

158

Society of Mind, 177

Source language, 99–100

Speaking processes, 93

Spontaneous facial expressions, 26, 156, 157

Stance detection, 65–66

Stand-alone labels, 87–88

Stanford Natural Language Processing group,

129

Stanford Network Analysis Project, 129

Stanford Parser, 91

Statistical language model, 109–110

Statistical methods, 6

Statistical model based approaches, 107–108

Sumblr, 143

Supporting sentence, 146

Support vector machine (SVM) classiﬁer, 45,

66, 90, 156, 157, 159, 161, 162,

164–167, 182, 185

Support Vector Regression model, 139

SVM system, 66

Synset, 89, 90, 94, 99

Syntagmatic similarity, 94

Target language, 99, 100

Task-based label, 86

196 Index

Term-Sentiment Association

automatic lexicons, 69–70

manual lexicons

annotation scheme, 68

Best–Worst Scaling, 69

General Inquirer, 68

MaxDiff questions, 68–69

MPQA Subjectivity Lexicon, 68

NRC Emotion Lexicon, 68

Text classiﬁcation, 102, 142, 165

Text mining, 48, 183

Text recognition, 175

Text summarization, 65, 135–138

Textual unit, 86–88, 101

Theme concept, 117

Topic-document distributions, 101

Topic model

based on LDA, 101

Dirichlet distribution, 114–115

LSI, 111

pLSI, 112–114

Reverse-JST, 121

Topic Modeling Toolbox, 129

Topic-Sentiment Mixture (TSM) model, 117,

125–126

Topic-speciﬁc sentiment summarization, 126

Training set creation, 90

Turney method, 69

Tweet Cluster vectors, 143

Tweetmotif, 143

Twitter API, 96, 97, 101

Twitter sentiment analysis, 7, 129, 184

Two-stage approach, 122

Unigram language model, 110, 113

Unigrams, 45, 62, 70, 95, 102, 110, 113, 157,

165, 184

Unlabeled corpus, 100

Unsupervised approach, 117, 119, 123, 130,

139, 148, 164, 182, 185

Upstream models, 116–120

User rating behavior analysis, 127

Valence dimension, 88

Verbal deception, 155, 157, 167

Web resources, 129

WordNet, 6, 86

WordNet-Affect (WNA), 94, 176, 177

WordNet synsets, 146, 177

Word-topic distributions, 101

Yelp Dataset Challenge, 129

A Practical Guide To Sentiment Analysis

A%20Practical%20Guide%20to%20Sentiment%20Analysis

Navigation menu

Versions of this User Manual:

Views

Navigation