A Practical Guide To Sentiment Analysis

A%20Practical%20Guide%20to%20Sentiment%20Analysis

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 199

DownloadA Practical Guide To Sentiment Analysis
Open PDF In BrowserView PDF
Socio-Affective Computing 5

Erik Cambria
Dipankar Das
Sivaji Bandyopadhyay
Antonio Feraco Editors

A Practical
Guide to
Sentiment
Analysis

Socio-Affective Computing
Volume 5

Series Editor
Amir Hussain, University of Stirling, Stirling, UK
Co-Editor
Erik Cambria, Nanyang Technological University, Singapore

This exciting Book Series aims to publish state-of-the-art research on socially
intelligent, affective and multimodal human-machine interaction and systems.
It will emphasize the role of affect in social interactions and the humanistic side
of affective computing by promoting publications at the cross-roads between
engineering and human sciences (including biological, social and cultural aspects
of human life). Three broad domains of social and affective computing will be
covered by the book series: (1) social computing, (2) affective computing, and
(3) interplay of the first two domains (for example, augmenting social interaction
through affective computing). Examples of the first domain will include but not
limited to: all types of social interactions that contribute to the meaning, interest and
richness of our daily life, for example, information produced by a group of people
used to provide or enhance the functioning of a system. Examples of the second
domain will include, but not limited to: computational and psychological models of
emotions, bodily manifestations of affect (facial expressions, posture, behavior,
physiology), and affective interfaces and applications (dialogue systems, games,
learning etc.). This series will publish works of the highest quality that advance
the understanding and practical application of social and affective computing
techniques. Research monographs, introductory and advanced level textbooks,
volume editions and proceedings will be considered.

More information about this series at http://www.springer.com/series/13199

Erik Cambria • Dipankar Das
Sivaji Bandyopadhyay • Antonio Feraco
Editors

A Practical Guide to
Sentiment Analysis

123

Editors
Erik Cambria
School of Computer Science
and Engineering
Nanyang Technological University
Singapore, Singapore
Sivaji Bandyopadhyay
Computer Science
and Engineering Department
Jadavpur University
Kolkata, India

Dipankar Das
Computer Science
and Engineering Department
Jadavpur University
Kolkata, India
Antonio Feraco
Fraunhofer IDM@NTU
Nanyang Technological University
Singapore, Singapore

ISSN 2509-5706
ISSN 2509-5714 (electronic)
Socio-Affective Computing
ISBN 978-3-319-55392-4
ISBN 978-3-319-55394-8 (eBook)
DOI 10.1007/978-3-319-55394-8
Library of Congress Control Number: 2017938021
© Springer International Publishing AG 2017
Chapter 4 is published with kind permission of the Her Majesty the Queen Right of Canada.
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

While sentiment analysis research has become very popular in the past ten years,
most companies and researchers still approach it simply as a polarity detection
problem. In reality, sentiment analysis is a “suitcase problem” that requires tackling
many natural language processing (NLP) subtasks, including microtext analysis,
sarcasm detection, anaphora resolution, subjectivity detection, and aspect extraction. In this book, we propose an overview of the main issues and challenges
associated with current sentiment analysis research and provide some insights on
practical tools and techniques that can be exploited to both advance the state of the
art in all sentiment analysis subtasks and explore new areas in the same context.
In Chap. 1, we discuss the state of the art of affective computing and sentiment
analysis research, including recent deep learning techniques and linguistic patterns
for emotion and polarity detection from different modalities, e.g., text and video.
In Chap. 2, Bing Liu describes different aspects of sentiment analysis and
different types of opinions. In particular, he uses product reviews as examples to
introduce general key concepts and definitions that are applicable to all forms of
formal and informal opinion text and all kinds of domains including social and
political domains.
In Chap. 3, Jiwei Li and Eduard Hovy describe possible directions for deeper
understanding about what opinions or sentiments are, why people hold them, and
why and how their facets are chosen and expressed, helping bridge the gap between
psychology/cognitive science and computational approaches.
In Chap. 4, Saif Mohammad discusses different sentiment analysis problems and
the challenges that are to be faced in order to go beyond simply determining whether
a piece of text is positive, negative, or neutral. In particular, the chapter aims to equip
researchers and practitioners with pointers to the latest developments in sentiment
analysis and encourage more work in the diverse landscape of problems, especially
those areas that are relatively less explored.
In Chap. 5, Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire contrast the
process of lexicon creation for a new language or a resource-scarce language from
a resource-rich one and, hence, show how the produced sentiment resources can be
exploited to solve classic sentiment analysis problems.
v

vi

Preface

In Chap. 6, Hongning Wang and ChengXiang Zhai show how generative models
can be used to integrate opinionated text data and their companion numerical
sentiment ratings, enabling deeper analysis of sentiment and opinions to obtain not
only subtopic-level sentiment but also latent relative weights on different subtopics.
In Chap. 7, Vasudeva Varma, Litton Kurisinkel, and Priya Radhakrishnan present
an overview of general approaches to automated text summarization with more
emphasis on extractive summarization techniques. They also describe recent works
on extractive summarization and the nature of scoring function for candidate
summary.
In Chap. 8, Paolo Rosso and Leticia Cagnina describe the very challenging
problems of deception detection and opinion spam detection, as lies and spam are
becoming increasingly serious issues with the rise, both in size and importance, of
social media and public opinion.
Finally, in Chap. 9 Federica Bisio et al. describe how to enhance the accuracy
of any algorithm for emotion or polarity detection through the integration of
commonsense reasoning resources, e.g., by embedding a concept-level knowledge
base for sentiment analysis.
Singapore, Singapore
Kolkata, India
Kolkata, India
Singapore, Singapore

Erik Cambria
Dipankar Das
Sivaji Bandyopadhyay
Antonio Feraco

Contents

1 Affective Computing and Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Erik Cambria, Dipankar Das, Sivaji Bandyopadhyay,
and Antonio Feraco

1

2

Many Facets of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bing Liu

11

3

Reflections on Sentiment/Opinion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Jiwei Li and Eduard Hovy

41

4

Challenges in Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Saif M. Mohammad

61

5

Sentiment Resources: Lexicons and Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire

85

6

Generative Models for Sentiment Analysis and Opinion Mining. . . . . . . 107
Hongning Wang and ChengXiang Zhai

7

Social Media Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Vasudeva Varma, Litton J. Kurisinkel, and Priya Radhakrishnan

8

Deception Detection and Opinion Spam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Paolo Rosso and Leticia C. Cagnina

9

Concept-Level Sentiment Analysis with SenticNet . . . . . . . . . . . . . . . . . . . . . . . 173
Federica Bisio, Claudia Meda, Paolo Gastaldo, Rodolfo Zunino,
and Erik Cambria

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

vii

Chapter 1

Affective Computing and Sentiment Analysis
Erik Cambria, Dipankar Das, Sivaji Bandyopadhyay, and Antonio Feraco

Abstract Understanding emotions is one of the most important aspects of personal
development and growth and, as such, it is a key tile for the emulation of
human intelligence. Besides being a important for the advancement of AI, emotion
processing is also important for the closely related task of polarity detection. The
opportunity automatically to capture the sentiments of the general public about
social events, political movements, marketing campaigns, and product preferences,
in fact, has raised increasing interest both in the scientific community, for the
exciting open challenges, and in the business world, for the remarkable fallouts
in marketing and financial market prediction. This has led to the emerging fields
of affective computing and sentiment analysis, which leverage on human-computer
interaction, information retrieval, and multimodal signal processing for distilling
people’s sentiments from the ever-growing amount of online social data.
Keywords Affective computing • Sentiment analysis • Five eras of the Web •
Jumping NLP curves • Hybrid approaches

1.1 Introduction
Emotions play an important role in successful and effective human-human relationships. In fact, in many situations, human ‘emotional intelligence’ is more important
than IQ for successful interaction (Pantic et al. 2005). There is also significant
evidence that rational learning in humans is dependent on emotions (Picard 1997).

E. Cambria ()
School of Computer Science and Engineering, Nanyang Technological University, 639798,
Singapore, Singapore
e-mail: cambria@ntu.edu.sg
D. Das • S. Bandyopadhyay
Computer Science and Engineering Department, Jadavpur University, 700032, Kolkata, India
e-mail: das@cse.jdvu.ac.in; sbandyopadhyay@cse.jdvu.ac.in
A. Feraco
Fraunhofer IDM@NTU, Nanyang Technological University, Singapore, Singapore
e-mail: antonio.feraco@fraunhofer.sg
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_1

1

2

E. Cambria et al.

Affective computing and sentiment analysis, hence, are key for the advancement
of AI (Minsky 2006) and all the research fields that stem from it. Moreover, they find
applications in several different scenarios and there is a good number of companies,
large and small, that include the analysis of emotions and sentiments as part of
their mission. Sentiment mining techniques can be exploited for the creation and
automated upkeep of review and opinion aggregation websites, in which opinionated
text and videos are continuously gathered from the Web and not restricted to
just product reviews, but also to wider topics such as political issues and brand
perception.
Affective computing and sentiment analysis have also a great potential as a
sub-component technology for other systems. They can enhance the capabilities
of customer relationship management and recommendation systems allowing, for
example, to find out which features customers are particularly happy about or
to exclude from the recommendations items that have received very negative
feedbacks. Similarly, they can be exploited for affective tutoring and affective entertainment or for troll filtering and spam detection in online social communication.
Business intelligence is also one of the main factors behind corporate interest
in the fields of affective computing and sentiment analysis. Nowadays, companies
invest an increasing amount of money in marketing strategies and they are constantly
interested in both collecting and predicting the attitudes of the general public
towards their products and brands. The design of automatic tools capable to mine
sentiments over the Web in real-time and to create condensed versions of these
represents one of the most active research and development areas. The development
of such systems, moreover, is not only important for commercial purposes, but
also for government intelligence applications able to monitor increases in hostile
communications or to model cyber-issue diffusion.
Several commercial and academic tools, e.g., IBM,1 SAS,2 Oracle,3 SenticNet4
and Luminoso,5 track public viewpoints on a large-scale by offering graphical
summarizations of trends and opinions in the blogosphere. Nevertheless, most
commercial off-the-shelf (COTS) tools are limited to a polarity evaluation or a mood
classification according to a very limited set of emotions. In addition, such methods
mainly rely on parts of text in which emotional states are explicitly expressed
and, hence, they are unable to capture opinions and sentiments that are expressed
implicitly. Because they are mainly based on statistical properties associated with
words, in fact, many COTS tools are easily tricked by linguistic operators such as
negation and disjunction.
The remainder of this chapter lists common tasks of affective computing and
sentiment analysis and presents a general categorization for them, after which some
concluding remarks are proposed.

1

http://ibm.com/analytics
http://sas.com/social
3
http://oracle.com/social
4
http://business.sentic.net
5
http://luminoso.com
2

1 Affective Computing and Sentiment Analysis

3

1.2 Common Tasks
The Web is evolving towards an era where communities will define future products
and services.6 In this context, big social data analysis (Cambria et al. 2014) is
destined to attract increasing interest from both academia and business (Fig. 1.1).

Fig. 1.1 Owyang’s Five-Eras vision shows that mining sentiments from the general public is
becoming increasingly important for the future of the Web

6

http://web-strategist.com/blog/2009/04/27

4

E. Cambria et al.

The basic tasks of affective computing and sentiment analysis are emotion
recognition (Picard 1997; Calvo and D’Mello 2010; Zeng et al. 2009; Schuller et al.
2011; Gunes and Schuller 2012) and polarity detection (Pang and Lee 2008; Liu
2012; Wilson et al. 2005; Cambria 2016). While the former focuses on extracting a
set of emotion labels, the latter is usually a binary classification task with outputs
such as ‘positive’ versus ‘negative’, ‘thumbs up’ versus ‘thumbs down’ or ‘like’
versus ‘dislike’. These two tasks are highly inter-related and inter-dependent to the
extent that some sentiment categorization models, e.g., the Hourglass of Emotions
(Cambria et al. 2012), treat it as a unique task by inferring the polarity associated to
a sentence directly from the emotions this conveys. In many cases, in fact, emotion
recognition is considered a sub-task of polarity detection.
Polarity classification itself can also be viewed as a subtask of more advanced
analyses. For example, it can be applied to identifying ‘pro and con’ expressions that
can be used in individual reviews to evaluate the pros and cons that have influenced
the judgements of a product and that make such judgements more trustworthy.
Another instance of binary sentiment classification is agreement detection, that is,
given a pair of affective inputs, deciding whether they should receive the same or
differing sentiment-related labels.
Complementary to binary sentiment classification is the assignment of degrees of
positivity to the detected polarity or valence to the inferred emotions. If we waive the
assumption that the input under examination is opinionated and it is about one single
issue or item, new challenging tasks arise, e.g., subjectivity detection, opinion target
identification, and more (Cambria et al. 2015). The capability of distinguishing
whether an input is subjective or objective, in particular, can be highly beneficial
for a more effective sentiment classification. Moreover, a record can also have a
polarity without necessarily containing an opinion, for example a news article can
be classified into good or bad news without being subjective.
Typically, affective computing and sentiment analysis are performed over ontopic documents, e.g., on the result of a topic-based search engine. However, several
studies suggested that managing these two task jointly can be beneficial for the
overall performances. For example, off-topic passages of a document could contain
irrelevant affective information and result misleading for the global sentiment
polarity about the main topic. Also, a document can contain material on multiple
topics that may be of interest to the user. In this case, it is therefore necessary to
identify the topics and separate the opinions associated with each of them.
Similar to topic detection is aspect extraction, a subtask of sentiment analysis
that consists in identifying opinion targets in opinionated text, i.e., in detecting
the specific aspects of a product or service the opinion holder is either praising
or complaining about. In a recent approach, Poria et al. (2016) used a 7-layer deep
convolutional neural network to tag each word in opinionated sentences as either
aspect or non-aspect word and developed a set of linguistic patterns for the same
purpose in combination with the neural network.
Other sentiment analysis subtasks include aspect extraction (Poria et al. 2016),
subjectivity detection (Chaturvedi et al. 2016), concept extraction (Rajagopal et al.

1 Affective Computing and Sentiment Analysis

5

2013), named entity recognition (Ma et al. 2016), and sarcasm detection (Poria et al.
2016), but also complementary tasks such as personality recognition (Poria et al.
2013), user profiling (Mihalcea and Garimella 2016) and especially multimodal
fusion (Poria et al. 2016). With increasing amounts of webcams installed in enduser devices such as smart phones, touchpads, or netbooks, there is an increasing
amount of affective information posted to social online services in an audio or
audiovisual format rather than on a pure textual basis. For a rough impression on
the extent, consider that two days of video material are uploaded to YouTube on
average per minute. Besides speech-to-text recognition, this allows for additional
exploitation of acoustic information, facial expression and body movement analysis
or even the “mood” of the background music or the color filters, etc.
Multimodal fusion is to integrate all single modalities into a combined single
representation. There are basically two types of fusion techniques that have been
used in most of the literature to improve reliability in emotion recognition from
multimodal information: feature-level fusion and decision-level fusion (Konar and
Chakraborty 2015). The authors in Raaijmakers et al. (2008) fuse acoustic and
linguistic information. Yet, linguistic information is based on the transcript of the
spoken content rather than on automatic speech recognition output. In Morency et al.
(2011), acoustic, textual, and video features are combined for the assessment of
opinion polarity in 47 YouTube videos. A significant improvement is demonstrated
in a leave-one-video-out evaluation using Hidden-Markov-Models for classification.
As relevant features the authors identify polarized words, smile, gaze, pauses, and
voice pitch. Textual analysis is, however, also only based on the manual transcript
of spoken words.
In Poria et al. (2016), finally, the authors propose a novel methodology for
multimodal sentiment analysis, which consists in harvesting sentiments from Web
videos by demonstrating a model that uses audio, visual and textual modalities as
sources of information. They used both feature- and decision-level fusion methods
to merge affective information extracted from multiple modalities, achieving an
accuracy of nearly 80%.

1.3 General Categorization
Existing approaches to affective computing and sentiment analysis can be grouped
into three main categories: knowledge-based techniques, statistical methods, and
hybrid approaches.
Knowledge-based techniques are very popular because of their accessibility and
economy. Text is classified into affect categories based on the presence of fairly
unambiguous affect words like ‘happy’, ‘sad’, ‘afraid’, and ‘bored’. Popular sources
of affect words or multi-word expressions are Ortony’s Affective Lexicon (Ortony
et al. 1988), Wiebe’s linguistic annotation scheme (Wiebe et al. 2005), WordNet-

6

E. Cambria et al.

Affect (Strapparava and Valitutti 2004), SentiWordNet (Esuli and Sebastiani 2006),
SenticNet (Cambria et al. 2016), and other probabilistic knowledge bases trained
from linguistic corpora (Stevenson et al. 2007; Somasundaran et al. 2008; Rao
and Ravichandran 2009). The major weakness of knowledge-based approaches is
poor recognition of affect when linguistic rules are involved. For example, while
a knowledge base can correctly classify the sentence “today was a happy day”
as being happy, it is likely to fail on a sentence like “today wasn’t a happy
day at all”. To this end, more sophisticated knowledge-based approaches exploit
linguistics rules to distinguish how each specific knowledge base entry is used in
text (Poria et al. 2015). The validity of knowledge-based approaches, moreover,
heavily depends on the depth and breadth of the employed resources. Without
a comprehensive knowledge base that encompasses human knowledge, in fact,
it is not easy for a sentiment mining system to grasp the semantics associated
with natural language or human behavior. Another limitation of knowledge-based
approaches lies in the typicality of their knowledge representation, which is usually
strictly defined and does not allow handling different concept nuances, as the
inference of semantic and affective features associated with concepts is bounded
by the fixed, flat representation.
Statistical methods, such as support vector machines and deep learning, have
been popular for affect classification of texts and have been used by researchers
on projects such as Pang’s movie review classifier (Pang et al. 2002) and many
others (Hu and Liu 2004; Glorot et al. 2011; Socher et al. 2013; Lau et al. 2014;
Oneto et al. 2016). By feeding a machine learning algorithm a large training corpus
of affectively annotated texts, it is possible for the system to not only learn the
affective valence of affect keywords (as in the keyword spotting approach), but also
to take into account the valence of other arbitrary keywords (like lexical affinity)
and word co-occurrence frequencies. However, statistical methods are generally
semantically weak, i.e., lexical or co-occurrence elements in a statistical model have
little predictive value individually. As a result, statistical text classifiers only work
with acceptable accuracy when given a sufficiently large text input. So, while these
methods may be able to affectively classify user’s text on the page- or paragraphlevel, they do not work well on smaller text units such as sentences or clauses.
Hybrid approaches to affective computing and sentiment analysis, finally, exploit
both knowledge-based techniques and statistical methods to perform tasks such as
emotion recognition and polarity detection from text or multimodal data. Sentic
computing (Cambria and Hussain 2015), for example, exploits an ensemble of
knowledge-driven linguistic patterns and statistical methods to infer polarity from
text. Xia et al. (2015) used SenticNet and a Bayesian model for contextual
concept polarity disambiguation. Dragoni et al. (2014) proposed a fuzzy framework
which merges WordNet, ConceptNet and SenticNet to extract key concepts from a
sentence. iFeel (Araújo et al. 2014) is a system that allows users to create their own
sentiment analysis framework by combing SenticNet, SentiWordNet and other sentiment analysis methods. Chenlo and Losada (2014) used SenticNet to extract bag of
concepts and polarity features for subjectivity detection and other sentiment analysis
tasks. Chung et al. (2014) used SenticNet concepts as seeds and proposed a method

1 Affective Computing and Sentiment Analysis

7

of random walk in ConceptNet to retrieve more concepts along with polarity scores.
Other works propose the joint use of knowledge bases and machine learning for
Twitter sentiment analysis (Bravo-Marquez et al. 2014), short text message classification (Gezici et al. 2013) and frame-based opinion mining (Recupero et al. 2014).

1.4 Conclusion
The passage from a read-only to a read-write Web made users more enthusiastic
about sharing their emotion and opinions through social networks, online communities, blogs, wikis, and other online collaborative media. In recent years, this
collective intelligence has spread to many different areas of the Web, with particular
focus on fields related to our everyday life such as commerce, tourism, education,
and health.
Despite significant progress, however, affective computing and sentiment analysis are still finding their own voice as new inter-disciplinary fields. Engineers
and computer scientists use machine learning techniques for automatic affect
classification from video, voice, text, and physiology. Psychologists use their long
tradition of emotion research with their own discourse, models, and methods.
Affective computing and sentiment analysis are research fields inextricably bound
to the affective sciences that attempt to understand human emotions. Simply put, the
development of affect-sensitive systems cannot be divorced from the century-long
psychological research on emotion.
Hybrid approaches aim to better grasp the conceptual rules that govern sentiment
and the clues that can convey these concepts from realization to verbalization in
the human mind. In recent years, such approaches are gradually setting affective
computing and sentiment analysis as interdisciplinary fields in between mere
NLP and natural language understanding by gradually shifting from syntax-based
techniques to more and more semantics-aware frameworks Cambria and White
(2014), where both conceptual knowledge and sentence structure are taken into
account (Fig. 1.2).
So far, sentiment mining approaches from text or speech have been mainly based
on the bag-of-words model because, at first glance, the most basic unit of linguistic
structure appears to be the word. Single-word expressions, however, are just a subset
of concepts, multi-word expressions that carry specific semantics and sentics, that
is, the denotative and connotative information commonly associated with objects,
actions, events, and people. Sentics, in particular, specifies the affective information
associated with real-world entities, which is key for emotion recognition and
polarity detection, the basic tasks of affective computing and sentiment analysis.
The best way forward for these two fields, hence, is the ensemble application of
semantic knowledge and machine learning, where different approaches can cover
for each other’s flaws. In particular, the combined application of linguistics and
knowledge bases will allow sentiments to flow from concept to concept based on

8

E. Cambria et al.

Fig. 1.2 Jumping NLP curves

the dependency relation of the input sentence, while machine learning will act as
backup for missing concepts and unknown linguistic patterns.
Next-generation sentiment mining systems need broader and deeper common
and commonsense knowledge bases, together with more brain-inspired and
psychologically-motivated reasoning methods, in order to better understand
natural language opinions and, hence, more efficiently bridge the gap between
(unstructured) multimodal information and (structured) machine-processable data.
Looking ahead, blending scientific theories of emotion with the practical engineering goals of analyzing sentiments in natural language and human behavior
will pave the way for development of more bio-inspired approaches to the design
of intelligent sentiment mining systems capable of handling semantic knowledge,
making analogies, learning new affective knowledge, and detecting, perceiving, and
‘feeling’ emotions.

References
Araújo, M., P. Gonçalves, M. Cha, and F. Benevenuto. 2014. iFeel: A system that compares and
combines sentiment analysis methods. In WWW, 75–78.
Bravo-Marquez, F., M. Mendoza, and B. Poblete. 2014. Meta-level sentiment models for big social
data analysis. Knowledge-Based Systems 69: 86–99.

1 Affective Computing and Sentiment Analysis

9

Calvo, R., and S. D’Mello. 2010. Affect detection: An interdisciplinary review of models, methods,
and their applications. IEEE Transactions on Affective Computing 1(1): 18–37.
Cambria, E. 2016. Affective computing and sentiment analysis. IEEE Intelligent Systems 31(2):
102–107.
Cambria, E., and A. Hussain. 2015. Sentic computing: A common-sense-based framework for
concept-level sentiment analysis. Cham: Springer.
Cambria, E., A. Livingstone, and A. Hussain. 2012. The hourglass of emotions. In Cognitive
behavioral systems, ed. A. Esposito, A. Vinciarelli, and R. Hoffmann, V. Muller, Lecture notes
in computer science, vol. 7403, 144–157. Berlin/Heidelberg: Springer.
Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for
sentiment analysis based on conceptual primitives. In COLING, 2666–2677.
Cambria, E., S. Poria, F. Bisio, R. Bajpai, and I. Chaturvedi. 2015. The CLSA model: A novel
framework for concept-level sentiment analysis. In Computational linguistics and intelligent
text processing. CICLing 2015, ed. A. Gelbukh, LNCS, vol. 9042, 3–22. Cham: Springer.
Cambria, E., H. Wang, and B. White. 2014. Guest editorial: Big social data analysis. KnowledgeBased Systems 69: 1–2.
Cambria, E., and B. White. 2014. Jumping NLP curves: A review of natural language processing
research. IEEE Computational Intelligence Magazine 9(2): 48–57.
Chaturvedi, I., E. Cambria, and D. Vilares. 2016. Lyapunov filtering of objectivity for Spanish
sentiment model. In: IJCNN, 4474–4481.
Chenlo, J.M., and D.E. Losada. 2014. An empirical study of sentence features for subjectivity and
polarity classification. Information Sciences 280: 275–288.
Chung, J.K.C., C.E. Wu, and R.T.H. Tsai. 2014. Improve polarity detection of online reviews with
bag-of-sentimental-concepts. In Proceedings of the 11th ESWC. Semantic Web Evaluation
Challenge. Crete: Springer.
Dragoni, M., A.G. Tettamanzi, and C. da Costa Pereira. 2014. A fuzzy system for concept-level
sentiment analysis. In Semantic web evaluation challenge, 21–27. Cham: Springer.
Esuli, A., and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion
mining. In LREC.
Gezici, G., R. Dehkharghani, B. Yanikoglu, D. Tapucu, and Y. Saygin. 2013. Su-sentilab: A
classification system for sentiment analysis in twitter. In International Workshop on Semantic
Evaluation, 471–477.
Glorot, X., A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment
classification: A deep learning approach. In ICML, Bellevue.
Gunes, H., and B. Schuller. 2012. Categorical and dimensional affect analysis in continuous input:
Current trends and future directions. Image and Vision Computing 31(2): 120–136.
Hu, M., and B. Liu. 2004. Mining and summarizing customer reviews. In KDD, Seattle.
Konar, A., and A. Chakraborty. 2015. Emotion recognition: A pattern analysis approach. Hoboken:
Wiley & Sons.
Lau, R., Y. Xia, and Y. Ye. 2014. A probabilistic generative model for mining cybercriminal
networks from online social media. IEEE Computational Intelligence Magazine 9(1): 31–43
Liu, B. 2012. Sentiment analysis and opinion mining. San Rafael: Morgan and Claypool.
Ma, Y., E. Cambria, and S. Gao. 2016. Label embedding for zero-shot fine-grained named entity
typing. In COLING, Osaka, 171–180.
Mihalcea, R., and A. Garimella. 2016. What men say, what women hear: Finding gender-specific
meaning shades. IEEE Intelligent Systems 31(4): 62–67.
Minsky, M. 2006. The emotion machine: Commonsense thinking, artificial intelligence, and the
future of the human mind. New York: Simon & Schuster.
Morency, L.P., R. Mihalcea, and P. Doshi. 2011. Towards multimodal sentiment analysis:
Harvesting opinions from the web. In International Conference on Multimodal Interfaces
(ICMI), 169–176. New York: ACM.
Oneto, L., F. Bisio, E. Cambria, and D. Anguita. 2016. Statistical learning theory and ELM for big
social data analysis. IEEE Computational Intelligence Magazine 11(3): 45–55.

10

E. Cambria et al.

Ortony, A., G. Clore, and A. Collins. 1988. The cognitive structure of emotions. Cambridge:
Cambridge University Press.
Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval 2: 1–135.
Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine
learning techniques. In EMNLP, Philadelphia, 79–86.
Pantic, M., N. Sebe, J. Cohn, and T. Huang. 2005. Affective multimodal human-computer
interaction. In ACM International Conference on Multimedia, New York, 669–676.
Picard, R. 1997. Affective computing. Boston: The MIT Press.
Poria, S., E. Cambria, and A. Gelbukh. 2016. Aspect extraction for opinion mining with a deep
convolutional neural network. Knowledge-Based Systems 108: 42–49.
Poria, S., E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data flow analysis
by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):
26–36.
Poria, S., E. Cambria, D. Hazarika, and P. Vij. 2016. A deeper look into sarcastic tweets using deep
convolutional neural networks. In COLING, 1601–1612.
Poria, S., E. Cambria, N. Howard, G.B. Huang, and A. Hussain. 2016. Fusing audio, visual and
textual clues for sentiment analysis from multimodal content. Neurocomputing 174: 50–59.
Poria, S., I. Chaturvedi, E. Cambria, and A. Hussain. 2016. Convolutional MKL based multimodal
emotion recognition and sentiment analysis. In ICDM, 439–448.
Poria, S., A. Gelbukh, B. Agarwal, E. Cambria, and N. Howard. 2013. Common sense knowledge
based personality recognition from text. In Advances in soft computing and its applications,
484–496. Berlin/Heidelberg: Springer.
Raaijmakers, S., K. Truong, and T. Wilson. 2008. Multimodal subjectivity analysis of multiparty
conversation. In EMNLP, Edinburgh, 466–474.
Rajagopal, D., E. Cambria, D. Olsher, and K. Kwok. 2013. A graph-based approach to commonsense concept extraction and semantic similarity detection. In WWW, Rio De Janeiro, 565–570.
Rao, D., and D. Ravichandran. 2009. Semi-supervised polarity lexicon induction. In EACL,
Athens, 675–682.
Recupero, D.R., V. Presutti, S. Consoli, A. Gangemi, and A. Nuzzolese. 2014. Sentilo: Framebased sentiment analysis. Cognitive Computation 7(2): 211–225.
Schuller, B., A. Batliner, S. Steidl, and D. Seppi. 2011. Recognising realistic emotions and affect
in speech: State of the art and lessons learnt from the first challenge. Speech Communication
53(9/10): 1062–1087.
Socher, R., A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. 2013.
Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP,
1642–1654.
Somasundaran, S., J. Wiebe, and J. Ruppenhofer. 2008. Discourse level opinion interpretation. In
COLING, Manchester, 801–808.
Stevenson, R., J. Mikels, and T. James. 2007. Characterization of the affective norms for english
words by discrete emotional categories. Behavior Research Methods 39: 1020–1024.
Strapparava, C., and A. Valitutti. 2004. WordNet-Affect: An affective extension of WordNet. In
LREC, Lisbon, 1083–1086.
Wiebe, J., T. Wilson, and C. Cardie. 2005. Annotating expressions of opinions and emotions in
language. Language Resources and Evaluation 39(2): 165–210.
Wilson, T., J. Wiebe, and P. Hoffmann. 2005. Recognizing contextual polarity in phrase-level
sentiment analysis. In HLT/EMNLP, Vancouver, 347–354.
Xia, Y., E. Cambria, A. Hussain, and H. Zhao. 2015. Word polarity disambiguation using bayesian
model and opinion-level features. Cognitive Computation 7(3): 369–380.
Zeng, Z., M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods:
Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and
Machine Intelligence 31(1): 39–58.

Chapter 2

Many Facets of Sentiment Analysis
Bing Liu

Abstract Sentiment analysis or opinion mining is the computational study of
people’s opinions, sentiments, evaluations, attitudes, moods, and emotions. It is
one of the most active research areas in natural language processing, data mining,
information retrieval, and Web mining. In recent years, its research and applications
have also spread to management sciences and social sciences due to its importance
to business and society as a whole. This chapter defines the sentiment analysis
problem and its related concepts such as sentiment, opinion, emotion, mood, and
affect. The goal is to abstract a structure from the complex unstructured natural
language text related to the problem and its pertinent concepts. The definitions not
only enable us to see a rich set of inter-related sub-problems, but also a common
framework that can unify existing research directions. They also help researchers
design more robust solution techniques by exploiting the inter-relationships of the
sub-problems.
Keywords Sentiment analysis • Opinion mining • Emotion • Mood • Affect •
Subjectivity

Many people thought that sentiment analysis is just the problem of classifying
whether a document or a sentence expresses a positive or negative sentiment or
opinion. It is in fact a much more complex problem than that. It involves many facets
and multiple sub-problems. In this chapter, I define an abstraction of the sentiment
analysis problem. The definitions will enable us to see a rich set of inter-related
sub-problems. It is often said that if we cannot structure a problem, we probably
do not understand the problem. The objective of the definitions is to abstract a
structure from the complex unstructured natural language text. The structure serves
as a common framework to unify existing research directions and enable researchers
to design more robust solution techniques by exploiting the inter-relationships of the
sub-problems.

B. Liu ()
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
e-mail: liub@cs.uic.edu
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_2

11

12

B. Liu

Unlike factual information, sentiment and opinion have an important characteristic, namely, being subjective. The subjectivity comes from many sources. First
of all, different people may have different experiences and thus different opinions.
Different people may also have different interests and/or different ideologies. Due to
such different subjective experiences, views, interests and ideologies, it is important
to examine a collection of opinions from many people rather than only one opinion
from a single person because such an opinion represents only the subjective view of
a single person, which is usually not sufficient for action. With a large number of
opinions, some form of summary becomes necessary (Hu and Liu 2004). Thus, the
problem definitions should also state what kind of summary may be desired. Along
with the problem definitions, the chapter also discusses different types of opinions
and the important concepts of affect, emotion and mood.
Throughout this chapter, I mainly use product reviews and sentences from such
reviews as examples to introduce the key concepts, but the ideas and the resulting
definitions are general and applicable to all forms of formal and informal opinion
text such as news articles, tweets (Twitter posts), forum discussions, blogs, and
Facebook posts, and all kinds of domains including social and political domains.
The content of this chapter is mainly taken from my book “Sentiment Analysis:
Mining Opinions, Sentiments, and Emotions” (Liu 2015).

2.1 Definition of Opinion
Sentiment analysis mainly studies opinions that express or imply positive or
negative sentiment. We define the problem in this context. We use the term opinion
as a broad concept that covers sentiment, evaluation, appraisal, or attitude, and its
associated information such as opinion target and the person who holds the opinion,
and use the term sentiment to mean only the underlying positive or negative feeling
implied by opinion. Due to the need to analyze a large volume of opinions, in
defining opinion we consider two levels of abstraction: a single opinion and a set
of opinions. In this section, we focus on defining a single opinion and describing
the tasks involved in extracting an opinion. Section 2.2 focuses on a set of opinions,
where we define opinion summary.

2.1.1 Opinion Definition
We use the following review (Review A) about a camera to introduce the problem
(an id number is associated with each sentence for easy reference):

2 Many Facets of Sentiment Analysis

13

Review A
Posted by John Smith
Date: September 10, 2011
(1) I bought a Canon G12 camera six months ago. (2) I simply love it. (3) The picture quality
is amazing. (4) The battery life is also long. (5) However, my wife thinks it is too heavy for
her.

From this review, we can make the following important observation:
Opinion, sentiment and target: Review A has several opinions with positive or
negative sentiment about the Canon G12 camera. Sentence (2) expresses a
positive sentiment about the Canon camera as a whole. Sentence (3) expresses
a positive sentiment about its picture equality. Sentence (4) expresses a positive
sentiment about its battery life. Sentence (5) expresses a negative sentiment about
the camera’s weight.
From these opinions, we can make a crucial observation about sentiment
analysis. That is, an opinion has two key components: a target g and a sentiment
s on the target, i.e., (g, s), where g can be any entity or aspect of the entity on
which an opinion has been expressed, and s can be a positive, negative, or neutral
sentiment, or a numeric rating. Positive, negative and neutral are called sentiment
or opinion orientations. For example, the target of the opinion in sentence (2) is
the Canon G12 camera, the target of the opinion in sentence (3) is the picture
quality of Canon G12, and the target of sentence (5) is the weight of Canon G12
(weight is indicated by heavy). Target is also called topic by some researchers.
Opinion holder: Review A contains opinions from two persons, who are called
opinion sources or opinion holders (Kim and Hovy 2004; Wiebe et al. 2005). The
holder of the opinions in sentences (2), (3), and (4) is the author of the review
(“John Smith”), but for sentence (5), it is the wife of the author.
Time of opinion: The date of the review was September 10, 2011. This date is useful
because one often wants to know how opinions change over time or the opinion
trend.
With this example, we can define opinion as a quadruple.
Definition 1 (Opinion) An opinion is a quadruple,
.g; s; h; t/;
where g is the sentiment target, s is the sentiment of the opinion about the target g,
h is the opinion holder (the person or organization who holds the opinion), and t is
the time when the opinion is expressed.
The four components here are essential. It is generally problematic if any of
them is missing. For example, the time component is important in practice because

14

B. Liu

an opinion two years ago is not the same as an opinion today. Not having an opinion
holder is also problematic. For example, an opinion from a very important person
(e.g., the US President) is probably more important than that from the average Joe
on the street.
One thing that we want to stress about the definition is that opinion has target.
Recognizing this is important for two reasons: First, in a sentence with multiple
targets, we need to identify the specific target for each positive or negative sentiment.
For example, “Apple is doing very well in this poor economy” has a positive
sentiment and a negative sentiment. The target for the positive sentiment is Apple
and the target for the negative sentiment is economy. Second, words or phrases such
as good, amazing, bad and poor that express sentiments (called sentiment or opinion
terms) and opinion targets often have some syntactic relations (Hu and Liu 2004;
Qiu et al. 2011; Zhuang et al. 2006), which allow us to design algorithms to extract
both sentiment terms and opinion targets, which are two core tasks of sentiment
analysis (see Sect. 2.1.6).
The opinion defined here is just one type of opinion, called a regular opinion
(e.g., “Coke taste great”). Another type is comparative opinion (e.g., “Coke tastes
better than Pepsi”) which needs a different definition (Jindal and Liu 2006b; Liu
2006). Section 2.1.4 will further discuss different types of opinions. For the rest of
this section, we focus on only regular opinions, which, for simplicity, we will just
call opinions.

2.1.2 Sentiment Target
Definition 2 (Sentiment Target) The sentiment target, also known as the opinion
target, of an opinion is the entity or a part or attribute of the entity that the sentiment
has been expressed upon.
For example, in sentence (3) of Review A, the target is the picture quality of
Canon G12, although the sentence mentioned only the picture quality. The target is
not just the picture quality because without knowing that the picture quality belongs
to the Canon G12 camera, the opinion in the sentence is of little use.
An entity can be decomposed and represented hierarchically (Liu 2006).
Definition 3 (Entity) An entity e is a product, service, topic, person, organization,
issue or event. It is described with a pair, e: (T, W), where T is a hierarchy of parts,
sub-parts, and so on, and W is a set of attributes of e. Each part or sub-part also has
its own set of attributes.
For example, a particular camera model is an entity, e.g., Canon G12. It has a
set of attributes, e.g., picture quality, size, and weight, and a set of parts, e.g., lens,
viewfinder, and battery. Battery also has its own set of attributes, e.g., battery life
and battery weight. A topic can be an entity too, e.g., tax increase, with its subtopics or parts ‘tax increase for the poor,’ ‘tax increase for the middle class’ and
‘tax increase for the rich.’

2 Many Facets of Sentiment Analysis

15

This definition describes an entity hierarchy based on the part-of relation. The
root node is the name of the entity, e.g., Canon G12 Review A. All the other nodes
are parts and sub-parts, etc. An opinion can be expressed on any node and any
attribute of the node. For instance, in Review A, sentence (2) expresses a positive
opinion about the entity Canon G12 as a whole, and sentence (3) expresses a
positive opinion about the picture quality attribute of the camera. Clearly, we can
also express opinions about any part or component of the camera.
In the research literature, entities are also called objects, and attributes are also
called features (as in product features) (Hu and Liu 2004; Liu 2010). The terms
object and feature are not used in this Chapter because object can be confused with
the term object used in grammar and feature can be confused with feature used
in machine learning as data attribute. In recent years, the term aspect has become
popular, which covers both part and attribute (see Sect. 2.1.4).
Entities may be called other names in specific application domains. For example,
in politics, entities are usually political candidates, issues, and events. There is
no term that is perfect for all application domains. The term entity is chosen
because most current applications of sentiment analysis study opinions about
various forms of named entities, e.g., products, services, brands, organizations,
events, and people.

2.1.3 Sentiment and Its Intensity
Definition 4 (Sentiment) Sentiment is the underlying feeling, attitude, evaluation,
or emotion associated with an opinion. It is represented as a triple,
.y; o; i/;
where y is the type of the sentiment, o is the orientation of the sentiment, and i is
the intensity of the sentiment.
Sentiment type: Sentiment can be classified into several types. There are linguisticbased, psychology-based, and consumer research-based classifications. Here I
choose to use a consumer research-based classification as it is simple and easy
to use in practice. Consumer research classifies sentiment broadly into two
categories: rational sentiment and emotional sentiment (Chaudhuri 2006).
Definition 5 (Rational Sentiment) Rational sentiments are from rational reasoning, tangible beliefs, and utilitarian attitudes. They express no emotions.
We also call opinions expressing rational sentiment the rational opinions. The
opinions in the following sentences imply rational sentiment: “The voice of this
phone is clear,” and “This car is worth the price.”
Definition 6 (Emotional Sentiment) Emotional sentiments are from non-tangible
and emotional responses to entities which go deep into people’s psychological state
of mind.

16

B. Liu

We also call opinions expressing emotional sentiment the emotional opinions.
The opinions in the following sentences imply emotional sentiment: “I love iPhone,”
“I am so angry with their service people,” “This is the best car ever” and “After our
team won, I cried.”
Emotional sentiment is stronger than rational sentiment, and is usually more
important in practice. For example, in marketing, to guarantee the success of a new
product in the market, the positive sentiment from a large population of consumers
has to reach the emotional level. Rational positive may not be sufficient.
Each of these broad categories can be further divided into smaller categories.
For example, there are many types of emotions, e.g., anger, joy, fear, and sadness.
We will discuss some possible sub-divisions of rational sentiment in Sect. 2.4.2 and
different emotions in Sect. 2.3. In applications, the user is also free to design their
own sub-categories.
Sentiment orientation: It can be positive, negative, or neutral. Neutral usually means
the absence of sentiment or no sentiment or opinion. Sentiment orientation is also
called polarity, semantic orientation, or valence in the research literature.
Sentiment intensity: Sentiment can have different levels of strength or intensity.
People often use two ways to express intensity of their feelings in text. The
first is to choose sentiment terms (words or phrases) with suitable strengths.
For example, good is weaker than excellent, and dislike is weaker than detest.
Sentiment words are words in a language that are often used to express positive
or negative sentiments. For example, good, wonderful, and amazing are positive
sentiment words, and bad, poor, and terrible are negative sentiment words.
The second is to use intensifiers and diminishers, which are terms that change
the degree of the expressed sentiment. An intensifier increases the intensity
of a positive/negative term, while a diminisher decreases the intensity of that
term. Common English intensifiers include very, so, extremely, dreadfully, really,
awfully, terribly, etc., and common English diminishers include slightly, pretty,
a little bit, a bit, somewhat, barely, etc.
Sentiment rating: In applications, we commonly use some discrete ratings to express
sentiment intensity. Five levels (e.g., 1–5 stars) are commonly employed, which
can be interpreted as follows based on the two types of sentiment in Definitions
5 and 6:
•
•
•
•
•

emotional positive (C2 or 5 stars)
rational positive (C1 or 4 stars)
neutral (0 or 3 stars)
rational negative (1 or 2 stars)
emotional negative (2 or 1 star)

Clearly, it is possible to have more rating levels, which, however, become difficult
to differentiate based on the natural language text alone due to the highly subjective
nature and the fact that people’s spoken or written expressions may not fully match
with their psychological states of mind. For example, the sentence “This is an
excellent phone” expresses a rational evaluation of the phone, while “I love this

2 Many Facets of Sentiment Analysis

17

phone” expresses an emotional evaluation about the phone. However, whether they
represent completely different psychology states of mind of the authors is hard to
say. In practice, the above five levels are sufficient for most applications. If these five
levels are not enough in some applications, I suggest dividing emotional positive
(and, respectively, emotional negative) into two levels. Such applications are likely
to involve sentiment about personal, social or political events or issues, for which
people can be highly emotional.

2.1.4 Opinion Definition Simplified
Opinion as defined in Definition 1, although concise, may not be easy to use in
practice especially in the domain of online reviews of products, services, and brands.
Let us first look at the sentiment (or opinion) target. The central concept here is
entity, which is represented as a hierarchy with an arbitrary number of levels. This
can be too complex for practical applications because NLP is a very difficult task.
Recognizing parts and attributes of an entity at different levels of details is extremely
hard. Most applications also do not need such a complex analysis. Thus, we simplify
the hierarchy to two levels and use the term aspect to denote both part and attribute.
In the simplified tree, the root node is still the entity itself and the second level (also
the leaf level) nodes are different aspects of the entity.
The definition of sentiment in Definition 4 can be simplified too. In many
applications, positive (denoted by C1), negative (denoted by 1) and neutral
(denoted by 0) orientations alone are already enough. In almost all applications,
5 levels of ratings are sufficient, e.g., 1–5 stars. In both cases, sentiment can be
represented with a single value. The other two components in the triple can be folded
into this value.
This simplified framework is what is typically used in practical sentiment
analysis systems. We now redefine the concept of opinion (Hu and Liu 2004; Liu
2010).
Definition 7 (Opinion) An opinion is a quintuple,
.e; a; s; h; t/;
where e is the target entity, a is the target aspect of entity e on which the opinion
has been expressed, s is the sentiment of the opinion on aspect a of entity e, h is
the opinion holder, and t is the opinion posting time. s can be positive, negative,
or neutral, or a rating (e.g., 1–5 stars). When an opinion is only on the entity as
a whole, the special aspect GENERAL is used to denote it. Here, e and a together
represent the opinion target.
Sentiment analysis (or opinion mining) based on this definition is often called
aspect-based sentiment analysis, or feature-based sentiment analysis as it was called
earlier in (Hu and Liu 2004; Liu 2010).

18

B. Liu

We should note that due to the simplification, the quintuple representation of
opinion may result in information loss. For example, ink is a part of printer. A
printer review might say “The ink of this printer is expensive.” This sentence does
not say that the printer is expensive (expensive here indicates the aspect price). If
one does not care about any attribute of the ink, this sentence just gives a negative
opinion about the ink (which is an aspect of the printer entity). This results in
information loss. However, if one also wants to study opinions about different
aspects of the ink, then the ink needs to be treated as a separate entity. The quintuple
representation still applies, but an extra mechanism will be required to record the
part-of relationship between ink and printer. Of course, conceptually we can also
extend the flat quintuple relation to a nested relation to make it more expressive.
However, as we explained above, too complex a definition can make the problem
extremely difficult to solve in practice. Despite this limitation, Definition 4 does
cover the essential information of an opinion sufficiently for most applications.
In some applications, it may not be easy to distinguish entity and aspect or there
is no need to distinguish them. Such cases often occur when people discuss political
or social issues, e.g., “I hate property tax increases.” We may deal with them in two
ways. First, since the author regards ‘property tax increase’ as a general issue and it
thus does not belong to any specific entity. We can treat it as an entity with the aspect
GENERAL. Second, we can regard ‘property tax’ as an entity and ‘property tax
increases’ as one of its aspects to form a hierarchical relationship. Whether treating
an issue/topic as an aspect or an entity can also depend on the specific context.
For example, in commenting about a local government, one says “I hate the
proposed property tax increase.” Since it is the local government that imposes and
levies property taxes, the specific local government may be regarded as an entity
and ‘the proposed property tax increase’ as one of its aspects.
Not all applications need all five components of an opinion. In some applications,
the user may not need the aspect information. For example, in brand management,
the user typically is interested in only opinions about product brands (entities). This
is sometimes called entity-based sentiment analysis. In some other applications,
the user may not need to know the opinion holder or time of opinion. Then these
components can be ignored.

2.1.5 Reason and Qualifier for Opinion
We can in fact perform an even finer-grained analysis of opinions. Let us use the
sentence “This car is too small for a tall person” to explain. It expresses a negative
sentiment about the size aspect of the car. However, only reporting the negative
sentiment for size does not tell the whole story because it can mean too small or too
big. In the above sentence, we call “too small” the reason for the negative sentiment
about size. Furthermore, the sentence does not say that the car is too small for
everyone, but only for a tall person. We call “for a tall person” the qualifier of
the opinion. We now define these concepts.

2 Many Facets of Sentiment Analysis

19

Definition 8 (Reason for Opinion) A reason for an opinion is the cause of the
opinion.
In practical applications, discovering the reasons for each positive or negative
opinion can be very important because it may be these reasons that enable one to
perform actions to remedy the situation. For example, the sentence “I do not like the
picture quality of this camera” is not as useful as “I do not like the picture quality of
this camera because the pictures are quite dark.” The first sentence does not give the
reason for the negative sentiment about the picture quality and it is thus difficult to
know what to do to improve the picture quality. The second sentence is more informative because it gives the reason or cause for the negative sentiment. The camera
manufacturer can make use of this piece of information to improve the picture quality of the camera. In most industrial applications, such reasons are called problems
or issues. Knowing the issues allows businesses to find ways to address them.
Definition 9 (Qualifier of Opinion) A qualifier of an opinion limits or modifies
the meaning of the opinion.
Knowing the qualifier is also important in practice because it tells what the
opinion is good for. For example, “This car is too small for a tall person” does
not say that the car is too small for everyone, but just for tall people. For a person
who is not tall, this opinion does not apply.
However, as we have seen, not every opinion comes with an explicit reason
and/or an explicit qualifier. “The picture quality of this camera is not great” does not
have a reason or a qualifier. “The picture quality of this camera is not good for night
shots” has a qualifier “for night shots,” but does not give a specific reason for the
negative sentiment. “The picture quality of this camera is not good for night shots
as the pictures are quite dark” has a reason for the negative sentiment (‘the pictures
are quite dark’) and also a qualifier (‘for night shots’). Sometimes, the qualifier and
the reason may not be in the same sentence and/or may be quite implicit, e.g., “The
picture quality of this camera is not great. Pictures of night shots are very dark”
and “I am 6 feet 5 inches tall. This car is too small for me.” An expression can also
serve multiple purposes. For example, ‘too small’ in the above sentence indicates
the size aspect of the car, a negative sentiment about the size, and also the reason
for the negative sentiment/opinion.

2.1.6 Objective and Tasks of Sentiment Analysis
With the definitions in Sects. 2.1.1, 2.1.2, 2.1.3 and 2.1.4, we can now present the
core objective and the key tasks of (aspect-based) sentiment analysis.
Objective of Sentiment Analysis Given an opinion document d, discover all
opinion quintuples (e, a, s, h, t) in d. For more advanced analysis, discover the
reason and qualifier for the sentiment in each opinion quintuple.

20

B. Liu

Key Tasks of Sentiment Analysis The key tasks of sentiment analysis can
be derived from the five components of the quintuple (Definition 7). The first
component is the entity and the first task is to extract entities. The task is similar
to named entity recognition (NER) in information extraction (Hobbs and Riloff
2010; Sarawagi 2008). However, as defined in Definition 3, an entity can also be
an event, issue, or topic, which is usually not a named entity. For example, in “I
hate tax increase,” the entity is ‘tax increase,’ which is an issue or topic. In such
cases, entity extraction is basically the same as aspect extraction and the difference
between entity and aspect becomes blurry. In some applications, there may not be a
need to distinguish them.
After extraction, we need to categorize the extracted entities as people often write
the same entity in different ways. For example, Motorola may be written as Mot,
Moto, and Motorola. We need to recognize that they all refer to the same entity (see
(Liu 2015) for details).
Definition 10 (Entity Category and Entity Expression) An entity category
represents a unique entity, while an entity expression or mention is an actual word
or phrase that indicates an entity category in the text.
Each entity or entity category should have a unique name in a particular
application. The process of grouping or clustering entity expressions into entity
categories is called entity resolution or grouping.
For aspects of entities, the problem is basically the same as for entities. For
example, picture, image, and photo refer to the same aspect for cameras. We thus
need to extract aspect expressions and resolve them.
Definition 11 (Aspect Category and Aspect Expression) An aspect category of
an entity represents a unique aspect of the entity, while an aspect expression or
mention is an actual word or phrase that indicates an aspect category in the text.
Each aspect or aspect category should also have a unique name in a particular
application. The process of grouping aspect expressions into aspect categories
(aspects) is called aspect resolution or grouping.
Aspect expressions are usually nouns and noun phrases but can also be verbs,
verb phrases, adjectives, and adverbs. They can also be explicit or implicit (Hu and
Liu 2004).
Definition 12 (Explicit Aspect Expression) Aspect expressions that appear in an
opinion text as nouns and noun phrases are called explicit aspect expressions.
For example, ‘picture quality’ in “The picture quality of this camera is great” is
an explicit aspect expression.
Definition 13 (Implicit Aspect Expression) Aspect expressions that are not nouns
or noun phrases but indicate some aspects are called implicit aspect expressions.
For example, expensive is an implicit aspect expression in “This camera is
expensive.” It implies the aspect price. Many implicit aspect expressions are
adjectives and adverbs used to describe or qualify some specific aspects, e.g.,
expensive (price), and reliably (reliability). They can also be verb and verb phrases,
e.g., “I can install the software easily.” Install indicates the aspect installation.

2 Many Facets of Sentiment Analysis

21

Implicit aspect expressions are not just individual adjectives, adverbs, verbs and
verb phrases; they can be very complex. For example, in “This camera will not easily
fit in my pocket,” ‘fit in my pocket’ indicates the aspect size (and/or shape). In the
sentence “This restaurant closes too early,” ‘closes too early’ indicates the aspect
of closing time of the restaurant. In both cases, some commonsense knowledge may
be needed to recognize them.
Aspect extraction is a very challenging problem, especially when it involves
verbs and verb phrases. In some cases, it is even very hard for human beings to
recognize and to annotate. For example, in a vacuum cleaner review, one wrote
“The vacuum cleaner does not get the crumbs out of thick carpets,” which seems to
describe only one very specific aspect, ‘get the crumbs out of thick carpets.’ But in
practice, it may be more useful to decompose it into three different aspects indicated
by (1) ‘get something out of,’ (2) crumbs, and (3) ‘thick carpets.’ (1) represents the
suction power of the vacuum cleaner in general, (2) represents suction related to
crumbs, and (3) represents suction related to ‘thick carpets.’ All three are important
and useful because the user may be interested in knowing whether the vacuum can
suck crumbs, and whether it works well with thick carpets.
The third component in the opinion definition is the sentiment. For this, we
need to perform sentiment classification or regression to determine the sentiment
orientation or score on the involved aspect and/or entity. The fourth component and
fifth components are opinion holder and opinion posting time respectively. They
also have expressions and categories as entities and aspects. I will not repeat their
definitions. Note that opinion holders (Bethard et al. 2004; Choi et al. 2005; Kim
and Hovy 2004) are also called opinion sources in (Wiebe et al. 2005).
Based on the above discussions, we can now define a model of entity and a model
of opinion document (Liu 2006) and summarize the main sentiment analysis tasks.
Model of Entity An entity e is represented by itself as a whole and a finite set of
its aspects A D fa1, a2, : : : , ang. e can be expressed in text with any one of a finite
set of its entity expressions fee1, ee2, : : : , eesg. Each aspect a 2 A of entity e can be
expressed with any one of its finite set of aspect expressions fae1, ae2, : : : , aemg.
Model of Opinion Document An opinion document d contains opinions about a
set of entities fe1, e2, : : : , erg and a subset of aspects of each entity. The opinions
are from a set of opinion holders fh1, h2, : : : , hpg and are given at a particular time
point t.
Given a set of opinion documents D, sentiment analysis performs the following
eight (8) main tasks:
Task 1 (entity extraction and resolution): Extract all entity expressions in D, and
group synonymous entity expressions into entity clusters (or categories). Each
entity expression cluster refers to a unique entity e.
Task 2 (aspect extraction and resolution): Extract all aspect expressions of the
entities, and group these aspect expressions into clusters. Each aspect expression
cluster of entity e represents a unique aspect a.

22

B. Liu

Task 3 (opinion holder extraction and resolution): Extract the holder expression
of each opinion from the text or structured data and group them. The task is
analogous to tasks 1 and 2.
Task 4 (time extraction and standardization): Extract the posting time of each
opinion and standardize different time formats.
Task 5 (aspect sentiment classification or regression): Determine whether an opinion about an aspect a (or entity e) is positive, negative or neutral (classification),
or assign a numeric sentiment rating score to the aspect (or entity) (regression).
Task 6 (opinion quintuple generation): Produce all opinion quintuples (e, a, s, h, t)
expressed in D based on the results from tasks 1–5. This task is seemingly very
simple but it is in fact quite difficult in many cases as Review B below shows.
For more advanced analysis, we also need to perform the following two
additional tasks, which are analogous to task 2:
Task 7 (opinion reason extraction and resolution): Extract reason expressions for
each opinion, and group all reason expressions for each aspect or entity and each
sentiment orientation into clusters. Each cluster for an aspect (or entity) and a
sentiment orientation represents a unique reason for the aspect (or entity) and the
orientation.
Task 8 (opinion qualifier extraction and resolution): Extract qualifier expressions
for each opinion, and group all qualifier expressions for each aspect (or entity)
and each sentiment orientation into clusters. Each cluster for an aspect (or entity)
and a sentiment orientation represents a unique qualifier for the aspect (or entity)
and the orientation.
Although reasons for and qualifiers of opinions are useful, their extraction and
categories are very challenging. Little research has been done about them so far.
We use an example review to illustrate the tasks (a sentence id is again associated
with each sentence) and the mining results.
Review B
Posted by: bigJohn
Date: Sept. 15, 2011
(1) I bought a Samsung camera and my friend brought a Canon camera yesterday. (2) In the
past week, we both used the cameras a lot. (3) The photos from my Samy are not clear for
night shots, and the battery life is short too. (4) My friend was very happy with his camera
and loves its picture quality. (5) I want a camera that can take good photos. (6) I am going
to return it tomorrow.

Task 1 should extract the entity expressions, Samsung, Samy, and Canon, and
group Samsung and Samy together because they represent the same entity. Task 2
should extract aspect expressions picture, photo, and battery life, and group picture
and photo together as they are synonyms for cameras. Task 3 should find that the
holder of the opinions in sentence (3) is bigJohn (the blog author) and that the holder
of the opinions in sentence (4) is bigJohn’s friend. Task 4 should find that the time
when the blog was posted is Sept-15-2011. Task 5 should find that sentence (3)
gives a negative opinion to the picture quality of the Samsung camera and a negative

2 Many Facets of Sentiment Analysis

23

opinion also to its battery life. Sentence (4) gives a positive opinion to the Canon
camera as a whole and also to its picture quality. Sentence (5) seemingly expresses
a positive opinion, but it does not. To generate opinion quintuples for sentence (4)
we need to know what ‘his camera’ and its refer to. Task 6 should finally generate
the following opinion quintuples:
1.
2.
3.
4.

(Samsung, picture_quality, negative, bigJohn, Sept-15-2011)
(Samsung, battery_life, negative, bigJohn, Sept-15-2011)
(Canon, GENERAL, positive, bigJohn’s_friend, Sept-15-2011)
(Canon, picture_quality, positive, bigJohn’s_friend, Sept-15-2011)

With more advanced mining and analysis, we also find the reasons and qualifiers
of opinions. None below means unspecified.
1. (Samsung, picture_quality, negative, bigJohn, Sept-15-2011)
Reason for opinion: picture not clear
Qualifier of opinion: night shots
2. (Samsung, battery_life, negative, bigJohn, Sept-15-2011)
Reason for opinion: short battery life
Qualifier of opinion: none
3. (Canon, GENERAL, positive, bigJohn’s_friend, Sept-15-2011)
Reason for opinion: none
Qualifier of opinion: none
4. (Canon, picture_quality, positive, bigJohn’s_friend, Sept-15-2011)
Reason for opinion: none
Qualifier of opinion: none

2.2 Definition of Opinion Summary
Unlike facts, opinions are subjective (although they may not be all expressed
in subjective sentences). An opinion from a single opinion holder is usually not
sufficient for action. In almost all applications, the user needs to analyze opinions
from a large number of opinion holders. This tells us that some form of summary
of opinions is necessary. The question is what an opinion summary should be. On
the surface, an opinion summary is just like a multi-document summary because we
need to summarize multiple opinion documents, e.g., reviews. It is, however, very
different from traditional multi-document summary. Although there are informal
descriptions about what a traditional multi-document summary should be, it is never
formally defined. A traditional multi-document summary is often just “defined”
operationally based on each specific algorithm that produces the summary. Thus
different algorithms produce different kinds of summaries. The resulting summaries
are also hard to evaluate. An opinion summary in its core form, on the other hand,
can be defined precisely based on the quintuple definition of opinion and easily
evaluated. That is, all opinion summarization algorithms should aim to produce the

24

B. Liu

same summary. Although they may still produce different final summaries, that is
due to their different accuracies. This core form of opinion summary is called the
aspect-based opinion summary (or feature-based opinion summary) (Hu and Liu
2004; Liu et al. 2005)
Definition 11 (Aspect-Based Opinion Summary) The aspect-based opinion summary about an entity e is of the following form:
GENERAL: number of opinion holders who are positive about entity e number of
opinion holders who are negative about entity e
Aspect 1: number of opinion holders who are positive about aspect 1 of entity e
number of opinion holders who are negative about aspect 1 of entity e
:::
Aspect n: number of opinion holders who are positive about aspect n of entity e
number of opinion holders who are negative about aspect n of entity e
where GENERAL represents the entity e itself and n is the total number of aspects
of e.
The key features of this opinion summary definition are that it is based on positive
and negative opinions about each entity and its aspects and that it is quantitative.
The quantitative perspective is reflected by the numbers of positive and negative
opinions. In an application, the number counts can also be replaced by percentages.
The quantitative perspective is especially important in practice. For example, 20%
of the people positive about a product is very different from 80% of the people
positive about the product.
To illustrate this form of summary, we summarize a set of reviews of a digital
camera, called digital camera 1, in Figure 2.1. This is called a structured summary
in contrast to a traditional text summary of a short document generated from one
or multiple long documents. In the figure, 105 reviews expressed positive opinions
about the camera itself denoted by GENERAL and 12 expressed negative opinions.
Picture quality and battery life are two camera aspects. 75 reviews expressed
positive opinions about the picture quality, and 42 expressed negative opinions.

Digital Camera 1:
Aspect: GENERAL
Positive:
105
Negative:
12
Aspect: Picture quality
Positive:
75
Negative:
42
Aspect: Battery life
Positive:
50
Negative:
9
…
Fig. 2.1 An aspect-based opinion summary








2 Many Facets of Sentiment Analysis

25

We also added , which can be a link pointing to the
sentences and/or the whole reviews that contain the opinions (Hu and Liu 2004;
Liu et al. 2005). With this summary, one can easily see how existing customers feel
about the camera. If one is interested in a particular aspect and additional details,
one can drill down by following the  link to see the
actual opinion sentences or reviews.
In a more advanced analysis, we can also summarize opinion reasons and qualifiers in a similar way. Based on my experience, qualifiers for opinion statements
are rare, but reasons for opinions are quite common. To perform the task, we need
another level of summary. For example, in the example of Figure 2.1, we may want
to summarize the reasons for the poor picture quality based on the sentences in
. We may find that 35 people say the pictures are not
bright enough and 7 people say that the pictures are blurry.
Based on the idea of aspect-based summary, researchers have proposed many
opinion summarization algorithms, and also extended this form of summary to some
other more specialized forms (Liu 2015).

2.3 Affect, Emotion, and Mood
Affect, emotion, and mood have been studied extensively in several fields, e.g.,
psychology, philosophy, and sociology. However, investigations in these fields are
seldom concerned with the language expressions used to express such feelings.
Their main concerns are people’s psychological states of mind, theorizing what
affect, emotion and mood are, what constitute basic emotions, what physiological
reactions happen (e.g., heart rate changes, blood pressure, sweating and so on),
what facial expressions, gestures and postures are, and measuring and investigating
the impact of such mental states. These mental states have also been exploited
extensively in application areas such as marketing, economics, and education.
However, even with such extensive research, understanding these concepts is still
slippery and confusing because different theorists often have somewhat different
definitions for them and even do not completely agree with each other about what
emotion, mood, and affect are. For example, about emotion, diverse theorists have
proposed that there are from two to twenty basic human emotions and some even
do not believe there is such a thing called basic emotions (Ortony and Turner 1990).
In most cases, emotion and affect are regarded as synonymous, and indeed, all three
terms are sometimes used interchangeably. Affect is also used as an encompassing
term covering all topics related to emotion, feeling, and mood. To make matters
worse, in applications, researchers and practitioners use these concepts loosely in
whatever way they feel like to without following any established definitions. Thus
one is often left puzzled by just what an author means when the word emotion,
mood, or affect is used. In most cases, the definition of each term also uses one or
more of the other terms resulting in circular definitions, which causes further confusion. The good news for natural language processing researchers and practitioners

26

B. Liu

is that in practical applications of sentiment analysis, we needn’t be too concerned
with such an unsettled state of affair because in practice we can pick up and use
whatever emotion or mood states that are suitable for the applications at hand.
This section first tries to create a reasonable understanding of these concepts
and their relationships for our tasks of natural language processing in general and
sentiment analysis in particular. It then puts these three concepts in the context of
sentiment analysis and discusses how they can be handled in sentiment analysis.

2.3.1 Affect, Emotion, and Mood in Psychology
We start the discussion with the dictionary definitions of affect, emotion, and mood1 .
The concept of feeling is also included as all three concepts are about human
feelings. From the definitions, we can see how difficult it is to explain or to articulate
these concepts:
• Affect: Feeling or emotion, especially as manifested by facial expression or body
language.
• Emotion: A mental state that arises spontaneously rather than through conscious
effort and is often accompanied by physiological changes.
• Mood: A state of mind or emotion.
• Feeling: An affective state of consciousness, such as that resulting from emotions, sentiments, or desires.
These definitions are confusing from a scientific point of view because we do not
see a clear demarcation for each concept. We turn to the field of psychology to look
for a better definition for each of them. The convergence of views and ideas among
theorists in the past twenty years gives us a workable classification scheme.
An affect is commonly defined as an neurophysiological state consciously
accessible as the simplest raw (nonreflective) feeling evident in moods and emotions
(Russell 2003). The key point here is that such a feeling is primitive and not directed
at an object. For example, you are watching a scary movie. If you are affected,
it moves you and you experience a feeling of being scared. Your mind further
processes this feeling and expresses it to yourself and the world around you. The
feeling is then displayed as an emotion, such as crying, shock, and scream.
Emotion is thus the indicator of affect. Due to cognitive processing, emotion is a
compound (rather than primitive) feeling concerned with a specific object, such as
a person, an event, a thing, or a topic. It tends to be intense and focused and lasts a
short period of time. Mood, like emotion, is a feeling or affective state but it typically
lasts longer than emotion and tends to be more unfocused and diffused. Mood is also
less intense than emotion. For example, you may wake up feeling happy and stay
that way for most of the day.

1

http://www.thefreedictionary.com/subjective

2 Many Facets of Sentiment Analysis

27

In short, emotions are quick and tense, while moods are more diffused and
prolonged feelings. For example, we can get very angry very quickly, but it is
difficult to stay very angry for a long time. The anger emotion may subside into an
irritable mood that can last quite a long time. An emotion is usually very specific,
triggered by noticeable events, which means that an emotion has a specific target.
In this sense, emotion is like a rational opinion. On the other hand, a mood can be
caused by multiple events, and sometimes it may not have any specific targets or
causes. Mood typically also has a dimension of future expectation. It can involve a
structured set of beliefs about general expectations of a future experience of pleasure
or pain, or of positive or negative affect in the future (Batson et al. 1992).
Since sentiment analysis is not so much concerned with affect as defined above,
below we focus only on emotion and mood in the psychological context. Let us start
with emotion. Emotion has been frequently mentioned in sentiment analysis. Since
it has a target or an involved entity, it fits the sentiment analysis context naturally.
Almost all applications are interested in opinions and emotions about some target
entities.
Theorists in psychology have grouped emotions into categories. However, as we
mentioned earlier, there is still not a set of agreed basic (or primary) emotions among
theorists. In (Ortony and Turner 1990), the basic emotions proposed by several
theorists were compiled to show there is a great deal of disagreement. We reproduce
them in Table 2.1.
In (Parrott 2001), apart from the basic emotions, secondary and tertiary emotions
were also proposed (see Table 2.2). These secondary and tertiary are useful in some

Table 2.1 Basic emotions from different theorists
Source
Arnold (1960)
Ekman et al. (1982)
Gray (1982)
Izard (1971)
James (1884)
McDougall (1926)
Mowrer (1960)
Oatley and Jobnson-Laird (1987)
Panksepp (1982)
Plutchik (1980)
Tomkins (1984)
Watson (1930)
Weiner and Graham (1984)
Parrott (2001)

Basic emotions
Anger, aversion, courage, dejection, desire, despair, fear,
hate, hope, love, sadness
Anger, disgust, fear, joy, sadness, surprise
Anxiety, joy, rage, terror
Anger, contempt, disgust, distress, fear guilt, interest, joy,
shame, surprise
Fear, grief, love, rage
Anger, disgust, elation, fear, subjection, tender-emotion,
wonder
Pain, pleasure
Anger, disgust, anxiety, happiness, sadness
Expectancy, fear, rage, panic
Acceptance, anger, anticipation, disgust, joy, fear, sadness,
surprise
Anger, interest, contempt, disgust, distress, fear, joy,
shame, surprise
Fear, love, rage
Happiness, sadness
Anger, fear, joy, love, sadness, surprise

28

B. Liu

Table 2.2 Primary, Secondary and Tertiary emotions from Parrott (2001)
Primary
emotion
Anger

Secondary
emotion
Disgust
Envy
Exasperation
Irritability
Rage

Fear

Torment
Horror
Nervousness
Cheerfulness

Joy

Love

Sadness

Surprise

Contentment
Enthrallment
Optimism
Pride
Relief
Zest
Affection

Tertiary emotion
Contempt, loathing, revulsion
Jealousy
Frustration
Aggravation, agitation, annoyance, crosspatch, grouchy,
grumpy
Anger, bitter, dislike, ferocity, fury, hatred, hostility, outrage,
resentment, scorn, spite, vengefulness, wrath
Torment
Alarm, fear, fright, horror, hysteria, mortification, panic,
shock, terror
Anxiety, apprehension (fear), distress, dread, suspense,
uneasiness, worry
Amusement, bliss, gaiety, glee, jolliness, joviality, joy,
delight, enjoyment, gladness, happiness, jubilation, elation,
satisfaction, ecstasy, euphoria
Pleasure
Enthrallment, rapture
Eagerness, hope
Triumph
Relief
Enthusiasm, excitement, exhilaration, thrill, zeal
Adoration, attractiveness, caring, compassion, fondness,
liking, sentimentality, tenderness
Longing
Desire, infatuation, passion

Longing
Lust/sexual
desire
Disappointment Dismay, displeasure
Neglect
Alienation, defeatism, dejection, embarrassment,
homesickness, humiliation, insecurity, insult, isolation,
loneliness, rejection
Sadness
Depression, despair, gloom, glumness, grief, melancholy,
misery, sorrow, unhappy, woe
Shame
Guilt, regret, remorse
Suffering
Agony, anguish, hurt
Sympathy
Pity, sympathy
Surprise
Amazement, astonishment

sentiment analysis applications because the set of basic emotions may not be finegrained enough. For example, in one of the applications that I worked on, the client
was interested in detecting optimism in the financial market. Optimism is not a basic
emotion in the list of any theorist, but it is a secondary emotion for joy in Table 2.2.
Note that although the words in Table 2.2 describe different emotions or states of
mind, they can also be used as part of an emotion lexicon in sentiment analysis to

2 Many Facets of Sentiment Analysis

29

spot different kinds of emotions. Of course, they need to be significantly expanded
to include those synonymous words and phrases to form a reasonably complete
emotion lexicon. In fact, there are some emotion lexicons that have been compiled
by researchers, see (Liu 2015). Note also that for sentiment analysis, we do not need
to be concerned with the disagreement of theorists. For a particular application, we
can choose the types of emotion that are useful to the application. We also do not
need to worry about whether they are primary, second or tertiary.
The emotion annotation and representation language (EARL) proposed by the
Human-Machine Interaction Network on Emotion (HUMAINE) (HUMAINE 2006)
has classified 48 emotions into different kinds of positive and negative orientations
or valences (Table 2.3). This is useful to us because sentiment analysis is mainly
interested in expressions with positive or negative orientations or polarities (also
called valences). However, we should take note that some emotions do not have
positive or negative orientations, e.g., surprise and interest. Some psychologists felt
that these should not be regarded as emotions (Ortony and Turner 1990) simply
because they do not have positive or negative orientations or valences. For the same
reason, they are not commonly used in sentiment analysis.

Table 2.3 HUMAINE polarity annotations of emotions
Negative and forceful
Anger
Annoyance
Contempt
Disgust
Irritation
Negative and not in control
Anxiety
Embarrassment
Fear
Helplessness
Powerlessness
Worry
Negative thoughts
Doubt
Envy
Frustration
Guilt
Shame
Agitation
Stress
Shock
Tension

Negative and passive
Boredom
Despair
Disappointment
Hurt
Sadness
Positive and lively
Amusement
Delight
Elation
Excitement
Happiness
Joy
Pleasure
Positive thoughts
Courage
Hope
Pride
Satisfaction
Trust

Quiet positive
Calm
Content
Relaxed
Relieved
Serene
Caring
Affection
Empathy
Friendliness
Love

Reactive
Interest
Politeness
Surprised

30

B. Liu

We now turn to mood. The types of mood are similar to those of emotion except
that the types of emotion that last only momentarily will not usually be moods, e.g.,
surprise and shock. Thus, the words or phrases used to express moods are similar
to those for emotions too. However, since mood is a feeling that lasts a relatively
long time, is diffused, and may not have a clear cause or target object, it is hard
to recognize unless a person explicitly says it, e.g., I feel sad today. We can also
monitor one’s writings over a period of time to assess his/her prevailing mood in
the period, which can help discover people with prolonged mental or other medical
conditions (e.g., chronicle depression) and even the tendency to commit suicides or
crimes.
It is also interesting to discover the mood of the general population, e.g., public
mood, and the general atmosphere between organizations or countries, e.g., the
mood of US and Russian relations, by monitoring the traditional news media and/or
social media over a period of time.

2.3.2 Affect, Emotion, and Mood in Sentiment Analysis
The above discussions are only about people’s states of mind, which are the subjects
of study of psychologists. However, for sentiment analysis, we need to know how
such feelings are expressed in natural language and how they can be recognized.
This leads us to the linguistics of affect, emotion and mood. Affect as defined
by psychologists as a primitive response or feeling with no target is not much
of interest to us as almost everything written in text or displayed in the form of
facial expressions and other visible signs have already gone through some cognitive
processing to become emotion or mood. However, we note that the term affect is still
commonly used in linguistics and many other fields to mean emotion and mood.
Wikipedia has a good page describing the linguistic aspect of emotion and
mood. There are two main ways that human beings express themselves, speech
and writing. In addition to choices of grammatical and lexical expressions, which
are common to both speech and writing (see below), speaker emotion can also be
conveyed through paralinguistic mechanisms such as intonations, facial expressions,
body movements, biophysical signals or changes, gestures, and postures. In writing,
special punctuations (e.g., repeated exclamation marks, !!!!), capitalization of all
letters of a word, emoticons, and lengthening of words (e.g., sloooooow) are
frequently used, especially in social media.
Regarding choices of grammatical and lexical expressions, there are several
common ways that people often employ to express emotions or moods:
1. use emotion or mood words or phrases such as love, disgusting, angry, and upset.
2. describe emotion-related behaviors, e.g., “He cried after he saw his mother” and
“After received the news, he jumped up and down for a few minutes like a small
boy.”
3. use intensifiers. As we discussed in Sect. 2.1.3, common English intensifiers
include very, so, extremely, dreadfully, really, awfully (e.g., awfully bad), terribly

2 Many Facets of Sentiment Analysis

31

(e.g., terribly good), never (e.g., “I will never buy any product from them again”),
the sheer number of, on earth (e.g., “What on earth do you think you are doing?”),
the hell (e.g., “What the hell are you doing?”), a hell of a, etc. To emphasize
further, intensifiers may be repeated, e.g., “This car is very very good.”
4. use superlatives. Arguably, many superlative expressions also express emotions,
e.g., “This car is simply the best.”
5. use pejorative (e.g., “He is a fascist.”), laudatory (e.g., “He is a saint.”), and
sarcastic expressions (e.g., “What a great car, it broke the second day”).
6. use swearing, cursing, insulting, blaming, accusing, and threatening expressions.
My experience is that using these clues is sufficient to recognize emotion and
mood in text, although in linguistics, adversative forms, honorific and deferential
language, interrogatives, tag questions, and the like may also be employed to
express emotional feelings, but their uses are rare and are also hard to recognize
computationally.
To design emotion detection algorithms, in addition to considering the above
clues, we should be aware that there is a cognitive gap between people’s true
psychological states of mind and the language that they use to express such states.
There are many reasons (e.g., being polite, and do not want people to know
one’s true feeling) that they may not fully match. Thus, language does not always
represent psychological reality. For example, when one says “I am happy with this
car,” one may not have any emotional reaction towards the car although the emotion
word happy is used. Furthermore, emotion and mood are very difficult to distinguish
in written text (Alm 2008). We normally do not distinguish them. When we say
emotion, we mean emotion or mood.
Since emotions have targets and most of them also imply positive or negative
sentiment, they can be represented and handled in very much the same way as
rational opinions. Although a rational opinion emphasizes a person’s evaluation
about an entity and an emotion emphasizes a person’s feeling caused by an entity,
emotion can essentially be regarded as sentiment with a stronger intensity (see Sect.
2.1.3). It is often the case that when the sentiment of a person becomes so strong,
he/she becomes emotional. For example, “The hotel manager is not professional”
expresses a rational opinion, while “I almost cried when the hotel manager talked to
me in a hostile manner” indicates that the author’s sentiment reached the emotional
level of sadness and/or anger. The sentiment orientation of an emotion naturally
inherits the polarity of the emotion, e.g., sad, anger, disgust, and fear are negative,
and love and joy are positive. At the emotional level, sentiment becomes more finegrained. Additional mechanisms are needed to recognize different types of emotions
in writing.
Due to the similarity of emotion and rational opinion, we can still use the
quadruple or quintuple representation of opinion (Definitions 1 and 7) to represent
emotion. However, if we want to be more precise, we can give it a separate definition
based on the quadruple (Definition 1) or quintuple (Definition 7) definitions as the
meanings of some components in the tuple are not the exactly same as they were in
the opinion definition because emotions focus on personal feelings, while rational
opinions focus on evaluations.

32

B. Liu

Definition 14 (Emotion) An emotion is a quintuple,
.e; a; m; f ; t/;
where e is the target entity, a is the target aspect of e that is responsible for the
emotion, m is the emotion type or a pair representing an emotion type and an
intensity level, f is the feeler of the emotion, and t is the time when the emotion
is expressed.
For example, for the emotion expressed in the sentence “I am so upset with the
manager of the hotel,” the entity is ‘the hotel,’ the aspect is ‘the manager’ of the
hotel, the emotion type is anger, and the emotion feeler is I (the author). If we
know the time when the emotion was expressed we can add it to the quintuple
representation. As another example, in “After hearing his brother’s death, he burst
into tears.” the target entity is ‘his brother’s death,’ which is an event, and there is
no aspect. The emotion type is sadness and the emotion feeler is he.
In practical applications, we should integrate the analysis of rational opinions
and emotions, we may also want to add the sentiment orientation or polarity of
an emotion, i.e., whether it is positive (desirable) or negative (undesirable) for the
feeler. If that is required, a sentiment component can be included to Definition 14 to
make it a sextuple.
Cause of Emotion In Sect. 2.1.5, we discussed the reasons for opinions. In a
similar way, emotions have causes as emotions are usually caused by some internal
or external events. Here we use the word cause instead of reason because an emotion
is an effect produced by a cause (usually an event), rather than a justification or
explanation in support of an opinion. In the above sentence, ‘his brother’s death’ is
the cause for his sadness emotion. Actually, ‘his brother’s death’ is both the target
entity and the cause. In many cases, the target and the cause of an emotion are
different. For example, in “I am so mad with the hotel manager because he refused
to refund my booking fee,” the target entity is the hotel, the target aspect is the
manager of the hotel, and the cause of the anger emotion is ‘he refused to refund
my booking fee.’ There is a subtle difference between ‘his brother’s death’ and ‘he
refused to refund my booking fee.’ The latter states an action performed by he (the
hotel manager) that causes the sadness emotion (negative). He is the agent of the
undesirable action. The sentiment on the hotel manager is negative. The sentence
also explicitly stated the anger is toward the hotel manager, In the case of ‘his
brother’s death,’ ‘his brother’ or death alone is not the target of the emotion. It
is the whole event that is the target and the cause of the sadness emotion.
Unlike rational opinions, in many emotion and mood sentences, the authors
may not explicitly state the entities (e.g., named entities, topics, issues, actions
and events) that are responsible for the emotions or moods, e.g., “I felt a bit sad
this morning” and “There is sadness in her eyes.” The reason is that a rational
opinion sentence focuses on both the opinion target and the sentiment on the target
but the opinion holder is often omitted (e.g., “The pictures from this camera are
great”) while an emotion sentence focuses on the feeling of the feeler (e.g., “There

2 Many Facets of Sentiment Analysis

33

is sadness in her eyes.” This means that a rational opinion sentence contains both
sentiments and their targets explicitly, but may or may not give the opinion holder.
An emotion sentence always has feelers and emotion expressions, but may or may
not state the emotion target or the cause (e.g., “I love this car” and “I felt sad this
morning”). This does not mean that some emotions do not have targets or causes.
They do, but the targets or the causes may be expressed in previous sentences or
implied by the context, which makes extracting targets or causes very difficult. In
the case of mood, the causes may be implicit or even unknown and are thus not
stated in the text.

2.4 Different Types of Opinions
Opinions can actually be classified along many dimensions. We discuss some main
classifications in this section.

2.4.1 Regular and Comparative Opinions
The type of opinion that we have defined is called the regular opinion (Liu 2006).
Another type is comparative opinion (Jindal and Liu 2006b).
Regular Opinion A regular opinion is often referred to simply as an opinion in the
literature. It has two main sub-types (Liu 2006):
Direct opinion: A direct opinion is an opinion that is expressed directly on an entity
or an entity aspect, e.g., “The picture quality is great.”
Indirect opinion: An indirect opinion is an opinion that is expressed indirectly on an
entity or aspect of an entity based on some positive or negative effects on some
other entities. This sub-type often occurs in the medical domain. For example,
the sentence “After injection of the drug, my joints felt worse” describes an
undesirable effect of the drug on ‘my joints,’ which indirectly gives a negative
opinion or sentiment to the drug. In this case, the entity is the drug and the aspect
is the effect on joints. Indirect opinions also occur in other domains, although
less frequently. In these cases, they are typically expressed benefits (positive) or
issues (negative) of entities, e.g., “With this machine, I can finish my work in one
hour, which used to take me 5 hours” and “After switching to this laptop, my eyes
felt much better.” In marketing, benefits of a product or service are regarded as
the major selling points. Thus, extracting such benefits is of practical interest.
Comparative Opinion A comparative opinion expresses a relation of similarities
or differences between two or more entities and/or a preference of the opinion holder
based on some shared aspects of the entities (Jindal and Liu 2006a, b). For example,
the sentences “Coke tastes better than Pepsi” and “Coke tastes the best” express

34

B. Liu

two comparative opinions. A comparative opinion is usually expressed using the
comparative or superlative form of an adjective or adverb, although not always (e.g.,
prefer). The definitions in Sects. 2.1 and 2.2 do not cover comparative opinion.
Comparative opinions have many types. See (Liu 2015) for their definitions.

2.4.2 Subjective and Fact-Implied Opinions
Opinions and sentiments are by nature subjective because they are about people’s
subjective views, appraisals, evaluations, and feelings. But when they are expressed
in actual text, they do not have to appear as subjective sentences. People can use
objective or factual sentences to express their happiness and displeasure because
facts can be desirable or undesirable. Conversely, not all subjective sentences
express positive or negative sentiments, e.g., “I think he went home,” which is a
belief and has no positive or negative orientation. Based on subjectivity, we can
classify opinions into two types, subjective opinions and fact-implied opinions. We
define them below.
Subjective Opinion An subjective opinion is a regular or comparative opinion
given in a subjective statement, e.g.,
“Coke tastes great.”
“I think Google’s profit will go up next month.”
“This camera is a masterpiece.”
“We are seriously concerned about this new policy.”
“Coke tastes better than Pepsi.”

We can broadly classified subjective opinions into two categories: rational
opinions and emotional opinions (Sect. 2.1.3).
Fact-Implied Opinion A fact implied opinion is a regular or comparative opinion
implied in an objective or factual statement. Such an objective statement expresses
a desirable or undesirable fact or action. This type of opinion can be further divided
into two subtypes:
1. Personal fact-implied opinion: Such an opinion is implied by a factual statement about someone’s personal experience, e.g.,
“I bought the mattress a week ago, and a valley has formed in the middle.”
“I bought the toy yesterday and I have already thrown it into the trash can.”
“My dad bought the car yesterday and it broke today.”
“The battery of this phone lasts longer than that of my previous Samsung phone.”

Although factual, these sentences tell us whether the opinion holder is positive
or negative about the product or his preference among different products. Thus,
the opinions implied by these factual sentences are no different from subjective
opinions.

2 Many Facets of Sentiment Analysis

35

2. Non-personal fact-implied opinion: This type is entirely different as it does not
imply any personal opinion. It often comes from fact reporting and the reported
fact does not give any opinion from anyone, e.g.,
“Google’s revenue went up by 30%.”
“The unemployment rate came down last week.”
“Google made more money than Yahoo last month.”

Unlike personal facts, these sentences do not express any experience or evaluation from any person. For instance, the first sentence above does not have the same
meaning as a sentiment resulted from a person who has used a Google product
and expresses a desirable or undesirable fact about the Google product. Since these
sentences do not give any personal opinion, they do not have opinion holders
although they do have the sources of information. For example, the source of the
information in the first sentence above is likely to be Google itself, but it is a fact,
not a Google’s subjective opinion.
However, we can still treat them as a type of opinion sentences due to the
following two reasons:
1. Each of the sentences above does indicate a desirable and/or undesirable state for
the involved entities or topics (i.e., Google, Yahoo and unemployment rate) based
on our commonsense knowledge.
2. The persons who post the above sentences might be expressing positive or
negative opinions implicitly about the involved entities. For example, the person
who posted the first sentence on Twitter is likely to have a positive sentiment
about Google; otherwise, he/she would probably not post the fact. This kind of
posts occur very frequently on Twitter, where Twitter users pick up some news
headlines from the traditional media and post them on Twitter. Many people may
also re-tweet them.
As we can see, it is important to distinguish personal facts and non-personal
facts as opinions induced from non-personal facts represent a very different type
of opinions and need a special treatment. How to deal with such facts depends on
applications. My recommendation is to assign it the positive or negative orientation
based on our commonsense knowledge whether the sentence is about a fact desirable
or undesirable to the involved entity, e.g., Google. Users of the sentiment analysis
system should be made aware of the convention so that they can make use the
opinion appropriately based on their applications.
Sometimes the author who posts such a fact may also give an explicit opinion,
e.g.,
“I am so upset that Google’s share price went up today.”

The clause ‘Google’s share price went up today’ in the example gives a nonpersonal fact-implied positive opinion about Google, but the author is negative about
it. This is called a meta-opinion, an opinion about an opinion.

36

B. Liu

Subjective opinions are usually easier to deal with because the number of words
and phrases that can be used to explicitly express subjective feelings is limited, but
this is not the case for fact-implied opinions. There seem to be an infinite number of
desirable and undesirable facts and every domain is different. Much of the existing
research has focused on subjective opinions. Limited work has been done about
fact-implied opinions (Zhang and Liu 2011).

2.4.3 First-Person and Non-First-Person Opinions
In some applications, it is important to distinguish those statements expressing
one’s own opinions from those statements expressing beliefs about someone else’s
opinions. For example, in a political election, one votes based on one’s belief of
each candidate’s stances on issues, rather than the true stances of the candidate,
which may or may not be the same.
1. First-person opinion: Such an opinion states one’s own attitude towards an
entity. It can be from a person, a representative of a group, or an organization.
Here are some example sentences expressing first-person opinions.
“Tax increase is bad for the economy.”
“I think Google’s profit will go up next month.”
“We are seriously concerned about this new policy.”
“Coke tastes better than Pepsi.”

Notice that not every sentence needs to explicitly use the first person pronoun “I”
or “we,” or to mention an organization name.
2. Non-first-person opinion: Such an opinion is expressed by a person stating
someone else’s opinion. That is, it is a belief of someone else’s opinion about
some entities or topics, e.g.,
“I think John likes Lenovo PCs.”
“Jim loves his iPhone.”
“President Obama supports tax increase.”
“I believe Obama does not like wars.”

2.4.4 Meta-opinions
Meta-opinions are opinions about opinions. That is, a meta-opinion’s target is also
an opinion which is usually contained in a subordinate clause. The opinion in the
subordinate clause can express either a fact with an implied opinion or a subjective
opinion. Let us see some examples:
“I am so upset that Google’s profit went up”
“I am very happy that my daughter loves her new Ford car”
“I am so sad that Germany lost the game.”

2 Many Facets of Sentiment Analysis

37

These sentences look quite different from opinion sentences before. But they
still follow the same opinion definition in Definition 7. It is just that the target of
the meta-opinion in the main clause is now an opinion itself in the subordinate
clause. For example, in the first sentence, the author is negative about ‘Google’s
profit went up,’ which is the target of the meta-opinion in the main clause. So the
meta-opinion is negative, but its target is a regular positive opinion about ‘Google’s
profit.’ In practice, these two types of opinions should be treated differently. Since
meta-opinions are rare, there is little research or practical work about them.

2.5 Author and Reader Standpoint
We can look at an opinion from two perspectives, that of the author (opinion holder)
who posts the opinion, and that of the reader who reads the opinion. Since opinions
are subjective, naturally the author and the reader may not see the same thing in the
same way. Let us use the following two example sentences to illustrate the point:
“This car is too small for me.”
“Google’s profits went up by 30%.”

Since the author or the opinion holder of the first sentence felt the car is too small,
a sentiment analysis system should output a negative opinion about the size of the
car. However, this does not mean that the car is too small for everyone. A reader may
actually like the small size, and feel positive about it. This causes a problem because
if the system outputs only a negative opinion about size, the reader will not know
whether it is too small or too large and then he/she would not see this positive aspect
for him/her. Fortunately, this problem can be dealt with by mining and summarizing
opinion reasons (see Sect. 2.1.2). Here ‘too small’ not only indicates a negative
opinion about the size but also the reason for the negative opinion. With the reason,
the reader can see a more complete picture of the opinion.
The second sentence represents a non-personal fact-implied opinion. As discussed in Sect. 2.4.1, the person who posts the fact is likely to be positive about
Google. However, the readers may have different feelings. Those who have financial
interests in Google should feel happy, but Google’s competitors will not be thrilled.
In Sect. 2.4.2, we choose to assign positive sentiment to the opinion because our
commonsense knowledge says that the fact is desirable for Google. Users can decide
how to use the opinion based on their application needs.

2.6 Summary
This chapter described many facets of sentiment analysis. It started with the definitions of the concepts of opinion, sentiment, and opinion summary. The definitions
abstracted a structure from the unstructured natural language text, and also showed
that sentiment analysis is a multi-faceted problem with many interrelated sub-

38

B. Liu

problems. Researchers can exploit the inter-relationships to design more robust and
accurate solution techniques. This chapter also classified and discussed different
types of opinions. Along with these definitions and discussions, the important
concepts of affect, emotion and mood were introduced and defined too. They
are closely related to, but are also different from conventional rational opinions.
Opinions emphasize evaluation or appraisal of some target objects, events or topics
(which are collectively called entities in this chapter), while emotions emphasize
people’s feelings caused by such entities.
After reading this chapter, I am sure that you would agree with me that on the one
hand, sentiment analysis is a challenging area of research involving many different
tasks and perspectives, and on the other, it is also highly subjective in nature. Thus,
I do not expect that you completely agree with me on everything in the chapter.
I also do not claim that this chapter covered all important aspects of sentiment
and opinion. My goal is to present a reasonably precise definition of sentiment
analysis (or opinion mining) and its related concepts, issues, and tasks. I hope I
have succeeded to some extent.

References
Alm, Ebba Cecilia Ovesdotter. 2008. Affect in text and speech: ProQuest.
Arnold, Magda B. 1960. Emotion and personality. New York: Columbia University Press.
Batson, C. Daniel, Laura L. Shaw, and Kathryn C. Oleson. 1992. Differentiating affect, mood, and
emotion: Toward functionally based conceptual distinctions. Emotion Review of Personality
and Social Psychology 13: 294–326.
Bethard, Steven, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou, and Dan Jurafsky. 2004.
Automatic extraction of opinion propositions and their holders. In Proceedings of the AAAI
spring symposium on exploring attitude and affect in text.
Chaudhuri, Arjun. 2006. Emotion and reason in consumer behavior. Oxford: Elsevier ButterworthHeinemann.
Choi, Yejin, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. 2005. Identifying sources of
opinions with conditional random fields and extraction patterns. In Proceedings of the human
language technology conference and the conference on empirical methods in natural language
processing (HLT/EMNLP-2005).
Ekman, P., W.V. Friesen, and P. Ellsworth. 1982. What emotion categories or dimensions can
observers judge from facial behavior? In Emotion in the human face, ed. P. Ekman, 98–110.
Cambridge: Cambridge University Press.
Gray, Jeffrey A. 1982. The neuropsychology of anxiety. Oxford: Oxford University Press.
Hobbs, Jerry R., and Ellen Riloff. 2010. Information extraction. In In handbook of natural language
processing, ed. N. Indurkhya and F.J. Damerau, 2nd ed. London: Chapman & Hall/CRC Press.
Hu, Minqing and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of
ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD2004).
HUMAINE. 2006. Emotion annotation and representation language. Available from: http://
emotion-research.net/projects/humaine/earl
Izard, Carroll Ellis. 1971. The face of emotion. New York: Appleton-Century-Crofts.
James, William. 1884. What is an emotion? Mind 9: 188–205.

2 Many Facets of Sentiment Analysis

39

Jindal, Nitin and Bing Liu. 2006a. Identifying comparative sentences in text documents. In
Proceedings of ACM SIGIR conference on research and development in information retrieval
(SIGIR-2006).
———. 2006b. Mining comparative sentences and relations. In Proceedings of national conference
on artificial intelligence (AAAI-2006).
Kim, Soo-Min and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings of
interntional conference on computational linguistics (COLING-2004).
Liu, Bing. 2006. Web data mining: Exploring hyperlinks, contents, and usage data. Berlin:
Springer.
———. 2010. Sentiment analysis and subjectivity, in Handbook of natural language processing,
Second Edition, N. Indurkhya and F.J. Damerau, Editors.
———. 2015. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge:
Cambridge University Press.
Liu, Bing, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing
opinions on the web. Proceedings of international conference on world wide web (WWW-2005).
McDougall, William. 1926. An introduction to social psychology. Boston: Luce.
Mowrer, Orval Hobart. 1960. Learning theory and behavior. New York: Wiley.
Oatley, K., and P.N. Jobnson-Laird. 1987. Towards a cognitive theory of emotions. Cognition and
Emotion 1: 29–50.
Ortony, Andrew, and Terence J. Turner. 1990. What’s basic about basic emotions? Psychological
Review 97 (3): 315–331.
Panksepp, Jaak. 1982. Toward a general psychobiological theory of emotions. Behavioral and
Brain Sciences 5 (3): 407–422.
Parrott, W. Gerrod. 2001. Emotions in social psychology: Essential readings. Philadelphia:
Psychology Press.
Plutchik, Robert. 1980. A general psychoevolutionary theory of emotion. In Emotion: Theory,
research, and experience: Vol. 1. Theories of emotion, ed. R. Plutchik and H. Kellerman, 3–33.
New York: Academic Press.
Qiu, Guang, Bing Liu, Bu Jiajun, and Chun Chen. 2011. Opinion word expansion and target
extraction through double propagation. Computational Linguistics 37 (1): 9–27.
Russell, James A. 2003. Core affect and the psychological construction of emotion. Psychological
Review 10 (1): 145–172.
Sarawagi, Sunita. 2008. Information extraction. Foundations and Trends in Databases 1 (3): 261–
377.
Tomkins, Silvan. 1984. Affect theory. In Approaches to emotion, ed. K.R. Scherer and P. Ekman,
163–195. Hillsdale: Eribaum.
Watson, John B. 1930. Behaviorism. Chicago: Chicago University Press.
Weiner, B., and S. Graham. 1984. An attributional approach to emotional development. In Emotion,
cognition and behavior, ed. C.E. Izard, J. Kagan, and R.B. Zajonc, 167–191. New York:
Cambridge University Press.
Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and
emotions in language. Language Resources and Evaluation 39 (2): 165–210.
Zhang, Lei and Bing Liu. 2011. Identifying noun product features that imply opinions. In
Proceedings of the annual meeting of the Association for Computational Linguistics (short
paper) (ACL-2011).
Zhuang, Li, Feng Jing, and Xiaoyan Zhu. 2006. Movie review mining and summarization. In
Proceedings of ACM international conference on information and knowledge management
(CIKM-2006).

Chapter 3

Reflections on Sentiment/Opinion Analysis
Jiwei Li and Eduard Hovy

Abstract The detection of expressions of sentiment in online text has become a
popular Natural Language Processing application. The task is commonly defined
as identifying the words or phrases in a given fragment of text in which the reader
understands that the author expresses some person’s positive, negative, or perhaps
neutral attitude toward a topic. These four elements—expression words, attitude
holder, topic, and attitude value—have evolved with hardly any discussion in the
literature about their foundation or nature. Specifically, the use of two (or three)
attitude values is far more simplistic than many examples of real language show.
In this paper we ask: where do sentiments come from? We focus on two basic
sources of human attitude—the holder’s non-logical/emotional preferences and the
fulfillment of the holder’s goals. After exploring each source we provide a notional
algorithm sketch and examples of how sentiment systems could provide richer and
more realistic accounts of sentiment in text.
Keywords Sentiment analysis • Opinion mining • Natural language processing •
Aspect extraction • Psychology of emotions

3.1 Introduction
Sentiment analysis is an application of natural language processing that focuses on
identifying expressions that reflect authors’ opinion-based attitude (i.e., good or bad,
like or dislike) toward entities (e.g., products, topics, issues) or facets of them (e.g.,
price, quality).
Since the early 2000s, a large number of models and frameworks have been
introduced to address this application, with emphasis on various aspects like opinion
related entity exaction, review mining, topic mining, sentiment summarization, recJ. Li ()
Computer Science Department, Stanford University, Stanford, 94305, CA, USA
e-mail: jiweil@stanford.edu
E. Hovy
Language Technology Institute, Carnegie Mellon University, Pittsburgh, 15213, PA, USA
e-mail: hovy@cmu.edu
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_3

41

42

J. Li and E. Hovy

ommendation, and these extracted from significantly diverse text sources including
product reviews, news articles, social media (blogs, Twitter, forum discussions), and
so on.
However, despite this activity, disappointingly little has been published about
what exactly a sentiment or opinion actually is. It is generally simply assumed that
two (or perhaps three) polar values positive, negative, neutral) are enough, and that
they are clear, and that anyone would agree on how to assign such labels to arbitrary
texts. Further, existing methods, despite employing increasingly sophisticated (and
of course more powerful) models (e.g., neural nets), still essentially boil down to
considering individual or local combinations of words and matching them against
predefined lists of words with fixed sentiment values, and thus hardly transcend
what was described in the early work by Pang et al. (2002).
There is nothing against simple methods when they work, but they do not
always work, and without some discussion of why not, and where to go next, this
application remains rather technically uninteresting. The goal of this paper is to
identify gaps in the current sentiment analysis literature and to outline practical
computational ways to address these issues.
Goals, Expectations and Sentiments. We begin with the fundamental question
“What make people hold positive attitudes towards some entities and negative
attitudes toward others?”. The answer to this question is a psychological state that
relates to the opinion holder’s satisfaction and dissatisfaction with some aspect of
the topic in question. One of only two principal factors determines the answer: either
(1) the holder’s deep emotionally-driven, non-logical native preferences, or (2)
whether (and how well) one of the holder’s goals is fulfilled, and how (in what
ways) the goal is fulfilled.
Examples of the former are reflected in sentences like “I just like red” or “seeing
that makes me happy”. They are typified by adverbs like “just” and “simply” that
suggest that no further conscious psychological reflection or motivation obtains. Of
this class of factor we can say nothing computationally, and do not address it in the
rest of this chapter.
Fortunately, a large proportion of the attitudes people write about reflect the other
factor, which one can summarize as goal-driven utility. This relates primarily to
Consequentialism: both to Utilitarianism, in which pleasure, economic well-being
and the lack of suffering are considered desirable, but also to the general case that
morally justifiable actions (and the objects that enable them) are desirable. That
is, the ultimate basis for any judgment about the rightness or wrongness of one’s
actions, and hence of the objects that support/enable them, is a consideration of
their outcome, or consequence.
In everyday life, people establish and maintain goals or expectations, both longterm or short-term, urgent or not-urgent, ones. Achieving these goals would fill one
with satisfaction, otherwise dissatisfaction: a man walks into a restaurant to achieve
the goal of getting full, he cannot be satisfied if all food was sold out (the main goal
not being achieved). A voter would not be satisfied if his candidate or party fails to
win an election, since the longer-term consequences would generally work against

3 Reflections on Sentiment/Opinion Analysis

43

his own preferences. The generation of sentiment-related texts is guided by such
sorts of mental satisfaction and dissatisfaction induced by goals being achieved or
needs being fulfilled.
We next provide some examples to illustrate why identifying these aspects is
essential and fundamental for adequate sentiment/opinion analysis. Following the
most popular motivation for computational sentiment analysis, suppose we wish to
analyze customers’ opinions towards a product or an offering. It is not sufficient to
simply determine that someone likes or dislikes something; to make that knowledge
useful and actionable, one also wants to know why that is the case. Especially when
one would like to change the opinion, it is important to determine what it is about
the topic that needs to be changed.
Case (1)
• Question: Why did the customer like detergent X?
• Customer’s review: The detergent removes stubborn stains.
No general sentiment indicator is found in the above review. But the review directly
provides the reason, and assuming his/her goal of clean clothing is achieved, it is
evident that the opinion holder holds a positive opinion towards the detergent.
Case (2)
• Question: Why did the traveller dislike flight Y?
• Customer’s review: The food was good. The crew was helpful and took care of
everything. The service was efficient. However the flight was supposed to take
1.5 h but was 3 h late, and I missed my next connecting flight.
The major goal of taking a flight is to get to your destination, which is more important than goals like enjoying one’s food and receiving pampering service. While
multiple simultaneous goals induce competing opinion decisions, the presence of
an importance ranking among them determines the overall sentiment.
Case (3)
•
•
•
•

Question: Why did the customer visit restaurant Z?
Review1: The food is bad.
Review2: The waiter was kind but the food was bad.
Review3: The food was good but the waiter was rude.

Although the primary goal of being sated may be achieved, secondary goals such
as enjoying the food and receiving respectful service can be violated in various
combinations. Often, these goals pertain to the method by which the primary goal
was achieved; in other words, to the question “how?” rather than “why?”.
A sentiment determination algorithm that can provide more than just a simple
opinion label thus has to pay attention both to the primary reason behind the holder’s
involvement with the topic (“why?”) and to the secondary reasons (both “why?” and
“how?”), and has to be able to determine their relative importance and relationship
to the primary goal.

44

J. Li and E. Hovy

Goals and Expectations are Personal. As different people (opinion holders)
are from different backgrounds, have different personalities, and are in different
situations, they have different goals, needs, and the expectations of life. This
diversity generally leads to completely diverse opinions towards the same entity,
the same action, and the same situation: a billionaire wouldn’t be the least bit
concerned with the price in a bread shop but would consider the quality, while a
beggar might care only about the price. This rather banal observation is explained
best by Maslow’s famous hierarchy of needs (Maslow 1943), in which the beggar’s
attention focuses on Maslow’s Physiological needs while the billionaire’s focuses
on Self-Actualization; more on this in Sect. 3.3.1.
Life Requires Trade-offs. Most situations in real life address many personal
needs simultaneously. People thus face trade-offs between their goals, which
entails sacrificing the achievement of one goal for the satisfaction of another.
Given the variability among people, the rankings and decision procedures will also
from individual to individual. However, Maslow’s hierarchy describes the general
behavioral trends of people in most societies and situations.
Complex Sentiment Expressions. As far as we see, current opinion analysis
frameworks mostly fail to address the kinds of issues mentioned above, and thereby
impair a deeper understanding about opinion or sentiment. As a result, they find it
impossible to provide even rudimentary approaches to cases such as the following
(from Hovy 2015):
1.
2.
3.
4.
5.
6.

Peter thinks the pants are great and I cannot agree more.
Peter thinks the pants are great but I don’t agree.
Sometime I like it but sometimes I hate it.
He was half excited, half terrified.
The movie is indeed wonderful, but for some reason, I just don’t like it.
Why I won’t buy this game even though I like it.

In this paper, we explore the feasibility of addressing these issues in a practical way
using machine learning techniques currently available.

3.2 A Review of Current Sentiment Analysis
Here we give a brief overview of tasks in current sentiment analysis literature. More
details can be found in Liu (2010, 2012).
The key points involved at the algorithm level in the sentiment analysis literature
follow the basic approaches of statistical machine learning, in which a gold-standard
labeling of training data is obtained through manual annotation or other data
harvesting approaches (e.g., semi-supervised or weakly supervised), and this is then
used to train a variety of association-learning techniques who are then tested on
new material. Usually, some text unit has to be identified and then associated with
a sentiment label (e.g., positive, neutral, negative). Based on the annotated dataset,
the techniques learn that vocabulary items like “bad”, “awful”, and “disgusting” are

3 Reflections on Sentiment/Opinion Analysis

45

negative sentiment indicators while “good”, “fantastic” and “awesome” are positive
ones. The main complexity lies in learning which words carry some opinion and,
especially, what to decide in cases where different words with opposite labels appear
in the same clause.
Basic sentiment analysis identifies the simple polarity of a text unit (e.g., a
token, a phrase, a sentence, or a document) and is framed as a binary or multiclass classification task; see for example Pang et al’s work (2002) that uses a
unigram/bigram feature-based SVM classifier. Over the past 15 years, techniques
have evolved from simple rule-based word matching to more sophisticated feature
and signal (e.g., local word composition, facets of topics, opinion holder) identification and combination, from the level of single tokens to entire documents, and from
‘flat’ word strings without any syntactic structure at all to incorporation of complex
linguistic structures (e.g., discourse or mixed-affect sentences); see (Pang and Lee
2004; Hu and Liu 2004; Wiebe et al. 2005; Nakagawa et al. 2010; Maas et al. 2011;
Tang et al. 2014a,b; Qiu et al. 2011; Wang and Manning 2012; Yang and Cardie
2014a; Snyder and Barzilay 2007). Recent progress in neural models provides new
techniques for local composition of both opinion and structure (e.g., subordination,
conjunction) using distributed representations of text units (e.g., Socher et al. 2013;
Irsoy and Cardie 2014a,b; Tang 2015; Tang et al. 2014c).
A supporting line of research extends the basic sentiment classification to
include related aspects and facets, such as identifying opinion holders, the topics
of opinions, topics not explicitly mentioned in the text, etc.; see (Choi et al. 2006;
Kim and Hovy 2006, 2004; Li and Hovy 2014; Jin et al. 2009; Breck et al.
2007; Johansson and Moschitti 2010; Yang and Cardie 2012, 2013, 2014b). These
approaches usually employ sequence labeling models (e.g., CRF (Lafferty et al.
2001), HMM (Liu et al. 2004)) to identify whether the current token corresponds to
a specific sentiment-related aspect or facet.
An important part of such supportive work is the identification of the relevant
aspects or facets of the topic (e.g., the ambience of a restaurant vs. its food or staff
or cleanliness) and the correspondent sentiment; see (Brody and Elhadad 2010; Lu
et al. 2011; Titov and McDonald 2008; Jo and Oh 2011; Xueke et al. 2013; Kim et al.
2013; García-Moya et al. 2013; Wang et al. 2011; Moghaddam and Ester 2012).
Online reviews (about products or offerings) in crowdsourcing and traditional sites
(e.g., yelp, Amazon, Consumer Reports) include some sort of aspect-oriented star
rating systems where more stars indicate higher level of satisfaction. Consumers rely
on these user-generated online reviews when making purchase decisions. To tackle
this issue, researchers invent aspect identification or target extraction approaches as
one subfield of sentiment analysis. These approaches first identify ‘aspects/facets
of the principal Topic and then discover authors’ corresponding opinions for each
one; e.g., (Brody and Elhadad 2010; Titov and McDonald 2008). Aspects are usually
identified either manually or automatically using word clustering models (e.g., LDA
(Blei et al. 2003) or pLSA). However, real life is usually a lot more complex and
much harder to break into a series of facets (e.g., quality of living, marriage, career).
Other related work includes opinion summarization, aiming to summary sentiment key points given long texts (e.g., Hu and Liu 2004; Liu et al. 2005; Zhuang
et al. 2006; Ku et al. 2006), opinion spam detection aiming at identifying fictitious

46

J. Li and E. Hovy

reviews generated to deceive readers (e.g., Ott et al. 2011; Li et al. 2014, 2013; Jindal
and Liu 2008; Lim et al. 2010), sentiment text generation (e.g., Mohammad 2011;
Blair-Goldensohn et al. 2008), and large-scale sentiment/mood analysis on social
media for trend detecion (e.g., O’Connor et al. 2010; Bollen et al. 2011; Conover
et al. 2011; Paul and Dredze 2011).

3.3 The Needs and Goals Behind Sentiments
As outlined in Sect. 3.1, this chapter argues that an adequate and complete account
of utilitarian-based sentiment is possible only with reference to the goals of the
opinion holder. In this section we discuss a classic model of human needs and
associated goals and then outline a method for determining such goals from text.

3.3.1 Maslow’s Hierarchy of Needs
Abraham Maslow (Maslow 1943, 1967, 1971; Maslow et al. 1970) developed a
theory of the basic human needs as being organized in a hierarchy of importance,
visualized using a pyramid (shown in Fig. 3.1), where needs at the bottom are the
most pressing, basic, and fundamental to human life (that is, the human will tend to
choose to satisfy them first before progressing to needs higher up).
According to Maslow’s theory, the most basic two levels of human needs are1 :
• Physiological needs: breathing, food, water, sleep, sex, excretion, etc.
• Safety Needs: security of body, employment, property, heath, etc.
Fig. 3.1 Maslow’s hierarchy
of needs
Self-actualization

Esteem
Love & belonging

creativity,
spontaneity
lack of prejudice,
acceptance of facts,
morality
self-esteem, confidence,
respect of and by others
family, friendship, (sexual) intimacy

Safety

security of self (body, resoures,
property, employment, health) and family

Physiology

breathing, food, water, sleep, excretion, sex

1
References from
https://en.wikipedia.org/wiki/Abraham_Maslow;
https://en.wikipedia.org/wiki/Maslow’s_hierarchy_of_needs;
http://www.edpsycinteractive.org/topics/conation/maslow.html

3 Reflections on Sentiment/Opinion Analysis

47

which are essential for the physical survival of a person. Once these needs are
satisfied, people tend to accomplish more and move to higher levels:
• Love and Belonging: psychological needs like friendship, family, sexual intimacy.
• Esteem: the need to be competent and recognized such as through status and level
of success like achievement, respect by others, etc.
These four types of needs are also referred to as DEFICIT NEEDS (or D-NEEDS),
meaning that for any human, if he or she doesn’t have enough of any of them, he
or she will experience the desire to obtain them. Less pressing than the D-needs are
the so-called GROWTH NEEDS, including Cognitive, Aesthetic (need for harmony,
order and beauty), and Self-actualization (described by Maslow as “the desire to
accomplish everything that one can, to become the most that one can be”). Growth
needs are more generalized, obscure, and computationally challenging. We focus in
this chapter on deficit needs. For further reading, refer to Maslow’s original papers
(1943, 1967) or relevant Wikipedia pages.
We note that real life offers many situations in which an action does not easily
align with a need listed in the hierarchy (for example, the goal of British troops to
arrest an Irish Republican Army leader or of US troops to attack Iraq). Additionally,
a single action (e.g., going to college, looking for a job) can simultaneously address
multiple needs. Putting aside such complex situations in this chapter, we focus on
more tractable situations to illustrate the key points.2

3.3.2 Finding Appropriate Goals for Actions and Entities
Typically, each deficit need gives rise to one or more goals that impel the agent (the
opinion holder) to appropriate action. Following standard AI and Cognitive Science
practice, we assume that the agent instantiates one or more plans to achieve his
or her goals, where a plan is a sequence of actions intended to alter the state of
the world from some situation (typically, the agent’s initial state) to a situation in
which the goal has been achieved and the need satisfied. In each plan, its actions,
their preconditions, and the entities used in performing them (the plan’s so-called
props) constitute the material upon which sentiment analysis operates. For example,
the goal to sate one’s hunger may be achieved by plans such as visit-restaurant,
cook-and-eat-meal-at-home, buy-or-steal-ready-made-food, cadge-meal-invitation,
etc. In all these plans, food is one of the props. For the restaurant and buying-food
plans, an affordable price is an important precondition.

2
However, putting them aside them doesn’t mean that we don’t need to explore and explain
these complex situations. On the contrary, these situations are essential and fundamental to the
understanding of opinion and sentiment, but requires deeper and more systematic exploration in
psychology, cognitive science, and AI.

48

J. Li and E. Hovy

A sentiment detection system that seeks to understand why the holder holds a
specific opinion valence has to determine the specific actions, preconditions, and
props that are relevant to the holder’s goal, and to what degree they suffice. In
principle, a complete account requires the system to infer from the given text:
1.
2.
3.
4.
5.
6.

what need is active,
which goal(s) have been activated to address the need,
which plan(s) is/are being followed to achieve the goal(s),
which actions, preconditions, and props appear in these plan(s),
which of these is/are being talked about in the text,
how well it/they actually have furthered the agent’s plan(s),

from which the sentiment valence can be automatically deduced. When the valence
is given in the text, one can work ‘backwards’ to infer step 6, and possibly even
earlier steps.
Determining all this is a tall order for computational systems. Fortunately, it
is possible to circumvent much of this reasoning in practice. For most common
situations, a relatively small set of goals and plans obtains, and the relevant actions,
preconditions, and props are usually quite standard. (In fact, they are precisely
what is typically called ‘facets’ in the sentiment analysis literature, for which, as
described in Sect. 3.2, various techniques have been investigated, albeit without a
clear understanding of the reason these facets are important.)
Given this, the principal unaddressed computational problem today is the determination from the text of the original need or goal being experienced by the holder,
since that is what ties together all the other (and currently investigated) aspects.
How can one, for a given topic, determine the goals an agent would typically have
for it, suggest likely plans, and potentially pinpoint specific actions, preconditions,
and props?
One approach is to perform automated goal and plan harvesting, using typical
text mining / pattern-matching approaches from Information Extraction. This is a
relatively mature application of NLP (Hearst 1992; Riloff and Shepherd 1997; Riloff
and Jones 1999; Snow et al. 2004; Davidov and Rappoport 2006; Etzioni et al.
2005; Banko 2009; Mitchell et al. 2009; Ritter et al. 2009; Kozareva and Hovy
2013), and the harvesting power and behavior of various styles of patterns has been
investigated for over two decades. (In practice, the Double-Anchored Pattern (DAP)
method (Kozareva and Hovy 2013) works better than most others.) Stated simply,
one creates or automatically induces text patterns anchored on the topic (e.g., a
camera) such as
“I want a camera because *”
“If I had a camera I could *”
“the main reason to get a camera is *”
“wanted to *, so he bought a camera” etc.

and then extracts from large amounts of text the matched VPs and NPs as being
relevant to the topic. Appropriately rephrased and categorized, one obtains the
information harvested by these patterns would provide typical goals (reasons) for
buying and using cameras.

3 Reflections on Sentiment/Opinion Analysis

49

3.4 Toward a Practical Computational Approach
We are now ready to describe the overall approach necessary for a more complete
sentiment analysis system. For illustrative purposes we focus on simple binary
(positive/negative) valence identification. However, the framework applies to finer
granularity (e.g., multi-class classification, regression) with minor adjustments. We
first provide an overall algorithm sketch, provide a series of examples, and then
suggest models for determining the still unexplored aspects required for deeper
sentiment analysis.
First, we assume that standard techniques are employed to find the following
from some given text:
1. Opinion Holder: Individual or organization holding the opinion.
2. Entity/Aspect/Theme/Facet: topic or aspect about which the opinion is held.
3. Sentiment Indicator: Sentiment-related text (tokens, phrases, sentences, etc.) that
indicate the polarity of the holder.
4. Valence: like, neutral, or dislike.
These have been defined (or at least used with implicit definition) throughout the
sentiment literature, and are defined for example in Hovy (2015). Of these, item 1
is usually achieved by simple matching. Item 2 can be partially addressed by recent
topic/facet mining models, and item 3 can be addressed by existing sentiment related
algorithms at the word-, sentence-, or text-level. Item 4 at its simplest is a matter of
keyword matching, but the composition witin a sentence of contrasting valences has
generated some interesting researech. Annotated corpora (or other semi-supervised
data harvesting techniques) might be needed for goal and need identification, as
discussed above.
Given this, the following sketch algorithm implements deeper sentiment
analysis:
1. In the text, identify the key goal underlying the Theme.
2. Is there is no apparent goal?
• If yes, the opinion is probably non-utilitarian, so find and return a valence if
any, but return no reason for it.
• If no, go to step 3.
3. Determine whether the goal is satisfied:
• If yes, go to step 4,
• If no, return a negative valence.
4. Identify the subgoals involved in achieving the major goal.
5. Identify how well the subgoals are satisfied.
6. Determine the final utilitarian sentiment based on the trade-off between different
subgoals, and return it together with the trade-off analysis as the reasoning.

50

J. Li and E. Hovy

This procedure requires the determination of the Goals or Subgoals and the
Condition/Situation under which the opinion holder holds that opinion. The former
is discussed above; the latter can usually bet determined from the context of the
given text.

3.4.1 Examples and Illustration
As a running example we use simple restaurant reviews, sentences in italics
indicating original text from the reviews3 :
Case 1
1. My friends and I went to restaurant X.
2. So many people were waiting there and we left without eating.
Following the algorithm sketch, the question “was the major goal of going to a
restaurant fulfilled?” is answered no. The reviewer is predicted to hold a negative
sentiment. Similar reasoning applies to Case 2 in Sect. 3.1.
Case 2
1. My friends and I went to restaurant X.
2. The waiter was friendly and knowledgeable.
3. We ordered curry chicken, potato chips and italian sausage. The Italian sausage
was delicious.
4. Overall the food was appetizing,
5. but I just didn’t enjoy the experience.
To the question “was the major goal of being full fulfilled?” the answer is yes, as
the food was ordered and eaten. Next the algorithms addresses the how (manner
of achievement) question described in steps 4–6, which involves the functional
elements of goals/needs embedded in each sentence:
1. My friends and I went to restaurant X.
Opinion Holder: I
Entity/Aspect/Theme: restaurant X
Need: sate hunger
Goal: visit restaurant
Sentiment Indicator: none
Valence: neutral Condition: in restaurant X
2. The waiter was friendly and knowledgeable.
Opinion Holder: I

3
These reviews were originally from yelp reviews and revised by the authors for illustration
purposes.

3 Reflections on Sentiment/Opinion Analysis

51

Entity/Aspect/Theme: waiter
Need: gather respect/friendship
Subgoal: order food
Sentiment Indicator: friendly, knowledgeable
Valence: positive
Condition: in restaurant X
3. We ordered curry chicken, potato chips and italian sausage. Italian sausage was
delicious.
Opinion Holder: I
Entity/Aspect/Theme: Italian sausage
Need: sate hunger
Subgoal: eat food
Sentiment Indicator: delicious
Valence: positive
Condition: in restaurant X
4. Overall the food was appetizing,
Opinion Holder: I
Entity/Aspect/Theme: food
Need: sate hunger
Subgoal: eat enough to remove hunger
Sentiment Indicator: appetizing
Valence: positive
Condition: in restaurant X
5. but I just didn’t enjoy the experience.
Opinion Holder: I
Entity/Aspect/Theme: restaurant visit experience
Need: none — this is not utilitarian
Goal: none
Sentiment Indicator: didn’t enjoy
Sentiment Label: negative
Condition: in restaurant X
The analysis of the needs/goals and their respective positive and negative valences
allows one to justify the various sentiment statements, and (in the case of tie final
negative decision) also indicate that it is not based on utilitarian considerations.

3.4.2 A Computational Model of Each Part
Current computational models can be used to address each of the aspects involved
in the sketch algorithm. We provide only a high-level description of each.
Deciding Functional Elements. Case 2 above involves three of the needs
described in Maslow’s hierarchy: food, respect/friendship, and emotion. The first
two are stated to have been achieved. The third is a pure emotion, expressed without

52

J. Li and E. Hovy

a reason, why the holder “just didn’t enjoy the experience”. Pure emotions usually
have no overt utilitarian value but only relate to the holder’s high-level goal of
being happy. In this example, we have to conclude that since all overt goals were
met, either some unstated utilitarian Maslow-type need was not met, or the holder’s
opinion stems from a deeper psychological/emotional bias, of the kind mentioned
in Sect. 3.1, that goes beyond utilitarian value.
Whether the Major Goal is Achieved. To make a decision about goal achievement, one must: (1) identify the goal/subgoal of an action (e.g., buying the detergent,
going to a restaurant); (2) identify whether that goal/subgoal is achieved. The two
steps can be computed either separately or jointly using current machine learning
models and techniques, including:
• Joint Model: Annotate corpora for satisfaction or not for all goals and subgoals
together, and train a single machine learning algorithm.
• Separate Model:
1. Determine the goal and its plans and subgoals either through annotation or as
described in Sect. 3.3.2.
2. Associate the actions or entities of the Theme (e.g., going to a restaurant;
buying a car) with their respective (sub)goals.
3. Align each subgoal with indicator sentence(s) in the document (e.g., “I got a
small portion”; “the car was all it was supposed to be”).
4. Decide whether the subgoal is satisfied based on indicator sentence(s).
Learning Weights for Different Goals/Needs. One can clearly infer that the
customer in case 2 assigns more weight to the emotional aspect, that being his or
her final conclusion, and less to the food or respect/friendship (which comes last in
this scenario). More formally, for a given text D, we discover L needs/(sub)goals,
with indices 1, 2; : : : ; L. Each type of need/(sub)goal i 2 Œ1; L is associated with
a weight that contributes to the final sentiment valence decision vi . In document
D, each type of need i is associated with achievement value ai that indicates how
the need or goal is satisfied. The sentiment score SD for given document D is then
given by:
SD D

X

vi  ai

i2Œ1;L

This simple approach is comparable to a regression model that assigns weights
to relevant aspects, where gold standard examples can be the overall ratings of
the labeled restaurant reviews. One can view such a weight decision procedure
as a supervised regression model by assigning a weight value to each discovered
need. Such a procedure is similar to latent aspect rating introduced in Wang
et al. (2011); Zhao et al. (2010) by learning aspect weight (i.e., value, room,
location, or service) for hotel review ratings. A simple illustrative example might be
collaborative filtering in recommendation systems, e.g., Breese et al. (1998); Sarwar

3 Reflections on Sentiment/Opinion Analysis

53

et al. (2001), optimizing need weight regarding each respective individual (which
could be sampled from a uniform prior for humans’ generally accepted weights).
Since individual expectations can differ, it would be advantageous to maintain
opinion holder profiles (for example, both yelp and Amazon keep individual profiles
for each customer) that record one’s long-term activity. This would support individual analysis of background, personality, or social identity, and enable learning of
specific goal weights for different individuals.
When these issues have been addressed, one can start asking deeper questions
like:
• Q: Why does John like his current job though his salary is low?
A: He weighs employment more highly than family.
• Q: How wealthy is a particular opinion holder?
A: He might be rich as he places little concern (weight) on money.
or make user-oriented recommendations like:
• Q: Should the system recommend an expensive–but-luxurious hotel or a cheapbut-poor hotel?

3.4.3 Prior/Default Knowledge About Opinion Holders
Sentiment/opinion analysis can be considerably assisted by the existence of a
knowledge base that provides information about the typical preferences of the
holder.
Individuals’ goals vary across backgrounds, ages, nationalities, genders, etc. An
engineer would have different life goals from a businessman, or a doctor, a citizen
living in South America would have different weighing systems from those in
Europe or the United States, people in wartime would have different life expectations from when in peacetime. Two general methods exist today for practically
collecting such standardized knowledge to construct a relevant knowledge base:
(1) Rule-based Approaches. Hierarchies of personality profiles have been proposed, and changes to them have long been explored in the social and
developmental psychology literature, usually based on polls or surveys. For
example, (1981) found that children have higher physical needs than other
age groups, love needs emerging in the transitional period from childhood to
adulthood; esteem needs are the highest among adolescents; the highest selfactualization levels are found with adults; and the highest levels of security
are found at older ages. As another example, researchers (Tang and Ibrahim
1998; Tang et al. 2002; Tang and West 1997) have found that survival (i.e.,
physiological and safety) needs dominate during wartime while psychological
needs (i.e., love, self-esteem, and self-actualization) surface during peacetime,
which is in line with our expectations. For computational implementation,

54

J. Li and E. Hovy

however, these sorts of studies provide very limited evidence, since only a few
aspects are typically explored.
(2) Computational Inference Approaches. Despite the lack of information about
individuals, reasonable preferences can be inferred from other resources such
as online social media. A vast section of the Social Network Analysis research
focuses on this problem, as well as much of the research of the large web
search engine companies. Networking websites like Facebook, LinkedIn, and
Google Plus provide rich repositories of personal information about individual attributes such as education, employment, nationality, religion, likes and
dislikes, etc. Additionally, online posts usually offer direct evidence for such
attributes. Some examples include age (Rao et al. 2010; Rao and Yarowsky
2010), gender (Ciot et al. 2013), living location (Sadilek et al. 2012), and
education (Mislove et al. 2010).

3.5 Conclusion and Discussion
The past 15 years has witnessed significant performance improvements in training
machine learning algorithms for the sentiment/opinion identification application.
But little progress has been made toward a deeper understanding about what
opinions or sentiments are, why people hold them, and why and how their
facets are chosen and expressed. No-one can deny the unprecedented contributions of statistical learning algorithms in modern-day (post-1990s) NLP, for
this application as for others. However, ignoring cognitive and psychological
perspectives in favor of engineering alone inevitably hampers progress once the
algorithms asymptote to their optimal performance, since understanding how
to do something doesn’t necessarily lead to better insight about what needs
to be done, or how it is best represented. For example, when inter-annotator
agreement on sentiment labels peaks at 0.79 even for the rather crude 3-way
sentiment granularity of positive/neutral/negative (Ogneva 2010), is that the theoretical best that could be achieved? How could one ever know, without understanding what other aspects of sentiment/opinion are pertinent and investigating
whether they could constrain the annotation task and help boost annotation agreement?
In this paper, we described possible directions for deeper understanding, helping bridge the gap between psychology / cognitive science and computational
approaches. We focus on the opinion holder’s underlying needs and their resultant
goals, which, in a utilitarian model of sentiment, provides the basis for explaining
the reason a sentiment valence is held. (The complementary non-utilitarian, purely
intuitive preference-based basis for some sentiment decisions is a topic requiring
altogether different treatment.) While these thoughts are still immature, scattered,
unstructured, and even imaginary, we believe that these perspectives might suggest
fruitful avenues for various kinds of future work.

3 Reflections on Sentiment/Opinion Analysis

55

References
Banko, Michelle. 2009. Ph.D. Dissertation, University of Washington.
Blair-Goldensohn, Sasha, Kerry Hannan, Ryan McDonald, Tyler Neylon, George A Reis, and Jeff
Reynar. 2008. Building a sentiment summarizer for local service reviews. In WWW Workshop
on NLP in the Information Explosion Era, vol. 14.
Blei, David M, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. The Journal
of Machine Learning Research 3: 993–1022.
Bollen, Johan, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market.
Journal of Computational Science 2(1): 1–8.
Breck, Eric, Yejin Choi, and Claire Cardie. 2007. Identifying expressions of opinion in context.
In IJCAI.
Breese, John S, David Heckerman, and Carl Kadie. 1998. Empirical analysis of predictive
algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on
Uncertainty in Artificial Intelligence, 43–52. Morgan Kaufmann Publishers Inc.
Brody, Samuel, and Noemie Elhadad. 2010. An unsupervised aspect-sentiment model for
online reviews. In Human Language Technologies: The 2010 Annual Conference of the North
American Chapter of the Association for Computational Linguistics, 804–812. Association for
Computational Linguistics.
Choi, Yejin, Eric Breck, and Claire Cardie. 2006. Joint extraction of entities and relations for
opinion recognition. In EMNLP.
Ciot, Morgane, Morgan Sonderegger, and Derek Ruths. 2013. Gender inference of twitter users in
non-English contexts. In EMNLP, 1136–1145.
Conover, Michael, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Filippo Menczer, and
Alessandro Flammini. 2011. Political polarization on twitter. In ICWSM.
Davidov, A., and D. Rappoport. 2006. Efficient unsupervised discovery of word categories
using symmetric patterns and high frequency words. In Proceedings of the 21st International
Conference on Computational Linguistics COLING and the 44th Annual Meeting of the ACL,
297–304.
Etzioni, O., M. Cafarella, D. Downey, A.M. Popescu, T. Shaked, and S. Soderland et al. 2005.
Unsupervised named-entity extraction from the web: An experimental study. Artificial
Intelligence 165(1): 91–134.
García-Moya, Lisette, Henry Anaya-Sánchez, and Rafael Berlanga-Llavori. 2013. Retrieving
product features and opinions from customer reviews. IEEE Intelligent Systems 28(3):
0019–27.
Goebel, Barbara L, and Delores R Brown. 1981. Age differences in motivation related to Maslow’s
need hierarchy. Developmental Psychology 17(6): 809.
Hearst, Marti. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings
of the 14th Conference on Computational Linguistics, 539–545.
Hovy, Eduard H. 2015. What are sentiment, affect, and emotion? Applying the methodology
of Michael zock to sentiment analysis. In Language production, cognition, and the Lexicon,
13–24. Cham: Springer.
Hu, Minqing, and Bing Liu. 2004. Mining opinion features in customer reviews. In AAAI, vol. 4,
755–760.
Irsoy, Ozan, and Claire Cardie. 2014a. Deep recursive neural networks for compositionality in
language. In Advances in neural information processing systems, 2096–2104. Cham: Springer.
Irsoy, Ozan, and Claire Cardie. 2014b. Opinion mining with deep recurrent neural networks. In
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), 720–728.
Jin, Wei, Hung Hay Ho, and Rohini K Srihari. 2009. A novel lexicalized HMM-based learning
framework for web opinion mining. In ICML.
Jindal, Nitin, and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of the 2008
International Conference on Web Search and Data Mining, 219–230. ACM.

56

J. Li and E. Hovy

Jo, Yohan, and Alice H Oh. 2011. Aspect and sentiment unification model for online review
analysis. In Proceedings of the Fourth ACM International Conference on Web Search and Data
Mining, 815–824. ACM.
Johansson, Richard, and Alessandro Moschitti. 2010. Syntactic and semantic structure for opinion
expression detection. In Proceedings of the Fourteenth Conference on Computational Natural
Language Learning.
Kim, Soo-Min, and Eduard Hovy. 2004. Determining the sentiment of opinions. In Proceedings
of the 20th International Conference on Computational Linguistics, 1367. Association for
Computational Linguistics.
Kim, Soo-Min, and Eduard Hovy. 2006. Extracting opinions, opinion holders, and topics expressed
in online news media text. In Proceedings of the Workshop on Sentiment and Subjectivity in
Text.
Kim, Suin, Jianwen Zhang, Zheng Chen, Alice H Oh, and Shixia Liu. 2013. A hierarchical aspectsentiment model for online reviews. In AAAI.
Kozareva, Z., and E.H Hovy. 2013. Tailoring the automated construction of large-scale taxonomies
using the web. Journal of Language Resources and Evaluation 47: 859–890.
Ku, Lun-Wei, Yu-Ting Liang, and Hsin-Hsi Chen. 2006. Opinion extraction, summarization and
tracking in news and blog corpora. In AAAI Spring Symposium: Computational Approaches to
Analyzing Weblogs, vol. 100107.
Lafferty, John, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields:
Probabilistic models for segmenting and labeling sequence data. In Proceedings of the
eighteenth international conference on machine learning, ICML. Vol. 1.
Li, Jiwei, and Eduard H. Hovy. 2014. Sentiment analysis on the people’s daily. In EMNLP, 467–
476.
Li, Jiwei, Myle Ott, and Claire Cardie. 2013. Identifying manipulated offerings on review portals.
In EMNLP, 1933–1942.
Li, Jiwei, Myle Ott, Claire Cardie, and Eduard H. Hovy. 2014. Towards a general rule for
identifying deceptive opinion spam. In ACL (1), 1566–1576.
Lim, Ee-Peng, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting
product review spammers using rating behaviors. In Proceedings of the 19th ACM International
Conference on Information and Knowledge Management, 939–948. ACM.
Liu, Bing. 2010. Sentiment analysis and subjectivity. Handbook of Natural Language Processing
2: 627–666.
Liu, Bing. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language
Technologies 5(1): 1–167.
Liu, Bing, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: analyzing and comparing
opinions on the web. In Proceedings of the 14th International Conference on World Wide Web,
342–351. ACM.
Liu, Yun-zhong, Ya-ping Lin, and Zhi-ping Chen. 2004. Text information extraction based on
hidden Markov model [J]. Acta Simulata Systematica Sinica 3: 038.
Lu, Bin, Myle Ott, Claire Cardie, and Benjamin K Tsou. 2011. Multi-aspect sentiment analysis
with topic models. In 2011 IEEE 11th International Conference on Data Mining Workshops
(ICDMW), 81–88. IEEE.
Maas, Andrew L, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher
Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language TechnologiesVolume 1, 142–150. Association for Computational Linguistics.
Maslow, Abraham Harold, Robert Frager, James Fadiman, Cynthia McReynolds, and Ruth Cox.
1970. Motivation and personality, vol. 2. New York: Harper & Row.
Maslow, Abraham Harold. 1943. A theory of human motivation. Psychological Review 50(4):
370.
Maslow, Abraham H. 1967. A theory of metamotivation: The biological rooting of the value-life.
Journal of Humanistic Psychology 7(2): 93–127.
Maslow, Abraham H. 1971. The farther reaches of human nature.

3 Reflections on Sentiment/Opinion Analysis

57

Mislove, Alan, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. 2010. You are who
you know: Inferring user profiles in online social networks. In Proceedings of the Third ACM
International Conference on Web Search and Data Mining, 251–260. ACM.
Mitchell, T.M., J. Betteridge, A. Carlson, E. Hruschka, and R. Wang. 2009. Populating the
semantic web by macro-reading internet text. In Proceedings of the 8th International Semantic
Web Conference (ISWC).
Moghaddam, Samaneh, and Martin Ester. 2012. On the design of LDA models for aspect-based
opinion mining. In Proceedings of the 21st ACM International Conference on Information and
Knowledge Management, 803–812. ACM.
Mohammad, Saif. 2011. From once upon a time to happily ever after: Tracking emotions in novels
and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for
Cultural Heritage, Social Sciences, and Humanities, 105–114. Association for Computational
Linguistics.
Nakagawa, Tetsuji, Kentaro Inui, and Sadao Kurohashi. 2010. Dependency tree-based sentiment
classification using CRFs with hidden variables. In Human Language Technologies: The
2010 Annual Conference of the North American Chapter of the Association for Computational
Linguistics, 786–794. Association for Computational Linguistics.
O’Connor, Brendan, Ramnath Balasubramanyan, Bryan R Routledge, and Noah A Smith. 2010.
From tweets to polls: Linking text sentiment to public opinion time series. ICWSM 11: 122–
129.
Ogneva, Maria. 2010. How companies can use sentiment analysis to improve their business.
Mashable. http://mashable.com/2010/04/19/sentiment-analysis/
Ott, Myle, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion
spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies-Volume 1, 309–
319. Association for Computational Linguistics.
Pang, Bo, and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on
Association for Computational Linguistics, 271. Association for Computational Linguistics.
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification
using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical
Methods in Natural Language Processing-Volume 10, 79–86. Association for Computational
Linguistics.
Paul, Michael J, and Mark Dredze. 2011. You are what you tweet: Analyzing twitter for public
health. In ICWSM, 265–272.
Qiu, Guang, Bing Liu, Jiajun Bu, and Chun Chen. 2011. Opinion word expansion and target
extraction through double propagation. Computational Linguistics 37(1): 9–27.
Rao, Delip, and David Yarowsky. 2010. Detecting latent user properties in social media. In
Proceedings of the NIPS MLSN Workshop.
Rao, Delip, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent
user attributes in twitter. In Proceedings of the 2nd International Workshop on Search and
Mining User-Generated Contents, 37–44. ACM.
Riloff, J., and E. Shepherd. 1997. A corpus-based approach for building semantic lexicons. In
Proceedings of the Second Conference on Empirical Methods in Natural Language Processing
(EMNLP), 117–124.
Riloff, R., and E. Jones. 1999. Learning dictionaries for information extraction by multi-level
bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence
(AAAI), 474–479.
Ritter, A., S. Soderland, and O. Etzioni. 2009. What is this, anyway: Automatic hypernym
discovery. In Proceedings of the AAAI Spring Symposium on Learning by Reading and
Learning to Read.
Sadilek, Adam, Henry Kautz, and Jeffrey P Bigham. 2012. Finding your friends and following
them to where you are. In Proceedings of the Fifth ACM International Conference on Web
Search and Data Mining, 723–732. ACM.

58

J. Li and E. Hovy

Sarwar, Badrul, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative
filtering recommendation algorithms. In Proceedings of the 10th International Conference on
World Wide Web, 285–295. ACM.
Snow, Rion, Daniel Jurafsky, and Andrew Y. Ng. 2004. Learning syntactic patterns for automatic
hypernym discovery. In NIPS, vol. 17, 1297–1304.
Snyder, Benjamin, and Regina Barzilay. 2007. Multiple aspect ranking using the good grief
algorithm. In HLT-NAACL, 300–307.
Socher, Richard, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y
Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over
a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), vol. 1631, 1642, Citeseer.
Tang, Thomas Li-Ping, and Abdul H Safwat Ibrahim. 1998. Importance of human needs during
retrospective peacetime and the persian gulf war: Mideastern employees. International Journal
of Stress Management 5(1): 25–37.
Tang, Thomas Li-Ping, and W. Beryl West. 1997. The importance of human needs during
peacetime, retrospective peacetime, and the persian gulf war. International Journal of Stress
Management 4(1): 47–62.
Tang, T.L.P, A.H.S Ibrahim, and W.B. West. 2002. Effects of war-related stress on the satisfaction
of human needs: The united states and the middle east. International Journal of Management
Theory and Practices 3(1): 35–53.
Tang, Duyu, Furu Wei, Bing Qin, Li Dong, Ting Liu, and Ming Zhou. 2014a. A joint segmentation
and classification framework for sentiment analysis. In EMNLP, 477–487.
Tang, Duyu, Furu Wei, Bing Qin, Ming Zhou, and Ting Liu. 2014b. Building large-scale Twitterspecific sentiment Lexicon: A representation learning approach. In COLING, 172–182.
Tang, Duyu, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014c. Learning sentimentspecific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual
Meeting of the Association for Computational Linguistics, 1555–1565.
Tang, Duyu. 2015. Sentiment-specific representation learning for document-level sentiment
analysis. In Proceedings of the Eighth ACM International Conference on Web Search and
Data Mining, 447–452. ACM.
Titov, Ivan, and Ryan T McDonald. 2008. A joint model of text and aspect ratings for sentiment
summarization. In ACL, vol. 8, 308–316. Citeseer.
Wang, Sida, and Christopher D Manning. 2012. Baselines and bigrams: Simple, good sentiment
and topic classification. In Proceedings of the 50th Annual Meeting of the Association for
Computational Linguistics: Short Papers-Volume 2, 90–94. Association for Computational
Linguistics.
Wang, Hongning, Yue Lu, and ChengXiang Zhai. 2011. Latent aspect rating analysis without
aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 618–626. ACM.
Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and
emotions in language. Language Resources and Evaluation 39(2–3): 165–210.
Xueke, Xu, Cheng Xueqi, Tan Songbo, Liu Yue, and Shen Huawei. 2013. Aspect-level opinion
mining of online customer reviews. Communications, China 10(3): 25–41.
Yang, Bishan, and Claire Cardie. 2012. Extracting opinion expressions with semi-Markov
conditional random fields. In EMNLP.
Yang, Bishan, and Claire Cardie. 2013. Joint inference for fine-grained opinion extraction. In ACL
(1), 1640–1649.
Yang, Bishan, and Claire Cardie. 2014a. Context-aware learning for sentence-level sentiment
analysis with posterior regularization. In Proceedings of ACL.
Yang, Bishan, and Claire Cardie. 2014b. Joint modeling of opinion expression extraction and
attribute classification. Transactions of the Association for Computational Linguistics 2: 505–
516.

3 Reflections on Sentiment/Opinion Analysis

59

Zhao, Wayne Xin, Jing Jiang, Hongfei Yan, and Xiaoming Li. 2010. Jointly modeling aspects
and opinions with a maxent-LDA hybrid. In Proceedings of the 2010 Conference on Empirical
Methods in Natural Language Processing, 56–65. Association for Computational Linguistics.
Zhuang, Li, Feng Jing, and Xiao-Yan Zhu. 2006. Movie review mining and summarization.
In Proceedings of the 15th ACM International Conference on Information and Knowledge
Management, 43–50. ACM.

Chapter 4

Challenges in Sentiment Analysis
Saif M. Mohammad

Abstract A vast majority of the work in Sentiment Analysis has been on developing more accurate sentiment classifiers, usually involving supervised machine
learning algorithms and a battery of features. Surveys by Pang and Lee (Found
Trends Inf Retr 2(1–2):1–135, 2008), Liu and Zhang (A survey of opinion mining
and sentiment analysis. In: Aggarwal CC, Zhai C (eds) In: Mining text data.
Springer, New York, pp 415–463, 2012), and Mohammad (Mohammad Sentiment analysis: detecting valence, emotions, and other effectual states from text.
In: Meiselman H (ed) Emotion measurement. Elsevier, Amsterdam, 2016b) give
summaries of the many automatic classifiers, features, and datasets used to detect
sentiment. In this chapter, we flesh out some of the challenges that still remain,
questions that have not been explored sufficiently, and new issues emerging from
taking on new sentiment analysis problems. We also discuss proposals to deal with
these challenges. The goal of this chapter is to equip researchers and practitioners
with pointers to the latest developments in sentiment analysis and encourage more
work in the diverse landscape of problems, especially those areas that are relatively
less explored.
Keywords Sentiment analysis tasks • Sentiment of the writer, reader, and other
entities • Sentiment towards aspects of an entity • Stance detection
• Sentiment lexicons • Sentiment annotation • Multilingual sentiment analysis

4.1 Introduction
There has been a large volume of work in sentiment analysis over the past decade
and it continues to rapidly develop in new directions. However, much of it is on
developing more accurate sentiment classifiers. In this chapter, we flesh out some of
the challenges that still remain. We start by discussing different sentiment analysis

S.M. Mohammad ()
National Research Council Canada, 1200 Montreal Rd., Ottawa, ON, Canada
e-mail: Saif.Mohammad@nrc-cnrc.gc.ca
© Her Majesty the Queen in Right of Canada 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_4

61

62

S.M. Mohammad

problems and how one of the challenges is to explore new sentiment analysis
problems that go beyond simply determining whether a piece of text is positive,
negative, or neutral (Sect. 4.2). Some of the more ambitious problems that need
more work include detecting sentiment at various levels of text granularities (terms,
sentences, paragraphs, etc); detecting sentiment of the reader or sentiment of entities
mentioned in the text; detecting sentiment towards aspects of products; detecting
stance towards pre-specified targets that may not be explicitly mentioned in the text
and that may not be the targets of opinion in the text; and detecting semantic roles
of sentiment. Since many sentiment analysis systems rely on sentiment lexicons, we
discuss capabilities and limitations of existing manually and automatically created
sentiment lexicons in Sect. 4.3. In Sect. 4.4, we discuss the difficult problem of
sentiment composition—how to predict the sentiment of a combination of terms.
More specifically, we discuss the determination of sentiment of phrases (that may
include negators, degree adverbs, and intensifiers) and sentiment of sentences and
tweets. In Sect. 4.5, we discuss challenges in annotation of data for sentiment.
We provide categories of sentences that are particularly challenging for sentiment
annotation. Section 4.6 presents challenges in multilingual sentiment analysis. This
is followed by a discussion on the challenges of applying sentiment analysis to
downstream applications, and finally, some concluding remarks (Sect. 4.7).

4.2 The Array of Sentiment Analysis Tasks
Sentiment analysis is a generic name for a large number of opinion and affect related
tasks, each of which present their own unique challenges. The sub-sections below
provide an overview.

4.2.1 Sentiment at Different Text Granularities
Sentiment can be determined at various levels: from sentiment associations of
words and phrases; to sentiment of sentences, SMS messages, chat messages, and
tweets; to sentiment in product reviews, blog posts, and whole documents. A word–
sentiment (or valence) association lexicon may have entries such as:
delighted – positive
killed – negative
shout – negative
desk – neutral

These lexicons can be created either by manual annotation or through automatic
means. Manually created lexicons tend to be in the order of a few thousand
entries, but automatically generated lexicons can capture sentiment associations for
hundreds of thousands unigrams (single word strings) and even for larger expres-

4 Challenges in Sentiment Analysis

63

sions such as bigrams (two-word sequences) and trigrams (three-word sequences).
Entries in an automatically generated lexicon often also include a real-valued score
indicating the strength of association between the word and the valence category.
These numbers are prior estimates of the sentiment of terms in an average usage
of the term. While sentiment lexicons are often useful in sentence-level sentiment
analysis,1 the same terms may convey different sentiments in different contexts. The
SemEval 2013 and 2014 Sentiment Analysis in Twitter shared tasks had a separate
sub-task aimed at identifying sentiment of terms in context. Automatic systems have
largely performed well in this task, obtaining F-scores close to 0.9. We discuss
manually and automatically created sentiment lexicons in more detail in Sect. 4.3.
Sentence-level valence classification systems assign labels such as positive,
negative, or neutral to whole sentences. It should be noted that the valence of a
sentence is not simply the sum of the polarities of its constituent words. Automatic
systems learn a model from labeled training data (instances that are already marked
as positive, negative, or neutral) using a large number of features such as word
and character ngrams, valence association lexicons, negation lists, word clusters,
and even embeddings-based features. In recent years, there have been a number
of shared task competitions on valence classification such as the 2013, 2014,
and 2015 SemEval shared tasks titled Sentiment Analysis in Twitter, the 2014
and 2015 SemEval shared tasks on Aspect Based Sentiment Analysis, the 2015
SemEval shared task Sentiment Analysis of Figurative Language in Twitter, and
the 2015 Kaggle competition Sentiment Analysis on Movie Reviews.2 The NRCCanada system (Mohammad et al. 2013a; Kiritchenko et al. 2014b), a supervised
machine learning system, came first in the 2013 and 2014 competitions. Other
sentiment analysis systems developed specifically for tweets include those by Pak
and Paroubek (2010), Agarwal et al. (2011), Thelwall et al. (2011), Brody and
Diakopoulos (2011), Aisopos et al. (2012), and Bakliwal et al. (2012). However,
even the best systems currently obtain an F-score of only about 0.7.
Sentiment analysis involving many sentences is often broken down into the
sentiment analysis of the component sentences. However, there is interesting work
in sentiment analysis of documents to generate text summaries (Ku et al. 2006; Liu
et al. 2007; Somprasertsri and Lalitrojwong 2010; Stoyanov and Cardie 2006; Lloret
et al. 2009), as well as detecting the patterns of sentiment and detecting sentiment
networks in novels and fairy tales (Nalisnick and Baird 2013a,b; Mohammad and
Yang 2011).

1
The top systems in the SemEval-2013 and 2014 Sentiment Analysis in Twitter tasks used large
sentiment lexicons (Wilson et al. 2013; Rosenthal et al. 2014a).
2
http://alt.qcri.org/semeval2015/task10/
http://alt.qcri.org/semeval2015/task12/
http://alt.qcri.org/semeval2015/task11/
http://www.kaggle.com/c/sentiment-analysis-on-movie-reviews

64

S.M. Mohammad

4.2.2 Detecting Sentiment of the Writer, Reader, and Other
Entities
On the surface, sentiment may seem unambiguous, but looking closer, it is easy
to see how sentiment can be associated with any of the following: 1. the speaker or
writer, 2. the listener or reader, or 3. one or more entities mentioned in the utterance.
A large majority of research in sentiment analysis has focused on detecting the
sentiment of the speaker, and this is often done by analyzing only the utterance.
However, there are several instances where it is unclear whether the sentiment in the
utterance is the same as the sentiment of the speaker. For example, consider:
James: The pop star suffered a fatal overdose of heroine.

The sentence describes a negative event (death of a person), but it is unclear whether
to conclude that James (the speaker) is personally saddened by the event. It is
possible that James is a news reader and merely communicating information about
the event. Developers of sentiment systems have to decide before hand whether they
wish to assign a negative or neutral sentiment to the speaker in such cases. More
generally, they have to decide whether the speaker’s sentiment will be chosen to be
neutral in absence of clear signifiers of the speaker’s own sentiment, or whether the
speaker’s sentiment will be chosen to be the same as the sentiment of events and
topics mentioned in the utterance.
On the other hand, people can react differently to the same utterance, for
example, people on opposite sides of a debate or rival sports fans. Thus modeling
listener sentiment requires modeling listener profiles. This is an area of research not
explored much by the community. Similarly, there is no work on modeling sentiment
of entities mentioned in the text, for example, given:
Drew: Jackson could not stop talking about the new Game of Thrones episode.

It will be useful to develop automatic systems that can deduce that Jackson (not
Drew) liked the new episode of Game of Thrones (a TV show).

4.2.3 Sentiment Towards Aspects of an Entity
A review of a product or service can express sentiment towards various aspects.
For example, a restaurant review can speak positively about the service, but express
a negative attitude towards the food. There is now a growing amount of work in
detecting aspects of products in text and also in determining sentiment towards
these aspects. In 2014, a shared task was organized for detecting aspect sentiment
in restaurant and laptop reviews (Pontiki et al. 2014a). The best performing
systems had a strong sentence-level sentiment analysis system to which they added
localization features so that more weight was given to sentiment features close to the
mention of the aspect. This task was repeated in 2015. It will be useful to develop

4 Challenges in Sentiment Analysis

65

aspect-based sentiment systems for other domains such as blogs and news articles
as well. (See proceeding of SemEval-2014 and 2015 for details about participating
aspect sentiment systems.)

4.2.4 Stance Detection
Stance detection is the task of automatically determining from text whether the
author of the text is in favor of, against, or neutral towards a proposition or target.
For example, given the following target and text pair:
Target of interest: women have the right to abortion
Text: A foetus has rights too!

Humans can deduce from the text that the speaker is against the proposition.
However, this is a challenging task for computers. To successfully detect stance,
automatic systems often have to identify relevant bits of information that may not
be present in the focus text. The systems also have to first identify the target of
opinion in the text and then determine its implication on the target of interest. Note
that the target of opinion need not be the same as the target of interest. For example,
that if one is actively supporting foetus rights (target of opinion), then he or she is
likely against the right to abortion (target of interest). Automatic systems can obtain
such information from large amounts of domain text.
Automatically detecting stance has widespread applications in information
retrieval, text summarization, and textual entailment. In fact, one can argue that
stance detection can bring complementary information to sentiment analysis,
because we often care about the authors evaluative outlook towards specific targets
and propositions rather than simply about whether the speaker was angry or happy.
Mohammad et al. (2016b) created the first dataset of tweets labeled for both
stance and sentiment. More than 4000 tweets are annotated for whether one can
deduce favorable or unfavorable stance towards one of five targets ‘Atheism’,
‘Climate Change is a Real Concern’, ‘Feminist Movement’, ‘Hillary Clinton’, and
‘Legalization of Abortion’. Each of these tweets is also annotated for whether
the target of opinion expressed in the tweet is the same as the given target of
interest. Finally, each tweet is annotated for whether it conveys positive, negative,
or neutral sentiment. Partitions of this stance-annotated data were used as training
and test sets in the SemEval-2016 shared task competition, Task #6: Detecting
Stance from Tweets Mohammad et al. (2016a). Participants were provided with
2,914 training instances labeled for stance for the five targets. The test data included
1,249 instances. All of the stance data is made freely available through the shared
task website. The task received submissions from 19 teams. The best performing
system obtained an overall average F-score of 67.8 in a three-way classification
task: favour, against, or neither. They employed two recurrent neural network (RNN)
classifiers: the first was trained to predict task-relevant hashtags on a large unlabeled
Twitter corpus. This network was used to initialize a second RNN classifier, which

66

S.M. Mohammad

was trained with the provided training data (Zarrella and Marsh 2016). Mohammad
et al. (2016b) developed a SVM system that only uses features drawn from word
and character ngrams and word embeddings to obtain an even better F-score of 70.3
on the shared task’s test set. Yet, performance of systems is substantially lower on
tweets where the target of opinion is an entity other than the target of interest.
Most of the earlier work focused on two-sided debates, for example on congressional debates (Thomas et al. 2006) or debates in online forums (Somasundaran
and Wiebe 2009; Murakami and Raymond 2010; Anand et al. 2011; Walker et al.
2012; Hasan and Ng 2013; Sridhar et al. 2014). New research in domains such as
social media texts, and approaches that combine traditional sentiment analysis with
relation extraction can make a significant impact in improving the state-of-the-art in
automatic stance detection.

4.2.5 Detecting Semantic Roles of Feeling
Past work in sentiment analysis has focused extensively on detecting polarity,
and to a smaller extent on detecting the target of the sentiment (the stimulus)
(Popescu and Etzioni 2005; Su et al. 2006; Xu et al. 2013; Qadir 2009; Zhang
et al. 2010; Zhang and Liu 2011; Kessler and Nicolov 2009). However, there
exist other aspects relevant to sentiment. Tables 4.1 and 4.2 show FrameNet
(Baker et al. 1998) frames for ‘feelings’ and ‘emotions’, respectively. Observe
that in addition to Evaluation, State, and Stimulus, several other roles such as
Reason, Degree, Topic, and Circumstance are also of significance and beneficial
to down-stream applications such as information retrieval, summarization, and
textual entailment. Detecting these various roles is essentially a semantic rolelabeling problem (Gildea and Jurafsky 2002; Màrquez et al. 2008; Palmer et al.
2010), and it is possible that they can be modeled jointly to improve detection
accuracy. Li and Xu (2014) proposed a rule-based system to extract the event that
was the cause of an emotional Weibo (Chinese microblogging service) message.
Mohammad et al. (2015a) created a corpus of tweets from the run up to the 2012
US presidential elections, with annotations for sentiment, emotion, stimulus, and
experiencer. The data also includes annotations for whether the tweet is sarcastic,

Table 4.1 The FrameNet frame for feeling
Role
Core
Emotion
State
Evaluation
Experiencer
Non-Core
Explanation

Description
The feeling that the experiencer experiences
The state the experiencer is in
A negative or positive assessment of the experiencer regarding his/her state
One who experiences the emotion and is in the state
The thing that leads to the experiencer feeling the emotion or state

4 Challenges in Sentiment Analysis

67

Table 4.2 The FrameNet frame for emotions
Role
Core
Experiencer
State
Stimulus
Topic
Non-Core
Circumstances
Degree
Empathy_target
Manner
Reason

Description
The person that experiences or feels the emotion
The abstract noun that describes the experience
The person or event that evokes the emotional response in the experiencer.
The general area in which the emotion occurs
The condition in which stimulus evokes response
The extent to which the experiencer’s emotion deviates from the norm for
the emotion
The Empathy_target is the individual or individuals with which the
experiencer identifies emotionally
Any description of the way in which the experiencer experiences
the stimulus which is not covered by more specific frame elements
The explanation for why the stimulus evokes a certain emotional response

ironic, or hyperbolic. Diman Ghazi and Szpakowicz (2015) compiled FrameNet
sentences that were tagged with the stimulus of certain emotions.

4.2.6 Detecting Affect and Emotions
Sentiment analysis is most commonly used to refer to the goal of determining
the valence or polarity of a piece of text. However, it can refer more generally to
determining one’s attitude towards a particular target or topic. Here, attitude can
even mean emotional or affectual attitude such as frustration, joy, anger, sadness,
excitement, and so on. Russell (1980) developed a circumplex model of affect and
showed that it can be characterized by two primary dimensions: valence (positive
and negative dimension) and arousal (degree of reactivity to stimulus). Thus, it
is not surprising that large amounts of work in sentiment analysis is focused on
determining valence. However, there is barely any work on automatically detecting
arousal and a relatively small amount of work on detecting emotions such as anger,
frustration, sadness, and optimism (Strapparava and Mihalcea 2007; Aman and
Szpakowicz 2007; Tokuhisa et al. 2008; Neviarouskaya et al. 2009; Bellegarda
2010; Mohammad 2012; Boucouvalas 2002; Zhe and Boucouvalas 2002; Holzman
and Pottenger 2003; Ma et al. 2005; Mohammad 2012; John et al. 2006; Mihalcea
and Liu 2006; Genereux and Evans 2006). Detecting these more subtle aspects
of sentiment has wide-ranging applications, for example in developing customer
relation models, public health, military intelligence, and the video games industry,
where it is necessary to make distinctions between anger and sadness (both of which
are negative), calm and excited (both of which are positive), and so on.

68

S.M. Mohammad

4.3 Sentiment of Words
Term–sentiment associations have been captured by manually created sentiment
lexicons as well as automatically generated ones.

4.3.1 Manually Generated Term-Sentiment Association
Lexicons
The General Inquirer (GI) has sentiment labels for about 3,600 terms (Stone et al.
1966). Hu and Liu (2004) manually labeled about 6,800 words and used them
for detecting sentiment of customer reviews. The MPQA Subjectivity Lexicon,
which draws from the General Inquirer and other sources, has sentiment labels
for about 8,000 words (Wilson et al. 2005). The NRC Emotion Lexicon has
sentiment and emotion labels for about 14,000 words (Mohammad and Turney
2010; Mohammad and Yang 2011). These labels were compiled through Mechanical
Turk annotations.3
For people, assigning a score indicating the degree of sentiment is not natural.
Different people may assign different scores to the same target item, and it is hard
for even the same annotator to remain consistent when annotating a large number of
items. In contrast, it is easier for annotators to determine whether one word is more
positive (or more negative) than the other. However, the latter requires a much larger
number of annotations than the former (in the order of N 2 , where N is the number
of items to be annotated).
An annotation scheme that retains the comparative aspect of annotation while
still requiring only a small number of annotations comes from survey analysis
techniques and is called MaxDiff (Louviere 1991). The annotator is presented with
four terms and asked which word is the most positive and which is the least positive.
By answering just these two questions five out of the six inequalities are known. If
the respondent says that A is most positive and D is least positive, then:
A > B; A > C; A > D; B > D; C > D

Each of these MaxDiff questions can be presented to multiple annotators. The
responses to the MaxDiff questions can then be easily translated into a ranking of
all the terms and also a real-valued score for all the terms (Orme 2009). If two
words have very different degrees of association (for example, A  D), then A will
be chosen as most positive much more often than D and D will be chosen as least
positive much more often than A. This will eventually lead to a ranked list such that
A and D are significantly farther apart, and their real-valued association scores are
also significantly different. On the other hand, if two words have similar degrees

3

https://www.mturk.com/mturk/welcome

4 Challenges in Sentiment Analysis

69

of association with positive sentiment (for example, A and B), then it is possible
that for MaxDiff questions having both A and B, some annotators will choose A as
most positive, and some will choose B as most positive. Further, both A and B will
be chosen as most positive (or most negative) a similar number of times. This will
result in a list such that A and B are ranked close to each other and their real-valued
association scores will also be close in value.
MaxDiff was used for obtaining annotations of relation similarity of pairs of
items in a SemEval-2012 shared task (Jurgens et al. 2012). Kiritchenko and Mohammad (2016a) applied Best–Worst Scaling to obtain real-valued sentiment association
scores for words and phrases in three different domains: general English, English
Twitter, and Arabic Twitter. They showed that on all three domains the ranking
of words by sentiment remains remarkably consistent even when the annotation
process is repeated with a different set of annotators. They also determine the
minimum difference in sentiment association that is perceptible to native speakers
of a language.

4.3.2 Automatically Generated Term-Sentiment Association
Lexicons
Semi-supervised and automatic methods have also been proposed to detect the
polarity of words. Hatzivassiloglou and McKeown (1997) proposed an algorithm
to determine the polarity of adjectives. SentiWordNet was created using supervised
classifiers as well as manual annotation (Esuli and Sebastiani 2006). Turney and
Littman (2003) proposed a minimally supervised algorithm to calculate the polarity
of a word by determining if its tendency to co-occur with a small set of positive
seed words is greater than its tendency to co-occur with a small set of negative seed
words. Mohammad et al. (2013b) employed the Turney method to generate a lexicon
(Hashtag Sentiment Lexicon) from tweets with certain sentiment-bearing seed-word
hashtags such as (#excellent, #good, #terrible, and so on) and another lexicon (Hashtag Sentiment Lexicon) from tweets with emoticons.4 Since the lexicons themselves
are generated from tweets, they even have entries for the creatively spelled words
(e.g. happpeee), slang (e.g. bling), abbreviations (e.g. lol), and even hashtags and
conjoined words (e.g. #loveumom). Cambria et al. (2016) created SenticNet that has
sentiment entries for 30,000 words and multi-word expressions using information
propagation to connect various parts of common-sense knowledge representations.
Kiritchenko et al. (2014b) proposed a method to create separate lexicons for words
found in negated context and those found in affirmative context; the idea being
that the same word contributes to sentiment differently depending on whether
it is negated or not. These lexicons contain sentiment associations for hundreds

4

http://www.purl.com/net/lexicons

70

S.M. Mohammad

of thousands of unigrams and bigrams. However, they do not explicitly handle
combinations of terms with modals, degree adverbs, and intensifiers.

4.4 Sentiment of Phrases, Sentences, and Tweets: Sentiment
Composition
Semantic composition, which aims at determining a representation of the meaning
of two words through manipulations of their individual representations, has gained
substantial attention in recent years with work from Mitchell and Loapata (2010),
Baroni and Zamparelli (2010), Rudolph and Giesbrecht (2010), Yessenalina and
Cardie (2011), Grefenstette et al. (2013), Grefenstette and Sadrzadeh (2011), and
Turney (2014). Socher et al. (2012) and Mikolov et al. (2013) introduced deep
learning models and distributed word representations in vector space (word embeddings) to obtain substantial improvements over the state-of-the-art in semantic
composition. Mikolov’s word2vec tool for generating word embeddings is available
publicly.5
Sentiment of a phrase or a sentence is often not simply the sum of the sentiments
of its constituents. Sentiment composition is the determining of sentiment of a
multi-word linguistic unit, such as a phrase or a sentence, based on its constituents.
Lexicons that include sentiment associations for phrases as well as their constituent
words are referred to as sentiment composition lexicons (SCLs). Kiritchenko
and Mohammad created sentiment composition lexicons for English and Arabic
that included: (1) negated expressions Kiritchenko and Mohammad (2016a,b),
(2) phrases with adverbs, modals, and intensifies Kiritchenko and Mohammad
(2016a,b), and (3) opposing polarity phrases (where at least one word in the phrase
is positive and at least one word is negative, for example, happy accident and
dark chocolate) (Kiritchenko and Mohammad 2016c). Socher et al. (2013) took
a dataset of movie review sentences that were annotated for sentiment and further
annotated ever word and phrasal constituent within those sentences for sentiment.
Such datasets where sentences, phrases, and their constituent words are annotated
for sentiment are helping foster further research on how sentiment is composed.
We discuss specific types of sentiment composition, and challenges for automatic
methods that address them, in the sub-sections below.

4.4.1 Negated Expressions
Morante and Sporleder (2012) define negation to be “a grammatical category that
allows the changing of the truth value of a proposition”. Negation is often expressed

5

https://code.google.com/p/word2vec

4 Challenges in Sentiment Analysis

71

through the use of negative signals or negator words such as not and never, and it can
significantly affect the sentiment of its scope. Understanding the impact of negation
on sentiment improves automatic analysis of sentiment. Earlier works on negation
handling employ simple heuristics such as flipping the polarity of the words in a
negator’s scope (Kennedy and Inkpen 2005; Choi and Cardie 2008) or changing
the degree of sentiment of the modified word by a fixed constant (Taboada et al.
2011). Zhu et al. (2014) show that these simple heuristics fail to capture the true
impact of negators on the words in their scope. They show that negators tend to
often make positive words negative (albeit with lower intensity) and make negative
words less negative (not positive). Zhu et al. also propose certain embeddings-based
recursive neural network models to capture the impact of negators more precisely.
As mentioned earlier, Kiritchenko et al. (2014b) capture the impact of negation by
creating separate sentiment lexicons for words seen in affirmative context and those
seen in negated contexts. They use a hand-chosen list of negators and determine
scope to be starting from the negator and ending at the first punctuation (or end of
sentence).
Several aspects about negation are still not understood though: for example, can
negators be ranked in terms of their average impact on the sentiment of their scopes
(which negators impact sentiment more and which impact sentiment less); in what
contexts does the same negator impact the sentiment of its scope more and in what
contexts is the impact less; how do people in different communities and cultures use
negations differently; and how negations of sentiment expressions should be dealt
with by paraphrase and textual entailment systems.

4.4.2 Phrases with Degree Adverbs, Intensifiers, and Modals
Degree adverbs such as barely, moderately, and slightly quantify the extent or
amount of the predicate. Intensifiers such as too and very are modifiers that do
not change the propositional content (or truth value) of the predicate they modify,
but they add to the emotionality. However, even linguists are hard pressed to
give out comprehensive lists of degree adverbs and intensifiers. Additionally, the
boundaries between degree adverbs and intensifiers can sometimes be blurred,
and so it is not surprising that the terms are occasionally used interchangeably.
Impacting propositional content or not, both degree adverbs and intensifiers impact
the sentiment of the predicate, and there is some work in exploring this interaction
(Zhang et al. 2008; Wang and Wang 2012; Xu et al. 2008; Lu and Tsou 2010;
Taboada et al. 2008). Most of this work focuses on identifying sentiment words
by bootstrapping over patterns involving degree adverbs and intensifiers. Thus
several areas remain unexplored, such as identifying patterns and regularities in how
different kinds of degree adverbs and intensifiers impact sentiment, ranking degree
adverbs and intensifiers in terms of how they impact sentiment, and determining
when (in what contexts) the same modifier will impact sentiment differently than

72

S.M. Mohammad

its usual behavior. (See Kiritchenko and Mohammad (2016b) for some recent work
exploring these questions in manually annotated sentiment composition lexicons.)
Modals (a kind of auxiliary verb) are used to convey the degree of confidence,
permission, or obligation to the predicate. Thus, if the predicate is sentiment bearing,
then the sentiment of the combination of the modal and the predicate can be different
from the sentiment of the predicate alone. For example, cannot work seems less
positive than work or will work (cannot and will are modals). There is little work
on automatically determining the impact of modals on sentiment.

4.4.3 Sentiment of Sentences, Tweets, and SMS messages
Bag-of-word models such as the NRC-Canada system (Mohammad et al. 2013a;
Kiritchenko et al. 2014a,b) and Unitn Severyn and Moschitti (2015) have been very
successful in recent shared task competitions on determining sentiment of whole
tweets, SMS messages, and sentences. However, approaches that apply systematic
sentiment composition of smaller units to determine sentiment of sentences are
growing in popularity. Socher et al. (2013) proposed a word-embeddings based
model that learns the sentiment of term compositions. They obtain state-of-theart results in determining both the overall sentiment and sentiment of constituent
phrases in movie review sentences. This has inspired tremendous interest in more
embeddings-based work for sentiment composition (Dong et al. 2014; Kalchbrenner
et al. 2014). These recursive models do not require any hand-crafted features or
semantic knowledge, such as a list of negation words or sentiment lexicons. However, they are computationally intensive and need substantial additional annotations
(word and phrase-level sentiment labeling). Nonetheless, use of word-embeddings
in sentiment composition is still in its infancy, and we will likely see much more
work using these techniques in the future.

4.4.4 Sentiment in Figurative Expressions
Figurative expressions in text, by definition, are not compositional. That is, their
meaning cannot fully be derived from the meaning of their components in isolation.
There is growing interest in detecting figurative language, especially irony and
sarcasm (Carvalho et al. 2009; Reyes et al. 2013; Veale and Hao 2010; Filatova
2012; González-Ibánez et al. 2011). In 2015, a SemEval shared task was organized
on detecting sentiment in tweets rich in metaphor and irony (Task 11).6 Participants
were asked to determine the degree of sentiment for each tweet where the score is
a real number in the range from 5 (most negative) to C5 (most positive). One of

6

The proceedings will be released later in 2015.

4 Challenges in Sentiment Analysis

73

the characteristics of the data is that a large majority is negative; thereby suggesting
that ironic tweets are largely negative. The SemEval 2014 shared task Sentiment
Analysis in Twitter Rosenthal et al. (2014a) had a separate test set involving
sarcastic tweets. Participants were asked not to train their system on sarcastic tweets,
but rather apply their regular sentiment system on this new test set; the goal was
to determine performance of regular sentiment systems on sarcastic tweets. It was
observed that the performances dropped by about 25% to 70%, thereby showing that
systems must be adjusted if they are to be applied to sarcastic tweets. We found little
to no work exploring automatic sentiment detection in hyperbole, understatement,
rhetorical questions, and other creative uses of language.

4.5 Challenges in Annotating for Sentiment
Clear and simple instructions are crucial for obtaining high-quality annotations. This
is true even for seemingly simple annotation tasks, such as sentiment annotation,
where one is to label instances as positive, negative, or neutral. For word annotations, researchers have often framed the task as ‘is this word positive, negative,
or neutral?’ Hu and Liu (2004), ‘does this word have associations with positive,
negative, or neutral sentiment?’ Mohammad and Turney (2013), or ‘which word
is more positive?’/‘which word has a greater association with positive sentiment’
(Kiritchenko et al. 2016; Kiritchenko and Mohammad 2016c). Similar instructions
are also widely used for sentence-level sentiment annotations—‘is this sentence
positive, negative, or neutral?’ (Rosenthal et al. 2015, 2014b; Mohammad et al.
2016a, 2015b). We will refer to such annotation schemes as the simple sentiment
questionnaires. On the one hand, this characterization of the task is simple, terse,
and reliant on the intuitions of native speakers of a language (rather than biasing the
annotators by providing definitions of what it means to be positive, negative, and
neutral). On the other hand, the lack of specification leaves the annotator in doubt
over how to label certain kinds of instances—for example, sentences where one side
wins against another, sarcastic sentences, or retweets.
A different approach to sentiment annotation is to ask respondents to identify
the target of opinion, and the sentiment towards this target of opinion (Pontiki
et al. 2014b; Mohammad et al. 2015b; Deng and Wiebe 2014). We will refer to
such annotation schemes as the semantic-role based sentiment questionnaires. This
approach of sentiment annotation is more specific, and more involved, than the
simple sentiment questionnaire approach; however, it too is insufficient for handling
several scenarios. Most notably, the emotional state of the speaker is not under
the purview of this scheme. Many applications require that statements expressing
positive or negative emotional state of the speaker should be marked as ‘positive’
or ‘negative’, respectively. Similarly, many applications require statements that
describe positive or negative events or situations to be marked as ‘positive’ or
‘negative’, respectively. Instructions for annotating opinion towards targets do not

74

S.M. Mohammad

specify how such instances are to be annotated, and worse still, possibly imply that
such instances are to be labeled as neutral.
Some sentence types that are especially challenging for sentiment annotation
(using either the simple sentiment questionnaire or the semantic-role based sentiment questionnaire) are listed below:
• Speaker’s emotional state: The speaker’s emotional state may or may not
have the same polarity as the opinion expressed by the speaker. For example,
a politician’s tweet can imply both a negative opinion about a rival’s past
indiscretion, and a joyous mental state as the news will impact the rival adversely.
• Success or failure of one side w.r.t. another: Often sentences describe the
success or failure of one side w.r.t. another side—for example, ‘Yay! France
beat Germany 3–1’, ‘Supreme court judges in favor of gay marriage’, and ‘the
coalition captured the rebels’. If one supports France, gay marriage, and the
coalition, then these events are positive, but if one supports Germany, marriage
as a union only between man and woman, and the rebels, then these events can
be seen as negative.
Also note that the framing of an event as the success of one party (or as the failure
of another party) does not automatically imply that the speaker is expressing
positive (or negative) opinion towards the mentioned party. For example, when
Finland beat Russia in ice hockey in the 2014 Sochi Winter Olympics, the
event was tweeted around the world predominantly as “Russia lost to Finland”
as opposed to “Finland beat Russia”. This is not because the speakers were
expressing negative opinion towards the Russian team, but rather simply because
Russia, being the host nation, was the focus of attention and traditionally Russian
hockey teams have been strong.
• Neutral reporting of valenced information: If the speaker does not give any
indication of her own emotional state but describes valenced events or situations,
then it is unclear whether to consider these statements as neutral unemotional
reporting of developments or whether to assume that the speaker is in a negative
emotional state (sad, angry, etc.). Example:
The war has created millions of refugees.

• Sarcasm and ridicule: Sarcasm and ridicule are tricky from the perspective of
assigning a single label of sentiment because they can often indicate positive
emotional state of the speaker (pleasure from mocking someone or something)
even though they have a negative attitude towards someone or something.
• Different sentiment towards different targets of opinion: The speaker may express
opinion about multiple targets, and sentiment towards the different targets might
be different. The targets may be different people or objects (for example, an
iPhone vs. an android phone), or they may be different aspects of the same entity
(for example, quality of service vs. quality of food at a restaurant).
• Precisely determining the target of opinion: Sometimes it is difficult to precisely
identify the target of opinion. For example, consider:
Glad to see Hillary’s lies being exposed.

4 Challenges in Sentiment Analysis

75

It is unclear whether the target of opinion is ‘Hillary’, ‘Hillary’s lies’, or
‘Hillary’s lies being exposed’. One reasonable interpretation is that positive
sentiment is expressed about ‘Hillary’s lies being exposed’. However, one can
also infer that the speaker has a negative attitude towards ‘Hillary’s lies’ and
probably ‘Hillary’ in general. It is unclear whether annotators should be asked to
provide all three opinion–target pairs or only one (in which case, which one?).
• Supplications and requests: Many tweets convey positive supplications to God
or positive requests to people in the context of a (usually) negative situation.
Examples include:
May god help those displaced by war.
Let us all come together and say no to fear mongering and divisive politics.

• Rhetorical questions: Rhetorical questions can be treated simply as queries (and
thus neutral) or as utterances that give away the emotional state of the speaker.
For example, consider:
Why do we have to quibble every time?

On the one hand, this tweet can be treated as a neutral question, but on the
other hand, it can be seen as negative because the utterance betrays a sense of
frustration on the part of the speaker.
• Quoting somebody else or re-tweeting: Quotes and retweets are difficult to
annotate for sentiment because it is often unclear and not explicitly evident
whether the one who quotes (or retweets) holds the same opinions as that
expressed by the quotee.
The challenges listed above can be addressed to varying degrees by providing
instructions to the annotators on how such instances are to be labeled. However,
detailed and complicated instructions can be counter-productive as the annotators
may not understand or may not have the inclination to understand the subtleties
involved. See Mohammad (2016a) for annotation schemes that address some of
these challenges.

4.6 Challenges in Multilingual Sentiment Analysis
Work on multilingual sentiment analysis has mainly addressed mapping sentiment
resources from English into morphologically complex languages. Mihalcea et al.
(2007) use English resources to automatically generate a Romanian subjectivity
lexicon using an English–Romanian dictionary. The generated lexicon is then
used to classify Romanian text. Wan (2008) translated Chinese customer reviews
to English using a machine translation system. The translated reviews are then
annotated using rule-based system that uses English lexicons. A higher accuracy is
achieved when using ensemble methods and combining knowledge from Chinese
and English resources. Balahur and Turchi (2014) conducted a study to assess
the performance of statistical sentiment analysis techniques on machine-translated

76

S.M. Mohammad

texts. Opinion-bearing phrases from the New York Times Text (2002–2005) corpus
were automatically translated using publicly available machine-translation engines
(Google, Bing, and Moses). Then, the accuracy of a sentiment analysis system
trained on original English texts was compared to the accuracy of the system
trained on automatic translations to German, Spanish, and French. The authors
conclude that the quality of machine translation is acceptable for sentiment analysis
to be performed on automatically translated texts. Salameh et al. (2015) conducted
experiments to determine loss in sentiment predictability when they translate Arabic
social media posts into English, manually and automatically. As benchmarks,
they use manually and automatically determined sentiment labels of the Arabic
texts. They show that sentiment analysis of English translations of Arabic texts
produces competitive results, w.r.t. Arabic sentiment analysis. They also claim that
even though translation significantly reduces human ability to recover sentiment,
automatic sentiment systems are affected relatively less by this.
Some of the areas less explored in the realm of multilingual sentiment analysis
include: how to translate text so as to preserve the degree of sentiment in the
source text; how sentiment modifiers such as negators and modals differ in function
across languages; understanding how automatic translations differ from manual
translations in terms of sentiment; and how to translate figurative language without
losing its affectual gist.

4.7 Challenges in Applying Sentiment Analysis
Applications of sentiment analysis benefit from the fact that even though systems are
not extremely accurate at determining sentiment of individual sentences, they can
accurately capture significant changes in the proportion of instances that are positive
(or negative). It is also worth noting that such sentiment tracking systems are more
effective when incorporating carefully chosen baselines. For example, knowing the
percentage of tweets that are negative towards Russian President, Vladimir Putin,
is less useful than, for instance, knowing: the percentage of tweets that are negative
towards Putin before vs. after the invasion of Crimea; or, the percentage of tweets
that are negative towards Putin in Russia vs. the rest of the world; or, the percentage
of tweets negative towards Putin vs. Barack Obama (US president).
Sentiment analysis is commonly applied in several areas including tracking
sentiment towards products, movies, politicians, and companies (O’Connor et al.
2010; Pang and Lee 2008), improving customer relation models (Bougie et al.
2003), detecting happiness and well-being (Schwartz et al. 2013), tracking the stock
market (Bollen et al. 2011), and improving automatic dialogue systems (Velásquez
1997; Ravaja et al. 2006). The sheer volume of work in this area precludes
detailed summarization here. Nonetheless, it should be noted that often the desired
application can help direct certain design choices in the sentiment analysis system.
For example, the threshold between neutral and positive sentiment and the threshold
between neutral and negative sentiment can be determined empirically by what

4 Challenges in Sentiment Analysis

77

is most suitable for the target application. Similarly, as suggested earlier, some
applications may require only the identification of strongly positive and strongly
negative instances.
Abundant availability of product reviews and their ratings has powered a lot of
the initial research in sentiment analysis, however, as we look forward, one can
be optimistic that the future holds more diverse and more compelling applications
of sentiment analysis. Some recent examples include predicting heart attack rates
through sentiment word usage in tweets (Eichstaedt et al. 2015), corpus-based
poetry generation (Colton et al. 2012), generating music that captures the sentiment
in novels (Davis and Mohammad 2014), confirming theories in literary analysis
(Hassan et al. 2012), and automatically detecting Cyber-bullying (Nahar et al. 2012).

References
Agarwal, A., B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of twitter
data. In Proceedings of Language in Social Media, 30–38. Portland.
Aisopos, F., G. Papadakis, K. Tserpes, and T. Varvarigou. 2012. Textual and contextual patterns for
sentiment analysis over microblogs. In Proceedings of the 21st WWW Companion, New York,
453–454.
Aman, S., and S. Szpakowicz. 2007. Identifying expressions of emotion in text. In Text, Speech
and Dialogue, Lecture notes in computer science, vol. 4629, 196–205.
Anand, Pranav, et al. 2011. Cats rule and dogs drool!: Classifying stance in online debate. In
Proceedings of the ACL workshop on computational approaches to subjectivity and sentiment
analysis, Portland.
Baker, C.F., C.J. Fillmore, and J.B. Lowe. 1998. The Berkeley framenet project. In Proceedings of
ACL, Stroudsburg, 86–90.
Bakliwal, A., P. Arora, S. Madhappan, N. Kapre, M. Singh, and V. Varma. 2012. Mining sentiments
from tweets. In Proceedings of WASSA’12, 11–18, Jeju.
Balahur, A., and M. Turchi. 2014. Comparative experiments using supervised learning and machine
translation for multilingual sentiment analysis. Computer Speech & Language 28(1): 56–75.
Baroni, M., and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing
adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on
Empirical Methods in Natural Language Processing, 1183–1193.
Bellegarda, J. 2010. Emotion analysis using latent affective folding and embedding. In Proceedings
of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation
of Emotion in Text. Los Angeles.
Bollen, J., H. Mao, and X. Zeng. 2011. Twitter mood predicts the stock market. Journal of
Computational Science 2(1): 1–8.
Boucouvalas, A.C. 2002. Real time text-to-emotion engine for expressive internet communication.
Emerging Communication: Studies on New Technologies and Practices in Communication 5:
305–318.
Bougie, J.R.G., R. Pieters, and M. Zeelenberg. 2003. Angry customers don’t come back, they get
back: The experience and behavioral implications of anger and dissatisfaction in services. Open
access publications from Tilburg university, Tilburg University.
Brody, S., and N. Diakopoulos. 2011. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word
lengthening to detect sentiment in microblogs. In Proceedings of the Conference on Empirical
Methods in Natural Language Processing, EMNLP’11, 562–570.
Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for
sentiment analysis based on conceptual primitives. In: COLING, 2666–2677.

78

S.M. Mohammad

Carvalho, P., L. Sarmento, M.J. Silva, and E. De Oliveira, 2009. Clues for detecting irony in
user-generated contents: oh. . . !! it’s so easy;-). In Proceedings of the 1st International CIKM
Workshop on Topic-Sentiment Analysis for Mass Opinion, 53–56. ACM.
Choi, Y., and C. Cardie. 2008. Learning with compositional semantics as structural inference for
subsentential sentiment analysis. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing, EMNLP’08, Honolulu, 793–801.
Colton, S., J. Goodwin, and T. Veale. 2012. Full face poetry generation. In Proceedings of the Third
International Conference on Computational Creativity, 95–102.
Davis, H., and S. Mohammad. 2014. Generating music from literature. In Proceedings of the 3rd
Workshop on Computational Linguistics for Literature (CLFL), Gothenburg, 1–10.
Deng, L., and J. Wiebe. 2014. Sentiment propagation via implicature constraints. In EACL, 377–
385.
Diman Ghazi, D.I., and S. Szpakowicz. 2015. Detecting emotion stimuli in emotion-bearing
sentences. In: Proceedings of the 2015 Conference on Intelligent Text Processing and Computational Linguistics.
Dong, L., F. Wei, M. Zhou, and K. Xu. 2014. Adaptive multi-compositionality for recursive neural
models with applications to sentiment analysis. In Twenty-Eighth AAAI Conference on Artificial
Intelligence (AAAI).
Eichstaedt, J.C., H.A. Schwartz, M.L. Kern, G. Park, D.R. Labarthe, R.M. Merchant, S. Jha, M.
Agrawal, L.A. Dziurzynski, and M. Sap et al. 2015. Psychological language on twitter predicts
county-level heart disease mortality. Psychological Science 2: 159–169.
Esuli, A., and F. Sebastiani. 2006. SENTIWORDNET: A publicly available lexical resource
for opinion mining. In In Proceedings of the 5th Conference on Language Resources and
Evaluation, LREC’06, 417–422.
Filatova, E. 2012. Irony and sarcasm: Corpus generation and analysis using crowdsourcing. In
LREC, 392–398.
Genereux, M., and R.P. Evans. 2006. Distinguishing affective states in weblogs. In AAAI-2006
Spring Symposium on Computational Approaches to Analysing Weblogs, Stanford, 27–29.
Gildea, D., and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics
28(3): 245–288.
González-Ibánez, R., S. Muresan, and N. Wacholder. 2011. Identifying sarcasm in twitter: A closer
look. In Proceedings of the ACL, 581–586.
Grefenstette, E., G. Dinu, Y.-Z. Zhang, M. Sadrzadeh, and M. Baroni. 2013. Multi-step regression
learning for compositional distributional semantics. arXiv preprint arXiv:1301.6939.
Grefenstette, E., and M. Sadrzadeh. 2011. Experimental support for a categorical compositional
distributional model of meaning. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing, 1394–1404.
Hasan, Kazi Saidul, and Vincent Ng. 2013. Stance classification of ideological debates: Data,
models, features, and constraints. In The 6th international joint conference on natural language
processing, Nagoya.
Hassan, A., A. Abu-Jbara, and D. Radev. 2012. Extracting signed social networks from text.
In Workshop Proceedings of TextGraphs-7 on Graph-Based Methods for Natural Language
Processing, 6–14.
Hatzivassiloglou, V., and K.R. McKeown. 1997. Predicting the semantic orientation of adjectives.
In Proceedings of the 8th Conference of European Chapter of the Association for Computational Linguistics, Madrid, 174–181.
Holzman, L.E., and W.M. Pottenger. 2003. Classification of emotions in internet chat: An
application of machine learning using speech phonemes. Technical report, Leigh University.
Hu, M., and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’04,
New York, 168–177.
John, D., A.C. Boucouvalas, and Z. Xu. 2006. Representing emotional momentum within expressive internet communication. In Proceedings of the 24th IASTED International Conference on
Internet and Multimedia Systems and Applications, 183–188. Anaheim: ACTA Press.

4 Challenges in Sentiment Analysis

79

Jurgens, D., S.M. Mohammad, P. Turney, and K. Holyoak. 2012. Semeval-2012 task 2: Measuring
degrees of relational similarity. In Proceedings of the 6th International Workshop on Semantic
Evaluation, SemEval’12, Montréal, 356–364.
Kalchbrenner, N., E. Grefenstette, and P. Blunsom. 2014. A convolutional neural network for
modelling sentences. arXiv preprint arXiv:1404.2188.
Kennedy, A., and D. Inkpen. 2005. Sentiment classification of movie and product reviews using
contextual valence shifters. In Proceedings of the Workshop on the Analysis of Informal and
Formal Information Exchange During Negotiations, Ottawa
Kessler, J.S., and N. Nicolov. 2009. Targeting sentiment expressions through supervised ranking of
linguistic configurations. In 3rd Int’l AAAI Conference on Weblogs and Social Media (ICWSM
2009).
Kiritchenko, S., and S.M. Mohammad. 2016a. Capturing reliable fine-grained sentiment associations by crowdsourcing and best–worst scaling. In Proceedings of the 15th Annual Conference
of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (NAACL), San Diego.
Kiritchenko, S., and S.M. Mohammad. 2016b. The effect of negators, modals, and degree adverbs
on sentiment composition. In Proceedings of the Workshop on Computational Approaches to
Subjectivity, Sentiment and Social Media Analysis (WASSA).
Kiritchenko, S., and S.M. Mohammad. 2016c. Sentiment composition of words with opposing
polarities. In Proceedings of the 15th Annual Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies (NAACL), San
Diego.
Kiritchenko, S., S.M. Mohammad, and M. Salameh. 2016. Semeval-2016 task 7: Determining
sentiment intensity of English and arabic phrases. In Proceedings of the International Workshop
on Semantic Evaluation, SemEval-2016, San Diego.
Kiritchenko, S., X. Zhu, C. Cherry, and S. Mohammad. 2014a. Nrc-canada-2014: Detecting aspects
and sentiment in customer reviews. In Proceedings of the 8th International Workshop on
Semantic Evaluation (SemEval 2014), Dublin, 437–442.
Kiritchenko, S., X. Zhu, and S.M. Mohammad. 2014b. Sentiment analysis of short informal texts.
Journal of Artificial Intelligence Research 50: 723–762.
Ku, L.-W., Y.-T. Liang, and H.-H. Chen. 2006. Opinion extraction, summarization and tracking in
news and blog corpora. In AAAI Spring Symposium: Computational Approaches to Analyzing
Weblogs, vol. 100107.
Li, W., and H. Xu. 2014. Text-based emotion classification using emotion cause extraction. Expert
Systems with Applications 41(4, Part 2): 1742–1749.
Liu, B., and L. Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text
Data, ed. C.C. Aggarwal and C. Zhai, 415–463. New York: Springer
Liu, J., Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. 2007. Low-quality product review detection in
opinion summarization. In EMNLP-CoNLL, 334–342.
Lloret, E., A. Balahur, M. Palomar, and A. Montoyo. 2009. Towards building a competitive
opinion summarization system: challenges and keys. In Proceedings of Human Language
Technologies: The 2009 Annual Conference of the North American Chapter of the Association
for Computational Linguistics, Companion Volume: S, 72–77.
Louviere, and J.J. 1991. Best-worst scaling: A model for the largest difference judgments. Working
Paper.
Lu, B., and B.K. Tsou. 2010. Cityu-dac: Disambiguating sentiment-ambiguous adjectives within
context. In Proceedings of the 5th International Workshop on Semantic Evaluation, 292–295.
Ma, C., H. Prendinger, and M. Ishizuka. 2005. Emotion estimation and reasoning based on affective
textual interaction. In First International Conference on Affective Computing and Intelligent
Interaction (ACII-2005), ed. J. Tao, R.W. Picard, Beijing, 622–628.
Màrquez, L., X. Carreras, K.C. Litkowski, and S. Stevenson. 2008. Semantic role labeling: An
introduction to the special issue. Computational Linguistics 34(2): 145–159.

80

S.M. Mohammad

Mihalcea, R., C. Banea, and J. Wiebe. 2007. Learning multilingual subjective language via
cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association of
Computational Linguistics.
Mihalcea, R., and H. Liu. 2006. A corpus-based approach to finding happiness. In AAAI-2006
Spring Symposium on Computational Approaches to Analysing Weblogs, 139–144. AAAI
Press.
Mikolov, T., I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. 2013. Distributed representations of
words and phrases and their compositionality. In Advances in Neural Information Processing
Systems, 3111–3119.
Mitchell, J., and M. Loapata. 2010. Composition in distributional models of semantics. Cognitive
Science 34(8): 1388–1429.
Mohammad, S. 2012. Portable features for classifying emotional text. In Proceedings of the 2012
Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, Montréal, 587–591.
Mohammad, S., S. Kiritchenko, and X. Zhu. 2013a. Nrc-canada: Building the state-of-the-art
in sentiment analysis of tweets. In Proceedings of the Seventh International Workshop on
Semantic Evaluation Exercises (SemEval-2013), Atlanta.
Mohammad, S., S. Kiritchenko, and X. Zhu. 2013b. NRC-Canada: Building the state-of-the-art
in sentiment analysis of tweets. In Proceedings of the International Workshop on Semantic
Evaluation, SemEval’13, Atlanta.
Mohammad, S., and T. Yang. 2011. Tracking sentiment in mail: How genders differ on emotional
axes. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and
Sentiment Analysis (WASSA 2011), Portland, 70–79.
Mohammad, S.M. 2012. #emotional tweets. In Proceedings of the First Joint Conference on
Lexical and Computational Semantics – Volume 1: Proceedings of the Main Conference and
the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic
Evaluation, SemEval’12, Stroudsburg, 246–255.
Mohammad, S.M. 2016a. A practical guide to sentiment annotation: Challenges and solutions.
In Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and
Social Media Analysis.
Mohammad, S.M. 2016b. Sentiment analysis: Detecting valence, emotions, and other affectual
states from text. In Emotion Measurement, ed. H. Meiselman. Amsterdam: Elsevier.
Mohammad, S.M., S. Kiritchenko, P. Sobhani, X. Zhu, and C. Cherry. 2016a. Semeval-2016
task 6: Detecting stance in tweets. In Proceedings of the International Workshop on Semantic
Evaluation, SemEval’16, San Diego.
Mohammad, S.M., P. Sobhani, and S. Kiritchenko. 2016b, In Press. Stance and Sentiment in
Tweets. Special Section of the ACM Transactions on Internet Technology on Argumentation
in Social Media.
Mohammad, S.M., and P.D. Turney. 2010. Emotions evoked by common words and phrases:
Using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL-HLT 2010
Workshop on Computational Approaches to Analysis and Generation of Emotion in Text,
California.
Mohammad, S.M., and P.D. Turney. 2013. Crowdsourcing a word-emotion association lexicon.
Computational Intelligence 29(3): 436–465.
Mohammad, S.M., X. Zhu, S. Kiritchenko, and J. Martin. 2015a. Sentiment, emotion, purpose, and
style in electoral tweets. Information Processing & Management 51: 480–499.
Mohammad, S.M., X. Zhu, S. Kiritchenko, and J. Martin. 2015b. Sentiment, emotion, purpose, and
style in electoral tweets. Information Processing and Management 51(4): 480–499.
Murakami, Akiko, and Rudy Raymond. 2010. Support or oppose?: Classifying positions in online
debates from reply activities and opinion expressions. In Proceedings of the 23rd international
conference on computational linguistics, Beijing.
Nahar, V., S. Unankard, X. Li, and C. Pang. 2012. Sentiment analysis for effective detection of
cyber bullying. In Web Technologies and Applications, 767–774. Berlin/Heidelberg: Springer.

4 Challenges in Sentiment Analysis

81

Nalisnick, E.T., and H.S. Baird. 2013a. Character-to-character sentiment analysis in Shakespeare’s
plays. In Proceedings of the 51st annual meeting of the association for computational
linguistics (ACL), Short Paper, Sofia, 479–483, Aug 2013.
Nalisnick, E.T., and H.S. Baird. 2013b. Extracting sentiment networks from Shakespeare’s plays.
In 2013 12th International Conference on Document Analysis and Recognition (ICDAR), 758–
762. IEEE.
Neviarouskaya, A., H. Prendinger, and M. Ishizuka. 2009. Compositionality principle in recognition of fine-grained emotions from text. In Proceedings of the Proceedings of the Third
International Conference on Weblogs and Social Media (ICWSM-09), 278–281, San Jose.
O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. From tweets to polls:
Linking text sentiment to public opinion time series. In Proceedings of the International AAAI
Conference on Weblogs and Social Media.
Orme, B. 2009. Maxdiff analysis: Simple counting, individual-level logit, and HB. Orem: Sawtooth
Software, Inc.
Pak, A., and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining.
In Proceedings of the 7th Conference on International Language Resources and Evaluation,
LREC’10, Valletta.
Palmer, M., D. Gildea, and N. Xue. 2010. Semantic role labeling. Synthesis Lectures on Human
Language Technologies 3(1): 1–103.
Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval 2(1–2): 1–135.
Pontiki, M., D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar.
2014a. SemEval-2014 Task 4: Aspect based sentiment analysis. In Proceedings of the International Workshop on Semantic Evaluation, SemEval’14, Dublin.
Pontiki, M., H. Papageorgiou, D. Galanis, I. Androutsopoulos, J. Pavlopoulos, and S. Manandhar.
2014b. SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th
International Workshop on Semantic Evaluation, SemEval’14, Dublin.
Popescu, A.-M., and O. Etzioni. 2005. Extracting product features and opinions from reviews. In
Proceedings of the Conference on Human Language Technology and Empirical Methods in
Natural Language Processing, HLT’05, Stroudsburg, 339–346.
Qadir, A. 2009. Detecting opinion sentences specific to product features in customer reviews using
typed dependency relations. In Proceedings of the Workshop on Events in Emerging Text Types,
eETTs’09, Stroudsburg, 38–43.
Ravaja, N., T. Saari, M. Turpeinen, J. Laarni, M. Salminen, and M. Kivikangas. 2006. Spatial
presence and emotions during video game playing: Does it matter with whom you play?
Presence: Teleoperators and Virtual Environments 15(4): 381–392.
Reyes, A., P. Rosso, and T. Veale. 2013. A multidimensional approach for detecting irony in twitter.
Language Resources and Evaluation 47(1): 239–268.
Rosenthal, S., P. Nakov, S. Kiritchenko, S. Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval2015 task 10: Sentiment analysis in Twitter. In Proceedings of the 9th International Workshop
on Semantic Evaluation, SemEval’15, Denver, 450–462.
Rosenthal, S., P. Nakov, A. Ritter, and V. Stoyanov. 2014a. SemEval-2014 Task 9: Sentiment
analysis in Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation,
ed. P. Nakov, and T. Zesch, SemEval-2014, Dublin.
Rosenthal, S., A. Ritter, P. Nakov, and V. Stoyanov. 2014b. SemEval-2014 Task 9: Sentiment
analysis in Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation
(SemEval 2014), Dublin, 73–80.
Rudolph, S., and E. Giesbrecht. 2010. Compositional matrix-space models of language. In
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,
907–916.
Russell, J.A. 1980. A circumplex model of affect. Journal of Personality and Social Psychology
39(6): 1161.

82

S.M. Mohammad

Salameh, M., S.M. Mohammad, and S. Kiritchenko. 2015. Sentiment after translation: A casestudy on arabic social media posts. In Proceedings of the North American Chapter of
Association of Computational Linguistics, Denver.
Schwartz, H., J. Eichstaedt, M. Kern, L. Dziurzynski, R. Lucas, M. Agrawal, G. Park, et al.
2013. Characterizing geographic variation in well-being using tweets. In Proceedings of the
International AAAI Conference on Weblogs and Social Media.
Severyn, A., and A. Moschitti. 2015. Unitn: Training deep convolutional neural network for
twitter sentiment classification. In Proceedings of the 9th International Workshop on Semantic
Evaluation (SemEval 2015), 464–469. Denver: Association for Computational Linguistics.
Socher, R., B. Huval, C.D. Manning, and A.Y. Ng. 2012. Semantic compositionality through
recursive matrix-vector spaces. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing, EMNLP’12, Jeju.
Socher, R., A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of
the Conference on Empirical Methods in Natural Language Processing, EMNLP’13, Seattle.
Somasundaran, Swapna, and Janyce Wiebe. 2009. Recognizing stances in online debates. In
Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th
international joint conference on natural language processing of the AFNLP, Singapore.
Somprasertsri, G., and P. Lalitrojwong. 2010. Mining feature-opinion in online customer reviews
for opinion summarization. Journal of Universal Computer Science 16(6): 938–955.
Sridhar, Dhanya, Lise Getoor, and Marilyn Walker. 2014. Collective stance classification of posts
in online debate forums. In Proceedings of the 52nd annual meeting of the association for
computational linguistics, Baltimore.
Stone, P., D.C. Dunphy, M.S. Smith, D.M. Ogilvie, and associates. 1966. The General Inquirer: A
Computer Approach to Content Analysis. Cambridge, MA: The MIT Press.
Stoyanov, V., and C. Cardie. 2006. Toward opinion summarization: Linking the sources. In
Proceedings of the Workshop on Sentiment and Subjectivity in Text, 9–14.
Strapparava, C., and R. Mihalcea. 2007. Semeval-2007 Task 14: Affective text. In Proceedings of
SemEval-2007, Prague, 70–74.
Su, Q., K. Xiang, H. Wang, B. Sun, and S. Yu. 2006. Using pointwise mutual information
to identify implicit features in customer reviews. In Proceedings of the 21st International
Conference on Computer Processing of Oriental Languages: Beyond the Orient: The Research
Challenges Ahead, ICCPOL’06, 22–30. Berlin/Heidelberg: Springer.
Taboada, M., J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for
sentiment analysis. Computational Linguistics 37(2): 267–307.
Taboada, M., K. Voll, and J. Brooke. 2008. Extracting sentiment as a function of discourse structure
and topicality. Simon Fraser Univeristy School of Computing Science Technical Report.
Thelwall, M., K. Buckley, and G. Paltoglou. 2011. Sentiment in Twitter events. Journal of the
American Society for Information Science and Technology 62(2): 406–418.
Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition
from congressional floor-debate transcripts. In Proceedings of the 2006 conference on empirical
methods in natural language processing. Sydney: Association for Computational Linguistics.
Tokuhisa, R., K. Inui, and Y. Matsumoto. 2008. Emotion classification using massive examples
extracted from the web. In Proceedings of the 22nd International Conference on Computational
Linguistics – Volume 1, COLING’08, 881–888.
Turney, P., and M.L. Littman. 2003. Measuring praise and criticism: Inference of semantic
orientation from association. ACM Transactions on Information Systems 21(4): 315–346.
Turney, P.D. 2014. Semantic composition and decomposition: From recognition to generation.
arXiv preprint arXiv:1405.7908.
Veale, T., and Y. Hao. 2010. Detecting ironic intent in creative comparisons. ECAI 215: 765–770.
Velásquez, J.D. 1997. Modeling emotions and other motivations in synthetic agents. In Proceedings
of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on
Innovative Applications of Artificial Intelligence, AAAI’97/IAAI’97, 10–15. AAAI Press.

4 Challenges in Sentiment Analysis

83

Walker, Marilyn A., et al. 2012. A corpus for research on deliberation and debate. In proceedings
of the eighth international conference on language resources and evaluation (LREC), Istanbul.
Wan, X. 2008. Using bilingual knowledge and ensemble techniques for unsupervised Chinese
sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural
Language Processing, EMNLP’08, 553–561.
Wang, C., and F. Wang. 2012. A bootstrapping method for extracting sentiment words using degree
adverb patterns. In 2012 International Conference on Computer Science and Service System
(CSSS), 2173–2176. IEEE.
Wilson, T., Z. Kozareva, P. Nakov, S. Rosenthal, V. Stoyanov, and A. Ritter. 2013. SemEval2013 Task 2: Sentiment analysis in Twitter. In Proceedings of the International Workshop on
Semantic Evaluation, SemEval’13, Atlanta.
Wilson, T., J. Wiebe, and P. Hoffmann. 2005. Recognizing contextual polarity in phrase-level
sentiment analysis. In Proceedings of the Conference on Human Language Technology and
Empirical Methods in Natural Language Processing, HLT’05, Stroudsburg, 347–354.
Xu, G., C.-R. Huang, and H. Wang. 2013. Extracting Chinese product features: Representing a
sequence by a set of skip-bigrams. In Proceedings of the 13th Chinese Conference on Chinese
Lexical Semantics, CLSW’12, 72–83. Berlin/Heidelberg: Springer.
Xu, R., K.-F. Wong, Q. Lu, Y. Xia, and W. Li. 2008. Learning knowledge from relevant webpage
for opinion analysis. In IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technology, 2008. WI-IAT’08., vol. 1, 307–313. IEEE.
Yessenalina, A., and C. Cardie. 2011. Compositional matrix-space models for sentiment analysis.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 172–
182.
Zarrella, G., and A. Marsh. 2016. MITRE at SemEval-2016 Task 6: Transfer learning for Stance
detection. In Proceedings of the International Workshop on Semantic Evaluation, SemEval’16,
San Diego.
Zhang, C., D. Zeng, Q. Xu, X. Xin, W. Mao, and F.-Y. Wang. 2008. Polarity classification
of public health opinions in Chinese. In Intelligence and Security Informatics, 449–454.
Berlin/Heidelberg: Springer.
Zhang, L., and B. Liu. 2011. Identifying noun product features that imply opinions. In Proceedings
of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies: Short Papers – Volume 2, HLT’11, 575–580.
Zhang, L., B. Liu, S.H. Lim, and O’Brien-E. Strain. 2010. Extracting and ranking product features
in opinion documents. In Proceedings of the 23rd International Conference on Computational
Linguistics: Posters, COLING’10, Stroudsburg, 1462–1470.
Zhe, X., and A. Boucouvalas. 2002. Text-to-Emotion Engine for Real Time Internet Communication
Text-to-Emotion Engine for Real Time Internet Communication, 164–168.
Zhu, X., H. Guo, S. Mohammad, and S. Kiritchenko. 2014. An empirical study on the effect of
negation words on sentiment. In Proceedings of the 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), Baltimore, 304–313.

Chapter 5

Sentiment Resources: Lexicons and Datasets
Aditya Joshi, Pushpak Bhattacharyya, and Sagar Ahire

Abstract Sentiment lexicons and datasets represent the knowledge base that lies
at the foundation of a SA system. In its simplest form, a sentiment lexicon is
a repository of words/phrases labelled with sentiment. Similarly, a sentimentannotated dataset consists of documents (tweets, sentences or longer documents)
labelled with one or more sentiment labels. This chapter explores the philosophy,
execution and utility of popular sentiment lexicons and datasets. We describe
different labelling schemes that may be used. We then provide a detailed description
of existing sentiment and emotion lexicons, and the trends underlying research in
lexicon generation. This is followed by a survey of sentiment-annotated datasets
and the nuances of labelling involved. We then show how lexicons and datasets
created for one language can be transferred to a new language. Finally, we place
these sentiment resources in the perspective of their classic applications to sentiment
analysis.
Keywords Sentiment lexicons • Sentiment datasets • Evaluation • Transfer
learning

The previous chapter shows that sentiment analysis (SA) is indeed more challenging
than it seems. The next question that arises is, where does the program ‘learn’
the sentiment from? In other words, where does the knowledge required for a SA
system come from? This chapter discusses sentiment resources as means to this
requirement of knowledge. We refer to words/phrases and documents as ‘textual
units’. In sentiment resources, it is these textual units that are annotated with
sentiment information.

A. Joshi ()
IITB-Monash Research Academy, Mumbai, India
e-mail: adityaj@cse.iitb.ac.in
P. Bhattacharyya • S. Ahire
IIT Bombay, Mumbai, India
e-mail: pb@cse.iitb.ac.in
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_5

85

86

A. Joshi et al.

5.1 Introduction
Sentiment resources, i.e., lexicons and datasets represent the knowledge base of
a SA system. Thus, creation of a sentiment lexicon or a dataset is the fundamental requirement of a SA system. In case of a lexicon, it is in the form of
simpler units like words and phrases, whereas in case of datasets, it consists of
comparatively longer text. There exists a wide spectrum of such resources that
can be used for sentiment/emotion analysis. Before we proceed, we reiterate the
definition of sentiment and emotion analysis. We refer to sentiment analysis as a
positive/negative/neutral classification task, whereas emotion analysis deals with
a wider spectrum of emotions such as angry, excited, etc. A discussion on both
sentiment and emotion lexicons is imperative to show how different the philosophy
behind construction of the two is.
A sentiment resource is a repository of textual units marked with one or
more labels representing a sentiment state. This means that there are two driving
components of a sentiment resource: (a) the textual unit, and (b) the labels. We
discuss the second component, labels in detail in Sect. 5.2.
In case of a sentiment lexicon, the lexical unit may be a word, a phrase or a
concept from a general purpose lexicon like WordNet. What constitutes the labels
is also important. The set of labels may be purely functional: task-based. For a
simple positive-negative classification, it is often sufficient to have a set of positive
and negative words. If the goal is a system that gives ‘magnitude’ (‘The movie was
horrible’ is more strongly negative than ‘The movie was bad’), then the lexicon
needs to capture that information in terms of a magnitude in addition to positive and
negative words.
An annotated dataset consists of documents labelled with one or more output
labels. As in the case of sentiment lexicons, the two driving components of a
sentiment-annotated dataset are: (a) the textual unit, and (b) the labels. For example,
a dataset may consist of a set of movie reviews (the textual units) annotated
by human annotators as positive or negative (the labels). Datasets often contain
additional annotation in order to enrich the quality of annotation. For example,
a dataset of restaurant reviews annotated with sentiment may contain additional
annotation in the form of restaurant location. Such annotation may facilitate insights
such as: which restaurant is the most popular, what are the issues with respect to this
outlet of a restaurant that people complain the most about, etc.

5.2 Labels
A set of labels is the pre-determined set of attributes that each textual unit in a
sentiment resource will be annotated with. The process of assigning a label to a
textual unit is called annotation, and in case the label pertains to sentiment, the
process is called sentiment annotation. The goal of sentiment annotation is to assign

5 Sentiment Resources: Lexicons and Datasets

87

labels in one out of three schemes: absolute, overlapping and fuzzy. The first two
are shown in Liu (2010).
Absolute labelling is when a textual unit is marked as exactly one out of multiple
labels. An example of absolute labelling may be positive versus negative – where
each document is annotated as either positive or negative. An additional label
‘neutral’ may be added. A fallback label such as ‘ambiguous’/‘unknown’/‘unsure’
may be introduced. Numeric schemes that allow labels to range between, say, C5 to
5 also fall under this method of labelling.
Labels can be overlapping as well. A typical example of this is emotion labels.
Emotions are more complex than sentiment, because there can be more than one
emotion at a time. For example, the sentence, “Was happy to bump into my friend
at the airport this afternoon.” would be labelled as positive as a sentiment-annotated
sentence. On the other hand, an emotion annotation would require two labels to
be assigned to this text: happiness and surprise. Emotions can, in fact, be thought
of arising from a combination of emotions, and their magnitudes. This means that
while positive-negative are mutually exclusive, emotions need not be. In such cases,
each one of them must be viewed as a Boolean attribute. This means that the word
‘amazed’ will be marked as ‘happy: yes, surprised: yes’ for an emotion lexicon,
whereas the same ‘amazed’ will be marked as ‘positive’ for a sentiment lexicon. By
definition, a positive word implies that it is not negative.
Finally, the third scheme of labelling is fuzzy: where a distribution over different
labels is assigned to a textual unit. Consider the case where we assign a distribution
over ‘positive/negative’ as a label. Such a distribution implies likelihood of the
textual unit to belong to the given label. For example, a word with ‘positive:0.8,
negative:0.2’ means that the word tends to occur more frequently in a positive
sense – however, it is not completely positive and it may still be used in the negative
sense to an extent.
Several linguistic studies have explored what constitutes basic labels for a
sentiment resource. In the next subsections, we look at three strategies.

5.2.1 Stand-Alone Labels
A sentiment resource may use two labels: positive or negative. The granularity
can be increased to strongly positive, moderately positive and so on. A positive
unit represents a desirable state, whereas a negative unit represents an undesirable
state (Liu 2010). Emotion labels are more nuanced. Basic emotions are a list of
emotions that are fundamental to human experience. Whether or not there are any
basic emotions at all, and whether it is worthwhile to discover these basic emotions
has been a matter of disagreement. Ortony and Turner (1990) state that the basic
emotion approach (i.e., stating that there are basic emotions and other emotions
evolve from them) is flawed, while Ekman (1992) supports the basic emotion theory.
Several basic emotions have been suggested. Ekman suggests six basic emotions:

88

A. Joshi et al.

anger, disgust, fear, sadness, happiness and surprise. Plutchik has listed eight basic
emotions: six from Ekman’s list in addition to anticipation and trust (Plutchik 1980).

5.2.2 Dimensions
Sentiment has been defined by Liu (2010) as a 5-tuple: . This means
that sentiment in a textual unit can be captured accurately only if information
along the five dimensions is obtained. Similarly, emotions can also be looked at in
the form of two dimensions: valence and arousal (Mehrabian and Russell 1974).
Valence indicates whether an emotion is pleasant or unpleasant. Arousal indicates
the magnitude of an emotion. Happy and excited are two forms of a pleasant
emotion, but they differ along the arousal axis. Excitement indicates a state where a
person is happy, but aroused to a great degree. On the other hand, calm and content,
while still being pleasant emotions, represent a deactivated state. Corresponding
emotions in the left quadrant (that indicates unpleasant emotions) are sad, stressed,
bored and fatigued. In such a case, overlapping labelling must be used. A resource
annotated using dimensional structure will assign a value per dimension for each
textual unit.

5.2.3 Structures
Plutchik wheel of emotions (Plutchik 1982) is a popular structure that represents
basic emotions, and emotions that arise as a combination of these emotions. It
combines the notion of basic emotions, along with arousal as seen in case of
emotion dimensions. The basic emotions according to Plutchik’s wheel are joy,
trust, fear, surprise, anticipation, sadness, disgust, anger and anticipation. The basic
emotions are arranged in a circular manner to indicate antonymy. The opposite of
‘joy’ is placed diametrically opposite to it: ‘sadness’. Similarly, ‘anticipation’ lies
diametrically opposite to ‘surprise’. Each ‘petal’ of the wheel indicates the arousal
of the emotion. The emotion ‘joy’ has ‘serenity’ above it and ‘ecstasy’ below it.
These emotions indicate a deactivated and activated state of arousal respectively.
Similarly, an aroused state of ‘anger’ becomes ‘rage’. Thus, the eight emotions in the
central circle are the aroused forms of the basic emotions. These are: rage, loathing,
grief, amazement, terror, admiration, ecstasy and vigilance. The wheel also allows
combination of emotions to create more nuanced emotions. A resource annotated
using a structure such as the Plutchik wheel of emotions will place every textual
unit in the space represented by the structure.

5 Sentiment Resources: Lexicons and Datasets

89

5.3 Lexicons
We now discuss sentiment lexicons: we describe them individually first, and then
show trends in lexicon generation. Words/phrases have two kinds of sentiment,
as given in Liu (2010): absolute and relative. Absolute sentiment means that
the sentiment remains the same, given the right word/phrase and meaning. For
example, the word ‘beautiful’ is a positive word. Relative sentiment means that the
sentiment changes depending on the context. For example, the word ‘increased’ or
‘fuelled’ has a positive/negative sentiment based on what the object of the word is.
There exists a third category of sentiment: implicit sentiment. Implicit sentiment
is different from absolute sentiment. Implicit sentiment is the sentiment that is
commonly invoked in the mind of a reader when he/she reads that word/phrase.
Consider the example ‘amusement parks’. A reader typically experiences positive
sentiment on reading this word. Similarly, the phrase ‘waking up in the middle of
the night’ does involve an implicit negative sentiment.
Currently, most sentiment lexicons limit themselves to absolute sentiment words.
Extraction of implicit sentiment in phrases forms a different branch of work.
However, there exist word association lexicons that capture implied sentiment in
words (Mohammad and Turney 2010). We stick to this definition as well, and
discuss sentiment and emotion lexicons that capture absolute sentiment.

5.3.1 Sentiment Lexicons
Early development of sentiment lexicons focused on creation of sentiment dictionaries. Stone et al. (1966) present a lexicon called ‘General Inquirer’ that has been
widely used for sentiment analysis. Finn (2011) present a lexicon called AFINN.
Like General Inquirer, it is also a manually generated lexicon. To show the general
methodology underlying sentiment lexicons, we describe some popular sentiment
lexicons in the forthcoming subsections.

5.3.1.1

SentiWordNet

SentiWordNet, described first by Esuli and Sebastiani (2006), is a sentiment lexicon
which augments WordNet (Miller 1995) with sentiment information. The labelling
is fuzzy, and is done by adding three sentiment scores to each synset in the WordNet
as follows. Every synsets has three scores:
1. Pos(s): The positive score of synsets
2. Neg(s): The negative score of synsets
3. Obj(s): The objective score of synsets

90

A. Joshi et al.

Thus, in SentiWordNet, sentiment is associated with the meaning of a word rather
than the word itself. This representation allows a word to have multiple sentiments
corresponding to each meaning. Because there are three scores, each meaning in
itself can be both positive and negative, or neither positive nor negative.
The process of SentiWordNet creation is an expansion of the approach used
for the three-class sentiment classification to handle graded sentiment values. The
algorithm to create SentiWordNet can be summarized as:
1. Selection of Seed Set: A seed set L_p and L_n consisting of ‘paradigmatic’ positive and negative synsets respectively was created. Each synset was represented
using the TDS. This representation converted words in the synset, its WordNet
definition and the sample phrases together with explicit labels for negation into
vectors.
2. Creation of Training Set: This seed set was expanded for k iterations using
the following relations of WordNet: Direct antonymy, Similarity, Derived from,
Pertains to, Attribute and Also see. These were the relations hypothesized to
preserve or invert the associated sentiment. After k iterations of expansion, this
gave rise to the sets Tr_pˆk and Tr_nˆk. The objective set L_o D Tr_oˆk was
assumed to consist of all the synsets that did not belong to Tr_pˆk or Tr_nˆk.
3. Creation of Classifiers: A classifier can be defined as a combination of a learning
algorithm and a training set. In addition to the two choices of learning algorithms
(SVM and Rocchio), four different training sets were constructed with the
number of iterations of expansion k D 0, 2, 4, 6. The size of the training set
increased substantially with an increase in k. As a result, low values of k yielded
classifiers with low recall but high precision, while higher k led to high recall
but low precision. As a result there were 8 ternary classifiers in total due to all
combinations of the 2 learners and 4 training sets. Each ternary classifier was
made up of two binary classifiers, positive vs. not positive and negative vs. not
negative.
4. Synset Scoring: Each synset from the WordNet was vectorized and given to the
committee of ternary classifiers as test input. Depending upon the output of the
classifiers, each synset was assigned sentiment scores by dividing the count of
classifiers that give a label by the total number of classifiers (8).

5.3.1.2

SO-CAL

Sentiment Orientation CALculator (SO-CAL) system (Brooke et al. 2009) is based
on a manually constructed low-coverage resource made up of raw words. Unlike
SentiWordNet, there is no sense information associated with a word. SO-CAL
uses as its basis a lexical sentiment resource consisting of about 5000 words. (In
comparison, SentiWordNet has over 38,000 polar words and several other strictly
objective words.) Each word in SO-CAL has a sentiment label which is an integer

5 Sentiment Resources: Lexicons and Datasets

91

in [5, C5] apart from 0 as objective words are simply excluded. The strengths
of SO-CAL lie in its accuracy, as it is manually annotated, and the use of detailed
features that handle sentiment in various cases in ways conforming to linguistic
phenomena.
SO-CAL uses several ‘features’ to model different word categories and the
effects they have on sentiment. In addition, a few special features operate outside the
scope of the lexicon in order to affect the sentiment on the document level. These
are some of the features of SO-CAL:
1. Adjectives: A manual dictionary of adjectives was created by manually tagging
all adjectives in a 500-document multidomain review corpus, and the terms from
the General Inquirer dictionary were annotated added to the list thus obtained.
2. Nouns, Verbs and Adverbs: SO-CAL also extended the approach used for
adjectives to nouns and verbs. As a result, 1142 nouns and 903 verbs were added
to the sentiment lexicon. Adverbs were added by simply adding the -ly suffix to
adjectives and then manually altering words whose sentiment was not preserved,
such as essentially. In addition multi-word expressions were also added, leading
to an addition to 152 multiwords in the lexicon. Thus, while the adjective ‘funny’
has a sentiment of C2, the multiword ‘act funny’ has a sentiment of 1.
3. Intensifiers and Downtoners: An Intensifier is a word which increases the
intensity of the phrase to which it is applied, while a Downtoner is a word which
decreases the intensity of the phrase to which it is applied. For instance the word
‘extraordinarily’ in the phrase ‘extraordinarily good’ is an intensifier while the
word somewhat in the phrase ‘somewhat nice’ is a downtoner.

5.3.1.3

Sentiment Treebank & Associated Lexicon

This Treebank was introduced in Socher et al. (2013). In order to do create the
Treebank, the work also came up with a lexicon called the Sentiment Treebank,
which is a lexicon consisting of partial parse trees annotated with sentiment.
The lexicon was created as follows. A movie review corpus was obtained
from www.rottentomatoes.com, consisting of 10,662 sentences. Each sentence was
parsed using the Stanford Parser. This gave a parse tree for each sentence. The parse
trees were split into phrases, i.e., each parse tree was split into its components,
each of which was then output as a phrase. This gave rise to 215,154 phrases.
Each of these phrases was tagged for sentiment using Amazon’s Mechanical Turk’s
interface. The selection of labels is also described in the original paper. Initially,
the granularity of the sentiment values was 25, i.e., 25 possible values could be
given for the sentiment, but it was observed from the data from the Mechanical
Turks experiment that most responses contained any one of only 5 values. These
5 values were then called ‘very positive’, ‘positive’, ‘neutral’, ‘negative’ and ‘very
negative’.

92

A. Joshi et al.

Table 5.1 Summary of sentiment lexicons
Approach
Manual

SO-CAL

Lexical unit
Word

SentiWordNet Automatic

WordNet
Synset

Sentiment
Treebank

Manual,
Crowdsourced

Phrase

Macquaire
semantic
orientation
lexicon

Semi-supervised Words

5.3.1.4

Labels
Integer in [5, C5]

Observation
Performance can
be improved by
incorporating
linguistic features
even with low
coverage
3 fractional values
WordNet captures
Pos, Neg, Obj in [0, 1] senses. Different
senses may have
different
sentiment.
5 labels ranging from Crowdsourcing
“very negative” to
can be beneficial.
“very positive”
Tune labels
according to the
task.
Positive/ negative
Using links in a
thesaurus to
discover new
words.

Summary

Table 5.1 summarizes sentiment lexicons described above, and in addition, also
mentions some other sentiment lexicons. We compare along four parameters: the
approach used for creation, lexical units, labels and some observations. Mohammad
et al. (2009) present Macquaire semantic orientation lexicon. This is a sentiment
lexicon that contains 76,400 terms, marked as positive or negative. In terms of
obtaining manual annotations, Louviere (1991) present an approach called the
MaxDiff approach. In this case, instead of obtaining annotations for one word at
a time, an annotator is shown multiple words and asked to identify the least positive
and most positive word among them.

5.3.2 Emotion Lexicons
We now describe emotion lexicons. They have been described in this separate
subsection so as to highlight challenges and the approaches specific to emotion
lexicon generation.
5.3.2.1

LIWC

Linguistic Inquiry and Word Count (LIWC) (Pennebaker et al. 2001) is a popular
manually created lexicon. The lexicon consists of 4500 words and word stems

5 Sentiment Resources: Lexicons and Datasets

93

(An example word stem is happ* which covers adjectival and adverbial forms
of the word) arranged in four categories. The four categories of words in LIWC
are: Linguistic processes (pronouns, prepositions, conjunctions, etc.), Speaking
processes (Interjections, Fillers, etc.), personal concerns (words related to work,
home, etc.) and psychological processes. The words in the psychological processes
category deal with affect and opinion, and are further classified into cognitive
and affective processes. Cognitive processes include words indicating certainty
(‘definitely’), possibility (‘likely’) and inhibition (‘prevention’), etc. Affective
processes include words with positive/negative emotion, words expressing anxiety,
anger, sadness. LIWC 2001 has 713 cognitive and 915 affective processes words.
LIWC was manually created by three linguistic experts in two steps:
(a) Define category scales: The judges determined categories and decided how they
can be grouped into a hierarchy
(b) Manual population: The categories were manually populated with words. For
each word, three judges manually evaluated whether or not a word should be
placed in a category. In addition, they also considered if a word can be moved
higher up in the hierarchy.
LIWC now exists in multiple languages, and has been widely used by several
applications for analysis of topic as well as sentiment/emotion.

5.3.2.2

ANEW

Affective norms for English words (ANEW) (Bradley and Lang 1999) is a dictionary of around 1000 words where each word is indicated with a three-tuple
representation: pleasure, arousal and activation. Pleasure indicates the valence of a
word, arousal the intensity while activation indicates whether the emotion expressed
in the word is in control or not. Consider the example word ‘afraid’. This word is
indicated by the tuple (negative, 3, not) indicating that it is a negative emotion, with
an arousal of 3, and is a deactivated emotion. ANEW was manually created by 25
annotators separately. Each annotation experiment was conducted in runs of 100–
150 words. Annotators are given a sheet called ScanSAM sheet. Each annotator
marks values of S, A and M for word. The annotators perform the annotation
separately.

5.3.2.3

Emo-Lexicon

Emo-Lexicon (Mohammad and Turney 2013) is a lexicon of 14,000 terms created
using crowd-sourcing portals like Amazon Mechanical Turk. Association with
positive and negative valence as well as with the eight Plutchik emotions is also
available. Although it is manually created, the lexicon is larger than other emotion
lexicons – a clear indication that crowdsourcing is indeed a powerful mechanism
for large-scale creation of emotion lexicon. However, because the task of lexicon

94

A. Joshi et al.

creation has been opened up to the ‘crowd’, quality control is a key challenge. To
mitigate this, the lexicon is created with additional drivers, as follows:
1. A list of words is created from a thesaurus.
2. When an annotator annotates a word with emotion, he/she must first ascertain
the sense of the word. The target word is displayed along with four words. The
annotator must select one that is closest to the target word.
3. Only if the annotator was able to correctly determine the sense of the word is
his/her annotation for emotion label obtained.

5.3.2.4

WordNet-Affect

WordNet-Affect (Strapparava and Valitutti 2004) like SentiWordNet, is a resource
that annotates senses in WordNet with emotions. WordNet Affect was created using
a semi-supervised method. It consists of 2874 synsets annotated with affective labels
(called a-labels). WordNet-Affect was created as follows:
1. A set of core synsets is created. These are synsets whose emotion has been
manually labelled in the form of a-labels.
2. These labels are projected to other synsets using WordNet relations.
3. The a-labels are then manually evaluated and corrected, wherever necessary.

5.3.2.5

Chinese Emotion Lexicon

A Chinese emotion lexicon (Xu et al. 2010) was created using a semi-supervised
approach, in absence of a graphical structure such as WordNet. There are two steps
of creation:
1. Select a core set of labelled words.
2. Expand these words using a similarity matrix. Iterate until convergence.
The similarity matrix takes three kinds of similarity into account:
1. Syntagmatic similarity: This includes co-occurrence of two words in a large text
corpus.
2. Paradigmatic similarity: This includes relations between two words in a semantic
dictionary.
3. Linguistic peculiarity: This involves syllable overlap, possibly to cover different
forms of the same word.

5.3.2.6

SenticNet

SenticNet (The most recent version, being SenticNet 4) by Cambria et al. (2016)
is a rich graphical repository of concepts. The resource aims to capture semantic,

5 Sentiment Resources: Lexicons and Datasets

95

Table 5.2 Summary of emotion lexicons
Approach
Manual

Labels
Hierarchy of
categories

ANEW & ANEW for
Spanish

Manual

Valence, arousal,
dominance

Emo-Lex

Manual

LIWC

WordNet affect

Eight emotions, two
valence categories
Semi-supervised Affective labels

Chinese emotion
Semi-supervised Five emotions
lexicon
NRC Hashtag emotion Automatic
Eight emoticons
lexicon
SenticNet 4
Semi-supervised A larger structure

Observation
Decide hierarchy of
categories; have judges
interacting with each other
ScanSAM lists; have a set of
annotators annotating in
parallel
Use crowd-sourcing.
Attention to quality control.
Annotate a seed set. Expand
using WordNet relations.
Annotate a seed set. Expand
using similarity matrices
Use hashtag based
supervision of tweets
Semi-supervised graphical
structure, created using
techniques such as
agglomerative clustering

and sentic properties of words and phrases. The sentic properties are related to
connotations of words. A detailed discussion of SenticNet forms a forthcoming
chapter of this book.
5.3.2.7

Summary

Table 5.2 shows a summary of emotion lexicons discussed in this section. We
observe that manual approaches dominate emotion lexicon creation. Key issues
in manual emotion annotation are: ascertaining the quality of the labels, deciding hierarchies if any. Additional useful lexicons are available at: http://www.
saifmohammad.com/WebPages/lexicons.html. On the other hand, automatic emotion annotation is mostly semi-supervised. To expand a seed set, structures like
WordNet may be used, or similarity matrices constructed from large corpora can
be employed. Mohammad (2012) present a hashtag emotion lexicon that consists
of 16,000C unigrams annotated with eight emotions. The lexicon is created using
emotion-denoting hashtags present in tweets. Mohammad and Turney (2010) is also
an emotion lexicon created using a crowdsourcing platform.

5.4 Sentiment-Annotated Datasets
This section describes sentiment-annotated datasets, and is organized as follows. We
first describe sources of data, mechanisms of annotation, and then provide a list of
some sentiment-annotated datasets.

96

A. Joshi et al.

5.4.1 Sources of Data
The first step is to obtain raw data. The following are candidate sources of raw
data:
1. Social networking websites like twitter are a rich source of data for sentiment
analysis applications. For example, Twitter API (Makice 2009) is a publicly
available API that allows you to download tweets based on a lot of interesting
search criteria such as keyword-based-search, download-timelines, downloadtweet-threads, etc.
2. Competitions such as SemEval have been regularly conducting Sentiment
analysis related tasks. These competitions release a training dataset followed by
a test dataset. These datasets can be used as benchmark datasets.
3. Discussion forums are portals where users discuss topics, often in the context
of a central theme or an initial question. These discussion forums often arrange
posts in a thread-like manner. This allows discourse nature to sentiment. However, this also introduces an additional challenge. A reply to a post could mean
one out of three possibilities: (a) The reply is an opinion with respect to the
post, offering an agreement or disagreement (example: Well-written post), (b)
The reply is an opinion towards the author of the post (example: Why do you
always post hateful things?), or (c) The reply is an opinion towards the topics
being discussed in the post. (Example: You said that the situation is bad. But do
you think that....). Reddit threads have been used as opinion datasets in several
past works.
4. Review websites: Amazon and other review websites have reviews on different
domains. Each kind of reviews has unique challenges of its own. In case of
movie reviews, the review often has a portion describing ‘what’ the movie is
about. It is possible to create subjective extracts before using them as done by
Mukherjee and Bhattacharyya (2012). In case of product reviews, the review
often contains sentiment towards different ‘aspects’. (‘Aspects’ of a cell phone
are battery, weight, OS, etc.).
5. Blogs are often long text describing an opinion with respect to a topic. They can
also be crawled and annotated to create a sentiment dataset. Blogs tend to be
structured narratives analyzing the topic. They may not always contain the same
sentiment throughout but can be useful sources of data that looks at different
aspects of the given topic.

5.4.2 Obtaining Labels
Once raw data has been obtained, the second step is to label this data. There are
different approaches that can be used for obtaining labels for a dataset:
1. Manual labelling: Several datasets have been created by human annotators.
The labelling can be done through crowd-sourcing applications like Amazon

5 Sentiment Resources: Lexicons and Datasets

97

Mechanical Turk. They allow obtaining large volumes of annotations by employing the ‘power of the crowds’ (Paolacci et al. 2010). To control the quality of
annotation, one way is to use a seed set of gold labels. Human annotators within
the controlled setup of the experiment create a set of gold labels. If a crowdsourced annotator (known as ‘worker’ in the crowd-sourcing parlance) gets a
sufficient number of gold labels right, only then is he/she permitted to perform
the task of annotation.
2. Distant supervision: Distant supervision refers to the situation where the label or
the supervision is obtained without an annotator – hence the word ‘distant’. One
way to do so is to use annotation provided by the writer themselves. However,
the question of reliability arises because not every data unit has been manually
verified by a human annotator. This has to be validated using the approach used
to obtain distant supervision. Consider the example of Amazon reviews. Each
review is often accompanied by star ratings. These star ratings can be used as
labels provided by the writer. Since these ratings are out of 5, a review with 1
star is likely to be strongly negative, whereas a review with 5 stars is likely to be
strongly positive. To improve the quality of the dataset obtained, Pang and Lee
(2005) consider reviews that are definitely positive and definitely negative – i.e.
reviews with 5 and 1 stars respectively.
Another technique to obtain distant supervision is the use of hashtags. Twitter
provides a reverse index mechanism in the form of hashtags. An example tweet
is ‘Just finished writing a 20 page long assignment. #Engineering #Boring’.
‘#Engineering’ and ‘#Boring’ as hashtags – since they are phrases preceded
by a hashtag symbol. Note that a hashtag is created by the author of the tweet
and hence, can be anything – topical (i.e. identifying what the tweet is about.
Engineering, in this case) or emotion-related (i.e. expressing an opinion through
a hashtag. In this case, the author of the tweet is bored). Purver and Battersby
(2012) emotion-related hashtags to obtain a set of tweets containing emotionrelated hashtags. Thus, hashtags such as ‘#happy’, ‘#sad’, etc. are used to
download tweets using the Twitter API. The tweets are then labelled as ‘#happy’,
‘#sad’, etc. Since hashtags are user-created, they can be more nuanced than
this. For example, consider the hypothetical tweet: ‘Meeting my ex-girlfriend
after about three years. #happy #not’. The last hashtag ‘#not’ inverts sentiment
expressed by the preceding hashtag ‘#happy’. This unique construct ‘#not’ or
‘#notserious’ or ‘#justkidding’/‘#jk’ is popular in tweets and must be handled
properly when hashtag-based supervision is used to create a dataset.

5.4.3 Popular Sentiment-Annotated Datasets
We now discuss some popular sentiment-annotated datasets. We divide them into
two categories: sentence-level annotation, discourse-level annotation. The latter
points to text longer than a sentence. While tweets may contain more than a
sentence, we group them under sentence-level annotation because of limited length
of tweets.

98

A. Joshi et al.

Sentence-Level Annotated Datasets
Niek Sanders released a dataset at http://www.sananalytics.com/lab/twittersentiment/. It consists of 5513 manually labelled tweets, classified as per four
topics.
SemEval is a competition that is run for specific tasks. Sentiment analysis and
related tasks have featured since 2013 (Nakov et al. 2013; Rosenthal et al. 2014,
2015). The datasets for these tasks are released online, and can be useful for
sentiment applications. SemEval 2013 dataset is at: http://www.cs.york.ac.uk/
semeval-2013/semeval2013.tgz SemEval 2014 dataset is at: http://alt.qcri.org/
semeval2014/task9/ SemEval 2015 dataset is at: http://alt.qcri.org/semeval2015/
task10/index.php?id=subtaske-readme
Darmstadt corpus consists of consumer reviews annotated at sentence and expression level. The dataset is available at: https://www.ukp.tu-darmstadt.de/data/
sentiment-analysis/darmstadt-service-review-corpus/ Sentence annotated polarity dataset from Pang et al. (2002) is also available at: https://www.cs.cornell.edu/
people/pabo/movie-review-data/ Sentiment140 (Go et al. 2009) is a corpus made
available by Stanford at http://help.sentiment140.com/for-students. The dataset
is of tweets and contains additional information such as timestamp, author, tweet
id, etc.
Deng et al. (2013) released a goodFor/badFor corpus that is available at: http://
mpqa.cs.pitt.edu/corpora/gfbf/. goodFor/badFor indicates positive/negative sentiment respectively. This corpus uses a five-tuple representation for opinion
annotation. Consider this example sentence from their user manual: ‘The smell
stifled his hunger.’ This sentence is marked as: ‘span: stifled, polarity: badFor,
agent: the smell, object: his hunger’.
Discourse-Level Annotated Datasets
Many movie review datasets and lexicons are available at: https://www.cs.cornell.
edu/people/pabo/movie-review-data/. These datasets include: sentiment annotated datasets, subjectivity annotated datasets, and sentiment scale datasets.
These have been released in Pang and Lee (2004, 2005), and widely used.
A Congressional speech dataset (Thomas et al. 2006) annotated with opinion is
available at: http://www.cs.cornell.edu/home/llee/data/convote.html The labels
indicate whether the speaker supported or opposed a legislation that he/she was
talking about.
A corpus consisting of Amazon reviews from different domains such as electronics,
movies, etc. is available at: https://snap.stanford.edu/data/web-Amazon.html
(McAuley and Leskovec 2013). This dataset spans a period of 18 years, and
contains information such as: product title, author name, star rating, helpful
votes, etc.
The Political Debate Corpus by Somasundaran and Wiebe (2009) is a dataset of
political debates that is arranged based on different topics. It is available here:
http://mpqa.cs.pitt.edu/corpora/product_debates/.

5 Sentiment Resources: Lexicons and Datasets

99

MPQA Opinion Corpus (Wiebe et al. 2005) is a popular dataset that consists of
news articles from different sources. Version 2.0 of the corpus is nearly 15,000
sentences. The sentences are annotated with topics and labels. The topics are
from different countries around the world. This corpus is available at http://mpqa.
cs.pitt.edu/corpora/mpqa_corpus/.

5.5 Bridging the Language Gap
Creation of a sentiment lexicon or a labelled dataset is a time/effort-intensive
task. Since English is the dominant language in which SA research has been
carried out, it is only natural that many other languages have tried to leverage on
resources developed for English by adapting and/or reusing them. Cross-lingual
SA refers to use of systems and resources developed for one language to perform
SA of another. The first language (where the resources/lexicons/systems have been
developed) is called the source language, while the second language (where a new
system/resource/lexicon needs to be deployed) is called the target language. The
basis of cross-lingual SA is availability of a lexicon or an annotated dataset in the
source language. It must be noted that several nuanced methodologies to perform
cross-lingual SA exist, but have been left out due to the scope of this chapter. We
focus on cross-lingual sentiment resources.
The fundamental requirement is a mapping between the two languages. Let us
consider what happens in case we wish to map a lexicon in language X to language
Y. For a lexicon, this mapping can be in the form of a parallel dictionary where
words of one language are mapped to another. ANEW For Spanish (Redondo et al.
2007) describes the generation of a lexicon called ANEW. Originally created for
English words, its parallel Spanish version is created by translating words from
English to Spanish, and then manually validating them. It can also be in the form
of linked WordNets, in case the lexicons involve concepts like synsets. Hindi
SentiWordNet (Joshi et al. 2010) map synsets in English to Hindi using a WordNet
linking, and generate a Hindi SentiWordNet from its English variant. Mahyoub et al.
(2014) describe a technique to create a sentiment lexicon for Arabic. Based on a seed
set of positive and negative words, and Arabic WordNet, they present an expansion
algorithm to create a lexicon. The algorithm uses WordNet relations in order to
propagate sentiment labels to new words/synsets. The WordNet relations they use
are divided into two categories: the ones that preserve the sentiment orientation, and
the ones that invert the sentiment orientation.
How is this process of mapping words in one language to another any different
for datasets? In case a machine translation (MT) system is available, this task is
simple. A dataset in the source language can be translated to the target language.
This is a common strategy that has been employed (Mihalcea et al. 2007; Duh

100

A. Joshi et al.

et al. 2011). It follows that translation may introduce additional errors into the
system, thus causing a degradation in the quality of the dataset. This is particularly
applicable to translation of sentiment-bearing idioms. Salameh et al. (2015) perform
their experiments for Arabic where a MT system is used to translate documents,
following which sentiment analysis is performed. An interesting observation that
the authors make is that although MT may result in a poor translation making it
difficult for humans to identify sentiment, a classifier performs reasonably well.
However, MT systems may not exist for all language pairs. Balamurali et al. (2012)
suggest a naive replacement for a MT system. To translate a corpus from Hindi to
Marathi (and vice versa), they obtain sense annotations for words in the dataset.
Then, they use a WordNet linking to transfer the word from the source language to
the target language.
An immediate question that arises is the hypothesis at the bottom of all crosslingual approaches: sentiment is retained across languages. This means that if a
word has a sentiment s in the source language, the translated word in target language
(with appropriate sense recorded) also has sentiment s. How fair is the hypothesis
that words in different languages bear the same emotion? This can be seen from
linear correlations between ratings for the three affective dimensions, as was done
for ANEW for Spanish. ANEW for Spanish (Redondo et al. 2007), as described
above, was a lexicon created using ANEW in English. The correlation values for
valence, arousal and dominance are 0.916, 0.746 and 0.720 respectively. This means
that a positive English word is very likely to be a positive Spanish word. The arousal
and dominance values remain the same to a lower extent.
Thus, we have two options now. The first option is cross-lingual SA: use
resources generated for the source language and map it to the target language.
The second option is in-language SA: create resources for the target language on
its own. Balamurali et al. (2013) weighs in-language SA against cross-lingual SA
based on Machine Translation. The authors show for English, German, French and
Russian that in-language SA does consistently better than cross-lingual SA relying
on translation alone.
Cross-lingual SA also benefits from additional corpora in target language:
1. Unlabeled corpus in target language: This type of corpus is used in different
approaches, the most noteworthy being the co-training-based approach. Wan
(2009). The authors assume that a labelled corpus in the source language,
unlabeled corpus in target language and a MT system to translate back and forth
between the two languages are available.
2. Labelled corpus in target language: The size of this dataset is assumed to be
much smaller than the training set.
3. Pseudo-parallel data: Lu et al. (2011) describe use of pseudo-parallel data for
their experiments. Pseudo-parallel data is the set of sentences in the source
language that are translated to the target language and used as an additional
polarity-labelled data set. This allows the classifier to be trained on a larger
number of samples.

5 Sentiment Resources: Lexicons and Datasets

101

5.6 Applications of Sentiment Resources
In the preceding sections, we described sentiment resources in terms of labels,
annotation techniques and approaches to creation. We will now see how a sentiment
resource (either a lexicon or a dataset) can be used.
A lexicon is useful as a knowledge base for a rule-based SA system. A rule-based
SA system takes a textual unit as input, applies a set of pre-determined rules, and
produces a prediction. Joshi et al. (2011) present C-Feel-It, a rule-based SA system
for tweets. The workflow is as follows:
1. A user types a keyword. Tweets containing the keyword are downloaded using
the Twitter API
2. The tweets are pre-processed to correct extended words (e.g. ‘happpyyyyy’ is
replaced with two occurrences of happy. Two, because the extended form of the
word ‘happy’ has a magnified sentiment)
3. The words in a tweet are looked up individually in four lexical resources.
The sentiment label of a tweet is calculated as a sum of positive and negative
words – with rules applied for conjunctions and negation. In case of negation,
the sentiment of words within a window is inverted. In case of conjunctions such
as ‘but’, the latter part of a tweet is considered.
4. The resultant prediction of a tweet is a weighted sum of prediction made by the
four lexical resources. The weights are determined experimentally by considering
how well the resources perform on an already labelled dataset of tweets.
The above approach is a common framework for rule-based SA systems.
Levallois (2013) also use lexicons and a set of rules to perform sentiment analysis
of tweets. The goal, as stated by the authors, is to design it as ‘fast and scalable’.
LIWC provides a tool which also uses the lexicon, applies a set of rules to generate
a prediction. Typically, systems that use SA as a sub-module of a larger application
can benefit greatly from a lexicon and simple hand-crafted rules.
Lexicons have also been used in topic models (Lin and He 2009) to set priors
on the word-topic distributions. A topic model takes as input a dataset (labelled
or unlabeled) and generates clusters of words called topics, such that a word may
belong to more than one topic. A topic model based on LDA (Blei et al. 2003)
samples a latent variable called topic, for every word occurrence in a document.
This results in two types of distributions over an unlabeled dataset: topic-document
distributions (the probability of seeing this topic in this document, given the words
and the topic-word assignments), and word-topic distributions (the probability of
seeing this word belonging to the topic in the entire corpus, given the words
and the topic-word assignments). The word-topic distribution is a multinomial
with a Dirichlet prior. Sentiment lexicons have been commonly used as Dirichlet
Hyperparameters for the word-topic distribution. Consider the following example.
In a typical scenario, all words have symmetric priors over the topics. This means
that all words are equally likely to belong to a certain topic. However, if we wish

102

A. Joshi et al.

to have ‘sentiment coherence’ in topics, then, setting Dirichlet Hyperparameters
appropriately can adjust priors on topic. Let us assume that we wish to have
the first half of topics to represent ‘positive’ topics, and second half of topics to
represent ‘negative’ topics. A ‘positive’ topic here means a topic with positive words
corresponding to a concept. More complex topic models which model additional
latent variables (such as sentiment or switch variables) also use lexicons to set priors
(Mukherjee and Bhattacharyya 2012). Lexicons have also been used to train deep
learning-based neural networks (Socher et al. 2013). A combination of datasets and
lexicons has also been used. Tao et al. (2009) propose a three-pronged factorization
method for sentiment classification. They factor in information from sentiment
lexicons (in the form of word level polarities), unlabeled datasets (in the form of
word co-occurrence) and labelled datasets (to set up the correspondences). Lexicons
can also be used to determine values of frequency-based features in a statistical
classification system. Kiritchenko et al. (2014) use features derived from a lexicon
such as: number of tokens with non-zero sentiment, total and maximal score of
sentiment, etc. This work also presents a set of ablation tests to identify value of
individual sets of features. When the lexicon-based features are removed from the
complete set, the maximum degradation is observed. Such lexicon-based features
have been used for related tasks such as sentiment annotation complexity prediction
(Joshi et al. 2014), thwarting detection (Ramteke et al. 2013) and sarcasm detection
(Joshi et al. 2015).
Let us now look at how sentiment-labelled datasets can be used, especially
in machine learning (ML)-based classification systems. ML-based systems model
sentiment analysis as a classification problem. A classification model predicts the
label of a document as one among different labels. This model is learnt using a
labelled dataset as follows. A document is converted to a feature vector. The most
common form of a feature vector of a document is the unigram representation with
the length equal to the vocabulary size. The vocabulary is the set of unique words
in the labelled dataset. A Boolean or numeric feature vector of length equal to the
vocabulary size is constructed for each document where the value is set for the words
present in the document. The goal of the model is to minimize error on training
documents, with appropriate regularization for variance in unseen documents. The
labelled documents serve as a building block for a ML-based system. While the
unigram representation is common, several features such as word sense based
features (Balamurali et al. 2011), qualitative features such as POS sequences (Pang
et al. 2002), have been used as features for ML-based systems. The annotated
datasets form the basis for creation of feature vectors with the documents acting
as observed instances. Melville et al. (2009) combine knowledge from lexicons
and labelled datasets in a unique manner. Sentiment lexicon forms the background
knowledge about words while labelled datasets provide a domain-specific view of
the task, in a typical text classification scenario.

5 Sentiment Resources: Lexicons and Datasets

103

5.7 Conclusion
This chapter described sentiment resources: specifically, sentiment lexicons and
sentiment-annotated datasets. Our focus was on the philosophy and trends in the
generation and use of sentiment lexicons and datasets. We described creation
of several popular sentiment and emotion lexicons. We then discussed different
strategies to create annotated datasets, and also presented a list of available datasets.
Finally, we add two critical points in the context of sentiment resources: how a
resource in one language can be mapped to another, and how these resources are
actually deployed in a SA system. The diversity in goals, approaches and uses of
sentiment resources highlights the value of good quality sentiment resources to
sentiment analysis.

References
Balamurali, A.R., Aditya Joshi, and Pushpak Bhattacharyya. 2011. Harnessing wordnet senses for
supervised sentiment classification. In Proceedings of the conference on empirical methods in
natural language processing. Association for Computational Linguistics.
———. 2012. Cross-lingual sentiment analysis for Indian languages using linked WordNets. In:
COLING.
Balamurali, A.R., Mitesh M. Khapra, and Pushpak Bhattacharyya. 2013. Lost in translation:
Viability of machine translation for cross language sentiment analysis. In Computational
linguistics and intelligent text processing. Berlin/Heidelberg: Springer.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. The
Journal of Machine Learning Research 3: 993–1022.
Bradley, M.M., and P.J. Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report C-1. The Center for Research in Psychophysiology,
University of Florida.
Brooke, Julian, Milan Tofiloski, and Maite Taboada. 2009. Cross-linguistic sentiment analysis:
From English to Spanish. In: RANLP.
Cambria, Erik, Soujanya Poria, Rajiv Bajpai, and Björn Schuller. 2016. SenticNet 4: A semantic
resource for sentiment analysis based on conceptual primitives. In The 26th International
conference on computational linguistics (COLING), Osaka, 2666–2677.
Deng, Lingjia, Yoonjung Choi, and Janyce Wiebe. 2013. Benefactive/Malefactive event and writer
attitude annotation. ACL (2).
Duh, Kevin, Akinori Fujino, and Masaaki Nagata. 2011. Is machine translation ripe for crosslingual sentiment classification?. In Proceedings of the 49th annual meeting of the association
for computational linguistics: Human language technologies: Short papers-volume 2. Association for Computational Linguistics.
Ekman, Paul. 1992. An argument for basic emotions. Cognition and Emotion 6 (3–4): 169–200.
Esuli, Andrea, and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource
for opinion mining. In Proceedings of LREC, vol. 6.

104

A. Joshi et al.

Finn, Arup. 2011. AFINN. Informatics and Mathematical Modelling, Technical University of
Denmark.
Go, Alec, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant
supervision. CS224N Project Report, Stanford 1 (2009): 12.
Joshi, Aditya, A.R. Balamurali, and Pushpak Bhattacharyya. 2010. A fall-back strategy for
sentiment analysis in hindi: A case study. In Proceedings of the 8th ICON.
Joshi, Aditya, A.R. Balamurali, and Pushpak Bhattacharyya, and Rajat Mohanty. 2011. C-feelit: A sentiment analyzer for micro-blogs. In Proceedings of the 49th annual meeting of the
association for computational linguistics.
Joshi, Aditya, Abhijt Mishra, and Pushpak Bhattacharyya. 2014. Measuring sentiment annotation
complexity of text. In Conference for association of computational linguistics.
Joshi, Aditya, Vinita Sharma, and Pushpak Bhattacharyya. 2015. Harnessing context incongruity
for Sarcasm detection. In Conference for association of computational linguistics.
Kiritchenko, Svetlana, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short
informal texts. Journal of Artificial Intelligence Research 50: 723–762.
Levallois, Clement. 2013. Umigon: Sentiment analysis for tweets based on lexicons and heuristics.
In Proceedings of the international workshop on semantic evaluation. SemEval, vol. 13.
Lin, Chenghua, and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In
Proceedings of the 18th ACM conference on Information and knowledge management. ACM.
Liu, Bing. 2010. Sentiment analysis and subjectivity. In Handbook of natural language processing,
vol. 2, 627–666.
Louviere, Jordan J. 1991.Best-worst scaling: A model for the largest difference judgments.
Technical report, University of Alberta.
Lu, Bin, et al. 2011. Joint bilingual sentiment classification with unlabeled parallel corpora.
In Proceedings of the 49th annual meeting of the association for computational linguistics:
Human language technologies-volume 1. Association for Computational Linguistics.
Mahyoub, Fawaz H.H., Muazzam A. Siddiqui, and Mohamed Y. Dahab. 2014. Building an Arabic
sentiment Lexicon using semi-supervised learning. Journal of King Saud University-Computer
and Information Sciences 26 (4): 417–424.
Makice, Kevin. 2009. Twitter API: Up and running: Learn how to build applications with the
Twitter API. Beijing: O’Reilly Media, Inc.
McAuley, Julian, and Jure Leskovec. 2013. Hidden factors and hidden topics: Understanding rating
dimensions with review text. In Proceedings of the 7th ACM conference on recommender
systems. ACM.
Mehrabian, Albert, and James A. Russell. 1974. An approach to environmental psychology.
Cambridge, MA: MIT Press.
Melville, Prem, Wojciech Gryc, and Richard D. Lawrence. 2009. Sentiment analysis of blogs by
combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM.
Mihalcea, Rada, Carmen Banea, and Janyce M. Wiebe. 2007. Learning multilingual subjective
language via cross-lingual projections.
Miller, George A. 1995. WordNet: A lexical database for English. Communications of the ACM 38
(11): 39–41.
Mohammad, Saif. 2012. #Emotional tweets. In Proceedings of the first joint conference on lexical
and computational semantics (*Sem), June 2012.
Mohammad, Saif M., and Peter D. Turney. 2010. Emotions evoked by common words and phrases:
Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT
2010 workshop on computational approaches to analysis and generation of emotion in text.
Association for Computational Linguistics.
———. 2013. Crowdsourcing a word–emotion association lexicon. Computational Intelligence 29
(3): 436–465.

5 Sentiment Resources: Lexicons and Datasets

105

Mohammad, S., C. Dunne, and B. Dorr. 2009. Generating high-coverage semantic orientation
lexicons from overtly marked words and a thesaurus. In EMNLP, 599–608.
Mukherjee, S., and P. Bhattacharyya. 2012. Wikisent: Weakly supervised sentiment analysis
through extractive summarization with wikipedia. In Machine learning and knowledge discovery in databases, 774–793. Berlin/Heidelberg: Springer.
Nakov, Preslav, et al. 2013. Semeval-2013 task 2: Sentiment analysis in twitter.
Ortony, Andrew, and Terence J. Turner. 1990. What’s basic about basic emotions? Psychological
Review 97 (3): 315–331.
Pang, Bo, and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on
association for computational linguistics. Association for Computational Linguistics.
———. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect
to rating scales. In Proceedings of the 43rd annual meeting on association for computational
linguistics. Association for Computational Linguistics.
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on
empirical methods in natural language processing-volume 10. Association for Computational
Linguistics.
Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. 2010. Running experiments on
Amazon Mechanical turk. Judgment and Decision Making 5 (5): 411–419.
Pennebaker, James W., Martha E. Francis, and Roger J. Booth. 2001. Linguistic inquiry and word
count: LIWC 2001, vol 71. Mahwah: Lawrence Erlbaum Associates.
Plutchik, Robert. 1980. Emotion: A psychoevolutionary synthesis. New York: Harpercollins
College Division.
———. 1982. A psychoevolutionary theory of emotions. Social Science Information/sur les
sciences sociales 21: 529–553.
Purver, Matthew, and Stuart Battersby. 2012. Experimenting with distant supervision for emotion
classification. In Proceedings of the 13th conference of the European chapter of the association
for computational linguistics. Association for Computational Linguistics.
Ramteke, Ankit, Pushpak Bhattacharyya, and J. Saketha Nath. 2013. Detecting Turnarounds in
sentiment analysis: Thwarting. In Conference for association of computational linguistics.
Redondo, Jaime, et al. 2007. The Spanish adaptation of ANEW (affective norms for English
words). Behavior Research Methods 39 (3): 600–605.
Rosenthal, Sara, et al. 2014. Semeval-2014 task 9: Sentiment analysis in twitter. In Proceedings of
SemEval, 73–80.
Rosenthal, Sara, Preslav Nakov, Svetlana Kiritchenko, Saif M. Mohammad, Alan Ritter, and
Veselin Stoyanov. 2015. Semeval-2015 task 10: Sentiment analysis in twitter. In Proceedings
of the 9th international workshop on semantic evaluation, SemEval.
Salameh, Mohammad, Saif Mohammad, and Svetlana Kiritchenko. 2015. Sentiment after translation: A case-study on Arabic social media posts. In Proceedings of the 2015 conference of
the North American chapter of the association for computational linguistics: Human language
technologies.
Socher, Richard, et al. 2013. Recursive deep models for semantic compositionality over a
sentiment treebank. In Proceedings of the conference on empirical methods in natural language
processing (EMNLP), vol. 1631.
Somasundaran, Swapna, and Janyce Wiebe. 2009. Recognizing stances in online debates. In
Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th
international joint conference on natural language processing of the AFNLP: Volume 1-volume
1. Association for Computational Linguistics.
Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. 1966. The general inquirer: A computer
approach to content analysis. Cambridge, MA: MIT Press.

106

A. Joshi et al.

Strapparava, Carlo, and Alessandro Valitutti. 2004. WordNet affect: An affective extension of
WordNet. LREC, vol. 4.
Tao, L., Y. Zhang, and V. Sindhwani. 2009. A non-negative matrix tri-factorization approach to
sentiment classification with lexical prior knowledge. In Proceedings of the joint conference
of the 47th annual meeting of the ACL and the 4th international joint conference on natural
language processing of the AFNLP Association for Computational Linguistics.
Thomas, Matt, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of the 2006 conference on
empirical methods in natural language processing. Association for Computational Linguistics.
Wan, Xiaojun. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of
the joint conference of the 47th annual meeting of the ACL and the 4th international joint
conference on natural language processing of the AFNLP: Volume 1-volume 1. Association for
Computational Linguistics.
Wiebe, Janyce, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and
emotions in language. Language Resources and Evaluation 39 (2–3): 165–210.
Xu, Ge, Xinfan Meng, and Houfeng Wang. 2010. Build Chinese emotion lexicons using a graphbased algorithm and multiple resources. In Proceedings of the 23rd international conference
on computational linguistics. Association for Computational Linguistics.

Chapter 6

Generative Models for Sentiment Analysis
and Opinion Mining
Hongning Wang and ChengXiang Zhai

Abstract This chapter provides a survey of recent work on using generative models
for sentiment analysis and opinion mining. Generative models attempt to model the
joint distribution of all the relevant data with parameters that can be interpreted as
reflecting latent structures or properties in the data. As a result of fitting such a
model to the observed data, we can obtain an estimate of these parameters, thus
“revealing” the latent structures or properties of the data to be analyzed. Such
models have already been widely used for analyzing latent topics in text data. Some
of the models have been extended to model both topics and sentiment of a topic,
thus enabling sentiment analysis at the topic level. Moreover, new generative models
have also been developed to model both opinionated text data and their companion
numerical sentiment ratings, enabling deeper analysis of sentiment and opinions to
not only obtain subtopic-level sentiment but also latent relative weights on different
subtopics. These generative models are general and robust and require no or little
human effort in model estimation. Thus they can be applied broadly to perform
sentiment analysis and opinion mining on any text data in any natural language.
Keywords Generative model • Probabilistic topic model • Topic-sentiment mixture • Latent aspect rating analysis • Latent variable analysis

There are many approaches to performing sentiment analysis and opinion mining.
At a high level, we can distinguish two main families of approaches. The first is
rule-based approaches where human expertise is leveraged to create rules (e.g.,
sentiment lexicon) for determining sentiment of a text object (Ding and Liu 2007;
Ding et al. 2008; Esuli and Sebastiani 2006; Taboada et al. 2011; Cambria et al.
2016). The second is statistical model based approaches, where statistical models

H. Wang ()
Department of Computer Science, University of Virginia, 22903, Charlottesville, VA, USA
e-mail: hw5x@virginia.edu
C.X. Zhai
Department of Computer Science, University of Illinois at Urbana-Champaign, 61801, Urbana,
IL, USA
e-mail: czhai@illinois.edu
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_6

107

108

H. Wang and C.X. Zhai

are estimated on labeled data or domain-specific priors generated by humans to
essentially learn “soft” rules for sentiment prediction (Dave et al. 2003; Kim and
Hovy 2004; Leskovec et al. 2010; Pang et al. 2002; Poria et al. 2015), a.k.a,
learning based methods. Learning based approaches usually require labeled data
for parameter estimation, while rule based approaches have less dependence on
manual annotation but they also suffer from limited generalization capability. The
rules can also be treated as high-level features to be used in a statistical model so as
to combine the two families of approaches (Hu et al. 2013; Lu et al. 2011; Rao and
Ravichandran 2009; Melville et al. 2009).
Among the statistical approaches, we may further distinguish generative models
from discriminative models (Bishop 2006). Generative models focus on modeling
the joint probability between class labels (e.g., sentiment labels) and data instances
(e.g., text documents). Latent variables can be introduced in generative models to
capture the unobservable or missing structures, e.g., latent topics (Blei et al. 2003;
Blei 2012; Hofmann 1999). As a result, a generative model is a full probabilistic
model of both observed and unobserved variables. In general, generative models
attempt to model the joint distribution of all the relevant data with parameters that
can be interpreted as reflecting latent structures or properties in the data. As a result
of fitting such a model to the observed data, we can obtain an estimate of these
parameters, thus “revealing” the latent structures or properties of the data to be
analyzed.
In contrast, discriminative models, such as support vector machines (Hearst
et al. 1998; Joachims 1998), directly model the decision boundaries, e.g., the
conditional probability of class labels given data instances. Thus, a discriminative
model provides a model only for the target variables conditioned on the observed
variables. Flexible feature representations can be exploited in discriminative models,
and empirically they often result in better classification performance than generative
models (Jordan 2002). This category of statistical solutions for sentiment analysis
have been well discussed in Liu’s and Pang’s survey book (Liu 2012, 2015; Pang
and Lee 2008), and therefore we will not cover it in our book.
In addition to supporting sentiment classification, one major advantage of generative models over discriminative models is the ability of expressing complex relationships between the observed and target variables, even when such relationships
are not directly observable. This property is of particular importance in sentiment
analysis and opinion mining, when formalizing the subtle dependency between
sentiment and text document content for more accurate modeling of opinions.
Promising progress in exploring generative models for sentiment analysis and
opinion mining has been achieved in recent studies (Lin and He 2009; Mei et al.
2007; Titov and McDonald 2008a; Jo and Oh 2011; Wang et al. 2010, 2011;
McAuley and Leskovec 2013; Moghaddam and Ester 2011). Previously, generative
models have already been widely used for analyzing latent topics in text documents,
e.g., topic models (Blei et al. 2003; Blei 2012; Hofmann 1999). Some of the
models have been extended to model the sentiment of a topic, thus enabling
sentiment analysis at the topic level (Lin and He 2009; Mei et al. 2007; Titov

6 Generative Models for Sentiment Analysis and Opinion Mining

109

and McDonald 2008a; Jo and Oh 2011). Moreover, new generative models have
also been developed to model both opinionated text data and their companion
numerical sentiment ratings, enabling deeper analysis of sentiment and opinions to
not only obtain subtopic-level sentiment but also latent relative weights on different
subtopics (Wang et al. 2010, 2011; McAuley and Leskovec 2013; Moghaddam
and Ester 2011). This chapter provides a survey of these recent works on using
generative models for sentiment analysis and opinion mining, and discusses various
applications of such models.
The rest of this chapter is organized as follows. In Sect. 6.1, we provide essential
background about language models and topic models, which is the basis of the
generative models that we will review in this chapter. We then present a detailed
review of the major generative models for sentiment analysis in Sect. 6.2. We will
discuss their applications in Sect. 6.3. To facilitate application development using
such models, in Sect. 6.4, we also provide a brief review of the relevant resources
on the Web.

6.1 Background: Language Models and Probabilistic Topic
Models
As a background, we first introduce generative models for modeling text data, starting from the N-gram language models, proceeding to introducing the probabilistic
topic models. We will introduce two most typical topic models, i.e., probabilistic
latent semantic indexing model (Hofmann 1999) and latent Dirichlet allocation
model (Blei et al. 2003). We will also briefly discuss the model estimation procedure
for these generative models.

6.1.1 Language Models for Text
The simplest generative model for modeling text data is the N-gram language models, which were first introduced in speech recognition for distinguishing between
words and phrases that sound similar (Katz 1987; Rabiner and Juang 1993) and
later introduced to information retrieval for matching keyword queries with text documents (Ponte and Croft 1998; Hiemstra and Kraaij 1998; Zhai and Lafferty 2001a).
A statistical language model specifies a probability distribution over sequences
of words. For example, with a language model estimated on a collection of computer
science research papers, one can make statistical assertions about which text
sequence is more likely to be generated by a computer scientist, e.g., P(“generative
models for sentiment analysis”) > P(“the flight to Chicago is cancelled”). Formally,
a language model P.w1 ; w2 ; : : : ; wn / specifies the joint probability of observing

110

H. Wang and C.X. Zhai

the word sequence w1 ; w2 ; : : : ; wn . Using the chain rule of probability, it can be
written as,
P.w1 ; w2 ; : : : ; wn / D P.w1 /P.w2 jw1 /P.w3 jw1 w2 / : : : P.wn jw1 w2 : : : wn1 /
D

n
Y

P.wk jw1 ; : : : ; wk1 /

(6.1)

kD1

where P.wk jw1 ; : : : ; wk1 / is a multinomial distribution over words in the vocabulary given the word sequence of w1 ; : : : ; wk1 .
The chain rule shows the link between computing the joint probability of a
sequence of words and computing the conditional probability of a word given all
preceding words. Intuitively, Eq. (6.1) defines the generation process of a word
sequence: repeatedly select the next word with regard to all the words in front of
it until meeting the predefined sequence length. For this reason, such a model is
often called a generative model.
Although Eq. (6.1) suggests that one can compute the joint probability of
an entire sequence of words by multiplying together a number of conditional
probabilities, it does not reduce the computational complexity. The bottleneck is
that we do not have any efficient way to compute the exact probability of a word
given a long sequence of preceding words. For example, with a vocabulary size
of V, to compute P.wk jw1 ; : : : ; wk1 / one needs in total .V  1/V k1 elements in
the probability table (minus one because the probabilities sum up to one). And this
complexity is in the same order as that to directly compute P.w1 ; : : : ; wk /, which is
V k  1. Since in general these probabilities must be estimated based on empirically
observed data, and in practice, we almost never have so much data to observe
all these different sequences, we must make simplification assumptions about the
model to make it tractable and actually useful in an application.
N-gram language models provide a practical solution to this computation complexity challenge: instead of computing the probability of a word given the entire
preceding sequence, we can approximate the preceding sequence by just a finite
:
number of previous words, i.e., P.wk jw1 ; : : : ; wk1 / D P.wk jwkNC1 ; : : : ; wk1 /.
The assumption that the conditional probability of a word depends only on
the previous N-1 words is called a Markov assumption. Unigram model is the
simplest N-gram language model, in which one assumes the current word is totally
independent of any other words in the sequence, i.e., P.wk jw1 ; : : : ; wk1 / D P.wk /;
as a result,
P.w1 ; w2 ; : : : ; wn / D

n
Y

P.wk /

(6.2)

kD1

In literature, unigram language model is also referred as bag-of-words model
(Harris 1954), since the order between words is totally ignored. To capture the local
dependency between words, bigram and trigram models are usually exploited.

6 Generative Models for Sentiment Analysis and Opinion Mining

111

One fundamental problem in applying the N-gram language models is to
estimate the N-gram probabilities of P.wk jwkNC1 ; : : : ; wk1 /. The simplest and
most intuitive way for estimating such probabilities is the maximum likelihood
estimation (Bishop 2006), in which one looks for the configuration of those
unknown probabilities to maximize the likelihood function over a given set of
training data. For the general case of maximum likelihood estimation for N-gram
language models, one estimates the conditional probability as follows,
P.wk jwkNC1 ; : : : ; wk1 / D

C.wkNC1 ; : : : ; wk1 wk /
C.wkNC1 ; : : : ; wk1 /

(6.3)

where C.wkNC1 ; : : : ; wk1 / is the frequency of word sequence wkNC1 ; : : : ; wk1
in the training corpus.
One important concept in maximum likelihood estimation for N-gram language
models is called “smoothing.” Due to the sparse observations in the training data,
zero probability is assigned to some word sequences, which makes any sequence
containing such sequences has a zero probability in the estimated model. Various
types of techniques have been developed to smooth a language model, e.g., Laplace
Smoothing, Good-Turing discounting and linear interpolation. Since this topic is
beyond the scope of this book, we refer the audiences to the following literature
for more details (Jurafsky and Martin 2009; Chen and Goodman 1996; Zhai and
Lafferty 2001b).

6.1.2 Probabilistic Topic Models
Topic models are a class of generative models for uncovering the underlying
semantic structure of a document collection. The very original idea of topic
modeling roots in Deerwester et al.s’ seminal work in latent semantic indexing
(LSI) (Deerwester et al. 1990), in which singular value decomposition is performed
to discover inter- and intra-document statistical structures in a lower dimensional
space. However, this approach is not a generative model, making it unclear how to
interpret the latent topics discovered. A significant step forward in this direction was
made by Hofmann (1999), who solved the problem of latent semantic indexing in a
probabilistic fashion (the pLSI model). In pLSI, words and documents are modeled
in a generative perspective: a document is modeled as a mixture of latent topics
and each topic is modeled as a multinomial distribution over words. However, pLSI
model is not a complete generative model, which does not specify the generation
process at the document level. To address this problem, a full Bayesian probabilistic
model, latent Dirichlet allocation (LDA) model (Blei et al. 2003), was introduced, in
which the topic proportion in each document is assumed to be drawn from a shared
Dirichlet distribution in the same corpus. LDA is an important milestone which
opened up many possibilities for further development of various generative models
for modeling topics. It has served as a springboard for many other topic models in

112

H. Wang and C.X. Zhai

analyzing different types of text data, including scientific literature (Steyvers et al.
2004; Blei and Lafferty 2007; Wang and Blei 2011), social media (Zhao et al. 2011;
Hong and Davison 2010) and opinionated text reviews (Titov and McDonald 2008a;
Lin and He 2009; Mei et al. 2007; Jo and Oh 2011; Wang et al. 2011).
In this section, we will briefly introduce these two basic probabilistic topic
models for text modeling, i.e., pLSI and LDA. We will focus on the basic notations,
generative assumptions, graphical model representation, and model estimation
procedure for each model.

6.1.2.1

pLSI

Probabilistic latent semantic indexing (pLSI), also known as probabilistic latent
semantic analysis (pLSA) is a generative model for document modeling. It models
a text document as a mixture over a set of latent topics, and each topic is modeled as
a probabilistic distribution over a fixed vocabulary. To formally describe the pLSI
model, and later other more advanced topic models, we will first introduce some
notations and terminologies.
Formally, a word w is the basic unit defined in a fixed size vocabulary,
indexed from 1 to V. A document is a length-N sequence of words, denoted as
d D .w1 w2 : : : wN /. A corpus is a collection of M documents, denoted as D D
fd1 ; d2 ; : : : ; dM g. In pLSI, a corpus is assumed to contain a set of k latent topics, each
of which is modeled as a multinomial distribution over the vocabulary, i.e., p.wjˇi /,
where ˇi is the distribution parameter for topic i. Thus a document is modeled
as a composition of those k topics: each word in a document is generated from a
single topic indexed by z, and different words in a document may be generated from
different topics.
An important assumption made in the pLSI model is that given the topic
assignments z D .z1 z2 : : : zN / for the words in a document d, the words are
independent of the document index. As a result, the joint probability of document d
and its words w1 w2 : : : wN can be computed as,
P.d; w1 w2 : : : wN / D P.d/

N X
Y

P.wi jzi /P.zi jd/

(6.4)

iD1 zi

The decomposition of joint probability of a document and its words in pLSI can
be described by the following generative process:
1. For each d 2 D, sample d by d  p.d/;
2. To generate each word wi 2 d,
a. Sample topic assignment zi by zi  p.zjd/;
b. Sample word wi by wi  p.wjˇ; zi /;
Using the graphical model presentation, the above generation process of a text
document defined by pLSI model can be illustrated in Fig. 6.1.

6 Generative Models for Sentiment Analysis and Opinion Mining

113

Fig. 6.1 Graphical model representation of probabilistic latent semantic indexing (pLSI) model.
The plates represent replicates, where the index on the bottom right corner indicates the number
of repetitions. The outer plate represents documents, while the inner plate represents the repeated
choice of topics and words within a document. The circles represent random variables, where
shaded circle indicates observable variables and light circle indicates latent variables

Consider the unigram language model described in Eq. (6.2), which assumes the
whole corpus only contains one topic and every word in documents is sampled from
that topic. pLSI relaxes this assumption by introducing k latent topics in a given
collection, and allows each document to be a mixture over those k topics. Hence, in
pLSI each document is represented as a list of mixing proportions for these mixture
components (i.e., p.zjd/) and thereby reduced to a probability distribution on a fixed
set of topics. Those mixing proportions can be considered as a lower dimensional
representation of a document, which can also be regarded as useful knowledge about
coverage of topics in each document.
pLSI model has served as building blocks in many other generative model for
text documents. Brants et al. used pLSI model to perform topic-based document
segmentation (Brants et al. 2002), Mei et al. utilized it to model the facets and
opinions in weblogs (Mei et al. 2007) and discover evolutionary theme patterns from
text (Mei and Zhai 2005), Zhai et al. used it for cross-collection comparative text
mining (Zhai et al. 2004), and Lu et al. exploited it for rated aspect summarization
of short comments (Lu et al. 2009).
pLSI model has two parameters to be estimated, i.e., the word distribution under a
given topic i, p.wjˇi /, and the topic proportions in a given document d, p.zjd/. Due
to the existence of latent variables in pLSI (i.e., the topic assignments of words),
maximum likelihood estimation is no longer applicable. Expectation maximization
(EM) algorithm (Dempster et al. 1977) is popularly used to estimate those two
parameters. Briefly, the EM algorithm
P approximates the lower bound of data
likelihood function (i.e., p.d; w/ D
z p.d; w; z/) by computing the expectation
of complete data likelihood over the latent variables (i.e., Ez Œp.d; w; z/). Two steps
are alternatively executed in EM algorithm: in E-step, the expectation of complete
data likelihood over the latent variables is computed; in M-step, the optimal model
parameters are found to maximize this expectation. Since a principled derivation of
EM algorithm and the proof of its convergence are beyond the scope of this book,
interested readers can refer to Dempster et al. (1977), McLachlan and Krishnan
(2007), and Wu (1983) for more details.
The EM iterations are guaranteed to stop at a local maximum. However, there is
no guarantee for an EM algorithm to find the global optimal. As a result, pLSI is
prone to overfitting the data and good initialization in pLSI becomes very important.

114

H. Wang and C.X. Zhai

Another source of overfitting in the pLSI model is its incomplete generative
process: the document variable d is simply modeled as an index in the corpus, and
there is no generative assumption about it. As a result, the number of parameters
in the model grows linearly with the size of the corpus (each document has its own
k-dimensional topic proportion vector), and it is not clear how to assign probability
to a document outside of the training set.
To address these limitations, latent Dirichlet allocation model was introduced
later to impose a full generative assumption about the document generation process.
We will introduce the LDA model in the next section.

6.1.2.2

LDA and Advanced Topic Models

Latent Dirichlet allocation model (LDA), proposed by Blei et al. in (Blei et al.
2003), introduces a shared Dirichlet distribution over the topic proportions in each
document to control the number of parameters in a topic model. As shown in
Fig. 6.2, the topic proposition p.zj; d/ in document d is modeled as a multinomial
distribution parameterized by a k-dimensional vector , which is assumed to be
drawn from a Dirichlet distribution with ˛ as the concentration parameter,
P
k
 . k ˛i / Y ˛i 1
p.j˛/ D Qk iD1
i
iD1  .˛i / iD1

(6.5)

where  ./ is the Gamma function.
According to Fig. 6.2, the generative process of documents specified by a LDA
model can be described as follows,
1. For each d 2 D, sample  by   Dir.˛/;
2. For each wi 2 d,
a. Sample topic assignment zi by zi  p.zj; d/;
b. Sample word wi by wi  p.wjˇ; zi /;
The corresponding joint probability of words w, latent topic assignments z,
and latent topic proportion  in document d specified by a LDA model can be
computed as,

Fig. 6.2 Graphical model representation of latent Dirichlet allocation (LDA) model. ˛ and ˇ are
corpus-level parameters for the distribution of topic proportion in documents and word distribution
under topics (Blei et al. 2003)

6 Generative Models for Sentiment Analysis and Opinion Mining

p.w; z; j˛; ˇ/ D p. j˛/

N
Y

p.wn jˇ; zn /p.zn j/

115

(6.6)

nD1

LDA model postulates a two-layer hierarchal Bayesian assumption in the
document generation process: the topic proposition  is drawn from a Dirichlet
distribution, and the specific topic assignment of each word is drawn from a multinomial distribution specified by . The conjugacy between Dirichlet distribution
and multinomial distribution provides additional computational advantage, which
facilitates posterior inference. Compared to the pLSI model, the topic proposition
 is now modeled as a latent variable, rather than a model parameter. It thus makes
the number of parameters in LDA model independent from the training corpus, and
provides a principled way to estimate the topic proposition in unseen test documents,
i.e., via statistical posterior inference.
Many extensions of LDA have been made. Blei and Lafferty replaced the
Dirichlet prior for the topic proportion in documents with a log-normal distribution
to model the covariance of topics in a corpus (Blei and Lafferty 2007). Temporal
dynamics of word distribution under topics in a given corpus are modeled in Blei
and Lafferty (2006). Both continuous supervision (Mcauliffe and Blei 2008), e.g.,
opinion ratings, and discrete supervision (Zhu et al. 2009; Ramage et al. 2009), e.g.,
sentiment class, are introduced into LDA. Teh et al. introduced another layer of
Bayesian hierarchy over the generation of Dirichlet parameter ˛ (Teh et al. 2006),
such that the clustering property of documents can be captured.
Because of the coupling between the continuous variable  and discrete variable
z in a document, the posterior inference in LDA model becomes more challenging
than that in pLSI model. Two most popularly used inference methods are Gibbs
sampling (Griffiths and Steyvers 2004) and variational inference (Blei et al.
2003). Both inference methods take advantage of the conjugacy between Dirichlet
distribution and multinomial distribution to facilitate the computation, e.g.,  can
be integrated out in Gibbs sampling and a closed form solution for  exists in
variational inference. Further details about those two inference procedures can
be found in Andrieu et al. (2003) and Wainwright and Jordan (2008). Parallel
implementation of LDA model for large-scale document collection can be found
in Smola and Narayanamurthy (2010), Andrieu et al. (2003), Zhai et al. (2012),
and Wang et al. (2009). And the parameter estimation in a LDA model can also be
achieved via EM algorithms (Blei et al. 2003).

6.2 Generative Models for Sentiment Analysis
With the basic concepts about generative modeling of text documents introduced
in the previous section, we are now ready to discuss how to utilize the generative
models for sentiment analysis. Before diving into the details of specific models, we
will first define some categorizations of generative models for sentiment analysis

116

H. Wang and C.X. Zhai

Fig. 6.3 Basic categorization
of generative models for
sentiment analysis (Mimno
and McCallum 2008). (a)
Upstream model. (b)
Downstream model

to facilitate our later discussions. According to the notion proposed in Mimno and
McCallum’s work (Mimno and McCallum 2008), we can categorize most of existing
generative models for sentiment analysis as upstream models and downstream
models, according to their particular dependency assumption among the sentiment
label s, topic assignment z and observed word w in a given document. Using the
language of graphical models, we can illustrate these two classes of generative
models for sentiment analysis in Fig. 6.3.
Upstream models assume that in order to generate a word wd;n in a given
document d, one needs to first decide the sentiment polarity sd;n of this word, and sd;n
then determines the topic assignment zd;n for this word. Upstream models usually
model sentiment as discrete labels and assume there are different topic proportions
under different sentiment labels. In contrast, downstream models assume the
sentiment label sd;n is determined by the topic assignment zd;n , in parallel to the word
wd;n . Therefore, downstream models are more flexible in modeling the sentiment,
e.g., continuous ratings can also be modeled (Mcauliffe and Blei 2008; Wang et al.
2011). The key difference between the two kinds of models lies in the way we
specify the dependency.
Intuitively, in the upstream models, topics and words are potentially dependent on
the sentiment variable, thus it can be regarded as in the “up stream” with its influence
on other variables directly captured in the model. In the downstream model, the
sentiment variable is assumed to depend on topics, thus the sentiment variable can
be regarded as in the “down stream”, and the model attempts to capture how other
variables (mostly topics) influence the sentiment variable. Since we treat sentiment
as a response variable of topic variable, it opens up many different ways to model
sentiment, and can easily model numerical ratings, which would be hard to model
with an upstream model.
One thing we need to emphasize about the graphical representation illustrated
in Fig. 6.3 is that we do not explicitly distinguish the scope of sentiment label s,
e.g., a document-level label v.s., a word-level variable. In some existing models, s
is considered as a document-level variable, such that all sd;n is forced to share the
same value (Mcauliffe and Blei 2008; Wang et al. 2011); while some models treat s
as a word-level or sentence-level variable, so that different words or sentences in the
same document might be associated with different sentiment (Jo and Oh 2011; Lin
and He 2009; Mei et al. 2007). Another factor not specified in Fig. 6.3 is whether
sd;n is observable or latent. In most of downstream models, sd;n is considered as an
observable random variable, e.g., sentiment class label for the documents (Mcauliffe

6 Generative Models for Sentiment Analysis and Opinion Mining

117

and Blei 2008). Some upstream models treat sd;n as latent variables and sentiment
prior is introduced to guide the corresponding model learning process, e.g., in Mei
et al. (2007), Lin and He (2009), and Jo and Oh (2011); while some consider it as
document-level observable variables (Ramage et al. 2009, 2011).
Following this categorization, we will introduce the basic modeling assumptions,
model specifications and interesting findings and results from upstream and downstream models for sentiment analysis in the following sections.

6.2.1 Upstream Models for Sentiment Analysis
Upstream models assume that to generate a word in a text document, one needs to
first sample a latent sentiment label, then sample a topic label with respect to this
sentiment category, and finally sample the word from this chosen topic. One typical
upstream generative model for sentiment analysis is the Topic-Sentiment Mixture
model (TSM) proposed in Mei et al. (2007). TSM is constructed based on the pLSI
model: in addition to assuming a corpus consists of k topics with neutral sentiment,
TSM introduces two additional sentiment models, one for positive and one for
negative opinions. In TSM, the sentiment models are assumed to be orthogonal
to topic models in the sense that they would assign high probabilities to general
words that are frequently used to express sentiment polarities whereas topical
models would assign high probabilities to words representing topical contents with
neutral opinions. For example, for a collection of MP3 player reviews, the words
“nano,” “price” and “mini” are supposed to be observed more often in the neutral
topic models, “awesome,” “love” are more likely to be found in positive sentiment
models, and “hate,” “bad” are more likely to be found in negative sentiment models.
A new concept called “theme” is then introduced in TSM and it is modeled as a
compound of these three components: neutral words, positive words and negative
words, in each document. The combination of topic models and sentiment models
creates a theme about a particular aspect with certain sentiment polarity in a
given document. And such combination varies across different documents to reflect
users’ distinct sentiment polarities toward the same aspect. Once the themes are
determined, a document is modeled as a mixture over the themes, and the rest
generation process follows what in the pLSI model.
We followed the representation used in Mei et al. (2007) to depict the TSM model
in Fig. 6.4. We should note this representation does not follow the conventional
graphical model representation of probabilistic models. According to the figure,
the generation of words from the document-specific themes follows the same
assumption as that in a pLSI model. The themes in a particular document are
modeled as another mixture over the corpus-level neutral, positive and negative
topics. As a result, a TSM model can be considered as a three-layer Bayesian model
of documents.
Since TSM model is based on the pLSI model, EM algorithm with a closed
form posterior inference is possible. TSM is unsupervised and it does not directly

118

H. Wang and C.X. Zhai

δ1,d,F

θ2

δ2,d,F

θ1
πd1

θ2
δ
θk k,d,F
δj,d,P

Themes

Neutral

θ1

1 – λB

πd2
πdk

Positive Negative

θk
d

θP
θN

w

λB

δj,d,N
B

Fig. 6.4 Illustration of Topic-Sentiment Mixture model. f1 ; 2 ; : : : ; k g, P and N labeled with
“Neutral,” “Positive” and “Negative” in the dash round box denote the neutral, positive and
negative topics in the corpus accordingly. f1 ; 2 ; : : : ; k g located in the dash round box labeled
with “theme” denote the themes of a particular document. A theme is modeled as a mixture
over the latent neutral, positive and negative topics; and the mixing weights are denoted as
fıi;d;F ; ıj;d;P ; ıj;d;N g for each specific theme i. B represents the background topic model, and words
in a given document are sampled from a mixture of the themes and background topic (Mei et al.
2007)

model sentiment labels. In TSM, sentiment prior extracted from external corpus
was introduced to the EM algorithm to guide the parameter estimation of sentiment
models. Thus a collection of text data with sentiment labels is needed to induce
priors for effective separation of positive and negative topics, but the sample text
data does not have to be related to the opinionated text data to be analyzed. With the
learned topic models and sentiment models in TSM, topic life cycles and sentiment
dynamics can be extracted from text documents. These mining results provide
unique insights about the latent sentiment conveyed in unstructured text data.
Because TSM model is based on the pLSI model, it also suffers from its
limitations, e.g., overfitting and can hardly generalize to unseen documents. Several
follow-up work tries to address the limitations with LDA’s modeling assumptions.
In (Lin and He 2009), Lin and He proposed a joint sentiment and topic (JST)
model for sentiment analysis. In JST model, a corpus is assumed to contain S  k
topics, where S is the number of sentiment categories, e.g., positive, negative and
neutral. As a result, in JST the combination of topics and sentiments is modeled as
a Cartesian product between topic models and sentiment models, similarly to the
linear interpolation combination assumed in the TSM model.
As an upstream model, JST model first samples a sentiment label and then samples topic assignment and the word from corresponding distributions. To generate a
document with the JST model, one needs to first sample a sentiment mixture for that
document from a shared Dirichlet distribution; and under each sentiment category,
sample a topic mixing proportion from another corpus-level Dirichlet distribution.

6 Generative Models for Sentiment Analysis and Opinion Mining

119

Fig. 6.5 Graphical model representation of Joint Sentiment and Topic (JST) model.  is a Sby-T matrix controlling the word distribution under each sentiment-topic combination.  is the
sentiment mixture proportion in a given document, and it is assumed to be drawn from a Dirichlet
distribution with parameter . l is a specific sentiment assignment for word w, and it also controls
the topic assignment z of this word.  is S k-dimensional vectors, which denote the topic proportion
under each sentiment class in this document (Lin and He 2009)

Specifically, the topic proportion in each document is modeled as S k-dimensional
vectors, which allow different topic mixtures under different sentiment categories.
Gibbs sampling is used to perform the posterior inference of latent variables in JST,
e.g., latent topic assignments, sentiment and topic mixture. The graphical model
representation of JST model is illustrated in Fig. 6.5.
Given JST model is also an unsupervised model, sentiment prior is vital for
it. Sentiment seed words are injected as the prior for the word distribution under
different topics in JST. The authors reported that without sentiment prior, JST’s
performance in sentiment categorization is close to random (Lin and He 2009).
Jo and Oh’s Aspect and Sentiment Unification Model (ASUM) employs the same
generative assumption as that in JST model. But to enforce the topic and sentiment
coherence inside a document, they further assumed all the words in one sentence
share the same topic and sentiment assignment. The same posterior inference
procedure as that in JST model is applied in ASUM, which takes sentence as the
basic unit for inference. Because ASUM is based on the same generation assumption
as that JST, it also heavily depends on sentiment seed words to differentiate different
types of sentiments.
A different variant of upstream generative model for sentiment analysis is
proposed in Zhao et al.’s work in Zhao et al. (2010). In particular, a Maximum
Entropy (ME) model is introduced into LDA model to control the selection of words
from background topic, aspect-specific topics and opinion-specific topics. In the
proposed ME-LDA model, a given word can be generated from five different types
of topics: background topic, general aspect topic, aspect-specific topics, general
opinion topics and aspect-specific topics. And a particular word’s assignment to
those five topics is controlled by a Maximum Entropy model based on discriminative
features extracted from previous, current and next words’ POS tags, and word
content. The authors used a set of training sentences with labeled background,

120

H. Wang and C.X. Zhai

aspect and opinion words to estimate the ME model beforehand. With this pretrained
ME model on a separately labeled corpus, ME-LDA should really be regarded as a
hybrid of generative and discriminative model.
The generative topic models have been used as building blocks in many other
sentiment analysis tasks. Lu et al., used pLSI model to integrate opinions expressed
in a well-written expert review with lots of opinions scattering in various sources
such as blogspaces and forums (Lu and Zhai 2008). Sentiment prior is given to
the pLSI model to identify sentiment-oriented aspects from expert reviews. Such
sentiment-oriented aspects are then used to retrieve the most relevant sentences from
various sources of opinionated text data. Later on, they used topics learned from
pLSI models as lower dimensional representation of documents for clustering (Lu
et al. 2009). In each aspect-specific document clusters, the overall sentiment rating
is aggregated to predict aspect-level opinions.
From the discussion above, we can observe that most of the typical upstream
generative models for sentiment analysis treat sentiment label as latent variable
over each word, and sentiment prior is used to inject sentiment polarity into the
models. Although such a modeling approach provides flexibility of identifying
distinct opinions on individual words, strong knowledge about sentiment is required
to ensure satisfactory analysis results. As an alternative solution, Ramage et al.’s
Labeled-LDA model provides a different perspective of modeling sentiment with
topics in an upstream model (Ramage et al. 2009). Specifically, in Labeled-LDA
model, sentiment can be modeled as document-level variables, which is directly
observable. And the choice of document sentiment labels affects the topic mixing
proportion in this document. Later on, partially Labeled-LDA model was developed
to handle the situation, in which some of labels are not directly observable in a
document (Ramage et al. 2011).

6.2.2 Downstream Models for Sentiment Analysis
Downstream models reverse the generation assumption between the sentiment
labels and latent topic assignments: to generate a text document, one needs to
first select the topic assignments in this document, and sample the words and
sentiment labels with respect to those topics. One typical downstream generative
model for sentiment analysis is Blei and McAuliffe’s supervised LDA (sLDA)
model (Mcauliffe and Blei 2008). The graphical model representation of sLDA
model is illustrated in Fig. 6.6.
The assumed generation process of text content in sLDA model is identical to
that assumed in LDA model. In addition to document generation, sLDA assumes the
document-level response variable y is drawn from
P a Gaussian distribution with mean
T zN and standard deviation , in which zN D N1 NnD1 zn , i.e., the mean vector of topic
assignments in document d. With this continuous assumption about the response
variable y, sLDA can be used as a regression model to model the opinion ratings
in text documents. The generation of y can be further modeled with a generalized

6 Generative Models for Sentiment Analysis and Opinion Mining

121

Fig. 6.6 Graphical model
representation of supervised
Latent Dirichlet Allocation
(sLDA) model. y is the
response variable observed in
document d (Mcauliffe and
Blei 2008)

linear model, e.g., a logistic model, to model discrete sentiment classes. Variational
inference similar to that used in LDA model can be applied in sLDA model for
posterior inference. Later on, Zhu et al. introduced the idea of maximum margin
training in sLDA model for better predictive performance (Zhu et al. 2009). Blei
and Wang extended sLDA to a collaborative setting (Wang and Blei 2011), where
collaborative filtering based on users’ opinion ratings can be achieved in the latent
topic space.
Boyd-Graber and Resnik further generalized sLDA model to perform holistic
sentiment analysis across languages (Boyd-Graber and Resnik 2010). In their
proposed MLSLDA model, topics organized according to some shared semantic
structure that can be represented as a tree, and the sentiment label in a given
document is modeled as a regression response variable with respect to the topic
assignments. As a result, MLSLDA simultaneously identifies how multilingual
concepts are clustered into thematically coherent topics and how topics associated
with text connect to the sentiment ratings.
In (Lin et al. 2012), Lin and He performed an interesting reparameterization
of JST to turn their original upstream JST model into a new downstream joint
sentiment-topic model, named Reverse-JST. In Reverse-JST, it is assumed that to
generate the word sequence in a given document, one needs to first sample topic
assignment, then sample sentiment category with respect to the selected topic,
and select a word under this topic sentiment combination. Without the sentiment
seed words being specified, the JST model and Reverse-JST model are essentially
the same, since both of them model the combination of topics and sentiments
with Cartesian product. The authors’ empirical evaluation indicates JST performs
consistently better than Reverse-JST when sentiment seed words are available.
One important line of research in downstream generative models for sentiment
analysis focuses on aspect-level understanding of opinions. Those aspect ratings
can be understood as users’ sentiment polarities over the latent topics in a given
document. This line of research exploits and analyzes user-generated opinionated
text content at the detailed topical aspect level and enables a deeper and more
detailed understanding of user opinions.
Titov and McDonald developed a LDA-based generative model called MultiAspect Sentiment (MAS) model for joint modeling of text content and aspect
ratings for sentiment summarization (Titov and McDonald 2008b). In their solution,
two types of topics, i.e., global and local topics, are explicitly modeled; and each
fraction inside a document (modeled as a moving window of sequential words

122

H. Wang and C.X. Zhai

in the document) is assumed to be a mixture over those global and local topics.
Based on the latent topic assignments, aspect ratings are assumed to be determined
by a logistic regression model, which takes the topic assignments and the word
sequence in that window as input. Comparing to sLDA model, which only captures
the document-level sentiment, MAS enables the understanding of sentiment at finer
granularity, in which the detailed prediction of aspect-level opinions is possible.
However, in MAS the aspect-level sentiment labels are assumed to be known to
the model during the training phase. This limits the application of this type aspectlevel sentiment analysis, when such detailed annotations are not available. Wang et
al.’s work in latent aspect rating analysis (LARA) (Wang et al. 2010, 2011) alleviates
the dependency on the fully annotated data and enables in-depth understanding of
users’ opinions at the aspect-level. In the LARA model, the overall rating is assumed
to be observable in a given document and it provides guidance for estimation of
corresponding latent aspect ratings. Moreover, in addition to analyzing opinions
expressed in text document at the level of topical aspects to discover each individual
user’s latent opinion on each aspect, the LARA model also identifies the relative
preference users have placed onto those different aspects when forming the overall
judgment.
A two-stage approach based on bootstrapping aspect segmentation and latent
rating regression model was first proposed to solve the problem of LARA in Wang
et al. (2010). This solutions assumes that a set of predefined keywords specifying
the latent topical aspects are available. The overall sentiment rating in a document
is assumed to be drawn from a mixture of the latent aspect ratings. Via posterior
inference, the overall rating can be decomposed into aspect ratings, the inferred
mixing weights reflect users’ preference over those latent aspects.
However, this two-step solution is not a fully generative model, because it does
not specify the generation of text content in a document. Later on, a unified solution
based on LDA model is introduced to jointly identify the latent topical aspects,
and infer the latent aspect weights/ratings from each user’s opinionated review
article (Wang et al. 2011). As shown in Fig. 6.7, in the unified LARA model, each
latent aspect rating in a given document is assumed to be drawn from a Gaussian

Fig. 6.7 Latent aspect rating analysis (LARA) model. s is a K-dimensional vector indicating
the aspect-level latent opinion ratings. r denotes observable document-level opinion rating.
Specifically, the LARA model assumes the overall rating r is determined by the weighted average
of aspect ratings, i.e., r  N.T s; 2 / (Wang et al. 2011)

6 Generative Models for Sentiment Analysis and Opinion Mining

123

distribution with mean determined
by the linear combination of words assigned to
P
that aspect, e.g., si  N. NnD1 wn ij Œwn D vj ; zn D i; ı 2 /. Intuitively, the latent
topic assignments z segment the text content into different aspects, and the observed
words in each aspect segment contribute to the sentiment polarity of corresponding
aspect rating. Then the observable overall rating is assumed to be drawn from
another linear combination of these latent aspect ratings, i.e., r  N.T s; 2 /.
Variational inference is used to infer the latent topic assignments, aspect ratings
and weights in a given document simultaneously.
Clearly distinct from all previous work in opinion analysis that mostly focuses
on integrated entity-level opinions, LARA reveals individual users’ latent sentiment
preferences at the level of topical aspects in an unsupervised manner. Discovering
such detailed user preferences (which are often hard to obtain by a human from
simply reading many reviews) enables many important applications. First, such
analysis facilitates in-depth understanding of user intents. For example, by mining
the product reviews, LARA recognizes which aspect influences a particular user’s
purchase decision the most. Second, by identifying each user’s latent aspect
preference in a particular domain (e.g., hotel booking), personalized result ranking
and recommendation can be achieved. Third, discovering the general population’s
sentiment preferences over different aspects of a particular product or service
provides a more effective way for businesses to manage their customer relationship
and conduct market research.
Follow up work extended LARA model in different directions. Diao et al.
introduced collaborative filtering into LARA modeling to uniformly model different
users’ rating preferences in a generative manner (Diao et al. 2014). Wu and Ester
also combined the LARA model with collaborative filtering method to predict the
latent aspect ratings even when the users have not generated the review content (Wu
and Ester 2015). Both of these two models enable aspect-based recommendation.

6.3 Applications of Generative Models for Sentiment
Analysis
In the above discussions, we have summarized the most representative works in
modeling opinionated text documents with generative models. In this section, we
review the landscape of application opportunities of such models.

6.3.1 Sentiment Lexicon Construction
A sentiment lexicon can be directly used for sentiment tagging or suggesting
useful features for supervised learning approaches to sentiment analysis. One major
challenge in constructing a sentiment lexicon is that the polarity of a word such as

124

H. Wang and C.X. Zhai

“long” highly depends on the context; for example, “long battery life” is positive,
while “long rebooting time” is negative in the same review of a laptop. Thus a
lexicon must incorporate context when specifying the polarity of a word.
A generative model can capture context by using appropriate latent variables, and
thus be useful for constructing a topic-specific sentiment lexicon. The sentiment
polarity of a word can be modeled in two different ways in a generative model.
In the first, we may explicitly have a positive or negative topic represented as a
word distribution. In such a case, the probability of a word can be regarded as an
indicator of polarity, thus a word with very high probability according to a positive
model would be tagged as a positive word and the probability can be used as a
measure of confidence which may be useful to include in the lexicon. In the second,
the sentiment of a term is modeled with a real number, which can be positive or
negative, depending on the sentiment of the word. In such a case, a high positive
weight would indicate a very positive word (for the corresponding topic).
One example of work in the first category is the topic-sentiment mixture model
(Mei et al. 2007). In this work, the authors demonstrated a list of positive and
negative words specific to the topics of “movies” and “cities”: “beautiful,” “love”
and “awesome” are automatically identified as positive for “cities” while “hate,”
“traffic” and “stink” are identified as negative for this topic. The authors in Lin and
He (2009) also reported a similar list of learned sentiment lexicon from JST model
on a movie review data set. However, as we discussed before, upper stream models
depend on sentiment priors to determine the sentiment polarity of learned topics.
The bias in those sentiment seed words determine the qualify of learned sentiment
lexicon.
Another example of the first category is the downstream model sLDA (Mcauliffe
and Blei 2008). In general, the downstream models can resolve the dependency on
sentiment prior by directly learning from the given sentiment labels. In (Mcauliffe
and Blei 2008), the authors applied sLDA on a set of labeled movie reviews,
where the learned topics are directly aligned with numerical sentiment polarities,
e.g., a topic represented by the words of “least,” “problem” and “unfortunately” is
strongly correlated with negative opinion while the topic represented by the words
of “motion,” “simple” and “perfect” is strongly correlated with negative opinion.
An example of the second category is the LARA model (Wang et al. 2010), which
is also a downstream model, but in contrast with sLDA, LARA uses numerical
weights to model the sentiment of a word, and thus can learn a topic-specific lexicon
in the form of positive and negative weights for words. Table 6.1 illustrates an
sample output from the LARA model (Wang et al. 2010), where the aspect specific
word sentiment polarity was learned from a collection of hotel reviews.
As shown in the table, words “linen”, “walk” and “beach” do not have opinion
annotations in general sentiment lexicons, e.g., SentiWordNet (Esuli and Sebastiani
2006), since they are nouns, while the LARA model automatically assigns them
positive sentiment likely because “linen” may suggest the “cleanliness” condition is
good and “walk” and “beach” might imply the location of a hotel is convenient.
In general, one can potentially design a generative model to embed a particular
perspective of topical context as needed for an application to automatically construct

6 Generative Models for Sentiment Analysis and Opinion Mining

125

Table 6.1 Estimated word sentiment polarities under different aspects. The numbers to the right
of listed words indicate their learned sentiment weight from a LARA model (Wang et al. 2010)
Value
Resort 22.80
Value 19.64
Excellent 19.54
Worth 19.20
Quality 18.60
Bad 24.09
Money 11.02
Terrible 10.01
Overprice 9.06
Cheap 7.31

Rooms
View 28.05
Comfortable 23.15
Modern 15.82
Quiet 15.37
Spacious 14.25
Carpet 9.88
Smell 8.83
Dirty 7.85
Stain 5.85
Ok 5.46

Location
Restaurant 24.47
Walk 18.89
Bus 14.32
Beach 14.11
Perfect 13.63
Wall 11.70
Bad 5.40
MRT 4.83
Road 2.90
Website 1.67

Cleanliness
Clean 55.35
Smell 14.38
Linen 14.25
Maintain 13.51
Spotlessly 8.95
Smelly 0.53
Urine 0.43
Filthy 0.42
Dingy 0.38
Damp 0.30

a topic-specific lexicon that would capture the desired dependency of sentiment on
context. Such a lexicon may itself be used directly as knowledge about people’s
opinions about a topic, thus facilitating comparative analysis of opinions across
opinion holders or other interesting context variables.

6.3.2 Sentiment Annotation and Pattern Discovery
Another direct application of the generative models for sentiment analysis is
sentiment annotation and pattern discovery. Sentiment annotation is to tag a text
object with sentiment labels which can be categorical (e.g., positive vs. negative vs.
neutral) or numerical (i.e., ratings). Once tagging is done, we can easily examine
patterns of opinions by associating sentiment labels with context variables such
as time, location, and sources of opinions to reveal patterns of opinions such as
spatiotemporal trends of opinions.
In (Lin and He 2009), the JST model is reported to achieve comparable performance as supervised statistical algorithms in binary sentiment classification. And
sLDA is reported to have better predictive power than the supervised lasso leastsquare regression model trained on LDA model’s topic output (Mcauliffe and Blei
2008). With maximum margin estimation method, further improved classification
performance is achieved in MedLDA model (Zhu et al. 2009). The aspect-level
sentiment model, e.g., MAS (Titov and McDonald 2008b) and LARA (Wang
et al. 2010, 2011), can also predict aspect-level sentiment ratings, which might be
unobservable during the training process, thus enabling discovery of latent patterns
of opinions at the level of subtopics.
Based on the identified sentiment polarity from text content, temporal dynamics
of opinions in user-generated content is studied in TSM model (Mei et al. 2007). A
hidden Markov model is built based on the TSM model’s identified neutral, positive

126

H. Wang and C.X. Zhai

and negative opinions over time to capture the topic life cycles and sentiment
dynamics. Similar idea has been explored in Si et al. (2013) to leverage topic based
sentiments from Twitter to help predict the stock market. A continuous Dirichlet
Process Mixture model is developed to estimate the daily topic set, which is mapped
to a sentiment time series according to predefined sentiment lexicon. A regression
model is build to predict the stock index with respect to this Twitter sentiment time
series.

6.3.3 Topic-Specific Sentiment Summarization
Yet another interesting application of the generative sentiment analysis models is to
generate topic-specific sentiment summaries. Summarization of opinions facilitates
digestion of opinions by users and also provides entry points for a user to navigate
into detailed information about a specific aspect of opinion. In (Jo and Oh 2011),
review text content can be summarized according to its topic and sentiment.
Table 6.2 illustrated the aspect-specific sentiment summarization reported in Wang
et al. (2010). Such detailed aspect-level sentiment analysis and summarization
provide flexibility for ordinal users to navigate through the opinionated text corpus.

6.3.4 Deep Analysis of Latent Preferences of Opinion Holders
An important application enabled by generative models is deep analysis of latent
preferences of opinion holders. While the applications discussed above can all

Table 6.2 Aspect-based comparative summarization (Hotel Max in Seattle as an example) (Wang
et al. 2010)
Aspect
Value

Room

Location

Summary
Truly unique character and a great location at a reasonable price Hotel
Max was an excellent choice for our recent three night stay in Seattle
Overall not a negative experience, however considering that the hotel
industry is very much in the impressing business there was a lot of room
for improvement
We chose this hotel because there was a Travelzoo deal where the Queen
of Art room was $139.00/night
Heating system is a window AC unit that has to be shut off at night or
guests will roast
The location ‘a short walk to downtown and Pike Place market’ made the
hotel a good choice
When you visit a big metropolitan city, be prepared to hear a little traffic
outside!

Rating
3.1
1.7

3.7
1.2
3.5
2.1

6 Generative Models for Sentiment Analysis and Opinion Mining
Table 6.3 User rating
behavior analysis (Wang et al.
2010)

Aspect
Value
Room
Location
Cleanliness
Service

Expensive hotel
5 Stars 3 Stars
0:134
0:148
0:098
0:162
0:171
0:074
0:081
0:163
0:251
0:101

127
Cheap hotel
5 Stars 1 Star
0:171
0:093
0:126
0:121
0:161
0:082
0:116
0:294
0:101
0:049

be potentially supported by other approaches to sentiment analysis, the deep
analysis of latent preferences of opinion holders cannot be easily supported by
other approaches, and thus represents a unique advantage of generative models for
sentiment analysis. This unique benefit comes from the explicit use of meaningful
latent variables in a generative model to model and capture the latent information
about an opinion holder.
For example, the aspect-level sentiment analysis enabled by LARA model
enables the in-depth understanding of users’ sentiment preference in their decision
making process. In (Wang et al. 2010), the authors demonstrated the learned aspect
weights in a hotel data set (see in Table 6.3), and such latent weights unveil
reviewers’ detailed sentiments preference over those aspects.
It is interesting to note that according to the learned aspect preference weights
in Table 6.3, reviewers give the “expensive hotels” high ratings mainly due to their
nice services and locations, while they give low ratings to such hotels because of
undesirable room condition and overprice. In contrast, reviewers give the “cheap”
hotels high ratings mostly because of the good price/value and good location, while
giving low ratings for its poor cleanliness condition. Such analysis can be performed
for different groups of hotels, or different groups of consumers, or different time
periods, etc, thus enabling potentially many interesting applications. Note that such
a deep understanding of reviewers cannot be easily achieved by other approaches
to sentiment analysis; indeed, it cannot even be easily achieved by humans even if
they read all the reviews, thus representing an important benefit of using generative
models for sentiment analysis.
Such a deep understanding of latent preferences would further enable many
applications, particularly those requiring better understanding people’s behavior and
preferences and finding groups of people with shared preferences. Examples include
market research where we want to understand consumer’s preferences, business
intelligence where we want to understand the relative strength and weakness of a
product with respective to another product for a particular group of consumers, and
targeted advertising where the goal is to discover groups of consumers that may
potentially find a product appealing.

128

H. Wang and C.X. Zhai

6.3.5 Entity Ranking and Recommendation
Generative models enable detailed understanding of opinions about entities such as
products as well as detailed understanding of preferences of people such as reviewers. Thus they can be used to generate more informative representations for both
entities and users, which further helps improving the ranking and recommendation
of entities for users.
For example, based on the identified aspect preferences, collaborative filtering
can be performed. In (Wang and Blei 2011), scientific article recommendation is
performed based on the learned latent topics in each individual user from their rating
history. Comparing to the tradition collaborative filtering solutions, which can only
provide item-level recommendations, the collaborative topic model enables topicspecific recommendations. Diao et al.’s JMARS model identifies users’ aspect-level
sentiment preference and the content distribution in their generated review content
(Diao et al. 2014). Improved recommendation performance is reported comparing
to traditional collaborative filtering solutions.
In LARA (Wang et al. 2010), the inferred reviewer preferences can be leveraged
to support personalized entity recommendation. Specifically, a user can specify his
or her preferences (e.g., price is much more important than service or location), and
the system can selectively use only those reviewers that are written by reviewers
with similar preferences to recommend hotels, instead of using the generic set
of all reviewers, making the recommendation more accurately reflect the specific
preferences of this particular group of users. Such a personalized recommendation
is only possible because of the inferred latent preference information, which enabled
us to know which reviewers have put more weight on price than on location and
service.

6.3.6 Social Network and Social Media Analysis
The generative model based solutions for sentiment analysis have also been explored
in the context of social networks. Liu et al. explore topic modeling technique to
study topic-level influence in heterogeneous networks (Liu et al. 2010). Rao et
al. developed a supervised topic model to analyze emotion based on social media
content (Rao et al. 2014). Xu et al. developed a pLSI-based generative model to
analyze users’ posting behaviors on Twitter: via generative modeling, the motivation
of a user’s posting behavior is decomposed into the factors of breaking news, posts
from social friends and user’s intrinsic interest.

6 Generative Models for Sentiment Analysis and Opinion Mining

129

6.4 Resources on the Web
Most of aforementioned generative models for sentiment analysis have open
implementations online and there are also publicly available sentiment data sets
on the Web. In this section, we will briefly summarize some resources for this line
of research.
David M. Blei maintains a page for topic modeling, where implementations of
many LDA-based generative models (e.g., the LDA (Blei et al. 2003) and sLDA
(Mcauliffe and Blei 2008) models) are provided: http://www.cs.princeton.edu/~blei/
topicmodeling.html. The Stanford Natural Language Processing group provides a
Topic Modeling Toolbox, which can easily import and manipulate text from cells
in Excel and other spreadsheets. This toolbox focuses on helping social scientists
and others who wish to perform analysis on datasets that have a substantial textual
component. Implementations of LDA and Labeled-LDA (Ramage et al. 2009) models are provided in this toolbox. Andrew McCallum and David Mimno developed
a Java-based package for statistical text document modeling named MALLET
(McCallum 2002), which provides implementations of several aforementioned topic
models, e.g., LDA model. Besides those generic implementation of standard topic
models, there are also implementations of those specific generative models for
sentiment analysis introduced above. The authors of JST model (Lin and He 2009)
provide their implementation on GitHub at: https://github.com/linron84/JST. And
the authors of LARA model (Wang et al. 2010) provide their implementation of
two-step solution at: http://www.cs.virginia.edu/~hw5x/Codes/LARA.zip.
Besides those open implementation of generative models, there are also public
sentiment data sets available on the Web. The Stanford Network Analysis Project
provides a large collection of Amazon reviews, spanning a period of 18 years,
including around 35 million reviews up to March 2013. The data can be found
at http://snap.stanford.edu/data/web-Amazon.html. The authors of book “Sentiment
Analysis and Opinion Mining” (Liu 2012) also provide a large collection of amazon
reviews at http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html, where additional sentence-level positive and negative annotations are possible in a subset of
reviews. Yelp.com hosts an annual “Yelp Dataset Challenge,” which provides more
than 1.6 million Yelp reviews from more than 366k users. Besides the text content
and opinion ratings, this Yelp data set also includes the social connections among
those reviewers. In addition to those user review data sets, twitter data sets with
sentiment annotations are also available. Go et al. manually created a collection of
40,216 tweets with polarity sentiment labels (Go et al. 2009). This data set can be
found at http://help.sentiment140.com/for-students. Shamma et al. used Amazon
Mechanical Turk to annotate sentiment polarities in 3,269 tweets posted during
the presidential debate on September 26, 2008 between Barack Obama and John
McCain (Shamma et al. 2009). The data set can be found at https://bitbucket.org/
speriosu/updown/src/5de483437466/data/. Saif et al. provided a survey of datasets
for twitter sentiment analysis (Saif et al. 2013).

130

H. Wang and C.X. Zhai

6.5 Summary
In this chapter, we provide an introduction and systematic review of generative
models for sentiment analysis, which represent an important family of (mostly
unsupervised) approaches to sentiment analysis that can be potentially applied
to any opinionated text data due to their generality and robustness. They are
especially powerful in inferring latent variables about opinion holders or detailed
opinions about specific subtopics and can very effectively perform joint analysis
of both opinionated text data and the companion numerical ratings. Besides supporting common applications of sentiment analysis such as sentiment classification,
sentiment lexicon construction, and sentiment summarization, they also enable
many other interesting new applications such as topic-specific lexicon construction,
detailed opinion pattern discovery in association with context variables such as time,
location, and sources, personalized entity ranking and recommendation, and deep
analysis of latent preferences of opinion holders. When using appropriate latent
variables, such generative models can discover latent opinion patterns from large
amounts of data that are hard to discovery by humans even if they have time to read
all the opinionated text data, thus are essential tools for building intelligent systems
for opinion understanding and its related applications, as well as for research in
computational social science.

References
Andrieu, C., N. De Freitas, A. Doucet, and M.I. Jordan. 2003. An introduction to MCMC for
machine learning. Machine Learning 50(1–2): 5–43.
Bishop, C.M. 2006. Pattern recognition and machine learning. New York: Springer.
Blei, D.M. 2012. Probabilistic topic models. Communications of the ACM 55(4): 77–84.
Blei, D.M., and J.D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 113–120. ACM.
Blei, D.M., and J.D. Lafferty. 2007. A correlated topic model of science. The Annals of Applied
Statistics 1(1): 17–35.
Blei, D.M., A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine
Learning Research 3: 993–1022.
Boyd-Graber, J., and P. Resnik. 2010. Holistic sentiment analysis across languages: Multilingual
supervised latent Dirichlet allocation. In Proceedings of the 2010 Conference on Empirical
Methods in Natural Language Processing (EMNLP ’10), 45–55, Stroudsburg. Association for
Computational Linguistics.
Brants, T., F. Chen, and I. Tsochantaridis. 2002. Topic-based document segmentation with
probabilistic latent semantic analysis. In Proceedings of the Eleventh International Conference
on Information and Knowledge Management, 211–218. ACM.
Cambria, E., S. Poria, R. Bajpai, and B. Schuller. 2016. SenticNet 4: A semantic resource for
sentiment analysis based on conceptual primitives. In: COLING, 2666–2677.
Chen, S.F., and J. Goodman. 1996. An empirical study of smoothing techniques for language
modeling. In Proceedings of the 34th Annual Meeting on Association for Computational
Linguistics, 310–318. Association for Computational Linguistics.

6 Generative Models for Sentiment Analysis and Opinion Mining

131

Dave, K., S. Lawrence, and D.M. Pennock. 2003. Mining the peanut gallery: Opinion extraction
and semantic classification of product reviews. In Proceedings of the 12th International
Conference on World Wide Web (WWW ’03), 519–528. New York: ACM.
Deerwester, S.C., S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. 1990. Indexing
by latent semantic analysis. JAsIs 41(6): 391–407.
Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via
the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological) 39: 1–38.
Diao, Q., M. Qiu, C.-Y. Wu, A.J. Smola, J. Jiang, and C. Wang. 2014. Jointly modeling aspects,
ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 193–202.
ACM.
Ding, X., and B. Liu. 2007. The utility of linguistic rules in opinion mining. In Proceedings
of the 30th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, 811–812. ACM.
Ding, X., B. Liu, and P.S. Yu. 2008. A holistic lexicon-based approach to opinion mining. In
Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM
’08), 231–240. New York: ACM.
Esuli, A., and F. Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion
mining. In Proceedings of LREC, vol. 6, 417–422. Citeseer.
Go, A., R. Bhayani, and L. Huang. 2009. Twitter sentiment classification using distant supervision.
CS224N Project Report, Stanford, 1: 12.
Griffiths, T.L., and M. Steyvers. 2004. Finding scientific topics. Proceedings of the National
Academy of Sciences, 101(suppl 1): 5228–5235.
Harris, Z.S. 1954. Distributional structure. Word.
Hearst, M.A., S.T. Dumais, E. Osman, J. Platt, and B. Scholkopf. 1998. Support vector machines.
Intelligent Systems and their Applications, IEEE, 13(4): 18–28.
Hiemstra, D., and W. Kraaij. 1998. Twenty-one at TREC7: ad-hoc and cross-language track. In
Proceedings of The Seventh Text REtrieval Conference (TREC 1998), Gaithersburg, 174–185,
9–11 Nov 1998.
Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual
International ACM SIGIR Conference on Research and Development in Information Retrieval,
50–57. ACM.
Hong, L., and B.D. Davison. 2010. Empirical study of topic modeling in twitter. In Proceedings of
the First Workshop on Social Media Analytics, 80–88. ACM.
Hu, X., J. Tang, H. Gao, and H. Liu. 2013. Unsupervised sentiment analysis with emotional
signals. In Proceedings of the 22nd International Conference on World Wide Web, 607–618.
International World Wide Web Conferences Steering Committee.
Jo, Y., and A.H. Oh. 2011. Aspect and sentiment unification model for online review analysis. In
Proceedings of the Fourth ACM International Conference on Web Search and Data Mining,
815–824. ACM.
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant
features. Berlin/New York: Springer.
Jordan, A. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression
and naive bayes. Advances in Neural Information Processing Systems 14: 841.
Jurafsky, D., and J.H. Martin. 2009. Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech recognition.
Katz, S.M. 1987. Estimation of probabilities from sparse data for the language model component
of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):
400–401.
Kim, S.-M., and E. Hovy. 2004. Determining the sentiment of opinions. In Proceedings of the 20th
International Conference on Computational Linguistics, 1367. Association for Computational
Linguistics.

132

H. Wang and C.X. Zhai

Leskovec, J., D. Huttenlocher, and J. Kleinberg. 2010. Predicting positive and negative links in
online social networks. In Proceedings of the 19th International Conference on World Wide
Web, 641–650. ACM.
Lin, C., and Y. He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of the
18th ACM Conference on Information and Knowledge Management, 375–384. ACM.
Lin, C., Y. He, R. Everson, and S. Rüger. 2012. Weakly supervised joint sentiment-topic detection
from text. IEEE Transactions on Knowledge and Data Engineering 24(6): 1134–1145.
Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language
Technologies 5(1): 1–167.
Liu, B. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge:
Cambridge University Press.
Liu, L., J. Tang, J. Han, M. Jiang, and S. Yang. 2010. Mining topic-level influence in heterogeneous
networks. In Proceedings of the 19th ACM International Conference on Information and
Knowledge Management (CIKM ’10), 199–208. New York: ACM.
Lu, Y., M. Castellanos, U. Dayal, and C. Zhai. 2011. Automatic construction of a contextaware sentiment lexicon: An optimization approach. In Proceedings of the 20th International
Conference on World Wide Web, 347–356. ACM.
Lu, Y., and C. Zhai. 2008. Opinion integration through semi-supervised topic modeling. In
Proceedings of the 17th International Conference on World Wide Web, 121–130. ACM.
Lu, Y., C. Zhai, and N. Sundaresan. 2009. Rated aspect summarization of short comments. In
Proceedings of the 18th International Conference on World Wide Web, 131–140. ACM.
McAuley, J., and J. Leskovec. 2013. Hidden factors and hidden topics: Understanding rating
dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender
Systems, 165–172. ACM.
Mcauliffe, J.D., and D.M. Blei. 2008. Supervised topic models. In Advances in Neural Information
Processing Systems, 121–128.
McCallum, A.K. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.
edu.
McLachlan, G., and T. Krishnan. 2007. The EM algorithm and extensions, vol. 382. John Wiley &
Sons.
Mei, Q., X. Ling, M. Wondra, H. Su, and C. Zhai. 2007. Topic sentiment mixture: Modeling facets
and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide
Web, 171–180. ACM.
Mei, Q., and C. Zhai. 2005. Discovering evolutionary theme patterns from text: An exploration of
temporal text mining. In Proceedings of the Eleventh ACM SIGKDD International Conference
on Knowledge Discovery in Data Mining, 198–207. ACM.
Melville, P., W. Gryc, and R.D. Lawrence. 2009. Sentiment analysis of blogs by combining lexical
knowledge with text classification. In Proceedings of the 15th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD ’09), 1275–1284. New York:
ACM.
Mimno, D., and A. McCallum. 2008. Topic models conditioned on arbitrary features with
Dirichlet-multinomial regression. The 24th Conference on Uncertainty in Artificial Intelligence, 411–418.
Moghaddam, S., and M. Ester. 2011. ILDA: Interdependent LDA model for learning latent aspects
and their ratings from online product reviews. In Proceedings of the 34th International ACM
SIGIR Conference on Research and Development in Information Retrieval, 665–674. ACM.
Pang, B., and L. Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval 2(1–2): 1–135.
Pang, B., L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine
learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in
Natural Language Processing, vol. 10, 79–86. Association for Computational Linguistics.
Ponte, J.M., and W.B. Croft. 1998. A language modeling approach to information retrieval.
In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR ’98), 24–28 Aug 1998, Melbourne, 275–281.

6 Generative Models for Sentiment Analysis and Opinion Mining

133

Poria, S., E. Cambria, A. Gelbukh, F. Bisio, and A. Hussain. 2015. Sentiment data flow analysis
by means of dynamic linguistic patterns. IEEE Computational Intelligence Magazine 10(4):
26–36.
Rabiner, Lawrence R., and Biing-Hwang Juang. 1993. Fundamentals of speech recognition. Upper
Saddle River: Prentice-Hall.
Ramage, D., D. Hall, R. Nallapati, and C.D. Manning. 2009. Labeled LDA: A supervised topic
model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference
on Empirical Methods in Natural Language Processing, vol. 1, 248–256. Association for
Computational Linguistics.
Ramage, D., C.D. Manning, and S. Dumais. 2011. Partially labeled topic models for interpretable
text mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 457–465. ACM.
Rao, D., and D. Ravichandran. 2009. Semi-supervised polarity lexicon induction. In Proceedings of
the 12th Conference of the European Chapter of the Association for Computational Linguistics,
675–682. Association for Computational Linguistics.
Rao, Y., Q. Li, X. Mao, and L. Wenyin. 2014. Sentiment topic models for social emotion mining.
Information Sciences 266: 90–100.
Saif, H., M. Fernandez, Y. He, and H. Alani, 2013. Evaluation datasets for twitter sentiment
analysis a survey and a new dataset, the sts-gold. In Proceedings, 1st workshop on emotion
and sentiment in social and expressive media (ESSEM) in conjunction with AIIA conference,
Turin.
Shamma, D.A., L. Kennedy, and E.F. Churchill. 2009. Tweet the debates: Understanding community annotation of uncollected sources. In Proceedings of the First SIGMM Workshop on Social
Media, 3–10. ACM.
Si, J., A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng. 2013. Exploiting topic based twitter
sentiment for stock prediction. In ACL (2), 24–29.
Smola, A., and S. Narayanamurthy. 2010. An architecture for parallel topic models. Proceedings
of the VLDB Endowment 3(1–2): 703–710.
Steyvers, M., P. Smyth, M. Rosen-Zvi, and T. Griffiths. 2004. Probabilistic author-topic models for
information discovery. In Proceedings of the Tenth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 306–315. ACM.
Taboada, M., J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for
sentiment analysis. Computational linguistics, 37(2): 267–307.
Teh, Y.W., M.I. Jordan, M.J. Beal, and D.M. Blei. 2006. Hierarchical Dirichlet processes. Journal
of the American Statistical Association 101(476).
Titov, I., and R. McDonald 2008a. Modeling online reviews with multi-grain topic models. In
Proceedings of the 17th International Conference on World Wide Web, pages 111–120. ACM.
Titov, I., and R.T. McDonald 2008b. A joint model of text and aspect ratings for sentiment
summarization. In ACL, vol. 8, 308–316. Citeseer.
Wainwright, M.J., and M.I. Jordan. 2008. Graphical models, exponential families, and variational
inference. Foundations and Trends® in Machine Learning 1(1–2): 1–305.
Wang, C., and D.M. Blei. 2011. Collaborative topic modeling for recommending scientific articles.
In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, 448–456. ACM.
Wang, H., Y. Lu, and C. Zhai. 2010. Latent aspect rating analysis on review text data: A rating
regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 783–792. ACM.
Wang, H., Y. Lu, and C. Zhai. 2011. Latent aspect rating analysis without aspect keyword
supervision. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 618–626. ACM.
Wang, Y., H. Bai, M. Stanton, W.-Y. Chen, and E.Y. Chang. 2009. PLDA: Parallel latent Dirichlet
allocation for large-scale applications. In Algorithmic Aspects in Information and Management,
301–314. Springer.

134

H. Wang and C.X. Zhai

Wu, C.J. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11:
95–103.
Wu, Y., and M. Ester. 2015. Flame: A probabilistic model combining aspect based opinion mining
and collaborative filtering. In Proceedings of the Eighth ACM International Conference on Web
Search and Data Mining, 199–208. ACM.
Zhai, C., and J. Lafferty 2001a. Model-based feedback in the language modeling approach to
information retrieval. In Proceedings of the Tenth International Conference on Information
and Knowledge Management, 403–410. ACM.
Zhai, C., and J. Lafferty 2001b. A study of smoothing methods for language models applied to
ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, 334–342. ACM.
Zhai, C., A. Velivelli, and B. Yu. 2004. A cross-collection mixture model for comparative text
mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 743–748. ACM.
Zhai, K., J. Boyd-Graber, N. Asadi, and M.L. Alkhouja. 2012. Mr. LDA: A flexible large scale
topic modeling package using variational inference in mapreduce. In Proceedings of the 21st
International Conference on World Wide Web, 879–888. ACM.
Zhao, W.X., J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. 2011. Comparing twitter and
traditional media using topic models. In Advances in Information Retrieval, 338–349. Springer.
Zhao, W.X., J. Jiang, H. Yan, and X. Li. 2010. Jointly modeling aspects and opinions with a
maxent-lda hybrid. In Proceedings of the 2010 Conference on Empirical Methods in Natural
Language Processing (EMNLP ’10), Stroudsburg, 56–65. Association for Computational
Linguistics.
Zhu, J., A. Ahmed, and E.P. Xing. 2009. Medlda: Maximum margin supervised topic models for
regression and classification. In Proceedings of the 26th Annual International Conference on
Machine Learning, 1257–1264. ACM.

Chapter 7

Social Media Summarization
Vasudeva Varma, Litton J. Kurisinkel, and Priya Radhakrishnan

Abstract Social media is an important venue for information sharing, discussions
or conversations on a variety of topics and events generated or happening across the
globe. Application of automated text summarization techniques on the large volume
of information piled up in social media can produce textual summaries in a variety
of flavors depending on the difficulty of the use case. This chapter talks about the
available set of techniques to generate summaries from different genres of social
media text with an extensive introduction to extractive summarization techniques.
Keywords Social media summarization • Extractive summarization • Conversational summarization • Event summarization • Sentiment analysis • Attribute
extraction semantic similarity • Topic modeling

7.1 Introduction
Text Summarization is one of the prominent areas in the domain of Computational
Text Processing. The relevance of the field is of particular interest in the prevailing
era of social media than ever before, given the enormous amount of data available in
diverse styles and formats, from tweets, blogs to articles and news reports. Some of
these data such as tweets and posts of social media stand apart from the conventional
formal-styled texts, due to their highly informal, often non-grammatical usage.
Nevertheless, their prominence in terms of content are no less than any formal
document because of social media data are instantaneous, temporally and topically
relevant and sensitive to affairs of the world. This precisely makes the idea of social
media summarization interesting, despite the challenges posed by the data. In this
chapter we talk about the psychological perspectives about social media usage, then
discuss at length a wide range of issues pertinent to the field, present a coherent
description of various methodologies in prevalence and list out the variability in the
choice of summarization technique with the variability in data.

V. Varma () • L.J. Kurisinkel • P. Radhakrishnan
International Institute of Information Technology-Hyderabad, Hyderabad, India
e-mail: vv@iiit.ac.in; litton.jKurisinkel@research.iiit.ac.in; priya.r@research.iiit.ac.in
© Springer International Publishing AG 2017
E. Cambria et al. (eds.), A Practical Guide to Sentiment Analysis,
Socio-Affective Computing 5, DOI 10.1007/978-3-319-55394-8_7

135

136

V. Varma et al.

Section 7.2 presents an overview of general approaches to automated text
summarization with more emphasis on extractive summarization techniques. We
go on to describe the recent works on extractive summarization in Sect. 7.2.1 and
subsequently the nature of scoring function for candidate summary is discussed.
Section 7.3 is the final part which outlines the challenges involved in social media
summarization, General Approaches to Social Media Summarization, event summarization, sentiment analysis and summarization, conversational summarization
and emerging trends in social media summarization under Sects. 7.3.1, 7.3.2, 7.3.3,
7.3.4, 7.3.5 and 7.3.6.

7.1.1 Expressiveness of Social Media
According to Erikson’s psycho-social theory, the phases which characterize the
process of adolescent and adult development include the formation of identity and
the development of intimate relationships. Social networking sites allow people to
engage in activities that reflect their identity. Friendships, romantic relationships,
and ideology remain as key aspects of adolescent development. These identity
challenges of adulthood is addressed through self-disclosure, particularly with
peers.
Since online interactions offer a level of anonymity and privacy, which are
quite uncommon in actual interactions, people tend to express themselves more
openly in the relatively safe environment. (Kang 2000) has noted that, ‘Cyberspace
makes talking with strangers easier’. People with stigmatized social identities
(homosexuality or fringe political beliefs) may be inspired to join and participate in
online groups devoted to that particular identity, because of the relative anonymity,
safety in internet and the shortage of such groups in offline world (Bargh and
McKenna 2004).
Polarization of political opinions, social support groups for various causes,
intimate relationships are expressed by people more openly online than in the offline
world. This is due to relative insulation from identity disclosure, implicit trust
in the privacy of communication and disruption the reflexive operation of racial
stereotypes etc. (Kang 2000). Usage of online networks requires deep faith within.
It depicts about our trust that the information which we share will not be used in
unlawful or deceitful ways. We write open and confidential messages to our friends
and colleagues and believe that it will remain confidential. Due to these reasons, data
obtained from social media are more expressive of the people’s actual opininons
than in most offline interactions.

7.1.2 Need for Text Summarization on Social Media Data
Social Media interactions are instrumental in massive production and sharing of data
in the form video, images and text. This enormous amount of data can be utilized to

7 Social Media Summarization

137

identify implicit patterns in social behavior which can be utilized for social surveys,
business decisions or framing governmental policies.
The majority of data shared and produced by social media applications is in the
form of text prevalent widely as posts, comments or messages.[The data produced
by social media as a consequence of a particular pattern of social behaviour, can
be huge in size and noisy]. This data needs to be summarized and converted into
intrepretable forms so that the information contained can be utilized for practical
purposes.
The information can be reported in graphical forms like Histograms or Pie charts
which analyze the data on various parameters and present them statistically. But
laymen who are searching for the opinion of masses about a movie, an incident or
a retail product may be ignorant or impatient to interpret these representations. In
such a context, a noise-free textual summary, generated out this huge volume of data
makes it possible to leverage the information for the benefit of a layman end user
who can afford only a ‘skimming’ to grasp the information conveyed.
In other words, while statistical representations can effectively capture the information pertaining to various specific parameters from a large social media data,
a text summary aims to capture the information pertaining to contents of various
topics and present a coherent overview of those topics. For example a statistical
representation may rate the cinematography of a movie as good with 4 on a scale
of 5. But a textual summary may actually give an overview of what is good in
the cinematography: say, ‘the veteran cinematographer Rajiv Menon has displayed
sheer brilliance in the climax which received critical acclamation’.

7.2 An Overview of Autmated Text Summarization
A summary is a text that is produced from one or more texts, that conveys important
information in the original text(s), and that is no longer than half of the original text(s)
and usually significantly less than that. (Radev et al. 2002)

In an era of information explosion where a large number of sources of information co-exist and produce a significantly huge content overlap, there is an immense
necessity for auomatic means for summarizing this information so that a noisefree essence of the entire information available can be brought out. Automated
Text summarization techniques provide means for summarizing textual content and
are broadly classified into Extractive and Abstractive methods. Abstractive summarization Techniques convert the source text into an internal semantic representation
which in turn is utilized by Natural Language Generation techniques to generate a
summary which is equivalent to a human created summary. Due to the complexity
constraints of abstractive techniques, research community has been overwhelmingly
inclined towards extractive techniques. We will focus on extractive summarization
techniques in the remaining part of this section.
Extractive summarization approaches try to identify from the original corpus of
textual data, a proper subset of linguistic units, which can be the best representative

138

V. Varma et al.

of the original corpus within the constraints of a stipulated summary size. The
linguistic units can be sentences, phrases or a short textual entity like a tweet.
The research community of the field has approached the problem of auto- mated
summarization in a variety of ways, but most of them can be generalized to follow
three steps given below.
1. Creating an intermediate representation for the target text such that the key
textual features are captured. Possible approaches are Topic Signatures, Wordfrequency count, Latent Space Approaches using Matrix Factorisations or
Baysian approaches.
2. Using the intermediate representation to assign scores for individual linguistic
units within the text.
3. Selecting a set of linguistic units which maximises the total score as the summary
for target text.
Candidate summaries are those subsets of linguistic units in the original corpus
whose total size falls within the stipulated targeted summary size. The quality of a
candidate summary is estimated with a scoring function and the maximum scoring
candidate summary is chosen as the summary of the corpus. The scoring function
for candidate summaries for a generic summarization purpose is of the form:
F.S/ D œ  Coverage.S/  .1  œ/  Redundancy.S/

(7.1)

Or
F.S/ D

 Coverage.S/ C .1  /  Diversity.S/

(7.2)

where œ is a constant, S is a candidate summary. Coverage function positively
rewards the summary which covers maximum information from the original text,
Redundancy function penalizes a candidate summary for carrying a redundant
information and Diversity function encourages candidate summaries with diverse
information with higher values.

7.2.1 Recent Developments in Extractive Summarization
Extensive work has been done on extractive summarization which tries to achieve
a proper content coverage by scoring and selection of sentences. Typically these
methods extract candidate sentences to be included in the summary and then reorder
them separately. Most of the extractive summarization researches aim to increase the
total salience of the sentences while reducing redundancy. Approaches include the
use of Maximum Marginal Relevance (Carbonell and Goldstein 1998), Centroidbased Summarization (Radev et al. 2002), Summarization through Keyphrase
Extraction (Qazvinian et al. 2010) and Formulation as Minimum Dominating Set
problem (Shen and Li 2010). Graph centrality has also been used to estimate the

7 Social Media Summarization

139

salience of a sentence (Erkan and Radev 2004). Approaches to content analysis
include generative topic models (Haghighi and Vanderwende 2009; Celikyilmaz and
Hakkani-Tur 2010; Li et al. 2011a) and Discriminative models (Aker et al. 2010).
ILP2 (Galanis et al. 2012) is a system that uses Integer Linear Programming (ILP)
to jointly optimize the importance of the summary’s sentences and their diversity
(non-redundancy), while also respecting the maximum allowed summary length.
They use a Support Vector Regression model to generate a scoring function for
the sentences. Woodsend and Lapata (2012) arrived at a scoring function which
holds linear components to quantify the salience of bi-grams, salience of parse tree
nodes and a component based on a language model which penalises the unlikely
sentences. An approach based on the distribution of some important concepts in the
summary was done by (Berg-Kirkpatrick et al. 2011). The concepts are bi-grams
in the corpus to be summarised. They formulated an ILP objective function in the
space of candidate summaries that maximizes the total concept weight score of the
summary to be chosen.
Takamura and Okumura (2009) have treated multidocument summarization as
a maximum concept coverage problem with knapsack constraint (MCKP). They
have also exploited the possibility of decoding algorithms in solving MCKP in the
summarization task. Lin and Bilmes (2011) formulated summarization as a submodular function maximization problem in the possible set of candidate summaries
with due respect to the space constraint. The primary goal of all thes above methods
is to achieve maximum content coverage.
As far as sentence ordering is concerned, Li et al. (2011b) used context inference
to achieve better sentence ordering while (McKeown et al. 2001) used majority
ordering algorithm to sort sentences. (Lapata 2013) provided an unsupervised
probabilistic model for sentence ordering while (Ji and Yu 2013) used a cluster
adjacency based approach. One disadvantage in these approaches is that though the
sentence ordering approaches can achieve a topical order of sentences, the local
structural relations of the sentences are never captured.
The work which pioneered a holistic approach towards multi-document summarization by bringing sentence selection and coherence under a single umbrella
is G-Flow by (Janara et al. 2013). They built a graph which stored dis- course
relations with proper edge weights to quantify coherence. This value was linearly
combined along with salience and redundancy in the scoring function of sentences
to formulate multi-document summarization as a constraint optimization problem.
The system has taken into consideration the readability of the extracted sentences in
output summary by quantifying its coherence by means of discourse graph. This
has ensured the optimal content coverage with readability and coherence of the
sentences taken care of in the resultant summary.
Varma et al. (2011) and Jagadeesh et al. (2007b) utilized Hyperspace Analog to
language model to create a semantic space of words from word co-occurance based
statistics and effectively leverage this information for summarization. Chandan
et al. (2008) created a scheme for genearting personalised summaries on web
documents by utilizing user specific information according to the user’s subjective
information need. Chandan et al. (2009) formulated summarization as a decision

140

V. Varma et al.

making problem where a risk associated with the selection of sentence in terms
of information loss is estimated and the set sentences inducing minimum total risk
of selection generate the summary. Rahul et al. (2009) approached summarization
sentence position policy with an assumption that key sentences are present at
specific locations of the text.

7.2.2 Expected Nature of Scoring Function for Candidate
Summary
The scoring function of candidate summaries designed for an extractive summarization can be formalized as follows.
For a given corpus containing set of sentences V D fv1 ,v2 , : : : , vn g,
F : 2V ! R is a function that returns a real value for any subset S  V. And the
summarization function traces out a subset of bounded size which maximises F. i.e.
Ssum D arg max F.S/
SV

(7.3)

where jSsumj k and k ! Targeted summary size.
And this optimization is obviously NP-complete. An automated multi-document
summarization approach is expected to be scalable on large document set to produce a reliable summary. Lin and Bilmes (2011) observed the importance of monotone, submodular functions for extractive summarization process. It has been shown
by (Nemhauser et al. 1978) that if F is monotone, non-decreasing, submodular
function there exists a greedy approach which approaximates sum- marySsum such
that
 
F .Ss um/ >D .e  1=e/  F Sopt

(7.4)

where
Sopt D arg maxSV F.S/:
(Minoux 1978) has come up with a version of this algorithm which scales to very
large dataset. Submodular functions possess an interesting property of ‘diminishing
returns’ which can be formalised as follows.
For any A  B  V, and (v 2 V, v 62 A and v 62 B), if F is submodular,
F .A C v/  F.A/

F .B C v/  F.B/

(7.5)

i.e. the value addition induced by v decreases as A grows to B. And F is nondecreasing if,
8A  B; F.A/

F.B/

(7.6)

7 Social Media Summarization

141

A monotone, non-decreasing submodular functions (MND) has an additional
property that a resultant function formulated as a weighted sum of several MND
submodular functions, will in turn, be a monotone
submodular function if weights
P
used are positive real numbers. i.e. F D i (˛ i  Fi ) is submodular if each of the
Fi is a monotone, non-decreasing function. For all i, ˛ i > 0. This is of significant
importance to summarization, as in most of the cases, the scoring function of
sentences utilised for extractive summarization is a weighted sum of a function
which estimates Topical Coverage and another function which maximises topical
diversity.
For a generic summarization purpose, Lin and Bilmes (2011) used the following
function
F.S/ D L1 .S/ C R1 .S/

(7.7)

Here L1 (S) and R1 (S) are given by

L1 .S/ D

X
i2V

min

8

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Language                        : EN
XMP Toolkit                     : Adobe XMP Core 5.2-c001 63.143651, 2012/04/05-09:01:49
Create Date                     : 2017:04:04 16:23:02+05:30
Creator Tool                    : Adobe InDesign CC 2017 (Windows)
Modify Date                     : 2017:04:04 17:09:48+05:30
Metadata Date                   : 2017:04:04 17:09:48+05:30
Producer                        : Adobe PDF Library 15.0
Format                          : application/pdf
Document ID                     : uuid:2ac00326-1e37-4899-966d-831fad5d07f2
Instance ID                     : uuid:d2771004-5fec-461f-bf50-741add5d2ca1
Rendition Class                 : default
Version ID                      : 1
History Action                  : converted
History Instance ID             : uuid:04c0ae38-159d-42d4-9e10-2a837c5a7e4a
History Parameters              : converted to PDF/A-2b
History Software Agent          : pdfToolbox
History When                    : 2017:04:04 17:09:47+05:30
Part                            : 2
Conformance                     : B
Schemas Namespace URI           : http://ns.adobe.com/pdf/1.3/
Schemas Prefix                  : pdf
Schemas Schema                  : Adobe PDF Schema
Schemas Property Category       : internal
Schemas Property Description    : A name object indicating whether the document has been modified to include trapping information
Schemas Property Name           : Trapped
Schemas Property Value Type     : Text
Page Layout                     : SinglePage
Page Mode                       : UseOutlines
Page Count                      : 199
Creator                         : Adobe InDesign CC 2017 (Windows)
EXIF Metadata provided by EXIF.tools

Navigation menu