Sentiment analysis: what, why, how and issues


Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials
- Wikipedia.com

Sentiment analysis is essential to determine the attitude of a speaker or a writer of the text. If it is in electronic media, the “text” might be referred to blog post, web article, tweet, comment or a review on a product and etc. The speaker or writer is referred to the blogger, tweep, commenter or the consumer in the context of internet era.


The attitude can be,
  • Writer’s own judgment or evaluation – The judgment/evaluation of the writer for particular aspect of a product or brand
  • Affective state – emotional state of the writer when writing
  • Emotional Communication – the emotions the writer wishes to communicate to the reader.[3]

Why sentiment analysis?


The explosion of social media has created unprecedented opportunities for citizens to publicly voice their opinions, but has created serious bottlenecks when it comes to making sense of those opinions [4].
Analyzing of the content in social media plays a significant role for market researchers and consumers who are always seeking for high quality products and content in the web. The customer reviews are very critical for a product in an E-commerce site since the sales are mostly dependant on them. Most of the buyers refer to the product reviews before purchasing the product. The reviews are important not only for the consumers, but also for the managers and market researchers since analyzing the web content is very significant and essential in order to take the further decisions on a brand, measure the impact of promotional campaigns and identify the aspects needed to be improved.
Not only for a single site, but also for analyzing the overall web content from several sites is important in the context of identifying the “Mood” of the web in a special event such as election, world cup, release of a new movie, musical show and etc. For example during an election period people tend to tweet or post comments on their opinion regarding the candidates. General public, especially the campaign organizers are able to get a clear intention of the “mood” of the people and take necessary actions accordingly. Also it is important for political analyses and commentators to predict on the election results. Since the feed is publicly available anyone does not need to explicitly ask from the people regarding their opinion or do unnecessary surveys which consume money and time. This is valid for any other public events in which the number of participants is larger.
The review of the audience is important in a conference or a lecture. It helps for the speaker for further improvements and to maintain the reputation. Also the organizers can select the best speakers, considering the feedback from the audience.

Ways of doing sentiment analysis


The conventional way of doing sentiment analysis is through natural language processing (NLP). It is basically done by extracting a text and determines its subjectivity and polarity (negative or positive) with a numeric value. In most fine grained analysis the polarity strength also comes in to the play. i.e weakly positive, mildly positive, strongly positive[4]
Most of the NLP based applications are based on keyword based analysis. i.e. a set of words (word bag) are used as reference for analysis. The words are labeled according to negative or positive polarity. For example the words such as “happy”, “great” are classified as positive and the words such as “poor”, “bad” are classified as negative. Aggregating the polarities implied by those words in a particular text is used to determine the overall sentiment.
This is very primitive approach and seems to be brittle. [6] Because maintaining a bag of word is quite difficult and negation of the word can occur at any location of the text. Proximity analysis helps with this, "look for any negations in the sentence within two words from happy." The problem is more rules to write and the sentences where the negation is more than a few words away are misclassified. [6]
Exhaustive Extraction is one of the patented mechanisms invented by Attensity [5] which is more reliable and accurate to extract the indicatives of sentiment and the relationships between them. This will first extract the topic specific feature and then the sentiment associated with it and the relationship between the subject and the sentiment.
Word classification based approaches which are discussed above are simple, but the results are not very much reliable and accurate. For example, the sentence “How could anyone sit through this movie?" contains no single word that is obviously negative. Thus, sentiment seems to require more understanding than the usual topic-based classification. [1] Therefore, Some machine language approaches like, support vector mechanism (SVM) are been used for sentiment analysis. [1]
A research has been done for document level sentiment analysis over sentence level sentiment analysis. A subjectivity detector has been used to extract the subjective sentences and omit objective ones (basically the factual sentences). Then the extracted sentences are sent through a sentiment classifier which determines the polarity of each subjective sentence. [2]

Tools available in Market


The market for sentiment analysis tools is crowded with large number of service providers. Chatterbox [8] and Viralheat [10] are commercial products available for sentiment analysis, which provide a REST APIs. Some open source tools are also available and hosting has to be maintained by the user.[9]
The common feature in available sentiment analysis platforms (irrespective they are free, commercial or open source) is that those are exposing an API which takes language and the sentence as inputs and outputs a classification (a numerical value/may be between 0-1) for that sentence.
There are web sites which provide sentiment analysis as SAAS solutions. Veooz [7] is a real time sentiment analysis tool which seems to be accurate and reliable. This has been used in last American presidential election campaigns to see the mood of social media. This takes twitter and public facebook feeds to analyze. This is comparatively reliable to Twitrratr (http://twitrratr.com) which uses keyword based sentiment analysis.

Issues and challenges


Due to the ambiguity in expressing ideas with words, accurate sentiment analysis has become a hard task. In some cases, even humans have a hard time understanding the sentiment of what someone else is saying. Commonly used statements when humans interact: "What do you mean?" "What are you trying to say?”[6] Also due to the unnecessary information in a text, the review or the comment becomes very crowded and it is difficult to omit the objective data from those. The way of communication might be different in different cultures and due to that the same way of analysis might not be valid through different nations.
Identification of fake reviews, outliers and the reputation of the reviewer is a problem in most of the system.
Since the sentiment analysis is mostly based on NLP techniques, it consumes higher computing power. This is quite challenging in real time analysis situations and the systems have to be well architected such that the scalability and the robustness are preserved. Mostly it might be needed to use parallel computing. Most sentiment analysis platforms which provide hosted solutions charge quite high amounts due to high demand for computational power.

References


[1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, Stroudsburg, PA, USA, 2002, pp. 79–86.
[2] B. Pang and L. Lee, “A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA, 2004.
[3] “Sentiment analysis,” Wikipedia, the free encyclopedia. 28-Feb-2013.
[4] David Osimo and Francesco Mureddu, “Research Challenge on Opinion Mining and Sentiment Analysis.”
[5] “Exhaustive Extraction | Attensity.” [Online]. Available: http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/. [Accessed: 12-Mar-2013].
[6] “Sentiment Analysis, Hard But Worth It! | CustomerThink.” [Online]. Available: http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_it. [Accessed: 12-Mar-2013].
[7] “Real-time Social Media Search and Analytics | Veooz.” [Online]. Available: http://www.veooz.com/. [Accessed: 12-Mar-2013].
[8] “Chatterbox - Sentiment Analysis for Social Media API.” [Online]. Available: http://chatterbox.co/sentiment-analysis-social-media-api/. [Accessed: 12-Mar-2013].
[9] “sentiment-analyzer,” GitHub. [Online]. Available: https://github.com/madhusudancs/sentiment-analyzer. [Accessed: 12-Mar-2013].
[10] “Social Media Simplified | Viralheat.” [Online]. Available: http://www.viralheat.com. [Accessed: 12-Mar-2013].