+352 2674 554 41Mo. - Fr. 8:00 - 18:00 Uhr

Sentiment Analysis 2.0

You are currently viewing Sentiment Analysis 2.0


Sentiment analysis is arguably the most common and prominent tool for the analysis of web and social media data. Before coding data with other parameters such as topics or author groups, sentiment mapping is the most obvious. Thus, the sites are classically evaluated according to the sentiments neutral, negative or positive. Some assignments also include the possibility of evaluating a reference as ambivalent, since it is partly critical (negative) but also positive. In other classifications, a location is broken down into smaller content parts and, depending on the content, receives several sentiment ratings in the form of the classic division of neutral, positive and negative.

In addition, the question of analysis quality also arises at this point. Some tools offer so-called AI engines that propagate a sentiment accuracy of over 95% with the help of artificial intelligence. In an internal test, we humanly cross-checked a panel of data at four different points in time over a 12-month period, processed using an AI engine, with analysts. However, the average correctness of the sentiment was only just over 60 % (!) and not the aforementioned 95% as advertised.

This aspect is not pursued further in this article, but should be mentioned here as food for thought. Regardless of whether human or machine analysis – in our opinion, it is important to look at and evaluate the sentiment in a more differentiated way. This approach and our solution is part of the following article.



The idea of a more differentiated view of sentiment came to us, as with so many things, through our daily practical work. Time and again, we noticed that brands and products were rated very well or very poorly in comparison to their competitors. Often, the negative or positive effects were not as drastic as they appeared in the reports. So we did a little research to back up our feelings with facts. We quickly discovered that there was a relatively large amount of positive content for a brand, for example, but that this content was only available on a relatively small number of different domains. In the same way, we found that other brands were also rated positively, but were talked about positively on a relatively large number of different domains. We were also able to determine the same for the negative evaluations of different brands and products.

Based on these observations, we felt that apart from the number of mentions per sentiment, it is also necessary to take the spread of positive and negative mentions on different domains into consideration in order to achieve a more differentiated sentiment assessment. Our assumptions are as follows:

  • If positive or negative mentions of a brand or product exist on a relatively large number of different domains, these are “highly positive” or “highly negative” mentions.
  • If positive or negative mentions of a brand or product exist on a relatively small number of different domains, these are “slightly positive” or “slightly negative” mentions.

The assumptions imply that positive or negative mentions on several different domains leave a generally more positive or more negative image of the brand or product, as the potential audience reach is higher. Conversely, it is implied that positive or negative mentions on fewer different domains leave a less positive or less negative image of the brand or product, as the potential audience reach is lower.


For the implementation, we make use of statistical tools. Firstly, the arithmetic mean  (μ) is used for the positive and negative references as well as the number of different domains in the past 24 months. Furthermore, the spread (standard deviation σ) of the positive and negative references as well as the number of different domains in the past 24 months are taken into consideration.

With the help of these two statistical calculations, we were able to compare a recent time period (e.g. the current month) with the corresponding parameters of the past 24 months and derive conclusions. With the help of a scoring model from 0 to 6, we were also able to define the thresholds from which the positive or negative sentiment should be rated as slightly or highly positive or slightly or highly negative. In practice, we adapt the influence of the spread to the specific use case. In some use cases, for example, the thresholds are defined for 0.5, 1 and 1.5 standard deviations.


We assess the sentiment in five levels:

  • Neutral
  • Slightly negative
  • Highly negative
  • Slightly positive
  • Highly positive

This more complex and work-intensive analysis is conducted especially at the overall reference level for a brand or product or only at site type and topic level in order to deliver more differentiated statements on the prevalent moods.

Source: CURE S.A.


Leave a Reply

eighteen + 8 =