How Sentiment is Processed

In order to assess sentiment at Kapiche, the uploaded data is broken into data units. Once this process is completed, the data is then parsed by an external Neural Networks cloud engine. This engine assigns a score and magnitude value to each text excerpts in the data collection.

Per-Excerpt Sentiment Calculation

The sentiment score of the text excerpt indicates the overall emotion of that excerpt. The value assigned is between -1 and 1. A negative value is considered to indicate negative sentiment while a positive value is indicative of positive sentiment. The magnitude value is an absolute value that indicates how much emotional content is present within the text excerpt. The range of the value is from 0 to positive infinity and is proportional to the length of the document.

Labelling Excerpts

Using these values, the Kapiche analytics engine then goes through and assigns a polarity label to each text excerpt. The labels assigned are either Positive, Negative, Mixed, or Neutral.

  • A Positive label is assigned to a text excerpt that has a sentiment score equal to or above 0.25 and a magnitude above 0.75.
  • A Negative label is assigned to all text excerpts that have a sentiment score equal to or below -0.25 and a magnitude above 0.75.
  • A Mixed label is assigned to all text excerpts that have a sentiment score equal to or between -0.25 to 0.25 and a magnitutde of 1.00
  • A Neutral label is assigned to all other text excerpts that do not fall within the above labels.

The sentiment score parameters are shown in the figure below:

Displaying Sentiment

Once these labels are assigned we are now able to generate the data we need to create the visualisations. Sentiment is displayed using the percentage of each Polarity (Positive, Negative, Mixed and Neutral) within the respective page.

In order to determine the overall sentiment of the data collection we need to determine the frequency in which each sentiment label appears the data collection. We then divide these frequencies by the total number of text excerpts in order to determine the relative frequency of each label in the data collection. The formula used to calculate this is below:

The sentiment data is also used to calculate the overall sentiment for any query result. The overall sentiments are calculated using a similar formula as above. The polarity frequency is determined by summing the number of text excerpts that are in the query results and contain the polarity label being explored. The frequency is then divided by the total number of text excerpts in the query.

Did this answer your question?