How to do sentence-level querying?

Overview

The Query Builder now also allows you to match the query against sentences, rather than whole verbatims. The Query Builder is accessed via the Query screen, where you will see the list of Themes that you have built.

In both cases, whether matching is performed at the verbatim level or the sentence level, results on the Query Builder and Dashboard are presented as verbatims, and statistics are calculated on verbatim counts. To choose the “match” setting you would like to apply, click on the dropdown on the Query Builder screen, as shown below.

Query Matching at Sentence-level and Verbatim-level

Match on Verbatim

Match on Verbatim is the mode that Kapiche utilized prior to the introduction of sentence-level matching. Essentially, it means that a verbatim will be considered a valid “hit” if the query criteria is met across the full verbatim text.

For example, if there is a query defined as quality AND food , The following verbatim would be considered a valid query result:

I need to say something about the food. The quality was impressive!

Even though the terms quality and food appear in different sentences, the verbatim is still a match because Match on Verbatim applies across the whole verbatim: as long as a verbatim contains both terms anywhere in the text, it will match. For the example above, this is a correct match.

However, Match on Verbatim can be undesirable in some cases. Consider the following verbatim:

I didn’t have any food so I can’t comment on that. The quality of the music was great though.

The above verbatim would also be a match, because the verbatim contains both terms. However, it is clearly an incorrect result because the term food and the term quality are being used in different contexts. This example shows why the Match on Sentence feature has been introduced.

Match on Sentence

Sentence matching is a new query behavior that we have introduced that applies the query AND operator within the context of individual sentences. This means that you can now have tighter constraints on your queries and limit them to the text of a single sentence.

Consider a dataset containing reviews for a particular airline. We want to know about the quality of the food served by that airline, so we construct the Theme quality AND food. Now consider 2 verbatims in the dataset as below:

Verbatim 1 - The quality of food onboard was lackluster. The drinks selection on the other hand was rich.

Verbatim 2 - The earphones provided were of the cheapest quality possible! The cabin crew was friendly though. Could not comment on food since I didn’t have any.”

In the above example, both verbatims contain the terms quality and food. , but only the first verbatim matches the intention of the Theme. If we use Match on Verbatim as our query constraint we would get both Verbatim 1 and Verbatim 2 as results since both verbatims contain the words food and quality. However, with the new Match on Sentence option, only the first verbatim will be included in the results.

Sentence-matching makes a difference in AND constraints

The OR operator in Kapiche creates a union of two different sets and will not be affected by the selection of Match on Sentence. In the above example, if we change the query to quality OR food, any verbatim would be a hit if any sentence within that verbatim contained either term. With an OR constraint, there is no difference in the verbatim results regardless of whether Match on Verbatim or Match on Sentence is used.

Sentence-matching and “does not include” constraints

In the Query Builder, a “does not include” constraint is a NOT operator. When using Match on Verbatim, the NOT operator rejects verbatims that match the associated query logic. However, when using Match on Sentence the rejection is applied at the sentence level.

This creates a subtle and potentially unexpected behavior, so let’s work through an example.

Suppose we are analyzing the same dataset with airline reviews as before and we want to look for comments where respondents mention being cold, but we want to make sure to exclude when they are talking about food. We construct a query as: cold AND DOES NOT INCLUDE food and apply the query across the following verbatims:

The food was cold
The cabin temperature was too cold. The food was delicious.
I was feeling cold so the cabin crew gave me an extra blanket.

Match on Sentence will provide the behavior we are looking for by returning verbatims that mention cold, but not in the same sentence as food. So in this example, the second and third verbatims would be returned, but the first wouldn't. However, if we chose Match on Verbatim, then the second verbatim would also not be returned because it includes food and cold within the same verbatim.

Let's try another example where we would prefer the Match on Verbatim behavior. In this example we notice that cabin crew is mentioned extremely frequently in the dataset, often occurring alongside other concepts. We want to see what concepts are mentioned in the absence of cabin crew, so we construct a query NOT cabin crew and apply it across the following verbatims:

The cabin crew were calm, collected and quite comforting. They were really friendly and had everyone’s needs taken care of.
Cabin crew were friendly. The food was extremely salty.
I loved how the cabin crew treated everyone.
Great selection of movies

If Match on Verbatim is selected, the first three verbatims would be excluded from query results. However, if Match on Sentence is selected, then the first two verbatims would actually appear in the results because both verbatims contain individual sentences that do not contain cabin crew.

Sentiment Search on Sentences

Match on Sentence will also change the behavior of sentiment search, causing it to filter results based on the sentiment value of individual sentences instead of the verbatim as a whole. This can be especially useful for looking for specific verbatims where a concept is used directly in a positive or negative context as opposed to within a generally positive, negative, or mixed verbatim. It's important to note that the Sentiment Widget will always show sentiment statistics for verbatims regardless of whether you use Match on Verbatim or Match on Sentence.

Another important detail to note is that verbatims that have been classified as mixed are made up of positive and negative sentences, so you may see an increased number of results when searching for sentiment at the sentence level.

Let us consider an example that illustrates this behavior:

The food served onboard was delightful. The cabin crew were friendly as well. The seats were however uncomfortable and there was a poor collection of drinks.
The flight was relaxing and the food was good. However the flight was delayed by 2 hours which messed up my schedule overall.
The cabin crew were extremely rude when I asked for a refill for my drink. The food was up to the mark and boarding was handled with care.

All 3 verbatims present above are classified as having mixed sentiment since they contain both positive and negative sentences. Now let's say we search for NEGATIVE SENTIMENT.

With Match on Verbatim none of the above 3 verbatims will show up in the results since the verbatim label assigned to them is mixed. However, if we use Match on Sentence, all 3 of these verbatims will be part of the results. This is because all 3 of them contain at least 1 sentence that has a negative sentiment attached to it.

Questions? 🤔

As you can see, this new capability greatly increases your control over queries and we hope these examples provide the context to help you use it effectively. If you have any questions about using sentence level matching (or anything else!) you can get in touch with us any time by hitting the blue chat button to your right 👉

What is Sentiment and how is Sentiment calculated?

How to use and get the most out of your Kapiche Queries

How do I compare queries?

How to build meaningful Queries in Kapiche

What are Operators