The Operators section in the Query Builder provides specialized controls and filters for use when building Themes.
Word Count Operator
The Word Count operator gives you the power to filter the unstructured text based on how many words Kapiche has detected in a particular verbatim or sentence. This operator behaves similarly to a numerical field, allowing us to use the standard set of filters available for numerical data.
You can choose one of the preset options of values or any positive whole number by typing it in the value field.
This operator also respects the verbatim/sentence level setting. For example, if we use
Word Count < 3, on a sentence level, we can expect the following results:
The same search result on the verbatim level will apply the filter to the entire text of the verbatim.
It's important to note that the word count operator acts upon words detected by Kapiche and not every sequence of characters will be detected as a word. In the first verbatim above, even though there are 3 distinct character sequences, these are not detected as words, so the filter matches the verbatim.
Email addresses such as
firstname.lastname@example.org or even redacted ones,
********@gmail.com are valid tokens that contribute to word count.
When doing a search for e.g. Word Count is less than 2 and you have multiple text fields in your Analysis, this will match on any of those text fields. So if one of those text fields is primarily empty, it will match nearly 100% like in the example below. The other thing to note is the top number is the match on records or document counts, which is rows in your data which could have multiple text fields / columns. The other number to note in the example below is in the Verbatims widget, this shows frames or verbatim count and percentage, but doesnt take into account empty frames.