What's the minimum number of verbatims you can analyse in Kapiche?

The short answer is that we recommend having about 500 records or rows of unstructured data (e.g. 500 survey responses, app reviews, emails, etc) in order to generate a nice and robust language model that you can work with.

The longer answer is that it largely comes down to how rich the verbatims themselves are in each data set (e.g. you can run meaningful analyses on data sets with only 50 records if they're rich in language).

For example if you run an analyses on 100 records that are 3-5 words on average long, your language model might not be that valuable and thus you're probably better off just manually reading through the data.

But if those same 100 records were more like sentences or paragraphs, then you've got a whole lot more data to work with in Kapiche.

With this in mind, perhaps a better rule of thumb to go by is that if you can't get what you need by eye balling the data in a feasible time frame, run it through Kapiche to see how it performs!

Often it's not so much the quantity that matters, it's the quality, too!

Tip: The only thing to keep in mind when running an analyses on smaller data sets is the sample size - the less individual records you have, the smaller your sample sizes.


Questions? 🤔

If you have any questions about Verbatim Counts (or if you just need some help!) you can get in touch with us any time by hitting the blue chat button to your right 👉

Did this answer your question?