A list of common terms and definitions used in the Kapiche product.

Josh Winters avatar
Written by Josh Winters
Updated over a week ago


An analysis is the result of running Project Data through the Kapiche Analytics engine with specified settings. Analysis settings include things such as the date range and text fields being analyzed in a particular analysis. An analysis will always use all available data within the Project it is being run in.


A Concept is a meaningful term which occurs at a relatively substantial frequency in the dataset.


Correlations are the likelihood of a theme to have a relationship to a concept or segment. In other words, how likely or unlikely that a concept or segment is to appear in the theme.


A dashboard is a screen with information populated by the Themes a user has selected.

Data Coverage

Data Coverage refers to the percentage of total records that an element (query, concept, term, segment, etc) appears in. A concept, term or phrase is counted only once per text excerpt even if it appears multiple times within the excerpt.


A field, or variable, is the label given to a column header within a data file. Examples of fields include: Age, Date, Gender, Satisfaction_Score. Within a field can be multiple segments, or values of a field.


Frequency is the number of counts something has occured. A concept, term, or phrase is counted only once per text excerpt even if it appears multiple times within the excerpt.


Influence is a measurement of how much our observation of co-occurrence between two concepts/segments exceeds expectations.


An n-gram, or phrase, is a string of two or more terms. N-grams that appear in an analysis will be the phrases that appear multiple times in a dataset (like everything the Kapiche analytics engine does, it is based on the data and not predetermined). Examples of n-grams: "loading screen", "queue time", "mobile app".


When data is normalized, it eliminates the units of measurement in order to make variables more comparable to each other. When the Normalize option is active on certain visualizations, the data is graphed in terms of its relative frequency at the data point. Since each data point may have a different total number of records, this allows you to view the underlying frequency trends regardless of the amount of data collected at each point.


A project is a container for related data and its respective analyses. Each Project has its own user list, known as Project Members.

Project Member

A user of the Site with access to a specific Project.

Project Data

Data that has been added to a Project by uploading files or through an integration. Project data must follow the same column structure, as analyses being run always use all Project data. Mixing unrelated data files in a Project will result in analyses being uninterpretable.

Raw Freq.

Short for "Raw Frequency"; the raw occurrence values are shown rather than a percentage.


A record, or row, is one entry in a data source.


A segment is a value of a field. For example, "Male" would be a segment of the field "Gender".


Sentiment is an indication of the emotion within the unstructured data. See the article on Sentiment for more information.


Stopwords are words that are identified prior to and excluded from analysis because they do not add value to the language model. Examples of stopwords include: "a", "the", "I". Analysts have full control over the stopwords at the analysis level.

Structured Data

Structured data refers to any data that sits within a fixed field in a data file. Examples of structured data include: Gender, Date, and Location. Any text that is not freely written and is instead more or less 'chosen' sits under this category of data.

Text Excerpt

A text excerpt is the data from a text field in one record of a data source. A record may include multiple text excerpts if the file has more than one text field. In a customer survey context you can think of a text excerpt as the written response of a customer to a question in the survey.

Unstructured Data

Unstructured data is data that does not have a predefined model; data that has no fixed selection. In the context of Kapiche: unstructured data is free-text data.


A user is any account with some form of access to your Kapiche Site. This is different to a Project member, which is specific to a Project. All Projects members are users on a Site, but not all users are members of every Project.


An open-ended or unstructured verbatim refers to a piece of text that a human has written, like an email message or when someone answers a survey question like "Why did you give us that score?"

Did this answer your question?