Including List Fields in Your Dataset

What Are LIST Fields?

LIST fields are a flexible way to store multiple values within a single field or cell in your dataset. Unlike a regular single-value categorical field (like "Region" with values such as "City" or "Regional"), a LIST field can contain multiple values. For example, a dataset about customer support interactions could have a LIST field for "Reason for contact" with the values "account access, balance inquiry, transaction inquiry" all in one cell.

How LIST Fields Are Processed

After you upload you dataset to Kapiche you will see the "Select Column Data Types" screen. During this stage Kapiche will automatically identify any list fields and assign the "LIST" data type. You can also assign this data type by clicking on the data type dropdown. This field will now show throughout Kapiche where you would expect to see any standard categorical field, available for analysis and aggregation.

While the processing is automatic, understanding the formatting rules can help you avoid common errors.

1. Formatting for Comma-Separated Values

The standard format for LIST fields is a simple comma-separated list.

Comma-Separated Values: Use commas to separate individual values e.g. "bakery, deli, confectionary".
Natural Language Conjunctions: The words "and" and the ampersand symbol "&" are automatically treated as commas.

2. Handling Complex Values with Quotes

If a list item itself contains a comma, an "and," or an "&," you need to enclose the entire item in quotes to prevent it from being split incorrectly. You can use either single quotes ('...') or double quotes ("..."), but don't mix them within the same item.

Example with a Comma: To ensure "Smith, John" is treated as a single item, use quotes: "Smith, John", "Doe, Jane"
Example with a Conjunction: To keep "bread and butter" as a single item, use quotes: "bread and butter", salt, pepper

3. Automatic Processing

Our system automatically handles common formatting issues to simplify your data preparation.

Whitespace: Extra spaces around commas are automatically removed. apple , banana , orange becomes ['apple', 'banana', 'orange']
Empty Values: Any empty items in the list will be automatically discarded. apple, , banana, orange becomes ['apple', 'banana', 'orange']
Case Sensitivity: The original capitalization of your data is preserved. Apple, BANANA, CamelCase becomes ['Apple', 'BANANA', 'CamelCase']

4. Best Practices

Use quotes when your items contain:

Commas: "Smith, John"
Conjunctions: "bread and butter"
Leading/trailing spaces you want to preserve: " padded item "
Special characters: "item@domain.com"

How LIST Fields Affect Aggregations

When you create an aggregation or a chart based on a LIST field, you might notice that the total counts are greater than the total number of records in your dataset. Don't worry, this is expected. Since a single document can have multiple values in a LIST field, it will be counted multiple times in the aggregation—once for each value it contains.

For example, if you have a dataset of 1,000 survey responses and a LIST field for "Which services have you used?" an aggregation by this field might show "Online Banking (600)," "Mobile App (450)," "ATM (300)" and "Customer Support (200)," totaling 1,550.

If You Experience an Error Message

If you encounter processing errors, they're usually due to:

Mismatched quotes: Check that all quotes are properly closed
Unexpected characters: Some special characters might need to be quoted
Empty fields: Make sure required LIST fields aren't completely empty
Encoding issues: Ensure your file uses consistent character encoding (UTF-8 recommended)

Creating a Project & Uploading Data

How to use the Export API