We’ve put together a list of our suggested best practices to follow when it comes to formatting your data to help you get the best results with Kapiche.
Let's take a look!
Supported file formats
Data should be in a spreadsheet format: csv, xls or xlsx.
Note: If using an xls or xslx spreadsheet with multiple sheets, the data to be analysed by Kapiche will need to be in the first sheet.
File size limits
At the moment, data uploads are limited to 500K records and a max file size of 500MB for CSV files (100MB for xlsx files) per file. This means that if the total number of records you would like to upload exceeds this, you will need to upload them in batches.
1) Each column should correspond to a specific field (e.g. "Age", "Location", etc) and fields should not be repeated or spread across multiple columns (e.g. for “Gender” there should be a single column for that field, rather than having unique columns for Male, Female & Other).
2) The first row in your spreadsheet needs to be column headers (every column requires a heading - the headings cannot be blank).
3) Each row (after the first) should correspond to an individual survey response, support conversation, product review, etc.
4) There should be at least one free text (unstructured/verbatim) field and at least 500 free text responses (note: this is not a hard-and-fast rule - our system can definitely support smaller quantities, particularly if they’re rich with text/are detailed responses).
Things to avoid
1) Non-numeric values for scores / rating fields; e.g. “10 - Great” instead of “10” (when non-numerics are included as part of the same column our system will not be able to generate important tools & insights (such as NPS benchmarking and tracking).
2) Mixing numeric values for scores with non-numeric values (e.g. if most of your responses are numeric values but some of them also include a text label, Kapiche won't be able to calculate or include the responses which have both).
3) Ambiguous date formats (e.g. 01/10/2020 could mean the First of October or the 10th of January) - to prevent this from happening, we recommend following the YYYY/MM/DD format for your dates!
4) Fields that are split across multiple columns; e.g. “Gender” is split across two columns: “Gender: Male” and “Gender: Female” with values “1” and “0” instead of a single “Gender” column with values “Male” and “Female”.
5) Rows that don’t correspond to a record of customer data (e.g. titles, dates & other descriptive information about the document or data in the first rows, a “totals” row that sums up the values of numeric columns, etc.)
Our suggestions above are quite general in nature as every data set is unique, so if you have any questions about your specific use-case be sure to reach out to us by using the chat icon towards the bottom-right of any screen so that we can offer you more personalized guidance! 👉