Kapiche can automatically transcribe audio recordings of calls, converting spoken conversations into analyzable text data. This allows you to analyze call recordings alongside your other customer feedback sources within a single project.
Supported Audio Formats
Before uploading, ensure your audio file meets the following requirements:
File formats: MP4, M4V, FLAC, MP3, WAV
Maximum file size: 550 MB per file
Files larger than 550 MB cannot be processed and will need to be split into smaller segments before upload.
Uploading Audio Files
Audio files are uploaded using the same process as standard data files. The system automatically detects audio formats and routes them for transcription processing.
Step 1: Navigate to Your Project
Open the project where you want to add the transcribed call data. If you haven't created a project yet, see Creating a Project & Uploading Data.
Step 2: Add Your Audio File
Click Add Data and select your audio file from your computer. The system will validate the file format and size before uploading.
Step 3: Upload and Processing
Once you select a valid audio file:
- The file uploads to secure storage
- You'll receive a confirmation message: "Audio file uploaded. Transcription is processing in the background."
- The file status displays as "Transcribing"
- You can navigate away from the page while transcription continues
Step 4: Transcription Completion
Transcription processing time varies based on the audio file length. When complete:
- The file status changes from "Transcribing" to "Finished"
- The transcribed text becomes available in your project
- You can analyze the transcribed content using all standard Kapiche analysis features
Common Issues
File Size Limit Exceeded
Error message: "File exceeds size limits. File: [size] MB, Limit: 550 MB"
Resolution: Use audio editing software to split the recording into segments under 550 MB, or re-encode the audio at a lower bitrate to reduce file size.
Unsupported File Format
Error message: "File format not supported"
Resolution: Convert your audio file to one of the supported formats (MP4, M4V, FLAC, MP3, or WAV) using audio conversion software.
Extended Processing Time
Transcription processing time scales with audio length. As a general guideline, a 60-minute call typically processes within several minutes. If your file remains in "Transcribing" status significantly longer than expected, contact support.
Understanding Your Transcribed Data
Output Format
When transcription completes, Kapiche generates a CSV file containing your transcribed text and metadata. The original audio filename is preserved with `_transcript.csv` appended.
Example: `customer_call.mp3` becomes `customer_call_transcript.csv`
Standard Output Fields
All transcribed files include these fields:
call_transcript: The complete transcribed conversation with speaker labels. Each speaker's dialogue is formatted as separate lines:
Agent: How can I help you today?
Customer: I'm calling about my recent order.
Agent: I'd be happy to help with that.
filename: Your original audio filename, for traceability
format: The detected filename format (see Filename Conventions below)
transcription_id: Unique identifier for the transcription
date: Recording date extracted from your filename
Speaker Identification
Kapiche automatically identifies and labels speakers in your calls:
- Agent: Customer service or sales representatives or Agent 1 and Agent 2 if there is transfers.
- Customer: The caller
- Interactive Voice Response: Automated IVR systems
- Bot: If there is a voice AI agent used in calls
This speaker identification allows you to analyze agent responses separately from customer feedback.
Filename Conventions for Enhanced Metadata
Using standardized filename formats enables Kapiche to extract additional metadata from your call recordings.
Five9 Format
If your files follow Five9 naming conventions, Kapiche automatically extracts additional fields:
Format: `phone_number by agent@email.com @ H_MM_SS AM/PM.wav`
Example: `8284931690 by roger.smith@example.com @ 2_07_35 PM.wav`
Extracted fields:
- phone_number
- agent_email
- start_time
- date
CXone Format
For CXone (NICE inContact) recordings:
Format: `CXone recording_Agent Name_YYYY-MM-DD_HH-MM[UTC]_uuid.mp4`
Example: `CXone recording_Leeann Pomeroy-Jones_2025-04-09_07-22[UTC]_abc123.mp4`
Extracted fields:
- agent_name (full name)
- agent_display (first name + last initial)
- date
- datetime
Generic Date Extraction
If your files don't match Five9 or CXone formats, Kapiche attempts to extract dates using these patterns:
Unix timestamp: `1609459200000.wav` (milliseconds since epoch)
ISO date format: `recording_2023-12-25.mp4`
Underscore format:** `audio_2023_12_25.wav`
Compact format: `call_20231225.wav`
If no date pattern is detected, Kapiche uses the upload date.
Best Practices
Audio quality: Clear audio with minimal background noise produces more accurate transcriptions. While we make our best efforts to ensure maximum efficacy, transcription accuracy depends on recording quality.
File organization: Use descriptive file names to identify calls easily within your project (e.g., "Customer-Support-Call-2024-01-15.mp3").
Multiple recordings: You can upload multiple audio files to the same project. Each file is transcribed independently and added to your project data.
---
*If you encounter issues uploading audio files for transcription, contact support with the file format, file size, and any error messages received.*
