Merge & Split Speakers
Even with automatic diarization, speaker attribution isn't always perfect. This guide shows you how to fix over-segmentation (merging speakers) and under-segmentation (splitting utterances) to create an accurate transcript.
Understanding Speaker Segmentation Errors
Over-Segmentation
What It Is:
- One person incorrectly identified as multiple speakers
- Creates duplicate speaker IDs for the same person
Example:
[00:00:05] Speaker 1
Hello, my name is John.
[00:00:15] Speaker 2
As I was saying, we need to focus on quality.
[00:00:30] Speaker 1
This is really important for our team.
Actual Reality: Speaker 1 and Speaker 2 are both John
Why It Happens:
- Speaker's voice changes (loud/soft, emotional/calm)
- Background noise interrupts voice signature
- Audio quality variations
- Coughing, clearing throat between segments
- Long pauses causing re-identification
Under-Segmentation
What It Is:
- Multiple speakers incorrectly combined into one speaker turn
- Single utterance contains speech from different people
Example:
[00:00:05] Speaker 1
Thank you for joining us today. Thanks for having me. Let's begin.
Actual Reality:
"Thank you for joining us today" = Speaker 1 (Host)
"Thanks for having me" = Speaker 2 (Guest)
"Let's begin" = Speaker 1 (Host)
Why It Happens:
- Rapid back-and-forth conversation
- No pause between speakers
- Speakers talking over each other briefly
- Very short speaker turns missed
Merging Speakers (Fixing Over-Segmentation)
When to Merge
Indicators You Need to Merge:
- Two speaker IDs sound like the same person
- Same voice characteristics
- Logical continuity suggests same speaker
- More speaker IDs than actual people in audio
How to Identify Duplicates:
- Count speaker IDs in the dropdown (e.g., 8 speakers)
- Count actual distinct voices by listening
- If speaker IDs > actual voices, you have duplicates
Method: Rename Speaker to Merge
How It Works: When you rename any speaker, ALL utterances with that speaker's old name are automatically changed to the new name. If the new name matches an existing speaker, this effectively merges them.
Step-by-Step:
-
Identify Duplicates
- Determine which speaker IDs are the same person
- Example: "Speaker 1" and "Speaker 4" are both John
-
Choose Target Speaker
- Decide which name to keep (usually the one with more utterances)
- Example: Keep "Speaker 1", merge "Speaker 4" into it
-
Execute Merge Through Rename
- Click any utterance labeled with "Speaker 4"
- Click on the speaker name to open the dropdown menu
- Click the "Rename this speaker" option
- Type "Speaker 1" (the target speaker name)
- Press Enter or click Save
-
Verify
- All former "Speaker 4" utterances now show "Speaker 1"
- "Speaker 4" disappears from speaker list
- "Speaker 1" now has all combined utterances
Example Workflow:
Before Merge:
- Speaker 1: 15 utterances
- Speaker 2: 20 utterances
- Speaker 3: 8 utterances
- Speaker 4: 12 utterances (actually same as Speaker 1)
Action:
Click any "Speaker 4" → Click speaker name → "Rename this speaker"
→ Type "Speaker 1" → Enter
After Merge:
- Speaker 1: 27 utterances (15 + 12)
- Speaker 2: 20 utterances
- Speaker 3: 8 utterances
- Speaker 4: (removed)
Important Notes About Merging
Automatic Global Replacement:
- When you rename a speaker, the change applies to ALL utterances with that speaker name
- There's no way to rename just one utterance without affecting all others
- This is the intended behavior for merging speakers
Name Matching:
- Speaker names are matched case-insensitively
- "Speaker 1", "speaker 1", and "SPEAKER 1" are treated as the same
- Renaming to any variation will merge with existing speaker
Undo Available:
- The editor supports undo (Ctrl+Z / Cmd+Z)
- Use immediately after merging if you make a mistake
- After saving the document, undo history may be cleared
Merging Multiple Speakers
Scenario: Multiple speaker IDs are all the same person
Example:
- Speaker 1, Speaker 3, and Speaker 5 are all Sarah
Approach:
-
Choose Primary ID
- Pick one to keep (e.g., Speaker 1)
-
Merge First Duplicate
- Rename "Speaker 3" to "Speaker 1"
-
Merge Second Duplicate
- Rename "Speaker 5" to "Speaker 1"
-
Result
- All three merged into "Speaker 1"
- Optionally rename "Speaker 1" to "Sarah"
Order Doesn't Matter: Merge in any sequence, final result is the same.
Splitting Utterances (Fixing Under-Segmentation)
When to Split
Indicators You Need to Split:
- Single utterance contains multiple speakers
- Logical speaker change within an utterance
- Different voices or perspectives in one block
- Back-and-forth dialogue combined incorrectly
How to Identify:
- Read the text - Does it sound like a conversation within one utterance?
- Listen to audio - Do you hear different voices?
- Check logic - Does one "speaker" ask and answer their own question?
Splitting an Utterance
Built-In Split Tool:
Step-by-Step:
-
Click to Focus Utterance
- Click anywhere in the text of the utterance you want to split
- The utterance must be active (focused) for split button to be enabled
-
Position Cursor
- Click in the text where the speaker changes
- Place cursor at the exact point to split
- Usually at the start of the second speaker's text
-
Activate Split
- Hover over the right side of the utterance row
- Control buttons appear (Add, Split, Delete)
- Click the Split button (scissors icon)
- Keyboard shortcut:
Ctrl+Shift+Down(Windows/Linux) orCmd+Shift+Down(Mac)
-
Utterance Divides
- Original utterance breaks into two at cursor position
- First part keeps original speaker
- Second part becomes a new utterance with same speaker (initially)
-
Reassign Second Part
- Click speaker label on the second utterance
- Select the correct speaker from dropdown
- Attribution now accurate
Example:
Before Split:
[00:01:00] Speaker 1
Thank you for joining us today. Thanks for having me.
Let's dive into the discussion.
Cursor Position: Before "Thanks for having me"
After Split:
[00:01:00] Speaker 1
Thank you for joining us today.
[00:01:03] Speaker 1 ← Initially same speaker
Thanks for having me. Let's dive into the discussion.
Reassign Second Part:
[00:01:00] Speaker 1
Thank you for joining us today.
[00:01:03] Speaker 2 ← Reassigned to correct speaker
Thanks for having me. Let's dive into the discussion.
(May need another split if "Let's dive into the discussion" is Speaker 1)
Multiple Splits in One Utterance
Scenario: Rapid back-and-forth combined into single utterance
Approach:
-
Split from Beginning to End
- Work through the utterance sequentially
- Split at first speaker change
- Then split at next speaker change
- Continue through the utterance
-
Example Order:
- Split at first speaker change
- Select the new second utterance and split again if needed
- Continue until all speaker changes are separated
-
Reassign Speakers
- After all splits, reassign each piece to correct speaker
Combined Workflow: Merge + Split
Typical Scenario
Situation:
- Over-segmentation: Speaker 1 and Speaker 3 are both John
- Under-segmentation: Some "Speaker 2" utterances contain both Sarah and John
Workflow:
Step 1: Merge Duplicates
- Rename "Speaker 3" to "Speaker 1" (merge Johns)
- Now have: Speaker 1 (John), Speaker 2 (mixed)
Step 2: Split Mixed Utterances 3. Identify "Speaker 2" utterances with multiple speakers 4. Split each at speaker change points 5. Creates additional utterances
Step 3: Reassign Split Parts 6. Reassign some split parts to "Speaker 1" (John) 7. Keep rest as "Speaker 2" (Sarah)
Step 4: Rename for Clarity 8. Rename "Speaker 1" to "John" 9. Rename "Speaker 2" to "Sarah"
Result: Clean, accurate speaker attribution.
Best Practices
Merge Before Rename to Real Names
Why:
- Renaming before merging can be confusing
- Multiple speaker IDs with different custom names
- Harder to track which IDs are duplicates
Recommended Order:
- Merge duplicate speaker IDs (rename to matching generic names)
- Consolidate to actual number of speakers
- Then rename each unique speaker ID to real names
- Finally edit transcript text
Verify with Audio
Don't Guess:
- Always listen to audio when uncertain
- Click utterances to hear who's speaking
- Use playback to confirm speaker identity
Especially Important:
- Similar-sounding speakers
- Rapid speaker changes
- Unclear or crosstalk sections
Work Systematically
Approach:
For Merging:
- List all speaker IDs from the dropdown
- Identify duplicates by listening to samples
- Merge all duplicates by renaming
- Verify final speaker count matches actual speakers
For Splitting:
- Read through transcript
- Note utterances that seem like dialogues
- Listen to each to confirm
- Split and reassign
Speaker Picker Features
Dropdown Menu
When you click on a speaker name, a dropdown menu appears with:
Speaker List:
- All speakers currently in the transcript
- Sorted intelligently (numbered speakers first, then alphabetical)
- Current speaker marked with checkmark
- Click any speaker to reassign current utterance
Rename Section:
- "Rename this speaker" option
- Click to edit speaker name inline
- Enter new name and press Enter or Tab
- If new name matches existing speaker, they merge automatically
Action Items:
- Add speaker - Create a new speaker name (max 40 characters)
- No speaker - Remove speaker assignment
- No name - Set speaker to "No name"
Sorting Logic
The speaker dropdown sorts speakers intelligently:
- Numbered speakers first - "Speaker 1", "Speaker 2", etc. in numerical order
- Pure numbers - "1", "2", "3" in numerical order
- Named speakers - Alphabetically sorted
- "No name" - Always appears last
- Empty names - Treated as "No name"
Action Buttons
Each utterance has action buttons that appear when the utterance is active (focused):
- Add Utterance Above (only on first utterance) - Insert blank utterance above
- Add Utterance Below - Insert blank utterance below
- Split Utterance (scissors icon) - Split at cursor position
- Hotkey:
Ctrl+Shift+DownorCmd+Shift+Down
- Hotkey:
- Delete Utterance - Remove the utterance
All buttons are only enabled when the utterance is focused/active.
Common Scenarios
Scenario 1: Speaker Coughs Mid-Speech
Problem:
[00:01:00] Speaker 1
We need to focus on quality.
[00:01:05] Speaker 3 ← System created new ID for cough
This is very important.
Reality: Both are Speaker 1 (just coughed between)
Solution:
- Click "Speaker 3" utterance
- Click speaker name → "Rename this speaker"
- Type "Speaker 1" → Enter
- Result: Continuous speech by one person
Scenario 2: Quick Question-Answer
Problem:
[00:02:00] Speaker 1
What do you think about that? I think it's great. Let's proceed.
Reality:
"What do you think?" = Speaker 1
"I think it's great" = Speaker 2
"Let's proceed" = Speaker 1
Solution:
- Click before "I think it's great"
- Click Split button (or Ctrl+Shift+Down)
- Click before "Let's proceed"
- Click Split button again
- Reassign middle part to Speaker 2
- Result: Proper attribution
Scenario 3: Many Short Turns
Problem:
- 20 speaker IDs for a 2-person rapid conversation
- Severe over-segmentation
Solution:
- Listen to first few utterances of each ID
- Group IDs by actual speaker (e.g., IDs 1,3,5,7 are Person A)
- Rename all Person A IDs to "Speaker 1" one by one
- Rename all Person B IDs to "Speaker 2" one by one
- Result: 2 speaker IDs for 2 people
Limitations and Considerations
Cannot Partially Rename
Important:
- Renaming a speaker affects ALL utterances with that speaker name
- No way to rename just some instances
- If you need selective changes, split the utterance first, then rename
Example:
If "Speaker 4" should be:
- Speaker 1 for utterances 1-10
- Speaker 2 for utterances 11-20
You must:
1. Split or manually reassign utterances 1-10 to "Speaker 1"
2. Split or manually reassign utterances 11-20 to "Speaker 2"
3. No batch rename option that lets you choose which utterances
Timestamps After Splitting
After Splitting:
- New utterance inherits approximate timestamp
- Based on original utterance's duration and split position
- Good enough for navigation
- Don't rely on split-generated timestamps for frame-accurate synchronization
Complex Cross-Talk
Challenging Scenarios:
- Multiple speakers talking simultaneously
- Overlapping speech for extended periods
- May require judgment calls on attribution
Approach:
- Attribute to primary speaker
- Add note in transcript text if needed: "[Multiple speakers]"
- Or split and try to separate as best as possible
Troubleshooting
Split Button Disabled
Issue: Split button is grayed out
Cause: Utterance is not active/focused
Solution:
- Click anywhere in the utterance text to focus it
- Split button will become enabled (blue)
Merge Didn't Work as Expected
Issue: Renamed speaker but wrong utterances changed
Cause: All utterances with that exact name are changed
Solution:
- Use Undo (Ctrl+Z / Cmd+Z) immediately
- Check which speaker name you actually renamed
- Try again with correct speaker
Too Many Speakers After Editing
Issue: Still have extra speaker IDs after attempting merges
Possible Causes:
- Didn't rename all duplicates
- Some utterances still use old names
- Case-sensitive confusion (shouldn't happen, but check)
Solutions:
- Review speaker list in dropdown
- Click through utterances to find old IDs
- Rename remaining old IDs to target name
- Old IDs will disappear automatically when no utterances use them
Frequently Asked Questions
Can I split an utterance at multiple points at once?
No, split one at a time, working through the utterance sequentially.
Can I merge more than two speaker IDs at once?
No, rename/merge two at a time. Rename "Speaker 3" to "Speaker 1", then rename "Speaker 5" to "Speaker 1", etc.
What happens to timestamps when I split?
The new utterances inherit approximate timestamps based on the original utterance's timing and the split position. They're suitable for navigation but may not be frame-accurate.
Can I undo a merge?
Use Undo (Ctrl+Z / Cmd+Z) immediately after merging. The editor maintains undo history during your editing session. Once you save and close, undo history may be cleared.
Will merging/splitting affect my audio file?
No, these operations only affect the transcript structure. The audio file is never modified.
Should I merge before or after renaming speakers?
Merge before renaming to real names. It's easier to work with generic IDs (Speaker 1, Speaker 2) during merging, then rename to real names (John, Sarah) once you have the correct number of speakers.
How do I know how many speakers I should have?
Listen to the audio and count distinct voices. That's your target speaker count. Compare to the speaker dropdown to see if you have over-segmentation.
Why does renaming affect all utterances, not just the current one?
This is intentional behavior for merging speakers. If you need to change just one utterance, select a different speaker from the dropdown instead of renaming.
Can I rename a speaker to a blank name?
No, there are specific options for this:
- "No speaker" - Removes speaker assignment
- "No name" - Sets speaker to the special "No name" label
What's the maximum length for a speaker name?
40 characters. This applies when adding new speakers.
Keyboard Shortcuts
- Split utterance:
Ctrl+Shift+Down(Windows/Linux) orCmd+Shift+Down(Mac) - Undo:
Ctrl+Z(Windows/Linux) orCmd+Z(Mac)
Next Steps
After merging and splitting speakers:
- Edit Text - Refine transcript content
- Export Overview - Export with proper speaker names
- Comments & Review - Collaborate on speaker verification
**Perfect your