Skip to main content

Merge & Split Speakers

Even with automatic diarization, speaker attribution isn't always perfect. This guide shows you how to fix over-segmentation (merging speakers) and under-segmentation (splitting utterances) to create an accurate transcript.

Understanding Speaker Segmentation Errors

Over-Segmentation

What It Is:

  • One person incorrectly identified as multiple speakers
  • Creates duplicate speaker IDs for the same person

Example:

[00:00:05] Speaker 1
Hello, my name is John.

[00:00:15] Speaker 2
As I was saying, we need to focus on quality.

[00:00:30] Speaker 1
This is really important for our team.

Actual Reality: Speaker 1 and Speaker 2 are both John

Why It Happens:

  • Speaker's voice changes (loud/soft, emotional/calm)
  • Background noise interrupts voice signature
  • Audio quality variations
  • Coughing, clearing throat between segments
  • Long pauses causing re-identification

Under-Segmentation

What It Is:

  • Multiple speakers incorrectly combined into one speaker turn
  • Single utterance contains speech from different people

Example:

[00:00:05] Speaker 1
Thank you for joining us today. Thanks for having me. Let's begin.

Actual Reality:
"Thank you for joining us today" = Speaker 1 (Host)
"Thanks for having me" = Speaker 2 (Guest)
"Let's begin" = Speaker 1 (Host)

Why It Happens:

  • Rapid back-and-forth conversation
  • No pause between speakers
  • Speakers talking over each other briefly
  • Very short speaker turns missed

Merging Speakers (Fixing Over-Segmentation)

When to Merge

Indicators You Need to Merge:

  • Two speaker IDs sound like the same person
  • Same voice characteristics
  • Logical continuity suggests same speaker
  • More speaker IDs than actual people in audio

How to Identify Duplicates:

  1. Count speaker IDs in the dropdown (e.g., 8 speakers)
  2. Count actual distinct voices by listening
  3. If speaker IDs > actual voices, you have duplicates

Method: Rename Speaker to Merge

How It Works: When you rename any speaker, ALL utterances with that speaker's old name are automatically changed to the new name. If the new name matches an existing speaker, this effectively merges them.

Step-by-Step:

  1. Identify Duplicates

    • Determine which speaker IDs are the same person
    • Example: "Speaker 1" and "Speaker 4" are both John
  2. Choose Target Speaker

    • Decide which name to keep (usually the one with more utterances)
    • Example: Keep "Speaker 1", merge "Speaker 4" into it
  3. Execute Merge Through Rename

    • Click any utterance labeled with "Speaker 4"
    • Click on the speaker name to open the dropdown menu
    • Click the "Rename this speaker" option
    • Type "Speaker 1" (the target speaker name)
    • Press Enter or click Save
  4. Verify

    • All former "Speaker 4" utterances now show "Speaker 1"
    • "Speaker 4" disappears from speaker list
    • "Speaker 1" now has all combined utterances

Example Workflow:

Before Merge:
- Speaker 1: 15 utterances
- Speaker 2: 20 utterances
- Speaker 3: 8 utterances
- Speaker 4: 12 utterances (actually same as Speaker 1)

Action:
Click any "Speaker 4" → Click speaker name → "Rename this speaker"
→ Type "Speaker 1" → Enter

After Merge:
- Speaker 1: 27 utterances (15 + 12)
- Speaker 2: 20 utterances
- Speaker 3: 8 utterances
- Speaker 4: (removed)

Important Notes About Merging

Automatic Global Replacement:

  • When you rename a speaker, the change applies to ALL utterances with that speaker name
  • There's no way to rename just one utterance without affecting all others
  • This is the intended behavior for merging speakers

Name Matching:

  • Speaker names are matched case-insensitively
  • "Speaker 1", "speaker 1", and "SPEAKER 1" are treated as the same
  • Renaming to any variation will merge with existing speaker

Undo Available:

  • The editor supports undo (Ctrl+Z / Cmd+Z)
  • Use immediately after merging if you make a mistake
  • After saving the document, undo history may be cleared

Merging Multiple Speakers

Scenario: Multiple speaker IDs are all the same person

Example:

  • Speaker 1, Speaker 3, and Speaker 5 are all Sarah

Approach:

  1. Choose Primary ID

    • Pick one to keep (e.g., Speaker 1)
  2. Merge First Duplicate

    • Rename "Speaker 3" to "Speaker 1"
  3. Merge Second Duplicate

    • Rename "Speaker 5" to "Speaker 1"
  4. Result

    • All three merged into "Speaker 1"
    • Optionally rename "Speaker 1" to "Sarah"

Order Doesn't Matter: Merge in any sequence, final result is the same.

Splitting Utterances (Fixing Under-Segmentation)

When to Split

Indicators You Need to Split:

  • Single utterance contains multiple speakers
  • Logical speaker change within an utterance
  • Different voices or perspectives in one block
  • Back-and-forth dialogue combined incorrectly

How to Identify:

  1. Read the text - Does it sound like a conversation within one utterance?
  2. Listen to audio - Do you hear different voices?
  3. Check logic - Does one "speaker" ask and answer their own question?

Splitting an Utterance

Built-In Split Tool:

Step-by-Step:

  1. Click to Focus Utterance

    • Click anywhere in the text of the utterance you want to split
    • The utterance must be active (focused) for split button to be enabled
  2. Position Cursor

    • Click in the text where the speaker changes
    • Place cursor at the exact point to split
    • Usually at the start of the second speaker's text
  3. Activate Split

    • Hover over the right side of the utterance row
    • Control buttons appear (Add, Split, Delete)
    • Click the Split button (scissors icon)
    • Keyboard shortcut: Ctrl+Shift+Down (Windows/Linux) or Cmd+Shift+Down (Mac)
  4. Utterance Divides

    • Original utterance breaks into two at cursor position
    • First part keeps original speaker
    • Second part becomes a new utterance with same speaker (initially)
  5. Reassign Second Part

    • Click speaker label on the second utterance
    • Select the correct speaker from dropdown
    • Attribution now accurate

Example:

Before Split:
[00:01:00] Speaker 1
Thank you for joining us today. Thanks for having me.
Let's dive into the discussion.

Cursor Position: Before "Thanks for having me"

After Split:
[00:01:00] Speaker 1
Thank you for joining us today.

[00:01:03] Speaker 1 ← Initially same speaker
Thanks for having me. Let's dive into the discussion.

Reassign Second Part:
[00:01:00] Speaker 1
Thank you for joining us today.

[00:01:03] Speaker 2 ← Reassigned to correct speaker
Thanks for having me. Let's dive into the discussion.

(May need another split if "Let's dive into the discussion" is Speaker 1)

Multiple Splits in One Utterance

Scenario: Rapid back-and-forth combined into single utterance

Approach:

  1. Split from Beginning to End

    • Work through the utterance sequentially
    • Split at first speaker change
    • Then split at next speaker change
    • Continue through the utterance
  2. Example Order:

    • Split at first speaker change
    • Select the new second utterance and split again if needed
    • Continue until all speaker changes are separated
  3. Reassign Speakers

    • After all splits, reassign each piece to correct speaker

Combined Workflow: Merge + Split

Typical Scenario

Situation:

  • Over-segmentation: Speaker 1 and Speaker 3 are both John
  • Under-segmentation: Some "Speaker 2" utterances contain both Sarah and John

Workflow:

Step 1: Merge Duplicates

  1. Rename "Speaker 3" to "Speaker 1" (merge Johns)
  2. Now have: Speaker 1 (John), Speaker 2 (mixed)

Step 2: Split Mixed Utterances 3. Identify "Speaker 2" utterances with multiple speakers 4. Split each at speaker change points 5. Creates additional utterances

Step 3: Reassign Split Parts 6. Reassign some split parts to "Speaker 1" (John) 7. Keep rest as "Speaker 2" (Sarah)

Step 4: Rename for Clarity 8. Rename "Speaker 1" to "John" 9. Rename "Speaker 2" to "Sarah"

Result: Clean, accurate speaker attribution.

Best Practices

Merge Before Rename to Real Names

Why:

  • Renaming before merging can be confusing
  • Multiple speaker IDs with different custom names
  • Harder to track which IDs are duplicates

Recommended Order:

  1. Merge duplicate speaker IDs (rename to matching generic names)
  2. Consolidate to actual number of speakers
  3. Then rename each unique speaker ID to real names
  4. Finally edit transcript text

Verify with Audio

Don't Guess:

  • Always listen to audio when uncertain
  • Click utterances to hear who's speaking
  • Use playback to confirm speaker identity

Especially Important:

  • Similar-sounding speakers
  • Rapid speaker changes
  • Unclear or crosstalk sections

Work Systematically

Approach:

For Merging:

  1. List all speaker IDs from the dropdown
  2. Identify duplicates by listening to samples
  3. Merge all duplicates by renaming
  4. Verify final speaker count matches actual speakers

For Splitting:

  1. Read through transcript
  2. Note utterances that seem like dialogues
  3. Listen to each to confirm
  4. Split and reassign

Speaker Picker Features

When you click on a speaker name, a dropdown menu appears with:

Speaker List:

  • All speakers currently in the transcript
  • Sorted intelligently (numbered speakers first, then alphabetical)
  • Current speaker marked with checkmark
  • Click any speaker to reassign current utterance

Rename Section:

  • "Rename this speaker" option
  • Click to edit speaker name inline
  • Enter new name and press Enter or Tab
  • If new name matches existing speaker, they merge automatically

Action Items:

  • Add speaker - Create a new speaker name (max 40 characters)
  • No speaker - Remove speaker assignment
  • No name - Set speaker to "No name"

Sorting Logic

The speaker dropdown sorts speakers intelligently:

  1. Numbered speakers first - "Speaker 1", "Speaker 2", etc. in numerical order
  2. Pure numbers - "1", "2", "3" in numerical order
  3. Named speakers - Alphabetically sorted
  4. "No name" - Always appears last
  5. Empty names - Treated as "No name"

Action Buttons

Each utterance has action buttons that appear when the utterance is active (focused):

  • Add Utterance Above (only on first utterance) - Insert blank utterance above
  • Add Utterance Below - Insert blank utterance below
  • Split Utterance (scissors icon) - Split at cursor position
    • Hotkey: Ctrl+Shift+Down or Cmd+Shift+Down
  • Delete Utterance - Remove the utterance

All buttons are only enabled when the utterance is focused/active.

Common Scenarios

Scenario 1: Speaker Coughs Mid-Speech

Problem:

[00:01:00] Speaker 1
We need to focus on quality.

[00:01:05] Speaker 3 ← System created new ID for cough
This is very important.

Reality: Both are Speaker 1 (just coughed between)

Solution:

  • Click "Speaker 3" utterance
  • Click speaker name → "Rename this speaker"
  • Type "Speaker 1" → Enter
  • Result: Continuous speech by one person

Scenario 2: Quick Question-Answer

Problem:

[00:02:00] Speaker 1
What do you think about that? I think it's great. Let's proceed.

Reality:
"What do you think?" = Speaker 1
"I think it's great" = Speaker 2
"Let's proceed" = Speaker 1

Solution:

  • Click before "I think it's great"
  • Click Split button (or Ctrl+Shift+Down)
  • Click before "Let's proceed"
  • Click Split button again
  • Reassign middle part to Speaker 2
  • Result: Proper attribution

Scenario 3: Many Short Turns

Problem:

  • 20 speaker IDs for a 2-person rapid conversation
  • Severe over-segmentation

Solution:

  1. Listen to first few utterances of each ID
  2. Group IDs by actual speaker (e.g., IDs 1,3,5,7 are Person A)
  3. Rename all Person A IDs to "Speaker 1" one by one
  4. Rename all Person B IDs to "Speaker 2" one by one
  5. Result: 2 speaker IDs for 2 people

Limitations and Considerations

Cannot Partially Rename

Important:

  • Renaming a speaker affects ALL utterances with that speaker name
  • No way to rename just some instances
  • If you need selective changes, split the utterance first, then rename

Example:

If "Speaker 4" should be:
- Speaker 1 for utterances 1-10
- Speaker 2 for utterances 11-20

You must:
1. Split or manually reassign utterances 1-10 to "Speaker 1"
2. Split or manually reassign utterances 11-20 to "Speaker 2"
3. No batch rename option that lets you choose which utterances

Timestamps After Splitting

After Splitting:

  • New utterance inherits approximate timestamp
  • Based on original utterance's duration and split position
  • Good enough for navigation
  • Don't rely on split-generated timestamps for frame-accurate synchronization

Complex Cross-Talk

Challenging Scenarios:

  • Multiple speakers talking simultaneously
  • Overlapping speech for extended periods
  • May require judgment calls on attribution

Approach:

  • Attribute to primary speaker
  • Add note in transcript text if needed: "[Multiple speakers]"
  • Or split and try to separate as best as possible

Troubleshooting

Split Button Disabled

Issue: Split button is grayed out

Cause: Utterance is not active/focused

Solution:

  • Click anywhere in the utterance text to focus it
  • Split button will become enabled (blue)

Merge Didn't Work as Expected

Issue: Renamed speaker but wrong utterances changed

Cause: All utterances with that exact name are changed

Solution:

  • Use Undo (Ctrl+Z / Cmd+Z) immediately
  • Check which speaker name you actually renamed
  • Try again with correct speaker

Too Many Speakers After Editing

Issue: Still have extra speaker IDs after attempting merges

Possible Causes:

  • Didn't rename all duplicates
  • Some utterances still use old names
  • Case-sensitive confusion (shouldn't happen, but check)

Solutions:

  • Review speaker list in dropdown
  • Click through utterances to find old IDs
  • Rename remaining old IDs to target name
  • Old IDs will disappear automatically when no utterances use them

Frequently Asked Questions

Can I split an utterance at multiple points at once?

No, split one at a time, working through the utterance sequentially.

Can I merge more than two speaker IDs at once?

No, rename/merge two at a time. Rename "Speaker 3" to "Speaker 1", then rename "Speaker 5" to "Speaker 1", etc.

What happens to timestamps when I split?

The new utterances inherit approximate timestamps based on the original utterance's timing and the split position. They're suitable for navigation but may not be frame-accurate.

Can I undo a merge?

Use Undo (Ctrl+Z / Cmd+Z) immediately after merging. The editor maintains undo history during your editing session. Once you save and close, undo history may be cleared.

Will merging/splitting affect my audio file?

No, these operations only affect the transcript structure. The audio file is never modified.

Should I merge before or after renaming speakers?

Merge before renaming to real names. It's easier to work with generic IDs (Speaker 1, Speaker 2) during merging, then rename to real names (John, Sarah) once you have the correct number of speakers.

How do I know how many speakers I should have?

Listen to the audio and count distinct voices. That's your target speaker count. Compare to the speaker dropdown to see if you have over-segmentation.

Why does renaming affect all utterances, not just the current one?

This is intentional behavior for merging speakers. If you need to change just one utterance, select a different speaker from the dropdown instead of renaming.

Can I rename a speaker to a blank name?

No, there are specific options for this:

  • "No speaker" - Removes speaker assignment
  • "No name" - Sets speaker to the special "No name" label

What's the maximum length for a speaker name?

40 characters. This applies when adding new speakers.

Keyboard Shortcuts

  • Split utterance: Ctrl+Shift+Down (Windows/Linux) or Cmd+Shift+Down (Mac)
  • Undo: Ctrl+Z (Windows/Linux) or Cmd+Z (Mac)

Next Steps

After merging and splitting speakers:


**Perfect your