Word Error Rate (WER)

-> Go to BOTwiki

The Word Error Rate (WER) is the key metric for measuring the quality of speech-to-text systems. It indicates how many words in a spoken sentence were transcribed incorrectly by the recognition system, expressed as a percentage of the total number of spoken words. A low WER is a prerequisite for reliable voicebots, because every recognition error subsequently degrades the classification of the request, entity extraction, and thus end-to-end automation.

 

How the Word Error Rate is Calculated

The WER sums up three types of errors and compares them to the length of the reference text:

  • Substitutions (S): A word has been replaced by another.
  • Insertions (I): An additional word has been inserted.
  • Deletions (D): A word is missing from the transcription.

The formula is WER = (S + I + D) / N, where N is the number of words in the reference text. For example, a WER value of 5% means that one word was incorrectly recognized in a 20-word sentence.

 

WER and Its Impact on Voice Bots

In the voice channel, the WER directly impacts subsequent steps. If the system misidentifies a customer number or a plan name, the entire workflow fails. That is why the WER is not just a quality metric, but an input variable for multi-agent orchestration: When confidence is low, the triage agent specifically requests a repeat or compares the audio text with stored custom entities.

 

WHO for proper nouns, numbers, and technical terms

The average error rate (ER) of modern speech-to-text systems for standard conversations is in the low single digits. For proper nouns, addresses, numbers, or industry-specific terminology, it is often significantly higher—unfortunately, precisely where it is most critical for service processes. Custom vocabularies, industry-specific language models, and downstream plausibility checks via phonebots provide a solution.

 

Frequently Asked Questions (FAQ)

In the context of dictation, an error rate (ER) below 5% is considered very good. In the service sector, where proper names, addresses, and numbers are involved, realistic target rates vary by industry—the key is to ensure that critical data points (customer number, address, amount) are accurate.

Any gap in recognition leads either to follow-up questions or to escalation to employees. Both reduce the automation rate. A low WER is therefore a direct driver of ROI.

No. WER is a necessary but not sufficient condition. It is only through the combination of multi-agent orchestration, hybrid intelligence, and knowledge AI that a good transcript can be transformed into a robust service process.

–> Back to the BOTwiki