Improve ASR accuracy - Recognise slide content; Upload local taxonomy/dictionary; AI learning

Tim Vincent · September 2020

We use Panopto in medical education. ASR is good but struggles with any technical language (drugs, anatomy, biology) - we're getting less than 50% accuracy, which is really not good enough and is causing great concern among our faculty (they are asking if there are better providers!).

Please could you improve the accuracy of ASR. We suggest three avenues:

Slide text recognition - often the spoken words are included in the PowerPoint slide. Can ASR harvest/recognise this text and add it to the lexicon to reference its translation algorithm?
Upload local taxonomy - could the institution (or school or user) upload a text file of technical terminology to add to the ASR lexicon to reference its translation algorithm?
AI learning - does the ASR system actually learn from corrections that are made? Are new terms added to the ASR lexicon for reference? Colleagues are not seeing this happen but surely this is a key feature of machine translation technology.

Tim Vincent · June 2021

Hi Admins, great to see the June service update and beta features. Disappointing lack of any mention of improving ASR accuracy. Any work going on in that much-needed area please?

Michael Espey · July 2021

I see that Speechmatics is able to generate requests with an individual dictionary (https://docs.speechmatics.com/en/cloud/saasv2api/#transcriptionconfig as "additional_vocab") for each request, and this would be very useful in two ways. The first, and highest priority, would be to take the captured text from a PPT and provide that as a dictionary. Most of the important words will be included in the PPT so this will be a quick way to improve accuracy dramatically. The other would be a folder level and a site-level dictionary option for creators/admins. I think these should be a cumulative thing. If there is a max word count on these requests (which I assume there is), I would like it to start with the containing folder and move up the tree. This way courses can override departments, and departments can override the central configuration.

Charles Barbour · July 2021

This would be really helpful. Reading some of the documentation it looks like there may be some increased memory and cpu usage but the benefits might outweigh the costs. It looks like their custom dictionary limit is currently 1,000 words. Pulling the words and phrases from OCR results of slides,screens is a perfect starting point. Do the OCR, get the results, remove all the common and easy words, add the rest as additional_vocab, then submit the audio for ASR.

I like the idea of folder level dictionaries and rolling the results upwards. I also think there should be a site level dictionary. (This is not simply a Panopto problem, but I can't tell you how many times I've had a speaker say N.D. as in Notre Dame, and see Andy, and he, or indie displayed.)

I also think there should be user level dictionaries and they should be used when submitting audio for captioning.

If I edit the captions for a session and fix a term (SR -> ASR) it would be awesome if Panopto would offer to replace the rest of the terms in the recording. Or it could offer me the option to review (and replace) each instance of that word in the recording (Next, Next, Next...). It could even offer add that term and the "sounds-like" phrase (ASR, sounds like: ay ess are) to the folder and/or user's custom dictionary.

Would would also be helpful is for a given recording or folder, if we could get a list of the most common ASR results which had very low confidence. If faculty knew the top 10 terms being used with low confidence which were being interpreted incorrectly, they would know which terms were worth adding to the dictionary.

Elba Rios · July 2021

@Charles Barbour , I especially liked your idea for replacing terms in the recording.... "Or it could offer me the option to review (and replace) each instance of that word in the recording (Next, Next, Next...)".

Also, for ASR low confidence results we need a method for capturing measurable data resolve ASR issues.

A rating system to capture captions accuracy satisfaction -- could be star system, or Likert Scale.
Ability to generate reports based on "start" system level of satisfaction.

Welcome to the Panopto Community

Improve ASR accuracy - Recognise slide content; Upload local taxonomy/dictionary; AI learning

Completed · Last Updated June 2023

Comments