Better ASR Captions

Michael Espey · October 2020

In many cases the ASR that is available in Panopto is not quite as accurate as we might hope so we are starting to dig into what it might look like to get a more accurate ASR solution in place for media in Panopto. I have been testing with the Azure Media Services offering for audio analysis. It seems to be quite accurate, very quick, and the prices seem to be OK. Beyond the Azure solution, what is available from AWS gives timings for each word so it would allow us to reflow captions with edits (https://community.panopto.com/discussion/743/reflow-captions). It would even allow some infrastructure to be built out to translate text and convert that translated text as audio. This sort of a session (with multiple audio tracks a user can pick from) isn't available today in Panopto, but getting into something like this for ASR would open that development in the future. Obviously there is some cost there beyond the "out of the box" experience, but having this sort of thing available could provide some other modes of operation in Panopto. (https://aws.amazon.com/blogs/machine-learning/create-video-subtitles-with-translation-using-machine-learning/)

For the feature request, would it be possible to tie in other ASR platforms? (namely - https://docs.microsoft.com/en-us/azure/media-services/latest/media-services-apis-overview)

If there are plans to improve the quality of what is available in Panopto out of the box, that is great, but if not, giving us the flexibility to tie into other ASR providers may be good.

Welcome to the Panopto Community

Better ASR Captions