Welcome to the Panopto Community

Please note: All new registrants to the Panopto Community Forum must be approved by a forum moderator or admin. As such, if you navigate to a feature that is members-only, you may receive an error page if your registration has not yet been approved. We apologize for any inconvenience and are approving new members as quickly as possible.

What is Panopto looking at to do AI Audio Description?

I'm just starting to play with this option with my Disabilities office. Given a chalk talk class, it seems to work great so far - but trying with a mixed chalk /PC presentation, not so much.

So I'm curious what it's actually looking at - primary track, or in the instance of a Pearl capture with multiple tracks, is it looking at the other sources for description? In particular, I'm thinking description of a PC PowerPoint slide, which even with an OCR scrape is limited on how much of the slide it puts under content.

Anyone have any information on this from your own testing?

Thanks!

Elaine

Answers

  • Brianna ParkerBrianna Parker Administrator

    Hey Elaine!

    Our audio descriptions are generated based on two key things:

    1. Transcriptions. As with our automatic captions, audio descriptions are based on sounds heard across all audio streams — this will not pick up on any stream that does not have audio. This means that audio descriptions will have to rely entirely on clear motion in that stream to determine what should and should not be transcribed.
    2. Scene analysis. All streams, including those without audio, will "watch" the content of each stream that is actively playing at all points in the video, and the technology will apply the appropriate audio description. For example, if it recognizes a person clapping their hands, it will show an audio description for that.

    It's important to note that automatic audio descriptions may not be 100% accurate. We always encourage users to review anything that was automatically generated— both captions and audio descriptions. Audio descriptions may struggle to capture screen content that is blurry or actions that are partially obscured, such as when a person has their back turned to the camera but their arms/shoulders are moving to suggest clapping.

    So, in your specific example, a PowerPoint slide may not be interpreted as easily as a standard video stream, depending on how you have it set up. If there are no clear audio cues or significant movement on the slide, it may not pick up on them.

    If streams that very clearly should have audio descriptions are not transcribing correctly, please submit a Support ticket. Be sure to include the relevant video!

Sign In or Register to comment.