Welcome to the Panopto Community

Please note: All new registrants to the Panopto Community Forum must be approved by a forum moderator or admin. As such, if you navigate to a feature that is members-only, you may receive an error page if your registration has not yet been approved. We apologize for any inconvenience and are approving new members as quickly as possible.

Scrape for captions/slide text via API?

We are working on a project to parse through caption/slide data for keywords. We are looking for any suggestions for pulling these captions via API. Our team is aware that captions aren't yet available for download via the API, but we can infer the caption download URL based on the session's deliveryID, which should work for our needs. In addition, we're hoping to obtain presentation slide data. Is there any way in the API that we could return slide images (or the entire presentation file) along with slide change time metadata?

Our goal is to supplement our existing medical dictionary of terms that students must learn with an easy way to locate where those terms come up in the curriculum. We'd like to start development as soon as possible and begin querying sessions in Panopto by mid-April. Any advice would be greatly appreciated, along with any suggestions for different ways to obtain the data we're looking for.

Tagged:

Answers

  • You could potentially build something to consume opensearch results.

    This would pull in OCR, ASR, captions, slide titles, slide notes, descriptions, etc. It wouldn't give you the individual slide images, nor would it give you the slide deck, but it would give you what Panopto identified and pulled out.

    I would also be curious if there would be a way to pull out the search index for a session. That should give you the data that you are looking for. I am not sure that is something Panopto would be able to provide, but it might be worth the ask.

  • I think one challenge is that the endpoints may not have been fully documented. Does anyone have a full list of what's possible via REST API?

  • Joe MalmstenJoe Malmsten Panopto Employee

    Hi,

    Here is a full list of our current REST API endpoints.

    https://demo.hosted.panopto.com/Panopto/Api/Docs/index.html

    While there are no endpoints to get the slide data, when you get a session using the REST API the response includes a link to download any existing captions.

    Thanks,

    Joe Malmsten

  • Hi Joe,

    We were actually able to get the Slide and OCR data. What are the possible EventTargetType values that can be searched for in a session?

  • Hiroshi OhnoHiroshi Ohno Panopto Employee

    Hi Matthew,

    This is Hiroshi from Panopto engineering team, working with Joe.

    Panopto does not provide the raw data from Slide or OCR, or aggregated search index data through API. Panopto API provides only search capability itself.

    I am not sure your inquiry about "EventTagetType". As far as I know, that term is not mentioned in REST API document. If you may point out where that term is discussed in our documentation or material, I may give more information.

    Note that Panopto provides a system level index export, which is called federated search integration. This requires Panopto system admin level work and I am not sure if this is something you are interested in. If it is the case, please work with your organization's POC (Point of Contact) for Panopto so that our customer support team may discuss that option further.

  • Hiroshi OhnoHiroshi Ohno Panopto Employee

    Matthew,

    Let me add one more thing.

    If you pass includeFields=Context to GET /api/v1/sessions/search, the response includes the information about where the query word hits inside the video. The result should be equivalent to what you get from a search on Panotpo UI.

    This might satisfy your need to find out a specific term within the timeline of the videos. Please evaluate it.

Sign In or Register to comment.