Scrape for captions/slide text via API?

Matthew Polaniecki · March 2021

We are working on a project to parse through caption/slide data for keywords. We are looking for any suggestions for pulling these captions via API. Our team is aware that captions aren't yet available for download via the API, but we can infer the caption download URL based on the session's deliveryID, which should work for our needs. In addition, we're hoping to obtain presentation slide data. Is there any way in the API that we could return slide images (or the entire presentation file) along with slide change time metadata?

Our goal is to supplement our existing medical dictionary of terms that students must learn with an easy way to locate where those terms come up in the curriculum. We'd like to start development as soon as possible and begin querying sessions in Panopto by mid-April. Any advice would be greatly appreciated, along with any suggestions for different ways to obtain the data we're looking for.

Michael Espey · March 2021

You could potentially build something to consume opensearch results.

This would pull in OCR, ASR, captions, slide titles, slide notes, descriptions, etc. It wouldn't give you the individual slide images, nor would it give you the slide deck, but it would give you what Panopto identified and pulled out.

I would also be curious if there would be a way to pull out the search index for a session. That should give you the data that you are looking for. I am not sure that is something Panopto would be able to provide, but it might be worth the ask.

Matthew Polaniecki · March 2021

I think one challenge is that the endpoints may not have been fully documented. Does anyone have a full list of what's possible via REST API?

Joe Malmsten · March 2021

Hi,

Here is a full list of our current REST API endpoints.

https://demo.hosted.panopto.com/Panopto/Api/Docs/index.html

While there are no endpoints to get the slide data, when you get a session using the REST API the response includes a link to download any existing captions.

Thanks,

Joe Malmsten

Matthew Polaniecki · June 2021

Hi Joe,

We were actually able to get the Slide and OCR data. What are the possible EventTargetType values that can be searched for in a session?

Hiroshi Ohno · June 2021

Hi Matthew,

This is Hiroshi from Panopto engineering team, working with Joe.

Panopto does not provide the raw data from Slide or OCR, or aggregated search index data through API. Panopto API provides only search capability itself.

I am not sure your inquiry about "EventTagetType". As far as I know, that term is not mentioned in REST API document. If you may point out where that term is discussed in our documentation or material, I may give more information.

Note that Panopto provides a system level index export, which is called federated search integration. This requires Panopto system admin level work and I am not sure if this is something you are interested in. If it is the case, please work with your organization's POC (Point of Contact) for Panopto so that our customer support team may discuss that option further.

Hiroshi Ohno · June 2021

Matthew,

Let me add one more thing.

If you pass includeFields=Context to GET /api/v1/sessions/search, the response includes the information about where the query word hits inside the video. The result should be equivalent to what you get from a search on Panotpo UI.

This might satisfy your need to find out a specific term within the timeline of the videos. Please evaluate it.

Welcome to the Panopto Community

Scrape for captions/slide text via API?

Answers