Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumetric video #286

Open
xfq opened this issue Sep 3, 2021 · 10 comments
Open

Volumetric video #286

xfq opened this issue Sep 3, 2021 · 10 comments
Labels

Comments

@xfq
Copy link
Member

xfq commented Sep 3, 2021

During the July 2021 W3C Chinese Interest Group Meeting on Immersive Web Standards, there was a presentation from Prometheus about volumetric video.

Volumetric video is a special kind of video. They currently use Three.js to render a 3D scene and 3D characters. A frame of the video is not an image, but a 3D model. In order to make such a video, they use hundreds of cameras to surround and model the actors.

Currently, there is no standardized format for this kind of 3D model streams, so in order to better compress and distribute these data together, they have created a proprietary file format. To put it simply, they compress the textures into MP4, and add the model into the supplemental enhancement information (SEI) of an MP4 file, then distribute the MP4 to the CDN and the users can play it on the web (using their web video player).

Compared with traditional 3D modeling, the production cycle volumetric video is shorter, the cost is lower, and it is more photorealistic.

Use cases include fashion shows, AR advertising, immersive film, and so on.

Standardization requirements include:

  • The current volumetric videos (3D model stream) do not have a standardized file format, and each volumetric video provider develops its own compression and file format, and the bitrate is relatively large (usually more than 10Mbps).
  • <video> in HTML lacks callback with precise timestamp (HTMLVideoElement.requestVideoFrameCallback() is just a CG draft).
  • No callback for SEI (so they can only use WebAssembly to soft-decode the video, the overall power consumption will be relatively high, and it is difficult to achieve smooth playback on older mobile phones).
@xfq xfq added the Media label Sep 3, 2021
@xfq
Copy link
Member Author

xfq commented Nov 17, 2021

Possibly related issue on SEI: w3c/webcodecs#198

@tidoust
Copy link
Member

tidoust commented Nov 17, 2021

No callback for SEI (so they can only use WebAssembly to soft-decode the video, the overall power consumption will be relatively high, and it is difficult to achieve smooth playback on older mobile phones).

That may have been discussed during the presentation, but it could be good to quantify the overhead that this triggers. Could that be provided as input to w3c/webcodecs#198?

This is the sort of considerations that will likely be used to determine whether SEI metadata should be provided in the output callback of WebCodecs or whether it's fine to leave that up to applications (as muxing/demuxing operations for the time being). In particular, if I understand things correctly, the frames themselves don't need to be soft-decoded here, "just" the SEI information.

@xfq
Copy link
Member Author

xfq commented Nov 18, 2021

That may have been discussed during the presentation, but it could be good to quantify the overhead that this triggers. Could that be provided as input to w3c/webcodecs#198?

I'm not aware of any relevant quantitative tests. Is there a standardized test method? Or maybe the browser DevTools is enough?

This is the sort of considerations that will likely be used to determine whether SEI metadata should be provided in the output callback of WebCodecs or whether it's fine to leave that up to applications (as muxing/demuxing operations for the time being). In particular, if I understand things correctly, the frames themselves don't need to be soft-decoded here, "just" the SEI information.

Unfortunately, the video itself also requires soft decoding. IIUC hard-decoded video has no callback for the frame, so it is not possible to align the frame, so both the video and the SEI information need to be soft-decoded.

(There are some other use cases that need SEI metadata. I will share it with you in a separate thread.)

@tidoust
Copy link
Member

tidoust commented Nov 18, 2021

I'm not aware of any relevant quantitative tests. Is there a standardized test method? Or maybe the browser DevTools is enough?

I don't know :)
I guess anything, from CPU/memory usage and FPS counts to more fine-grained analyses would help.

Unfortunately, the video itself also requires soft decoding. IIUC hard-decoded video has no callback for the frame, so it is not possible to align the frame, so both the video and the SEI information need to be soft-decoded.

I meant once WebCodecs is more broadly available. In such a scenario, and unless WebCodecs gets amended to also expose the SEI metadata, the app would need to soft-decode the SEI information, but WebCodecs would still take care of decoding the frames, so app could in theory skip that?

(There are some other use cases that need SEI metadata. I will share it with you in a separate thread.)

Feel free to use w3c/webcodecs#198 directly, where they're trying to collect such use cases, and where you'll get much more expert feedback ;)

@xfq
Copy link
Member Author

xfq commented Nov 20, 2021

I'm not aware of any relevant quantitative tests. Is there a standardized test method? Or maybe the browser DevTools is enough?

I don't know :) I guess anything, from CPU/memory usage and FPS counts to more fine-grained analyses would help.

OK. I'll ask if Prometheus has any quantitative analysis.

Unfortunately, the video itself also requires soft decoding. IIUC hard-decoded video has no callback for the frame, so it is not possible to align the frame, so both the video and the SEI information need to be soft-decoded.

I meant once WebCodecs is more broadly available. In such a scenario, and unless WebCodecs gets amended to also expose the SEI metadata, the app would need to soft-decode the SEI information, but WebCodecs would still take care of decoding the frames, so app could in theory skip that?

Indeed.

@xfq
Copy link
Member Author

xfq commented Dec 8, 2021

Prometheus tested WebCodecs and the decoded format is a VideoFrame. They found that the main problem currently is that converting VideoFrame to WebGL Textures is very slow.

They tested a 3840x3840 video. When doing decoding only, the frame rate is about 80fps; adding frame.clone() it is still about 80fps. But after adding drawImage or texSubImage2D, the frame rate dropped to about 10fps:

sei

@tidoust
Copy link
Member

tidoust commented Dec 8, 2021

Thanks for the exploration. The conversion of VideoFrame to WebGL/WebGPU textures has been the topic of various discussions, not sure where we are right now, although drawImage seems like a basic need and should not be slower than when using a <video> element as parameter instead of a VideoFrame. Could an issue be raised on the WebCodecs repo so that relevant experts shed some light on reasons that may explain the drop in frame rate?

Back to requirements listed at the top of this issue, what I was more interested in is an evaluation of the time and resources needed to parse SEI metadata at the application layer, to inform the discussion on whether browsers should expose SEI metadata natively. This seems orthogonal to the problem of converting VideoFrame to textures.

@xfq
Copy link
Member Author

xfq commented Dec 10, 2021

Thanks for the exploration. The conversion of VideoFrame to WebGL/WebGPU textures has been the topic of various discussions, not sure where we are right now, although drawImage seems like a basic need and should not be slower than when using a <video> element as parameter instead of a VideoFrame. Could an issue be raised on the WebCodecs repo so that relevant experts shed some light on reasons that may explain the drop in frame rate?

Filed w3c/webcodecs#421

Back to requirements listed at the top of this issue, what I was more interested in is an evaluation of the time and resources needed to parse SEI metadata at the application layer, to inform the discussion on whether browsers should expose SEI metadata natively. This seems orthogonal to the problem of converting VideoFrame to textures.

They haven't tested it yet, but they wonder if there's any reason not to expose SEI metadata to Web developers. Because they modified the MP4 format, they can use their customized uuid to find the SEI with MP4Box.js, which is convenient, but for the general MP4 files, it is more difficult.

@tidoust
Copy link
Member

tidoust commented Dec 10, 2021

They haven't tested it yet, but they wonder if there's any reason not to expose SEI metadata to Web developers.

I would say that, generally speaking, that is not how standardization works ;)

As far as I can tell, initial exchanges don't reveal anything that would make exposing SEI metadata a bad idea. That does not necessarily make it a good idea. The good thing about leaving it up to applications is that you then don't have to worry about interoperability issues between browser implementations: the application controls the JS/WASM code it ships, and that code should run equally well across browsers. If browsers need to support that natively, we will have to make sure that implementations are interoperable. That is certainly doable, the main question that is on the table, on top of the level at which that API should be exposed (media element or WebCodecs), is whether that is really needed or worth the effort if the application can already do it at a reasonable cost.

I note that the Media & Entertainment Interest Group discussed SEI metadata exposure last week (minutes not yet published), and discussions will continue in the Media Timed Events task force in that group.

@MARTOFF194
Copy link

Markdown IS supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants
@tidoust @xfq @MARTOFF194 and others