Conversation

jinhohwang-meta

This PR added the benchmark for opencv. The benchmark shows that there is gap. Hope the torchcodec team can find out where the gap is from and improve the decoding performance.

Local benchmark

|  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: --------------------------------------------------------------------------------------------------------------------------------
      OpenCV            |            22.4            |            22.6           |       6.4        |        9.3        |        18.1
      TorchCodecPublic  |            39.4            |            28.1           |       9.5        |        9.8        |        26.0

Times are in milliseconds (ms).

Image

@facebook-github-botfacebook--bot added the CLA SignedThis label is managed by the Meta Open Source bot.label May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.

Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

raise ValueError("Could not open video stream")

fps = cap.get(cv2.CAP_PROP_FPS)
frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frames seems unused and gets redefined below

Comment on lines +157 to +160
if not vr.isBackendBuiltIn(backend):
_, abi, api = vr.getStreamBufferedBackendPluginVersion(backend)
if (abi < 1 or (abi == 1 and api < 2)):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinhohwang-meta, can we be more descriptive of what the meaning of these backends are? Why are we checking for specific ranges of the ABI and API?

I think we'll also want assert api_pref is not None after the loop.

Comment on lines 180 to 181
if not ok:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's raise here instead of a break. This will ensure that we are not reporting results in case cap.grab() fails.


fps = cap.get(cv2.CAP_PROP_FPS)
frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
approx_frame_numbers = [int(pts * fps) for pts in pts_list]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/numbers/indices. We use the term indices for this.

for i in range(n):
ok = cap.grab()
if not ok:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we should raise instead of break

ret, frame = cap.retrieve()
if ret:
frames.append(frame)
cap.release()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assert that len(frames) == n to ensure no error went undetected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.

Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

@scotts

The comparison we'll probably be most interested in is this one:

python benchmarks/decoders/benchmark_decoders.py --decoders opencv,torchcodec_public:seek_mode=exact,torchcodec_public:seek_mode=approximate,torchaudio --min-run-seconds 40

That is, it compares:

  1. OpenCV
  2. TorchCodec public API, exact seek mode
  3. TorchCodec public API, approximate seek mode
  4. TorchAudio

We'll also want to use some more videos - the ones that are part of the README benchmark are good to use. We generate those on the fly here:

resolutions = ["1920x1080"]
encodings = ["libx264"]
patterns = ["mandelbrot"]
fpses = [60]
gop_sizes = [600]
durations = [10, 120]
pix_fmts = ["yuv420p"]
ffmpeg_path = "ffmpeg"
generate_videos(
resolutions,
encodings,
patterns,
fpses,
gop_sizes,
durations,
pix_fmts,
ffmpeg_path,
videos_dir_path,
)

That doesn't need to be a part of this PR. In this PR, we should focus on correct usage of the OpenCV API and then we can investigate the comparative performance in follow-ups. Just making a note. :)

@jinhohwang-meta

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.

Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

@NicolasHug I actually did compare with approximate as well, but it showed similar results. @scotts asked me to push this for the team to follow on to find out the gap and I am very much interested in to see why as well.

@Dan-FloresDan-Flores mentioned this pull request Jun 4, 2025
Sign up for free to join this conversation on . Already have an account? Sign in to comment
CLA SignedThis label is managed by the Meta Open Source bot.
None yet

Successfully merging this pull request may close these issues.