add opencv benchmark by jinhohwang-meta · Pull Request #674 · pytorch/torchcodec ·

This PR added the benchmark for opencv. The benchmark shows that there is gap. Hope the torchcodec team can find out where the gap is from and improve the decoding performance.

Local benchmark

|  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: --------------------------------------------------------------------------------------------------------------------------------
      OpenCV            |            22.4            |            22.6           |       6.4        |        9.3        |        18.1
      TorchCodecPublic  |            39.4            |            28.1           |       9.5        |        9.8        |        26.0

Times are in milliseconds (ms).

add opencv benchmark

406e336

facebook--bot added the CLA Signedlabel

NicolasHug reviewed

View reviewed changes

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.

Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

benchmarks/decoders/benchmark_decoders_library.py Outdated

+raise ValueError("Could not open video stream")
+fps = cap.get(cv2.CAP_PROP_FPS)
+frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

Member

NicolasHug May 12, 2025

frames seems unused and gets redefined below

benchmarks/decoders/benchmark_decoders_library.py

Comment on lines +157 to +160

+if not vr.isBackendBuiltIn(backend):
+_, abi, api = vr.getStreamBufferedBackendPluginVersion(backend)
+if (abi < 1 or (abi == 1 and api < 2)):
+continue

Member

NicolasHug May 12, 2025

Can you add a comment explaining why this is needed?

Contributor

scotts May 15, 2025

@jinhohwang-meta, can we be more descriptive of what the meaning of these backends are? Why are we checking for specific ranges of the ABI and API?

I think we'll also want assert api_pref is not None after the loop.

benchmarks/decoders/benchmark_decoders_library.py Outdated

Comment on lines 180 to 181

		if not ok:
		break

Member

NicolasHug May 12, 2025

Let's raise here instead of a break. This will ensure that we are not reporting results in case cap.grab() fails.

benchmarks/decoders/benchmark_decoders_library.py Outdated

+fps = cap.get(cv2.CAP_PROP_FPS)
+frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+approx_frame_numbers = [int(pts * fps) for pts in pts_list]

Member

NicolasHug May 12, 2025

Nit: s/numbers/indices. We use the term indices for this.

benchmarks/decoders/benchmark_decoders_library.py Outdated

+for i in range(n):
+ok = cap.grab()
+if not ok:
+break

Member

NicolasHug May 12, 2025

Same here, we should raise instead of break

benchmarks/decoders/benchmark_decoders_library.py

+ret, frame = cap.retrieve()
+if ret:
+frames.append(frame)
+cap.release()

Member

NicolasHug May 12, 2025

Let's assert that len(frames) == n to ensure no error went undetected

NicolasHug reviewed

View reviewed changes

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.

Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

The comparison we'll probably be most interested in is this one:

python benchmarks/decoders/benchmark_decoders.py --decoders opencv,torchcodec_public:seek_mode=exact,torchcodec_public:seek_mode=approximate,torchaudio --min-run-seconds 40

That is, it compares:

OpenCV
TorchCodec public API, exact seek mode
TorchCodec public API, approximate seek mode
TorchAudio

We'll also want to use some more videos - the ones that are part of the README benchmark are good to use. We generate those on the fly here:

resolutions = ["1920x1080"]
encodings = ["libx264"]
patterns = ["mandelbrot"]
fpses = [60]
gop_sizes = [600]
durations = [10, 120]
pix_fmts = ["yuv420p"]
ffmpeg_path = "ffmpeg"
generate_videos(
resolutions,
encodings,
patterns,
fpses,
gop_sizes,
durations,
pix_fmts,
ffmpeg_path,
videos_dir_path,
)

That doesn't need to be a part of this PR. In this PR, we should focus on correct usage of the OpenCV API and then we can investigate the comparative performance in follow-ups. Just making a note. :)

Thanks a lot for the benchmark @jinhohwang-meta , that's very useful! I made a few comments below.
Based on how the code is converting pts to indices (using the average fps), I think a fairer comparison against torchcodec would be to benchmark torchcodec with the approximate mode (instead of exact mode). I think it corresponds to torchcodec_core_nonbatch, although I'm not 100% sure (CC @scotts )

@NicolasHug I actually did compare with approximate as well, but it showed similar results. @scotts asked me to push this for the team to follow on to find out the gap and I am very much interested in to see why as well.

reflect comments

f85361a

Dan-Flores mentioned this pull request

add opencv benchmark #711

Merged

Sign up for free to join this conversation on . Already have an account? Sign in to comment