Skip to content

video: VP9 fallback + BYO H.264 + PyAV 15 bump#1999

Open
suiyoubi wants to merge 3 commits into
mainfrom
aot/video-encoder-unified
Open

video: VP9 fallback + BYO H.264 + PyAV 15 bump#1999
suiyoubi wants to merge 3 commits into
mainfrom
aot/video-encoder-unified

Conversation

@suiyoubi
Copy link
Copy Markdown
Contributor

Summary

Unifies three video-encoder PRs that previously landed on r1.2.0 into a single change targeting main. Each PR is cherry-picked from its r1.2.0 squash commit; the branch carries the same three commits in order.

# Original PR Squash on r1.2.0 Scope
1 #1930 — Remove libopenh & libx264, add VP9 79cacb469 Drop libopenh264/libx264, add libvpx-vp9 as the CPU fallback encoder alongside h264_nvenc; update ClipTranscodingStage validation, docs, and tutorials.
2 #1959 — FFmpeg + video processing support 8055f35e2 Bring-your-own H.264 install path (install_h264_support.sh), refactor install_ffmpeg.sh, bump PyAV 13.1.0 → 15.1.0, expand decoder_utils, update CI/Dockerfile.
3 #1973 — Fix motion vector export f07fa0e19 Add _resolve_export_mvs_flag to handle PyAV's export_mvs flag rename across versions; covers the API change introduced by the PyAV 15 bump in #1959.

Cherry-pick notes

suiyoubi and others added 3 commits May 19, 2026 08:05
* fix: remove libopenh264 and replace with libx264

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: install libx264

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: drop libx264 support

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Enhance video transcoding support by adding `libvpx-vp9` as a CPU fallback encoder alongside `h264_nvenc`. Update installation instructions and verification steps to reflect the new encoder options. Modify `ClipTranscodingStage` to validate encoder selection and handle encoding options for both encoders. Update relevant documentation and examples to guide users on using the new encoder.

Signed-off-by: Ao Tang <aot@nvidia.com>

* Minor update

Signed-off-by: Ao Tang <aot@nvidia.com>

* refine docs

Signed-off-by: Ao Tang <aot@nvidia.com>

* refine docs

Signed-off-by: Ao Tang <aot@nvidia.com>

* add back openlibh264 and inform user to install themself if needed

Signed-off-by: Ao Tang <aot@nvidia.com>

* docs format

Signed-off-by: Ao Tang <aot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update FFmpeg and video processing support

- Upgrade PyAV dependency from version 13.1.0 to 15.1.0 in `pyproject.toml`.
- Add support for software H.264/HEVC/AV1 decoders in the Curator container by introducing an opt-in script (`install_h264_support.sh`) that recompiles FFmpeg with the necessary decoders.
- Enhance error handling in `VideoReaderStage` to log warnings when software codecs are missing, improving user feedback.
- Update documentation to reflect changes in codec support and installation instructions for users needing software decoders.
- Modify Dockerfile to ensure FFmpeg is discoverable by source-built Python dependencies.

This commit aims to improve video processing capabilities and user experience when handling H.264/HEVC/AV1 inputs.

Signed-off-by: Ao Tang <aot@nvidia.com>

* Enhance CI workflows with FFmpeg library installation

- Added steps to install FFmpeg development libraries in both `cicd-main.yml` and `install-test.yml` workflows, ensuring necessary dependencies for source-built PyAV are available.
- Improved error logging in `VideoReaderStage` by simplifying the warning message for missing software codecs.
- Updated test cases in `test_decoder_utils.py` to handle codec-related error messages more accurately.

These changes aim to streamline the video processing pipeline and improve error handling for codec issues.

Signed-off-by: Ao Tang <aot@nvidia.com>

* Refactor FFmpeg installation in CI workflows and Dockerfile

- Removed FFmpeg development library installation steps from `cicd-main.yml` and `install-test.yml` workflows to streamline CI processes.
- Updated `Dockerfile` to enforce source-building of PyAV with the `--no-binary-package av` option, ensuring compatibility with the FFmpeg version used in the Docker image.
- Added explanatory comments in `pyproject.toml` regarding the handling of PyAV dependencies in Docker.

These changes aim to improve the build process and maintain consistency in dependency management across environments.

Signed-off-by: Ao Tang <aot@nvidia.com>

* wording

Signed-off-by: Ao Tang <aot@nvidia.com>

---------

Signed-off-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
* fix: update motion vector export flag handling in PyAV

Signed-off-by: Ao Tang <aot@nvidia.com>

* ruff check

Signed-off-by: Ao Tang <aot@nvidia.com>

* format

Signed-off-by: Ao Tang <aot@nvidia.com>

* feat: implement _resolve_export_mvs_flag function for PyAV compatibility

Added a new function to handle the EXPORT_MVS bitflag for different PyAV versions, ensuring compatibility and preventing silent failures in motion vector retrieval. Updated the motion vector decoding logic to utilize this new function. Added unit tests to verify the correct behavior of the flag resolution.

Signed-off-by: Ao Tang <aot@nvidia.com>

---------

Signed-off-by: Ao Tang <aot@nvidia.com>
@suiyoubi suiyoubi requested review from a team and abhinavg4 as code owners May 19, 2026 15:15
@suiyoubi
Copy link
Copy Markdown
Contributor Author

/ok to test 328033c

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR cherry-picks three r1.2.0 video encoder changes onto main: it drops libx264, adds libvpx-vp9 as a CPU fallback encoder, introduces a BYO-H.264 install path (install_h264_support.sh), refactors install_ffmpeg.sh to build a minimal shared-library FFmpeg from source, bumps PyAV 13.1.0 → 15.1.0, and adds _resolve_export_mvs_flag to handle the EXPORT_MVS flag rename across PyAV versions.

  • VP9 fallback: ClipTranscodingStage now accepts libvpx-vp9 with setup-time validation, a perf-advisory log, and CRF/VBR command-building; libopenh264 remains supported but probes for availability at setup time instead of at encode time.
  • BYO H.264: New install_h264_support.sh recompiles FFmpeg in-container with software h264/hevc/av1 decoders and an optional libopenh264 encoder; decoder_utils raises SoftwareCodecMissingError (with a targeted hint) when ffprobe fails due to missing NVDEC.
  • PyAV 15 compat: _resolve_export_mvs_flag tries lowercase export_mvs first (PyAV ≥15), falls back to EXPORT_MVS (PyAV ≤13); unit tests pin both branches so a future rename would surface as a test failure.

Confidence Score: 4/5

The core transcoding, metadata, and motion-vector paths are all covered by new unit tests; the three cherry-picked changes are logically coherent and the Docker/PyAV wiring is sound. Two narrow edge cases in error handling could mislead users but won't cause data loss or silent incorrect results.

The libopenh264 availability probe uses check=False but never inspects the return code — a broken ffmpeg binary (missing .so, bad permissions) would produce empty stdout and raise a 'codec not found' error that points the user to reinstall the codec rather than fix the real problem. Similarly, _resolve_export_mvs_flag would surface a bare AttributeError with no PyAV context if a third flag name appears in a future release. Neither path affects the default VP9 or NVENC flows.

clip_extraction_stages.py (_verify_libopenh264_available error masking) and motion_vector_backend.py (_resolve_export_mvs_flag AttributeError fallback) are worth a second look before merge.

Important Files Changed

Filename Overview
nemo_curator/stages/video/clipping/clip_extraction_stages.py Adds libvpx-vp9 encoder path, validates hwaccel/encoder combo in setup(), probes for libopenh264 at setup time. Minor issue: check=False in the ffmpeg probe subprocess can mask unrelated ffmpeg failures with a misleading 'libopenh264 not found' error.
nemo_curator/stages/video/filtering/motion_vector_backend.py Adds _resolve_export_mvs_flag() to handle PyAV 13→15 rename of EXPORT_MVS. Falls back to bare AttributeError if neither name exists, which would be cryptic at runtime.
nemo_curator/utils/decoder_utils.py Adds SoftwareCodecMissingError with MP4 FOURCC header sniff and NVDEC failure signal detection; raises a targeted error pointing at install_h264_support.sh. Logic is sound.
nemo_curator/stages/video/io/video_reader.py Adds a specific catch for SoftwareCodecMissingError logged at ERROR level (vs WARNING for generic exceptions). Clean and correctly placed.
docker/common/install_ffmpeg.sh Refactored from tarball download to git clone with --disable-everything + explicit allowlist; switched to shared libs for PyAV. Cleanup at end of script is present.
docker/common/install_h264_support.sh New opt-in script to recompile FFmpeg with software h264/hevc/av1 decoders and optional libopenh264 encoder. Mirrors install_ffmpeg.sh configure flags exactly, includes skip for existing nv-codec-headers, and cleans up.
docker/Dockerfile Adds PKG_CONFIG_PATH for PyAV to discover system FFmpeg; forces --no-binary-package av so PyAV source-builds against the container's libav*.
pyproject.toml Bumps PyAV from 13.1.0 to 15.1.0; adds comment explaining why --no-binary-package lives in the Dockerfile rather than here.

Reviews (1): Last reviewed commit: "fix: update motion vector export flag ha..." | Re-trigger Greptile

Comment on lines +100 to +126
"""Probe the local FFmpeg build for libopenh264 support."""
ffmpeg_bin = shutil.which("ffmpeg")
if ffmpeg_bin is None:
error_msg = (
"Could not find `ffmpeg` on PATH while verifying libopenh264 support. "
f"Install FFmpeg and ensure it is on PATH. See {_BYO_H264_DOCS_URL}"
)
raise RuntimeError(error_msg)
try:
result = subprocess.run( # noqa: S603
[ffmpeg_bin, "-hide_banner", "-encoders"],
capture_output=True,
text=True,
check=False,
timeout=10,
)
except subprocess.TimeoutExpired as e:
error_msg = f"`ffmpeg -encoders` timed out while verifying libopenh264 support. See {_BYO_H264_DOCS_URL}"
raise RuntimeError(error_msg) from e
if "libopenh264" not in result.stdout:
error_msg = (
"encoder='libopenh264' was requested but the local FFmpeg build "
"does not include it. Curator does not ship libopenh264 due to "
"its patent-license redistribution model. To enable it, install "
f"a libopenh264-enabled FFmpeg yourself — see {_BYO_H264_DOCS_URL}"
)
raise RuntimeError(error_msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Broken ffmpeg masks "libopenh264 not found" for unrelated failures

subprocess.run(..., check=False) means any non-zero exit from ffmpeg — e.g., a missing shared library at container boot, a corrupted binary, or a permission error — leaves result.stdout empty or truncated. The subsequent if "libopenh264" not in result.stdout check then unconditionally raises a RuntimeError blaming a missing libopenh264 codec, when the real failure is something entirely different. A user following the linked docs to reinstall the codec would not fix the problem.

Comment on lines 28 to +40
_MIN_SIDE_RESOLUTION = 256


def _resolve_export_mvs_flag() -> int:
"""Return the EXPORT_MVS bitflag, accepting either the PyAV >=15 lowercase
name (``export_mvs``) or the PyAV <=13 uppercase name (``EXPORT_MVS``).

The enum member was renamed between PyAV 13 and 15. Tests for both branches
pin this contract so a future PyAV bump that renames it again surfaces as
a failed unit test rather than silently zero motion vectors at runtime.
"""
flags2 = av.codec.context.Flags2
flag = getattr(flags2, "export_mvs", None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bare AttributeError if neither flag name exists

If a future PyAV release renames the flag a third time, getattr(flags2, "export_mvs", None) returns None and flags2.EXPORT_MVS raises AttributeError: type object 'Flags2' has no attribute 'EXPORT_MVS' — no stack context, no mention of PyAV, no version hint. The docstring says "surfaces as a failed unit test", but a runtime hit (e.g., in a worker that isn't under test) would produce a completely opaque crash. A try/except AttributeError with a message pointing to the PyAV version would make the failure self-diagnosable.

Copy link
Copy Markdown
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed tech pubs review and provided a few copyedits/suggested text revisions for clarity.

|----------|-------------|-----|
| **Local Development** | Minimum specs listed above | Continue below |
| **Production Clusters** | Detailed hardware, network, storage specs | [Deployment Requirements](deployment/requirements.md) |
| **Multi-node Setup** | Advanced infrastructure planning | [Deployment Options](deployment/index.md) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment Options links to prerequisites, not multi-node setup or advanced infrastructure planning. However, both topics are documented — multi-node setup lives under docs/admin/deployment/slurm/, and advanced infrastructure planning lives under docs/reference/infrastructure/. Neither is surfaced from the main /admin/deployment landing page.

Choose one of the following installation methods based on your needs:

:::{tip}
**Docker is the recommended installation method** for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the [Container Installation](#container-installation) tab below.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Docker is the recommended installation method** for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, eliminating the need for manual dependency setup. Refer to the [Container Installation](#container-installation) tab below.

- **H.264 inputs in CPU-only pipeline stages.** `VideoReader` and `ClipWriter` invoke `ffprobe` from CPU-only Ray actors that can't see the GPU; they need a software `h264`/`hevc`/`av1` decoder to extract metadata. Without it you'll get a `SoftwareCodecMissingError` pointing back here.
- **H.264 software encoding** (for example, on GPUs without an NVENC encoder block such as A100 or H100, when VP9 isn't acceptable).

#### Option 1: Run the bundled installer inside the container (Recommended)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Option 1: Run the bundled installer inside the container (Recommended)
#### Option 1: Run the Bundled Installer Inside the Container (Recommended)


The build takes ~5–10 minutes, replaces `/usr/local/bin/{ffmpeg,ffprobe}` in place, and pins to the same FFmpeg tag as the image build. Script source: [docker/common/install_h264_support.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_h264_support.sh).

License notice: the default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
License notice: the default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution.
License notice: The default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution.


## Troubleshooting

- "Encoder not found": Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `Encoder not found`: Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`.

## Troubleshooting

- "Encoder not found": Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`.
- "No NVENC capable devices found": Install NVIDIA drivers/CUDA and ensure the GPU is visible in `nvidia-smi`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `No NVENC capable devices found`: Install NVIDIA drivers/CUDA and ensure the GPU is visible in `nvidia-smi`.

Comment thread docs/get-started/video.md
You can reuse the same `<MODEL_DIR>` across runs.
:::

2. No additional setup is required. The model will be downloaded automatically when first used.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Verify that `MODEL_DIR` is writable. No additional setup is required.

@@ -1 +1 @@
# Getting Started with Video Curation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Get Started with Video Curation

--fixed-stride-split-duration 10.0 \
--embedding-algorithm cosmos-embed1-224p
```
This example extends from the above example and adds an additional embedding stages using `cosmos-embed1-224p` model. Use `--model-dir "$MODEL_DIR"` if the model is predownloaded.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This example extends the example above and adds an embedding stage using the `cosmos-embed1-224p` model. Use `--model-dir "$MODEL_DIR"` if the model is already downloaded.

--verbose
```
This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `libopenh264` encoder, and enables verbose logging for more detailed output.
This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `h264_nvenc` encoder, and enables verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `h264_nvenc` encoder, and enables verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container.
This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applying the Cosmos-Embed1 embedding model to each clip, transcoding the output using the `h264_nvenc` encoder, and enabling verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants