video: VP9 fallback + BYO H.264 + PyAV 15 bump#1999
Conversation
* fix: remove libopenh264 and replace with libx264 Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: install libx264 Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: drop libx264 support Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Enhance video transcoding support by adding `libvpx-vp9` as a CPU fallback encoder alongside `h264_nvenc`. Update installation instructions and verification steps to reflect the new encoder options. Modify `ClipTranscodingStage` to validate encoder selection and handle encoding options for both encoders. Update relevant documentation and examples to guide users on using the new encoder. Signed-off-by: Ao Tang <aot@nvidia.com> * Minor update Signed-off-by: Ao Tang <aot@nvidia.com> * refine docs Signed-off-by: Ao Tang <aot@nvidia.com> * refine docs Signed-off-by: Ao Tang <aot@nvidia.com> * add back openlibh264 and inform user to install themself if needed Signed-off-by: Ao Tang <aot@nvidia.com> * docs format Signed-off-by: Ao Tang <aot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Signed-off-by: Ao Tang <aot@nvidia.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update FFmpeg and video processing support - Upgrade PyAV dependency from version 13.1.0 to 15.1.0 in `pyproject.toml`. - Add support for software H.264/HEVC/AV1 decoders in the Curator container by introducing an opt-in script (`install_h264_support.sh`) that recompiles FFmpeg with the necessary decoders. - Enhance error handling in `VideoReaderStage` to log warnings when software codecs are missing, improving user feedback. - Update documentation to reflect changes in codec support and installation instructions for users needing software decoders. - Modify Dockerfile to ensure FFmpeg is discoverable by source-built Python dependencies. This commit aims to improve video processing capabilities and user experience when handling H.264/HEVC/AV1 inputs. Signed-off-by: Ao Tang <aot@nvidia.com> * Enhance CI workflows with FFmpeg library installation - Added steps to install FFmpeg development libraries in both `cicd-main.yml` and `install-test.yml` workflows, ensuring necessary dependencies for source-built PyAV are available. - Improved error logging in `VideoReaderStage` by simplifying the warning message for missing software codecs. - Updated test cases in `test_decoder_utils.py` to handle codec-related error messages more accurately. These changes aim to streamline the video processing pipeline and improve error handling for codec issues. Signed-off-by: Ao Tang <aot@nvidia.com> * Refactor FFmpeg installation in CI workflows and Dockerfile - Removed FFmpeg development library installation steps from `cicd-main.yml` and `install-test.yml` workflows to streamline CI processes. - Updated `Dockerfile` to enforce source-building of PyAV with the `--no-binary-package av` option, ensuring compatibility with the FFmpeg version used in the Docker image. - Added explanatory comments in `pyproject.toml` regarding the handling of PyAV dependencies in Docker. These changes aim to improve the build process and maintain consistency in dependency management across environments. Signed-off-by: Ao Tang <aot@nvidia.com> * wording Signed-off-by: Ao Tang <aot@nvidia.com> --------- Signed-off-by: Ao Tang <aot@nvidia.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
* fix: update motion vector export flag handling in PyAV Signed-off-by: Ao Tang <aot@nvidia.com> * ruff check Signed-off-by: Ao Tang <aot@nvidia.com> * format Signed-off-by: Ao Tang <aot@nvidia.com> * feat: implement _resolve_export_mvs_flag function for PyAV compatibility Added a new function to handle the EXPORT_MVS bitflag for different PyAV versions, ensuring compatibility and preventing silent failures in motion vector retrieval. Updated the motion vector decoding logic to utilize this new function. Added unit tests to verify the correct behavior of the flag resolution. Signed-off-by: Ao Tang <aot@nvidia.com> --------- Signed-off-by: Ao Tang <aot@nvidia.com>
|
/ok to test 328033c |
Greptile SummaryThis PR cherry-picks three
Confidence Score: 4/5The core transcoding, metadata, and motion-vector paths are all covered by new unit tests; the three cherry-picked changes are logically coherent and the Docker/PyAV wiring is sound. Two narrow edge cases in error handling could mislead users but won't cause data loss or silent incorrect results. The libopenh264 availability probe uses check=False but never inspects the return code — a broken ffmpeg binary (missing .so, bad permissions) would produce empty stdout and raise a 'codec not found' error that points the user to reinstall the codec rather than fix the real problem. Similarly, _resolve_export_mvs_flag would surface a bare AttributeError with no PyAV context if a third flag name appears in a future release. Neither path affects the default VP9 or NVENC flows. clip_extraction_stages.py (_verify_libopenh264_available error masking) and motion_vector_backend.py (_resolve_export_mvs_flag AttributeError fallback) are worth a second look before merge. Important Files Changed
Reviews (1): Last reviewed commit: "fix: update motion vector export flag ha..." | Re-trigger Greptile |
| """Probe the local FFmpeg build for libopenh264 support.""" | ||
| ffmpeg_bin = shutil.which("ffmpeg") | ||
| if ffmpeg_bin is None: | ||
| error_msg = ( | ||
| "Could not find `ffmpeg` on PATH while verifying libopenh264 support. " | ||
| f"Install FFmpeg and ensure it is on PATH. See {_BYO_H264_DOCS_URL}" | ||
| ) | ||
| raise RuntimeError(error_msg) | ||
| try: | ||
| result = subprocess.run( # noqa: S603 | ||
| [ffmpeg_bin, "-hide_banner", "-encoders"], | ||
| capture_output=True, | ||
| text=True, | ||
| check=False, | ||
| timeout=10, | ||
| ) | ||
| except subprocess.TimeoutExpired as e: | ||
| error_msg = f"`ffmpeg -encoders` timed out while verifying libopenh264 support. See {_BYO_H264_DOCS_URL}" | ||
| raise RuntimeError(error_msg) from e | ||
| if "libopenh264" not in result.stdout: | ||
| error_msg = ( | ||
| "encoder='libopenh264' was requested but the local FFmpeg build " | ||
| "does not include it. Curator does not ship libopenh264 due to " | ||
| "its patent-license redistribution model. To enable it, install " | ||
| f"a libopenh264-enabled FFmpeg yourself — see {_BYO_H264_DOCS_URL}" | ||
| ) | ||
| raise RuntimeError(error_msg) |
There was a problem hiding this comment.
Broken ffmpeg masks "libopenh264 not found" for unrelated failures
subprocess.run(..., check=False) means any non-zero exit from ffmpeg — e.g., a missing shared library at container boot, a corrupted binary, or a permission error — leaves result.stdout empty or truncated. The subsequent if "libopenh264" not in result.stdout check then unconditionally raises a RuntimeError blaming a missing libopenh264 codec, when the real failure is something entirely different. A user following the linked docs to reinstall the codec would not fix the problem.
| _MIN_SIDE_RESOLUTION = 256 | ||
|
|
||
|
|
||
| def _resolve_export_mvs_flag() -> int: | ||
| """Return the EXPORT_MVS bitflag, accepting either the PyAV >=15 lowercase | ||
| name (``export_mvs``) or the PyAV <=13 uppercase name (``EXPORT_MVS``). | ||
|
|
||
| The enum member was renamed between PyAV 13 and 15. Tests for both branches | ||
| pin this contract so a future PyAV bump that renames it again surfaces as | ||
| a failed unit test rather than silently zero motion vectors at runtime. | ||
| """ | ||
| flags2 = av.codec.context.Flags2 | ||
| flag = getattr(flags2, "export_mvs", None) |
There was a problem hiding this comment.
Bare
AttributeError if neither flag name exists
If a future PyAV release renames the flag a third time, getattr(flags2, "export_mvs", None) returns None and flags2.EXPORT_MVS raises AttributeError: type object 'Flags2' has no attribute 'EXPORT_MVS' — no stack context, no mention of PyAV, no version hint. The docstring says "surfaces as a failed unit test", but a runtime hit (e.g., in a worker that isn't under test) would produce a completely opaque crash. A try/except AttributeError with a message pointing to the PyAV version would make the failure self-diagnosable.
jgerh
left a comment
There was a problem hiding this comment.
Completed tech pubs review and provided a few copyedits/suggested text revisions for clarity.
| |----------|-------------|-----| | ||
| | **Local Development** | Minimum specs listed above | Continue below | | ||
| | **Production Clusters** | Detailed hardware, network, storage specs | [Deployment Requirements](deployment/requirements.md) | | ||
| | **Multi-node Setup** | Advanced infrastructure planning | [Deployment Options](deployment/index.md) | |
There was a problem hiding this comment.
Deployment Options links to prerequisites, not multi-node setup or advanced infrastructure planning. However, both topics are documented — multi-node setup lives under docs/admin/deployment/slurm/, and advanced infrastructure planning lives under docs/reference/infrastructure/. Neither is surfaced from the main /admin/deployment landing page.
| Choose one of the following installation methods based on your needs: | ||
|
|
||
| :::{tip} | ||
| **Docker is the recommended installation method** for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the [Container Installation](#container-installation) tab below. |
There was a problem hiding this comment.
| **Docker is the recommended installation method** for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, eliminating the need for manual dependency setup. Refer to the [Container Installation](#container-installation) tab below. |
| - **H.264 inputs in CPU-only pipeline stages.** `VideoReader` and `ClipWriter` invoke `ffprobe` from CPU-only Ray actors that can't see the GPU; they need a software `h264`/`hevc`/`av1` decoder to extract metadata. Without it you'll get a `SoftwareCodecMissingError` pointing back here. | ||
| - **H.264 software encoding** (for example, on GPUs without an NVENC encoder block such as A100 or H100, when VP9 isn't acceptable). | ||
|
|
||
| #### Option 1: Run the bundled installer inside the container (Recommended) |
There was a problem hiding this comment.
| #### Option 1: Run the bundled installer inside the container (Recommended) | |
| #### Option 1: Run the Bundled Installer Inside the Container (Recommended) |
|
|
||
| The build takes ~5–10 minutes, replaces `/usr/local/bin/{ffmpeg,ffprobe}` in place, and pins to the same FFmpeg tag as the image build. Script source: [docker/common/install_h264_support.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_h264_support.sh). | ||
|
|
||
| License notice: the default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution. |
There was a problem hiding this comment.
| License notice: the default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution. | |
| License notice: The default mode adds only FFmpeg-internal decoders (LGPL). With `--with-libopenh264` the binary additionally links Cisco's OpenH264 (BSD-2-Clause + Cisco-distributed binary license — see https://www.openh264.org/BINARY_LICENSE.txt). You are responsible for any license obligations the resulting binaries impose on your distribution. |
|
|
||
| ## Troubleshooting | ||
|
|
||
| - "Encoder not found": Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`. |
There was a problem hiding this comment.
| - `Encoder not found`: Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`. |
| ## Troubleshooting | ||
|
|
||
| - "Encoder not found": Your `ffmpeg` build may lack the encoder; verify with `ffmpeg -encoders`. | ||
| - "No NVENC capable devices found": Install NVIDIA drivers/CUDA and ensure the GPU is visible in `nvidia-smi`. |
There was a problem hiding this comment.
| - `No NVENC capable devices found`: Install NVIDIA drivers/CUDA and ensure the GPU is visible in `nvidia-smi`. |
| You can reuse the same `<MODEL_DIR>` across runs. | ||
| ::: | ||
|
|
||
| 2. No additional setup is required. The model will be downloaded automatically when first used. |
There was a problem hiding this comment.
| 2. Verify that `MODEL_DIR` is writable. No additional setup is required. |
| @@ -1 +1 @@ | |||
| # Getting Started with Video Curation | |||
There was a problem hiding this comment.
| # Get Started with Video Curation |
| --fixed-stride-split-duration 10.0 \ | ||
| --embedding-algorithm cosmos-embed1-224p | ||
| ``` | ||
| This example extends from the above example and adds an additional embedding stages using `cosmos-embed1-224p` model. Use `--model-dir "$MODEL_DIR"` if the model is predownloaded. |
There was a problem hiding this comment.
| This example extends the example above and adds an embedding stage using the `cosmos-embed1-224p` model. Use `--model-dir "$MODEL_DIR"` if the model is already downloaded. |
| --verbose | ||
| ``` | ||
| This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `libopenh264` encoder, and enables verbose logging for more detailed output. | ||
| This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `h264_nvenc` encoder, and enables verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container. |
There was a problem hiding this comment.
| This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applies the Cosmos-Embed1 embedding model to each clip, transcodes the output using the `h264_nvenc` encoder, and enables verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container. | |
| This example demonstrates a more advanced workflow than the minimal example by using scene-aware splitting with the TransNetV2 algorithm (which detects scene boundaries instead of fixed intervals), applying the Cosmos-Embed1 embedding model to each clip, transcoding the output using the `h264_nvenc` encoder, and enabling verbose logging for more detailed output. On GPUs without NVENC (such as A100/H100), pass `--transcode-encoder libvpx-vp9` instead — VP9 is a royalty-free CPU encoder that produces clips in the same `.mp4` container. |
Summary
Unifies three video-encoder PRs that previously landed on
r1.2.0into a single change targetingmain. Each PR is cherry-picked from itsr1.2.0squash commit; the branch carries the same three commits in order.r1.2.079cacb469libvpx-vp9as the CPU fallback encoder alongsideh264_nvenc; updateClipTranscodingStagevalidation, docs, and tutorials.8055f35e2install_h264_support.sh), refactorinstall_ffmpeg.sh, bump PyAV13.1.0 → 15.1.0, expanddecoder_utils, update CI/Dockerfile.f07fa0e19_resolve_export_mvs_flagto handle PyAV'sexport_mvsflag rename across versions; covers the API change introduced by the PyAV 15 bump in #1959.Cherry-pick notes
main.docker/Dockerfile— keptmain's--python /usr/bin/python3.13venv pin and added the PR'sPKG_CONFIG_PATH=/usr/local/lib/pkgconfigenv so source-built PyAV links against the system FFmpeg.uv.lock— discarded the textual merge and regenerated viauv lock. The only net change vs.main's lock isav 13.1.0 → 15.1.0.