Add YOLO26 pose inference support#6
Open
Tar-ive wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the minimum YOLO26 pose inference path needed to run
yolo26n-posein the pure MLX runtime. This is part of the support we need for our YOLO pose model in the YOLO26 MLX Build Challenge: we need to run a camera-facing pose model on Apple Silicon through MLX, then use the decoded keypoints in an alarm/gesture application.The change is inference-only. It does not add pose training or pose loss support.
What changed
yolo26-pose.yamlconfig for the nano pose model path.Pose26, matching the YOLO26 pose head layout with separate keypoint and sigma branches.Pose26in model parsing/imports so the YAML can build the model graph.*-posemodels select the pose YAML.cv4_kptscv4_sigmaone2one_cv4_kptsone2one_cv4_sigmaBoxesandKeypoints.Implementation steps
Detect/Poseconventions already in this repo.Pose26in the task parser.yolo26n-pose.ptweights load into the MLX graph.Keypointsalongside boxes..npzloading and checked that pose-specific keys were not missing.Local verification
Commands run locally on this branch:
Results:
I also ran the full test suite. It has one pre-existing segmentation metric assertion unrelated to this PR:
MLX vs PyTorch pose comparison
I benchmarked this on the laptop I am developing on (MacBook M3- Pro 18GB RAM) using the same pattern as the existing inference benchmark scripts: warmup runs, timed end-to-end
model.predict,calculate_stats, speedup calculation, device info, and JSON-style result output.Dataset used for this local comparison:
Weights:
Timing summary across all 150 frames:
Pose parity summary across all 150 frames:
Key 6_7 gesture joints (part of the hackathon project), mean pixel difference vs PyTorch MPS:
This confirms that the MLX pose decode is landing keypoints on the same body parts as the PyTorch model, with small pixel-level differences across the full local 150-frame sample.