Skip to content

Commit 7ce7adf

Browse files
chtruong814ko3n1g
andauthored
ci: Add initial GHA (#1)
Signed-off-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com>
1 parent c43a4a5 commit 7ce7adf

20 files changed

Lines changed: 923 additions & 1 deletion
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help us improve
4+
title: ''
5+
labels: bug
6+
assignees: ''
7+
8+
---
9+
10+
**Describe the bug**
11+
12+
A clear and concise description of what the bug is.
13+
14+
**Steps/Code to reproduce bug**
15+
16+
Please list *minimal* steps or code snippet for us to be able to reproduce the bug.
17+
18+
A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.
19+
20+
21+
**Expected behavior**
22+
23+
A clear and concise description of what you expected to happen.
24+
25+
**Environment overview (please complete the following information)**
26+
27+
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
28+
- Method of install: [pip install or from source]. Please specify exact commands you used to install.
29+
- If method of install is [Docker], provide `docker pull` & `docker run` commands used
30+
31+
**Environment details**
32+
33+
If NVIDIA docker image is used you don't need to specify these.
34+
Otherwise, please provide:
35+
- OS version
36+
- PyTorch version
37+
- Python version
38+
39+
**Additional context**
40+
41+
Add any other context about the problem here.
42+
Example: GPU model
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea for this project
4+
title: ''
5+
labels: feature request
6+
assignees: ''
7+
8+
---
9+
10+
**Is your feature request related to a problem? Please describe.**
11+
12+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
13+
14+
**Describe the solution you'd like**
15+
16+
A clear and concise description of what you want to happen.
17+
Provide a code snippet on how new APIs/changes would be used by others.
18+
19+
**Describe alternatives you've considered**
20+
21+
A clear and concise description of any alternative solutions or features you've considered.
22+
23+
**Additional context**
24+
25+
Add any other context or screenshots about the feature request here

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# What does this PR do ?
2+
3+
Add a one line overview of what this PR aims to accomplish.
4+
5+
# Changelog
6+
- Please update the [CHANGELOG.md](/CHANGELOG.md) under next version with high level changes in this PR.
7+
8+
# Usage
9+
* You can potentially add a usage example below
10+
11+
```python
12+
# Add a code snippet demonstrating how to use this
13+
```
14+
15+
# Before your PR is "Ready for review"
16+
**Pre checks**:
17+
- [ ] Make sure you read and followed [Contributor guidelines](/CONTRIBUTING.md)
18+
- [ ] Did you write any new necessary tests?
19+
- [ ] Did you add or update any necessary documentation? Make sure to also update the [NeMo Framework User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html) which contains the tutorials
20+
21+
# Checklist when contributing
22+
- [ ] TBD
23+
24+
# Additional Information
25+
* Related to # (issue)

.github/labeler.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
documentation:
15+
- docs/**
16+
17+
CI:
18+
- .github/**/*

.github/workflows/_run_test.yml

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
name: ~test template
15+
16+
on:
17+
workflow_call:
18+
inputs:
19+
RUNNER:
20+
type: string
21+
description: Runner to use for test
22+
required: true
23+
TIMEOUT:
24+
type: number
25+
description: Max runtime of test in minutes
26+
required: false
27+
default: 10
28+
SCRIPT:
29+
type: string
30+
description: Test script to execute
31+
required: true
32+
AFTER_SCRIPT:
33+
type: string
34+
description: Script to run after main test
35+
required: false
36+
default: ":"
37+
IS_OPTIONAL:
38+
type: boolean
39+
description: Failure will cancel all other tests if set to true
40+
required: false
41+
default: false
42+
outputs:
43+
conclusion:
44+
description: Conclusion of main test step
45+
value: ${{ jobs.main.outputs.conclusion }}
46+
log:
47+
description: Last 2000 characters of the test step's log
48+
value: ${{ jobs.main.outputs.log }}
49+
jobs:
50+
51+
main:
52+
runs-on: ${{ inputs.RUNNER }}
53+
outputs:
54+
conclusion: ${{ steps.main.conclusion }}
55+
log: ${{ steps.main.outputs.log }}
56+
steps:
57+
- name: Docker system cleanup
58+
run: |
59+
docker system prune -a --filter "until=48h" --force || true
60+
61+
- name: Docker pull image
62+
run: |
63+
docker pull nemoci.azurecr.io/nemo__placeholder_container:${{ github.run_id }}
64+
65+
- name: Start container
66+
run: |
67+
docker run --rm -d --name nemo_container_${{ github.run_id }} --runtime=nvidia --gpus all --shm-size=64g \
68+
--env TRANSFORMERS_OFFLINE=0 \
69+
--env HYDRA_FULL_ERROR=1 \
70+
--env HF_HOME=/home/TestData/_placeholder/hf_home \
71+
--env _PLACEHOLDER_CI_DIR=/home/TestData/_placeholder \
72+
--env _PLACEHOLDER_REPO_DIR=/opt/NeMo-_Placeholder \
73+
--volume /mnt/datadrive/TestData/_placeholder/checkpoints:/home/TestData/_placeholder/checkpoints:ro \
74+
--volume /mnt/datadrive/TestData/_placeholder/hf_home/hub:/home/TestData/_placeholder/hf_home/hub:ro \
75+
nemoci.azurecr.io/nemo__placeholder_container:${{ github.run_id }} \
76+
bash -c "sleep $(( ${{ inputs.TIMEOUT }} * 60 + 60 ))"
77+
78+
- id: main
79+
name: Run main script
80+
timeout-minutes: ${{ inputs.TIMEOUT }}
81+
run: |
82+
# Print the host driver for debugging
83+
nvidia-smi
84+
mkdir -p ${{ github.run_id }}
85+
cd ${{ github.run_id }}/
86+
87+
set +e
88+
(
89+
set -e
90+
91+
cmd=$(cat <<"RUN_TEST_EOF"
92+
nvidia-smi
93+
# Sanity check the driver/cuda combo
94+
cudaCheck
95+
# In case git commands need to be run inside _Placeholder
96+
git config --global --add safe.directory $_PLACHOLDER_REPO_DIR
97+
${{ inputs.SCRIPT }}
98+
RUN_TEST_EOF
99+
)
100+
docker exec nemo_container_${{ github.run_id }} bash -eux -o pipefail -c "$cmd"
101+
) 2> >(tee err.log)
102+
103+
EXIT_CODE=$?
104+
105+
echo "log=$(tail -c 2000 err.log | base64 -w 0)" >> "$GITHUB_OUTPUT"
106+
107+
exit $EXIT_CODE
108+
109+
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
110+
if: failure() && inputs.IS_OPTIONAL == false
111+
112+
- name: after_script
113+
if: always() && inputs.AFTER_SCRIPT != ':'
114+
run: |
115+
cmd=$(cat <<"RUN_TEST_EOF"
116+
${{ inputs.AFTER_SCRIPT }}
117+
RUN_TEST_EOF
118+
)
119+
docker exec nemo_container_${{ github.run_id }} bash -eux -o pipefail -c "$cmd"
120+
121+
- name: Container shutdown
122+
if: always()
123+
run: |
124+
docker container stop nemo_container_${{ github.run_id }} || true
125+
docker container rm nemo_container_${{ github.run_id }} || true
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# name: Build, test, and publish a PyPi wheel (to testpypi)
15+
16+
# on:
17+
# push:
18+
# branches:
19+
# - main
20+
# - 'r**'
21+
22+
# defaults:
23+
# run:
24+
# shell: bash -x -e -u -o pipefail {0}
25+
26+
# jobs:
27+
# build-test-publish-wheel:
28+
# uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_build_test_publish_wheel.yml@v0.22.3
29+
# with:
30+
# dry-run: true
31+
# python-package: nemo__placeholder
32+
# python-version: "3.12"
33+
# secrets:
34+
# TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }}
35+
# TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
36+
# SLACK_WEBHOOK: ${{ secrets.SLACK_RELEASE_ENDPOINT }}
37+
# SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
name: Create PR to main with cherry-pick from release
15+
16+
on:
17+
push:
18+
branches:
19+
- main
20+
21+
jobs:
22+
cherry-pick:
23+
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_cherry_pick.yml@v0.22.7
24+
secrets:
25+
PAT: ${{ secrets.PAT }}
26+
SLACK_WEBHOOK_ADMIN: ${{ secrets.SLACK_WEBHOOK_ADMIN }}
27+
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}

0 commit comments

Comments
 (0)