feat(core): allow guardian prompt overrides from model metadata by charley-oai · Pull Request #13915 · openai/codex

charley-oai · 2026-03-07T23:31:14Z

Summary

add a guardian-specific developer-instructions field to model metadata
teach core guardian prompt assembly to prefer the selected guardian model's override while keeping the JSON contract appended in code
update affected test fixtures and add coverage for the override path

Testing

cargo test -p codex-protocol openai_models::tests::model_info_defaults_availability_nux_to_none_when_omitted
cargo test -p codex-core guardian_subagent_config
cargo test -p codex-api models_client_hits_models_endpoint
cargo test -p codex-app-server --test all get_auth_status_no_auth

charley-oai · 2026-03-08T00:14:52Z

@codex review this

chatgpt-codex-connector · 2026-03-08T00:16:57Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Co-authored-by: Codex <noreply@openai.com>

nidhishgajjar · 2026-04-21T04:31:06Z

Orb Code Review (powered by GLM-4.7 on Orb Cloud)

Summary

This PR introduces the ability to override guardian prompts from model metadata, allowing different models to have custom guardian instructions instead of using a single hardcoded prompt. The implementation adds a new guardian_developer_instructions field to ModelInfo and modifies the guardian prompt assembly to prefer model-specific overrides when available.

Architecture

New Component:

Model metadata field:

pub struct ModelInfo {
    // ... existing fields
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub guardian_developer_instructions: Option<String>,
    // ... more fields
}

Guardian prompt assembly with override:

fn guardian_policy_prompt(prompt_override: Option<&str>) -> String {
    let prompt = prompt_override
        .unwrap_or(include_str!("guardian_prompt.md"))
        .trim_end();
    format!("{prompt}\n\n{}", guardian_output_contract_prompt())
}

Retrieval from model metadata:

let guardian_model_info = session
    .services
    .models_manager
    .get_model_info(&guardian_model, turn.config.as_ref())
    .await;

let guardian_config = build_guardian_subagent_config(
    // ... other params
    guardian_model_info
        .guardian_developer_instructions
        .as_deref(),
)?;

Analysis

Correctness ✓

The feature implementation:

Adds optional field to model metadata
Retrieves override from model when available
Falls back to default prompt when not specified
Preserves JSON contract appending in code

Override logic:

let prompt = prompt_override
    .unwrap_or(include_str!("guardian_prompt.md"))
    .trim_end();

This correctly implements preference for model-specific prompts while maintaining backward compatibility.

Code Quality ✓

Minimal and focused changes:

Single new field addition to ModelInfo
Modified guardian prompt assembly function
Updated guardian subagent config building
Comprehensive test coverage

Good use of Rust idioms:

Uses Option<&str> for optional overrides
as_deref() for safe option handling
#[serde(default)] for backward compatibility
Test fixtures properly updated

Testing ✓

Comprehensive test coverage:

New functionality test:

#[test]
fn guardian_subagent_config_prefers_model_prompt_override() {
    let guardian_config = build_guardian_subagent_config(
        &test_config(),
        None,
        "active-model",
        None,
        Some("override prompt"),
    )
    .expect("guardian config");

    let instructions = guardian_config
        .developer_instructions
        .expect("guardian instructions");

    assert!(instructions.starts_with("override prompt"));
    assert!(instructions.contains("\"risk_level\": \"low\" | \"medium\" | \"high\""));
}

Updated all existing fixtures:

11 test files updated to include guardian_developer_instructions: None
Ensures backward compatibility
Maintains existing test behavior

Backward Compatibility ✓

Preserves existing behavior:

Field is optional with default None
When not specified, uses existing hardcoded prompt
All existing tests pass with None value
JSON contract is still appended

Why Option is correct:

New field is optional for existing models
Existing models don't need this field
Gradual migration path for model authors

Security ⚠️

Prompt injection concerns:

Model metadata is typically controlled by the platform
Guardian prompts are critical security controls
Consider validating guardian prompts for injection attempts
JSON contract appending provides some safety

Cross-file Impact

Moderate impact:

Modified files: 12 (protocol, core, tests)
Added field to ModelInfo struct
Modified guardian prompt assembly
Updated all test fixtures
No breaking changes to existing APIs

Assessment

Approve - This is a well-implemented feature that provides model flexibility:

✅ Pros:

Enables model-specific guardian prompts
Clean, minimal implementation
Good backward compatibility
Comprehensive test coverage
Follows existing code patterns
Preserves JSON contract

⚠️ Considerations:

Security: Model-specific guardian prompts could potentially weaken security if not properly validated
Complexity: Adding per-model configuration increases system complexity
Documentation: Need to ensure model authors understand the security implications

Recommendations:

Security validation: Consider adding validation for guardian prompt overrides to ensure they maintain security contracts
Documentation: Document the security implications and best practices for model-specific guardian prompts
Monitoring: Add logging or metrics when custom guardian prompts are used
Audit trail: Consider keeping track of which guardian prompts are used for auditing purposes
User visibility: Consider showing users when a custom guardian prompt is being used

Verdict: This is a well-implemented feature that provides necessary flexibility for model-specific guardian prompts. The implementation is clean, maintains backward compatibility, and has good test coverage. The main concern is ensuring that custom guardian prompts maintain appropriate security boundaries. This should be merged with the understanding that proper validation and monitoring should be implemented for the custom prompts.

charley-oai force-pushed the guardian-assessment-ui branch from 05060d9 to 04f5f35 Compare March 8, 2026 00:19

charley-oai force-pushed the guardian-prompt-override branch from a759c69 to 7c09e3b Compare March 8, 2026 00:19

feat(core): allow guardian prompt overrides from model metadata

c5fdd79

Co-authored-by: Codex <noreply@openai.com>

charley-oai force-pushed the guardian-prompt-override branch from 7c09e3b to c5fdd79 Compare March 8, 2026 00:48

charley-oai changed the base branch from guardian-assessment-ui to main March 8, 2026 00:48

This was referenced Mar 8, 2026

📊 AI CLI 工具社区动态日报 2026-03-08 rollysys/agents-radar#52

Closed

📊 Bản tin hàng ngày công cụ AI CLI 2026-03-08 compasify/agents-radar#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): allow guardian prompt overrides from model metadata#13915

feat(core): allow guardian prompt overrides from model metadata#13915
charley-oai wants to merge 1 commit into
mainfrom
guardian-prompt-override

charley-oai commented Mar 7, 2026

Uh oh!

charley-oai commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 8, 2026

Uh oh!

nidhishgajjar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charley-oai commented Mar 7, 2026

Summary

Testing

Uh oh!

charley-oai commented Mar 8, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 8, 2026

Uh oh!

nidhishgajjar commented Apr 21, 2026

Summary

Architecture

Analysis

Correctness ✓

Code Quality ✓

Testing ✓

Backward Compatibility ✓

Security ⚠️

Cross-file Impact

Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants