SAIGEN supports multiple package repositories to gather comprehensive software metadata. The repository system provides caching, configurable parsing, and support for various data formats.
- apt - Debian/Ubuntu APT repositories
- brew - Homebrew formulae (macOS/Linux)
- winget - Windows Package Manager
- dnf/yum - Red Hat/Fedora repositories
- pacman - Arch Linux repositories
- generic - Custom repositories with configurable parsing
Repositories are configured in the main SAIGEN configuration file under the repositories section:
repositories:
repository-name:
name: repository-name
type: apt|brew|winget|dnf|generic
platform: linux|macos|windows
url: https://repository.url/path
enabled: true|false
priority: 1-100
cache_ttl_hours: 24
timeout: 300
architecture: [amd64, arm64]
parsing:
# Parsing configuration (see below)
credentials:
# Optional authentication
metadata:
description: "Repository description"
maintainer: "Maintainer name"For repositories that provide plain text package lists:
parsing:
format: text
line_pattern: '^Package:\s*(.+)$'
name_group: 1
version_pattern: '^Version:\s*(.+)$'
version_group: 1
description_pattern: '^Description:\s*(.+)$'
description_group: 1For JSON-based APIs:
parsing:
format: json
package_path: [packages] # Path to package array
field_mapping:
name: package_name
version: latest_version
description: summary
homepage: project_url
maintainer: authorFor XML-based repositories:
parsing:
format: xml
package_xpath: './/package'
xml_field_mapping:
name: name
version: 'version/@ver'
description: descriptionFor YAML-based repositories:
parsing:
format: yaml
package_path: [packages]
field_mapping:
name: name
version: version
description: descriptioncache:
directory: ~/.saigen/cache
max_size_mb: 1000
default_ttl: 3600
cleanup_interval: 86400# View cache statistics
saigen repo stats
# Update specific repository cache
saigen repo update ubuntu-main
# Update all repository caches
saigen repo update --all
# Clear cache for specific repository
saigen repo clear ubuntu-main
# Clear all caches
saigen repo clear --all
# List cached repositories
saigen repo list --cachedFor repositories requiring authentication:
repositories:
private-repo:
name: private-repo
type: generic
url: https://private.repo.com/api/packages
credentials:
username: "${REPO_USERNAME}"
password: "${REPO_PASSWORD}"
api_key: "${REPO_API_KEY}"- Use environment variables for credentials
- Set appropriate cache TTL values
- Validate repository URLs and certificates
- Monitor cache size and cleanup regularly
- Use HTTPS URLs when possible
repositories:
custom-api:
name: custom-api
type: generic
platform: linux
url: https://api.example.com/v1/packages
enabled: true
priority: 5
cache_ttl_hours: 6
timeout: 120
parsing:
format: json
package_path: [data, packages]
field_mapping:
name: pkg_name
version: current_version
description: short_desc
homepage: website
license: license_type
size: package_size
dependencies: deps
tags: categories
metadata:
description: Custom Package API
maintainer: Example Corp
api_version: v1For complex parsing requirements, you can implement custom parsers:
def custom_parser(content: str, repository_info: RepositoryInfo) -> List[RepositoryPackage]:
"""Custom parser for specialized repository formats."""
packages = []
# Custom parsing logic here
return packages
# Register in configuration
repositories:
custom-format:
parsing:
format: custom
custom_parser: my_module.custom_parser# Check repository configuration
saigen config show --section repositories
# Test repository connectivity
saigen repo test ubuntu-main
# Check cache status
saigen repo stats ubuntu-main# Validate repository configuration
saigen config validate
# Check parsing with verbose output
saigen repo update ubuntu-main --verbose
# Test parsing with sample data
saigen repo parse-test ubuntu-main --sample-size 10# Clear corrupted cache
saigen repo clear ubuntu-main
# Rebuild cache
saigen repo update ubuntu-main --force
# Check cache directory permissions
ls -la ~/.saigen/cache/- Set appropriate
cache_ttl_hoursbased on repository update frequency - Monitor cache size with
saigen repo stats - Use
cleanup_intervalto automatically remove expired entries
generation:
parallel_requests: 3 # Limit concurrent repository requests
request_timeout: 120 # Timeout for repository requestsConfigure repository priorities to prefer faster or more reliable sources:
repositories:
fast-mirror:
priority: 10 # Higher priority
slow-mirror:
priority: 5 # Lower priorityRepositories are automatically used during saidata generation:
# Generate using all enabled repositories
saigen generate nginx
# Generate using specific repositories
saigen generate nginx --repositories ubuntu-main,homebrew-core
# Generate with repository data context
saigen generate nginx --use-repository-contextThe generation engine will:
- Query enabled repositories for package information
- Use cached data when available
- Include repository metadata in LLM context
- Validate generated saidata against repository data