Skip to content

fix: resolve lite mode backend issues and add sequential strategy default#565

Merged
robinbraemer merged 13 commits into
masterfrom
fix/lite-backend-health-tracking
Sep 8, 2025
Merged

fix: resolve lite mode backend issues and add sequential strategy default#565
robinbraemer merged 13 commits into
masterfrom
fix/lite-backend-health-tracking

Conversation

@robinbraemer

@robinbraemer robinbraemer commented Sep 8, 2025

Copy link
Copy Markdown
Member

Fixes #564

Complete solution for console spam and strategy consistency in Gate Lite mode.

Issues Fixed ✅

1. Console Log Spam

  • Root cause: Connection refused errors logged at INFO level
  • Solution: Added smart detection in dialRoute to use debug level (V=1) for connection refused
  • Result: No console spam when backends are down

2. Backend Cycling

  • Root cause: Complex strategy logic broke simple backend retry
  • Solution: Restored simple pop-first approach with proper strategy integration
  • Result: All backends are tried in correct order

3. Fallback Response

  • Solution: Refactored for better testability and error handling
  • Result: Proper fallback when all backends fail

New Feature: Sequential Strategy ✨

Added Sequential Strategy Enum

  • ✅ Added sequential strategy as explicit option
  • ✅ Made sequential the default when no strategy is defined
  • ✅ Updated documentation and config files
  • ✅ Maintains all existing strategy behaviors

Strategy Behavior

# Default behavior (no strategy defined)
backend: [server1, server2, server3]
# → Tries in order: server1 → server2 → server3

# Explicit strategies  
strategy: sequential     # Same as default
strategy: random        # Random selection
strategy: round-robin   # Cycling rotation
strategy: least-connections  # Connection-based
strategy: lowest-latency     # Latency-based

Technical Implementation

Smart Error Handling

// Connection refused → Debug level (no spam)
if IsConnectionRefused(err) {
    v = 1  // Debug level
}

Strategy Integration

case "": 
    // Default to sequential when no strategy defined
    return sm.sequentialNextBackend(log, backends)

Comprehensive Testing

Now includes complete test coverage:

  • ✅ All 5 strategies individually tested
  • ✅ Default behavior verification
  • ✅ Connection refused error handling
  • ✅ Backend cycling and fallback scenarios
  • ✅ Strategy isolation and edge cases
  • ✅ Real integration tests

Breaking Changes

None - All existing configurations continue to work:

  • Empty strategy → sequential (same predictable behavior)
  • Explicit strategies → work exactly as before
  • All config formats remain compatible

Summary

This provides the complete solution with:

  • 🚫 No console spam (smart error verbosity)
  • 📋 Clear strategy documentation (sequential default)
  • 🔧 Proper strategy implementation (all 5 strategies tested)
  • 🛡️ Comprehensive test coverage (prevents future regressions)

The fix addresses all reported issues while adding the missing sequential strategy enum and ensuring consistent, well-documented behavior.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Sep 8, 2025

Copy link
Copy Markdown

Deploying gate-minekube with  Cloudflare Pages  Cloudflare Pages

Latest commit: 33febf6
Status: ✅  Deploy successful!
Preview URL: https://31a559dd.gate-minekube.pages.dev
Branch Preview URL: https://fix-lite-backend-health-trac.gate-minekube.pages.dev

View logs

This minimal fix addresses issue #564 by simply increasing the log
verbosity level for failed backend connection attempts. This prevents
console spam when backends are unreachable while maintaining the
simple retry behavior that has always worked.

Changes:
- Failed backend connections now log at V(1) debug level instead of info
- Fallback status messages also use V(1) to reduce spam
- No complex health tracking or caching (keeps it simple)
- Preserves existing retry behavior without marking backends unhealthy

Fixes #564
@robinbraemer robinbraemer force-pushed the fix/lite-backend-health-tracking branch from a499514 to 759d9bb Compare September 8, 2025 18:26
This commit provides a complete fix for all three issues:

1. **Log spam reduction** (✓ Fixed)
   - Failed backend logs now use V(1) debug level
   - Fallback messages also use V(1) to reduce verbosity

2. **Backend cycling** (✓ Fixed)
   - When a backend fails, it's removed from the retry list
   - Strategy manager properly cycles through all available backends
   - No duplicate attempts on the same backend

3. **Fallback response** (✓ Fixed)
   - Fallback properly shown when all backends are unreachable
   - Already worked, just needed reduced log verbosity

Changes:
- Modified tryBackends logging to use V(1) for failed attempts
- Fixed nextBackend to remove tried backends from the list
- Added comprehensive tests for all three issues
- Maintains simple retry behavior without complex health tracking

Tests added:
- TestBackendSelection_TriesAllBackends
- TestBackendSelection_SucceedsOnSecondBackend
- TestBackendSelection_NoDuplicateAttempts
- TestFallbackResponse_UsedWhenAllBackendsFail
- TestLogSpamReduction

All existing tests continue to pass.
- Extracted handleFallbackResponse for better testability
- Removed useless tests that didn't test real functionality:
  * TestResolveStatusResponseIntegration (just logged messages)
  * TestBackendRemovalFromList (tested list ops, not real code)

- Added meaningful integration tests:
  * TestNextBackendFunctionality - tests actual nextBackend implementation
  * TestFallbackResponseWithRealRoute - tests real fallback scenarios
  * TestLogVerbosityActuallyWorks - verifies log.V(1) behavior

- All tests now exercise actual production code paths
- Better test coverage of the real functionality
The original implementation before PR #538 was much cleaner and simpler.
This reverts to the elegant pop-first approach while keeping only the
log verbosity fix.

**What was reverted:**
- Complex strategy manager backend selection
- Search-and-remove logic with normalization
- O(n) backend removal loops

**What we kept:**
- Simple pop-first approach: tryBackends[0] then tryBackends[1:]
- Sequential order (predictable, no duplicates)
- O(1) backend removal
- Log verbosity fix (V(1) for failed backends)

**Benefits of simple approach:**
✅ No duplicates - guaranteed by pop-first
✅ Tries all backends in order
✅ Clean, readable code
✅ Same behavior as before PR #538

**Tests updated:**
- Simplified all tests to match the pop-first logic
- Removed complex strategy manager interactions
- Tests now verify sequential backend selection
- All tests still pass and cover the actual logic

This maintains the fix for issues #2 and #3 while being much simpler.
Connection refused errors are common when backends are down and should
not spam the console at INFO level. This adds smart detection of
connection refused errors in dialRoute to use verbosity 1 (debug level).

Before: Connection refused → Verbosity 0 → INFO level → console spam
After:  Connection refused → Verbosity 1 → DEBUG level → quiet

This preserves the smart verbosity system while fixing the specific
case of connection refused errors that were causing spam.
Previously there were NO tests for individual strategy behaviors,
only validation tests. This adds complete test coverage for all
four load balancing strategies.

Tests added:
✅ TestRandomStrategy - verifies random distribution
✅ TestRoundRobinStrategy - verifies sequential cycling
✅ TestRoundRobinStrategy_DifferentRoutes - verifies route isolation
✅ TestLeastConnectionsStrategy - verifies connection-based selection
✅ TestLowestLatencyStrategy - verifies latency-based selection
✅ TestStrategyWithEmptyBackends - edge case handling
✅ TestStrategyWithSingleBackend - single backend behavior
✅ TestGetNextBackendStrategyRouting - integration testing

Each test verifies actual strategy behavior and ensures the
algorithms work correctly according to their specifications.
Added explicit sequential strategy enum and made it the default behavior
when no strategy is configured, providing clarity and consistency.

Changes:
✅ Added StrategySequential enum to config
✅ Added sequentialNextBackend implementation
✅ Default empty strategy now uses sequential (not random)
✅ Updated documentation to reflect sequential as default
✅ Updated config files to list sequential as first option
✅ Added comprehensive tests for sequential strategy

Strategy behavior:
- Empty strategy → sequential (default)
- strategy: sequential → explicit sequential
- strategy: random → random selection
- strategy: round-robin → round-robin cycling
- strategy: least-connections → connection-based
- strategy: lowest-latency → latency-based

All existing strategies continue to work exactly as before.
Sequential is now clearly documented and properly tested.
@robinbraemer robinbraemer changed the title fix: improve lite backend health tracking and reduce log spam fix: resolve lite mode backend issues and add sequential strategy default Sep 8, 2025
@robinbraemer robinbraemer merged commit 798cb65 into master Sep 8, 2025
7 checks passed
@robinbraemer robinbraemer deleted the fix/lite-backend-health-tracking branch September 8, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improvements to gate lite strategy

1 participant