Circuit Breakers
Circuit Breakers are an automatic reliability feature that protects your webhook workflows from cascading failures. When a destination is experiencing issues, circuit breakers temporarily stop sending webhooks to prevent overwhelming the failing service, giving it time to recover.
Overview
Circuit breakers work like electrical circuit breakers in your home - when too much current flows (too many errors), the circuit "opens" to prevent damage, then automatically "closes" once conditions improve.
Key features:
- Automatic protection: No manual intervention required
- Per-destination: Each destination has its own circuit breaker
- Self-healing: Automatically tests for recovery
- Configurable thresholds: Tune sensitivity to your needs
- Real-time monitoring: View circuit breaker status in dashboard
- Manual controls: Reset circuits when needed
Why Circuit Breakers Matter
The Cascading Failure Problem
Without circuit breakers, webhook delivery failures can cause serious issues:
Scenario: Your destination API goes down
- Hooklistener keeps attempting delivery
- Each attempt times out (adds latency)
- Retry queue grows rapidly
- System resources are consumed by failing requests
- Other healthy destinations may be affected
- Recovery takes longer even after service is restored
With circuit breakers:
- After a few failures, the circuit opens
- Subsequent webhooks are blocked immediately (no timeout wait)
- System resources are preserved
- Periodic tests check for recovery
- Once healthy, the circuit closes automatically
- Normal operation resumes quickly
Benefits
- Prevent resource exhaustion: Stop wasting resources on failing destinations
- Faster failure detection: Know immediately when a service is down
- Automatic recovery: Resume once service is healthy
- Protect downstream systems: Prevent overwhelming already-struggling services
- Improve overall reliability: Isolate failures to specific destinations
Circuit Breaker States
Closed (Normal Operation)
The circuit is closed when everything is working normally.
Behavior:
- All webhook deliveries are attempted
- Failures are counted and tracked
- If failures exceed threshold, circuit opens
Status indicator: 🟢 Healthy
Open (Service Failing)
The circuit opens when too many failures occur within a time window.
Behavior:
- Webhook deliveries are blocked immediately
- No delivery attempts are made (prevents timeouts)
- Webhooks are marked as "circuit breaker blocked"
- After a timeout period, circuit transitions to half-open
Status indicator: 🔴 Open
Threshold configuration:
- Failure threshold: 5 failures within 60 seconds (default)
- Time window: 60 seconds (default)
- Recovery timeout: 60 seconds before testing (default)
Half-Open (Testing Recovery)
The circuit enters half-open state after the recovery timeout expires.
Behavior:
- Limited webhook deliveries are attempted (test mode)
- If deliveries succeed, circuit closes
- If deliveries fail, circuit reopens immediately
- Success threshold determines when to close
Status indicator: 🟡 Testing
Recovery configuration:
- Success threshold: 2 successful deliveries required (default)
- Single failure: Immediately reopens circuit
How Circuit Breakers Work
State Transitions
Closed → Open → Half-Open → Closed
↑ ↓
└───────────────────────────┘
Closed to Open:
- Triggered by: 5+ failures within 60-second window
- Action: Block all subsequent deliveries
- Duration: Stays open for 60 seconds
Open to Half-Open:
- Triggered by: 60 seconds elapsed since last failure
- Action: Allow limited test deliveries
- Purpose: Check if service recovered
Half-Open to Closed:
- Triggered by: 2 consecutive successful deliveries
- Action: Resume normal operation
- Effect: Circuit fully recovered
Half-Open to Open:
- Triggered by: Any single failure during testing
- Action: Block deliveries again
- Duration: Wait another 60 seconds before retrying
Failure Detection
Circuit breakers count these as failures:
Network Errors:
- Connection timeout
- Connection refused
- DNS resolution failure
- SSL/TLS errors
HTTP Status Codes:
- 500 Internal Server Error
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
- 429 Too Many Requests (rate limiting)
Timeouts:
- Request timeout (>30 seconds)
- Response timeout
Not counted as failures:
- 4xx errors (except 429) - these indicate client errors, not service failures
- 2xx/3xx responses - successful deliveries
Time Window Tracking
Circuit breakers use a sliding time window to count failures:
60-second window example:
- Failure at 10:00:00
- Failure at 10:00:15
- Failure at 10:00:30
- Failure at 10:00:45
- Failure at 10:00:55
- Circuit opens (5 failures in 60 seconds)
- At 10:01:55, circuit transitions to half-open (60 seconds after last failure)
Older failures are automatically removed:
- Failure at 10:00:00 expires at 10:01:00
- Only recent failures count toward threshold
- Sliding window ensures responsive behavior
Monitoring Circuit Breakers
Dashboard View
View circuit breaker status for all destinations:
Navigate to: Bridges → Circuit Breakers
Information displayed:
- Destination name
- Current state (Closed/Open/Half-Open)
- Failure count within window
- Last failure time
- Time until recovery attempt
- Success count (in half-open state)
Color coding:
- 🟢 Green: Closed (healthy)
- 🟡 Yellow: Half-open (testing)
- 🔴 Red: Open (failing)
Per-Destination Status
View detailed circuit breaker info for a specific destination:
Navigate to: Bridges → Select Bridge → Destinations → Circuit Breaker Status
Detailed metrics:
- Current state and transition history
- Failure timeline (last 100 failures)
- Recovery attempts and outcomes
- Configuration settings
- Manual override status
Alerts and Notifications
Configure alerts for circuit breaker events:
Alert triggers:
- Circuit opened (destination failing)
- Circuit half-open (testing recovery)
- Circuit closed (recovered)
- Circuit repeatedly opening (chronic issues)
Notification channels:
- Slack
- Discord
- Telegram
- Webhook
Managing Circuit Breakers
Automatic Behavior
Circuit breakers operate automatically without intervention:
- Monitor failures: Track delivery success/failure rates
- Open circuit: When threshold exceeded
- Wait for recovery: Timeout period
- Test recovery: Half-open state
- Resume or retry: Based on test results
Best practice: Let circuit breakers handle recovery automatically in most cases.
Manual Reset
Sometimes you need to manually reset a circuit breaker:
When to manually reset:
- You've fixed the destination issue
- You know the service is healthy again
- Circuit breaker timeout is too long
- You want to force immediate retry
How to reset:
Via Dashboard:
- Navigate to Circuit Breakers
- Find the destination with open circuit
- Click "Reset Circuit Breaker"
- Confirm the action
Via API:
curl -X POST https://api.hooklistener.com/api/v1/circuit-breakers/{destination_id}/reset \
-H "Authorization: Bearer YOUR_API_KEY"
What happens on reset:
- Circuit immediately transitions to Closed state
- Failure count is cleared
- Next webhook delivery is attempted normally
- If destination still failing, circuit will reopen quickly
Caution: Don't repeatedly reset without fixing the underlying issue. This defeats the purpose of circuit breakers and can cause resource exhaustion.
Configuration
Default circuit breaker settings work well for most use cases, but you can customize them:
Per-Bridge configuration (coming soon):
{
"circuit_breaker": {
"failure_threshold": 10,
"success_threshold": 3,
"timeout": 120000,
"window_size": 60000
}
}
Configuration options:
failure_threshold: Failures before opening (default: 5)success_threshold: Successes before closing from half-open (default: 2)timeout: Milliseconds before testing recovery (default: 60000)window_size: Time window for counting failures in milliseconds (default: 60000)
Best Practices
Monitoring
-
Set up alerts
- Get notified when circuits open
- Track recovery times
- Monitor chronic failures
-
Review circuit breaker patterns
- Frequent opens indicate destination issues
- Long recovery times suggest capacity problems
- Consistent failures need investigation
-
Track destinations separately
- One failing destination shouldn't affect others
- Monitor each destination independently
- Identify problematic integrations
Responding to Open Circuits
When a circuit opens:
- Don't panic - circuit breakers are working as designed
- Check destination health - is the service actually down?
- Review error messages - what's causing the failures?
- Let it auto-recover - wait for automatic testing
- Fix root cause - address underlying issues
- Monitor recovery - ensure sustained health
What NOT to do:
- ❌ Repeatedly reset circuit breakers
- ❌ Disable circuit breakers to "force through"
- ❌ Ignore chronic failures
- ❌ Blame Hooklistener for destination issues
Destination Design
Design destination endpoints to work well with circuit breakers:
-
Fast failure
- Return errors quickly (don't let requests hang)
- Use appropriate HTTP status codes
- Timeout internal operations
-
Idempotency
- Handle duplicate deliveries gracefully
- Use unique identifiers for deduplication
- Design for at-least-once delivery
-
Health indicators
- Provide health check endpoints
- Return 503 when degraded
- Implement graceful degradation
-
Rate limiting
- Return 429 when rate limited
- Include Retry-After header
- Handle burst traffic
Troubleshooting
Circuit Breaker Frequently Opening
Possible causes:
- Destination service is unstable
- Capacity issues at destination
- Network connectivity problems
- Incorrect authentication
- Rate limiting issues
Diagnosis steps:
- Review error messages in event history
- Check destination logs for clues
- Test destination directly outside Hooklistener
- Monitor destination metrics (CPU, memory, response times)
- Verify network connectivity and DNS
Solutions:
- Fix destination service issues
- Increase destination capacity
- Review and fix authentication
- Implement rate limiting or request throttling
- Use retries appropriately
Circuit Stuck in Half-Open
Possible causes:
- Destination intermittently failing
- Slow recovery time
- Partial service degradation
What's happening:
- Circuit tests recovery
- Occasional success doesn't meet threshold
- Failures reset to open state
- Cycle repeats
Solutions:
- Investigate intermittent failures
- Lower success threshold (if appropriate)
- Fix underlying stability issues
- Consider temporary manual intervention
Too Many False Opens
Possible causes:
- Threshold too sensitive
- Transient network issues
- Burst failures from legitimate traffic spikes
Solutions:
- Increase failure threshold
- Increase time window
- Improve destination reliability
- Implement better error handling
Circuits Not Opening When They Should
Possible causes:
- Destination returning 4xx instead of 5xx
- Errors not being counted properly
- Threshold set too high
Diagnosis:
- Review failure types in event history
- Check HTTP status codes returned
- Verify failures are actually errors
Solutions:
- Fix destination to return correct status codes
- Lower failure threshold if needed
- Review what constitutes a failure
Circuit Breakers and Retries
Circuit breakers and retries work together:
When Circuit is Closed
- Delivery is attempted
- If it fails, retry logic kicks in
- Multiple retries may occur
- Each retry failure counts toward circuit breaker threshold
- After enough failures, circuit opens
When Circuit is Open
- Delivery is blocked immediately
- No retry attempts are made
- Webhook is marked as "circuit breaker blocked"
- Retries will resume once circuit closes
When Circuit is Half-Open
- Test deliveries are attempted
- Successful deliveries close the circuit
- Failed deliveries reopen the circuit
- Regular retries resume when circuit closes
Best Practices
-
Configure retries appropriately
- More retries = more opportunities for circuit to detect failures
- Too many retries = longer time to open circuit
- Balance based on your needs
-
Circuit breakers complement retries
- Retries handle transient failures
- Circuit breakers handle sustained failures
- Together they provide robust error handling
Integration Examples
Monitoring Circuit Breaker Status
# Get all circuit breaker statuses
curl -X GET https://api.hooklistener.com/api/v1/circuit-breakers \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"circuit_breakers": [
{
"destination_id": "550e8400-e29b-41d4-a716-446655440000",
"destination_name": "Production API",
"state": "closed",
"failure_count": 0,
"success_count": 0,
"last_failure_at": null
},
{
"destination_id": "660e8400-e29b-41d4-a716-446655440001",
"destination_name": "Slack Notifications",
"state": "open",
"failure_count": 7,
"success_count": 0,
"last_failure_at": "2024-01-15T10:30:00Z",
"recovery_at": "2024-01-15T10:31:00Z"
}
]
}
Resetting a Circuit Breaker
curl -X POST https://api.hooklistener.com/api/v1/circuit-breakers/{destination_id}/reset \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"success": true,
"message": "Circuit breaker reset successfully",
"destination_id": "550e8400-e29b-41d4-a716-446655440000",
"new_state": "closed"
}
Comparison with Other Patterns
Circuit Breakers vs Retries
Retries:
- Handle transient failures
- Attempt delivery multiple times
- Useful for temporary network glitches
- Can waste resources if service is down
Circuit Breakers:
- Handle sustained failures
- Stop attempting after threshold
- Prevent resource waste
- Allow automatic recovery
Use both together for comprehensive error handling.
Circuit Breakers vs Health Checks
Health Checks:
- Proactive monitoring
- Separate from actual traffic
- Detect issues before they impact users
- Require destination support
Circuit Breakers:
- Reactive protection
- Based on actual delivery attempts
- Detect failures in real traffic
- Work with any destination
Complementary: Use health checks to prevent failures, circuit breakers to handle them.
Next Steps
Now that you understand Circuit Breakers, explore related features:
- Configure Retries to handle transient failures
- Monitor Issues to track persistent problems
- Set up Destinations with reliability in mind
- Build Bridges with circuit breaker awareness
- Track Events to understand failure patterns
Circuit Breakers are a critical reliability feature that makes Hooklistener resilient and self-healing. By automatically protecting your webhook workflows from cascading failures, circuit breakers ensure your integrations remain stable even when individual services experience issues.