Skip to main content

Circuit Breakers

Circuit Breakers are an automatic reliability feature that protects your webhook workflows from cascading failures. When a destination is experiencing issues, circuit breakers temporarily stop sending webhooks to prevent overwhelming the failing service, giving it time to recover.

Overview

Circuit breakers work like electrical circuit breakers in your home - when too much current flows (too many errors), the circuit "opens" to prevent damage, then automatically "closes" once conditions improve.

Key features:

  • Automatic protection: No manual intervention required
  • Per-destination: Each destination has its own circuit breaker
  • Self-healing: Automatically tests for recovery
  • Configurable thresholds: Tune sensitivity to your needs
  • Real-time monitoring: View circuit breaker status in dashboard
  • Manual controls: Reset circuits when needed

Why Circuit Breakers Matter

The Cascading Failure Problem

Without circuit breakers, webhook delivery failures can cause serious issues:

Scenario: Your destination API goes down

  • Hooklistener keeps attempting delivery
  • Each attempt times out (adds latency)
  • Retry queue grows rapidly
  • System resources are consumed by failing requests
  • Other healthy destinations may be affected
  • Recovery takes longer even after service is restored

With circuit breakers:

  • After a few failures, the circuit opens
  • Subsequent webhooks are blocked immediately (no timeout wait)
  • System resources are preserved
  • Periodic tests check for recovery
  • Once healthy, the circuit closes automatically
  • Normal operation resumes quickly

Benefits

  1. Prevent resource exhaustion: Stop wasting resources on failing destinations
  2. Faster failure detection: Know immediately when a service is down
  3. Automatic recovery: Resume once service is healthy
  4. Protect downstream systems: Prevent overwhelming already-struggling services
  5. Improve overall reliability: Isolate failures to specific destinations

Circuit Breaker States

Closed (Normal Operation)

The circuit is closed when everything is working normally.

Behavior:

  • All webhook deliveries are attempted
  • Failures are counted and tracked
  • If failures exceed threshold, circuit opens

Status indicator: 🟢 Healthy

Open (Service Failing)

The circuit opens when too many failures occur within a time window.

Behavior:

  • Webhook deliveries are blocked immediately
  • No delivery attempts are made (prevents timeouts)
  • Webhooks are marked as "circuit breaker blocked"
  • After a timeout period, circuit transitions to half-open

Status indicator: 🔴 Open

Threshold configuration:

  • Failure threshold: 5 failures within 60 seconds (default)
  • Time window: 60 seconds (default)
  • Recovery timeout: 60 seconds before testing (default)

Half-Open (Testing Recovery)

The circuit enters half-open state after the recovery timeout expires.

Behavior:

  • Limited webhook deliveries are attempted (test mode)
  • If deliveries succeed, circuit closes
  • If deliveries fail, circuit reopens immediately
  • Success threshold determines when to close

Status indicator: 🟡 Testing

Recovery configuration:

  • Success threshold: 2 successful deliveries required (default)
  • Single failure: Immediately reopens circuit

How Circuit Breakers Work

State Transitions

Closed → Open → Half-Open → Closed
↑ ↓
└───────────────────────────┘

Closed to Open:

  • Triggered by: 5+ failures within 60-second window
  • Action: Block all subsequent deliveries
  • Duration: Stays open for 60 seconds

Open to Half-Open:

  • Triggered by: 60 seconds elapsed since last failure
  • Action: Allow limited test deliveries
  • Purpose: Check if service recovered

Half-Open to Closed:

  • Triggered by: 2 consecutive successful deliveries
  • Action: Resume normal operation
  • Effect: Circuit fully recovered

Half-Open to Open:

  • Triggered by: Any single failure during testing
  • Action: Block deliveries again
  • Duration: Wait another 60 seconds before retrying

Failure Detection

Circuit breakers count these as failures:

Network Errors:

  • Connection timeout
  • Connection refused
  • DNS resolution failure
  • SSL/TLS errors

HTTP Status Codes:

  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • 429 Too Many Requests (rate limiting)

Timeouts:

  • Request timeout (>30 seconds)
  • Response timeout

Not counted as failures:

  • 4xx errors (except 429) - these indicate client errors, not service failures
  • 2xx/3xx responses - successful deliveries

Time Window Tracking

Circuit breakers use a sliding time window to count failures:

60-second window example:

  • Failure at 10:00:00
  • Failure at 10:00:15
  • Failure at 10:00:30
  • Failure at 10:00:45
  • Failure at 10:00:55
  • Circuit opens (5 failures in 60 seconds)
  • At 10:01:55, circuit transitions to half-open (60 seconds after last failure)

Older failures are automatically removed:

  • Failure at 10:00:00 expires at 10:01:00
  • Only recent failures count toward threshold
  • Sliding window ensures responsive behavior

Monitoring Circuit Breakers

Dashboard View

View circuit breaker status for all destinations:

Navigate to: Bridges → Circuit Breakers

Information displayed:

  • Destination name
  • Current state (Closed/Open/Half-Open)
  • Failure count within window
  • Last failure time
  • Time until recovery attempt
  • Success count (in half-open state)

Color coding:

  • 🟢 Green: Closed (healthy)
  • 🟡 Yellow: Half-open (testing)
  • 🔴 Red: Open (failing)

Per-Destination Status

View detailed circuit breaker info for a specific destination:

Navigate to: Bridges → Select Bridge → Destinations → Circuit Breaker Status

Detailed metrics:

  • Current state and transition history
  • Failure timeline (last 100 failures)
  • Recovery attempts and outcomes
  • Configuration settings
  • Manual override status

Alerts and Notifications

Configure alerts for circuit breaker events:

Alert triggers:

  • Circuit opened (destination failing)
  • Circuit half-open (testing recovery)
  • Circuit closed (recovered)
  • Circuit repeatedly opening (chronic issues)

Notification channels:

  • Email
  • Slack
  • Discord
  • Telegram
  • Webhook

Managing Circuit Breakers

Automatic Behavior

Circuit breakers operate automatically without intervention:

  1. Monitor failures: Track delivery success/failure rates
  2. Open circuit: When threshold exceeded
  3. Wait for recovery: Timeout period
  4. Test recovery: Half-open state
  5. Resume or retry: Based on test results

Best practice: Let circuit breakers handle recovery automatically in most cases.

Manual Reset

Sometimes you need to manually reset a circuit breaker:

When to manually reset:

  • You've fixed the destination issue
  • You know the service is healthy again
  • Circuit breaker timeout is too long
  • You want to force immediate retry

How to reset:

Via Dashboard:

  1. Navigate to Circuit Breakers
  2. Find the destination with open circuit
  3. Click "Reset Circuit Breaker"
  4. Confirm the action

Via API:

curl -X POST https://api.hooklistener.com/api/v1/circuit-breakers/{destination_id}/reset \
-H "Authorization: Bearer YOUR_API_KEY"

What happens on reset:

  • Circuit immediately transitions to Closed state
  • Failure count is cleared
  • Next webhook delivery is attempted normally
  • If destination still failing, circuit will reopen quickly

Caution: Don't repeatedly reset without fixing the underlying issue. This defeats the purpose of circuit breakers and can cause resource exhaustion.

Configuration

Default circuit breaker settings work well for most use cases, but you can customize them:

Per-Bridge configuration (coming soon):

{
"circuit_breaker": {
"failure_threshold": 10,
"success_threshold": 3,
"timeout": 120000,
"window_size": 60000
}
}

Configuration options:

  • failure_threshold: Failures before opening (default: 5)
  • success_threshold: Successes before closing from half-open (default: 2)
  • timeout: Milliseconds before testing recovery (default: 60000)
  • window_size: Time window for counting failures in milliseconds (default: 60000)

Best Practices

Monitoring

  1. Set up alerts

    • Get notified when circuits open
    • Track recovery times
    • Monitor chronic failures
  2. Review circuit breaker patterns

    • Frequent opens indicate destination issues
    • Long recovery times suggest capacity problems
    • Consistent failures need investigation
  3. Track destinations separately

    • One failing destination shouldn't affect others
    • Monitor each destination independently
    • Identify problematic integrations

Responding to Open Circuits

When a circuit opens:

  1. Don't panic - circuit breakers are working as designed
  2. Check destination health - is the service actually down?
  3. Review error messages - what's causing the failures?
  4. Let it auto-recover - wait for automatic testing
  5. Fix root cause - address underlying issues
  6. Monitor recovery - ensure sustained health

What NOT to do:

  • ❌ Repeatedly reset circuit breakers
  • ❌ Disable circuit breakers to "force through"
  • ❌ Ignore chronic failures
  • ❌ Blame Hooklistener for destination issues

Destination Design

Design destination endpoints to work well with circuit breakers:

  1. Fast failure

    • Return errors quickly (don't let requests hang)
    • Use appropriate HTTP status codes
    • Timeout internal operations
  2. Idempotency

    • Handle duplicate deliveries gracefully
    • Use unique identifiers for deduplication
    • Design for at-least-once delivery
  3. Health indicators

    • Provide health check endpoints
    • Return 503 when degraded
    • Implement graceful degradation
  4. Rate limiting

    • Return 429 when rate limited
    • Include Retry-After header
    • Handle burst traffic

Troubleshooting

Circuit Breaker Frequently Opening

Possible causes:

  • Destination service is unstable
  • Capacity issues at destination
  • Network connectivity problems
  • Incorrect authentication
  • Rate limiting issues

Diagnosis steps:

  1. Review error messages in event history
  2. Check destination logs for clues
  3. Test destination directly outside Hooklistener
  4. Monitor destination metrics (CPU, memory, response times)
  5. Verify network connectivity and DNS

Solutions:

  • Fix destination service issues
  • Increase destination capacity
  • Review and fix authentication
  • Implement rate limiting or request throttling
  • Use retries appropriately

Circuit Stuck in Half-Open

Possible causes:

  • Destination intermittently failing
  • Slow recovery time
  • Partial service degradation

What's happening:

  • Circuit tests recovery
  • Occasional success doesn't meet threshold
  • Failures reset to open state
  • Cycle repeats

Solutions:

  • Investigate intermittent failures
  • Lower success threshold (if appropriate)
  • Fix underlying stability issues
  • Consider temporary manual intervention

Too Many False Opens

Possible causes:

  • Threshold too sensitive
  • Transient network issues
  • Burst failures from legitimate traffic spikes

Solutions:

  • Increase failure threshold
  • Increase time window
  • Improve destination reliability
  • Implement better error handling

Circuits Not Opening When They Should

Possible causes:

  • Destination returning 4xx instead of 5xx
  • Errors not being counted properly
  • Threshold set too high

Diagnosis:

  • Review failure types in event history
  • Check HTTP status codes returned
  • Verify failures are actually errors

Solutions:

  • Fix destination to return correct status codes
  • Lower failure threshold if needed
  • Review what constitutes a failure

Circuit Breakers and Retries

Circuit breakers and retries work together:

When Circuit is Closed

  • Delivery is attempted
  • If it fails, retry logic kicks in
  • Multiple retries may occur
  • Each retry failure counts toward circuit breaker threshold
  • After enough failures, circuit opens

When Circuit is Open

  • Delivery is blocked immediately
  • No retry attempts are made
  • Webhook is marked as "circuit breaker blocked"
  • Retries will resume once circuit closes

When Circuit is Half-Open

  • Test deliveries are attempted
  • Successful deliveries close the circuit
  • Failed deliveries reopen the circuit
  • Regular retries resume when circuit closes

Best Practices

  1. Configure retries appropriately

    • More retries = more opportunities for circuit to detect failures
    • Too many retries = longer time to open circuit
    • Balance based on your needs
  2. Circuit breakers complement retries

    • Retries handle transient failures
    • Circuit breakers handle sustained failures
    • Together they provide robust error handling

Integration Examples

Monitoring Circuit Breaker Status

# Get all circuit breaker statuses
curl -X GET https://api.hooklistener.com/api/v1/circuit-breakers \
-H "Authorization: Bearer YOUR_API_KEY"

Response:

{
"circuit_breakers": [
{
"destination_id": "550e8400-e29b-41d4-a716-446655440000",
"destination_name": "Production API",
"state": "closed",
"failure_count": 0,
"success_count": 0,
"last_failure_at": null
},
{
"destination_id": "660e8400-e29b-41d4-a716-446655440001",
"destination_name": "Slack Notifications",
"state": "open",
"failure_count": 7,
"success_count": 0,
"last_failure_at": "2024-01-15T10:30:00Z",
"recovery_at": "2024-01-15T10:31:00Z"
}
]
}

Resetting a Circuit Breaker

curl -X POST https://api.hooklistener.com/api/v1/circuit-breakers/{destination_id}/reset \
-H "Authorization: Bearer YOUR_API_KEY"

Response:

{
"success": true,
"message": "Circuit breaker reset successfully",
"destination_id": "550e8400-e29b-41d4-a716-446655440000",
"new_state": "closed"
}

Comparison with Other Patterns

Circuit Breakers vs Retries

Retries:

  • Handle transient failures
  • Attempt delivery multiple times
  • Useful for temporary network glitches
  • Can waste resources if service is down

Circuit Breakers:

  • Handle sustained failures
  • Stop attempting after threshold
  • Prevent resource waste
  • Allow automatic recovery

Use both together for comprehensive error handling.

Circuit Breakers vs Health Checks

Health Checks:

  • Proactive monitoring
  • Separate from actual traffic
  • Detect issues before they impact users
  • Require destination support

Circuit Breakers:

  • Reactive protection
  • Based on actual delivery attempts
  • Detect failures in real traffic
  • Work with any destination

Complementary: Use health checks to prevent failures, circuit breakers to handle them.

Next Steps

Now that you understand Circuit Breakers, explore related features:

  1. Configure Retries to handle transient failures
  2. Monitor Issues to track persistent problems
  3. Set up Destinations with reliability in mind
  4. Build Bridges with circuit breaker awareness
  5. Track Events to understand failure patterns

Circuit Breakers are a critical reliability feature that makes Hooklistener resilient and self-healing. By automatically protecting your webhook workflows from cascading failures, circuit breakers ensure your integrations remain stable even when individual services experience issues.