Deduplication
Deduplication prevents processing the same webhook multiple times when providers send duplicate requests. This feature protects your systems from duplicate charges, repeated notifications, and inconsistent state.
Overview
Webhook providers often retry failed deliveries, sometimes sending the same webhook multiple times. Without deduplication, your system might process the same order twice, send duplicate notifications, or create conflicting records.
Key features:
- Automatic detection: Identifies duplicate webhooks based on your configuration
- Three strategies: Choose how to identify duplicates (single field, include list, or exclude list)
- Configurable window: Set how long to remember webhook signatures (default: 5 minutes)
- Zero overhead: Rejected duplicates don't count toward your quota
- Source-level: Each Source can have independent deduplication rules
How Deduplication Works
Detection Process
When a webhook arrives at a Source with deduplication enabled:
- Extract identifier: Based on your strategy, extract the deduplication key from the webhook
- Hash the key: Create a SHA256 hash of the identifier
- Check cache: Look up the hash in the deduplication cache
- Accept or reject:
- If not found: Store hash and process webhook normally
- If found: Reject as duplicate (return 200 OK but skip processing)
Time Window (TTL)
Deduplication uses a time-to-live (TTL) window:
- Default: 5 minutes (300 seconds)
- Configurable: Set any positive integer (in seconds)
- After TTL expires: Same webhook is treated as new
Example: With 300-second TTL, a webhook received at 10:00:00 is stored until 10:05:00. If the same webhook arrives at 10:04:00, it's rejected as a duplicate. If it arrives at 10:06:00, it's processed as new.
Duplicate Response
When a duplicate is detected:
- HTTP 200 OK is returned (success)
- Webhook is NOT processed
- Event is NOT created
- Destinations are NOT notified
- Duplicate count is tracked for monitoring
This prevents webhook providers from retrying endlessly.
Deduplication Strategies
Hooklistener offers three strategies for identifying duplicate webhooks. Choose based on your webhook provider's behavior.
Strategy 1: Single Field Path (body_path)
Extract a single field from the webhook body to identify duplicates.
Best for:
- Webhooks with unique IDs (most common)
- Simple deduplication needs
- Consistent webhook structure
Configuration:
{
"deduplication_config": {
"enabled": true,
"body_path": "body.id",
"ttl_seconds": 300
}
}
Example - Stripe webhook:
{
"id": "evt_1234567890",
"type": "payment_intent.succeeded",
"data": {
"object": {
"id": "pi_1234567890",
"amount": 1000
}
}
}
Use body_path: "body.id" to deduplicate on the event ID.
Example - GitHub webhook:
{
"delivery_id": "12345678-1234-1234-1234-123456789012",
"action": "opened",
"pull_request": {
"id": 987654321
}
}
Use body_path: "body.delivery_id" to deduplicate on the delivery ID.
Strategy 2: Include Fields (include_fields)
Hash multiple specific fields together to identify duplicates.
Best for:
- Webhooks without single unique ID
- Composite keys
- Partial payload matching
Configuration:
{
"deduplication_config": {
"enabled": true,
"include_fields": [
"body.order_id",
"body.customer_id",
"body.timestamp"
],
"ttl_seconds": 300
}
}
Example - E-commerce webhook:
{
"order_id": "ORD-12345",
"customer_id": "CUST-67890",
"timestamp": "2024-01-15T10:30:00Z",
"items": [...],
"shipping_address": {...}
}
All three fields (order_id, customer_id, timestamp) are combined and hashed together. A webhook is only considered a duplicate if ALL three match.
Strategy 3: Exclude Fields (exclude_fields)
Hash the entire payload except specified fields.
Best for:
- Webhooks where most fields should be considered
- Excluding timestamps, metadata, or dynamic fields
- Complex payloads
Configuration:
{
"deduplication_config": {
"enabled": true,
"exclude_fields": [
"body.metadata.timestamp",
"body.metadata.server_id",
"headers.x-request-id"
],
"ttl_seconds": 300
}
}
Example - Monitoring webhook:
{
"alert_id": "alert-123",
"severity": "critical",
"message": "High CPU usage",
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"server_id": "srv-456"
}
}
The webhook is deduplicated on everything EXCEPT the excluded fields (timestamp and server_id). If the alert content is identical but from a different server or time, it's still considered a duplicate.
Field Path Syntax
Field paths specify which fields to extract from webhooks.
Basic Paths
Root prefixes:
body.- Access request body (JSON)headers.- Access HTTP headersquery.- Access query parameterspath.- Access URL path parameters
Examples:
"body.id" // Top-level field
"body.data.object.id" // Nested field
"headers.x-github-delivery" // Header (case-insensitive)
"query.event_type" // Query parameter
Array Access
Access specific array elements by index:
"body.items[0].id" // First item
"body.tags[2]" // Third tag
"body.data.users[0].email" // Nested array access
Wildcard Selection
Use [*] to select all array elements:
"body.items[*].id" // All item IDs
"body.tags[*]" // All tags
"body.data.orders[*].total" // All order totals
When using wildcards with body_path, all matching values are combined and hashed together.
Examples by Provider
Stripe:
{
"body_path": "body.id"
}
GitHub:
{
"include_fields": [
"headers.x-github-delivery",
"body.action"
]
}
Shopify:
{
"body_path": "body.id"
}
Custom webhooks:
{
"include_fields": [
"body.transaction_id",
"body.event_type"
]
}
Configuring Deduplication
Via Dashboard
Step 1: Edit Source
- Navigate to Sources
- Select your Source
- Click "Edit"
Step 2: Enable Deduplication
- Find "Deduplication" section
- Toggle "Enable Deduplication" on
- Choose strategy:
- Single Field: Enter
body_path - Include Fields: Add field paths to include
- Exclude Fields: Add field paths to exclude
- Single Field: Enter
- Set TTL (default: 300 seconds)
- Click "Save"
Via API
Create Source with deduplication:
curl -X POST https://api.hooklistener.com/api/v1/sources \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Stripe Webhooks",
"type": "stripe",
"deduplication_config": {
"enabled": true,
"body_path": "body.id",
"ttl_seconds": 300
}
}'
Update existing Source:
curl -X PATCH https://api.hooklistener.com/api/v1/sources/{source_id} \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"deduplication_config": {
"enabled": true,
"include_fields": ["body.order_id", "body.customer_id"],
"ttl_seconds": 600
}
}'
Use Cases
Stripe Payment Webhooks
Problem: Stripe retries webhooks when they fail, potentially charging customers twice.
Solution:
{
"deduplication_config": {
"enabled": true,
"body_path": "body.id",
"ttl_seconds": 3600
}
}
Stripe's id field is unique per event. Use a longer TTL (1 hour) since Stripe may retry over longer periods.
GitHub Push Events
Problem: GitHub may send duplicate push events during network issues.
Solution:
{
"deduplication_config": {
"enabled": true,
"include_fields": [
"headers.x-github-delivery",
"body.after"
],
"ttl_seconds": 300
}
}
Combine delivery ID and commit SHA to ensure uniqueness.
Shopify Order Webhooks
Problem: Shopify webhooks may arrive multiple times during processing.
Solution:
{
"deduplication_config": {
"enabled": true,
"body_path": "body.id",
"ttl_seconds": 600
}
}
Use order ID for deduplication with 10-minute window.
Custom Application Webhooks
Problem: Your application sends webhooks that may duplicate during retries.
Solution - Without unique ID:
{
"deduplication_config": {
"enabled": true,
"include_fields": [
"body.user_id",
"body.action",
"body.resource_id"
],
"ttl_seconds": 300
}
}
Solution - With timestamp to exclude:
{
"deduplication_config": {
"enabled": true,
"exclude_fields": [
"body.timestamp",
"body.metadata.server_id"
],
"ttl_seconds": 300
}
}
High-Frequency Event Streams
Problem: IoT devices or monitoring systems send rapid events that may duplicate.
Solution:
{
"deduplication_config": {
"enabled": true,
"include_fields": [
"body.device_id",
"body.event_type",
"body.value"
],
"ttl_seconds": 60
}
}
Use shorter TTL (1 minute) for high-frequency streams where duplicates arrive quickly.
Best Practices
Choosing a Strategy
-
Use body_path when:
- Webhook has a unique ID field
- Structure is consistent
- Single field is sufficient
-
Use include_fields when:
- No single unique field exists
- Need composite key
- Want explicit control over what's checked
-
Use exclude_fields when:
- Most fields should be considered
- Easier to list exclusions than inclusions
- Payload structure varies slightly
Setting TTL
Short TTL (60-120 seconds):
- High-frequency events
- Quick retry cycles
- Low memory usage priority
Medium TTL (300-600 seconds):
- Standard webhooks
- Most providers
- Balanced approach
Long TTL (3600+ seconds):
- Infrequent webhooks
- Providers with long retry windows
- Critical duplicate prevention
Rule of thumb: Set TTL to 2-3x your provider's retry interval.
Field Selection
-
Always include unique identifiers
- Event IDs
- Transaction IDs
- Delivery IDs
-
Consider temporal fields
- Include if part of uniqueness
- Exclude if generated per request
-
Test with real webhooks
- Use sample payloads
- Verify deduplication works
- Check for false positives
Performance
- Keep include_fields lists short (< 10 fields)
- Use simple paths (avoid deep nesting when possible)
- Avoid wildcards unless necessary
- Monitor deduplication metrics
Monitoring Deduplication
Metrics to Track
Duplicate rate:
duplicate_webhooks / total_webhooks * 100
Typical rates:
- 0-5%: Normal (occasional retries)
- 5-15%: Common during provider issues
- 15%+: Investigate provider or configuration
Dashboard Metrics
View in Sources → [Your Source] → Metrics:
- Total webhooks received
- Duplicate webhooks rejected
- Duplicate rate over time
- Deduplication hit rate
API Metrics
curl -X GET https://api.hooklistener.com/api/v1/sources/{source_id}/stats \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"total_requests": 10000,
"duplicates_rejected": 250,
"duplicate_rate": 2.5,
"period": "24h"
}
Troubleshooting
No Duplicates Detected
Symptoms:
- Deduplication enabled
- Expecting duplicates
- All webhooks processed
Causes:
1. Wrong field path:
# Check actual webhook payload
curl -X GET https://api.hooklistener.com/api/v1/events/{event_id} \
-H "Authorization: Bearer YOUR_API_KEY"
Verify the field path exists in the payload.
2. Field value changes:
- Timestamps in deduplication path
- Random IDs generated per request
- Dynamic content
Solution: Exclude dynamic fields or use include strategy.
3. TTL too short:
- Duplicates arrive after TTL expires
- Increase TTL to cover retry window
4. Deduplication not saved:
- Check Source configuration
- Verify
enabled: true - Confirm strategy is set
Too Many False Positives
Symptoms:
- Legitimate webhooks rejected as duplicates
- Different events marked as duplicates
Causes:
1. Too broad exclusion:
// Problem: Excludes too much
{
"exclude_fields": [
"body.data" // Excludes entire data object
]
}
Solution: Be more specific
{
"exclude_fields": [
"body.data.timestamp",
"body.data.metadata"
]
}
2. Missing unique identifier:
// Problem: Only using non-unique fields
{
"include_fields": [
"body.type",
"body.status"
]
}
Solution: Add unique field
{
"include_fields": [
"body.id", // Unique!
"body.type",
"body.status"
]
}
3. TTL too long:
- Keeping signatures too long
- Different events treated as duplicates
- Reduce TTL to appropriate window
Deduplication Not Working
Symptoms:
- Configuration looks correct
- Still processing duplicates
Debug steps:
1. Verify configuration:
curl -X GET https://api.hooklistener.com/api/v1/sources/{source_id} \
-H "Authorization: Bearer YOUR_API_KEY"
Check deduplication_config is set correctly.
2. Test field path:
# Send test webhook
curl -X POST https://api.hooklistener.com/api/v1/sources/{source_id}/ingest \
-H "Content-Type: application/json" \
-d '{
"id": "test-123",
"type": "test"
}'
# Send duplicate immediately
curl -X POST https://api.hooklistener.com/api/v1/sources/{source_id}/ingest \
-H "Content-Type: application/json" \
-d '{
"id": "test-123",
"type": "test"
}'
Second request should be rejected as duplicate.
3. Check logs: Look for deduplication messages:
- "Duplicate request payload detected, skipping processing"
- "Deduplication check failed"
4. Verify field exists:
Ensure body_path or include_fields point to existing fields in payload.
Advanced Configuration
Multiple Sources, Different Rules
Configure each Source independently:
Production Source:
{
"name": "Production Stripe",
"deduplication_config": {
"enabled": true,
"body_path": "body.id",
"ttl_seconds": 3600
}
}
Development Source:
{
"name": "Development Stripe",
"deduplication_config": {
"enabled": false
}
}
Combining with Filters
Deduplication occurs BEFORE filters:
- Webhook received
- Deduplication check (if enabled)
- If not duplicate: Apply filters
- If passes filters: Apply transformations
- Forward to destinations
This means duplicates are rejected regardless of filter configuration.
Temporary Disabling
Temporarily disable without losing configuration:
curl -X PATCH https://api.hooklistener.com/api/v1/sources/{source_id} \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"deduplication_config": {
"enabled": false
}
}'
Configuration is preserved, just disabled.
Next Steps
Now that you understand deduplication:
- Configure Sources with deduplication enabled
- Monitor Events to verify deduplication is working
- Use Filters for additional webhook routing
- Track Issues if deduplication problems occur
Deduplication is essential for production webhook workflows. Configure it properly to prevent duplicate processing and ensure data consistency in your systems.