Designing a Payment System That Never Loses a Transaction
Payment processing is the one place you cannot afford bugs. Learn idempotency keys, webhook reconciliation, state machines, and the patterns that prevent double charges and lost payments.
Why Payments Are Harder Than You Think
Consider this scenario: a user clicks 'Pay Now,' the request reaches Stripe, Stripe charges the card, but your server times out before receiving the response. Did the payment succeed? Your database says no. Stripe says yes. The user clicks 'Pay Now' again. Now they're double-charged.
This isn't hypothetical. It happens in production systems every day. Here's how to prevent it.
Pattern 1: Idempotency Keys
Every payment request must include a unique idempotency key. If the same key is sent twice, the payment gateway returns the original result instead of processing again.
import uuid
from django.db import models
class Payment(models.Model):
class Status(models.TextChoices):
PENDING = 'pending'
PROCESSING = 'processing'
COMPLETED = 'completed'
FAILED = 'failed'
idempotency_key = models.UUIDField(
default=uuid.uuid4, unique=True, db_index=True
)
status = models.CharField(
max_length=20, choices=Status.choices, default=Status.PENDING
)
amount = models.DecimalField(max_digits=10, decimal_places=2)
stripe_payment_intent_id = models.CharField(
max_length=255, null=True, blank=True
)
attempts = models.IntegerField(default=0)
class Meta:
constraints = [
models.UniqueConstraint(
fields=['idempotency_key'],
name='unique_payment_idempotency'
)
]Pattern 2: Payment State Machine
Never use boolean flags like is_paid. Use a state machine with explicit transitions. This prevents invalid states and makes debugging trivial.
VALID_TRANSITIONS = {
'pending': ['processing', 'failed'],
'processing': ['completed', 'failed', 'pending'], # pending = retry
'completed': ['refunded'],
'failed': ['pending'], # retry
'refunded': [], # terminal state
}
def transition_payment(payment, new_status):
allowed = VALID_TRANSITIONS.get(payment.status, [])
if new_status not in allowed:
raise InvalidTransition(
f"Cannot go from {payment.status} to {new_status}"
)
old_status = payment.status
payment.status = new_status
payment.save()
# Audit log every transition
PaymentAuditLog.objects.create(
payment=payment,
from_status=old_status,
to_status=new_status,
)Pattern 3: Webhook Reconciliation
Never trust your own API response for payment confirmation. Stripe/Razorpay webhooks are the source of truth. Your API response might time out, but the webhook always arrives (with retries).
# Webhook handler: the source of truth
@csrf_exempt
def stripe_webhook(request):
payload = request.body
sig = request.headers.get('Stripe-Signature')
try:
event = stripe.Webhook.construct_event(
payload, sig, settings.STRIPE_WEBHOOK_SECRET
)
except (ValueError, stripe.error.SignatureVerificationError):
return HttpResponse(status=400)
if event['type'] == 'payment_intent.succeeded':
intent = event['data']['object']
payment = Payment.objects.get(
stripe_payment_intent_id=intent['id']
)
transition_payment(payment, 'completed')
# Fulfill the order HERE, not in the API response
fulfill_order(payment.order)
elif event['type'] == 'payment_intent.payment_failed':
intent = event['data']['object']
payment = Payment.objects.get(
stripe_payment_intent_id=intent['id']
)
transition_payment(payment, 'failed')
return HttpResponse(status=200)The Golden Rules of Payment Processing
1. Every payment operation must be idempotent. Retries should be safe.
2. Webhooks are the source of truth, not API responses.
3. Log everything. Every state transition, every API call, every webhook. When a customer disputes a charge, you need a complete audit trail.
4. Test failure modes, not just success. Simulate timeouts, double submissions, webhook delays, and partial failures.
5. Reconcile daily. Compare your database against Stripe's records. Catch discrepancies before customers do.
In payment systems, optimism is a bug. Assume every network call will fail, every webhook will be delayed, and every user will click the button twice.
— Stripe Engineering Blog
