Communication Is the Hard Part

In a microservices architecture, the complexity shifts from within services to between services. How services communicate determines the system's reliability, performance, and operational complexity. At Nexis Limited, Bondorix uses a combination of synchronous REST APIs, asynchronous event processing, and gRPC for internal high-throughput communication.

Synchronous Communication

REST APIs

HTTP-based REST APIs are the most common inter-service communication pattern. Service A sends an HTTP request to Service B and waits for the response. Simple, well-understood, and debuggable. The main drawback: the calling service is blocked while waiting, and if Service B is down, Service A's request fails.

gRPC

gRPC uses Protocol Buffers for serialization and HTTP/2 for transport, providing significantly better performance than JSON over HTTP/1.1. Strongly typed contracts generated from .proto files catch integration errors at compile time. We use gRPC for internal service-to-service calls in Bondorix where performance matters — the bidding engine communicates with the pricing service via gRPC.

Asynchronous Communication

Message Queues

Services communicate by sending messages to a queue (RabbitMQ, AWS SQS). The sender puts a message on the queue and continues without waiting. The receiver processes messages from the queue at its own pace. This decouples services temporarily — if the receiver is down, messages queue up and are processed when it recovers.

Event Streaming

Event streaming (Apache Kafka) provides durable, ordered, replayable event logs. Services publish events to topics, and any interested service can consume them. Unlike message queues, events are retained and can be replayed. This supports event sourcing, analytics pipelines, and audit trails.

API Gateway

An API gateway sits between external clients and internal services. It handles:

  • Request routing to the appropriate backend service.
  • Authentication and authorization.
  • Rate limiting and throttling.
  • Request/response transformation.
  • Protocol translation (REST to gRPC, for example).

Popular options include Kong, Envoy, AWS API Gateway, and custom API gateways built with frameworks like Express or Gin.

Service Mesh

A service mesh (Istio, Linkerd) adds a proxy sidecar to each service Pod. The sidecar handles traffic management, security (mTLS), observability, and resilience patterns transparently. The application code communicates with localhost and the sidecar handles the rest. This is valuable for large microservices deployments but adds significant operational complexity.

Resilience Patterns

Circuit Breaker

When a downstream service fails repeatedly, the circuit breaker "opens" and stops sending requests, returning an error or fallback response immediately. After a timeout, the circuit breaker "half-opens" and sends a test request. If it succeeds, the circuit closes and normal traffic resumes. This prevents cascading failures.

Retry with Backoff

Automatically retry failed requests with exponentially increasing delays (1s, 2s, 4s, 8s). Add jitter (random delay) to prevent thundering herd problems when many clients retry simultaneously. Set a maximum retry count to prevent infinite loops.

Timeout

Always set timeouts on inter-service calls. A missing timeout means a slow downstream service can block the calling service indefinitely, consuming connections and degrading performance.

Choosing the Right Pattern

  • Use synchronous (REST/gRPC): When the caller needs the response immediately to continue processing.
  • Use asynchronous (events/queues): When the caller does not need an immediate response, or when decoupling services is more important than immediate feedback.
  • Use a hybrid: Most real systems combine both — synchronous for user-facing request-response flows and asynchronous for background processing and cross-service data propagation.

Conclusion

Microservices communication design is fundamental to system reliability. Choose patterns based on latency requirements, coupling tolerance, and operational capabilities. Implement resilience patterns (circuit breakers, retries, timeouts) for all synchronous calls, and invest in observability to diagnose communication failures quickly.

Designing a microservices architecture? Our team builds and operates distributed systems in production.