Understanding Event-Driven Architecture from the Perspective of an API Server

Solution 1: Make APIs Asynchronous

Before processing with event-driven architecture, tasks can be handled with asynchronous APIs.

Since Go is relatively easy for concurrent processing, let’s use Go as an example:

package main

import "net/http"

func StartEc2Handler(w http.ResponseWriter, r *http.Request) {
    // ...
    go func() {
      // Code for launching an EC2 machine
		
      // Code for creating user notification about success/failure
    }()
    // ...
	w.WriteHeader(http.StatusAccepted) // 202 Accepted: Indicates the request has been received and the process has started
}

The code above processes the code for launching an EC2 machine using a goroutine and returns a 202 Accepted to indicate the request has been received and the operation has started.

Even with such a setup, the time from request to response can be greatly reduced as shown below.

As shown above, the user’s request takes only about 2 seconds, and the response can be received very quickly. However, the code for creating notifications has been added.

This is because 202 merely indicates that the request has been received and does not guarantee that the operation has been completed.

Therefore, when responding regardless of success/failure, notifications, or status codes should be also sent to inform the client about the success/failure of the task.

Of course, there is no issue code-wise even if you don’t handle notifications. However, if you’re a developer, not ensuring that users can know the success/failure of their tasks through notifications, harms the user experience greatly. It is necessary to consider adding notifications promptly.

Did we finish it…?

Not really. There are various constraints and problems in the world.

Once you start expanding code like this, you encounter several issues, and the following three were the ones I faced:

Asynchronous tasks generally involve a high computational load (otherwise it would be simple and problem-free to handle them synchronously). Thus, if traffic to the API increases, the server might easily go down.
If work is interrupted due to deployments, acts of God, etc., recovery is difficult and users do not receive notifications.
If it’s an external service API, rate limits generally exist, and if requests flood in, 429 Too Many Requests may be returned, meaning the external service might not process work properly.

Solution 2: Handling with Event-Driven Architecture

Event-driven architecture, as the name suggests, refers to an architecture that processes tasks based on events.

There is an Event Broker in the middle, and tasks are processed by having a Producer publish events and a Consumer subscribe to events.

Event Brokers include Kafka, RabbitMQ, AWS SNS, and so on, and Producers and Consumers exchange events through them.

Practically, the Consumer handles tasks (such as launching an EC2 machine), and the Producer publishes an event when a request is received, after which the API returns a 202 Accepted.

Let’s review the code.

In the example below, events are published using RabbitMQ and processed in a subscription manner.

package main

import (
    "net/http"
    
    amqp "github.com/rabbitmq/amqp091-go"
)

func PublishStartEc2EventHandler(w http.ResponseWriter, r *http.Request) {
    // Connecting to RabbitMQ and getting a channel
    conn, err := amqp.Dial("amqp://guest:guest@localhost:5672/")
    if err != nil {
        w.Write([]byte(`{"result": "rabbitmq connection error"}`))
        w.WriteHeader(http.StatusInternalServerError)
        return
    }
	
    ch, err := conn.Channel()
    if err != nil {
        w.Write([]byte(`{"result": "rabbitmq channel error"}`))
        w.WriteHeader(http.StatusInternalServerError)
        return
    }
	
    // Publishing an event
    event, err := json.Marshal(Event{
        Code: "START_EC2",
        RequestID: "some_uuid",
        Body: "some_body",
    })
    if err != nil {
        w.Write([]byte(`{"result": "json marshal error"}`))
        w.WriteHeader(http.StatusInternalServerError)
        return
    }
    
    if err = ch.PublishWithContext(
		r.Context(),
        "",
        "aws-event-queue",
        false,
        false,
        amqp.Publishing{
            ContentType: "application/json",
            Body:        event,
    }); err != nil {
        w.Write([]byte(`{"result": "rabbitmq publish error"}`))
        w.WriteHeader(http.StatusInternalServerError)
        return
    }
	
    w.WriteHeader(http.StatusAccepted)
}

The code is somewhat lengthy. The code itself isn’t important, but the role of this API is crucial.

In this API, tasks are processed in the order of Connecting to RabbitMQ -> Creating an Event Object -> JSON Serialization -> Placing Event Object in Body and Publishing Event.

In practice, connections to the Queue are shared upon application startup and generally do not need to be established with each request, but for ease of understanding, I included it to help ease concerns about seeing conn.Channel() pop up unexpectedly.

Consumer code is quite lengthy depending on its function, so the following diagram omits it:

For routing, depending on the configuration, you can either create a separate Queue for each event and process it directly from the Queue, or as shown in the diagram above, handle routing within the Application.

However, as shown in the above diagram, the role of the Consumer can expand, and the burden of handling branches grows as the number of events increases.

As it scales up (more accurately, as it becomes difficult to manage), you might additionally set up a separate layer for event branching or even set up a load balancer to designate Consumers.

Why Lock?

In event-driven architecture, you can optionally lock during consumption.

A lock is a method to control concurrency for a specific resource, such as RedisLock or DB Lock.

Generally, holding a lock prevents another Consumer from processing a specific event or Queue. Once the task is completed, the lock is released, allowing another Consumer to process the said event.

I usually applied locks for the following reasons:

In a multi-container environment, multiple Consumers inevitably exist. Without a distinct lock, it becomes uncertain whether another container is working on the said Queue, potentially consuming simultaneously by the number of Containers. I applied locks to limit the number of events processed simultaneously to this effect.
When there are multiple Consumers, they might accidentally process the same event, so locks are applied to prevent this.
Many external services limit the number of requests processed at once to prevent overload. My case had a very restrictive Rate Limit of about 2 per second, so I applied locks to ensure only one external service API was processed at once.

Event Chain

One advantage of event-driven architecture is that it can create an event chain to connect several events together.

For instance, the EC2 handler had the following two tasks:

The task of launching an EC2 machine
Creating notifications about the launched EC2 machine

When made into an event chain, it looks like this:

When looking at the code, it becomes even simpler; a very straightforward implementation might look as follows:

func StartEc2Event(args any) {
	// Task to launch an EC2 machine
	// ... Some code
	
	// Publish a CreateAlarm event
	ch.PublishWithContext(
        r.Context(),
        "",
        "aws-event-queue",
        false,
        false,
        amqp.Publishing{
            ContentType: "application/json",
            Body:        event, // Roughly indicating that the task has been successful
    })
}

func CreateAlarm(args any) {
	// Create the notification
}

Although a Consumer function, the StartEc2Event function consumes and publishes a new event, allowing for easily implementing subsequent tasks based on the event-driven model.

The more standardized the events are, the more reusable they become. In the given example, the CreateAlarm function can be used not only in the specific case but also in various other asynchronous tasks that utilize such events.

From a productivity standpoint, the significant advantage lies in not having to implement events that have already been developed.

Disadvantages

It looks straightforward when presented in these diagrams, but depending on the situation, the code can become quite complex.

The deeper the event chain gets, the harder it becomes to follow its flow, and during debugging, understanding the relationships between events can be challenging.

Therefore, it is crucial to document the relationships between events thoroughly.

In Conclusion

Event-driven architecture, when used properly, is one of the most powerful architectures available. It allows you to control the number of consumed APIs to reduce server load and increase reusability through event chains.

From a user standpoint, it’s a grateful technology enabling them to engage in other tasks while reducing the wait time from request to response. However, monitoring also requires separate configurations to bundle things into one transaction, and improper event object creation could lead to much higher complexity compared to simply handling synchronous APIs.

Therefore, I believe it’s crucial to have comprehensive discussions with team members, establish conventions in advance, and maintain those conventions well when using event-driven architecture.

What is REST?, RESTful?, REST API?Testing Webhooks with webhook.site