Welcome to the exciting world of data engineering!
Every week, you'll get a chance to learn from real-world engineering as you delve into the fascinating world of building and maintaining data platforms.
From the technical challenges of design and implementation to the business considerations of working with clients, you'll get a behind-the-scenes look at what it takes to be a successful data engineer. So sit back, relax, and get ready to be inspired as we explore the world of data together!
Design Pattern of the week: AWS Eventbridge
Intro
Event-Driven Architecture (EDA) is a software design pattern that enables systems to respond to events in real time.
The idea is to design systems as a collection of loosely coupled components that communicate with each other through events.
This decentralization of processing allows for scalability, fault tolerance, and improved responsiveness, as events can be processed independently and asynchronously by multiple processing entities.
In the data engineering context, many workflows are triggered by events and generate events (such as status updates). These events usually need to be forwarded to one or several downstream services and/or to be formatted.
Example:
Several downstream services need to be informed about the creation of a new key in an S3 bucket.
You receive notifications from different providers in different formats, and you need a standard event envelope for your Lambda.
You want to handle S3 notifications coming from your data lake using a specific logic, depending on its prefix.
As a result, you may quickly find yourself creating many Lambda functions for handling these events.
However, in the serverless paradigm, Lambda functions represent a relatively low level of abstraction. The developer must define many parameters to optimize its behavior, such as the polling strategy, batch size, timeout, timeout shield, memory size, and concurrency.
For this reason, a serverless best practice is to use lambda only to encapsulate specific business logic and to use Amazon EventBridge for all the event-handling tasks.
Amazon EventBridge is well-known for its scheduler feature, which is used to trigger Lambda functions using a cron expression, but it provides other features for handling events, such as:
Event bus: for communication between N producers and N consumers.
Event pipe: for communication between 1 producer and 1 consumer (a new feature announced at re:Invent 2022).
Eventbridge bus
An Amazon EventBridge bus receives events from different sources and dispatches them to different destinations.
The events can be from various sources such as custom applications, or SaaS applications, and can be dispatched to various destinations such as AWS Lambda functions, Amazon SNS topics, Amazon SQS queues, and other AWS services.
EventBridge provides a unified way to manage the routing and processing of events, making it easier to build event-driven applications.
The dispatching is defined by “rules” with which you can:
filter event for a particular destination
transform event for a particular destination
The filtering of events is done by the definition of “event patterns”. In the example below I create a new event pattern for the filtering of s3 notifications when a new key is created in the bucket “test-bucket-eventbridge”.
You can realize advanced filtering as well by using array filtering, numeric range, or/and logic etc
{
"detail": {
"$or": [
{ "c-count": [ { "numeric": [ ">", 0, "<=", 5 ] } ] },
{ "d-count": [ { "numeric": [ "<", 10 ] } ] },
{ "x-limit": [ { "numeric": [ "=", 3.018e2 ] } ] }
]
}
}
When you create a rule, you attach a destination to it, and all events that match the rule's filter criteria are forwarded to the destination.
Additionally, when defining the target, you have the option to choose an "input-transformer," which allows you to modify the format of the event before it is sent to the destination.
This allows you to standardize the event data, making it easier to process and integrate with other systems.
For example, if we have the following input event :
{
"version": "0",
"id": "7bf73129-1428-4cd3-a780-95db273d1602",
"detail-type": "EC2 Instance State-change Notification",
"source": "aws.ec2",
"account": "123456789012",
"time": "2015-11-11T21:29:54Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2:us-east-1:123456789012:instance/i-abcd1111"
],
"detail": {
"instance-id": "i-0123456789",
"state": "RUNNING"
}
}
First, you need to define an “Input Path” in order to parse the event and extract variables:
{
"timestamp" : "$.time",
"instance" : "$.detail.instance-id",
"state" : "$.detail.state",
"resource" : "$.resources[0]"
}
This json defines four variables <timestamp>
, <instance>
, <state>
, and <resource> that can be reused
in an “Input Template” to define the format of the output event:
{
"timestamp" : <timestamp>,
"message": "instance <instance> is in <state>",
"Transformed" : "Yes"
}
The template would return the following event:
{
"timestamp" : 2015-11-11T21:29:54Z,
"message": "instance i-0123456789 is in RUNNING",
"Transformed": "Yes"
}
Eventbridge pipe
Amazon EventBridge Pipes is a new feature announced by AWS during re:Invent 2022. It focuses on integration patterns between 1 provider and 1 consumer.
It is particularly useful for connecting two AWS services to each other without having to create a Lambda function in the middle.
If you want to connect a DynamoDB stream to an SQS queue, you cannot do it using an Amazon EventBridge bus, as it does not support other AWS services as possible sources.
In this scenario, you would need to use a Lambda function to act as a middleman, which would read the events from the DynamoDB stream and then publish them to the EventBridge bus.
That's why the new Amazon EventBridge Pipes feature can be useful in this scenario, as it allows you to directly connect two AWS services without the need for a Lambda function in the middle.
Pipes solved that problem by providing a serverless integration between 2 services: a source
and one destination
With Amazon EventBridge Pipes, you have the same filtering and transforming capabilities as you do with Amazon EventBridge bus.
This means you can apply filters to the events as they flow from the source service to the destination service, and you can also use input transformers to modify the format of the events before they are sent to the destination.
A powerful feature of Amazon EventBridge Pipes is the ability to add an enrichment Lambda function in the middle, which allows you to further customize events for a specific consumer.
The basic use case for Amazon EventBridge Pipes is to use a Lambda function to perform some enrichment on events.
However, the most interesting use case is to use the EventBridge API destination. With this feature, you can define an API call, including the URL, parameters, HTTP method, and authentication, and then parse the output from the API call to enrich your event. This enables you to dynamically fetch data from external systems and integrate it with your events.
The path Parameter automatically parses the json received from the API, extracts the even-id field, and adds it to the input event.
You don’t need to worry about setting up lambda, integration roles, error handling, batch processing, etc.
Wrap-up
Amazon EventBridge is indeed a very powerful service that can greatly simplify the management of events in a serverless architecture. By providing features such as event buses, event pipes, and the ability to add enrichment Lambdas or API destinations, EventBridge enables you to centralize your event processing workflows and streamline the communication between your event producers and consumers. This helps you to reduce the number of Lambdas required in your architecture, minimize the complexity of your event processing pipelines, and improve the overall reliability and performance of your event-driven applications.
thank you for reading.
-Ju
I would be grateful if you could help me to improve this newsletter. Don’t hesitate to share with me what you liked/disliked and the topic you would like to be tackled.
P.S. you can reply to this email; it will get to me.