Great write up! Like Junaid mentioned, it is definitely underrated. Maybe because of the steep learning curve at the start and the other cloud agnostic options - although it has definitely improved over the years!
This is a wonderful post, thanks for putting it together. I'm curious on your thoughts of Step Functions to run a pipeline vs Airflow, say MWAA? Is one better than the other for certain situations? Thanks again.
Thanks a lot, Daniel. I’m really happy you liked the article.
I think that Airflow offers more advanced functionalities for running pipelines.
However, MWAA is a pretty bad managed service:
• Deploying environments takes hours, and adding custom dependencies is extremely complicated.
• I’ve encountered situations where we lost workers for no apparent reason, with no way to debug, and even AWS support couldn’t help.
• MWAA generates so many metrics and alarms that, overall, the service ends up being quite expensive.
For these reasons, if you’re working within AWS, the integration and cost-effectiveness of Step Functions outweigh the more advanced data engineering features of Airflow (MWAA) in many cases.
Great write up! Like Junaid mentioned, it is definitely underrated. Maybe because of the steep learning curve at the start and the other cloud agnostic options - although it has definitely improved over the years!
Great one, I have used Step Function quite alot. I like it and I think its underrated.
One of the e2e design I shared had it as well which shares a bit about step function as well: https://www.junaideffendi.com/cp/146962001
Step function triggering is a bit tricky, sqs and sns cannot trigger which is a bummer though.
Yes, you’re right!
Eventbridge Pipes help with that.
It is indeed strange that there isn’t "native" integration.
This is a wonderful post, thanks for putting it together. I'm curious on your thoughts of Step Functions to run a pipeline vs Airflow, say MWAA? Is one better than the other for certain situations? Thanks again.
Thanks a lot, Daniel. I’m really happy you liked the article.
I think that Airflow offers more advanced functionalities for running pipelines.
However, MWAA is a pretty bad managed service:
• Deploying environments takes hours, and adding custom dependencies is extremely complicated.
• I’ve encountered situations where we lost workers for no apparent reason, with no way to debug, and even AWS support couldn’t help.
• MWAA generates so many metrics and alarms that, overall, the service ends up being quite expensive.
For these reasons, if you’re working within AWS, the integration and cost-effectiveness of Step Functions outweigh the more advanced data engineering features of Airflow (MWAA) in many cases.