7 Comments

Very very excited that you're building content on multi-engine data stacks. Keep it up! Gets me 🧃 up

Expand full comment

First of all, I love your posts. Your contribution to the data community is very appreciated! Considered this stack, what would you introduce to source data from a database, such as SQL Server?

Expand full comment
author

thanks Kent !

You would need CDC. Depending on your need you can opt for open-source (Debezium), AWS DMS or any other vendor in that space. Changes would then be writen to s3 (possiblity with compaction) before downstream consumption.

Expand full comment

Embedding DuckDB in Lambdas as an API layer to serve Snowflake results is worth exploring. I wonder how well it scales compared to syncing data to Redis/PSQL or solutions like Airfold/Tinybird.

Expand full comment
author

Hi Julian,

I conducted some experiments in this blog post: https://juhache.substack.com/p/exploring-duckdb-aws-lambda.

The main drawback was the cold start of the Lambda function.

Indeed, benchmarking it with the tools you mentioned is a great idea.

Do you have any figures already at Airfold?

Expand full comment

> I have no idea how well PyIceberg can read/write large data but I will test it in a future post!

Pretty well! See iceberg-python/#428 and iceberg-python/#444

https://github.com/apache/iceberg-python/issues/428

https://github.com/apache/iceberg-python/pull/444

Expand full comment
author

Thanks a lot Kevin ! Looks really good indeed !

Expand full comment