Discussion about this post

User's avatar
Sam Verhasselt's avatar

Great project! It introduces me to a couple technologies I don't get to try out in my day job.

Because of the lack of predicate pushdown, would you consider connecting to the iceberg table with spark docker container? I think you can specify a directory as your 'catalog'. I feel like the overhead of spinning up a spark session is worth it because of how much less data needs to be read with each subsequent query.

On the other hand, if data transfer is free, you're not paying extra to scan more rows. It still requires more compute though.

Expand full comment
Dan Goldin's avatar

Very cool - love the example. In my mind the complexity is still with the catalog and managing that across the different compute engines. It seems we're ending up in a world where maybe there's a single "write" catalog and then a variety of "read" catalogs that are copies of the write.

Expand full comment
2 more comments...

No posts