Ju Data Engineering Newsletter

Sep 22

Thanks, Saravan. Sure, everything you mentioned is relevant, but having the catalog out of the write path bould be more resilient.

Expand full comment

Viktor Kessler

Iceberg Catalog is more than just manage read/writes. Additionally Catalog will mange authorization/Governance and solve maintenance of the table. That will keep your Lakehouse really open and all new future tools/engines can consume data without forcing any migration neither data or rbac etc

Expand full comment

Yes, exactly. In that context, it’s not directly tied to the concept of Open Table Format but rather functions as an rbac mechanism for the files in a data lake.

Expand full comment

Emanuel Oliveira

So by "(..)By adding the If-Match header to a PUT operation, S3 will return an error if the object already exists.(..)" i suppose se api uses combination of object key + version ?

Expand full comment

Sep 22

Yes, the object key. If it’s a versioned bucket, the new object becomes the most recent version of the key.

Expand full comment

John Wessel

I’m curious if anyone has tried iceberg with azure blob? When I most recently compared s3 and azure blob it felt like there was still a pretty big gap between them - but I didn’t do a deep dive.

Expand full comment