10 Comments

This is a great post. Is this same applicable to Azure blob? Also i still feel even though the basic functionality of a catalog is to map the table to its latest metadata file, catalogs will play a role in much more than just mapping but access control, data governance, search and discovery and data lineage

Expand full comment

Thanks, Saravan. Sure, everything you mentioned is relevant, but having the catalog out of the write path bould be more resilient.

Expand full comment

Iceberg Catalog is more than just manage read/writes. Additionally Catalog will mange authorization/Governance and solve maintenance of the table. That will keep your Lakehouse really open and all new future tools/engines can consume data without forcing any migration neither data or rbac etc

Expand full comment

Yes, exactly. In that context, it’s not directly tied to the concept of Open Table Format but rather functions as an rbac mechanism for the files in a data lake.

Expand full comment

So by "(..)By adding the If-Match header to a PUT operation, S3 will return an error if the object already exists.(..)" i suppose se api uses combination of object key + version ?

Expand full comment

Yes, the object key. If it’s a versioned bucket, the new object becomes the most recent version of the key.​

Expand full comment

I’m curious if anyone has tried iceberg with azure blob? When I most recently compared s3 and azure blob it felt like there was still a pretty big gap between them - but I didn’t do a deep dive.

Expand full comment

Good point, don't know much about Azure blob TBH. Let me know if you find the info :)

Expand full comment

> I’m not completely certain about this take, but given the potential implications, we should get an answer soon enough…

I have a feeling AWS re:Invent will have something to say about it this year

Expand full comment

Great post. I get the appeal of Iceberg but also think having a standalone/dedicated catalog feels hacky. There's something elegant about Parquet files just being the source of truth!

But Iceberg does also provide a bit more - for example schema management that would still be a problem even with the new If-Match functionality, right?

Expand full comment