This is a great post. Is this same applicable to Azure blob? Also i still feel even though the basic functionality of a catalog is to map the table to its latest metadata file, catalogs will play a role in much more than just mapping but access control, data governance, search and discovery and data lineage
Iceberg Catalog is more than just manage read/writes. Additionally Catalog will mange authorization/Governance and solve maintenance of the table. That will keep your Lakehouse really open and all new future tools/engines can consume data without forcing any migration neither data or rbac etc
Yes, exactly. In that context, it’s not directly tied to the concept of Open Table Format but rather functions as an rbac mechanism for the files in a data lake.
So by "(..)By adding the If-Match header to a PUT operation, S3 will return an error if the object already exists.(..)" i suppose se api uses combination of object key + version ?
I’m curious if anyone has tried iceberg with azure blob? When I most recently compared s3 and azure blob it felt like there was still a pretty big gap between them - but I didn’t do a deep dive.
Great post. I get the appeal of Iceberg but also think having a standalone/dedicated catalog feels hacky. There's something elegant about Parquet files just being the source of truth!
But Iceberg does also provide a bit more - for example schema management that would still be a problem even with the new If-Match functionality, right?
This is a great post. Is this same applicable to Azure blob? Also i still feel even though the basic functionality of a catalog is to map the table to its latest metadata file, catalogs will play a role in much more than just mapping but access control, data governance, search and discovery and data lineage
Thanks, Saravan. Sure, everything you mentioned is relevant, but having the catalog out of the write path bould be more resilient.
Iceberg Catalog is more than just manage read/writes. Additionally Catalog will mange authorization/Governance and solve maintenance of the table. That will keep your Lakehouse really open and all new future tools/engines can consume data without forcing any migration neither data or rbac etc
Yes, exactly. In that context, it’s not directly tied to the concept of Open Table Format but rather functions as an rbac mechanism for the files in a data lake.
So by "(..)By adding the If-Match header to a PUT operation, S3 will return an error if the object already exists.(..)" i suppose se api uses combination of object key + version ?
Yes, the object key. If it’s a versioned bucket, the new object becomes the most recent version of the key.
I’m curious if anyone has tried iceberg with azure blob? When I most recently compared s3 and azure blob it felt like there was still a pretty big gap between them - but I didn’t do a deep dive.
Good point, don't know much about Azure blob TBH. Let me know if you find the info :)
> I’m not completely certain about this take, but given the potential implications, we should get an answer soon enough…
I have a feeling AWS re:Invent will have something to say about it this year
Great post. I get the appeal of Iceberg but also think having a standalone/dedicated catalog feels hacky. There's something elegant about Parquet files just being the source of truth!
But Iceberg does also provide a bit more - for example schema management that would still be a problem even with the new If-Match functionality, right?