9 Comments
Apr 17·edited Apr 17Liked by Julien Hurault

SQLMesh does thing similar by using views and then "pointing" the view to the underlying physical table you want to use for that environment.

Another approach we used at my old company was having multiple versions of our pipeline run in parallel (production + pre-release) but based off of the same input data source. Then we would have jobs compare the results of the two versions and if things aligned with our expectations we'd then push our code to production. We would only run the pre-release version when needed since it was an expensive process but worked well for our use case. It also worked well since the input data source we received didn't change much and we rarely needed to make changes to that step of the pipeline.

Expand full comment

Thanks Julien for the explanation here! Definitely something to explore

Still, I'm wondering how complicated it is to implement in real life 🤔

As pointed out in the post you pointed out:

"And in fact, we cannot stress this enough, moving to a full Lakehouse architecture is a lot of work. Like a lot!"

Is this work really valuable when a lot of subjects like governance, GDPR, data culture, etc. are very late compared to data warehouse architecture and tooling around quite mature now (dbt, SQLMesh, etc.)

Don't get me wrong: lakehouse architectures are the future; but might be a bit early to move on now. WDYT?

Expand full comment

Very interesting, thanks 🙏

Would you happen to have tested LakeFS for that pattern?

It seems to provide similar cross-table branching as Nessie, so I'm not sure how both overlap...

Expand full comment

im not very dbt expert, but seems wouldn't be more clear if commmand dbt clone to specify both source and target ?

Expand full comment