After one of my posts in June, Tarush, founder of 5X, contacted me.
We had an interesting conversation about the current data tool market and, specifically, the rebuilding of the data stack.
He proposed supporting one of my posts, which I gladly accepted as I've been eager to explore this bundling topic for a long time.
5X offers a generous 10 free hours of consulting/implementation services for my readers.
Sign up for a demo and mention the newsletter, or simply email sales@5x.co to qualify.
Imagine you are the co-founder of a young marketplace startup.
You’ve spent the first years of your company building the product and supporting its growth.
You now have a tech team of 5-10 engineers and are increasingly being asked by the business to understand your customers’ behavior.
It’s time to get started with analytics!
You sit down and try to draft a project plan:
“
Okay, I cannot run my analytical queries on my prod instance;I should use a dedicated engine. Snowflake? BigQuery? DuckDB?
How do I get my data there? Which ELT tools support CDC from my prod DB?
I want to join this data with website event data from Google Analytics.
Which ETL tool has the right connectors?
I will also need to transform the data. I’ve heard about dbt—how does it work exactly?
And what about BI: Superset, Rill, PowerBI, Tableau? Arrggh
Oh, and I almost forgot !
My pipeline needs to be coordinated somehow.
Do I need an orchestrator?
Which one should I pick?
“
The eco-system is very fragmented, and setting up a new stack requires combining several tools: data ingestion, batch/streaming data processing, dbt, BI, and orchestration, data catalog.
This leads to significant engineering costs for setting up and maintaining the platform, considerably raising the entry barrier for analytics.
This high investment becomes especially risky given the often unclear ROI of such an initiative.
Many vendors have identified this problem.
Let’s explore the various approaches they are taking to address it.
One-stop Shop Warehouses
The simplest way to solve this problem is to have one tool providing all that you need in one box.
However, building an ETL, an engine, and an orchestrator in one platform requires massive capital to support associated R&D costs.
The only ones today able to invest that much are the cloud warehouses: Snowflake, Databricks, Big Query, and Co.
They are expanding horizontally quite aggressively toward orchestration and ingestion.
For example, Snowflake supports more and more orchestration features via Tasks and Dynamic Tables.
They also push toward the ELT side by providing managed integrations with third-party services.
Although ELT and orchestration features are less powerful than dedicated tools, they may soon become a one-stop shop for controlling the complete data supply chain.
Unified control plane
Another approach to simplifying the management of data platforms is to provide a control plane over several niche tools.
This is the promise of orchestrators: one place where you control them all.
Dagster, Kestra, and Orchestra are all building integrations with as many tools as possible, including ELTs, warehouses, databases, and third-party data providers.
They gather metadata from all these tools and help build a control plane over the complete stack.
This is compelling as it reduces the maintenance burden by making it simpler to detect failing pipelines, set up an alerting system, and relaunch failed runs from a single place.
Bundling as a Service
Orchestrators' unified control plane simplifies the management of a fragmented stack but does not help with its initial setup.
This is the promise of another approach exemplified by 5X, which provides a managed platform integrating a mix of existing vertical tools into one consistent UI.
Their idea is to use the core features of existing tools and build a common UI on top of them.
It works on top of your existing Snowflake warehouse (and soon BigQuery, Azure). If you don’t have a Snowflake warehouse, you can provision one from 5X at a 5% discount on Snowflake credits.
Their platform is split into four segments:
5X Ingestion: The ingestion layer is built on top of Fivetran connectors, which are directly integrated into their UI.
They can also build custom connectors on demand in a matter of days, leveraging Gravity’s tech that they acquired last year.
5X Modelling: They provide an IDE and dbt interface, similar to dbt Cloud, where you can write dbt models.
5X Orchestration: They also directly integrate Dagster as an orchestrator into the interface.
5X BI: They use Superset for the BI layer, and they will soon add Tableau, Looker, PowerBI, and Sigma.
They have a pay-as-you-go pricing model based on credit consumption; credits that you can spend on the four segments mentioned above.
The pros of such an approach are quite clear:
Consistent UI across existing data tools.
One credit for the entire platform: no need to manage various invoices.
Lower cost: they announce a -30% total cost of ownership (TCO) compared to building from scratch and up to -50% when including the manpower to build and maintain the platform.
These advantages come at the cost of working with an opinionated stack with pre-selected tools and the risk of being limited by the speed at which new features are released.
This trade-off is reasonable for companies that find investing in an ecosystem-led stack too expensive or risky and where the integrated approach can help quickly validate the ROI of analytics use cases.
I find each of these three bundling options interesting, as organizations now have the choice of the path they want to follow: platform, ecosystem-led, or integrated.
They can find the best trade-off between lock-in, flexibility, and scalability for their context.
This probably indicates that the ecosystem is maturing progressively and entering its democratization phase now:
Thanks for reading, and thanks, Tarush and the 5X team for the support and great collaboration.
-Ju
I would be grateful if you could help me to improve this newsletter. Don’t hesitate to share with me what you liked/disliked and the topic you would like to be tackled.
P.S. You can reply to this email; it will get to me.
Interesting idea though security wise it’s not as simple as true bundling.
Thanks for the shout out!