Launching: The Boring Semantic Layer

Ju Data Engineering Weekly - Ep 87

Jun 24, 2025

Bonjour!
I'm Julien, freelance data engineer based in Geneva 🇨🇭.
Every week, I research and share ideas about the data engineering craft.
Not subscribed yet?
👨🏽‍💻 echo {YOUR_INBOX} >>

This post is brought to you by Bufstream— a Kafka drop-in replacement with built-in schema enforcement.

Bufstream ensures data quality at the source by validating every message against a Protobuf schema — so bad data never enters your streams in the first place.

Yet another semantic layer!

I’m currently working with

David Krevitt

, leveraging Boring Data templates to build vertical data products.

One of our core assumptions is that MCP-only interfaces can disrupt traditional UI-based data products.

While building our first product, which manages Stripe failed payment dunning, we noticed a major limitation:

LLMs struggle to generate correct SQL queries when retrieving data from a DB.

They often:

Misinterpret the join relationships
Get confused about column meanings or context

Our key takeaway:

👉 We need a semantic layer!

By enforcing consistent data access patterns, this semantic layer helps reduce the error rate of the LLM.

I researched the available tools on the market.

I wanted a simple Python package that I can use directly within my MCP implementation:

I haven’t found anything that fully meets my needs: open, composable, and … simple.

Semantic layer tools fall into one of the following categories:

Not Open Source:
- dbt MetricFlow
- Snowflake Semantic Layer

Bundled with a larger platform:
- Rill Data
Requires hosting a dedicated service:
- Cube.dev
- DataJunction

The best option I found was Malloy.

But Malloy isn’t easily embeddable in Python as far as I know, and I’m concerned that LLMs may struggle to generate correct Malloy queries.

So…

We partnered up with Hussain Sultan from xorq-labs (a multi-engine version of ibis) and built the simplest semantic layer on earth: boring-semantic-layer.

It has two main differentiators:

Lightweight: Just pip install boring-semantic-layer
Built on top of Ibis: Supports any engine Ibis integrates with

Boring Semantic Layer

Demo Time!

Let’s take a look at how our semantic layer works.

Step 1: Define your semantic model

With boring-semantic-layer, you define your semantic model using a SemanticModel class:

Each model is built on top of an Ibis table — meaning it works with any backend Ibis supports, like DuckDB, BigQuery, Snowflake, and more.

Example with DuckDB and Parquet:

Once you’ve defined your table, you can declare your dimensions and measures:

Dimensions describe how you slice your data — e.g. country, user_cohort, or time_of_day.
Think of them as the “by” in “group by”: flights by origin, users by signup_date.
Measures define what you measure — e.g. total_revenue, unique_users, or average_order_value.
These are usually aggregations or calculations.

All dimensions and measures are expressed as Ibis expressions — which are Python functions representing database operations.

This gives you a clean, backend-agnostic way to express logic.

The t parameter represents your Ibis table — and you can chain operations naturally, like you would with pandas.

Ibis compiles these expressions into optimized SQL for the target engine — whether it’s DuckDB, Postgres, BigQuery, or others.

Step 2: Query your semantic model

Once defined, the semantic model exposes a .query() method that lets you combine dimensions and measures.

For example, if you want to count the number of flights by origin, run:

This returns a DataFrame like:

Under the hood, the Ibis expressions are compiled into SQL to query the underlying data sources.

As you can see, the .query() interface is much more restrictive by design.

By only allowing dimensions and measures to be specified, the input is tightly constrained, making it far less error-prone.

Filtering

Querying dimensions and measures is great — but you also need a way to filter data dynamically to answer more specific user questions.

The easiest way to add filters to a query is, of course, by using Ibis expressions:

However, there’s a catch: Ibis expressions are difficult to serialize, which makes them hard for an LLM to generate or interact with.

To address this, we’ve implemented an additional filtering option: JSON-based filtering.

Instead of writing Ibis expressions directly, the user can specify filters using a simple JSON format:

This filtering approach is much more LLM-friendly and covers all basic filtering needs using standard operators.

However, this solution doesn’t cover more complex filtering and transformations, especially those involving time-based logic.

To address that, we’ve implemented a dedicated time-based query mechanism, inspired by Rill’s MCP implementation.

To use it, simply define a time_dimensions in your SemanticModel that points to a timestamp column:

The user can then perform simple time-based aggregations using the time_range and time_grain parameters of the .query() method.

This makes time-based analysis incredibly easy.

Want the number of flights per day in January?

Just write:

Want the year-to-date (YTD) number of flights?

Super simple and flexible.

Join

In most cases, tables are exposed in a normalized form, split across multiple tables.

These relationships should be abstracted away from the semantic layer consumer.

To handle this, we expose a Join class that allows you to define relationships between semantic models.

Let’s stick with the flight example from earlier.

Now, we’ll add a new semantic model: carriers.

We can now enhance our flight semantic model by using the joins argument:

The end user can now query carrier information without having to think about any joins:

Next Step

We’re really excited about the boring-semantic-layer!

In the coming weeks, we’ll be adding:

A YAML interface
Charting support
MCP support
Materialization and caching support

We’ve already seen very promising results with MCP and will share more in a follow-up post.

Check out the demo video:

We’d love to hear your feedback and welcome contributions — and don’t forget to ⭐ the project!

We’ll also be sharing updates with David about the product we’re building on top of the boring-semantic-layer.

Stay tuned!

Thanks for reading,

-Ju

Ju Data Engineering Newsletter

Discussion about this post