Last Friday, I stumbled upon something that left me gobsmacked. OpenAI announced ChatGPT plugins, and I couldn't believe I hadn't seen it coming sooner.
In today's newsletter, I'll introduce you to ChatGPT plugins and explain why they're creating such a buzz in the data world.
Chat GPT plugins ?!?
You've probably experienced this at least once while using ChatGPT: "As of my knowledge cutoff date of September 2021…"
The model isn't connected to any data sources beyond the original dataset it was trained on, which can be quite limiting.
Plugins solve this problem by allowing the user to query any external API through chat-GPT.
Here's a quick breakdown of their key components:
A manifest: The blueprint that guides the plugin.
An API endpoint: The communication hub for your plugin.
API endpoint spec: Detailed info on how the endpoint works.
With these elements in place, ChatGPT takes the wheel and does the rest,
The AI model acts as an intelligent API caller. Given an API spec and a natural-language description of when to use the API, the model proactively calls the API to perform actions.
ChatGPT engages in conversation with you, the user. As you chat, it cleverly figures out which API calls to make, all based on the information you provide.
But wait! Retrieving data from an API – isn't that what we do as data engineers?
… and then I got scared …
As a Data Engineer, I began to reflect on my daily tasks. Interestingly, I realized that about 80% of my work consists of low-level activities.
Writing data transformations in Python
Building database objects (tables/views)
Data modeling
Writing config files for infrastructure configuration
System design
While the remaining 20% is for communication/stakeholder management.
What is defensible in the 80% of doing?
Probably nothing except data modeling and system design.
Will In a few months our data eng tasks look like this?
Writing data transformations in PythonBuilding database objects (tables/views)Creating config files for infrastructure configurationData modeling
System design
Chat GPT prompting:
“create a step function workflow running every day at 8 pm importing data from API blabla.com to s3 bucket xxx”
“create an incremental dbt model running every hour with merge key PK_XX”
Taking a step back, our profession is caught between two powerful trends:
On one hand, there's a massive surge in demand. Companies everywhere are realizing the importance of building data engineering teams to create robust data infrastructure. This is essential to extract value from data and remain competitive.
On the other hand, data engineering tasks are gradually being cannibalized by:
Smart agents like ChatGPT
Data-sharing platforms like Snowflake that simplify integration efforts, reducing the workload for data engineers.
So, we find ourselves in a unique situation: more job opportunities are emerging, but at the same time, technological advancements are reducing the scope of work in the field.
… before seeing the opportuities…
In my opinion, these rapid changes in data engineering are reshuffling the cards.
Those who were once 10X better than their peers now are at the same level.
A new race is beginning.
The modern data engineer will probably combine system design with LLM prompting skills and deliver 10x more than an engineer today.
People have begun developing "templates" that outline their expectations from GPT, aiming to enhance the quality of its output.
While today's primary use case revolves around text generation, this technique of prompt engineering holds great potential for application in engineering contexts as well.
I'm not sure if we're experiencing an Apple-store moment, but in today's world, being an engineer means getting tremendous leverage: armed with the right skills and tools, a single engineer can now construct an extensive data platform almost single-handedly, and without any upfront cost.
I have new data product ideas flooding into my head every day, and the prospect of being able to build them almost alone is really exciting!
It's truly an exciting time to be an engineer!
thank you for reading.
-Ju
I would be grateful if you could help me to improve this newsletter. Don’t hesitate to share with me what you liked/disliked and the topic you would like to be tackled.
P.S. you can reply to this email; it will get to me.