- Tags:: 🗞️Articles, Data culture
- Author:: Jeff Magnusson
- Link:: Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department | Stitch Fix Technology – Multithreaded
- Source date:: 2016-03-16
- Finished date:: 2020-11-01
✍️ La tiranĂa del Thinker
[As a thinker, you] get to sit around all day, think up better ways to do things, and then hand off your ideas to people who eagerly rush to put them into production.
In this model the Doers are solely accountable for implementation, failure, and support of other people’s ideas, while the Thinker is rewarded for their success.
This is at the heart of the contention and misalignment between the teams. It creates an IT group rather than an engineering team.
In order to attract talented engineers into a role like that, you need some really big scaling problems to serve as a distraction to the soulless, subservient role you have hired them into. You need the type of problems created by the existence of Big Data. And, I’m sorry, but you don’t have Big Data. Instead, you will hire mediocre engineers.
Everybody wants to be the “Thinker”. Because it sounds like such a cool role!
Data scientists, especially those who are newer to the industry and don’t know any better, are especially vocal about desiring such a role.
The assembly line handoff from scientist to engineer creates the polar opposite environment. (Truth is, even the Thinker resents having to rely on the Doer). The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved.
In case you did not realize it, Nobody enjoys writing and maintaining data pipelines or ETL. It’s the industry’s ultimate hot potato. It really shouldn’t come as a surprise then that ETL engineering roles are the archetypal breeding ground of mediocrity.****
The alternative:
To sum it up, engineers must deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy (such as a tool, framework, or service used to build, schedule, and execute ETL). I like to think of it in terms of Lego blocks. Engineers design new Lego blocks that data scientists assemble in creative ways to create new data science. (…) The engineers own the infrastructure that they build, and the data scientists own the business logic and algorithm implementations that they provide.
Other
It is absolutely essential for platform engineers to stay ahead of the data science teams.
For Data team vision and mission:
We are not optimizing the organization for efficiency, we are optimizing for autonomy. What is offered is clear ownership of ideas and accountability for their delivery.
In the absence of abstractions and frameworks for rolling out solutions, engineers partner with scientists to create solutions