- Tags:: 📝CuratedNotes , Working in Data
The options
From Data Team Structure Embedded or Centralised:
Link to original
There is an additional disadvantage of embedded teams, short-term thinking. From 🗞️ How should our company structure our data team:
^55568dThis misalignment in priorities caused lots of problems — data engineering was focused on building the data health of the company long term, while the DAs were pursuing short term revenue at the cost of the companies’ long term data integrity. Our solution was to take the best of both teams and combine them into one full-stack data team.
Link to original
Let’s briefly define a couple of roles first, because data team roles are a mess:
- “Classic data engineers” would be the ones that do whatever is needed to move data around the way we need it. If we stick strictly to this definition, you probably don’t need them to be embedded since you could argue that business context there is not that important.
- “Analytics engineers” are the ones that make sure we model the data in a way that makes sense for analysts, scientists, product… Also responsible for quality checks. They definitely need to know about the business and thus, you likely want them to be embedded.
- Now, this may be over specialization, so you will probably want a “data engineer” to cover both needs. And thus, you wanted them both centralized and embedded.
The ideal end goal
Let’s imagine that you have full buy-in and unlimited money. What would you want to have?
- At least 2 DAs and 2 DEs embedded on every multidisciplinar team you have (or “stream-aligned” team in 📖 Team Topologies parlance). One just doesn’t cut it because of the bus factor, and you cannot fix that by asking other DEs to act as pair or peer reviewers because they don’t have that team business context and would be too much context switching if you want them to have it. With this you make sure you have business alignment and easy prioritization.
- But you still have the problem of a lack of centralized tooling and standards. What do we usually do in regular soft. engineering about this problem? Centralized teams, either transient or permanent: SRE, DevEx, and/or Staff engineers with chapters in which an agenda of centralized objectives are prioritized (and development is made by the same embedded engineers).
How do we reach there?
Any company would be reluctant to go straight to the end goal because it’s a lot of investment on people without a lot of clarity on the value they’ll bring. So, you won’t likely have enough people to have them embedded in all teams. In addition, you probably have a good amount of data debt that a stream-aligned team won’t really want to prioritize.
A central team would give you the flexibility to move with your workload (firefighting and feature work) while combining it with a more strategic view and tackling debt.
But as the needs grows, the team will grow to a point where the cognitive load (from all the different domains) will be too high, and prioritization will be a mess.. it’s time to at least divide the team in pods (see the story of 🗞️ How should our company structure our data team).