-
Tags:: 📚Books, Data methodology
-
Author:: Russell Jurney
-
Liked:: 6
-
Link:: Agile Data Science 2.0
-
Source date:: 2017-06-01
-
Finished date:: 2021-10-11
-
Cover::
The Data Science Process
In other words, we can’t tell you exactly what we will ship, when. But in exchange for accepting this reality, you get a constant, shippable progress report, so that by participating in the reality of doing data science you can use this information to coordinate other efforts. That is the trade-off of Agile Data Science. (p. 16)
Adapting to Change
Continuous and iterative sharing of intermediate work is one of the things that make agility in Data Science possible (p. 20)
in Agile Data Science, there is no unpublishable state. The rest of the team must see weekly, if not daily (or more often), updates to the state of the data (p. 22)
Notes on Process
It is easy to detect and fix errors in parsing. Systemic errors in algorithms are much harder to detect without a second, third, fourth pair of eyes. And they need not all be data scientists—if a data scientist presents her code with an explanation of what is happening, any programmer can catch inconsistencies and make helpful suggestions. What is more, having a formal code review process sets the standard for writing code that is understandable and can be shared and explained. (p. 24)
This cultural impact is perhaps the most important aspect of code review, because it creates cross-training among team members who become proficient at understanding and fixing components of the system they don’t usually work on or maintain. You’ll be glad you have a code review process in place when a critical data scientist or data engineer is out sick and you need someone else to find and fix a bug in production. (p. 24)
It is very easy to get people excited about data across departments when they can see concrete proof of the progress of the data science team (p. 26)