Editor’s Note: This post does not contain ANY code but describes concepts for data science which are language agnostic.
“We do not learn from experience… we learn from reflecting on experience.”
― John Dewey, American Education Reformer
After having dabbled in data science for the past few years, I decided to take the leap and attend an intensive data science bootcamp in Berlin at Data Science Retreat (DSR) from January till April.
While I’m planning on writing some posts in the future describing what I’ve learned programming wise, (e.g.,
pandas is the python equivalent of
dplyr) today I’m focusing on something more universal, what helps teams succeed namely: being organized, having clear objectives and roles, and maintaining open channels of communication.
After wrestling with the basics of git, bash, anaconda virtual environments, pandas, sql, numpy, and some machine learning models in a little more than 2 and 1⁄2 weeks (I did say it was intensive), the next task was that my colleagues and I were divided into 3 member teams and presented with a modified version of the Rossman Kaggle Challenge which required us to predict the sales (in Euros) of a single store based on a historical data set.
Charged with our task and armed with an endless supply of coffee, we set about the following:
Step 1: Establishing a Road Map
We were given approximately 2.5 days to complete the challenge. As such, the first thing we did as a team was to define the objective of the competition which, once again, was to create a model to predict the sales of a given store with as little error possible.
Having identified the goal, we were able to work backwards to identify the steps in the process which would ensure us reaching our target in the time allotted by creating clear objectives. For example, if we found ourselves starting down a rabbit hole, we could ask ourselves “how does this task help us reach our goal?” If it didn’t, it got scrapped and we refocused on the task at hand.
Step 2: Assigning Clear Roles
Given that our task naturally divided into three roles (e.g., data cleaning, feature engineering, and modelling) with the additional task of setting up the git repo for sharing code. Naturally, each of us took a role and I took on the added task of setting up git since I had the most experience with it (“In the land of the blind, the one-eyed man is king.”)
By having these clear roles, it limited the number of problems each of us had to solve. This also greatly reduced our stress as well as our cognitive load; less to worry about means greater focus and clarity of purpose. However, being assigned a given role is far different from working in isolation. To that end, anytime any of us felt like we’d hit a roadblock we’d ask each other for assistance which was only possible because we had:
Step 3: Establishing and Maintaining Channels of Communication
Neither of the above steps would have been successful if we hadn’t established an open dialogue with each other from the outset. To achieve and maintain this state, we held 3 - 5 minute meetings every 45 - 90 minutes. These meetings had the following structure: - review of the previous target - request for assistance - setting the next target
The purpose of these meetings was to provide a space to identify how much progress we’d made individually and as a team, ask for help, and, most importantly, identify what our next objectives should be. Additionally, by maintaining contact with one another, we avoided accidentally doubling up on work as well as avoiding unnecessary task (i.e., creating features which won’t make it into the model, imputing values for features which get dropped during cleaning, etc).
I was fortunate to be paired with Johanna Viktor and Julian Frank; they were both incredibly competent and generous with their time in that they were more than willing to assist me when I ran into obstacles as well as explain aspects of python I had yet to encounter.
Lesson for Next Time
Create Multiple Branches on Git
While our repo (which you can find here) served it’s purpose, it was incredibly difficult for us to deal with multiple merges. To that end, for the next challenge, I plan on creating one repo for the team and individual branches for each member. That way, given that each team member is working on separate tasks in separate files, we can commit and push own own files to repo and merge the branches at the end.