Stuff I wish I knew before becoming a data scientist

September 11, 2022

One year on the job

Anniversaries are a mixed bag. From celebrations to mournings, they are a funny bunch, and as of tomorrow I will have clocked-up 1 whole year in my current role as a Data Scientist. Now, I have had some data jobs prior to this role - some internships and consultancy work here and there - but I shamelessly admit that this is my first real job.

In this job, I feel fortunate to be surrounded by an intelligent, but most importantly kind, data team hailing from companies such as Spotify, King, and Bloomberg. They have taught me how to construct and orchestrate SQL pipelines, modularize and test my dbt code, as well as extract signal from messy industry data. However, the most valuable thing I have learnt from my team are the standards hold your workplace and manager to. What would my career look like if I hadn't been so lucky to learn this?

This post highlights 3 things you should evaluate before considering a role as a Data Scientist.

Manager

The relationship you build with your manager is an extremely significant factor in determining the success of your role. You could be the best Data Scientist at your company, but it counts for nothing unless your manager also agrees. Ultimately, your manager decides:

  • What you work on
  • Who you work with; and
  • When you work on a given task

Which directly affects:

  • The skills you develop
  • The network you build; and
  • The exposure you have to growth opportunities

Managers dictate the ceiling you can attain in your role, and you want this to be as high as possible.

This isn't an advert for grossly flattering your manager, however. Instead, the story here is to understand that managers are also human beings facing pressures from their manager that you are (hopefully) unaware of. They need work done, and they need it done well. So early on in your career, show that you don't mind sweeping the floor a little! This will earn you some brownie points for you to spend later down the line.

For those who have maybe swept the floor one too many times though, ask yourself whether your manager is aware of your career goals and interests? Do they know that you dream of serving machine learning models with a hotch-potch MLOps of tools? Or that you are dying to work with Bayesian statistics in any way you possibly can? If they don't, the chances of you doing what you want any time in the near-future are incredibly slim, and you'll never stop sweeping those floors.

Giving your manager a good impression of yourself is one thing, but the relationship you have with your manager should also work both ways. It's imperative that you also have some expectations for your manager. Here's some that I have for mine:

  • Do they provide a safe environment for the team to have open communication and feedback?
  • Do they schedule regular one-on-ones with you (every 4-6 weeks) to provide guidance on how to succeed in your role, discuss your career growth and salary, as well as most importantly, just connect as people?
  • Do they shield you from outside noise and prevent context switching?
  • And finally, do they recognise good work when they see it?

If your answer to most of these questions are yes, then you're probably enjoying your time at work and are in good shape for some nice career growth. If your manager is failing at these tasks though, I think it's time you look elsewhere 👀

Org structure

So you and your manager are tight? Great, but how does the rest of the org perceive the value your data team brings to the business? This is equally important as the relationship with your manager since if your manager dictates what you work on, your data team's relationship with the rest of the org dictates the space of possible work available. It doesn't matter then if you and your manager are chums 4 lyf if the work they can assign you sucks.

If you're on data twitter, then you've probably run across some blog posts about running your data team like a product team. For the uninitiated, this probably doesn't mean much, but the TLDR on this idea is that it enables data teams to scale and I'd strongly recommend anyone to read the linked blogged post by former GitLab employees Emilie Schario and Taylor Murphy.

Assuming you want to work in a product-first data environment then, one way to quickly gauge if this is happening is see who the data team rolls up to. If the head of data reports to a CPO, CTO, or even has their own C-level title in Chief Data Officer (CDO), then you're probably in good shape for building some nice data products at your company; dashboards, models, pipelines, experiments, the lot. Skies grow dark however when the data team reports to a CFO or CMO. Chances are the data team will be run more as a service, and you'll be stuck fetching numbers and shipping csvs until the end of your days.

Besides company hierarchy, some other questions to ask your current/potential manager to gauge whether you're in a product-first data environment are:

  • What deliverables does the data team produce to deliver value to the business?
  • What would success look like for the data team?
  • Who are the main stakeholders of the data team? Or,
  • How many A/B tests did the team conduct in the last month or quarter?

If answers to these questions are vague and unconvincing then that's a 🚩 and another sign you should look elsewhere.

Maturity

Finally, the reality of the Data Scientist title is that Data Scientists at one company might have completely different responsibilities to a Data Scientist at another. This has led to criticism of the role in the past (insert link here), and even calls for the title to be replaced with a combination of Data Analyst, Analytics Engineer, or Machine Learning Engineer (insert link here). Parking that discussion for the moment, I believe the underlying cause which clarifies what it actually means to be a Data Scientist at a particular company is the maturity of that company's data team within the organisation. This is something I think you should strongly consider before accepting a full-time role.

For example, let's start this exploration by listing the top 10 things that I think are in scope for a Data Scientist to own in order of low maturity to high maturity:

  1. Ad hoc data fetches
  2. Ad hoc data analyses
  3. Writing SQL pipelines (or other data pipelines)
  4. Creating metric frameworks
  5. Building dashboards
  6. Running experiments (A/B tests)
  7. Analysing experiments (A/B tests)
  8. Developing statistical and/or machine learning models
  9. Creating internal packages and tools for analyses
  10. Publishing data science research

Now points 1 and 2 would rarely be considered Data Science, but I'm almost certain there are Data Scientists out there, from super young start-ups to giant corporations, where their role is to just get numbers to stakeholders as fast as possible. At this maturity stage, the data team are essentially stuck in the data-as-a-service model and remain in this position until the start-up gets another funding round, or the corporation wakes up to their mis-management. With the bag secured, more members are recruited to the data team and the company begins to build up their analytics infrastructure: that's points 3, 4, and 5. If this infrastructure building is successful, a bank of robust metrics will be available to measure the company against thus unlocking points 6, 7, and 8. Finally, with the culture of data-driven product development established, the data team have the space to begin building internal packages to enhance iteration speed, and have probably created some problems so niche along this whole journey that they require cutting-edge research to solve them: that's points 9 and 10, respectively.

As a current or potential Data Scientist then, you should look at this list and consider the type of work you'd like to be doing. An immature data team will never be able to give you work on natural language processing if or no one outside the data team can tell you how many daily active users (DAU) your product has. Now, this isn't to say that working on NLP models is the pinnacle of data science. There is some extremely rewarding work to be done in building analytics infrastructure that could probably be argued as being more impactful to the business. However, If you're the type of person who wants to be in their text editor all day, then you have to ensure your role is at a company with mature data team, or be willing to wait for your current team to grow.