What should I learn to become a Data Scientist?

I've already known Python libraries like pandas, numpy and some linear math. #science #python #datascience

Sachin’s Answer

Hi Vladislav,

Thanks for the question. Here is a webpage that lists the steps detailing all the skills, knowledge and training you need to become a data scientist


Hope this helps and good luck!

Su’s Answer

Hi Vladislav,

First of all, I'll put all the materials I found useful here:

Coursera (ML / Statistics / Big Data / Data Visualization)

- Machine Learning, by Stanford

- Deep Learning Specialization (5 courses), by deeplearning.ai

- Advanced Machine Learning Specialization (3/7 courses), by National Research University

- Bayesian Statistics, by University of California, Santa Cruz. check 1point3acres for more.

- Data Visualization and Communication with Tableau, by Duke

- Big Data Integration and Processing, by University of California, Santa Cruz

Books (ML / Statistics)

- Hands-On Machine Learning with SciKit-Learn and TensorFlow

- Python Machine Learning

- Pattern Recognition and Machine Learning (PRML)

- The Elements of Statistical Learning (ESL)

- An Introduction to Statistical Learning (ISL)

- Machine Learning: A Probabilistic Perspective

- Interpretable Machine Learning

Secondly, the role Data Scientist in tech industry have several different duties:

  • Data Analytics: interaction with data warehouse and discover insights, require SQL skills
  • Machine Learning Engineer: maintain ML models and solve business needs, close to backend software engineer role
  • Machine Learning Scientist: also related to ML models but less involved in large scale problems

So I'd suggest to find a particular role to start with and focus on. 

For example, as Data Analytics its a most have skill set to run sophisticated SQL queries and be familiar with modern data warehouse like Hive, SparkSQL. A great book to start with is: https://www.manning.com/books/big-data-warehousing-cx

As a machine learning engineer, I'd recommend to start with machine learning knowledges as well as general software engineer skill sets.

Most company that hiring particularly machine learning Scientist requires Phd degree or more than 5 years experience. 

Lastly, this is a fast changing industry and the requirements can be dramatically different in 3/5 years. So I'd suggest to take interviews with real companies every year.



Su recommends the following next steps:

  • Find a particular field of data scientist to start with
  • Go through the list of books/courses I shared above (and more..)
  • Knowing big data related skill sets is a good plus(Hadoop, Spark..)
  • Take interviews with real companies every year.