5 answers

What is the difference between data science and machine learning?

5
100% of 5 Pros
Asked Viewed 417 times Translate

I've been looking into data science careers, and I know that it is closely related with machine learning and big data. I'm confused as to what the difference between data science and machine learning is, and also how big data plays in a part in both fields. What exactly is data science and machine learning, and how are they related to each other (and how does big data tie into them)? Any help would be greatly appreciated.

#data-analysis #data-science #computer-science #computer-software #big-data #machine-learning #data-visualization #data #data-mining

5
100% of 5 Pros

5 answers

Kurt’s Answer

3
100% of 3 Pros
Updated Translate

"data science" is a more broad category, and "machine learning" is a subset. kinda like "medicine" would be a broad category and "heart surgery" would be a smaller subset of that discipline. So "all machine learning is a type of data science, but not all data science involves machine learning"... ;-)

3
100% of 3 Pros

Benjamin’s Answer

1
100% of 1 Students
Updated Translate

Data science is a discipline that works with machine learning and Big Data, as well as many other things. I work as a Data Scientist, and while I do use machine learning and Big Data in my job, it is not all I do. Also, you need to consider that there are different types of data scientists.


Machine Learning is, at its most basic, a predictive model created by feeding it data. Let us imagine we have a list of houses that sold recently. We have two columns, one with the square footage of the house and one with the price the house sold for. We could feed this data into a machine learning algorithm and it will build a model for us. Now if I ask the model how much a 2000 sq ft house will sell for, the model will provide us price based on the list of prices we had given it. Now obviously we use much more complex data sets with many many more variables, but at the end of the day machine learning boils down to asking a computer to either classify an object (is the picture a cat or dog?), provide a numeric value (regression - think of the house price example), or cluster (see how data should be best grouped based on attributes - think of all the students in your high school and how they can be grouped: jocks, drama clubs kids, nerds, popular crowd).


Big Data is massive, fast moving, data sets. It is a popular term, but not all data science or machine learning involves Big Data. Twitter is great example of big data with millions of tweets every few minutes.


In my case, I am what you might call an operational data scientist. I work in financial compliance at Verizon helping to hunt down people who are "gaming the system" or stealing from us by using loop holes in our policies. The biggest part of my job is finding, gathering, and cleaning data so I can analyze it. Once I have the data I may run it through a machine learning algorithm to create a predictive model they may help us to predict which people we should look at more closely (make the haystack a little smaller - easier to find the needle in.


A big data example I worked on with another company was using the voice recordings or people calling customer service. I was able to determine certain speech patterns that were more likely to be used by someone trying to commit some type of fraud. We were able to use this information to alert the customer care reps who to be on the look for.

Benjamin recommends the following next steps:

  • Check out Kaggle, great site to learn about data science and machine learning
  • Check out my website analytics4all.org - It is an introductory website designed to give an overview of everything from databases to machine learning and Big Data
1
100% of 1 Students

Michelle (Guqian)’s Answer

1
100% of 1 Students
Updated Translate

Data Science is a rather broad field that covers many areas, and machine learning is one of them. Data Science in the industry currently has three major tracks: analytics, generalist, and machine learning.


  • Analytics requires minimum statistical background and it requires someone to have keen business sense, and the ability to break down business problems into different aspects and do deep dives. Major skills needed for this track are: data pulling, data processing & dashboarding.
  • Generalist track requires you to solve a business / product problem end-to-end. You need to be able to understand the real problem, and has good business sense, knows how to solve it, and come up with a solution using statistical or modeling approach.
  • Machine learning track requires you to understand the problem, and could figure out what are the suitable ML techniques to apply here, which models you could apply and how to fine tune them with reasonable performance evaluation. You would also need to know how to have your model built in the product, how to evaluate its real-time performance, etc. Sometimes it's not the issue of simply building one model, it could become a ML system design problem that could involve multiple components.

Michelle (Guqian) recommends the following next steps:

  • Read some articles on how Data Science techniques are used in the industry, and have a brief idea which area you're more interested in.
  • Machine learning is also a broad area, and you could go really deep with it if you want. You could take the famous Andrew Ng's Machine learning course on coursera to see whether you like it! https://www.coursera.org/learn/machine-learning
1
100% of 1 Students

Bonnie’s Answer

0
Updated Translate
Tagging on to Benjamin’s answer I recommend Udacity’s free online courses. They have a free Stanford Introduction to Machine Learning Course and Intro to Data Science courses.

Bonnie recommends the following next steps:

  • Visit Udacity.com and choose from a list of over 200 fre courses
0

Kulwinder’s Answer

0
Updated Translate

Data science includes the algorithms and processing methodology for entire data as well.

Machine learning includes the implementing different algorithms for data to get best output.

0