What is the difference between data science and machine learning?
I've been looking into data science careers, and I know that it is closely related with machine learning and big data. I'm confused as to what the difference between data science and machine learning is, and also how big data plays in a part in both fields. What exactly is data science and machine learning, and how are they related to each other (and how does big data tie into them)? Any help would be greatly appreciated.
#data-analysis #data-science #computer-science #computer-software #big-data #machine-learning #data-visualization #data #data-mining
"data science" is a more broad category, and "machine learning" is a subset. kinda like "medicine" would be a broad category and "heart surgery" would be a smaller subset of that discipline. So "all machine learning is a type of data science, but not all data science involves machine learning"... ;-)
Data science is a discipline that works with machine learning and Big Data, as well as many other things. I work as a Data Scientist, and while I do use machine learning and Big Data in my job, it is not all I do. Also, you need to consider that there are different types of data scientists.
Machine Learning is, at its most basic, a predictive model created by feeding it data. Let us imagine we have a list of houses that sold recently. We have two columns, one with the square footage of the house and one with the price the house sold for. We could feed this data into a machine learning algorithm and it will build a model for us. Now if I ask the model how much a 2000 sq ft house will sell for, the model will provide us price based on the list of prices we had given it. Now obviously we use much more complex data sets with many many more variables, but at the end of the day machine learning boils down to asking a computer to either classify an object (is the picture a cat or dog?), provide a numeric value (regression - think of the house price example), or cluster (see how data should be best grouped based on attributes - think of all the students in your high school and how they can be grouped: jocks, drama clubs kids, nerds, popular crowd).
Big Data is massive, fast moving, data sets. It is a popular term, but not all data science or machine learning involves Big Data. Twitter is great example of big data with millions of tweets every few minutes.
In my case, I am what you might call an operational data scientist. I work in financial compliance at Verizon helping to hunt down people who are "gaming the system" or stealing from us by using loop holes in our policies. The biggest part of my job is finding, gathering, and cleaning data so I can analyze it. Once I have the data I may run it through a machine learning algorithm to create a predictive model they may help us to predict which people we should look at more closely (make the haystack a little smaller - easier to find the needle in.
A big data example I worked on with another company was using the voice recordings or people calling customer service. I was able to determine certain speech patterns that were more likely to be used by someone trying to commit some type of fraud. We were able to use this information to alert the customer care reps who to be on the look for.
Benjamin recommends the following next steps:
Data Science is a rather broad field that covers many areas, and machine learning is one of them. Data Science in the industry currently has three major tracks: analytics, generalist, and machine learning.
- Analytics requires minimum statistical background and it requires someone to have keen business sense, and the ability to break down business problems into different aspects and do deep dives. Major skills needed for this track are: data pulling, data processing & dashboarding.
- Generalist track requires you to solve a business / product problem end-to-end. You need to be able to understand the real problem, and has good business sense, knows how to solve it, and come up with a solution using statistical or modeling approach.
- Machine learning track requires you to understand the problem, and could figure out what are the suitable ML techniques to apply here, which models you could apply and how to fine tune them with reasonable performance evaluation. You would also need to know how to have your model built in the product, how to evaluate its real-time performance, etc. Sometimes it's not the issue of simply building one model, it could become a ML system design problem that could involve multiple components.
Michelle (Guqian) recommends the following next steps:
people use term Data science more broadly - it definitely includes machine learning, and AI, and it can also includes more traditional modeling and statistic as well;
Every day there are new buzz words being introduce to refer to the same technology, my advice if you are interested in learning technology is to stay away from tech marketing and focus on the fundamentals: mathematics, statistics, computer science!
Bonnie recommends the following next steps:
Machine learning creates a useful model or program by autonomously testing many solutions against the available data and finding the best fit for the problem. This means machine learning is great at solving problems that are extremely labor intensive for humans. It can inform decisions and make predictions about complex topics in an efficient and reliable way.
These strengths make machine learning useful in a huge number of different industries. The possibilities for machine learning are vast. This technology has the potential to save lives and solve important problems in healthcare, computer security and more. Google, always on the cutting edge, has decided to integrate machine learning into everything they do to stay ahead of the curve.
Data Science Process
The proliferation of smartphones and digitization of so many parts of daily life have created massive amounts of data. At the same time, the continuation of Moore’s Law, the idea that computing would dramatically increase in power and decrease in relative cost over time, has made cheap computing power widely available. Data science exists as the link between these two innovations. By combining these components, data scientists can derive more insight from data than ever before.
The practice of data science requires a unique combination of skills and experience. A good data scientist is fluent in programming languages like R and Python, has knowledge of statistical methods, an understanding of database architecture and the experience to apply these skills to real-world problems. A masters in data science may build upon existing knowledge to ensure that you are best prepared for a long career in this ever-growing field.
Data Scientist vs Machine Learning Engineer
Skills Needed for Data Scientists
Data mining and cleaning
Unstructured data management techniques
Programming languages such as R and Python
Understand SQL databases
Use big data tools like Hadoop, Hive and Pig
Skills Needed for Machine Learning Engineers
Computer science fundamentals
Data evaluation and modeling
Understanding and application of algorithms
Natural language processing
Data architecture design
Text representation techniques