How do I get started in the field of data science/big data/machine learning?
I'm a third year computer science major, and I've been trying to learn more about data science/big data/machine learning. There aren't any classes offered at the college that I'm attending that cover these topics, and I'm a little lost as to how I would get started in any of these fields. What are some resources that I could use to learn more about these fields and get a taste of what they're like? Any help would be appreciated.
#computer-software #data-science #data-analysis #big-data #machine-learning #software-engineering #engineering #analytics #computer-engineering
Excited to see you interested in data science and machine learning. It can be a daunting space to get up and running with. I'll talk about getting up and running with data science and machine learning. For big data, you're really looking at a set of technologies (e.g. Hadoop, various AWS tools, Spark, etc). You can learn these on your own by downloading and experimenting with them, but realistically most new professional are exposed to them on the job. What's more important is learning to think about how to parallelize and scale computation tasks to take advantage of those technologies.
On the data science/ML front. First check if there are classes in statistics department around statistical learning. Regardless of whether data science/ML classes are available, you'll want to build a solid base in statistics and probability.
If offered take the following classes to help create a solid base:
- intro to statistics and probability
- bayesian inference
- linear algebra and differential equation
- Calculus I and II for machine learning
Machine learning builds on what you learn in those classes above and gives you a set of tools and techniques tackle various problems. Most machine learning engineers use the same set of software packages to get started (e.g. Python and the following python libraries: sci-kit learn, numpy, pandas, etc). The best way to learn is by doing. I'd suggest taking Udacity's Intro to Machine Learning class or following the ML tutorials on Kaggle. The Udacity class is free and provides a great application based approach to get you up and running with machine learning. Once you have a taste of the common techniques (e.g. Regression, Decision Trees, K nearest neighbors, etc) you'll want to get more real world experience with more complex problems and data sets. Kaggle provides a great source of resources here. They data science competitions where any one can participate and the top competitors can even win prizes. The competitions will give you a fantastic introduction to diverse set of problems and you'll get to see what cutting edge ML techniques are being used (hint it's almost alway XGboost lol).
Finally, talk to your career office and look for internship opportunities. Good luck!
- Statistics and Probability
- Regression Analysis
- Time Series Analysis
Happy to see that you're interested to get into the field of Data Science/ML.
In order to be good at your Job of a DS you need to great with three things.
1) Mathematics - This includes Statistics + Calculus.
3) Any coding Language example could be Python.
A lot of people make this mistake of jumping right in on writing ML code without understanding the underlining algorithms. They only focus on calling on a few packages and running codes using them. This might give them quick results , it. wont benefit on a longer run when they start solving real world problems.
Second Key Aspect is , Practice on real data. Go on Kaggle.com to solve for some real problems.
Get access to online datasets and try to clean them. Data Cleaning is 80% of a usual day in a DS's life.
Intern with Startups and companies to work on Projects with them.
For everything written above, you have stuff available online for free.
With this knowledge , it would be pretty straightforward to get into a role of DS/ML engineer.
I am happy to see that you have in interest in Data Science! I personally think anyone with an interest to learn more about math/statistics and computer programming can carve a path for themselves to become an incredible data scientist - even if you don't have a degree specific to that field. In your case, studying computer science is a great first step.
In addition to the advice already given here, I would highly suggest more hands-on activities that you can do anywhere, anytime (as long as you are connected to the internet).
Here are my recommendations:
1) Sign up for this course on UDEMY course offered by Kirill Eremenko: Machine Learning A-Z: Hands on Python and R in Data Science
2) Data Science weekly newsletter: https://www.datascienceweekly.org/
This newsletter gives you a better understanding of how others in the field are using data science to solve business problems. Here you will also find career advice, job/internship opportunities, book recommendations, etc. It's really a great resource to scan weekly for additional insights from professionals in the field.
3) You can also check out Kirill Eremenko's own website where he offers free tutorials, advice, bootcamps, etc.
I personally also took advantage of these above resources to get a better understanding for the skills and tools I need to succeed in Data Science. Kirill Eremenko is a great teacher and makes his lectures very easy to follow, fun to listen to, and applicable to Data Science and real-life business problems. You can also use some of the skills you learn from his UDEMY tutorial to create a side-project and post it on GitHub for future employers to check out!
Best of luck and never give up! :)