4 answers

How do I get started in the field of data science/big data/machine learning?

Asked Los Angeles, California

I'm a third year computer science major, and I've been trying to learn more about data science/big data/machine learning. There aren't any classes offered at the college that I'm attending that cover these topics, and I'm a little lost as to how I would get started in any of these fields. What are some resources that I could use to learn more about these fields and get a taste of what they're like? Any help would be appreciated.

#computer-software #data-science #data-analysis #big-data #machine-learning #software-engineering #engineering #analytics #computer-engineering

4 answers

Dhairya’s Answer

Updated Boston, Massachusetts
Hi Albert, Excited to see you interested in data science and machine learning. It can be a daunting space to get up and running with. I'll talk about getting up and running with data science and machine learning. For big data, you're really looking at a set of technologies (e.g. Hadoop, various AWS tools, Spark, etc). You can learn these on your own by downloading and experimenting with them, but realistically most new professional are exposed to them on the job. What's more important is learning to think about how to parallelize and scale computation tasks to take advantage of those technologies. On the data science/ML front. First check if there are classes in statistics department around statistical learning. Regardless of whether data science/ML classes are available, you'll want to build a solid base in statistics and probability. If offered take the following classes to help create a solid base: - intro to statistics and probability - bayesian inference - linear algebra and differential equation optional - Calculus I and II for machine learning Machine learning builds on what you learn in those classes above and gives you a set of tools and techniques tackle various problems. Most machine learning engineers use the same set of software packages to get started (e.g. Python and the following python libraries: sci-kit learn, numpy, pandas, etc). The best way to learn is by doing. I'd suggest taking Udacity's Intro to Machine Learning class or following the ML tutorials on Kaggle. The Udacity class is free and provides a great application based approach to get you up and running with machine learning. Once you have a taste of the common techniques (e.g. Regression, Decision Trees, K nearest neighbors, etc) you'll want to get more real world experience with more complex problems and data sets. Kaggle provides a great source of resources here. They data science competitions where any one can participate and the top competitors can even win prizes. The competitions will give you a fantastic introduction to diverse set of problems and you'll get to see what cutting edge ML techniques are being used (hint it's almost alway XGboost lol). Finally, talk to your career office and look for internship opportunities. Good luck!
Updated
Hi Dhairya, thanks for the helpful advice! I will definitely check out the courses and videos. I'm leaning toward data science/machine learning but I will also look into big data as well.
Updated
if you are checking out big data, I'd suggest experimenting with Map-Reduce and the python library mrjob (https://pythonhosted.org/mrjob/). map-reduce is a fundamental technique for parallelizing computation tasks on any big data platform (e.g. Hadoop).
Updated
Hello Albert. Look into Nano Degree in Data science. This a good start to see if you will really the profession.

Daria’s Answer

Updated

Hello Albert,

I am happy to see that you have in interest in Data Science! I personally think anyone with an interest to learn more about math/statistics and computer programming can carve a path for themselves to become an incredible data scientist - even if you don't have a degree specific to that field. In your case, studying computer science is a great first step.


In addition to the advice already given here, I would highly suggest more hands-on activities that you can do anywhere, anytime (as long as you are connected to the internet).

Here are my recommendations:

1) Sign up for this course on UDEMY course offered by Kirill Eremenko: Machine Learning A-Z: Hands on Python and R in Data Science

2) Data Science weekly newsletter: https://www.datascienceweekly.org/

This newsletter gives you a better understanding of how others in the field are using data science to solve business problems. Here you will also find career advice, job/internship opportunities, book recommendations, etc. It's really a great resource to scan weekly for additional insights from professionals in the field.

3) You can also check out Kirill Eremenko's own website where he offers free tutorials, advice, bootcamps, etc.


I personally also took advantage of these above resources to get a better understanding for the skills and tools I need to succeed in Data Science. Kirill Eremenko is a great teacher and makes his lectures very easy to follow, fun to listen to, and applicable to Data Science and real-life business problems. You can also use some of the skills you learn from his UDEMY tutorial to create a side-project and post it on GitHub for future employers to check out!


Best of luck and never give up! :)

Eden’s Answer

Updated Seattle, Washington

Hi Albert,


I will be taking those classes my self in the next two months. Fortunately, my school offers variety of computer related courses and Machine Learning is one of them. We are required to have a very good understanding of Discrete Mathematics and C programming before taking those classes. So, if you think you are good at those, you can easily learn of off a you tube lecture tutorials. If you are not familiar with those, I would recommend you to have a good understanding on Discrete Math and Objective-C programming, which can also be found online.


Once I register for that class, I will tell you more about it. I might as well share resources and helpful sites soon. Good Luck until then.


Jesmin K’s Answer

Updated Frisco, Texas

Learn basics of data analysis and statistics using online courses.Install and gain familiarity of data tool s such as Tableau,Hadoop,Python and R.

Jesmin K recommends the following next steps:

  • Do small projects to master the skills aquired.