Skip to main content
8 answers
9
Asked 1519 views

How do I get started in the field of data science/big data/machine learning?

I'm a third year computer science major, and I've been trying to learn more about data science/big data/machine learning. There aren't any classes offered at the college that I'm attending that cover these topics, and I'm a little lost as to how I would get started in any of these fields. What are some resources that I could use to learn more about these fields and get a taste of what they're like? Any help would be appreciated.

#computer-software #data-science #data-analysis #big-data #machine-learning #software-engineering #engineering #analytics #computer-engineering

+25 Karma if successful
From: You
To: Friend
Subject: Career question for you

9

8 answers


3
Updated
Share a link to this answer
Share a link to this answer

Dhairya’s Answer

Hi Albert,
Excited to see you interested in data science and machine learning. It can be a daunting space to get up and running with. I'll talk about getting up and running with data science and machine learning. For big data, you're really looking at a set of technologies (e.g. Hadoop, various AWS tools, Spark, etc). You can learn these on your own by downloading and experimenting with them, but realistically most new professional are exposed to them on the job. What's more important is learning to think about how to parallelize and scale computation tasks to take advantage of those technologies.

On the data science/ML front. First check if there are classes in statistics department around statistical learning. Regardless of whether data science/ML classes are available, you'll want to build a solid base in statistics and probability.

If offered take the following classes to help create a solid base:
- intro to statistics and probability
- bayesian inference
- linear algebra and differential equation

optional
- Calculus I and II for machine learning

Machine learning builds on what you learn in those classes above and gives you a set of tools and techniques tackle various problems. Most machine learning engineers use the same set of software packages to get started (e.g. Python and the following python libraries: sci-kit learn, numpy, pandas, etc). The best way to learn is by doing. I'd suggest taking Udacity's Intro to Machine Learning class or following the ML tutorials on Kaggle. The Udacity class is free and provides a great application based approach to get you up and running with machine learning. Once you have a taste of the common techniques (e.g. Regression, Decision Trees, K nearest neighbors, etc) you'll want to get more real world experience with more complex problems and data sets. Kaggle provides a great source of resources here. They data science competitions where any one can participate and the top competitors can even win prizes. The competitions will give you a fantastic introduction to diverse set of problems and you'll get to see what cutting edge ML techniques are being used (hint it's almost alway XGboost lol).

Finally, talk to your career office and look for internship opportunities. Good luck!
Thank you comment icon Hi Dhairya, thanks for the helpful advice! I will definitely check out the courses and videos. I'm leaning toward data science/machine learning but I will also look into big data as well. Albert
Thank you comment icon if you are checking out big data, I'd suggest experimenting with Map-Reduce and the python library mrjob (https://pythonhosted.org/mrjob/). map-reduce is a fundamental technique for parallelizing computation tasks on any big data platform (e.g. Hadoop). Dhairya Dalal
Thank you comment icon Hello Albert. Look into Nano Degree in Data science. This a good start to see if you will really the profession. Nathan Fricke
3
0
Updated
Share a link to this answer
Share a link to this answer

Animesh’s Answer

https://developers.google.com/machine-learning/crash-course

Coursera- ML course by Andrew NG

Disclaimer- haven't done these courses myself. Plan to do in the near future.
0
0
Updated
Share a link to this answer
Share a link to this answer

Srini’s Answer

Coursera offers online courses with very minimal cost. The URL is - https://www.coursera.org/

You can start with the courses offered for Data Science, Machine Learning and Big Data and later you can get into advanced courses.
0
0
Updated
Share a link to this answer
Share a link to this answer

Dominic’s Answer

As a computer science student, you may use your free time to expand on your knowledge in data science through self-learning. There are several nano-degree courses you can take on Coursera, Udacity, Udemy, etc. I would advise you to start with concepts that you find easier and work your way up. Using the knowledge you learned on some real-world problems or hackathons would not only reinforce further your knowledge but also help you build a good portfolio on GitHub that you can use later to demonstrate your knowledge to potential employers. Finally, you may join data science communities in Kaggle, LinkedIn, TowardsDataScience and other platforms to stay engaged and learn from peers. Good luck!
0
0
Updated
Share a link to this answer
Share a link to this answer

Eric’s Answer

You'll find that Data Science and Machine Learning packages in Python and R are becoming easier to use, but building a model is not the goal. Interpreting models and results, however, requires fundamental knowledge of the math and statistics that back it up. I'd look for classes in the Statistics department, e.g.:
- Statistics and Probability
- Regression Analysis
- Time Series Analysis
- Econometrics
0
0
Updated
Share a link to this answer
Share a link to this answer

Anant’s Answer

Hi Albert ,

Happy to see that you're interested to get into the field of Data Science/ML.
In order to be good at your Job of a DS you need to great with three things.

1) Mathematics - This includes Statistics + Calculus.
2) Databases.
3) Any coding Language example could be Python.

A lot of people make this mistake of jumping right in on writing ML code without understanding the underlining algorithms. They only focus on calling on a few packages and running codes using them. This might give them quick results , it. wont benefit on a longer run when they start solving real world problems.

Second Key Aspect is , Practice on real data. Go on Kaggle.com to solve for some real problems.
Get access to online datasets and try to clean them. Data Cleaning is 80% of a usual day in a DS's life.

Intern with Startups and companies to work on Projects with them.

For everything written above, you have stuff available online for free.

With this knowledge , it would be pretty straightforward to get into a role of DS/ML engineer.

Best,
Anant
0
0
Updated
Share a link to this answer
Share a link to this answer

Jesmin K’s Answer

Learn basics of data analysis and statistics using online courses.Install and gain familiarity of data tool s such as Tableau,Hadoop,Python and R.

Jesmin K recommends the following next steps:

Do small projects to master the skills aquired.
0
0
Updated
Share a link to this answer
Share a link to this answer

Daria’s Answer

Hello Albert,

I am happy to see that you have in interest in Data Science! I personally think anyone with an interest to learn more about math/statistics and computer programming can carve a path for themselves to become an incredible data scientist - even if you don't have a degree specific to that field. In your case, studying computer science is a great first step.


In addition to the advice already given here, I would highly suggest more hands-on activities that you can do anywhere, anytime (as long as you are connected to the internet).

Here are my recommendations:

1) Sign up for this course on UDEMY course offered by Kirill Eremenko: Machine Learning A-Z: Hands on Python and R in Data Science

2) Data Science weekly newsletter: https://www.datascienceweekly.org/

This newsletter gives you a better understanding of how others in the field are using data science to solve business problems. Here you will also find career advice, job/internship opportunities, book recommendations, etc. It's really a great resource to scan weekly for additional insights from professionals in the field.

3) You can also check out Kirill Eremenko's own website where he offers free tutorials, advice, bootcamps, etc.


I personally also took advantage of these above resources to get a better understanding for the skills and tools I need to succeed in Data Science. Kirill Eremenko is a great teacher and makes his lectures very easy to follow, fun to listen to, and applicable to Data Science and real-life business problems. You can also use some of the skills you learn from his UDEMY tutorial to create a side-project and post it on GitHub for future employers to check out!


Best of luck and never give up! :)

0