How to gain more data science knowledge quickly
I'm in high school and I've been going on kaggle and doing lots of learning on kaggle learn, coursera, etc. I've learned about models and fitting and datasets and cross validation and all the beginner machine learning stuff. Now I'm onto ensembles and beyond. I need advice to quickly gain a lot of data science knowledge. There is a lot I want to learn but I'm afraid I'm not using the most efficient use of my time. Right now I'm trying to apply the stuff I learn in Kaggle competitions and looking at other people's code to try to get new insights. I've also been googling and on stackoverflow. I just want to learn how I can make the most efficient use of my time to learn data science. #datascience #computer-science #machinelearning #kaggle
Honestly, you are on the right track. You are really lucky to be aware about the field of Data Science and all the resources to learn about it in High school. I wish i was this lucky.
Okay, now keeping that aside, i would like to mention that you should not rush into learning new things or quickly gain a lot of Data science knowledge. It is a very vast field and it is important to understand what is happening under the hood. Although, I like the approach of learning from other's code and gaining new insights, it is also very important to understand why they used a particular model or a specific data cleaning technique. If it interests you, then reading technical papers of some of the models might be really helpful and you will also get to learn something new.
People often think that applying model is the only thing, but that's not true. It is also important to understand or at least know the Math behind it. I say this because real world Data science problems are very different from the ones found online or in competitions (although I am not saying that you should not take part in them). So if you understand why data science models and techniques work the way work, it would help you tackle any real world challenge.
Now that being said, as mentioned in the previous comment, it is very important to find your interest and should be your first step. Once you have found that, try to read and find what problems are companies facing pertaining to this field and see if you can work on that problem (you don'e have to find a solution for it, but just work on it). Also, keep taking part in competitions and try to build an end to end pipeline for every project, from data collection to exploration, cleaning, modeling, visualization and evaluation.
Lastly, all I would like to say is, stay focused, be patient and keep learning. You are doing a great job!
I've been working in medical device development as a data scientist/engineer and also done non-profit consulting with Delta Analytics as a Data Science Lead. I worked on a recent project with CareerVillage this year!
You've definitely started on the right track! Given that you've worked on Kaggle competitions, Coursera and reading code, that should give you some insight on various areas in data science you can explore. Data Science as a field is VERY large, and you can easily specialize in one area. You can go in a lot of directions.
Do you have an idea of what areas you would be interested in, based on what you've learned so far? For example, visualization, data pipelines/engineering, text processing (NLP), computer imaging, machine learning, etc.
A couple of options to consider are conferences (Open Data Science Conference has one coming up later this month in Burlingame near SFO), Meetup groups and hackathons. You can learn about various topics in data science that are used in industry and also for networking with other professionals.
One idea that just came to mind - have you considered doing informational interviews? I've used these over the years, and they help with networking and gaining knowledge from professionals about their work. People LOVE to talk about what they do, and it's a nice and easy way to get advice and give some perspective on where you might want to take your data science journey.
Erik recommends the following next steps:
When it comes to modeling as Rahul already mentioned, it definitely helps to understand the math and stats behind them. Today, machine learning libraries like sci-kit learn make it easy to run models on your data; it abstracts away the math behind the models. However, spending some time learning fundamental math and statistics used in data science will help you better tweak your model to match your needs.
The last step I’d focus on is data visualization. Doing all this analysis and modeling is only useful if it’s conveyed properly to the audience. Essentially, you’re telling a story using the data. So spend some time looking into different visualization libraries to translate your findings — there’s Plotly, Matplotlib, Seaborn. It’s rather fun to play around with as well.
I’d recommend Jupyter notebooks as a platform for data analysis. It has a nice interface, which makes loading in libraries and doing analysis all the more easier. Plus, you can export the notebook as a report with your code, visualizations, and text to help convey your findings. Hope these tips help you get started! Good luck, and have fun exploring data :)