How to start preparing towards a career in data science ?
I'm looking forward to starting a career in data science and I'm doing my master's right now. So, any information regarding this will be helpful for me. #data-science #databases #python #data-analysis #big-data #hadoop
Great to hear about your interest in a career in data science. For sake of simplicity, I'll assume your undergrad and/or masters is in a field that equips you with a solid understanding of data and statistics (e.g. major in math, engineering, statistics, etc.). This would be the firs step.
From there, knowing what type of data scientist you'd like to be would help focus your career prep. There's lots of different types of data scientists working in different industries and departments. So you may be able to help whittle down your options if you're slanting towards a particular industry (e.g. Financial Services, Tech), or department (e.g. marketing, operations).
After you've narrowed down the area of data science that piques your interest, I'd simply encourage you to talk to as many data scientists out there to understand what they do and their career path to their position. A good start is working the grapevine of your undergrad/grad school alumni network.
In general, attributes you'll want to have to help you succeed in a data science career include:
- Constant curiosity of the world, and desire to answer questions through mining data
- Ability to regularly form a hypothesis which you'll test using data (this ability improves and comes with time and experience)
- Ability to gather insights out of large amounts of data
Best of luck,
I agree with the answers above, both Ken and Wilfred gave some great insight. In addition to what they have said, I would recommend looking into starting to teach yourself programming languages like Python or R. Those are commonly used in the data world. I find myself using a tool called Informatica daily, however that's going to be hard to find a way to teach that to yourself.
For the reporting side of things, I or members of my team are using products like Tableau or IBM Cognos every day. You can download Tableau Public for free from there website and start playing around with it. It's a really cool tool and you can make some awesome reports/visualizations with it.
Ken Liu provided great input. In addition I'd like to add that you should learn a lot about data analysis tools as well as data quality. There is a lot of "Big Data" out there that is pretty meaningless. It is often the job of a data scientist to draw meaningful conclusions from ambiguous data sets, but the conclusions are so much better, and actionable (yes, we need to do something with the conclusions) if data is of good quality. It often takes years to improve data sets, so draw initial conclusions early and support processes to improve the information. As far as analysis tools are concerned, you'll need a broad toolbox. You should understand databases, spreadsheets and various analysis tools. Beyond that, presentation skills (along with MS PowerPoint and MS Word skills) are a great asset as well. Last but not least, work on some team building skills. You will need to work with many different people to be effective and continue to get funding and support for your work.
a few things you can do:
contribute to open sourced ML libraries on github,
create a fun AI apps, such as apps to recognize bears
implement research paper with code and put on your github
-- Data Visualization
-- Foundational understanding of data modeling
-- Understanding of how data flows from systems to landing areas to data use layers
So, how to go about becoming a data-scientist? Its partly becoming a good statistician and partly becoming good marketeer.
Now, there are even discussions leading to conclusion that data scientist is not an individual role, but a team.
Why so? Data scientist is an umbrella role, where the team need to be aware of statistical techniques, software tools for developing algorithms and automation, business acumen so identifying and converting business problems to analytics projects and highly adept in story telling.
So where to start?
a. Start with technical on statistical algorithms.
Assuming you are comfortable with excel, the next best thing to do, if you are willing to can be a certification. (Udacity,Udemy,Coursera etc offers industry relevant training). You can do certifications at self paced mode. And even participate in kaggle competitions.
b. What tool to learn?
Depends on what inclination you have. If you have coding background, python can be good to start. If statistics is what you need to focus, you can start with R-studio. Both are open source.
c. What for soft skills?
Identifying hypothesis and relating it to business problems or vice-versa need to be a regular exercise, which should never be taken for granted. Because, we should always remember data science is just a set of tools to enable business better decisions. Enabling better business decisions can be through most simple of data analysis or highly complicated algorithms.
Being data scientists, our success hugely depends on how we interpret and convert analytical outcomes to business relevant insights. This is the single most important skill for any data scientist to possess, which is again easier said that done..
Probably the best thing is to ask what interests you and what are your current skills.
Data science helps discover something not previously known, but in particular it helps answer WHY. With out the why all we have is correlations and endless statics runs -- an answer looking for a problem.
Preparation for a career in Data science -- Immerse yourself in a problem you care about. Discover the issues--learn to break things into components, build a framework, understand the univariate and multivariate dependencies and then see if you can model a solution to determine the WHY and solve for the problem. Going through these gives real world experience and exactly what we do each day in practice.
When looking for talent, I do not care if you know a tool, if you cannot show me how you saw the problem, determined what might be a adequate solution model and then how you used the tool(s) to solve for, even if a solution was not found. Data science is not just knowing a tool (python, R, Excel, SAS, Tableau, MS BI, etc...), data science starts with conceptualizing the problem into a framework and then working through the parts with the appropriate tool. So skill up in the tools, but focus on the framework. You will get good at a tool as you use it more and more in applying data science.