It depends quite a bit on your background in Software Engineering. If your background is, say, a BS or MS in Computer Science or Computer Engineering, you're likely to have a good mathematics background. You'll want to be familiar with multivariate calculus, discrete mathematics, and statistics. If you don't remember much, you'll want to refresh yourself. You'll spend a lot of time analyzing data to be able to predict outcomes from statistical methods. Some good languages to get familiar with are R and Python. Much of this training is available for free on the web. Then look into Hadoop and Splunk, for example.
The biggie in Data Science today is AI, or more specifically Machine Learning and Deep Learning. Get familiar with ML if you're not already. Use your tutorial projects to get familiar with the practical aspects and to generate some examples to show prospective employers. You'll certainly want to be familiar with SQL and any of the SQL application environments. All this will be useful in anything computer-related anyway, as SQL is everywhere and ML is the current trend. It will eventually be necessary for just about anyone who wants to be on the forefront of what businesses will be focusing on. The nice thing is that if you have a CS degree already, you're likely familiar with much of these, and the rest can actually be learned for free on the web.
Look at case studies on-line, particularly things like Natural Language Processing and Machine Vision. Find out what the big companies are using DS for. Finally, as a DS, you'll be thrown a wide range of problems that will require your own creative approaches. If you're really a CS "code head", this is probably something you're already enjoying.
I've done the reverse - I am a data scientist-turned software engineer.
Perhaps a few things I learned during that process would help you pursue your desired career path as well.
You mentioned that you love mathematics. To be honest, I think that's the number one ingredient to becoming an exceptional data scientist. (I love physics and psychology, went into the "science" field by accident lol) Depending on which kind of data scientist you'd like to become, there are usually two paths: data scientist who do research, and machine learning engineers who build products.
1) Research oriented data science
If doing research on data and extracting insight is what you like, this is your route. Here are my recommendations:
- Compete on Kaggle. The community of brilliant statisticians and scientists on there is amazing.
- Build presentable data science projects with your own framework of choice. Use any combination of R, Python, Scala, ggplot, plotly, R shiny, Dash, etc. to build something end-to-end and host it somewhere to share with people.
- Gain deep understanding of statistical methods. These are two great books I recommend: An Introduction to Statistical Learning with Applications in R, The Elements of Statistical Learning. You should aim to get to the level where you can confidently say "I know my statistical methods".
- Connect with data scientists. This is just as important as having the actual expertise. Don't wait till you're "ready" to put yourself out there. Opportunities are everywhere. Ask and knock on doors.
2) Machine learning engineering
If you love building things and making them intelligent. This could be your path.
- Algorithm and data structure. You probably know the in-and-outs of this already. Basically stay competent in the software engineering realm in the foundations.
- Machine learning algorithms under the hood. Make sure you have a solid understanding of how algorithms work, not just know what they do on a high level. How does feature selection work with different algorithms? How do you make trade-offs?
- Big data framework. Hadoop, Hive, Oozie, Spark, Java, etc. Proficiency of these tools will help you be more autonomous and build faster, better.
This is definitely not an exhaustive list but hopefully I provided a sense of general direction. All the best and good luck on pursuing your dream career!
Learn this Basic Machine Learning A-Z™: Hands-On Python & R In Data Science.
huhm. But why do you want to be a data scientist from the very first place?
Since you already mentioned you have interest on mathematics and analyze the data that will help you on choose the Data Science career. Also, it will help if you want to explore on AI related careers. I would recommend to explore the Data Science Careers. As more and more industries see the benefit of using analytical data to improve business practices, big data and data science career opportunities are exploding. Data science related occupations are likely to enjoy excellent job prospects, as many companies report difficulties finding highly skilled workers. The good news is that there are a number of different kinds of paths that a data science career can take. The challenge is that it can sometimes be difficult to understand how these careers differ and what kinds of skill sets are required for each. Here is the link which will help you to explore the suitable options.
If you want to switch your career in Data Science, you need to assess your current knowledge on data science technology. This will help you to identify the GAPs and put more focus on certain areas where you need to spend more time.
If you are starting from scratch, I would recommend you to follow the below plan:
• Understanding the concepts: Below are the books and courses that I recommend you study to understand how data science works. Take note that the learning resources below are shown in the exact order that I recommend you take them (based on both my experience and feedback from other people).
1. Python for Everybody Specialization — This series of courses is great for the absolute beginner who wants to get started. Best course to take in order to get you over your fear of learning how to code.
2. Machine Learning by Andrew Ng — This course gave me the core foundation of my understanding of different machine learning models. Andrew Ng literally inspired me to pursue a career in machine learning.
3. Learn Python 3 the Hard Way — This book will create a solid foundation for your python skills (and coding skills in general). I cannot stress enough how great this book is at teaching basic concepts with practical lessons and well designed exercises.
4. Applied Data Science with Python Specialization* — This series of courses is a good way to glue your understanding of machine learning models with your coding skills. I personally know people who were able to get jobs in data science right after this specialization, since by then, they already had a decent toolkit of data science skills that they could use to solve real world problems.
5. Introduction to Machine Learning for Coders (fast.ai) — This course is taught by Jeremy Howard and he gives a very practical walkthrough on how to do machine learning properly with code. Get ready to learn how to code the random forest algorithm from scratch!
6. Practical Deep Learning for Coders (fast.ai) — This two part course is the best resource out there for both 1) aspiring data scientists trying to get into deep learning and 2) more experienced data scientists trying to get deeper into what it takes to get state-of-the-art results in deep learning. In the first lesson, Jeremy Howard shows you right away how to get cutting edge accuracy in the ImageNet dataset using the fastai library. In later episodes, you will get more and more used to implementing models directly on PyTorch. Highly recommended!
More and more Practice: Some would argue that true learning only happens when you are working on a concrete project and solving real world problems with your data science skills. Below are recommended ways to gain experience by applying your knowledge (i.e. learn by doing).
1. CodeSignal — When I was new to coding, I had a difficult time understanding how my basic skills could be used to solve real world problems. Thankfully, CodeSignal (formerly called CodeFights) had fun coding challenges that allowed me to compete against bots and real people. This made me comfortable with the process of solving problems with code. The website started out as a platform for competitive coding but now focuses on preparing developers for the coding exams during interviews with tech companies.
2. Kaggle — This is a platform where data scientists come together to 1) share data and code, and 2) compete on training ML algorithms that best reach a target objective (e.g. predict housing prices most accurately). Even if you don’t explicitly compete, I think the biggest value add from Kaggle is the availability of “code solutions” from competitions. Reading the code of other more experienced data scientists is one of the fastest ways to get better because it teaches you best practices while getting you comfortable with reading and writing ML code from scratch yourself.
Certification in Data science: There are multiple Certifications are available in Data Science. There are some renowned universities, product based companies, Learning partners available, who offer certifications/Degree in Data Science. You can choose based on your interest and flexibility.
Solving Real world problems: To test and enhance your knowledge and gain confidence in this field I would recommend you to do the following:
1. Passion projects — Even if you don’t have a data science job but want to get into the field, think of a cool project to execute! Identify a problem you want to solve or even something fun you want to do, then create a machine learning model for this. It’s even better if you decide to deploy it as an app accessible on the internet!
2. Freelancer Projects — This one should be obvious. The best way to learn by doing is to get yourself a job in data science. Try to get in to a Freelancing project, if you can co-work with an experienced Data science engineer would be best.
Now if you know Python or R that's great, otherwise learn it fast as it is the fastest growing programming language for Data Science. Almost everything you can do with those programming language.
Only the programming language is not enough, you need to have a good depth of Statistics, Probability and algebra. I believe this is the main part of a data scientist. Before applying different statistical packages, if you know the things from the bottom of the route, it will give you an added advantages over to your competitors.
Now do some online courses on Data science. If you master those above knowledges, it will be very easy for you to master data science and get a job as a data scientist.
Hope this helps.