What kind of challenges will I be facing as a Data Scientist?
I am 24 and I really like numbers and statistics. I have a variety of experiences in computer skills ranging from HTML and programming, to hardware and software installation. I never really considered a job in computers however the education required versus the salary available is very enticing. I'm always open for new avenues of adventure so I would like to learn about what kind of things I would be up against before taking the journey down this path.
I can think of two main challenges:
1. Technical: The field of data science is wide, deep and constantly changing. To be effective you'll need to build knowledge and skills in statistics, programming and data management among others. You'll find that there's always new modeling techniques being developed and new tools and frameworks that you can use, so you'll have to be constantly learning.
2. Storytelling: In general communication skills are crucial to any job. But in particular for data science you need to develop the ability to tell a story through the data. Lots of people can generate all sorts of statistics and charts, but being able to communicate insights and influence decisions is how you really deliver value. Like Toyin mentioned, this is also how you stand out.
But these challenges are also what make this field exciting and rewarding! You'll never be bored and there will always be room to grow.
Hope this helps!
Identifying the Issue
The hardest challenge faced by data scientists while examining a real-time problem is to identify the issue. They have to not only understand the data but also make it readable for the common man. The insights from the analysis should remove the major glitches and hiccups in the business. Data scientists can use a dashboard software that offers an array of visualization widgets for making the data meaningful.
Machine learning and deep learning algorithms can beat human intelligence. Algorithms are exemplary at learning to do exactly what they are taught to do but the problem occurs when data gave is poorly curated. For example, Microsoft’s Tay, chatbot learned about tweets on the internet and ultimately ended up chaotic. Machine language is a boon and a bane, they have the immense power to learn things so rapidly but they will be able to reproduce only what they have been told. Henceforth data quality is of prime importance and data scientist will have the herculean task to curate data.
For a data scientist, a development of a powerful model is of top priority. A complicated problem requires an intense model with more crucial model parameters. However, more the model parameters more the data requirement. Also, it is quite challenging to find quality data to train such models. Even unsupervised learning or algorithms demand a huge amount of data to form a meaningful output.
Multiple Data Sources
Big data allows data scientists to reach the vast and wide range of data from various platforms and software. But handling such a huge data poses a challenge to the data scientist. This data will be most useful when it is utilized properly. To an extent, this problem could be solved with the help of virtual data warehouses that can effectively connect data from enumerable locations using cloud-based integrated data platforms. The deeper the reach of data the more useful insights and conclusions.
Sometimes in data science, unexpected results may be obtained which may or may not be the end with the rightful conclusions. In such a challenging situation, a data scientist should press on supervised learning for future exploration, model selection and appropriate selection of algorithms. With sufficient time and power, a data scientist can generate models of predictive strength having little interpretation.
The following are the major challenges
• Dirty data
• Lack of data science talent
• Company politics
• Lack of clear question
• Inaccessible data
• Insights not used by a governing body
• Explaining data science into the business language
• Privacy issues
• The organization couldn’t afford a data science wing
2) being able to visually represent data in a manner that is clear, concise, accurate and understood - the story should be obvious
3) diversity of delivery and tools - some tools are great for some things, not so great at others
4) finish deliverables / don't deliver half completed tasks
5) understanding of analytical principles and being able to predict with some accuracy what potential outcomes will be
There are some great responses so far, hopefully this expands upon some of what has already been shared and is helpful.
If you’re unsure I recommend you try Udacity’s online programs. There you get to immerse yourself in the experience of Data analytics and science. You get the help of mentors and also the social interaction of a cohort group.
Bonnie recommends the following next steps:
I'm in a similar boat as you! I've been working in the software industry for a few years but I'm very interested in data science as a field. I started taking the "Data Science: Statistics and Machine Learning" specialization on Coursera a few months ago (https://www.coursera.org/specializations/data-science-statistics-machine-learning); I'm almost done with it, I absolutely enjoyed it and maybe you will too! It talked a lot about both the technical as well as the story-telling aspects of a data scientist's job, and you can get a feel of what kind of questions they'll be answering in the assignments and projects.
Hope this helps!
Lei recommends the following next steps:
1. Problem statement: Not all problem could be solved by data science & in most cases it complicate things if applied forcefully. Data science solves some category of problem like classification, clustering, reinforcement learning etc...
The stake holders need to understand if their problem could be translated to one of the above which could solve by data science.
Just because a problem A which is very hard to do, is implemented successfully, problem B which seems pretty easy can also be solved.
Availability: Too much data is never a problem in data science but some time it do become. The data at times flow through various sources before finally reaching you. Now as a data scientist when you try to extract the data it becomes too tedious to find and extract based on your problem. Some time there is also un-availability of data as well.
Quality and integrity: The quality of data, contents and consistency is very important. E.g. If there is more human intervention while data generation the quality degrades. So the data generation should have minimal human intervention.
Finally, the real time issue comes while you have everything but not aware how to fetch the data for that one must understand how the data was generated at first place.
Hope this helps
All the Best
1. Social/listening - working with the people who need a model to understand what they need, why, how they are going to use it, what sorts of models will best meet their needs
2. Social/collaborating - working with other teams or people on my team to secure data, platform space/time, tools, to decide who will work on what part of the problem, to help colleagues solve problems
3. Analytical - delving into the data to find out where the signal is, testing and comparing different models, engineering new features
4. Technical - using my skills and experience to decide or consult on model selection, team members needed, delivery platforms, carving out time to improve my skills, learn something new
5. Problem-solving - running into snags and bugs, defining and/or diagnosing them, and coming up with solutions
6. Social/storytelling - interpreting the results of a model and telling its story in context of the business and our goals; helping colleagues build on their data skills and numeracy
Hope this helps!
* Strategy and Funding - Does that organisation have a strong and clear data strategy, and do they have adequate funding allocated to support it?
* Access to data - who REALLY owns the data? Will you get the access you need? Does the Chief Data Officer (or your boss) have final say as to who gets data access, or is data ownership fragmented?
* Where does the data reside? - On-Premises data will (highly likely) have different interrogation tools available to you compared with data stored on a Public Cloud. You will also learn different skills in these scenarios which may or may not influence future career steps.
I'm sure there are many more, but these are 3 topics I've found have played a role in job satisfaction in the past.
Happy to provide some insight!
I think you already have the background and attitude to succeed as a Data Scientist. Like all professions, Data Scientists have to constantly remain flexible in their approach to interpreting and analyzing data.
This means you’re not just involved in the “how,” but also the “why” of making things happen. You’re not just randomly sifting through data looking for connections. Instead, you’re using your knowledge of various business factors to form a “mental model” which can then be validated or disproved by your data.
Also, it's helpful to have a unique background and blend of skills which will be one of your greatest strengths, kinda like a STAND OUT strategy.
Data Scientists are not expected to be nerds who work with numbers only but people who are able to collaborate across all functions. it's important to understand your business, your customers and the future of work so you can use your predictive analytics skills efficiently.