DS/AI-SSH: Navigating the Landscape
In the last post, we learnt the big picture of this blog series, to know what is covered in which post.
This is the 3rd post of blog post series ‘DS/AI: Self-Starter Handbook’, this post navigates the landscape of DS/AI for starters. Following topics are covered here:
Data Science Roles
Academia Vs Industry
DS/AI is a complex and evolving field. The first challenge a DS/AI aspirant faces is understanding the landscape and how he could navigate through it. Consider this, if you are travelling to a new city, and if you don’t have the map, you will have trouble to navigate the city and you will need to ask a lot of random people during your travel without knowing how much they know about the place. Similarly, all the newcomers to data science have this trouble, and there are two ways to deal with this, arrange the map (or a guide) or travel yourself and learn with experience.
This post intends to serve as a map of DS/AI field.
You might have heard data science, machine learning, deep learning, artificial intelligence etc terminology but might not be fully aware of these terms, what to use when and how these topics are interconnected. After going through this post, you should be able to understand what is where in DS/AI field.
DS/AI is a multidisciplinary field with sub-fields of study in Math/Statistics, CS/IT & Business/Domain knowledge.
Math/Statistics is required to understand the data and relationship between data elements. CS/IT skills are required to process the data to generate insights. And Business or domain knowledge is required to apply above to skills in the context of a business problem.
Programming is an essential skill to become a data scientist but one needs not be a hard-core programmer to learn DS/AI. Having familiarity with basic concepts of programming will ease the process of learning data science programming tools like Python/R. These basic concepts of programming should help a candidate get a long way on the journey to pursue a career in DS/AI as it is all about writing efficient code to analyse big data and not being a master of programming. Individuals should learn the basics of programming in Python/R (or any relevant language) before they begin to work on DS/AI problems/projects.
Maths & Statistics
Data science teams have people from diverse backgrounds like chemical engineering, physics, economics, statistics, mathematics, operations research, computer science, etc. You will find many data scientists with a bachelor’s degree in statistics and machine learning but it is not a requirement to learn DS/AI. However, having familiarity with the basic concepts of Math and Statistics like Linear Algebra, Calculus, Probability, etc. is important to learn DS/AI.
Subsequently, the business knowledge that the data scientists would need to have would be related to the domain that the project/analysis is in. For instance, if the data scientist is working for a credit card department in a bank, it will need to understand the specific business definitions, regulations, accounting policies & international standards, processes etc. This is the part that is more specific to the organization the data scientist is deployed in.
In my view, one thing to take care while the hiring data scientists is not to give huge preference to domain knowledge. This may severely limit the supply of data science talents to the organization. You would have a better chance of getting more value from data science by looking for those that are strong in math & programming, being able to convert business objectives to mathematical models. Based on my observation, this is a much more difficult skill to find or train, as compared to domain knowledge.
As a DS/AI starter, you will come across many similar terminologies. First thing you need to do is to understand what each term means and where each fits in the bigger picture. Data Science, Business Intelligence, Data Mining, Machine Learning, Deep Learning, Artificial Intelligence; let’s have a look at Wikipedia definition for each term & later see how these are interconnected.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, like data mining.
Business intelligence comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Machine learning is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task.
Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms.
Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals.
Data mining uses statistics and other programming languages to find hidden patterns in the data to explain a certain phenomenon. It helps in building a perception about the data using both math and programming.
Machine Learning deploys data mining techniques as well as other algorithms to develop models of what is happening behind some data to forecast future outcomes.
Artificial Intelligence uses models developed by Machine Learning and other algorithms to lead to intelligent behaviour. AI is very much programming based.
Data Mining demonstrates patterns
Machine Learning forecasts with models
Artificial Intelligence shapes behaviours
So you see that these terms are different but still inter-connected.
Data Science Roles
Before looking into the skill-set of a data scientist, let’s have a look at various roles required to work and deliver a data science project, after all, it’s a teamwork.
Every role has its own skills that are critical to data science projects at various stages.
A data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning. She spends a lot of time in the process of collecting, cleaning, and munging data. Domain knowledge is also an integral part of the skill.
Machine Learning Engineer
Machine learning engineers are sophisticated programmers who develop machines and systems that can learn and apply knowledge without specific domain requirement.
Data analysts translate numbers into plain English. Every business collects data, whether it’s sales figures, market research, logistics, or transportation costs. A data analyst’s job is to take that data and use it to help companies make better business decisions. There are many different types of data analysts in the field, including operations analysts, marketing analysts, financial analysts, etc.
Data Engineers are responsible for the creation and maintenance of analytics infrastructure that enables almost every other function in the data world. They are responsible for the development, construction, maintenance and testing of architectures, such as databases and large-scale processing systems.
Data architects build complex computer database systems for companies, either for the general public or for individual companies. They work with a team that looks at the needs of the database, the data that is available and creates a blueprint for creating, testing and maintaining that data architecture.
The data science manager coordinates the different tasks that must be completed by their team for a DS/AI project. Tasks may include researching and creating effective methods to collect data, analyzing information, and recommending solutions to business.
Data science business analyst converts the business problem statement to a DS/AI problem statement which means what data needs to be analyzed to arrive at the insights. The data would then be reviewed with the technology team and results would be delivered to the business team in the form of insights and data patterns. The business analyst should also be knowledgeable enough to apply various predictive modelling techniques and right model selection for generating insights for the problem at hand.
The job of quality analyst includes checking the quality of the training data-set, preparing data-sets for testing, running statistics on human-labelled data-sets, evaluating precision and recall on the resulting ML model, reporting on unexpected patterns in outputs, and implementing necessary tools to automate repetitive parts of the work. Experience in software testing with data quality or DS/ML focus, understanding of statistics, exposure to Data Science / Machine Learning techniques and coding proficiency in Python, are some of the skills required for the job.
To work on DS/AI projects in any of the above mentioned roles, one needs to have an understanding of the core concepts at a high level but depth is required in the specific area you would be working in.
Academia Vs Industry
Academia and Industry are different fields with different people and culture. People working in Academia for longer tenure may find it difficult to adjust to industry culture and vice versa.
There is also an academic trap when your career trajectory is so specialized for academia that you’re unprepared for a job outside of it.
The academic trap happens in all areas of study, but for this post, we focus only on DS/AI students who want to leave academia for data science positions.
Further, companies are often hesitant to hire people coming straight from academia for various reasons like:
In academia, individuals prefer writing papers over internships, making grants over learning programming languages, and not doing the things that could help you in the industry but not academia. The things that are important for academic hirings, such as papers, talks, and grants, are not as important in the industry.
Working as a data scientist within a corporation requires an understanding of how the business world works, including how quickly deliverable need to be made, how to craft a good presentation, and how to word an email to make a request.
In academia, you are encouraged to find the most innovative and elegant solution. In industry, you are encouraged to spend as little time as possible to find an analytical solution that just fits the need.
Salary expectations for advanced degree holders are higher than someone with only an undergraduate degree. This also pushes away recruiters as the industry works in a different way, culture is simply different than the academic one. People coming from academia need to learn these lessons at their first job, which means that there is a lot of risk for the hiring company.
In the next post, we will learn about the building blocks of DS-AI field.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also, feel free to listen to me on SoundCloud.