Data Science Ramblings — I
I keep on posting some useful stuff on LinkedIn based on my day-to-day learning & experience. This is an attempt to publish these ramblings as a collection in the form of blog-post. Kindly let me know your feedback on this approach.
Important aspects in Data Science
What is more important in data science? Maths/stats, Technology or Domain? IMHO, there is no generic answer, every business problem have their own weightage of these data science aspects, which we get to know while dealing with those specific problems. So it depends on the problem you are working on. There are business problems where domain knowledge is much more important and elementary knowledge of maths & technology will suffice to provide a solution. There are problems which are more of statistical learning, domain & technology doesn’t matter that much.
Data Science for Business
Its very easy to get overwhelmed by data and algorithms, but we need to keep reminding ourselves why we are using data and why we are applying algorithms. Similar data, similar problem can mean different things to different businesses. We should always keep business context in mind.
Improving predictive models
After building the baseline models for your DS project, next crucial task is to improve the performance of the predictive models, which can be done by:
Working on data (further data cleaning, feature engineering, feature selection, regularization etc)
Working on models (appropriate model & evaluation metrics selection, re-sampling techniques, cross-validation etc)
Tuning the models (optimizing hyper-parameters, random & grid search etc)
Ensembling the models (bagging, boosting, stacking etc)
Data Science on Big Data
Data understanding and its relevance to the business problem is the crucial part of data science. When we need to deal with big data for data science, we should break the problem into two parts, first to build an understanding & data science baseline on random samples (to iterate & learn quickly) and then to work on scaling that pipeline to handle big data. First part of the problem is scientific and second part is engineering.
Estimating Data Science projects
Data science projects are different from typical IT projects, in DS projects we need to test the hypothesis first, once it’s tested and POC results are accepted by stakeholders, we have enough details to estimate it like an IT project.
Data Science is a teamwork
From understanding DS requirements to building data pipelines to training models to deploying DS models in production, each step requires different set of skills and it’s really difficult to find that unicorn data scientist. Different people with these skillets are required to work in different phases of project which requires decent amount of collaboration to make the DS project a success. We succeed or fail as a team.
Context is the key
Data scientists should know multiple ways to answer a question, analyzing the data in different ways, using different algorithms, using different evaluation metrics, each having its own advantages or disadvantages, so that they may apply the best approach for that particular business context. Most of the time, business context is the key.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also feel free to listen to me on SoundCloud & visit my website https://ankitrathi.com.