Your DS Story S01E03: Yatin Bhatia (RxLogix)
‘Your DS Story’ is my attempt to bridge the gap between data science professionals & data science aspirants. Here new crop of data scientists will share their experiences, struggles, achievements & their advice so that data science aspirants/enthusiasts can learn and get inspired.
Yatin Bhatia has 14 years of IT experience, 6+ years of Experience in Machine Learning, Deep Learning, NLP, Big Data. His areas of interest are Machine Learning, Data Mining, Data Analytics, Recommender Systems, Enterprise Software Design, Big Data, Statistical Modeling, Deep Learning, Predictive Modeling. He has mainly worked in Consumer Electronics, Pharma and BFSI domains.
1. Please tell us a bit about your background?
Thanks for giving me chance to share my journey with budding data scientists. I have done Masters in Computer Applications from IIT-Delhi & B.E. in Electronics from North Cap University, Gurgaon. I started my professional career in embedded design & spent around 5 years in same field.
After that I moved to Informatica, where I gained Expertise in developing Enterprise Software Applications. My Data Science Journey started when I moved to Samsung where I worked for 4.5 years. After Samsung, I worked in Genpact for a period of one year and then joined RxLogix which is my Current Company. Overall I am doing Data Science for last 6.5 years.
The most exciting part of data science for me is automating the existing workflow which saves manpower, time & other resources.
2. What projects you are working these days?
The current business problem I am working is to automate manual entity extraction from pharma data. Data can be semi-structured form or unstructured text. This involves following technologies Information Retrieval, Information Extraction, Text Classification. It’s a pharma related data having Drug Info, Adverse Events Info, Patient Info, Reporter Info.
The major challenge or day -to day activity is continuously improving on previous accuracy and scalability benchmarks. The learning here is immense as I got opportunity to do end-to-end implementation of this project from gathering SQL data, building models, deploying models.
3. How your day to day job looks like?
In the real world data science project, there is only 15–20% is data science, rest is data engineering only. When you are learning data science via books or courses, the easiest part is getting data, you are only supposed to build machine learning models. While in industry, the hardest part is data itself.
My typical day goes like:
Data Import — 20%
Data Cleaning, Pre-Processing — 30%
Feature Engineering — 10%
Modeling — 10%
Evaluation — 20%
Deployment — 10%
4. How you started with DS or transitioned into DS?
I got entry into data science through a Big Data project. The task was to write Map-Reduce codes in Java and implementing Collaborative Filtering, Content Based Filtering algorithms. Then deploying these Map-Reduce codes in Hadoop ecosystem.
After getting some starting success in Big Data, I did a project of Text Classification. Later on, I got some projects of Natural Language Generation, Named Entity Recognition etc.
One of the challenges that I faced was to convince stakeholders about applying automation in existing use cases. Since data science is a new Field so lots of business stakeholders are not convinced with automation but this is an evolving activity.
Another Problem that I see as business leaders/stakeholders considers data science task as a activity but in Reality Its a process. As an example, higher accuracy figures cannot be achieved overnight but a good approach can lead one to a good accuracy figure.
To solve a business problem, I gain confidence of stakeholders by releasing first end-to-end solution quickly then do more of research in feature engineering to improve upon accuracy & scalability.
I continuously try to improve accuracy and also try to give deliverable at important junctures.
5. What advice would you like to give to DS starters or DS transitioners?
Learn from GitHub repositories of other data scientists. Don’t get involved into too much of maths, start with little maths & have a combination of pratical results & theoritical knowledge.
Data science follows RCI (Religious Continous Improvement), the accuracy & scalability tasks are improved over time and not overnight. Data science aspirants should try to solve some real use case scenario, if not able to get these then learn use cases solved by others.
I would advice DS aspirants to refer trusted content and quality data science code. Kindly look into the profile or career of person whose content your are following.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also feel free to listen to me on SoundCloud.