Your DS Story S01E07: Mohammed Ashiq Ariff
‘Your DS Story’ is my attempt to bridge the gap between data science professionals & data science aspirants. Here new crop of data scientists will share their experiences, struggles, achievements & their advice so that data science aspirants/enthusiasts can learn and get inspired.
Mohammed Ashiq Ariff is a techie, who relies on practical experience rather than chunks of theory. A techie who in the way of mastering the skill of engineering rather than proving to be an engineer with just a degree.
1. Please tell us a bit about your background?
I come from a computer science background. I did my engineering in India with Information Technology as my specialization. I started working for HCL in mainframe security. But in 6 months, I had to “breakup” with HCL as I eventually fell in love with new technologies and the way start-ups utilize them. I left HCL convincing myself that I should not get stuck within the “service bubble” but instead, I should start being a part of a team which is driven by innovation. I joined an exciting start-up called Funfinity Learning Solutions (earlier 40Tables Lunchbox). I had a chance to be a full stack web application developer and experimented with a number of cloud technologies. Once the application was developed and pushed, we started focussing on building analytics around it. Those were early days of data science (the term was not popular yet). We were one of the first few kagglers. Those days, we just had the Titanic tutorial (No kernels). We did some elementary data science like forecasting sales and orders but gradually went into sentiment analysis and user segmentation/profiling (not the Cambridge Analytica type but some ethical ones). Understanding the power of data science, I wanted to learn more about it. Its beautiful to see history repeats itself. It is even beautiful to observe patterns in data, in your everyday life and even in the entire universe. Fascinated by the ability of humans to interpret and learn without being instructed, I wanted to dwell a bit deeper into the ways humans learn and try to simulate them in machines. To this day, my very first thought when given an ML problem is, “How will a human solve this? What will a human look for?”. To get this understanding, I did not have sufficient access to courses/mentors in India. So, I decided to do my Masters (Though I was done mentally with my education). I did my masters in Big Data Science from Queen Mary University of London. Being one of the Russel Group of Universities I had access to lectures, workshops, labs, libraries and some of the best professors in the UK. Uni was hard, we were pushed harder on mathematics which we cribbed about but are happy that we had them. I worked under many professors contributing on their projects. I was involved in projects for TFL (Transport for London), some petty projects for charities, computer vision projects for a security company operating in some of the airports. I did my final masters dissertation trying my luck/intuition to predict stock markets. I had a good Bullish/Bearish classifier which can “probably” make me rich! It uses features like sentiment scores from recent news and other economic forums. I then joined AstraZeneca and since then, I am working in an awesome internal consultancy primarily focused on Finance and I also lead some exciting DS projects there.
2. What projects you are working these days?
I am currently working on projects involving Global Business Services(GBS) within AstraZeneca. We look at wide range of financial data, try to categorize them into Tax Buckets automatically. We also developed our own Optical Character Recognition engine to extract text from tabular PDFs. We built a lot around entity recognition from documents to extract structured information from unstructured text. Being in an internal consultancy, our problems are not limited to GBS and Corporate Finance but also involves other enabling functions like HR, Legal and Global Assurance. Projects there can be identifying risks in contracts, cutting down costs for AstraZeneca and even anomaly detection in employee expense data for better compliance. The major challenge we face in most of the projects is the availability of labelled data. Taking this opportunity to thank the teams who label data for us. It is such a tough job they are doing for us!
3. How your day to day job looks like?
My job requires me to be very close to the business. It would require engaging with directors and project managers, trying to understand their problems or scout for ways to improve or do quick proof of concepts. This is where we get to spend most of the time. Defining the problem right and defining “what success is” would mean I have already solved 50% of the problem. The next is to scout for relevant data. AZ being a huge pharma company, we need to sort our access and compliance right before doing anything. We got many software systems and processes across the organization and to join multiple data sources to get a horizontal enriched view is a tough job. Most of these wrangling/joining are done by big-data engineers so our job would actually start from liaising with business analysts and engineers to get the data, multiple iterations of exploratory analysis and feature engineering. We spend about 30% of our time here. Finally we get to do our modelling, tweaking parameters and pushing models to production which is the remaining 20%.
4. How you started with DS or transitioned into DS?
I transitioned from a software developer. I was pretty fascinated by the fact that I didn’t have to write rules to tell a program to categorize something. I remember that I once thought, there should be a program which does not require series of “if.. else if.. else if…….. else if.. else“ to do something (say categorize) and computers should get an ability to learn and infer like humans. When I coded my first ML solution (after Kaggle Titanic) and it worked well, I felt this strange power of combining mathematics and computing. The biggest challenge I faced was the ability to “visualize math”. In Indian education system, math was taught very theoretically. Being forced to memorize formulas, it took quite a while for me to figure out why in the world we need calculus and in probability theory, why would a person always worry about tossing a coin or pick a red/blue ball from a box. (But I certainly understood why probability of picking a card was beneficial when I played bluff). To solve this, I went back to basics, I learnt linear algebra, probability, statistics from YouTube. There are recorded lectures from various universities and schools. I still do a lot of learning from YouTube.
5. What advice would you like to give to DS starters or DS transitioners?
Get your math right. There is no Data Science without math. If you are not good at it, you are definitely not a good data scientist. We test quite heavily on the math side of things in our recruitment.
If you are using an algorithm, try explaining it to someone who is non-technical. If they understand it, then use it. This technically means that you have understood it well enough.
Parameters are not always random numbers. They are there for a reason, use it well. Don’t GridSearchCV everything! Everytime!
Data Science is not about using complex algorithms. You don’t XGBoost everything. Sometimes the simplest models are the best. It takes quite a lot of practice to get that sense of maturity. So fail a lot! There is a cheat-sheet in sklearn to start with.
Accuracy can be misleading. Don’t boast about it. Look at other scores to evaluate a model. Understand how they are calculated.
Feature selection and feature engineering — Know it and practice it.
Your job does not end with cPickle-ing the model. It ends with FLASK. At least understand how it is done.
Google everything which you don’t understand. Even if it is something from this article.
Data Science is not rocket science. You don’t have to start perfect. Start small and always prepare yourself to get your hands dirty.
Data Science starts with Data — Know it and exploit it well before jumping on algorithm decisions.
Stay perseverant. Every single job on earth demands this and data science is no exception.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also feel free to listen to me on SoundCloud.