Who needs a Data Engineer?
Today, who needs a Data Engineer when everyone else wants to hire a Data Scientist?
Let me start with a real-time situation; a new enthusiastic data scientist joins a firm. He knows how to analyse data, how to build models around it, how to create data stories. Now, business wants him to work on a use-case, data scientist understand the use-case and start looking around for data to work on. And he keeps on waiting, because there is no ready-made data available, data is hidden across various data stores. Now, data scientist needs help and here comes data engineer to his rescue.
“A Data Engineer is responsible for the creation, processing and maintenance of data pipelines which gives processed data that enables data scientists to work on their use-cases.”
So I would like to call ‘data science’ as ‘data science & engineering’ which gives a better idea of engineering skills required in this field.
But not all organizations realizes that they require both roles and most of the time data scientists end up doing data engineering tasks most of their time.
Skills of a Data Engineer
An article from DataQuest mentions following skills what a data engineer should have:
Architecting distributed systems
Creating reliable pipelines
Combining data sources
Architecting data stores
Collaborating with data science teams and building the right solutions for them
Panoply has published a decent article on ‘How to Become A Data Engineer’ which also highlights the skills required for the role:
Data Scientists Vs Data Engineers:
In general, data scientists are great at advanced analytics and data engineers are good at programming front in general.
The differences between data engineers and data scientists is explained in following article by DataCamp from following aspects: responsibilities, tools, languages, job outlook, salary, etc.
Following article on O’Really coins a term ‘Machine Learning Engineer’ for a role who fills the gap between a Data Scientist & Data Engineer.
Ratios of data engineers to data scientists
Even if an organization/department realizes that they need both roles, a common issue is to figure out the ratio of data engineers to data scientists. Considering that building data pipelines requires more efforts, a common starting point is 2–3 data engineers for every data scientist.
Ankit Rathi is an AI architect, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.