DS/AI-SSH: Building Your Portfolio
In the last post, we got to know the resources which we need to refer in order to build the skill-set in DS/AI field.
This is the 6th post of blog post series ‘DS/AI: Self-Starter Handbook’, this post covers how DS/AI starters can build their portfolio. Following topics are covered here:
Working on Public data-sets
Participating in Competitions
Contributing on Git-hub
Publishing your Blog
This blog-post talks about how you can build your DS/AI portfolio. Lets first understand, why a portfolio is important in DS/AI field?
Besides the benefit of learning by making a portfolio, a portfolio is important as it can help get you employment.
For the purpose of this article, let’s define a portfolio as public evidence of your DS/AI skills.
People often forget that software engineers and data scientists also google their issues. If these same people have their problems solved by reading your public work, they might think better of you and reach out to you.
Working on Public data-sets
You can gain more DS/AI skills by working on prediction problems rather than getting stuck in endless learning loop.
But you will not get a project to work on from day one of your learning. Still, there are platforms where you can apply and learn DS/AI.
The UCI Machine Learning Repository is a collection of data-sets that are used by the machine learning community for the analysis of machine learning algorithms. The archive was created as an FTP archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data-sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited “papers” in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged.
Kaggle is where many data scientists spend their nights and weekends. It’s a crowd-sourced platform to attract, nurture, train and challenge data scientists from all around the world to solve data science, machine learning and predictive analytics problems. It has over half a million active members from 190+ countries and it receives close to 150K submissions per month. Started from Melbourne, Australia Kaggle moved to Silicon Valley in 2011, ultimately been acquired by the Google in March of 2017. Kaggle is the number one stop for data science enthusiasts all around the world who compete for prizes and boost their Kaggle rankings. There are only a handful of Kaggle Grandmasters in the world to this date.
Do you know that most data scientists are only theorists and rarely get a chance to practice before being employed in the real-world? Kaggle solves this problem by giving data science enthusiasts a platform to interact and compete in solving real-life problems. The experience you get on Kaggle is invaluable in preparing you to understand what goes into finding feasible solutions for big data.
This is the home of the U.S. Government’s open data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more. Data.gov is managed and hosted by the U.S. General Services Administration, Technology Transformation Service. Data.gov is powered by two open source applications: CKAN and WordPress, and it is developed publicly on GitHub. Learn how you can contribute to Data.gov and these larger open source projects here.
This source contains many datasets in different fields such as Public Transport, Ecological Resources, Satellite Images, etc. It also has a search box to help you find the dataset you are looking for and it also has dataset description and Usage examples for all datasets which are very informative and easy to use!
The datasets are stored in Amazon Web Services (AWS) resources such as Amazon S3 — A highly scalable object storage service in the Cloud. If you are using AWS for machine learning experimentation and development, that will be handy as the transfer of the datasets will be very quick because it is local to the AWS network.
Google’s Datasets Search Engine
In late 2018, Google did what they do best and launched another great service. It is a toolbox that can search for datasets by name. Their aim is to unify tens of thousands of different repositories for datasets and make that data discoverable.
In July 2018, Microsoft along with the external research community announced the launch of “Microsoft Research Open Data”. It contains a data repository in the cloud dedicated to facilitating collaboration across the global research community. It offers a bunch of curated datasets that were used in published research studies.
FiveThirtyEight, sometimes rendered as 538, is a website that focuses on opinion poll analysis, politics, economics and sports blogging. The website, which takes its name from the number of electors in the United States electoral college, was founded on March 7, 2008, as a polling aggregation website with a blog created by analyst Nate Silver.
You can find the data and code behind some of the popular articles and graphics here. You can use it to check others’ work and to create stories and visualizations of your own.
Participating in Competitions
Participating in DS/AI competitions is one of the most frequent paths taken by data scientists, while it doesn’t dish you all the challenges, it can help you to build your exploratory, modelling & cross-validation skills. You can also learn from fellow competitors about their approaches once the competition is over.
Kaggle runs a variety of different kinds of competitions, each featuring problems from different domains and have different difficulties. Before you start, navigate to the Competitions listing. It lists all of the currently active competitions.
If you click on a specific Competition in the listing, you will go to the Competition’s homepage.
DataHack by AnalyticsVidhya
AnalyticsVidhya Data Hack is also a platform where you can compete with the best in the world on real-life data science problems. You can learn by working on real-world problems. You can also upskill yourself and get hired in the listed companies. You can showcase your expertise and get hired in top firms. If you happen to be at the top of competitions, you can also win lucrative prizes.
Machine Hack by AIM
COMPETE. CODE. COLLABORATE.
MachineHack is an online platform for Machine Learning competitions. They host the toughest business problems that can now find solutions using Machine Learning & Data Science techniques. Companies can hire better data scientists, the can discover & evaluate talented data scientists.
Just like Kaggle & DataHack, you can enrol in competitions here and help host solve their business problem. In return, you get near real-world project experience, you can learn from fellow competitors once the competition is over.
Publishing on Git-hub
GitHub is a powerful platform for software development, but at its heart, it’s about empowering people like you by helping you learn from other developers, build the software that matters to you, and propel yourself to the next stage of your life as a software developer.
GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.
In order to work on GitHub, you need to learn essentials like repositories, branches, commits, and Pull Requests. You’ll create your own Hello World repository and learn GitHub’s Pull Request workflow, a popular way to create and review code.
Publish on Github
GitHub Pages are public webpages hosted and easily published through GitHub. The quickest way to get up and running is by using the Jekyll Theme Chooser to load a pre-made theme. You can then modify your GitHub Pages’ content and style remotely via the web or locally on your computer.
Writing blogs is an effective way to showcase your expertise and skills. You can write what you have learnt recently, any interesting problem you have solved or worked on any project.
Writing an engaging blog-post is an art in itself, here are few tips to write and promote your blog posts.
Take notes for ideas
Start by writing down ideas as they occur to you. Make it a habit and keep doing it consistently by installing a note-taking app (like Keep, EverNote etc) on your mobile device.
Ideas occur to us all the time. You need a way to capture them when they do so that you can turn them into a great blog post in the future.
Build a simple outline
It is an essential step to develop an easy-to-follow outline before you sit down to write a blog post.
Once you’ve picked a topic to write about, from the list of ideas that you’ve written down, create an outline. The outline contains a heading, introduction, major points you want to write about and conclusion.
To get the juices flowing, you should actually write the introduction and the conclusion first, then add a list of things that you’ll cover in the body.
Start with a story
Entertainment is the biggest factor in engaging your audience. If you’re just about to start a blog, keep this at in your mind.
Stories engage people in and help clear the doubts. You are able to develop a scene which people can relate to.
Become a memorable writer by integrating stories into your blog posts. It doesn’t have to be your own story, you can tell interesting stories about others.
Solve common problems
Consistent writing is one of the easiest ways to become a better writer. The question is, what should you write about? As a beginner, write blog posts that answer questions.
Look for the problems that are common in your field, what most of the people are struggling with. Research about that topic, try to explore the problem and its possible solution.
Learn & Share
When I write a blog post, I read a lot about the subject. On the web and in real life, there are too many questions with too few answers.
Many a time, you will end up learning yourself in an attempt to write the post on a certain topic.
Read other great writers
The truth is that if you don’t read great writers, you don’t really know how to do it and that successful blog that you dream of will evade you.
I’ve learned that I get a better education from studying authors’ best work than I do from waiting for a piece of advice from them.
As part of being a successful and well-rounded data scientist, giving back can be a rewarding and beneficial aspect. Becoming a mentor, or mentoring those who want to follow in your steps of being a data scientist can sharpen your expertise and credentials.
Build a Personal Brand
Building a brand is about giving yourself more opportunities to help and connect with people in your industry. And one of the best ways to build a brand is through blogging.
A blog is a hub for your advice. It also has the added benefit of helping you rank on search engines.
I hope that reading this inspires at least a few of you who want to become a data scientist and want to get better day by day by following the above-mentioned approach.
In the next post, we will cover how you can build your network and start looking for a job.
Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also, feel free to listen to me on SoundCloud.