The coding round has become an integral part of Data Science interviews. As ubiquitous it may be, it is also a dreaded round for many. With this post, I aim to fight fear with information, by sharing the different types of coding interviews and questions I encountered recently.
Let us look at the different Formats of execution and questions asked, and understand what concept is being tested by the questions.
You are asked to open an editor (Jupyter notebook) and share your screen with the interviewer. It’s appreciated when a candidate talks through their process to keep the interviewer on…
This post will help you understand the advantage of AUC over other metrics, how it’s calculated (using RoC), and why it’s necessary to calculate it so.
If you have built a classifier, you have most certainly measured the performance of the model using metrics like accuracy, precision, recall, or F score. But each of these metrics are calculated after defining the cutoff probability (like 0.5) at which to measure the metric.
What about when you have two competing models and want to compare their performance? What if model 1 performs best at a 0.5 cutoff, and model 2 performs best…
If you have read about Bootstrap and Out of Bag (OOB) samples in Random Forest (RF), you would most certainly have read that the fraction of observations in the ‘bag’ when you build RF with bootstrap is around 63.2%.
This post is a crisp explanation for the origins of the number 63.2%.
The post is organized as:
RF is a techniques of ensemble learning through Bagging.
Bagging = Bootstrap + Aggregation
Bootstrap means that instead of training on all the observations, each tree of RF is trained on…
The goal of this article is to introduce the concept of POS chunking with the example of Amazon review tags.
I am planning to upgrade from a 2017 Moto G5 plus to a new phone. In my research for a new phone, I ended up going through a lot of phones listed on Amazon and scouring through their reviews.
And just like me, you’d have noticed a list of tags on top of the verbose reviews. These tags saved me a lot of time by highlighting the most talked about points regarding the phone.
If you’ve used SQL to perform a text search, you would have probably used the like command. But the limitation with like command is that it looks for exact matches. Luckily for us, SQL offers a feature - SQL FULL TEXT INDEX — that offers fuzzy text search capability on any column that contains raw text. this is a god sent for NLP projects.
I for one, am a big fan of NLP libraries offered by python — scikit learn and spaCy.
But before one steps into the deep waters of the text processing, its good to dip your toe…
Its important to know the difference
Quite often when we are venturing in something new, we are faced with doubts, fears. We are even tempted to give up saying, it was impossible. At this time if we can draw the line between the impossible and the unknown, then we can quite easily transcend the fear.
More often than not, the task is only unknown. At such time, we need to list down things to do, and find the best person/resource to guide. And slowly what was impossible starts becoming a reality.
On the other hand if it really is impossible, still try. At least now we know that its impossible in this way. Perhaps it’ll be possible some other way.