An explanation for why the bagging fraction is 63.2%

If you have read about Bootstrap and Out of Bag (OOB) samples in Random Forest (RF), you would most certainly have read that the fraction of observations in the ‘bag’ when you build RF with bootstrap is around 63.2%.

This post is a crisp explanation for the origins of the number 63.2%.

How Random can the Forest be? (Photo by David Kovalenko on Unsplash)

The post is organized as:

  1. Example of Bootstrap
  2. Generalizing the example
  3. Simulation
  4. Conclusion

Recap of RF terminologies

RF is a techniques of ensemble learning through Bagging.

Bagging = Bootstrap + Aggregation

Bootstrap means that instead of training on all the observations, each tree of RF is trained on…

Using a regex parser based on grammar to extract key phrases

The goal of this article is to introduce the concept of POS chunking with the example of Amazon review tags.

I am planning to upgrade from a 2017 Moto G5 plus to a new phone. In my research for a new phone, I ended up going through a lot of phones listed on Amazon and scouring through their reviews.

Screenshot captured by Author

And just like me, you’d have noticed a list of tags on top of the verbose reviews. These tags saved me a lot of time by highlighting the most talked about points regarding the phone.

Using Full Text Search

Photo by Marten Newhall on Unsplash

If you’ve used SQL to perform a text search, you would have probably used the like command. But the limitation with like command is that it looks for exact matches. Luckily for us, SQL offers a feature - SQL FULL TEXT INDEX — that offers fuzzy text search capability on any column that contains raw text. this is a god sent for NLP projects.

I for one, am a big fan of NLP libraries offered by python — scikit learn and spaCy.

But before one steps into the deep waters of the text processing, its good to dip your toe…

Its important to know the difference

Photo by Ethan Dow on Unsplash

Quite often when we are venturing in something new, we are faced with doubts, fears. We are even tempted to give up saying, it was impossible. At this time if we can draw the line between the impossible and the unknown, then we can quite easily transcend the fear.

More often than not, the task is only unknown. At such time, we need to list down things to do, and find the best person/resource to guide. And slowly what was impossible starts becoming a reality.

On the other hand if it really is impossible, still try. At least now we know that its impossible in this way. Perhaps it’ll be possible some other way.

Divya Choudhary

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store