Relevance Learning Challenge - Collecting Labeled Data Part 3 of 3

In the Relevance Learning Challenge 3-Part Series, we explore aspects of training data collection, judgment systems and the effects of human nature of judges. Part 3 explores deeper aspects of judgment collection, and system design with some specific suggestions that can improve overall model accuracy, and make life easier for judges.

Relevance Learning Challenge - Collecting Labeled Data Part 2 of 3

In the Relevance Learning Challenge 3-Part Series, we explore aspects of training data collection, judgment systems and the effects of human nature of judges. Part 2 explores collecting labeled data and multi-valued relevance judgments.

Relevance Learning Challenge - Collecting Labeled Data Part 1 of 3

Relevance Learning Challenge - Collecting Labeled Data Part 1 of 3

 

The hardest task that probably has the least amount of literature (as compared to other aspects of applied ML) happens to be the one most people take for granted, and many suffer for it - problem definition. This post will explore aspects of search and ranking tasks and provide some advice and examples about the problem of problem definition, to help increase the chance of a product being successful.

 

Know Your Priors (Prior Probability)

My first blog hits on a topic very important to me, one that is fundamental to all of machine learning, but yet so many practitioners forget about, or even worse don’t understand. The concept has become my go-to interview question for candidates for Machine Learning related jobs, and failure to properly consider it, is what I believe one of the largest causes of commercial failures of applied machine learning - prior probability.

While the phrase “prior probability” might cause some people to cringe in horror recalling that crazy-hard math class in college, the key lessons here are not very advanced and I hope that even a person who doesn’t like math will embrace. Hopefully a few minutes here will save hundreds or thousands of dollars of lost time and product launch failures.