The key challenge in on-line advertising is placing the right ad that matches a specific user context, such as a search query (sponsored search) or browsing a web-page (contextual advertising). The computational techniques, which find the best match to deliver the most relevant ad, while maximizing value for advertisers, publishers and ad-networks, lie at the intersection of machine learning, statistics, optimization and micro-economics.
In Y! Labs, Bangalore, we are working on aspects ranging from relevant ad matching for contextual advertising to accurate click prediction for display ads.
In contextual advertising, the “best” ad is selected based on its relevance to the web-page. As pages change and new ones continuously come online, the page URL is all that is available to determine the best ad. We are developing unsupervised techniques to hierarchically model the web page inventory in a given Web site into contextually similar sub-sites, associate topics with the identified sub-sites, and use these sub-site features for better ad targeting.
An upcoming trend, is that of open ad exchanges, where guaranteed and non-guaranteed ads will compete with each other for placement on pages. A key problem in this scenario is to be able to meaningfully compare an ad, which pays per impression (CPM ad), to an ad which pays on a per click basis (CPC ad). A known metric to make the comparison fair is called eCPM, which uses the product of bid amount and the probability of realizing revenue. The problem now boils down to accurately estimating the probability of a click, given a user, an ad and the page. We are developing statistical and machine learning techniques for rare-event probability estimation, the goal of which is to reduce the bias and variance in estimating the click probabilities.
While monetization can be increased by placing the most-clicked ads more often on web-pages, it is often not desirable or even allowed. We are working on defining and computing metrics for ad-quality, which encompass features beyond short-term clickability of an ad. Diversity (in relation to other ads for similar context) is one of them. Semantic relevance (as opposed to frequency based matching of common terms) is another. Synthesizing these metrics into a single unified ad-quality score, which can be used at serving time to increase CTRs is the goal of this project.
We are working on data-driven ways to generate customized ad performance improvement recommendations for our advertisers. Different advertising models, Pay-per-impression banners vs. Pay-per-click banners vs. Sponsored search results, pose different scientific and business challenges. We are working on a wide variety of algorithms for click-through rate prediction, statistical modeling, market-basket analysis, and keyword expansion. Solutions need to be efficient and scalable; requirements that pose unique challenges.