Intent Media is now Intent
We use cookies, which include essential cookies and cookies to analyze our traffic. For more information please visit our privacy policy

Hack Day: March 2014

  • Posted by Intent Media

One of the best things about working in tech at Intent Media is the fun we all have while working. I’m not just talking about the obviously fun stuff like playing foosball or watching the dogs chase their own shadows. We also seem to have an inordinate amount of fun battling with checkstyle and debugging EMR jobs. I think it’s a great sign for our future how much we all seem to enjoy doing the stuff that is, in fact, our real jobs.

All of this goes double for when we’re really trying to have a bit of fun with things during my favorite regular IM event, Hack Day. We have Hack Days about once every quarter. The goal is to work on something outside what we would normally do. Of course, we’ve had lots of great products come out of Hack Days and make into customers hands. We’ve also had a ton of success with other Hack Day projects that focused on improving our office, our kitchen, and just our experience at work. This latest Hack Day was no exception to our long track record of successful Hack Days with tons of awesome projects. Here’s a recap.

Day 0: Continue reading

Deleting Cookies in Safari 6.0 with Selenium WebDriver

  • Posted by Intent Media

One of the many useful features of Selenium WebDriver is that it can act as a wrapper for several different browser drivers. This means (in theory, at least) that if you restrict yourself to calling methods exposed by the WebDriver API, you can reuse the same code to drive automated tests in any of the browsers WebDriver supports.

In practice, however, this is not always the case. There can be subtle differences in how WebDriver’s methods are implemented for each browser. In some cases, certain browsers may not support a given method at all.

One example is deleting cookies. In Ruby, we can tell WebDriver to delete all cookies for the current domain via a call to


This executes WebDriver’s browser-specific implementation of the delete_all_cookies method, which for most browsers works as expected.

For Safari, a message is passed to the SafariDriver telling it to execute its own cookie deletion method. In OSX, SafariDriver does this by deleting the ~/Library/Cookies/Cookies.binarycookies file where the cookie data is stored.

Unfortunately, as of Safari 6.0, cookie data is also stored in active memory by the cookied process. If cookied notices that Cookies.binarycookies is missing, it simply recreates the file from memory — … Continue reading

Random Assumptions can make an A– Part 1

  • Posted by Intent Media


Sometimes its good to take an in-depth look at the mistakes we make. This is an examination of an issue we were addressing, and how a couple of assumptions we made, introduced a new and devious bug.

It ain’t pretty but looking back is how we get better. And hopefully you can read this and learn something without making the same mistake.

We split this into two posts. The first post covers the original problem we encountered and our attempt to solve it. The second post will cover the bug we introduced, and our final resolution.

Background on Multivariate Testing

We here at Intent Media believe in testing and data. We have a whole system dedicated to test the effectiveness of our ads, the Multivariate Test System. This system allows us to test multiple attributes of our ads at the same time. It’s used throughout our system and has proven to be a very effective tool to maximize our effectiveness.

For example, let’s say we decide to test a new design, Design Awesome, against our current design, Design Sweet with a 50/50 split. We also want to test a new Ad Copy, “Look at me!”, vs “I am … Continue reading

Introducing Dashboard.js

  • Posted by Intent Media


At Intent Media, we have a pretty glorious amount of data coming in from our partner sites, and we needed a way to visualize all of that data, in real-time, so that we could detect and recover from issues quickly and discover patterns that aren’t apparent using alerts alone. We already had this data available, but it involved logging in to a service and navigating to many different pages, which is very different from having it always immediately available on a screen somewhere in the office, where anybody walking by might notice something interesting going on.

We originally tried using Dashing, which is nice for displaying many simple data points on one page – but we wanted to display complicated data on many pages, leveraging the full power of d3.js. So we wrote Dashboard.js. We’re using this to display dozens of charts that are specific to our company (build monitors, code coverage, and interactive area charts detailing revenue, interactions, extranet performance to name a few) but released an open-source version that includes a couple basic example services that hopefully demonstrate what’s possible.

The source code is available at – we’d love to continue to improve it and get … Continue reading

RAID0 and stripe sizes

  • Posted by Intent Media

This will hopefully be the first in a series of posts that relate to a recent project we undertook – improving our build cycle from over 1 and 1/2 hours to 30 minutes.

The project touched the full vertical build stack – hardware and hardware virtualization, CI configuration and software. Every improvement in every slice of this vertical contributed to a speed increase of some kind.

Today, the topic is RAID0 and disk stripe size… Why? The answer is simple, your hardware could be a speed bottle neck.

Let’s start off with a couple of definitions:

Q. What is RAID0?

A. “RAID 0 offers striping with no parity or mirroring. Striping means data is “split” evenly across two or more disks. For example, in a two-disk RAID 0 set up, the first, third, fifth (and so on) blocks of data would be written to the first hard disk and the second, fourth, sixth (and so on) blocks would be written to the second hard disk. RAID 0 offers very fast write times because the data is split and written to several disks in parallel. Reads are also very fast in RAID 0. In ideal scenarios, the transfer speed of the Continue reading

Vowpal Wabbit on Elastic MapReduce

  • Posted by Intent Media


Vowpal Wabbit is a fast machine learning package from John Langford’s group at Microsoft and Yahoo. It can be run in parallel on a cluster, allowing for implementation of e.g. the algorithms outlined in Zinkevich et al.

  • An overview of using VW on a cluster (not EMR-specific) is here
  • A good tutorial introduction to VW in general is here

The following are some notes about setting up VW to run on an EMR cluster.

Install AWS Ruby Command Line Tools

The first step is to download and install the EMR ruby client, setting it up so that one can launch and monitor jobs from the command line.

Create Bootstrap Script

The next step is to create a bootstrap script that each of the machines in the EMR cluster will run upon launch. This script needs to download the VW source, install VW and any tools necessary to build it, and put the libraries in a place where the EMR EC2 instances will see them. By default, the Amazon Machine Image that is launched for EMR can view libraries in the /usr/local/cuda/lib path. So we compile the vw libraries and copy them there, though we are not using … Continue reading

Distributed Classification with ADMM

  • Posted by Intent Media

Today we presented our paper on ADMM for Hadoop at the IEEE BigData 2013 conference.

The paper describes sadfasfour implementation of Boyd’s ADMM algorithm in Hadoop Map Reduce. We talk about the statistical details of implementing ADMM as well as the nuances of storing state on Hadoop.

In our presentation we present background on the data pipeline we have built at Intent Media and motivate why a Hadoop Map Reduce job is the appropriate run-time for us to use. We mention the alternatives for building distributed logistic regression models, such as sampling the data, Apache Mahout, Vowpal Wabbit, and Spark.

We also discuss alternatives specifically designed for iterative computation on Hadoop, such as HaLoop and Twister.

Our presentation is here:

You may also read the full paper Practical Distributed Classification using the Alternating Direction Method of Multipliers Algorithm.

The paper describes our open source Hadoop based implementation of the ADMM algorithm and how to use it to compute a distributed logistic regression model

Peter Lubell-Doughtie
Software EngineerContinue reading

Animating Bayesian Bandits in Python

  • Posted by Intent Media

Randomized probability matching is a strategy for solving the explore/exploit problem that arises when running experiments. It may be useful, for example, when a publisher is evaluating a number of different ad designs to determine which has the best click-through rate. At any given time the experimenter may decide to display the ad which has performed best so far (“exploit”), or try out different ads that might perform better (“explore”).

Ted Dunning showed a video demonstration of randomized probability matching during a recent Mahout presentation in New York. The following video is based on Ted’s, and the python code that generates it gives some insight into how randomized probability matching works.

The upper plot of the animation represents each ad design with a different color. The click/no click events for each ad are sampled from a bernoulli distribution with $P(Click)$ for each color based the small arrows in the top plot. In a real world experiment, these would be the unknown values that the experimenter is trying to discern. The curves in the plot represent the uncertainty about the true $P(Click)$ values for each ad, or in other words, $P(P(Click) == x)$. They change over time as experiments are run. … Continue reading

Designing Experiments with Continuous Inputs

  • Posted by Intent Media


We are continually running experiments to optimize publisher revenue and user experience. Many of these are A/B tests that involve two or more discrete alternative treatments. Others are experiments that involve a continuous set of possible quantitative treatments. Here, we will consider the latter case.

Example Application

As an example application, consider bidding in a sequence of auctions for a set of similar items. The more that one bids in a given auction, the more likely one will be to win it. But the result of a given auction may depend on the bidders involved and on variations between the items in the auction. So our observation of a given auction result may be a noisy one.

Let’s say that we want to model the probability of winning the auction as a function of the bid: $P(Win sim Bid)$. Running experiments in this situation will cost money as we will need to place a number of possibly suboptimal bids to learn this function. So we would like to run as few experiments as possible.

One way we could approach this problem is to discretize the bid space. By doing so, the resulting experiment will be similar to the A/B … Continue reading

Database Migration Testing

  • Posted by Intent Media

The challenge

For the past month+ I’ve been on a really interesting project converting some of our mysql reporting tables to Vertica. Initial indications are that Vertica flies and this will be an awesome investment for the company.

The challenge for me as the QA on the team was how to test this database change. A quick google search for ‘database migration testing’ gave me some ideas, but not any home runs and not really specific to my problem. There are also tools available which do this, but none of my research turned up anything I was particularly impressed with.

In the end, I decided to use basic tools provided by mysql and vertica to grab data. Then, it was a matter of brushing off my shell scripting and command line skills and putting it all together. Along the way, I was able to pair with a developer who really helped to improve the initial scripts to something kind of awesome. And, I feel really comfortable with the results of the testing.

So here is the journey

The concept and first script

Why not use command line mysql and vsql to grab table contents and then do a straight diff … Continue reading