- Posted by Intent Media
- 0 Comments
At Intent Media we collect vast amounts of data on travel and ecommerce, which we leverage to develop amazing products for the online travel marketplace. In particular, our data team develops models which predict user behaviors like clicks or transactions. We have developed a lot of experience with the complex world of large scale machine learning. While we have iterated on our infrastructure and implementations throughout the years, we have recently begun to productionize some of our models in Apache Spark. Here are a few thoughts on machine learning I had, based on empirical testing and experience.
1. Make production mirror development
In the past we often had a hard time testing our ML infrastructure on both a small and large scale level. For certain datasets, tools like SVMLight or Scikit-Learn are powerful. It is not difficult to build unit and integration tests around them in a development system. The problem is that when you use another tool at a larger scale (perhaps Mahout or our ADMM Hadoop implementation) you can no longer trust your small tests to tell whether a large scale change will break or not. Spark MLLib works really nicely in a development setting and is … Continue reading