• Posted by Intent Media 02 Jun
  • 0 Comments

Data Engineering Principles & Practices

Intent Media takes its culture and values very seriously.  We want to continue to build a company full of energetic, collaborative people who are driven to make an impact.  As part of our value of openness, we on the data engineering squad decided to publicly share our thoughts around our team-level principles and practices.

What do we do on the data engineering squad?  We’re responsible for building large scale machine learning models using terabytes of eCommerce data and massive clusters of servers.  These models are used to predict user behavior on travel sites and optimize revenue for our partners.  For a deeper dive into some of what we do, check out our blog posts on the following topics:

The material below  is adapted from an internal document that we drafted over the course of months, discussing with all of our team members and stakeholders.  It’s divided into our principles, the things that we value, and our practices, the tactics we use to uphold those values in our work.

Principles and Practices

Technology

We collectively own the codebase.

Individuals are empowered to make coordinated changes anywhere in our applications, other principles in mind.

Practices:

  • We pair-program often to produce more effective implementations, to reduce bugs, and to share knowledge. We especially encourage pairing on user stories that are particularly complex, touch intricate parts of the codebase, or could benefit from knowledge sharing. We discuss whether to pair on every story.
  • We should all understand the codebase, so we communicate our changes through things like documentation, tests, docstrings, etc.

We value automated tests.

They enable refactoring, give us trust in the application’s functionality, and allow us to work faster.

Practices:

  • Every MapReduce job owned by Data Engineering will have an integration test.
  • Automated tests are the default method for proving the functionality of a feature.
  • Tests must be reliable, meaning resilient, maintainable, and extensible. In other words, they are to be treated like any other code.

We have collective operational ownership.

We want to take responsibility for deploying and maintaining our applications.  We all have the same freedom to use company infrastructure resources to accomplish our shared mission and thus the obligation to use them responsibly.

Practices:

  • Each team member has access rights to our AWS account to create machines and run jobs.
  • Our “you build it you run it” approach to techops encourages engineers to own their deployment infrastructure.  We set up alerts around the functionality we own, and we respond to them when they fire.

Process

We continuously, rapidly deliver business value.

Practices:

  • User stories are tightly scoped, ensured through close collaboration between developers, product owners, and QA.
  • User stories have clear exit criteria, ensured through input from QA.

We prefer direct conversation over tools when coordinating changes.

We prefer collaboration over asking for forgiveness or permission.

Practices:

  • We sit at a large common desk area to make it easy to communicate in person.
  • We use Slack throughout the day to share ideas online, in realtime.

Collaboration

We consider QA concerns throughout the development process.

Practices:

  • Team members with QA expertise participate in the entire lifecycle of a user story, from scoping to development to verification.

We collaboratively deliver value with Data Science.

We strive continually to refine our modeling capability at all stages of model development, from research to prototype to production, and including operation and maintenance.

Practices:

  • Since moving technology from research to production is one of the most important tasks we perform, coordination is crucial during that phase.  Concretely, a person with expertise in each of data engineering, data science, QA, and product should be involved in that transition, as early in that process as possible.  At that point, a kickoff-style conversation should usually be conducted.

We collaborate closely with a product owner.

The product owner seeks to understand constantly evolving market needs in order to prioritize what is most worthwhile for the team to build at any given moment in time.

Practices:

  • User stories and epics are kicked off by a conversation between product owners, developers, and QA, also known as the Three Amigos.

Growth

We are all students and teachers.

We all are expected to be humble enough to learn from each other, and we each bring unique backgrounds and ideas that are worth sharing.

Practices:

  • When starting work on an existing system, instead of scrapping the codebase and rewriting, we consider the good design choices that were made.
  • We listen charitably to everyone’s ideas, regardless of level of experience
  • We offer thoughts on improving code during pair programming and code review.

We support each other in the pursuit of mastery.

Practices:

  • We support and celebrate each individual’s development through activities like blogging, speaking, open sourcing, and other activities.
  • We continue to learn and grow as engineers through continuing professional study with our coworkers using MOOC study groups, technical book study groups, and other resources.

Wrap Up

Just like our team, this document is always evolving.  So far, it seems like the principles are going to remain fairly constant with our practices being much more fluid.  These are the principles and practices that work for our team today.  We’d love to hear what you think about them.  If this describes how you would like to work, then check out the Intent Media jobs page for our open roles.


 

John Chapin is a data engineer at Intent Media, where he is building the next generation of predictive analysis tools alongside a world-class team. When he’s not hacking on a functional programming project, he can be found running along the Hudson River or planning his next trip abroad. John has a bachelor’s degree in computer science from James Madison University.

Wesley Harris tests software with the data science team at Intent Media. He codes as @whharris and tweets (rarely) as @whharris. You can find him IRL rolling four deep (+ spouse, dog, baby) through the wilds of Prospect Park in beautiful Brooklyn, New York.

Shehzad Khan is a product manager at Intent Media, working with data engineers & scientists to build machine learning software products. He is a music enthusiast who enjoys DJing in his spare time. Shehzad has a master’s degree in business administration from Columbia University. You can find him on Soundcloud if you are interested in avant-garde electronic music.

Chet Mancini is a data engineer at Intent Media, Inc. He enjoys functional programming, downhill skiing, and cycling around Brooklyn. Chet has a master’s degree in computer science from Cornell University.  You can find him on Twitter and on Github.

Randhier Ramlachan tests software with the data science team at Intent Media.  He has a bachelor degree in computer science from New Jersey City University.

Phil Rene is a data engineer at Intent Media. On the weekend, he is usually playing hockey, fishing in some remote location or enjoying good food from a farm to table restaurant. You can find him on Twitter and on Github.

Sebastian Rojas is a data engineer at Intent Media, where he works solving big data processing and machine learning problems. He is constantly dreaming about dystopian retro futuristic technology and sometimes composing related music about it. You can find him on Github.

Jeff Smith is a data engineer at Intent Media working on large scale machine learning systems.  He used to build warehouses and sequence genomes (not at the same time, though).  Intent Media is the fifth startup he’s worked at, and it’s easily the most fun one.  You can find him tweeting, blogging, and drawing comics all over the internet.


 

The data engineering squad also wants to thank their counterparts on the data science squad, who have been great partners in writing this post and building our platform.  Thanks to Jon Sondag, Saurav Pandit, Sharath Rao, Sam Ross, Brian Roland, and Eddie Liu.

 

Post Comments 0