• Posted by Intent Media 25 Nov
  • 0 Comments

Database Migration Testing

The challenge

For the past month+ I’ve been on a really interesting project converting some of our mysql reporting tables to Vertica. Initial indications are that Vertica flies and this will be an awesome investment for the company.

The challenge for me as the QA on the team was how to test this database change. A quick google search for ‘database migration testing’ gave me some ideas, but not any home runs and not really specific to my problem. There are also tools available which do this, but none of my research turned up anything I was particularly impressed with.

In the end, I decided to use basic tools provided by mysql and vertica to grab data. Then, it was a matter of brushing off my shell scripting and command line skills and putting it all together. Along the way, I was able to pair with a developer who really helped to improve the initial scripts to something kind of awesome. And, I feel really comfortable with the results of the testing.

So here is the journey

The concept and first script

Why not use command line mysql and vsql to grab table contents and then do a straight diff of these two outputs? Throw that into a shell script and voila, you have a test. Sounds simple right?

I started with something like this:

#!/bin/sh
. ./vertica_migration_functions.sh

rm $basedir/output/proof_*

current_table=our_best_table

mysql -u $mysql-user -D $mysql_database -e "SELECT * FROM "$current_table" ORDER by id" | sed 's/       /,/g;s/[^,]*,//' > $output_root/proof_mysql_"$current_table".out

sort $output_root/proof_mysql_"$current_table".out > $output_root/proof_mysql_"$current_table".sorted.out

vsql -U $vertica_user  -c "SELECT * FROM "$current_table" ORDER by id" -P null='NULL' -P footer=OFF -A --no-align -F ',' -o $output_root/proof_vertica_"$current_table".original -q

sed s/,[^,]*$// $output_root/proof_vertica_"$current_table".original > $output_root/proof_vertica_"$current_table".out

sort $output_root/proof_vertica_"$current_table".out > $output_root/proof_vertica_"$current_table".sorted.out

sql_output_file_diff $output_root/proof_mysql_"$current_table".sorted.out $output_root/proof_vertica_"$current_table".sorted.out "$current_table"

vertica_migration_functions.sh included a bunch of variables for the database settings, like mysql_user, etc. It also includes a diff function which wraps standard diff into a SUCCESS/FAILURE clause.

This script is doing alot. This is because the mysql and vsql calls are in fact returning significantly different formatted results. So for example, there is a sed on mysql is changing tab delimited to comma delimited output. There is a sed on both of the files to remove the id column (which was initially loaded to not be the same). Then they are both sorted so that the results are comparable.

By the way, the mysql and vsql tools do provide a multitude of ways to format the output, so it took some careful reading to determine how to turn column names off, row count footers off, format NULL and output in csv or tab delimited.

The final script

After pitching this idea to the team we decided this was a good approach. So we proceeded to use these scripts. As they were used and we learned more things, they got tweaked, and then tweaked even further as I began to pair with Ori, a developer on the project. Ori really improved these so they were even more parameterized and easier to run.

So here it is:

#!/bin/bash
. ./test_common.sh
. ./columns.sh

table_name=$1
full_extract=${2-false}
columns=$(eval "echo $${table_name}")
order_by=$(eval "echo $${table_name}_order")
incremental=$(eval "echo $${table_name}_incremental")
start_date='2012-10-19'
end_date='2012-12-04'
select_clause="SELECT $(eval "echo $${table_name}") FROM $table_name"
where_clause="WHERE $incremental>=$start_date and $incremental<=$end_date"
order_clause="ORDER BY $order_by"

mysql_proof=$output_root/proof_mysql_${table_name}.out
vertica_proof=$output_root/proof_vertica_${table_name}.out

main() {
  echo current table is $table_name full extract is $full_extract
  delete_existing_output_files
  sql=$(sql)
  execute_sql_on_vertica "$sql"
  if ! $full_extract; then execute_sql_on_mysql "$sql"; fi
  wait
  diff_extracts
}

diff_extracts() {
  ./diff_table_content.sh $table_name $full_extract
}

delete_existing_output_files() {
  rm -f $basedir/output/proof_*$table_name*
  rm -f $basedir/output/diff_*$table_name*
}

sql() {
  if $full_extract; then
    echo "${select_clause} ORDER BY id;"
  else
    echo "${select_clause} ${where_clause} ${order_clause};"
  fi
}

execute_sql_on_mysql() {
  $mysql_cmd -e "$1" --skip-column-names  > $mysql_proof &
}

execute_sql_on_vertica() {
  $vsql_cmd -c "$1" -P null=NULL -P footer=OFF -At --no-align -F '      ' -o $vertica_proof -q &
}

main

This looks even more complicated, but is in fact really just more cool. Let me explain one thing about the project though. The first step is for us to migrate the table’s existing content from mysql to vertica. The second step is to turn on the actual report that used to write to mysql and have it write to vertica. This test evolved to cover both of these steps.

Running this with the table name and full extract of true will do a full diff of the table, by building the select statement and ordering by the id (which we found during the project we could migrate with no difficulty). Running this with an incremental extract will only test a subset date range of the tables, and order by the table’s key (stored in columns.sh). The diff for the incremental extract (not shown here) will also remove any id columns, and any date field columns (like updated_at) which are not expected to match.

Diff options

The other thing we found was that using options -y -W $width —suppress-common-lines —speed-large-files were the best for our needs. In fact, until we found the speed large files option, this whole concept just was too slow to be feasible.

Conclusion

In the end, we were able to test 27 tables, with the largest being 57gb and feel really comfortable about the results. There are alot more details along the way, but this is a good start to show how we approached this problem.

Michele Martone
Head of Quality Assurance

Post Comments 0