How to fix problems when migrating a Rails app from MySQL to PostgreSQL

Recently, at eet.nu we have been working on migrating a Ruby on Rails application from MySQL to PostgreSQL. Depending on the complexity of an app, there might be many caveats during migration. Here are some notes on issues that we solved during the migration.

This article assumes that we work with:

  • Ruby on Rails 4;
  • MySQL 5.6;
  • PostgreSQL 9.4.

Quoting and Boolean fields

MySQL uses backticks (`) for quoting, while in PostgreSQL double quotes () are being used.

MySQL uses TINYINT datatype to represent boolean fields. Therefore, 1 would represent true and 0 would represent false. PostgreSQL has native boolean datatype. Instead of writing where(visible: 1) in MySQL, in PostgreSQL it would be where(visible: true).

Sorting with ‘nil’

If we order by a column, which contains nil (or NULL in SQL) values, then we might get different results in MySQL and PostgreSQL. In MySQL NULL values will be put last if ORDER BY ... DESC is used. In PostgreSQL NULL values are assumed to be larger than any non-null value. Therefore, NULL values will be put first if ORDER BY ... DESC is used. Some possible solutions are:

  1. (PostgreSQL specific) Use NULLS LAST or NULLS FIRST options: order("average_rating NULLS LAST")

  2. Add a minus sign before column name and revert sorting order(e.g., from DESC to ASC): order("-average_rating ASC")

Usage of prepared and unprepared statements.

Since Rails 3.1 prepared statements are enabled by default. Currently, gem mysql2 does not support prepared statements (see this pull request) while gem pg supports them. It means, when using MySQL with rails method to_sql it will return a SQL query without bind values, but if PostgreSQL is being used then to_sql method would return a SQL query that requires bind values (e.g., "user_id" = $1). In order to use unprepared statements in Rails 4 with PostgreSQL they should be wrapped into Model.connection.unprepared_statement { query.to_sql }

Division by zero

In MySQL there is a special mode that changes the behavior of handling division by zero - ERROR_FOR_DIVISION_BY_ZERO. If this mode is disabled, then MySQL will silently insert NULL. PostgreSQL is stricter and it will produce ERROR: division by zero. The function NULLIF(value1, value2) can be used to replace zero values with NULL.

Functions

Some built-in functions have different implementations or not present in PostgreSQL.

UNIX_TIMESTAMP()

In MySQL the function UNIX_TIMESTAMP() returns a Unix timestamp for the current time. The equivalent in PostgreSQL is: extract(epoch from now())

FIELD()

The function FIELD(str, str1, str2,...) in MySQL returns a position of str in str1, str2,… (imagine that it is an array). Unfortunately, there is no such function in PostgreSQL. For arrays that contain only integer values, we can use function idx from intarray module (documentation). See this gist for more suggestions on how to implement similar behavior in PostgreSQL.

Datatypes migration

There might be a need to convert from one datatype to another during migration. For example, a column should have a bigint datatype instead of text. Since in PostgreSQL there is no implicit conversion from text to int, we need to use ALTER TABLE (PostgreSQL documentation) command or using Rails migrations:

change_column :users, :uid, 'bigint USING CAST(uid as bigint)'

Default sort order

There is no default order of records returned from a database if ORDER BY is not specified. From PostgreSQL documentation: > If sorting is not chosen, the rows will be returned in an unspecified order.

Hence, if some tests started to fail because order of records is not the same as expected, then it is a good idea to explicitly specify sorting order.

Rounding

Both database management systems have ROUND() function to round numbers. By default, they might give different results.

mysql> select round(55/10);
+----------------+
| round(55/10)   |
+----------------+
|              6 |
+----------------+

and in PostgreSQL:

psql=# select round(55/10);
round
-------
    5

It seems that in PostgreSQL by default function ROUND() assumes that result of 5510 is of datatype double precision, which is not stored as exact value. In order to get expected result from ROUND() we should pass it at least one numeric value, e.g.:

psql=# select round(55::numeric/10);
 round
-------
     6

Data migration

There are different solutions on to how migrate existing data from MySQL to PostgreSQL, but pgloader worked best for us. It is straightforward to use and it is relatively fast.

Interesting reading

Comparison of different SQL implementations (not very fresh but still interesting).