/ ruby

Winning An ORM

Object Relational Mapping promise to let us manipulate data using objects, standard coding paradigms and good old loops. The problem is they rarely deliver.

When I just started writing Rails I implemented a simple chat web service. It had a nice UI and worked well in my tests, but just a few hours after moving to production it became impossible to work with, all due to using ActiveRecord the wrong way.

ORM Anti Pattern: N+1 Queries

The main reason my chat failed was treating SQL query results like normal objects. Consider the following snippet:

messages = current_chat_room.messages
render json: {
  messages: messages.map do |msg|
      from: msg.sender.name,
      text: msg.text
  end
}

It creates an array of message objects each with a from and text fields. But to get the from field I used msg.sender.name. If msg.sender is also an ActiveRecord with a name field, then rendering message requires an extra DB query. The more messages we have the more queries we need, and this gets slow quicker than you think.

That same bug is harder to notice the fatter your models become. Consider a little convenience method in Message class:

class Message < ActiveRecord::Base
    def sender_name
        sender&.name || 'User'
    end
end

Previously we could conclude from the dot operator that something's wrong in the query, but now we have no way to know if sender_name is really an attribute in Message or its own model.

Fixing (and avoiding) The Issue

Fixing an N+1 problem once you find it is not that hard. Just use a join to collect all the fields you need in one query:

messages = current_chat_room.messages.joins(:sender).select('
  messages.id as id,
  messages.text as text,
  senders.name as sender_name
')

render json: { messages: messages }

Avoiding the problem requires some discipline: Don't use map with active records (or any ORM result object for that matter). Instead use joins and select to create the data in the format you need it on the initial query.

Aggregating Data

On some occasions you may need additional queries to collect all the data required. Consider a blog post that belongs to multiple categories. The tables are posts, categories and categories_posts. This aggregated query is not a good idea if you want to list your posts for the index page:

posts = Post.joins(:categories).select('
    posts.title,
    posts.text,
    categories.name,
    posts.published_at
').order(published_at: :desc)

Because posts belonging to multiple categories will appear multiple times in the query result.

Another bad idea is to iterate the over posts and collect the categories in a loop. This brings us back to the old N+1 problem and won't survive the real world.

The correct solution here would be to let the database do the aggregation by building a better query. Here's how a postgresql query would look like:

posts = Post.joins(:categories).select('
        posts.title,
        posts.text,
        array_agg(categories.id)')
        posts.published_at
    ').
    group('posts.id').
    order(published_at: :desc)

The result relation will have just one item per post and its array_agg field contains an array of all category ids.

TL;DR

ORMs provide us with a friendly interface to manipulate SQL data objects as if they were code. Unfortunately they're not code and using that interface too often results in poor performance.

By limiting what we do with the query objects of the ORM we can keep using it and stay relatively safe.