This gem provides a generic lazy batching mechanism to avoid N+1 DB queries, HTTP queries, etc.
- Highlights
- Usage
- Installation
- Implementation details
- Development
- Contributing
- Alternatives
- License
- Code of Conduct
- Generic utility to avoid N+1 DB queries, HTTP requests, etc.
- Adapted Ruby implementation of battle-tested tools like Haskell Haxl, JS DataLoader, etc.
- Batching is isolated and lazy, load data in batch where and when it's needed.
- Automatically caches previous queries (identity map).
- Thread-safe (
loader). - No need to share batching through variables or custom defined classes.
- No dependencies, no monkey-patches, no extra primitives such as Promises.
Let's have a look at the code with N+1 queries:
def load_posts(ids)
Post.where(id: ids)
end
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
# _ ↓ _
# ↙ ↓ ↘
users = posts.map do |post| # U ↓ ↓ SELECT * FROM users WHERE id = 1
post.user # ↓ U ↓ SELECT * FROM users WHERE id = 2
end # ↓ ↓ U SELECT * FROM users WHERE id = 3
# ↘ ↓ ↙
# ¯ ↓ ¯
puts users # UsersThe naive approach would be to preload dependent objects on the top level:
# With ORM in basic cases
def load_posts(ids)
Post.where(id: ids).includes(:user)
end
# But without ORM or in more complicated cases you will have to do something like:
def load_posts(ids)
# load posts
posts = Post.where(id: ids)
user_ids = posts.map(&:user_id)
# load users
users = User.where(id: user_ids)
user_by_id = users.each_with_object({}) { |user, memo| memo[user.id] = user }
# map user to post
posts.each { |post| post.user = user_by_id[post.user_id] }
end
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
# _ ↓ _ SELECT * FROM users WHERE id IN (1, 2, 3)
# ↙ ↓ ↘
users = posts.map do |post| # U ↓ ↓
post.user # ↓ U ↓
end # ↓ ↓ U
# ↘ ↓ ↙
# ¯ ↓ ¯
puts users # UsersBut the problem here is that load_posts now depends on the child association and knows that it has to preload data for future use. And it'll do it every time, even if it's not necessary. Can we do better? Sure!
With BatchLoader we can rewrite the code above:
def load_posts(ids)
Post.where(id: ids)
end
def load_user(post)
BatchLoader.for(post.user_id).batch do |user_ids, loader|
User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
end
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
# _ ↓ _
# ↙ ↓ ↘
users = posts.map do |post| # BL ↓ ↓
load_user(post) # ↓ BL ↓
end # ↓ ↓ BL
# ↘ ↓ ↙
# ¯ ↓ ¯
puts users # Users SELECT * FROM users WHERE id IN (1, 2, 3)As we can see, batching is isolated and described right in a place where it's needed.
In general, BatchLoader returns a lazy object. Each lazy object knows which data it needs to load and how to batch the query. As soon as you need to use the lazy objects, they will be automatically loaded once without N+1 queries.
So, when we call BatchLoader.for we pass an item (user_id) which should be collected and used for batching later. For the batch method, we pass a block which will use all the collected items (user_ids):
BatchLoader.for(post.user_id).batch do |user_ids, loader| ... end
Inside the block we execute a batch query for our items (User.where). After that, all we have to do is to call loader by passing an item which was used in BatchLoader.for method (user_id) and the loaded object itself (user):
BatchLoader.for(post.user_id).batch do |user_ids, loader|
User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
When we call any method on the lazy object, it'll be automatically loaded through batching for all instantiated BatchLoaders:
puts users # => SELECT * FROM users WHERE id IN (1, 2, 3)
For more information, see the Implementation details section.
Now imagine we have a regular Rails app with N+1 HTTP requests:
# app/models/post.rb
class Post < ApplicationRecord
def rating
HttpClient.request(:get, "https://example.com/ratings/#{id}")
end
end
# app/controllers/posts_controller.rb
class PostsController < ApplicationController
def index
posts = Post.limit(10)
serialized_posts = posts.map { |post| {id: post.id, rating: post.rating} } # N+1 HTTP requests for each post.rating
render json: serialized_posts
end
endAs we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called parallel:
class Post < ApplicationRecord
def rating_lazy
BatchLoader.for(post).batch do |posts, loader|
Parallel.each(posts, in_threads: 10) { |post| loader.call(post, post.rating) }
end
end
# ...
endloader is thread-safe. So, if HttpClient is also thread-safe, then with parallel gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for concurrent HTTP requests in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.
In the controller, all we have to do is to replace post.rating with the lazy post.rating_lazy:
class PostsController < ApplicationController
def index
posts = Post.limit(10)
serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }
render json: serialized_posts
end
endBatchLoader caches the loaded values. To ensure that the cache is purged between requests in the app add the following middleware to your config/application.rb:
config.middleware.use BatchLoader::MiddlewareSee the Caching section for more information.
Batching is particularly useful with GraphQL. Using such techniques as preloading data in advance to avoid N+1 queries can be very complicated, since a user can ask for any available fields in a query.
Let's take a look at the simple graphql-ruby schema example:
Schema = GraphQL::Schema.define do
query QueryType
end
QueryType = GraphQL::ObjectType.define do
name "Query"
field :posts, !types[PostType], resolve: ->(obj, args, ctx) { Post.all }
end
PostType = GraphQL::ObjectType.define do
name "Post"
field :user, !UserType, resolve: ->(post, args, ctx) { post.user } # N+1 queries
end
UserType = GraphQL::ObjectType.define do
name "User"
field :name, !types.String
endIf we want to execute a simple query like the following, we will get N+1 queries for each post.user:
query = "
{
posts {
user {
name
}
}
}
"
Schema.execute(query)To avoid this problem, all we have to do is to change the resolver to return BatchLoader:
PostType = GraphQL::ObjectType.define do
name "Post"
field :user, !UserType, resolve: ->(post, args, ctx) do
BatchLoader.for(post.user_id).batch do |user_ids, loader|
User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
end
endAnd setup GraphQL to use the built-in lazy_resolve method:
Schema = GraphQL::Schema.define do
query QueryType
use BatchLoader::GraphQL
endThat's it.
By default BatchLoader caches the loaded values. You can test it by running something like:
def user_lazy(id)
BatchLoader.for(id).batch do |ids, loader|
User.where(id: ids).each { |user| loader.call(user.id, user) }
end
end
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
# => <#User:...>
puts user_lazy(1) # no request
# => <#User:...>Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:
use BatchLoader::MiddlewareTo drop the cache manually you can run:
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
puts user_lazy(1) # no request
BatchLoader::Executor.clear_current
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)In some rare cases it's useful to disable caching for BatchLoader. For example, in tests or after data mutations:
def user_lazy(id)
BatchLoader.for(id).batch(cache: false) do |ids, loader|
# ...
end
end
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)Add this line to your application's Gemfile:
gem 'batch-loader'And then execute:
$ bundle
Or install it yourself as:
$ gem install batch-loader
See the slides [37-42].
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
There are some other Ruby implementations for batching such as:
However, batch-loader has some differences:
- It is implemented for general usage and can be used not only with GraphQL. In fact, we use it for RESTful APIs and GraphQL on production at the same time.
- It doesn't try to mimic implementations in other programming languages which have an asynchronous nature. So, it doesn't load extra dependencies to bring such primitives as Promises, which are not very popular in Ruby community. Instead, it uses the idea of lazy objects, which are included in the Ruby standard library. These lazy objects allow one to return the necessary data at the end when it's necessary.
- It doesn't force you to share batching through variables or custom defined classes, just pass a block to the
batchmethod. - It doesn't require to return an array of the loaded objects in the same order as the passed items. I find it difficult to satisfy these constraints: to sort the loaded objects and add
nilvalues for the missing ones. Instead, it provides theloaderlambda which simply maps an item to the loaded object. - It doesn't depend on any other external dependencies. For example, no need to load huge external libraries for thread-safety, the gem is thread-safe out of the box.
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the Batch::Loader project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.