Ruby DataMapper Status

In my recent post I gave you a brief overview of what I think about the state of Ruby ORMs. Since I’m involved in the development of DataMapper project I want to write a little more lines of text to give you a good overview of the current state of DataMapper project and how the future version 2.0 is going to look like.

DataMapper 1.x

Let’s get this straight once and for all: DataMapper was never a pure implementation of the Data Mapper pattern. It has elements of both Data Mapper and Active Record patterns. It has a mapping layer where you can configure mappings between model properties and database columns but this doesn’t change the fact that your models have direct access to the persistence layer and a lot of persistence-related functionality is mixed into them. This is what makes DataMapper an ‘ActiveRecord-ish’ library. After a few years of development it’s become clear that this approach, despite its advantages over ActiveRecord, is still not good enough.

Probably one of the biggest wins of DataMapper is support for many different kinds of data stores. Apart from supporting most common RDBMS databases there are adapters for various key-value stores, NoSQL databases and more. The flip side is that working with multiple data stores at once is not very stable at the moment and that you can’t easily use the full power of various databases, like MongoDB, due to limitations in DataMapper’s API.

The good news is that while working on DataMapper and many of its adapters we’ve learned our lessons. With all that knowledge and experience the work on DataMapper 2.0 has been started.

DataMapper 2.0

First of all the next major version of DataMapper will implement the Data Mapper pattern as described in the PoEAA:

A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.

DataMapper 2 won’t be a big monolithic library, it will consist of multiple independent pieces that are glued together. Each piece of DM2 is a standalone
library so that you could use it separately from the DM itself.

Let me describe all the related projects.

Veritas – the new relational algebra engine

The core of DM2 is Veritas – the new relational algebra engine developed by Dan Kubb. Why not ARel? The answer is simple – ARel is designed to generate SQL whereas DataMapper needs something more abstract with adapters that can handle different kinds of query dialects. With that premise Veritas is a full-featured relational algebra implementation and there’s already an SQL generator which can build pretty complex queries. More generators will be written soon.

Here’s a quick sneak-preview of how you can build relations with Veritas:

What’s great about Veritas is that with its level of abstraction you can use it for various things – not just to generate queries. For example there’s an
idea of using Veritas to build a new “migrations” library for DM2 that will easily handle even very complex scenarios.

Virtus – the new model definition and introspection layer

I’ve already introduced Virtus – it’s an extraction of DataMapper’s Property API. During the last few months we’ve made many significant improvements and more features will be added soon too. Virtus already supports defining attributes on your models and can handle many kinds of coercions. You can also implement your own coersion methods if you need them.
In the near future Virtus will be supporting more complex functionality like EmbeddedValue and ValueObject. It’s also possible that with Virtus you will be able to define relationships between POROs.

The important thing about Virtus is that it’s designed to work with PORO and it has nothing to do with any persistence concerns. For example there were
people asking me about things like support for dirty attributes tracking – it’s not going to be included in Virtus.

I should mention that DataMapper 2.0 will be designed to work with POROs that are not extended by Virtus. Virtus will be optional but since it provides a very common functionality I suspect most people will want to use it.

Aequitas – the new validation library

Developed by Emmanuel Gomez, Aequitas is designed to work with any Ruby object and it also has support for Virtus. The idea is similar to what we have in DataMapper 1.x where validations can be derived from property declarations. Aquitas is based on the current
dm-validations and the API is very similar, if not identical. Aequitas has support for custom error messages, I18n, validation contexts and built-in data types which come with their own set of validation rules.

Session, Dirty Tracking and Unit of Work

In the Data Mapper world every change made to the models loaded into memory is tracked by a specialized object which works like a session. When you’re done
with making changes you can either commit them or rollback. The important thing is that all the complex logic behind tracking what was changed, what was deleted and what was added is handled by the session object. Your models are dumb in this regard, they don’t care about dirty tracking, it’s not their responsibility. It’s one of the major differences between AR and DM.

The session object will be an implementation of the Unit of Work pattern described in PoEAA as:

Maintains a list of objects affected by a business transaction and coordinates
the writing out of changes and the resolution of concurrency problems.

More technically speaking a session will hold a DAG of commands sorted by their dependencies so that when you commit the session it will know in what order those commands should be executed.

When you think about it you will realize that this pattern is quite common. That’s why it would probably make sense to come up with a general UoW library
and then create an extended version for DataMapper needs.

Here’s an example of how such a session may look like:

The Mapper

Probably dm-core, the current core DataMapper library, will be completely rewritten and become the mapper layer with a thin query API that delegates most of the heavy work down to a veritas adapter. For your convenience there will be a veritas mapper class that you can inherit from but this doesn’t mean that you won’t be able to write custom mapper classes. The idea is really simple here, a mapper class defines
mappings between PORO and the database schema. In most of the cases this means a direct 1:1 mapping but the huge advantage is that once you need something
custom – you will be able to define it.

This is really the key aspect of using a data mapper library – you define your domain objects so that they correspond to your real world domain as close as
possible. It makes applying practices like “Fast Rails Tests” suggested by Corey Haines come in a natural way because you’ll be implementing the business logic in POROs and have them unit tested in an isolation without the database access.

You will be able to define a mapper class more or less like this:

Roadmap? ETA?

There’s a work-in-progress roadmap for DM2 available here. Regarding ETA it’s really hard to say. We’re taking our time to build all these libraries, there’s a big focus on code quality and proper test and docs coverage. Once the roadmap is finalized we will be able to come up with some ETA.

Here’s the full list of related projects on Github:

…and more will come soon, so stay tuned!

Make sure to follow @datamapper on twitter too :)

If you’re eager to learn more you can always join #datamapper IRC channel. I also understand that this post doesn’t answer many possible questions – feel free to ask them in the comments.

  • http://ku1ik.com/ Marcin Kulik

    Great insight into the future. And the future seems bright!

    • johan_lunds

      I agree.

  • http://twitter.com/JingwenOwenOu Owen Ou

    Nice! I have been following DM 2.0 for a while and am very glad to see the official announcement. If anyone still have questions about the DataMapper pattern, I have a blog post about it awhile back ( http://owenou.com/2011/09/24/poeaa-on-rails.html )

    • http://solnic.eu/ solnic

      Thanks for the link!

  • http://twitter.com/rmontgomery429 Ryan Montgomery

    It’s great to see this coming to the ruby community. This reminds me of the kind of thing I used to do in .NET. We would map our domain model to NHibernate using AutoMapper and LinqToSql for queries. Everything was handled by a unit of work and some repositories. Linq could query our domain objects which allowed for isolated tests without any database backend.

    One thing I don’t see above is the idea of AutoMapping. What do you think about making things like the User::Mapper only include the exceptions to the rules, and maybe have some conventions. So that I don’t have to define a field called :id which is an IdentityField and then have to repeat myself by telling you to map that field explicitly in a mapper.

    • http://solnic.eu/ solnic

      I completely forgot to mention that we will definitely provide an auto-mapping feature! Thanks for pointing this out – I will update the post.

  • http://twitter.com/edzhelyov Evgeni Dzhelyov

    From what I read, you will implement every part even if that there are already gems that implement some of the functionality. Can you reason why you’ve chosen to implement everything ?

    There are a lot of validation gems, including the functionality present in ActiveModel.
    Have you considered re-using already existing code ?

    I’m asking because I’ve this feeling that everyone in implementing from scratch in Ruby.

    • Emmanuel Gomez

      Aequitas is a deep refactoring of dm-validations, so it’s not a completely from-scratch effort. 

      One of my main motivations for not using ActiveModel validations is that the ActiveModel::Validator API is based on side-effects. It makes things more difficult to test and makes assumptions about what kinds of objects can be validated. In contrast, Aequitas’ API is based on referential transparency and command-query separation. In other words, I don’t like that ActiveModel::Validation is based on adding errors to the errors collection of its arguments (eg., def validate(record) record.errors[:base] = 'watch the side-effects pile up!'; end; see: http://api.rubyonrails.org/classes/ActiveModel/Validator.html). 

    • http://github.com/dkubb Dan Kubb

      That’s a really good question, one that I wish more people were asking. Believe it or not, you’re the first person who’s mentioned this to me about DM2 as a whole even though we’ve been planning it for a while now. People have mentioned it about one piece I’ve been working on called Veritas, but I’ll get to that a bit later.

      Many of the libraries in the Ruby community are based around the idea that an object handles validation, persistence, business logic, and other responsibilities at the same time. Most ORM/ODMs in Ruby follow that pattern (including DataMapper 1), and the libraries make similar assumptions. In a proper Data Mapper the pieces should be decoupled and usable by themselves. You should be able to test your domain objects separate from the mapper and persistence layers for example.

      It’s actually kind of funny that you mention other libraries like ActiveModel and validation gems. DM1 provided separate gems for those things several years before most of the overlapping gems and gems extractions were released, as far back as 2007/2008. If you go through and look at their introduction blog posts, you’ll usually see one or two comments asking “Why didn’t you use DataMapper for this?” or “There’s a DataMapper gem that does this”.

      We’ve experimented with many different approaches for quite a while, and we know the strengths and weaknesses. What we’re planning is basically just the next step of our evolution.

      Another part that deserves attention is the part I’ve been working on called Veritas. It’s a relational algebra library which allows you to describe queries for the datastore. Some people have asked me why I didn’t just use ARel. I think it’s is a powerful library, and it inspired much of my work, but for pragmatic reasons it’s evolved into primarily an SQL generator rather than a relational algebra library. Internally it uses an AST that directly models an SQL SELECT statement. There’s nothing wrong with that, and it had some nice advantages especially when it comes to generating SQL. It works great with ActiveRecord because it’s an abstraction layer tightly coupled (by design) to relational databases.

      With Veritas I wanted something that would model each relational algebra (RA) operation in an AST, and then allow me to translate those to SQL or other query languages. I think RA is absolutely beautiful, and I wanted to use that as a foundation for the queries rather than something less consistent. I did not want something tightly coupled to one datastore, but something that can be used to describe complex queries in a more abstract way.

      Veritas has an interesting design, which I don’t think has been executed in quite the same way before. In Veritas each node in the AST may be wrapped in a datastore specific adapter object. The adapter can use the RA operation to generate an SQL query (for example) and execute it. Each node knows to process results in-memory (so for example a Join knows how to join the left and right operands in memory). The beautiful thing is that we can ask the tree to evaluate itself, the native queries are executed, and as the data propagates to the “root”, we can process anything remaining in-memory. Essentially we can do things like cross-datastore joins with ease.

      This might sound really complex, but the beauty of RA is that each operation is small, simple and only has one function. Here’s an example of the Join operation, probably one of the more complex operations: https://github.com/dkubb/veritas/blob/master/lib/veritas/algebra/join.rb and for contrast here are the Intersection https://github.com/dkubb/veritas/blob/master/lib/veritas/algebra/intersection.rb and Union https://github.com/dkubb/veritas/blob/master/lib/veritas/algebra/union.rb operations. As you can see these are extremely simple, yet so powerful.

      I haven’t even mentioned the coolest part yet, the veritas-optimizer gem. It can “walk” the tree of operations and rebalance it into something equivalent but more efficient. I could say more about this, but it’s worth checking out more.

      TLDR; DM1 already has all the pieces that implement the Active Record pattern in place; everything and more than what can be found in ActiveModel and other validations gems. What we’re working on now is the next step in the evolution of the library.

  • http://mislav.uniqpath.com mislav

    Not a fan of the new naming scheme.

    • http://solnic.eu/ solnic

      FYI these are codenames

    • http://github.com/dkubb Dan Kubb

      We chose code names for many pieces to discourage usage except by the early adopters and the people helping get things in order. Many of the libraries will be renamed when we’re ready to actually start the alpha/beta process.

  • wojtekmach

    Hi,

    I like the idea of having separate classes of User and User::Mapper. I don’t see any examples of how to define validations in this new convention, though. Do you have any plans for that? I guess this is tricky since some of the validations are purely business requirements, and some are tied to the database (like uniqueness).

    • Emmanuel Gomez

      The example under the ‘Aequitas’ heading shows an example of inline validations. Many validations can be declared in line with the attribute declaration itself as shown in the example (eg., attribute :title, String, :required => true appends an instance of Rule::Presence::NotBlank to the Book.validation_rules collection). Additional validation rules can be added using class methods like validates_presence_of, validates_length_of, etc.

      Aequitas currently only deals with validating business-logic, per your description. Uniqueness or anything else that requires checking the datastore to verify is currently unsupported. I’m not sure yet how those types of issues will be addressed: one possibility is for them to be handled via constraints functionality built on top of Veritas. This would be part of Veritas (or a separate library) which handles data integrity constraints: uniqueness, foreign key references, etc. 

      This is an open area of discussion. Feel free to weigh in with feedback if you have an opinion on the matter!

    • http://solnic.eu/ solnic

      As Emmanuel wrote, this will be handled by the constraints layer, it’s briefly described in the roadmap: https://github.com/datamapper/dm-core/wiki/Roadmap

  • craig bowes

    I think the attributes should be defined in the model in the usual ruby way, either as @instance:disqus variables or attr_reader / attr_accessor or ruby properties.  The identity, int, etc. stuff should be in the mapping class, not in the model.  You should be able to map any instance variables or property to a field using the mapper, not have ORM specific constructs in your ruby model.  Looks good though.  

  • sabereent

    I’ll start helping with some PRs soon. I can’t stand AR anymore. I know I could create a Mapper layer using AR, but that only makes things tougher.

  • Oren Dobzinski

    Any plans for supporting prepared statements?

    • http://github.com/dkubb Dan Kubb

      Yes. When I designed the RDBMS adapter for Veritas I was thinking about how to support prepared statements and designed it so it should be easier to add in the future. I still need to work with Dirkjan on DataObjects (our database driver) to get it supported there, but once it is I don’t think it’ll be to difficult to integrate.

  • http://profiles.google.com/qertoip Piotr Włodarek

    This is exactly what we need in any non-trivial Rails app. As soon as DM2 gets alpha/beta I will happily show the middle finger to AR in the new projects. Or perhaps it is already somehow usable for PostgreSQL? If so, the alpha-tutorial on how to tie all those things together would be very helpful.

    • http://solnic.eu/ solnic

      Heh :) It’s not yet usable as Veritas and its DO adapter are “read-only”. Something usable should be ready later this year though (unfortunately I said the same thing last year heh).

  • http://twitter.com/jistr Jiří Stránský

    The architecture looks excellent. Very clean. Can’t wait to use it :)

  • Pingback: The Greenfield App Continuum – Ernie Miller

  • Alex

    I hope Class Table Inheritance will be there. I am wondering why ruby world ignoring this pattern.

  • http://twitter.com/plexus Arne Brasseur

    I very much wish this was already here! Two small errata : in your first code example

    relation_one = Veritas::Relation::Base.new(‘one’, header_one, tuples_two)

    I suppose that should be tuples_one?

    And the link to your post about Virtus is broken, it seems the two dashes have turned into an mdash.

    Keep up the good work!

    • http://solnic.eu/ solnic

      I just fixed the issues you mentioned. Thanks for spotting them! :)

  • Pingback: 097 RR Book Club: Patterns of Enterprise Architecture with Martin Fowler