Get Rid of That Code Smell – Primitive Obsession

This is a post from the Get Rid of That Code Smell series.


Primitive Obsession is another popular code smell in Ruby land. It’s very easy, tempting and just feels convenient to use primitive objects to represent various concepts in our code. Here are some primitive classes in Ruby that we like to be obsessed about:

  • Array
  • Hash
  • String
  • Fixnum
  • Float

Whenever you use one of these classes in a context where they don’t actually fit being semantically incorrect, that’s when you introduce Primitive Obsession code smell. Nothing explains it better than a few simple examples:

  • a string representing a URI
  • a float number representing money
  • a hash representing a set of objects

Think about it this way: would you use a string to represent a date? You could, right? Just create a string, let’s say "2012-06-25" and you’ve got a date! Well, no, not really – it’s a string. It doesn’t have semantics of a date, it’s missing a lot of useful methods that are available in an instance of Date class. You should definitely use Date class and that’s probably obvious for everybody. This is exactly what Primitive Obsession smell is about.

Detecting Primitive Obsession Smell

Unfortunately I’m not aware of any tools that would know how to detect this smell so you need to rely on your own analysis. If you have a good understanding of the domain you’re dealing with it should be fairly easy to detect the smell. Simply look for cases where primitive ruby objects are being used intensively in domain specific contexts. Monkey patching primitive classes could also be considered as a smell indicator.

Here’s an example of the smell from Virtus project, a hash with attributes:

What happens here is that we use Hash to represent a set of attributes. It just feels convenient to use a hash here, we want to access attribute objects via #[] method and hash gives us that for free. What we forget about is that we don’t really need a hash, we need something with set semantics and methods to add and merge attributes. Building such API on top of Hash is a mistake.

Removing Primitive Obsession Smell

It’s easy, just come up with a rich object that has correct semantics and exposes API that you need. The attributes hash in Virtus was replaced with an instance of Virtus::AttributeSet.

Here’s its piece (full source is here):

The class exposes smaller API than Hash and at the same time has additional responsibility that we needed (dealing with attributes from a parent). With a primitive hash object we would have to put this responsibility in a place where it wouldn’t really belong. win-win.

Summing Up

In OOP it’s important to use rich objects representing various concepts from your domain. Implement Money class if you need to deal with money, it’s much better than using floats all over the place. Don’t use Hash for configuration objects, use a custom Configuration class instead. Have fun with GeoLocation class instances rather than having pairs of latitude and longitude values embedded inside other objects. This way you will achieve better encapsulation and slim API.

Next stop: we’re going to take a look at Feature Envy.

  • http://twitter.com/Rev_Kachowski r.kachowski

    In essence, you’d be duplicating the functionality of data structures that exist in the standard library every time a domain specific requirement arose. 

    Aside from the extra potential for defects to arise due to multiple implementations of the same functions, you’d also be requiring developers to comprehend new interfaces and structures before interacting with the functionality.

    • http://solnic.eu/ solnic

      No, of course not. It’s adding new behavior on top of what language has to offer so that using your objects is more convenient and feels “natural”, so to say.

    • Paul Leader

      With the availability of things like the Enumerable mixin I’m not sure I’d really call this duplication of functionality. It’s not as if you have to rewrite all the methods of Hash.

      The advantage is that you now have an abstraction that can be extended to provide additional facilities without totally breaking the existing interface. In a large system that could be incredibly useful.

      A simple example would be Rails’ HashWithIndifferentAccess, which is a relatively small extension on the Hash primitive, but provides a really useful improvement.

  • http://rendion.myopenid.com/ render

    I think the industry is moving in the opposite direction of this article.  First of all the date vs string notion is a fallacy.  

    For example, ALL as in ALL fields in a UI are strings, its up to the programmer to determine for validation or storage or business logic purposes when to convert that into a DATE.

    Thats a simple case, the collection case is also simple, most devs are not going to write or enhance collections in any language and thats a good thing.

    Encapsulation of collections is the standard practice.  If a devs natural tendency is to write a collection when a standard one would do just fine, I wont hire them.

    Sorry I think this article is off.

    • http://solnic.eu/ solnic

      The problem is that primitive data types are used too often in Ruby. That’s what I’m writing about here. I worked on projects where everything was either a string, number, array or hash and with a bigger system it’s really terrible. It’s hard to tell what kind of objects you’re dealing with and what interfaces you’re supposed to be using.
      I’m also not saying “wrap every collection in a custom class”, that would be silly over-engineering of course.

      I do say though that when you can identify a specific domain concept that can be represented by a specialized object that has custom behavior then it’s better to use a custom class.

      I probably should mention there has to be a balance between using primitive types and custom ones. It’s not like using what language has built-in is wrong. I never said that.

      Oh and regarding date…if a dev natural tendency is to use a string to represent a date, I won’t hire him :) Well, not in Ruby world at least.

  • Michael

    so you trade 15 lines including documentation for 35 lines without documentation which are not even integrated into the ‘original’ code.

    Don’t get me wrong, I’m not trolling around here, but I don’t think that the refactored code is any better, even worse.
    Primitives are good. They are tested and everybody knows how to use them.

    • http://solnic.eu/ solnic

      New class can be probably simplified by using a hash internally (I can’t comment on the specifics of this implementation as I’m not the original author of this code).

      This really boils down to the fact we needed a set of attributes that has custom behavior and using a primitive object to achieve that would be a code smell.

      And again, I never said primitives are bad :)

      • Michal

        As you said I’d use Hash internally in ArgumentSet (at least in the first try) and delegate calls I need to Hash API. It’d make this code only just a bit longer than original one.

  • http://henrik.nyh.se Henrik N

    You shouldn’t use a float for money because of rounding errors, but I would be fine with a BigDecimal unless you need more than it offers.

    But anyway, I wouldn’t consider it a code smell to use e.g. a string for a URL, or a BigDecimal for an amount of money. If you ever need to do URL-specific things to the string (like extracting the query parameters), or validate its format or something, then a specific object makes sense. Before that, it seems like needless complexity.

    If you take it to its extreme, you’d wrap a foo_count in a Count class and a username in a Username class. You wouldn’t, of course, unless you needed that wrapper for something, and I’d say the same applies to e.g. a URL or an amount of money.

    For contrast, I believe Rich Rickey argued for using primitives more – maybe it was in “Simple Made Easy”?

    • http://solnic.eu/ solnic

      Yes it was “Simple Made Easy” and I totally agree with that talk.

      I also agree with your comment. The need for custom, domain specific behavior is the reason why you need a richer representation of something.

  • Renato Zannon

    I’ve seen this happening so many times it’s not even funny… Mostly for domain concepts that are serialized as YAML.

    I guess it’s specially easy to fall into this trap in ruby, due to the functionality you get free from the core classes.

  • Paul Leader

    This is one of those code smells that I’ve been very guilty of over the years. I think the problem is that the Ruby primitives are so powerful that it’s easy to forget that adding an abstraction can have real benefits.

    Like all patterns you can take it too far, but it’s definitely something to consider.

  • http://www.facebook.com/bloodycelt.fb Jason Kenney

    Or you could create an accessor method that checks if the object is Date, Symbol or String, if its a string or symbol, convert to a date. And have the model return a date. Reflection is there for a reason. ex…

    def start_at=(value = nil)
    @start_at = case value.cass
    when Time, nilClass
    value
    else
    Time.parse(“#{value}”)
    end
    end