Development

Hello RowBoat: CSV Rows to DB Rows

Michael Crismali — Tandem Alum

last updated August 8, 2017

Originally I set out to write a blog post on how to combine SmarterCSV and activerecord-import to efficiently parse and import CSVs into ActiveRecord models in Ruby. After a few short paragraphs, I realized how much boiler plate there was, and I hate typing out boilerplate 😡. So, I spent a little time wrapping it up in a gem that lets folks focus more on working with their data and less on the nuts and bolts of parsing CSVs and putting records into the database 🙂👍💎.

And that, my friends, is how Row Boat was born! Check out the README and the documentation for more information or keep reading for a few quick examples (here’s the repo that contains the examples).

For both of our examples, we will be using “College Scorecard” data from data.gov to import a list of colleges.

Basic Example

To keep the example as simple as possible, let’s assume our College model has no database constraints or significant validations.

All we need to do is make a fresh class that subclasses RowBoat::Base, implement import_into and column_mapping, and call import. Let’s break it down a bit.

import_into tells RowBoat what model to import the records into.
column_mapping maps CSV column names to names you want to interact with. Here, we’re using a hash where the keys are lowercased symbol representations of the CSV headers, but there’s a lot of room for configuration here.
import is a path to a CSV or a CSV file or anything SmarterCSV accepts 😀.

https://gist.github.com/crismali/729614a0d9f1c463102b86309e179039

Now we’re ready to call BasicCollegeBoat.import(path_to_our_data) and import the colleges!

Too easy? Let’s turn up the difficulty and assume we have a unique index on ope_id. That way, if we run the code above we’ll now get an error 😞.

Not to worry! Assuming we’re using Postgres, we can just add a few lines of configuration by implementing options. This is where we can configure a lot of aspects of the underlying gems (SmarterCSV and activerecord-import). In this case, we just need on_duplicate_key_ignore and our code ends up looking like the following.

https://gist.github.com/crismali/67723c4695a5c57817b5577c964889ac

No more uniqueness violations! Of course, depending on your situation, you might prefer a different approach from ignoring the duplicates. (The complete file can be found in the example repo here.)

Advanced Example

Okay, we added a basic importer, but let’s dig a little deeper. If you ran the “Basic Import” in the example app, you may have noticed that some of our inst_urls were all uppercase and that a few npc_urls just read “NULL”. Plus, our hypothetical requirements have just changed so that we now need to ignore any rows where the ope_id doesn’t end with 00. Fear not! RowBoat makes this easy.

Let’s start with filtering based on ope_id. First off, SmarterCSV is pretty clever and will try to convert strings that look like numbers into numeric types. For this feature though, it’ll be easier if we’re always dealing with strings. So, to opt out of this behavior we can just add convert_values_to_numeric: false to our hash in the options method.

Now that we know we’re always dealing with a String, let’s implement preprocess_row. Just by returning nil from this method instead of the row we’ll be telling RowBoat to skip this row. That way we can write something like

https://gist.github.com/crismali/bc3bc50579fb54438281de381fe36077

Awesome, now we can get started on making sure the inst_urls are lowercase and that the npc_urls don’t end up containing that gross “NULL” string. While we can just add to our preprocess_row method, RowBoat encourages us to keep it lightweight by allowing us to implement value_converters. In this method, we’ll tell RowBoat to use a lambda to adjust our inst_urls and a method to use for the npc_urls (check out the docs for a complete list of everything you can do with value_converters).

https://gist.github.com/crismali/4c2a28bdceeda414d51b2d7dfe53f624

Bam! Now we’ve covered all of our bases with a fairly small amount of code. Check it out below or here in the example app.

https://gist.github.com/crismali/64eccb8fc17701e4273e0f6ad8878e4d

If you have any problems or would like to suggest (or even better, add) a feature, please open an issue or a pull request on the RowBoat repo.

Thanks for reading and have fun rowing your own boat!

Tandem – software development companies in Chicago with practice areas in digital strategy, human-centered design, UI/UX, and web application and custom mobile development.

Michael Crismali