Originally I set out to write a blog post on how to combine SmarterCSV and activerecord-import to efficiently parse and import CSVs into ActiveRecord models in Ruby. After a few short paragraphs, I realized how much boiler plate there was, and I hate typing out boilerplate 😡. So, I spent a little time wrapping it up in a gem that lets folks focus more on working with their data and less on the nuts and bolts of parsing CSVs and putting records into the database 🙂👍💎.
For both of our examples, we will be using “College Scorecard” data from data.gov to import a list of colleges.
To keep the example as simple as possible, let’s assume our
College model has no database constraints or significant validations.
All we need to do is make a fresh class that subclasses
column_mapping, and call
import. Let’s break it down a bit.
import_intotells RowBoat what model to import the records into.
column_mappingmaps CSV column names to names you want to interact with. Here, we’re using a hash where the keys are lowercased symbol representations of the CSV headers, but there’s a lot of room for configuration here.
importis a path to a CSV or a CSV file or anything SmarterCSV accepts 😀.
Now we’re ready to call
BasicCollegeBoat.import(path_to_our_data) and import the colleges!
Too easy? Let’s turn up the difficulty and assume we have a unique index on
ope_id. That way, if we run the code above we’ll now get an error 😞.
Not to worry! Assuming we’re using Postgres, we can just add a few lines of configuration by implementing
options. This is where we can configure a lot of aspects of the underlying gems (SmarterCSV and activerecord-import). In this case, we just need
on_duplicate_key_ignore and our code ends up looking like the following.
No more uniqueness violations! Of course, depending on your situation, you might prefer a different approach from ignoring the duplicates. (The complete file can be found in the example repo here.)
Okay, we added a basic importer, but let’s dig a little deeper. If you ran the “Basic Import” in the example app, you may have noticed that some of our
inst_urls were all uppercase and that a few
npc_urls just read “NULL”. Plus, our hypothetical requirements have just changed so that we now need to ignore any rows where the
ope_id doesn’t end with
00. Fear not! RowBoat makes this easy.
Let’s start with filtering based on
ope_id. First off, SmarterCSV is pretty clever and will try to convert strings that look like numbers into numeric types. For this feature though, it’ll be easier if we’re always dealing with strings. So, to opt out of this behavior we can just add
convert_values_to_numeric: false to our hash in the
Now that we know we’re always dealing with a
String, let’s implement
preprocess_row. Just by
nil from this method instead of the row we’ll be telling RowBoat to skip this row. That way we can write something like
Awesome, now we can get started on making sure the
inst_urls are lowercase and that the
npc_urls don’t end up containing that gross “NULL” string. While we can just add to our
preprocess_row method, RowBoat encourages us to keep it lightweight by allowing us to implement
value_converters. In this method, we’ll tell RowBoat to use a lambda to adjust our
inst_urls and a method to use for the
npc_urls (check out the docs for a complete list of everything you can do with
Bam! Now we’ve covered all of our bases with a fairly small amount of code. Check it out below or here in the example app.
If you have any problems or would like to suggest (or even better, add) a feature, please open an issue or a pull request on the RowBoat repo.
Thanks for reading and have fun rowing your own boat!