Recently I had a breakthrough in understanding software design. Up until then, I could confidently write code to fulfill an obvious task at hand, but I had less experience with work that spanned multiple levels of ambiguous functionality. So when I was tasked with a feature to generate an Excel report with multiple tabs of data, my instinct at the time told me, “You should make a class for each tab.” I would later realize that these tabs were more related than they were different.
We were building an application for an energy company to notify their customers (via email, text, and phone) ahead of a potential power shutoff. Additionally, the application must create a report that details exactly who received communication and if they acknowledged receipt. That’s where the multi-tab spreadsheet comes in. Each tab contained a different level of granularity of the data. By zip code, how many people received communication? For each electric meter, did the customer acknowledge receiving communication, and at what time?
We decided to create service classes called “tab generators” for each tab. I was comfortable with this level of abstraction, but in a minute we’ll see how it falls short. The classes all shared the same sources of raw data: the customer information owned by the energy company and the results of the communications sent out by a third-party system for mass notification. The differences involved knowing what to do with that data and munging it to the appropriate level of granularity.
For example, we had one tab called that lists every single person (a candidate) who was notified. A candidate corresponds to a meter record and a contact record in the database. At one point, we had Ruby code that looked something like this:
1 def call 2 sheet.name = TAB_NAME 3 sheet.insert_row(0, headers) 4 ReportingHelper.format_header(sheet.row(0)) 5 rows = build_candidates_details 6 rows.each_with_index do |row, index| 7 sheet.insert_row(index + 1, row) 8 end 9 end 10 11 def build 12 data =  13 activation_report_data.each do |activation_uid, candidates_data| 14 # A mass notification event is called an "activation", and there can be many activations 15 # activation_report_data is the data received from the third-party notification system 16 candidates_data.each do |candidate_data| 17 # A candidate is a single person who was notified and candidate_data is the results of the notifications to this person 18 meter_id = candidate_data["MeterId"] 19 contact_id = candidate_data["ContactId"] 20 meter = Meter.where(id: meter_id) 21 contact = Contact.where(id: contact_id) 22 23 meter_data = ReportingHelper.meter_cells(meter) # Returns an array of fields on the meter 24 25 cells =  26 cells.push(contact.first_name, contact.last_name) 27 cells.push(*meter_data) 28 cells.push(*messages_data(candidate_data)) 29 30 data.push(cells) 31 end 32 end 33 34 data 35 end 36 37 def messages_data(candidate_data) 38 [ 39 ReportingHelper.pickup_datetime_for(candidate_data), 40 candidate_data["Device Type"], 41 ... 42 ] 43 end
This simplified snippet is a just a peek into what we were dealing with. So what issues did we encounter down the line?
- The tab generator class is doing way too much. It parses data from one source (L37-43), queries more data from another (L18-21), puts it all together, and writes it to the sheet (L3-8). We’ve now set an unfortunate pattern for mixing business logic and presentational logic.
- It knows the internal details of multiple sources of data. For example, we expect fixed strings from activation_report_data, like “MeterId” and “Device Type”. If the format of that data changes, we’ll need to make adjustments all over the place.
- There is shared functionality extracted in ReportingHelper, but not in a meaningful way. That class holds hard-to-find methods for both styling the sheet and munging data.
- Not shown here, but testing became exceedingly difficult with each new use case. We had a shared context for testing that contained sample data from the third-party notification system. But as we added new test cases, it became bloated and affected specs in other files.
Everything above made the code painful to change, even for something as simple as switching the order of columns in the spreadsheet. Remember, we repeated this pattern for all the other tabs as well. There’s a reason that writing flexible code is best practice. We encountered our first curveball when the client decided to use a different type of report in the third-party notification system, that returned differently formatted data. We found ourselves changing those strings from point #2 across multiple files. That was our first code smell.
Next, we discovered from our client that we couldn’t query the Meter and Contact tables because of their nonstandard database practices. That’s a story for a different day, but the TLDR is that we had to change our sources of data, but those tab generators were tightly coupled to the implementation of those models. Now we’ve seen this problem twice.
With the help of my teammate and manager Mercedes, we realized we needed to refactor the entire feature. The missing piece, as you may have guessed, was proper abstraction. The concept of tab generator seemed fine until we faced all the work required to get from raw data to usable metrics. Now it was stretched too thin, doing too much, and making change requests unnecessarily difficult.
While there’s too much code to show, I can tell you our approach for the refactor. We identified that the tab generator classes did the following three things instead of one:
- Finding and parsing the correct data. We solved this by creating a “data adapter” class that consumed the various sources of data and transformed them into a consistent data structure with only the necessary information to pass around.
- Doing duplicated business logic. Instead of having a ReportingHelper class that housed tenuously related functions, we designed more intentional and shareable interfaces for this work.
- Presenting the final sheet. We still had tab generator classes, but now they only focused on displaying data in the correct format and order.
And that’s it! An example of learning abstraction in the real world. Around that time, I read Practical Object-Oriented Design in Ruby by Sandi Metz. Suddenly I began to see design patterns in my everyday work. However, she urges developers not to design prematurely, as reaching for the wrong abstraction is more expensive than any duplication of code. I try to have self-compassion for that original implementation, knowing that I did the best I could with the knowledge I had. This refactor came at the perfect time for me to get the most benefit, in no small part because of the support from my manager. She found the right scope of work for me start thinking about higher-level software architecture. Most excitingly, I feel more my confident in my intuition to ask, “Hey, this code isn’t working for us anymore. Why is that?” and figure it out from there!