I messed up.

I am not an intern. But, I am a relative newbie to big architecture mono-repos. And in this particular week, I was perhaps insufficiently supervised.

I messed up.

[Relative dates in the first part of the post are off by several weeks, as this post needed some time aging in my drafts folder before publication.]

There's a meme out there about interns pushing to production and breaking things.

I am not an intern. But, I am a relative newbie to big architecture mono-repos. And this week, I was perhaps insufficiently supervised.

What happened

I've written a bunch of code for Ghost over the last couple months. It's been a blast. I got to fix a bunch of things that were bugging me. It was sort of like this, but without the getting hired, paid, or quitting:

Generally, the way it goes is this: I put in a PR. Someone on the Ghost team looks at (sometimes a few weeks later) and tells me to either (1) write more tests or (2) that there's a better way to do what I've done (possibly including scrapping it and starting over). And eventually, after a few rounds of this, I get it to the point where it's hopefully-good, and then it waits for a while for some busy dev to review it, and maybe it gets merged, and I do the happy dance.

That's a simplification. Sometimes I try to put in a PR and discover that I've screwed up package dependencies and it will never pass the CI build process and scrap it and start over. Sometimes I can't get my tests to pass. Sometimes.... You get the idea. It tends to be pretty struggle-y.

But eventually, my stuff gets merged. (Well, a few times my PRs got closed and a better fix got merged. I'm OK with that. I wanted it fixed and it got fixed. Win.)

Last week, the PR I'd worked on for a couple days to add newsletter translations got merged with fairly minimal discussion. "Wow!" I thought, "I've really arrived. A PR that didn't need a bunch of work? I'm rocking it."

Readers, that was not the case.

Problem #1: Feature flags

When I first started adding i18n features, I didn't understand that the i18n feature flag (which is labeled "Portal translations" in the dashboard) was supposed to go on all i18n translations, not just the portal related features. I found out weeks later that I should have been adding it, but at that point, I didn't think much of it because we were talking about removing it a week later, so it didn't make sense to go back and add it, just to turn around and remove it right? Wrong.

That was a mistake, because it meant that if there were any problems with the new code, it wasn't all that easy to yank it temporarily.

Problem #2: Cached settings

In my struggles with dependency injection, I missed resetting newsletter translations when the user changes their language after Ghost starts. That meant that once the language was set, it wasn't changeable until Ghost was restarted. (Or like, never, for managed hosting sites that don't reboot.)

I am happy to say that I found the problem and patched it myself, before anyone else had noticed it. I'm less happy to say that this was after the code had been released.

The initial (very gracious) response from the team was "oh well, no harm no foul, it's a beta feature and the fix can go out with the release next week." Except it was actually not controlled by the beta feature flag. Oops.

Problem #3: Failing to sanitize user input

This was the big one, and the one that I'm most upset with myself about. The locale string that Ghost uses is a freeform text entry box. A week or two prior to putting in the newsletter PR, I'd actually had a Slack conversation with one of the devs about why he should not change it to a dropdown. (Because it also determines the 'lang' setting in the site's html tag, and we don't want to restrict users to only using languages with translations.) So, I absolutely knew in that moment that what was in the 'locale' value was not necessarily a supported locale.

Unfortunately, I didn't still know it a week or so later when I was deciding to translate dates. It's really really hard to hold all the context for a huge codebase in working memory, and I failed to ask myself enough questions about what I was doing to pull the relevant context back to working memory, and which would have allowed me spot the problem.

The main part of the i18n package handles nonsense locales just fine. It just falls back to the English strings, no harm, no foul. Unfortunately, when I added newsletter i18n, I also added localization of dates. And that package doesn't handle nonsense inputs without crashing.

Then it all went South

Which brings us to... Thursday afternoon, shortly after the 5.100 release had been cut 24 hours early in case there were any "y2k"-like bugs associated with the release bumping up to 5.100, so that everyone would still be at work when it happened. That's when the Ghost team discovered that nonsense locales made it impossible to send a newsletter. (The process threw an error and quit instead of sending.) And because of my failure to use the feature flag and the fact that the fix to being able to change language meant you couldn't just change your language to something valid wasn't actually released yet, there wasn't a super easy fix, and I cost two devs a couple hours of work and kept one of them up late dealing with my mess. Oops.

So... if you're wondering why there 's a 5.99.1 and 5.100.1 release? Uh.. yeah, that was my fault.

New goals

  • Move a little more slowly
  • Break fewer things.
  • Sanitize the damn user input. Seriously. I knew better.

I'm relieved to report (now several weeks after the fact) that I'm still having PRs merged, so apparently I haven't been blacklisted or anything. The dev team was very gracious about it, but ugh – new bugs are not the sort of contributions I'd like to make!