Blog: December 2020

Most of these posts were originally posted somewhere else and link to the originals. While this blog is not set up for comments, the original locations generally are, and I welcome comments there. Sorry for the inconvenience.

GitHub graph

My GitHub history got a lot less sparse in 2020, and especially in the last few months of the year. It's great to be a productive member of my first open-source team!

activity graph

2020

Somebody on Twitter asked:

What did you learn in 2020 (besides how to make bread)?

I responded there:

  • To grow food in pots.
  • To cut men's hair.
  • To cook more new things.
  • That my cat loves me being home all the time.
  • More about community-building.
  • How to set up a nonprofit foundation.
  • To cut people w/no morals or human decency out of my life.
  • And yes, sourdough.

I was up against a character limit there, but I'm not here. Read moreā€¦

Sad news from AEthelmearc

Master Remus Fletcher, who was an instigating force in music in the Debatable Lands and at events across the kingdom and beyond, died on Friday. The obituary talks some about his SCA participation, and there'll be an AEthelmearc Gazette post.

This is such sad news. Remus encouraged music and was sometimes a one-person source of ambience. During events, if there was no other entertainment happening, he would sit in a corner and play. He was happy to put instruments in curious people's hands and teach. Some of the people he drew in went on to surpass him musically, but I never got the sense that he felt threatened by that -- he just wanted there to be more music. Before there was a Debatable Consort, Remus showed up at fighting practice every week with packets of photocopied music and a bag of recorders and the Consort grew from that. He was part of the Debatable Choir during its early days, and sang individually at events frequently.

Remus was friendly and welcoming to all. He encouraged people he knew to reach higher, to stretch, but he didn't judge -- he invited, never criticized. I will miss him.

Gmail update

Someone who can self-identify if desired shared Google's summary of the recent email outages (PDF). This is the outage that caused my address (and many others) to start sending permanent bounce messages.

Background: The Gmail SMTP inbound service uses a configuration system that allows specific service options and flags to be changed while the service is already deployed in production. The "gmail.com" domain name is specified as one of these configuration options. An ongoing migration was in effect to update this underlying configuration system to meet Google internal best practices.

A configuration change during this migration shifted the formatting behavior of a service option so that it incorrectly provided an invalid domain name, instead of the intended "gmail.com" domain name, to the Google MTP inbound service. As a result, the service incorrectly transformed lookups of certain email addresses ending in "(at)gmail.com" into non-existent email addresses. When the Gmail user accounts service checked each of these non-existent email addresses, the service could not detect a valid user, resulting in SMTP error code 550.

[...]

To guard against the issue recurring and to reduce the impact of similar events, we are taking the following actions:

  • Update the existing configuration difference tests to detect unexpected changes to the SMTP service configuration before applying the change.
  • Improve internal service logging to allow more accurate and faster diagnosis of similar types of errors.
  • Implement additional restrictions on configuration changes that may affect production resources globally.
  • Improve static analysis tooling for configuration differences to more accurately project differences in production behavior.

Ouch.

Fixing things in production systems is hard. I've been there; things can go wrong, sometimes badly wrong. I'm used to thinking of Google as having near-infinite resources, including a replica of their production system to test changes on. Perhaps that's unrealistic.

Not the customer but the product

There's apparently another widespread Gmail outage, but this one is more harmful -- it's lying to senders about addresses being invalid (permanent error).

This might be the swift kick in the rear that I needed to figure out a different approach to email. I have a domain, so I should set up a single "collector" address there to receive everything I'm currently forwarding to Gmail (which I'll have to hunt around for; Pobox is easy but not the only one). I hadn't done that before because I thought that relying on Google (a huge, hardened service) was a safer bet than relying on my domain -- what happens if my domain gets hijacked, my hosting company compromised, etc? Rethinking that now...

Fortunately, I'm already forwarding Pobox to an address on my domain, a backup for Gmail, so I probably haven't lost anything. But I might be getting silently dropped from mailing lists I cared about. We'll see.


Ok, I think I now have everything going to one mailbox on my domain and, from there, mirrored to Gmail for now. I'd like to have all my mail in one place, but the last download of my Gmail mailbox was a 10G file in mbox format, which I don't know how to read or plug in to something else. (I mean, obviously that's a standard format, but what can I use on my Mac to read it?) I don't really want to store all that on my domain server long-term (it'd raise my storage costs), but there's probably a lot of junk in it, mixed in with the stuff I care about. I'd already done some passes to, for example, nuke years-old mailing-list threads that I don't care about now, because Google has storage limits, but that's time-consuming.

I welcome input from people who've wrangled large mailboxes, domains, and email more generally.

Learning to use the manual camera settings

When I've taken pictures of the chanukiyah in the past, I've usually been disappointed by how blurry the flames look. Photographing flames in a darkened room is apparently challenging -- it's not just me. I asked a question about it a while back on the Codidact photography community and got some interesting advice.

I've been experimenting this season. Here's one from tonight that came out decently well:

photo, 5 candles, window reflection

The camera settings were:

  • Shutter speed: 1/90
  • ISO: 1600
  • Exposure: 0 (I don't know what this means; it's a scale from -2 to +2)

The other settings I have available are named:

  • White balance: (scale of pictograms of sun, light bulbs, etc)
  • Interval(s): scale from 0 to 60
  • Focus: picture of flower, 25/50/75%, picture of mountain

I left those set to "auto".

I can make guesses (based on the scales) about white balance and focus, but "interval(s)" has me stumped.

Google security question

Dear brain trust,

I have an Android tablet. As with my phone, I use it with my Google account. My account confirms new sign-ins or other access grants by sending a confirmation to my phone (so I have to say "yes it was me" there before the sign-in completes on another device). This is all good.

Google also sends that confirmation to the tablet. How do I disable that part, while still remaining signed in on the tablet? I want to use it, but I don't want it to be a source of trust. I've been through the Google security settings and I don't see a way to do this -- a way to say "trust it to be signed in but don't trust it to grant trust".


From comments, apparently this is no longer possible (but once was). What gives, Google?

"Blah blah blah."

Today's bit of randomness:

When I was a young programmer I worked for an AI company on a text-categorization project -- for a commercial client, all hush-hush for a while to preserve their competitive advantage and such, apparently really innovative (didn't realize then; I was just writing code to solve a problem, y'know?). Then somebody accidentally published the training dataset. And apparently it's gotten quite a lot of use in the research community, which I was completely unaware of, having never really been that kind of researcher.

For 30+ years there's been a mystery in that dataset that people have noticed, commented on, and apparently never tried to track down...until now. This podcaster got in touch with me and some others last week, and here's the result: Underunderstood: The Case of the Blah Blah Blahs. (36 minutes; has transcript).

It was neat to hear this trip down memory lane, and also to hear other parts of the story I'd never known about before, including the discussion from a researcher from the "other side" of one of the big arguments in AI in the 80s.

Our legacies are not always what we think they will be

In the mid-80s, in my first full-time position after college, I worked for a now-defunct software company doing artificial intelligence, specifically natural-language processing. The most significant project I worked on while there was a text categorization system. I was the tech lead (this was 1987ish). The client was Reuters, who at the time had literal rooms full of people whose job was to skim news stories coming over the wire, attach categories to them, and send them back out quickly. Our job was to automate that -- or, more realistically, to automate the parts that machines could do and send a much smaller set of "don't know" cases to humans. I'm writing this from memory; it's been more than 30 years and details are fuzzy.

I left that company and went on to do other things. I was vaguely aware that, at some point, the corpus of news stories we used for training data had been released publicly, by agreement between Reuters and my then-employer. I wasn't a researcher, wasn't in the NLP business any more, and lost touch. Technology moves on, and I figured our little project had long since faded into obscurity.

Tonight I got email with a question about that data set. My name is in the README file as one of the original compilers, and somebody tracked me down.

Somebody still cares about that data set.

I Googled it. Our data set was popular for close to a decade, during which time people improved the formatting (SGML, baby!) and cleaned up some other things. It spawned a child -- the original either had, or had acquired, some duplicate entries, and the new one removed them. (The question I got was actually about the child data set.) And now I'm curious about the question I was asked too, because I either don't know or don't remember how it got that way.

Neat!