thread pools) should be able to work from multiple queues and have independent concurrency configuration (e.g. And I think that informs that execution pools (e.g. I believe queues should be organized by maximum total latency SLO ( latency_15s, latency_15m, latency_8h) and not by their purpose or dependencies ( mailers, billing, api). I do think queue design is a place where lots of people do it wrong. Queue design and multi-queue execution pools.And these features are tough because they sit in counterbalance to overall performance: do you want to run jobs faster or smarter? And these are the features that I think are legit because there are other features below under Queue Design that I think are bunk. Similarly to Repeating Jobs, demand is high for everything I’d bucket under “concurrency controls”, which I’ll say covers both enqueue and dequeue complexity. Unique Jobs, Throttles, Fuses and other Concurrency Controls.There’s lots of different ways to design it that I don’t feel strongly about, for example GoodJob minimizes autoloading by keeping schedules separate from job classes, but I do think it is necessary to plan for scheduled jobs in a well-architected Rails application. It took me a while to come around to this in GoodJob, but I believe that performing work repetitively on a schedule (“cron-like”) is very much in the same problem-domain as background jobs. There’s quite a lot of ground to cover, and a lot different systems and tooling: Kubernetes, systemd, rc.d, Heroku, Judoscale, to name just a few of the various operational targets that I’ve spent considerable time supporting. Signal handling, timeouts, daemonization, liveness and healthcheck probes, monitoring and scaling instrumentation. I think it’s interesting that Rails might ship with a 1st party queue backend before it ships with a 1st party webserver: there is a lot of operational overlap. For example, there are very different answers on “when using retry_on SpecialError, attempts: 3 should the 4th error be reported to the exception tracker? What about an explicit discard_on? Should a discard_on error be reviewed and reenqueued or not?” If a job is SIGKILLed/interrupted, should it be automatically restarted or held for manual review? Everyone seems to do it differently! I haven’t cracked the code on what is “ideal” or reasonable to say “nope, don’t do it that way.” Active Job’s error handling isn’t clear cut either, so maybe we can make that better and come around to a more opinionated (but still inclusive) design. I still am dialing this in on GoodJob because there is wide variability of how people and teams manage their error workflows. send to Sentry or Bugsnag) and expose the various reasons an error appears: retried, retry stopped, explicitly discarded, SIGKILLed/interrupted, unhandled error, etc. Everybody has a different workflow for how they deal with errors, and I believe that a backend needs to track, report (e.g. (Human) Exception and Retry Workflows.I do not think jobs should be limited to a specific timeout (as Delayed Job’s design uses) as that also creates significant retry latency when resumed, and jobs definitely shouldn’t be wrapped with a transaction either. That informed my desire to use Advisory Locks (which are automatically released on disconnect), and my future thinking about heartbeats if GoodJob switched over to using FOR UPDATE SKIP LOCK instead of Advisory Locks. Recovering from a SIGKILL (or someone unplugging the power cord) is always number one in my mind when thinking of GoodJob. And I hope they can be the seed to further conversations, rather than a fully realized proposal or argument. It would be nice to keep these in mind when designing a potential successor to GoodJob. These are not intended to be design documents but more a list of things that I have learned or come across during my 3 years working on GoodJob. With that thought in mind, I reflected on some of the design and motivations that became GoodJob, and that I believe are important regardless of the Active Job backend under development. I’m very excited about this! I had a chance to talk to Rosa Gutierrez, who is leading the effort at 37signals, and I’m hopeful that I’ll be able to contribute to Solid Queue and who knows, maybe it could even become a successor to GoodJob. Solid Queue, like my GoodJob, is backed by a relational database. During the conference, a new Active Job backend was announced: Solid Queue ( video), which has the potential to become first, first-party backend in Rails. I attended Rails World in Amsterdam this past week. A new Active Job queue backend, Solid Queue, was announced, and I’m excited to see where it goes! GoodJob, via its introductory blog post, was highlighted last week at Rails World.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |