Hello,
as some of you may have noticed we just had about 25 minutes of downtime due to the update to Lemmy 0.19.10.
Lemmy release notes: https://join-lemmy.org/news/2025-03-19_-_Lemmy_Release_v0.19.10_and_Developer_AMA
This won’t fix YouTube thumbnails for us, as YouTube banned all IPs belonging to our hosting provider.
We were intending to apply this update without downtime, as we’re looking to apply the database migration that allows marking PMs as removed due to the recent spam waves.
Although this update contains database migrations, we expected to still be able to apply the migration in the background before updating the running software, as the database schema between the versions was backwards compatible. Unfortunately, once we started the migrations, we started seeing the site go down.
In the first minutes we assumed that the migrations contained in this upgrade were somehow unexpectedly blocking more than intended but still processing, but it turned out that nothing was actually happening on the database side. Our database deadlocked due to what appears to be an orphaned transaction, which didn’t die even after we killed all Lemmy containers other than the one running the migrations.
While the orphaned transaction was pending, a pending schema migration was waiting for the previous transaction to complete or be rolled back, so nothing was moving anymore. As the previous transaction also didn’t move anymore everything started to die. We’re not entirely sure why the original transaction broke down, as it was started about 30 seconds before the schema migration query, which seems like that shouldn’t have been broken by that happening at the same time.
Lemmy has a “replaceable” schema, which is applied separately from the regular database schema migrations, which runs every time a DB migration occurs. We unfortunately did not consider this replaceable schema migration in our planning, as we would otherwise have realized that this would likely have larger impact on the overall migration.
After we identified that the database had deadlocked, we resorted to restarting our postgres container, then run the migration again. Once we restarted the database, everything was back online in less than 30 seconds, which includes first running the remaining migrations and then starting up all containers again.
When we tested this process on our test instance prior to deploying this to the Lemmy.World production environment we did not run into this issue. Everything was working fine with the backend services running on Lemmy 0.19.9 and the database being upgraded to Lemmy 0.19.10 schema already, but the major difference here is the lack of user activity during the time of the migration.
Our learning from this is to always plan for downtime for Lemmy updates if any database migrations are included, as it does not appear to be possible to “safely” apply them even if they seem small enough to be theoretically doable without downtime.
Appreciate the transparency
Interesting, I’ve got some reading to do. Thanks for the links.
Best to have fixed downtime for any maintenance like what banks do.
we all do this in our spare time. if we had set working hours then it would be easy to do so, but even then I don’t think a daily maintenance window would be necessary when we don’t changes that frequently.
we believed this change to be doable without downtime, otherwise we would’ve announced it ahead of time.
this change is important for our anti spam measures, especially if we tune it to be more aggressive, which might increase the false positive rate, it is important for us to be able to distinguish removed pms from user deleted pms in case we need to restore them at a later point.
due to that it’s a somewhat urgent change that was fit in where we had spare time available to allow us to continue improving our efforts to combat pm spam effectively.
I understand, I was thinking like it would give you less stress if you had a clean window. For an activity that is done in spare time should be enjoyable and not stress inducing.
:)
Nah, I’d rather the admins upgrade when they want. It’s already a free service no need to make it a full time unpaid job.
This won’t fix YouTube thumbnails for us, as YouTube banned all IPs belonging to our hosting provider.
Isn’t there a way that YT thumbnails could be generated locally by the person posting them who’s then uploading the thumbnail to LW?
(mobile) apps could do this, but I don’t think browser based apps would be able to. the generation of YouTube thumbnails works by requesting the html content of the YouTube page and then extracting a metadata component from it, where YouTube provides the actual preview image as a link. browsers set restrictions on how you can interact with other websites for security reasons and I dint think this would be allowed there.
manually this is of course doable, but it’s rather cumbersome.
Would probably be doable with a browser extension, but that’s quite a hassle.
Bummer
Oh good. It wasn’t me. I thought I somehow broke something.
Thanks!
Well done!
I appreciate you