I just had to roll the site back 5 hours. From what I can tell, maintenance on the hypervisor led to a corrupted file system on the main instance. I tried to recover it, but it was too far gone. I restored the latest backup and we’re running clean again
Yep, a regular good ol’ slave database, that gets replicated in real time from the master database.
But just as with RAID1, this should not be confused with a backup solution on its own. It will only come in handy for situations such as this one, where the master suffers some sort of hardware issue, so you have an identical copy of the data ready to go.
I’ll admit I’m not quite as good with postgres as I am with mysql. Added layer of frustration is that discourse is packaged into docker containers now in a very tightly controlled setup. It’s actually a great setup and I’m quite thankful for the ease of management, but it makes redundancy a much larger task.
An alternative could be R1Soft continuous data protection. It would be a bit more expensive, considering R1Soft has been a b*tch lately and keeps increasing the license prices… but all in all, I can’t see HB incremental backups lasting more than maybe a few seconds each, you could virtually back it up every 10 minutes.