It’s possible something like syssnap might catch it. I mean monitoring systems all work on polling as far as I know so it’s surely happening for long enough periods of time that your current monitoring picks up on it.
My logic may not be sound.
It’s possible something like syssnap might catch it. I mean monitoring systems all work on polling as far as I know so it’s surely happening for long enough periods of time that your current monitoring picks up on it.
My logic may not be sound.
Maybe try something like this: How to Monitor Disk IO on Linux Server with Iotop and Cron - BinaryTides. After you see it spike in your monitoring, check the iotop logs you’ll be keeping to track down the cause.
I’d also suggest setting up sar, if you haven’t. It’ll help you break it down further- depending on your virtualization. ioperf/iotop/binfalse may all work for you - just need to peg it down. Make cron send you a note, et al…
apt-get install glances
glances
apt-get install iotop
iotop
What do these do for your node specifically?
It’s basically a way to control page writes to the disk and RAM. The lower the number, the faster it’s going to start thrashing a bit more, but it’s going to ensure things aren’t in an ugly state. I run softraid on a lower end CPU, so I am a bit paranoid about my data safety.
I also want the most performance possible from this ~12 year old hardware, so if I/O isn’t being an issue, I’ve got swappiness set very low, so it won’t page out if unnecessary.
I’d definitely recommend Netdata to help track it down. One of the charts is disk IO per application, which should help you figure out where the disk activity is coming from.
By default it’ll keep about an hour of data at 1 second granularity (consumes ~10 MB RAM for one hour of data for 1000 metrics) which should be sufficient to track down something like this, but you can increase that if needed, or log the data to Prometheus for long-term storage.
Here’s an older thread about it: Netdata - Awesome System Monitoring Tool
NetData works very well, been using it last month
Secondly, are you running this on the VMs and / or the host?
Hypervisor.
Interesting, why not inside the VMS?
I only control these VMs personally. I mostly aim for something that’s easy to push out and keep things playing nicely.
I don’t play with other folks’ builds; as long as you aren’t abusing services- just keep on keeping on.
it reads like a rapist’s police file
But Linux has a code of conduct now!!!
If you did control those VMs, would you change there swapiness value to the ones you posted?
Depends on their use; I run my VPS pretty tight but without overrunning them most of the time. All of the schedulers on the different services may work with, or against eachother. I tend to run my lower end client services with default settings except basic TCP tuning and possible RAM/etc scrubbing as above.
Most of the time I just tune the Hypervisor to handle for my VPS needs. I’ll tune it as time goes on, but the above works really well with a handful of ~2GB services and a behemoth webserver that basically caches all of the hundred-some domains because meh.
Do you have Swap on zvols? What is the average RAM utilization %?
What is your ZFS ARC max size?
There are some known bad edge cases around OOM when using zvol as swap. 5.2 or 5.3 installer onwards allows you to leave a couple of GB or whatever as un-partitioned space where you can manually create a raw swap partition(linux).
I haven’t had the time to take a deeper look on the issue.
Right now memory is at 95% But it could be because I’m taking a snapshot to move a VM. VMs should use 14GB RAM in total.
ZFS ARC max size is 16GB (the host has 32GB RAM)
2/2 VMs have the “discard” setting set to ON. Is that a problem?
Disks are HDDs.