How to Trace High Disk IO?

It’s possible something like syssnap might catch it. I mean monitoring systems all work on polling as far as I know so it’s surely happening for long enough periods of time that your current monitoring picks up on it.

My logic may not be sound.

2 Likes

Maybe try something like this: How to Monitor Disk IO on Linux Server with Iotop and Cron - BinaryTides. After you see it spike in your monitoring, check the iotop logs you’ll be keeping to track down the cause.

2 Likes

I’d also suggest setting up sar, if you haven’t. It’ll help you break it down further- depending on your virtualization. ioperf/iotop/binfalse may all work for you - just need to peg it down. Make cron send you a note, et al…

3 Likes

apt-get install glances

glances


apt-get install iotop

iotop


2 Likes

What do these do for your node specifically?

It’s basically a way to control page writes to the disk and RAM. The lower the number, the faster it’s going to start thrashing a bit more, but it’s going to ensure things aren’t in an ugly state. I run softraid on a lower end CPU, so I am a bit paranoid about my data safety.

I also want the most performance possible from this ~12 year old hardware, so if I/O isn’t being an issue, I’ve got swappiness set very low, so it won’t page out if unnecessary.

I’d definitely recommend Netdata to help track it down. One of the charts is disk IO per application, which should help you figure out where the disk activity is coming from.

By default it’ll keep about an hour of data at 1 second granularity (consumes ~10 MB RAM for one hour of data for 1000 metrics) which should be sufficient to track down something like this, but you can increase that if needed, or log the data to Prometheus for long-term storage.

Here’s an older thread about it: Netdata - Awesome System Monitoring Tool

4 Likes

NetData works very well, been using it last month

1 Like

Secondly, are you running this on the VMs and / or the host?

1 Like

Hypervisor.

1 Like

Interesting, why not inside the VMS?

1 Like

@Munzy

Glances is awesome, InfluxDB is blazing fast, while still low on resources.

I only control these VMs personally. I mostly aim for something that’s easy to push out and keep things playing nicely.

I don’t play with other folks’ builds; as long as you aren’t abusing services- just keep on keeping on.

1 Like

it reads like a rapist’s police file

1 Like

But Linux has a code of conduct now!!!

2 Likes

If you did control those VMs, would you change there swapiness value to the ones you posted?

Depends on their use; I run my VPS pretty tight but without overrunning them most of the time. All of the schedulers on the different services may work with, or against eachother. I tend to run my lower end client services with default settings except basic TCP tuning and possible RAM/etc scrubbing as above.

Most of the time I just tune the Hypervisor to handle for my VPS needs. I’ll tune it as time goes on, but the above works really well with a handful of ~2GB services and a behemoth webserver that basically caches all of the hundred-some domains because meh.

2 Likes

Do you have Swap on zvols? What is the average RAM utilization %?
What is your ZFS ARC max size?
There are some known bad edge cases around OOM when using zvol as swap. 5.2 or 5.3 installer onwards allows you to leave a couple of GB or whatever as un-partitioned space where you can manually create a raw swap partition(linux).

I haven’t had the time to take a deeper look on the issue.

Right now memory is at 95% :dizzy_face: But it could be because I’m taking a snapshot to move a VM. VMs should use 14GB RAM in total.

ZFS ARC max size is 16GB (the host has 32GB RAM)

2/2 VMs have the “discard” setting set to ON. Is that a problem?

Disks are HDDs.

https://www.linuxatemyram.com/