I have a server that suddenly gets frozen until reboot. It’s a VM running on Proxmox.
Changed from CentOS 7 to 8.
Changed from MariaDB 10.5 to 10.4
And keeps happening.
This is a Proxmox host running in a Ryzen VPS.
Any ideas where to look?
Waaa, downgrading from MariaDB 10.5 to MariaDB 10.4? They introduced a new auth system. So you’re a hero, I wouldn’t do that myself.
Check the logs?
Any idea where to find them?
MariaDB [(none)]> show global variables like 'log_error';
| Variable_name | Value |
| log_error | |
This is the content of /var/log/
Well, maybe its the host?
Or you have queries, that lock your database server up.
You haven’t specified an error log location in my.cnf, you need to.
You can also check live queries using mtop or show processlist;
Thanks will do.
Do you mean mytop? Installing… but the problem happens mostly when I’m not at the computer.
what do you mean by frozen, the service, the whole VM
The whole VM.
is it still accessible via ssh or vnc in that state or not
No SSH connection. Ping stops. VNC shows the login screen and nothing can be done there.
can you match the timestamp to any syslog entries and is there anything
after the freeze
Can’t find /var/log/messages in CentOS 8, I’m going to read where it should be in this version.
what else is running on it other then mariadb
New Relic monitoring agent, qemu agent. But it happened when I was using CentOS 7 with MariaDB 10.5 without those agents.
which hardware specs/options did you use, esp. for disk (thin-lvm, raid, virtio, scsi etc.?), network (virtio, local vs public ip etc.), memory (swap available?)
Host (it’s a VM too):
Plenty of RAM and swap usage at zero on both. No high load detected.
I will update when I set up the logs correctly.
The whole VM.
atop with a 30 second granularity (/etc/sysconfig/atop) to see what’s going on. Sounds like you’ve got a spike in load that you’re misattributing to MariaDB.
Next time it goes down flip through the log,
atop -r /var/log/atop/atop_2020YYMMDD to see what’s happening prior to the lockup.
Should I run systemctl enable/start atop after installing?
Yup after dropping
LOGINTERVAL from 600 to 30.
systemctl enable --now atop does both.
Host (it’s a VM too):
so you are using nested virt? (I think I now remember something in the other thread about the IPs…)
do you have other VMs running in parallel in your proxmox?
my bet would be on something like hitting IO limits or whatever. system not able to read or write properly anymore… mariadb does not has to be the direct cause as
@nem already wrote. could simply add to the problem.
do you know what the underlying storage system is on the real hostnode? how did you setup your storage? zfs? thin-lvm? or just plain ext storage?
Did I use “nooice” correctly?
Yes to all, I have a web server in another guest.
Not sure. Maybe
@seriesn can comment something.
I installed Debian and Proxmox on top, a big partition.
No idea about lvm stuff, that’s something pending to learn.
Looks LVM. Anything happens when you boot via rescue mode?
It’s already a production server, I can’t restart. I will need to migrate the database.
I will try to configure the logs properly in a few hours, maybe they catch something useful if it happens again. Fortunately I have another little server to use temporary.
No, mtop, to monitor the MySQL queries.
I reinstalled again last night. Running CentOS 7 and MariaDB 10.5, let’s see how it goes.
All logs recommended here are configured and it’s running Nixstats agent too.
Let’s see how it goes.
Thank you everyone!.
Make sure you have a syslog daemon like
rsyslog installed and running.
It happened again
This is what I got from the console:
Output of /var/log/messages:
Sep 30 08:01:01 db3 systemd: Created slice User Slice of root.Sep 30 08:01:0 - Pastebin.com (server went down at 11:09 AM)
First screenshot is a kernel panic. Is your microcode up to date? Firmware? Anything tasty in /var/log/boot.log? I’ve seen bad memory for example result in sporadic panics under load. As an example, this line was enough to deduce the memory was bad.
[ 0.000000] gran_size: 64K chunk_size: 16M num_reg: 10 lose cover RAM: 238M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 32M num_reg: 10 lose cover RAM: -18M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 64M num_reg: 10 lose cover RAM: -18M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 128M num_reg: 10 lose cover RAM: -16M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 256M num_reg: 10 lose cover RAM: -16M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 512M num_reg: 10 lose cover RAM: -16M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 1G num_reg: 10 lose cover RAM: -512M
[ 0.000000] *BAD*gran_size: 64K chunk_size: 2G num_reg: 10 lose cover RAM: -1536M
Client had memory replaced and his server has been humming ever since.
Not sure about that. At least there are not package updates available in both host and guest.
Everything says OK:
[root@db3 ~]# cat /var/log/boot.log[ OK ] Started Show Plymouth Boot Screen. - Pastebin.com
Is it possible there are memory problems? This failing server is a VM, another VM on the same host is running completely fine. Host is a VM too.