Some months ago I’ve set up a Debian 9 virtual machine on the VMWare server of the SME I work for. I planned to enable several services on it:
- an IMAP server to locally store the email archive of the administration office
- a shared calendar and address-book with the startd protocols CalDAV and CardDAV; while there are leaner server for those I chose to install Nextcloud as it would be far easier to use and integrate.
I picked Debian 9 as the operative system as it allows for a leaner and lighter installation when you start from its netinstall version. Knowing that the 40Gb I provisioned for 4 users will surely require to be expanded sometime in the future I used Logical Volume Manager.
Everything looked fine and ran smoothly for some weeks then I started to find the machine hanged.
As I’m the only full-time employee with some proficiency in IT I am the de-facto help desk for anything “computer related”: remote banking issues, updating Java on Windows 10, maintaining installed the latest Firefox version that still support the Java Applets required by the remote banking, handling anti-viruses, non-administrative applications updates and so on. All this listing to say that I have root access on the VMWare console. I needed to reset this VM (called posta
)
I had to Disable screen blanking on text console (thanks, Stack Exchange!) to catch the bug as I just can’t stare at a console for several hours hoping to catch it. So, after having removed the timeouts I left the console opened for a while.
I hadn’t to wait too long. The day after I’ve finally got a clue to follow :
Finally something to look for: the bug is in the kernel driver vmxnet3_rq_rx_complete
. Luckily it seems a fairly common bug; excluding the bug reports from RedHat that are – rightly so! – available only to the subscribers of their services there is an interesting bug report in VMWare Knoledge Base titled «Linux VM fails with the error “kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!” (2151480)».
As the VMWare also runs the main administrative applications I think it wasn’t worth a call to the remote management so I’ve applied the workaround on posta
.
As this bug showed somehow randomly I can only leave the console open and keep an eye on it.
I’ll let you know if it worked.
Everything’s fine till ‘now, sir