Bug chasing

Some months ago I’ve set up a Debian 9 virtual machine on the VMWare server of the SME I work for. I planned to enable several services on it:

  • an IMAP server to locally store the email archive of the administration office
  • a shared calendar and address-book with the startd protocols CalDAV and CardDAV; while there are leaner server for those I chose to install Nextcloud as it would be far easier to use and integrate.

I picked Debian 9 as the operative system as it allows for a leaner and lighter installation when you start from its netinstall version. Knowing that the 40Gb I provisioned for 4 users will surely require to be expanded  sometime in the future I used Logical Volume Manager.

Everything looked fine and ran smoothly for some weeks then I started to find the machine hanged.

As I’m the only full-time employee with some proficiency in IT I am the de-facto help desk for anything “computer related”: remote banking issues, updating Java on Windows 10, maintaining installed the latest Firefox version that still support the Java Applets required by the remote banking, handling anti-viruses, non-administrative applications updates and so on. All this listing to say that I have root access on the VMWare console. I needed to reset this VM (called posta)

I had to Disable screen blanking on text console (thanks, Stack Exchange!) to catch the bug as I just can’t stare at a console for several hours hoping to catch it. So, after having removed the timeouts I left the console opened for a while.

I hadn’t to wait too long. The day after I’ve finally got a clue to follow :

Oh it looks I’ve found a bug in a kernel driver

Finally something to look for: the bug is in the kernel driver vmxnet3_rq_rx_complete. Luckily it seems a fairly common bug; excluding the bug reports from RedHat that are – rightly so! – available only to the subscribers of their services there is an interesting bug report in VMWare Knoledge Base titled «Linux VM fails with the error “kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!” (2151480)».

As the VMWare also runs the main administrative applications I think it wasn’t worth a call to the remote management so I’ve applied the workaround on posta.

As this bug showed somehow randomly I can only leave the console open and keep an eye on it.

I’ll let you know if it worked.


One Reply to “Bug chasing”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.