Emergency Mode Issue on SUSE Linux HPC System

Question

I have a head node and 4 worker nodes for high-performance computing (HPC).

Recently, I had to turn it off for maintenance at our data center. I tried to turn the system back on, but I encountered an error message stating

[ 5.215623][ C14] nvme0: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) You are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or "exit" to boot into default mode. Give root password for maintenance (or press Control-D to continue):

and it seems to be stuck in a loop.

Initially, I selected Ctrl+d as suggested to boot into the default mode, but unfortunately, it just recycles back to the same emergency mode error every time.

A couple of things that might be relevant:

I wasn't aware, but it seems an external USB was left plugged into the system's back when I turned it on after maintenance. I'm not entirely sure if this could be causing the issue, but it's worth mentioning.
Each node requires two power cables plugged into the power adapter. During the reconnection, I realized that one of the power cables for a node was not initially connected to a power source. However, I have fixed this issue, and now all nodes are receiving power as required.

I'm not a Linux expert, so I'm a bit lost as to what could be causing this problem. I've tried searching for solutions online, but nothing seems to be working for me.

If any of you have encountered a similar issue or have expertise with SUSE Linux and HPC systems, I would greatly appreciate any advice or guidance on how to troubleshoot and resolve this "emergency mode" problem.

I have been told that it might have to do with /etc/fstab misconfigured. I any body has step by step instruction for me to test this theory? — Train_Learn_2350
– Train_Learn_2350, Commented Aug 8, 2023 at 15:04
What is the full message that appears on the screen? Do you see Give root password for maintenance (or press Control D to continue):? If so, you can just type the root password instead of CTRL-D, and you'll get into the shell. Then you can view and edit /etc/fstab. — aviro
– aviro, Commented Aug 8, 2023 at 15:26
Here is the full message that appears on the screen: [ 5.215623][ C14] nvme0: Identify(0x6), Invalid Field in Command (sct 0x0 / you are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or "exit" to boot into default mode. Given root password for maintenance (or press Control-D to continue): I did press Control-D but system reboots with the same message — Train_Learn_2350
– Train_Learn_2350, Commented Aug 8, 2023 at 15:44
Read my comment again: you can just type the root password instead of CTRL-D, and you'll get into the shell. — aviro
– aviro, Commented Aug 8, 2023 at 15:47
Please add extra/new information by editing the question. Searching through comments is difficult and error-prone. After getting into the shell, did you try the suggested commands? Did journalctl -xb provide any information? — doneal24
– doneal24, Commented Aug 8, 2023 at 17:06

Stack Exchange Network

Emergency Mode Issue on SUSE Linux HPC System

0

You must log in to answer this question.

Hot Network Questions