25

I've moved a server from one mainboard to another due a disk controller failure.

Since then I've noticed that constantly a 25% of one of the cores goes always to IRQ however I haven't managed myself to know which is the IRQ responsible for that.

The kernel is a Linux 2.6.18-194.3.1.el5 (CentOS). mpstat -P ALLshows:

18:20:33 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 18:20:33 all 0,23 0,00 0,08 0,11 6,41 0,02 0,00 93,16 2149,29 18:20:33 0 0,25 0,00 0,12 0,07 0,01 0,05 0,00 99,49 127,08 18:20:33 1 0,14 0,00 0,03 0,04 0,00 0,00 0,00 99,78 0,00 18:20:33 2 0,23 0,00 0,02 0,03 0,00 0,00 0,00 99,72 0,02 18:20:33 3 0,28 0,00 0,15 0,28 25,63 0,03 0,00 73,64 2022,19 

This is the /proc/interrupts

cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 245 0 0 7134094 IO-APIC-edge timer 8: 0 0 49 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 66: 67 0 0 0 IO-APIC-level ehci_hcd:usb2 74: 902214 0 0 0 PCI-MSI eth0 169: 0 0 79 0 IO-APIC-level ehci_hcd:usb1 177: 0 0 0 7170885 IO-APIC-level ata_piix, b4xxp 185: 0 0 0 59375 IO-APIC-level ata_piix NMI: 0 0 0 0 LOC: 7104234 7104239 7104243 7104218 ERR: 0 MIS: 0 

How can I identify which IRQ is causing the high CPU usage?

Edit:

Output from dmesg | grep -i b4xxp

wcb4xxp 0000:30:00.0: probe called for b4xx... wcb4xxp 0000:30:00.0: Identified Wildcard B410P (controller rev 1) at 00012000, IRQ 177 wcb4xxp 0000:30:00.0: VPM 0/1 init: chip ver 33 wcb4xxp 0000:30:00.0: VPM 1/1 init: chip ver 33 wcb4xxp 0000:30:00.0: Hardware echo cancellation enabled. wcb4xxp 0000:30:00.0: Port 1: TE mode wcb4xxp 0000:30:00.0: Port 2: TE mode wcb4xxp 0000:30:00.0: Port 3: TE mode wcb4xxp 0000:30:00.0: Port 4: TE mode wcb4xxp 0000:30:00.0: Did not do the highestorder stuff wcb4xxp 0000:30:00.0: new card sync source: port 3 
2
  • 1
    is this an asterisk server? what does dmesg | grep -i b4xxp show? Commented Nov 23, 2011 at 17:39
  • @TimKennedy: yes it is. I've edited my question to show what does dmesg show. Commented Nov 23, 2011 at 20:10

4 Answers 4

23

Well, since you're specifically asking how to know which IRQ is responsible for the number in mpstat, you can assume it's not the local interrupt timer (LOC), since those numbers are fairly equal, and yet mpstat shows some of those cpus at 0 %irq.

That leaves IRQ 0, which is the system timer, and which you can't do anything about, and IRQ 177, which is tied to your b4xxp driver.

My guess is that IRQ 177 would be your culprit.

If this is causing a problem, and you would like to change the behavior your see, try:

  1. disabling the software that uses that card, and see if the interrupts decrease.

  2. removing that card from the system, and unloading the driver, and see if there's improvement.

  3. move that card to another slot and see if that helps.

  4. check for updated drivers or patches for the software.

If it's not a problem, and you were just curious, then carry on. :)

2
  • The problem arised after changing the MB. Maybe changing the card to another PCI slot is worth to try. Commented Nov 23, 2011 at 20:12
  • 1
    check this page: voip-info.org/wiki/view/Asterisk+PCI+bus+Troubleshooting good info for identifying problems, including IRQ issues. Commented Nov 23, 2011 at 20:35
9
watch -n1 -d cat /proc/interrupts 
2
  • 1
    This does not answer the actual question OP is asking. Commented Nov 9, 2017 at 5:36
  • 1
    That way you see the most interrupt changes, I know it did help me when troubleshooting exactly the issue that was described in the topic. Commented Nov 9, 2017 at 23:36
5

BP410P is a ISDN card with 4 BRI Lines, if all four lines are connected you should be getting four sync packets at a time and when calls are being made you can have 8 voices channels active all sending packets, etc

If you get a high IRQ count without any calls being made this could be a symptom of 2 bad things:

  1. There's a sync problem with the operator, you should also get bad voice quality.
  2. IRQ lines are conflicting, in this case your ata_piix (ide/sata) is using the same line has the BP410P card, the drivers might not like that very much, in this case do has the previous answer suggested try and change the card to another slot.

To debug you can also try removing the BRI cables and see if it makes a difference.

2
  • +1 I'll check your advices. Thanks Commented Nov 24, 2011 at 16:29
  • 1
    Wow, shocking. The last time I had to play card-jockey was in the mid-Nineties. Haven't even used the term ‘card-jockey’ since. I thought all this was well behind us, what with APICs, MSI etc. Commented Mar 8, 2012 at 19:01
4

I found myself in such a situation some time ago, and I wrote a little irqtop tool to monitor easily what's going on. It's basically the same thing as doing a watch -n 1 cat /proc/interrupts, with a nicer output.

Source code available here: https://gitlab.com/elboulangero/irqtop

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.