BBB throwing NMIs and time outs

This is an off-shoot from the issue entitled “wifi unreliable” but a completely different issue. I’m very pleased with the TL-WN722N wifi plug, BTW.

My setup:

  • BBB rev C
  • Debian 7.7 and 3.18-0-rc3-bone1 custom build using RCN image-builder
  • USB 4-port hub
  • Tenda W311M (Ralink RT537) wifi plug as control interface
  • TP-Link TL-WN722N (Atheros AR9271) wifi plug as RFMON monitor interface
  • external 1.5A power supply
    My application is a wifi sniffer for SOHO networks using dumpcap saving to a file on the sdcard and then offloading the file to another system for analysis. When I use a capture filter on a single wifi MAC it works well. When I capture on all MACs, it works well for about five minutes (~50K 802.11 frames) and then the kernel starts complaining - periodically generating a mix of NMIs and bus timeouts and my SSH session stops - but it looks like the RFMON interface continues to capture 80211 frames. See attached file for the console capture. When I kill the RFMON capture, things settle down and I can reconnect via SSH.

I think this is a case of the CPU running hard handling high-priority tasks so the watchdog and SPI bus are not serviced in the necessary time windows. I’ll dig into this more tomorrow - ftrace, dynamic debug poles in the driver, etc (yes,that will place a greater burden on the CPU but hopefully it will point me to something.) Conceptually the kernel shouldn’t schedule the 80211 receive tasklets if they are consuming too much of the CPU - but I’m not clear how that would happen. Any advice/experience with this will be appreciated.

Dave

issue.141206 (20.6 KB)