X9SCM / X9SCL Network Timeout

Supermicro X9SCM and X9SCL main boards will lose network connection after some heavy traffic, especially on RHEL/CentOS 6. Updating BIOS and driver will not always fix this:

Oct 19 18:32:49 zeus kernel: ------------[ cut here ]------------
Oct 19 18:32:49 zeus kernel: WARNING: at net/sched/sch_generic.c:267 dev_watchdog+0x26d/0x280() (Not tainted)
Oct 19 18:32:49 zeus kernel: Hardware name: X9SCL/X9SCM
Oct 19 18:32:49 zeus kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
Oct 19 18:32:49 zeus kernel: Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length vhost_net xt_hl xt_tcpmss macvtap xt_TCPMSS macvlan iptable_mangle iptable_filter xt_multiport xt_limit tun xt_dscp ipt_REJECT ip_tables kvm_intel kvm vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse snd_pcsp snd_pcm snd_timer video i2c_i801 tpm_tis tpm tpm_bios serio_raw i2c_core snd shpchp output soundcore snd_page_alloc ext3 jbd mbcache ahci e1000e [last unloaded: scsi_wait_scan]
Oct 19 18:32:49 zeus kernel: Pid: 4, comm: ksoftirqd/0 veid: 0 Not tainted 2.6.32-15-pve #1
Oct 19 18:32:49 zeus kernel: Call Trace:
Oct 19 18:32:49 zeus kernel: <IRQ> [<ffffffff8106c608>] ? warn_slowpath_common+0x88/0xc0
Oct 19 18:32:49 zeus kernel: [<ffffffff8106c6f6>] ? warn_slowpath_fmt+0x46/0x50
Oct 19 18:32:49 zeus kernel: [<ffffffff8147c6fd>] ? dev_watchdog+0x26d/0x280
Oct 19 18:32:49 zeus kernel: [<ffffffff8107fcac>] ? run_timer_softirq+0x1bc/0x380
Oct 19 18:32:49 zeus kernel: [<ffffffff8147c490>] ? dev_watchdog+0x0/0x280
Oct 19 18:32:49 zeus kernel: [<ffffffff81075413>] ? __do_softirq+0x103/0x260
Oct 19 18:32:49 zeus kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30

So far, the only fix that has worked for me is doing two things.

1. Patch the NIC firmware (do this for both eth0 and eth1):

Download Here

2. Then add the following kernel parameter in your grub.conf (or menu.lst depending on OS flavor):

pcie_aspm=off

Now reboot. You shouldn’t lose your network again.

2 Responses to X9SCM / X9SCL Network Timeout

  1. […] got the Intel Ethernet NIC fixed permanently? Unfortunately, pcie_aspm=off and this patch (http://djlab.com/2012/10/x9scm-x9scl-network-timeout/) worked only for a few […]

  2. Devator says:

    Thanks! Last week I had this issue since I did not use the eth1 port for years.. Your solution seems to have fixed the problem! 🙂

    Cheers!