OK, so it’s not really ISE per-se, but clearly something is a bit foobar.
Thankfully I can Wireshark it!
Let’s start with the ISE server. This is the traffic going out:
See, nothing coming back.
So let’s head over to the server and see if anything gets to that:
The server gets the traffic, and sends it out again.
The logical next step would be for SW1 to send it (directly) to SW4, but it doesn’t. It sends it to SW3:
So far, so good.
But somewhere (i.e. SW3), the traffic does not get sent back to SW4:
For some reason, the reply traffic is actually going up to SW2:
But the replies never go back down to SW4. I won’t include a screenshot of this, but the Wireshark filter was empty.
Shutting down the connection to SW2 didn’t help either. Things did look very promising with connecting the laptop to the wifi, but then it just wouldn’t connect, and the same issues appeared again.
This isn’t working much better either…
I think I will swap the switches and see if that helps. Time to shut everything down and re-build.
After a (quick) rebuild…
It’s now running IOL images, and we STILL have the same problem.
What is making it worse is that the switches will turn themselves off:
Thanks for that, switch.
So, another rebuild, back to vIOS, a slightly older image now. Let’s see how this one fairs. I will be back in a couple of hours…
So, how do things look now? Well, sadly, not all that great. I even moved the AD server to be on the same switch as the ISE;
It still craps out:
ISE20/admin# ping 10.1.4.100 PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data. 64 bytes from 10.1.4.100: icmp_seq=1 ttl=127 time=7.11 ms 64 bytes from 10.1.4.100: icmp_seq=2 ttl=127 time=7.74 ms 64 bytes from 10.1.4.100: icmp_seq=3 ttl=127 time=9.45 ms 64 bytes from 10.1.4.100: icmp_seq=4 ttl=127 time=6.95 ms --- 10.1.4.100 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3012ms rtt min/avg/max/mdev = 6.954/7.816/9.456/0.997 ms ISE20/admin# ping 10.1.4.100 PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data. --- 10.1.4.100 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 33000ms ISE20/admin# ping 10.1.4.100 PING 10.1.4.100 (10.1.4.100) 56(84) bytes of data. --- 10.1.4.100 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 33000ms ISE20/admin#
How bad does this suck? More so as I have also lost access to the vWLC, and need to fix that again. So, how to fix this? Windows firewall is off, so the problem is not that. So, I moved the AD server back to SW1, and dual-homed it, with a second connection (192.168.90.100) into SW4.
ISE20/admin# ping 192.168.90.100 PING 192.168.90.100 (192.168.90.100) 56(84) bytes of data. 64 bytes from 192.168.90.100: icmp_seq=1 ttl=128 time=9.79 ms 64 bytes from 192.168.90.100: icmp_seq=2 ttl=128 time=2.20 ms 64 bytes from 192.168.90.100: icmp_seq=3 ttl=128 time=2.61 ms 64 bytes from 192.168.90.100: icmp_seq=4 ttl=128 time=2.72 ms --- 192.168.90.100 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3008ms rtt min/avg/max/mdev = 2.205/4.335/9.793/3.157 ms ISE20/admin# ping 192.168.90.100 PING 192.168.90.100 (192.168.90.100) 56(84) bytes of data. 64 bytes from 192.168.90.100: icmp_seq=1 ttl=128 time=5.93 ms 64 bytes from 192.168.90.100: icmp_seq=2 ttl=128 time=9.14 ms 64 bytes from 192.168.90.100: icmp_seq=3 ttl=128 time=3.17 ms 64 bytes from 192.168.90.100: icmp_seq=4 ttl=128 time=2.86 ms --- 192.168.90.100 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3006ms rtt min/avg/max/mdev = 2.861/5.281/9.148/2.533 ms ISE20/admin#
Let’s give it a few minutes…
Seems to be stable. I have fixed (hopefully) the vWLC (the port you can see in CDP on the switch needs to be an access port, not a trunk), and the ISE can ping the AD box, and all the checks pass.
Changing the switch image (instead of the platform) would have been much easier, but let’s see if this is fixed before celebrating.
I did have one theory when in the shower this morning, that the issue could stem from HSRP, I havn’t put HSRP back in, instead each switch gets a VIF with it’s own IP address 10.1.4.1, 192.168.90.1 for SW1 and so on. HSRP is not that great in a virtualized environment!
The vWLC can see the AP, but I am not seeing the WLANs, but I think I need to switch the AP back to FlexConnect mode instead of local.
Will find out tonight when I get home.
Upon returning home, the ISE looks in a much better state. However, the issue now seems to have moved to the vWLC. I changed the AP beck to FlexConnect mode and I can see the WLANs again, but the vWLC seems to lose contact. Same issue, different device. I think this might have something to do with it:
SW2(config-if)# SW2(config-if)# -Traceback= 1DBB7C8z 8DBFE5z 90522Ez 904F50z 904D5Dz 900F45z 901B7Bz 901B0Fz 7A8738z 7F8D8Dz - Process "Net Input", CPU hog, PC 0x008FD5AD -Traceback= 1DBB7C8z 8DBFE5z 90522Ez 904F50z 904D5Dz 900F45z 901B7Bz 901B0Fz 7A8738z 7F8D8Dz - Process "Net Input", CPU hog, PC 0x008FD5B5 SW2(config)# *Jun 3 18:14:17.110: %SYS-3-CPUHOG: Task is running for (1997)msecs, more than (2000)msecs (0/0),process = Net Input. *Jun 3 18:14:19.110: %SYS-3-CPUHOG: Task is running for (3997)msecs, more than (2000)msecs (0/0),process = Net Input. SW2(config-router)#
So I rebooted all the switches. Things look better (again). I even get so far as to be prompted for the ISE certificate when trying to connect to the CCIE.Sec-Admin WLAN. It doesn’t actually connect, but still, we do have progress. Clearly there are issues, but I think rebooting the switches (rather than wasting time redesigning the LAN) is the way around it at the moment. Until I can fix the memory hog issue with spanning-tree. Moving to a solely layer-3 would be the best solution here.
The final tasks are to actually get something connected via the ISE server. Which will be the next post, once I can think up a suitable ISE/ICE-based pun.
(Cisco Controller) >ping 192.168.90.205 Send count=3, Receive count=0 from 192.168.90.205 (Cisco Controller) >
I might just replace all the switches with a single Arista switch.
Enough for now. I will just leave this here to outline my current feelings.
What IOS versions were these vIOS and IOL switches? I'm still doing R&S but I've also found issues when labbing out a few advanced scenarios.
I didn't want to invest much more time if your seeing these kind of issues on Cisco L2 IOL/vIOS devices. I'll have to fire up a L2 device and check my versions but it's the one from the latest VIRL release, just wondering if you have the same device and seeing these issues? In fact I have it here it's IOSvL2 – 15.2.4055 DSGS image.
I've had some success by disabling IGMP snooping on these virtual L2 devices and was wondering if you'd tried that.
I used that image as well. I didn't try disabling igmp snooping though… But vEOS (Arista) is working well for me at the moment.
Glad to hear it's working a lot better for you not much more frustrating than troubleshooting a faulty image when your trying to get something else working. I've not tried those switches out yet but I'd heard good things about them when run virtually so I'll have to give them a try.
i’ve had problems like these with the virtual switches where L3 traffic would not be forwarded
however entering the command no ip cef ‘fixed’ this
i’ve also had issues where the virtual devices that were foremerly working in my topology seemed unreachable until i ping them from the adjacent switch, and to avoid this added ip sla’s to the switches to act as a keepalive