Interesting Troubleshooting Cases, Part 3 - Breaking other Wi-Fi

Note: This article is part 3 of a 4-part troubleshooting series, with more in-depth information about a TEN talk at WLPC.
Part 1 - The RADIUS connection
Part 2 - Zoom issues
Part 4 - The suddenly weaker Wi-Fi
Video recording from WLPC Prague

Incoming Ticket: I have installed eduroam via the configuration assistant tool (CAT) on macOS without issues and eduroam connects flawlessly.
But now my home network does not work anymore. After deleting the CAT profile my Wi-Fi at home works again. Is this a known issue?

Now, my first reaction to this was “What?!” as this sounds just impossible.

If you don’t know CAT, this is a tool that provides onboarding for eduroam, with executables (Windows), Scripts (Linux), Apps (Android), and mobileconfig files (macOS, iOS), not doing more than telling your device how to correctly configure for eduroam, so it is secure. It is basically justa sort of MDM for eduroam.

It does normally not delete SSIDs (though some installers can, if instructed to), and does not mess with IP settings, DNS settings, Adapter settings, or anything else. We have thousands of installs of the macOS .mobileconfig, so you would think that if it did mess up settings, there would be more clients raising issues.

If we look into the profile, we can see that there is barely anything in there. Just the CA certificate to trust, and the SSID and security settings, right?

.mobileconfig file for macOS

Wait - what is that at the bottom? Proxy Type Auto? Why is that in here?

Because this is a supported configuration in eduroam, and clients need to be configured for it in case they roam to an institution that uses it.

Source: GEANT Wiki How to deploy eduroam, highlighting by me

But why is this setting causing issues for one single person? This is not a widespread problem.

Proxy Auto detection usually works via WPAD protocol, which has different methods - but as a user at home probably does not provide a proxy this way, there has to be something off, and I guessed that it would be with the DNS detection method.

So I did some tests. -> timeout @ -> timeout @ -> timeout -> timeout -> timeout

That is interesting. I expected that it would be a problem in the search path (to the provider), but to external DNS servers, to different domains, and just having “wpad” in the query string - even if it is a legitimate domain - would lead to a timeout, so just not receiving any answer. On different providers, or the same provider in different areas, all was fine - would either receive an A record answer if the entry existed or an NXDOMAIN if it didn’t.

macOS would shut the network down as soon as the wpad query timeouts - so you have about 2 seconds being online, then you are offline. The issue is that the proxy setting is not only for the SSID configured, but the whole system - thus making problems at home.

You could argue that this setting is a problem anyway, there were some security issues and I would say that to some extent they still exist. But as it is an allowed configuration in eduroam, it is what it is.

Researching this, I did find some other universities with the same problem, and there seems to be one more failure mode - they noticed that the WPAD queries were answered with REFUSED, leading to the same issue.

A GitHub issue was raised for eduroam CAT. Stefan Winter of Restena argued that setting this off would, of course, help the few people with issues, but would break the connection for a lot of others, and there is nothing that could be done to improve this from the eduroam CAT side.

Comment from Stefan Winter on GitHub, highlighting by me

So, now we know why this happens, but what intercepts the DNS query?

There seem to be a few things, the other Universities seem to have Google Nest Wi-Fi devices that seemed to cause the issue. In our case, the student had an Arris cable modem. I did talk to the provider about what exactly was causing this - if it is the modem or something upstream - but did not get any further than the first level helpdesk with this.

How did I resolve this then?

As this was hitting just one student, I just made a .mobileconfig file by hand (you can use Apple Configurator for this) with the Auto Proxy disabled.

Auto Proxy disabled

Not ideal, but for this one case it will do.