An investigation into the MTU issues

A project to provide VPN access to the River System Raspberry Pis to allow WMT Volunteers and Staff to log in from home. Triggered by the COVID19 lockdown
Post Reply
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

An investigation into the MTU issues

Post by PatrickW »

The VPN MTU issues have rather captured my attention. I have investigated them a little.

I can't speak authoritatively on the topic, but I feel I've now reached a reasonable level of understanding.

Executive summary

I believe the MTU problems are caused by routers that interfere with path MTU discovery by not sending ICMP "fragmentation needed" packets, or by not fragmenting packets that are too large. There are some justifications for the behaviour of these routers, but the basic fact is that they exist and are relatively common.

Equipment at the model town end of the VPN seems not to exhibit that kind of behaviour and, as things currently stand, I do not believe it is at fault for the path MTU discovery problems. I think the problems we are now seeing come from our individual home internet connections. The WMT equipment could have been at fault in the past, though, prior to the internet connection upgrade.

If I am correct and your home internet connection has an MTU of 1492 or more, you should not now need to set an MTU in order to use the VPN, because with an MTU that high, your connection won't be the bottleneck, and with the WMT end working well, there isn't an opportunity for MTU issues.

If your home internet connection has an MTU of less than 1492, and you find you are having problems, then you will still need to set an MTU in order to use the VPN.

You can set a route MTU instead of a link MTU, so that only VPN traffic is affected. See the "Recommendations" section below.

The suggestion to use an MTU of 576 should work, but you can probably go a bit larger. If you take the MTU of your broadband connection and subtract 64 or 128 from it, that should give you an MTU that works with the VPN.

This will only help with applications that use TCP, such as HTTP and SSH. Applications that use UDP will still have problems. I do not believe we intend to use any applications that rely on UDP. (Except for the VPN itself, which doesn't count, as long as the applications we run through it use TCP.)

You should not change the MTU of your internet connection. Only set a link or route MTU on the machine you are using to access the VPN.

If a forum post has an executive summary, then it is probably too long. :D The middle of this post is mostly just me explaining how I reached the conclusions I reached. I am not expecting everyone to read the whole thing, unless they find it interesting or seek to reach a shared understanding. The most actionable information is at the beginning and the end.

First things first: my experience with MTU. The symptoms.

If I do nothing with regard to MTU, and I am connecing to the VPN using my home broadband, then I can connect to the VPN, and I can ping hosts in the WMT network, but I cannot SSH through the VPN. I believe this is in line with what everyone else has experienced at one time or another.

The maximum MTU that I can set on my local network interface in order to make SSH work thorugh the VPN turns out to be equal to the path MTU between my computer and the VPN server. In my case, that is 1454. This can be set on the VPN host itself, or on a well-behaved upstream host that reacts correctly to oversized packets and serves as the VPN client's gateway to the Internet. (I've tried both.)

However, while this is adequate for SSH, it is not sufficient to enable me to access the WMT Engineers GUI webpage. For that to work, I have to set an MTU no larger than 1390, and it must be set on the VPN client host, and not an upstream host. Setting an MTU of 1390 on an upstream host is no better than setting an MTU of 1454.

Meanwhile, if I instead connect to the VPN server using a mobile network, the VPN works seemingly perfectly, with no messing about with the MTU whatsoever. I can SSH to hosts in the WMT network, and access the Engineers GUI.

Now, some definitions.

I am going to use these terms, which sound similar to one another, but have distinct meanings. I have tried to use these terms carefully and non-interchangeably. These may not be the "de jure" definitions, but they are what I intend to mean by them:
  • MTU: Maximum Transfer Unit; the largest unit of data that can travel through a particular physical or logical networking component. In practice: the largest IP packet, including headers, that can be sent through such a component. In common usage, when used without qualification, "MTU" tends to refer to a "link MTU".
  • link MTU: The MTU of a single network link available to a host. It will correspond to a particular network interface. The term "link" should be understood roughly as in "link layer" in the IP layer model. Typically the link would correspond to a physical link, but virtual links also exist (for example, between two virtual machines, or between two hosts on a VPN).
  • path MTU: The MTU of a complete path between two hosts in a network. The path may consist of multiple links, via intermediate hosts ("routers"), each such link having its own MTU. The path MTU is equal to minimum link MTU in the path. In routing terminology, each link in the path corresponds to a "hop" along the route.
  • route MTU: The path MTU that a given host expects to be applicable to a given IP route. This can be a static configuration, similar to a link MTU, or else the host will continually employ path MTU detection to dynamically update route MTUs for the IP traffic it is currently handling.
Attempted diagnosis

The main thing that I think is going on with the MTU is that path MTU detection (RFC 1191/RFC 1981) is sometimes disrupted by one or more routers between the VPN client and the VPN server. In turn, this prevents the Linux kernel at one end, the other, or both from calculating an appropriate route MTU for packets that go through the VPN. In turn, this prevents application software like SSH or web browsers from sizing its packets to fit within those route MTUs. If the packets don't fit within an applicable route MTU, they may be lost. This breaks stuff. It is a simple enough concept, but the devil is in the details.

Path MTU to the VPN server

You can learn a lot about path and route MTUs using `ping`, with the arguments "-c 1 -M do" and trying different "-s" arguments to send different sized packets. If you start with a large size and decrease, you can do manual path MTU discovery, and find out whether there are any routers disrupting path MTU discovery. With a bit of deduction, you may even be able to work out which routers are causing the problem.

By using ping in this way, I now know for certain that my home broadband router interferes greatly with path MTU detection, in both directions. Unfortunately, it is not configurable in that regard.

And this kind of router "misbehaviour" is apparently not uncommon, and causes problems for people trying to use VPNs. However, it could also serve as a crude way of avoiding some security issues. It might be best to avoid disabling the misbehaviour, unless you know that the router employs some other, more sophisticated form of defence, or you are prepared to take the risk.

Alternatively, when I connect through the mobile network, the path MTU between me and the VPN server is somewhat larger than it is on my home broadband: 1492, versus 1454 for the home broadband. There is still the same problem, where packets with the DF flag set (meaning "do not fragment") and larger than the path MTU simply disappear. However, unlike on my home broadband, packets larger than the MTU and without the DF flag set are successfully fragmented in transit, and I get the appropriate responses from the VPN server. This is just enough good behaviour to enable the kernel to discover the path MTU.

I can successfully use the mobile network to send 1500 byte DF-flagged packets to other hosts on the Internet, proving that the MTU of the mobile connection is 1500. Therefore, the path MTU of 1492 to the VPN server must come from the model town's broadband connection. This deduction is supported by the fact that 1492 is a common MTU for FTTC broadband. In which case, the WMT gateway router, or else the ISP supplying WMT's internet connection, must be "to blame" for ignoring the DF-flagged packets larger than 1492 bytes and "to thank" for properly fragmenting the non-DF-flagged packets. As I say, you don't necessarily want to change this behaviour of the router or ISP.

Path MTU through the VPN

It is just as interesting to use ping to send packets through the VPN, to a host within the river system, such as 192.168.0.6. Back on my home broadband, with my local interface MTU left untouched at 1500, small packets obviously get through fine. Using Wireshark, I can see that sending packets into the VPN causes corresponding, slightly larger encrypted packets to be sent out towards the Internet. If I am connected to my home broadband, and I size the packets I send into the VPN such that the corresponding encrypted packets will exceed the MTU of my broadband connection, then, as you might expect, those packets are lost. This prevents SSH from working. Evidently, initiating a SSH connection involves sending large packets from the SSH client to the SSH server.

But, if (using ping) I size a packet so that the encrypted version of it has a size equal to the path MTU to the VPN server, and I send it through the VPN with the DF flag set (meaning "do not fragment"), then it gets all the way to 192.168.0.30. Acting in its router role (as opposed to its VPN server role), 192.168.0.30 responds "Frag needed and DF set (mtu = 576)". From this, I deduce that the VPN server's link into the rest of the WMT network has an MTU of 576. Assuming that's true, then that's exactly how 192.168.0.30 should respond to my packet.

Then the clever bit: the Linux kernel on my local machine has taken note of the "Frag needed" response from the VPN server, and it has set the MTU to 576 for the route to the IP address that I was pinging. So, if I try to send exactly the same size of DF-flagged packet once more, instead of getting a response from 192.168.0.30, I get a local error "Message too long, mtu=576". If I remove the DF flag, then the packet is fragmented locally, to fit in 576 bytes, and passes successfully all the way through the VPN to the intended host (e.g. 192.168.0.6).

As a result of the kernel setting the MTU of the route to 576, I can then successfully establish an SSH connection to the IP address I was pinging, even though the MTU of my local Ethernet interface is still set to 1500. This is because the kernel is telling SSH it can't send packets bigger than 576 bytes, and, by chance, 576 happens to be small enough that they will also avoid the 1454 limit imposed by my broadband connection. A fun trick.

If I disconnect from the VPN, the kernel 'forgets' the MTUs for the routes to the model town network. I can repeat the same experiment the next time I connect to the VPN.

If I connect via a mobile network, the results are, expectedly, different. If I try to send a 2000 byte, DF-flagged packet through the VPN, then, straight off the bat, I get a local error: "Message too long, mtu=1438". This is how it should work. Before I've sent a single packet through the tunnel, the kernel has already detected the path MTU to the VPN server and set the MTU for the VPN tunnel route accordingly.

If I then send a packet through the VPN that fits within that route MTU, the packet hits the VPN server and gets the same response as before about needing to fit within 576 bytes. Again, that's exactly how it should function.

This MTU of 576 is set per individual desitination IP address, because it only applies if the packet is to be forwarded onwards through that low-MTU link to the rest of the WMT network. If you ping the VPN server itself, then the applicable MTU is higher.

Engineer GUI puzzle

But why is it that can't I access the Engineer GUI if I don't set a low link MTU on the host that is acting as the VPN client?

I think the key is to realise that MTU path discovery is being done independently in both directions, because each endpoint maintains its own understanding of the path MTU. So, you have to think about the sizes of packets being sent in each direction.

With SSH, my SSH client sends a large packet to the SSH server straight off the bat. Provided that the underlying physical connection enables path MTU discovery, this immediately triggers a series of events that result in the route MTU being detected. This causes the SSH client to try again with a smaller packet, enabling the connection to succeed.

With HTTP, the browser sends only a very small packet to the web server to initiate a connection to download the page. This packet is not large enough to cause the route MTU to be detected. Neither the browser nor the web server know at this point that there is a low MTU, so the server sends large packets in response. These get lost, because they are too big for the 1454 MTU of my ADSL link. Neither the browser nor the server can figure out what's going wrong, so the connection just hangs, repeatedly trying to re-send the packets that aren't getting through.

If I deliberately trigger a 404 error from the NAS box web server, then the page displays regardless of the link MTU I set on my local link. The HTML for the "404 - Not Found" page is only 414 bytes in size, so the server doesn't end up sending any packets bigger than 1454 bytes.

Because HTTP is based on TCP, setting my local link MTU to 1390 will probably have the effect of limiting the TCP MSS (Maximum Segment Size) for the TCP connection to some value (1390 minus some TCP overhead). The TCP MSS applies to both ends of the connection, so the web server will be forced to send smaller packets.

If the encryption overhead is the same in both directions, then an MTU of 1454 on my local link should be low enough to make it all work. I suspect that the reason it isn't low enough is that the encryption overhead on packets from the VPN server to the client is marginally higher than on packets from the client to the server. Perhaps this could be due to the different names of the devices in the authentication scheme, for example, or something of that nature. Because the IPsec traffic is encrypted in 64-byte chunks, a marginal difference in overhead can end up increasing the size of encrypted packets by 64 bytes, or by a multiple of 64 bytes. So, if the TCP MSS was chosen based on the MTU and encryption overhead at the client end of the VPN, it is not necessarily small enough for the server end.

Setting the MTU at my end 64 bytes lower than the real MTU of my broadband link (1454 - 64 = 1390) results in a TCP MSS that causes the web server to produce packets sized so that, after the higher overhead at the VPN server end, their encrypted version fits within 1454 bytes!

Testing outbound traffic through the WMT office router

If I run the command

Code: Select all

ping -c 1 -M do -s 1472 isup.me

On sbuttspi, then I get the a "Frag needed and DF set (mtu = 1492)" response from the office router. This shows that the office router is working perfectly to support outbound traffic MTU detection, and should not be the source of any MTU problems with outbound VPN traffic. This means that the problems accessing the Engineers GUI must be caused by our own internet connections and home routers, and not at the WMT end of the connection.

Whether this was always the case (e.g. on the old WMT internet connection) is another question. I suspect that it wasn't.

I can also draw two further conclusions from this outbound ping test:
  1. Anyone who finds they need to limit their local link MTU in order to access the Engineers GUI must have a router somewhere in their route to the Internet that does not facilitate path MTU discovery for inbound packets from the Internet. This could be a router in their home network, or at their ISP. The same could also be true of outbound packets through said router, but to find that out you would have to send outbound packets through said router.
  2. The MTU of 576 set on the link between the VPN server and the rest of the WMT network is technically a misconfiguration. Normally, every link that connects to the same Ethernet switch, or to the same network of switches, should have the same MTU. This is because switches cannot send ICMP responses or maintain knowledge about different MTUs, otherwise they would be routers, not switches. Everything that's connected to a switch is "on the same link" from the point of view of all the other things that are connected to the switch. In the case of the model town, the correct MTU for everything connected to the switches would be 1500.

    Fortunately, this 576 MTU seems to only apply to traffic sent from the VPN server. If I ping the VPN server from sbuttspi:

    Code: Select all

    ping -c 1 -M do -s 1472 192.168.0.30
    Then the 1500 byte packet gets through to the VPN server just fine. In other words, the Linux kernel on the VPN server ignores its own MTU setting for incoming traffic. There was a risk that the 1500 byte packet would just just get lost.

    Just because it is technically incorrect to have the VPN server's link MTU to the WMT network set to 576 does not mean that it does not function as a workaround for MTU problems caused by VPN users' broadband connections. However, the effects of the workaround are likely to be somewhat unpredictable, because they depend on the interaction between the exact size of packets being sent through the VPN and the path MTU discovery algorithm employed by the OS at either end.
Other pitfalls

IPsec VPNs are not "links", and path MTU discovery is opportunistic

You might think that the kernel would perform path MTU discovery in both directions as soon as the VPN tunnel is established, in order to calculate a link MTU for what appears to be a virtual "link". I certainly did. But that's not how it works.

IPsec tunnels are not treated as "links", otherwise they would appear as virtual network interfaces. Instead, they are a lightweight wrapper and some extra routing logic for packets that get sent out over a normal physical link.

In essence, the VPN server is not so much a tunnel endpoint as it is a special kind of router, which knows how to route these individual, IPsec-encrypted packets that it receives.

This is a rather neat and efficient solution, but it is also probably the reason why IPsec VPNs are so vulnerable to MTU issues. The MTU issues affect individual packets, and therefore individual applications that run through the VPN, not the complete VPN solution.

Path MTU discovery ends up being done in the same way that it would be done for unencrypted packets sent across the Internet to a normal router. The main difference is that the kernel has to take into account the encryption overhead when it calculates the MTU for routes that go through the VPN. The kernel first assumes that the unencrypted route MTU between it and the VPN server matches the local interface MTU. Then, the first time it is called upon to send an encrypted packet larger than the true path MTU, it ends up getting a 'fragmentation needed' response, or else the packet gets fragmented along the way and the remote end acknowledges the receipt of fragments rather than a single packet. Either way, the kernel gets clued in to the fact that the true path MTU is lower than it originally predicted, and it adjusts accordingly, no longer sending packets that are too large. Periodically, the kernel will send larger, probing packets to find out if the MTU has increased.

Capturing ESP packets

When examining MTU issues by capturing packets outbound to the VPN on the VPN client host, only the encrypted ESP (Encapsulating Security Payload) packets are captured. This is an artefact of the way IPsec is handled by the Linux kernel. The unencrypted version never exists "on the wire" for the capture software to see, or so to speak. In theory you can configure Wireshark to decrypt the encrypted packets and see their contents, but in practice this requires a set of obscure configuration values, including an encryption key. In principle, those values will be available somewhere on that machine, but I baulked at the idea of figuring it all out.

This limitation only affects outbound traffic. For inbound traffic, the decrypted inner traffic can be captured in the normal manner.

In terms of packet size, the ESP packets are padded to fit into the cipher block size of the encryption. This means that the sizes of ESP packets fall into a set of specific sizes rather than varying over the full range of possible packet sizes.

Recommendations

It looks like the WMT internet connection is in pretty good shape now. I don't think it's likely to be causing issues, although the old connection might well have done.

The main oddity at the WMT end is the MTU of 576 on the link from the VPN server to the rest of the WMT network. This shouldn't be necessary, but it may be helping to mask issues that derive from individual users' broadband connections.

For individual users having problems, it is almost certainly now being caused by your own router or broadband connection.

Anyone with a an internet connection MTU of 1492 or larger should find they have no problems connecting to the VPN, because their connection is not the bottleneck. If your broadband MTU is less than 1492, then that makes your internet connection the bottleneck, and that means the VPN expects your broadband connection to facilitate path MTU discovery. That's an expectation that may not actually be getting fulfilled.

If you can configure your router to be a bit less restrictive about packets larger than the broadband link MTU, then it might fix all the issues for you.

What you ought not to do is change the MTU of your broadband link. If you decrease it, it can only make the VPN problems worse. If you increase it, then it might exceed the true MTU of the physical connection, and that will give you all sorts of other problems.

If you can't make your router behave better, then setting a link MTU for your local link (i.e., for one of the links between your VPN client and your broadband router) is a valid workaround that should get everything working. 576 is fine, but you should be able to achieve something larger. You might try the MTU of your broadband connection, minus 64, or minus 128. (In my case, 1454-64 = 1390.)

Some typical broadband MTUs are: 1500, 1492,

Or, instead of setting a link MTU, you could set a route MTU that only applies to traffic between your computer and the VPN server. For example, I can do:

Code: Select all

ip route get <WMT VPN server IP>
Which tells me what route will currently be used to get to that IP address:

Code: Select all

<WMT VPN server IP> via <my gateway IP> dev <my dev> src <my src> uid 1000 cache
If I then run:

Code: Select all

sudo ip route add <WMT VPN server IP> via <my gateway IP> mtu 1390
Then I can access the WMT Engineers GUI even though the MTU of my local link is still set to 1500. The advantage of this approach is that traffic that isn't anything to do with the VPN remains unaffected.

It is left as an exercise for the reader* to write a script which does a DNS lookup of the VPN server domain name to find out its current IP address, then finds out the current route to that IP address, then adds a new route using the same gateway and a reduced mtu.*

*Meaning: "I can't be bothered to do this, but you can if you want to."

Setting a link or route MTU at the client end is probably the most efficient way to deal with the problem.

In theory you could limit the outbound MTU at the WMT end, so that the packets are always smaller than any typical broadband MTU. However, doing so risks further complications. I would be inclined to just continue with the current set-up and setting a route MTU when you connect to the VPN.

One final point: the current workaround (setting a lower MTU at the client end) will only fully solve the problem for TCP connections. Any application using UDP to transfer data out of the model down through the VPN will still come a cropper.

I don't think we're likely to use any applications that use UDP rather than TCP, but if we ever did then the only way to make it work outbound through the VPN would be to limit the outbound MTU at the model town end, to make sure it is lower than the MTU of any typical broadband connection. (I think most broadband connections are at least 1400, so something like 1350 would do the job.)
TerryJC
Posts: 2616
Joined: 16/05/2017, 17:17

Re: An investigation into the MTU issues

Post by TerryJC »

Patrick,

That is a very impressive piece of research :!:

I mentioned to Hamish the other day that we probably don't need to use an MTU value of 576 now that we have a 'proper' broadband connection at the WMT end. It was Hamish who identified MTU as the solution (and presumably the cause) of the problems we were having back in July. For me, the main problem was that I was only getting about 20 lines of a Results file when I tried to read it. Changing the MTU value locally completely solved that at the time, but the broadband speed that we were getting was very, very low (sub 2 MHz I recall (and it's the upload speed that's needed here)).

I'll be interested to see what Hamish's thoughts are on this.
Terry
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: An investigation into the MTU issues

Post by hamishmb »

This is interesting, but unless I missed it, there isn't actually a reason for wanting a higher MTU (except for UDP, which we aren't using anyway)?

Extra information on the VPN server:

The River system facing interface on the VPN server has the default MTU (probably 1500), and the internet-facing side was set to 576 because that was the only way I could make it work.

Hamish
Hamish
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

Re: An investigation into the MTU issues

Post by PatrickW »

hamishmb wrote: 04/12/2020, 12:59 This is interesting, but unless I missed it, there isn't actually a reason for wanting a higher MTU (except for UDP, which we aren't using anyway)?
In this isolated situation, probably not.

But, I am the sort of person who will prioritise the replacement of a blown lightbulb, even if the room it lights up has minimal aesthetic value and the remaining lightbulbs provide adequate functional illumination. It's both a strength and a weakness. This was a rather self-indulgent investigation.

On a big-picture scale, a larger MTU reduces the load on routers and endpoints (fewer packets to process) and makes more efficient use of bandwidth (less bandwidth taken up by protocol headers), so it's something that becomes more important to consider the larger and more-heavily loaded the network is, although there will be diminishing returns the larger you make the MTU. The model town shouldn't ever have to worry about any of that though. (Unless it somehow ends up hosting services for thousands of users.)

Setting a smaller MTU as a workaround for issues caused by differences in MTU is also something that has the potential to compound problems in a large system. E.g. if our ISPs solved a problem by decreasing MTU as a workaround, then they would probably end up causing new problems for people using IPsec VPNs. Again, not really applicable to this isolated situation. Then again, who knows what weird and wonderful MTU-sensitive protocols someone might attempt to run through the VPN in years to come.
hamishmb wrote: 04/12/2020, 12:59 The River system facing interface on the VPN server has the default MTU (probably 1500), and the internet-facing side was set to 576
That would make sense, and is essentially what I was suggesting as the solution to make UDP applications work. However, it behaves as though it has been set up the other way around, allowing large packets over the internet-facing interface, but demanding 576 byte ones over the river-system-facing interface. I draw this conclusion from the results of:

Code: Select all

ping -c 1 -M do -s 1200 192.168.0.30
and

Code: Select all

ping -c 1 -M do -s 1200 192.168.0.2
Which show that 1228 bytes is too much to reach 192.168.0.2, but not too much to reach 192.168.0.30.

In a packet capture, the response to each of these pings is a single packet, larger than 576 bytes, so there doesn't seem to be a 576 byte limit for traffic towards the Internet from the VPN server.

I don't have access to verify the VPN server's link MTU configuration myself, but the pings seem fairly conclusive.

Having the MTU set on the river system interface probably still has some effect on making it work, though.
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: An investigation into the MTU issues

Post by hamishmb »

I did it wrong, your investigation is right - I set the MTU on the river system interface, not the internet-facing one. I guess it works anyway though, because it's still limiting the MTU for data coming from the pis.
Hamish
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

Re: An investigation into the MTU issues

Post by PatrickW »

hamishmb wrote: 04/12/2020, 18:34 it's still limiting the MTU for data coming from the pis.
Dependent on the application. Mainly it limits the MTU for data going to the Pis.
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: An investigation into the MTU issues

Post by hamishmb »

I guess I could fix it, but I don't know if it will break something else if I try :lol:

One for when there's more time and headspace I guess.
Hamish
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

Re: An investigation into the MTU issues

Post by PatrickW »

Yes, I would leave it for now.

It turns out 1454 was not actually the correct MTU for our home broadband ISP. It should have been 1492. 1454 was a hangover from a previous ISP, left in place because the ISPs never prompt you to change it when you switch ISPs and the router has a stern warning not to change it unless the ISP tells you to! With that changed, I can now access the Engineers GUI without configuring MTU workarounds. Happy days.

And if I ping 1500 byte packets from outside with DF not set, the packets get fragmented into 1492 byte pieces to get down the line, which is an improvement (previously, 1500 byte packets from the outside were just lost).
Post Reply