> ...ultimately the problem here has nothing to do with v6.
I agree... more or less. The remainder of this message is a reply to nyrikki, but I'm sticking it under your comment because you might also appreciate how weird it looks like this guy's setup is.
nyrikki: The rest of this message is directed directly at you:
============================
Actually, what's up with your link-local addresses? They have really odd flags on them.
The only way I can figure that you got into that configuration was to remove the kernel-generated link-local address and add a new one with the arguments 'scope link noprefixroute'. Even if a router on your network advertised a fe80::/64 prefix, that does nothing at all, as hosts are supposed to [0] ignore advertised prefixes that are link-local.
Yeah. After playing around with this for a bit, I can see that your network is at either least as misconfigured as one would be if -say- your DHCP server was giving leases with an invalid default gateway, or it is very, very specially configured for very special reasons.
Starting with the ubuntu-server host in the "IPv4 traffic is REJECTed" configuration from my last comment, we do this on the host to delete the kernel-supplied link-local address and instruct the OS to create an address in the link-local address space that can be used for global addresses.
root@ubuntu-server:~# ip addr del fe80::5054:98ff:fe00:64a9/64 dev enp0s3
root@ubuntu-server:~# ip addr add fe80::5054:98ff:fe00:64aa/64 noprefixroute dev enp0s3
root@ubuntu-server:~#
We then configure our upstream router to either* Send RAs on the local link without a prefix
or
* Send RAs on the local link with a link-local prefix (so they're ignored by the Ubuntu host)
or we hard-code the address of a next-hop router on our host. One (or more) of these three things sets up the host with a default route. If you do none of them, you don't get a default route, and global traffic goes nowhere.
Then -because either you or something running on the host deleted the kernel-provisioned link-local address, and then explicitly instructed the kernel to create a link-local address that can be used to reach global addresses- the local host starts emitting IPv6 traffic with a link-local source address and a global destination address.
When presented with this sort of traffic, my router immediately sends back a ICMP6 "destination unreachable, beyond scope", which immediately terminates the connection attempt on the host, so the behavior ends up being exactly the same as when the host didn't have a misconfigured link-local address. But. You claim to be having trouble.
So, there are one or more things that might be going on that explain your trouble.
1) You have a firewall on this host that is dropping important ICMP6 traffic, causing it to miss the "this destination address is beyond your scope" message from the router. Do. Not. Do. This. ICMP is network-management traffic which tells you important things. Dropping important ICMP traffic is how you have mysterious and annoying failures.
2) Your router is configured to ignore link-local traffic with non-link-local destination addresses, rather than replying that the destination is out of scope. On the one hand, this seems stupid to me, but on the other hand, we got here through a misconfiguration that seems very unlikely to me to happen often, [1] so the router admin might not have thought about it when making "locked down" firewall rules.
3) There's some middlebox on the path to the router that's dropping your traffic because not all that many folks would expect to see link-local source and global destination, and middleboxes are widely known for dropping stuff that's even a little bit abnormal.
Investigating your misconfigured host (and maybe also connected network) has been interesting. I'd love to try to figure out if SystemD can be misconfigured to produce the host configuration that we're seeing (or if this misconfiguration is 100% bespoke), but I hear a hot burrito calling my name. Maybe I'll get bored and do more investigation later.
Also, you might object to my conclusion with "But this couldn't happen on IPv4! Clearly IPv6 is too complicated!". I would reply with "What would happen if your host couldn't get a lease from a DHCPv4 server, autoconfigured an address in the IPv4 link-local (169.254.0.0/16) address range, and the network's upstream router was configured to silently drop traffic from that subnet? At least the IPv6 link-local address range is prohibited from sending traffic off the local link [2] and fails the transmission attempt immediately."
[0] ...and Ubuntu questing does ignore such prefixes...
[1] ...that is, a link-local address that has been configured to handle global traffic...
[2] ...unless -as we've discovered- you specifically tell the OS otherwise...
> So, there are one or more things that might be going on that explain your trouble.
Ah, there's secret option #4:
4) This rather weird configuration has been deliberately set up by the sysadmin that manages this system and network and ordinarily works fine, but the "external transitive failure that happened on April 15th." affected both IPv4 and IPv6 traffic (which, duh, that happens frequently)... but it was an intermittent failure so unrelated changes made by OP caused him to come to the wrong conclusions and point the blame cannon at the wrong part of the system.
Okay. Burrito time!
> Actually, what's up with your link-local addresses? They have really odd flags on them.
They were probably configured by one of the fancy network config daemons (systemd-networkd, dhcpcd or similar). They like to take over RA processing, and they add IPs with "noprefixroute" so they can add the route themselves separately.
RAs have nothing to do with link-locals, but I bet one or the other of those daemons also takes over configuring link-local addresses and does the same thing there. If you looked in the routing table, there'll be a prefix route for fe80::/64 that was added by the daemon.
This wouldn't affect how DNS replies are sorted though. On machines without non-link-local v6, AAAA records aren't handled by trying them first and then expecting them to quickly fail. They're handled by pushing them to the bottom of the list so that the A records are tried first.