Matrix Federation Issue and Cloudflare

For the duration of this post, [domain] shall have the implicit value of a FQDN.

Problem Description
Attempts to fetch public rooms fails.
Research initially suggested an improper federation setup causes this to fail.
However, research/testing indicate it is due to Cloudflare proxy. But this is not full/complete solution.

Steps to Reproduce

  1. On vanilla matrix-synapse install with homeserver set up and working, Cloudflare for DNS
  2. From within local network (internal)- Attempt to fetch public room list from matrix.org from matrix client Element
  3. Receive “Failed to fetch room list”
  4. Use a VPN or phone hotspot to reach homeserver from external network
  5. Repeat attempt to fetch public room list from matrix.org using Element matrix client
  6. Receive "request failed: CORS request rejected:“request failed: CORS request rejected: [domain]_matrix/client/r0/publicRooms?server=matrix.org”

Expected Results
Receive list of public rooms

Actual results
From the internal network- “Failed to fetch room list”
From external to network- “request failed: CORS request rejected: [domain]_matrix/client/r0/publicRooms?server=matrix.org”

System setup information
Router with HAProxy forwarding 80,443,8448 traffic to freedombox.
Local DNS server on network.
Freedombox is installed in a VM with two network connections-
First network is set to be internal and firewall rules enforce no WAN traffic allowed to address.
Second is set to be external and firewall rules enforce no LAN communication to any address but local DNS server.

Freedombox is NOT serving DHCP or any other networking service.

Known Goods-
FreedomBox page can be accessed from both internal and external paths; users can login from either route.
FreedomBox can resolve matrix-federation.matrix.org domain as records in DNS server empirically prove.
Matrix server accessible inside and outside home network; normal communications between users.
[domain]/.well-known/matrix/server is set and accessible from internal and external paths.

Findings Thus Far-
Research first from the internal network error suggested issue with federation setup.
[domain]/_matrix/keyv2/server is accessible from internal network but not external- This was discussed in several federation articles but I’m not sure what role it plays

But the CORS error points me to this thread where in part, a respondent blames cloudflare and the underlying server. Room directory returns an error ¡ Issue #5180 ¡ matrix-org/synapse ¡ GitHub

Setting Cloudflare to straight DNS allows me to see room listings but joining still fails after a period of time, causing me to circle back to the above regarding the _matrix/keyv2/server route

What logs should I check? What further information can I produce to help solve this?

Update:

I’m passing federation test at https://federationtester.matrix.org/ with cloudflare straight DNS service (no proxy) set.
All green according to it.
Now able to fetch room lists. But “Joining” shows for a minute or two before throwing “Failed to join”

Looking at logs, a few things jump out.

1- An absurd quantity of the following in logs
synapse.http.site - 353 - INFO - PUT-5754- Connection from client lost before response was sent

Mostly PUTs but also GETS as well.

2- A bunch of these
synapse.http.server - 690 - WARNING - PUT-6605- Not sending response to request <SynapseRequest at 0x7fb07a8454f0 method=‘PUT’ uri=’/_matrix/federation/v1/send/1651286692393’ clientproto=‘HTTP/1.1’ site=‘8448’>, already disconnected.
Often followed by
synapse.access.https.8448 - 427 - INFO - PUT-6605- ::ffff:192.168.1.1 - 8448 - {monero.social} Processed request: 142.479sec/-112.475sec (0.002sec, 0.000sec) (51.023sec/0.004sec/1) 0B 200! “PUT /_matrix/federation/v1/send/1651286692393 HTTP/1.1” “Synapse/1.57.1” [0 +++dbevts]

3- Not sure of this is helpful but I see a bunch of these as well. Different domains but PDU and EDU values always the same.
synapse.federation.transport.server.federation - 102 - INFO - PUT-6660- Received txn 1641052295767 from serious.im. (PDUs: 0, EDUs: 1)

4- Sometimes, item 3 lines above are followed by
synapse.util.caches.response_cache - 256 - INFO - PUT-6660- [fed_txn_handler]: using incomplete cached result for [(‘serious.im’, ‘1641052295767’)]

When I attempt a join of a public server, a lot of the above but nothing else in the logs stands out. No ERROR or FAILURE or such.

Closest thing I get is
synapse.handlers.federation_event - 1283 - INFO - POST-3976- Persisting 1909 of 12039 remaining outliers: [’$mRfLRqTa5t6_8y0rFn-Kmzszwhu77TFQ3gxo452pzRc’, ‘$ddkZ8s_3Qczuh4c4HA86Jt5eJFPwg2Mpv4yW2KR0Dc0’, ‘$2GvN01slN8XilH1CD08KpHLjpH8Lf9cdTNisr-W3P4o’, +++’$pEYf1hAnCEn1qgfZdooseNmgouKmEAuDTwswNGbAHxg’, ‘$0erwxwskvL1NT9n29W6i7-yj__6N4224fMuZgEchtdY’, …]

New update- When checking the phone Element client this morning for messages on my own server, I noticed notifications that did NOT correspond to my server or private messages with my users.

Lo and behold, two of the three or four rooms I had tried joining yesterday (from desktop) had succeeded. Fetching messages is measurable in geological epochs though.

Starting my desktop client reveals the same situation. Two rooms visible, nonexistent loading of messages.

Its also definitely sent the server into wonky land. Clients now show regular lost connectivity with server, then restore after about 15-20 secs.

Confirmed- Whatever happened with federation, it was the source of the new behavior of bad connectivity for existing clients/users, as well as slow loading of messages, outright rejection/failure to post messages, and other issues.

A restoration from a back up before the federation work returned service to normal.

Side note- Have backups. Weekly at the least once you get things relatively set. Then backup before any “experimentation” or “fixes”. Saves a lot of headaches.

Do you mean attempt to fetch the list of public rooms hosted on synapse setup from your Freedombox?

It may be unrelated, I tested synapse a little and I noticed that joining any room on matrix.org with an account on my freedombox (running on Olinuxino Lime2) using Quarternion takes several minutes. When it comes to fetching a list of public rooms, perhaps I did not wait long enough.

That somehow encouraged me to give up matrix and stick to XMPP and IRC.

1 Like

No. Public rooms of other servers.

And that I traced that issue (it seems) to Cloudflare proxy address service. Turning it off (ie regular DNS service) lets me fetch public room list. It is fairly snappy and pops up in a second or three.