For the duration of this post, [domain] shall have the implicit value of a FQDN.
Problem Description
Attempts to fetch public rooms fails.
Research initially suggested an improper federation setup causes this to fail.
However, research/testing indicate it is due to Cloudflare proxy. But this is not full/complete solution.
Steps to Reproduce
On vanilla matrix-synapse install with homeserver set up and working, Cloudflare for DNS
From within local network (internal)- Attempt to fetch public room list from matrix.org from matrix client Element
Receive âFailed to fetch room listâ
Use a VPN or phone hotspot to reach homeserver from external network
Repeat attempt to fetch public room list from matrix.org using Element matrix client
Receive "request failed: CORS request rejected:ârequest failed: CORS request rejected: [domain]_matrix/client/r0/publicRooms?server=matrix.orgâ
Expected Results
Receive list of public rooms
Actual results
From the internal network- âFailed to fetch room listâ
From external to network- ârequest failed: CORS request rejected: [domain]_matrix/client/r0/publicRooms?server=matrix.orgâ
System setup information
Router with HAProxy forwarding 80,443,8448 traffic to freedombox.
Local DNS server on network.
Freedombox is installed in a VM with two network connections-
First network is set to be internal and firewall rules enforce no WAN traffic allowed to address.
Second is set to be external and firewall rules enforce no LAN communication to any address but local DNS server.
Freedombox is NOT serving DHCP or any other networking service.
Known Goods-
FreedomBox page can be accessed from both internal and external paths; users can login from either route.
FreedomBox can resolve matrix-federation.matrix.org domain as records in DNS server empirically prove.
Matrix server accessible inside and outside home network; normal communications between users.
[domain]/.well-known/matrix/server is set and accessible from internal and external paths.
Findings Thus Far-
Research first from the internal network error suggested issue with federation setup.
[domain]/_matrix/keyv2/server is accessible from internal network but not external- This was discussed in several federation articles but Iâm not sure what role it plays
Setting Cloudflare to straight DNS allows me to see room listings but joining still fails after a period of time, causing me to circle back to the above regarding the _matrix/keyv2/server route
What logs should I check? What further information can I produce to help solve this?
Iâm passing federation test at https://federationtester.matrix.org/ with cloudflare straight DNS service (no proxy) set.
All green according to it.
Now able to fetch room lists. But âJoiningâ shows for a minute or two before throwing âFailed to joinâ
Looking at logs, a few things jump out.
1- An absurd quantity of the following in logs synapse.http.site - 353 - INFO - PUT-5754- Connection from client lost before response was sent
Mostly PUTs but also GETS as well.
2- A bunch of these synapse.http.server - 690 - WARNING - PUT-6605- Not sending response to request <SynapseRequest at 0x7fb07a8454f0 method=âPUTâ uri=â/_matrix/federation/v1/send/1651286692393â clientproto=âHTTP/1.1â site=â8448â>, already disconnected.
Often followed by synapse.access.https.8448 - 427 - INFO - PUT-6605- ::ffff:192.168.1.1 - 8448 - {monero.social} Processed request: 142.479sec/-112.475sec (0.002sec, 0.000sec) (51.023sec/0.004sec/1) 0B 200! âPUT /_matrix/federation/v1/send/1651286692393 HTTP/1.1â âSynapse/1.57.1â [0 +++dbevts]
3- Not sure of this is helpful but I see a bunch of these as well. Different domains but PDU and EDU values always the same. synapse.federation.transport.server.federation - 102 - INFO - PUT-6660- Received txn 1641052295767 from serious.im. (PDUs: 0, EDUs: 1)
4- Sometimes, item 3 lines above are followed by synapse.util.caches.response_cache - 256 - INFO - PUT-6660- [fed_txn_handler]: using incomplete cached result for [(âserious.imâ, â1641052295767â)]
When I attempt a join of a public server, a lot of the above but nothing else in the logs stands out. No ERROR or FAILURE or such.
Closest thing I get is synapse.handlers.federation_event - 1283 - INFO - POST-3976- Persisting 1909 of 12039 remaining outliers: [â$mRfLRqTa5t6_8y0rFn-Kmzszwhu77TFQ3gxo452pzRcâ, â$ddkZ8s_3Qczuh4c4HA86Jt5eJFPwg2Mpv4yW2KR0Dc0â, â$2GvN01slN8XilH1CD08KpHLjpH8Lf9cdTNisr-W3P4oâ, +++â$pEYf1hAnCEn1qgfZdooseNmgouKmEAuDTwswNGbAHxgâ, â$0erwxwskvL1NT9n29W6i7-yj__6N4224fMuZgEchtdYâ, âŚ]
New update- When checking the phone Element client this morning for messages on my own server, I noticed notifications that did NOT correspond to my server or private messages with my users.
Lo and behold, two of the three or four rooms I had tried joining yesterday (from desktop) had succeeded. Fetching messages is measurable in geological epochs though.
Starting my desktop client reveals the same situation. Two rooms visible, nonexistent loading of messages.
Confirmed- Whatever happened with federation, it was the source of the new behavior of bad connectivity for existing clients/users, as well as slow loading of messages, outright rejection/failure to post messages, and other issues.
A restoration from a back up before the federation work returned service to normal.
Side note- Have backups. Weekly at the least once you get things relatively set. Then backup before any âexperimentationâ or âfixesâ. Saves a lot of headaches.
Do you mean attempt to fetch the list of public rooms hosted on synapse setup from your Freedombox?
It may be unrelated, I tested synapse a little and I noticed that joining any room on matrix.org with an account on my freedombox (running on Olinuxino Lime2) using Quarternion takes several minutes. When it comes to fetching a list of public rooms, perhaps I did not wait long enough.
That somehow encouraged me to give up matrix and stick to XMPP and IRC.
And that I traced that issue (it seems) to Cloudflare proxy address service. Turning it off (ie regular DNS service) lets me fetch public room list. It is fairly snappy and pops up in a second or three.