Pinned toot

This Mastodon account will be used for all maintenance, upgrade and downtime information about the Organise.Earth and Rebellion.Global family of servers.

Please ensure that your XR Branch designates a tech coordinator to monitor this account.

Downtime is anticipated.

Love and Code,

@xradmin

Further update from the datacenter:

"Short update: ceph cluster is almost fully recovered, all VMs we checked are reachable. There is however a network problem persisting that causes packets from the server network to the outside world (including DNS requests to our own dns servers) to randomly drop. We are currently determining which of the switches is the main cause for this."

Due to too many downtime events, and with something of a heavy heart, we are forced to consider moving datacenter. An update will be provided in Mattermost as to the decision.

Further outage experienced today, affecting all servers at the datacenter, DataCenterLight. Statement from the datacenter:

"Update: due to a cascading network error from a broken switch, DNS resolution failed in the ceph cluster, causing slow failures of OSDs. We're recovering ceph cluster now."

There was outage experienced by some reaching Organise.Earth between 15:50 UTC to 16:30 UTC on Jun 23. It is being investigated.

There will be some intermittent service disruption of Mattermost, Forms, Cloud and other platforms in the next 30 minutes, during a database tuning operation.

An issue was corrected at forms.organise.earth where corrupted tables were resulting in 501 errors on attempt to load survey results and/or surveys. The database has been repaired and service can now resume.

Mattermost at Organise.Earth will go down for approximately 10 minutes of scheduled maintenance at 01:00 UTC, Weds 10 Jun.

Organise.Earth has been down for some hours last night (UTC) due to a disk containing an encrypted data partition locking up. The reason it ended up in exactly this state is unclear but being investigated. No data was lost. Apologies to all teams affected for this interruption of service.

From the datacenter:

"So from what we can see at the moment what happened is: the server was upgraded, the default ip6tables was replaced by an nftables wrapper, the migration was triggered and *before* the ip6tables alternative was reset to the legacy/old ip6tables setting an error was thrown as the nftables based variant complains about old rules being present."

Server has been restored. Still investigating as to actual cause. There were some alterations to routing tables at the datacenter made by the team there that affected other servers, but unclear if that is the cause as regards Organise.Earth

Organise.Earth core server is down. Investigating

Service is back up. Docker Registry (docker image repository) now enabled.

The plugin has been reactivated and seems stable.

The Mattermost BBB integration plugin has been temporarily disabled post upgrade, due to issues with the plugin reaching the API end-point.

The pad.organise.earth is down due to a corrupted table `etherpad.store`, seemingly caused by the shared disk read IO issues previously (now remedied at the datacenter). A database repair operation is underway.

Disruptions at the Swiss datacenter as they urgently upgrade the CEPH cluster to remedy a serious issue with shared disk read performance discovered by XR Global Tech, affecting all servers in the datacenter. We were given no forewarning of this event, and so could not alert teams in time. Apologies for the downtime.

The Global Base is presently down during rebuild after a failed plugin addition that introduced no issues in staging, yet did in production, for reasons presently unknown.

Show more
Mastodon

Mastodon instance for Extinction Rebellion