The cluster replicates changes synchronously through global ordering, but applies these changes asynchronously from the originating node out. To prevent any one node from falling too far behind the cluster, Galera Cluster implements a feedback mechanism called Flow Control.
Nodes queue the write-sets they receive in the global order and begin to apply and commit them on the database. In the event that the received queue grows too large, the node initiates Flow Control. The node pauses replication while it works the received queue. Once it reduces the received queue to a more manageable size, the node resumes replication.
Galera Cluster provides global status variables for use in monitoring Flow Control. These break down into those status variables that count Flow Control pause events and those that measure the effects of pauses.
SHOW STATUS LIKE 'wsrep_flow_control_%';
Running these status variables returns only the node’s present condition. You are likely to find the information more useful by graphing the results, so that you can better see the points where Flow Control engages.
For instance, using myq_gadgets:
$ mysql -u monitor -p -e 'FLUSH TABLES WITH READ LOCK;' \ example_database $ myq_status wsrep Wsrep Cluster Node Queue Ops Bytes Flow Conflct time name P cnf # name cmt sta Up Dn Up Dn Up Dn pau snt dst lcf bfa 09:22:17 cluster1 P 3 3 node3 Sync T/T 0 0 0 9 0 13K 0.0 0 101 0 0 09:22:18 cluster1 P 3 3 node3 Sync T/T 0 0 0 18 0 28K 0.0 0 108 0 0 09:22:19 cluster1 P 3 3 node3 Sync T/T 0 4 0 3 0 4.3K 0.0 0 109 0 0 09:22:20 cluster1 P 3 3 node3 Sync T/T 0 18 0 0 0 0 0.0 0 109 0 0 09:22:21 cluster1 P 3 3 node3 Sync T/T 0 27 0 0 0 0 0.0 0 109 0 0 09:22:22 cluster1 P 3 3 node3 Sync T/T 0 29 0 0 0 0 0.9 1 109 0 0 09:22:23 cluster1 P 3 3 node3 Sync T/T 0 29 0 0 0 0 1.0 0 109 0 0
You can find the slave queue under the Queue Dn column and FC pau refers to Flow Control pauses. When the slave queue rises to a certain point, Flow Control changes the pause value to 1.0. The node will hold to this value until the slave queue is worked down to a more manageable size.
See Also: For more information on status variables that relate to flow control, see Galera Status Variables.
When Flow Control engages, it notifies the cluster that it is pausing replication using an FC_Pause event. Galera Cluster provides two status variables that monitor for these events.
In addition to tracking Flow Control pauses, Galera Cluster also allows you to track the amount of time since the last SHOW STATUS query during which replication was paused due to Flow Control.
You can find this using one of two status variables:
Galera Cluster provides two sets of parameters that allow you to manage how nodes handle the replication rate and Flow Control. The first set controls the write-set cache, the second relates to the points at which the node engages and disengages Flow Control.
These three parameters control how nodes respond to changes in the replication rate. They allow you to manage the write-set cache on an individual node.
gcs.recv_q_hard_limit This sets the maximum write-set cache size (in bytes). The parameter value depends on the amount of RAM, swap size and performance considerations.
The default value is SSIZE_MAX minus 2 gigabytes on 32-bit systems. There is no practical limit on 64-bit systems.
gcs.max_throttle This sets the smallest fraction to the normal replication rate the node can tolerate in the cluster. If you set the parameter to 1.0 the node does not throttle the replication rate. If you set the parameter for 0.0, a complete replication stop is possible.
The default value is 0.25.
gcs.recv_q_soft_limit This serves to estimate the average replication rate for the node. It is a fraction of the gcs.recv_q_hard_limit. When the replication rate exceeds the soft limit, the node calculates the average replication rate (in bytes) during this period. After that, the node decreases the replication rate linearly with the cache size so that at the gcs.recv_q_hard_limit it reaches the value of the gcs.max_throttle times the average replication rate.
The default value is 0.25.
When the node estimates the average replication rate, it can reach a value that is way off from the sustained replication rate.
The write-set cache grows semi-logarithmically with time after the gcs.recv_q_soft_limit and the time needed for a state transfer to complete.
These parameters control the point at which the node triggers Flow Control and the factor used in determining when it should disengage Flow Control and resume replication.
gcs.fc_limit This parameter determines the point at which Flow Control engages. When the slave queue exceeds this limit, the node pauses replication.
It is essential for multi-master configurations that you keep this limit low. The certification conflict rate is proportional to the slave queue length. In master-save setups, you can use a considerably higher value to reduce Flow Control intervention.
The default value is 16.
gcs.fc_factor This parameter is used in determining when the node can disengage Flow Control. When the slave queue on the node drops below the value of gcs.fc_limit times that of gcs.fc_factor replication resumes.
The default value is 0.5.
Bear in mind that, while it is critical for multi-master operations that you use as small a slave queue as possible, the slave queue length is not so critical in master-slave setups. Depending on your application and hardware, the node can apply even 1K of write-sets in a fraction of a second. The slave queue length has no effect on master-slave failover.
Warning: Cluster nodes process transactions asynchronously with regards to each other. Nodes cannot anticipate in any way the amount of replication data. Because of this, Flow Control is always reactive. That is, it only comes into affect after the node exceeds certain limits. It cannot prevent exceeding these limits or, when they are exceeded, it cannot make any guarantee as to the degree they are exceeded.
Meaning, if you were to configure a node with:
That node can still exceed that limit from a 1Gb write-set.