Can’t Be Any Faster Than That: A Real-Life Experiment with Latency in a Geo-distributed Environment

Introduction

A question frequently asked is how much latency does Galera have in geo-distributed environments. After all, snail mail can also be used for database replication, but the latency will not be acceptable.

Let’s see how Galera performs.

A Two-Datacenter Setup

We start by creating a two-node cluster using a pair of Amazon EC2 regions that are as far apart Internet-wise as it gets – Sydney and São Paulo. The underwater cable maps do not show a direct link between Australia and South America.

The ICMP round-trip time between the two regions is a steady 316 ms:


root@ip-172-31-4-77:/home/ubuntu# ping ec2-54-94-217-199.sa-east-1.compute.amazonaws.com
PING ec2-54-94-217-199.sa-east-1.compute.amazonaws.com (54.94.217.199) 56(84) bytes of data.
64 bytes from ec2-54-94-217-199.sa-east-1.compute.amazonaws.com (54.94.217.199): icmp_seq=1 ttl=51 time=316 ms
64 bytes from ec2-54-94-217-199.sa-east-1.compute.amazonaws.com (54.94.217.199): icmp_seq=2 ttl=51 time=316 ms
...

The following bare-bones configuration lines in my.cnf are enough to get us started:


[mysqld]
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_buffer_pool_size=256M
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_provider_options="gcache.size=300M;gmcast.segment="
wsrep_node_address=""
wsrep_cluster_address="gcomm://"

We pick a unique numeric gmcast.segment ID for each region in order to indicate to Galera that nodes belong to different regions, which is important for cases where there is more than one node per region.

Let’s see how Galera deals with standard 1K-byte inserts:


mysql> create table t1 (f1 varchar(1000)) engine=innodb;
Query OK, 0 rows affected (0.37 sec)

mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.36 sec)

The operation takes 360 msec, indicating that there is no unnecessary back-and-forth when replicating the data. The Galera protocol is not chatty and gets the job done in the smallest amount of time that is physically possible.

A Three-Datacenter Setup

We now add another node, this time in Northern Virginia, to see how its presence will impact the overall latency of the database. The ICMP round-trip time between Sydney and Northern Virginia is 238 ms and between São Paulo and Northern Virginia is 119 ms.

Here is how Galera fares:


# Inserting from Sydney:

mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.35 sec)

# Inserting from São Paulo:

mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.34 sec)

# Inserting from Northern Virginia:

mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.34 sec)

As you can see, adding a third node halfway across the world from the first two did not cause the latency to increase even a single bit! Galera does not require additional sequences of round-trips in order to keep more nodes synchronized.

More Nodes Per Region

Let’s add another node in an existing remote region, in this case São Paulo. We configure both nodes at the same location with the same value for gmcast.segment.

And here are the numbers:


# Inserting from Sydney:
mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.35 sec)

# Inserting from São Paulo #1
mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.35 sec)

# Inserting from São Paulo #2
mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.34 sec)

# Inserting from Northern Virginia
mysql> insert into t1 values (REPEAT('a',1000));
Query OK, 1 row affected (0.32 sec)

There is no impact whatsoever from the presence of a second node in one of the most remote regions. Galera does not consume additional round-trips. Furthermore, the gmcast.segment facility ensures that all data travels only once to São Paulo and is then replicated internally to the two nodes in that region.

Replicating Large Writes

What about large writes? Can Galera handle them? Let's find out. With some tuning, we get:


mysql> INSERT INTO t2 VALUES (REPEAT('a', 512 * 1024)),(REPEAT('a', 512 * 1024)),(REPEAT('a', 512 * 1024)),(REPEAT('a', 512 * 1024)),(REPEAT('a', 512 * 1024));
Query OK, 5 rows affected (0.67 sec)

Or we can ship 2.5Mb worth of data in less than a second from within a single transaction.

Conclusion

Latency between regions is unavoidable due to the limitations of the speed of light and the performance of the routing hardware. However, Galera's optimized protocols ensure that no time is wasted in additional round-trips between regions and there is no unnecessary duplication of data during transmission.