Database replication refers to the frequent copying of data from one node—a database on a server—into another. Think of a database replication system as a distributed database, where all nodes share the same level of information. This system is also known as a database cluster.
The database clients, such as web browsers or computer applications, do not see the database replication system, but they benefit from close to native DBMS behavior.
Many Database Management Systems replicate the database.
The most common replication setup uses a master/slave relationship between the original data set and the copies.
In this system, the master database server logs the updates to the data and propagates those logs through the network to the slaves. The slave database servers receive a stream of updates from the master and apply those changes.
Another common replication setup uses mult-master replication, where all nodes function as masters.
In a multi-master replication system, you can submit updates to any database node. These updates then propagate through the network to other database nodes. All database nodes function as masters. There are no logs available and the system provides no indicators sent to tell you if the updates were successful.
In addition to the setup of how different nodes relate to one another, there is also the protocol for how they propagate database transactions through the cluster.
In theory, there are several advantages that synchronous replication has over asynchronous replication. For instance:
Traditionally, eager replication protocols coordinate nodes one operation at a time. They use a two phase commit, or distributed locking. A system with number of nodes due to process operations with a throughput of transactions per second gives you messages per second with:
What this means that any increase in the number of nodes leads to an exponential growth in the transaction response times and in the probability of conflicts and deadlock rates.
For this reason, asynchronous replication remains the dominant replication protocol for database performance, scalability and availability. Widely adopted open source databases, such as MySQL and PostgreSQL only provide asynchronous replication solutions.
There are several issues with the traditional approach to synchronous replication systems. Over the past few years, researchers from around the world have begun to suggest alternative approaches to synchronous database replication.
In addition to theory, several prototype implementations have shown much promise. These are some of the most important improvements that these studies have brought about:
The certification-based replication system that Galera Cluster uses is built on these approaches.