There is an online college bookstore app where many students can purchase books. Every time a student logs on, it shows a list of suggestions based on their previous purchase history. The SQL Server that stores the customer data is in Seattle, but these students are logging on from all over the world. Therefore, performance may suffer and those who are farther away across the WAN may experience a time lag for queries.
Instead of having the students who are farther away suffer slow page loading times, replication may be used to copy and maintain database objects in multiple sites and sync later so that consistency is maintained. Each site keeps the portion of the database that contains the data that is most relevant to them and most frequently used. Now, each student can purchase books on the website, and the data will be synced later.
How Data Replication Works
There are several server components, and they take on different roles in order to implement replication. A publisher role is a database instance where the source of the data resides, and it contains objects that are designed as replication articles. These articles are grouped together and published in a publication so that the data is replicated as a unit. The publisher can have multiple publications.
A distributor role is a database instance that holds the distribution databases. Each publisher is mapped to a single distribution database that stores the replicated data from the publisher that is to be passed to the subscriber. The distributor could be set up as a local distributor, meaning a single server instance can serve in the roles of both the publisher and distributor. If a distributor is configured on separate servers, it is referred to as a remote distributor.
A subscriber role is the instance(s) that receives the replicated data by subscribing to the publications. The subscriber is not limited and is eligible to receive data from multiple publishers, and the objects may be updated depending on the replication type. If applicable, the publisher would receive these changes from the subscriber and republish the data.
In general, the subscriber receives changes to the data in two ways: through push subscription or pull subscription. The difference is in which server component performs the updates. With push, the distributor pushes or directly updates the subscriber database. With pull, the subscriber checks in with the distributor to see if there were any changes, and it will perform the update itself.
Three Types of Data Replication
To implement replication, several agents are used to carry out the jobs associated with copying changes, keeping track of changes, and distributing the data. What agents are needed depends on the type of replication used. There are three main types of replication.
1. Snapshot Replication
Snapshot replication is the simplest type of data replication and is used if the data is not changed as often or if small volumes of data need to be replicated. For example, if there are tables that don’t get updated a lot, then Snapshot Agents can be used to copy the entire database once or repeatedly according to a schedule. Then, a Distribution Agent is responsible for transferring these files to the subscriber.
This technique requires little maintenance because what gets distributed is a snapshot of the data at a specific moment in time. Also, there is no need to monitor for changes because every time a subscriber receives an update, it overwrites the entire copy of the data.
Unfortunately, copying an entire database may contribute to high latency or more waits than desirable. Generating snapshots requires holding locks on objects. It is not convenient if data is being changed often and likely to impact performance—for example, if the publisher has lots of insert, update, and delete activity.
In addition to using the Snapshot Agents to create the snapshots, transactional replications also take advantage of Log Reader Agents that run at the distributor. The Log Reader Agent reads transaction logs of the publisher database and only delivers the marked changes instead of waiting on an entire database. This provides flexibility because it gives you room to decide how much of the database to publish (e.g., a column). Then, the Distribution Agent moves the transactions to the subscribers, and where it runs will accommodate the push and pull subscription strategies respectively.
2. Transactional Replication
Standard transactional replication implies that the data at the subscriber is read-only. However, there are different publication types that allow modifications to be made at the subscriber. If these changes are made, they can be relayed back to the publisher to republish. The Queue Reader Agent is used for bidirectional transactional replication, and it will read changes from the queue and apply those at the publisher.
Transactional replication is very beneficial in a server-to-server environment where changes can be made at the publisher and at the subscriber in real time—for example, real-time data relating to what flights are currently available for an airline. It does not make sense to use snapshot replication in this case because updates are typically synced once per day or according to a schedule.
3. Merge Replication
Merge replication is like transaction replication, but it allows updates at both the subscriber and publisher to be merged together. Many subscribers can go offline, make updates to the data at different times, and then go back online and sync those changes later.
This type of replication is likely to be used in server-to-client environments such as mobile clients. Like snapshot and transaction replication, the initial snapshot is created by the Snapshot Agent, but then the Merge Agent will track the changes and resolve conflicts with triggers. If multiple subscribers are updating the same rows, they can cause an issue. Therefore, conflict resolution needs to be accounted for.