Although we have several blog posts about replication that we have posted on our blog for instance, the topic of replication being a single thread and on semi-synchronous replication or how to estimate the capacity of replication I’m not sure there’s one that goes over the basics about how MySQL replication works on the highest level. It’s been a long time that I’m unable to locate the post. So, I’ve thought of writing one right this moment.
Of obviously, there are many aspects to MySQL replication however my primary focus will be on the logistics – the way that replication events are recorded onto the master and then how they get transferred to the slave that is responsible for replication and the way they are then applied to the slave. This isn’t an HOWTO create duplicate. Instead, it’s a HowStuffWorks kind of thing.
MySQL Replication events
I’m referring to instances of replication in this article since I’m trying to avoid discussions about the different formats for copy. They are well-documented within the MySQL manual here. Put these events can fall into one of two kinds:
- Statement-based on statements. In which this case, they write queries
- In this instance, row-based is a chance to record row changes, as you would call them.
However, other than that, I’m not planning to go back to the differences in replication using different formats of image, mainly since there’s nothing different in moving data updates.
On the master
Let me begin by examining what’s happening at the master. To enable replication, first, the master must write the image’s events into an additional binary log. This is generally a very light operation (assuming that events aren’t synchronized to disk) since writes are buffered and the writes are sequential. Binary log files contain data that the replication enslaved people read in the future.
When a replication enslaved person connects to the enslaver, the master begins an entirely new thread to handle the connection (similar to the one employed by different server clients). Then he does whatever the client – or replication enslaved person in this instance needs to do. The majority of this will be (a) feeding the replication enslaved person with events recorded in the binary log and (b) informing the enslaved person about new events written to his binary log.
Enslaved people that are up-to-date are mainly reading events that remain being stored in OS caches on the master, which means there will not be any disk readings from the master to send binary logs to slave(s). If you connect a replica slave that’s a couple of minutes or days late, it will begin to read binary logs written several hours or even days ago. Master may not have them cached, meaning disk reads will occur. If the master doesn’t have available IO sources, then you might be able to feel a bump at this moment.
On the replica
Let’s look at what’s going on with the enslaved person. Once you initiate the replication process, two threads are created simultaneously on your slave.
1. Thread for IO
The process, referred to as “IO Thread,” is a connection to the master and reads log messages in binary form from the master whenever they are received and then transfers them to the local log file, which is called a relay log. That’s all.
Although there’s only one thread that reads binary logs from the master and another writing relay logs for the enslaved person often, copying events from replication is the slowest part that is part of the replica. There may be an interruption in the network that causes an unabated delay of several hundred milliseconds, but this is all it takes.
If you’re interested in seeing what the IO thread is at the moment, then you should check the following information in “show slave statusG”:
- Master_Log_File The last copy of the file off the master (most of the time, it will be identical to the previous binary log that the master wrote)
- Read_Master_Log_Pos, The binary log of the master, is transferred over onto the relay’s record of the enslaved person until this point.
Then you can examine that output from “show master statusG” from the master.
2. SQL thread
The second one is called SQL thread, which takes events from a relay log stored in the local file on the replication enslaved person (the file was created via IO thread) and then applies them as quickly as possible.
This thread is the one that is often blamed as being single-threaded. Returning to the step “show slave statusG,” you can determine the status of the SQL thread by using these variables
- Relay_Master_Log_File – the binary log from master that SQL thread is “working on” (in reality, it is working on relay log, so it’s just a convenient way to display information)
- Exec_Master_Log_Pos What position in the master binary log is being run by an SQL thread?