Infrastructure/DB Replication Plan
Databases are currently a single point of failure in infrastructure. We'd like to come up with something that lets us reboot a db server and not have downtime. We have mostly postgres databases and one thing (the wiki) on mysql.
Owners: Toshio (abadger1999), Seth (skvidal), Kevin (nirik)
Ticket: Infra ticket 2718
Features we're looking for
These are the reasons that we want db replication. Anything less than this would be unacceptable
- Want to reboot db server. Sysadmin manually specifies that db1 is going away and db2 should take over
- Very short downtime
- less than 5 minutes on a switchover/failover event
- No loss of data. Once the db says data is committed there must be copies on other boxes
- Performance must meet our current demands but only our current demands.
- if we need to service 100 fas commits per second but the current (unreplicated) service could theoretically handle 1000 commits, the replication solution only needs to handle 100 commits, not 1000.
Really really want
If a solution has these and its competition doesn't chances are we're going to go with that solution.
- Auto failover
- Db1 stops responding. db2 automatically takes over.
- No downtime (as long as one db node is up)
Won't lose sleep over
May I have a pony too?
- load balancing (reads or writes)
- Currently we don't have load issues
- replication to other data centers
postgres technologies to explore
synchronous streaming replication + repmgr
The core replication is in postgres 9.x. repmgr adds command line scripts and automation to make switchover and failover much easier to manage.
PGPoolII is very flexible. We'll have to pick out the set of features that gives us what we want and test it heavily to see how it works.