Paxos Made Transparent: presenting CRANE, an SMR system to replicate general server programs.

Cui, Gu, Liu, Chen and Yang

Paxos Made Transparent: What we will cover

  • Introduction and problem description
  • State Machine Replication and Paxos
  • Deterministic Multithreading and PARROT
  • Time-bubbling and CRANE
  • Implementating CRANE
  • Evaluating CRANE
  • Conclusion and Class Discussion

Introduction and Problem Description

In a network we have a situation as portrayed below. Several clients making requests from a server:

A happy network is one where the server is available.

This means that replication to handle faults that may occur is a desirable feature. So let's replicate everything a server does in a simple, consistent, practical, and reliable way!

Oh that's easy! We just need to:

understand and implement Paxos.

Also to make it work more generally, combine it with a deterministic scheduler that can reliably handle many threads coming in asynchronously.

Oh, and of course, pick a cool bird acronym!

Paxos is an island off the west coast of Greece. Leslie Lamport imagined an ancient parliamentary system:

“ Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers.
As we learned in lecture on Tuesday, Paxos is a distributed algorithm where asynchronous client requests can be processed in a fault tolerant manner via consensus. It is a protocol for state machine replication or SMR.
  • State Machine $\Rightarrow$ consists of a set of states, a set of transitions between states, and a current state. Transition to a new state happens after an operation is issued. The new state can be the same state, modeling a read only operation.
  • If this machine can no longer transition states, then it has crashed. But in an asynchronous environment, we can't tell if a machine is slow or it has crashed.
Enter State Machine Replication or SMR:
  • It masks failures, like crashes.
  • Surely we want to replicate deterministic machines, so same input + same program = same output.
  • For example, a DNS server.
Ok, this is a distributed systems class, so we are getting rid of that deterministic thing right?

Yes!

..ish..

We need to distinguish determinism and stability:
Enter PARROT...not an acronym, parrots are just cool!

A practical runtime for deterministic, stable and reliable threads.

  • It reduces the number of schedules for all inputs.
  • It allows for programmer added "hints" to prioritize computations.
  • It will ignore determinism for performance critical sections.
So a Parrot in Paxos...we are done right?

Hi I'm a client! S'up, client here. Me too. Don't forget me, I'm a client.

...and we are out of sync...

Traffic is bursty:

For example: two clients make simultaneous HTTP PUT and GET requests for the same page.

Different arrival times lead to different outcomes!

Enter time-bubbling!

Two parameters for this technique:

  • $W_{timeout}$: "the physical duration that the primary's DMT scheduler waits before it request consensus on a time bubble insertion."
  • $N_{clock}$: "the number of logical clocks within each time bubble."

Correctly ReplicAting Nondeterministic Executions

Yay! They did it!

  • Uses CRIU (open-source tool) to checkpoint registers and memory.
  • Uses LXC (Linux containers - like Docker) to checkpoint file system state.
It is easy to use:

diff -ruN httpd-2.2.11/modules/generators/mod_cgid.c httpd-2.2.11-modified/modules/generators/mod_cgid.c
--- httpd-2.2.11/modules/generators/mod_cgid.c	2015-06-23 19:11:58.484786127 +0800
+++ httpd-2.2.11-modified/modules/generators/mod_cgid.c	2015-06-23 19:12:56.676785084 +0800
@@ -564,8 +564,10 @@
     ap_log_error(APLOG_MARK, APLOG_ERR, err, r->server, "%s", description);
 }
 
+#include 
 static int cgid_server(void *data)
 {
+    tern_disable_sched_paxos();  
     struct sockaddr_un unix_addr;
     int sd, sd2, rc;
     mode_t omask;
	    
an example of a performance hint
github.com/columbia/crane

Evaluation

  • They tested CRANE using a set of 3 replicas:
    • Linux 3.13.0, 1Gbps bandwidth LAN
    • 2.80 GHz dual-socket hex-core Intel Xeon with 24 hyperthreading cores
    • 64GB memory, and 1TB SSD

  • On 5 widely used server programs:
    Apache Mongoose
    MediaTomb ClamAV
    MySQL

Discussion

  • CRANE can be summarized as "PARROT (stable deterministic multithreading) with PAXOS" - why is both stability and determinism important?

  • What are some limitations of the CRANE and/or PARROT systems?
  • CRANE performance depends on adding PARROT "performance hints" to indicate "these are the major computations to line up." The authors emphasize the brevity of these hints - what would you suggest to also make it easier to diagnose and correctly insert them?

  • What suggestions for enhancement or further work would you make for the CRANE system (with a primary goal of encouraging adoption and broad utility)?

The End

nitrofix.github.io/distributed-birds