Why Chiara_PEAR_Server will not implement push-based mirroring

Gregory Beaver
Lot 49: Music, Life
4 min readMay 19, 2005

--

While investigating unrelated code, I stumbled upon this article:

The part I find most directly applicable to the question of how to implement mirroring in Chiara_PEAR_Server is quoted below:

[quote]The push and pull methods both fail if the consumer isn’t informedof a change in the producer. The difference however is this, whichparty holds the responsibility for ensuring compatibility In generalthe initiating party holds responsibility. That is the initiating partymakes the decision as to what is the most compatible access path. Also,in general there are more client implementations than serverimplementations, therefore it is less demanding a task for a clientimplementation to upgrade than a server. From a global perspective theamount of effort is identical, however at the local level (where itreally counts) the effort differs significantly. Therefore a pull basedimplementation has a greater chance of ensuring compatibility.

An example of asychronous publish and subscribe implemented in RESTfulfashion is the ad-hoc RSS network. Let’s compare RSS a polledpull asynchronous model versus SMTP a pushed asynchronous model. Thecommon explanation why email spam is difficult to fix because the SMTPhas no mechanism to authenticate the source of the email message.However, the problem lies deeper than that, because SMTP is a pushnetwork, for it to function correctly, all the nodes of the networkneed to be upgraded in unison. Furthermore, message propagation areinitiated by the source and not by the consumer it becomes moredifficult to avoid spoofing.

RSS by contrast has less difficulty upgrading its infrastructure, infact today despite having multiple incompatible variants, the RSSnewtork keeps on ticking. Spoofing is much more difficult sincetransfers are initiated by the consumer. One would have to hack in theaccess path in each and every consumer to pretend to be someone else.The network can be incrementally upgraded because a change in theproducers format doesn’t break consumers, it’s simply ignored.Consumers choose the access path and therefore can choose the path thatis most compatible. For example if I have a news aggregator thatdoesn’t understand ATOM, I can always select a path (i.e. FeedBurner)that translats the ATOM feed to an older format.

Yet again we see a pattern recurring, that is the consumer has theright to ignore certain features. It’s easy to be reminded of Postel’s Law, that is “Be liberal in what you accept, and conservative in what you send”.[/quote]

This is the primary reason that Chiara_PEAR_Server’s interaction with mirrors will always be pull-based rather than push-based. In practical terms, this means that a Chiara_PEAR_Server-based primary site will never need to know about its mirrors, and upon a data change, such as a new package release, Chiara_PEAR_Server will not attempt to notify its mirrors about the change. Instead, the mirror will be responsible for pulling the new data from the primary site.

This will automatically allow for far more complex mirroring scenarios, such as a mirror pulling data from another mirror. Imagine a situation where the primary site located in New York is almost always inaccessible to a mirror site located in Papau-New Guinea. However, a mirror site located in New Delhi is accessible to both the New York site and the Papau-New Guinea mirror. The New Delhi mirror could simply pull data from Chiara_PEAR_Server-based site, and the Papau-New Guinea mirror from the New Delhi mirror. None of them need to actively communicate back and forth (although the mirrors must both be listed in channel.xml, but email should be sufficient to arrange that as it is a one-shot event traversing through many hubs). If we implemented a push-based mirroring, the New Delhi site would be forced to run Chiara_PEAR_Server as well, and there would need to be complex arrangement of which mirrors it should push to and which it should not. With pull-based mirroring, there is no need for anything complex, and even a simple “grab the files and run” download sync-based mirror should be more than sufficient to implement complete mirroring with little effort at all. Implementing extremely complex mirroring paths is in fact a piece of cake when you take push-based mirroring out of the equation.

However, to reduce bandwidth waste, it will be important to at some point implement a changelog xml that can be used to determine which files need to be synchronized, and this will most likely be the next thing I add to REST-based xml generation.

Side note: Chiara_PEAR_Server’s REST implementation is not the most complex, as all data is read-only. If we needed to model any server modification via POST requests, mirroring would probably be a bit trickier, but not for the mirrors, as the primary host would simply update the static xml upon receiving a POST-based REST request. Fortunately, I don’t need to worry about this, because there are no plans in the near or far future to implement REST-based remote modification of a channel server — the security headaches involved are tremendous, and it is a lot easier to control things through the html backend.

--

--