A good analogy can often be useful to explain intricate technical details. In an earlier article, “Wrong Fish Food”, I related an analogy that I used to describe a technical issue to a non-technical audience. This article shares an analogy I created for a technical audience, because sometimes even techies need an analogy to grasp an unfamiliar technology.
For the past few years I’ve been responsible for maintaining a subsystem that allows a point-of-sale system to communicate with external applications. These external applications may display information gathered from the POS and poke inputs back into it. I was involved in the initial development of the subsystem back in 2004 or so, and when the other developers left for other positions it fell to me to continue enhancing and supporting the subsystem.
I wasn’t happy with the first implementation because of some design problems, but I couldn’t change the design too much without impacting existing integrations with the subsystem. Instead, over the course of a year or so I made several improvements to the bits of code that integrated the subsystem with other applications and improved the API that the subsystem exposes.
Considering the number of transactions it supports every day across hundreds of installations, I think the subsystem is rather fast and stable. Occasionally, however, whenever there is a performance issue with an application that uses the subsystem, the subsystem itself gets blamed, and I am usually called upon to investigate the issue. While there are some scenarios that can trigger bottlenecks with the subsystem, the problems I investigate are pretty much never the fault of the subsystem.
Finally, after having to explain the intricate technical details yet again, I decided to come up with an analogy to explain how the subsystem works, how it could be a bottleneck is some scenarios, and why it usually wasn’t the bottleneck in spite of the possibility. I e-mailed the analogy to all interested parties, and it seems to have had the desired effect. I’ve edited the analogy to remove specific product and subsystem names and provided it here. In the wrap-up, I changed the name of the subsystem to NPS, for Note-Passing Subsystem.
The Story of Olympic Note Passing
We’re going to imagine a strange sport called Olympic Note Passing. Two runners run around adjacent circular tracks: Runner A, and runner F. Each time runner A reaches the outside portion of the track he picks up a message that he will deliver to runner F. Runner F will then deliver this message to the other side of track F. Never mind why this happens; some sports are just unusual (think golf or curling).
Now, the first time this sport was introduced into the Olympics, the messages were passed from runner A to runner F via a specially designed, and somewhat magical, box. This box accepts messages via a device like a bill acceptor, and on the other side of the box is a device like a bill dispenser that spits out formatted, translated copies of the messages that were put into the other side. This way, runners from different nations that speak different languages can still participate in the sport and understand the messages being passed from A to F.
Runners on the two tracks could run at independent speeds, so that runner A could stop for just a moment and place a message into the box before taking off around the track again, and runner F could stop for just a moment to pick up any messages that were waiting.
There was a problem with this magic box, however. If runner A was placing messages into one side of the box, runner F could not take any messages out of the box until A was finished. Likewise, if runner F was taking messages out, runner A couldn’t place any messages into the box. This turned out to be a problem because the runners had to keep stopping and waiting for each other.
This state of affairs threatened to kill the fledgling sport before it ever got started, since spectators didn’t want to see the two runners stopping, starting, and waiting for each other all the time, so the Olympic committee had to come up with a solution. One intrepid committee member suggested designing new magic boxes so that messages could be inserted and extracted at the same time, but the committee had already spent a lot of money designing the existing boxes, and any new boxes would have to be the same size, shape, and color as the existing boxes and would have to work EXACTLY the same way in every other respect. Furthermore, the guy that designed the original boxes was now working for NASCAR and doing equally silly things to the Car of Tomorrow.
The way the committee decided to fix this problem was to station a person on either side of the magic box: helper A, and helper F. Now, runner A just hands messages to helper A on the way around the track, and helper A inserts the message into the machine while runner A keeps moving. Likewise, helper F extracts any waiting messages from the machine and hands them off to runner F when he comes around the track. If helper A sees that the machine is busy and he can’t insert any messages, he will just put the new messages in his pocket until the machine is available. Helper F will pull messages out of the machine as soon as they are available and keep them in his pocket until runner F comes around the track. If helper F has no messages, he will wave runner F on around the track.
Now, we’ll bring all of this around to the actual NPS design. Runner A is an AwesomePOS system, and Runner F is FooApp. The magic box is NPS. The messages being passed through the magic box are NPS messages. The redesigned sport, with helper A and helper B, represents the new NPS integration design introduced at SuperCustomer in late 2009 and early 2010. The people stationed on either side of the magic box are the independent threads that are now running in AwesomePOS and FooApp that service the NPS message queue.
With this picture in mind, it should be a little more apparent why NPS is pretty much never the source of any performance issues in current implementations, but why it can be a performance bottleneck if the integration is not done carefully. If AwesomePOS or FooApp have to keep stopping and starting to deal with NPS, it can impact the performance of the entire system. With the new design, however, NPS is much less of a bottleneck since runners A and F can now keep moving around the track a lot more smoothly, but messages can still get from one side of track A to the other side of track F no more quickly than the slower runner can move.