Mar 1 ’08

SOA & Response Time: A Generational Issue?

by Editor in z/Journal

Service-Oriented Architecture (SOA) and acceptable Web service response times have always seemed a dichotomy. SOA is clearly a major goal of Enterprise Information Technology (EIT), and satisfying users is what EIT is all about.  

SOA is focused on asset reuse, reducing costs and risks, and making the overall business more agile. The enterprise is balancing the costs and risks against the:

 • Assets’ security

• Assets’ utility aspects to the user

• Richness of the assets’ function to the user

• Ability to make the business more agile by publishing the assets in a manner that can be categorized and reused. 

This article focuses on the utility aspects of assets and how response time affects the utility of an asset, depending on the age of the users. 

Response time is usually the measure of the time from the point the requester makes the service request until the time the service provider completes the first required information payload back to the requester. A payload might be the message content required for the first screen or the delivery of the messages required to trigger the next sequential operation at the requester. 

The basic advice regarding user perceived response times has been about the same for 40 years [Miller 1968; Doherty & Thadani 1982; Card et al. 1991; Jakob Nielsen 2006]:  

• 0.1 second or less is about the limit for having a user feel the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result. This also is the response level that Doherty & Thadani dealt with in their 1989 paper, “The Economic Value of Rapid Response Time.” In this treatise, it was shown that when a computer and its users interact at a pace that ensures neither has to wait on the other, productivity soars, the cost of the work done on the computer tumbles, employees get more satisfaction from their work, and quality tends to improve.

• One second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1 second, but the user does lose the feeling of operating directly on the data, especially if the response time is inconsistent. If a user always gets a 5-second response, he becomes used to it. When system response goes from 1 to 6 seconds, errors start to occur. Immediately, the user thinks something is wrong.

• Ten seconds is considered the limit for keeping the user’s attention focused on the system interaction. For longer delays, users will want to perform other tasks while waiting for the system to respond, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users then won’t know what to expect. (Current studies indicate 8 seconds may be the upper limit.) 

Traditional thinking has held that response times should be as fast as possible so users can maintain focus on the task at hand and not introduce errors into the current or next task. The longer the response time, the more errors might occur. Consider the merit of this perspective: 

• When you’re conducting online banking and hit the “pay” button and the system goes away for what seems an eternity, don’t you wonder if the transaction is hung up? Should you reboot? Will you pay that bill twice if you try to exit now?

• At work, you may be doing a task and, if the system response is slow, your mind may go to the vacation you’re about to take. When the system response finally arrives, you may have forgotten what you were doing. So you either enter something wrong or have to stop and review what you’ve done. 

When the system can’t provide an immediate response, the user should see some form of continuous feedback. This might be in the form of a percent-done indicator [Myers 1985]. As a rule, percent- done progress indicators should be used for operations taking more than about 10 seconds. Progress indicators have several pluses; they:

 • Reassure the user that the system hasn’t crashed but is working on their task

• Indicate approximately how long the user can expect to wait, allowing the user to do other activities during long waits

• Provide something for the user to look at, making the wait somewhat less painful. 

Have you ever done a Windows update and gotten the progress bar that never moves? Having a progress bar that never progresses is worse than having no bar. You’re sure the update is broken. Should you abort, reboot, wait? 

Don’t put in a progress indicator unless it really works! Consider adding a conditional element for borderline response time assets that inspects the current average response time and interjects dynamic progress indications to the requesters when more than 10-second delays start occurring. The conditional element might report the agreed-upon Service Level Agreement (SLA) use counts against the actual use counts to show latency issues. In old terms from the ’70s and ’80s, we would call the latency issue the peak hour demand. If the configurations are provisioned for 1,000 instances of asset use, and the actual use is 2,000, insufficient resources exist to maintain the agreedto service level. Either additional provisioning or new SLAs are required. 

Response Time Paradigm Is Generational  

The following statement may cause a furor in the IT community, but it’s still true: The view of acceptable response is a generational issue. Some business applications of technology require real-time, sub-second responses. You probably wouldn’t like to be flying around the skies under control of a system using a Web Services SOAP implemented SOA! It would be too slow for a real-time requirement, by far. 

But consider the workplace entry of the Y Generation, also known as the Millennials (see Claire Raines’ 2002 book, Managing Millennials). The term Generation Y first appeared in an August 1993 AD Age editorial to describe those children born between 1981 and 1995. The scope of the term has greatly changed since then to include, in many cases, anyone born as early as 1976 and as late as 2000. Generation Y is the end product of Generation X. The X’ers defined themselves by their tech-friendly acceptance of the Internet, desktop computers, and multitudes of other high-tech personal devices. They’re also heavily into multi-tasking and have little tolerance for impediments and delays. The Millennials take this to the extreme and wonder why a process takes overnight when the application of technology could make the process’s results continuously available. 

 Millennials will multi-task around consistent, lengthy response times. Once they’ve experienced the delay, they’ll simply pick up another task during the wait. 

So what does that have to do with anything? OK, think about it. The original views of response time came in the era of transactional systems and involved people who thought better meant faster, smaller, and cheaper. That era projected we would have supersonic transports flying us around the world in a half hour. So what has actually happened? The only supersonic passenger plane, the British Concord, died a slow death because it could never be economically flown. We now build jumbo aircraft that carry more passengers at lower speed and use less fuel. Economics will always win over the need for speed. 

Passengers today accept the sardine-packed seating and the actual slower speed transient time between airports at the price asked by the airlines. What infuriates passengers are delays exceeding the posted transient time. While our current Web services are much faster than the original analog modem sessions, users have become tolerant of requests that take several seconds to more than 10 seconds. However, they won’t tolerate erratic responsiveness. “Sticky users” will move to another equivalent service provider if responsiveness remains inconsistent. (These users aren’t loyal to one Website or service provider. If things are too slow or cumbersome, they will be off searching for a service that’s faster or easier to navigate.) 

The newer generation of user can accept elongated response times if they’re: 

• Consistent in nature

• Indicate progress status

• Offer reasonable value for the service being performed. 

Consider the implications. If you went to a business function manager and said: “We can give you the new widget at one-third the cost and a quarter of the time of the original estimates, but you’ll have to accept that it runs 20 percent slower than the old widget.” What do you think the manager would say? If the manager is Generation X or Generation Millennial, chances are he or she will agree to the proposal. 

Consistency Is Key  

In a June 2007 article for ComputerworldUK titled “Eight Rules of Network Performance Management,” Joel Trammel noted: “ 

The best way to understand the notion that all performance is relative is to ask someone who uses a networked system or application if a 3-second application response time is good or bad? The answer is, it depends. If the normal response time is 10 seconds, a 3-second-response time is very good. But if the normal response time is 1 second or less, 3 seconds isn’t very good at all. For the same measurement, different circumstances lead directly to different interpretations.” 

Performance is usually based on either previous experience or a user’s changing expectations. 

Consistent response is the key to SOA acceptance. The timely provisioning of session and asset resources to ensure response consistency is a critical factor. 

At the IBM 2006 SOA Architect Summit, a presentation on SOA governance discussed the impact of high transaction rates on response time. New and reasonable SLAs are required to set service requester expectations to the appropriate level and provide the correct service targets for the governance team’s operational measurements. 

A recent Microsoft manual, Principles of Service Design: Patterns and Anti- Patterns, offers some relevant points: “ 

(SOA) services are dynamically addressable through URIs, enabling their underlying locations and deployment topologies to change or evolve over time with little impact upon the service itself (this is also true of a service’s communication channels). While these changes may have little impact upon the service, they can have a devastating impact upon applications consuming the service. What if a service you were using today moved to a network in New Zealand tomorrow? The change in response time may have unplanned or unexpected impacts upon the service’s consumers.  

“Service designers should adopt a pessimistic view of how their services will be consumed—services will fail and their associated behaviors (service levels) are subject to change. Appropriate levels of exception handling and compensation logic must be associated with any service invocation. Additionally, service consumers may need to modify their policies to declare minimum response times from services to be consumed. For example, consumers of a service may require varying levels of service regarding security, performance, transactions, and many other factors. “ 

A configurable policy enables a single service to support multiple SLAs regarding service invocation (additional policies may focus on versioning, localization, and other issues). Communicating performance expectations at the service level preserves autonomy since services need not be familiar with the internal implementations of one another. “ 

Service consumers are not the only ones who should adopt pessimistic views of performance—service providers should be just as pessimistic when anticipating how their services are to be consumed. Service consumers should be expected to fail, sometimes without notifying the service itself. Service providers also cannot trust consumers to “do the right thing.” For example, consumers may attempt to communicate using malformed/malicious messages or attempt to violate other policies necessary for successful service interaction. Service internals must attempt to compensate for such inappropriate usage, regardless of user intent.” 

Both the requester/consumer and the service provider must agree to clear explanations of response time patterns. These patterns must incorporate the overhead to deal with the concepts of Web services and Microsoft’s stated pessimistic rationale. 

Users of existing green-screen 3270 legacy applications might scoff at the idea of adding 20 percent to the transactional response time, but can live with it if they: 

• Always get the same response characteristics

• Understand why these characteristics exist

• See the new corporate assets that SOA opens to them as a reasonable tradeoff for the 20 percent. 

The Generation Millennials will more than understand. They should embrace the SOA concept. But they, too, will have to be trained as to why some response times may vary when errors are introduced into the SOA solution. 

When developing a Web servicesbased SOA, remember your audience. What generation are they from? Structure your SOA approach to best fit the decision makers’ and influencers’ generations.