I'm starting to hear more and more people interested in building adaptive behaviors into their SOA. By "adaptive", I mean "autonomic" or "self healing" - control systems geeks might call this a "closed loop non-linear control system". A simple example is that, when the SOA is under heavy load (e.g. gold customer SLAs are close to being breached), reroute gold customers to services in an alternate data center; however, under normal load conditions, all customers are routed to the services in the local data center.
This requires a dynamic change in the behavior of the routing, based on feedback about current values of SLA metrics. Using adaptive behaviors can optimize an SOA as well as making it more reliable and robust.
Of course, coming up with conceptual uses for adaptive behavior is easy. The real challenge is implementing them in a way that makes things better, not worse. Unless you're careful, there can be a lot of pitfalls in making these systems robust.
There are two separate parts to implementing an adaptive system: what to do (the action), and when to do it (the trigger).
In the earlier example, the action is toggle between one of two different routing behaviors. The first, and most important point, is that all actions must be testable. You can't (or at least shouldn't) use your production systems as a guinea pig. These adaptive behaviors are changing the way the application behaves and you need to test these different behaviors in a safe environment (typically QA or system test) to make sure the application works correctly when the adaptation is on, off, and (especially) when it changes state (something that can get tricky if you have to worry about issues like data replication to make the adaptation work).
Once you've worked out any kinks with the adaptive actions, the next step is to figure out when to turn these on and off. While you might have an idea of what the triggers should be, my experience shows that this is often just a guideline - you often requires production experience to get this right. One way to do this is to let the operators "get the feel" for the adaptations by manually controlling them and seeing how the application responds to the changes under real load.
Once you're comfortable with how the adaptive application behaves in the real world, you can start to automate turning the adaptations on and off via policy and rules. Because you've tested the adaptations in QA, even if you encounter an issue with your trigger rules, the good news is that the application won't break.
The bad news is that you can still encounter significant and hard to predict issues. One example is that the trigger action keeps firing and the adaptation rapidly turns on and off. This can range from annoying but harmless switching to severe performance degradations. This happens for the same reason that a room's temperature is always hotter or colder than the thermostat setting: even after you trigger the change, it takes time to take effect and stabilize.
In the case of a room, the heater needs to warm up and the heat wave needs to reach the thermostat. In the case of a change in routing, requests that have already started are still being processed on the original system, and the first requests routed to the new system may not perform as well as later requests (e.g. because caches, threads, connection pools, etc. are still being spun up).
To be successful you need to turn the adaptations on before you actually need them, and turn them off only after you're sure they won't just flip back on immediately. This is where getting it right requires the "feel" for the how the system behaves. Once you've automated this, you may still end up with situations where the system doesn't behave as expected (an application is a complex thing -- much more complex than a room thermostat). The simplest way to address this is to provide an operator override. If the system starts behaving strange, let the operation quickly and easily force the adaptation to stay either on or off until the system stabilizes. With these types of administrative controls in place, you'll be protected against the unknown.
I've only scratched the surface of the value and complexity of adaptive applications, but hopefully you've gotten a feel for how you can tame the beast and get value out of adaptive applications - while controlling the risk.