Service-oriented architecture (SOA) provides an elastic and automatic way to discover, publish, and compose individual services. SOA enables faster integration of existing software components from different parties, makes fault tolerance (FT) feasible, and is also one of the fundamentals of cloud computing. However, the unpredictable nature of SOA systems introduces new challenges for reliability evaluation, while reliability and dependability have become the basic requirements of enterprise systems. This paper proposes an SOA system reliability model which incorporates three common fault-tolerance strategies. Sensitivity analysis of SOA at both coarse and fine grain levels is also studied, which can be used to efficiently identify the critical parts within the system. Two SOA system scenarios based on real industrial practices are studied. Experimental results show that the proposed SOA model can be used to accurately depict the behavior of SOA systems. Additionally, a sensitivity analysis that quantizes the effects of system structure as well as fault tolerance on the overall reliability is also studied. On the whole, the proposed reliability modeling and analysis framework may help the SOA system service provider to evaluate the overall system reliability effectively and also make smarter improvement plans by focusing resources on enhancing reliability-sensitive parts within the system. 1. Introduction Service-oriented architecture (SOA) has become a major distributed computing framework [1]. With characteristics like standardized interfaces, loosely coupled structure, cross-platform as well as elastic service discovery, deployment, and reuse capabilities, SOA opens a new door to faster integration of existing software components from different parties, especially in the scheme of Web services (WS). Legacy components may still live within the system via service adapters [2], which is good for enterprises which prefer system upgrades in gentle and stable way. It is noted that SOA also makes fault-tolerance (FT) techniques feasible for building reliable systems. Since it is difficult to build failure-free useful systems under limited development costs and the pressure of time to market, software fault tolerance [3], whose concepts originated from hardware reliability assurance, was proposed as an effective way to utilize redundancy to mask software failures and recover to normal operational states in a long running system. However, the extra costs of bringing out alternative software designs (redundancy) basically limit the applications of software
References
[1]
Z. Zheng and M. R. Lyu, “Collaborative reliability prediction of service-oriented systems,” in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE '10), pp. 35–44, May 2010.
[2]
H. M. Sneed, “Integrating legacy software into a service oriented architecture,” in Proceedings of the 10th European Conference on Software Maintenance and Reengineering (CSMR '06), pp. 11–14, March 2006.
[3]
M. Lyu, Software Fault Tolerance. Trends in Software, Wiley, 1995.
[4]
Z. Zheng and M. R. Lyu, “An adaptive QoS-aware fault tolerance strategy for web services,” Empirical Software Engineering, vol. 15, no. 4, pp. 323–345, 2010.
[5]
K. Go?eva-Popstojanova, A. P. Mathur, and K. S. Trivedi, “Comparison of architecture-based software reliability models,” in Proceedings of the 12th International Symposium on Software Reliability Engineering, pp. 22–31, IEEE, November 2001.
[6]
S. S. Gokhale and K. S. Trivedi, “Analytical models for architecture-based software reliability prediction: a unification framework,” IEEE Transactions on Reliability, vol. 55, no. 4, pp. 578–590, 2006.
[7]
R. C. Cheung, “A user-oriented software reliability model,” IEEE Transactions on Software Engineering, no. 2, pp. 118–125, 1980.
[8]
V. Grassi, “Architecture-based reliability prediction for service-oriented computing,” in Architecting Dependable Systems III, pp. 279–299, Springer, 2005.
[9]
K.-L. Peng and C.-Y. Huang, “Reliability assessment and analysis of incorporating fault tolerance into service-oriented architectural systems,” in Proceedings of the IEEE Internation Conference Industrial Engineering and Engineering Management (IEEM '12), 2012.
[10]
W.-L. Wang, D. Pan, and M.-H. Chen, “Architecture-based software reliability modeling,” Journal of Systems and Software, vol. 79, no. 1, pp. 132–146, 2006.
[11]
K. M. Chan, J. Bishop, J. Steyn, L. Baresi, and S. Guinea, “A fault taxonomy for web service composition,” in Proceedings of the Workshops of Service-Oriented Computing (ICSOC 707), pp. 363–375, Springer, 2009.
[12]
Soap Version 1.2—part 1: Messaging Framework, 2nd edition, 2007.
[13]
Web Services Description Language (Wsdl) Version 2.0—part 1: Core Language, 2007.
[14]
Web Services Reliable Messaging (Ws-ReliableMessaging) Version 1.2, 2009.
[15]
Web Services Coordination (Ws-Coordination) Version 1.2, 2009.
[16]
C.-L. Fang, D. Liang, F. Lin, and C.-C. Lin, “Fault tolerant web services,” Journal of Systems Architecture, vol. 53, no. 1, pp. 21–38, 2007.
[17]
P. P. W. Chan, M. R. Lyu, and M. Malek, “Reliable web services: methodology, experiment and modeling,” in Proceedings of the IEEE International Conference on Web Services (ICWS '07), pp. 679–686, July 2007.
[18]
S. Subramanian, P. Thiran, N. C. Narendra, G. K. Mostefaoui, and Z. Maamar, “On the enhancement of BPEL engines for self-healing composite Web services,” in Proceedings of the International Symposium on Applications and the Internet (SAINT '08), pp. 33–39, August 2008.
[19]
Z. Zheng, Y. Zhang, and M. R. Lyu, “Distributed QoS evaluation for real-world Web services,” in Proceedings of the IEEE 8th International Conference on Web Services (ICWS '10), pp. 83–90, July 2010.
[20]
B. Li, X. Fan, Y. Zhou, and Z. Su, “Evaluating the reliability of web services based on BPEL code structure analysis and run-time information capture,” in Proceedings of the 17th Asia Pacific Software Engineering Conference: Software for Improving Quality of Life (APSEC '10), pp. 206–215, December 2010.
[21]
K. H. Kim and H. O. Welch, “Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 5, pp. 626–636, 1989.
[22]
R. K. Scott, J. W. Gault, and D. F. McAllister, “Fault-tolerant software reliability modeling,” IEEE Transactions on Software Engineering, vol. 13, no. 5, pp. 582–592, 1987.
[23]
Business Process Model and Notation (Bpmn), 2011.
[24]
E. Nelson, “Estimating software reliability from test data,” Microelectronics Reliability, vol. 17, no. 1, pp. 67–73, 1978.
[25]
W. T. Tsai, D. Zhang, Y. Chen, H. Huang, R. Paul, and N. Liao, “A software reliability model for web services,” in Proceedings of the 8th IASTED International Conference on Software Engineering and Applications, pp. 144–149, November 2004.
[26]
V. Cortellessa and V. Grassi, “Reliability modeling and analysis of service-oriented architectures,” in In Test and Analysis of Web Services, pp. 339–362, Springer, 2007.
[27]
Web Services Business Process Execution Language Version 2.0, 2007.
[28]
J. Musa, “Operational profiles in software-reliability engineering,” IEEE Software, vol. 10, no. 2, pp. 14–32, 1993.
[29]
M. Rosen, B. Lublinsky, K. T. Smith, and M. J. Balcer, Applied SOA: Service-Oriented Architecture and Design Strategies, Wiley, 2008.
[30]
D. Gross, Fundamentals of Queueing Theory, John Wiley & Sons, Hoboken, NJ, USA, 3rd edition, 2008.
[31]
A. Saltelli, M. Ratto, T. Andres et al., Global Sensitivity Analysis. The Primer, John Wiley & Sons, Chichester, UK, 2008.