Monday, June 15, 2009

Survivability

Jianming 2008 - A Survivable Scheme for Critical Information Systems
  • Need/motivation: "Potential threats include failures (usually generated internally) due to software design errors, hardware degeneration, human errors, or corrupted data, hardware malfunctions, software flaws, environmental hazards, malicious and accidental (generally are externally generated events) human acts"
  • "In [9], a rigorous definition of survivability was presented. The survivability specification is a six-tuple, {S, E, D, V, T, P} where: S represents the specification set, E represents the service value factors, D represents the reachable environmental states, V represents relative service values, T and P represent the set of valid transitions and service probabilities. This definition is an engineering definition of survivability."
  • "Survivability focuses whether services of the whole system can survive in malicious environment but not the individual components" -- services here are not WS services, but instead functional
  • (1) Resistance and Recognition, (2) Recovery (checkpointing), (3) Adaptation (reconfiguraiton)
  • [9] Knight, Strunk, Sullivan, "Towards a rigorous definition of information system survivability", IEEE DISCEX 2003
Saridakis, "Surviving Errors in Component-Based Software", 2005 EUROMICRO-SEAA
  • In the introduction they have a pretty good breakdown of the main challenges that survivable systems must address
  • "However, fault tolerance techniques are based on some form of redundancy (e.g. service replication, data replication, state checkpoints, message logging, etc.) which makes them costly. This cost as system complexity (e.g. managing a replica group or taking checkpoints), resource consumption (e.g. additional hosts are needed to execute the service replicas and additional memory to store the replicated data, the checkpoints and the logs), and time penalty during system execution (e.g. delays in service delivery due to the time overhead in replica synchronizing or in saving checkpoints and logs on the stable storage).
  • Their paper is a about a concept known as "graceful degradation", basically a slow step down in functionality that reduces the overall performance of a system but still keeps it online. While this is not the exact same concept as a delta-federation, it is still related to some degree.
  • Also "Dependable systems" - "the capability of a system to outlast runtime errors and fulfill its mission"
  • "In other cases (e.g. [11]), mainly inspired from military and avionics domains, survivability describes the capability of a system to adjust its execution so ... can provide functionality despite damages .. due to errors"
  • Paper is on an optimistic graceful degradation approach. Identificaiton, isolation, adaption, repair. Optimisic maps errors to replacement functionality designed to fix, I think.
  • [11] Knight, Strunk, Sullivan, "Towards a rigorous definition of information system survivability", IEEE DISCEX 2003
Krings, "Design for Survivability: A Tradeoff Space" CSIIRW 2008
  • Survivability vs. Adaptation
  • "In [3, 19] survivability was described in terms of Resistance, Recognition, Recover, and Adaptation. Adaptation implemented the mechanism to adapt the system to knowledge gained in the prior three phases. Adaptation, in general, also encompases movements in the tradeoff space.
  • [3] Ellison, Fisher, Linger, Lipson, Longstaff, Mead "Survivable Network Systems: An emerging discipline" Technical Report CMU
  • [19] Mead, Ellison, Linger, Longstaff, McHugh, "Survivable Network Analysis Method", Technical Report CMU
Knight, Strunk, Sullivan, "Towards a Rigorous Definition of Information System Survivability", DARPA Information Survivability Conference and Expo 2003
  • Several definitions from different sources
  • Survivability: "the capability of..." , "a property of..." , ".. is measured by the probability...", "...qualified by specifying the range", "... the degree to which...",
  • Later definition "Survivability is the ability of a network computing system to provide essential services in the presence of attacks and failures, and recover full services in a timely manner"
  • Faults, are masked or unmasked. Fault avoidacnce may be used and this is not (according to them) an aspect of survivabiltiy.
  • They claim "survivabitliy is a measurable system characteristic" -- I am not sure about this.
  • Example is from miltary C2 (command and control), not very specific in details but its generalized and can be adaptive, different modes of operation and different needs.
  • {S, E, D, V, T, P}
  • V is user-defiend and ranked, user's perceived service value


0 comments: