Classify possible failure points of distributed applications

Classify the possible failure points of a distributed application.

E

Expert

Verified

Client is not able to locate server: return error

Lost request messages: simple time-out mechanisms

Lost replies: timeout mechanisms

  • Make operation idempotent
  • Use sequence numbers, mark retransmissions

Server failures: did failure take place before or after operation?

  • At least once semantics (SUNRPC)
  • At most once
  • No guarantee

Client failure: what happens to server computation?
Referred to as an orphan

Extermination:

  • logging at client stub and explicitly kill orphans
  • Overhead of keeping disk logs

Reincarnation:

  • Divide the time into epochs between failures and delete computations from old epochs
  • Gentle reincarnation: upon a new epoch broadcast, try to locate owner first (delete only if no owner)

Expiration: give each RPC a fixed quantum T; explicitly request extensions

   Related Questions in Computer Networking

©TutorsGlobe All rights reserved 2022-2023.