DS: Distributed Systems (Level 11)

Course summary/outline

A distributed system is broadly categorized as a collection or network of loosely coupled, autonomous computers that can communicate with each other and execute logically separate computations, though these may be related to concurrent computations on other nodes.

* The nodes are relatively loosely coupled.
* Each node is a self-contained autonomous computer with its own peripherals.
* The system can survive various categories of node and network failures.
* The nodes may execute logically separate computations, though these may be related to concurrent computations on other nodes.
* The system may be modeled as synchronous or asynchronous.

Distributed systems have become pervasive-many applications now require the cooperation of two or more computers-yet the design and implementation of such systems remain challenging and complex tasks. Difficulties arise from the concurrency of components, the lack of a global clock and the possibility of independent failure of components. Moreover designs must aim to provide inter-operability, transparency and autonomy.

The emphasis of this module is on gaining understanding of the principles and concepts that are used to design distributed systems and how network and communication facilities can be leveraged to achieve efficient distributed computing. We will cover the following topics

  1. Introduction and overview - Need for distributed systems (Failures, consistency, delay, etc.)
  2. Architecture & Communication - Scalability, Load balancing, Partitioning, RPC
  3. Fault Tolerance - Failure models, Reliability, Recovery
  4. Coordination - Ordering & Causality, Distributed transactions, Concurrency Control, Consensus/Agreement
  5. Consistency & Replication - Epidemic algorithms, Consistency Models, Replica management
  6. Distributed Storage - File systems, Large Scale systems
  7. Issues - Energy/Power, Security, Local OS Support, Verification, Testing
Learning Outcomes

On completion of this course, the student will be able to:

  1. Develop an understanding of the principles of distributed system and be able to demonstrate this by explaining them.
  2. Being able to give an account of the trade-offs which must be made when designing a distributed system, and make such trade-offs in their own designs.
  3. Develop practical skills of implementation of distributed algorithms in software so that they will be able to take an algorithm description and realize it in software.
  4. Being able to give an account of the models used to design distributed systems and to manipulate those models to reason about such systems.
  5. Being able to design efficient algorithms for distributed computing tasks.
License
All rights reserved