DS-week5

Architectures of Distributed Systems

  • An obvious way to distinguish between distributed systems is on the organisation of their software components, in other words, their software architecture
    • Centralised architectures (eg. traditional client-server)
    • Decentralised architectures (eg. p2p)
    • Hybrid architectures

Software Architectural Styles

  • A software architectural style is formulated in terms of components, the way that components are conencted to each otters, data exchange between components, and how elements are jointly configured into a system
  • A variety of architectural styles
    • Layered architectural styles
    • Object-based architectural styles
    • Resource_centered architectural styles
    • Event-based architectural styles

Layered Architectural Style

  • Lj can make downcall to Li (i<j) and expects a response

  • bottom layers provide services to top layers

  • request flow from top to bottom, response from bottom to top

Object-Based Architectural Style

  • each object coresponds to a component, and component connect through procedure call in a network

  • provides a way of encapsulating data and operations into a single entity

  • Communication happen as method invocations, called Remote Procedure Calls (RPC)

Resource-Centered Architectural Style

  • A DS is viewed as a huge collection of resources that individually managed by components

  • based on a data centre

The Event-Based Architectural Style

  • processes running on various components are both referentially decoupled and temporally coupled, one process does not explicitly know any other process

  • the only thing a process can do is publish a notification describing the occurence of an event

  • processes may subscribe to a specific kind of notification

Centralised System Architecture

  • A server and a client

Multi-Tiered Client-Server Architectures

  • Three logical tiers
  • Two types of machines
    • a client machine
    • a server machine
  • All functinality is handled by the server, client is no more than a dumb terminal
  • Many distributed applications are divided into the three layes
    • user interface layer
    • processing layer
    • data layer
  • The main challenge to clients and servers is to distribute these layers across different machines
  • A server may sometimes act as a client

Decentralised System Architectures: P2P

  • better workload balance
  • a client or server may be physically split up into a number of logical parts, this is horizontal distribution
  • each process will act as a client and a server
  • processes are organised in an overlay network
  • two types of overlay networks:
    • structured
    • unstructured

Structured P2P Systems

  • nodes are organised in an overlay that adheres to a specific, deterministic topology (eg. a ring, a binary tree, a grid etc)
  • topology used to efficiently loop up data
  • and node can be asked to loop up a given key

Unstructured P2P Systems

  • each node maintains an ad hoc(临时的) list of neighbours, eg. random graph
  • changes its local list almost continuously
  • searching for data is necessary

Examples of Searching Methods

  • Flooding
  • Random Walks

Making Data Search more Scalable in Unstructured P2P

  • to improve scalability of data search, it make use of special nodes that maintain an index of data items, creating special collaborations among nodes

Collaborative Distributed Systems

  • BitTorrent (download from other users until form a complete file)
  • global directory
  • contains a link to the file tracker, a server of active nodes

Edge-Server Systems

  • following properties
    • Are deployed on the internet
    • servers are placed “at the edge” of the network
  • helps reduce latency, bandwidth usage and improves overall performance

Week4

A Simple Machine

  • A machine makes available computing and storage resources
    • computing resource, eg. CPU
    • storage resource, eg. a given amount of primary memory
  • A process is an executing instance of a program
  • Computing and Resource usage are typically controlled by and operating system
  • an OS aims to make the most efficient use possible of those resources
  • the OS assigns a unique identity to each process and then controls how a process is granted access to computing resources
  • the OS also controls how a process is granted access to storage resoures by assigning an address space to that process
  • when the OS ensures that each process P has a single address space A that is exclusive to P, we are allowed a sequential reading of the steps that comprise the process

Sequential reading of the steps that comprise the process

  • The OS ensures that no other process tampers with x and y, and only foobar has access to x and y

Sequential processing

  • if foo and bar take long to run, we wish to run them concurrently, in different processes, and even better, in parallel, in different processors

Sequential, isolated processing is simple, but bounded and limiting

  • Non sequential, non isolated processing expands the bounds and limits with respect to performance

Sequential vs. Multi-Processing, Concurrent, Parallel and Distributed Computing

Multi-Processing vs. Multi-Tasking

  • Two different concepts
  • An OS multi tasks by:
    • allowing more than one process to be underway by controlling how each one makes use of the resources allocated to it
    • implementing a scheduling policy, which grants each active process a time slice during which it can access the resources allocated to it
  • In multi-tasking, processes are not really exeuting concurrently
  • the appearance of concurrent execution stems from an effective scheduling policy
  • If all processes get a fair share of the resource and they get it sufficiently often, it seems to users that all processes are execcuting concurrently

Multi-Processing by Forking

  • causes two copies of itself to be active concurrently

  • The child process is given a copy of the parent process’s address space. The address spaces are distince

    • the child process starts executing after the OS call
    • the parent can continue or wait for the child to execute
    • the parent must find out how and when the child completes execution
  • because of the copying, forking can be expansive

  • modern OSs have strategies that make the actual cost quite affordable

  • forking is reasonably safe because the address spaces are distinct

  • When forking is used in multi-processing, consistency of processing results is more likely to occur than in cases where multi-processing is achieved by threading

Multi-Processing by Threading

  • if parent and child need to interact and share, threading may be a better approach to multi-tasking

  • with threading, the address space is not copied, it is shared

    • this means that if one process changes a variable, all other processes see it

    • this makes threading less expensive, but also less face than forking

Concurrent Computing

  • consider **many ** application processes

  • processes are often threads (the OS schedules the execution of n copies of process Pi, 1<= i <= n, to run in the same processor, typically sharing a single address space)

Parallel Computing

  • many processors bound by and interconnect (eg. a bus)

  • there is truly many processes running at the same time

Distributed Computing

  • many independent, self-sufficient, autonomous, heterogeneous machines
  • spatial separation
  • message exchange is needed, network effects are felt
  • complexity may reach a point in which applications are not written against OS services. Instead, they are written against a middleware API. The middleware then takes some of the complexity upon itself

Week3

Naming in Distributed Systems

  1. uniquely identify entities
  2. name resolution refers to the means by which a process is allowed to access a named entity, which supported by a naming system
  3. every computer connected to the internet needs to be “addressable

Possible approaches to addressing mechanisms

  1. centralised
  2. Free-for-all
  3. by delegating naming responsibilities

The internet is the biggest distributed system

Addressing machanisms

Centralised Naming Approach

  1. Any name is handed out once and only once
  2. a single point of contact

Limitation: single point of contact is not a very scalable solution, and creates a single point of failure

Free-for-All Naming Approach

  1. Allows any object wants a name to make up its own name
  2. Although its a **massively ‘distributed’ solution **which avoids single point of failure, it does not guarantee uniqueness

THE ‘Delegating Naming Responsibilities’ Approach

  1. authority to allocate names is delegated to smaller parts of the system
  2. this approach better balances the conflicting issues associated with single points of failure

but what rules are appropriate for each system?

examples: MAC addresses, IP addresses and Domain Names

MAC Addresses

  1. a unique identifier given to each network device in a system: meaning that every ethernet or wifi card in a computer has one MAC address
    • There are more MAC addresses than computers (since most computers have serveral network devices)
    • a MAC address is 48 bit number
  2. a MAC consist of two main parts:
    • the ‘Organisationally unique identifier’ (OUI)
    • the ‘Network interface controller’ (NIC)
  3. A MAC address does not tell you where a device is on a network

IP Addresses

  1. Unique identifier and contains some information about where a device is on a network

  2. most IP addresses are 32 bit numbers, but are most often written as four 8 bit numbers separated by dots

  3. Top-level authority for IP addresses is the Internet Assigned Numbers Authority (IANA)

    • Unlike MAC address, the delegation of IP addresses tkaes place initially to geographical regions

    • For this reason, IP addresses can tell you some information about the location of a device on a network

Domain Names

  1. created because humans find IP addresses hard to read
  2. Domain Name System (DNS) is used to create associations between human-readable names and IP addresses (eg. www.bbc.co.uk)
  3. ‘delegation model’ is complex, as it has aspects of geographical delegation
    • eg. ‘.co.uk’ are for UK-based companies
  • DNS cannot allocate batches of names upfront, and need to respond in real time to requests to ranslate names into IP addresses

  • achieve this by being a Distributed System consisting of a hierarchy of servers with the most authoritative server at the top of the hierarchy

Protocols

  • define sets of rules how two or more objects should interact with one another

  • serve as specifications rather than implementations

    (eg. HTTP)

HTTP Protocol: Statelessness

  • Request-response protocol, transfer data between web servers and web clients (typically web browsers)

  • Content exchange protocol

  • web servers are said to be stateless, meaning that once a request from a client application is fulfilled, the web server disconnects from the client and “forgets” that the client ever connected

Email

  • more complex system than ‘the Web’
  • has to be in exactly one place at any one time
    • if it is in two places at the same time then it has been duplicated accidentally
    • If it is in zero places then it has been lost in the system, in detriment(harmed or damaged) of the sender and the recipient of that email

Email Associated Protocols

  • The simple mail transport protocol (SMTP)
    • connection based
    • Content exchange protocol
    • a client (mail agent) can issue multiple consecutive comments to the SMTP server, and should explicitly terminate its connection when its finished
    • treat a single client-server interaction as an individual and complete transaction
    • a series of client-server interacitions can take place before the connect vetween the client and the server are explicitly terminated