2023-05-09

DS-week5

Architectures of Distributed Systems

An obvious way to distinguish between distributed systems is on the organisation of their software components, in other words, their software architecture
- Centralised architectures (eg. traditional client-server)
- Decentralised architectures (eg. p2p)
- Hybrid architectures

Software Architectural Styles

A software architectural style is formulated in terms of components, the way that components are conencted to each otters, data exchange between components, and how elements are jointly configured into a system
A variety of architectural styles
- Layered architectural styles
- Object-based architectural styles
- Resource_centered architectural styles
- Event-based architectural styles

Layered Architectural Style

Lj can make downcall to Li (i<j) and expects a response
bottom layers provide services to top layers
request flow from top to bottom, response from bottom to top

Object-Based Architectural Style

each object coresponds to a component, and component connect through procedure call in a network
provides a way of encapsulating data and operations into a single entity
Communication happen as method invocations, called Remote Procedure Calls (RPC)

Resource-Centered Architectural Style

A DS is viewed as a huge collection of resources that individually managed by components
based on a data centre

The Event-Based Architectural Style

processes running on various components are both referentially decoupled and temporally coupled, one process does not explicitly know any other process
the only thing a process can do is publish a notification describing the occurence of an event
processes may subscribe to a specific kind of notification

Centralised System Architecture

A server and a client

Multi-Tiered Client-Server Architectures

Three logical tiers
Two types of machines
- a client machine
- a server machine
All functinality is handled by the server, client is no more than a dumb terminal
Many distributed applications are divided into the three layes
- user interface layer
- processing layer
- data layer
The main challenge to clients and servers is to distribute these layers across different machines
A server may sometimes act as a client

Decentralised System Architectures: P2P

better workload balance
a client or server may be physically split up into a number of logical parts, this is horizontal distribution
each process will act as a client and a server
processes are organised in an overlay network
two types of overlay networks:
- structured
- unstructured

Structured P2P Systems

nodes are organised in an overlay that adheres to a specific, deterministic topology (eg. a ring, a binary tree, a grid etc)
topology used to efficiently loop up data
and node can be asked to loop up a given key

Unstructured P2P Systems

each node maintains an ad hoc(临时的) list of neighbours, eg. random graph
changes its local list almost continuously
searching for data is necessary

Examples of Searching Methods

Flooding
Random Walks

Making Data Search more Scalable in Unstructured P2P

to improve scalability of data search, it make use of special nodes that maintain an index of data items, creating special collaborations among nodes

Collaborative Distributed Systems

BitTorrent (download from other users until form a complete file)
global directory
contains a link to the file tracker, a server of active nodes

Edge-Server Systems

following properties
- Are deployed on the internet
- servers are placed “at the edge” of the network
helps reduce latency, bandwidth usage and improves overall performance

2023-05-09

Week4

A Simple Machine

A machine makes available computing and storage resources
- computing resource, eg. CPU
- storage resource, eg. a given amount of primary memory
A process is an executing instance of a program
Computing and Resource usage are typically controlled by and operating system
an OS aims to make the most efficient use possible of those resources
the OS assigns a unique identity to each process and then controls how a process is granted access to computing resources
the OS also controls how a process is granted access to storage resoures by assigning an address space to that process
when the OS ensures that each process P has a single address space A that is exclusive to P, we are allowed a sequential reading of the steps that comprise the process

Sequential reading of the steps that comprise the process

The OS ensures that no other process tampers with x and y, and only foobar has access to x and y

Sequential processing

if foo and bar take long to run, we wish to run them concurrently, in different processes, and even better, in parallel, in different processors

Sequential, isolated processing is simple, but bounded and limiting

Non sequential, non isolated processing expands the bounds and limits with respect to performance

Sequential vs. Multi-Processing, Concurrent, Parallel and Distributed Computing

Multi-Processing vs. Multi-Tasking

Two different concepts
An OS multi tasks by:
- allowing more than one process to be underway by controlling how each one makes use of the resources allocated to it
- implementing a scheduling policy, which grants each active process a time slice during which it can access the resources allocated to it
In multi-tasking, processes are not really exeuting concurrently
the appearance of concurrent execution stems from an effective scheduling policy
If all processes get a fair share of the resource and they get it sufficiently often, it seems to users that all processes are execcuting concurrently

Multi-Processing by Forking

causes two copies of itself to be active concurrently
The child process is given a copy of the parent process’s address space. The address spaces are distince
- the child process starts executing after the OS call
- the parent can continue or wait for the child to execute
- the parent must find out how and when the child completes execution
because of the copying, forking can be expansive
modern OSs have strategies that make the actual cost quite affordable
forking is reasonably safe because the address spaces are distinct
When forking is used in multi-processing, consistency of processing results is more likely to occur than in cases where multi-processing is achieved by threading

Multi-Processing by Threading

if parent and child need to interact and share, threading may be a better approach to multi-tasking
with threading, the address space is not copied, it is shared
- this means that if one process changes a variable, all other processes see it
- this makes threading less expensive, but also less face than forking

Concurrent Computing

consider **many ** application processes
processes are often threads (the OS schedules the execution of n copies of process Pi, 1<= i <= n, to run in the same processor, typically sharing a single address space)

Parallel Computing

many processors bound by and interconnect (eg. a bus)
there is truly many processes running at the same time

Distributed Computing

many independent, self-sufficient, autonomous, heterogeneous machines
spatial separation
message exchange is needed, network effects are felt
complexity may reach a point in which applications are not written against OS services. Instead, they are written against a middleware API. The middleware then takes some of the complexity upon itself

2023-05-09

Week3

Naming in Distributed Systems

uniquely identify entities
name resolution refers to the means by which a process is allowed to access a named entity, which supported by a naming system
every computer connected to the internet needs to be “addressable“

Possible approaches to addressing mechanisms

centralised
Free-for-all
by delegating naming responsibilities

The internet is the biggest distributed system

Addressing machanisms

Centralised Naming Approach

Any name is handed out once and only once
a single point of contact

Limitation: single point of contact is not a very scalable solution, and creates a single point of failure

Free-for-All Naming Approach

Allows any object wants a name to make up its own name
Although its a **massively ‘distributed’ solution **which avoids single point of failure, it does not guarantee uniqueness

THE ‘Delegating Naming Responsibilities’ Approach

authority to allocate names is delegated to smaller parts of the system
this approach better balances the conflicting issues associated with single points of failure

but what rules are appropriate for each system?

examples: MAC addresses, IP addresses and Domain Names

MAC Addresses

a unique identifier given to each network device in a system: meaning that every ethernet or wifi card in a computer has one MAC address
- There are more MAC addresses than computers (since most computers have serveral network devices)
- a MAC address is 48 bit number
a MAC consist of two main parts:
- the ‘Organisationally unique identifier’ (OUI)
- the ‘Network interface controller’ (NIC)
A MAC address does not tell you where a device is on a network

IP Addresses

Unique identifier and contains some information about where a device is on a network
most IP addresses are 32 bit numbers, but are most often written as four 8 bit numbers separated by dots
Top-level authority for IP addresses is the Internet Assigned Numbers Authority (IANA)
- Unlike MAC address, the delegation of IP addresses tkaes place initially to geographical regions
- For this reason, IP addresses can tell you some information about the location of a device on a network

Domain Names

created because humans find IP addresses hard to read
Domain Name System (DNS) is used to create associations between human-readable names and IP addresses (eg. www.bbc.co.uk)
‘delegation model’ is complex, as it has aspects of geographical delegation
- eg. ‘.co.uk’ are for UK-based companies

DNS cannot allocate batches of names upfront, and need to respond in real time to requests to ranslate names into IP addresses
achieve this by being a Distributed System consisting of a hierarchy of servers with the most authoritative server at the top of the hierarchy

Protocols

define sets of rules how two or more objects should interact with one another
serve as specifications rather than implementations

(eg. HTTP)

HTTP Protocol: Statelessness

Request-response protocol, transfer data between web servers and web clients (typically web browsers)
Content exchange protocol
web servers are said to be stateless, meaning that once a request from a client application is fulfilled, the web server disconnects from the client and “forgets” that the client ever connected

Email

more complex system than ‘the Web’
has to be in exactly one place at any one time
- if it is in two places at the same time then it has been duplicated accidentally
- If it is in zero places then it has been lost in the system, in detriment(harmed or damaged) of the sender and the recipient of that email

Email Associated Protocols

The simple mail transport protocol (SMTP)
- connection based
- Content exchange protocol
- a client (mail agent) can issue multiple consecutive comments to the SMTP server, and should explicitly terminate its connection when its finished
- treat a single client-server interaction as an individual and complete transaction
- a series of client-server interacitions can take place before the connect vetween the client and the server are explicitly terminated