Distributed storage

From Higher Computing Science
Jump to: navigation, search

Key points

  • Distributed storage is a way of storing files on more than one set of backing storage devices, usually by replicating data. There are a number of reasons why this might be a good idea, such as increasing access speed, providing redundancy, or to share the load of requests for data across the hardware.

Information

Videos

Further information

Distributed databases

Large data storage is necessary for most major online services such as Google, Facebook or Amazon. In order for the data needs of billions of requests to be met, requests must be redirected to multiple sources, all of which can access the same data. In networking, this is called load balancing. In terms of the data, a strategy must be put into place to make sure that all data is replicated across all nodes used. For a distributed database to work, it must replicate changes made by a user to all nodes.

Peer to Peer distributed storage

Technologies like Bittorrent can be used to copy a file between a primary source (known as a seed) and other computers. Every computer that hosts a complete copy of the file is a seed computer, and each computer that is downloading the file, and has a partial copy, is a peer computer. When a new computer requests the file from the network, the Bittorrent protocol will connect to several computers and download blocks of the file from each. This distributes the load to each computer, and allows file sharing to work without a central source (as long as more than one computer has the complete file).

Test yourself

Teaching resources