Posts

Showing posts from March, 2020

[Distributed File System] MapReduce: simplified data processing on large clusters Overview

This is a paper summary of the paper, "MapReduce: simplified data processing on large clusters". What is the paper trying to do? Inspired by the map and reduce primitives in functional languages, the paper introduces a new abstraction whereby map and reduce operations allows the program to parallelize large computations easily. In short, it allows the anyone to execute programs with parallelization, fault-tolerance, data distribution and load balancing without bothering with the mess that usually comes with it. What do you think is the contribution of the paper? The major contribution of this paper is developing an interface that distributes large-scale computations using MapReduce. This allows it to achieve ‘automatic parallelization’. Another big contribution is implementing this interface on large clusters of commodify PCs. What are its major strengths? Model is easy-to-use. Because it hides away the details of parallelization, fault-tolerance, locality optimiza

[Distributed File System] A Berkeley View on Serverless Computing Overview

This blog is a paper summary of the paper, "A Berkeley View on Serverless Computing". What is the paper trying to do? The paper serves as a good introduction to serverless computing. It gives an introduction to serverless computing then goes on to talk about how it started (the motivations), the limitations and what the authors predict serverless computing will become in the future. It does a good job in explaining how the serverless cloud handles virtually all the system administration operations and makes it easier for programmers do what they usually do on the cloud. What do you think is the contribution of the paper? Again, the paper’s major contribution is its very detailed introduction of server less computing. What are its major strengths? The major strengths of serverless computing are: The appearance of infinite computing resources on demand. The elimination of an up-front commitment by cloud users. The ability to pay for use of computing resources on

[Distributed File System] Dynamo: Amazon's Highly Available Key-value Store Overview

This is a paper summary of the paper, "Dynamo: Amazon's Highly Available Key-value Store". What is the paper trying to do? This paper is trying to “present the design and implementation of Dynamo, a highly available key-value storage system”, that is used in Amazon’s core services. By sacrificing consistency under certain failure scenarios, Dynamo is able to reach an “always-on” experience in terms of availability. This allows it to be successful in handling server failures, data center failures and network partitions. Additionally, Dynamo is incrementally scalable and can scale up and down while it is up and running. Dynamo uses a combination of technologies, including extensive use of object versioning, application-assisted conflict resolution, data partitioning, replication via consistent hashing. Additionally, during updates, quorum-like technique and a decentralized replica synchronization protocol is used to maintain consistency amongst replicas. What do yo

[Distributed File System] Introduction to Ceph

Image
Ceph 1. Preface This blog focuses on the paper, Ceph: A Scalable High-Performance Distributed File System. Note that this paper was written in 2006 and the implementation of it might be different from what is described in the paper (and here). Details on more in-depth concepts, such as CRUSH, Metadata Server and Object Storage Device cluster are skipped in this blog (might include it in future blogs). 2. Introduction Ceph is distributed file system that builds on several philosophical and design principles. 2.1. Philosophical Principles Starting with philosophical principles, Ceph is open source . This means that it is free to use, free from being vendor specific, free to modify and free to share. Secondly, Ceph is Community-focused means that anybody can decide future step, anybody can fix a bug and update the documentation because all of us as a whole are smarter than some of us, so we can end up with better product. 2.2. Design Principles First, Ceph is Scalable .