MIT 6.824 · Lecture 3: GFS #

由于 2021 的课程是网课形式，听了两节课下来，总体感觉老师讲授的内容没有线下这么多了（没有黑板可能不太方便），于是又看回了 Spring 2020 的课程。因此该次学习计划就调整为：2020 Lectures + 2021 Labs。

Big Storage #

This is a dead loop.

C1  Wx1
C2  Wx2
C3        Rx2
C4             Rx2

Client 3, 4 should read Wx2 results but not Wx1 results.

Filename -> array of chunk handles (non-volatile)
Handle:
- list of chunk servers (volatile)
- Version # (non-volatile)
- primary (volatile)
- Lease expiration (volatile)
Log, checkpoint – DISK

Client sends filename, offset to Master
Master sends chunk handle, return list of Servers
The client cached these results.
Client sends request to chunk server, and chunk server returns the file data.

No primary – on master:
- Find up to date replicas
- Picks primary and secondaries chunks
- Increment the version number (v#)
- Tells primary and secondaries this new version, and gives primary a lease
- master writes its own version number to disk
- Primary picks off all replicas(including itself) told to write at offset
- If all secondaries say “yes”, then the primary reply “success” to client; else reply “no” to client
How to deal with scenarios like “split brain” or network partition that having multiple primaries?

The answer is using “Lease”.

Duplicate detection (discard duplicate requests)
Secondary confirmation (Secondaries really have done something, with an confirmation mechanism)
two-phase commit (Can you do it? Do it after replying with a promise response)
primary crashed, new primary has to start by explicitly make synchronizing tail operations
if secondaries times differ or client has a slightly stale indication, the system should send all client reads to the primary

tremendously successful: a number of google infrastructure built as a layer on top of the GFS
most serious limitation: single master (record all chunk records, maybe run out of RAM)
some applications found it its hard to deal with some odd semantics
master cannot automatically fail-over that requires human intervention (cost about 10 minutes+)