Degrees of Consistency

Advanced Topics in Computer Systems	Fall 2001
Joe Hellerstein & Anthony Joseph

Degrees of Consistency (a/k/a Isolation Levels)

Despite all the discussion of ACID, sometimes it's nice to sacrifice semantic guarantees for the sake of performance. The goal is to let individual transactions choose this WITHOUT messing up the database or the other transactions that do care.

Gray, et al.: Degrees of Consistency

First, a definition: A write is committed when transaction if finished; otherwise, the write is dirty.

A Locking-Based Description of Degrees of Consistency:

This is not actually a description of the degrees, but rather of how to achieve them via locking. But it’s better defined.

Degree 0: set short write locks on updated items ("short" = length of action)
Degree 1: set long write locks on updated items ("long" = EOT)
Degree 2: set long write locks on updated items, and short read locks on items read
Degree 3: set long write and read locks

A Dirty-Data Description of Degrees of Consistency

Transaction T sees degree X consistency if...

Degree 0: T does not overwrite dirty data of other transactions
Degree 1:

T sees degree 0 consistency, and
T does not commit any writes before EOT

Degree 2:

T sees degree 1 consistency, and
T does not read dirty data of other transactions

Degree 3:

T sees degree 2 consistency, and
Other transactions do not dirty any data read by T before T completes.

Examples of Inconsistencies prevented by Various Degrees

Garbage reads:

T1: write(X); T2: write(X)

Who knows what value X will end up being?

Solution: set short write locks (degree 0)

Lost Updates:

T1: write(X)

T2: write(X)

T1: abort (physical UNDO restores X to pre-T1 value)

At this point, the update to T2 is lost

Solution: set long write locks (degree 1)

Dirty Reads:

T1: write(X)

T2: read(X)

T1: abort

Now T2’s read is bogus.

Solution: set long X locks and short S locks (degree 2)

Many systems do long-running queries at degree 2.

Unrepeatable reads:

T1: read(X)

T2: write(X)

T2: end transaction

T1: read(X)

Now T2 has read two different values for X.

Solution: long read locks (degree 3)

Phantoms:

T1: read range [x - y]

T2: insert z, x < z < y

T2: end transaction

T1: read range [x - y]

Z is a "phantom" data item (eek!)

Solution: ??

NOTE: if everybody is at least degree 1, than different transactions can CHOOSE what degree they wish to "see" without worry. I.e. can have a mixture of levels of consistency.

Adya, et al. : Generalized Isolation Levels

Gray et al's definitions (and the resulting ANSI standards) are not implementation-independent, and semantics are ill-defined.

Want an implementation-independent semantic isolation levels which is as permissive as possible (most possible schedules allowed).

Key insight: many dependencies are multi-object. Capture those, and you'll get the right semantics.

Conflicts in Adya's Serialization Graphs:

Read dependencies (WR):

Def'n: Ti changes the matches of Tj for Tj's predicate-based reads if Ti installs a new version that either adds to or deletes from one of Tj's read predicates.
Tj directly read depends on Ti if Ti directly installs some version that Tj subsequently reads (item-read-depends), or if Ti changes the matches of Tj.
A way to think about predicate-based reads or phantoms: imagine that every object is versioned, there are "ghost versions" of objects before they're born and after they die. Predicate-based reads look at all latest versions of all objects (including ghosts), and what matters is the set of objects that do or do not match. See example on page 7 of the paper for H_{pred-read}

Anti-dependencies (RW)

Def'n: Tj overwrites a predicate-based read by Ti if Tj installs a new version of an object in the read by Ti that changes the matches of Ti.
Tj directly anti-depends on Ti if Ti reads an object, and Tj installs the very next version of that object, or if Tj's install of any later version changes the matches of a read by Ti.

Write dependencies (WW)

Tj directly write-depends on Ti if Ti installs a version of an object, and Tj installs the next version. (Note there's no predicate-based version of write dependencies, since database writes are read-predicate/write-tuple).

Direct Serialization Graph:

nodes are committed xacts
edges are directed by time, labeled WR, RW, or WW.

Now we can talk about isolation in terms of serialization graphs and "histories" ("schedules"), NOT implementation.

Adya's Isolation Levels
Try to Generalize Gray's. PL-x = "Portable Level x".

PL-1: try to serialize based on writes alone (ignore reads) -- ensure that updates are not interleaved.

Specifically, no cycles containing only WW edges are allowed.
Note: more permissive than Gray's Degree 1: allows concurrenct xacts to modify the same object...just ensure no cycles.
But obvious locking implementation of PL-1: long write locks.

PL-2: avoid aborted reads

Specifically, no aborted reads, no intermediate reads, and no circular information flow (dependency-edge cycles in serialization graph)
Note: cascaded aborts and/or commit delays prevent aborted reads
Note: no intermediate reads means that committed xactions read only committed data
More permissive than Degree 2, allows reads from uncommitted xacts
Obvious locking implementation: long write locks, short read locks

PL-3: prevent xactions from committing if they perform inconsistent reads or writes

Specifically, do PL-2 AND no anti-dependency cycles
More permissive than Degree 3, since a modifying xact can update an object previously read by another uncommited xact
Obvious locking implementation: 2PL

PL-2.99: generalize "REPEATABLE READ"

REPEATABLE READ was long locks for everything except predicate reads (phantoms can happen)
PL-2.99 is PL-2 + no cycles with item-anti-dependency edges

Modeling Mixed-Mode Systems

Mixed serialization graph, only contains dependencies relevant to a transaction's level (or obligatory dependencies required by other transactions' modes).