Advanced Topics in Computer Systems |
Fall 2001
|
Joe Hellerstein & Anthony Joseph |
|
Degrees of Consistency (a/k/a Isolation Levels)
Despite all the discussion of ACID, sometimes it's nice to sacrifice semantic
guarantees for the sake of performance. The goal is to let individual
transactions choose this WITHOUT messing up the database or the other transactions
that do care.
Gray, et al.: Degrees of Consistency
First, a definition: A write is committed when transaction if finished; otherwise,
the write is dirty.
A Locking-Based Description of Degrees of Consistency:
This is not actually a description of the degrees, but rather of how to
achieve them via locking. But it’s better defined.
- Degree 0: set short write locks on updated items ("short" = length
of action)
- Degree 1: set long write locks on updated items ("long" = EOT)
- Degree 2: set long write locks on updated items, and short read locks
on items read
- Degree 3: set long write and read locks
A Dirty-Data Description of Degrees of Consistency
Transaction T sees degree X consistency if...
- Degree 0: T does not overwrite dirty data of other transactions
- Degree 1:
- T sees degree 0 consistency, and
- T does not commit any writes before EOT
- T sees degree 1 consistency, and
- T does not read dirty data of other transactions
- T sees degree 2 consistency, and
- Other transactions do not dirty any data read by T before T completes.
Examples of Inconsistencies prevented by Various Degrees
- Garbage reads:
T1: write(X); T2: write(X)
Who knows what value X will end up being?
Solution: set short write locks (degree 0)
- Lost Updates:
T1: write(X)
T2: write(X)
T1: abort (physical UNDO restores X to pre-T1 value)
At this point, the update to T2 is lost
Solution: set long write locks (degree 1)
- Dirty Reads:
T1: write(X)
T2: read(X)
T1: abort
Now T2’s read is bogus.
Solution: set long X locks and short S locks (degree 2)
Many systems do long-running queries at degree 2.
- Unrepeatable reads:
T1: read(X)
T2: write(X)
T2: end transaction
T1: read(X)
Now T2 has read two different values for X.
Solution: long read locks (degree 3)
- Phantoms:
T1: read range [x - y]
T2: insert z, x < z < y
T2: end transaction
T1: read range [x - y]
Z is a "phantom" data item (eek!)
Solution: ??
NOTE: if everybody is at least degree 1, than different transactions
can CHOOSE what degree they wish to "see" without worry. I.e. can have
a mixture of levels of consistency.
Adya, et al. : Generalized Isolation Levels
Gray et al's definitions (and the resulting ANSI standards) are not implementation-independent,
and semantics are ill-defined.
Want an implementation-independent semantic isolation levels which is as
permissive as possible (most possible schedules allowed).
Key insight: many dependencies are multi-object. Capture those,
and you'll get the right semantics.
Conflicts in Adya's Serialization Graphs:
- Read dependencies (WR):
- Def'n: Ti changes the matches of Tj for Tj's predicate-based
reads if Ti installs a new version that either adds to or deletes from one
of Tj's read predicates.
- Tj directly read depends on Ti if Ti directly installs some
version that Tj subsequently reads (item-read-depends), or if Ti changes
the matches of Tj.
- A way to think about predicate-based reads or phantoms: imagine that
every object is versioned, there are "ghost versions" of objects before they're
born and after they die. Predicate-based reads look at all latest versions
of all objects (including ghosts), and what matters is the set of objects
that do or do not match. See example on page 7 of the paper for H_{pred-read}
- Anti-dependencies (RW)
- Def'n: Tj overwrites a predicate-based read by Ti if Tj installs
a new version of an object in the read by Ti that changes the matches of Ti.
- Tj directly anti-depends on Ti if Ti reads an object, and Tj
installs the very next version of that object, or if Tj's install
of any later version changes the matches of a read by Ti.
- Write dependencies (WW)
- Tj directly write-depends on Ti if Ti installs a version of
an object, and Tj installs the next version. (Note there's no predicate-based
version of write dependencies, since database writes are read-predicate/write-tuple).
Direct Serialization Graph:
- nodes are committed xacts
- edges are directed by time, labeled WR, RW, or WW.
Now we can talk about isolation in terms of serialization graphs and "histories"
("schedules"), NOT implementation.
Adya's Isolation Levels
Try to Generalize Gray's. PL-x = "Portable Level x".
- PL-1: try to serialize based on writes alone (ignore reads) -- ensure
that updates are not interleaved.
- Specifically, no cycles containing only WW edges are allowed.
- Note: more permissive than Gray's Degree 1: allows concurrenct
xacts to modify the same object...just ensure no cycles.
- But obvious locking implementation of PL-1: long write locks.
- PL-2: avoid aborted reads
- Specifically, no aborted reads, no intermediate reads,
and no circular information flow (dependency-edge cycles in serialization
graph)
- Note: cascaded aborts and/or commit delays prevent aborted reads
- Note: no intermediate reads means that committed xactions read only
committed data
- More permissive than Degree 2, allows reads from uncommitted
xacts
- Obvious locking implementation: long write locks, short read locks
- PL-3: prevent xactions from committing if they perform inconsistent
reads or writes
- Specifically, do PL-2 AND no anti-dependency cycles
- More permissive than Degree 3, since a modifying xact can update
an object previously read by another uncommited xact
- Obvious locking implementation: 2PL
- PL-2.99: generalize "REPEATABLE READ"
- REPEATABLE READ was long locks for everything except predicate reads
(phantoms can happen)
- PL-2.99 is PL-2 + no cycles with item-anti-dependency edges
Modeling Mixed-Mode Systems
- Mixed serialization graph, only contains dependencies relevant to a
transaction's level (or obligatory dependencies required by other transactions'
modes).