Object-Oriented DBMS
Recall our Friend, The Relational Model:
- DB = {relations}
- Relation = {tuples}
- Tuple = {named fields/columns (homogeneous)}
Relational Languages
- SQL @ declarative queries (or QBE, Quel, etc.)
- C/SQL or 4GLs for applications
Other relational goodies
- Views vs. Logical vs. physical schemas (data independence)
- Triggers, authorization, constraints
- Simple algebra & query optimization
- Robust systems w/good performance
- Easily parallelizable
Q: Isn't this heaven?
A1: "A relational database is like a garage that forces you to takeyour
car apart and store the pieces in little drawers"
A2: E/R world, set-valued attributes, variances among entities, &SQL
limitations (expressive power)
OODBMS Goals
- Shrink the "impedance mismatch" problem for application programmers
- DB vs. PL type systems
- Declarative vs. procedural programming
- Set-at-a-time vs. instance-at-a-time compilation
- Relax data model limitations
- Atomic values, tuples, sets, arrays, identity
- Classes w/methods & encapsulation
- Subtyping/inheritance
- Composite objects (w/sharing)
- Versions/configurations (& long xacts)
- New language features
- Computationally complex methods (e.g. C++)
- Complex object "queries"
- Integrated DBPL(s) (sometimes)
- General focus tends to be on CAx, GIS, telecomm, cooperative/collaborative
work, etc.
- Predates "Object-Relational" systems (but coeval with Postgres)
The OODBMS Manifesto (Atkinson/Bancilhon/DeWitt/Dittrich/Maier/Zdonik,
'90)
Thou shalt support:
- Complex objects (tuples, sets, bags, arrays + constructors & ops)
- Object identity (equal not the same as identical; sharing & updates.
Plutarch's
Ship of Theseus)
- Encapsulation (ADTs/info hiding/implementation vs. interface)
- Class/type hierarchies (inheritance, substitution for specialization)
- Late binding (polymorphism, "virtual" classes in C++ terms)
- Computational completeness (methods)
- Extensibility (system & user types are the same)
- Persistence (orthogonal to type)
- Secondary storage (large DBs)
- Concurrency control
- Recovery
- Ad hoc query facility (declarative, optimized)
Thou may support:
- Multiple inheritance
- Type checking (static vs. dynamic up to you)
- Distribution (client/server)
- Long xacts
- Version management
Wide open:
- Programming paradigm
- Type systems details (base + constructors)
- Type system fanciness (e.g. templates, etc.)
ObjectStore
One of the more successful vendors, both commercially and design-wise.Took
C++ type system & language constructs, added databasey featuressuch as:
- Persistence for objects (at allocation)
- Bulk types (via templates)
- Relationships (i.e. OO referential integrity)
- Query expressions (simple, but optimizable)
- Fancy runtime system with DB goodies like CC&R, client/server,
indexes,…
Some simple DDL examples to see data model, C++ extensions, "query language",
index support.
Still in business (www.odi.com), supporting
C++, Java. Started an XML product called Excelon, and now ExcelonCorp
is the "parent" company, ODI the child.
Some of the Major Research Themes in OODBMSs
- Pointers.
- Logical pointers (i.e. disk pointers) require a level of indirection
(hash index over the buffer pool). The level of indirection consumes
time and space!
- Physical pointers (i.e. memory pointers) require the level of indirection
to be translated. Pointer swizzling. ObjectStore "fooled"
VM to get hardware help in swizzling pointers while explicitly managing the
buffer pool. QuickStore (see White/DeWitt in red book) goes into the
details of whatthis kind of pointer swizzling requires.
- Obviously, there are tradeoffs depending on workload. Also,
ObjectStore "loses control" of pages in VM, so must do page-level locking
& physical logging.
- Clustering.
- Objects are connected in a graph structure via pointers. Programs
navigate this graph. How should you lay things out on disk to getgood
locality? (see Tsangaris/Naughton SIGMOD '92 for a survey and a graph-partitioning
scheme) Lots of rediscovery in file systems and web settings!
Clustering is a classic problem, of course, but the disk version is treated
first in OODBMSs.
- Client/Server Caching & Prefetching
- Typically OODBMSs were to be used in a client/server environment,
wherethe client would operate on a portion of the database (e.g. a piece
ofa VLSI diagram.) Would like to do intelligent client/server caching.
- Need to pay attention to transactions in this environment!
- Page-shipping vs. object-shipping
- Franklin et al. did good work in this area, including an excellent
surveywe'll see soon.
- Indexing
- Over class hierarchies
- Over path expressions
- Some nice tradeoff papers here, nothing especially surprising.
- Query Processing and Optimization
- Declarative languages (OQL)
- Algebras that include support for complex objects (e.g. Nest/Unnest,
othercomplex type constructors)
- Path expressions
- Extensible optimizer generators (Graefe's work on Exodus, Volcano,
Cascades)
- O2 is the only "real" system to explore this, and the basis of much
ofthe research (which tends to be rather fussy)
- Schema Evolution, Bulk Loading
- How to make this possible and efficient?
- Benchmarking
- Wisconsin's OO7 the de facto benchmark (pretty academic, though!)
Takeaways
- Don't compete with relational (and its evolution) for the storage/query
market. It's too big a task, customers are too conservative in this
space. Evolvability outweighs performance for the bulk of the market.
The rest of the market rolls its own solution. (XML database vendors
beware!)
- A bunch of nice research was done in the OODB space. Some of
it isrelevant for ORDBMS (e.g. clustering, indexing, query processing/opt).
Some of it is relevant in other environments as well -- e.g. pointer swizzling
is a classic problem, transactional caching may become important on the web,
etc.
- The XML and "semi-structured" research has done little systems work
thatdoesn't look just like OODB or relational work. (We may have a
peek later in the semester, or see CS286)
- A well-educated DB researcher should know about the bag of tricks here.