| Home Membership
|
325 notes-3DBMS architecture overview DBMS's can be broken up into several distinct architectural classes. The first three are well-defined. After that things are a little fuzzy. The first three classes are
Database management systems evolved in the 60's and early 70's primarily to handle standard business type applications such as various accounting functions (accounts payable, general ledger, payroll, etc.), inventory control, automated banking, reservation systems, and the like. These types of applications have four requirements that characterize database systems:
Hierarchical and network systems, considered first generation database systems, were largely successful in these respects. They essentially solved the file problems that plagued data processing systems before the advent of DBMS's. However first generation systems had some fundamental disadvantages. They were very efficient for processing applications that used predetermined access paths (large scale transaction processing) but could not easily handle ad-hoc type queries, e.g. "How many employees in the widget department are due to retire in the next three years?" To answer a query such as this, an application programmer skilled in performing disk-oriented optimization would have to write a procedural program to navigate through the database. The time frame for such an activity might be anywhere from 1 day to 6 months. (In contrast, a skilled SQL user could likely answer the question within minutes, using a relational database.) First generation systems did not have as much data independence as most would find desirable. That is, structural changes to the database to accommodate one application generally required all application programs to be recompiled at the very least, and might well cause some of them to be rewritten if access paths were altered in any way. Essentially for these reasons (among others, including the ease of use of relational systems), there are no longer very many implementations of first generation systems. Let's briefly look at the first generation systems in a little more detail. The downloadable appendices of the text have a more complete description of hierarchical and network databases.
Hierarchical databases: This approach is based on representing data in a tree structure. Information Management Systems (IMS) from IBM was the most widely used such system. It was developed in the 60's, and there may still be a few IMS installations in use today. At one time it was the most widely used DBMS in the world.
Network databases: The term "Network" should not be confused with the distributed databases in use today, many (if not most) of which use the internet to communicate among geographically distributed sites. In this context, the term "Network" referred to a data structure similar to a tree structure, except that a node could have any number of "parents". Basically, a network database allowed any record type to be associated with any other record type. In a full network model a record type can have any number of different parent types and any number of different child types. This meant that a conceptual schema could be directly modeled with very few design compromises. The main difficulty with the network approach was its complexity. It required programming users to be aware of and understand the underlying data structure, which was likely based on a sophisticated, complex model. Many of the programmers, perhaps even a majority, were not up to it. A network database solved file problems, but application development was slow in part because the typical programmer could not cope with the complexity of network databases. In addition, query facilities were primitive at best, so the only way a non-programmer could use the database was through application programs written by the programmers. Relationships between record types were almost exclusively implemented via linked lists. To get information from a first generation database it was frequently necessary to "navigate" (hence the term "navigational databases") through a complex series of linked lists, being explicitly aware of pointer fields and which record types were "pointing to" which other record types. It took very careful programming to avoid "getting lost", i.e. being unable to get back to a specific point in the database. (I speak from experience.)
Relational database systems (RDBMS) are generally considered second generation systems. They were pioneered by E. F. Codd in the early 70's, and offered a fundamentally different approach to data storage, at least conceptually, if not in fact. (I strongly suggest you read the vignette on page 3 of the text. It gives a very brief capsule of the development of relational databases in general and Oracle in particular. Oracle is the most widely used database software in the world and the system we will be using in this course.) RDBMS's have some limitations. For example, data types are limited to standard types (integer, floating point, character, date, and Boolean). There are no composite types, no collection types, and in general no user defined data types. There are a broader class of applications, however, for which RDBMS's are inadequate. These applications include computer aided design (CAD), computer aided software engineering (CASE), and applications which can make use of hypertext, graphics, video, sound, and perhaps other complex data types. Consider, for example, a publishing application for a newspaper layout. This requires storing text segments, graphics, icons, and many other kinds of data elements found in most hypertext environments. Supporting such data elements is usually difficult in second generation systems. A new type, a third generation system (or more appropriately a "next-generation" system, since there is more than one candidate for whatever the "third generation" would be), is needed for this type of application. The third generation system needs to be able to handle routine business data processing also. In the newspaper example these would include handling classified and other advertisements. The text of the ad would have to be stored, along with the rate, the number of days the ad will run, the billing address of the customer, etc. Handling of the ads requires normal business transaction processing. In addition, rules might be needed. For example, a rule might say that competing businesses do not have their ads placed on the same page or facing pages. Of course the rule could be built into the application program which lays out the paper, but it would be better to have the rules stored within the database so that they would not be lost if the application program is changed or modified. Generally speaking features that are built into the database are more stable than features which must be supplied by application programs. The most likely candidates for a third generation DBMS are Object-Oriented Database Management Systems (OODBMS) and/or Object-Relational Database Management Systems (ORDBMS). The latter are sometimes referred to as Extended Relational Database Systems. Chapter 2 section 2.5.5 briefly discusses the OO model, and appendix G goes into more detail. I am not sure if we will ever get to the point of having "state-of-the-art" databases which fall into the category of OODBMS. At the present time OODBMS's do exist, but they are immature and do not follow any set standard. The Object Oriented database model (OODM) seemed to be poised (at least by theorists) to dislodge the Relational Data Model (RDM) in the face of increasingly complex data that included video and audio , yet the OODM fell short in the database arena. However, the OODM≠s basic concepts have become the basis of a wide variety of database systems analysis and design procedures. In addition, the basic OO approach has been adopted by many application generators and other development tools.
The OODM's inability to replace the RDM is due to several factors. First, the large installed base of RDM-based databases is difficult to overcome. Change is often complex and expensive, so the prime requisite for change is an overwhelming advantage of the change agent. The OODM advantages were simply not accepted as overwhelming and were, therefore, not accepted as cost-effective. Second, compared to the RDM, the OODM's design, implementation, and management learning curves are much steeper than the RDM's. (Recall that ease of use was a strong factor in having relational systems take over from first generation systems.) Third, the RDM preempted the OODM in some important respects by adopting many of the OODM's best features, thus becoming the extended relational data model (ERDM). Because the ERDM retains the basic modeling simplicity of the RDM while being able to handle the complex data environment that was supposed to be the OODM's forte, you can have the proverbial cake and eat it, too. The OODM-ERDM battle for dominance in the database marketplace seems remarkably similar to the one waged by the hierarchical and network models against the relational model almost three decades ago. The OODM and ERDM are similar in the sense that each attempts to address the demand for more semantic information to be incorporated into the model. However, the OODM and the ERDM differ substantially both in underlying philosophy and in the nature of the problem to be addressed. Although the ERDM includes a strong semantic component, it is primarily based on the relational data model's concepts. In contrast, the OODM is wholly based on the OO and semantic data model concepts. The ERDM is primarily geared to business applications, while the OODM tends to focus on very specialized engineering and scientific applications. In the database arena, the most likely scenario appears to be an ever-increasing merging of OO and relational data model concepts and procedures. Oracle 11 is an ERDM. It is not truly OO (at least to the purist), but it has many OO features, including the two most important, abstract data types and type inheritance. Future versions will undoubtedly have more. In terms of data types Oracle 11 has large object types (CLOB, character large object, and BLOB, binary large object), reference types so that a field within a record can explicitly reference another record (note that this puts some navigational aspects,which were a major problem with first generation systems, back into the database arena), XML types, various media types (sound, video, etc.), nested table types, variable array types, and user defined data types which allow the user to define composite types and/or include methods (built in procedures) within a user defined data type. We will be using Oracle as if it is a straight relational database, but the OO features are there for "power users".
As was just noted, the OO model tends to be complex, which was one of the problems with first generation systems. There is no question that the object features of Oracle 11, while very powerful, are much more complex and difficult to use than if the DBMS is treated as a strictly relational database. Perhaps today's users are better educated and more technologically sophisticated, and are better able to deal with the complexity than users were 30 years ago and are ready for the complexities of the OO model. Relational databases: In chapter 3 we will get to the details of the relational model, but at this point we will expand a little more on the overview. RDBMS's are the current state of the art in the sense that most of the time when an organization gets new DBMS software, the software would be based on the relational model (or these days an extended-relational model, but with Oracle at least, it can be treated as if it is purely a relational database). The products are mature, and because they are based on a single standard they tend to be very similar. That makes it easy to share data among systems with different software, and to move from one piece of software to another. The name, relational, comes from the fact that the underlying theory is based upon the mathematical theory of relations (which may have made practitioners think it is more complicated than it is). Think of a relation as a file in the form of a 2-dimensional table, where each row represents a record. The number of columns (fields) is fixed, but the number of rows is indefinite. No two rows should be precisely the same. (See examples on page 39.) The driving force of the relational model is simplicity. To that end there are two crucial features:
Relational databases were proposed in theory in the early 70's, but did not become commercially viable until the mid 80's. On a large mainframe with huge files, processing could still be fairly slow. The optimizer (part of every large scale relational database system) may choose to do an exhaustive search of a particular table if there were no other way. A designer can design in indices, hash fields, in some cases link fields, etc. which the optimizer would take advantage of. The SQL program would be the same no matter what. It is not necessary for the programmer to understand the underlying data structure. Perhaps the biggest advantage of the relational approach is its simplicity (at least at the user level). To some extent this is a disadvantage as well because it makes it easy to create poorly designed databases, which can be very frustrating to use.
|
|||
|
Last update: Monday, July 13, 2009 at 2:39:56 PM. |
||||