| Home Membership
|
325 notes-7Transaction Management
A transaction is a real world event which will cause the database contents (not the structure) to be altered. As far as the database is concerned, a transaction must be a logical unit of work, i.e. it must take the database from one consistent state to another. Consider the following classic example (which happens thousands of times daily in real life): $100 is transferred from account A to account B. This is a very simple situation. It requires two update commands to be sent to the database.
Let's consider a slightly more extensive example. The real world event (which happens millions of times a day) is that a customer buys a product and charges it to his/her account. The salesman making the sale gets a commission. Even in the simplest possible case we must update at least four tables: The customer table to change the balance due field, the inventory table to reflect the number and value of the items in stock, the accounts receivable table to keep the assets of the organization current, and the employee table to add in the commission for the salesman. Other updates which might or might not happen now (but will certainly happen at some point because of an event such as this one) could involve shipping, billing, reordering for inventory, etc. For the database to accurately reflect the assets of the organization all the updates related to the event must be done, or none of them. A transaction basically consists of all the commands which update all the relevant tables. If only some of them are executed, the database would be in an inconsistent state.
There is a specific example in the text, very similar to the one described above, on page 415. It uses the database described on page 413.
The standard SQL commands which determine the end of a transaction are COMMIT and ROLLBACK. A new transaction is started automatically when a program is initiated (which could be the start of an sqlplus session), or when the previous transaction ends. There is also an optional command called SET TRANSACTION (which we will discuss shortly) which allows you to establish certain parameters about the transaction. ROLLBACK ends the current transaction and returns the database to the state it was in when the transaction started. It rolls back the updates performed within the transaction. COMMIT ends the transaction and makes permanent any updates that were performed within the transaction. There is an implicit commit if the program (which could be an embedded SQL program, an sqlplus session, the underlying code for a user front end, etc.) ends smoothly or if there is a DDL command within the program. If you are making numerous changes to the database, you do not want to have to redo them all in case of a problem. You can avoid this by issuing explicit COMMIT commands whenever you have a logically complete transaction. Implicit COMMIT commands are generated by a smooth exit, or by a DDL command. Any transaction can be killed by the user with an explicit ROLLBACK command. You might want to do this if you noted an earlier mistake, or if the situation changed in the middle of a transaction. For example, suppose you are taking on-line orders. A customer ordered something, you updated the inventory, and the customer changed his/her mind. If a transaction is particularly long, and only some of the commands need to be redone (e.g. a customer order with numerous possible products), oracle has a feature which allows rollback to a previously defined part of the transaction called a savepoint. The syntax would be ROLLBACK TO SAVEPOINT sav_1sav_1 is a user defined name placed in the midst of the transaction.
Transaction management and control is relatively straightforward with a single user database. The real problems arise with concurrent users. Consider the following scenario: Two transactions are called serializable if running them concurrently produces the same results as if they run in some order where one is completed before the other starts. That way the database will be in a consistent state. One of the main goals of transaction management is to have all transactions have the property of serializability.
Simultaneous update is a real problem, but it can be (and usually is) solved. The most common way to guard against it is to use file locking. The general idea of file locking is the following: You need a file for a potential update situation, and there are other users who might also access the file. You place a lock on the file, i.e. prevent any other user from accessing the file. Then when you are done with the file, you release the lock. This will prevent the simultaneous update problem, but it leads to some potential problems of its own.
Deadlock is a serious and difficult problem. A full discussion of it is well beyond the scope of this course (take CS 326, Operating Systems), but we will consider it a little. One relatively common way to handle potential deadlock situations is for the OS to maintain a "wait for" table to see which processes are waiting for which resources. Examination of the table looking for "cycles" (e.g. A holds X waiting for Y, B holds Y waiting for Z, C holds Z waiting for X) will usually reveal potential or actual deadlocks. Since a process can hold several resources and be waiting for several, the actual cycles are not that easy to pick up. When the OS discovers a potential deadlock situation, it will rollback one of the processes, freeing the resources held by that process. There is often a priority scheme to determine which process gets rolled back. The deadlock problem can be greatly alleviated by locking at the record level rather than the file level. It is likely that two processes will contend for the same file at the same time, but unlikely that they would contend for the same record. The difficulty is that the overhead for record locking is very high, and response time could easily become totally unacceptable. A more practical alternative, which is a compromise between the two, is to lock at the page (block) level. A page, by definition, is the amount of data read into memory with an ordinary read operation. It will usually contain several records, perhaps 20 - 200, but nowhere near the number of records in a file. In addition to the granularity (record, page, file level, etc.) of the locks, there are also different types of locks. The most common are the following:
Oracle does have a specific LOCK TABLE command.
SELECT EMP_NAME, SALARY, ISSUE_DATE, ORDER_VALUE FROM WH_EMPLOYEE E, WH_PUR_ORDER P WHERE E.EMPNUMB = P.EMPNUMB AND WHNUMB = 'wh80' FOR UPDATE;will lock the relevant rows of the employee and purchase order tables until the transaction is ended. If you only wish to update the salary field, you could make the last line FOR UPDATE OF SALARY. Then only the rows of the employee table would be locked. As this handout shows, when the SELECT....FOR UPDATE is used, we apparently do get row level locking. Another mechanism a user has for transaction management is the SET TRANSACTION command. Such a command must be the first command within its transaction. One example of such is the command: SET TRANSACTION READ ONLY; This causes all subsequent queries within the transaction to see only changes that were committed before the transaction started. Therefore you would see only consistent data (which might, however, be somewhat obsolete). You would be prohibited from doing updates within that transaction. Another example is the following command: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; Suppose a DML command within transaction T1 attempts to update an item that was updated by another transaction, say T2, and T2 was uncommitted at the start of T1, then the DML command will fail. One advantage to this is that without this SET TRANSACTION statement, an attempt to update an item that has been locked will cause your session to wait until the lock is released, so that in effect the transaction will hang. In many cases it is better to have a command fail and to know why than to have it hang without an obvious reason.
Although locking is certainly a widely used method of transaction control, it is not the only one. One methodology for avoiding concurrent updates which does not use locking is illustrated by the following handout:
Real-time Transaction Processing In some applications transactions are associated with real-time constraints, typically in the form of deadlines, i.e. the transaction must be completed by some specified time, or its value decreases. There are actually two types of real-time transactions. The application system is referred to as a hard real-time system if the value of the transaction becomes 0 if it is not completed on time. Examples might be a radar surveillance system for an airport or a patient monitoring system for a hospital. The system is referred to as a soft real-time system if the value of the transaction diminishes if it is not completed on time, but does not necessarily become zero. A computerized stock buying system might fall into such a category. Another example might be taking customer orders online. If the customer has to wait too long, he/she might hang up. We will not distinguish between the two for this discussion. In a real-time system transaction scheduling is almost always based on a priority system. Each transaction is assigned a priority. The priority can be assigned in a number of ways. The simplest way is to give the highest priority to the transaction with the closest completion deadline. In many applications it is possible to predict the execution time of a transaction. In that case a priority procedure could be to assign the highest priority to the transaction with the least amount of slack with respect to its completion time. Priority is used for CPU scheduling. These days many systems, even desktop systems, have multiple CPU's but the number of potential transactions usually far exceeds the number of CPU's, so the problems we are about to discuss remain the same, though more complicated in such situations. For simplicity, therefore, we will assume a single CPU. If transaction T1 arrives while T2 is executing, and T1 has a higher priority than T2 does, T2 is suspended and T1 assumes control of the CPU. A problem, known as the priority inversion problem, occurs when a higher priority item must wait for a lower priority item. This would occur if T2 holds a lock on a resource that T1 needs. In that case T1 must wait for T2 to complete. There are two ways that this delay can be long enough to cause T1 to miss its deadline. The first way is called data resource blocking. Suppose T1 needs an exclusive update lock on item A, an item for which T2 holds a shared read lock. T1 must wait for T2 to complete. But before T2 completes T3 comes along and requests a shared read lock on A. T2 completes, but T3 is still going, so T1 must now wait for T3. This could continue until it is too late for T1 to complete on time. The second way is called CPU resource blocking. In this case suppose T4 comes along with a higher priority than T2, but lower than T1. T4 has no lock conflicts with T2. Therefore T2 is suspended and T4 executes. T1 cannot preempt T4 because it still cannot get the resource held by T2. T1 must wait for both T2 and T4 to finish before it can start. Since this can happen more than once, it is possible for T1 to be blocked until it is too late to beat its deadline. There are potential solutions to the priority inversion problem, but they can cause problems themselves. One "solution" is called Priority Inheritance (PI). Under PI, when priority inversion occurs, the low priority transaction holding the lock (T2 in our example) would inherit the priority of the highest priority transaction waiting for the locked resource. In this case T2 would inherit T1's priority. With a higher priority, T2 might run faster and release the lock quicker. Also, when T4 comes along, it would not be able to preempt T2. When T2 completed and released the lock, T1 would have priority over T4 and could begin processing. PI eliminates the CPU resource blocking problem. PI does not affect data resource blocking, however. Also, raising T2's priority may affect other concurrent transactions, slowing down their completion, and possibly causing them to miss their deadlines. A second possible solution to the priority inversion problem is called Priority Abort (PA). In this case when a high priority transaction conflicts with a low priority item over a lock, if the low priority item has already started, it is aborted (rolled back). With this scheme the priority inversion problem goes away completely. A low priority item will never be able to block a high priority item. This is desirable for real-time transaction scheduling, but it has a negative side effect for the system. There could be a high system abort rate, which wastes system resources. Also consider the following scenario: Suppose that priority is based solely on the deadline, i.e. the transactions with the closest deadline get the highest priority. Suppose T1 takes 5 units to complete and has a deadline of 18. T2 takes 9 units to complete, and has a deadline of 20. Suppose T2 starts executing at time 0, and T1 arrives at time 8. Then under PA, since T1 has a higher priority, T2 is aborted. T1 starts at 8 and finishes at 13. T2 can start at time 13, but cannot finish until time 22, so misses its deadline.
The trade off between PI and PA is the potential long blocking of PI versus the high conflict abort rate of PA. A potential compromise, advocated by some, is called Conditional Priority Inheritance. A threshold value of h is determined. If the difference between T2's total execution time and the remaining time is greater than h, abort T2. If it is not T2 inherits T1's priority. The value of an appropriate h is not clear, and research is still trying to determine it. In the above scenario, if h = 2, T2 would be allowed to go to completion at time 9, and T1 would start at 9 and finish at 14, allowing both transactions to meet their deadline, and not wasting system resources.
Distributed Database Management Systems
A DDBMS is different from the situation of a centralized database with remote access from different locations. A DDBMS consists of a collection of sites, connected via some kind of a communications network. There are two basic conditions which must be satisfied for a setup to be considered a DDBMS:
The DDBMS can be regarded as a kind of partnership among the local DBMS's. Each local DBMS would have software, possibly an extension of the local DBMS, to provide the partnership functions. The combination of this new component, together with the existing DBMS's constitutes the DDBMS. Advantages of a DDBMS:
Disadvantages of a DDBMS:
C. J. Date, one of the big names in database theory, gives what is generally regarded as the fundamental principle of DDBMS: To the user, a distributed database should look exactly like a non-distributed system. This principle led to Date's 12 objectives of DDBMS. These are available in your text on pages 506 - 507. Date proposed these more than 20 years ago, but most database people still buy into them. The following quote in the book I find interesting. "Although no current DDBMS conforms to all of them, they constitute a useful target."
For the previous set of notes click on 325 notes-6. For the next set of notes click on 325 notes-8.
|
|||
|
Last update: Wednesday, August 12, 2009 at 11:23:46 AM. |
||||