This section includes 7 InterviewSolutions, each offering curated multiple-choice questions to sharpen your Current Affairs knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Discuss The Advantages And Disadvantages Of The Two Methods That We Presented In Section 19.5.2 For Generating Globally Unique Time Stamps? |
|
Answer» The centralized approach has the problem of a possible BOTTLENECK at the central site and the problem of electing a new central site if it goes down. The distributed approach has the problem that many messages must be exchanges to KEEP the system fair, or ONE site can get ahead of all other sites and DOMINATE the DATABASE. The centralized approach has the problem of a possible bottleneck at the central site and the problem of electing a new central site if it goes down. The distributed approach has the problem that many messages must be exchanges to keep the system fair, or one site can get ahead of all other sites and dominate the database. |
|
| 2. |
Give An Example Where Lazy Replication Can Lead To An Inconsistent Database State Even When Updates Get An Exclusive Lock On The Primary (master) Copy? |
|
Answer» Consider the balance in an account, replicated at N sites. Let the current balance be $100 – consistent across all sites. Consider TWO transactions T1 and T2 each depositing $10 in the account. Thus the balance would be $120 after both these transactions are EXECUTED. Let the transactions execute in sequence: T1 first and then T2. Suppose the copy of the balance at one of the sites, say s, is not consistent – due to lazy replication strategy – with the primary copy after transaction T1 is executed and let transaction T2 read this copy of the balance. One can see that the balance at the primary SITE would be $110 at the END. Consider the balance in an account, replicated at N sites. Let the current balance be $100 – consistent across all sites. Consider two transactions T1 and T2 each depositing $10 in the account. Thus the balance would be $120 after both these transactions are executed. Let the transactions execute in sequence: T1 first and then T2. Suppose the copy of the balance at one of the sites, say s, is not consistent – due to lazy replication strategy – with the primary copy after transaction T1 is executed and let transaction T2 read this copy of the balance. One can see that the balance at the primary site would be $110 at the end. |
|
| 3. |
Explain The Difference Between Data Replication In A Distributed System And The Maintenance Of A Remote Backup Site? |
|
Answer» In remote backup systems all TRANSACTIONS are performed at the primary site and the data is replicated at the remote backup site. The remote backup site is kept synchronized with the updates at the primary site by sending all LOG records. Whenever the primary site fails, the remote backup site takes over processing. The distributed systems OFFER GREATER availability by having multiple copies of the data at different sites whereas the remote backup systems offer lesser availability at lower cost and execution overhead. In a distributed system, TRANSACTION code runs at all the sites whereas in a remote backup system it runs only at the primary site. The distributed system transactions follow two-phase commit to have the data in consistent state whereas a remote backup system does not followtwo-phase commit and avoids related overhead. In remote backup systems all transactions are performed at the primary site and the data is replicated at the remote backup site. The remote backup site is kept synchronized with the updates at the primary site by sending all log records. Whenever the primary site fails, the remote backup site takes over processing. The distributed systems offer greater availability by having multiple copies of the data at different sites whereas the remote backup systems offer lesser availability at lower cost and execution overhead. In a distributed system, transaction code runs at all the sites whereas in a remote backup system it runs only at the primary site. The distributed system transactions follow two-phase commit to have the data in consistent state whereas a remote backup system does not followtwo-phase commit and avoids related overhead. |
|
| 4. |
Explain The Notions Of Transparency And Autonomy. Why Are These Notions Desirable From A Human-factors Standpoint? |
|
Answer» Autonomy is the AMOUNT of CONTROL a single site has over the local database. It is important because users at that site want quick and correct access to local data items. This is especially true when one considers that local data will be most frequently accessed in a database. Transparency hides the distributed NATURE of the database. This is important because users should not be required to know about LOCATION, replication, fragmentation or other implementation aspects of the database. Autonomy is the amount of control a single site has over the local database. It is important because users at that site want quick and correct access to local data items. This is especially true when one considers that local data will be most frequently accessed in a database. Transparency hides the distributed nature of the database. This is important because users should not be required to know about location, replication, fragmentation or other implementation aspects of the database. |
|
| 5. |
When Is It Useful To Have Replication Or Fragmentation Of Data? |
|
Answer» REPLICATION is USEFUL when there are MANY read-only transactions at different SITES wanting access to the same data. They can all execute QUICKLY in parallel, accessing local data. But updates become difficult with replication. Fragmentation is useful if transactions on different sites tend to access different parts of the database. Replication is useful when there are many read-only transactions at different sites wanting access to the same data. They can all execute quickly in parallel, accessing local data. But updates become difficult with replication. Fragmentation is useful if transactions on different sites tend to access different parts of the database. |
|
| 6. |
How Might A Distributed Database Designed For A Local-area Network Differ From One Designed For A Wide-area Network? |
|
Answer» Data transfer on a local-area NETWORK (LAN) is much faster than on a wide-area network (WAN). Thus replication and fragmentation will not increase THROUGHPUT and speed-up on a LAN, as much as in a WAN. But EVEN in a LAN, replication has its uses in increasing reliability and AVAILABILITY. Data transfer on a local-area network (LAN) is much faster than on a wide-area network (WAN). Thus replication and fragmentation will not increase throughput and speed-up on a LAN, as much as in a WAN. But even in a LAN, replication has its uses in increasing reliability and availability. |
|
| 7. |
Explain How The Following Differ: Fragmentation Transparency, Replication Transparency, And Location Transparency? |
Answer»
|
|
| 8. |
Discuss The Relative Advantages Of Centralized And Distributed Databases? |
Answer»
|
|
| 9. |
Consider A Network Based On Dial-up Phone Lines, Where Sites Communicate Periodically, Such As Every Night. Such Networks Are Often Configured With A Server Site And Multiple Client Sites. The Client Sites Connect Only To The Server, And Exchange Data With Other Clients By Storing Data At The Server And Retrieving Data Stored At The Server By Other Clients. What Is The Advantage Of Such An Architecture Over One Where A Site Can Exchange Data With Another Site Only By First Dialing It Up? |
|
Answer» With the central server, each SITE does not have to remember which site to contact when a particular data item is to be requested. The central server alone needs to remember this, so data items can be moved AROUND easily, depending on which sites access which items most frequently.Other house-keeping TASKS are also centralized rather than distributed, making the system easier to DEVELOP and maintain. Of course there is the disadvantage of a total shutdown in case the server becomes unavailable. Even if it is running, it may become a bottleneck because every request has to be routed via it. With the central server, each site does not have to remember which site to contact when a particular data item is to be requested. The central server alone needs to remember this, so data items can be moved around easily, depending on which sites access which items most frequently.Other house-keeping tasks are also centralized rather than distributed, making the system easier to develop and maintain. Of course there is the disadvantage of a total shutdown in case the server becomes unavailable. Even if it is running, it may become a bottleneck because every request has to be routed via it. |
|
| 10. |
Consider A Bank That Has A Collection Of Sites, Each Running A Database System. Suppose The Only Way The Databases Interact Is By Electronic Transfer Of Money Between One Another. Would Such A System Qualify As A Distributed Database? Why? |
|
Answer» In a distributed system, all the SITES TYPICALLY run the same database management software, and they share a global schema. Each site provides an ENVIRONMENT for EXECUTION of both global transactions INITIATED at remote sites and local transactions. The system described in the question does not have these properties, and hence it cannot qualify as a distributed database. In a distributed system, all the sites typically run the same database management software, and they share a global schema. Each site provides an environment for execution of both global transactions initiated at remote sites and local transactions. The system described in the question does not have these properties, and hence it cannot qualify as a distributed database. |
|
| 11. |
What Are The Factors That Can Work Against Linear Scale Up In A Transaction Processing System?which Of The Factors Are Likely To Be The Most Important In Each Of The Following Architectures: Shared Memory, Shared Disk, And Shared Nothing? |
|
Answer» Increasing contention for shared resources prevents linear scale-up with increasing parallelism. In a shared MEMORY system, contention for memory (which implies bus contention) will result in falling scale-up with increasing parallelism. In a shared disk system, it is contention for disk and bus access which affects scale-up. In a shared-nothing system, inter-process COMMUNICATION overheads will be the MAIN impeding factor. Since there is no shared memory, acquiring locks, and other activities requiring message PASSING between processes will take more time with increased parallelism. Increasing contention for shared resources prevents linear scale-up with increasing parallelism. In a shared memory system, contention for memory (which implies bus contention) will result in falling scale-up with increasing parallelism. In a shared disk system, it is contention for disk and bus access which affects scale-up. In a shared-nothing system, inter-process communication overheads will be the main impeding factor. Since there is no shared memory, acquiring locks, and other activities requiring message passing between processes will take more time with increased parallelism. |
|
| 12. |
Suppose A Transaction Is Written In C With Embedded Sql, And About 80 Percent Of The Time Is Spent In The Sql Code, With The Remaining 20 Percent Spent In C Code. How Much Speed Up Can One Hope To Attain If Parallelism Is Used Only For The Sql Code? |
|
Answer» SINCE the part which cannot be parallelized takes 20% of the total RUNNING time, the best speed up we can HOPE for has to be less than 5. Since the part which cannot be parallelized takes 20% of the total running time, the best speed up we can hope for has to be less than 5. |
|
| 13. |
Suppose You Were In Charge Of The Database Operations Of A Company Whose Main Job Is To Process Transactions. Suppose The Company Is Growing Rapidly Each Year, And Has Outgrown Its Current Computer System. When You Are Choosing A New Parallel Computer, What Measure Is Most Relevant-speed Up, Batch Scale Up, Or Transaction Scale Up?why? |
|
Answer» With increasing SCALE of operations, we expect that the number of transactions submitted per unit time increases. On the other hand, we wouldn’t expect most of the individual transactions to grow LONGER, nor would we require that a given TRANSACTION should execute more quickly now than it did before. Hence transaction scale-up is the most RELEVANT measure in this SCENARIO. With increasing scale of operations, we expect that the number of transactions submitted per unit time increases. On the other hand, we wouldn’t expect most of the individual transactions to grow longer, nor would we require that a given transaction should execute more quickly now than it did before. Hence transaction scale-up is the most relevant measure in this scenario. |
|
| 14. |
What Is Lock De-escalation, And Under What Conditions Is It Required?why Is It Not Required If The Unit Of Data Shipping Is An Item? |
|
Answer» In a client-server system with page shipping, when a client requests an item, the server typically grants a lock not on the requested item, but on the page having the item, thus implicitly granting locks on all the items in the page. The other items in the page are SAID to be prefetched. If some other client subsequently requests one of the prefetched items, the server may ask the OWNER of the page lock to transfer back the lock on this item. If the page lock owner doesn’t need this item, it de-escalates the page lock that it holds, to item locks on all the items that it is actually accessing, and then returns the locks on the unwanted items. The server can then grant the latter lock request. If the unit of data shipping is an item, there are no COARSER granularity locks; even if prefetching is used, it is typically implemented by granting individual locks on each of the prefetched items. Thus when the server asks for a return of a lock, there is no question of de-escalation, the requested lock is just RETURNED if the client has no USE for it. In a client-server system with page shipping, when a client requests an item, the server typically grants a lock not on the requested item, but on the page having the item, thus implicitly granting locks on all the items in the page. The other items in the page are said to be prefetched. If some other client subsequently requests one of the prefetched items, the server may ask the owner of the page lock to transfer back the lock on this item. If the page lock owner doesn’t need this item, it de-escalates the page lock that it holds, to item locks on all the items that it is actually accessing, and then returns the locks on the unwanted items. The server can then grant the latter lock request. If the unit of data shipping is an item, there are no coarser granularity locks; even if prefetching is used, it is typically implemented by granting individual locks on each of the prefetched items. Thus when the server asks for a return of a lock, there is no question of de-escalation, the requested lock is just returned if the client has no use for it. |
|
| 15. |
Instead Of Storing Shared Structures In Shared Memory, An Alternative Architecture Would Be To Store Them In The Local Memory Of A Special Process, And Access The Shared Data By Inter Process Communication With The Process. What Would Be The Drawback Of Such An Architecture? |
|
Answer» The DRAWBACKS would be that TWO interprocess messages would be required to acquire locks, ONE for the request and one to confirm grant. Interprocess communication is much more expensive than memory access, so the cost of locking would increase. The process storing the shared structures could ALSO become a bottleneck. The benefit of this alternative is that the LOCK table is protected better from erroneous updates since only one process can access it. The drawbacks would be that two interprocess messages would be required to acquire locks, one for the request and one to confirm grant. Interprocess communication is much more expensive than memory access, so the cost of locking would increase. The process storing the shared structures could also become a bottleneck. The benefit of this alternative is that the lock table is protected better from erroneous updates since only one process can access it. |
|
| 16. |
Why Is It Relatively Easy To Port A Database From A Single Processor Machine To A Multiprocessor Machine If Individual Queries Need Not Be Parallelized? |
|
Answer» Porting is relatively easy to a shared memory multiprocessor machine. Databases DESIGNED for single-processor machines already provide multitasking, allowing multiple processes to run on the same processor in a time shared MANNER, giving a VIEW to the user of multiple processes running in parallel. Thus, coarse-granularity parallel machines logically appear to be identical to single-processor machines, MAKING the porting relatively easy. Porting a database to a shared disk or shared nothing multiprocessor architecture is a little harder. Porting is relatively easy to a shared memory multiprocessor machine. Databases designed for single-processor machines already provide multitasking, allowing multiple processes to run on the same processor in a time shared manner, giving a view to the user of multiple processes running in parallel. Thus, coarse-granularity parallel machines logically appear to be identical to single-processor machines, making the porting relatively easy. Porting a database to a shared disk or shared nothing multiprocessor architecture is a little harder. |
|
| 17. |
Explain The Difference Between A System Crash And A "disaster."? |
|
Answer» In a system crash, the CPU goes down, and disk may also crash. But STABLE-STORAGE at the site is assumed to survive system crashes. In a “DISASTER”, everything at a site is destroyed. Stable storage needs to be distributed to survive disasters. In a system crash, the CPU goes down, and disk may also crash. But stable-storage at the site is assumed to survive system crashes. In a “disaster”, everything at a site is destroyed. Stable storage needs to be distributed to survive disasters. |
|
| 18. |
Compare The Shadow-paging Recovery Scheme With The Log-based Recovery Schemes In Terms Of Ease Of Implementation And Overhead Cost? |
|
Answer» The shadow-paging scheme is easy to implement for single-transaction systems, but difficult for multiple-transaction systems. In particular it is very HARD to allow multiple updates concurrently on the same page. Shadow paging COULD suffer from extra space overhead, but garbage collection can take care of that. The I/O overhead for shadow paging is typically HIGHER than the log based schemes, since the log based schemes need to write one record per UPDATE to the log, whereas the shadow paging scheme NEEDS to write one block per updated block. The shadow-paging scheme is easy to implement for single-transaction systems, but difficult for multiple-transaction systems. In particular it is very hard to allow multiple updates concurrently on the same page. Shadow paging could suffer from extra space overhead, but garbage collection can take care of that. The I/O overhead for shadow paging is typically higher than the log based schemes, since the log based schemes need to write one record per update to the log, whereas the shadow paging scheme needs to write one block per updated block. |
|
| 19. |
Explain The Difference Between The Three Storage Types-volatile, Nonvolatile, And Stable-in Terms Of I/o Cost? |
|
Answer» Volatile storage is storage which fails when there is a power failure. Cache, MAIN memory, and REGISTERS are examples of volatile storage. Nonvolatile storage is storage which retains its content despite power FAILURES. An example is magnetic disk. Stable storage is storage which theoretically survives any kind of failure (short of a complete disaster!). This type of storage can only be approximated by replicating DATA. In terms of I/O cost, volatile memory is the fastest and non-volatile storage is typically several times SLOWER. Stable storage is slower than non-volatile storage because of the cost of data replication. Volatile storage is storage which fails when there is a power failure. Cache, main memory, and registers are examples of volatile storage. Nonvolatile storage is storage which retains its content despite power failures. An example is magnetic disk. Stable storage is storage which theoretically survives any kind of failure (short of a complete disaster!). This type of storage can only be approximated by replicating data. In terms of I/O cost, volatile memory is the fastest and non-volatile storage is typically several times slower. Stable storage is slower than non-volatile storage because of the cost of data replication. |
|
| 20. |
Devise A Time Stamp-based Protocol That Avoids The Phantom Phenomenon? |
|
Answer» Answer :In the text, we CONSIDERED two approaches to dealing with the PHANTOM phenomenon by means of locking. The coarser granularity approach OBVIOUSLY works for time stamps as well. The B+-tree index based approach can be adapted to time stamping by treating index buckets as data items with time stamps associated with them, and requiring that all read accesses use an index. We now show that this simple method works. Suppose a transaction TI wants to access all tuples with a particular range of search-key values, using a B+- tree index on that search-key. Ti will need to read all the buckets in that index which have key values in that range. It can be seen that any delete or insert of a tuple with a key-value in the same range will need to WRITE one of the index buckets read by Ti. Thus the logical conflict is converted to a conflict on an index bucket, and the phantom phenomenon is avoided. |
|
| 21. |
Explain The Phantom Phenomenon. Why May This Phenomenon Lead To An Incorrect Concurrent Execution Despite The Use Of The Two-phase Locking Protocol? |
|
Answer» The phantom PHENOMENON ARISES when, due to an insertion or deletion, two transactions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose Ti deletes a tuple from a relation while Tj scans the relation. If Ti deletes the tuple and then Tj reads the relation, Ti should be serialized before Tj . Yet there is no tuple that both Ti and Tj conflict on. An interpretation of 2PL as just locking the accessed tuples in a relation is incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is READ by any TRANSACTION that scans the relation, and modified by transactions that update, or insert into, or delete from the relation. Hence locking MUST also be performed on the index or relation data, and this will avoid the phantom phenomenon. The phantom phenomenon arises when, due to an insertion or deletion, two transactions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose Ti deletes a tuple from a relation while Tj scans the relation. If Ti deletes the tuple and then Tj reads the relation, Ti should be serialized before Tj . Yet there is no tuple that both Ti and Tj conflict on. An interpretation of 2PL as just locking the accessed tuples in a relation is incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is read by any transaction that scans the relation, and modified by transactions that update, or insert into, or delete from the relation. Hence locking must also be performed on the index or relation data, and this will avoid the phantom phenomenon. |
|
| 22. |
If Deadlock Is Avoided By Deadlock Avoidance Schemes, Is Starvation Still Possible? |
|
Answer» Answer :A transaction MAY become the victim of deadlock-prevention rollback ARBITRARILY many times, THUS creating a potential starvation SITUATION. |
|
| 23. |
Under What Conditions Is It Less Expensive To Avoid Deadlock Than To Allow Deadlocks To Occur And Then To Detect Them? |
|
Answer» Deadlock AVOIDANCE is PREFERABLE if the CONSEQUENCES of abort are SERIOUS (as in interactive transactions), and if there is high contention and a resulting high probability of deadlock. Deadlock avoidance is preferable if the consequences of abort are serious (as in interactive transactions), and if there is high contention and a resulting high probability of deadlock. |
|
| 24. |
Under A Modified Version Of The Timestamp Protocol, We Require That A Commit Bit Be Tested To See Whether A Read Request Must Wait. Explain How The Commit Bit Can Prevent Cascading Abort. Why Is This Test Not Necessary For Write Requests? |
|
Answer» Using the commit bit, a read request is made to wait if the transaction which wrote the data ITEM has not yet COMMITTED. Therefore, if the writing transaction fails before commit, we can abort that transaction alone. The waiting read will then ACCESS the EARLIER version in case of a multiversion system, or the restored value of the data item after abort in case of a single-version system. For writes, this commit bit checking is unnecessary. That is because either the write is a “blind” write and thus independent of the old value of the data item or there was a prior read, in which case the test was already applied. Using the commit bit, a read request is made to wait if the transaction which wrote the data item has not yet committed. Therefore, if the writing transaction fails before commit, we can abort that transaction alone. The waiting read will then access the earlier version in case of a multiversion system, or the restored value of the data item after abort in case of a single-version system. For writes, this commit bit checking is unnecessary. That is because either the write is a “blind” write and thus independent of the old value of the data item or there was a prior read, in which case the test was already applied. |
|
| 25. |
Consider The Validation-based Concurrency-control. Show That By Choosing Validation(ti), Rather Than Start(ti), As The Time Stamp Of Transaction Ti, We Can Expect Better Response Time Provided That Conflict Rates Among Transactions Are Indeed Low? |
|
Answer» In the concurrency control SCHEME choosing Start(Ti) as the time stamp of Ti gives a subset of the schedules ALLOWED by choosing Validation(Ti) as the time stamp. USING Start(Ti) means that whoever started FIRST must finish first. Clearly transactions could enter the validation phase in the same order in which they began executing, but this is overly restrictive. Since choosing Validation(Ti) causes fewer non conflicting transactions to restart, it gives the better response TIMES. In the concurrency control scheme choosing Start(Ti) as the time stamp of Ti gives a subset of the schedules allowed by choosing Validation(Ti) as the time stamp. Using Start(Ti) means that whoever started first must finish first. Clearly transactions could enter the validation phase in the same order in which they began executing, but this is overly restrictive. Since choosing Validation(Ti) causes fewer non conflicting transactions to restart, it gives the better response times. |
|
| 26. |
Use Of Multiple-granularity Locking May Require More Or Fewer Locks Than An Equivalent System With A Single Lock Granularity. Provide Examples Of Both Situations, And Compare The Relative Amount Of Concurrency Allowed. |
|
Answer» If a transaction needs to access a large a set of items, multiple granularity locking requires FEWER locks, whereas if only ONE ITEM needs to be ACCESSED, the single lock granularity system allows this with just one lock. Because all the desired data items are locked and unlocked together in the multiple granularity SCHEME, the locking overhead is low, but concurrency is also reduced. If a transaction needs to access a large a set of items, multiple granularity locking requires fewer locks, whereas if only one item needs to be accessed, the single lock granularity system allows this with just one lock. Because all the desired data items are locked and unlocked together in the multiple granularity scheme, the locking overhead is low, but concurrency is also reduced. |
|
| 27. |
Although Six Mode Is Useful In Multiple-granularity Locking, An Exclusive And Intend-shared (xis) Mode Is Of No Use. Why Is It Useless? |
|
Answer» An EXCLUSIVE lock is incompatible with any other lock kind. Once a node is locked in exclusive mode, NONE of the descendents can be SIMULTANEOUSLY accessed by any other transaction in any mode. Therefore an exclusive and intend-shared declaration has no MEANING. An exclusive lock is incompatible with any other lock kind. Once a node is locked in exclusive mode, none of the descendents can be simultaneously accessed by any other transaction in any mode. Therefore an exclusive and intend-shared declaration has no meaning. |
|
| 28. |
In Multiple-granularity Locking, What Is The Difference Between Implicit And Explicit Locking? |
|
Answer» When a TRANSACTION explicitly locks a NODE in SHARED or exclusive mode, it implicitly locks all the DESCENDENTS of that node in the same mode. The transaction need not explicitly lock the descendent nodes. There is no DIFFERENCE in the functionalities of these locks, the only difference is in the way they are acquired, and their presence tested. When a transaction explicitly locks a node in shared or exclusive mode, it implicitly locks all the descendents of that node in the same mode. The transaction need not explicitly lock the descendent nodes. There is no difference in the functionalities of these locks, the only difference is in the way they are acquired, and their presence tested. |
|
| 29. |
When A Transaction Is Rolled Back Under Time Stamp Ordering, It Is Assigned A New Time Stamp. Why Can It Not Simply Keep Its Old Time Stamp? |
|
Answer» A transaction is rolled back because a newer transaction has read or written the data which it was supposed to WRITE. If the rolled back transaction is re-introduced with the same TIME stamp, the same reason for ROLLBACK is STILL VALID, and the transaction will have be rolled back again. This will continue indefinitely. A transaction is rolled back because a newer transaction has read or written the data which it was supposed to write. If the rolled back transaction is re-introduced with the same time stamp, the same reason for rollback is still valid, and the transaction will have be rolled back again. This will continue indefinitely. |
|
| 30. |
In Time Stamp Ordering,w-time Stamp(q) Denotes The Largest Time Stamp Of Any Transaction That Executed Write(q) Successfully. Suppose That, Instead, We Defined It To Be The Time Stamp Of The Most Recent Transaction To Execute Write(q) Successfully.would This Change In Wording Make Any Difference? |
|
Answer» It WOULD make no difference. The write protocol is such that the most RECENT TRANSACTION to write an item is also the one with the largest time stamp to have DONE so. It would make no difference. The write protocol is such that the most recent transaction to write an item is also the one with the largest time stamp to have done so. |
|
| 31. |
Consider A Database Organized In The Form Of A Rooted Tree. Suppose That We Insert A Dummy Vertex Between Each Pair Of Vertices. Show That, If We Follow The Tree Protocol On The New Tree, We Get Better Concurrency Than If We Follow The Tree Protocol On The Original Tree? |
|
Answer» The proof is in Buckley and Silberschatz, “Concurrency Control in Graph Protocols by USING Edge Locks,” PROC. ACM SIGACT-SIGMOD SYMPOSIUM on the Principles of DATABASE Systems, 1984. The proof is in Buckley and Silberschatz, “Concurrency Control in Graph Protocols by Using Edge Locks,” Proc. ACM SIGACT-SIGMOD Symposium on the Principles of Database Systems, 1984. |
|
| 32. |
Most Implementations Of Database Systems Use Strict Two-phase Locking. Suggest Three Reasons For The Popularity Of This Protocol? |
|
Answer» It is relatively simple to implement, imposes low ROLLBACK overhead because of cascadeless SCHEDULES, and usually ALLOWS an ACCEPTABLE LEVEL of concurrency. It is relatively simple to implement, imposes low rollback overhead because of cascadeless schedules, and usually allows an acceptable level of concurrency. |
|
| 33. |
What Benefit Does Rigorous Two-phase Locking Provide? How Does It Compare With Other Forms Of Two-phase Locking? |
|
Answer» Rigorous two-phase locking has the advantages of strict 2PL. In addition it has the property that for two conflicting transactions, their COMMIT order is their serializability order. In some SYSTEMS USERS MIGHT expect this behavior. Rigorous two-phase locking has the advantages of strict 2PL. In addition it has the property that for two conflicting transactions, their commit order is their serializability order. In some systems users might expect this behavior. |
|
| 34. |
What Benefit Does Strict Two-phase Locking Provide? What Disadvantages Result? |
|
Answer» Because it produces only cascadeless schedules, RECOVERY is very easy. But the SET of schedules obtainable is a subset of those obtainable from PLAIN TWO phase locking, thus concurrency is reduced. Because it produces only cascadeless schedules, recovery is very easy. But the set of schedules obtainable is a subset of those obtainable from plain two phase locking, thus concurrency is reduced. |
|
| 35. |
What Is A Cascadeless Schedule?why Is Cascadelessness Of Schedules Desirable? Are There Any Circumstances Under Which It Would Be Desirable To Allow Noncascadeless Schedules? |
|
Answer» A cascadeless schedule is ONE where, for each pair of transactions Ti and TJ such that Tj reads data ITEMS previously written by Ti, the commit operation of Ti appears before the read operation of Tj. Cascadeless SCHEDULES are desirable because the failure of a transaction does not LEAD to the aborting of any other transaction. Of course this comes at the cost of less concurrency. If failures occur rarely, so that we can pay the price of cascading aborts for the increased concurrency, noncascadeless schedules might be desirable. A cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the read operation of Tj. Cascadeless schedules are desirable because the failure of a transaction does not lead to the aborting of any other transaction. Of course this comes at the cost of less concurrency. If failures occur rarely, so that we can pay the price of cascading aborts for the increased concurrency, noncascadeless schedules might be desirable. |
|
| 36. |
What Is A Recoverable Schedule? Why Is Recoverability Of Schedules Desirable? Are There Any Circumstances Under Which It Would Be Desirable To Allow Nonrecoverable Schedules? |
|
Answer» A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items PREVIOUSLY written by Ti, the COMMIT operation of Ti appears before the commit operation of Tj. Recoverable schedules are desirable because failure of a transaction might otherwise BRING the system into an irreversibly INCONSISTENT state. Nonrecoverable schedules may sometimes be needed when updates must be made visible early due to time constraints, even if they have not yet been COMMITTED, which may be required for very long duration transactions. A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the commit operation of Tj. Recoverable schedules are desirable because failure of a transaction might otherwise bring the system into an irreversibly inconsistent state. Nonrecoverable schedules may sometimes be needed when updates must be made visible early due to time constraints, even if they have not yet been committed, which may be required for very long duration transactions. |
|
| 37. |
Since Every Conflict-serializable Schedule Is View Serializable, Why Do We Emphasize Conflict Serializability Rather Than View Serializability? |
|
Answer» Most of the concurrency control protocols (protocols for ensuring that only SERIALIZABLE SCHEDULES are generated) used in practice are based on conflict serializability—they actually permit only a SUBSET of conflict serializable schedules. The GENERAL FORM of view serializability is very expensive to test, and only a very restricted form of it is used for concurrency control. Most of the concurrency control protocols (protocols for ensuring that only serializable schedules are generated) used in practice are based on conflict serializability—they actually permit only a subset of conflict serializable schedules. The general form of view serializability is very expensive to test, and only a very restricted form of it is used for concurrency control. |
|
| 38. |
Explain The Distinction Between The Terms Serial Schedule And Serializable Schedule. |
|
Answer» A schedule in which all the INSTRUCTIONS belonging to one single transaction appear together is CALLED a SERIAL schedule. A serializable schedule has a weaker restriction that it should be equivalent to some serial schedule. There are two definitions of schedule EQUIVALENCE – conflict equivalence and VIEW equivalence. Both of these are described in the chapter. A schedule in which all the instructions belonging to one single transaction appear together is called a serial schedule. A serializable schedule has a weaker restriction that it should be equivalent to some serial schedule. There are two definitions of schedule equivalence – conflict equivalence and view equivalence. Both of these are described in the chapter. |
|
| 39. |
Justify The Following Statement: Concurrent Execution Of Transactions Is More Important When Data Must Be Fetched From (slow) Disk Or When Transactions Are Long, And Is Less Important When Data Is In Memory And Transactions Are Very Short. |
|
Answer» If a transaction is very long or when it fetches data from a slow disk, it takes a long TIME to COMPLETE. In absence of CONCURRENCY, other transactions will have to wait for longer period of time. Average response time will increase. Also when the transaction is reading data from disk, CPU is idle. So resources are not properly utilized. Hence concurrent execution becomes important in this case. However, when the transactions are short or the data is available in memory, these problems do not OCCUR. If a transaction is very long or when it fetches data from a slow disk, it takes a long time to complete. In absence of concurrency, other transactions will have to wait for longer period of time. Average response time will increase. Also when the transaction is reading data from disk, CPU is idle. So resources are not properly utilized. Hence concurrent execution becomes important in this case. However, when the transactions are short or the data is available in memory, these problems do not occur. |
|
| 40. |
During Its Execution, A Transaction Passes Through Several States, Until It Finally Commits Or Aborts. List All Possible Sequences Of States Through Which A Transaction May Pass. Explain Why Each State Transition May Occur. |
|
Answer» The possible sequences of states are:-
The possible sequences of states are:- |
|
| 41. |
Database-system Implementers Have Paid Much More Attention To The Acid Properties Than Have File-system Implementers. Why Might This Be The Case? |
|
Answer» Database systems usually perform crucial tasks whose effects need to be ATOMIC and durable, and whose outcome affects the real world in a permanent manner. EXAMPLES of such tasks are monetary transactions, SEAT BOOKINGS etc. Hence the ACID PROPERTIES have to be ensured. In contrast, most users of file systems would not be willing to pay the price (monetary, disk space, time) of supporting ACID properties. Database systems usually perform crucial tasks whose effects need to be atomic and durable, and whose outcome affects the real world in a permanent manner. Examples of such tasks are monetary transactions, seat bookings etc. Hence the ACID properties have to be ensured. In contrast, most users of file systems would not be willing to pay the price (monetary, disk space, time) of supporting ACID properties. |
|
| 42. |
Consider A File System Such As The One On Your Favorite Operating System. what Are The Steps Involved In Creation And Deletion Of Files, And In Writing Data To A File? explain How The Issues Of Atomicity And Durability Are Relevant To The Creation And Deletion Of Files, And To Writing Data To Files. |
|
Answer» There are several STEPS in the creation of a FILE. A storage area is assigned to the file in the file system, a unique i-number is given to the file and an i-node entry is inserted into the i-list. Deletion of file involves EXACTLY opposite steps. For the file system user in UNIX, durability is important for obvious REASONS, but atomicity is not relevant generally as the file system doesn’t support transactions. To the file system implementor though, many of the internal file system ACTIONS need to have transaction semantics. All the steps involved in creation/deletion of the file must be atomic, otherwise there will be unreferenceable files or unusable areas in the file system. There are several steps in the creation of a file. A storage area is assigned to the file in the file system, a unique i-number is given to the file and an i-node entry is inserted into the i-list. Deletion of file involves exactly opposite steps. For the file system user in UNIX, durability is important for obvious reasons, but atomicity is not relevant generally as the file system doesn’t support transactions. To the file system implementor though, many of the internal file system actions need to have transaction semantics. All the steps involved in creation/deletion of the file must be atomic, otherwise there will be unreferenceable files or unusable areas in the file system. |
|
| 43. |
Suppose That There Is A Database System That Never Fails. Is A Recovery Manager Required For This System? |
|
Answer» Even in this case the recovery MANAGER is NEEDED to PERFORM roll-back of ABORTED transactions. Even in this case the recovery manager is needed to perform roll-back of aborted transactions. |
|
| 44. |
List The Acid Properties. Explain The Usefulness Of Each? |
|
Answer» The ACID properties, and the need for each of them are:-
The ACID properties, and the need for each of them are:- |
|
| 45. |
The Indexed Nested-loop Join Algorithm Can Be Inefficient If The Index Is A Secondary Index, And There Are Multiple Tuples With The Same Value For The Join Attributes. Why Is It Inefficient? Describe A Way, Using Sorting, To Reduce The Cost Of Retrieving Tuples Of The Inner Relation. Under What Conditions Would This Algorithm Be More Efficient Than Hybrid Merge-join? |
|
Answer» If there are multiple tuples in the inner relation with the same value for the join attributes, we MAY have to access that many blocks of the inner relation for each tuple of the outer relation. That is why it is inefficient. To reduce this cost we can perform a join of the outer relation tuples with just the secondary index LEAF entries, postponing the inner relation tuple retrieval. The result file obtained is then sorted on the inner relation addresses, allowing an EFFICIENT physical order scan to complete the join. HYBRID merge–join requires the outer relation to be sorted. The above algorithm does not have this REQUIREMENT, but for each tuple in the outer relation it needs to perform an index lookup on the inner relation. If the outer relation is much larger than the inner relation, this index lookup cost will be less than the sorting cost, thus this algorithm will be more efficient. If there are multiple tuples in the inner relation with the same value for the join attributes, we may have to access that many blocks of the inner relation for each tuple of the outer relation. That is why it is inefficient. To reduce this cost we can perform a join of the outer relation tuples with just the secondary index leaf entries, postponing the inner relation tuple retrieval. The result file obtained is then sorted on the inner relation addresses, allowing an efficient physical order scan to complete the join. Hybrid merge–join requires the outer relation to be sorted. The above algorithm does not have this requirement, but for each tuple in the outer relation it needs to perform an index lookup on the inner relation. If the outer relation is much larger than the inner relation, this index lookup cost will be less than the sorting cost, thus this algorithm will be more efficient. |
|
| 46. |
Design A Variant Of The Hybrid Merge-join Algorithm For The Case Where Both Relations Are Not Physically Sorted, But Both Have A Sorted Secondary Index On The Join Attributes? |
|
Answer» We merge the LEAF entries of the first sorted secondary index with the leaf entries of the second sorted secondary index. The result FILE contains pairs of addresses, the first address in each pair POINTING to a tuple in the first relation, and the second address pointing to a tuple in the second relation. This result file is first sorted on the first relation’s addresses. The relation is then scanned in physical storage order, and addresses in the result file are replaced by the actual tuple values. Then the result file is sorted on the second relation’s addresses, allowing a SCAN of the second relation in physical storage order to complete the JOIN. We merge the leaf entries of the first sorted secondary index with the leaf entries of the second sorted secondary index. The result file contains pairs of addresses, the first address in each pair pointing to a tuple in the first relation, and the second address pointing to a tuple in the second relation. This result file is first sorted on the first relation’s addresses. The relation is then scanned in physical storage order, and addresses in the result file are replaced by the actual tuple values. Then the result file is sorted on the second relation’s addresses, allowing a scan of the second relation in physical storage order to complete the join. |
|
| 47. |
What Are The Advantages And Disadvantages Of Hash Indices Relative To B+-tree Indices? How Might The Type Of Index Available Influence The Choice Of A Query Processing Strategy? |
|
Answer» Hash indices enable US to perform point LOOK up (eg. σA=r(relation)) operations very fast, but for range searches the B+-tree INDEX would be much more efficient. If there is a range query to be evaluated, and only a hash index is available, the better STRATEGY might be to perform a file scan rather than using that index. Hash indices enable us to perform point look up (eg. σA=r(relation)) operations very fast, but for range searches the B+-tree index would be much more efficient. If there is a range query to be evaluated, and only a hash index is available, the better strategy might be to perform a file scan rather than using that index. |
|
| 48. |
Why Is It Not Desirable To Force Users To Make An Explicit Choice Of A Query Processing Strategy? Are There Cases In Which It Is Desirable For Users To Be Aware Of The Costs Of Competing Query-processing Strategies? |
|
Answer» In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users WOULD make poor CHOICES about processing queries is that they would not know how a RELATION is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a MAJOR OBJECT of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system. In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users would make poor choices about processing queries is that they would not know how a relation is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a major object of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system. |
|
| 49. |
How Does Data Encryption Affect Index Schemes? In Particular, How Might It Affect Schemes That Attempt To Store Data In Sorted Order? |
|
Answer» Note that indices must OPERATE on the encrypted data or someone could gain access to the index to interpret the data.Otherwise, the index would have to be RESTRICTED so that only CERTAIN users could access it. To KEEP the data in sorted order, the index scheme would have to decrypt the data at each level in a tree. Note that hash systems would not be AFFECTED. Note that indices must operate on the encrypted data or someone could gain access to the index to interpret the data.Otherwise, the index would have to be restricted so that only certain users could access it. To keep the data in sorted order, the index scheme would have to decrypt the data at each level in a tree. Note that hash systems would not be affected. |
|
| 50. |
Show How To Compute Existence Bitmaps From Other Bitmaps. Make Sure That Your Technique Works Even In The Presence Of Null Values, By Using A Bitmap For The Value Null? |
|
Answer» The existence BITMAP for a relation can be calculated by taking the union (logical-or) of all the BITMAPS on that attribute, INCLUDING the bitmap for value NULL. The existence bitmap for a relation can be calculated by taking the union (logical-or) of all the bitmaps on that attribute, including the bitmap for value null. |
|