DBMS
CHAPTER-1
INTRODUCATION TO DBMS
System
A
system is an integration of entities, alternatively designed as components,
which have integration among them.
For
E.g., consider a particular department in a college (or) university. The
entities of the department are students, non-teaching staff, infrastructure
etc. These entities interact with another.
The information system can be either a
conventional file processing system (or) database management system.
Explain
the Conventional File Processing System:
In the conventional File Processing
System each and every sub system of the information system will have own set of
files. As a result there will be a duplication of data between various sub
systems.
The concept of Conventional File
Processing System is shown below.
The above system consists of 3 sub-systems
namely application-x, application-y and applications.
It is clearly that some of the files are
duplicated in different subsystems of the conventional file processing system.
This will intern increase the data redundancy.
Example
of Conventional File Processing System:
Consider the example of a hospital system.
The following diagram of the hospital is shown below.
The patients come to the hospital from the
society. Upon the arrival of a preliminary registration is done by seeking
information about the patient. Then depending on the type and illness, the
patient will either be treated as out-patient (or) in-patient. In some cases
initially a patient will be treated as out-patient and then the patient will be
admitted as outpatient if necessary. Finally the bills are to be paid before
the patient is discharged. In this system we are using four files. The files
are
Patient
File: At the Registration.
In-patient
Treatment File: at the in-patient
section.
Out-patient
Treatment File: at the out-patient section.
In
Voice File: at the accounts section.
The
files are maintained in different sections of the hospital in a decentralized
manner, certain data items (patient no, patient name, patient address) will be
duplicated. This will have some undesirable results. Some of the difficulties
of the conventional file processing system of the hospital are the following.
There
may be a duplication of data in different sections of the hospital which would
lead to inconsistency of data.
A
lot of paper work and telephone calls would be required to synchronize file
structure.
The
system cannot provide answer to complex queries.
What
are the Drawbacks of Conventional File Processing System ?
A
list of drawbacks of the Conventional File Processing System is presented
below.
Uncontrolled
Redundancy of data.
Inconsistency
of data.
Inflexibility
Lack
of backup and recovery
Limited
data sharing.
Poor
enforcement of standards.
Unable
to represent Relationships among data.
Excessive
program maintenance and low programming productivity.
Uncontrolled
Redundancy of data
Each sub system of an organization
maintains own set of files without data sharing, the same data will be
available in different files. This will result increased disc space, increased
time of data entry and inconsistency of data.
Inconsistency
of data
The uncontrolled redundancy of data will
permit the system to have the same data in different files. As a result, a
particular data element like patient name will be entered differently in
different files. Which is nothing but inconsistency of that particular data
element . while performing the basic data operations like updation, deletion,
retrieval etc. This inconsistency will give misleading results.
Inflexibility
In
the conventional file processing system generally top down approach will be followed
in file design. In this approach a set of meaningful entities of the proposed
system will be identified along with their attributes to create the files. The
actual set of reports which is required for the system will not be considered
on this approach. As a result , it may not be possible to meet the requirements
of the users fully. Also in the future , if there is a some changes in the user
requirements , then the conventional file processing system will not be
flexible to provide the results.
Lack
of Backup and recovery
In this conventional file processing system
there is no implicit facility for backup and recovery from system failure. It
means that when an application program failed in middle of its work on its
updating on a file.
Limited
data sharing
In the conventional file processing system
the data is stored in a decentralized manner, hence sharing of the data is complex one.
Poor
enforcement of standards
Since different applications and their
respective files were developed by different groups will design data fields,
since each group will follow its own standards for fields name, fields width,
fields type etc. This will create a serious difficulty while modifying programs
and data structures by different groups of users which will leads to low
programmer productivity.
Unable
to represent relationships among data.
In the conventional file processing system
there is no implicit facility to represent relationship among data in different
file for a single system.
Excessive
program maintenance & Low program productivity
Since the different applications are
developed differently by different groups in terms of file specifications and
program specifications, it will be very difficult to modify the programs and
data structure at a later stage by a different group. Many of program variables
may be defined differently in different programs. All these difficulties will
leads to excessive maintenance.
Programmer productivity is a measure of
time taken to develop an application. If the time taken to develop an
application is low then the programmer productivity is high and vice versa.
What
is DATABASE?
Database
is a collection of related data (or) files.
Data
means known facts that can be recorded and that have implicit meaning.
For
E.g., consider the names, telephone numbers and addresses of the people. We
have recorded this data in an address book (or) we may have stored it on a file
in the hard disk, using a computer and software such as Microsoft Excel (or) MS
Access. This collection of inter related data is a database. Generally a
database contains one data file (or) large number of data files. The database
is organized in such a way that a application programs will quickly retrieve
the required data.
What
is Database Management System (DBMS)?
A database management system (DBMS) is a
collection of programs that enables users to create and maintain a database.The
DBMS is general-purpose software system that facilitates the defining,
constructing, manipulating and sharing database among various users and
applications.
Defining
a database involves specifying the data types, structures, and constraints for
the data to be stored in the database.
Constructing
the database is the process of storing the data itself on some storage medium
that is controlled by the DBMS.
Manipulating
a database includes such functions as querying the database to retrieve
specific data, updating the database to reflect changes in the database, and
generating reports from the data.
Sharinga
database allows multiple users and programs to access the database
simultaneously. Other important function provided by the DBMS is unauthorized
users cannot access the database.
Example
of Database Management System
Consider
the example of the hospital system which deals in-patients as well as out-
patients. The hospital system is shown below.
In the Conventional File Processing System
there will be a separate system for in-patients, Out-patient and Accounts. But
under the database approach, all the necessary files are included in a single
database, which minimizes redundancy of data and facilitates sharing. As a
result all the drawbacks of Conventional File Processing System are eliminated
in the Database System.
What
are the Advantages of Database Management System:
The Advantages of Database Management System
are
Controlled
Redundancy of data.
Consistency
of data.
Flexibility
Backup
and recovery
Enhanced
Data sharing.
Better
enforcement of standards.
Relationships
among data.
Reduced
program maintenance and Increased programmer productivity.
1.
Controlled Redundancy of data
The
database approach uses a centralized place to save data hence the amount of
data redundancy will be minimized. Data redundancy can be minimized by applying
normalization process in database design.
2.
Consistency of data
The
controlled redundancy will minimize the presence of same data in different
files. This will lead consistency of data by reducing multiple inserts and
updates on same data at different places.
3.
Flexibility
In
database approach , the database design based on bottom up approach. In this
approach all the reports that are currently used and various expected reports
are taken into design the database. When some changes in report requirements
occurs revision of database can be done with minor changes in the database.
4.
Backup and Recovery
In DBMS if a transaction fails in
middle of its execution due to system failure then DBMS will get back the data
into its consistent state as before.
5.
Enhanced Data Sharing
In
database approach uses centralized database hence same data can be shared by
different applications or users simultaneously with concurrency control. In
DBMS multiple users will access same data and can do changes.
6.
Better enforcement of standards
Since different files of the database
are design at a time of a different subsystems, it will be better enforcement
of standards in terms of defining field name, field width , field type etc.,
7.
Relationships among data
We can apply relationships among data
to improve performance of applications and consistency (correctness) of data.
For example foreign key constraint makes a relationship between Employee and
Department information.
8.
Reduced program maintenance and increase of programming productivity.
Different applications are developed
under the coordination of the database administrator. As a result, there will
be a integrated effort among the different group in terms of file design and
program design. This will reduce the program maintenance.
Programmer
productivity is measure of time taken to develop an applications. In database
approach , data is separated from programs. There are many fourth generation
languages available to access and manipulate the database.
Because
of advanced capabilities of fourth generation languages , the time taken to
develop an application will be less when compared to the time taken to develop
using conventional file processing system.
What
are the Application of DBMS ?
The application of database is:
Banking:
For customer information, accounts, loans, Bank transactions.
Airlines:
For reservations and scheduled information.
Universities:
For student information, course registration and grading.
Telecommunication:
For keeping records of call mode, generating monthly bills,
maintaining balances on pre-paid cards and
storing information about the communication
Network.
What
are the Disadvantages of DBMS ?
The following are the disadvantages of
using DBMS.
Increased
complexity.
Requirement
of new and specialized manpower.
Large
size of DBMS.
Increased
installation and maintenance cast.
Conversion
cost.
Increased
complexity
A
multi user DBMS becomes an complex piece of software due to expected
functionalities from it. It becomes necessary for database designers,
developers, database administrator and end users to understand these
functionalities. Failure to understand that can lead to bad designed decisions.
Requirement
of new and specialized manpower
Because
of rapid in database technology and organizations, business need to trained
manpower on regular basis to design and implement of database administrative
services and manage a staff of new people, therefore an organization needs to
maintain specialized skill person.
Large
size of DBMS
The
DBMS occupies many Giga Bytes of storage space and requires more amount of main
memory to run efficiently.
Increased
installation and maintenance cost
The
DBMS software has a high initial cost. It requires trained person to install
and operate. and also has more annual maintenance. Installing such software’s
also requires upgrades to the hardware and software.
Conversion
cost
The
conversion cost from old database technology to modern database environment is
high.
Explain
the DBMS Architecture
The
main aim of database system is to provide an abstract view of data hiding of
certain detail of how data is stored. And manipulated, to satisfy these needs
to develop architecture for database system.
In
early days the whole DBMS package was a single package where as modern DBMS is
based on client-server architecture.
Under
the client-server architecture the database is not present in the client
machine. But the client machine connected to the database system through
Network and server.
There
are two types of DBMS architecture as shown below.
Client
Application srever
Application srever |
Server
Two Tire Architecture Three Tire Architecture
In
two-tier architecture, the application is a component that resides on the
client machines that communicate server machine through query language
statements.
In
three-tier architecture, the client machine cannot directly communicate with an
application server, the application server communicates with a database to
access data. Three tier architectures are more suitable for large applications.
Explain
the Database Architecture
A database Architecture is shown below
depending upon the three tier architecture. It contains of 3 levels.
Internal
level
The internal schema defines the internal
level. The internal level is the lowest level of data abstraction. This level
indicates how the data will be stored into the database and describes the file
structures and data structures and methods to be used by the data base.
Conceptual
level
The
conceptual schema defines the conceptual level. The conceptual level in the
middle level abstraction. This level indicates entities, attributes,
relationship between entities and attributes.
External
level
External
schema defines the external level. The external level is the highest level of
data abstraction. This level describes part of database. i.e., relevant to the
user.
What
is Data independency ?
Data independency is the characteristic
of database. To change the schema at one level without changing the schema at
the higher level. There are two types of data independency as shown below.
Physical
independency
In
physical independency, changes to the internal schema such as file structures, accessing
methods and devices used for store the data must be possible, without changing
the conceptual schema and external schema.
Logical
independency
In
logical independency, changes to the conceptual schema such as addition and
deletion of entities, addition and deletion of attributes, addition and
deletion of relationships must be possible without change in external schema.
What
are the Functions or services of DBMS ?
The
functions and services of DBMS are
Data
storage Management
DBMS
creates the structure for database in the physical storage devices. It provides
a mechanism for permanent storage of dat.
Data
Manipulation Management
The DBMS provides ability to add new data
into the database (or) retrieve, update and delete existing data in the database.
Data
Definition Management
The
DBMS creates the structure of data in which the data is stored.
Data
dictionary
The
DBMS provides a data dictionary in which stores the description of data items.
Authorization
The DBMS protects the database against
unauthorized access either intentional (or) accidental.
Backup
and recovery
The DBMS provides a mechanism for Backup
data periodically and recovery from different types of failures.
Concurrency
control
The
DBMS supports sharing of data among multiple users. The DBMS provides a
mechanism for concurrent access to the database.
Transaction
Management
The
transaction in a series of database operations, which access (or) changes the
content of the database. This is done by the transaction management.
Data
Independency Service
The
DBMS supports the independency of the programs from its structure of the
database.
Integrity
Service
The
DBMS provides integrity service to store the data into the database (or) to change
the data into the database follows certain rules.
What
are the Functions database Administrator ?
Database administrator is an
individual person with an overview of one (or) more databases and also controls
the design and use of database.
Functions and responsibilities of DBA
are
Defining
conceptual schema and database creation
The
DBA creates the conceptual schema such as defining entities and attributes,
deleting entities and attributes and modifying entities and attributes etc. The
DBA also creates the structure of the database.
storage
structure and access method definition
The
DBA defines the storage structure of the data. And access methods of the
database.
Granting
authorization to the user
The
DBA grants the access to use the database to its users. The authorization
information is kept is a system, the database system consults whenever someone
attempt to access the data in the system.
Routine
maintenance
The
DBA maintains periodically backups of the database either on hard disc (or) CD
to prevent loss of data in case of failure.
Job
monitoring
The
DBA is responsible for the performance of data is not decreased.
What
are the various components of Database Systems
?
The database system is composed of the
five major components.
Hardware
Software
People
Producers
Data
Hardware:
Hardware refers to physical components of the system.
E.g. storage devices, printers etc.
Software:
software is a set of programs. To make database system function fully, three
types of software’s are needed. They are
a. Operating System Software
b.
DBMS software.
c.
Application Programs and utility software.
a. Operating system software: operating system manages all hardware
components and run other software’s on the computer.
E.g. WINDOWS, LINUX
b. DBMS Software: manages the database within the database
system.
E.g.
Oracle, SQL, MS Access
C. Application programs and utility
software: Application programs are used
to access and manipulate data to generate reports and making decisions.
Utilities
are the software tools used to help, manage the database systems computer
components.
People:
There are five types of users in the database system.
System
Administrator: to see the database systems general operation.
Database
Administrator: see the database is functionality properly.
Database
designers: design the database structure.
System
analyst and programmers: Design and implement the application programs.
End
user: use the application programs to run the organization daily operations.
Procedures: procedures are the set of rules based on
design and use the database.
Data: The data is a facts stored in the database.
Because data are the raw material from which the information is generated.
What
is Data?
A system consists of interrelated
entities, each entity has a set of attributes of entities of the system.
What
is Information ?
Information is nothing but processed
data.
Define
Meta data
Meta data is the data about the data
i.e., information for accessing the data.
Explain
the Terminology of a file
Field:
A field is the lowest level of data item of an entity which is alternatively
called as an attribute of that entity.
Emp Empno Empname Empaddress
Record:
Record is the collection of fields (or) attributes of an entity.
Empno Empname Empaddress
1 sweaty banglore
File:
File is a collection of records having same set of fields arranged in the same
sequence.
Empno Empname Empaddress
1 manimala Chennai
2 priya Hyderabad
Key
field (or) Primary key: A key field is said to be key field (Or) primary key if
it can identify a record uniquely in a file.
e.g. student no in student file
emp no in emp file.
Non
key field (or) secondary key: A field is said to be Non key field (or)
secondary key if it cannot identify a record uniquely in a file.
e.g. student name in student file.
Emp name in emp file.
Schema:
it is a overall view of all the files in the database.
Subschema:
A portion of the schema of a database is called as subschema.
CHAPTER-2
Data
Models
Data
Modeling and Data Models:
Data
model:
A
data model is a collection of concepts that can be used to describe the structure
of a database
Data
modeling in the first step in designing a database refers to the process of
creating a specific data model for a problem.
A model is an abstraction of a real
world object. A data model represents data structures and their
characteristics, relations, constraints and transactions.
Data model is an iterative process we
start with a simple understanding of the problem increases, and finally design
a database in a specific database model.
Importance
of Data Models:
Data
Model can facilitate interaction among the designer, the application programmer
and the end user.
Applications
are used to transform data into information. But data are viewed in different
ways by different people.
For
e.g. the manager and clerk both are working in the same company, the manager
have wide view of company data than the clerk.
A
company president has universal view of data.
Different
Managers views data differently in a company. The inventory manager is more
concerned about inventory levels, while purchasing manager concerned about
items and supplies.
Application
programmers have another view of data i.e., concerned with data locations and
formatting.
A
house is a collection of roots, if someone is going to build a house, they have
the overall view i.e., provided by blue print. A sound data environment
requires an overall database blue print based on appropriate data model.
When
a good database blue print is available, an application programmer view of data
is different from the managers and end users. When a good database blue print
is not available problems are likely to ensure.
Data
Model basic building blocks:
The basic building blocks of data
models are entities, attributes, relationships and constraints. An entity represents
a real world object person (or) place.
For e.g., a customer entity have different of
customers.
An attribute is a characteristic of an entity.
For e.g. customer entity have attributes
customer_no, customer_name, customer_address etc. A relationship describes an
association between entities. Data models use three types of associations.
One-to-many, many-to-many and one-to-one.
One-to-many
(1:M, 1…..*): A painter paints many different paintings. Therefore, the
database designer label the relationship PAINTER PAINTS
PAINTINGS as one-to-many.
Many-to-many
(M:N, *…….*): An employee may learn many
job skills and each job skill may be learned by many employees. Therefore, the database designer label the
relationship Employee learns skills as many-to-many (M:N).
One-to-one
(1:1, 1…….1): Each store manager manages
only a single store. Therefore, the data designer label the relationship
employee manages stores as one-to-one (1:1).
By using the business rules we can properly
identify entities, attributes, relationships and constraints.
Business rules:
A business rule is a description of a policy,
procedure (Or) principle within a business organization. Examples of business
rule.
A
customer may generate many invoices.
A
training session cannot be scheduled for fever than 10 employees or for more
than 30 employees. These
business rules establish entities, relationships and constraints.
The first business rule establishes two
entities (customer, invoices) and a one-to-many relationship between these two
entities.
The 2nd business rule
establishes a constraint. (No fewer than 10 people (or) more than 30 people)
and two entities (training, people) and a relationship between employee and
training.
Discovering business rules:
The
main source of business rules are company manager, policy manager, department
manager and written documents such as company’s procedures, standards (or)
operation manuals. A faster direct source of business rules is direct
interviews with the concerned persons.
Translating
business rules into Data Model:
General rule: A noun in a business rule be
translate into an entity in that model, and a verb associating nouns will
translated into a relationship among the entities.
For
e.g. the business rule “ customer may generate many voices “
Containing two nouns (customer and
invoices) and a verb (generate) that associates the noun.
To proper identify the type of
relationship, the relationships are bi-directional.
For e.g. the business rule “ A customer may generate many invoices”,
the relationship is one-to-many (1:M, 1…….*). Customer is the 1 side and
invoice is the many side.
Evolution
of Data Models:
Generation
|
Time
|
Model
|
Example
|
First
|
1960-70
|
File
system
|
VMS
|
Second
|
1970
|
Hierarchical
and network data models.
|
IMS,
focus IDMS
|
Third
|
Mid
1970’s-present
|
Relational
data model
|
M.S.Acess,
Oracle
|
Fourth
|
Mid
1980’s-present
|
Object
Oriented Model
Extended
Relational Model
|
Versant
Objectivity
|
Fifth
|
Present-Future
|
XML
|
Oracle
log
|
Hierarchical Data Model:
The hierarchical
data model is the oldest type of data model, developed by IBM in 1968.
This data model organizes the data in a tree-like structure, in which
each child node (also known as dependents) can have only
one parent node. The database based on the hierarchical data model
comprises a set of records connected to one another through links. The link is
an association between two or more records. The top of the tree structure
consists of a single node that does not have any parent and is called the root
node.
The
hierarchical data model consists of a set of nested relationships one-to-many
and one-to-one association.
In
hierarchical data model the relations are presented in the form of
tree-structure in which the root segment is kept at the top and further
branches emanate downwards from the root segment.
In
this model the type of association can be one-to-one and one-to-many. This
means that many-to-one association is not permitted. This is equivalent to say
that multiple percentages for a child segment is not permitted.
The above conceptual data model can be
mapped into any one ways as shown below.
An alternative1 student file is kept at
the root segment of the tree and the faculty file is kept at the bottom of the
tree. By mapping the conceptual data model into the hierarchical data model the
following facts are observed.
The
association from student to enrollment is one-to-many. This mapped without any
modifications.
The
association from enrollment to subject is many-to-one. This is not permitted in
hierarchical data model. Hence it is modified into one-to-one association.
The
association from subject to faculty in many-to-one. This is not permitted in
hierarchical data model. Hence it is modified into one-to-one association.
In alternative1 while mapping the
conceptual data model into hierarchical data model, the many-to-one association
presents at two levels are modified into one-to-one association. These
modifications will increase the data redundancy.
In alternative2 the faculty file
is kept at the root of the tree and student file is kept at the bottom of the
tree. While mapping the conceptual data model the following facts are observed.
The
association from faculty to subject file is one-to-many. So it is mapped
without any modifications.
The
association from to subject enrollment is many-to-one. This is not permitted in
hierarchical data model. Hence it is modified into one-to-one association.
The
association from enrollment to student is many-to one. This is not permitted in
hierarchical data model. Hence it is modified into one-to-one association.
Finally which alternative has less
redundancy should be selected for implementation.
In alternative2, the association
change between enrollment and student. That means we are changing one type.
When we compare alternative2 with alternative1, alternative2 has less
redundancy and it is implemented.
Advantages:
1.
It promotes data sharing.
2.
Parent/Child relationship promotes conceptual simplicity.
3.
Database security is provided and enforced by DBMS.
4.
Parent/Child relationship promotes data integrity.
5.
It is efficient with 1:M relationships.
Disadvantages:
Complex
implementation requires knowledge of physical data storage characteristics.
Changes
in structure require changes in all application programs.
There
are implementation limitations (no multi parent or M:N relationships).
There
is no data definition or data manipulation language in the DBMS.
There
is a lack of standards.
Network Data Model:
A Network data model consists of a set of
pair wise association between the entities.
The Network data model was created to
improve database performance, database standards and represent complex
relationships effectively than the hierarchical data model.
To establish database standards, the
conference of database system languages (CODASYL) created the database task
group (DBTG). The DBTG define standard specifications for database creation and
data manipulations.
Schema:
The schema provides overall view of the database to the administrator.
Sub
Schema: The sub schema which defines the
portion of the database seen by the application programs.
Database
Management Language: That defines the
environment in which data can be changed. The DBTC specify 3 DML components.
A
schema data Definition Language (DDL), which enables the data base
administrator to create the database.
A
subschema DDL, which allows the application programs to define database
component that will be used by the application.
A
Data Manipulation Language, to manipulate the data in the database.
In Network data model, the
Network database as a collection of records in one-to-many record to have more
than one parent.
In Network database, a
relationship is called a set. Each set contains two entities one entity is
owner and other entity is member.
Sets
Set
Name
|
Owner
|
Member
|
Ordered
|
Customer
|
Sales
|
Buyers
|
Book
|
Sales
|
Contribute
|
Author
|
Book
|
Catalog
|
Publisher
|
Book
|
Advantages:
1.
Conceptual simplicity is at least equal to that of the hierarchical model.
2.
It handles more relationship types, such as M:N and multi-parent.
3.
Data access is more flexible than in hierarchical and file system models.
4.
Data Owner/Member relationship promotes data integrity.
5.
There is conformance to standards.
6.
It includes data definition language (DDL) and data manipulation language (DML)
in DBMS
Disadvantages:
1.
System complexity limits efficiency—still
a navigational system.
2.
Navigational system yields complex implementation, application development, and
management.
3.
Structural changes require changes in all application programs.
Relational data model:
The relational model was introduced
in 1970 by E.M.Codd. The foundation of relation is a mathematical concept known
as relation. The Relation is composed of intersecting rows and columns. Each
row in a relation represents a tuple. Each column represents an attribute.
The relational data model is
implemented through a Relational Database Management System (RDBMS).
Tables are related through the
sharing of common attribute. For e.g. the table agent and customer as shown
below.
Agent
Agent_Code
|
Agent_Name
|
Agent_Address
|
Agent_PhoneNo
|
Agent_Area
code
|
|
|
|
|
|
Customer
Cust_No
|
Cust_name
|
Cust_Address
|
Phone_No
|
Agent_Code
|
|
|
|
|
|
By
matching the Agent_Code in the customer table with Agent_Code in the Agent
table we can find Agent details of that customer.
The relationship types one-to-one,
one-to-many and many-to-many have in a relational data model.
A relationship diagram is a
representation of entities, the attributes within the entities and the
relationship between the entities.
Agent
Customer
Agent_Code
|
Agent_Name
|
Agent_Address
|
Agent_PhoneNo
|
Agent_Area
code
|
Cust_No
|
Cust_name
|
Cust_Address
|
Phone_No
|
Agent_Code
|
In
the above diagram, the relationship is one-to-many. The symbol ∞ indicates
many.
The customer represents “many”
sides, because an AGENT can have many CUSTOMERS.
The AGENT represents the “1” side
because each CUSTOMER has only one AGENT.
The languages which are supported to
relational data model is powerful and flexible. Because of that the relational
data model is popular. FoxPro, database, M.S.Acess, SQL are relational database
software’s. This software’s allows the user to specify what must be done
without specifying how it must be done.
SQL based database applications
involves 3 parts. 1. End user interface.
2. Set of tables stored in the database. 3. SQL Engine.
Advantages:
1.
Structural independence is promoted by the use of independent tables. Changes
in a table’s structure do not
affect data access or application programs.
2.
Tabular view substantially improves conceptual simplicity, thereby promoting
easier database design, implementation, management, and use.
3.
Ad hoc query capability is based on SQL.
4.
Powerful RDBMS isolates the end user from physical-level details and improves
implementation and management simplicity.
Disadvantages:
1.
The RDBMS requires substantial hardware and system software overhead.
2.
Conceptual simplicity gives relatively untrained people to use a good system
poorly, and if unchecked, it may produce the same data anomalies found in file
systems.
3.
It may promote “islands of
information” problems as
individuals and departments can easily develop their own applications.
Entity relation model:
Peter Chen first introduced the E.R.data model in 1976; it was the
graphical representation of entities and their relationship in a database.
E.R. models are normally represented
in an entity relationship diagram.
The E.R.Model is based on the following
components.
Entity:
entities are the real time objects. Entities represented by a rectangle.
e.g.
painter, employee, skills, noun.
Attribute:
Attributes are the characteristics of entities.
e.g. Empno, Empname, Empaddress etc.
Relationships:
A relationship describes association among the entities. There are three types
of relationships, one-to-many, many-to-many and one-to-one.
There
are two types of ER notations.
Chen
notation
Crow’s
foot notation.
For different types of
relationships.
CHEN
NOTATION
Crown’s
Foot Notation
In Chen notation, entities are
represented rectangle and entity names are written in the capital letters at
the centre of the rectangle. Relationships are represented by a diamond. The
diamonds are connected to entities through a relationship name is written
inside the diamond.
In the crows foot notation, the crow
foot is derived from the three pronged symbol used to represent many
relationships. In this notation, the one represented by a short line segments,
and many is represented by the crow’s foot. The relationship name is written
above the relationship line. The relationships are also show in vertical.
Advantages:
1.
Visual modeling yields exceptional conceptual simplicity.
2.
Visual representation makes it an effective communication tool.
3.
It is integrated with dominant relational model.
Disadvantages:
1.
There is limited constraint representation.
2.
There is limited relationship representation.
3.
There is no data manipulation language.
4.
Loss of information content occurs when attributes are removed from entities to
avoid crowded displays. (This limitation has been addressed in subsequent
graphical versions)
Object Oriented Model:
In the Object Oriented Data Model
(OODM) both data and their relationships are contained in a single structure
known as an Object.
Object Oriented Data Model has allowed
an object, the object contains operations that can be performed on it, such as
changing data values, finding a specific data value, and printing data values.
The
OODM is based on the following components.
An
object is an abstraction of a real world entity.
Attributes
describes the properties of an object.
E.g. Person object contains the attribute name,
social security number, date of birth etc.
A
collection of similar objects contains attributes and methods. By using those
methods change data values, find data values and print data values in the
objects.
Classes
are organized in a class hierarchic key. The class hierarchic key is similar to
upside down tree. In which each class has only one parent.
One
of the properties of object oriented data model is inheritance. By using the
inheritance we can inherit attributes and methods from super class to sub
classes.
E.g.
customer and employee sub classes of the person super class. Customer and
employee will inherit all attributes and methods from person.
Object
oriented data models are drawn using unified modeling language (UML) class
diagram. The UML class diagrams are used to represent data and other their
relationships.
For
E.g. Let us use invoice program. In this case the invoices are generated by
customer, each invoice contains one (or) more lines, and each line represents
an item purchased by customer.
The following diagram shows an object
representation UML class diagram, ER model
for
invoice.
The
object representation of invoice includes all the related objects within the
same object. The 1 next to the customer
object indicates that each invoice related to one customer. The M next to the
line object indicates that each invoice contains no. of lines.
The UML class diagram uses 3 separate
classes (customer, invoice and line) and two relationships to represent this
problem.
The E.R Model also uses the 3 separate
entities and two relationships to represent the invoice problem.
Advantages:
1.
Semantic content is added.
2.
Visual representation includes semantic content.
3.
Inheritance promotes data integrity.
Disadvantages:
1.
Slow development of standards caused vendors to supply their own enhancements,
thus eliminating a widely accepted standard.
2.
It is a complex navigational system.
3.
There is a steep learning curve.
4.
High system overhead slows transactions
CHAPTER-3
Relational
Database Model
In
Relational Data base model records are stored into tables. Relational data
model is easier to understand than the hierarchal data models and network data
models. Relational data model provides a logical view of the data and its
relationship among other data.
Tables
and Characteristics:
A Table is composed of rows and
columns. Each row of the table is called as tuple. Each column of the table is
called as attribute. A table is also called as Relation. A table contains a
group of related entities.
Characteristics
of a Table:
A
table is composed of rows and columns.
Each
row of the table represents one entity (tuple) in the entity set.
Each
column represents an attribute and each column has distinct name.
Each
cell represents a single value.
All
values in a column must have same data format.
Each
column has a specified range of values which is called domain.
The
order of the rows and columns is immaterial to the DBMS.
Each
table must have an attribute or group of attributes that uniquely identified
each row.
The
following Student table shows above characteristics.
The
student is composed of 8 tuples (rows) and 6 attributes (columns).
In
the student table the primary key is STU_NUM(student number) by using this
attribute we can identify a record uniquely in a student table.
Keys
Key
field or Primary Field:
A key consists of one or more attributes
that determines other attributes. For example, an invoice number identifies
attributes such as invoice date, customer details, items details and amount.
The statement “A DETERMINES B”
indicates that if we know the value of attribute A then determine the value of
B.
Eg:
In the student table if we know the value of student_number then determines
student last name, st_fname, stu_inital. This can be represented in the
following way.
STU_NUMàSTU_LNAME, STU_FNAME,
STU_INIT.
Functional
Dependency:
The term Functional Dependency can be
defined for “A DETERMINES B”, if each value in a column “A” determines only one
value in column B.
Eg:
STU_NUM functionally determines STU_LNAME (or) STU_LNAME not functionally
depends on STU_NUM).
STU_NUMà STU_LNAME.
Composite
Key or Fully Functional Dependency:
A key may be composed of more than one
attribute; such a multi attribute key is known as Composite key.
Eg:
STU_LNAME, STU_FNAME, STU_INIT, PHONE_NOà
STU_CLASS,STU_HOURS.
The combination of STU_LNAME, STU_FNAME,
STU_INIT, PHONE_NO can determine the STU_CLASS, STU_HOURS. If the attribute “B”
is functionally depends on “A”. composite key “A”, but not on any subset of
composite key. The attribute “B” is fully functionally depends on “A”.
Super Key :A sub set of
Attributes that uniquely identifies a tuple(row) in a relation(table).
Eno
|
Ename
|
Salary
|
Dept_no
|
Voter_Id
|
1
|
Raju
|
20000
|
10
|
V12345
|
2
|
Prasad
|
40000
|
10
|
V12222
|
3
|
Raju
|
20000
|
20
|
V45666
|
{Eno } :No two rows have
same Eno ( Eno uniquely identifies a tuple(row) in a relation)
{Ename } : Two employee’s may have same name.
{Voter_id} :No two rows
have same Voter_id (Voter_id uniquely identifies a tuple(row) in a relation)
{Eno, Ename }: Eno itself
uniquely identifies a tuple(row) in a relation, hence combination of Eno and
Ename also uniquely identifies a tuple(row) in a relation
Eg
: In a STUDENT table
STUDENT{STU_NUM,STU_LNAME,STU_FNAME,STU_INIT,STU_DOB,STU_HRS,STU_CLASS}
Super
Keys can be identified as fallows.
{STU_NUM}
{STU_NUM,
STU_LNAME}
{STU_NUM,
STU_LNAME, STU_INIT}
*Minimal Super Key (Key) :
Definition : A Minimal
Super Key (Key) K is a superkey with the additional property that removal of
any attribute from K will cause K not to be a superkey any more.
{Eno } : is Minimal Super
Key ( A Super Key which have only one
attribute is Minimal Super Key)
{Voter_id} : is Minimal
Super Key
{Eno, Ename }:
is Not a Minimal Super Key ( Removal of Ename from {Eno, Ename} = {Eno}
is also a Super Key } hence {Eno, Ename} is not Minimal Super Key.
Candidate Key :
Definition : If a relation schema has more than one key
(Minimal Super Key) then each of them is called as candidate key.
One of the candidate keys
is arbitrarily designated to be the primary key, and the others are called
secondary keys(or Alternative key).
A key formed by combining
at least two or more columns is called composite key
Primary Key :
Definition : Set of
attributes of a relation which uniquely identifies a tuple in a relation.
Note :
A Relation(table) can have
many Superkeys, and also many Minimal Superkeys.
If a Relation(table) has
more than on Minimal Superkeys each can be called as Candidate Keys.
One of the candidate keys
is arbitrarily designated to be the primary key, and the others are called
secondary keys(or Alternative key).
Key Hierarchy
Primary key doesn’t allows
duplicates and Null Values.
Foreign
Key:
A foreign key is an attribute whose values
match the primary key values in the related table.
Foreign Key (Referential
Integrity Constraint) :
Referential Integrity
Constraint is used to specify interdependencies between relations. This
constraint specifies a column or list of columns as a foreign key of the
referencing table.
A foreign key means
the values of a column in one table must also appear in a column in another
table. The foreign key in the child table will generally reference a primary
key in the parent table. The referencing table is called the child table &
referenced table is called the parent table
Self Referential
Integrity Constraint mean a column in one table references the primary key
column(s) in the same table.
Eno
|
Ename
|
Salary
|
Dept_no
(FK)
|
Voter_Id
|
1
|
raju
|
20000
|
10
|
V12345
|
2
|
ravi
|
40000
|
10
|
V12222
|
3
|
Raju
|
25000
|
20
|
V45666
|
EMPReferencing relation
In EMP table Eno is Primary
Key. (Duplicates and Null values are not allowed in Eno)
In EMP table Dept_no is
foreign key which references DEPT table
Dept_no column. (A value for Dept_no in EMP table accepts only if it exists in
Dept_no column in DEPT table.)
DEPT Referenced relation
Dept_no
(PK)
|
Dname
|
Dloc
|
10
|
MTech
|
BVRM
|
20
|
MBA
|
HYD
|
30
|
MCA
|
BVRM
|
In DEPT table Dept_no is
Primary key.
Secondary
Key:
The
secondary key is defined as a key that is used to for data retrieval purpose.
Example:
In the customer table the data retrieval can be facilitated when CUST_LAST and
CUST_PHONE number are used.
Integrity
Rules:
Integrity
rules are used in the database design.
Entity
Integrity: All primary key entries are unique and no part of the primary key
may be NULL.
Example:
In Agent table the agent_code is primary key and this column is free from null
values.
Referential
Integrity: A foreign key is an attribute whose values match the primary key
values in the related table.
Example:
The vendor_code is the primary key in the vendor key and it occurs as a foreign
key.
NOT
NULL: NOT NULL constraint can be placed in a column while inserting a row that
column must have a value.
Unique:
Unique constraint can be placed in a column while inserting a row that column
have unique values. (No duplication).
Relational Set Operators:
The
relational set operators are SELECT, PROJECT, JOIN, INTERSECT, UNION,
DIFFERENCE, PRODUCT, DIVIDE.
UNION:
The
UNION operator combines all rows from two tables, excluding duplicate rows. The
tables must have the same structure.
Product
1
Product 2
Query: Product 1 union product 2
INTERSECT:
The
INTERSECT operator gives only the rows that are appear in both tables. The
tables are also have same structure.
DIFFERENCE:
The
DIFFERENCE operator gives all rows in one table that are not found in other
table.
STUDENT EMPLOYEE
QUERY:
STUDENT MINUS EMPLOYEE
PRODUCT:
The
PRODUCT operator gives all possible pair of rows from two tables.
SELECT:
The
SELECT operator gives all rows that satisfies a given condition.
SQL> Select product
where P_CODE=311452.
PROJECT:
The
PROJECT operator gives all values for selected attributes. In other words
project gives a vertical subset of tables.
Project P_DESCRIPT Product;
JOIN:
The JOIN operator combines rows from 2 or more tables. There are several types
of joins.
Natural
Join
A
Natural Join joins tables by selecting the rows with common values in their
common attributes. A natural join is the result of a three-stage process:
1).
First, a PRODUCT of the tables is created, yielding the results shown in Figure
3.12.
Query: SQL> Product NATURAL JOIN Vendor
2).
Second, a SELECT is performed on the output of Step a to yield only the rows
for which theAGENT_CODE values are equal. The common columns are referred to as
the join columns.
3).
A PROJECT is performed on the results of Step b to yield a single copy of each
attribute, thereby eliminating duplicate columns.
The
final outcome of a natural join yields a table that does not include unmatched
pairs and provides only the copies of the matches.
Equijoin:
In
Equijoin the tables on the basis of equality condition that compares specified
columns of each table. In Equijoin the comparison operator Is Equal To is used
in the condition.
Or
Inner
join produces only the set of records that match in both Table A and Table
B.
Product:
Prod_code
|
Prod_Descp
|
Vend_code
|
123
|
tyres
|
V101
|
124
|
tubes
|
V102
|
125
|
Bolts
|
-------
|
Vendor:
Vend_code
|
Vend-name
|
V101
|
ravi
|
V102
|
ram
|
V103
|
krishna
|
Result:
Prod_code
|
Prod_Descp
|
Vend_code
|
Vend-name
|
123
|
Tyres
|
V101
|
ravi
|
124
|
Tubes
|
V102
|
ram
|
Outer Join:
In
the Outer Join the matched pair of records would be written and any unmatched
values in other table would be NULL.
Left
Outer Join:
The
Left Outer Join matched the records would be return and any unmatched values in
the other table would be NULL.
Right
Outer Join:
In
Right Outer Join the matched records would be and any unmatched values in the
right table would be NULL.
Full
Outer Join:
Full
outer join produces the set of all records in Table A and Table B, with
matching records from both sides where available. If there is no match, the
missing side will contain null.
Product:
Prod_code
|
Prod_Descp
|
Vend_code
|
123
|
tyres
|
V101
|
124
|
tubes
|
V102
|
125
|
Bolts
|
-------
|
Vendor:
Vend_code
|
Vend-name
|
V101
|
ravi
|
V102
|
ram
|
V103
|
krishna
|
Result:
Prod_code
|
Prod_Descp
|
Vend_code
|
Vend-name
|
123
|
Tyres
|
V101
|
ravi
|
124
|
Tubes
|
V102
|
ram
|
125
|
Bolts
|
----------
|
-----------
|
---------
|
----------
|
V103
|
krishna
|
DIVIDE:
The
DIVIDE operator uses one single column table as a deviser and two column table
as the dividend. The output of DIVIDE operator is a single column with a values
column-A from the dividend table rows where the values of the common column in
both tables match.
Data
Dictionary:
The
Data Dictionary provides a description of all tables in the database. The Data
Dictionary contains attribute names and characteristics of each table in the
system. The data dictionary contains meta data.
Example:
System
Catalog:
Like
the Data Dictionary, System Catalog contains metadata. The system catalog
describes table names, table creator, and creation of data, Number of column in
each table, the data types of the column, authorized users and access
privileges.
Relationships
within Relational Databases
The
relationships are classified into One-to-Many, One-to-One and Many-to-Many.
One-to-Many
(1:M) relationship
One entity of one table is associated
with number of entities into other tables.
Consider
the “Painter Paints Paintings” the ER Model and implementations are shown
below.
One-to-One
In
this relationship one entity of one table is associated with one entity of
other table and vice-versa.
Consider
the PROFESSOR AND DEPARTMENT. The ER data model and implementation are shown in
below.
May-to-Many
(M:M) relationship
In this relationship each and every entity
of one file will be associated with one or more entities of another tables and vice versa. This relationship is
not supported in relational environment. The many to many relationship can be
converted into two One to Many relationships.
Consider
student and subjects example. The ER model and implementation are shown in
below.
In
above each and every entity of student file is associated with one or more
entities of the subject table because each student will opt one or more
subjects in a semester.
Each
and every entity of subject file is associated with one or more entities of
student table because each sub will be opted by more than one student in a
semester. Many to Many associations are not supported.
Hence
the relationship will be converted into two one to many associations are as
shown below by introducing an intermediate table in which the common data of
the original file are stored.
Index:
An Index is composed of and index key
and set of points. Each key points to the location of data identified by the
key.
Example:
Suppose we want to look up all of the paintings created by the given painter
without an index, we must read each row in the painting table. If we index the
painter table and use the index key of painter number, we look up appropriate
painter number in the index and find the matching pointers.
DBMSs use indexes for many
different purposes. You just learned that an index can be used to retrieve data
more efficiently. But indexes can also be used by a DBMS to retrieve data
ordered by a specific attribute or attributes.
For example, creating an index on
a customer’s last name will allow you to retrieve the customer last name in
alphabetical ordered.
Indexes play an important role in DBMS for
implementation of primary keys. When we define a table primary key the DBMS
automatically creates an unique index on the primary key column.
When
we declare the customer-code to be the primary key of the customer table, the
DBMS automatically creates a unique index on that attribute.An unique index is
an index in which the index key can have one pointer.
Codd`s Relational Database Rules
Dr E F Codd published a list of 12
rules to define a relational database.
Rule
1: Information:
All
information in a relational database must be logically represented column
values in rows with tables.
Rule
2: Guaranteed Access:
Every
value in a table is guaranteed to be accessible through a combination of table
name, primary key and column name.
Rule
3: Systematic Treatment of NULLs:
NULL
must be represented and treated in a systematic way (Independent of data type).
Rule
4: Dynamic online catalog based on the relation:
The
metadata must be stored as ordinary data in a table within the database. Such
data must be available to authorized users.
Rule
5: Comprehensive data sub language:
The
relational database may support many languages. However it must support data
definition, view definition, data manipulation integrity constraints,
authorizations and transaction management.
Rule
6: View Update:
Any
view i.e., theoretically must be updatable through the system.
Rule
7: High level Insert, Update and Delete:
The
database must support insert, update and delete.
Rule
8: Physical data independency:
Application
programs are logically unaffected when storage structures are changed.
Rule
9: Logical data independency:
Application
programs are logically unaffected when changes are made to the table
structures.
Rule
10: Integrity Independency:
All
relational integrity constraints must be definable in the relational language
and stored in the system catalogs.
Rule
11: Distribution Independency:
The
end users and application programs are unaffected by the data locations.
Rule
12: Non Sub Version:
If
the system support low level access to the data, there must not be a way to
bypass the integrity rules of the database.
UNIT-2 CHAPTER-4
Entity
Relationship Modeling
The
Entity Relationship modeling forms an Entity Relationship diagram. The ERD
represents the conceptual database. The Entity Relationship diagram contains
mainly three components. Entity, Attribute and Relationship.
Entity:
An Entity represents a real world object.
Eg:
Student, Customer, Employee, Subject, Faculty, and Product
In
chen and crow foot notation an entity is represented by rectangle containing
entity name. The entity names usually return in capital letters.
Attributes:
An Attribute represents characteristics are properties of an entities.
Eg:
stu_name , stu_address
In
chen notation the attributes are represented by Vowels and are connected to the
entity rectangle with a line .Each vowel contains name of the Attribute.
In
the cross foot Notation the attributes are written in the attribute box below
the entity rectangle.
Required
and Optional Attribute:
A Required is an Attribute that must
have a value. An Optional Attribute is an Attribute that does not have a value.
In
the Crows foot Notation the required Attributes are represented by bold face.
Domain:
Attributes have a domain. A domain is the set of possible values for a given
Attributes.
Eg:
The domain for the gender attribute consists of only two possibilities namely
male and female.
Identifiers:
Identifiers contains one attributes that
uniquely identify an entity in the entity set. In relational model, such
identifiers are mapped to primary keys in the table. Identifiers are undermined
in the ER diagram.
For
Eg. STUDENT(Sto_no, Stu_fname,
Stu_Lname, Stu_email);
Composite
Identifier: A Composite Identifier is a primary Key contains more than one
attribute.
For
Eg. To identify the each student entity by using a composit primary key
composed of the combination of Stu_Lname and Stu_Fname instead of using Stu_no.
Composite
and Single Attribute: Attributes are classified as single Attribute or
Composite Attribute.
A
Single Attribute is an attribute that cannot be sub divided.
Eg:
Age, F_name , Gender.
A
Composite Attribute is an Attribute that can be further sub divided which gives
additional Attributes.
Eg:
Address can be sub divided into street , city and pincode.
Single
Attribute: A Single Attribute ia an attribute that can have only one value.
Eg:
STU_Number in the STUDENT TABLE.
Multi
Value Attribute: A multi value Attribute is an attribute that can have many
values.
Eg:
A car color may be sub divided into many colors.
In
Chenn notation , the multivalued attributes are shown by a double line
connecting to the attribute to entity.
Note: The crows foot notation does not identify
multivalued attributes.
Derived
Attribute: A Derived Attribute is an attribute whose values are calculated from
other Attributes.
Eg:
An Employee age computing difference between current data and the employee date
of birth.
A Derived Attribute is indicated in a chen’s
Notation by a dashed line by the connecting to the attribute and entity.
Relationship:
The Relationship is an Association between entities.
Eg:
A professor teaches a class.
There
are three different types of relationships
One-
One
One-
Many
Many-Many
Connectivity
and cardinality:
Cardinality Express the minimum and
maximum number of entities occurrences associated with one occurrence of the
related entity.
In the ERD the cardinality is
indicated by placing the appropriate numbers beside the entities using the
format(x, y).
The first value represents the minimum
number of associated entities. The second value represents the maximum number
of associated entities.
For
Eg:
The
cardinality (1,1) indicates that each class is taught by only one professor.
The cardinality (1,4) indicates one
entity in the professor relation
associate not more than four times in the entities of the class Relation.
Existence
Dependency: An Entity is said to be existence dependency when the entity is
associated with another related entity.
Weak
Relationship
A
weak relationship exists if the primary key of the entity does not contain
component of the parent entity.
COURSE(CRS_CODE,
DEPT_CODE, CRS_DESCRIPTION, CRS_CREDIT)
CLASS(CLASS_CODE,
CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)
The Relationship between course and
class is weak because the class_code is the primary key in the class entity
while the course_code in the class is foreign key. In this example the class
primary key did not inherit the primary key components from the course entity.
Strong Relationship
A Relationship exist if the primary key of the
related by contains a primary key component of the parent entity. The
relationship between course and class is strong because the class entity
composite entity key is composed of class_code+ Crs_code. The class primary key
inherit the primary key component from the course entity. In the strong
relationship we can write ‘o’ symbol next to the entity.
Weak Entity
In
Weak Entity the primary key is primary key is partially as totally derived from
the parent entity in the relationship .For example: The dependent entity key
was inherited from the employee entity as show below.
Employee
(Emp_no, Emp_LName, Emp_FName,Emp_init.Emp_dob).
Dependent
(Emp_no, Dept_no, Dept_LName,Dept_FName, Dept_dob).
In
chen Notation the Weak entity is represented by double weak entity.
Strong Entity
In
Strong entity the primary key has not partially or totally derived from the
parent entity in the relationship. For example in the course and class
relationship the class table primary key is class code which is not derived
from the course parent key. The entity class is a strong entity.
Class(class_code,
course_code,class_desc,class_time, prof_code)
In
the Course and Class relationship is composed of class_code and course_code is
derived from the course parent entity. The entity class is a weak entity.
Class(class_code,
course_code, desc,class_time, prof_code)
Relationship Participation:
The
Relationship Participation between the entitiesis either optional or mandatory.
Optional
participation
Optional
participation means that one entity occurrences does not require a
corresponding entity occurrence in a particular Relationship.
Eg:
“course generate class” relationship an entity occurrence in the course table
does not necessary require the corresponding entity occurrences in the class
table.
In
the crows foot notation an optional relationship between entities is shown by
drawing a circle on the side of optional entity. The minimum cardinality is
zero for the optional entity.
HEAR
CLASS IS OPTIONAL TO COURSE
Mandatory
participation means that one entity occurrence requires a corresponding entity
occurrence in a particular relationship.
Eg
1: “course generate class” relationship
an entity occurrences in the course table necessary require the corresponding
entity occurrences in the class table.
In
the crows foot Notation there is no separate symbol for the mandatory entity.
The minimum cardinality is one for the mandatory entity.
(HEAR COURSE AND CLASS IS MANDATORY
RELEATIONSHIP)
Eg
2: The “Professor teaches class” relationship it possible for a professor not
to teach a class. Therefore class is optimal to the professor on other hand a
class a class must be taught by a professor. Therefore the professor is
mandatory on other hand a class must be taught by a professor. Therefore the
professor is mandatory a class.
The
cardinality (1,1) represents one class is taken by one professor. The
cardinality (0,3) indicates the professor may teach no classes or theory
classes.
The
following table shows various cardinalities that are supported by crow’s foot
notation.
Relationship Degree
A
Relationship Degree indicates the number of entities associated with a
relationship.
A
Unary Relationship: A Unary Relationship exists when an association is
maintained within a single entity.
Eg:
An Employee within the employee entity is the manger for one or more entities
within that entity.
Binary
Relationship: A binary relationship exists when two entities are associated.
Eg:
The relationship a professor teaches one or more class.
Ternary
Relationship: A ternary relationship exists when three entities are associated.
Eg: A doctor writes one or more prescription
A patient may receive one or more
prescription
A drug may appear one or more
prescription
Recursive
Relationship: A Recursive Relationship is one in which a relationship between
the same entity set.
There
are three types of Recursive Relationships
One-
One
One-
Many
Many-Many
One
to One: A One to One unary relationship may be expressed by an employee may be
married to one and only one other employee.
One
to Many: A One to Many unary relationships may be expressed by an employee may
manage many employees.
Many
to Many: The Many to Many relationship may be expressed by a course may be pre
requisite to many other courses.
Associative
Entities: The associative Entity is used to implement a many to many
relationship between entities. This associated entity is composed of the
primary key of each of the entites to be connected.
Example:
The Crown foot notation the relationship between the parent and child entities
indicates the strong relationship.
Developing an ER diagram:
The process of database design is an
Iterative process rather than a sequential process. Building an ER diagram,
usually involves the following Activities.
Create
a detailed description of operation of the organization.
Identify
the business rules from the description.
Identify
entities, Attributes and relationship from the business rules.
Developing
the initial ERD.
Identify
the attributes and primary keys.
Revise
and review the ERD
During
the review process additional objects, attributes and relationships will be
covered. Therefore the basic ERD will be modified to incorporate the newly
discovered components.
Eg:
Let us with initial interviews with the tiny college administrator and the
interview process gives the following business rules.
Tiny
college is divided into several schools a school of business, a school of Arts
and Science, a school of education a school of applied sciences, each school is
administrated by a deal who is a professor.
Each
school is composed of several departments.
Each
department may offer several courses.
Each
course may operate several classes.
Each
department may have professors one and only one of those professors chairs the
department and no professor is required to accept the chair position.
Each
professor may teach the classes. A professor may not teach the class.
A
student may enroll several classes, each class contains several students.
Student is optional to class in the many to many relationships. This many to
many relationship must be divided into two one to many relationship through
many enroll entities.
Each
department has several students.
Each
student has an Adviser in Department, each Adviser consists several students.
An Adviser is also a professor, but not all professor advice students.
A
building contains Rooms and room is used for classes.
The
following diagram shows crows foot ERD for tiny college.
Database Design Challenges: Conflicting Goals
DatabaseDesignChallenges:
Database designers often must make design compromises that are triggered by
conflicting goals, such as adherence to design standards (design elegance),
processing speed,andinformationrequirements.
Designstandards:
The database design must conform to design standards. Such standards have
guided you in developing logical structures that minimize data redundancies.
In
short, design standards allow you to work with well-defined components and to
evaluate the interaction of those components with some precision.
Processing
speed:
In many organizations, particularly those generating large numbers of
transactions, high processing speeds are often a top priority in database
design. High processing speed means minimal access time. If the focus is on
data-retrieval speed, you might also be forced to include derived attributes in
the design.
Information requirements:
The quest for timely information might be the focus of database design.
Complex information requirements may dictate data transformations, and they
may expand the number of entities and attributes within the design.
Therefore, the database may have to sacrifice
some of its “clean” design structures and/or some of its high transaction speed
to ensure maximum information generation.
A
design that meets all logical requirements and design conventions is an
important goal.
However,
if this perfect design fails to meet the customer’s transaction speed and/or
information requirements, the designer will not have done a proper job from the
end user’s point of view.
Compromises
are a fact of life in the real world of database design.
Even while focusing on the entities, attributes, relationships, and
constraints, the designer should begin thinking about end-user requirements
such as performance, security, shared access, and data integrity.
Finally,
prepare the document! Put all design activities in writing. Then review what
you’ve written.
Chapter-5
Normalization of Database Tables
Database
Tables & Normalization:
In
Database designed process, the table is the basic building block.
The
ER model gives good table structure. But it is possible to create poor table
structure. Even in a good database structure design.
Def:
Normalization is an Analysis of
functional dependency between the attributes of a relation. It reduces the
complex user view into set of stable sub groups or fields.
The normalization process is used to
create a good table structure to minimize data redundancy.
Normalization
works through a series of stages called normal form.
The
first three stages are
First
Normal Form(1NF)
Second
Normal Form(2NF)
Third
Normal Form(3NF)
Business
Databases: Business databases are sufficient to normalize to 2NF or 3NF. The
other stages are
Boyce
Code Normal Form (BCNF)
Fourth
Normal Form(4NF)
Fifth
Normal Form (5NF)
Normalization is a very important in
database design .Generally the higher normal forms, the more relational join
operations required to produce a specific output. Therefore occasionally we
accepted to denormalize some positions of the database to increase the
efficiency.
DeNormalization produces a lower
normal form i.e., 3NF will be connected into 2NF will be converted into 1NF.
The
need for Normalization:
Consider the Database activities of a
construction company that manages several building projects. Each Project has
its own project number and project name, employee assigned to it and soon. Each
employee has employee number, employee name, classification etc.
PROJ_NUM
|
PROJ_NAME
|
EMP_NUM
|
EMP_NAME
|
JOB_CLASS
|
CHG_HOUR
|
HOURS
|
Total_Charge
|
15
|
Evergreen
|
103
|
June
E .Arbough
|
Elect.Engineer
|
84.50
|
23.8
|
2011.1
|
|
|
101
|
Swetha
|
Database
designer
|
105.00
|
19.4
|
2037
|
|
|
105
|
Lakshmi
|
Database
designer
|
105.00
|
35.7
|
3748.5
|
|
|
106
|
Durga
|
Programmer
|
35.75
|
12.6
|
450.45
|
|
|
102
|
Ram
|
Systems
Analyst
|
96.75
|
20.8
|
2012.4
|
SubTotal 10259.45
|
|||||||
18
|
Amber
wave
|
114
|
Harika
|
Applications
designer
|
48.10
|
25.6
|
1231.36
|
|
|
118
|
Ganesh
|
General
support
|
18.36
|
45.3
|
831.708
|
|
|
104
|
Sri
|
Systems
analyst
|
96.75
|
32.4
|
3134.7
|
|
|
112
|
Hari
|
DSS
Analyst
|
45.95
|
44.0
|
2021.8
|
Sub Total 7219.568
|
|||||||
22
|
Rolling
Tide
|
105
|
Sruthi
|
Database
designer
|
105.00
|
64.7
|
6793.5
|
|
|
104
|
Raju
|
Systems
analyst
|
26.75
|
48.4
|
1294.7
|
|
|
113
|
Ravi
|
Application
designer
|
48.10
|
23.6
|
1135.16
|
|
|
111
|
Ramesh
|
Clerical
support
|
26.87
|
22.0
|
591.14
|
|
|
106
|
Rao
|
Programmer
|
35.75
|
12.8
|
457.6
|
Sub Total 10272.1
|
|||||||
25
|
Starflight
|
107
|
Rekha
|
Programmer
|
35.75
|
24.6
|
879.45
|
|
|
115
|
Rani
|
System
analyst
|
96.75
|
45.8
|
4431.15
|
|
|
101
|
John
|
Database
designer
|
105.00
|
56.3`
|
5911.5
|
|
|
114
|
Manikanta
|
Applications
designer
|
48.10
|
33.1
|
1592.11
|
|
|
108
|
Nalini
|
System
analyst
|
96.75
|
23.6
|
2283.3
|
|
|
118
|
James
|
General
secretary
|
18.36
|
30.5
|
559.98
|
|
|
112
|
P
J
|
DSS
Analyst
|
45.95
|
41.4
|
1902.33
|
Sub Total 17559.82
|
|||||||
Total Amount 45310.94
|
The
Easiest way to generate the required report to create a table that table has
some fields of the Report.
Table_Name
: Construction_Company
An
Employee can be assigned more than one project.
For example: Employee number 104 has
been assigned to two project .Therefore knowing the project _no and employee no
will find the job classification and hours worked. Therefore project_No and
emp_no will be taken as primary key.
The
above structure of the table has the following deficiency
The
project _no is a part of primary key. But it contains null values.
The
table entries invites data inconsistency for example job classification value
Electrical_Engineer might be entered.Elec_engi ,EE
The
table displays data redundancy.
Update
Anomalies: Modify the job class for Employee_No 105 requires many alternatives.
Insertion
Anomalies: To complete a row definition of a new employee must be assigned to a
project. If the employee is not assigned, a dummy project must be created to
complete the row.
Deletion
Anomalies: Suppose only one employee is associated with a project, if that
employee leaves the company and the employee data are deleted, the project
information will also be deleted.
The above deficiency of table
structure appears to work, the report gives different results depending on
data.
Normalization
Process: The most common Normal forms and their characteristics are
First
Normal Form (1NF): A Relation is said to be in first normal form if it is
already in un normalized form and it has no repeating group.
Second
Normal Form (2NF): A Relation is said to be in second normal form if it is
already in first normal form and it has no partial dependency.
Third
Normal Form (3NF): A Relation is said to be in third Normal form if it is
already in second normal form and it has no transitive dependency.
Boyce
code Normal Form(BCNF): A Relation is said to be in Boyce code Normal form if
it is already in third normal form and every determinant is a candidate key.
Fourth
Normal Form (4NF): A Relation is said to be in fourth normal form if it is
already in Boyce code normal form and it has no multi valued dependency.
Fifth
Normal Form(5NF): A Relation is said to be fifth normal form if it is already
in fourth normal form and it has no loss less decompose.
Eg:
Normalization of construction company Report
PROJ_NUM
|
PROJ_NAME
|
EMP_NUM
|
EMP_NAME
|
JOB_CLASS
|
CHG_HOUR
|
HOURS
|
Total_Charge
|
15
|
Evergreen
|
103
|
June
E .Arbough
|
Elect.Engineer
|
84.50
|
23.8
|
2011.1
|
|
|
101
|
Swetha
|
Database
designer
|
105.00
|
19.4
|
2037
|
|
|
105
|
Lakshmi
|
Database
designer
|
105.00
|
35.7
|
3748.5
|
|
|
106
|
Durga
|
Programmer
|
35.75
|
12.6
|
450.45
|
|
|
102
|
Ram
|
Systems
Analyst
|
96.75
|
20.8
|
2012.4
|
SubTotal 10259.45
|
|||||||
18
|
Amber
wave
|
114
|
Harika
|
Applications
designer
|
48.10
|
25.6
|
1231.36
|
|
|
118
|
Ganesh
|
General
support
|
18.36
|
45.3
|
831.708
|
|
|
104
|
Sri
|
Systems
analyst
|
96.75
|
32.4
|
3134.7
|
|
|
112
|
Hari
|
DSS
Analyst
|
45.95
|
44.0
|
2021.8
|
Sub Total 7219.568
|
|||||||
22
|
Rolling
Tide
|
105
|
Sruthi
|
Database
designer
|
105.00
|
64.7
|
6793.5
|
|
|
104
|
Raju
|
Systems
analyst
|
26.75
|
48.4
|
1294.7
|
|
|
113
|
Ravi
|
Application
designer
|
48.10
|
23.6
|
1135.16
|
|
|
111
|
Ramesh
|
Clerical
support
|
26.87
|
22.0
|
591.14
|
|
|
106
|
Rao
|
Programmer
|
35.75
|
12.8
|
457.6
|
Sub Total 10272.1
|
|||||||
25
|
Starflight
|
107
|
Rekha
|
Programmer
|
35.75
|
24.6
|
879.45
|
|
|
115
|
Rani
|
System
analyst
|
96.75
|
45.8
|
4431.15
|
|
|
101
|
John
|
Database
designer
|
105.00
|
56.3`
|
5911.5
|
|
|
114
|
Manikanta
|
Applications
designer
|
48.10
|
33.1
|
1592.11
|
|
|
108
|
Nalini
|
System
analyst
|
96.75
|
23.6
|
2283.3
|
|
|
118
|
James
|
General
secretary
|
18.36
|
30.5
|
559.98
|
|
|
112
|
P
J
|
DSS
Analyst
|
45.95
|
41.4
|
1902.33
|
Sub Total 17559.82
|
|||||||
Total Amount 45310.94
|
The
construction company report is represented in the form of relation. The
relation named as CONSTRUCTION_COMPANY this is in un normalized form as shown
below
CONSTRUCTION_COMPANY(Proj_No,Proj_Name,(Emp_No,Emp_Name,
Job_Classification, Charge_Per_Hour, Hours_ Billed)) -----à (1)
The
field Total charge, SUB TOTAL,GRAND TOTAL are not included in the relation
because they are derived Attribute.
First
Normal Form:
In Relation (1), the fields in the
inner most set of parenthesis put together is known as repetating group. This
will result in redundancy of data for the first two relations remove the
repetating group. Hence the relation 1 is subdivided into two relations to
remove repeating group
PROJECT(proj_No,proj_Name) ----------> (2)
PROJECT_EMP(Proj_No,
Emp_No,Emp_Name,Job_Class,Charge_Per_Hour,Hours_Billed) ------ à(3)
Now
above relation (2) & (3) are in 1NF. In relation (3) Proj_No , Emp_No
jointly serve as key field.
Second
Normal Form:
Definition
of Partial Dependency:
Non key attribute are depending on
the part of the composite primary key then it is said to be Partial Dependency.
In Relation 2 the number key fields is
only one and hence there is no scope for partial dependency the absence of
partial dependency in relation 2 takes it 2NF without any modification.
The
dependency diagram of relation 3 is shown below
In the above diagram Hours_Builled
depends on Project_No and Emp_No but the remaining non key fields (Emp_Name,Job_Class,Charge_per_Hour)depends
on Emp_No this situation is an example of 2nd normal form .Hence the
relation 3 is divided into 2 relations.
Assignment(Proj_No,
Emp_No, Hours_Billed)------------à(4)
Emp_Job(Emp_No,Emp_Name,Job_Class,Charge-Per_Hour)--------------à(5)
The
Relations (4) & (5) are in 2NF
Third
Normal Form:
Transitive
Dependency:
If one non prime attribute is
determines the other non prime attribute then it is called as transitive
dependency..
In Relation (2) there is only one non
key field. This means that it has no transitive dependency. Hence Relation (2)
can be treated as 3NF without any modification similarly in relation (4) there
is only one non key field. This means that it has no transitive dependency.
Hence relation (4) can be treated as 3Nf without any modification.
In
Relation(5) Charge_Per_Hour depends on Job_Classification this means that
relation (5) has transitive dependency. The dependency diagram for the relation
(5) is shown below.
Diagram
Hence
relation (5) is sub divided into two relations. Relation (6) and relation (7)
as shown below.
Job(Job_Class,
Charge_Per_Hour)-----------------à(6)
Emp(Emp_No,Emp_Name,Job_Class)------------à(7)
For
a practical application it is sufficient to normalized up to either 2NF or 3NF
Hence,
the process of normalization is stopped and the final 3NF relations of
construction company as shown below.
Project(Proj_No,Proj_Name)---------------------------------à(1)
Assignment(Proj_No,
Emp_No, Hours_Billed)------------à(2)
Emp(Emp_No,Emp_Name,Job_Classification)------------à(3)
Job(Job_Classification,
Charge_Per_Hour)----------------à(4)
Improving
Design:
How
to improve the design of the database?
Evaluate
primary key Assignment:
Each
time a new employee is entered into the employee table, A job class value must
be entered. Unfortunately the data entry in the job class contains an error,
that lead to referential integrity violation. For example entering “DB Designer”instead
of Database Designer for the Job_class attribute in the Employee table will
trigger such a violation . Therefore it is better to add job code attribute in
job relation and employee relation.
Emp(Emp_No,Emp_Name,Job_Code)
Job(Job_Code,Job_Classification,Charge_Per_Hour)
Evaluate
Naming conversions:
In
the job relation job classification will be changed to Job_Description and
Charge_Per_Hour will be charged to Job_Charg_Hour. In the assignment relation
Hours_Billed will be changed to Assign_Hours_Billed.
Job(Job_Code,Job_Description,Job_Chg_Hour)
Assignment(Proj_No,Emp_No,Assign_Hours_Billed)
Refine
Attribute Atomicity:
An
Automatic Attribute is one that cannot be further sub divided such an attribute
is said to be automaticity. In employee table the attribute Emp_Name is not a
automaticity because it is further sub divided as Emp_LName,Emp_FName,Emp_Init.
These attributes are added to the employee table.
Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Job_Code)
Identify
New Attribute:
If
the employee table used in real world environment several other attributes will
be added. For example Social_Security_Number, Date_of_Joining,
Date_of_Birth,Hire_Date,Gross_Salary,Net_salary etc will be added to improve
relation.
Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Hire_Data,Gross_Salary,Job_Code)
Identify
New Relationship:
When
we create a new relationship between the table it will not produce unnecessary
duplication. Then we create a new relationship.
Refine
primary Keys:
The
combination of Emp_No and Proj_No is a primary key in the Assignment table. For
example if we add a assigned hours more than one time for a particular project
then it violates the primary key constraints.
To
avoid the violation to add additional attribute Assign_Date to the assignment
table. If we want to add assigned hours for a particular project more than one
time in the same day then it will violates the primary key constraints. The
same data entry gives no problem when Assign_No is used as a primary key in the
Assignment relation.
Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_Billed)
To
Maintain Historical Accuracy:
It
is assumed that the Job_Chg_Hour will change over time. The changes to each
project were billed by multiplying the hours worked on the project in the
assignment table by the Job_Chg_Hour in the job table. Those changes would
always show the current change_ per_hour stored in the job table rather than
the job charge hour that was in effect at the time of assignment. Because of
that we are adding an attribute Assign_Chg_Hour to the Assignment table.
Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_Billed,Assign_Chg_Hour)
Evaluate
using Derived Attribute:
The
derived attribute Assign_Charge is added to the Assignment relation and
Assign_charge is updated by multiplying with Assign_Chg_Hour with the
Assign_Hours_Billed. However the derived attribute Assign_Charge in the
Assignment table makes easy to write reports or invoices.
Boyce
code Normal Form(BCNF):
The
BCNF can be violated only when the table contains more than one candidate key
Candidate
key:
A
key is said to be candidate key if the
superkey that does not contain a subset of attributes i.e the key itself a
superkey.
These
functional dependencies are shown below.
A+B -------------àC,D
C
----------à B
The
table structure has no partial dependency and there is no transitive
dependency. But the condition C ---àB indicates that a nonkey
attribute determines the part of key the primary key, causes the table to
fail to meet the BCNF requirements.
To
convert the above table structure from 3NF to BCNF, first change the primary
key to A+C. The dependency C---àB
means that C is in effect a superset of B. The Decomposition procedures to
produce the results shown below.
Example
for BCNF:
Each
Class_Code identifies a class iniquely. A student contains many classes and
earning the grades respectively. A staff member can teach many classes. But
each class is taught by only one staff member.
Stu_Id+Staff_Id
--------àClass_Code,
Enroll_grade
Class_Code ---------------àStaff_Id
The
above table contains two candidate keys to violets the BCNF. Now we can
eliminate the one candidate key from the above table.
Fourth
Normal Form (4NF):
Consider an employee can have multiple
assignment i.e, that employee works as an volunteer in service organization and
worked in different projects which is shown below
The
above contains two sets of independent multi valued dependencies(i.e,
Org_Code,Proj_Code).If volunteer-1 and Volunteer_2 two tables are implemented.
The two tables contains null values.
In volunteer-3 table has a primary key
but it is composed of all attributes of the table. When you consider like this
it produces many redundancies.
To eliminate multi valued dependency
by creating the assignment and service tables as shown below.
In
the Assignment table and service table does not contain multi valued dependency
De
Normalization: ( Under Construction..Ravindra)
Normalization is a very important in
database design. Generally the higher normal forms the more the relational join
operations required to produce a specific output. A successful design must also
consider end user requirement for fast performance. Therefore occasionally we
expected to de normalize some portions of the database design in order to meet
performance require.
De Normalization produces lower normal
forms 3Nf will be converted to 2NF or 2Nf will be converted to 1NF.
Eg:
The need for de normalization due to generate evaluation of faculty report in
which each row list the scores of obtaining during the last 4 semester taught.
Faculty
Evaluation Report:
instructor
|
Dept.
|
Sem-1
|
Mean
|
Sem-2
|
Mean
|
Sem-3
|
Mean
|
Sem-4
|
Mean
|
Last_sem
avg
|
|
|
|
|
|
|
|
|
|
|
|
We
can generate easy above the report but the problem arises. The data are stored
in a normalized table. In which each row represented a different score for a
given faculty in a given semester.
EVALDATA:
ID
|
Instructor
|
DEPT.
|
Mean
|
Semistor
|
|
|
|
|
|
It
is some difficulty to generate faculty evaluation report the normalized table
The
other table FACHLST faculty history table contains the last four semester mean
for each faculty .The faculty history table is a temporary table created from
the evaldata as shown below.
Instructor
|
Dept.
|
Sem-1
|
Mean
|
Sem-2
|
Mean
|
Sem-3
|
Mean
|
Sem-4
|
Mean
|
Last_sem
avg
|
|
|
|
|
|
|
|
|
|
|
|
The
FACHIST is a un normalized from table using the table we can generate .The
faculty evaluation report very firstly. After generating the report, the
temporary table, FACHIST will be deleted. We are doing like this, we can
increase the performance of the database
Chapter
– 6
Advanced
Data Modeling
Advanced
Data Model:
The Extended Entity Relationship Model
(EERM) sometimes referred to as the enhanced entity relationship model because
adding more semantic constructs to the original entity relationship model. The ER Model constructs Entity super types,
entity sub types and entity clustering.
Entity
Super types, Entity Sub types:
In an Organization contains different
types of employee and all the employees are not having the same attributes. If
you create a one table for all employees to store their information many
columns have null values.
For Example the pilot shares certain
characteristics with other employee such as employee_no,
emp_name, emp_address, employee_hire_date on the other employees. But pilot characteristics are not shared by other
employee. The pilot characteristics are employee_license
and employee_ratingwill generates nulls for employees who are not pilot.
The pilot entity store only the attributes that are unique to pilot, and the
employee entity store attribute that are common to all employees.
We can conclude that pilot is a sub type of employee and employee is a super type of pilot. An entity super
type is a generic type i.e, related to one or more entity subtypes, where the
entity super type contains common characteristics entity subtype contains
unique characteristics.
Specialization
Hierarchy:
Entity super types and sub types are
organized in a hierarchy which describes the higher level entity super types
(parent entity) and lower level entity (child entity) sub types.
The
following diagram shows specialization hierarchy by an employee super type and
three entity sub types.
Pilot
Mechanic
Accountant.
The
specialization hierarchy reflects one-to-one relationship between super entity
type and sub entity type.
Eg:
A
pilot sub type is related to one instance of employee super type.
A
mechanic sub type is related to one instance of employee super type.
An
Account sub type is related to one instance of employee super type.
Inheritance:
The property of inheritance enables an
entity subtype, to inherit the attributes and relationships of the super type.
In
the above example the employee entitysuper type participating in a one-to-many
relationship with a dependent entity through inheritance all subtypes also
participated in that relationship.
Subtype
Discriminator:
A sub type discriminator is the
attribute in the super type entity that determines to which sub type is
related.
In
the above example if the sub type discriminator is emp_type.
If
the emp_type has a value of p the super type is related to pilot sub type.
If
the emp_type has a value of A the super type is related to Account subtype.
If
the emp_type has a value of M the super type is related to mechanic subtype.
Disjoint
& Overlapping Constraints:
An entity super type can have disjoint
or overlapping entity supertypes.
Disjoint subtype are sub types that
contains a unique subset of the super type entity set.
The
Disjoint subtypes are indicated by the letter ‘d’ inside the category shape. In
disjoint the super type entity is related to only one sub entity types.
Eg:
An employee who is a pilot can appear only in the pilot sub type, not in any of
other sub types.
Overlapping subtypes are subtypes that
contains non unique subsets of the super type entity set.
The Overlapping subtypes are indicated
by the letter’ o’ inside the category shape. In the overlapping the super type
entity is not relate to only one sub entity types.
Eg:
An employee may be a professor as well as administrator.
The
Administrator and professor overlap the super type entity employee.
Completeness
constraint:
The completeness constraint can be
partial or total.
Partial completeness means that not every super type entity is a member
of sub type entity. A single horizontal line under the circle represents a
partial constraint O.
Total
completeness means that every super type is must be a member of atlatest one sub type. A double horizontal
line under the circle represents the total completeness constraint.
Specialization
and Generalization:
We can use various approaches to
develop entity super types and sub types.
Specialization
is the top down process of identifying lower level entity sub type from a
higher level entity super type.
Eg:
The specialization is used to identify multiple entity supply (Pilot, Mechanic,
Accountant) from the super entity employee.
Generalization
is the bottom-up process of identifying higher level entity super types from a
lower level entity sub types.
Eg:
The Generalization is used to identify entity super type from the sub type
(Pilot, mechanic, Accountant).
Entity
Clustering:
Generally the data model will develop
an initial ERD containing a few entities. As the designed approach completion
the ERD will contain hundreds of entities and relationships. In those cases, we
can use entity cluster to minimize the number of entities in the ERD.
An entity cluster is a virtual entity type used
to represent multiple entities and the relationship in the ERD.
An entity cluster is formed by
combining multiple inter related entities into a single entity object. An
entity cluster is considered virtually in the sense that it is not actually an
entity in the final ERD.
Entity
Integrity:
The most important characteristics of
an entity is its primary key, which uniquely identifies each entity instance.
The primary key and foreign key works together implement relationship between
the tables in the relational data model.
Natural
keys & Primary keys:
The unique identifies is commonly
uncounted in the real world. For example class_no to register for classes, invoice_no
to identify a particular invoice, account_no
to identify credit cards and soon. These Examples contains natural keys.
The natural keys class_no, invoice_no,
account_no is used to uniquely identify the real world objects.
If an entity has a natural identifier,
a data modeler uses that natural key as the primary key of the entity.
Primary
key Guidelines:
The primary key main function is to
uniquely identity an entity for a given primary key value the relational model
can determine values of all dependent attributes.
The second function is primary key and
foreign key implement relationship between tables or entities.
Characteristics
of Primary Key:
A
primary key contains unique values and not accept the null values.
The
primary key should be permanent and unchangeable
A
primary key should have the minimum number of attributes.
Unique
values can be managed when primary keys are numeric
When
to use composite primary keys:
The
composite primary keys are particularly useful in two cases.
As
Identifier of composite entities.
As
Identifier of weak entities.
In the first case assume that we have
a student entity and class entity and the relationship between these two
entities is many to many via enroll entity. The enroll entity contains key
fields of student entity and class entity which is used to identity entity
instance in the enroll entity.
In
the 2nd case a weak entity in a strong relationship with a parent
entity is normally used.
Eg:
The key field of employee entity is used one of the key filed of dependent
entity.
Emp(Emp_no,
Emp_Fname,Emp_Lname,email)
Dependent(Emp_no,
Depn_ Depn_Fname, Depn_Lname, Depn_Addr)
When
to use surrogate primary keys?
These
are some instances when a primary key does not exist in the real world object.
(or)
When
the existing natural key is not suitable as primary keys.
For
Example: Consider the facility that rent for rooms for small parties. The
manager of the facility keep the all the events in the following table formats.
Date
|
Time_start
|
Time_End
|
Room
|
Event_name
|
Party-of
|
|
|
|
|
|
|
The
above table can be represented as Event entity
EVENT(Date,
Time_start, Time_End, Room, Event_name, Party-of)
In
the above entity there is no simple natural keys i.e, used as a primary key in
the model.
Based
on the concept of primary key we suggest one of these options (Date,
Time_start, Room) or (Date, Time_End, Room).
When
Implementation of data model, the composite primary key in the event entity
makes complexity and also coding.
The
solution to the problem is to use a numeric single attribute as surrogate
primary key.
EVENT(Date, Time_start, Time_End,
Room, Event_name, Party-of)
CHAPTER
- 6
Transaction
Control and Concurrency Control
Transaction:
A Transaction is a series of actions
to be performed on the database such that either all actions are performed or
none.
Suppose
that we sell a product to a customer, our sales transaction consists of at
least the following parts.
We
must write a new customer invoice.
We
must reduce the quantity on hand in the product entities.
We
must update the Acc_Transaction.
We
must update the customer_bal.
In
database terms a transaction is any action that read from and/or writes to a
database. A transaction may consists of a single SQL statement or a series of
related update statements or insert statements or combination of select, update
and insert statements.
A
Transaction is logical units of work that must entirely complete or entirely
aborted no intermediate states are accepted. All of the SQL statements in a
transaction must be completed successfully. If any of the SQL statements fail
the entire transaction is roll back.
A
successful transaction changes the database one consistent state to another. A
consistent database state is one in which all data integrity constraints are
satisfied.
Each
database request generates several I/O operations that reads from or writes to
physical storage medium.
Transaction
Properties
The Transaction has the following
properties.
Atomicity.
Consistency.
Isolation.
Durability.
Atomicity:
All Operations of the transaction must be completed. If not the transaction is
aborted.
Example:
If a transaction T1 has four SQL requests, all four requests must be
successfully completed otherwise the entire transaction is aborted.
Consistency:
When a transaction is completed, the database must be consistent state. That
means it must satisfies all integrity constraints otherwise the entire
transaction is aborted.
Isolation:
The data used during the execution of the transaction cannot be used by a
second transaction until the first transaction is completed.
Durability:
Once the transaction changes are done they cannot be undone even in the system
failure.
Transaction
Management in SQL
When a transaction sequence is
initiated by a user the sequence must continue through SQL statements until one
of the following four events occurs.
A
COMMIT statement is reached, in which case all changes are permanently recorded
into the database.
A
ROLLBACK statement is reached in which case all changes are aborted and the
database is ROLLBACK to previous state.
The
END of the transaction is successfully reached. In which case all changes are permanently
recorded within the database. This action is equivalent to COMMIT.
The
transaction is abnormally terminated. In which case the changes made in the
database are aborted and the database is ROLLLBACK to previous state. This
action equivalent to ROLLBACK.
The
Transaction Log
The
DBMS uses a transaction log file to keep track of all transaction that update
the database.
The
Information stored in the log file is used by the DBMS for ROLLBACK, abnormal
termination or system failure.
While
DBMS executes transactions that modify the database, it also automatically
updates the transaction log.
The
transaction log stores
A
record for the beginning of transaction.
For
each transaction components
The
type of operation is being to perform.
The
name of the object effected by the transaction (Name of the table).
The
“Before” and “After” values for the fields being updated.
Pointers
to the previous and next transaction log entries for the same transaction.
The
end of the transaction log.
Eg:
update product set prod_qua=prod_qua-2
Where prod code=’p1001’;
Update Customer set cus_bal=Bal+8000
Where cust_no=’c1234’;
The
transaction log for the above transaction is,
TRL_id
|
TRX-NUM
|
PREV-PTR
|
NEXT-PTR
|
OPERATION
|
TABLE
|
ROW-ID
|
ATTRIBUTE
|
BEFORE
VALUE
|
AFTER
VALUE
|
|
341
|
101
|
Null
|
352
|
START
|
Start
operation
|
|
|
|
|
|
352
|
101
|
341
|
363
|
update
|
Product
|
1011
|
Product_poh
|
25
|
23
|
|
363
|
101
|
352
|
365
|
Update
|
Custmer
|
363
|
Cust_Balance
|
525.15
|
615.75
|
|
365
|
101
|
363
|
null
|
commit
|
End
of operation
|
|
|
|
|
|
If the system failures occur the DBMS
will examine the transaction log for incomplete transaction and ROLLBACK the
database to its previous state.
Q.
Explain Concurrency control in transaction management.
Concurrency control is important
because the simultaneously execution of transaction over a shared database can
create several data integrity and consistency problems.
Lost
updates.
Uncommitted
data
Data
inconsistency.
Lost
Updates:
The Lost update problem occurs when
two concurrent transitions T1 and T2 are updating the
same data element and one of the update is lost.
Eg:
The two concurrent transactions T1 and T2 update the prod_qua
value for same item in the prod table. Assume that current prod_qua value is
35.
Transaction Computation
T1:
Purchase 100 units prod_qua=prod_qua+100
T2:Sale
30 units prod_qua=prod_qua-30
The following table shows the serial
execution of those transaction under the normal circumstance gives the answer.
Time
|
Transaction
|
Step
|
Stored
Value
|
1
|
T1
|
READ
PROD_QOH
|
35
|
2
|
T1
|
PROD_QOH=35+100
|
|
3
|
T1
|
WRITE
PROD_QOH
|
135
|
4
|
T2
|
READ PROD_QOH
|
135
|
5
|
T2
|
PROD_QOH=135-30
|
|
6
|
T2
|
WRITE
PROD_QOH
|
105
|
Suppose that transaction is read a
prod_qua value from a table before a previous transaction has been committed.
This sequence shows the following table, how the lost update problem can occur.
Time
|
Transaction
|
Step
|
Stored
Value
|
1
|
T1
|
READ
PROD_QOH
|
35
|
2
|
T2
|
READ
PROD_QOH
|
35
|
3
|
T1
|
PROD_QOH=35+100
|
|
4
|
T2
|
PROD_QOH=35-30
|
|
5
|
T1
|
WRITE
PROD_QOH
|
135
|
6
|
T2
|
WRITE
PROD_QOH
|
5
|
Uncommitted
Data
The uncommitted data problem occur
when two transactions T1 and T2 are executed concurrently
and the first transaction T1 is Rolled back after the second
transaction T2 has already accessed the uncommitted data.
Eg:
The two concurrent transactions T1 &T2update the
prod_qua value for same item in the prod table assumes that the current
prod_qua value is 35.
Transaction Computation
T1: Purchase 100
units prod_qua=prod_qua+100
T2: Sale 30 units prod_qua=prod_qua-30
The following table shows under normal
circumstance, the serial execution of this transaction use that transaction.
Time
|
Transaction
|
Step
|
Stored
Value
|
1
|
T1
|
READ
PROD_QOH
|
35
|
2
|
T1
|
PROD_QOH=35+100
|
|
3
|
T1
|
WRITE
PROD_QOH
|
135
|
4
|
T1
|
ROLLBACK
|
35
|
5
|
T2
|
READ
PROD_QOH
|
35
|
6
|
T2
|
PROD_QOH=135-30
|
|
7
|
T2
|
WRITE
PROD_QOH
|
5
|
The
following table shows, how the uncommitted data problem can arise when the Roll
back is completed after T2 has begin its execution.
Time
|
Transaction
|
Step
|
Stored
Value
|
1
|
T1
|
READ
PROD_QOH
|
35
|
2
|
T1
|
PROD_QOH=35+100
|
|
3
|
T1
|
WRITE
PROD_QOH
|
135
|
4
|
T2
|
READ
PROD_QOH
(Read
uncommitted data)
|
135
|
5
|
T2
|
PROD_QOH=135-30
|
|
6
|
T1
|
ROLLBACK
|
35
|
7
|
T2
|
WRITE
PROD_QOH
|
105
|
Inconsistency
Retrievals
Inconsistency retrieval occur when a
transaction access data before and after another transaction finish working
with same data.
For example the transaction T1
calculates the total prod_qua of the products stored in the product table. At
the same time T2 updates prod_qua for two products in the product
table.
TRANSACTION
1
|
TRANSACTION
2
|
Select
sum(Prod_QOH)from Product
|
Update
Product set PROD_QOH+10 where
Prod_code=1003;
Update
Product set PROD_QOH- 10 where
Prod_code=1004;
Commit;
|
The
following table show the results of above two transactions.
Prod_code
|
BEFORE
|
AFTER
|
|
PROD_QOH
|
PROD_QOH
|
1001
1002
1003
1004
1005
1006
|
8
32
15
23
8
6
|
8
32
15+10=25
23-10=13
8
6
|
|
Total=92
|
Total=92
|
The
final result shows in the above table are correct.
The following demonstrates that
inconsistency retrievals are possible. During the transaction T1
executes in the absence of concurrency control.
While
summing the prod_qua of the transaction T1 reads the afterprod_qua
25 for prod_code=1003 and reads the before prod_qua=23 for prod_code=1004.
Gives the result 102.
The
following table shows the inconsistent retrievals’.
TIME
|
TRANSACTION
|
STEP
|
VALUE
|
TOTAL
|
1
|
T1
|
READ
PROD_QOH for PROD_CODE=1001
|
8
|
8
|
2
|
T1
|
READ
PROD_QOH for PROD_CODE=1002
|
32
|
40
|
3
|
T2
|
READ
PROD_QOH for PROD_CODE=1003
|
15
|
|
4
|
T2
|
READ
PROD_QOH =15+10
|
|
|
5
|
T2
|
WRITE
PROD_QOH for PROD_CODE=1003
|
25
|
|
6
|
T1
|
READ
PROD_QOH for PROD_CODE=1003
|
25
|
65(AFTER)
|
7
|
T1
|
READ
PROD_QOH for PROD_CODE=1004
|
23
|
88(Before)
|
8
|
T2
|
READ
PROD_QOH for PROD_CODE=1003
|
23
|
|
9
|
T2
|
READ
PROD_QOH =23-10
|
|
|
10
|
T2
|
WRITE
PROD_QOH for PROD_CODE=1004
|
13
|
|
11
|
T2
|
COMMIT
|
|
|
12
|
T1
|
READ
PROD_QOH for PROD_CODE=1005
|
8
|
96
|
13
|
T1
|
READ
PROD_QOH for PROD_CODE=1006
|
6
|
102
|
The
Shedular
The shedular is a special DBMS
process. The shedular uses the concurrency control algorithms such as locking
or time stamp methods to control the concurrent executions of transactions on
the same database.
Concurrency
Control with Locking methods
A transaction acquires a lock before
to data access the lock is released when the transaction is completed. So that
another transaction can lock the data item for its exclusive use all lock
information is managed by a lock manager.
Lock
Granularity
Lock granularity indicates the level
of lock use.Locking can takes place at the following levels:
Database
level lock
Table
level lock
Page
level lock
Row
level lock
Field
level lock.
Database
level:
In Database level lock , the entire
database is locked, preventing the use of any tables in the database by
transaction T2 while transacting T1 is being executed.
This
type of locking is unsuitable for multi user DBMS because thousands of transactions
waiting for the previous transactions to be completed.
In
database level lock transactions cannot access the same database concurrently
even when they use different tables.
Table
Level Lock
In
table level lock, the entire table is locked preventing access to any row by
transaction T2 while transaction T1 is using a table.
If
a transaction requires several tables, each table may be locked. Two
transactions can access the same database as long as they access different
tables.
Table
level locks also cause traffic jam when many transactions are waiting to access
the same table.
The
following shows transactions T1 and T2 cannot access the
same table even when they trying to use different rows, T2 must wait
until T1 unlocks the table.
Page
Level
A page has a fixed size such as 4KB,
8KB or 12KB.
A
table can span several pages and a page can contains several rows. A page level
locks are most frequently used multi user DBMS locking method.
The
following shows transactions access the same table while locking different pages.
If T2 requires the use of a row located on a page that is locked by
T1, T2 must wait until the page is unlocked by T1.
Row
level
The DBMS allows concurrent
transactions to access different rows of a same table even when the row are located
on the same page. Although the row level locking approach the improves the
availability of the rows but its management requires high overhead.
The
following shows the row level lock.
In
the above fig. the both transactions execute concurrently, even when the requested
rows are on the same page. T2 must wait only if it requests the same row as T1.
Field
Level
The field level lock allows concurrent
transactions to access the same row but different fields. The field level
locking gives most flexible multiuser data access but it requires an extremely
high level overheads.
Lock Types:
The DBMS may use different lock types.
Binary
lock
Shared/
Exclusive lock.
Binary
Lock: A binary lock has two states locked (1), unlocked (0). Every transaction
requires a lock and unlock operations for each data item that is accessed such
operations automatically managed by DBMS.
An
Example of Binary Lock for Lost Updates Probleam
TIME
|
TRANSACTION
|
STEP
|
STORED
VALUE
|
1
|
T1
|
LOCK
PRODUCT
|
|
2
|
T1
|
READ
PROD_QOH
|
35
|
3
|
T1
|
PROD_QOH=35+100
|
|
4
|
T1
|
WRITE
PROD_QOH
|
135
|
5
|
T1
|
UNLOCK
PRODUCT
|
|
6
|
T2
|
LOCK
PRODUCT
|
|
7
|
T2
|
READ
PROD_QOH
|
135
|
8
|
T2
|
PROD_QOH=135-30
|
|
9
|
T2
|
WRITE
PROD_QOH
|
105
|
10
|
T2
|
UNLOCK
PRODUCT
|
|
Shared
or Exclusive lock
A
shared lock is issued when a transactions wants to read data item from the
database and no exclusive lock is held on that data item.
An
Exclusive lock is issued when a transaction wants to update a data item and no
locks are held on that data item by any other transaction.
The
following table shows conflict when at least one of a transaction is a write
operation.
TRANSACTION
|
RESULT
|
||
OPERATION
|
T1
|
T2
|
|
READ
|
READ
|
No
Conflict
|
|
READ
|
WRITE
|
Conflict
|
|
WRITE
|
READ
|
Conflict
|
|
WRITE
|
WRITE
|
Conflict
|
0 Comments