Friday, June 8, 2012

E-R Model Concepts



E-R Model Concepts

Entities
Entities are specific objects or things in the mini-world that are represented in the database

Ex :-a company, a job, or a university course.



What Should an Entity Be?

SHOULD BE:

An object that will have many instances in the database

An object that will be composed of multiple attributes

An object that we are trying to model

SHOULD NOT BE:

A user of the database system

An output of the database system (e.g., a report)


Characteristics of Bad Database


Characteristics of Bad Database

Not satisfy all user requirements

Would contain inconsistent and invalid data

Would require excessive effort to change

Would be slow and clumsy to use for achieving a desired outcome.

When trying to decide if some choices are better than others, you need to consider the key desirable properties of a database.

Classification of DBMSs


Classification of DBMSs

Based on the data model used:

Traditional: Relational, Network, Hierarchical.

Emerging: Object-oriented, Object-relational.

Other classifications:

Single-user (typically used with micro-computers)vs. multi-user (most DBMSs).

Centralized (uses a single computer with one database)vs. distributed (uses multiple computers, multiple databases)

Classification of DBMSs

Distributed Database Systems have now come to be known as client server based database systems because they do not support a totally distributed environment, but rather a set of database

Two Tier Client-Server Architecture-Three Tier Client-Server Architecture


Two Tier Client-Server Architecture

User Interface Programs and Application Programs run on the client side

Interface called ODBC (Open Database Connectivity ) provides an Application program interface (API) allow client side programs to call the DBMS. Most DBMS vendors provide ODBC drivers.

A client program may connect to several DBMSs.

Other variations of clients are possible: e.g., in some DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc.

In such situations the server may be called the Data Server.

Three Tier Client-Server Architecture

Common for Web applications

Intermediate Layer called Application Server or Web Server:

stores the web connectivity software and the rules and business logic (constraints) part of the application used to access the right amount of data from the database server

acts like a conduit for sending partially processed data between the database server and the client.

Additional Features- Security:

encrypt the data at the server before transmission

decrypt data at the client

Other Tools


Other Tools


Data dictionary / repository

Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc.

Active data dictionary is accessed by DBMS software and users/DBA.

Passive data dictionary is accessed by users/DBA only.

Application Development Environments and CASE (computer-aided software engineering) tools:

Examples –Power builder (Sybase), Builder (Borland)

communications software


function is to allow users at locations remote from the database system site to access the database through computer terminals, workstations, or their local personal computers.

Database System Utilities


Database System Utilities

To perform certain functions such as:

Loading data stored in files into a database. Includes data conversion tools.

Backing up the database periodically on tape.

Reorganizing database file structures.

Report generation utilities.

Performance monitoring utilities.

Other functions, such as sorting, user monitoring, data compression, etc.

DBMS Interfaces


DBMS Interfaces


Menu-Based Interfaces for Browsing :- These interfaces present the user with lists of options, called menus, that lead the user through the formulation of a request .

Forms-Based Interfaces :- A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries.

Graphical User Interfaces :- A graphical interface (GUI) typically displays a schema to the user in diagrammatic form. The user can then specify a query by manipulating the diagram. In many cases, GUIs utilize both menus and forms.

Natural Language Interfaces :- These interfaces accept requests written in English or some other language and attempt to "understand" them.

Interfaces for Parametric Users :- Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly.

Data Independence


Data Independence

Capacity to change the schema at one level of a database system without having to change the schema at the next higher level.

Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs.

Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema.

Data independence is accomplished because, when the schema is changed at some level, the schema at the next higher level remains unchanged; only the mapping between the two levels is changed.

However, the two levels of mappings create an overhead during compilation or execution of a query or program, leading to inefficiencies in the DBMS.

DBMS Languages


DBMS Languages

Data Definition Language(DDL): Used by the DBA and database designers to specify the conceptual schema of a database.

In many DBMSs, the DDL is also used to define internal and external schemas (views).

In some DBMSs, separate storage definition language(SDL) and view definition language(VDL) are used to define internal and external schemas.

Data Manipulation Language(DML): Used to specify database retrievals and updates.

DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C or an Assembly Language.

Alternatively, stand-alone DML commands can be applied directly (query language).

High Level or Non-procedural Languages : e.g., SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages.

Low Level or Procedural Languages: record-at-a-time ;they specify how to retrieve data and include constructs such as looping.

Schema VS Instances


Schema VS Instances


Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database.

Schema Diagram: A diagrammatic display of (some aspects of) a database schema.

Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE.

Database Instance: The actual data stored in a database at a particular moment in time. Also called database state(or occurrence).



Database Schema Vs. Database State

Database State: Refers to the content of a database at a moment in time.

Initial Database State: Refers to the database when it is loaded

Valid State: A state that satisfies the structure and constraints of the database.
Distinction

The database schema changes very infrequently. The database state changes every time the database is updated.

Schema is also called intension, whereas state is called extension

Three-Schema Architecture


Three-Schema Architecture


Proposed to support DBMS characteristics of:

Program-data independence.

Support of multiple views of the data.

Use of a catalog to store the database description (schema).

Internal schema at the internal level to describe physical storage structures and access paths. Typically uses a physical data model.

Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. Uses a conceptual or an implementation data model.

External schemas at the external level to describe the various user views. Usually uses the same data model as the conceptual level.

Network Model


Network Model


Some data were more naturally modeled with more than one parent per child.

Permitted the modeling of many-to-many relationships in data.

Consists of an owner record type, a set name, and a member record type.

A member record type can have that role in more than one set

 An owner record type can also be a member or owner in another set.



Network Model

ADVANTAGES:

Network Model is able to model complex relationships and represents semantics of add/delete on the relationships.

Can handle most situations for modeling using record types and relationship types.

Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database

DISADVANTAGES:

Navigational and procedural nature of processing

Database contains a complex array of pointers that thread through a set of records.

Little scope for automated "query optimization”

Categories of data models


Categories of data models

Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.)

Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer.

Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details

Hierarchical Model


Hierarchical Model


Organizes data in a tree structure.

This structure implies that a record can have repeating information, generally in the child data segments.

Collects all the instances of a specific record together as a record type.

To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types.

Hierarchical Model


ADVANTAGES:

Hierarchical Model is simple to construct and operate on

Corresponds to a number of natural hierarchically organized domains -e.g., assemblies in manufacturing, personnel organization in companies

Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.

DISADVANTAGES:

Navigational and procedural nature of processing

Database is visualized as a linear arrangement of records

Little scope for "query optimization"

Data Models


Data Models


Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey.

Data Model Operations: Operations for specifying database retrievals and updates by referring to the
concepts of the data model. Operations on the data model may include basic operations and user-defined operations.

Example of User Defined Operation

Calculate GPA
Sum(Result GPA * No. of Credits)/Sum(Credits)

Typical DBMS Functionality


Typical DBMS Functionality


Define a database : in terms of data types, structures and constraints

Construct or Load the Database on a secondary storage medium

Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content

Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent

Protection or Security measures to prevent unauthorized access

Presentation and Visualization of data

Extending Database Capabilities


Extending Database Capabilities


New functionality is being added to DBMSs in the following areas:

Scientific Applications

Image Storage and Management

Audio and Video data management

Data Mining

Spatial data management

Time Series and Historical Data Management

The above gives rise to new research and development in incorporating new data types, complex data structures, new operations and storage and indexing schemes in database systems.

When not to use a DBMS


When not to use a DBMS

Main inhibitors (costs) of using a DBMS:

High initial investment and possible need for additional hardware.

Overhead for providing generality, security, concurrency control, recovery, and integrity functions.

When a DBMS may be unnecessary:

If the database and applications are simple, well defined, and not expected to change.

If there are stringent real-time requirements that may not be met because of DBMS overhead.

If access to data by multiple users is not required.

When no DBMS may suffice:

If the database system is not able to handle the complexity of data because of modeling limitations

If the database users need special operations not supported by the DBMS.

Historical Development of Database Technology


Historical Development of Database Technology

Early Database Applications

The Hierarchical and Network Models were introduced in mid 1960’s and dominated during the seventies. A bulk of the worldwide database processing still occurs using these models.

Relational Model based Systems

The model that was originally introduced in 1970 was heavily researched and experimented with in IBM and the universities. Relational DBMS Products emerged in the 1980’s.

Object-oriented applications

OODBMSs were introduced in late 1980’s and early 1990’s to cater to the need of complex data processing in CAD and other applications. Their use has not taken off much.

Data on the Web and E-commerce Applications

Web contains data in HTML (Hypertext markup language) with links among pages. This has given rise to a  new set of applications and E-commerce is using new standards like XML (eXtended Markup Language).

Advantages of Using a DBMS


Advantages of Using a DBMS


Controlling Redundancy
Restricting Unauthorized Access
Providing Persistent Storage for Program Objects and Data Structures
Permitting Inferencing and Actions Using Rules
Providing Multiple User Interfaces
Representing Complex Relationships Among Data
Enforcing Integrity Constraints
Providing Backup and Recovery

Implications of the Database Approach


Implications of the Database Approach


Potential for Enforcing Standards
This is very crucial for the success of database applications in large organizations Standards refer to data item   names, display formats, screens, report structures, meta-data (description of data) etc
Reduced Application Development Time
Incremental time to add each new application is reduced
Flexibility to change data-structures
Database structure may evolve as new requirements are defined
Availability of Up-to-Date Information
Very important for on-line transaction systems such as airline, hotel, car reservations
Economies of Scale
By consolidating data and applications across departments wasteful overlap of resources and personnel can be avoided

System Analysts & Application Programmers (Software Engineers)


System Analysts & Application Programmers (Software Engineers)

System analysts
determine the requirements of end users, especially naive and parametric end users, and develop specifications for canned transactions that meet these requirements.

Application programmers
implement these specifications as programs; then they test, debug, document, and maintain these canned transactions.

Such analysts and programmers (nowadays called software engineers) should be familiar with the full range of capabilities provided by the DBMS to accomplish their tasks

Categories of End-users


Categories of End-users


People whose jobs require access to the database for querying, updating, and generating reports
Casual
Access database occasionally when needed

Naïve or Parametric
They make up a large section of the end-user population.
 They use previously well-defined functions in the form of “canned transactions” against the database.
Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.

Sophisticated
These include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities.
Many use tools in the form of software packages that work closely with the stored database.

Stand-alone
Mostly maintain personal databases using ready-to-use packaged applications.
An example is a tax program user that creates his or her own internal database.

Database Users


Database Users


Users may be divided into those who actually use and control the content (called “Actors on the Scene”) and those who enable the database to be developed and the DBMS software to be designed and implemented (called “Workers Behind the Scene”).


Actors on the Scene
Database Administrators
Administrating the database & the DBMS

Responsibilities :-
Authorizing access to the database
Coordinating and monitoring the use of database
Acquiring software and hardware resources as needed
Accountable for problems such as breach of security or poor system response time

Database Designers
Responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data

Responsibilities :-
 Communicate with all prospective database users, in order to understand their requirement
Identifying the data to be stored in the database
Choosing appropriate structures to represent and store this data
Develop a view of the database that meets the data and processing requirements

Characteristics of the Database Approach


Characteristics of the Database Approach


Self-Describing Nature of a Database System
Insulation between Programs and Data, and Data Abstraction
Support of Multiple Views of the Data
Sharing of Data and Multiuser Transaction Processing

Self-Describing Nature of a Database System


A complete definition or description of the database structure and constraints
DBMS software works equally well with any number of database applications
DBMS catalog stores the description of the database. The description is called meta-data


Insulation between Programs and Data, and Data Abstraction
Program-data independence
Allows changing data storage structures and operations without having to change the DBMS access

programs
Program - operation independence :-
The interface (or signature) of an operation includes the operation name and the data types of its

arguments (or parameters).
The implementation (or method) of the operation is specified separately and can be changed without affecting the interface.
Data Abstraction
A data model is used to hide storage details and present the users with a conceptual view of the database

Support of Multiple Views of the Data


A database typically has many users, each of whom may require a different perspective or view of the database.
A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored
A multiuser DBMS whose users have a variety of applications must provide facilities for defining multiple views

Sharing of Data and Multiuser Transaction Processing

Allow multiple users to access the database at the same time
Concurrency control
Allowing a set of concurrent users to retrieve and to update the database.
Concurrency control within the DBMS guarantees that each transaction is correctly executed or completely aborted.
OLTP (Online Transaction Processing) is a major part of database applications.

What is Database Management System?


What is Database Management System?

  • A database management system (DBMS) is a collection of programs that enables users to create and maintain a database.
  • The DBMS is hence a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications.
What is Database Management System?
  • Database System: The DBMS software together with the data itself. Sometimes, the applications are also included.
Example of a Database (with a Conceptual Data Model)
Mini-world for the example: Part of a UNIVERSITY environment.
Some mini-world entities:
STUDENTs
COURSEs
SECTIONs (of COURSEs)
(academic) DEPARTMENTs
INSTRUCTORs

Example of a Database (with a Conceptual Data Model)
Some mini-world relationships:
SECTIONs are of specific COURSEs
STUDENTs take SECTIONs
COURSEs have prerequisite COURSEs
INSTRUCTORs teach SECTIONs
COURSEs are offered by DEPARTMENTs
STUDENTs major in DEPARTMENTs

Note: The above could be expressed in the ENTITY-RELATIONSHIP data model.

What is a Database?


What is a Database?

  • A database is a collection of related data
  • A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested.
Data & Mini World
  • Data: Known facts that can be recorded and have an implicit meaning.
  • Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university
Types of Databases and Database Applications
  • Numeric and Textual Databases
  • Multimedia Databases
  • Geographic Information Systems (GIS)
  • Data Warehouses
  • Real-time and Active Databases