Data are raw facts that constitute building blocks of information. Database is a collection of information and a means to manipulate data in a useful way, which must provide proper storage for large amounts of data, easy and fast access and facilitate the processing of data. Database Management System (DBMS) is a set of software that is used to define, store, manipulate and control the data in a database. From pre-stage flat-file system, to relational and object-relational systems, database technology has gone through several generations and its 40 years history.
The Evolution of the Database
Ancient History: Data are not stored on disk; programmer defines both logical data structure and physical structure, such as storage structure, access methods, I/O modes etc. One data set per program: high data redundancy. There is no persistence; Random access memory (RAM) is expensive and limited, programmer productivity low.
1968 File-Based: predecessor of database, Data maintained in a flat file. Processing characteristics determined by common use of magnetic tape medium.
- Data are stored in files with interface between programs and files. Mapping happens between logical files and physical file, one file corresponds to one or several programs
- Various access methods exits, e.g., sequential, indexed, random
- Requires extensive programming in third-generation language such as COBOL, BASIC.
- Limitations:
- Separation and isolation: Each program maintains its own set of data, users of one program may not aware of holding or blocking by other programs.
- Duplication: Same data is held by different programs, thus, wastes space and resources.
- High maintenance costs such as ensuing data consistency and controlling access
- Sharing granularity is very coarse
- Weak security
1968-1980 Era of non-relational database: A database provides integrated and structured collection of stored operational data which can be used or shared by application systems. Prominent hierarchical database model was IBM�s first DBMS called IMS. Prominent network database model was CODASYL DBTG model; IDMS was the most popular network DBMS.
Hierarchical data model
- Mid 1960s Rockwell partner with IBM to create information Management System (IMS), IMS DB/DC lead the mainframe database market in 70�s and early 80�s.
- Based on binary trees. Logically represented by an upside down tree, one-to many relationship between parent and child records.
- Efficient searching; Less redundant data; Data independence; Database security and integrity
- Disadvantages:
- Complex implementation
- Difficult to manage and lack of standards, such as problem to add empty nodes and can�t easily handle many-many relationships.
- Lacks structural independence, such add up application programming and use complexity.
Network data model
- Early 1960s, Charles Bachmann developed first DBMS at Honeywell, Integrated Data Store ( IDS)
- It standardized in 1971 by the CODASYL group (Conference on Data Systems Languages)
- Directed acyclic graph with nodes and edges
- Identified 3 database component: Network schema�database organization; Subschema�view s of database per user; Data management language -- at low level and procedural
- Each record can have multiple parents:
- Composed of sets relationships, a set represents a one--many relationship between the owner and the member
- Each set has owner record and member record
- Member may have several owners
- Main problem: System complexity and difficult to design and maintain; Lack of structural independence
The distinction of storing data in files and databases is that databases are intended to be used by multiple programs and types of users.
1970-present Era of relational database and Database Management System (DBMS): Based on relational calculus, shared collection of logically related data and a description of this data, designed to meet the information needs of an organization; System catalog/metadata provides description of data to enable program-data independence; logically related data comprises entities, attributes, and relationships of an organization�s information. Data abstraction allows view level, a way of presenting data to a group of users and logical level, how data is understood to be when writing queries.
- 1970: Ted Codd at IBM�s San Jose Lab proposed relational models.
- Two major projects start and both were operational in late 1970s
- INGRES at University of California, Berkeley became commercial and followed up POSTGRES which was incorporated into Informix.
- System R at IBM san Jose Lab, later evolved into DB2, which became one of the first DBMS product based on the relational model. (Oracle produced a similar product just prior to DB2.)
- 1976: Peter Chen defined the Entity-relationship(ER) model
- 1980s: Maturation of the relational database technology, more relational based DBMS were developed and SQL standard adopted by ISO and ANSI.
- 1985: Object-oriented DBMS (OODBMS) develops. Little success commercially because advantages did not justify the cost of converting billions of bytes of data to new format.
- 1990s: incorporation of object-orientation in relational DBMSs, new application areas, such as data warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource planning (ERP) and management resource planning (MRP)
- 1991: Microsoft ships access, a personal DBMS created as element of Windows gradually supplanted all other personal DBMS products.
- 1995: First Internet database applications
- 1997: XML applied to database processing, which solves long-standing database problems. Major vendors begin to integrate XML into DBMS products.
Relational DBMS at glance:
Fundamental Relational Database Characteristics
|
Database Schema(The description of the user data in the database)
|
DBMS Functions
|
Database Approach
|
Advantages and disadvantages of DBMSs
|
� The internal structure of an operating database is basically fixed in the �row� direction
� The user will interact with a logical view of the data, and need not know anything about the actual internal structure.
|
� Conceptual schema: logically describes all data in the database
� Internal schema (Physical schema): describes how data are actually stored.
� External schema (User view): describes the data which are interested by user.
|
� Data dictionary management
� Data storage management
� Data transformation and presentation
� Security management
� Multi-user access control
� Backup and recovery management
� Data integrity management
� Database language and application programming interfaces
� Database communication interfaces
|
� Data definition language (DDL): define database schemas
� Data manipulation language (DML): to retrieve, insert, delete and update data in the database. Query language are part of DML
� Data control language (DCL): control the access of data.
|
Advantages:
� Control of data redundancy, consistency, abstraction, sharing
� Improved data integrity, security, enforcement of standards and economy of scale.
� Balanced conflicting requirements
� Improved data accessibility, responsiveness, maintenance
� Increase productivity, concurrency, backup and recovery services.
Disadvantages:
� Complexity, size, cost of DBMSs
� Higher impact of a failure
|
- The main players:
- Microsoft Corp- SQL Server
- Oracle- Oracle 9i
- IBM � IMS/DB, DB2
Relational companies challenged by �object-oriented DB� companies, and countered with �object-relational� systems, which retain the relational core while allowing type extension as in OO systems.
The advanced database technology, along with Internet has proved faster communication and world-wide connectivity, ubiquitous publishing seems led information overload, and still, I can�t find a thing!
No comments:
Post a Comment