History of Database Systems

Data are raw facts that constitute building blocks of information.  Database is a collection of information and a means to manipulate data in a useful way, which must provide proper storage for large amounts of data, easy and fast access and facilitate the processing of data.  Database Management System (DBMS) is a set of software that is used to define, store, manipulate and control the data in a database. From pre-stage flat-file system, to relational and object-relational systems, database technology has gone through several generations and its 40 years history.



 The Evolution of the Database
 Ancient History: Data are not stored on disk; programmer defines both logical data structure and physical structure, such as storage structure, access methods, I/O modes etc. One data set per program: high data redundancy.  There is no persistence; Random access memory (RAM) is expensive and limited, programmer productivity low. 
1968 File-Based: predecessor of databaseData maintained in a flat file.  Processing characteristics determined by common use of magnetic tape medium.
  • Data are stored in files with interface between programs and files. Mapping happens between logical files and physical file, one file corresponds to one or several programs
  • Various access methods exits, e.g., sequential, indexed, random
  • Requires extensive programming in third-generation language such as COBOL, BASIC.
  • Limitations:
    • Separation and isolation: Each program maintains its own set of data, users of one program may not aware of holding or blocking by other programs.
    • Duplication: Same data is held by different programs, thus, wastes space and resources.
    • High maintenance costs such as ensuing data consistency and controlling access
    • Sharing granularity is very coarse
    • Weak security
1968-1980 Era of non-relational database: A database provides integrated and structured collection of stored operational data which can be used or shared by application systems.  Prominent hierarchical database model was IBM�s first DBMS called IMS. Prominent network database model was CODASYL DBTG model; IDMS was the most popular network DBMS.
 Hierarchical data model
  • Mid 1960s Rockwell partner with IBM to create information Management System (IMS), IMS DB/DC lead the mainframe database market in 70�s and early 80�s.
  • Based on binary trees. Logically represented by an upside down tree, one-to many relationship between parent and child records.
  • Efficient searching; Less redundant data; Data independence; Database security and integrity
  • Disadvantages:
    • Complex implementation
    • Difficult to manage and lack of standards, such as problem to add empty nodes and can�t easily handle many-many relationships.
    • Lacks structural independence, such add up application programming and use complexity.
 Network data model
  • Early 1960s, Charles Bachmann developed first DBMS at Honeywell, Integrated Data Store ( IDS)
  • It standardized in 1971 by the CODASYL group (Conference on Data Systems Languages)
  • Directed acyclic graph with nodes and edges
  • Identified 3 database component: Network schema�database organization; Subschema�view s of database per user; Data management language -- at low level and procedural
  • Each record can have multiple parents:
    • Composed of sets relationships, a set represents a one--many relationship between the owner and the member
    • Each set has owner record and member record
    • Member may have several owners
  • Main problem: System complexity and difficult to design and maintain; Lack of structural independence

The distinction of storing data in files and databases is that databases are intended to be used by multiple programs and types of users.

1970-present Era of relational database and Database Management System (DBMS): Based on relational calculus, shared collection of logically related data and a description of this data, designed to meet the information needs of an organization; System catalog/metadata provides description of data to enable program-data independence; logically related data comprises entities, attributes, and relationships of an organization�s information. Data abstraction allows view level, a way of presenting data to a group of users and logical level, how data is understood to be when writing queries.

  • 1970: Ted Codd at IBM�s San Jose Lab proposed relational models.
  • Two major projects start and both were operational in late 1970s
    • INGRES at University of California, Berkeley became commercial and followed up POSTGRES which was incorporated into Informix.
    • System R at IBM san Jose Lab, later evolved into DB2, which became one of the first DBMS product based on the relational model. (Oracle produced a similar product just prior to DB2.)
  • 1976: Peter Chen defined the Entity-relationship(ER) model
  • 1980s: Maturation of the relational database technology, more relational based DBMS were developed and SQL standard adopted by ISO and ANSI.
  • 1985: Object-oriented DBMS (OODBMS) develops.  Little success commercially because advantages did not justify the cost of converting billions of bytes of data to new format.
  • 1990s: incorporation of object-orientation in relational DBMSs, new application areas, such as data warehousing and OLAP, web and Internet, Interest in text and multimedia, enterprise resource planning (ERP) and management resource planning (MRP)
    • 1991: Microsoft ships access, a personal DBMS created as element of Windows gradually supplanted all other personal DBMS products.
    • 1995: First Internet database applications
    • 1997: XML applied to database processing, which solves long-standing database problems.  Major vendors begin to integrate XML into DBMS products.
  
Relational DBMS at glance:
Fundamental  Relational Database Characteristics
Database Schema(The description of the user data in the database)
DBMS Functions
Database Approach
Advantages and disadvantages of DBMSs
�          The internal structure of an operating database is basically fixed in the �row� direction
�          The user will interact with a logical view of the data, and need not know anything about the actual internal structure.
�          Conceptual schema: logically describes all data in the database
�          Internal schema (Physical schema): describes how data are actually stored.
�          External schema (User view): describes the data which are interested by user.
�          Data dictionary management
�          Data storage management
�          Data transformation and presentation
�          Security management
�          Multi-user access control
�          Backup and recovery management
�          Data integrity management
�          Database language and application programming interfaces
�          Database communication interfaces
�          Data definition language (DDL): define database schemas
�          Data manipulation language (DML): to retrieve, insert, delete and update data in the database. Query language are part of DML
�          Data control language (DCL): control the access of data.
Advantages:
�          Control of data redundancy, consistency, abstraction, sharing
�          Improved data integrity, security, enforcement of standards and economy of scale.
�          Balanced conflicting requirements
�          Improved data accessibility, responsiveness, maintenance
�          Increase productivity, concurrency, backup and recovery services.
Disadvantages: 
�          Complexity, size, cost of DBMSs
�          Higher impact of a failure

  • The main players:
    • Microsoft Corp- SQL Server
    • Oracle- Oracle 9i
    • IBM � IMS/DB, DB2
Relational companies challenged by �object-oriented DB� companies, and countered with �object-relational� systems, which retain the relational core while allowing type extension as in OO systems.
 The advanced database technology, along with Internet has proved faster communication and world-wide connectivity, ubiquitous publishing seems led information overload, and still, I can�t find a thing!

No comments:

Post a Comment