How to Design a Database

Databases exist because of the need to change data into information. Data are the raw and unprocessed facts. Information is obtained by processing the data into something useful. A database is a large repository of facts, designed in such a way that processing the facts into information is easy.

Database Life Cycle

Like everything else, databases have a finite lifespan. Even the most successful database at some time is replaced by another, more flexible and up-to-date structure. There are generally six stages of the database life cycle:

1. Analysis: The analysis phase is where the stakeholders are interviewed and any existing system is examined to identify problems, possibilities and constraints. The objectives and scope of the new system are determined.

2. Design: The design phase is where a conceptual design is created from the previously determined requirements, and a logical and physical design are created that will ready the database for implementation.

3. Implementation: The implementation phase is where the database management system (DBMS) is installed, the databases are created, and the data are loaded or imported.

4. Testing: The testing phase is where the database is tested and fine-tuned, usually in conjunction with the associated applications.

5. Operation: The operation phase is where the database is working normally, producing information for its users.

6. Maintenance: The maintenance phase is where changes are made to the database in response to new requirements or changed operating conditions (such as heavier load).

Conceptual Design

The design phase is where the requirements identified in the previous phase are used as the basis to develop the new system. A commonly-used conceptual model is called an entity-relationship model.

Entities are basically people, places, or things you want to keep information about. For example, a library system may have the book, library and borrower entities. The following questions can help you to identify whether something is an entity:

  • Can it vary in number independently of other entities? For example, person height is probably not an entity, as it cannot vary in number independently of person. It is not fundamental, so it cannot be an entity in this case.
  • Is it important enough to warrant the effort of maintaining. For example customer may not be important for a small grocery store and will not be an entity in that case, but it will be important for a video store, and will be an entity in that case.
  • Is it its own thing that cannot be separated into subcategories? For example, a car-rental agency may have different criteria and storage requirements for different kinds of vehicles. Vehicle may not be an entity, as it can be broken up into car and boat, which are the entities.
  • Does it list a type of thing, not an instance? The video game is not an entity, rather an instance of the game entity.
  • Does it have many associated facts? If it only contains one attribute, it is unlikely to be an entity. For example, city may be an entity in some cases, but if it contains only one attribute, city name, it is more likely to be an attribute of another entity, such as customer.

The following are examples of entities involving a university with possible attributes in parentheses.

  • Course (name, code, course prerequisites)
  • Student (first_name, surname, address, age)
  • Book (title, ISBN, price, quantity in stock)

Relationships: Entities are related in certain ways. For example, a borrower may belong to a library and can take out books. A book can be found in a particular library. There are a number of possible relationships (Click here for meaning of terms):

  • Mandatory
  • Optional
  • One-to-one (1:1)
  • One-to-many (1:M)
  • Many-to-many (M:N)

An entity can also have a relationship with itself. Such an entity is called a recursive entity. Take a person entity. If you're interested in storing data about which people are brothers, you wlll have an "is brother to" relationship. In this case, the relationship is a M:N relationship.

The term connectivity refers to the relationship classification.

The term cardinality refers to the specific number of instances possible for a relationship. Cardinality limits list the minimum and maximum possible occurrences of the associated entity. In the husband and wife example, the cardinality limit is (1,1), and in the case of a student who can take between one and eight courses, the cardinality limits would be represented as (1,8).

How To Develop Entity Relationship Model

An entity-relationship diagram models how the entities relate to each other. In general, these entities go on to become the database tables. The first step in developing the diagram is to identify all the entities in the system. In the initial stage, it is not necessary to identify the attributes. Once the entities are listed, relationships between these entities are identified and modelled according to their type: one-to-many, optional and so on.

The next stage is to replace many-to-many relationships with two one-to-many relationships. A DBMS cannot directly implement many-to-many relationships, so they are decomposed into two smaller relationships. To achieve this, you have to create an intersection, or composite entity type. You can name them according to the two entities being intersected. For example, you can intersect the many-to-many relationship between student and course by a student-course entity.

The same applies even if the entity is recursive. The person entity that has an M:N relationship "is brother to" also needs an intersection entity. The intersection entity in this case would contain two fields, one for each person of the brother relationship - the primary key of the first brother and the primary key of the other brother.

Logical and Physical Design

Once the conceptual design is finalised, it is time to convert this to the logical and physical design. Each entity will become a database table, and each attribute will become a field of this table. 

Foreign keys are created, if required. If the relationship is mandatory, the foreign key must be defined as NOT NULL, and if it's optional, the foreign key can allow nulls.

The ON DELETE CASCADE and ON DELETE RESTRICT clauses are used to support foreign keys. ON DELETE RESTRICT means that records cannot be deleted unless all records associated with the foreign key are also deleted. 

Normalising tables is an important step when designing the database. This process helps avoid data redundancy and improves your data integrity.

Common Errors in Database Design

  1. Keep unrelated data in different tables.
  2. Don't store values you can calculate. It wastes space and can easily be calculated if you need it.
  3. Design should cater to all the conditions you have analysed.
  4. Attributes, which are about to become field names, should be well chosen and clearly named.
  5. Don't create too many relationships. Conversely, check whether you have created necessary relations correctly.
  6. Don't combine fields that should be separate. For example, combining first name and surname into one field is a common mistake.
  7. Every table should get a primary key. Creating a system-defined primary key ensures it will always be unique.
  8. Place foreign keys correctly. In a one-to-many relationship, the foreign key appears in the many table, and the associated primary key in the one table. Mixing these up can cause errors.
  9. Ensure referential integrity. Foreign keys should not relate to a primary key in another table that no longer exists.