What IS a Database, anyway?

The term "database" gets thrown around a lot. But what exactly does it mean?

The term “database” gets thrown around a lot. But what exactly does it mean? Usually, programmers are referring to a specific type of database called “relational”. A newer, but rapidly growing, type is a “document” database. In this article, I will be describing Relational databases.

Basically, a relational database is a computer version of a filing cabinet plus a library card-catalog. Information is described as “records” equivalent to a single paper document stored in “tables” (file folders) within a “schema” (file cabinet). Related documents are cross-referenced via an “index” (card catalog).

The computer files are managed by software products known as DBMS, for Database Management System. Two major relational DBMS vendors are Microsoft SQL Server and MySQL. (A couple of document DBMS vendors are Redis and MongoDB.) The industry has agreed upon a set of common commands called SQL, for Structured Query Language, as a consistent way to read and update files in a relational database. All major relational DBMS vendors support basic SQL commands.


A relational database is known for its…wait for it…relationships. The defining feature is that it tries to formally describe what people usually understand intuitively. We all know what we mean by “the Marketing employees” of a company. But a computer has to have that spelled out in great detail.

For instance: “One department has many employees” but “An employee is a member of one and only one department”. The one-to-many relationships can then be expressed by having each “employee” record contain the identifier for that employee’s “department”. In the two tables below, can you find which employees are in the Marketing department?

Departments

Dept IDDepartment Name
10Marketing
20Administration
30Payroll
40Accounting
50Operations

Employees

Employee IDDept IDEmployee Name
110John
220Jane
330Robert
410Mary
520Michael

A later SQL command can then ask to SELECT EMPLOYEE-NAME FROM EMPLOYEES WHERE DEPT-ID = 10 to get a list of all employees in the Marketing department. In the examples above, it would find John and Mary, but not Jane or Robert or Michael.

Why use “Dept ID” in the Employees table? Why not just say “Marketing” or “Administration”? There are two main reasons.

  • One, there is probably more information to be kept about each Department. For instance, who is its Manager? Where is its Office located? What are its Business Hours? We need a home for that extra information.
  • Two, trying to consistently type the same word or phrase into multiple records is prone to error. If one Employee is put into the “Markting” Department, they will get left out of queries that look for all “Marketing” employees.
    • Similarly, if the Department name changes from “Marketing” to “Sales”, all the employees would have to be changed. It’s much easier to just change the name associated with record ID #10 in a single Department record.

Almost every database table will have an ID that is unique to that one specific record. Normally, it is a meaningless sequential number not to be confused with any of the “real” information being stored about the employee, department, etc. This explains most of those Account numbers and Confirmation numbers that we encounter all around us.


A large database trying to model complex real-world relationships can grow to hundreds of these Tables related to one another. Here are some examples:

  • A college offers Courses, scheduled as Classes, attended by Students and taught by Professors, organized into Degree programs.
  • A retail chain has Stores who hire Employees, and that sell Products, which are purchased from Suppliers and stored in Warehouses.
  • A Restaurant may have multiple Menus, each with Dishes that contain Ingredients that come from Suppliers.

But wait, there’s more!

  • A Class has multiple Students
    • but a Student may attend multiple Classes.
  • A Store has multiple Products
    • but the same Product may be sold in multiple Stores.
  • A restaurant Dish has multiple Ingredients
    • but the same Ingredient may be used in multiple Dishes.

These “many-to-many” relationships tell us that there is more information to be found.

At the intersection of two Tables, we can look for “What grade did this particular Student make in this specific Class?” or “How many of this particular Product was sold by this specific Store?” or “How much of this particular Ingredient is used in this specific Dish?”.

Now we end up with new Tables like the two below. What classes did Mary take? What grade did she make in Reading? Which students took Writing class? What products were sold in the Downtown store? How much bread was sold in all stores put together (add up the Quantities)?

Report Cards

ClassStudentGrade
ReadingMaryA
WritingJohnB
ArithmeticMaryC
WritingJohnA+
SELECT GRADE FROM REPORT-CARDS WHERE STUDENT = “Mary” AND CLASS = “Reading”

Product Sales

StoreProductQuantity
DowntownCereal100
EastsideBread200
DowntownBread300
WestsideMilk250
SELECT SUM(QUANTITY) FROM PRODUCT-SALES WHERE PRODUCT = “Bread”

By combining this idea of “Junction” tables with the original “Master” tables and their unique ID numbers, we end up with the three tables below for our Classes, Students, and Report Cards. What does the SQL statement below the Report Card table tell us?*

Classes

Class IDClass Name
1Reading
2Writing
3Arithmetic
4Recess

Students

Student IDStudent Name
1Mary
2John
3Jane
4Michael

Report Cards

Class IDStudent IDGrade
11A
24B
31C
42A+
SELECT GRADE FROM REPORT-CARDS WHERE CLASS-ID = 4 AND STUDENT-ID = 2

Database design is both art and science, and one of my favorite things to do. Try it for yourself: As you go about your day, identify the objects around you and their one-to-many or many-to-many relationships. I have an annoying tendency to look over the cashier’s shoulder in a store, watching how their register screen is displaying and trying to guess what tables are behind it!

When all the objects and their relationships are defined correctly, the database will be able to answer any business query that it is given. When a query cannot be answered, it’s probably a sign that something in the database does not yet reflect the real world accurately.


*John is very good at recess!

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *