Asia/Jakarta
Posts

Understanding Database Fundamentals: From Tables to Transactions

Understanding Database Fundamentals: From Tables to Transactions
August 16, 2024
Storing data in a plain text file works fine for 100 records. Making that same system handle millions of records with fast queries, zero data loss, and multiple users writing simultaneously — that's where databases begin. Relational Databases bridge this gap — they're structured storage systems designed to organize data into tables, enforce relationships, and guarantee data integrity. In Database Fundamentals, we explore the core concepts that power every modern application: from a simple blog to a large-scale e-commerce platform. This article covers the essential database concepts: from data modeling and normalization to SQL queries and transaction management.
+----------------+     +----------------+     +----------------+
|    Data        |     |    Query       |     | Transaction    |
|   Modeling     |     |  & Retrieval   |     |  & Integrity   |
|                |     |                |     |                |
| Tables &       |---->| SQL SELECT /   |---->| ACID           |
| Relations      |     | JOIN / GROUP   |     | Properties     |
|                |     |                |     |                |
| Normalization  |     | Index &        |     | Concurrency    |
| ERD            |     | Optimization   |     | Control        |
+----------------+     +----------------+     +----------------+

A database organizes data into tables (also called relations), where each table represents a single entity. Each row is a record, and each column is an attribute.
Sql
-- A simple table representing a Student entity
CREATE TABLE students (
    id          INT PRIMARY KEY AUTO_INCREMENT,
    name        VARCHAR(100) NOT NULL,
    email       VARCHAR(100) UNIQUE NOT NULL,
    birthdate   DATE,
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
One-to-One
  One student has one student card
  students -------- student_cards

One-to-Many
  One teacher teaches many students
  teachers -------- students
              |---- student_1
              |---- student_2
              |---- student_N

Many-to-Many
  Many students enroll in many courses
  students ---- enrollments ---- courses
Sql
-- One-to-Many: courses belong to a department
CREATE TABLE departments (
    id    INT PRIMARY KEY AUTO_INCREMENT,
    name  VARCHAR(100) NOT NULL
);

CREATE TABLE courses (
    id            INT PRIMARY KEY AUTO_INCREMENT,
    name          VARCHAR(100) NOT NULL,
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(id)
);

-- Many-to-Many: students enroll in courses
CREATE TABLE enrollments (
    student_id  INT,
    course_id   INT,
    enrolled_at DATE,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(id),
    FOREIGN KEY (course_id)  REFERENCES courses(id)
);

Normalization is the process of organizing a database to reduce data redundancy and improve data integrity. It is divided into Normal Forms (NF).
order_id | customer | email          | products
---------|----------|----------------|------------------
1        | Afrizal  | a@mail.com     | Laptop, Mouse
2        | Afrizal  | a@mail.com     | Keyboard
3        | Budi     | b@mail.com     | Monitor, Mouse

Problems: duplicate customer data, multiple values in one column
Sql
-- First Normal Form: Eliminate repeating groups, one value per cell
-- Second Normal Form: Remove partial dependencies, split into tables
-- Third Normal Form: Remove transitive dependencies

CREATE TABLE customers (
    id    INT PRIMARY KEY,
    name  VARCHAR(100),
    email VARCHAR(100) UNIQUE
);

CREATE TABLE orders (
    id          INT PRIMARY KEY,
    customer_id INT,
    order_date  DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

CREATE TABLE order_items (
    order_id   INT,
    product_id INT,
    quantity   INT,
    PRIMARY KEY (order_id, product_id)
);
| Normal Form | Rule | Problem Solved | |-------------|------|----------------| | First NF | Each column has atomic values | Repeating groups | | Second NF | No partial dependency on PK | Redundant data in composite keys | | Third NF | No transitive dependency | Indirect column dependencies | | BCNF | Every determinant is a candidate key | Anomalies in third NF edge cases |
SQL (Structured Query Language) is the standard language for interacting with relational databases.
Sql
-- CREATE
INSERT INTO students (name, email, birthdate)
VALUES ('Afrizal', 'afrizal@mail.com', '2003-05-12');

-- READ
SELECT * FROM students WHERE name = 'Afrizal';

-- UPDATE
UPDATE students SET email = 'new@mail.com' WHERE id = 1;

-- DELETE
DELETE FROM students WHERE id = 1;
Sql
-- INNER JOIN: only matching rows from both tables
SELECT students.name, courses.name AS course
FROM enrollments
INNER JOIN students ON enrollments.student_id = students.id
INNER JOIN courses  ON enrollments.course_id  = courses.id;

-- LEFT JOIN: all rows from left table, matched rows from right
SELECT students.name, enrollments.course_id
FROM students
LEFT JOIN enrollments ON students.id = enrollments.student_id;
Sql
-- Count students per department
SELECT departments.name, COUNT(students.id) AS total_students
FROM students
INNER JOIN departments ON students.department_id = departments.id
GROUP BY departments.name
HAVING COUNT(students.id) > 10
ORDER BY total_students DESC;

Without indexes, every query scans the entire table — fine for 100 rows, catastrophic for 10 million rows.
Sql
-- Single column index
CREATE INDEX idx_students_email ON students(email);

-- Composite index for queries filtering by multiple columns
CREATE INDEX idx_enrollments_student_course
ON enrollments(student_id, course_id);
| Condition | Rows Scanned | Query Time | |-----------|-------------|------------| | No index (full table scan) | 1,000,000 | ~800ms | | With single index | ~100 | ~2ms | | With composite index | ~1 | less than 1ms |
Sql
-- Check how MySQL executes a query
EXPLAIN SELECT * FROM students WHERE email = 'afrizal@mail.com';

-- Output:
-- type: ref  means index is used  (GOOD)
-- type: ALL  means full table scan (BAD)
-- rows: 1    means only one row scanned (GOOD)

A transaction is a sequence of operations treated as a single unit. ACID properties guarantee reliability even when things go wrong.
A — Atomicity    : All operations succeed, or none of them do
C — Consistency  : Data always moves from one valid state to another
I — Isolation    : Concurrent transactions do not interfere with each other
D — Durability   : Committed data survives system crashes
Sql
-- Transfer balance between two accounts
-- Without transaction: a crash halfway leaves inconsistent data
-- With transaction: either both operations complete, or neither does

START TRANSACTION;

UPDATE accounts SET balance = balance - 500000 WHERE id = 1;
UPDATE accounts SET balance = balance + 500000 WHERE id = 2;

-- If both succeed
COMMIT;

-- If anything fails
ROLLBACK;

Using the wrong data type wastes storage and slows queries:
Sql
-- Bad: storing numbers as strings
CREATE TABLE products (
    price VARCHAR(20)
    -- BAD: cannot do math operations, wastes space
);

-- Good: use appropriate numeric types
CREATE TABLE products (
    price  DECIMAL(10, 2),
    -- GOOD: exact decimal for money
    stock  INT UNSIGNED,
    -- GOOD: no negative stock values
    rating TINYINT
    -- GOOD: small range (1-5), saves space
);
Sql
-- Bad: 1 query to get all students + N queries for each enrollment
SELECT * FROM students;
-- then for each student:
SELECT * FROM enrollments WHERE student_id = ?;
-- this runs N times, once per student

-- Good: 1 query using JOIN
SELECT students.name, COUNT(enrollments.course_id) AS total_courses
FROM students
LEFT JOIN enrollments ON students.id = enrollments.student_id
GROUP BY students.id;
Opening a new database connection for every request is expensive. Connection pooling reuses existing connections:
Python
# Using SQLAlchemy connection pool in Python
from sqlalchemy import create_engine

engine = create_engine(
    "mysql+pymysql://user:pass@localhost/dbname",
    pool_size=10,       # maintain 10 persistent connections
    max_overflow=20,    # allow up to 20 extra connections
    pool_timeout=30     # wait max 30s for an available connection
)

The journey from a flat text file to a well-structured relational database requires careful thinking at every stage. Normalization, proper indexing, and ACID transactions are what separate a database that works in development from one that holds up in production. The key lesson from database fundamentals: structure is everything. A poorly designed schema that works for 1,000 users will completely fall apart at 1,000,000. Design for relationships first, optimize for performance second, and never skip transactions when data integrity matters.
On this page