Data Engineering -Day1:

5 min readOct 24, 2024

We will start with core module that is Databases, so lets understand what are databases.

What is a Database and what are its types?

A database is an organized collection of structured information or data, typically stored electronically in a computer system. Databases are managed by a Database Management System (DBMS), which allows users and applications to interact with the data efficiently

Key Features of Databases:

Data Storage: Databases store data in a structured format, making it easy to retrieve, manage, and update information
Data Retrieval: They enable quick and efficient access to data through queries, often using languages like SQL (Structured Query Language)
Data Management: Databases support various operations such as inserting, updating, deleting, and searching data
Data Integrity: They ensure the accuracy and consistency of data through constraints and rules
Security: Databases provide mechanisms to control access and protect data from unauthorized users

Types of Databases:

Relational Databases: Use tables to store data and SQL for querying (e.g., MySQL, PostgreSQL)
NoSQL Databases: Use various data models like key-value pairs, documents, or graphs (e.g., MongoDB, Cassandra)
In-Memory Databases: Store data in memory for faster access (e.g., Redis)
Cloud Databases: Hosted on cloud platforms, offering scalability and flexibility (e.g., Amazon RDS, Google Cloud Spanner)

Databases are essential for modern applications, enabling efficient data management and supporting various business and technological needs.

What is DBMS & RDBMS ?

Database Management System (DBMS) is software that allows you to create, manage and manipulate databases. It provides tools to store, retrieve and manage data efficiently. You can think of it as a digital filing cabinet where you can organize and access your data easily.

Relational Database Management System (RDBMS)

A Relational Database Management System (RDBMS) is a type of DBMS that organizes data into tables (or relations). Each table consists of rows and columns, similar to a spreadsheet. The key feature of an RDBMS is that it uses a structured query language (SQL) to interact with the data and supports relationships between tables.

Key Differences:

Data Structure:

DBMS: Can store data in various formats such as files, hierarchical or network models.
RDBMS: Specifically stores data in tabular form with predefined relationships between tables.

Schema:
DBMS: May not enforce a strict schema allowing for more flexibility but less consistency.
RDBMS: Enforces a strict schema, ensuring data integrity and consistency.

Transactions:
DBMS: May not fully support ACID (Atomicity, Consistency, Isolation, Durability) properties.
RDBMS: Fully supports ACID properties, making it reliable for complex transactions.

Scalability:
DBMS: Typically scales vertically (adding more power to a single server).
RDBMS: Can scale both vertically and horizontally (adding more servers), though horizontal scaling can be more complex.

Use Cases:
DBMS: Suitable for smaller, less complex applications where data relationships are not crucial.
RDBMS: Ideal for applications requiring complex queries, transactions, and data integrity, such as financial systems and enterprise applications.

What is the difference between Relational Databases and NoSQL databases?

Data Model:
Relational Databases (SQL): Use a tabular data model where data is organized into tables with rows and columns. Each table has a predefined schema, and relationships between tables are established using foreign keys.
NoSQL Databases: Use various data models, including key-value pairs, documents, wide-column stores, and graphs. They are schema-less, allowing for more flexible and dynamic data structures.

Schema:
Relational Databases (SQL): Have a fixed schema that must be defined before data can be inserted. This ensures data integrity and consistency.
NoSQL Databases: Have a dynamic schema, allowing for the storage of unstructured or semi-structured data. This flexibility makes it easier to handle varying data types.

Scalability:
Relational Databases (SQL): Typically scale vertically by adding more power to a single server (CPU, RAM). Horizontal scaling (adding more servers) is possible but more complex.
NoSQL Databases: Designed for horizontal scaling, making it easier to distribute data across multiple servers. This is ideal for handling large volumes of data and high-traffic applications.

Transactions:
Relational Databases (SQL): Support ACID (Atomicity, Consistency, Isolation, Durability) transactions, ensuring reliable and consistent transactions.
NoSQL Databases: Often prioritize availability and partition tolerance over strict consistency, following the BASE (Basically Available, Soft state, Eventual consistency) model.

Use Cases:
Relational Databases (SQL): Best suited for applications requiring complex queries, transactions, and data integrity, such as financial systems, ERP, and CRM.
NoSQL Databases: Ideal for applications needing to handle large volumes of unstructured data, real-time analytics, and rapid development cycles, such as social media platforms, IoT, and big data applications.

What are Transactions?

A transaction in a database context is a sequence of operations performed as a single logical unit of work. These operations can include reading, writing, updating or deleting data. The key aspect of a transaction is that it must be completed entirely or not at all ensuring the database remains in a consistent state.

Concept of ACID Properties:

ACID is an acronym that stands for Atomicity, Consistency, Isolation and Durability. These properties ensure that database transactions are processed reliably and maintain data integrity. Let’s break down each property:

Atomicity:

Ensures that all operations within a transaction are completed successfully. If any part of the transaction fails, the entire transaction is rolled back, leaving the database unchanged.
Example: If you transfer money from one bank account to another, both the debit and credit operations must succeed. If one fails, neither should be applied.

Consistency:

Ensures that a transaction brings the database from one valid state to another, maintaining all predefined rules and constraints.
Example: If a transaction violates a database constraint (like a unique key), it will be rolled back to maintain consistency.

Isolation:

Ensures that transactions are executed in isolation from one another. Intermediate states of a transaction are not visible to other transactions until the transaction is complete.
Example: If two users are updating the same data simultaneously, isolation ensures that their transactions do not interfere with each other.

Durability:

Ensures that once a transaction is committed, it remains so, even in the event of a system failure.
Example: Once a bank transfer is confirmed, the changes are permanent, even if the system crashes immediately afterward.

These properties are crucial for maintaining the reliability and integrity of databases especially in applications that require high levels of data accuracy and consistency such as financial systems.

To continue learning and get updates on my next free Blog on Data Engineering Day-2 please follow and like since lot of topics are coming up every day to learn. Happy Learning!!!!!