Databases are not merely storage systems; they should accurately represent an enterprise’s current state. This means that all users, irrespective of their interactions, should view and engage with the same data simultaneously seeing the results of each other’s interactions without breaking data integrity. This principle is known as data consistency.
Data Consistency: A database’s capability to present a singular, valid, and current version of its data to all users, even during concurrent access and modifications. Successful read queries should reflect the database’s most recent state, while successful writes should immediately influence all subsequent read actions.
The underlying data may be distributed and true consistency may be deferred but the system
Understanding data consistency becomes clearer when examining its breaches. For instance, during early morning hours, I’ve observed my bank’s website displaying the previous day’s pending transactions, but the account balance doesn’t reflect these changes until a couple of hours later. This discrepancy between transaction views and account balances exemplifies data inconsistency. Fortunately, such inconsistencies, in this case, seem to be confined to the web interface, as the system eventually reaches a consistent state.
Ensuring data consistency is straightforward in certain scenarios. By avoiding conditions that might compromise it, consistency is preserved. For example, if only one party generates data and the rest merely access it, the likelihood of conflicts leading to inconsistency is minimal. Delayed queries still provide a consistent, albeit older, state. This is typical in scientific projects, where one lab produces data while others analyze it.
Complexities arise when multiple entities, be they human or digital, access and modify data simultaneously. Maintaining consistency amidst such concurrent interactions becomes challenging. To achieve this, databases might temporarily limit access for some users during another’s transaction or force users to resolve discrepancies before data integration.
Modern relational databases adhere to the ACID model to maintain consistency:
- Atomic
- Consistent
- Isolated
- Durable
Ensuring consistency becomes notably challenging in geographically dispersed systems with distributed data storage, especially when faced with slow or intermittent network connections. Historically, it was believed that data systems spanning vast areas couldn’t maintain consistency. The CAP Theorem suggested that in such systems, there’s an irreconcilable trade-off between system responsiveness (availability) and data consistency.
Traditional relational database systems, like Oracle, MySQL, and others, maintained strong consistency but weren’t tailored for distributed setups. This limitation spurred the rise of NoSQL in the 2000s and 2010s, emphasizing responsiveness in distributed systems, albeit with weaker consistency.
However, recent advancements have bridged this gap. Modern distributed systems, like Spanner and CockroachDB, leverage data replication and consensus algorithms (e.g., Paxos, Raft) to offer high availability while maintaining strict consistency.
DataJoint adheres to the classic ACID consistency model, leveraging serializable transactions or the master-part relationship, detailed further in the “Transactions” section.
[Some sequences of operations must be performed carefully with isolation from outside interventions and must not be left incomplete.
- A = Atomic
- C = Consistent
- I = Isolated
- D = Durable
Transaction serialization: operations performed concurrently but ensuring the same effect if they were executed sequentially.
from faker import Faker
fake = Faker()
import datajoint as dj
schema = dj.Schema("bank")
dj.Diagram(schema)
@schema
class Account(dj.Manual):
definition = """
account_number : int
---
customer_name : varchar(60)
balance : decimal(9, 2)
"""
Account.insert(
dict(
account_number=fake.random.randint(10_000_000, 99_999_999),
customer_name=fake.name(),
balance=fake.random.randint(0, 100_000_00) / 100,
)
for i in range(100)
)
Account()
keys = Account.fetch("KEY")
account1 = fake.random.choice(keys)
account2 = fake.random.choice(keys)
account1, account2
({'account_number': 86719375}, {'account_number': 32673111})
Account & account1
Account.update1(dict(account1, balance=120.00))
def transfer_bad(account1, account2, amount):
current_balance = (Account & account1).fetch1("balance")
if current_balance < amount:
raise RuntimeError("Insufficient funds")
Account.update1(dict(account1, balance=float(current_balance) - amount))
raise RuntimeError
b = (Account & account2).fetch1("balance")
Account.update1(dict(account2, balance=float(b) + amount))
def transfer_good(account1, account2, amount):
conn = dj.conn()
with conn.transaction:
current_balance = (Account & account1).fetch1("balance")
if current_balance < amount:
raise RuntimeError("Insufficient funds")
Account.update1(dict(account1, balance=float(current_balance) - amount))
b = (Account & account2).fetch1("balance")
Account.update1(dict(account2, balance=float(b) + amount))
Account & [account1, account2]
transfer(account1, account2, 100.00)
Account & [account1, account2]
import pymysql
pymysql.install_as_MySQLdb()
%load_ext sql
%config SqlMagic.autocommit=True
%sql mysql://root:simple@127.0.0.1
%%sql
use bank;
SHOW CREATE TABLE account;
%%sql
SELECT * FROM account;
%%sql
BEGIN TRANSACTION;
UPDATE account SET balance = balance + 100
WHERE account = 98230343;
UPDATE account SET balance = balance - 100
WHERE account 95440048;
COMMIT
import pymysql
conn = pymysql.connect(
user="root", host="127.0.0.1", password="simple", autocommit=True
)
cursor = conn.cursor()
cursor.execute(
"""
SELECT balance FROM bank.account
WHERE account_number = %s
""",
(account1["account_number"],),
)
amount = 100
current_balance = cursor.fetchone()
cursor.execute(
"""
UPDATE bank.account
SET balance = balance - %s
WHERE account_number = %s
""",
(
amount,
account1["account_number"],
),
)
cursor.execute(
"""
UPDATE bank.account
SET balance = balance + %s
WHERE account_number = %s
""",
(
amount,
account2["account_number"],
),
)
def transfer(cursor, account1, account2, amount):
cursor.execute("BEGIN TRANSACTION")
try:
cursor.execute(
"""
SELECT balance FROM bank.account
WHERE account_number = %s
""",
(account1["account_number"],),
)
current_balance = cursor.fetchone()
if current_balance < amount:
raise RuntimeError("Insufficient funds")
cursor.execute(
"""
UPDATE shared_bank.account
SET balance = balance - %s
WHERE account_number = %s
""",
(
amount,
account1["account_number"],
),
)
cursor.execute(
"""
UPDATE shared_bank.account
SET balance = balance + %s
WHERE account_number = %s
""",
(
amount,
account2["account_number"],
),
)
except:
cursor.execute("CANCEL TRANSACTION")
raise
else:
cursor.execute("COMMIT")
Design Patterns¶
- Sequence
- workflows
- Specialization / Generalization
- student / faculty / staff
- Hierarchies
- Ownership
- Using composite primary keys
- Secondary keys
- Parameterization *
- Associations
- Many-to-many relationships
- Directed graphs
- Trees
- Undirected graphs
- Master-part
import datajoint as dj
schema = dj.schema("dimitri_patterns")
schema.drop()
schema = dj.schema("dimitri_patterns")
Sequence / Workflows¶
# E.g. order / shipment / confirmation
@schema
class Order(dj.Manual):
definition = """
order_number : int
---
order_date : date
"""
@schema
class Shipment(dj.Manual):
definition = """
-> Order
---
ship_date : date
"""
@schema
class Confirm(dj.Manual):
definition = """
-> Shipment
---
confirm_date : date
"""
dj.Diagram(schema)
Order * Shipment * Confirm
@schema
class Order2(dj.Manual):
definition = """
order_number : int
---
order_date : date
"""
@schema
class Shipment2(dj.Manual):
definition = """
shipment_id : int
---
->[unique] Order2
ship_date : date
"""
@schema
class Confirm2(dj.Manual):
definition = """
confirm_id : int
---
-> [unique] Shipment2
confirm_date : date
"""
dj.Diagram(schema)
Order * Confirm
Order * Shipment * Confirm
@schema
class Subject(dj.Manual):
definition = """
# Experiment Subject
subject_id : int
---
species = 'mouse' : enum('human', 'mouse', 'rat', 'worm')
sex : enum('F', 'M', 'unknown')
"""
Subject()
Subject.insert1(
dict(subject_id=1, species="human", sex="unknown"), skip_duplicates=True
)
Subject.insert1(dict(subject_id=2, species="mouse", sex="F"), skip_duplicates=True)
Subject.insert1(dict(subject_id=3, species="worm", sex="M"), skip_duplicates=True)
@schema
class Session(dj.Manual):
definition = """
-> Subject
session : int
---
session_timestamp = CURRENT_TIMESTAMP : timestamp
"""
Session.insert1(dict(session=1, subject_id=2), skip_duplicates=True)
Session.insert1(dict(session=2, subject_id=2), skip_duplicates=True)
Session.insert1(dict(session=3, subject_id=3), skip_duplicates=True)
Session()
@schema
class Scan(dj.Manual):
definition = """
-> Session
scan_id : int
---
laser_power : float # mW
"""
Scan()
Scan.insert1(dict(subject_id=2, scan_id=1, session=1, laser_power=3200))
@schema
class Cell(dj.Manual):
definition = """
-> Scan
cell_id : int
---
cell_type : enum('E', 'I') # excitatory or inhibitory
"""
dj.Diagram(schema)
# Give me excitatory cells for all males
Cell & (Subject & {"sex": "M"})
@schema
class Subject2(dj.Manual):
definition = """
# Experiment Subject
subject_id : int
---
species = 'mouse' : enum('human', 'mouse', 'rat', 'worm')
sex : enum('F', 'M', 'unknown')
"""
@schema
class Session2(dj.Manual):
definition = """
session : int
---
-> Subject2
session_timestamp = CURRENT_TIMESTAMP : timestamp
"""
@schema
class Scan2(dj.Manual):
definition = """
scan_id : int
---
-> Session2
laser_power : float # mW
"""
@schema
class Cell2(dj.Manual):
definition = """
cell_id : int
---
-> Scan2
cell_type : enum('E', 'I') # excitatory or inhibitory
"""
dj.Diagram(schema)
Cell.insert1(dict(cell_id=1, scan_id=1, cell_type="E"))
Cell()
# Give me cells for subject_id=1
Cell2 & (Scan2 & (Session2 & "subject_id=2"))
# Give me cells for all males
(Cell2 & (Scan2 & (Session2 & (Subject2 & 'sex="M"')))).make_sql()
(Cell & (Subject & 'sex="M"')).make_sql()
dj.Diagram(schema)
Parameterization¶
@schema
class Image(dj.Manual):
definition = """
image_id : int
---
image : longblob
"""
@schema
class EnhanceMethod(dj.Lookup):
definition = """
enhance_method : int
---
method_name : varchar(16)
"""
contents = ((1, "sharpen"), (2, "contrast"))
@schema
class EnhancedImage(dj.Manual):
definition = """
-> Image
-> EnhanceMethod
---
enhanced_image : longblob
"""
dj.Diagram(schema)
@schema
class Book(dj.Manual):
definition = """
isbn : int
---
title : varchar(300)
"""
@schema
class Author(dj.Manual):
definition = """
author_id : int
---
name : varchar(300)
"""
@schema
class AuthorBook(dj.Manual):
definition = """
-> Author
-> Book
---
order : tinyint unsigned
unique index(isbn, order)
"""
dj.Diagram(schema)
Generalization / specialization¶
Employee, student, instructor
@schema
class Person(dj.Manual):
definition = """
person_id : int
---
date_of_birth : date
gender : enum("male", "female", "unknown")
"""
@schema
class Employee(dj.Manual):
definition = """
-> Person
---
hire_date : date
"""
@schema
class Instructor(dj.Manual):
definition = """
-> Employee
---
department : varchar(30)
"""
@schema
class Student(dj.Manual):
definition = """
-> Person
---
admission_date : date
"""
dj.Diagram(schema)
Directed graphs¶
@schema
class Subordinate(dj.Manual):
definition = """
-> Employee
---
-> Employee.proj(manager_id="person_id")
"""
dj.Diagram(schema)
@schema
class Neuron(dj.Manual):
definition = """
neuron : int
"""
@schema
class Synapse(dj.Manual):
definition = """
synapse_id : int
---
-> Neuron.proj(pre="neuron")
-> Neuron.proj(post="neuron")
"""
dj.Diagram(schema)
CREATE TABLE managed_by (
person_id : int NOT NULL,
manager_id : int NOT NULL,
PRIMARY KEY (person_id),
FOREIGN KEY (person_id) REFERENCES employee (person_id),
FOREIGN KEY (manager_id) reference employee (person_id))
Undirected graphs¶
# direcated friendship = full directed graph capability
@schema
class Friendship(dj.Manual):
definition = """
-> Person.proj(friend1 = "person_id")
-> Person.proj(friend2 = "person_id")
"""
dj.Diagram(schema)
@schema
class Order(dj.Manual):
definition = """
order_id : int
---
order_date : date
"""
class Item(dj.Part):
definition = """
-> master
order_item : int
---
"""
Puzzle:¶
Cities and states.
- Each city belongs to one state.
- Each state has one capital.
- A capital is a city.
- A capital must be in the same state.
- Tables
- Primary keys
- Foreign keys
@schema
class State(dj.Manual):
definition = """
st : char(2)
---
state : varchar(30)
"""
State.insert(
(("WA", "Washington"), ("TX", "Texas"), ("AK", "Alaska"), ("LA", "Louisiana"))
)
@schema
class City(dj.Manual):
definition = """
-> State
city_name : varchar(30)
---
capital = null : enum("YES")
unique index(st, capital)
"""
City.delete_quick()
City.insert1(("WA", "Seattle", None))
City.insert1(("TX", "Austin", "YES"))
City.insert1(("TX", "Houston", None))
City.insert1(("WA", "Olympia", "YES"))
City()
@schema
class State2(dj.Manual):
definition = """
state : char (2)
---
state_name : varchar(30)
"""
@schema
class City2(dj.Manual):
definition = """
-> State2
city_name : varchar(30)
"""
@schema
class Capital2(dj.Manual):
definition = """
-> State2
---
-> City2
"""
State2.delete_quick()
City2.delete_quick()
State2.insert(
(("WA", "Washington"), ("TX", "Texas"), ("AK", "Alaska"), ("LA", "Louisiana"))
)
City2.insert1(("WA", "Seattle"))
City2.insert1(("TX", "Austin"))
City2.insert1(("TX", "Houston"))
City2.insert1(("WA", "Olympia"))
dj.Diagram(schema)
Capital2.insert1(("TX", "Austin"))
Capital2.insert1(("TX", "Houston"))
City2()
In SQL¶
CREATE TABLE state (
state char(2) NOT NULL,
state_name varchar(30) NOT NULL,
PRIMARY KEY (state))
CREATE TABLE city (
state char(2) NOT NULL,
city_name varchar(30) NOT NULL,
PRIMARY KEY (state, city_name),
FOREIGN KEY (state) REFERENCES state(state))
CREATE TABLE capital (
state char(2) NOT NULL,
city_name varchar(30) NOT NULL,
PRIMARY KEY (state),
FOREIGN KEY (state, city_name) REFERENCES city (state, city_name))
CREATE TABLE state (
state char(2),
state_name varchar(30),
capital varchar(30),
PRIMARY KEY (state),
FOREIGN KEY (state, capital) REFERENCES city (state, city_name))
CREATE TABLE city (
state char(2),
city_name varchar(30),
PRIMARY KEY (state, city_name))
FOREIGN KEY (state) REFERENCES state(state)
In SQL¶
CREATE TABLE state (
state char(2) NOT NULL,
state_name varchar(30) NOT NULL,
PRIMARY KEY (state))
CREATE TABLE city (
city_id int NOT NULL,
state char(2) NOT NULL,
city_name varchar(30) NOT NULL,
is_capital enum('yes'),
PRIMARY KEY (state_id),
UNIQUE INDEX(state, is_capital),
FOREIGN KEY (state) REFERENCES state(state))
Problem¶
Model a vet clinic.
Customers bring in pets. Customers are identified by their cell phones. Pets are identified by their nicknames for that customer.
Pets have a date of birth, species, and date of birth.
Pets have a list of vaccinations that must be performed for their species.
Pets have vaccination administration, shot date.
schema = dj.Schema("shared_vet")
schema.drop()
schema = dj.Schema("shared_vet")
@schema
class Owner(dj.Manual):
definition = """
cell_phone : char(10)
---
full_name : varchar(16)
"""
@schema
class Species(dj.Lookup):
definition = """
species : varchar(30)
"""
contents = (("cat",), ("dog",), ("ferret",), ("parrot",))
Species()
@schema
class Pet(dj.Manual):
definition = """
-> Owner
-> Species
nickname : varchar(30)
---
birthdate : date
"""
@schema
class RequiredVaccine(dj.Manual):
definition = """
-> Species
vaccine : varchar(10)
"""
dj.Diagram(schema)
@schema
class Shot(dj.Manual):
definition = """
-> Pet
-> RequiredVaccine
---
shot_date : date
"""
dj.Diagram(schema)
Shot()
create table shot (
cell_phone char(10) NOT NULL,
nickname varchar(16) NOT NULL,
species varchar(20) NOT NULL,
vaccine varchar(10) NOT NULL,
PRIMARY KEY (cell_phone, nickname, species, vaccine),
FOREIGN KEY (cell_phone, nickname, species) REFERENCES pet(cell_phone, nickname, species),
FOREIGN KEY (species, vaccine) REFERENCES required_vaccine(species, vaccine)
)
Homework homework¶
Homework assignments, students, grades
- Homework is given with a due date.
- Students submit homework, we record the submit date
- Submitted homework gets a grade
@schema
class Assignment(dj.Manual):
definition = """
assignment : int
---
due_date : date
"""
@schema
class Student(dj.Manual):
definition = """
student_id : int
---
student_name : varchar(30)
"""
@schema
class Submission(dj.Manual):
definition = """
-> Student
-> Assignment
---
submit_date : date
"""
@schema
class Grade(dj.Manual):
definition = """
-> Submission
---
grade : char(1)
"""
dj.Diagram(schema)
Messaging App (Slack, Telegram, Signal)¶
- Users can create channels. Each channel belongs to one user.
- Channel names are globally unique
- A user can create a post in their channels only
- A user can be a guest in another person’s channel.
- Guest can reply to posts
@schema
class User(dj.Manual):
definition = """
username : varchar(12)
---
irl_name : varchar(30)
"""
@schema
class Channel(dj.Manual):
definition = """
channel : varchar(12)
---
-> User
"""
@schema
class Guest(dj.Manual):
definition = """
-> Channel
-> User
"""
dj.Diagram(schema)
@schema
class Post(dj.Manual):
definition = """
-> Channel
post : int
---
message : varchar(1024)
"""
Post * Channel
@schema
class Response(dj.Manual):
definition = """
-> Post
-> Guest
---
response : varchar(1024)
"""
dj.Diagram(schema)