Transactions - The DataJoint Book

Databases are not merely storage systems; they should accurately represent an enterprise’s current state. This means that all users, irrespective of their interactions, should view and engage with the same data simultaneously seeing the results of each other’s interactions without breaking data integrity. This principle is known as data consistency.

Data Consistency

Data Consistency: A database’s capability to present a singular, valid, and current version of its data to all users, even during concurrent access and modifications. Successful read queries should reflect the database’s most recent state, while successful writes should immediately influence all subsequent read actions.

The underlying data may be distributed and true consistency may be deferred but the system

Understanding data consistency becomes clearer when examining its breaches. For instance, during early morning hours, I’ve observed my bank’s website displaying the previous day’s pending transactions, but the account balance doesn’t reflect these changes until a couple of hours later. This discrepancy between transaction views and account balances exemplifies data inconsistency. Fortunately, such inconsistencies, in this case, seem to be confined to the web interface, as the system eventually reaches a consistent state.

Ensuring data consistency is straightforward in certain scenarios. By avoiding conditions that might compromise it, consistency is preserved. For example, if only one party generates data and the rest merely access it, the likelihood of conflicts leading to inconsistency is minimal. Delayed queries still provide a consistent, albeit older, state. This is typical in scientific projects, where one lab produces data while others analyze it.

Complexities arise when multiple entities, be they human or digital, access and modify data simultaneously. Maintaining consistency amidst such concurrent interactions becomes challenging. To achieve this, databases might temporarily limit access for some users during another’s transaction or force users to resolve discrepancies before data integration.

Modern relational databases adhere to the ACID model to maintain consistency:

ACID Model for Database Transactions

Atomic
Consistent
Isolated
Durable

Ensuring consistency becomes notably challenging in geographically dispersed systems with distributed data storage, especially when faced with slow or intermittent network connections. Historically, it was believed that data systems spanning vast areas couldn’t maintain consistency. The CAP Theorem suggested that in such systems, there’s an irreconcilable trade-off between system responsiveness (availability) and data consistency.

Traditional relational database systems, like Oracle, MySQL, and others, maintained strong consistency but weren’t tailored for distributed setups. This limitation spurred the rise of NoSQL in the 2000s and 2010s, emphasizing responsiveness in distributed systems, albeit with weaker consistency.

However, recent advancements have bridged this gap. Modern distributed systems, like Spanner and CockroachDB, leverage data replication and consensus algorithms (e.g., Paxos, Raft) to offer high availability while maintaining strict consistency.

DataJoint adheres to the classic ACID consistency model, leveraging serializable transactions or the master-part relationship, detailed further in the “Transactions” section.

[Some sequences of operations must be performed carefully with isolation from outside interventions and must not be left incomplete.

A = Atomic
C = Consistent
I = Isolated
D = Durable

Transaction serialization: operations performed concurrently but ensuring the same effect if they were executed sequentially.

from faker import Faker

fake = Faker()

import datajoint as dj

schema = dj.Schema("bank")

dj.Diagram(schema)

@schema
class Account(dj.Manual):
    definition = """
    account_number : int
    ---
    customer_name : varchar(60) 
    balance : decimal(9, 2)
    """

Account.insert(
    dict(
        account_number=fake.random.randint(10_000_000, 99_999_999),
        customer_name=fake.name(),
        balance=fake.random.randint(0, 100_000_00) / 100,
    )
    for i in range(100)
)

Account()

keys = Account.fetch("KEY")
account1 = fake.random.choice(keys)
account2 = fake.random.choice(keys)

account1, account2

({'account_number': 86719375}, {'account_number': 32673111})

Account & account1

Account.update1(dict(account1, balance=120.00))

def transfer_bad(account1, account2, amount):
    current_balance = (Account & account1).fetch1("balance")
    if current_balance < amount:
        raise RuntimeError("Insufficient funds")

    Account.update1(dict(account1, balance=float(current_balance) - amount))

    raise RuntimeError

    b = (Account & account2).fetch1("balance")

    Account.update1(dict(account2, balance=float(b) + amount))

def transfer_good(account1, account2, amount):
    conn = dj.conn()
    with conn.transaction:
        current_balance = (Account & account1).fetch1("balance")
        if current_balance < amount:
            raise RuntimeError("Insufficient funds")
        
        

        Account.update1(dict(account1, balance=float(current_balance) - amount))

        b = (Account & account2).fetch1("balance")

        Account.update1(dict(account2, balance=float(b) + amount))

Account & [account1, account2]

transfer(account1, account2, 100.00)

Account & [account1, account2]

import pymysql

pymysql.install_as_MySQLdb()
%load_ext sql
%config SqlMagic.autocommit=True
%sql mysql://root:simple@127.0.0.1

%%sql

use bank;

SHOW CREATE TABLE account;

%%sql

SELECT * FROM account;

%%sql

BEGIN TRANSACTION;
  
    UPDATE account SET balance = balance + 100
    WHERE account = 98230343;
    
    
    UPDATE account SET balance = balance - 100
    WHERE account 95440048;

COMMIT

import pymysql

conn = pymysql.connect(
    user="root", host="127.0.0.1", password="simple", autocommit=True
)

cursor = conn.cursor()
cursor.execute(
    """
    SELECT balance FROM bank.account 
    WHERE account_number = %s
    """,
    (account1["account_number"],),
)

amount = 100

current_balance = cursor.fetchone()

cursor.execute(
    """
    UPDATE bank.account 
    SET balance = balance - %s
    WHERE account_number = %s
    """,
    (
        amount,
        account1["account_number"],
    ),
)

cursor.execute(
    """
    UPDATE bank.account 
    SET balance = balance + %s
    WHERE account_number = %s
    """,
    (
        amount,
        account2["account_number"],
    ),
)

def transfer(cursor, account1, account2, amount):
    cursor.execute("BEGIN TRANSACTION")

    try:
        cursor.execute(
            """
            SELECT balance FROM bank.account 
            WHERE account_number = %s
            """,
            (account1["account_number"],),
        )

        current_balance = cursor.fetchone()

        if current_balance < amount:
            raise RuntimeError("Insufficient funds")

        cursor.execute(
            """
            UPDATE shared_bank.account 
            SET balance = balance - %s
            WHERE account_number = %s
            """,
            (
                amount,
                account1["account_number"],
            ),
        )

        cursor.execute(
            """
            UPDATE shared_bank.account 
            SET balance = balance + %s
            WHERE account_number = %s
            """,
            (
                amount,
                account2["account_number"],
            ),
        )

    except:
        cursor.execute("CANCEL TRANSACTION")
        raise

    else:
        cursor.execute("COMMIT")

Design Patterns¶

Sequence
- workflows
Specialization / Generalization
- student / faculty / staff
Hierarchies
- Ownership
- Using composite primary keys
- Secondary keys
Parameterization *
Associations
- Many-to-many relationships
- Directed graphs
- Trees
- Undirected graphs
Master-part

import datajoint as dj

schema = dj.schema("dimitri_patterns")
schema.drop()
schema = dj.schema("dimitri_patterns")

Sequence / Workflows¶

# E.g. order / shipment / confirmation

@schema
class Order(dj.Manual):
    definition = """
    order_number : int
    ---
    order_date : date
    """

@schema
class Shipment(dj.Manual):
    definition = """
    -> Order
    ---
    ship_date : date
    """

@schema
class Confirm(dj.Manual):
    definition = """
    -> Shipment
    ---
    confirm_date : date
    """

dj.Diagram(schema)

Order * Shipment * Confirm

@schema
class Order2(dj.Manual):
    definition = """
    order_number : int
    ---
    order_date : date
    """


@schema
class Shipment2(dj.Manual):
    definition = """
    shipment_id : int
    ---
    ->[unique] Order2
    ship_date : date
    """


@schema
class Confirm2(dj.Manual):
    definition = """
    confirm_id : int
    ---
    -> [unique] Shipment2
    confirm_date : date
    """

dj.Diagram(schema)

Order * Confirm

Order * Shipment * Confirm

Two ways to model hierarchies¶

Approach 1: Simple primary keys with secondary dependencies¶

@schema
class Subject(dj.Manual):
    definition = """
    # Experiment Subject
    subject_id : int
    ---
    species = 'mouse' : enum('human', 'mouse', 'rat', 'worm')
    sex : enum('F', 'M', 'unknown')
    """

Subject()

Subject.insert1(
    dict(subject_id=1, species="human", sex="unknown"), skip_duplicates=True
)
Subject.insert1(dict(subject_id=2, species="mouse", sex="F"), skip_duplicates=True)
Subject.insert1(dict(subject_id=3, species="worm", sex="M"), skip_duplicates=True)

@schema
class Session(dj.Manual):
    definition = """
    -> Subject
    session : int
    ---
    session_timestamp = CURRENT_TIMESTAMP : timestamp
    """

Session.insert1(dict(session=1, subject_id=2), skip_duplicates=True)
Session.insert1(dict(session=2, subject_id=2), skip_duplicates=True)
Session.insert1(dict(session=3, subject_id=3), skip_duplicates=True)

Session()

@schema
class Scan(dj.Manual):
    definition = """
    -> Session
    scan_id : int
    ---
    laser_power : float  # mW
    """

Scan()

Scan.insert1(dict(subject_id=2, scan_id=1, session=1, laser_power=3200))

@schema
class Cell(dj.Manual):
    definition = """
    -> Scan
    cell_id : int
    ---
    cell_type : enum('E', 'I')   # excitatory or inhibitory
    """

dj.Diagram(schema)

# Give me excitatory cells for all males

Cell & (Subject & {"sex": "M"})

@schema
class Subject2(dj.Manual):
    definition = """
    # Experiment Subject
    subject_id : int
    ---
    species = 'mouse' : enum('human', 'mouse', 'rat', 'worm')
    sex : enum('F', 'M', 'unknown')
    """


@schema
class Session2(dj.Manual):
    definition = """
    session : int
    ---
    -> Subject2
    session_timestamp = CURRENT_TIMESTAMP : timestamp
    """


@schema
class Scan2(dj.Manual):
    definition = """
    scan_id : int
    ---
    -> Session2
    laser_power : float  # mW
    """


@schema
class Cell2(dj.Manual):
    definition = """
    cell_id : int
    ---
    -> Scan2
    cell_type : enum('E', 'I')   # excitatory or inhibitory
    """

dj.Diagram(schema)

Cell.insert1(dict(cell_id=1, scan_id=1, cell_type="E"))

Cell()

# Give me cells for subject_id=1

Cell2 & (Scan2 & (Session2 & "subject_id=2"))

# Give me cells for all males

(Cell2 & (Scan2 & (Session2 & (Subject2 & 'sex="M"')))).make_sql()

(Cell & (Subject & 'sex="M"')).make_sql()

dj.Diagram(schema)

Parameterization¶

@schema
class Image(dj.Manual):
    definition = """
    image_id : int
    ---
    image : longblob
    """

@schema
class EnhanceMethod(dj.Lookup):
    definition = """
    enhance_method : int
    ---
    method_name : varchar(16)
    """

    contents = ((1, "sharpen"), (2, "contrast"))

@schema
class EnhancedImage(dj.Manual):
    definition = """
    -> Image
    -> EnhanceMethod
    ---
    enhanced_image : longblob
    """

dj.Diagram(schema)

Association¶

Books and authors

Checking accounts and account owners

@schema
class Book(dj.Manual):
    definition = """
    isbn : int
    ---
    title : varchar(300)
    """

@schema
class Author(dj.Manual):
    definition = """
    author_id : int
    ---
    name : varchar(300)
    """

@schema
class AuthorBook(dj.Manual):
    definition = """
    -> Author
    -> Book
    ---
    order : tinyint unsigned 
    unique index(isbn, order)
    """

dj.Diagram(schema)

Generalization / specialization¶

Employee, student, instructor

@schema
class Person(dj.Manual):
    definition = """
    person_id : int
    ---
    date_of_birth : date
    gender : enum("male", "female", "unknown")
    """

@schema
class Employee(dj.Manual):
    definition = """
    -> Person
    ---
    hire_date : date    
    """


@schema
class Instructor(dj.Manual):
    definition = """
    -> Employee
    ---
    department : varchar(30)
    """


@schema
class Student(dj.Manual):
    definition = """
    -> Person
    ---
    admission_date : date
    """

dj.Diagram(schema)

Directed graphs¶

@schema
class Subordinate(dj.Manual):
    definition = """
    -> Employee
    ---
    -> Employee.proj(manager_id="person_id")
    """

dj.Diagram(schema)

@schema
class Neuron(dj.Manual):
    definition = """
    neuron : int
    """


@schema
class Synapse(dj.Manual):
    definition = """
    synapse_id : int
    ---
    -> Neuron.proj(pre="neuron")
    -> Neuron.proj(post="neuron")    
    """

dj.Diagram(schema)


CREATE TABLE managed_by (
   person_id : int NOT NULL,
   manager_id : int NOT NULL,
   
   PRIMARY KEY (person_id),
   
   FOREIGN KEY (person_id) REFERENCES employee (person_id),
   FOREIGN KEY (manager_id) reference employee (person_id))

Undirected graphs¶

# direcated friendship = full directed graph capability
@schema
class Friendship(dj.Manual):
    definition = """
    -> Person.proj(friend1 = "person_id")
    -> Person.proj(friend2 = "person_id")
    """

dj.Diagram(schema)

@schema
class Order(dj.Manual):
    definition = """
    order_id : int
    ---
    order_date : date
    """

    class Item(dj.Part):
        definition = """
        -> master
        order_item : int
        ---
        
        """

Puzzle:¶

Cities and states.

Each city belongs to one state.
Each state has one capital.
A capital is a city.
A capital must be in the same state.

Tables
Primary keys
Foreign keys

@schema
class State(dj.Manual):
    definition = """
    st : char(2)
    ---
    state : varchar(30)
    """


State.insert(
    (("WA", "Washington"), ("TX", "Texas"), ("AK", "Alaska"), ("LA", "Louisiana"))
)

@schema
class City(dj.Manual):
    definition = """
    -> State
    city_name : varchar(30)
    ---
    capital = null : enum("YES")
    unique index(st, capital)
    """

City.delete_quick()

City.insert1(("WA", "Seattle", None))
City.insert1(("TX", "Austin", "YES"))
City.insert1(("TX", "Houston", None))
City.insert1(("WA", "Olympia", "YES"))

City()

@schema
class State2(dj.Manual):
    definition = """
    state : char (2)
    ---
    state_name : varchar(30)
    """


@schema
class City2(dj.Manual):
    definition = """
    -> State2
    city_name : varchar(30)
    """


@schema
class Capital2(dj.Manual):
    definition = """
    -> State2
    ---
    -> City2
    """

State2.delete_quick()
City2.delete_quick()

State2.insert(
    (("WA", "Washington"), ("TX", "Texas"), ("AK", "Alaska"), ("LA", "Louisiana"))
)

City2.insert1(("WA", "Seattle"))
City2.insert1(("TX", "Austin"))
City2.insert1(("TX", "Houston"))
City2.insert1(("WA", "Olympia"))

dj.Diagram(schema)

Capital2.insert1(("TX", "Austin"))

Capital2.insert1(("TX", "Houston"))

City2()

In SQL¶

CREATE TABLE state (
    state char(2) NOT NULL,
    state_name varchar(30) NOT NULL,
    PRIMARY KEY (state))
    
CREATE TABLE city (
    state char(2) NOT NULL,
    city_name varchar(30) NOT NULL,
    PRIMARY KEY (state, city_name),
    FOREIGN KEY (state) REFERENCES state(state))

CREATE TABLE capital (
    state char(2) NOT NULL,
    city_name varchar(30) NOT NULL,
    PRIMARY KEY (state),
    FOREIGN KEY (state, city_name) REFERENCES city (state, city_name))

CREATE TABLE state (
    state char(2),
    state_name varchar(30),
    capital varchar(30),
    PRIMARY KEY (state),
    FOREIGN KEY (state, capital) REFERENCES city (state, city_name))
    
CREATE TABLE city (
    state char(2),
    city_name varchar(30),
    PRIMARY KEY (state, city_name))
    FOREIGN KEY (state) REFERENCES state(state)

In SQL¶

CREATE TABLE state (
    state char(2) NOT NULL,
    state_name varchar(30) NOT NULL,
    PRIMARY KEY (state))
    
CREATE TABLE city (
    city_id int NOT NULL,
    state char(2) NOT NULL,
    city_name varchar(30) NOT NULL,
    is_capital enum('yes'),
    PRIMARY KEY (state_id),
    UNIQUE INDEX(state, is_capital),
    FOREIGN KEY (state) REFERENCES state(state))

Problem¶

Model a vet clinic.

Customers bring in pets. Customers are identified by their cell phones. Pets are identified by their nicknames for that customer.
Pets have a date of birth, species, and date of birth.
Pets have a list of vaccinations that must be performed for their species.
Pets have vaccination administration, shot date.

schema = dj.Schema("shared_vet")
schema.drop()
schema = dj.Schema("shared_vet")

@schema
class Owner(dj.Manual):
    definition = """
    cell_phone : char(10) 
    ---
    full_name : varchar(16)
    """

@schema
class Species(dj.Lookup):
    definition = """
    species : varchar(30)
    """
    contents = (("cat",), ("dog",), ("ferret",), ("parrot",))

Species()

@schema
class Pet(dj.Manual):
    definition = """
    -> Owner
    -> Species
    nickname : varchar(30)
    ---
    birthdate : date
    """

@schema
class RequiredVaccine(dj.Manual):
    definition = """
    -> Species
    vaccine  : varchar(10)
    """

dj.Diagram(schema)

@schema
class Shot(dj.Manual):
    definition = """
    -> Pet
    -> RequiredVaccine
    ---
    shot_date : date
    """

dj.Diagram(schema)

Shot()

create table shot (
    cell_phone char(10) NOT NULL,
    nickname varchar(16) NOT NULL,
    species varchar(20) NOT NULL,
    vaccine varchar(10) NOT NULL,
    PRIMARY KEY (cell_phone, nickname, species, vaccine),
    FOREIGN KEY (cell_phone, nickname, species) REFERENCES                         pet(cell_phone, nickname, species),
    FOREIGN KEY (species, vaccine) REFERENCES required_vaccine(species, vaccine)
)

Homework homework¶

Homework assignments, students, grades

Homework is given with a due date.
Students submit homework, we record the submit date
Submitted homework gets a grade

@schema
class Assignment(dj.Manual):
    definition = """
    assignment : int
    ---
    due_date : date
    """

@schema
class Student(dj.Manual):
    definition = """
    student_id : int
    ---
    student_name : varchar(30)
    """

@schema
class Submission(dj.Manual):
    definition = """
    -> Student
    -> Assignment
    ---
    submit_date : date
    """

@schema
class Grade(dj.Manual):
    definition = """
    -> Submission
    ---
    grade : char(1)
    """

dj.Diagram(schema)

Messaging App (Slack, Telegram, Signal)¶

Users can create channels. Each channel belongs to one user.
Channel names are globally unique
A user can create a post in their channels only
A user can be a guest in another person’s channel.
Guest can reply to posts

@schema
class User(dj.Manual):
    definition = """
    username : varchar(12)
    ---
    irl_name : varchar(30)
    """

@schema
class Channel(dj.Manual):
    definition = """
    channel : varchar(12)
    ---
    -> User
    """

@schema
class Guest(dj.Manual):
    definition = """
    -> Channel
    -> User
    """

dj.Diagram(schema)

@schema
class Post(dj.Manual):
    definition = """
    -> Channel
    post : int
    ---
    message : varchar(1024)
    """

Post * Channel

@schema
class Response(dj.Manual):
    definition = """
    -> Post
    -> Guest
    ---
    response : varchar(1024)
    """

dj.Diagram(schema)