In this section, we’ll explore how to build complex relationships between entities using a combination of uniqueness constraints and referential constraints.
Uniqueness Constraints¶
Uniqueness constraints are typically set through primary keys, but tables can also have additional unique indexes beyond the primary key.
Referential Constraints¶
Referential constraints establish relationships between tables and are enforced by foreign keys.
Foreign Keys Establish 1:N or 1:1 Relationships¶
When a child table defines a foreign key constraint to a parent table, it creates a relationship between the entities in the parent and child tables. The cardinality of this relationship is always 1 on the parent side: each entry in the child table must refer to a single entity in the parent table.
On the child side, the relationship can have different cardinalities:
- 0–1: if the foreign key field in the child table has a unique constraint.
- N: if no uniqueness constraint is applied to the foreign key field.
Let’s illustrate these possibilities with examples of bank customers and their accounts.
import datajoint as dj
One-to-Many Relationships¶
In the first example, let the rule be that customers are independent entities and accounts have exactly one owner but customers can have any number of accounts. This is an example of an 1:N relationship between customers and their accounts.
Then the foreign key is declared in the Account
table.
schema1 = dj.Schema('bank1')
@schema1
class Customer1(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned
unique index(ssn)
"""
@schema1
class Account1(dj.Manual):
definition = """
account : int unsigned
---
-> Customer1
open_date : date
"""
dj.Diagram(schema1)
In this design:
- Each account is linked to a single customer through a foreign key referencing the primary key in Customer1, ensuring that each account has one, and only one, owner.
- This setup allows each customer to own multiple accounts, as there is no unique constraint on the foreign key in Account1.
- Customers may have zero or more accounts, as there’s no requirement for every customer to have an associated account.
- Every account must have an owner, since the foreign key reference to Customer1 is mandatory (non-nullable).
This structure establishes a one-to-many relationship between customers and accounts: one customer can own multiple accounts, but each account belongs to only one customer.
To allow some accounts without an assigned owner, we can modify the design to make the foreign key nullable:
schema2 = dj.Schema('bank2')
@schema2
class Customer2(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned
unique index(ssn)
"""
@schema2
class Account2(dj.Manual):
definition = """
account : int unsigned
---
-> [nullable] Customer2
open_date : date
"""
dj.Diagram(schema2)
In this modified design:
- Accounts without owners are allowed by setting the foreign key to Customer2 as nullable.
- The schema diagram does not visually distinguish between required and optional dependencies, so the nullable nature of the foreign key is not visible in the diagram.
- This configuration supports cases where accounts may or may not be assigned to a customer, adding flexibility to the data model.
Consider a third design where the foreign key is part of a composite primary key:
schema3 = dj.Schema('bank3')
@schema3
class Customer3(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned # Optional SSN with unique constraint
unique index(ssn)
"""
@schema3
class Account3(dj.Manual):
definition = """
-> Customer3
account : int unsigned
---
open_date : date
"""
dj.Diagram(schema3)
In this design:
- Composite Primary Key: The primary key for
Account3
is a combination ofcustomer_id
andaccount
, meaning each account is uniquely identified by both the customer and account number together and neither of the two fields separately has to be unique across accounts. - One-to-Many Relationship: Since
customer_id
is only part of the primary key (not the entire primary key), it doesn’t need to be unique withinAccount3
. This allows each customer to have multiple accounts, preserving the one-to-many relationship between Customer3 and Account3. - Foreign Key Reference: The foreign key to
Customer3
establishes the relationship, ensuring that each entry in Account3 references a valid customer in Customer3.
This setup maintains the one-to-many relationship while allowing for each account to be uniquely identified by a combination of customer and account identifiers.
In the diagram, solid lines indicate a dependency where the foreign key is part of the primary key, signifying a stronger relationship than a secondary reference. This stronger relationship ensures that any foreign keys pointing to Account3
will also transitively reference Customer3
.
Exercise: Analyzing Bank Design¶
Review the database design below and consider how this structure might reflect the bank’s operations.
schema4 = dj.Schema('bank4')
@schema4
class Account4(dj.Manual):
definition = """
account : int unsigned
---
open_date : date
"""
@schema4
class Customer4(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned # Optional SSN with unique constraint
unique index(ssn)
-> Account4
"""
dj.Diagram(schema4)
Interpretation¶
Consider the implications of this setup:
- Each customer entry contains a reference to a single account in
Account4
, suggesting that each customer is linked to one specific account. - Since there is no constraint on the number of customers who can point to the same account, this design may allow multiple customers to be associated with a single account, indicating the possibility of shared accounts.
- However, the structure does not allow an customer to exist without being associated with an account, as each customer record must reference an existing account.
These choices might reflect the bank’s operations and policies, such as whether joint accounts are supported, and how account ownership is managed.
One-to-One Relationships¶
A one-to-one relationship is created when a foreign key in the child table is also designated as either the primary key or a unique index. This ensures that each entry in the child table corresponds to a single, unique entry in the parent table, and no parent entry is linked to more than one child entry.
In a one-to-one relationship, the connection is always optional on the child side: a child entry is not required for every parent entry. Therefore, the cardinality on the child side is 0..1—each parent may have either zero or one associated child entry.
In the following example, the foreign key in Account
is also its primary key, resulting in a one-to-one relationship:
schema5 = dj.Schema('bank5')
@schema5
class Customer5(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned # Optional SSN with unique constraint
unique index(ssn)
"""
@schema5
class Account5(dj.Manual):
definition = """
-> Customer5
---
open_date : date
"""
dj.Diagram(schema5)
The diagramming notation represents this relationship with a thick solid line, which indicates the strongest type of dependency between two entities.
In this setup, Customer5
and Account5
share the same identity because Account5
inherits its primary key from Customer5
.
This setup creates a strict one-to-one relationship between Customer5 and Account5, where each account is uniquely and exclusively linked to a single customer.
Characteristics of This Structure¶
Unified Identity: Since
Account5
shares the primary key withCustomer5
, eachAccount5
record is uniquely identified by the same key asCustomer5
. This enforces the rule that each account cannot exist without an associated customer.Conflated Entities: In the diagram, the name
Account5
is no longer underscored, indicating it has ceased to function as a separate “dimension” or independent entity.Account5
is now fully conflated withCustomer5
, meaning it effectively serves as an extension of theCustomer5
entity, rather than an independent table with its own identity.
Why Keep Separate Tables?¶
Although this design could allow us to simply merge all account-related data into the Customer5
table, there are reasons we may choose to keep Account5
as a separate table:
Modularity and Clarity: Separating
Account5
fromCustomer5
keeps the structure modular, which can clarify different aspects of customer and account data in queries and during development.Data Management: By keeping account information in a separate table, we can manage and update account-related data independently from customer information. This separation can be beneficial for tasks such as auditing, logging, or updating fields associated with only account data.
Avoiding Optional Fields: In cases where certain fields are only relevant to accounts (e.g., open_date, account-specific details), keeping them in a separate table prevents having unused or irrelevant fields in the main
Customer5
table.Access Control: When account information is sensitive or needs restricted access, placing it in a separate table can simplify access control, allowing finer-grained security policies around account data.
Scalability and Maintenance: Over time, this separation can support scalability as customer and account data expand. If we anticipate adding extensive account-specific data or if account records will be managed differently from customer records, the separate tables facilitate maintenance and future-proof the structure.
Schema Evolution: Separate tables provide flexibility to adapt or expand either the Customer5 or Account5 table independently, without altering the other table. This flexibility is especially useful if the schema is expected to evolve over time.
Another approach to establishing a one-to-one relationship is to use a secondary foreign key with an additional uniqueness constraint. DataJoint’s foreign key syntax supports both unique
and nullable
modifiers on foreign keys, providing flexibility in how relationships are structured.
unique
Modifier: Adding the unique modifier to a foreign key converts a one-to-many relationship into a one-to-one relationship. This ensures that each entry in the child table corresponds to only one entry in the parent table and vice versa, enforcing a strict one-to-one link.nullable
Modifier: Thenullable
modifier allows the relationship to be optional on the child side, meaning that not every child entry must reference a parent entry. (Relationships are already optional on the parent side, as parent entries don’t depend on children.)
One-to-One Relationship with Unique and Nullable Modifiers¶
The following example demonstrates how to model a one-to-one relationship using a secondary unique constraint:
schema6 = dj.Schema('bank6')
@schema6
class Customer6(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
ssn = null : int unsigned # Optional SSN with unique constraint
unique index(ssn)
"""
@schema6
class Account6(dj.Manual):
definition = """
account : int unsigned
---
-> [unique, nullable] Customer6
open_date : date
"""
dj.Diagram(schema6)
In this design:
- The
Account6
table includes a foreign key reference toCustomer6
, modified with bothunique
andnullable
modifications. - The
unique
constraint ensures that eachAccount6
entry is exclusively linked to a singleCustomer6
entry, establishing a one-to-one relationship. - The
nullable
constraint allows accounts to exist without necessarily being tied to a customer, making the relationship optional from the child’s perspective.
When to Use this Approach¶
Using a secondary unique constraint on a foreign key is helpful when:
- Optional Relationships: You want flexibility to create child entries without always requiring a parent reference.
- Separate, Modular Tables: Keeping entities modular and maintaining a strict one-to-one relationship without merging the tables or merging the entity identities in the child table with those in the parent.
This method provides flexibility and maintains clear separation between entities while enforcing a one-to-one association, even if the relationship isn’t visually highlighted in the diagram.
Diagram Representation Limitations¶
This dependency-and-constraint pattern doesn’t visually convey the close, exclusive association created by the unique and nullable modifiers. The diagram will show a basic line for the foreign key, lacking any specific notation to indicate that the relationship is both unique and optional.
The diagram only reflect the relationships formed through the the structure of primary keys and foreign keys, without taking into account the additional constraints imposed by secondary unique indexes. While solid think lines indicate a one-to-one relationship, additional uniqueness constraints may be in force that are not evident from the diagram alone.
Consider all the diagrams side-by-side and recall which ones are one-to-one and which are one-to-many:
dj.Diagram(schema1) + dj.Diagram(schema2) + dj.Diagram(schema3) + dj.Diagram(schema4) + dj.Diagram(schema5) + dj.Diagram(schema6)
By examining the diagrams alone, only one (schema5
) clearly shows a one-to-one relationship. For the others, additional constraints may modify the relationships, though these may not be immediately visible. In Jupyter, you can hover over a diagram element to view its full definition, including any secondary uniqueness constraints.
In practice, DataJoint users generally avoid secondary unique constraints when the primary key can enforce uniqueness. This approach has the added benefit of making relationships more transparent in the diagrams, providing a clearer representation of the schema’s structure.
Many-to-Many Relationships¶
In relational databases, a single foreign key between two tables can only establish one-to-many or one-to-one relationships. To create a many-to-many (M:N) relationship between two entities, a third table is required, with each entry in this table linking one instance from each of the two related tables. This third table is commonly referred to as an association table or join table.
Structure of Many-to-Many Relationships¶
An M:N relationship can be visualized as two one-to-many (1:N and 1:M) relationships with the association table.
The association table contains:
- A foreign key referencing each of the two related entities, establishing connections to instances of both tables.
- Composite primary key or a secondary unique constraint on the two foreign keys to ensure each combination of entities is unique.
This structure allows each entity to link to multiple instances of the other entity through the association table.
Example: Bank Customers and Bank Accounts¶
Consider a bank system where customers can have multiple accounts, and accounts can be jointly owned by multiple customers. To represent this many-to-many relationship, an association table is used to link Customer
and Account
:
schema7 = dj.Schema('bank7')
@schema7
class Customer7(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
"""
@schema7
class Account7(dj.Manual):
definition = """
account_id : int unsigned
---
open_date : date
"""
@schema7
class CustomerAccount7(dj.Manual):
definition = """
-> Customer7
-> Account7
"""
dj.Diagram(schema7)
DataJoint’s diagramming language does not use special notation for association tables; they appear identical to other tables. By contrast, other diagramming styles, such as Chen’s Entity-Relationship (ER) notation, represent associations—often called “relationship sets”—with diamond shapes to distinguish them from entity sets.
DataJoint purposefully avoids this strict conceptual distinction between entities and relationships, as the boundary between them is often blurred. For instance, a synapse between two neurons could be considered an entity, storing specific data about the synapse itself, or it might be viewed as an association linking two neurons. Additionally, some relationships can even link other relationships, a complexity not easily captured in Chen’s notation.
In DataJoint, you can often recognize an association table by its converging pattern of foreign keys, which reference multiple tables to form a many-to-many relationship. This flexible approach supports various interpretations of relationships, making DataJoint schemas particularly adaptable for complex scientific data, where associations may themselves hold meaningful attributes.
Association tables are primarily used to establish many-to-many relationships, but they also offer the flexibility to model one-to-many and even one-to-one relationships by applying additional uniqueness constraints. By controlling the uniqueness on the foreign keys within the association table, you can fine-tune the type of relationship between entities.
Example, enforcing One-to-Many with Shared Accounts¶
In the following example, we model a scenario where each customer can have only one account, but each account may be shared among multiple customers.
This structure enforces a one-to-many relationship between Customer8
and Account8
via the CustomerAccount8
association table.
schema8 = dj.Schema('bank8')
@schema8
class Customer8(dj.Manual):
definition = """
customer_id : int unsigned
---
full_name : varchar(30)
"""
@schema8
class Account8(dj.Manual):
definition = """
account_id : int unsigned
---
open_date : date
"""
@schema8
class CustomerAccount8(dj.Manual):
definition = """
-> Customer8
---
-> Account8
"""
dj.Diagram(schema8)
Explanation of the Design¶
- Association Table (
CustomerAccount8
): TheCustomerAccount8
association table linksCustomer8
andAccount8
through foreign keys. Although it resembles a many-to-many structure, by making the foreign key-> Customer8
unique (it’s the primary key), we ensure that each customer is associated with only one account. However, we leave the-> Account8
foreign key unconstrained, allowing multiple customers to link to the same account, which enables account sharing.
Versatility of Association Tables¶
While association tables are necessary for modeling many-to-many relationships, they can also model one-to-many and even one-to-one relationships.
This is accomplished by by altering their primary key or adding additional uniqueness constraints. If the association table links tables A
and B
, then:
- One-to-Many:
-> B
---
-> A
Any number of B
s are each matched to at most one A
.
- One-to-One:
-> A
---
-> [unique] B
With uniqueness constraints on both A
and B
, each entry in A
is matched to at most one entry in B
and vice versa.
- Many-to-Many
-> A
-> B
---
Leave both foreign keys in the primary key, allowing each entity to associate freely with multiple instances of the other.
This approach makes association tables a powerful tool for defining relationships of varying cardinality, adding flexibility and adaptability to DataJoint schemas. By managing uniqueness constraints directly in the association table, you can model complex relationships while keeping the primary entities’ structures simple and intuitive.
The schema diagram indicates the cardinality of these associations with thick lines corresponding to one-to-one relationship and thin lines indicating one-to-many:
dj.Diagram(schema7) + dj.Diagram(schema8)
More Design Patterns¶
Here we will consider several other common patterns that make use of uniqueness constraints (primary keys and unique indexes) and referential constraints (foreign keys) to design more complex relationships.
Sequences¶
DataJoint schemas are directional: dependencies form a directed-acyclic graph (DAG) representing sequences of steps or operations. The diagrams are plotted with all the dependencies pointing in the same direction (top-to-bottom or left-to-right), so that a schema diagram can be understood as an operational workflow.
Let’s model a simple sequence of operations such as placing an order, shipping, and delivery.
The three entities: Order
, Shipment
, and Delivery
form a sequence of one-to-one relationships:
schema = dj.Schema('orders')
@schema
class Order(dj.Manual):
definition = """
order_number : int
---
order_date : date
"""
@schema
class Shipment(dj.Manual):
definition = """
-> Order
---
ship_date : date
"""
@schema
class Delivery(dj.Manual):
definition = """
-> Shipment
---
delivery_date : date
"""
dj.Diagram(schema)
In this design, Order
, Shipment
, and Delivery
use the same primary key that cascades through the sequence. This makes it straightforward to perform relationonal operations that skip steps. For example, joining information from Order
and Delivery
does not require the inclusion of Shipment
in the query: Order * Delivery
is a well-formed query:
Order * Delivery
Hieararchies¶
Several 1:N relationships in sequence form a hierachy. Many data standards are defined as hierarchies.
Filesystems of files and folders with standardized naming conventions are examples of hiearchical structures. Many file formats such as HDF5 are also hiearchical.
Example: Brain Imaging Database¶
Consider the hierarchical structure of the Brain Imaging Data Standard — Gorgolewski et al., 2016, which is used for brain imaging data.
In BIDS, a neuroimaging study is organized around experiment subjects, imaging sessions for each subject, and then specific types of brain scans within each sessios: anatomical scans, diffusion-weighted imaging (DWI) scans, and functional imaging.
Study
, Subject
, Session
, Scan
form a hierarchical relationship.
Let’s design a relational schema for the BIDS hiearchical format:
schema = dj.Schema('bids')
@schema
class Study(dj.Manual):
definition = """
study : varchar(6) # study unique code
---
investigator : varchar(60) # primary investigator
study_description : varchar(255)
"""
@schema
class Study(dj.Manual):
definition = """
study : varchar(6) # study unique code
---
investigator : varchar(60) # primary investigator
study_description : varchar(255)
"""
@schema
class Subject(dj.Manual):
definition = """
-> Study
subject_id : varchar(8)
---
subject_species : enum('human', 'primate', 'rodent')
date_of_birth = null : date
subject_notes : varchar(2000)
"""
@schema
class Session(dj.Manual):
definition = """
-> Session
session : smallint unsigned
---
session_date : date
operator : varchar(60)
aim : varchar(60)
"""
@schema
class Func(dj.Manual):
definition = """
-> Session
session : smallint unsigned
---
session_date : date
operator : varchar(60)
aim : varchar(60)
"""
- Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, R. C., Das, S., Duff, E. P., Flandin, G., Ghosh, S. S., Glatard, T., Halchenko, Y. O., Handwerker, D. A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B. N., Nichols, T. E., Pellman, J., … Poldrack, R. A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1). 10.1038/sdata.2016.44