Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Diagramming

Diagramming: Visualizing Schema Structure

Schema diagrams are essential tools for understanding and designing DataJoint pipelines. They provide a visual representation of tables and their dependencies, making complex workflows comprehensible at a glance.

As introduced in Relational Workflows, DataJoint schemas form Directed Acyclic Graphs (DAGs) where:

This DAG structure embodies a core principle of the Relational Workflow Model: the schema is an executable specification. Tables at the top are independent entities; tables below depend on tables above them. Reading the diagram top-to-bottom reveals the workflow execution order.

Quick Reference

Line StyleAppearanceRelationshipChild’s Primary KeyCardinality
Thick Solid━━━ExtensionParent PK onlyOne-to-one
Thin Solid───ContainmentParent PK + own field(s)One-to-many
Dashed┄┄┄ReferenceOwn independent PKOne-to-many
Key Principle

Solid lines mean the parent’s identity becomes part of the child’s identity. Dashed lines mean the child maintains independent identity.

Visual Indicators:

  • Underlined table name: Independent entity introducing a new schema dimension

  • Non-underlined name: Dependent entity whose identity derives from parent(s)

  • Orange dots: Renamed foreign keys (via .proj())

  • Table colors: Green (Manual), Blue (Imported), Red (Computed), Gray (Lookup)

Setup

First, we import DataJoint and create a schema for our examples:

import datajoint as dj

schema = dj.Schema('diagrams_tutorial')

The Three Line Styles

Line styles convey the semantic relationship between parent and child tables. The choice is determined by where the foreign key appears in the child’s definition.

Thick Solid Line: Extension (One-to-One)

The foreign key is the entire primary key of the child table.

DataJoint
SQL
@schema
class Customer(dj.Manual):
    definition = """
    customer_id : int unsigned
    ---
    customer_name : varchar(60)
    """

@schema
class CustomerPreferences(dj.Manual):
    definition = """
    -> Customer                   # This IS the entire primary key
    ---
    theme : varchar(20)
    notifications : enum('on', 'off')
    """

Semantics: The child extends or specializes the parent. They share the same identity—at most one child exists for each parent.

Use cases: Workflow sequences (Order → Shipment → Delivery), optional extensions (Customer → CustomerPreferences), modular data organization.

In diagrams: Notice that CustomerPreferences is not underlined—it doesn’t introduce a new dimension.

# Define tables for thick solid line example
@schema
class Customer(dj.Manual):
    definition = """
    customer_id : int unsigned
    ---
    customer_name : varchar(60)
    """

@schema
class CustomerPreferences(dj.Manual):
    definition = """
    -> Customer                   # This IS the entire primary key
    ---
    theme : varchar(20)
    notifications : enum('on', 'off')
    """
# View the diagram - notice the thick solid line
dj.Diagram(Customer) + dj.Diagram(CustomerPreferences)

Thin Solid Line: Containment (One-to-Many)

The foreign key is part of (but not all of) the child’s primary key.

DataJoint
SQL
@schema
class Customer(dj.Manual):
    definition = """
    customer_id : int unsigned
    ---
    customer_name : varchar(60)
    """

@schema
class Account(dj.Manual):
    definition = """
    -> Customer                   # Part of primary key
    account_num : int unsigned    # Additional PK component
    ---
    balance : decimal(10,2)
    """

Semantics: The child belongs to or is contained within the parent. Multiple children can exist for each parent, each identified within the parent’s context.

Use cases: Hierarchies (Study → Subject → Session), ownership (Customer → Account), containment (Order → OrderItem).

In diagrams: Account is underlined because account_num introduces a new dimension.

# Define tables for thin solid line example
@schema
class Account(dj.Manual):
    definition = """
    -> Customer                   # Part of primary key
    account_num : int unsigned    # Additional PK component
    ---
    balance : decimal(10,2)
    """
# View the diagram - notice the thin solid line
dj.Diagram(Customer) + dj.Diagram(Account)

Dashed Line: Reference (One-to-Many)

The foreign key is a secondary attribute (below the --- line).

DataJoint
SQL
@schema
class Department(dj.Manual):
    definition = """
    dept_id : int unsigned
    ---
    dept_name : varchar(60)
    """

@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int unsigned    # Own independent PK
    ---
    -> Department                 # Secondary attribute
    employee_name : varchar(60)
    """

Semantics: The child references or associates with the parent but maintains independent identity. The parent is just one attribute describing the child.

Use cases: Loose associations (Product → Category), references that might change (Employee → Department), when child has independent identity.

# Define tables for dashed line example
@schema
class Department(dj.Manual):
    definition = """
    dept_id : int unsigned
    ---
    dept_name : varchar(60)
    """

@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int unsigned    # Own independent PK
    ---
    -> Department                 # Secondary attribute
    employee_name : varchar(60)
    """
# View the diagram - notice the dashed line
dj.Diagram(Department) + dj.Diagram(Employee)

What Diagrams Show and Don’t Show

Understanding the limitations of diagram notation is crucial for accurate schema interpretation.

Clearly Indicated in Diagrams

FeatureHow It’s Shown
Foreign key in primary keySolid line (thick or thin)
Foreign key as secondary attributeDashed line
One-to-one via shared identityThick solid line
One-to-many via containmentThin solid line
Independent entity (new dimension)Underlined table name
Dependent entity (shared dimension)Non-underlined table name
Table tierColors (Green/Blue/Red/Gray)
Many-to-many patternsConverging solid lines into association table
Renamed foreign keysOrange dots on connection

NOT Visible in Diagrams

FeatureMust Check Table Definition
nullable foreign keysDefinition shows -> [nullable] Parent
unique foreign keysDefinition shows -> [unique] Parent
Combined modifiersDefinition shows -> [nullable, unique] Parent
Secondary unique indexesDefinition shows unique index(...)
CHECK constraintsDefinition shows constraint
Attribute names and typesHover tooltip or inspect definition
Default valuesDefinition shows = value

Example: Hidden Uniqueness

Consider these two schemas—they produce identical diagrams:

Many Spots per Employee
One Spot per Employee
@schema
class ParkingSpot(dj.Manual):
    definition = """
    spot_id : int unsigned
    ---
    -> Employee                   # many spots per employee allowed
    location : varchar(30)
    """

Both show a dashed line from ParkingSpot to Employee. Only by examining the definition can you see the unique constraint.

Association Tables and Many-to-Many

Many-to-many relationships appear as tables with converging foreign keys—multiple solid lines pointing into a single table.

DataJoint
SQL
@schema
class Student(dj.Manual):
    definition = """
    student_id : int unsigned
    ---
    student_name : varchar(60)
    """

@schema
class Course(dj.Manual):
    definition = """
    course_code : char(8)
    ---
    course_title : varchar(100)
    """

@schema
class Enrollment(dj.Manual):
    definition = """
    -> Student
    -> Course
    ---
    grade : enum('A', 'B', 'C', 'D', 'F')
    """

Reading the diagram:

  • Student and Course are independent entities (underlined, at top)

  • Enrollment has two thin solid lines converging into it

  • Its primary key is (student_id, course_code)—the combination of both parents

  • This creates a many-to-many: each student can take multiple courses, each course can have multiple students

# Define tables for many-to-many example
@schema
class Student(dj.Manual):
    definition = """
    student_id : int unsigned
    ---
    student_name : varchar(60)
    """

@schema
class Course(dj.Manual):
    definition = """
    course_code : char(8)
    ---
    course_title : varchar(100)
    """

@schema
class Enrollment(dj.Manual):
    definition = """
    -> Student
    -> Course
    ---
    grade : enum('A', 'B', 'C', 'D', 'F')
    """
# View the many-to-many diagram
dj.Diagram(Student) + dj.Diagram(Course) + dj.Diagram(Enrollment)

Renamed Foreign Keys and Orange Dots

When you reference the same parent table multiple times, or need semantic clarity, use .proj() to rename foreign key attributes.

DataJoint
SQL
@schema
class Neuron(dj.Manual):
    definition = """
    neuron_id : int unsigned
    ---
    neuron_type : enum('excitatory', 'inhibitory')
    """

@schema
class Synapse(dj.Manual):
    definition = """
    synapse_id : int unsigned
    ---
    -> Neuron.proj(presynaptic='neuron_id')
    -> Neuron.proj(postsynaptic='neuron_id')
    strength : float
    """

Orange dots appear between Neuron and Synapse, indicating:

  • A projection has renamed the foreign key attributes

  • Two distinct foreign keys connect the same pair of tables

  • presynaptic and postsynaptic both reference Neuron.neuron_id

In interactive Jupyter notebooks, hovering over orange dots reveals the projection expression.

Common patterns using renamed foreign keys:

  • Neural networks: presynaptic and postsynaptic neurons

  • Organizational hierarchies: employee and manager

  • Transportation: origin and destination airports

# Define tables for renamed foreign key example
@schema
class Neuron(dj.Manual):
    definition = """
    neuron_id : int unsigned
    ---
    neuron_type : enum('excitatory', 'inhibitory')
    """

@schema
class Synapse(dj.Manual):
    definition = """
    synapse_id : int unsigned
    ---
    -> Neuron.proj(presynaptic='neuron_id')
    -> Neuron.proj(postsynaptic='neuron_id')
    strength : float
    """
# View the diagram - notice the orange dots indicating renamed foreign keys
dj.Diagram(Neuron) + dj.Diagram(Synapse)

Diagram Operations

DataJoint provides operators to filter and combine diagrams for exploring large schemas:

# Show entire schema
dj.Diagram(schema)

# Show specific tables
dj.Diagram(Table1) + dj.Diagram(Table2)

# Show table and N levels of upstream dependencies
dj.Diagram(Table) - N

# Show table and N levels of downstream dependents
dj.Diagram(Table) + N

# Combine operations
(dj.Diagram(Table1) - 2) + (dj.Diagram(Table2) + 1)

# Intersection: show only common nodes between two diagrams
dj.Diagram(Table1) * dj.Diagram(Table2)

Finding Paths Between Tables

The intersection operator * is particularly useful for finding connection paths between two tables in a large schema. By expanding one table downstream and another upstream, then taking the intersection, you reveal only the tables that form the path(s) between them:

# Find all paths connecting table1 to table2 (where table2 is downstream from table1)
(dj.Diagram(table1) + 100) * (dj.Diagram(table2) - 100)

This works because:

  • dj.Diagram(table1) + 100 includes table1 and up to 100 levels of downstream dependents

  • dj.Diagram(table2) - 100 includes table2 and up to 100 levels of upstream dependencies

  • The intersection * shows only tables that appear in both diagrams—the connecting path(s)

# View the entire schema we've built
dj.Diagram(schema)
# Find the path from Student to Enrollment using intersection
# Expand Student downstream and Enrollment upstream, then intersect
(dj.Diagram(Student) + 100) * (dj.Diagram(Enrollment) - 100)

Diagrams and Queries

The diagram structure directly informs query patterns.

Solid line paths enable direct joins:

# If A → B → C are connected by solid lines:
A * C  # Valid—primary keys cascade through solid lines

Dashed lines require intermediate tables:

# If A ---> B (dashed), B → C (solid):
A * B * C  # Must include B to connect A and C

This is why solid lines are preferred when the relationship supports it—they simplify queries by allowing you to join non-adjacent tables directly.

Comparison to Other Notations

DataJoint’s notation differs significantly from traditional database diagramming:

FeatureChen’s ERCrow’s FootDataJoint
CardinalityNumbers near entitiesSymbols at line endsLine thickness/style
DirectionNo inherent directionNo inherent directionTop-to-bottom (DAG)
Cycles allowedYesYesNo
Entity vs. relationshipDistinct symbolsNot distinguishedNot distinguished
Primary key cascadeNot shownNot shownSolid lines show this
Identity sharingNot indicatedNot indicatedThick solid line

Why DataJoint differs:

  1. DAG structure: No cycles means schemas are readable as workflows (top-to-bottom execution order)

  2. Line style semantics: Immediately reveals relationship type without reading labels

  3. Primary key cascade visibility: Solid lines show which tables can be joined directly

  4. Unified entity treatment: No artificial distinction between “entities” and “relationships”

Best Practices

Reading Diagrams

  1. Start at the top: Identify independent entities (underlined)

  2. Follow solid lines: Trace primary key cascades downward

  3. Spot convergence patterns: Multiple lines into a table indicate associations

  4. Check line thickness: Thick = one-to-one, Thin = one-to-many containment

  5. Note dashed lines: These don’t cascade identity

  6. Check definitions: For nullable, unique, and other constraints not visible in diagrams

Designing with Diagrams

  1. Choose solid lines when:

    • Building hierarchies (Study → Subject → Session)

    • Creating workflow sequences (Order → Ship → Deliver)

    • You want direct joins across levels

  2. Choose dashed lines when:

    • Child has independent identity from parent

    • Reference might change or is optional

    • You don’t need primary key cascade

  3. Choose thick lines when:

    • Extending entities with optional information

    • Modeling workflow steps (one output per input)

    • Creating true one-to-one relationships

Interactive Tips

  • Hover over tables to see complete definitions (works in Jupyter and SVG exports)

  • Hover over orange dots to see projection expressions

  • Use + and - operators to focus on specific parts of large schemas

Summary

ConceptKey Points
Line StylesThick solid (extension), thin solid (containment), dashed (reference)
Underlined NamesIndicate tables introducing new schema dimensions
Orange DotsIndicate renamed foreign keys via .proj()
Diagram ShowsForeign key placement, relationship structure, table tiers
Diagram Doesn’t Shownullable, unique modifiers, secondary indexes

Cleanup

Optionally drop the tutorial schema when done:

# Uncomment to drop the schema
# schema.drop()