Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Operator: Join

The join operator combines data from two tables based on their shared attributes. It produces a new table containing all attributes from both input tables, with rows matched according to DataJoint’s semantic matching rules.

Understanding Join

Join combines attributes from two tables, matching rows where shared attributes have equal values. The result contains all columns from both tables (with shared columns appearing once) and only the rows where the matching attributes align.

Key Concepts

  • Semantic matching: Rows are matched on attributes that share both the same name and the same lineage through foreign keys

  • Algebraic closure: The result is a valid relation with a well-defined primary key

  • Attribute combination: The result contains all attributes from both tables

Basic Syntax

# Join two tables
result = TableA * TableB

The * operator performs a natural join on semantically matched attributes.

Semantic Matching

DataJoint’s join differs from SQL’s NATURAL JOIN in an important way. Two attributes are matched only when they satisfy both conditions:

  1. They have the same name in both tables

  2. They trace to the same original definition through an uninterrupted chain of foreign keys

This prevents accidental joins on attributes that happen to share the same name but have different meanings.

Example: Semantic Matching in Action

Consider tables Student(student_id, name) and Course(course_id, name):

  • Both have an attribute called name

  • But Student.name refers to a person’s name while Course.name refers to a course title

  • These attributes do not share lineage through foreign keys

  • DataJoint will raise an error if you attempt Student * Course because name collides

Resolution: Use projection to rename the colliding attribute:

# Rename 'name' in Course before joining
Student * Course.proj(course_name='name')

Types of Joins

1. Join with Foreign Key Relationship

The most common join connects tables linked by foreign keys. When table B has a foreign key referencing table A, joining them combines their attributes for each matching pair.

# Join students with their enrollments
# Enroll has a foreign key -> Student
Student * Enroll

Result structure:

  • Primary key: The union of primary keys from both tables (with shared attributes appearing once)

  • Attributes: All attributes from both tables

2. Join without Direct Relationship (Cartesian Product)

When two tables share no common attributes, the join produces a Cartesian product—every row from the first table paired with every row from the second.

# All combinations of students and departments
Student.proj() * Department.proj()

Use Cartesian products deliberately and with caution, as they can produce very large result sets.

3. Chained Joins

Multiple tables can be joined in sequence:

# Join students with enrollments and course information
Student * Enroll * Course

Join is associative: (A * B) * C produces the same result as A * (B * C).

Primary Key of Join Results

The primary key of a join result depends on the relationship between the tables:

Case 1: One-to-Many Relationship

When joining a parent table with a child table (child has foreign key to parent):

  • The result’s primary key is the child table’s primary key

  • Each child row appears with its matching parent’s attributes

# Student (parent) * Enroll (child with FK to Student)
# Result primary key: (student_id, course_id, section_id) from Enroll
Student * Enroll

Case 2: Independent Tables

When joining tables with no shared attributes:

  • The result’s primary key is the union of both primary keys

  • Every combination is included

# Result primary key: (student_id, dept)
Student.proj() * Department.proj()

Examples from the University Database

The following examples use the university database schema with Student, Department, Course, Section, Enroll, and Grade tables.

Example 1: Students with Their Majors

# Join Student with StudentMajor to see each student's declared major
Student.proj('first_name', 'last_name') * StudentMajor

SQL Equivalent:

SELECT s.student_id, s.first_name, s.last_name, m.dept, m.declare_date
FROM student s
JOIN student_major m ON s.student_id = m.student_id;

Example 2: Enrollment Details with Course Names

# Combine enrollment records with course information
Enroll * Course

SQL Equivalent:

SELECT e.*, c.course_name, c.credits
FROM enroll e
JOIN course c ON e.dept = c.dept AND e.course = c.course;

Example 3: Complete Grade Report

# Join multiple tables to create a comprehensive grade report
Student.proj('first_name', 'last_name') * Grade * Course * LetterGrade

This produces a table with student names, course details, grades, and grade point values.

Join vs. Restriction by Subquery

Join and restriction serve different purposes:

OperationPurposeResult Attributes
A * B (Join)Combine data from both tablesAll attributes from A and B
A & B (Restriction)Filter A based on matching keys in BOnly attributes from A

Example Comparison

# Join: Get student info WITH their enrollment details
Student * Enroll  # Result has student AND enrollment attributes

# Restriction: Get students WHO have enrollments
Student & Enroll  # Result has only student attributes

Use join when you need data from both tables.
Use restriction when you only need to filter one table based on another.

Left Join

DataJoint’s standard join (*) is an inner join—only rows with matches in both tables appear in the result. For cases where you need to include all rows from the left table regardless of matches, use the .join() method with left=True.

# Include all students, even those without declared majors
Student.proj('first_name', 'last_name').join(StudentMajor, left=True)

Result: All students appear in the result. For students without majors, the major-related attributes contain None.

SQL Equivalent:

SELECT s.student_id, s.first_name, s.last_name, m.dept, m.declare_date
FROM student s
LEFT JOIN student_major m ON s.student_id = m.student_id;

Note: Left joins can produce results that don’t represent a single well-defined entity type. Use them when necessary, but prefer inner joins when the semantics fit your query.

Combining Join with Other Operators

Join works seamlessly with restriction and projection in query expressions.

Restrict Before Joining

For efficiency, apply restrictions before joins to reduce the data being combined:

# Find enrollments for math courses only
(Course & {'dept': 'MATH'}) * Enroll

# Find grades for current term only
Student * (Grade & CurrentTerm)

Project After Joining

Use projection to select only the attributes you need from the combined result:

# Get student names with their enrolled course names
(Student * Enroll * Course).proj('first_name', 'last_name', 'course_name')

Complex Query Example

# Find students enrolled in MATH courses during the current term,
# showing their names and course details
(
    Student.proj('first_name', 'last_name') 
    * (Enroll & CurrentTerm & {'dept': 'MATH'}) 
    * Course.proj('course_name', 'credits')
)

SQL Translation

DataJoint’s join translates to SQL JOIN operations:

Basic Join

# DataJoint
Student * Enroll
-- SQL
SELECT s.*, e.dept, e.course, e.section_id
FROM student s
JOIN enroll e ON s.student_id = e.student_id;

Multi-Table Join

# DataJoint
Student * Enroll * Course
-- SQL
SELECT s.*, e.section_id, c.course_name, c.credits
FROM student s
JOIN enroll e ON s.student_id = e.student_id
JOIN course c ON e.dept = c.dept AND e.course = c.course;

Left Join

# DataJoint
Student.join(StudentMajor, left=True)
-- SQL
SELECT s.*, m.dept, m.declare_date
FROM student s
LEFT JOIN student_major m ON s.student_id = m.student_id;

Best Practices

1. Understand Foreign Key Relationships

Before joining tables, understand how they’re connected:

  • Check the schema diagram (dj.Diagram(schema))

  • Identify which attributes will be matched

  • Predict the primary key of the result

2. Restrict Before Joining

Apply restrictions early to minimize intermediate result sizes:

# Better: restrict first, then join
(Student & {'home_state': 'CA'}) * Enroll

# Less efficient: join first, then restrict
(Student * Enroll) & {'home_state': 'CA'}

3. Resolve Name Collisions

If two tables have attributes with the same name but different meanings, rename them before joining:

# If TableA and TableB both have 'name' with different meanings
TableA * TableB.proj(b_name='name')

4. Use Projection to Keep Results Clean

Project after joining to select only the attributes you need:

(Student * Enroll).proj('first_name', 'last_name', 'dept', 'course')

5. Be Cautious with Cartesian Products

Joining tables with no shared attributes creates a Cartesian product. This is occasionally useful but can produce very large results:

# This creates 2000 students × 4 departments = 8000 rows
Student.proj() * Department.proj()

Summary

The join operator combines data from multiple tables:

  1. Syntax: TableA * TableB performs a natural join on semantically matched attributes

  2. Semantic matching: Attributes must share both name and lineage through foreign keys

  3. Result: Contains all attributes from both tables with matching rows combined

  4. Primary key: Determined by the relationship between the joined tables

  5. Left join: Use .join(other, left=True) to include all rows from the left table

  6. Composition: Join works with restriction and projection to build complex queries

Join is essential for combining related data across tables. Use it when you need attributes from multiple tables in your result.

Practice Exercises

Using the university database, try these exercises:

Exercise 1: Basic Join

Task: Get all students with their enrolled courses, showing student name and course name.

(Student.proj('first_name', 'last_name') * Enroll * Course.proj('course_name'))

Exercise 2: Join with Restriction

Task: Find all CS majors enrolled in current term courses.

(StudentMajor & {'dept': 'CS'}) * Student.proj('first_name', 'last_name') * (Enroll & CurrentTerm)

Exercise 3: Multi-Table Join

Task: Create a complete transcript showing student name, course name, credits, and grade.

Student.proj('first_name', 'last_name') * Grade * Course.proj('course_name', 'credits')

Exercise 4: Left Join

Task: List all students with their majors, including students who haven’t declared a major.

Student.proj('first_name', 'last_name').join(StudentMajor, left=True)