The union operator combines rows from two tables that represent the same entity type. It produces a table containing all unique rows from both input tables.
Understanding Union¶
Union combines rows from two tables into a single result. Unlike join (which combines columns), union stacks the rows of compatible tables.
Key Concepts¶
Same entity type: Both tables must represent the same kind of entity with the same primary key
Semantic compatibility: All shared attributes must be semantically matched
Deduplication: Duplicate rows (based on primary key) are included only once
Algebraic closure: The result has the same primary key as the input tables
Basic Syntax¶
# Combine rows from two tables
result = TableA + TableBThe + operator performs a union on tables with compatible schemas.
Requirements for Union¶
For a union to be valid, the two operands must satisfy these conditions:
1. Same Primary Key¶
Both tables must have the same primary key attributes—identical names, types, and semantic meaning. They must represent the same entity type.
# Valid: Both represent students with student_id as primary key
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
physics_majors = Student & (StudentMajor & {'dept': 'PHYS'})
stem_majors = math_majors + physics_majors2. Semantic Compatibility¶
All attributes shared between the two tables must be semantically compatible—they must trace to the same original definition through foreign keys.
# Invalid: Cannot union Student and Course—different entity types
# Student + Course # This would raise an errorHow Union Handles Attributes¶
The result of a union includes:
Primary Key Attributes¶
The result’s primary key is identical to that of both operands. All primary key entries from either table are included.
Secondary Attributes¶
| Scenario | Result |
|---|---|
| Attribute in both tables | Included; value from left operand takes precedence for overlapping keys |
| Attribute only in left table | Included; NULL for rows from right table |
| Attribute only in right table | Included; NULL for rows from left table |
Handling Overlapping Keys¶
When the same primary key exists in both tables:
The row appears once in the result
Secondary attribute values come from the left operand (the first table in
A + B)
# If student 1000 exists in both math_majors and physics_majors,
# the secondary attributes will come from math_majors
result = math_majors + physics_majorsCommon Use Cases¶
1. Combining Query Results with OR Logic¶
Union is useful when you need entities that satisfy one condition OR another, especially when those conditions involve different related tables.
# Students who speak English OR Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual_candidates = english_speakers + spanish_speakers2. Merging Subsets of the Same Table¶
When you’ve created different filtered views of the same table, union combines them:
# Students from California or New York
ca_students = Student & {'home_state': 'CA'}
ny_students = Student & {'home_state': 'NY'}
coastal_students = ca_students + ny_studentsNote: For simple OR conditions on the same table, restriction with a list is often cleaner:
# Equivalent and more concise
coastal_students = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]
# Or using SQL syntax
coastal_students = Student & 'home_state IN ("CA", "NY")'3. Combining Results from Different Foreign Key Paths¶
Union shines when the OR conditions involve different relationship paths:
# Students who either major in CS or are enrolled in a CS course
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
cs_enrolled = Student & (Enroll & {'dept': 'CS'})
cs_students = cs_majors + cs_enrolledExamples from the University Database¶
Example 1: STEM Majors¶
Find all students majoring in any STEM field:
# Students in STEM departments
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
physics_majors = Student & (StudentMajor & {'dept': 'PHYS'})
bio_majors = Student & (StudentMajor & {'dept': 'BIOL'})
stem_students = math_majors + cs_majors + physics_majors + bio_majorsExample 2: Students with Academic Activity¶
Find students who are either currently enrolled or have received grades:
# Students with enrollments in current term
currently_enrolled = Student & (Enroll & CurrentTerm)
# Students with any grades on record
students_with_grades = Student & Grade
# All academically active students
active_students = currently_enrolled + students_with_gradesExample 3: Honor Students¶
Find students who either have high GPAs or are in the honors program:
# High GPA students (3.5+)
high_gpa = Student.aggr(
Course * Grade * LetterGrade,
gpa='SUM(points * credits) / SUM(credits)'
) & 'gpa >= 3.5'
# Students in honors program (assuming an HonorsStudent table)
honors_enrolled = Student & HonorsStudent
# All honor students
all_honors = (Student & high_gpa) + honors_enrolledUnion with Projection¶
When unioning query expressions, often you’ll work with projections to ensure the tables have compatible structures:
Projecting to Primary Key Only¶
The simplest union uses only primary keys:
# Get unique student IDs from multiple sources
math_students = (Student & (StudentMajor & {'dept': 'MATH'})).proj()
enrolled_students = (Student & Enroll).proj()
all_relevant = math_students + enrolled_studentsEnsuring Attribute Compatibility¶
If the queries have different secondary attributes, project to a common set:
# Both restricted to same attributes for clean union
ca_names = (Student & {'home_state': 'CA'}).proj('first_name', 'last_name')
ny_names = (Student & {'home_state': 'NY'}).proj('first_name', 'last_name')
coastal_names = ca_names + ny_namesSQL Translation¶
DataJoint’s union translates to SQL UNION operations:
Basic Union¶
# DataJoint
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
stem_students = math_majors.proj() + cs_majors.proj()-- SQL
SELECT student_id FROM student
WHERE student_id IN (SELECT student_id FROM student_major WHERE dept = 'MATH')
UNION
SELECT student_id FROM student
WHERE student_id IN (SELECT student_id FROM student_major WHERE dept = 'CS');Union with Attributes¶
# DataJoint
ca_students = (Student & {'home_state': 'CA'}).proj('first_name', 'last_name')
ny_students = (Student & {'home_state': 'NY'}).proj('first_name', 'last_name')
result = ca_students + ny_students-- SQL
SELECT student_id, first_name, last_name FROM student WHERE home_state = 'CA'
UNION
SELECT student_id, first_name, last_name FROM student WHERE home_state = 'NY';Union vs. Other Approaches¶
Union vs. OR in Restriction¶
For simple conditions on the same table, use OR (list restriction):
# Using OR (preferred for simple cases)
coastal = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]
# Using union (equivalent but more verbose)
coastal = (Student & {'home_state': 'CA'}) + (Student & {'home_state': 'NY'})When Union is Necessary¶
Use union when:
Conditions involve different related tables
Queries have different computation paths
You’re combining results from separate query expressions
# This requires union—can't express with simple OR
honors_or_dean_list = (Student & HonorsProgram) + (Student & DeansList)Best Practices¶
1. Verify Entity Type Compatibility¶
Before unioning, confirm both operands represent the same entity:
# Check primary keys match
print(query_a.primary_key)
print(query_b.primary_key)2. Use Projection for Cleaner Results¶
Project to common attributes when operands have different secondary attributes:
# Project both to same structure
result = query_a.proj('name') + query_b.proj('name')3. Consider Alternatives for Simple Cases¶
For simple OR conditions, restriction with a list is cleaner:
# Instead of union for simple cases
Student & 'home_state IN ("CA", "NY", "TX")'4. Be Aware of Left Precedence¶
Remember that for overlapping primary keys, secondary attributes come from the left operand:
# order_a's attributes take precedence for shared keys
result = order_a + order_bSummary¶
The union operator combines rows from compatible tables:
Syntax:
TableA + TableBcombines rows from both tablesRequirements: Same primary key and entity type; semantically compatible attributes
Deduplication: Each primary key appears once; left operand takes precedence
Use cases: OR logic across different relationships, combining filtered subsets
Alternatives: For simple OR on the same table, use list restriction instead
Union is useful for combining query results that represent the same entity type from different filtering paths.
Practice Exercises¶
Exercise 1: Simple Union¶
Task: Find all students majoring in either Math or CS.
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
math_or_cs = math_majors + cs_majorsExercise 2: Union Across Relationships¶
Task: Find students who either have a declared major or are enrolled in at least one course.
students_with_major = Student & StudentMajor
students_enrolled = Student & Enroll
active_students = students_with_major + students_enrolledExercise 3: Union with Projection¶
Task: Get names of students from western states (CA, OR, WA).
western_students = (
(Student & {'home_state': 'CA'}).proj('first_name', 'last_name') +
(Student & {'home_state': 'OR'}).proj('first_name', 'last_name') +
(Student & {'home_state': 'WA'}).proj('first_name', 'last_name')
)
# Or more simply:
western_students = (Student & 'home_state IN ("CA", "OR", "WA")').proj('first_name', 'last_name')