Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Operator: Union

The union operator combines rows from two tables that represent the same entity type. It produces a table containing all unique rows from both input tables.

Understanding Union

Union combines rows from two tables into a single result. Unlike join (which combines columns), union stacks the rows of compatible tables.

Key Concepts

  • Same entity type: Both tables must represent the same kind of entity with the same primary key

  • Semantic compatibility: All shared attributes must be semantically matched

  • Deduplication: Duplicate rows (based on primary key) are included only once

  • Algebraic closure: The result has the same primary key as the input tables

Basic Syntax

# Combine rows from two tables
result = TableA + TableB

The + operator performs a union on tables with compatible schemas.

Requirements for Union

For a union to be valid, the two operands must satisfy these conditions:

1. Same Primary Key

Both tables must have the same primary key attributes—identical names, types, and semantic meaning. They must represent the same entity type.

# Valid: Both represent students with student_id as primary key
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
physics_majors = Student & (StudentMajor & {'dept': 'PHYS'})
stem_majors = math_majors + physics_majors

2. Semantic Compatibility

All attributes shared between the two tables must be semantically compatible—they must trace to the same original definition through foreign keys.

# Invalid: Cannot union Student and Course—different entity types
# Student + Course  # This would raise an error

How Union Handles Attributes

The result of a union includes:

Primary Key Attributes

The result’s primary key is identical to that of both operands. All primary key entries from either table are included.

Secondary Attributes

ScenarioResult
Attribute in both tablesIncluded; value from left operand takes precedence for overlapping keys
Attribute only in left tableIncluded; NULL for rows from right table
Attribute only in right tableIncluded; NULL for rows from left table

Handling Overlapping Keys

When the same primary key exists in both tables:

  • The row appears once in the result

  • Secondary attribute values come from the left operand (the first table in A + B)

# If student 1000 exists in both math_majors and physics_majors,
# the secondary attributes will come from math_majors
result = math_majors + physics_majors

Common Use Cases

1. Combining Query Results with OR Logic

Union is useful when you need entities that satisfy one condition OR another, especially when those conditions involve different related tables.

# Students who speak English OR Spanish
english_speakers = Person & (Fluency & {'lang_code': 'en'})
spanish_speakers = Person & (Fluency & {'lang_code': 'es'})
bilingual_candidates = english_speakers + spanish_speakers

2. Merging Subsets of the Same Table

When you’ve created different filtered views of the same table, union combines them:

# Students from California or New York
ca_students = Student & {'home_state': 'CA'}
ny_students = Student & {'home_state': 'NY'}
coastal_students = ca_students + ny_students

Note: For simple OR conditions on the same table, restriction with a list is often cleaner:

# Equivalent and more concise
coastal_students = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]
# Or using SQL syntax
coastal_students = Student & 'home_state IN ("CA", "NY")'

3. Combining Results from Different Foreign Key Paths

Union shines when the OR conditions involve different relationship paths:

# Students who either major in CS or are enrolled in a CS course
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
cs_enrolled = Student & (Enroll & {'dept': 'CS'})
cs_students = cs_majors + cs_enrolled

Examples from the University Database

Example 1: STEM Majors

Find all students majoring in any STEM field:

# Students in STEM departments
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
physics_majors = Student & (StudentMajor & {'dept': 'PHYS'})
bio_majors = Student & (StudentMajor & {'dept': 'BIOL'})

stem_students = math_majors + cs_majors + physics_majors + bio_majors

Example 2: Students with Academic Activity

Find students who are either currently enrolled or have received grades:

# Students with enrollments in current term
currently_enrolled = Student & (Enroll & CurrentTerm)

# Students with any grades on record
students_with_grades = Student & Grade

# All academically active students
active_students = currently_enrolled + students_with_grades

Example 3: Honor Students

Find students who either have high GPAs or are in the honors program:

# High GPA students (3.5+)
high_gpa = Student.aggr(
    Course * Grade * LetterGrade,
    gpa='SUM(points * credits) / SUM(credits)'
) & 'gpa >= 3.5'

# Students in honors program (assuming an HonorsStudent table)
honors_enrolled = Student & HonorsStudent

# All honor students
all_honors = (Student & high_gpa) + honors_enrolled

Union with Projection

When unioning query expressions, often you’ll work with projections to ensure the tables have compatible structures:

Projecting to Primary Key Only

The simplest union uses only primary keys:

# Get unique student IDs from multiple sources
math_students = (Student & (StudentMajor & {'dept': 'MATH'})).proj()
enrolled_students = (Student & Enroll).proj()
all_relevant = math_students + enrolled_students

Ensuring Attribute Compatibility

If the queries have different secondary attributes, project to a common set:

# Both restricted to same attributes for clean union
ca_names = (Student & {'home_state': 'CA'}).proj('first_name', 'last_name')
ny_names = (Student & {'home_state': 'NY'}).proj('first_name', 'last_name')
coastal_names = ca_names + ny_names

SQL Translation

DataJoint’s union translates to SQL UNION operations:

Basic Union

# DataJoint
math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
stem_students = math_majors.proj() + cs_majors.proj()
-- SQL
SELECT student_id FROM student
WHERE student_id IN (SELECT student_id FROM student_major WHERE dept = 'MATH')
UNION
SELECT student_id FROM student
WHERE student_id IN (SELECT student_id FROM student_major WHERE dept = 'CS');

Union with Attributes

# DataJoint
ca_students = (Student & {'home_state': 'CA'}).proj('first_name', 'last_name')
ny_students = (Student & {'home_state': 'NY'}).proj('first_name', 'last_name')
result = ca_students + ny_students
-- SQL
SELECT student_id, first_name, last_name FROM student WHERE home_state = 'CA'
UNION
SELECT student_id, first_name, last_name FROM student WHERE home_state = 'NY';

Union vs. Other Approaches

Union vs. OR in Restriction

For simple conditions on the same table, use OR (list restriction):

# Using OR (preferred for simple cases)
coastal = Student & [{'home_state': 'CA'}, {'home_state': 'NY'}]

# Using union (equivalent but more verbose)
coastal = (Student & {'home_state': 'CA'}) + (Student & {'home_state': 'NY'})

When Union is Necessary

Use union when:

  1. Conditions involve different related tables

  2. Queries have different computation paths

  3. You’re combining results from separate query expressions

# This requires union—can't express with simple OR
honors_or_dean_list = (Student & HonorsProgram) + (Student & DeansList)

Best Practices

1. Verify Entity Type Compatibility

Before unioning, confirm both operands represent the same entity:

# Check primary keys match
print(query_a.primary_key)
print(query_b.primary_key)

2. Use Projection for Cleaner Results

Project to common attributes when operands have different secondary attributes:

# Project both to same structure
result = query_a.proj('name') + query_b.proj('name')

3. Consider Alternatives for Simple Cases

For simple OR conditions, restriction with a list is cleaner:

# Instead of union for simple cases
Student & 'home_state IN ("CA", "NY", "TX")'

4. Be Aware of Left Precedence

Remember that for overlapping primary keys, secondary attributes come from the left operand:

# order_a's attributes take precedence for shared keys
result = order_a + order_b

Summary

The union operator combines rows from compatible tables:

  1. Syntax: TableA + TableB combines rows from both tables

  2. Requirements: Same primary key and entity type; semantically compatible attributes

  3. Deduplication: Each primary key appears once; left operand takes precedence

  4. Use cases: OR logic across different relationships, combining filtered subsets

  5. Alternatives: For simple OR on the same table, use list restriction instead

Union is useful for combining query results that represent the same entity type from different filtering paths.

Practice Exercises

Exercise 1: Simple Union

Task: Find all students majoring in either Math or CS.

math_majors = Student & (StudentMajor & {'dept': 'MATH'})
cs_majors = Student & (StudentMajor & {'dept': 'CS'})
math_or_cs = math_majors + cs_majors

Exercise 2: Union Across Relationships

Task: Find students who either have a declared major or are enrolled in at least one course.

students_with_major = Student & StudentMajor
students_enrolled = Student & Enroll
active_students = students_with_major + students_enrolled

Exercise 3: Union with Projection

Task: Get names of students from western states (CA, OR, WA).

western_students = (
    (Student & {'home_state': 'CA'}).proj('first_name', 'last_name') +
    (Student & {'home_state': 'OR'}).proj('first_name', 'last_name') +
    (Student & {'home_state': 'WA'}).proj('first_name', 'last_name')
)

# Or more simply:
western_students = (Student & 'home_state IN ("CA", "OR", "WA")').proj('first_name', 'last_name')