Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Query Operators

DataJoint provides five operators for building queries. These operators form a complete query language—any question you can ask of your data can be expressed using these five tools.

OperatorSymbolPurpose
Restriction&, -Filter entities by conditions
Projection.proj()Select and compute attributes
Join*Combine related entities
Aggregation.aggr()Summarize related data
Union+Combine entity sets of the same type

Core Principles

Before diving into the operators, two principles guide how DataJoint queries work:

Entity Integrity

Every query result is a well-formed entity set with:

  • A clear entity type (what kind of things are in the result)

  • A defined primary key (how entities are uniquely identified)

  • Attributes that directly describe each entity

This means you always know what you’re working with. A query on Student entities returns Student entities—not some ambiguous collection of data.

Algebraic Closure

The output of any operator is itself a valid entity set that can be used as input to another operator. This enables unlimited composition:

# Chain operators freely
result = ((Student & 'gpa > 3.5') * Enrollment).proj('course_name')

Each intermediate step produces a valid result you can inspect, debug, or use further.

Quick Reference

Here’s a summary of what each operator does and when to use it:

Restriction (&, -)

Use when: You want to filter entities based on conditions.

# Keep entities matching a condition
young_mice = Mouse & 'age < 30'

# Keep entities matching values from another table (semijoin)
mice_with_sessions = Mouse & Session

# Exclude entities (antijoin)
mice_without_sessions = Mouse - Session

Result: Same entity type, same primary key, fewer entities.


Projection (.proj())

Use when: You want to select, rename, or compute attributes.

# Select specific attributes
Mouse.proj('sex', 'date_of_birth')

# Rename an attribute  
Mouse.proj(dob='date_of_birth')

# Compute a new attribute
Mouse.proj(age_days='DATEDIFF(NOW(), date_of_birth)')

Result: Same entity type, same primary key, same number of entities, different attributes.


Join (*)

Use when: You want to combine information from related tables.

# Combine mouse info with their sessions
Mouse * Session

# Chain multiple joins
Mouse * Session * Scan

Result: Combined entity type, combined primary key, only matching combinations.


Aggregation (.aggr())

Use when: You want to summarize related data for each entity.

# Count sessions per mouse
Mouse.aggr(Session, n_sessions='COUNT(*)')

# Average score per student
Student.aggr(Grade, avg_grade='AVG(score)')

Result: Same entity type as the first operand, enriched with summary attributes.


Union (+)

Use when: You want to combine entities from compatible tables.

# Combine two groups of mice
experimental_mice = Mouse & 'group="experimental"'
control_mice = Mouse & 'group="control"'
all_selected = experimental_mice + control_mice

Result: Same entity type, combined entities (duplicates removed).

How DataJoint Differs from SQL

If you’re familiar with SQL, here are the key differences:

AspectSQLDataJoint
ResultsCan produce arbitrary column setsAlways produces well-formed entity sets
JoinsAny columns can be joinedOnly semantically related attributes
AggregationTransforms entity typeEnriches existing entities
Order of operationsHidden execution orderExplicit, left-to-right
Primary keyCan be lost in queriesAlways preserved or well-defined

Semantic Joins

DataJoint’s join operator (*) only matches attributes that share a semantic relationship—they must trace back to the same source through the schema’s dependencies. This prevents accidental joins on coincidentally named columns.

# DataJoint: Only joins on schema-defined relationships
Session * Scan  # Works: scan depends on session

# SQL NATURAL JOIN: Joins on ANY matching column names
# Can produce wrong results if tables share unrelated column names

Aggregation Preserves Identity

SQL’s GROUP BY creates a new entity type. DataJoint’s .aggr() enriches existing entities:

# DataJoint: Still Mouse entities, now with session counts
Mouse.aggr(Session, n='COUNT(*)')

# SQL: Creates "mouse summary" entities, losing direct connection to Mouse
# SELECT mouse_id, COUNT(*) FROM session GROUP BY mouse_id

Building Complex Queries

The power of these operators comes from composition. Build complex queries step by step:

# Goal: Find adult mice with more than 5 sessions, showing their average session duration

# Step 1: Filter to adult mice
adults = Mouse & 'age > 90'

# Step 2: Add session statistics
with_stats = adults.aggr(
    Session,
    n_sessions='COUNT(*)',
    avg_duration='AVG(duration)'
)

# Step 3: Filter to mice with many sessions
result = with_stats & 'n_sessions > 5'

Each step produces a valid entity set you can examine:

adults  # Check: which mice are adults?
with_stats  # Check: what are the session counts?
result  # Final answer

Next Steps

The following chapters explore each operator in detail:

  • Restriction — Filtering with conditions, semijoins, and antijoins

  • Projection — Selecting, renaming, and computing attributes

  • Join — Combining related tables

  • Union — Merging compatible entity sets

  • Aggregation — Computing summaries across related data

Summary

DataJoint’s five query operators provide a complete, composable query language:

  1. Restriction (&, -) — Filter entities

  2. Projection (.proj()) — Shape attributes

  3. Join (*) — Combine related data

  4. Aggregation (.aggr()) — Summarize relationships

  5. Union (+) — Merge entity sets

Every operation preserves entity integrity, ensuring results are always meaningful and can be used in further operations. This makes queries predictable, debuggable, and composable.