Lookup Tables - The DataJoint Book

Lookup tables are a special kind of tables in DataJoint that are used to store static data or enumerations that don’t change frequently.

The contents of lookup tables is considered part of schema design: they are populated at the time of table declaration. When a new instance of the database is initiated, lookup tables get populated immediately and the database is still considered “empty.”

Although technically the contents of lookup tables can change overtime, this should be considered a rare even, part of schema design alteration and not a normal database operation.

Lookup tables are particularly useful for maintaining data consistency and enforcing referential integrity in your database.

Lookup tables are created by subclassing the dj.Lookup class. Rather than using insert commands to populate lookup tables, its contents should be provided in the contents member of the table class. The @schema decorator declares the table and populates its from the contents class member.

Examples¶

The following are some examples of lookup tables.

import datajoint as dj
schema = dj.Schema('learning_lookups')

[2024-08-27 03:02:03,038][INFO]: Connecting root@localhost:3306
[2024-08-27 03:02:03,055][INFO]: Connected root@localhost:3306

Example: Letter grades¶

Consider the lookup table that converts letter grades into their numerical point values. The primary key is the letter grade. The points are of type decimal(3,2).

@schema
class LetterGrade(dj.Lookup):
    definition = """
    grade_letter: char(2)  # Letter grade
    ----
    grade_point = null: decimal(3,2) unsigned     # Corresponding grade point
    """
    contents = [
        ('A', 4.0),
        ('A-', 3.67),
        ('B+', 3.33),
        ('B', 3.00),
        ('B-', 2.67),
        ('C+', 2.33),
        ('C', 2.00),
        ('C-', 1.67),
        ('D+', 1.33),
        ('D', 1.00),
        ('F', 0.00),
        ('I', None)
    ]

LetterGrade()

Note that the point value is nullable, it can be set to NULL, which is represented by None in Python.

Example: Gene Information in Genomics Research¶

In genetics or genomics research, a lookup table can hold gene information to ensure consistency across datasets.

@schema
class Gene(dj.Lookup):
    definition = """
    gene_name: varchar(32)  # Gene symbol or name
    ----
    full_name: varchar(255)  # Full name or description of the gene
    """
    contents = [
        ('BRCA1', 'Breast cancer type 1 susceptibility protein'),
        ('TP53', 'Tumor protein p53')
    ]

Example: Mouse Brain Regions¶

Here is how you can create a lookup table for mouse brain regions according to the Allen Institute Common Coordinate Framework Wang et al., 2020 using DataJoint:

@schema
class BrainRegion(dj.Lookup):
    definition = """
    region_id: int               # Unique identifier for the brain region (e.g., from the Allen Institute CCF)
    ----
    acronym: varchar(32)         # Acronym for the brain region
    name: varchar(255)           # Full name of the brain region
    parent_id: int               # Parent region ID for hierarchical structure (0 if no parent)
    structure_order: int         # Structure order in the hierarchy
    color_hex: varchar(7)        # Hex color code for visualizing the region
    """
    
    contents = [
        (997, 'root', 'Root', 0, 1, '#FFFFFF'),
        (688, 'OLF', 'Olfactory areas', 997, 2, '#A9A9A9'),
        (315, 'HPF', 'Hippocampal formation', 997, 3, '#FF0000'),
        (549, 'CTXsp', 'Cortical subplate', 997, 4, '#0000FF'),
        (567, 'STR', 'Striatum', 997, 5, '#00FF00'),
        (803, 'PAL', 'Pallidum', 997, 6, '#FFFF00'),
        (961, 'TH', 'Thalamus', 997, 7, '#FFA500'),
        (1097, 'HY', 'Hypothalamus', 997, 8, '#800080'),
        (313, 'MB', 'Midbrain', 997, 9, '#008000'),
        (803, 'HB', 'Hindbrain', 997, 10, '#00FFFF'),
        # Add more regions as needed
    ]

BrainRegion()

Example: Table of Elements¶

For a more substantial example, consider the lookup table for the chemical elements.

# Splitting the data into smaller parts to reduce errors

# First set of elements
part1 = {
    "atomic_number": list(range(1, 31)),
    "element": ["Hydrogen", "Helium", "Lithium", "Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine", "Neon",
                "Sodium", "Magnesium", "Aluminum", "Silicon", "Phosphorus", "Sulfur", "Chlorine", "Argon",
                "Potassium", "Calcium", "Scandium", "Titanium", "Vanadium", "Chromium", "Manganese", "Iron", 
                "Cobalt", "Nickel", "Copper", "Zinc"],
    "symbol": ["H", "He", "Li", "Be", "B", "C", "N", "O", "F", "Ne", 
               "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar", 
               "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", 
               "Co", "Ni", "Cu", "Zn"],
    "atomic_weight": [1.008, 4.0026, 6.94, 9.0122, 10.81, 12.011, 14.007, 15.999, 18.998, 20.180, 
                      22.990, 24.305, 26.982, 28.085, 30.974, 32.06, 35.45, 39.948, 
                      39.098, 40.078, 44.956, 47.867, 50.942, 51.996, 54.938, 55.845, 
                      58.933, 58.693, 63.546, 65.38],
    "electron_orbitals": ["1s1", "1s2", "[He] 2s1", "[He] 2s2", "[He] 2s2 2p1", "[He] 2s2 2p2", "[He] 2s2 2p3", "[He] 2s2 2p4", "[He] 2s2 2p5", "[He] 2s2 2p6", 
                          "[Ne] 3s1", "[Ne] 3s2", "[Ne] 3s2 3p1", "[Ne] 3s2 3p2", "[Ne] 3s2 3p3", "[Ne] 3s2 3p4", "[Ne] 3s2 3p5", "[Ne] 3s2 3p6", 
                          "[Ar] 4s1", "[Ar] 4s2", "[Ar] 3d1 4s2", "[Ar] 3d2 4s2", "[Ar] 3d3 4s2", "[Ar] 3d5 4s1", "[Ar] 3d5 4s2", "[Ar] 3d6 4s2", 
                          "[Ar] 3d7 4s2", "[Ar] 3d8 4s2", "[Ar] 3d10 4s1", "[Ar] 3d10 4s2"]
}

# Second set of elements
part2 = {
    "atomic_number": list(range(31, 61)),
    "element": ["Gallium", "Germanium", "Arsenic", "Selenium", "Bromine", "Krypton", "Rubidium", "Strontium", "Yttrium", 
                "Zirconium", "Niobium", "Molybdenum", "Technetium", "Ruthenium", "Rhodium", "Palladium", "Silver", 
                "Cadmium", "Indium", "Tin", "Antimony", "Tellurium", "Iodine", "Xenon", "Cesium", "Barium", "Lanthanum", 
                "Cerium", "Praseodymium", "Neodymium"],
    "symbol": ["Ga", "Ge", "As", "Se", "Br", "Kr", "Rb", "Sr", "Y", 
               "Zr", "Nb", "Mo", "Tc", "Ru", "Rh", "Pd", "Ag", 
               "Cd", "In", "Sn", "Sb", "Te", "I", "Xe", "Cs", "Ba", "La", 
               "Ce", "Pr", "Nd"],
    "atomic_weight": [69.723, 72.63, 74.922, 78.971, 79.904, 83.798, 85.468, 87.62, 88.906, 91.224, 92.906, 95.95, 
                      98, 101.07, 102.91, 106.42, 107.87, 112.41, 114.82, 118.71, 121.76, 127.6, 126.9, 131.29, 
                      132.91, 137.33, 138.91, 140.12, 140.91, 144.24],
    "electron_orbitals": ["[Ar] 3d10 4s2 4p1", "[Ar] 3d10 4s2 4p2", "[Ar] 3d10 4s2 4p3", "[Ar] 3d10 4s2 4p4", "[Ar] 3d10 4s2 4p5", 
                          "[Ar] 3d10 4s2 4p6", "[Kr] 5s1", "[Kr] 5s2", "[Kr] 4d1 5s2", "[Kr] 4d2 5s2", "[Kr] 4d3 5s2", "[Kr] 4d4 5s2", 
                          "[Kr] 4d5 5s2", "[Kr] 4d6 5s2", "[Kr] 4d7 5s2", "[Kr] 4d8 5s2", "[Kr] 4d10", "[Kr] 4d10 5s2", 
                          "[Kr] 4d10 5s2 5p1", "[Kr] 4d10 5s2 5p2", "[Kr] 4d10 5s2 5p3", "[Kr] 4d10 5s2 5p4", "[Kr] 4d10 5s2 5p5", 
                          "[Kr] 4d10 5s2 5p6", "[Xe] 6s1", "[Xe] 6s2", "[Xe] 4f1 5d1 6s2", "[Xe] 4f2 6s2", "[Xe] 4f3 6s2", 
                          "[Xe] 4f4 6s2"]
}

@schema
class Element(dj.Lookup):
    definition = """
    atomic_number : tinyint unsigned 
    ---
    symbol  : char(2)
    element : varchar(30)
    atomic_weight  : float 
    electron_orbitals : varchar(20)
    """
    contents = [dict(zip(part1, r)) for r in zip(*part1.values())] + [
        dict(zip(part2, r)) for r in zip(*part2.values())
        ]

Element()

Why Use Lookup Tables?¶

Consistency: Lookup tables enforce consistent use of values across your database, reducing the risk of data entry errors.
Referential Integrity: By linking other tables to a lookup table, you ensure that only valid, predefined values are used.
Ease of Updates: If a standard value needs to change (e.g., correcting a typo in a species name), it only needs to be updated in one place.

In summary, lookup tables are powerful tools in DataJoint for maintaining the consistency and integrity of your data, especially when dealing with standardized or repeated information across multiple tables in your schema.

References¶

Wang, Q., Ding, S.-L., Li, Y., Royall, J., Feng, D., Lesnar, P., Graddis, N., Naeemi, M., Facer, B., Ho, A., Dolbeare, T., Blanchard, B., Dee, N., Wakeman, W., Hirokawa, K. E., Szafer, A., Sunkin, S. M., Oh, S. W., Bernard, A., … Ng, L. (2020). The Allen Mouse Brain Common Coordinate Framework: A 3D Reference Atlas. Cell, 181(4), 936-953.e20. 10.1016/j.cell.2020.04.007