Skip to article frontmatterSkip to article content

Reverse Engineering

This and other notebooks for this video series are found at

https://github.com/vathes/db-programming-with-datajoint

This notebook describes how to work with database schemas without access to the original code that generated the schema. These situations often arise when the database is created by another user who has not shared the generating code yet or when the database schema is created from a programming language other than Python.

import datajoint as dj

Working with schemas and their modules

Typically a DataJoint schema is created as a dedicated Python module. This module defines a schema object that is used to link classes declared in the module to tables in the database schema. As an example, examine the university module in this folder (./university.py).

You may then import the module to interact with its tables:

import university as uni
Connecting dimitri@localhost:3306
dj.Diagram(uni)
Loading...

Note that dj.Diagram can extract the diagram from a schema object or from a python module containing its schema object, lending further support to the convention of one-to-one correspondence between database schemas and Python modules in a datajoint project:

dj.Diagram(uni)

is equvalent to

dj.Diagram(uni.schema)
# students without majors 
uni.Student - uni.StudentMajor
Loading...

Spawning missing classes

Now imagine that you do not have access to university.py or you do not have its latest version. You can still connect to the database schema but you will not have classes declared to interact with it.

So let’s start over in this scenario.

!!! Restart the kernel here to remove the previous class definitions !!!

You can may use the dj.list_schemas function (new in datajoint 0.12.0) to list the names of database schemas available to you.

import datajoint as dj
dj.list_schemas()
Connecting dimitri@localhost:3306
['dimitri_alter', 'dimitri_attach', 'dimitri_blob', 'dimitri_blobs', 'dimitri_nphoton', 'dimitri_schema', 'dimitri_university', 'dimitri_uuid', 'university']

Just as with a new schema, we start by creating a schema object to connect to the chosen database schema:

schema = dj.schema('dimitri_university')

If the schema already exists, dj.schema is initialized as usual and you may plot the schema diagram. But instead of seeing class names, you will see the raw table names as they appear in the database.

# let's plot its diagram
dj.Diagram(schema)
Loading...

You may view the diagram but, at this point, there is now way to interact with these tables. A similar situation arises when another developer has added new tables to the schema but has not yet shared the updated module code with you. Then the diagram will show a mixture of class names and database table names.

Now you may use the schema.spawn_missing_classes method to spawn classes into the local namespace for any tables missing their classes:

schema.spawn_missing_classes()
dj.Di(schema)
Loading...

Now you may interact with these tables as if they were declared right here in this namespace:

# students without majors 
Student - StudentMajor
Loading...

!!! Restart the kernel here to remove the previous class definitions !!!

Creating a virtual module

Now spawn_missing_classes creates the new classes in the local namespace. However, it is often more convenient to import a schema with its python module, equivalent to the python the python command

import university as uni

We can mimick this import without having access to university.py using the create_virtual_module function:

import datajoint as dj

uni = dj.create_virtual_module('university', 'dimitri_university')
Connecting dimitri@localhost:3306

Now uni behaves as an imported module complete with the schema object and all the table classes.

dj.Di(uni)
Loading...
uni.Student - uni.StudentMajor
Loading...

dj.create_virtual_module takes optional arguments.

First, create_schema=False assures that an error is raised when the schema does not already exist. Set it to True if you want to create an empty schema.

dj.create_virtual_module('what', 'nonexistent')
---------------------------------------------------------------------------
DataJointError                            Traceback (most recent call last)
<ipython-input-4-b316110c06f3> in <module>
----> 1 dj.create_virtual_module('what', 'nonexistent')

~/dev/datajoint-python/datajoint/schema.py in create_virtual_module(module_name, schema_name, create_schema, create_tables, connection)
    242     """
    243     module = types.ModuleType(module_name)
--> 244     _schema = Schema(schema_name, create_schema=create_schema, create_tables=create_tables, connection=connection)
    245     _schema.spawn_missing_classes(context=module.__dict__)
    246     module.__dict__['schema'] = _schema

~/dev/datajoint-python/datajoint/schema.py in __init__(self, schema_name, context, connection, create_schema, create_tables)
     65                 raise DataJointError(
     66                     "Database named `{name}` was not defined. "
---> 67                     "Set argument create_schema=True to create it.".format(name=schema_name))
     68             else:
     69                 # create database

DataJointError: Database named `nonexistent` was not defined. Set argument create_schema=True to create it.

The other optional argument, create_tables=False is passed to the schema object. It prevents the use of the schema obect of the virtual module for creating new tables in the existing schema. This is a precautionary measure since virtual modules are often used for completed schemas. You may set this argument to True if you wish to add new tables to the existing schema. A more common approach in this scenario would be to create a new schema object and to use the spawn_missing_classes function to make the classes available.

However, you if do decide to create new tables in an existing tables using the virtual module, you may do so by using the schema object from the module as the decorator for declaring new tables:

uni = dj.create_virtual_module('university.py', 'dimitri_university', create_tables=True)
@uni.schema
class Example(dj.Manual):
    definition = """
    -> uni.Student 
    ---
    example : varchar(255)
    """
dj.Di(uni)
Loading...
uni.Enroll.describe();
-> uni.Student
-> uni.Section

uni.save()