Skip to article frontmatterSkip to article content

Attachnments

This notebooks demonstrates the storaage and retrieval of complex datatypes (blobs) and file attachments in DataJoint.

A blob refers to an attribute in a table that can store complex data structures such as numeric arrays.

An attachment refers to an attribute that can store an entire file with its filename, etc.

Both blobs and attachments can be stored directly in the tables of the relational database or in configurable external “stores” such as network-attached storage servers or object storage systems such Amazon S3 and Minio.

Many of these features existing in prior releases of datajoint but have been substantially expanded in version 0.12.0.

%matplotlib inline
from IPython import display
from matplotlib import pyplot as plt
import os
import imageio
import requests
from ipywidgets import Image
import ipywidgets
import numpy as np

import datajoint as dj
dj.__version__
'0.12.dev7'

Configure stores

The following is a configuration defining two external stores. This should only be done once for all users and the configuration file must be saved and provided to all users.

The first store is named "shared" and is hosted on an S3 endpoint.

The second store is named "local" and it uses the local path ./dj-store.

Now these repositories can be used for blobs and attachments.

## Storage configuration

# set up stores
dj.config['stores'] = {
    'shared': dict(
        protocol='s3',
        endpoint='localhost:9000',
        access_key='datajoint',
        secret_key='datajoint',
        bucket='datajoint-demo',        
        location=''
    ),       
    'local': {  # store in files
        'protocol': 'file',
        'location': os.path.abspath('./dj-store')
    }}
# create a schema for this demo
schema = dj.schema('test_attach')
Connecting dimitri@localhost:3306
schema.drop()  # drop if exists to create anew
Proceed to delete entire schema `test_attach`? [yes, No]: yes
# create a schema for this demo
schema = dj.schema('test_attach')

A Minimal example of blobs and configurable blobs

Let’s declear the table Test with blobs and attachments stored intrnally and externally.

@schema
class Test(dj.Manual):
    definition = """
    # Test blob and attachments
    id : int
    ---
    b0 : longblob       # a python object stored internally in the table
    b1 : blob@shared    # a python object stored on S3
    b2 : blob@local     # a python object store on the file system
    a0 : attach         # a file attachment stored internally in the table
    a1 : attach@shared  # a file attachment stored on s3
    a2 : attach@local   # a file attachment stored on the file system
    """
# Create three numpy arrays as save them in different files
q0, q1, q2 = np.random.randn(3,4), np.random.randn(7), np.random.randn(2, 3, 4)
f0, f1, f2 = './outfile0.npy', './outfile1.npy', './outfile2.npy'
np.save(f0, q0)
np.save(f1, q1)
np.save(f2, q2)
Test.delete()
About to delete:
Nothing to delete
# insert the blobs and the attachments into the table
Test.insert1(dict(id=1, b0=q0, b1=q1, b2=q2, a0=f0, a1=f1, a2=f2))
Test()
Loading...
# delete the attached files
os.remove(f0)
os.remove(f1)
os.remove(f2)
# now fetch them and verify that they retrieved correctly
result = Test.fetch(as_dict=True)
result[0]['b0']
array([[-0.03270063, 1.48132664, -0.47460454, 0.53584166], [-0.54839705, 0.06459001, -0.12448809, 0.1326574 ], [ 0.68458856, 0.36013697, 0.00544484, 0.30176943]])
q0
array([[-0.03270063, 1.48132664, -0.47460454, 0.53584166], [-0.54839705, 0.06459001, -0.12448809, 0.1326574 ], [ 0.68458856, 0.36013697, 0.00544484, 0.30176943]])
np.array_equal(q0, result[0]['b0'])
True
result[0]['a1']
PosixPath('outfile1.npy')
np.load(result[0]['a1'])
array([ 1.24891915, -1.16716244, -0.55109167, -0.45738901, -0.68241222, -0.23278837, 1.23499246])
q1
array([ 1.24891915, -1.16716244, -0.55109167, -0.45738901, -0.68241222, -0.23278837, 1.23499246])
schema.external
External file tables for schema `test_attach`: "shared" s3:" "local" file:/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store"
schema.external['local']
Loading...
schema.external['shared']
Loading...
schema.external['local'].fetch_external_paths()
[(UUID('1faa3094-d718-075c-9c94-f1a6c0505c55'), PurePosixPath('/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store/test_attach/1f/aa/1faa3094d718075c9c94f1a6c0505c55.outfile2.npy')), (UUID('784db39e-7622-9539-a896-d3afb854250c'), PurePosixPath('/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store/test_attach/78/4d/784db39e76229539a896d3afb854250c'))]
schema.external['shared'].fetch_external_paths()
[(UUID('013ac743-ddf5-afc8-bf7d-40fcf3754758'), PurePosixPath('test_attach/01/3a/013ac743ddf5afc8bf7d40fcf3754758.outfile1.npy')), (UUID('90dccd2e-fb2a-02c5-8556-f494b7a4ac57'), PurePosixPath('test_attach/90/dc/90dccd2efb2a02c58556f494b7a4ac57'))]
schema.external['shared'].used()
Loading...
schema.external['local'].used()
Loading...
schema.external['shared'].unused()
Loading...
Test.delete()
About to delete:
`test_attach`.`test`: 1 items
Proceed? [yes, No]: yes
Committed.
schema.external['shared']
Loading...
schema.external['shared'].delete()  # deleted
100%|██████████| 2/2 [00:00<00:00, 74.11it/s]
[]
# cleanup
for s in schema.external.values():
    s.delete()
0it [00:00, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 70.79it/s]

Lookup of images on the web

We create a lookup table, WebImage to point to some images available on the web

@schema
class WebImage(dj.Lookup):
    definition = """
    # A reference to a web image
    image_number : int
    ---
    image_name : varchar(30)
    image_description : varchar(1024)
    image_url : varchar(1024)
    
    unique index(image_name)
    """
    contents = [
        (0, "pyramidal", 
         
         'Coronal section containing the chronically imaged pyramidal neuron "dow" '\
         '(visualized by green GFP) does not stain for GABA (visualized by antibody staining in red). '\
         'Confocal image stack, overlay of GFP and GABA channels. Scale bar: 100 um',
         
         "https://upload.wikimedia.org/wikipedia/commons/d/dc/PLoSBiol4.e126.Fig6fNeuron.jpg"
        ),
        (1, "striatal", 
         
         "Mouse spiny striatal projection neuron expressing a transgenic fluorescent protein "\
         "(colored yellow) delivered by a recombinant virus (AAV). "\
         "The striatal interneuron are stainerd in green for the neurokinin-1 receptor.",
         
         "https://upload.wikimedia.org/wikipedia/commons/e/e8/Striatal_neuron_in_an_interneuron_cage.jpg"
        )
    ]

Preview the images directly from the web

Image.from_url((WebImage & 'image_number=0').fetch1('image_url'))
Loading...
Image.from_url((WebImage & 'image_number=1').fetch1('image_url'))
Loading...

Define a table with attachments

Now we can use the stores to define attachment attributes in the form attribute_name : attach@store # comment where the store is either @local or @shared as defined above.

Let’s define the table OriginalFile to automatically download and attach files from WebImage and stores the attachments in the shared store.

@schema
class OriginalFile(dj.Imported):
    definition = """
    -> WebImage
    ---
    image_file : attach@shared
    """
    
    def make(self, key):
        # get the URL
        url = (WebImage & key).fetch1('image_url')
        
        # download the file from the web
        local_file = os.path.join(os.path.abspath('.'), url.split('/')[-1])
        with open(local_file, 'wb') as f:
            f.write(requests.get(url).content)
            
        # attach the file
        self.insert1(dict(key, image_file=local_file))
        
        # delete the downloaded file
        os.remove(local_file)
dj.Diagram(schema)
Loading...
# perform the download
OriginalFile.populate()
OriginalFile()
Loading...
# preview downloaded attachment
file = (OriginalFile & 'image_number=1').fetch1('image_file')
Image.from_file(file)
Loading...
os.remove(file)

Extract images into blobs

Now let’s define another class that extracts imags from attached files and stores as blobs in the local store.

# Declare a table with a configurable blob
@schema
class Slide(dj.Computed):
    definition = """
    -> OriginalFile
    ---
    image_array : blob@local  # array in specified store
    """
    
    def make(self, key):
        # get the attached file
        file = (OriginalFile & key).fetch1('image_file')
        
        # save image data
        self.insert1(dict(key, image_array=imageio.imread(file)))
        
        # remove the downloaded file
        os.remove(file)
Slide.populate()
Slide()
Loading...
schema.external
External file tables for schema `test_attach`: "shared" s3:" "local" file:/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store"
schema.external['local']
Loading...
# Plot image form a blob
plt.imshow((Slide & 'image_number=0').fetch1('image_array'));
<Figure size 432x288 with 1 Axes>

Caching

By default, the data from blobs and attachments are retrieved from remote stores with every fetch command. For repeated queries, a cache folder may be specified to improve performance and reduce cost of operations. After the first fetch of a given blob or attachment, it will be read from the cache.

# configure the cache
dj.config['cache'] = './dj-cache'
# clear the cache for the timing test
import shutil
if os.path.isdir(dj.config['cache']):
    shutil.rmtree(dj.config['cache'])
%%timeit -n1 -r1

# first time no cache
files = OriginalFile.fetch('image_file')
17.7 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
%%timeit -n1 -r1

# now with cache
files = OriginalFile.fetch('image_file')
11.4 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Deleting

Deleting from tables using external storage is just as simple and transaction-safe as with all other kinds of attributes. Simply use the delete method:

schema.external
External file tables for schema `test_attach`: "shared" s3:" "local" file:/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store"
schema.external['shared']
Loading...
schema.external['shared'].unused()
Loading...
(WebImage & 'image_number=0').delete()
About to delete:
`test_attach`.`__slide`: 1 items
`test_attach`.`_original_file`: 1 items
`test_attach`.`#web_image`: 1 items
Proceed? [yes, No]: yes
Committed.
schema.external['shared'].unused()
Loading...

Deleting

For the sake of performance, deleting from the data tables does not remove the data from external storage.

The delete method of the external table deletes its unused entries and their corresponding external files.

schema.external
External file tables for schema `test_attach`: "shared" s3:" "local" file:/home/dimitri/dev/db-programming-with-datajoint/notebooks/dj-store"

You may cleanup the external table using its delete method. It is a transaction-safe operation and can be performed at any time.

schema.external['local'].delete()
100%|██████████| 1/1 [00:00<00:00, 47.74it/s]
[]
schema.external['shared'].delete()
100%|██████████| 1/1 [00:00<00:00, 66.55it/s]
[]
for s in schema.external.values():
    s.delete()
0it [00:00, ?it/s]
0it [00:00, ?it/s]
schema.external['shared'].used()
Loading...
schema.external['shared'].unused()
Loading...