migration

Migration

Let's say we have created a blog post which look like this:

>>> from mongokit import *
>>> con = Connection()

class BlogPost(Document):
    structure = {
        "blog_post":{
            "title": unicode,
            "created_at": datetime,
            "body": unicode,
        }
    }
    default_values = {'blog_post.created_at':datetime.utcnow}

Let's create some blog posts:

>>> for i in range(10):
...     con.test.tutorial.BlogPost({'title':u'hello %s' % i, 'body': u'I the post number %s' % i}).save()

Now, developpment goes on and we add a 'tags' field in our BlogPost:

class BlogPost(Document):
    structure = {
        "blog_post":{
            "title": unicode,
            "created_at": datetime,
            "body": unicode,
            "tags":  [unicode],
        }
    }
    default_values = {'blog_post.created_at':datetime.utcnow}

We're gonna be in trouble when we'll save a fetched document because the structure don't match:

>>> blog_post = con.test.tutorial.BlogPost.find_one()
>>> blog_post['blog_post']['title'] = u'Hello World'
>>> blog_post.save()
Traceback (most recent call last):
    ...
StructureError: missed fields : ['tags']

If we want to fix this issue, we have to add the 'tags' field manually in all BlogPost of the database:

>>> con.test.tutorial.update({'blog_post':{'$exists':True}, 'blog_post.tags':{'$exists':False}},
...    {'$set':{'blog_post.tags':[]}}, multi=True)

and now we can save our blog_post:

>>> blog_post.reload()
>>> blog_post['blog_post']['title'] = u'Hello World'
>>> blog_post.save()

Lazy migration

IMPORTANT: You cannot use this feature if use_schemaless is set to True

Mongokit provides a convenient way to set migration rules an apply them lazily. Here's how to do, we use the previous example.

Let's create a BlogPostMigration which inherit from DocumentMigration:

class BlogPostMigration(DocumentMigration):
    def migration01__add_tags_field(self):
        self.target = {'blog_post':{'$exists':True}, 'blog_post.tags':{'$exists':False}}
        self.update = {'$set':{'blog_post.tags':[]}}

How does it work ? All migration rules are simple method in the BlogPostMigration. They must begin by migration and be numeroted (so they can be applied in certain order). The rest of the name should describes the rules. Here, we create our first rule (migration01) which add a 'tags' field into our BlogPost.

Then you must set two attribute : self.target and self.update. There's both mongodb regular query.

self.target will tell mongokit which document will match this rule. So, any document which match this query will would get a migration.

self.update is a mongodb update query with modifiers. This will describes what update shoud be apply to the matching document.

Now that our BlogPostMigration is created, we have to tell Mongokit to what document thoses migration rules should be applied. To do that, we have to set the migration_handler in BlogPost:

class BlogPost(Document):
    structure = {
        "blog_post":{
            "title": unicode,
            "created_at": datetime,
            "body": unicode,
            "tags": [unicode],
        }
    }
    default_values = {'blog_post.created_at':datetime.utcnow}
    migration_handler = BlogPostMigration

Each time that an error is raised while validating a document, migration rules are applied to the object and the document is reloaded.

CAUTION: if migration_handler is set then skip_validation is desactivated. Validation must be on to allow lazy migration.

Bulk migration

Lazy migration is usefull if you have many document to migrate because update will lock the database. But sometime, you might want to make a migration on few documents and you don't want slow down your application with validation. You should then use bulk migration.

Bulk migration work like lazy migration but DocumentMigration method must start with allmigration. Because lazy migration add document _id to self.target, with bulk migration, you should provide more informations on self.target. Here's an example of bulk migration, finally, we wan't to remove the tags field from BlogPost:

class BlogPost(Document):
    structure = {
        "blog_post":{
            "title": unicode,
            "creation_date": datetime,
            "body": unicode,
        }
    }
    default_values = {'blog_post.created_at':datetime.utcnow}

Note that we don't need to add the migration_hanlder, it is required only for lazy migration.

Let's edit the BlogPostMigration:

class BlogPostMigration(DocumentMigration):
    def allmigration01_remove_tags(self):
        self.target = {'blog_post.tags':{'$exists':True}}
        self.update = {'$unset':{'blog_post.tags':[]}}

To apply the migration, instanciate the BlogPostMigration and call the migrate_all method:

>>> migration = BlogPostMigration(BlogPost)
>>> migration.migrate_all(collection=con.test.tutorial)

NOTE: because migration_* methods are not called with migrate_all(), you can mix migration_* and allmigration_* methods.

Migration status

Once all your documents have been migrated, some migration rules could become deprecated. To know wich rules are deprecated, use the get_deprecated method:

>>>> migration = BlogPostMigration(BlogPost)
>>> migration.get_deprecated(collection=con.test.tutorial)
{'deprecated':['allmigration01__remove_tags'], 'active':['migration02__rename_created_at']}

Here, we can remove the rule allmigration01__remove_tags.

Advanced migration

Lazy migration

Sometime, we might want to build more advanced migration. For instance, say you want to copy a field value into another field, you can have access to the current doc value via self.doc. In the following example, we want to add an update_date field and copy the creation_date value into it:

class BlogPostMigration(DocumentMigration):
    def migration01__add_update_field_and_fill_it(self):
        self.target = {'blog_post.update_date':{'$exists':False}, 'blog_post':{'$exists':True}}
        self.update = {'$set':{'blog_post.update_date': self.doc['blog_post']['creation_date']}}

Advanced and bulk migration

If you want to do the same thing with bulk migration, things are a little differents:

class BlogPostMigration(DocumentMigration):
    def allmigration01__add_update_field_and_fill_it(self):
        self.target = {'blog_post.update_date':{'$exists':False}, 'blog_post':{'$exists':True}}
        if not self.status:
            for doc in self.collection.find(self.target):
                self.update = {'$set':{'blog_post.update_date': doc['blog_post']['creation_date']}}
                self.collection.update(self.target, self.update, multi=True, safe=True)

In this example, the method allmigration01__add_update_field_and_fill_it will directly modify the database and will be called by get_deprecated(). But calling get_deprecated() should not arm the database so, we need to specify what portion of the code must be ignored when calling get_deprecated(). This explain the second line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migration

Migration

Lazy migration

Bulk migration

Migration status

Advanced migration

Lazy migration

Advanced and bulk migration

Clone this wiki locally