-
Notifications
You must be signed in to change notification settings - Fork 132
migration
Let's say we have created a blog post which look like this:
>>> from mongokit import *
>>> con = Connection()
class BlogPost(Document):
structure = {
"blog_post":{
"title": unicode,
"created_at": datetime,
"body": unicode,
}
}
default_values = {'blog_post.created_at':datetime.utcnow}
Let's create some blog posts:
>>> for i in range(10):
... con.test.tutorial.BlogPost({'title':u'hello %s' % i, 'body': u'I the post number %s' % i}).save()
Now, developpment goes on and we add a 'tags' field in our BlogPost
:
class BlogPost(Document):
structure = {
"blog_post":{
"title": unicode,
"created_at": datetime,
"body": unicode,
"tags": [unicode],
}
}
default_values = {'blog_post.created_at':datetime.utcnow}
We're gonna be in trouble when we'll save a fetched document because the structure don't match:
>>> blog_post = con.test.tutorial.BlogPost.find_one()
>>> blog_post['blog_post']['title'] = u'Hello World'
>>> blog_post.save()
Traceback (most recent call last):
...
StructureError: missed fields : ['tags']
If we want to fix this issue, we have to add the 'tags' field manually in all BlogPost
of the database:
>>> con.test.tutorial.update({'blog_post':{'$exists':True}, 'blog_post.tags':{'$exists':False}},
... {'$set':{'blog_post.tags':[]}}, multi=True)
and now we can save our blog_post:
>>> blog_post.reload()
>>> blog_post['blog_post']['title'] = u'Hello World'
>>> blog_post.save()
IMPORTANT: You cannot use this feature if use_schemaless
is set to True
Mongokit provides a convenient way to set migration rules an apply them lazily. Here's how to do, we use the previous example.
Let's create a BlogPostMigration
which inherit from DocumentMigration
:
class BlogPostMigration(DocumentMigration):
def migration01__add_tags_field(self):
self.target = {'blog_post':{'$exists':True}, 'blog_post.tags':{'$exists':False}}
self.update = {'$set':{'blog_post.tags':[]}}
How does it work ? All migration rules are simple method in the BlogPostMigration
. They must begin by migration
and be numeroted (so they can be applied in certain order). The rest of the name should describes the rules. Here, we create our first rule (migration01
) which add a 'tags' field into our BlogPost
.
Then you must set two attribute : self.target
and self.update
. There's both mongodb regular query.
self.target
will tell mongokit which document will match this rule. So, any document which match this query will would get a migration.
self.update
is a mongodb update query with modifiers. This will describes what update shoud be apply to the matching document.
Now that our BlogPostMigration
is created, we have to tell Mongokit to what document thoses migration rules should be applied. To do that, we have to set the migration_handler
in BlogPost
:
class BlogPost(Document):
structure = {
"blog_post":{
"title": unicode,
"created_at": datetime,
"body": unicode,
"tags": [unicode],
}
}
default_values = {'blog_post.created_at':datetime.utcnow}
migration_handler = BlogPostMigration
Each time that an error is raised while validating a document, migration rules are applied to the object and the document is reloaded.
CAUTION:
if migration_handler
is set then skip_validation
is desactivated.
Validation must be on to allow lazy migration.
Lazy migration is usefull if you have many document to migrate because update will lock the database. But sometime, you might want to make a migration on few documents and you don't want slow down your application with validation. You should then use bulk migration.
Bulk migration work like lazy migration but DocumentMigration
method must start with allmigration
. Because lazy migration add document _id
to self.target
, with bulk migration, you should provide more informations on self.target
. Here's an example of bulk migration, finally, we wan't to remove
the tags
field from BlogPost
:
class BlogPost(Document):
structure = {
"blog_post":{
"title": unicode,
"creation_date": datetime,
"body": unicode,
}
}
default_values = {'blog_post.created_at':datetime.utcnow}
Note that we don't need to add the migration_hanlder
, it is required only for lazy migration.
Let's edit the BlogPostMigration
:
class BlogPostMigration(DocumentMigration):
def allmigration01_remove_tags(self):
self.target = {'blog_post.tags':{'$exists':True}}
self.update = {'$unset':{'blog_post.tags':[]}}
To apply the migration, instanciate the BlogPostMigration
and call the migrate_all
method:
>>> migration = BlogPostMigration(BlogPost)
>>> migration.migrate_all(collection=con.test.tutorial)
NOTE:
because migration_*
methods are not called with migrate_all()
, you can mix migration_*
and allmigration_*
methods.
Once all your documents have been migrated, some migration rules could become deprecated. To know wich rules are deprecated, use the get_deprecated
method:
>>>> migration = BlogPostMigration(BlogPost)
>>> migration.get_deprecated(collection=con.test.tutorial)
{'deprecated':['allmigration01__remove_tags'], 'active':['migration02__rename_created_at']}
Here, we can remove the rule allmigration01__remove_tags
.
Sometime, we might want to build more advanced migration. For instance, say you
want to copy a field value into another field, you can have access to the
current doc value via self.doc
. In the following example, we want to add an
update_date
field and copy the creation_date
value into it:
class BlogPostMigration(DocumentMigration):
def migration01__add_update_field_and_fill_it(self):
self.target = {'blog_post.update_date':{'$exists':False}, 'blog_post':{'$exists':True}}
self.update = {'$set':{'blog_post.update_date': self.doc['blog_post']['creation_date']}}
If you want to do the same thing with bulk migration, things are a little differents:
class BlogPostMigration(DocumentMigration):
def allmigration01__add_update_field_and_fill_it(self):
self.target = {'blog_post.update_date':{'$exists':False}, 'blog_post':{'$exists':True}}
if not self.status:
for doc in self.collection.find(self.target):
self.update = {'$set':{'blog_post.update_date': doc['blog_post']['creation_date']}}
self.collection.update(self.target, self.update, multi=True, safe=True)
In this example, the method allmigration01__add_update_field_and_fill_it
will
directly modify the database and will be called by get_deprecated()
. But calling
get_deprecated()
should not arm the database so, we need to specify what portion
of the code must be ignored when calling get_deprecated()
. This explain the
second line.