Hướng dẫn testing Drupal data migrations với CasperJS

Hướng dẫn testing Drupal data migrations với CasperJS

At DesignHammer, we've used the Migrate module on many projects, with a variety of use cases: Drupal-to-Drupal migrations; importing content from CSVs or XML to populate a new site build; importing XML from an eBook as nodes for an online version of the book content, and so on. It's a flexible, stable module that provides a great API for getting content from point A to point B. (Note: if you are not familiar with the Migrate module, you should check out the documentation — for the purposes of this post, you just need to know that it's a framework to help developers move content from an external source into Drupal.)

But while Migrate lets you automate the process of importing the content, validating what gets migrated is often a manual review process. In most of the projects we've worked on, if our migration code gets article #123 migrated correctly, then we are pretty confident that article #456 will look fine too, because the data is in a consistent format.

Clearly, this is an optimistic assumption and it doesn't always hold up. On a recent project, the source data is anything but consistent, so while a manual review of one migrated item might look correct, any number of other items could have problems. Adding to the challenge is that this is not a one-time migration, but a regular, nightly import of data, so we can't afford to have errors from the migration showing up in production.

In this post, we'll look at how we can implement a testing framework for ongoing imports of data into Drupal using some of our favorite tools: CasperJS, Migrate module, and Jenkins.

The Scenario

We have a Drupal 7 site where about 90% of the content is pulled from an external data source once per day. The external data source is itself supplying data based on an import from a CRM. So, the workflow is something like: the client edits content in a CRM, that data gets exported to another server as XML, then that server provides a SOAP API for our Drupal site to pull down XML. Finally, for reasons that I won't go into here, we need to convert the XML to CSV format.

There are a couple points where data can get munged as it makes its way from the CRM to the Drupal site. More importantly, the CRM data is not always correct to begin with — sometimes invalid cross-references are created between different entities in the CRM. For example, if we are importing Authors and Publications they have written, we would expect that the Publication XML contains valid Author IDs, and that the Author IDs are the correct Author references for the data. This is a nice assumption, but we ran into problems over and over again where it isn't the case. In other cases, fields that should not have HTML in them (e.g. fields that map to node titles) sometimes have HTML.

We can't go ahead with deploying a site where a nightly import has the potential to break a lot of the content, so we need to look at setting up a process to prevent any problems from ever reaching production.

Validating the migration

Preparation

First thing we need to do is get the source data in a format that CasperJS can understand — JSON. As I mentioned earlier, we are importing XML from the data source and converting it to CSV for our Migrate module code. This is done through a custom module that grabs the XML from the external source, then converts the XML to CSV (here's a snippet for anyone interested). To get JSON, we read the CSV back into an array, then write the array as JSON.

API module for locating migrated data

The second thing needed is an API for our Drupal site to return content to CasperJS based on the source ID. For example, let's say we have a Publication with ID 123. After running a migration, we have a Publication node with a node ID of 456. What we want is for CasperJS to be able to find the migrated entity based on the source ID, so we can compare if the data get migrated accurately. To do this, I wrote an API module with a hook_menu()entry for returning content. The idea being that I can now visit http://example.com/api/publication/123and the API module will redirect me to http://example.com/node/456. What's more, the API module will also set messages or errors if more than one Drupal entity matches a source ID, or if the item is not found — that way CasperJS can run assertions based on those status messages.

Dynamic migration

Next, we need to write a dynamic migration test in CasperJS.

Recently I wrote about Using CasperJS Drush and Jenkins to test Drupal and I recommend glancing through that to see an example of the way a typical test might be written: you define some behaviors that will get run on a predefined set of paths, and then test them.

The problem is, the tests described in that post look at a predefined number of tests/steps to perform. In our case, we have a variable number of items (in fact, thousands of items) that we need to check. Fortunately CasperJS provides an approach for dynamic tests — just what we need.

Let's look at some code to implement this.

Validating entities

In this example, we are looping through source data containing rows of Publication Authors that are migrated into Drupal as entities. Each Publication Author entity contains references to Publication entities. In the test, we verify that publications linked with publication authors in the source data have been accurately migrated as entities with entity references in Drupal. We also verify that a few fields for the Publication Authors (name, initials, remote ID, etc) have been accurately migrated.

Here's a link to the full test as it's a bit lengthy — for this post, I'll break down the individual parts.

Kicking things off

At the top of the test script, we have this code:

casper.test.begin('Verifying Publication Authors entity migration', function suite(test) {

This is just telling CasperJS about our test.

Defining variables

Next up we define a couple variables we'll use in the test.

// Load the JSON document.
var json = require(json_dir + '/' + datestamp + '/PUB_AUTHORS2.json');
var pubAuthorJoin = require(json_dir + '/' + datestamp + '/PUB_AUTHORS.json');
var currentRow = 0;

Running the test

Now it's time to kick off the test! The test starts with a command telling CasperJS to run the check()method:

casper.run(check);

So, what's in check()?

function check () {
    if (json[currentRow] && currentRow < json.length) {
        verifyData.call(this, json[currentRow]);
        currentRow++;
        this.run(check);
    } else {
        this.test.done();
    }
}

jsonis the variable we defined above. It contains rows of publication author data. This snippet of code is saying, loop through the entire data set of publication authors, and call the verifyData()method with the current row; then after we're finished with the source data tell CasperJS that the test is completed.

Verifying the row

Now for the core of the test, the verifyData()method. I'm not going to go into great detail, as the full gist has comments explaining, but the basic overview is:

  1. For the current publication author that verifyData()is looking at, grab a list of publication IDs from our other source data (var pubAuthorJoin).
  2. Visit our site's API (/api/publication-author/%) to load the publication author entity in Drupal. Note that if response.statusisn't 200, we retry a few times (waiting 5 seconds each time) to make sure that there's an actual problem with loading the entity, as opposed to a hiccup on the server that didn't return the page when CasperJS requested it.
  3. Run the assertions. In this example we are verifying that the last name, initials, author ID, and external ID were migrated correctly. We also loop through the pubsForAuthorvariable (from step #1) and make sure that all the publications that should be linked with this author have been migrated and referenced correctly in Drupal.

Success

And that's it! Here's what some of the output from the test looks like:

#CDCD00;"># Checking content returned by /api/publication-author/2751
PASS Success message present
PASS No duplicate content found.
PASS Titles match
PASS Last name is present.
PASS Initials present.
PASS Publication Author ID exists.
PASS Pub ID 10828 is linked.
PASS Pub ID 2516 is linked.

And if there are failures, we'll get a log stating which assertion failed, the relevant ID associated with the failure, and so on. You can even configure CasperJS to take a screenshot when there is a failure to help debug what is going on.

More complex tests

Each content migration for Drupal has its own set of CasperJS tests. Some are quite complex — for example, on a staff member page, there are views showing publications for a staff member, services they provide, etc, and we have CasperJS check all the links to make sure that the cross-references are accurate — while others are much more straightforward. There aren't really many limitations on this, other than the amount of time your tests will take to run.

Implementing the testing and deployment process with Jenkins

A continuous integration delivery system is essential to making this all work well. (For an overview of Jenkins, check out this Slideshare.) Here's the general order of operations for the work Jenkins does:

  1. Rebuild @stageby importing the DB from @production, disabling a couple of modules and tweaking a few settings in the process
  2. Run an import of the 3rd party data source XML and store it on the server
  3. Run a Drush script that does an initial pass of validating the source data — the idea is that there are some basic validations of the source data that can be done well before we run the content migrations, e.g. checking each file for required fields, unique values, etc.
  4. Kick off a migration on @stageusing the latest data set
  5. Run the CasperJS tests (one at a time, so that we can fail quickly if needed) to iterate over the source data and verify that content and fields migrated correctly to Drupal
  6. If all tests pass, then a data pull and migration are initiated in @prod

If there are errors, we get a report stating exactly which rows of the source data did not pass the tests, and Jenkins stores a copy of the imported content, so we can quickly locate the source of any issues. This also means that we have a framework for testing changes to Migration code implementation — if we need to adjust a field mapping, or add a new migration that ties in to the other ones, we can run the tests on the new code and see if there are any issues.

The whole process takes between 2 - 3 hours. While it is time consuming, the end result is that we can run drush migrate-importwith confidence in production, in an automated way, and not have to worry about data corruption or failures.

So, that's the gist of how we are validating our Drupal migrations and preventing problems from reaching production. If you have questions or suggestions please let us know in the comments below.

Bạn thấy bài viết này như thế nào?: 
Average: 5 (1 vote)
Ảnh của Khanh Hoang

Khanh Hoang - Kenn

Kenn is a user experience designer and front end developer who enjoys creating beautiful and usable web and mobile experiences.

Bình luận (0)

 

Add Comment

Filtered HTML

  • Các địa chỉ web và email sẽ tự động được chuyển sang dạng liên kết.
  • Các thẻ HTML được chấp nhận: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Tự động ngắt dòng và đoạn văn.

Plain text

  • No HTML tags allowed.
  • Các địa chỉ web và email sẽ tự động được chuyển sang dạng liên kết.
  • Tự động ngắt dòng và đoạn văn.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Tìm kiếm bất động sản

 

Advertisement

 

jobsora

Dich vu khu trung tphcm

Dich vu diet chuot tphcm

Dich vu diet con trung

Quảng Cáo Bài Viết

 
Mai Ngọc sinh năm 1990. Là một trong những hotgirl Hà thành đời đầu, Mai Ngọc ghi dấu ấn với các bản tin thời tiết của VTV trước khi chuyển qua làm thời sự. Năm 2019, Mai Ngọc bắt đầu dẫn bản tin Thời sự Việt Nam hôm nay lúc 17h30 của VTV và là một trong những gương mặt quen thuộc của bản tin này từ đó đến nay

Tin nóng ngày 3.5.2021 - trước khi hành động nên quan sát các cảm giác trên thân

Con người hay có xu hướng Chê bai người khác khi cho rằng điều này họ làm sai, làm chưa đúng và phẫn nộ khi điểu đó có những hậu quả rất lớn.

Yahoo Voice website reportedly hacked, over 400,000 username and passwords made public

Yahoo Voice website reportedly hacked, over 400,000 username and passwords made public

Hackers appear to have breached a Yahoo Voice server and posted around 453,000 user accounts and passwords online. D33Ds Company, a hacking group, made a file available on

Hướng dẫn tối ưu Drupal 8 Performance năm 2020

Hướng dẫn tối ưu Drupal 8 Performance năm 2020

I have really enjoyed how much easier it is to make Drupal 8 performant compared to its predecessors. When I first started working with Drupal 8 performance, I was surprised how few of the tips and tricks I used from Drupal 7

Công ty diệt chuột T&C

 

Diet con trung