Drupal 8 has a wonderful migration system that generally boils migrations into a particular entity type down into a single YAML file. This file specifies the data source, the output entity type, and how to process input fields and where in the entity they get stored.
However, there can be a number of complications. Two, in particular, are bad or unexpected input data values, and exactly what form the input data - or even, the partially processed data - is in (eg, is it a scalar, an array, a nested array, etc.). Because the migration system doesn’t provide a way to look at what’s going on inside the process pipeline, it can be difficult and frustrating to debug these kinds of issues.
One way to quickly see what’s going on without having to get into xdebug and grope around in the code for the various plugins provided by the migrate
, drupal_migrate
, migrate_plus
, and migrate_tools modules
, is to create a debug plugin and place it wherever you need in the process pipeline.
Usually, the custom migration module code will be structured like this:
migration/
├── config
│ └── install
│ ...
│ ├── migrate_plus.migration.node_article.yml
│ ...
├── migration.info.yml
├── README.txt
└── src
└── Plugin
└── migrate
├── destination
├── process
│ ...
│ ├── Debug.php
│ ...
└── source
We’re going to look at the node_article
migration pass for an example. The YAML file could look something like this:
id: node_article
label: Node - Article
migration_group: migration
migration_tags:
- Drupal 6
source:
plugin: node
node_type: blog
constants:
type: article
process:
nid: tnid
type: constants/type
langcode:
plugin: default_value
source: language
default_value: "und"
title: title
uid: node_uid
status: status
created: created
changed: changed
promote: promote
sticky: sticky
'body/format':
-
plugin: static_map
source: format
map:
2: 'filtered_text' # Filtered HTML "no links"
3: 'html_text' # HTML
5: 'plain_text' # PHP code
6: 'filtered_text' # Filtered HTML
7: 'plain_text' # Email Filter
-
plugin: default_value
default_value: 'html_text'
'body/value': body
'body/summary': teaser
revision_uid: revision_uid
revision_log: log
revision_timestamp: timestamp
destination:
plugin: entity:node
But say that it turns out that the body/format
is not getting translated correctly when the node_article migration pass is run. How can this be debugged to understand if the YAML structure is wrong, or the data values are wrong?
Let’s have a look at migration/src/Plugin/migrate/process/Debug.php:
<?php
namespace Drupal\migration\Plugin\migrate\process;
use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\ProcessPluginBase;
use Drupal\migrate\Row;
/**
* Debug process pipeline
*
* @MigrateProcessPlugin(
* id = "debug"
* )
*/
class Debug extends ProcessPluginBase {
/**
* {@inheritdoc}
*/
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
echo "DEBUG: " . $this->configuration['message'] . "\n";
print_r(['value' => $value]);
if (!empty($this->configuration['row']) &&
$this->configuration['row']) {
print_r(['row' => $row]);
}
return $value;
}
}
Debug.php provides a process plugin that can be placed “invisibly” into a process pipeline and gives details about what’s happening inside there. The transform
method for all process plugins receives the $value
being processed and the $row
of input values, among other things. It should return a value that is the result of processing that the plugin does.
The ProcessPluginBase
object also contains a configuration
object, where named parameters that appear in the process pipeline YAML code are stored. For example, the static_map
plugin can find the source
value in $this->configuration[‘source’]
.
In this case, the debug plugin returns exactly the value it receives, so it doesn’t change anything. At base, it prints a message
parameter and dumps the $value
so it can show what actual data is being passed and what format it is in. It has been useful to also be able to look at the $row
values. To see them, simply add the parameter row: 1
.
For example, to see what the result of the body/format
mapping is before passing it along to the default_value plugin:
'body/format':
-
plugin: static_map
source: format
map:
2: 'filtered_text' # Filtered HTML "no links"
3: 'html_text' # HTML
5: 'plain_text' # PHP code
6: 'filtered_text' # Filtered HTML
7: 'plain_text' # Email Filter
-
plugin: debug
-
plugin: default_value
default_value: 'html_text'
If it would also be useful to see what the input row values looked like:
'body/format':
-
plugin: static_map
source: format
map:
2: 'filtered_text' # Filtered HTML "no links"
3: 'html_text' # HTML
5: 'plain_text' # PHP code
6: 'filtered_text' # Filtered HTML
7: 'plain_text' # Email Filter
-
plugin: debug
row: 1
-
plugin: default_value
default_value: 'html_text'
This produces output for each row processed, which can get quite large. If you know what rows are causing problems, the migration can be run with the --idlist=”<source keys>”
parameter. If not, simply redirect the output into a file and use an editor like vim to hunt through it for the cases that are problematic.
This idea can be expanded: if there is some logic to the problem, this can be added so that output is only created for rows that create an issue. It’s possible to create more than one debug plugin (with different names, of course) if there are multiple special purpose needs.