i18n in CakePHP 1.2 - database content translation, Part 2

In my own opinion database content translation with core CakePHP ‘translate’ behavior is a kind of pain when the matter comes to HABTM relations and any bulk records manipulations. It’s not a secret that it is also pain for java ORM engines, like Hibernate.
So the situation needs simple solution that just works.

  1. User wants to add, delete and amend some classifiers in all languages the application is available. This means default.so file is not our friend any more and the magic __() function also. Of course it’s possible to modify the SO files but why we need it if there is a database to store this kind of data.
  2. Database content localization should not complicate bulk records manipulations with sub queries.
  3. Localization of single record CRUD should be simple for developer. Perfect solution – change nothing.
  4. There is legacy database fields that should remain unchanged to keep current functionality up and running after the localization functionality deployed. Also this field values should be used as default values if there is no translation.
  5. Development contract defines how many languages should support application, so we know from the beginning what languages should the application support. If user locale does not match supported locales then application should fallback to default locale.

In a couple of minutes I found at least three following approaches how to store the data.

Solution Bulk manipulations Single CRUD
1. Store localized data in separate table, like it does CakePHP translation behavior (i18n table). Bulk manipulations with this approach can be done at programming language level with loops.
Direct SQL queries will include additional relation to the table with localized data.
Localized data field type is a memo for all types of data.

Here is possible to use model object interceptors to amend sql to read data from external table before CRUD request and manipulate data after CRUD request to set localized values. In CakePHP implementation I found that if there is no translated content in i18n table then no record selected at all.
2. Just store all data in one field in a kind of xml or symbol separated format, e.g. English String; Russian String. Because all data in one field it’s hard to manipulate with localized values with SQL. There is overhead with unnecessary data load that hard to fix. For example you have 28 languages and user want to see data in selected locale. The rest 27 localized values just wasting computer memory. On read we have to get whole field value and select value for current locale in code. On save we should done reverse process.
3. Store localized data In the same table - for each localizable field create fields for default value and localized values, e.g. crate fields with _eng and _rus suffix and map _eng as default. Bulk manipulations with localized data are simple because you can access localized value with sql column name without external table linkage and sql queries are easy to localize with text parser. Method requires to intercept request before a) read operation to select only localized values and localize sql field names; b) save operation to write content to the current locale field. If you need to read values for more then one language then: a) you can select multiple columns - pass array of locales to read and locale to use as default; b) you can read same model object multiple times - one time per selected locale.

I’ve selected approach 3 because second looks bad to me, and I already had experience with the first one. Maybe there are some other solutions, but I fill comfortable with the third one - it looks simple to me can match all requirements. Lets go deeper in the third approach implementation for CakePHP…

The i18n model behavior

A weekend is passed. Finally I’ve done with i18n behavior prototype. Here is the code:

<?php
/* SVN FILE: $Id: $ */
/**
 * Requires:
 * CakePHP 1.2.1.8004
 *
 * I18n behavior for database content internationalization using locale dependent table field names.  
 *
 * I18n behavior integration steps:
 * 1. Identify which languages you are going to use 
 *	(e.g. English and Russian)
 * 2. Identify your default language 
 *	(e.g. English);
 * 3. Identify fields of your models to be internationalized (
 *	(e.g. model Country field 'name' should be i18n compatible);
 * 4. Update your database tables for each model field to be i18n compatible 
 *	(e.g. rename 'name' field to .'_'.DEFAULT_LANGUAGE - default, and create field 'name_rus' that will be russian content); 
 * 5. Add to your model this behavior;
 *	(e.g. $artAs = array('i18n' => array('fields' => array('name'), 'display' => 'name');) 
 * 6. Add to all models that are associated with i18n compatible models this behavior;
 *	(e.g. $actAs = array('i18n'); //you can simply add this to each model )
 *	Its necessary because beforeFind and afterFind invoked for the behavior of the model that calls find method. 
 *	During beforeFind and afterFind the behavior will look for any i18n behaviors, see _localizeScheme and _unlocalizeResults.
 * 7. In your model you can set $displayField as usual. The i18n behavior will unlocalize result field names in afterFind. Default $displayField is 'name'.
 * 8. In your model you can set $order as usual. The i18n behavior will localize your order field name in beforeFind.
 * 9. In your relations you can set order attribute for one field and it will be localized.
 * 10. To save multiple locales pass data with database field names.
 *  (e.g. 'name_rus', 'name_eng');
 * 11. To save data in to current locale pass data without locale profex.
 *  (e.g. 'name' will be saved to 'name_eng' if current locale is 'eng');
 * 12. To load values for all locales detach the i18n behavior before calling model read.
 * (e.g. $this->MyModel->Behaviors->detach('i18n'); $this->MyModel->read();)
 * 13. i18n can be used with Containable behaviour, but becuase it relies on recursion while searching for localizable 
 * fields througth relations, check you have enougth recursion level (default recursion=1);
 *
 * PHP versions 4 and 5
 *
 * Copyright 2008, Palivoda IT Solutions, Inc.
 *
 * Licensed under The MIT License
 * Redistributions of files must retain the above copyright notice.
 *
 * @filesource
 * @copyright		Copyright 2008, Palivoda IT Solutions, Inc.
 * @link			http://www.palivoda.eu
 * @package		app
 * @subpackage		app.models.behaviors
 * @since			CakePHP(tm) v 1.2
 * @version			$Revision:  $
 * @modifiedby		$LastChangedBy:  $
 * @lastmodified		$Date: $
 * @license			http://www.opensource.org/licenses/mit-license.php The MIT License
 */
class I18nBehavior extends ModelBehavior {
 
	//for each model stores lozalizable field names and their aliases to current locale
	var $fields = array();
 
	/** 
	 * Reads configuration of behavior.
	 * Allowed values:
	 * fields - array of i18n compatible field names;
	 */
	function setup(&$model, $config = array()) {
		if (!defined('DEFAULT_LANGUAGE')) {
			trigger_error("Add to bootstrap.php line: define('DEFAULT_LANGUAGE', 'eng');");
		}
		if (!empty($config['fields'])) {
			$this->fields[$model->alias] = array_fill_keys($config['fields'], null);
		}
	}
 
	function cleanup(&$model) {
		$this->_refreshSchema($model);
		//debug('I18n behaviour detached from '.$model->alias.' model.');
	}
 
	function beforeFind(&$model, &$query) {
 
		$locale = $this->_getLocale($model);
		//debug('i18n-'.$model->alias.'-beforeFind-'.$locale);
		//debug($query);
 
		//reset shema if model locale set and was changed since last query
		if (isset($model->locale) && $locale != $model->locale) $this->_refreshSchema($model);
 
		$recursive = empty($query['recursive']) ? 
			(empty($model->recursive) ? 0 : $model->recursive) 
				: $query['recursive']; //during 'delete' there are queries with empty recursive
 
		$this->_localizeScheme($model, $locale, $recursive);
		$this->_localizeQuery($model, $query, $recursive, true);
 
		//debug($query);
		return $query;
	}
 
	//Recursively replaces $localField values to $localAlias in $section array (or string)
	function __localizeArrayInQuery(&$model, &$section, $localField, $localAlias, $isPrimary, &$level) {
 
		if ($level <= 0) return; //rectrict recursion level
 
		//multiple filed as array
		if (is_array($section)) {
 
			//localize array values 
			foreach($section as $queryAlias => &$queryField) {
				if (is_array($queryField)) {
					//for containable [model] => array('fields'=>array(...)), all sub calls will localize by short name too
					if ($queryAlias == $model->alias) $isPrimary = true;
					//localize array values in sub section (like contain, order)
					$this->__localizeArrayInQuery($model, $queryField, $localField, $localAlias, $isPrimary, $level);
				}
				else {
					//full name
					if (preg_match('/(^|,| )('.$model->alias.'.'.$localField.')(,| |$)/i', $queryField))
						$queryField = preg_replace('/(^|,| )('.$model->alias.'.'.$localField.')(,| |$)/i', 
							'$1'.$model->alias.'.'.$localAlias.'$3', $queryField);
					//short name
					else if ($isPrimary && preg_match('/(^|,| )('.$localField.')(,| |$)/i', $queryField))
						$queryField = preg_replace('/(^|,| )('.$localField.')(,| |$)/i', 
							'$1'.$localAlias.'$3', $queryField);
				}
			}
 
			//localize array keys
			$oldKeys = array();
			foreach($section as $queryAlias => &$queryField) {
				//full name
				if (preg_match('/(^|,| )('.$model->alias.'.'.$localField.')(,| |$)/i', $queryAlias)) {
					$newKey = preg_replace('/(^|,| )('.$model->alias.'.'.$localField.')(,| |$)/i', 
							'$1'.$model->alias.'.'.$localAlias.'$3', $queryAlias);
					$section[$newKey] = $queryField;
					$oldKeys[] = $queryAlias;
					debug($queryAlias.''.$newKey);
				}
				//short name
				else if ($isPrimary && preg_match('/(^|,| )('.$localField.')(,| |$)/i', $queryAlias)) {
					$newKey = preg_replace('/(^|,| )('.$localField.')(,| |$)/i', 
						'$1'.$localAlias.'$3', $queryAlias);
					$section[$newKey] = $queryField;
					$oldKeys[] = $queryAlias;
					debug($queryAlias.''.$newKey);
				}
			}
			foreach($oldKeys as $removeKey) {
				unset($section[$removeKey]);
			}
 
			unset($queryAlias); unset($queryField); unset($section);
		}
		//multiple fileds in one string, comma separated
		else {
			//full name
			if (strstr($section, $model->alias.'.'.$localField) != false)
				$section = str_replace($model->alias.'.'.$localField, $model->alias.'.'.$localAlias, $section);
			//short name
			else if ($isPrimary && strstr($section, $localField) != false)
				$section = str_replace($localField, $localAlias, $section);
		}
 
	}
 
	/**
	* Modifies query fielelds to load localized content for current locale.
	* isPrimary should be true only when localizing model that has afterFind event
	*/
	function _localizeQuery(&$model, &$query, $recursive, $isPrimary) {
 
		if (isset($model->Behaviors->i18n) && isset($model->Behaviors->i18n->fields[$model->alias])) {
			foreach($model->Behaviors->i18n->fields[$model->alias] as $localField => $localAlias) { //$localAlias set by _localizeScheme
 
				//localize field names in query sections:
				//1. fields - localize full and short array values
				//2. contain - localize full array values
				//3. conditions - localize array keys, localize array values
				//4. order - localize array values as comma separated string
				foreach(array('fields', 'contain', 'conditions', 'order') as $section) {
					if (isset($query[$section])) {
						$level = 3; //recursion level for __localizeArrayInQuery only
						$this->__localizeArrayInQuery($model, $query[$section], $localField, $localAlias, $isPrimary, $level);
					}
				}
 
				//on primary model append default display name to query if not exists
				if ($isPrimary && 
					is_array($query['fields']) &&
					$model->displayField == $localField &&
					!in_array($model->alias.'.'.$localAlias,  $query['fields']) &&
					!in_array($localAlias,  $query['fields']) ) {
						//keep only one Id column in query
						$query['fields'] = array_values(array_unique($query['fields']));
						$query['fields'][] = $model->alias.'.'.$localAlias;
						//set displayFieled fof list type of query
						$query['list']['valuePath'] = '{n}.'.$model->alias.'.'.$localField; 
 
				}
			}
		}
 
		//if no recursive set then localize fields of related models
		if (empty($recursive)) $recursive = 0;
 
		if ($recursive < 0) return;
 
 		//go throught related models and if thay has i18n behaviour then localize theme
		//Note: models A-B-C, if B is not i18n then C will not be localized, even if it has i18n behaviour
 
		foreach(array('belongsTo','hasOne','hasMany','hasAndBelongsToMany') as $relationGroup) {
			if (isset($model->$relationGroup)) {
				foreach ($model->$relationGroup as $name => &$relation) {
					if (isset($model->Behaviors->i18n)) {
						$model->Behaviors->i18n->_localizeQuery($model->$name, $query, $recursive-1, false);
					}
				}
			}
		}
 
	}
 
	/**
	* Modifies theme to load localized content only for default and current locale.
	*/
	function _localizeScheme(&$model, $locale, $recursive, &$relation = null) {
 
		$model->locale = $locale;
 
		if (isset($model->Behaviors->i18n) && isset($model->Behaviors->i18n->fields[$model->alias])) {
			foreach($model->Behaviors->i18n->fields[$model->alias] as $configName => &$configAlias) {
 
				//ammend schema and store in config localized field name _ or _def
				$foundSpecific = false;
				foreach($model->_schema as $shemaName => $v) {
					if (strpos('_'.$shemaName, $configName) == 1) { //is one of i18n fields
						if ($configName.'_'.DEFAULT_LANGUAGE != $shemaName) { //not for default locale
							if ($configName.'_'.$locale != $shemaName) { //not for current locale
								unset($model->_schema[$shemaName]);
							}
							else {
								$foundSpecific = true;
								$configAlias = $configName.'_'.$locale;
							}
						}
					}
				}
				unset($shemaName); unset($v);
				if ($foundSpecific) { //found locale specific content, no need in default content
					unset($model->_schema[$configName.'_'.DEFAULT_LANGUAGE]);
				}
				else {
					$configAlias = $configName.'_'.DEFAULT_LANGUAGE;
				}
 
				//set defailt display field to i18n name or title
				if (empty($model->displayField) || $model->displayField == 'id') {
					if (isset($this->fields[$model->alias]['name'])) {
						$model->displayField = 'name';
					}
					if (isset($this->fields[$model->alias]['title'])) {
						$model->displayField = 'title';
					}
				}
 
				//localize relations
				if (isset($relation)) {
 
					// localize other relation attributes: 'conditions', 'fields', 'order', //TODO: 'finderQuery', 'deleteQuery', 'insertQuery'.
					$sections = array(&$relation['fields'], &$relation['order'], &$relation['conditions']);
					foreach ($sections as &$section) {
						//do not localize more than once
						if (isset($section)) {
							if (is_array($section)) {
								foreach ($section as &$subSection) {
									if (substr_count($subSection, $configAlias) == 0)
										$subSection = str_replace($configName, $configAlias, $subSection);
								}
							} 
							else { 
								if (strlen($section) > 0 && substr_count($section, $configAlias) == 0)
									$section = str_replace($configName, $configAlias, $section);
							}
						}
					}
 
				}
 
 
			}
		}
 
 
		//if no recursive set then update schema of related models
		if (empty($recursive)) $recursive = 0;
 
		if ($recursive < 0) return;
 
		//go throught related models and if thay has i18n behaviour then localize theme
		//Note: models A-B-C, if B is not i18n then C will not be localized, even if it has i18n behaviour
 
		foreach(array('belongsTo','hasOne','hasMany','hasAndBelongsToMany') as $relationGroup) {
			if (isset($model->$relationGroup)) {
				foreach ($model->$relationGroup as $name => &$relation) {
					if (isset($model->Behaviors->i18n)) {
						$model->Behaviors->i18n->_localizeScheme($model->$name, $locale, $recursive-1, $relation);
					}
				}
			}
		}
 
	}
 
	function afterFind(&$model, &$results, &$primary) {
		//debug('i18n-'.$model->alias.'-afterFind');
		if (is_array($results)) {
			foreach ($results as &$result) {
				$this->_unlocalizeResults($model, $result, $this->_getLocale($model));
			}
		}
		return $results;
	}
 
	/**
	* Narrows fields of loaded data to locale independant names, e.g. fields _def and _eng will became just .
	* It recurse as far as resulsts are exists. If you made find with recursive 2 then it will recurse till second level of results.
	* TODO: The reverse process should be made before model saved.
	*/
	function _unlocalizeResults(&$model, &$result, &$locale) {
 
		if (isset($model->Behaviors->i18n) && isset($model->Behaviors->i18n->fields[$model->alias])) {
 
			//collection of models
			if (!empty($result[$model->alias])) {
				$data = &$result[$model->alias];
			}
			//single model
			else {
				$data = &$result;
			}
 
			foreach($model->Behaviors->i18n->fields[$model->alias] as $name => $alias) { //alias set in _localizeScheme
				//unlocalize field name
				if (is_array($data) && array_key_exists($alias, $data)) {
					$data[$name] = $data[$alias];
					unset($data[$alias]);
				}
			}
 
			unset($data);
		}
 
		if (isset($model->belongsTo)) {
			foreach ($model->belongsTo as $name => $relation) {
				$behaviors = $model->$name->Behaviors;
				if (isset($result[$name]) && isset($model->Behaviors->i18n)) {
					$model->Behaviors->i18n->_unlocalizeResults($model->$name, $result[$name], $locale);
				}
			}
		}
 
		if (isset($model->hasOne)) {
			foreach ($model->hasOne as $name => $relation) {
				$behaviors = $model->$name->Behaviors;
				if (isset($result[$name]) && isset($model->Behaviors->i18n)) {
					$model->Behaviors->i18n->_unlocalizeResults($model->$name, $result[$name], $locale);
				}
			}
		}
 
		if (isset($model->hasMany)) {
			foreach ($model->hasMany as $name => $relation) {
				$behaviors = $model->$name->Behaviors;
				if (isset($result[$name]) && isset($model->Behaviors->i18n)) {
					foreach ($result[$name] as &$record) {
						$model->Behaviors->i18n->_unlocalizeResults($model->$name, $record, $locale);
					}
				}
			}
		}
 
		if (isset($model->hasAndBelongsToMany)) {
			foreach ($model->hasAndBelongsToMany as $name => $relation) {
				$behaviors = $model->$name->Behaviors;
				if (isset($result[$name]) && isset($model->Behaviors->i18n)) {
					foreach ($result[$name] as &$record) {
						$model->Behaviors->i18n->_unlocalizeResults($model->$name, $record, $locale);
					}
				}
			}
		}
 
	}
 
	function beforeSave(&$model) {
 
		//get current locale
		$locale = $this->_getLocale($model);
 
		//if user is saving unlocalized values then reset shema and do not localize any value
		foreach($this->fields as $modelAlias => $modelFields){
			foreach($modelFields as $fieldName => $fieldAlias){
				if(isset($model->data[$modelAlias][$fieldAlias])) {
					$this->_refreshSchema($model);
					return true; //exit
				}
			}
		}
 
		//save localized value to alias database field
		foreach($this->fields as $modelAlias => $modelFields){
			foreach($modelFields as $fieldName => $fieldAlias){
				if(!empty($model->data[$modelAlias][$fieldName])){				
					$model->data[$modelAlias][$fieldAlias] = $model->data[$modelAlias][$fieldName];
					unset($model->data[$modelAlias][$fieldName]);
				}
			}
		}
		//debug($model->data);
 
		return true;
	}	
 
	public static $_i18n = null;
 
	function _getLocale(&$model) {
 
		//instanciate current locale storage class
		if (self::$_i18n == null) {
			if (!class_exists('I18n')) {
				uses('i18n');
			}
			self::$_i18n =& I18n::getInstance();
		}
 
		//retreive current locale
		$locale = self::$_i18n->l10n->locale;
		//debug($model->alias.' get locale '.$locale);
 
		return $locale;
	}
 
	function _refreshSchema(&$model) {
		$model->_schema = null;
		$model->schema();
		//debug($model->alias.' schema renewed');
	}
 
}
 
?>

 

Localized find operation

Lets imagine we have following table in the database with fields for localized content for English and Russian languages:

CREATE TABLE IF NOT EXISTS `countries` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name_eng` varchar(64) character SET utf8 collate utf8_unicode_ci NOT NULL,
  `name_rus` varchar(64) character SET utf8 collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`id`)
);
 
--- and here is sample records
INSERT INTO `countries` (`id`, `name_eng`, `name_rus`) VALUES
(1, 'Australia', 'Австралия'),
(2, 'Austria', 'Австрия');

Next we have the model class /app/models/country.php:

<?php
class Country extends AppModel {
	var $name = 'Country';
	var $useTable = 'countries';
	var $actsAs   = array('i18n' => array('fields'=>array('name')) ); 
	var $displayField = 'name';
	var $order = 'name';
}
?>

Note, we have i18n in the behaviors list ($actAs) and name of the field we want to localize - “name”. Here in our example we have only one filed, but you can add as many fileld to the array as you wish, like

var $actsAs   = array('i18n' => array('fields'=>array('name', 'shortname', 'firstname', 'mydrandmaname')) );

CakePHP uses “name” or “title” as default display name and i18n behavior do the same, therefore we can skip definition of $displayField variable but I will leave it for example.
As far as we have our model we can drop the i18n behavior in to the /app/models/behaviors/i18n.php (full source code of i18n is at the top of the page).
Next we have to define that English is our default language in /app/config/bootstrap.php

define('DEFAULT_LANGUAGE', 'eng');

Fallback to default language happens when the database field for the specific locale not found. e.g. user wants Lithuanian locale, but in the database there are fields name_eng and name_rus, but no field name_lit, then values from name_eng field will be selected.
Now we are ready to read our localized model from database, and here is how we do that:

//output localized list or countries
debug($this->Country->find('list'));
//output all countries with all fields with localized names
debug($this->Country->findAll());

That’s all about find operation localization.

I tried making one of the fields empty, but it does not falls back to the default locale language. I want to select default value if column exists but value is empty. Like user has Russian locale and there is name_rus but values are NULL in the database. Then app should select values from name_eng.

This will increase the amount of selected data (we need to select default and locale specific) fields from the database and in afterfind event of the behavior we should parse all results to check for empty locale specific value. It’s really bad from performance point of view. My idea is to write default value to the empty locale specific value during save operation. E.g. we have name_eng, name_rus, user has Russian locale and saves model, if value is empty then value of nam_eng will be written to the name_rus. Note, this functionality not implemented yet.

I have a one to many relationship table - a category table which has many product relationship. Product model has id field, i18n ‘name’ field (name_eng and name_zh_tw), and this table belongs to category by category_id field. When I retrieve the category table with recursive of two, I wanted to retrieve the name of the product depending on the locale.

What you should keep in mind it is the i18n behavior should be both:
a) in the model that has the localizable fields;
b) in the model that does find operation.
In your case you have: category (no i18n fields) 1 - M product (i18n fileds: name_eng, name_zh_tw).
For this case you should put the i18n behavior:
a) in to the product model the i18n behavior to mark fields that you are going to localize;

var $actsAs   = array('i18n' => array('fields'=>array('name')) );

b) in to the category model to allow the i18n behavior to intercept the beforeFind and afterFind calls:

var $actsAs   = array('i18n');

 

What’s next?

As you can see in comments the save interception is not implemented yet. And if you will try to save localized model you will got error message that field not found.
I will keep up to date the post and commit updates to the code as I will progress in the area. Right now this functionality is enough for me and it works great.
Do not hesitate to post comments and code improvements.
As soon as the code became more or less functional and stable enough it will be introduced to CakePHP community.
That’s all for now, happy coding. :)