Summary: A collection of UTF-8 related tips and fixes
Description
A collection of UTF-8 related tips and fixes for PmWiki
- Please add your comments/questions about a section in the end of that section (to keep them together).
- Feel free to contribute with other tips or propose improvements to the existing ones.
How to enable UTF-8 support in PmWiki
Note that the 2.2 beta versions of PmWiki have much better Unicode support for more than a year now. It is highly recommended to use a recent beta for an international UTF-8 wiki.
Inside your (farm)config.php add the following line:
include_once($FarmD.'/scripts/xlpage-utf-8.php');
It is recommended to enable UTF-8 for the entire site (as this encoding allows any language or alphabet to be used), and not for some Groups only (as in such case cross-group links may not work properly).
Internationalization and XLPage
Add in (farm)config.php:
include_once($FarmD.'/scripts/xlpage-utf-8.php');
XLPage('fr','Site.XLPage-fr');
See PmWiki.Internationalizations. Basically, you create an "XLPage" (eXtra Language page) containing the strings to translate (Edit - History - Print - Recent Changes - Search). You can use PmWiki.XLPageTemplate as a base for your translations. Copy this page, for example as Site.XLPage-fr, then fill in the translations.
You can use some ready translations from other PmWiki users, see PmWiki.OtherLanguages.
Note that you should first set the above lines in config.php, then open the selected page (in our example "Site.XLPage-fr") for editing, and paste there the translated page. Because: some currently available translations are in another encoding, ie. iso8859-1; if you just download the i18n.zip archive and copy the pages, some characters may not work.
To properly display the dates, numbers and other localized stuff, make sure you set in your Site/XLPage-fr the correct locale in .UTF-8, example:
'Locale' => 'fr_FR.UTF-8',
Otherwise some of the accentuated letters may dissapear, i.e. for French: février. (On some systems it is not .UTF-8 but .utf8, try both.)
Broken {$Namespaced}, {$Titlespaced}, {$Groupspaced}
This was fixed in PmWiki version 2.2.0-beta30 (2007-02-09), but if you cannot upgrade, or wish to keep the latest stable version, take a look here:
Using a different PageStore object (PerGroupSubDirectories...)
If you are using using an alternative page storing format/function/filename (examples: SQLite, PerGroupSubDirectories, CompressedPageStore), you must call the include_once($FarmD.'/scripts/xlpage-utf-8.php'); and XLPage() after the declaration of the alternative PageStore object (new PageStore() or include_once(recipe)).
Page names not properly resolved (pages dissapear, titles break...)
Any call to ResolvePageName() must be made after the include_once($FarmD.'/scripts/xlpage-utf-8.php'); and XLPage() calls. This function may be called by you in config.php and also by some recipe, so you should include any recipes after the include xlpage-utf-8.php and XLPage() call.
Order: In the best case, you should
- first declare the PageStore object (or recipe that declares it),
- next set internationalizations (
xlpage-utf-8.php and XLPage()) and
- then all other recipes.
This tip will save you days of headaches why your pages dissapear and titles break!!!
Other comments
UTF8 should be enabled when installing PmWiki, to avoid page content problems. This is not said in the install instructions AFAIK - jdd
If you have problems with your old pages written in other unicode after enabled utf-8, use the programm 'recode' on Debian-Linux-Distributions to migrate the pages. Also for the i18n-Pakages from PmWiki.
jesus2099 (2008-10-13) : This is not enough. You still have to save your local/config.php file in UTF-8 without BOM (if it contains strings like titles etc.) and to add the following line to the same config file (
Cookbook.ContentType) :
$HTTPHeaders[] = 'Content-type: text/html; charset=utf-8;';
This should be $HTTPHeaders['utf-8'] = 'Content-type: text/html; charset=UTF-8';, and it is not needed in the 2.2 beta version. --Petko
Pagenames didn't look right after redirecting out of action=edit session, so I included "Charset=$Charset" to headers and set $Charset variable as global and also included a meta tag to the redirection page html to indicate charset as well.But there are still some problems for me because it looks like my filesystem is running on ISO-8859-1 and kde in utf-8 and after a redirection pagenames still look odd in the browser's get/url box, but removing the header field "location:" everything looks right.
function Redirect($pagename, $urlfmt='$PageUrl') {
# redirect the browser to $pagename
global $EnableRedirect, $RedirectDelay, $EnableStopWatch, $Charset;
SDV($RedirectDelay, 0);
clearstatcache();
$pageurl = FmtPageName($urlfmt,$pagename);
if (IsEnabled($EnableRedirect,1) &&
(!isset($_REQUEST['redirect']) || $_REQUEST['redirect'])) {
header("Location: $pageurl");
header("Content-type: text/html;Charset=$Charset");
echo "<html><head>
<meta http-equiv='content-type' content='text/html; charset=$Charset' />
<meta http-equiv='Refresh' Content='$RedirectDelay; URL=$pageurl' />
<title>Redirect</title></head><body>".$pageurl."</body></html>";
exit;
}
echo "<a href='$pageurl'>Redirect to $pageurl</a>";
if (@$EnableStopWatch && function_exists('StopWatchHTML'))
StopWatchHTML($pagename, 1);
exit;
}
CarlosAB
I feel ugly urls can happen with redirects disabled, but even then, the link should point to the right destination. On most UTF-8 wikis, the 2.2 version should work out of the box, without the need to modify core scripts. --Petko
Another tip if you are using kde with openbsd, is to export KDE_UTF8_FILENAMES=1 in your environment, so you will be able to see utf-8 filenames.
CarlosAB December 22, 2008, at 05:00 PM
See Also
Contributors