Summary: Convert an HTML page to PmWiki markup
Version: 2008-10-07
Prerequisites: PmWiki 2.2.0-beta45
Status: beta
Question answered by this recipe
- How do I convert an HTML page to PmWiki markup?
- How can I migrate a site to PmWiki or import HTML pages?
Description
PmWiki markup does not support all of the HTML markup so a 100% conversion is not possible. However, PmWiki can make replacements to the text as it is being edited or saved. ConvertHTML implements a relatively comprehensive set of rules for converting HTML tags to wiki markup.
To install this recipe:
- download convert-html.phpΔ to your cookbook directory
- add the following line to your configuration file:
if ($action=='edit') include_once("$FarmD/cookbook/convert-html.php");
What it does
ConvertHTML uses the $ROEPatterns patterns array to translate most HTML tags, leaving the rest intact. All replacements are case-insensitive and attributes may be surrounded by single or double quotes, or in some cases left unquoted. The XHTML / at the end of a lone tag is always optional.
The following tags will be parsed only if they contain no attributes: TITLE, I, EM, B, STRONG, TT, CODE, PRE, BIG, SMALL, SUP, SUB, INS, DEL, HR, BLOCKQUOTE, DD.
The following tags will be parsed even if they contain attributes: P, H1..6, BR, DIV, SPAN, TABLE, TD, UL, OL, LI, DL, DT, IMG. These attributes will be assigned within an applicable (:...:) or %...% ... %% statement. The validity or effectiveness of these attributes as PmWiki markup isn't verified, for the most part.
Some additional notes:
<meta name="description|keywords" content="..." /> is also recognised, as are HTML comments <!-- ... -->.
- Other attributes than
href are lost for A links. This is partly due to PmWiki markup not supporting attributes for links. <a name="..."></a> anchors are handled, though. As PmWiki doesn't support spaces within the resulting [[#...]] markup, these spaces are replaced with the _ character. Link targets that start with a . or a / are prepended with Path:, link targets that contain neither / or : are prepended with Attach:.
- IMG tags with
alt or title attributes are correctly handled, and align=left|right on an image results in the markup %lfloat% or %rfloat% at the beginning of the line.
- Ordered and unordered lists are supported to an arbitrary depth.
- Attributes defined for a TR are only applied to the first TD of the TR.
- Only the
clear attribute is supported for BR; having it set to all, left or right results in [[<<]] instead of \\ markup
Usage
- Install the recipe
- Paste HTML into a PmWiki edit box
- Press "Preview" or "Save and edit"
- Verify the resulting markup
Notes
The $ROEPatterns array is available in the PmWiki core starting from pmwiki-2.2.0-beta45. For earlier versions, you'll need to implement Cookbook.ROEPatterns or replace the reference in the cookbook to use $ROSPatterns.
Suggestions, fixes and improvements to the regular expressions involved are quite positively encouraged.
I am aware that <p>...</p> tags end up having two empty rows between blocks, but this shouldn't affect the page's rendering and I'm not quite sure how to fix this in a robust manner.
I haven't actually tested the html2wiki program suggested below, but as far as I can tell from its source files the above recipe handles all of the markup also handled by html2wiki.
Release Notes
- 2008-10-07Δ
- better documentation
- bugfixes: white space in output, DL lists
- IMG
alt/title and align attributes
- better A names and targets
- 2008-10-05Δ — first public release
See Also
Contributors
Alternative: html2wiki
There is a perl program html2wiki which does a good job. You can use the converter on the web page, or install the program.
It can be installed from CPAN in the usual perl way, or some Linux distributions may have it as a separate package, such as libhtml-wikiconverter-perl.
One needs to install both the HTML::WikiConverter module and the HTML::WikiConverter::PmWiki (which is the PmWiki "dialect" module).
The html2wiki script is a standalone program which takes a HTML input file and creates Wikified output. You can then cut-and-paste the output into the wiki (or use your favourite editor, see EmacsPmWikiMode and Pywe).
For example:
html2wiki --dialect=PmWiki input.html >output.wiki
Comments