TinyMCE and Legacy Content: Protecting Your Ordered Lists

Content

May 24

As you work in TinyMCE, what you see is not always what you started with.

credit: https://www.youtube.com/watch?v=mjlm2B9umPA

TinyMCE, the Web’s leading rich text editor, serves authors well when generating new content – but it can disrupt legacy and migrated content, especially if it contains ordered lists.

A sophisticated and capable tool, TinyMCE provides immediacy and responsiveness when authoring Web content, all while watchfully enforcing compliant markup underneath. Though prized, that emphasis on compliance can present disadvantages when it comes to handling legacy content.

Yes, as you work in TinyMCE, “What You See Is What You Get”. But it turns out, what you see is not always what you started with. It may come as a surprise that TinyMCE can mercilessly transform the markup of your legacy content in unanticipated ways.

Here, “legacy content” means it was created or previously managed in another editor – or outside an editor altogether – then later surfaced in TinyMCE as part of a new authoring experience.

Legacy content may have originated as raw markup long ago in a simple text application like Notepad or BBEdit, or may have been generated by discontinued software such as Adobe Pagemill, or exported from Microsoft Word, which yields bloated markup.

Because legacy content is often just old, it is likely to contain deprecated markup – or markup that never was correct – and to be non-compliant with current Web standards.

This non-compliance does not always break the front-end display of Web pages you view, for as long as browsers are backward compatible, legacy content may still render elegantly.

The issue with legacy content vis-à-vis TinyMCE, however, is that TinyMCE is by its nature a rich text editor, not a free-range markup tool, and is ruthless about enforcing compliant HTML.

When legacy content is accessed in TinyMCE for the first time, the editor will wrestle any invalid markup into compliance before the content ever reaches your front-end site.

Normally, we would want our editor to rigorously enforce compliant markup so newly authored content can be expected to consistently validate as well-formed HTML.

But in what can be a dismaying discovery, when legacy content is opened in TinyMCE, its markup can be transformed silently and significantly.

Merely viewing legacy content in design mode in TinyMCE can trigger transformations to legacy markup. In fact, it is not necessary to use “View” > “<> Source code” for transformations to occur. Changes can happen subtly and without the user noticing until later, if at all.

The transformations TinyMCE makes to legacy markup include smaller, helpful conversions of unclosed BR tags to closed BR tags, FONT tags to styled SPANs, B to STRONG, I to EM, etc. These formatting changes are generally harmless.

But TinyMCE will also intervene in more complex markup. And in the case of ordered and nested lists, we have a potential worst-case scenario wherein the transformations TinyMCE makes can change the meaning of content.

If TinyMCE detects incorrectly positioned elements within ordered lists, it will meticulously close and re-open these lists and their items prematurely and intermittently. This extends to nested ordered lists as well.

The result can be a pronounced and incorrect fragmentation and renumbering within content. And where legal and historical material is involved, changes to list numbering have the damaging effect of throwing off their meaning:

Section “4” subsection “d” becomes section “2” subsection “d”.

Section “5” subsection “xi” becomes section “3” subsection “iv”.

As an illustration, the markup below comes from a sample legal page and includes both BLOCKQUOTE and P elements that are disallowed as direct children of the OL element:

<ol>

<li>Legally binding terms</li>

<li>Important clause</li>

<blockquote>Be advised that …</blockquote>

<li>Notwithstanding the aforementioned</li>

<p>The precedents for the determination …</p>

<li>An exception being</li>

<li>Under these conditions</li>

<li>Completely and irreparably</li>

<li>In conclusion</li>

</ol>

Before any transformations by TinyMCE, these list items display following a sequence from 1 to 7, interspersed with a blockquote and paragraph:

1. Legally binding terms

2. Important clause

Be advised that …

3. Notwithstanding the aforementioned

The precedents for the determination …

4. An exception being

5. Under these conditions

6. Completely and irreparably

7. In conclusion

After being loaded in TinyMCE, the markup becomes compliant, but the list numbering is thrown off and is no longer legally accurate:

<ol>

<ol>

<li>Legally binding terms</li>

<li>Important clause</li>

</ol>

</ol>

<blockquote>Be advised that &hellip;</blockquote>

<ol>

<ol>

<li>Notwithstanding the aforementioned</li>

</ol>

</ol>

<p>The precedents for the determination &hellip;</p>

<ol><li>An exception being</li>

<li>Under these conditions</li>

<li>Completely and irreparably</li>

<li>In conclusion</li>

</ol>

Note the renumbering that now disrupts the original legal references:

Legally binding terms
Important clause

Be advised that …

Notwithstanding the aforementioned

The precedent for the determination …

An exception being
Under these conditions
Completely and irreparably
In conclusion

When the original markup is tested in the W3C Markup Validation Service, these two errors account for the list disruption we see:

Error: Element blockquote not allowed as child of element ol in this context.
Contexts in which element blockquote may be used:
Where flow content is expected.
Content model for element ol:
Zero or more li and script-supporting elements.

Error: Element p not allowed as child of element ol in this context.
Contexts in which element p may be used:
Where flow content is expected.
Content model for element ol:
Zero or more li and script-supporting elements.

Again, at their root these transformations are caused by inherent invalidity of the original markup in the legacy content opened by TinyMCE. The editor is simply doing its job as the chosen vehicle to render the content.

As is now apparent, when viewing or editing legacy content – especially very old migrated content – TinyMCE could present significant headaches due to the transformations it imposes. More so if your legacy content carries legal or historical weight.

A challenge is commingling new and legacy content within the same project, website or content management ecosystem.

So how can we address this challenge?

Possible Solutions

Consider an alternate editor: If you have not yet begun your project or migrated your legacy content, consider using a very basic editor or even an unrestricted text field to access and manage older markup. As for alternatives to TinyMCE, some may pose the same risk to the integrity of legacy content. The most prominent competitor, CKEditor, offers a promising HtmlEmbed plugin. But in an echo of the issue TinyMCE presents, the CKEditor documentation flat-out states:
CKEditor 4 is not a tool that will let you input invalid HTML code.
CKEditor 4 abides by W3C standards so it will modify HTML code if it is invalid.
Override defaults where possible: The default settings that ship with the TinyMCE editor largely lock in place certain transformations we want to avoid. There are minimal settings available to affect the editor’s behavior in these respects. Given the composition of your legacy content and your tolerance for transformations, you might be able to configure your way around any predicament that TinyMCE presents. Note that TinyMCE’s Legacy Output Plugin “will greatly reduce the functionality of the editor” and is discouraged “for use in producing normal web content”.
Update legacy markup en masse: With an experienced team of HTML and CSS specialists, a large collection of non-compliant legacy content could be brought into current markup compliance, but at painstaking manual effort and great expense. Updates could also be scripted programmatically. However, the range of variations and edge cases that coding and testing need to cover quickly becomes extreme. If legal or historical content is involved, such a project would presumably need to involve review by an archivist or lawyer subject matter expert.
Quarantine legacy content: From the point that legacy content migrates into a system where TinyMCE is in use, it would need to be strictly segregated from any access point – authoring form, editor window, WYSIWYG field, etc., where TinyMCE could alter it. This approach might prove delicate to maintain and would need to be preserved for as long as TinyMCE is in use. No solution is future-proof, but an architecturally capable product and development team could take steps to effectively insulate legacy content from inadvertent changes by TinyMCE.

The impressive power that TinyMCE provides necessitates discretion as to which content we surface within it. Whatever text editor we use, we don’t want it to change legal contracts or rewrite historical records. And we certainly don’t want it to make such changes unbeknownst to us.

TinyMCEHTMLmarkupeditors

Marc Salvatierra

TinyMCE and Legacy Content: Protecting Your Ordered Lists

As you work in TinyMCE, what you see is not always what you started with.

Possible Solutions

Planning Your Content Inventory: A Technical Quick Guide