Content Archetype: Modeling the Base Content Object

June 26, 2021 Marc Salvatierra

The base content object presented here empowers you to understand and model your content types across multiple domains, in whatever dimensions and direction your Web presence requires.

Construction of the Trylon and Perisphere, 1939 New York World's Fair — by Hugh Ferriss

From page to podcast, every piece of content has its origin story. A given Web article, streaming video, or downloaded PDF likely came into being and exists somewhere, in someone’s system, as a content type. There are thousands of content types represented across the Web today.

While the many content types you encounter on the Web are identical or very similar, they present diversity in their approach to structuring content depending on individual communication, authoring, governance, storage, addressing and delivery needs, as well as underlying systems and services.

How do we unify our understanding of this universe of content types? By remaining agnostic of any one particular approach and incorporating all identifiable components of these many types into a base content object, we can more effectively model, connect and re-use our own more distinct content types.

A single, all-encompassing content type – especially the interface overlaying it – would intimidate and overwhelm most authors were it offered to them for daily practical use. The feature set presented would be overkill to handle any single use case.

Instead, this base content object exists at a more foundational, fundamental level, as an archetype from which permutations of other, more narrow and practical content types can be derived and extended.

Within the base content object depicted, components are grouped by their functional properties, such as format and metadata. Each component is annotated and described vis-à-vis its relationships to other components and its role within the flow of functionality. Relative emphasis is given to certain components where multiple relationships intersect.

This representation is not intended to suggest any specific management technology to administer and support the model and resulting content types. It would understandably be distributed and implemented among a number of platforms and microservices.

Content Archetype: Modeling the Base Content Object - by Marc Salvatierra

Base Content Object: Main Container

The main container retains all pertinent components of the base content object.

Material Components: Link, Article, Asset, Compound

The initial determination to be made is: Of what direct, material component is the content housed by the object comprised? That is, what is the essence of the content itself, whether in the form of an external resource, managed text, or a hard asset?

Link: The Link serves as a full, structured and managed object, as opposed to the simpler, more commonly understood inline hyperlink. The Link wraps a URL of an externally managed resource and manages it as an object, incorporating other components. The target resource is typically content at a URL outside our own domain that we do not control, but that we want to reference via the Link. This resource is ultimately an Article, Asset or Compound, as described immediately below.

Article: Internally managed text, typically handled in a WYSIWYG editor, which yields copy as or within a Web page. Its ultimate publication format could be as text, a binary, image or other transformed format. “Article” already carries a specific meaning in the HTML5 sense, but here we are using the label broadly to encapsulate all forms of authored text that do not currently reside in hard asset form, as in the Asset below. Article content is essentially liquid text within the confines of an editor.

Asset: An internally managed hard asset – document, image, audio file, video file, etc. The Asset is typically created and managed by some technology that is not a WYSIWYG editor within a Web form. Assets are the weak point in many content management systems in that they are seldom given original, first-class status and are instead treated as supplemental appendages. Here, Asset is elevated to object level where it can be tagged, translated, aggregated, etc., in the same way as Link and Article.

Compound: Provides flexibility in that it can combine the simplest, material components of Link, Article and Asset in more complex arrangements. In contrast to an Aggregation, which pools individual content types on the fly in shifting representations such as views, search results, tag clouds, etc., a Compound rigorously encapsulates and binds multiple material components so that the larger content object can be handled in the same way as any of its individual material components, i.e., published, assigned, addressed.

Note that while Link, Article, Asset and Compound can carry unique addresses implicitly, there is no requirement that any of these material components display at a URL. Rather, they can exist at their own dedicated URL, or can be streamed from or embedded within a larger container, or both.

Further Components

In addition to the material components described above, further components enhance the base content object in other important aspects, describing and controlling its: format, metadata, language, permissions, workflows, positioning, aggregation, storage and dissemination into the World Wide Web.

Format: Composition of Content

Having determined of what material component the content is comprised, we can now consider in what formats those material components exist. Content can be constituted in the following formats:

plain text
rich text
HTML
raw markup
binary
dynamically generated, such as a PDF rendered on-the-fly and provided for download
data source – such as a key-value pair or a stock price API
multimedia – such as a video blob or file, an audio podcast file, or an SVG image

Metadata: Information about Content

Metadata describes the content. Common properties include:

Title
Contextual
Display Title
Legal Date
Display Date
Published Date
Last Modified Date
Last Accessed Date
Revision identifier
Version identifier
Taxonomy
Vocabulary
Tags
Related Objects
Correlated Objects
custom fields

Language: Multilingual Aspects of Content

Content and its metadata can have language aspects. Among these:

authoritative language – the primary language of the object
supported languages – allowable translation languages
language versions – translations that have been populated and correlated
translation status – an indicator of whether translations remain up-to-date
internationalization – ways in which localization is enabled, i.e. date, time, locale, etc.
localization – specific ways content is localized, i.e. date and time format, interface, etc.
layout flow – affects template designs and front-end presentation
text direction – affects template designs and front-end presentation
encoding – vital for setting and tracking, especially as content is migrated and transformed
targeted domains – language-specific domains assignable by language, country or region

Permissions: Controls on Content

The base content object grants permissions over the content. Permissions can be based upon:

team – whether mapped to a business unit or labelled according to system function
role – typically an internal system grouping within the permissions scheme
aspect – such as archivable, translatable, classifiable, versionable
property – such as template, date, language, author, format
content type – noting that for compound content objects, inheritability should be determined
location within the site storage, taxonomy or directory, AKA folder structure
container – as in a parent container from which permissions are inherited

Workflows: Actions on Content

In turn, depending on permissions, workflows govern actions that can be taken on the content and that update the content object:

Create Draft – precedes any of the downstream workflow actions
Edit Revision – an unpublished, incremented set of content changes
Publish Version – a published incremented set of content changes
Clone Version – useful for re-using and re-purposing complex content
Unpublish Version – withdrawing an incremented set of content changes
Archive Version – denoting as immutable a particular content increment
Delete Version – removing from the system a particular content increment
Translate – where applying aspects of the Language component above
Approve – when pushing content and changes through gated processes
Lock – when necessary to temporarily prevent additional or outside incremental content changes

Assignment: Placement of Content

Workflow actions extend to the assignment of content in terms of where it can be positioned within a larger Web presence.

channels – such as sites, apps, devices
interface – such as menus, breadcrumbs, header, footer, sidebars
contextual displays – such as personalized experiences by location and demographic

Aggregation: Grouping of Content

Workflow actions further extend to the aggregation and bulk handling of content, how the content object is assembled en masse among other content objects which may be the same, similar or altogether different:

association – such as application of taxonomy values and free tagging
relation – such as the creation of parent – child relationships within the content, i.e., a legal case object with a legal document object
correlation – such as a sibling relationship used in a draft-final, redline-plain or translation set
views – such as coded or configured combinations of paginated listings
queues – as seen in Drupal
search results, such as front-end results centered around a key search term, metadata facet, or other content aspect

A Note on Aggregations: An Aggregation such as a search results page or paginated product listing can be uniquely addressable and manageable in its own right, through configuration or code. An Aggregation pools individual content objects assembled within it dynamically in ever-shifting representations. In contrast, the content objects themselves maintain a more rigid, stable and granular configuration of content through their structure as defined in the content object.

Storage: How Content and Associated Data are Stored

Depending on its origin and format as described above, content and its associated data would reside within the following repositories:

database
flat file storage
internal file system
content delivery network (CDN)
object storage
caching – where and for how long the content is cached, including session-specific caching
search indices – such as dedicated back-end Solr and Elasticsearch indices that store content by purpose
management system – for example: digital asset management platform (DAM), content management system (CMS), document management system (DMS), knowledge management system (KMS), digital experience management platform (DXM / DEM / DXP), product information management system (PIM), etc.

Ultimately, all content, metadata, translations, permissions, assignments, aggregations and modifications need to write to some form of storage.

Namespace: How Content is Addressed and Accessed

Finally, data in storage informs the namespace through which content will be accessed, whether internally or across the World Wide Web. Namespace considerations include:

URL conventions, incorporating slugs, naming practices, directory structure, taxonomy, etc.
system UUIDs – such as those used internally or for embedding and streaming
canonical URLs – which is the definitive public address of the content
shortened URLs – if maintaining a record of them is useful for future redirection
aliasing – for evergreen and vanity URLs
redirection – critical for post-migration and post-transformation publication of content
chain flattening – useful to optimize redirection to remove intermediary hops
sites, apps and devices – the channels to which we intend to publish content

World Wide Web

Completing the base content object, content is expressed through the namespace to the World Wide Web – the Internet’s content layer – where it accessed in the forms covered earlier:

Article
File
Compound
Aggregation

At this point, hyperlinks can be applied within content of the same domain or among multiple domains.