An Abstract View on Data Structures in WordPress

This weekend, I gave a talk at WordCamp Portland, looking into data structures in WordPress. While the session will soon be available on wordpress.tv, in this post I will provide a written version of it. I recommend you to read this alongside the original slides.

When referring to data structures here, it’s not data structures as in computer science. You won’t hear about arrays, doubly linked lists, binary trees. Instead we will look at object types, metadata and options, in other words, how data is organized in WordPress. We will do so from both the database and the code points of view. Knowing the ins and outs of this is crucial for both core contributors and plugin developers.

The WordPress Database Schema

Let’s first have a look at the WordPress database schema. In the default setup, WordPress contains 12 tables. Although the prefix does vary across installs, we’ll use wp_ here for convenience and to differentiate. Here’s the list:

wp_commentmeta
wp_comments
wp_links
wp_options
wp_postmeta
wp_posts
wp_termmeta
wp_terms
wp_term_relationships
wp_term_taxonomy
wp_usermeta
wp_users

In multisite, 7 more tables are added:

wp_blogmeta (coming in WordPress 5.1)
wp_blogs
wp_blog_versions
wp_registration_log
wp_signups
wp_sitemeta
wp_site

When looking at these 19 tables, there is at least one group that we can identify at first glance: The tables wp_commentmeta, wp_postmeta, wp_termmeta, wp_usermeta, wp_blogmeta and wp_sitemeta are clearly somewhat related. We can also see that each of these tables has another table that goes together with it, for example wp_posts for wp_postmeta or wp_terms for wp_termmeta. We will later see (and you may already know) that it is actually the other way around, the meta tables accompany their respective other table.

For further assumptions, we need to look at the structures of the individual tables. We can see that, for almost all tables, rows are uniquely identified by a single numeric ID, that is automatically incremented with each new entry, and furthermore these columns are the only unique ones in the respective table. The naming for these columns (for example ID, term_id, comment_ID) is inconsistent, because naming is hard especially with history, but the structure is similar. While it’s hard to figure out why all these tables have that setup, maybe we can instead find out something by looking at the three exceptions:

Rows in the wp_options table can alternatively be uniquely identified through another way, namely its option_name column. We will look at this further towards the end of this post since it is rather special.
The wp_term_taxonomy table can alternatively be uniquely identified through another way as well, namely its term_id and taxonomy columns together. As a teaser I’ll tell you right away that this table is actually almost unnecessary nowadays, but was more meaningful a couple years ago – we’ll investigate shortly.
The wp_term_relationships table is the only one that does not have a unique numeric ID column, instead its rows are identified by both an object ID and a term taxonomy ID. This lets us guess that the table defines relationships between two other rows from other tables. Let’s look closer at this one now.

Relationships between Tables

wp_term_relationships stores relations between posts and terms, for example which post is assigned which tag(s). It is actually the only dedicated relationships database table in WordPress. Because it allows for multiple entries in both identifying columns, we can tell that this is a many-to-many relationship, in other words a post can be assigned multiple terms, and a term can be assigned to multiple posts.

`wp_term_relationships`
`object_id bigint(20) unsigned primary` `term_taxonomy_id bigint(20) unsigned primary` `term_order int(11)`

However, there are of course still other relationships between entities in WordPress, which we can also partly distill from the database schema. Several tables contain a column to point to another entry’s ID, which may be an entry from the same table or from another one. Where the entry is stored, we can only guess from the database schema, this is then defined in the WordPress codebase. Some examples:

In wp_posts, there is a post_parent column pointing to another entry from the same table.
In wp_posts, there is a post_author column pointing to an entry from the wp_users table.
In wp_comments, there is a comment_post_ID column pointing to an entry from the wp_posts table.
In wp_blogs, there is a site_id column pointing to an entry from the wp_site table.
Also very important, all of the aforementioned meta tables have a column that points to an entry from the respective related table, which is crucial for their functionality as we will find out later.

All of these are always one-to-many relationships, for example a post can only be assigned a single post_parent, but a post can be assigned as parent to multiple posts. While in this specific case, the column actually carries the word “parent” in its name, in most other cases the nature of the relationship is parent-like as well. For example, a site has a (parent) network that it belongs to – sites are stored in wp_blogs, networks are stored in wp_site – also, don’t question this naming, it is naming backward-compatibility at its finest.

All tables that we haven’t particularly addressed pretty much live by themselves, except the wp_term_taxonomy table which stands out too, as we noticed before…

Term Splitting

You may remember that a couple years ago there was this developer buzz about term splitting in WordPress. Let’s briefly recap what this was about, as it will explain the nature and relationship of the aforementioned wp_terms and wp_term_taxonomy tables.

Historically, an entry in wp_terms could be associated with multiple entries in wp_term_taxonomy, in other words a single term could be associated with multiple taxonomies. Over time, this caused several bugs and many unexpected scenarios, so in WordPress 4.2 a mechanism was implemented to split these wp_terms entries that had multiple wp_term_taxonomy connections into separate entries. Since then, nowadays there should always be only one wp_terms entry associated with one wp_term_taxonomy entry. Due to that, we can essentially consider these two tables as a single table now. Once again, backward-compatibility is the only reason that we still have two separate tables.

How WordPress Uses its Database Tables

So far, for the most part we focused on the database schema of WordPress. Let’s now look at how these database tables are actually used in the codebase. I hope things have been rather clear so far, because now they will become (more) confusing. Please bear with me.

The Post ≠ Post Dilemma

`wp_posts`
`ID bigint(20) unsigned auto_increment primary` `post_author bigint(20) unsigned` `post_date datetime` `post_date_gmt datetime` `post_content longtext` `post_title text` … `post_type varchar(20)` …

Let’s look at usage of the wp_posts table. This is probably the most commonly known database table as it contains posts, the essence of the original blogging paradigm of WordPress. However, we need to clarify this a bit: You may have noticed or already known that there is no wp_pages database table – so where are pages stored then?

The answer is that a page, as well as essentially every piece of content in WordPress, is a post. They are all stored in the wp_posts table.

To clarify this, when speaking about what we typically refer to as posts, we should be more precise: We would actually need to call them posts of the post type “post”. When we speak about pages, we would need to call them posts of the post type “page”. This is obviously inconvenient, but it is very important to know that “post” can refer to either all pieces of content authored in WordPress or the post type “post” specifically. Bringing up the topic of “custom post types” may help understanding this. These are not other types of the post type “post”, as in blog posts. These are entirely different post types, in addition to the existing ones “post”, “page”, “attachment” (which is for files in the media library) and a few internal ones.

WordPress post types, both built-in and custom ones, are basically defined entirely in the codebase, most importantly through the register_post_type() function. The only indicator in the database schema is the post_type column in the wp_posts table, which stores which post type a post belongs to. How that post type actually works can then only be determined by looking at the code.

The same applies to terms and their taxonomies. You can use register_taxonomy() to define how a taxonomy works, while in the database there is simply a single taxonomy column indicating to which taxonomy a term belongs to. At least here the naming confusion is not present, since we don’t have a taxonomy called “term” (thank God!), but instead “category” and “post_tag” (the latter being commonly referred to just as tags).

How Terms and Posts Relate

As you may know, terms, for example categories and tags, provide a way to group posts together by certain taxonomies. A term does not typically contain original content, instead, when looking at a term view in the frontend, we will see a list of posts which are assigned that term. These relationships are stored in the aforementioned wp_term_relationships table: A post can be assigned multiple terms, and a term can be assigned to multiple posts, hence the many-to-many relationship and dedicated database table. To clarify, the terms that are assigned to a post can have multiple taxonomies. For example, by default posts of the post type “post” can be assigned to terms of the taxonomy “category” and of the taxonomy “post_tag”. Actually, the terms which are assigned to a post are identified not via the term_id column as you may think, but via the term_taxonomy_id column. This is again due to the historic circumstances before term splitting happened. Nowadays, the term_taxonomy_id column is still used for that, but otherwise it is pretty much redundant.

Another interesting trait of the wp_term_relationships table is that the column that stores the post ID is called object_id. This is because, while terms of a taxonomy are typically associated with posts, they could in theory also be associated with other object types. Most of the codebase of WordPress however expects the related objects to be posts, so it would be rather involved to actually do so in a consistent way.

The previous observation indicated that, while we sometimes say that “everything in WordPress is a post”, there are still objects of other types than posts. And to clarify, we are not talking about post types here. A post in the most global sense possible is still a piece of data in a certain structure. Terms for example are also a piece of data, but their data structure is different from posts. We have so far looked at them distinctly, but there are actual quite a few similarities.

What are Object Types?

Object types in WordPress are something that has been around for a very long time, but hasn’t seen any real exposure in developer circles. Just recently the topic has become more important due to development of more abstract WordPress core APIs, particularly because of the REST API, about which I always like to highlight its benefits in bringing more order to the previous (internal) core API mess. We will get to a prime example in just a bit. But first, let’s look into how posts and terms relate.

When we look at posts and terms, we’ll notice that they have very similar APIs:

There is a WP_Post / WP_Term class representing a single entity.
There is a get_post() / get_term() function to retrieve a single entity’s class instance.
There is a wp_insert_post() / wp_insert_term() function to add a new entity.
There is a wp_update_post() / wp_update_term() function to modify an existing entity.
There is a wp_delete_post() / wp_delete_term() function to delete an existing entity.
There is a get_posts() / get_terms() function to query multiple entities.
There is a WP_Query / WP_Term_Query class that actually run the respective queries. WP_Query should actually be named WP_Post_Query, but when it was introduced many many years ago, it wasn’t clear that there would be a need for similar query classes for other object types in the future.

While there are slight nuances between the functions or classes in every mentioned set, the APIs follow a general structure. I’d go as far as to say: If WordPress core used interfaces, there would be one for all of them.

When looking into the core code further, we can actually see that there are more than these two that follow similar behavior: comments and users have matching APIs as well – and in multisite, sites and networks are fairly similar too (at least in their classes and in retrieving entities).

So far we have discovered that object types all provide similar APIs for interacting with entities of the respective object type. For a long time, it was just silently accepted and not much talked about that these follow similar architectures. However, as mentioned before, thinking about it became necessary once an interesting challenge regarding the REST API and metadata came up. Before we get to that though, let’s have a brief look at what metadata actually is and how it works.

What is Metadata?

Let’s take a step back to the database structure. Earlier we discovered that there are a couple of database tables whose names end in meta. These tables host so-called metadata. Here they are listed again:

wp_commentmeta
wp_postmeta
wp_termmeta
wp_usermeta
wp_blogmeta (multisite only, coming in WordPress 5.1)
wp_sitemeta (multisite only)

Metadata are arbitrary key-value pairs that are associated with an entry from another specific database table: For example, wp_postmeta contains key-value pairs associated with entries from the wp_posts table, in other words it hosts metadata for posts – in other words for objects of the object type “post” – and, as a reminder, that means for posts of all post types. Another trait of metadata is that there can be multiple values for a single key. While most of the time we use metadata for single key-value pairs, multiple values are supported. This can be beneficial especially for storing array-like data.

`wp_termmeta`
`meta_id bigint(20) unsigned auto_increment primary` `term_id bigint(20) unsigned` `meta_key varchar(255)` `meta_value longtext`

These metadata tables, while not being great in terms of database architecture, allow entries from their respective main table to store any extra data they need, in addition to what there is room for in the original table. This is where the whole magic of custom post types has historically come to work: If your custom post type is called “event” and it needs to store a start_date and end_date for each entry, this is where these fields would be stored – unless you use a custom database table, which is nowadays strongly recommended, but on the other hand more involved to develop and integrate. WordPress core even uses metadata itself, for example to store the featured image of a post: Since featured images were implemented much later than when the wp_posts database table schema was set up, and since WordPress does not like altering its database structure, metadata was an acceptable alternative for storing them.

The Abstract Meta API

When looking at the different meta database tables, we notice that they follow a standard pattern:

They all are named wp_{$object_type}meta.
They all have a meta_id column, except wp_usermeta where that column is inconsistently named umeta_id.
They all have column for the related object ID, named {$object_type}_id.
They all have a meta_key column.
They all have a meta_value column.
They all have their related objects stored in another table called wp_{$object_type}s, except wp_site which should have been called wp_sites.

As it makes sense due to this common setup, there is a common Meta API for CRUD (create, read, update, delete) operations:

get_metadata( $object_type, $object_id, $meta_key, $single )
add_metadata( $object_type, $object_id, $meta_key, $meta_value )
update_metadata( $object_type, $object_id, $meta_key, $meta_value, $prev_value )
delete_metadata( $object_type, $object_id, $meta_key, $delete_all )
In addition, there are a couple more specific metadata functions that are less commonly used.

As developers, we don’t usually interact with these functions directly because there are object type-specific wrappers for all of them, such as get_post_meta(), get_term_meta(), get_comment_meta() and so on. These wrapper functions exist for convenience so that we don’t always need to pass the $object_type argument, and they also obfuscate a bit that there even is the concept of an object type. To be completely honest, the parameter of the low-level functions is actually named $meta_type instead of $object_type, but that is because the decision on using the object type naming consistently came up just a few months ago.

How Metadata Caused us to Think about Object Types

The challenge that caused us to think about object types further was the introduction of the register_meta() function. I’ll try to briefly explain the underlying problem:

The REST API has since its introduction been dealing with metadata in an abstract way, so registering metadata needed to be implemented through an abstract API, just like the rest of the Meta API.
With the introduction of the REST API-supportive register_meta() function in WordPress 4.6, metadata could only be registered for an entire object type.
However, in the real world, metadata is typically used for data specific to a certain post type. Therefore the API was barely usable: Going back to our example of the custom “event” post type, we wouldn’t have been able to register our meta keys start_date and end_date for only that post type, instead we would only have been able to register them for the entire object type “post” – which would have meant it is exposed for all existing post types, which would obviously not have been what we wanted, and it would likely have caused trouble.
To address this shortcoming in the API, we essentially determined that, if “post”, “term”, “comment” and “user” are object types, then we could call the hierarchical level below object subtypes.

This is how we got to the need to talk about object types, and in some cases their object subtypes – in some cases, because not all object types have subtypes. Let’s look at the specific cases to clarify this:

For the object type “post”, the object subtype is essentially the same as the post type. In other words, “post”, “page”, “attachment” etc. are subtypes of the “post” object type.
For the object type “term”, the object subtype is essentially the same as the taxonomy. In other words, “category”, “post_tag” etc. are subtypes of the “term” object type.
The object types “comment”, “user”, “site” (internally called blog) and “network” (internally called site) do not have subtypes. While the wp_comments database tables contains a comment_type field which is currently used to distinguish regular comments from pingbacks and trackbacks, there is no API around it, and the behavior is rather undefined at this point.

Note that, for object types that do not have subtypes, we consider themselves the only subtype that exists at the moment. For example, “user” is the only subtype of the object type “user”. This decision was made for forward-compatibility, should we ever want to introduce actual object subtypes for the respective object type.

WordPress object types, and their subtypes where applicable

The register_meta() function as of WordPress 4.9.8 exposes the concept of object types and object subtypes to developers; while we added wrappers for this function as well, these only exist for the two object types that support subtypes (“post” and “term”). Developers should however still be aware of what object types and object subtypes are, how they relate, and most importantly how they integrate with the historically more commonly known terms:

The term “object type” is not a synonym of the term “post type”. Instead, “post type” is a synonym for the subtype of the “post” object type.

Metadata registration was just the first example where object types and subtypes are exposed due to an abstract implementation. We are now already starting to see more APIs being built in an abstract way, for example WordPress 5.0 will add a global search endpoint to the REST API, for all content in a WordPress installation, regardless of their object type. Since the APIs for the object types are slightly different though due to history, each object type that should be supported requires an integration with the search controller to be implemented. Initially, the only such integration is for searching the object type “post”, so that it is possible to search content independently from the post type. In theory though it can support searching content of any other object type, and we will likely see this being leveraged in the future. There is already a proof-of-concept plugin for how to add support for searching terms. When interacting with the API endpoint, users can specify the object_type and object_subtype to search, indicating its abstract nature to the client side as well.

Definition of an Object Type

After having looked into the origin of the term and why it is becoming necessary to be aware of it, let’s get to a definition of an object type.

An object type in WordPress specifies a group of entities of a similar nature. The APIs of different object types follow a common structure, even in case they are technically independent. Entities of an object type are identified by a numeric ID that is set via an auto-incremented database column. Object types may or may not support subtypes, and they may or may not support metadata.

It should be added that the technical independence of the APIs is largely due to historic reasons, and they might as well have been implemented using a common interface.

The following table shows you all object types in WordPress with and their common APIs and behavior:

	comment	post	term	user	site	network
class	`WP_Comment`	`WP_Post`	`WP_Term`	`WP_User`	`WP_Site`	`WP_Network`
retrieval	`get_comment()`	`get_post()`	`get_term()`	`get_userdata()`	`get_site()`	`get_network()`
addition	`wp_insert_comment()`	`wp_insert_post()`	`wp_insert_term()`	`wp_insert_user()`	`wp_insert_site()` ¹	`wp_insert_network()` ²
modification	`wp_update_comment()`	`wp_update_post()`	`wp_update_term()`	`wp_update_user()`	`wp_update_site()` ¹	`wp_update_network()` ²
deletion	`wp_delete_comment()`	`wp_delete_post()`	`wp_delete_term()`	`wp_delete_user()`	`wp_delete_site()` ¹	`wp_delete_network()` ²
querying	`get_comments()`	`get_posts()`	`get_terms()`	`get_users()`	`get_sites()`	`get_networks()`
query class	`WP_Comment_Query`	`WP_Post_Query`	`WP_Term_Query`	`WP_User_Query`	`WP_Site_Query`	`WP_Network_Query`
subtypes	kinda	yes (post types)	yes (taxonomies)	no	no	no
metadata	yes	yes	yes	yes	yes ¹	kinda

¹ coming in WordPress 5.1
² coming in the future

It’s worth noting that there are no APIs in WordPress to easily implement custom object types. However, by adding a custom database table, implementing surrounding classes and APIs of a similar nature, and hooking into the few common integration points in core, it is definitely possible to have custom object types. The only part that can be quickly developed is metadata support: When adding a custom metadata table for a custom object type, it needs to follow the same structure as the existing ones, and the out-of-the-box Meta API can simply be used for it.

Going from there, another thought to play with is that even some of the other core database tables could be implemented as object types, as most of them have their entries identified by a numeric ID. I know of at least one example for this, which is the “WP User Signups” plugin by JJJ, which provides an object type API for the wp_signups database table in multisite, with classes such as WP_Signup and WP_Signup_Query, plus related functions.

Comparing Options and Site Metadata

Most of this post has revolved around object types, simply because they are the rather new concept that needs most explanation. However, we have so far ignored one specific database table and its API: the wp_options table. Let’s dedicate a bit of time to cover this interesting table that is sort of a special case in the database schema.

Since entries in wp_options are identified by a numeric ID, it could in theory be accessed through an object type API. However, the entries in that table are also uniquely identified by the option_name column. Essentially, the table is another key-value storage, just like metadata tables. A key difference is that the wp_options table does not allow multiple values per key, so here the numeric ID column is actually unnecessary.

Thinking about the table in a multisite scope becomes more interesting as every site has its own options table there. However, sites also have (or rather will soon have) their own metadata, via the wp_blogmeta table that will be introduced in WordPress 5.1. The main reason for introducing that table is that it allows sites to be queried by arbitrary key-value pairs (aided by WP_Meta_Query), which would be impossible to accomplish through separate options tables. That is because there is only one global wp_blogmeta table which contains the metadata for all sites, similar to the other metadata tables.

We could now ask ourselves the following: If multisite had existed and been part of the thought process from the very beginning, would there only be a wp_blogmeta table and no wp_options? Given the above reasoning, our first response would be yes, since metadata appears to only have advantages over options. On the other hand, it is rather common knowledge that the wp_options table is very much over- and misused by plugins, which causes it to easily reach a massive size. Following this thought, if all data in all options tables were in wp_blogmeta, the table size would become insane (I don’t have any other word for it) in large multisites. Another argument for the distinction between options and metadata could be that site metadata should be data for a site that is only relevant in the network context (for example querying or listing sites in multisite), while options should be relevant for the site’s own context (in other words, actual settings that we wouldn’t ever want to filter sites by). However, there is some overlap here: The site title for example, which is stored in wp_options, is relevant for both contexts.

Aside: Network Options or Network Meta?

The problem of the differentiation between metadata and options gets worse when we look at how network options are implemented: Internally, they are stored in a wp_sitemeta table, which, as its name and schema indicate, is actually a table for network metadata. However, it does not use the Meta API, but a custom Network Option API:

get_network_option( $network_id, $option_name )
add_network_option( $network_id, $option_name, $option_value )
update_network_option( $network_id, $option_name, $option_value, $prev_value )
delete_network_option( $network_id, $option_name )

It looks actually quite similar to the Meta API in terms of signatures, however a difference is that it restricts usage so that only a single value per key is allowed, which is similar to how regular options work – although the table itself would allow multiple values per key. Furthermore the $network_id parameter will fall back to the current network if null is provided, different from the Meta APIs $object_id which must always be provided a valid ID manually. Internally, the functions use custom code, not even reusing parts of the Meta API in any way. There have been discussions for a while whether the API should be migrated to an actual Network Meta API, whether it should stay as it is but use the Meta API internally, or whether it should remain untouched. While this table is a special case, it just adds to the confusion of how metadata and options differ.

The truth is that we haven’t figured out a clear separation between metadata and options yet. We know why we introduced wp_blogmeta; what we don’t know is whether wp_options would still exist if wp_blogmeta had been thought of early on. We also don’t know what would make a clear policy for deciding whether a piece of data should live in wp_options or wp_blogmeta.

Organization and relations of WordPress data structures

With this post, I wanted to provide a detailed overview about how data is structured in WordPress, and I hope that you don’t leave this in total confusion. Instead I hope that the confusion that you do have leads to further communication and discussion about approaching these cases. We touched on object types, object subtypes, relationships, metadata and options, and most of these are either rather new concepts or rather unclear still. If you have ideas, please share them, in conversation, in a blog post, in a Trac ticket. I’m looking forward to your thoughts.

Types, Subtypes, Meta, Options – An Abstract View On Data Structures in WordPress