WordPress Multi-Multisite: A Case Study

Home » WordPress Multi-Multisite: A Case Study

The mission: Provide a dashboard within the WordPress admin area for browsing Google Analytics data for all your blogs.

The catch? You’ve got about 900 live blogs, spread across about 25 WordPress multisite instances. Some instances have just one blog, others have as many as 250. In other words, what you need is to compress a data set that normally takes a very long time to compile into a single user-friendly screen.

The implementation details are entirely up to you, but the final result should look like this Figma comp:

Design courtesy of the incomparable Brian Biddle.

I want to walk you through my approach and some of the interesting challenges I faced coming up with it, as well as the occasional nitty-gritty detail in between. I’ll cover topics like the WordPress REST API, choosing between a JavaScript or PHP approach, rate/time limits in production web environments, security, custom database design — and even a touch of AI. But first, a little orientation.

Let’s define some terms

We’re about to cover a lot of ground, so it’s worth spending a couple of moments reviewing some key terms we’ll be using throughout this post.

What is WordPress multisite?

WordPress Multisite is a feature of WordPress core — no plugins required — whereby you can run multiple blogs (or websites, or stores, or what have you) from a single WordPress installation. All the blogs share the same WordPress core files, wp-content folder, and MySQL database. However, each blog gets its own folder within wp-content/uploads for its uploaded media, and its own set of database tables for its posts, categories, options, etc. Users can be members of some or all blogs within the multisite installation.

What is WordPress multi-multisite?

It’s just a nickname for managing multiple instances of WordPress multisite. It can get messy to have different customers share one multisite instance, so I prefer to break it up so that each customer has their own multisite, but they can have many blogs within their multisite.

So that’s different from a “Network of Networks”?

It’s apparently possible to run multiple instances of WordPress multisite against the same WordPress core installation. I’ve never looked into this, but I recall hearing about it over the years. I’ve heard the term “Network of Networks” and I like it, but that is not the scenario I’m covering in this article.

Why do you keep saying “blogs”? Do people still blog?

You betcha! And people read them, too. You’re reading one right now. Hence, the need for a robust analytics solution. But this article could just as easily be about any sort of WordPress site. I happen to be dealing with blogs, and the word “blog” is a concise way to express “a subsite within a WordPress multisite instance”.

One more thing: In this article, I’ll use the term dashboard site to refer to the site from which I observe the compiled analytics data. I’ll use the term client sites to refer to the 25 multisites I pull data from.

My implementation

My strategy was to write one WordPress plugin that is installed on all 25 client sites, as well as on the dashboard site. The plugin serves two purposes:

  • Expose data at API endpoints of the client sites
  • Scrape the data from the client sites from the dashboard site, cache it in the database, and display it in a dashboard.

The WordPress REST API is the Backbone

The WordPress REST API is my favorite part of WordPress. Out of the box, WordPress exposes default WordPress stuff like posts, authors, comments, media files, etc., via the WordPress REST API. You can see an example of this by navigating to /wp-json from any WordPress site, including CSS-Tricks. Here’s the REST API root for the WordPress Developer Resources site:

The root URL for the WordPress REST API exposes structured JSON data, such as this example from the WordPress Developer Resources website.

What’s so great about this? WordPress ships with everything developers need to extend the WordPress REST API and publish custom endpoints. Exposing data via an API endpoint is a fantastic way to share it with other websites that need to consume it, and that’s exactly what I did:

Open the code

<?php

[...]

function register(WP_REST_Server $server) 
  $endpoints = $this->get();

  foreach ($endpoints as $endpoint_slug => $endpoint) 
    register_rest_route(
      $endpoint['namespace'],
      $endpoint['route'],
      $endpoint['args']
    );
  

You’ll note that I’m following the principle of least privilege by taking steps to only allow CORS where it’s necessary.

Auth, Part 2: I’ve been known to auth myself

I authenticated an Ajax call from the dashboard site to the client sites. I registered some logic on all the client sites to allow the request to pass CORS. But then, back on the dashboard site, I had to get that response from the browser to the server.

The answer, again, was to make an Ajax call to the WordPress REST API endpoint for storing the data. But since this was an actual database write, not merely a read, it was more important than ever to authenticate. I did this by requiring that the current user be logged into WordPress and possess sufficient privileges. But how would the browser know about this?

In PHP, when registering our endpoints, we provide a permissions callback to make sure the current user is an admin:

<?php

// ...

function get() 
  $version = 'v1';
  return array(

    'update_blogs' => array(
      'namespace' => 'LXB_DBA/' . $version,
      'route'     => '/update_blogs',
      'args'      => array(
        'methods' => array( 'PATCH' ),
        'callback' => array( $this, 'update_blogs_cb' ),
        'permission_callback' => array( $this, 'is_admin' ),
        ),
      ),
      // ...
    );           
  
                      
function is_admin() 
    $out = current_user_can( 'update_core' );
    return $out;

JavaScript can use this — it’s able to identify the current user — because, once again, that data is localized. The current user is represented by their nonce:

async function insertBlog( data ) 
    
  let url = lexblog_network_analytics.endpoint_urls.insert_blog;

  try 
    await $.ajax(
      url: url,
      method: 'POST',
      dataType: 'json',
      data: data,
      headers: 
        'X-WP-Nonce': getNonce()
      
    );
   catch (error) 
    console.error('Failed to store blogs:', error);
  


function getNonce() 
  if( typeof wpApiSettings.nonce == 'undefined' )  return false; 
  return wpApiSettings.nonce;

The wpApiSettings.nonce global variable is automatically present in all WordPress admin screens. I didn’t have to localize that. WordPress core did it for me.

Cache is King

Compressing the Google Analytics data from 900 domains into a three-minute loading .gif is decent, but it would be totally unacceptable to have to wait for that long multiple times per work session. Therefore I cache the results of all 25 client sites in the database of the dashboard site.

I’ve written before about using the WordPress Transients API for caching data, and I could have used it on this project. However, something about the tremendous volume of data and the complexity implied within the Figma design made me consider a different approach. I like the saying, “The wider the base, the higher the peak,” and it applies here. Given that the user needs to query and sort the data by date, author, and metadata, I think stashing everything into a single database cell — which is what a transient is — would feel a little claustrophobic. Instead, I dialed up E.F. Codd and used a relational database model via custom tables:

In the Dashboard Site, I created seven custom database tables, including one relational table, to cache the data from the 25 client sites, as shown in the image.

It’s been years since I’ve paged through Larry Ullman’s career-defining (as in, my career) books on database design, but I came into this project with a general idea of what a good architecture would look like. As for the specific details — things like column types — I foresaw a lot of Stack Overflow time in my future. Fortunately, LLMs love MySQL and I was able to scaffold out my requirements using DocBlocks and let Sam Altman fill in the blanks:

Open the code
<?php 

/**
* Provides the SQL code for creating the Blogs table.  It has columns for:
* - ID: The ID for the blog.  This should just autoincrement and is the primary key.
* - name: The name of the blog.  Required.
* - slug: A machine-friendly version of the blog name.  Required.
* - url:  The url of the blog.  Required.
* - mapped_domain: The vanity domain name of the blog.  Optional.
* - install: The name of the Multisite install where this blog was scraped from.  Required.
* - registered:  The date on which this blog began publishing posts.  Optional.
* - firm_id:  The ID of the firm that publishes this blog.  This will be used as a foreign key to relate to the Firms table.  Optional.
* - practice_area_id:  The ID of the firm that publishes this blog.  This will be used as a foreign key to relate to the PracticeAreas table.  Optional.
* - amlaw:  Either a 0 or a 1, to indicate if the blog comes from an AmLaw firm.  Required.
* - subscriber_count:  The number of email subscribers for this blog.  Optional.
* - day_view_count:  The number of views for this blog today.  Optional.
* - week_view_count:  The number of views for this blog this week.  Optional.
* - month_view_count:  The number of views for this blog this month.  Optional.
* - year_view_count:  The number of views for this blog this year.  Optional.
* 
* @return string The SQL for generating the blogs table.
*/
function get_blogs_table_sql() 
  $slug = 'blogs';
  $out = "CREATE TABLE $this->get_prefix()_$slug (
      id BIGINT NOT NULL AUTO_INCREMENT,
      slug VARCHAR(255) NOT NULL,
      name VARCHAR(255) NOT NULL,
      url VARCHAR(255) NOT NULL UNIQUE, /* adding unique constraint */
      mapped_domain VARCHAR(255) UNIQUE,
      install VARCHAR(255) NOT NULL,
      registered DATE DEFAULT NULL,
      firm_id BIGINT,
      practice_area_id BIGINT,
      amlaw TINYINT NOT NULL,
      subscriber_count BIGINT,
      day_view_count BIGINT,
      week_view_count BIGINT,
      month_view_count BIGINT,
      year_view_count BIGINT,
      PRIMARY KEY (id),
      FOREIGN KEY (firm_id) REFERENCES $this->get_prefix()_firms(id),
      FOREIGN KEY (practice_area_id) REFERENCES $this->get_prefix()_practice_areas(id)
  ) DEFAULT CHARSET=utf8mb4;";
  return $out;

In that file, I quickly wrote a DocBlock for each function, and let the OpenAI playground spit out the SQL. I tested the result and suggested some rigorous type-checking for values that should always be formatted as numbers or dates, but that was the only adjustment I had to make. I think that’s the correct use of AI at this moment: You come in with a strong idea of what the result should be, AI fills in the details, and you debate with it until the details reflect what you mostly already knew.

How it’s going

I’ve implemented most of the user stories now. Certainly enough to release an MVP and begin gathering whatever insights this data might have for us:

Screenshot of the final dashboard which looks similar to the Figma mockups from earlier.
It’s working!

One interesting data point thus far: Although all the blogs are on the topic of legal matters (they are lawyer blogs, after all), blogs that cover topics with a more general appeal seem to drive more traffic. Blogs about the law as it pertains to food, cruise ships, germs, and cannabis, for example. Furthermore, the largest law firms on our network don’t seem to have much of a foothold there. Smaller firms are doing a better job of connecting with a wider audience. I’m positive that other insights will emerge as we work more deeply with this.

Regrets? I’ve had a few.

This project probably would have been a nice opportunity to apply a modern JavaScript framework, or just no framework at all. I like React and I can imagine how cool it would be to have this application be driven by the various changes in state rather than… drumrolla couple thousand lines of jQuery!

I like jQuery’s ajax() method, and I like the jQueryUI autocomplete component. Also, there’s less of a performance concern here than on a public-facing front-end. Since this screen is in the WordPress admin area, I’m not concerned about Google admonishing me for using an extra library. And I’m just faster with jQuery. Use whatever you want.

I also think it would be interesting to put AWS to work here and see what could be done through Lambda functions. Maybe I could get Lambda to make all 25 plus 900 requests concurrently with no worries about browser limitations. Heck, maybe I could get it to cycle through IP addresses and sidestep the 429 rate limit as well.

And what about cron? Cron could do a lot of work for us here. It could compile the data on each of the 25 client sites ahead of time, meaning that the initial three-minute refresh time goes away. Writing an application in cron, initially, I think is fine. Coming back six months later to debug something is another matter. Not my favorite. I might revisit this later on, but for now, the cron-free implementation meets the MVP goal.

I have not provided a line-by-line tutorial here, or even a working repo for you to download, and that level of detail was never my intention. I wanted to share high-level strategy decisions that might be of interest to fellow Multi-Multisite people. Have you faced a similar challenge? I’d love to hear about it in the comments!


WordPress Multi-Multisite: A Case Study originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.

​ 

Leave a Comment

Your email address will not be published. Required fields are marked *