Multiple indexes and modifying search queries for each in InstantSearch.js

Documenting my solution, because it took a LONG time to get there.

A ghost with a magnifying glass.

I've done a number of Ghost and Algolia integrations. Generally things go fine, and I'd have told you I could rip through one in a couple hours no problem.

🔮
Tangential tip: the default Ghost/Algolia integration overwrites a number of database settings when run. My write-up on how to improve the behavior is a work in progress.

Then I got to this one. My client maintains a MySQL database of text with links and brief commentary. Then he runs some beast of a build step, that assembles everything from the database and builds pages organized by topic and by region. And all of those go into Ghost, via the Admin API.

He wanted those library entries to be individually searchable, although there can be dozens or hundreds on a single Ghost page. And because they're robustly tagged for topic and region, he wanted faceting for that search. He also publishes more traditional blog posts, and he wanted those to be full-text searchable, too. Oh, and he wanted the category and region pages to appear in searches, too.

I spent too much time (and my client was very patient) flailing at trying to run search in the browser. I got it working, but the load time on my beefy development computer was about five seconds of totally locked up. I might have offloaded building the index to a web worker or something, but the odds of it being functional on a low-end computer or a phone seemed to be basically nil. So, (drumroll please!) Algolia to the rescue!

I got his content loaded without too much fuss, once I stripped out the signup cards and some boilerplate link collections that I didn't want indexed. (I typically use a lightly-modified version of Ghost's packages.) It ended up being three separate databases, because the content types were just too different, and I needed to search them differently.

I started with the link library collection and had it searching with facets pretty quickly, thanks to using InstantSearch.js. I set up facets to be searchable, since there are over a hundred of them! (I added a little extra css to improve the behavior.) So you don't have to scroll through 100 possible facets to find the one about housing, you just start typing 'housing' in the facet search box. Pretty cool.

Then I added searching of the Ghost posts, and things got messy fast. The posts don't have tags matching the topics for the link library collection. They're rather lightly tagged, and tend towards long and spanning multiple topics. My initial attempts applied the facets to the posts, but the posts didn't have the right tags, so that didn't work.

Attempt #1 (the wrong way)

I figured out that I could apply facets to just one of my indices, but I did want to capture the user's intent to restrict the search to a topic or region, by adding that facet term to the query string. Lots of head scratching ensued. Here's what worked first. I'm using multiple indices, but modifying the query string for each index, and only applying the facets to one index.

/* This is the 'wrong' way to do it in terms of minimizing "search operations", which Algolia bills on.  Read on for a better way! */ 

const searchClient1 = {
  search(requests) {
    return algoliaClient.search(requests.map(request => {
      // I thought I was going to modify params here, but in the end, I didn't.
      return request;
    }))
  }
}

// set up the first index.
const algsearch = instantsearch( {
    indexName: 'databrief',
    searchClient: searchClient1,
    searchFunction(helper) {
      if (helper.state.query != '' || helper.state.disjunctiveFacetsRefinements['topicarray'].length > 0  {
    // this is an actual search, not the initial load in. 
    // various CSS adjustments
      } else {
    // Everything is blank, so hide the search results, which are worthless.
    // various CSS adjustments
      }
      helper.search();
      // here's my big trick.  I'm calling the algsearch2 searches from the context of algsearch, 
      // which lets me grab values from all its refinement widgets.
      algsearch2.helper.setQuery(helper.state.query + ' ' + helper.state.disjunctiveFacetsRefinements['topicarray'].join(' ').search();
    }
  } );
// set up the second index
  const algsearch2 = instantsearch( {
    indexName: 'posts_index',
    searchClient: searchClient2,
    searchFunction(helper) {
      helper.search();
    }
  } );
// add widgets for the first index.
algsearch.addWidgets( [
    instantsearch.widgets.searchBox( {
      container: '#searchbox',
    }),
    instantsearch.widgets.hits( {
      container: '#hits-1',
    }),
    instantsearch.widgets.refinementList( {
      container: '#topic-list',
    // various settings omitted
    }),
    // additional widgets omitted
    ])

algsearch2.addWidgets( [
   // no search box!  Shares the algsearch searchbox -- sort of.
        instantsearch.widgets.hits( {
          container: '#hits-posts',
        } )
      ])

  algsearch.start();
  algsearch2.start();

Please forgive me - in trying to produce a small example, I've almost certainly mangled a bracket somewhere.

I suspect my use case is weird, because I couldn't find an example that did this anywhere. So there you go, internet! That's how you set up InstantSearch.js for vanilla JavaScript to use multiple indices when you need to use a different query string and facets for each. Enjoy!

💡
Update: The method above is a suboptimal way to do it, because it causes two queries to run, and query operations cost money. Better option below! Keep reading!

The right way to do it?

So instead, I need to set up just one search client (so that it batches up search calls). I need to do my rewriting not in the 'helper', but by defining a custom search function instead, like this:

const searchClient1 = {
  search(requests) {
    return algoliaClient.search(requests.map(request => {
      // modify parameters here
      request.params.facetFilters = request.params.facetFilters || [];
      if (request.indexName == 'myfirstindex') {
      // possibilities for rewriting the query
        request.params.facets = [ "*" ];
        request.params.query = request.params.query + request.params.disjunctiveFacetsRefinements['topicarray'].join(' ');
      } else if (request.indexName == 'mysecondindex') {
      // do some other rewriting 
      }
      return request;
    }))
  }
}

// here's the widgets to go with it.

algsearch.addWidgets( [
    // shared by all indices:
    instantsearch.widgets.searchBox( {
      container: '#searchbox',
    } ),
    instantsearch.widgets.index( {
      indexName: 'databrief',
    }).addWidgets([
      // these widgets are attached to only the databrief index.
        instantsearch.widgets.hits( {
          container: '#hits',          
        } ),
        instantsearch.widgets.refinementList( {
          container: '#topic-list',
          searchable: false,
          limit: 200,
          attribute: 'topicarray',
          showMore: true,
          showMoreLimit: 201,
          sortBy: [ 'name:asc' ],
        }),
     ]),
     instantsearch.widgets.index( {
      indexName: 'posts_index',
      // these widgets are only attached to the posts_index index.
      }).addWidgets([

        instantsearch.widgets.hits( {
          container: '#hits-posts'
        },
      ])
])

It's not a terribly big rewrite, but it allows each search to only be one search operation.

Why so many operations?

I was really happy with my first (suboptimal) implementation in terms of behavior, except it was absolutely eating the search budget for lunch. Thousands of operations in an hour of light testing? Ouch. That wasn't going to work.

I changed the default behavior on the search box over to searchAsYouType: false, overriding the default (really cool) behavior. Now it doesn't search until the user presses enter or submit. I thought that was going to fix it, but search operations were still huge. I had searching of facets turned on, and those are only searchAsYouType behavior, and not possible to change. So if I wanted to search for Kenya in the country box, it did a search for 'K', then 'KE', then 'KEN', etcetera. This was really frustrating, because there are only a hundred or two possible values for those facets, and paying a whole search operation for each letter was nuts. So I switched to not using the InstantSearch refinementsList widget. Instead, I'm grabbing all the facet values once and then searching in the browser for matches. Yes, I lose typo tolerance and fuzzy matching on the facets, but I'm willing to take the loss for 10x fewer operations. It's plenty fast and I can retain the searchAsYouType behavior for free.

So my new flow is:

  • One initial request on page load to get the facets.
  • Refinements have a dedicated search box and checkboxes. Instant search is not used. I'm just using JavaScript to find matches.
  • When the user clicks either "Apply Filters" or submits a search in the main search box (controlled by InstantSearch), I read the refinement checkboxes and add them onto the search terms for the posts. I use the refinement checkboxes to generate the facetRefinements for the search on the link library items. All of that is one search operation.

That's a huge gain in terms of search operations use. I'm hoping that once we get it deployed and see how much real users are using it that maybe we can searchAsYouType back on for the main search box, but pressing return, while a little bit early 2000's, is going to be OK if that's what we need to do.


Next up: Dealing with this theme's weird form CSS settings that are messing with the otherwise reasonable Algolia defaults. No, I don't want my checkboxes set to 100% width!!


No link to the client site yet. We're still working through accessibility questions and bug fixes.


Hey, before you go... If your finances allow you to keep this tea-drinking ghost and the freelancer behind her supplied with our hot beverage of choice, we'd both appreciate it!