Ranking Make API Search Results

I've recently been exploring how I can resolve this Bugzilla bug, which is requesting a way to promote original makes in the Make API's search results. This means implementing a way to "boost" results based on some kind of rule, making them appear at the top of search results. Besides solving the problem in that bug, implementing this feature will enable far more powerful searching capabilities for Make API consumers.

Let me illustrate the problem with a hypothetical example. Lets pretend there is a make titled "How to do a Backflip". Now, lets imagine that this original make gets remixed nine times, and all nine remixes have the same title as the original. That gives us ten makes - the original, and nine remixes

If I were to do a simple search for makes titled "How To", The search results would likely contain the remixed makes before the original. I'm not entirely sure what determines the order, but my best guess is they're ordered on a "first matched, first returned" basis. This is happening because the default searches the MakeAPI uses are filtered searches. Filtered searches do not rank hits, they just match values and return results.

My first step was to identify the correct Elasticsearch query for the job. I picked through their documentation and selected several query/filter types that looked promising. The first was the Boosting Query which was able to solve the issue reported in bug 887568. However, It wasn't useful for special use cases like script based score boosting.

From there, my attention turned to the custom filters score query. It lets you apply one or many filters to the results of a query, and for each matching hit you can boost the score of it by a boost value or a script. I actually worked with this script for a while, but eventually discovered an annoying problem. I needed a way to apply a script to all the search hits, and when I attempted to use a "match_all" filter (does what you expect), I'd get an error from elasticsearch.

I ditched that query for a newer, better query - the Function Score Query. This query provides the same functionality of the two mentioned above, but it also allows the "match_all" query as a filter! Here's what the solution to Bug 887568 looks like:

{
  "query": {
    // a function score query!
    "function_score": {
      // functions are filters/queries that apply
      // boosts to search hits that match them
      "functions": [
        {
          // this function is a filter
          "filter": {
            // a missing filter
            "missing": {
              // if the remixedFrom field is null...
              "field": "remixedFrom"
            }
          },
          // BOOST THE SCORE!
          "boost_factor": 2.0
        }
      ],
      // score mode for if there are multiple functions.
      // "sum" means ad the results of the function scores.
      // can also be "first", "avg", "max", "min" and "multiply"
      "score_mode": "sum",

      // The actual query to run and apply the scoring functions to
      // In this case it's a generic match all query that the makeapi runs, with some special filters for deleted and unpublished records.
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "must": [
                {
                  "match_all": {}
                }
              ],
              "should": []
            }
          },
          "filter": {
            "bool": {
              "must": [
                {
                  "missing": {
                    "field": "deletedAt",
                    "null_value": true
                  }
                },
                {
                  "term": {
                    "published": true
                  }
                }
              ],
              "should": []
            }
          }
        }
      }
    }
  },
  "size": 100,
  "from": 0
}

Now that I know what kind of Query DSL I need to build, my next step is to design a simple enough API to use this type of search. Unfortunately, the search endpoint of the MakeAPI uses a querystring API for building searches. Before I can continue, I've decided to implement a new enpoint for searching. This will be the v3.0 of the Make Search API. Check back soon for a post explaining how it will function!