Interoperability with Lunr.js¶

A key goal of Lunr.py is interoperability with Lunr.js: building an index with Lunr.py and being able to read it using Lunr.js without having to build it on the client on each visit.

The key step in this process is index serialization, which is possible thanks to lunr-schema.

The serialization process in Lunr.py consist on calling Index.serialize, here is a complete example with the data from the introduction:

>>> import json
>>> from lunr import lunr
>>> documents = [{
...:         'id': 'a',
...:         'title': 'Mr. Green kills Colonel Mustard',
...:         'body': """Mr. Green killed Colonel Mustard in the study with the
...: candlestick. Mr. Green is not a very nice fellow."""
...:     }, {
...:         'id': 'b',
...:         'title': 'Plumb waters plant',
...:         'body': 'Professor Plumb has a green and a yellow plant in his study',
...:     }, {
...:         'id': 'c',
...:         'title': 'Scarlett helps Professor',
...:         'body': """Miss Scarlett watered Professor Plumbs green plant
...: while he was away on his murdering holiday.""",
...:     }]
>>> idx = lunr(
...:    ref='id',
...:    fields=[dict(field_name='title', boost=10), 'body'],
...:    documents=documents
...: )
>>> serialized_idx = idx.serialize()
>>> with open('idx.json', 'w') as fd:
...:    fd.write(json.dump(serialized_idx))

As you can see serialize will produce a JSON friendly dict you can write to disk and read from Lunr.js. The following snippet shows how to read the index using Node.js:

> const fs = require('fs')
> const lunr = require('lunr')
> const serializedIndex = JSON.parse(fs.readFileSync('idx.json'))
> let idx = lunr.Index.load(serializedIndex)
> idx.search('plant')
[
  {
    ref: 'b',
    score: 1.599,
    matchData: { metadata: [Object: null prototype] }
  },
  {
    ref: 'c',
    score: 0.13,
    matchData: { metadata: [Object: null prototype] }
  }
]

!!! Note The search will only the references of the matching documents. It is up to you to keep mapping of the documents in memory to be able show richer results which means in a web environment you will need to serve two files, one for the index and another the collection of documents.

Loading a serialized index¶

You can also do the reverse operation of reading a serialized index produced by Lunr.py or Lunr.js using the Index.load class method:

>>> import json
>>> from lunr.index import Index
>>> with open("idx.json") as fd:
...     serialized_idx = json.loads(fd.read())
...
>>> idx = Index.load(serialized_idx)
>>> idx.search("plant")
[{'ref': 'b', 'score': 1.599, 'match_data': <MatchData "plant">}, {'ref': 'c', 'score': 0.13, 'match_data': <MatchData "plant">}]

Language support¶

Lunr.js uses the lunr-languages package, a community driven collection of stemmers and trimmers for many languages.

Porting each of those into Python was not feasible so Lunr.py uses NTLK for language support and will configure the serialized index as expected by Lunr.js to ensure compatibility.

However, this produces differences in scoring when loading indices from Lunr.py into Lunr.js larger than those observed using the base english implementation, due to inherent differences in the implementation of said stemmers and trimmers.