sqlite has fairly decent full-text search, if you have it compiled in (and most reasonable places you can get your sqlites, they will have it). Only problem is that in order to rank the results—a rather common task when you’re doing full-text search in the first place—you need a bit of rocketsciencing. All very nicely documented mind you, but writing C for a web search app is not really my thing. Fortunately creating user-defined functions is also very easy to do from the python sqlite db-api interface, and also well documented. However, it’s not immediately obvious, and after having to re-learn how to do it, I thought I’d share.
So, here’s how I do it:
def make_rank_func(weights): def rank(matchinfo): # matchinfo is defined as returning 32-bit unsigned integers # in machine byte order # http://www.sqlite.org/fts3.html#matchinfo # and struct defaults to machine byte order matchinfo = struct.unpack("I"*(len(matchinfo)/4), matchinfo) it = iter(matchinfo[2:]) return sum(x*w/x for x, w in zip(zip(it, it, it), weights) if x) return rank def get_results(query): db = sqlite3.connect('tv.db') db.row_factory = sqlite3.Row db.create_function("rank", 1, make_rank_func((1., .1, 0, 0))) c = db.cursor() c.execute("select title, img, url from" " (select rank(matchinfo(tv)) as r, title, img, url" " from tv where tv match ?)" " where r > .03 order by r desc limit 10", (query,)) return c.fetchall()
the nested select there is because I don’t want to call the rank function multiple times per row.