cross-posted from: https://piefed.zip/c/commandline/p/1389995/cli-based-bookmark-manager-based-on-indexing-visited-sites-and-search-engine-like-queries

Funny thing: I just discovered Piefed doesn’t implement cross-posting. So, crossposting from an alt.

Most of the cases where most people use bookmarks, I want a search engine based on only sites I’ve visited. I don’t know whether I’m dramatically different from other people, but by the time I’m looking for a site, I’ve forgotten the most unique attributes, and even my own tagging often ends up tagging the wrong sorts of attributes. Tagging is still better for me than hierarchical organization, but what I really want is a sort of command-line search engine that searches only sites I’ve visited before.

I’ve frequently thought about building such a thing, but every time I do I think, “someone must have already built this.” So:

Does anyone know of a tool like bmm or buku, but which indexes the URL’s main page, and has a command-line tool for keyword querying the DB like a search engine? As in, performing stemming and lemmatization? It’d be like bmm/buku’s tag search, only the tags would be a search engine index of the page.

What I do not want is

  • a self-hosted, web-based UI search engine
  • a self-hosted bookmark manager; buku and bmm are already both fine tools, and I’m not trying to solve “access all my bookmarks from everywhere”. That latter I can do with rsync or syncthing.
  • a command-line bookmark manager… unless it conforms to the constraints above: queries should function on a full-text index of selected web pages. Again, buku and bmm would be fine if my tagging skills were better.
  • a crawler-based search engine

I do want:

  • the convenience of giving the tool a URL and having it auto-tag. buku does this, except that IME the resulting tags correlate even less well to how I remember things when I want to search than my manual tagging does.
  • some fuzziness in the search; my current problem is how constrained the searches are. This isn’t their failing; I simply have obtuse recollection skills. I tag “dog,pet,animal”, but when I’m looking for it, what I remember is “it’s got four legs”.
  • local, command-line
  • indexing a page of a given URL. Recursive is optional; I probably wouldn’t use it, but if it’s there that’s fine. I just want to be able to limit the indexing to a single page.

This is my last ditch effort to find an existing tool; otherwise, I’m going to build it, because it’s not a hard problem. Which is in part why I’m having trouble believing someone hasn’t already built it.