Great idea, and great suggestions I must say. You really put lot of effort to this, and you even offer a possible solution – that is something I rarely see in the support forums 🙂
I really like your idea finding related items, it would be actually very useful. One huge issue I’m always facing is, that I always have to consider what an average users server architecture looks like, and tailor the plugin balancing the performance and the relevance of results to that architecture.
It’s fun in a way, that I have too come up with interesting ways of getting around complicated queries and solutions to maintain as much performance as possible. One of these solutions is the index table architecture, which is similar to what the relevanssi plugin uses. It generates lots of lines (I’ve seen installations with over million records in the index table), yet the performance is linear, and very effective to query, if it’s done right. It’s one of the things I learned about databases – that the amount of data is less important than the proper architecture and proper ways of querying that data in terms of performance.
Anyways, I can see a possible solution to achieve something similar as you described using the index table, as it holds all the data required: keyword occurence data, post IDs, language information etc..
This is going to happen sooner or later, I’ve already noted as a feature in a future release, everything seems to be given to implement it in some way, without hurting the performance.
Ordering by clickthrough: I’m actually planning to implement this, alongside with the statistics re-work. It should be available in the upcoming (major) release.