Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mojeek – The alternative search engine that puts the people who use it first (mojeek.com)
110 points by red369 on July 21, 2024 | hide | past | favorite | 58 comments


In their article “Independent and Unbiased Search Results” I believe Mojeek fundamentally misunderstands the goal of a search engine. There is no such thing as an unbiased search engine. The entire purpose of a search engine is to be biased towards relevance. The opposite of biased isn’t fair, it is random.

If you search for “cars” what should the engine show you? There are likely billions of pages containing the word. (And for some queries, the correct page won’t even contain an exact string match.) If you want to create a useful ranking, your first few results need to have identified the correct intent but also be sensitive to timeliness and location.

https://blog.mojeek.com/2018/08/independent-and-unbiased-sea...


The entire purpose of a search engine is to be biased towards relevance. The opposite of biased isn’t fair, it is random.

The opposite of "biased" is "unbiased" - as in the title - meaning neutral or impartial here and I think you are possibly misinterpreting the context.

The blog entry you link to is from 2018, at the height of the culture wars when the big tech players including Google were being criticised for pushing political agendas in what they displayed to the user.

The (marketing) claim here is clearly suggesting Mojeek does not use an ideological filter.


I understand that the term “unbiased” exists, and that this article was written promising a lack of political bias. I purposefully chose an anodyne example of “cars” to illustrate that even in a context that’s not politically charged, promising “unbiased” results is a fool’s errand.

In particular, if you have any distribution of political opinion that is correlated with any other factor you consider in relevance calculations, there’s no way of avoiding accusations of bias. I’ll give two more examples.

If I search for “best baseball team” (or if you like, “mlb game today”) how many results should be from Red Sox fans? If I’m in Boston? If I’m in LA? There is obviously a desirable geographic bias to the query.

To take a more politically charged query, if I search for “Afghanistan war” there is a geographic bias (whether to show results for Soviet or American invasions depends somewhat on location), but also a temporal bias. Political support for American occupation decreased over time. So if your search engine favors recent results, it will have more negative sentiment than overall sentiment of all possible articles you may have shown.

The final point is that these are not systems which are easily interpretable. Aside from explicit filtering, it’s entirely possible to start from reasonable rules like timeliness, text matching, and location matching, but still end up with skewed results and be accused of bias, because they are biased, just not in the way people think.


I'm not a sports fan, but if I search "best baseball team" I would expect to find the team with the most wins, or the best current set of players based on their historical stats, or the team most likely to win the current or next season. So maybe a time bias (want to know what team is best now, not 10 years ago), but not a geography bias. I want to avoid echo chambers. I don't want to be told the team closest to me is best just because it might make me feel happy.


Then why not search for "the baseball team with the most wins" or "best baseball team historically", "baseball team most likely to win in 2025", etc.?

With a query like the one in your example, someone else might be completely justified in expecting to find the team with the best financials, the best little-league option for their child, etc.

I believe such queries are a perfect case of GIGO, and the amount of effort spent on optimizing for them instead of giving users more agency is what's driving down both the experience (bad results) and usability (little to none operators and explicit filters) of search engines as tools.


> With a query like the one in your example, someone else might be completely justified in expecting to find the team with the best financials, the best little-league option for their child, etc.

They would be, yes. My point is don't optimize for emotions, optimize for most probable interpretation of the query, and then yes, if it can be misconstrued, give me options, and give me a boat load of filters. I've been using the time filter in Google recently, I didn't even realize they had it because it's so well hidden.


yes, sadly, hiding functions and ignoring search operators has been a trend for Google.


I don't think things are this simple though. Spaces that attempt to either have no moderation, end up over-representing those who hit up against moderation elsewhere. Having moderation requires having an opinion, and the opinion ultimately becomes the product.


A search engine isn’t really a space though, it’s a tool. It’s not a social network.

The job of a search engine is to return relevant (by some metric) results given a query string, not to show you posts you’ll agree with or which the authors of the search engine find tasteful.

You might say that the relevance metric is a kind of bias, and you’re not wrong, but those biases can usually be empirically measured and understood, unlike the biases of unmoderated people in social networks.


I would indeed say that relevance measurement is the bias. You can build a bad search engine with simple rules like TFIDF, but pretty much anything more than that needs opinion and bias.

For example, if there's a content farm that rips off content from another site, assuming you don't want to show both the original and the farmed content (which would be a bad search engine by most people's metrics), you need an opinion on which the original was. Sometimes you might know for sure, but often you won't. Even if you do know who the original was, you need the opinion that the original is better than the copy. When you have content farms providing summarised versions, now you need an opinion on whether summarised content has value instead of/alongside the original – summaries do provide value, but how much?

None of this gets to politics or the legal requirements for handling things like hate speech or other content that is illegal in some/all regions, and you've already got essentially editorial control being exerted.

Social media moderation and search engine results are fundamentally the same problem, it's just that the base interaction is a search rather than viewing a following feed or viewing a video.


Yes, exactly. If a group of anarchists puts up spammy websites optimized for TF-IDF, blocking or downranking them suddenly means you are biased against anarchists.

The flip side is that history is littered with examples of rules which seem neutral on their face but which have very biased intentions. For example “literacy tests” for voting, or poll taxes.


Undoubtedly it's a difficult task, but I think there are degrees to which you can at least have a decent go at good faith moderation (eg see HN) and the article is appealing to a perspective that that good faith isn't there.


> you want to create a useful ranking, your first few results need to have identified the correct intent but also be sensitive to timeliness and location.

This fundamentally misunderstands the goal of a search engine too. This part is mainly for helping to sell ads and not rank by relevance.

The correct way to rank results for a query like Rome is to return results related to a modern city, then ancient times, and keep the local pizza place at the bottom. Yelp/maps/search really don’t need to be all wrapped up in the same utility function, and honestly no one asked for this. It’s just been that way for so long that people have internalized the idea that it must be essential.

Localization/trending content etc mainly serve the needs of advertisers, trackers, marketers.. not searchers.

News and product search can/should probably be orthogonal, but instead we’re going the opposite direction and increasingly removing distinctions between maps/video/image/text search. This helps platforms to erode the distinction between user-directed organic discovery vs promoted content.


Information neutrality is preferred to unbiased (which we used in 2018) but a bit of a mouthful and not widely understood. For more recent and nuanced views on this key topic see: https://blog.mojeek.com/2022/05/freedom-to-seek-matters.html


Agreed with this, rewarding users doesn't make much sense in this context, I don't want to know what other users are publishing I want to know what's the best result across the internet.

How much will "wikipedia" be using this search engine? They should probably be on the first page of a lot of queries but they wouldn't ever be using this.

It's definitely a cool idea but I wouldn't call it a search engine, maybe something some "community search engine"? Definitely a good use for this similar to stackoverflow for teams where you can bias results towards your own network


> If you search for “cars” what should the engine show you?

A definition of the word car, and a wikipedia link, should suffice. The more generic a search term is the fewer results it should surface, IMO.


There is a bias in who defines "relevancy". If the search engine implicitly sets the indexing and ranking algorithms, filters, etc., the results are biased.

If the user can control the filters, set their ranking algorithm, then the relevancy (bias) is in the user's control. I can tell the search engine what it needs to show.

There are billions of pages with the word "cars"? Alright, I want to see 20 pages that were indexed most recently, excluding sites that are flagged as spam/malware/SEO scams by a blocklist of my choice. I'd also add a contextual filter/category to exclude the animated movie and mechanical means of transport, as my intent is news about the band. No semantic engine, no synonyms this time, I only need exact matches, thanks.

Let me do all this with a GUI or at least with search operators that are always respected. And allow me to save these settings as bookmarklets.

Instead, there's way too much effort going into these black boxes of manure, which results in groupthink and uncritical repetition of claims such as "you want to create a useful ranking, your first few results need to have identified the correct intent but also be sensitive to timeliness and location"

Here's the real bias: the company behind the search engine and its product managers are defining what is "useful", that it needs to "identify the correct intent" without giving the user any way to clearly state the intent, and that it needs to be "sensitive to timeliness and location" (what if it doesn't?). This is what Google does and SEOs keep parroting, while we look at the decrepit state of Google Search without wondering whether there might be a better way to build a search engine.

Sure, a search engine like that would not be a mass market product, but it would not give its users results that fit the lowest common denominator of a population with a median reading comprehension level of an 8th grader.


Any alternative search engine that has its own, independent index is worth supporting, imo.

Mojeek results have overall been useful the last couple of times I used it and the team actually responds to search feedback.


I'm very happy to see this exists! While I support Kagi, I believe they just use Bing API for most search results. We need real alternatives with their own search index.

Of course, we need to hamper our expectations. Google reportedly indexed over 400 billion pages in 2020, whereas Mojeek reported just 6 billion pages in 2023. So I'm not expecting to be wowed by Mojeek, but so far it doesn't seem too bad.

Actually what excites me about it is it seems to use a more basic keyword search, unlike Google and Bing which are quite liberal in how they interpret your search query. If you're looking for specific keywords or a specific string, you may have better luck using Mojeek.


Kagi displays what portion of results are from their own index - I almost always see 50% or higher on this. I think they also use a mix of several external indexes rather than solely bing.

Kagi also has the advantage of letting you customise rankings to your preferences: I've got mine set to always pin a relevant wikipedia article to the very top, and to bubble MDN pages up and w3schools pages down.


> Kagi also has the advantage of letting you customise rankings to your preferences: I've got mine set to always pin a relevant wikipedia article to the very top, and to bubble MDN pages up and w3schools pages down.

How do you do this? I don't see it in the options.


For a search result you can click on the shield icon, and select the ranking preference for the results of that domain.

You can also manually add domains here: https://kagi.com/settings?p=user_ranked



Mojeek isn't mentioned on the up-to-date version of this page, so not really an indication whether they still use their index.


We do.


Ah, cool! Thanks for the clarification! :)


So... an ad-based search engine? Sounds familiar. The alternative needed is an alternative business model


kagi exists. If you're not already using that then I'm not sure what other alternative business model you can ask for.


Indeed, I use Kagi daily. My point above was simply that this company looks like it will be subject to the same incentive problems as almost all other search engines, Kagi being the exception.


All we need is to show less SEO and more sites made with passion, that have communities involved in them, etc.


Aka things made by real people and not marketing drones.

Unfortunately, all the "real people" are only publishing things on closed-off social media sites instead of the wider internet, so the majority of things outside of those sites is either crap nobody wants to read, or mono-culture nerd/tech blogs from people like us.


feels pretty good from just a few searches. I used DDG for some time as an alternative to google. In the end I had to switch back to google - the quality just wasn't as good as google's. Mojeek appears to have been around for 20+ years... is the business model just to sell search APIs to others?


I've been using DDG for a few years now, but while I think the quality isn't there compared to Google 8 years ago, it seems reasonably similar to Google as of now.


That says more about Google dropping in quality than about ddg becoming any better.


also DDG has !bangs

Even if I were to use Google, I'd remain with DDG (using '!g'!) so I can use other bangs


The business model includes ads, according to their website.


Lots of HN Q&A with their CEO from 4 years ago: https://news.ycombinator.com/item?id=25372401

I was a bit surprised that on my first visit to the main mojeek search page the "No tracking. Just search" text entry field was prepopulated with a few suggestions that are personally relevant, possibly from reddit searches I've made!

Is this something the browser manages, or is there some tracking after all, I wonder?


That's your browser


There certainly is a need for an alternative to google search, maybe not ads based. It’s something that’s often discussed here on HN or elsewhere. Whether it being brave search, ddg, kagi search or whatever. Thing is I doubt the next “google search” killer will be a traditional search engine at all, like this Mojeek, and I can’t see something like Kagi attracting the masses(who never paid for search) with a paid subscription, it’s very clear to me that their business model is a niche business model. I don’t want to say the inflated word of todays news, but my feeling is the only way to kill google search is to not directly compete with them but rather have something disruptive like an llm is. Direct competition is a dirty and expensive business, and sooner or later you must turn profitable, where do you think the company will turn against in order to make a profit? Funny thing is that I believe that Google knows this and jeopardize his own business with cramped Gemini in search


> I can’t see something like Kagi attracting the masses

For me it’s less about paying and more about having to log in and attach a payment profile. I’m sure the creators of Kagi are nice people, I just don’t trust whoever they may someday sell it to.


I pay their highest package on one account which has hardly a search run with it, and just make new trial accounts in VMs for my searching generally. If they ever plug the hole, I stop paying and move on. I am paying max to tip them, maintain privacy, and not be a mooching degenerate.


Although niche, and not aiming to be a google killer, kagi is profitable (according to a post blog post from May this year https://blog.kagi.com/what-is-next-for-kagi#1 ). So, who knows. Other search engines may actually live in a market owned by google...


Mojeek is now my default search engine on most of my computers.

As Mojeek has its own crawler, one often gets completely different results to say Google, Bing and DuckDuckGo.

I've found stuff on it that I've not found on those others. I'm not suggesting you give up the others completely in favor of Mojeek, rather that you use it as an adjunct to them.


I put in a word that I know would bring back at least 1,000 results that use that word precisely, and in its entirety.

It didn’t bring back a single one. I got a whole lot of results that had parts of that word, or looked similar to that word, but nothing with an exact match.

That word is not only a well-used German slang for a common vegetable (so, present in hundreds of online recipes), which is also used throughout the Middle East (food reviews and online menus), but it’s also a part of several company’s names (not related to the slang term) mainly because it’s also a German last name (online biographies and Wikipedia entries).

Clearly, this search engine has a long way to go before it comes close to being minimally functional.


Link straight to the search page: https://www.mojeek.com/

I'm definitely going to try this out along with my Google search, since Google is feeding me so much shit every day.


I’ve tried the quotation marks and “inurl:” couldn’t find the string of the dailymotion video it showed me from the previous search.

I really hope there is a spreadsheet somebody can link me to?

Edit: All the features work without JS even the pictures which is really nice and it is fast.


Just testing it out, I searched "linux", and results are:

unix.stackexchange.com

Wikipedia article

archlinux.org

ubuntu.com

linux.com (finally)

It seems to favor online exposure over relevance a bit too much IMO.

For example Ubuntu is more popular than Arch, but Arch users visit their distro's website much more frequently.

And few Linux users ever visit linux.com.


This is a kind of query you'd never use in real life though. What is even your expectation when you search "Linux"? If there isn't a specific one, it's not a good test.


Good point. Now try "Linux download", and the first result is https://www.opera.com/download. While DuckDuckGo gives you https://www.linux.org/pages/download/

My point still stands (even more).


I will keep using it and comparing it with DuckDuckGo! It's kinda fun!


DuckDuckGo results: linux.com linux.com Wikipedia article redhat.com ubuntu.com linux.com (again)


Until the people with money offer them enough of it


Currently in search for an alternative search engine to Google or Bing, used DDG for a while but found Brave's Search [1] structured results more useful which is now my default. So far so good but if Brave enshittifies their results may consider having to pay for kagi.com.

[1] https://search.brave.com


Try Kagi for sure. It’s a tool in my belt.


Beyond a tool in my belt, Kagi has been my go to since the day I heard about it.


Search engines go through cycles of usefulness. I was using qwant for over a year, but their relevance has dropped off. I'm very happy with startpage at the moment as my Google searches continue to drop. Perplexity (AI) is providing better augmentation than Google.


yuk

i typed in the query "gambling industry news".

1st result was a category page on a junky affiliate site, with last post date Sunday 25 July 2010


Mojeek employee here, will raise this to be looked into, can you let me know which country you're searching from so we can replicate it? Cheers




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: