Searx is Awesome, but the Default Settings? Not So Much

Note: This list is accurate as of June 25, 2023. Please remember that the search engine landscape may have changed if you’re referring to this list in the future. Some engines might not work, new ones may have been added, and the ones listed here may behave differently. Feel free to submit any updates or improvements you find. While I might revise this list over time, ensure to customize your Searx instance to best fit your or your users’ needs.

Hello everyone,

I decided to write this guide after I saw on the forum that some people weren’t too happy with Searx. I’ve been there too. When I first shared my Searx instance with my family and friends, they weren’t exactly thrilled. They thought Google and DuckDuckGo did a better job. I felt a bit bummed out and wasn’t too keen on sharing my instance anymore.

One night, I decided to get serious about tweaking my Searx setup. I tested each engine, activated some, deactivated others, and the result was amazing. Now my Searx instance is top-notch and it’s my default homepage because I use it non-stop.

That being said, it is really important to manage expectations. Even after all my tweaks, Searx handles around 90% of my searches. The other 10%? Well, there are times when other search engines might do a better job. For example, if I’m trying to find a local restaurant or shop, I just search on Google or Google Maps. Searx just can’t compete with them on that. But I don’t let that get to me. I mean, Google is really good at those kind of searches. For pretty much everything else, though, I’ve got Google beat.

Next, I share my list of activated engines. But remember, this is what works for me. I really encourage you to spend some time testing each engine with its shortcut (like !bi for ‘bing’) and figure out what fits your instance the best. Consider what your users might be after and make that a priority.

Before we dive into the engines, here are some quick tips:

  • If your users aren’t very tech-savvy, switch on the ‘Show advanced settings’ under the ‘General’ options tab to make the search categories visible. This could be handy because a lot of people might not even notice them otherwise.
  • I’ve experienced some downtime on a few engines, possibly because I was using an old version of Searx pre-upgrade. Some issues have been sorted, but you should definitely run your own tests.
  • Make changes to Searx’s settings.yml file for default settings that apply to new users, and for configuring your instance to open specific links on open-source web clients. I might share more about this in another post.

Now, let’s get into the engine specifics. Below, I’ve categorized the engines based on their utility and noted my experience with each. I’ve divided them into three categories - ‘Recommended,’ ‘Optional,’ and ‘Notes’.

General

This category is my go-to for most searches. I like a mix of results, including mainstream and alternative sources, as well as quick definitions, descriptions and links.

General - Recommended

  • ddg definitions: Handy for speedy definitions, descriptions, and related links - think Wikipedia and such.
  • wikidata: Great for getting some quick facts on whatever topic I’m digging into.
  • yahoo: Good mainstream results. Plus, the yahoo engine supports “Language” and “Time range” settings.
  • mojeek: This one’s a gem for getting some unique results. Mojeek does its own thing and doesn’t pull from other search engines. I’ve found a bunch of cool, smaller blogs through this.

General - Optional

  • wikipedia: I’ve turned this off since I already get Wikipedia results from ddg definitions. Turn it on if it works for you.
  • erowid: This one’s off my general search but handy with the shortcut for specialized searches on plants and plant medicines.
  • etymonline: This engine isn’t in my General search. I use the shortcut when I need information on word etymology.
  • duckduckgo: I’ve got this one off since I don’t see a huge benefit. DuckDuckGo gets results from Yahoo!, which you can easily do yourself. Check if it gives you a wider range of results.
  • qwant: Solid search results. Had some downtime recently, but it’s back up and running now.
  • reddit: I’ve got this on because I find useful stuff now and then. But depending on what you’re after, you might want to keep it under ‘Social Media’.
  • wikibooks to wikivoyage: Off my general search but useful with the shortcut for specific searches. See which fit the profile you are going for.
  • dictzone: Also off my general search but handy with the shortcut when needed.

General - Notes

  • etools: I’ve got this on because it pulls good results from a bunch of other search engines. But as of writing this, it has stopped working. Hopefully, it will come back in a future update.

Files

This category helps me quickly find torrents and launch my torrent client right from the search results.

Files - Recommended

  • piratebay: It is my main file search engine. I get very good results for torrents.

Files - Optional

  • apk mirror: I have it deactivated because I don’t usually download apk files. But it works well and I recommend that you enable it if it suits your profile.
  • fdroid: Similar to apk mirror, I don’t need to download apk files often and I use the F-Droid app on my phone, so I keep it off. However, it performs well when needed, so consider activating it if it aligns with your requirements.
  • nyaa: It offers good results. I have it activated because I like having East Asian results, but it might not suit your profile, as it is a bit niche.
  • openrepos: It works but it is very situational. I have it deactivated because it doesn’t fit my profile.
  • tokyotoshokan: I keep this activated for its Japanese media results, which I like. If you enjoy Japanese media, I’d say it’s worth activating.

Files - Notes

  • btdigg: It provides solid results and, as of now, it’s running smoothly. However, until recently, it wasn’t working properly. Now, I have it activated.

Images

Just regular image search. Nothing fancy.

Images - Recommended

  • bing images: My go-to engine for image searches. Provides solid results and supports both “SafeSearch” and “Time range”.

Images - Optional

  • frinkiac: It works well but it is too situational for me. If you’re a big fan of “The Simpsons”, consider turning this one on.
  • nyaa: I’ve got this activated in “Files”, so I figured why not here too. Check out my notes under the “Files” section for more info. Just note that it operates just like it does in “Files”. Consider if it makes sense to have it in both categories for your needs.
  • reddit: I keep this on as I frequently find relevant images. Depending on what you’re looking for, you might find it more useful to just keep it in the ‘Social Media’ category.

Images - Notes

  • qwant images: Delivers quality results and it’s currently up and running, despite some recent downtime. It also supports “Language” search.
  • unsplash: Similar to qwant, it provides good results and is functional at the time of writing, despite some recent downtime.

IT

I use this category for troubleshooting programs and code whenever something breaks. And also to search the Arch wiki.

IT - Recommended

  • arch linux wiki: It is a great source for most Linux-related issues.
    The following engines, I have them activated to get more results.
  • bitbucket
  • free software directory
  • gentoo
  • gitlab
  • github
  • codeberg
    end of list
  • searchcode code: I like it because it returns code snippets.

IT - Optional

  • npm: Currently it’s off as I find it kind of specific. But when needed, I just call it up with the shortcut.
  • framalibre: I have it deactivated because it doesn’t suit my use case. If you are interested in French content, go ahead.
  • rubygems: This one’s also switched off due to its situational use. I just use the shortcut when needed.

IT - Notes

  • lobste.rs: I keep it turned on since it fetches articles on IT subjects. It can be a bit finicky sometimes and may need a retry for results.

Map

Haven’t used this one as much as I’d like, as it doesn’t quite align with what I need.

Map - Recommended

  • openstreetmap
  • photon

Music

I use this all the time to grab links for mpv, listen to a track straight from the search results or even find lyrics.

Music - Recommended

  • genius: I activated it to find information about songs and lyrics.
  • soundcloud: You can listen to songs directly from your search results, which is pretty neat.

Music - Optional

  • deezer: It’s currently off, but if you have a subscription, it could be worth turning on.
  • mixcloud: Also turned off at the moment, but I’d suggest giving it a shot if you’re on the lookout for new artists.
  • youtube: I’ve got YouTube on as I like having music video results in my ‘Music’ category.

Music - Notes

The engines listed below work pretty much the same way as they do in “Files”. I personally keep them on, but have a look and see if it’s worth it for your needs, since they mostly provide file downloads and might make more sense to keep under “Files”.

  • btdigg
  • nyaa
  • piratebay
  • tokyotoshokan

News

Regular news search.

News - Recommended

  • bing news: Solid choice for mainstream news. Plus, it works with a “Time range” filter, which is handy.
  • wikinews: Great for historical news search.

News - Notes

  • google news: Also good for mainstream news and works with a “Time range” filter. Recently, it had issues on my instance.
  • qwant news: Good for mainstream news and supports “Language”, which can be super useful. Had some trouble with it recently, though.
  • reddit: It provides user-created, last-minute updates on news. I find it a decent resource for news.
  • yahoo news: Another good source for mainstream news. It had some downtime recently on my instance.

Science

I don’t really use this one a lot, but I want to offer my users as many research resources as I can. Mostly, I want them to have access to papers and articles.

Science - Recommended

  • arxiv
  • google scholar
  • openairdatasets
  • openairpublications
  • pbde
  • pubmed
  • wolframalpha: Pretty good to research quick facts.

Social Media

Most of the time, I’m using this to search Reddit, with the added benefit of getting Voat posts.

Social Media - Recommended

  • reddit
  • voat

Videos

This category is my go-to for video content that I can play directly from search results.

Videos - Recommended

  • bing videos: Decent results. Supports “SafeSearch” and “Time range”.
  • sepiasearch: I have it activated to get alternative results. It supports “Language”, “SafeSearch” and “Time range”. Also, you can play videos from the search results.
  • youtube: This is my primary video search engine. Has a “Time range” option. Plus, you can watch the video right from the search results without opening the YouTube page.
  • dailymotion: I have it to get alternative results. It supports “Language”. You can play videos from the search results.
  • vimeo: I have it to get alternative results. You can play videos from the search results.
  • peertube: I have it to get alternative results. It supports “Language”, and videos are playable directly from search results.
  • 1337x: This one’s activated for downloading video files since 1337x isn’t included in the “Files” engines.

Videos - Optional

ccc-tv: I don’t have this one activated. But if you’re looking for German content, give it a go.

Videos - Notes

*The following engines behave just like in “Files”. Personally, I have them activated. Check if it suits your use case. Remember, they only provide file downloads so consider if you want to keep them just under “Files”.

  • btdigg
  • nyaa
  • piratebay
  • tokyotoshokan

Web

As of now, the “Web” category only has one engine - DuckDuckGo. I’ve got it switched off, but even if I had it on, I’d put it under General.

4 Likes

Broken Engines

From time to time a search engine will not be returning results for your Searx instance. There could be many reasons. When you are receiving search results from Searx there are some engine statistics collected in the Preferences → Engines tabs.

Note that I have currency activated, but the average time is N/A. This tells me that I don’t successfully get results from this engine. I would disable that for now so Searx does not wait for no answer. If I like this engine, I can always try it again later.

1 Like

500 Internal Server Error related to uwsgi buffer size

I have reinstalled Searx with the default configuration after bookworm upgrade and confirm that this may be a problem from time to time with the current version of Freedombox and Searx. After saving changes to Searx preferences using the web UI, such as enabling a search engine, you may see:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your >request.

Please contact the server administrator at webmaster@localhost to inform them of the time this >error occurred, and the actions you performed just before this error.

More information about this error may be available in the server error log.


Apache/2.4.57 (Debian) Server at myfreedombox.freedombox.rocks Port 443

The problem will persist in your web browser from this point forward.

There may be many reasons for a 500 service error to occur. This post documents one which may be clearly identified and corrected. There are two alternatives: work around the problem using the web browser, or correct the problem in the system to reduce the chance it happens again.

Root Cause: uwsgi buffer size is too small.

If you are not eager to do this diagnosis and correction you may skip to the Web Browser Workaround at the bottom of this post. Web Browser Workaround is helpful, but these instructions can completely prevent the problem from occuring in the future.

You may confirm this issue by checking your log in /var/logs/uwsgi/app/searx.log after the problem occurs. In my case this log entry is not available using journalctl, so you’ll have to look at this log file directly. Using a shell as root (or using sudo) do this command:

#grep skip /var/log/uwsgi/app/searx.log

This will return log entries similar to these…

Tue Jul 25 07:04:55 2023 - invalid request block size: 6103 (max 5632)...skip
Tue Jul 25 07:07:17 2023 - invalid request block size: 5657 (max 5632)...skip
Tue Jul 25 07:43:45 2023 - invalid request block size: 4993 (max 4096)...skip

… indicating that the uwsgi buffer-size is too small to handle the request made through searx. If you want to see the largest request which has failed in the log you can use this command:

#grep skip /var/log/uwsgi/app/searx.log | awk '{print $11}' | sort -n

This will give you a list of the request sizes which failed in numeric order such that the last line is the largest searx request you’ve seen fail. You may use this value as a guide for setting the uwsgi buffer-size in the Repair Configuration.

Repair Configuration

Modify the /etc/uwsgi/apps-available/searx.ini file in a text editor as root adding this line:

buffer-size = 5632

You should choose this value (5632 shown here) carefully for system security reasons. It should be just large enough to handle your failed but legitimate Searx requests. Maybe add 10% or so to the largest failed request you see using the commands in Root Cause.

The top seven lines of my configuration file look like this, with the changes I made in lines 6 & 7. The remainder of the file is kept unmodified.

[uwsgi]
# Who will run the code
uid = www-data
gid = www-data

# increase allowable request size
buffer-size = 5632

After saving this change you may restart Searx from plinth and you will see these errors less frequently. If the problem happens again you may repeat these steps and pick a larger buffer-size value to accommodate your requests.

Web Browser Workaround

For users who don’t wish to modify the system configuration files or do not have permission to do so there is a partial solution which can be done from the web browser. I suspect that the request size increases with the number of enabled engines selected. You may reduce the problem from occurring by enabling fewer engines through the Searx preferences web UI. However, once this problem occurs you will continue to receive 500 Internal Service Errors until you apply this workaround.

  1. In your web browser settings page go to stored cookies.
  2. Find cookies set by your Freedombox.
  3. Remove all of them (This will require you to log in with your user name and password when you next connect).
  4. Log in to Freedombox again.
  5. Select the Searx app.
  6. You will not have the 500 Internal Service Error any longer, and you will be using the system default Searx configuration.
  7. Enable fewer search engines in the Searx UI.

It looks like Searx sets many different cookies, so for simplicity I recommend removing all the cookies for your Freedombox. Don’t forget your password!

1 Like