0 RAM, swap, & stalling crawls how much memory is enough and what to do about it?
Kian8017 edited this page 2019-05-15 13:57:19 -05:00

WARNING: This is a stub begun by a non-expert. Use the ideas herein at your own risk. Correction & amplification welcomed.

TL;DR: 4 giB of RAM apparently wasn't enough for me. Enabling an 8 giB swap file didn't complete solve the problem, but it did seem to help.

For me, on an ultra light linux built from the mini.iso of Ubuntu 16.04 LTS Xenial 64 bit with X & Openbox, 4 giB of RAM wasn't enough for prolonged crawls. I monitored with htop and never SAW a low memory condition, but I wasn't looking at htop continuously, only now and then. But I'd find the crawl stalled & an error message on the crawl monitoring page http://localhost:8090/Crawler_p.html to the effect that crawling was paused for low memory. Drive space wasn't the issue, so I presume RAM usage peaked while I wasn't looking, the crawl paused, and RAM use went back down, so I still can't PROVE that was the issue, but that's the way to bet. For years I've routinely run without swap (that's "paging" in Windows-speak I think) because my normal use never made use of it. So I enabled an 8 giB swap file and my crawling has been stable since. The traditional recommendation to put swap on a separate filesystem is a very persistent idea but with a modern 'nix there is no clear advantage over a swap file. The only way to make swap significantly faster, AFAIK, is to put it on a partition managed by a separate drive controller & that works for either a swap filesystem or a swap file ON a separate filesystem.

Diagnostic Tools

Anyone exploring similar issues in a 'nix might want to try atop, which will log process and resource usage & let you figure out what happened when you weren't looking. It will create very large log files eventually, so you'll need to have some mechanism in place to deal with that occasionally, even if it is just manually zeroing or deleting them now and then.