yacy_search_server/source/de/anomic/crawler
sixcooler 5f8a5ca32d - not doing merge-jobs while short on Memory
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-24 12:07:53 +00:00
..
retrieval less byte-arrays of response-content, less byte-array <-> stream conversation 2011-08-01 23:31:08 +00:00
Balancer.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 2011-07-15 08:38:10 +00:00
CrawlProfile.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 2011-07-03 23:55:55 +00:00
CrawlQueues.java - not doing merge-jobs while short on Memory 2011-08-24 12:07:53 +00:00
CrawlStacker.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 2011-05-27 08:24:54 +00:00
CrawlSwitchboard.java *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. 2011-07-03 23:55:55 +00:00
ImporterException.java added final where possible 2008-08-02 12:12:04 +00:00
Latency.java - refactoring of robots 2011-05-02 14:05:51 +00:00
NoticedURL.java added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer 2011-04-03 23:39:45 +00:00
ResourceObserver.java Implementation of strategies for controlling memory resources. 2011-08-22 17:50:03 +00:00
ResultImages.java - fixed a bug in crawl start with file name (npe in new url) 2011-04-18 16:11:16 +00:00
ResultURLs.java refactoring: moved all score-related classes to new ranking package 2011-08-22 22:37:53 +00:00
RobotsTxt.java - enhanced ybr ranking computation 2011-05-26 10:57:02 +00:00
RobotsTxtEntry.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 2011-05-27 08:24:54 +00:00
RobotsTxtParser.java - refactoring of robots 2011-05-02 14:05:51 +00:00
RSSLoader.java stop loading via http at defined maximum of bytes - even size is unknown before loading 2011-08-01 23:28:23 +00:00
SitemapImporter.java hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: 2011-05-27 08:24:54 +00:00
ZURL.java changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. 2011-07-15 08:38:10 +00:00