yacy_search_server/source/de/anomic/crawler
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
2012-04-24 16:07:03 +02:00
..
retrieval - Redesigned crawler and parser to accept embedded links from the NOLOAD 2012-04-24 16:07:03 +02:00
Balancer.java - Redesigned crawler and parser to accept embedded links from the NOLOAD 2012-04-24 16:07:03 +02:00
Cache.java refactoring 2012-04-24 12:54:41 +02:00
CrawlProfile.java fixed default must-match filter for full domain crawls - the old filter 2012-03-28 21:50:00 +02:00
CrawlQueues.java - Redesigned crawler and parser to accept embedded links from the NOLOAD 2012-04-24 16:07:03 +02:00
CrawlStacker.java new indexing strategy: ALL links that appear anywhere are indexed, not 2012-04-22 02:05:17 +02:00
CrawlSwitchboard.java - less automatic indexing after a search (needs to reset the default 2011-12-05 16:22:11 +01:00
ImporterException.java
Latency.java refactoring: removed dependency from switchboard in Balancer/CrawlQueues 2012-04-21 13:47:48 +02:00
NoticedURL.java refactoring: removed dependency from switchboard in Balancer/CrawlQueues 2012-04-21 13:47:48 +02:00
ResourceObserver.java more logging in resource observer 2012-02-23 01:20:42 +01:00
ResultImages.java ConcurrentLinkedQueue has a VERY long return time on the .size() method. 2012-02-27 00:42:32 +01:00
ResultURLs.java Removed handling of components objects for URIMetadataRows. 2011-12-17 01:27:08 +01:00
RobotsTxt.java produce a bookmark entry from every crawl start. these bookmarks are always private. 2011-11-21 23:10:29 +00:00
RobotsTxtEntry.java - enhanced logging in robots.txt parser for remote debugging 2011-11-16 01:03:49 +00:00
RobotsTxtParser.java fix for robot parser 2011-11-16 13:12:46 +00:00
RSSLoader.java refactoring: 2011-09-25 16:59:06 +00:00
SitemapImporter.java refactoring: 2011-09-25 16:59:06 +00:00
ZURL.java ConcurrentLinkedQueue has a VERY long return time on the .size() method. 2012-02-27 00:42:32 +01:00