yacy_search_server/source/net/yacy/crawler
Michael Peter Christen 8c3e5b7b6d added experimental pdf splitting which enables YaCy to split pdfs during
parsing into individual pages and add them all using different URLs.
These constructed urls are generated from the source url with an
appended page=<pagenumber> attribute to the url get/post properties.
This will distinguish the different page entries. The search result list
will then replace the post parameter with a url anchor # mark which
causes that the original url is presented in the search result. These
URLs can be opened directly on the correct page using pdf.js which is
now built-in into firefox. That means: if you find a search hit on page
5 and click on the search result, firefox will open the pdf viewer and
shows page 5.
2014-12-21 18:10:15 +01:00
..
data reactivated on-demand snapshot loading 2014-12-16 12:09:57 +01:00
retrieval added experimental pdf splitting which enables YaCy to split pdfs during 2014-12-21 18:10:15 +01:00
robots more ipv6 bugfixes 2014-10-08 15:21:49 +02:00
Balancer.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
CrawlStacker.java ViewFile servlet: update index if newer, 2014-12-05 01:13:37 +01:00
CrawlSwitchboard.java enhanced the snapshot functionality: 2014-12-09 16:20:34 +01:00
HarvestProcess.java fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
HostBalancer.java reduce number of calls to queue.size() because that may be a bottleneck 2014-11-23 20:09:32 +01:00
HostQueue.java more stacks shall be considered for on-demand loading, not only 2014-11-23 20:11:23 +01:00
LegacyBalancer.java special strategy for balancer: do not remove targets with zero wait time 2014-04-18 06:50:07 +02:00