yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

History

Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later be transcoded into jpg for image previews. To create such pdfs you must do: Add wkhtmltopdf and imagemagick to your OS, which you can do: On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from http://wkhtmltopdf.org/downloads.html and downloadh ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip In Debian do "apt-get install wkhtmltopdf imagemagick" Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and "Always Fresh" - this is used by wkhtmltopdf to fetch web pages using the YaCy proxy. Using "Always Fresh" it is possible to get all pages from the proxy cache. Finally, you will see a new option when starting an expert web crawl. You can set a maximum depth for crawling which should cause a pdf generation. The resulting pdfs are then available in DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf		2014-12-01 15:03:09 +01:00
..
Cache.java	more IPv6 bugfixes	2014-10-06 17:44:27 +02:00
CrawlProfile.java	YaCy can now create web page snapshots as pdf documents which can later	2014-12-01 15:03:09 +01:00
CrawlQueues.java	reduce number of calls to queue.size() because that may be a bottleneck	2014-11-23 20:09:32 +01:00
Latency.java	- added a new Crawler Balancer: HostBalancer and HostQueues:	2014-04-16 21:34:28 +02:00
NoticedURL.java	reduce number of calls to queue.size() because that may be a bottleneck	2014-11-23 20:09:32 +01:00
ResultImages.java	fix for image alt attachment to AnchorURLs in html parser.	2014-08-01 12:04:15 +02:00
ResultURLs.java	migrated the index export methods from the old metadata to solr. Now	2013-01-24 12:39:19 +01:00
Snapshots.java	YaCy can now create web page snapshots as pdf documents which can later	2014-12-01 15:03:09 +01:00