yacy_search_server/source/net/yacy/data
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
be transcoded into jpg for image previews. To create such pdfs you must
do:

Add wkhtmltopdf and imagemagick to your OS, which you can do:
On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from
http://wkhtmltopdf.org/downloads.html and downloadh
ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip
In Debian do "apt-get install wkhtmltopdf imagemagick"

Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and
"Always Fresh" - this is used by wkhtmltopdf to fetch web pages using
the YaCy proxy. Using "Always Fresh" it is possible to get all pages
from the proxy cache.

Finally, you will see a new option when starting an expert web crawl.
You can set a maximum depth for crawling which should cause a pdf
generation. The resulting pdfs are then available in
DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf
2014-12-01 15:03:09 +01:00
..
list fix for xml blacklist import 2013-02-08 15:12:10 +01:00
wiki - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
ymark YaCy can now create web page snapshots as pdf documents which can later 2014-12-01 15:03:09 +01:00
BlogBoard.java - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
BlogBoardComments.java - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
BookmarkDate.java - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
BookmarkHelper.java added an option to set 'obey nofollow' for links with rel="nofollow" 2014-07-18 12:43:01 +02:00
BookmarksDB.java - removed old metadata database and all migration code 2014-01-20 18:31:46 +01:00
DidYouMean.java enhanced didyoumean 2014-02-04 00:18:11 +01:00
Diff.java fixed generics warnings for generic array instantiation that appeared 2014-05-20 21:50:16 +02:00
ListManager.java refactoring 2012-09-21 15:48:16 +02:00
MessageBoard.java added missing @Override annotation 2014-03-28 13:48:37 +01:00
Translator.java use config value htroot in Jetty init (was hardcoded) 2014-02-27 00:23:34 +01:00
URLLicense.java - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
UserDB.java fix ConfigAccounts del user with uppercase letter in name 2014-08-05 01:27:27 +02:00
WorkTables.java toString fixes 2014-10-05 11:03:57 +02:00