Commit Graph

221 Commits

Author SHA1 Message Date
reger
de56d934b2 apply query parameter getQueryFields() to GSA servlet 2015-02-27 00:53:20 +01:00
reger
9b0de2de64 introduce getQueryFields to return default query fields (queryparamter QF)
calculated from boostfields config, making sure title, description, keywords and content is always searched.
- apply change to solrServlet makes sure every remote query uses at least all locally defined boost fields for search
- apply to local solr search
- simplify select query by using QF defaults
2015-02-23 23:12:07 +01:00
reger
23924348e2 url with semicolon or comma handling in proxy request
apply patch supplied with bugreport http://mantis.tokeek.de/view.php?id=540
2015-02-07 22:01:54 +01:00
reger
9025fe3518 upd error message for proxy
fix http://mantis.tokeek.de/view.php?id=539
2015-02-07 00:37:43 +01:00
Michael Peter Christen
b5ac29c9a5 added a html field scraper which reads text from html entities of a
given css class and extends a given vocabulary with a term consisting
with the text content of the html class tag. Additionally, the term is
included into the semantic facet of the document. This allows the
creation of faceted search to documents without the pre-creation of
vocabularies; instead, the vocabulary is created on-the-fly, possibly
for use in other crawls. If any of the term scraping for a specific
vocabulary is successful on a document, this vocabulary is excluded for
auto-annotation on the page.

To use this feature, do the following:
- create a vocabulary on /Vocabulary_p.html (if not existent)
- in /CrawlStartExpert.html you will now see the vocabularies as column
in a table. The second column provides text fields where you can name
the class of html entities where the literal of the corresponding
vocabulary shall be scraped out
- when doing a search, you will see the content of the scraped fields in
a navigation facet for the given vocabulary
2015-01-30 13:20:56 +01:00
Michael Peter Christen
bee5ee7cce removed some warnings 2015-01-27 17:00:20 +01:00
Michael Peter Christen
4c9d2a7c64 reverted 'do not show all options' strategy. This is actually confusing
new users. Will be activated maybe again if there is an optional
tutorial mode which can be switched on for this special purpose of
running a tutorial.
2015-01-20 18:18:12 +01:00
reger
4eb89d7f15 revert clickservlet
(default was indeed a mistakenly)
2015-01-05 09:10:20 +01:00
Michael Peter Christen
c9e2128260 please commit new files under your own name, this file was not created
by me.
2015-01-05 08:18:19 +01:00
reger
d44d8996d0 Added a “don't store remote search results” option
This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. 
The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules).
Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index.

To be able to improve the local index a Click-Servlet option was added additionally.
If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index.
The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks)

The option check-boxes are placed in ConfigPortal.html
2015-01-04 11:10:45 +01:00
reger
6a04563578 Init Jetty using setDefaultDescriptor (web.xml) to defaults/web.xml
so web.xml in defaults dir is applied first and optional DATA/SETTINGS/web.xml loaded on top.
By using this Jetty feature (default web.xml) we assure that changes to the default are applied to existing installations
and individual addition/changes are still respected.
2014-12-27 00:10:14 +01:00
reger
1f9389396a fix NPE related 500 (Bad Request) response of UrlProxy on blacklisted urls,
by adding parameter HTTPDeamon and removing unused hostAddress lookup code in sendRespondError
2014-12-25 02:21:45 +01:00
reger
f856edecb6 fix proxy redirect (http status 302) response
fixes http://mantis.tokeek.de/view.php?id=517

The url given in bug report uses a gzip input stream which causes the HTTPClient.writeto() throw an IOException due to incomplete input stream. This in turn prevents the 302 reponse to the client browser. 
By limiting to serve target content just on httpstatus=200 will proxy the header reponse and client browsers redirect settings can be honored.
2014-12-23 02:01:03 +01:00
Michael Peter Christen
28683530cd fixes to usage of no-cache: use and recognize also the no-store
directive
2014-12-19 17:37:58 +01:00
Michael Peter Christen
c9c700b510 reduction of http requests to YaCy using the correct cache-control,
expires and last-modified headers in http response.
2014-12-19 11:51:14 +01:00
Michael Peter Christen
1cfddea578 added (very experimental) Solr response writer for snapshot image
results
2014-12-16 13:18:49 +01:00
Michael Peter Christen
3354cd63be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-12-15 23:32:57 +01:00
reger
63846ddb89 add final SolrQueryRequest.close to SolrServlet 2014-12-15 22:54:49 +01:00
Michael Peter Christen
578ae29f1e added a note that the servlet is linked using web.xml 2014-12-15 05:56:12 +01:00
reger
6c3f36def1 - fix path to default heuristic.cfg
- deprecate unused ProxyServlet
2014-12-14 21:27:45 +01:00
reger
ff18129def ViewFile servlet: update index if newer,
so viewed text and metadata (stored) info is similar
- to archive it, use request with profile to allow indexing (defaultglobaltext) and update index 
   (the resource is loaded, parsed anyway, so it's not a expensive operation)

Request: remove 2 unused init parameter 
- number of anchors of the parent
- forkfactor sum of anchors of all ancestors
2014-12-05 01:13:37 +01:00
Michael Peter Christen
226aea5914 added a servlet which can create preview images, preview tumbnails and
preview pdfs from web pages, i.e.:
http://localhost:8090/api/snapshot.png?url=http://yacy.net/en/&width=128&height=128
http://localhost:8090/api/snapshot.jpg?url=http://yacy.net/en/&width=128&height=128
http://localhost:8090/api/snapshot.pdf?url=http://yacy.net/en/

This supports also an on-the-fly generation of the preview documents if
the user is an administrator. Otherwise, the servlet fails.
To enable this, you must add wkhtmltopdf, imagemagick and (on headless
servers) xvfb to your operation system.

for detailed instructions, see
97f6089a41
2014-12-03 11:45:48 +01:00
Michael Peter Christen
97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
be transcoded into jpg for image previews. To create such pdfs you must
do:

Add wkhtmltopdf and imagemagick to your OS, which you can do:
On a Mac download wkhtmltox-0.12.1_osx-cocoa-x86-64.pkg from
http://wkhtmltopdf.org/downloads.html and downloadh
ttp://cactuslab.com/imagemagick/assets/ImageMagick-6.8.9-9.pkg.zip
In Debian do "apt-get install wkhtmltopdf imagemagick"

Then check in /Settings_p.html?page=ProxyAccess: "Transparent Proxy" and
"Always Fresh" - this is used by wkhtmltopdf to fetch web pages using
the YaCy proxy. Using "Always Fresh" it is possible to get all pages
from the proxy cache.

Finally, you will see a new option when starting an expert web crawl.
You can set a maximum depth for crawling which should cause a pdf
generation. The resulting pdfs are then available in
DATA/HTCACHE/SNAPSHOTS/<host>.<port>/<depth>/<shard>/<urlhash>.<date>.pdf
2014-12-01 15:03:09 +01:00
Michael Peter Christen
c0f9f6ac66 added option to change the navbar-default, i.e. usable for dark skins 2014-11-26 18:01:35 +01:00
Michael Peter Christen
f1f74e8626 toString fix 2014-11-24 20:53:40 +01:00
reger
e4316e2d74 skip creation of local var in proxyhandler.storetocache 2014-11-09 04:17:14 +01:00
sixcooler
d8fcc4a2f5 added a timeout on Jetty connectors 2014-10-16 20:36:12 +02:00
Michael Peter Christen
92c5d97486 fix for bad node flag setting with IPv6 2014-10-07 22:16:18 +02:00
Marc Nause
1e6e69bc40 Finished implementation of UPNP:
*) will try other ports if YaCy standard ports are not available
*) distinguish between internal and external port (not sure if this
works 100%)

Still to add: propery in config to enter own external port (in case of
manually configured NAT)
2014-10-07 13:10:06 +02:00
reger
fe9f1c594e fix char encoding parameter in UrlProxy 2014-10-03 08:51:23 +02:00
Michael Peter Christen
528f583d72 ipv6 fixes 2014-10-01 15:32:10 +02:00
Michael Peter Christen
247e626083 IPv6 host parsing bugfixes 2014-10-01 10:21:03 +02:00
Michael Peter Christen
6491270b3a large IPv6 redesign of peer ping methods!
removed preferred IPv4 in start options and added a new field IP6 in
peer seeds which will contain one or more IPv6 addresses. Now every peer
has one or more IP addresses assigned, even several IPv6 addresses are
possible. The peer-ping process must check all given and possible IP
addresses for a backping and return the one IP which was successful when
pinging the peer. The ping-ing peer must be able to recognize which of
the given IPs are available for outside access of the peer and store
this accordingly. If only one IPv6 address is available and no IPv4,
then the IPv6 is stored in the old IP field of the seed DNA.
Many methods in Seed.java are now marked as @deprecated because they had
been used for a single IP only. There is still a large construction site
left in YaCy now where all these deprecated methods must be replaced
with new method calls. The 'extra'-IPs, used by cluster assignment had
been removed since that can be replaced with IPv6 usage in p2p clusters.
All clusters must now use IPv6 if they want an intranet-routing.
2014-09-30 14:53:52 +02:00
orbiter
a922b122a3 added a hack to forward solr search results from an external attached
solr to the YaCy built-in solr search servlet. Its not complete and not
fully correct (there is still a utf8 encoding problem) but it is a way
to get easily requests forwarded through YaCy to an external Solr.
2014-09-22 15:28:54 +02:00
Michael Peter Christen
0838326a76 changed error message, see http://mantis.tokeek.de/view.php?id=439 2014-09-13 17:02:26 +02:00
orbiter
aa6cdc4ab5 speed-up of start process if remote DNS waits for timeout 2014-09-07 12:28:19 +02:00
Michael Peter Christen
57ce7eeff3 fixed localhost authorization and replaced the adminRealm with an info
string which is visible in the browser. That makes it possible that the
browser instructs the user how to change a forgotten admin password
(during runtime).
2014-09-02 13:15:19 +02:00
reger
c7335318eb remove unused legacy procedure from httpserver
(deleted  generateSocketAddress(port) )
2014-08-31 00:33:05 +02:00
Michael Peter Christen
eab0d3e1a9 bugfix for wrong lock display, see
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5321&p=30484#p30484
2014-08-28 12:50:45 +02:00
orbiter
49d4f95faf bugfix to latest commit 2014-08-27 00:16:50 +02:00
orbiter
68211f8244 enable Crawler_p servlet if a rss feed or a wiki dump import was
submitted.
2014-08-27 00:15:31 +02:00
orbiter
b4f2a1db6e added a unlock icon for all protected pages that are unlocked because
the administrator is logged in.
2014-08-19 19:58:31 +02:00
Michael Peter Christen
6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
reger
47f201a6b8 Add Solr default query fields (&qf) to select servlet
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
2014-07-21 00:47:14 +02:00
reger
b24572f304 fix GSA filter query assignment
- use more parameter constants
2014-07-13 00:11:17 +02:00
reger
665e12f88e move startup time from old serverCore to switchboard (most used here)
to make servercore eventually obsolete.
2014-07-10 02:17:56 +02:00
reger
32bd2a61c1 add local ip to AbstractRemoteHandler local hostname cache 2014-07-10 02:09:26 +02:00
Michael Peter Christen
c7995d3e2a increased fixed limit for http POST request sizes to 100MB 2014-06-26 11:58:07 +02:00
Michael Peter Christen
2626c8f6db using concurrency to do base64 encoding in file POST commands 2014-06-20 13:55:15 +02:00
orbiter
0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-06-15 12:38:52 +02:00