Commit Graph

137 Commits

Author SHA1 Message Date
Michael Peter Christen
d2b8f2b477 enhancements for staticIP and ipv6 handling 2014-01-27 13:48:20 +01:00
Michael Peter Christen
6ada0daae9 making latency_factor and maximum number of same hosts in loader queue
settings available in Crawler_p.html servlet for steering.
2014-01-21 19:28:00 +01:00
sixcooler
40a4030b55 configurable max-load values for YaCy-Threads:
try lower values on smal systems like a Pi
2014-01-21 17:04:22 +01:00
Michael Peter Christen
022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http
request into a separate thread and ignores the furthure result of a
request if that does not answer within the requested time-out. This is a
try to solve a problem with the peer-ping, which hangs whenever a peer
appears to be dead or blocked.
2014-01-19 15:21:23 +01:00
reger
ea7cef5d05 fix NPE in TemplateEngine
StackTrace For input string: ""
java.lang.NumberFormatException: For input string: ""
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:504)
	at java.lang.Integer.parseInt(Integer.java:527)
	at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:241)
	at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:199)
	at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:896)
2014-01-10 18:11:32 +01:00
reger
6932aa4d7a use configured admin-username for api calls
- the admin user name can be configured, in apiExec calls the default "admin" username is used. 

TODO: the bin/apicall.sh script should likely take that into account.
2014-01-07 21:26:50 +01:00
orbiter
3cb6c7861f fixed shutdown authenticaton problem 2014-01-06 01:48:54 +01:00
Michael Peter Christen
7005ecdabd cleanup 2014-01-05 15:06:40 +01:00
Michael Peter Christen
9bd71fdbb4 made the access tracker class static because it shall be used by the
jetty auth module
2014-01-05 05:04:28 +01:00
reger
4c38bceafc handle http connect for proxy
refactor header cleanup (reuse existing code)
2014-01-04 13:09:34 +01:00
reger
0583f44306 reimplement proxy access log (to Jetty ProxyHandler)
- using existing HTTPDProxyHandler logger
- allow local loopback ip to access proxy
2014-01-02 03:37:33 +01:00
reger
8eaabb9600 remove dependency from old serverCore.java
- remaining getPortNr not needed 
  (as current release allows only to set plain integer as port,
   see ConfigBasic)
2013-12-29 02:00:44 +01:00
Michael Peter Christen
667a6adddb - use default files from yacy.init property "defaultFiles" if no
jetty-configuration is given for default files.
- fix a problem with default paths if no path is given (i.e.
http://localhost:8090 instead of http://localhost:8090/). Without this
patch the path was resolved automatically to http://localhost:8090//
2013-12-26 23:59:04 +01:00
Michael Peter Christen
e17624b6dd added html retrieval from alternative DATA/HTDOCS path 2013-12-23 02:06:33 +01:00
Michael Peter Christen
07cee6b99c removed more unused code 2013-12-23 01:51:48 +01:00
Michael Peter Christen
84167adb49 removed unused anomichttpd code after migration to jetty 2013-12-23 01:23:40 +01:00
Michael Peter Christen
7603e879dc Merge branch 'master' into HEAD
Conflicts:
	.classpath
	source/net/yacy/cora/federate/solr/SolrServlet.java
2013-12-20 01:19:06 +01:00
Michael Peter Christen
25250405f1 solr servlet preparation for join with jetty branch 2013-12-20 00:45:58 +01:00
reger
f111f30ace Merge origin/master into jetty 2013-11-17 00:18:25 +01:00
orbiter
ff86cb683f fixed some XSS bugs reported by Marius from http://ctf365.com/ 2013-11-16 20:34:31 +01:00
Michael Peter Christen
087df05e24 added option to Config_Network_p.html to enable remote search while
DHT-Receive is switched off.
2013-11-13 13:38:01 +01:00
reger
b29d262e70 implement Jetty8HttpServerImpl.generateSocketAddress
(code 1:1 copied from serverCore)
2013-11-10 18:59:18 +01:00
reger
f017066197 Merge origin/master into jetty 2013-10-27 15:09:24 +01:00
reger
69599566f9 catch one more malformed url in proxy url rewrite 2013-10-27 04:42:33 +01:00
reger
605530fec5 catch proxy url rewrite exception
malformed url (" http:\/\/" ) may cause error response
 testcase http://localhost:8090/proxy.html?url=http://dictionary.reference.com/browse/test
2013-10-27 04:06:11 +01:00
reger
1adb4b8741 merge rc1/master 2013-10-16 03:02:21 +02:00
reger
0d4efabaa8 fix YaCy version string in proxy headers
(config parameter vString not longer used)
2013-10-13 17:56:53 +02:00
reger
a44eede8b8 merge rc1/master 2013-10-11 01:50:25 +02:00
sixcooler
61f627eb85 fix for ssl-connections from proxy-usage staying in close-wait-state
+ some extra 'close' in HttpClient
2013-10-10 20:57:37 +02:00
reger
e74f548551 make legacy http server (serverCore) implement YaCyHttpServer interface 2013-10-09 01:07:22 +02:00
reger
71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
     - putting the version specific code in classes starting with Jetty8xxxx
     - moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets

- adjust other test cases/classes
2013-10-09 00:40:48 +02:00
reger
c7c706fd9f merge with rc1/master 2013-09-30 03:46:39 +02:00
Michael Peter Christen
82bfd9e00a - crawl profiles shall be deleted from active and passive stacks if they
are deleted to terminate the crawl because otherwise the crawl will go
on after the load-from-passive stack policy.
- better check if a crawl is terminated using the loader queue.
2013-09-26 10:22:31 +02:00
Michael Peter Christen
179ad281f9 close include byte buffer after usage 2013-09-23 12:19:51 +02:00
reger
70c51775ae Merge remote-tracking branch 'origin/master' into jetty 2013-09-22 02:09:02 +02:00
Michael Peter Christen
5e31bad711 - the webgraph shall store all links which appear on a web page and not
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
2013-09-15 00:30:23 +02:00
reger
127adbf5cf remove references to 10_http thread (legacy http server)
and add needed get/set function to jetty http server wrapper
2013-09-12 22:02:11 +02:00
Michael Peter Christen
765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
in intranets and the internet can now choose to appear as Googlebot.
This is an essential necessity to be able to compete in the field of
commercial search appliances, since most web pages are these days
optimized only for Google and no other search platform any more. All
commercial search engine providers have a built-in fake-Google User
Agent to be able to get the same search index as Google can do. Without
the resistance against obeying to robots.txt in this case, no
competition is possible any more. YaCy will always obey the robots.txt
when it is used for crawling the web in a peer-to-peer network, but to
establish a Search Appliance (like a Google Search Appliance, GSA) it is
necessary to be able to behave exactly like a Google crawler.
With this change, you will be able to switch the user agent when portal
or intranet mode is selected on per-crawl-start basis. Every crawl start
can have a different user agent.
2013-08-22 14:23:47 +02:00
Michael Peter Christen
47b1c81d08 - refactoring
- generalized writing of url attributes to solr documents
- added more url attributes to error documents
2013-08-20 15:46:04 +02:00
Michael Peter Christen
76afcccaaf fix for default boolean post values: the default value MUST NOT be TRUE,
because it's normal that a boolean value is missing in the post argument
if a checkbox is not selected.
Added also some style enhancements to IndexFederated, removed the Solr
attachment manual and replaced it with a link to the wiki which explains
this in more detail.
2013-07-31 10:49:26 +02:00
Michael Peter Christen
31902f54df fix for NPE which happens within solr code at MultiMapSolrParams.java,
line 52 in case that the array arr.length == 0
2013-07-30 14:32:59 +02:00
Michael Peter Christen
cf12835f20 replaced the single-text description solr field with a multi-value
description_txt text field
2013-07-30 12:48:57 +02:00
Michael Peter Christen
4c242f9af9 always use a default value for boolean options to have transparency for
the outcome if the attribute is missing in servlets
2013-07-25 12:17:29 +02:00
Michael Peter Christen
336f86394c replaced StringBuffer with StringBuilder 2013-07-23 12:21:27 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
Michael Peter Christen
64140f35cd fix for solr requests if no query part is given (prevent npe) 2013-06-28 13:16:25 +02:00
Michael Peter Christen
e441a9d4c8 to avoid confusion, the gsa api is available at /search? and
/searchresult?
2013-06-18 16:22:06 +02:00
Michael Peter Christen
291912ee52 removed misleading http accessGranted message (this is only for
debugging)
2013-06-12 00:16:28 +02:00
Michael Peter Christen
8dbc80da70 redesign of index.exist-test: this shall now not be done using a single
id to be tested, but with a collection of ids. This will cause only a
single call to solr instead of many. The result is a much better
performace when testing the existence of many urls. The effect should
cause very much less IO during index transmission, both on sender and
receiver side.
2013-05-17 13:59:37 +02:00
Michael Peter Christen
c91c67c3cd reject bad solr requests 2013-05-15 22:42:05 +02:00
orbiter
4baa0d4a97 Added a default keystore for ssl encryption of the YaCy web interface.
This will enable https-access to YaCy, but this feature is disabled by
default using the new server.https=false attribute. This has two
purposes:
- make it easier for everyone to use https (just set server.https=true)
- provide the basis for secure yacy-to-yacy communication in the future
2013-05-10 12:02:31 +02:00
Michael Peter Christen
ad050ec88d - upgraded httpclient, httpcore and httpmime
- removed httpclient 3.1 which has been used by solrj < 4.x.x and is now
not used any more
- fixed some parts in YaCy which used methods from httpclient 3.1
2013-05-09 00:22:45 +02:00
Michael Peter Christen
cc90f82dbb increased default proxy client timeout to one minute 2013-05-06 14:58:18 +02:00
Michael Peter Christen
0af7803367 added more features to ScoreMap (pretty toString) 2013-04-29 19:28:17 +02:00
Michael Peter Christen
c1a2175fbc added transparency to gif image animation and the integration to the
YaCy httpd for on-the-fly generated gifs (including animated gifs)
2013-04-21 12:29:05 +02:00
Michael Peter Christen
b3a54d5b1c fix for wrong class name in log 2013-03-15 09:35:57 +01:00
reger
6ae30f9d0f replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter 2013-03-10 05:22:18 +01:00
Michael Peter Christen
35fa718b77 testing to use solr for portalsearch caused some bugfixing but no full
success: try to comment out the solr search request in
yacy-portalsearch.js
2013-02-25 14:31:50 +01:00
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
structure, but is not filled yet. To have the opportunity of a second
core, multi-core functionality had to be implemented to the
deep-embedded solr:
- migrated the solr_40 directory content to a subdirectory
'collection1'; the previously used default core is now called
collection1
- added solr_40/webgraph subdirectory as second core
- added a servlet configuration for the second core 'webgraph' in
/IndexSchema_p.html
- added instance handling as addition to solr connections: all solr
connectors are now instances of an solr 'instance' object; this required
a complete re-design of the solr embedding
- migrated also caching and sharding ontop of new instance handling
- migrated the search apis to handle now the access to a specific core,
the default core named 'collection1'
- migrated the remote solr search interface to access shards of cores;
for the yacy remote search the default core is now called 'solr'; using
the peer address as solr address
- migrated the solr backup and restore process: old backups cannot be
used after this migration!
- redesign of solr instance handling in all methods which access the
instances: they cannot hold copies of these instances any more; the must
retrieve the actuall connection object every time they want to write to
it (this solves also some bugs when switching the index/network)
- added another schema 'solr.webgraph.schema', the old solr.keys.list is
replaced by solr.collection.schema
2013-02-21 13:23:55 +01:00
Michael Peter Christen
d3508fa8ff fixed json search, quotes, auto-facets, urls etc. for
yacyinteractive.html
2013-02-13 00:01:38 +01:00
Michael Peter Christen
16d90859b7 reverted put-semantics back to as-usual in serverObjects and introduced
an add-method to put in several objects for the same key
2013-02-12 11:52:33 +01:00
Michael Peter Christen
762b687e47 extended the serverObjects to be able to hold multipel values for a
single key. This is done using the solr class MultiMapSolrParams. That
class is needed in the OpensearchResultWriter to get multiple facet
requests.
2013-02-11 22:12:15 +01:00
Michael Peter Christen
d70d99fab5 added more metadata fields and facets to OpensearchResponseWriter.
This should make it possible to replace the original and enriched yacy
opensearch result with a solr output in opensearch format.
2013-02-11 22:10:14 +01:00
Michael Peter Christen
f5fd2aea18 removed archaic migration code 2013-01-21 17:59:42 +01:00
orbiter
1f33c30d7b re-integrating useForHost method (lost sometime?) to get the noProxy
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
2012-12-10 20:44:29 +01:00
reger
f1a9c2e604 fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
see bug http://bugs.yacy.net/view.php?id=215
2012-12-10 20:02:35 +01:00
Michael Peter Christen
72f165d58b added a Boost class which stores solr query boost values. The class can
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
2012-12-02 16:54:29 +01:00
Michael Peter Christen
8fc3679c66 using more pre-compile pattern for split methods 2012-11-26 13:11:55 +01:00
Michael Peter Christen
d6b82840f8 added a feature to find similarities in documents.
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
2012-11-21 18:46:49 +01:00
Michael Peter Christen
a15819fbec fix for some interface problems 2012-11-05 22:14:52 +01:00
Michael Peter Christen
0833937c1c better balancing and duetime-cumputation also for no-delay intranet
hosts
2012-10-30 11:28:49 +01:00
Michael Peter Christen
3d3d654e88 if a network configuration is choosed which does not allow DHT and no
P2P communication is in robinson mode) then some menu entries are
disabled which have no use in this mode.
2012-10-29 01:51:19 +01:00
Michael Peter Christen
a33e2742cb - removed unnecessary synchronized and deadlock in crawler
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
2012-10-28 19:56:02 +01:00
Michael Peter Christen
f2d0418218 because the new PngEncoder had a problem with the PixelGrabber which is
caused by a JRE bug, the PixelGrabber had to be circumvented using an
own frame buffer which can be read without a PixelGrabber. This resulted
in ultra-fast and much less memory-consuming transformation. YaCy images
are now generated really fast!
2012-10-25 17:59:20 +02:00
Michael Peter Christen
21fe8339b4 - enhanced generation of url objects
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
613cf7da7f enhancement to post argument parsing - possible fix to zero-filled
parameter values
2012-10-11 10:46:06 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of &amp; parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
f8a3ab2d82 added the usage of synonyms to the GSA search interface 2012-10-02 14:29:45 +02:00
Michael Peter Christen
280e36c90b allow Cross-Origin Resource Sharing for all stream servlets, that is the
solr and the gsa search interface. That means that all JavaScript in
browsers now can Cross-Origin access all YaCy search interfaces, which
opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over'
concepts.
2012-09-27 12:02:24 +02:00
Michael Peter Christen
1533bfd63b refactoring 2012-09-25 21:20:03 +02:00
Michael Peter Christen
872f83ebe0 refactoring 2012-09-25 21:04:58 +02:00
Michael Peter Christen
d9ebf4a40f a bit more logging 2012-09-24 15:01:44 +02:00
Michael Peter Christen
5683162bd3 simplifications in DHT Distribution class and more documentation 2012-09-24 12:01:09 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00