Commit Graph

8673 Commits

Author SHA1 Message Date
sixcooler
83c93e9209 no translation of queue-links 2012-07-25 15:35:13 +02:00
Michael Peter Christen
6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is
written. Also done some code-cleanup.
2012-07-25 01:53:47 +02:00
Michael Peter Christen
315d83cfa0 cleanup 2012-07-24 22:16:56 +02:00
Michael Peter Christen
1f41d9c6f5 bugfix for a NPE 2012-07-24 17:29:32 +02:00
Michael Peter Christen
76202f068e extended abstraction of local and remote solr index using one front-end
for index administration and querying.
2012-07-24 17:23:29 +02:00
Michael Peter Christen
d3f243e2e1 fixed node type calculation for principal peers 2012-07-23 23:40:50 +02:00
Michael Peter Christen
7ec7341f60 added user-authentication protection to solr search (same as implemented
for yacysearch)
2012-07-23 21:43:14 +02:00
Michael Peter Christen
e2a97ef8f6 better explain how to access the embedded solr 2012-07-23 21:31:12 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
Michael Peter Christen
cba4ab862e fix for http://bugs.yacy.net/view.php?id=202 2012-07-23 00:36:18 +02:00
Michael Peter Christen
b76836db7b Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-07-23 00:35:14 +02:00
reger
36c9875b6e removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid) 2012-07-23 00:00:40 +02:00
Michael Peter Christen
0640a6f7e6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-22 21:50:44 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
6cc5d1094e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-21 13:34:57 +02:00
orbiter
05a3ffd03a patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa
2012-07-20 11:47:50 +02:00
orbiter
5a3c829872 embedded solr is only initiated if it is activated with
IndexFederated_p.html
2012-07-20 11:40:33 +02:00
Michael Peter Christen
161005ceaa Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-20 09:04:14 +02:00
Michael Peter Christen
bf4968d748 source change in classpath 2012-07-20 09:04:02 +02:00
Lotus
3a350a2f83 partial html fix for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454
2012-07-20 08:53:12 +02:00
orbiter
49ee31f837 added classpath for htroot/solr 2012-07-20 00:59:58 +02:00
Michael Peter Christen
97b7bcf2a6 added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler
2012-07-19 11:34:05 +02:00
Michael Peter Christen
f0a079ac9f allow larger log entries 2012-07-14 16:28:14 +02:00
Michael Peter Christen
9b48c9fe2e removed a crawler overhead (terminated loop which searches greatest
stack that has zero-waiting urls). This should cause a slightly faster
crawl for crawl stacks with many different domains in the crawl queue.
2012-07-14 13:11:04 +02:00
Michael Peter Christen
784a4abb18 enhancement in internal data organization which should generate less
synchronizations in database access
2012-07-14 13:09:44 +02:00
Michael Peter Christen
f78ce93a80 collection of speed and memory saving hacks 2012-07-13 21:15:38 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
a196f24f60 prevent enqueueing of non-loggeable logging entries 2012-07-12 19:42:42 +02:00
orbiter
482afed07c reduced logging overhead (a bit) 2012-07-12 19:23:40 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
58e7d1952f reduction of logging to prevent too much IO caused be logging 2012-07-12 02:08:11 +02:00
Michael Peter Christen
83da68c4c1 fixed a memory leak inside the logger which appeared if the log was
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
2012-07-12 01:23:04 +02:00
Michael Peter Christen
e3aa05b9dd added creation of subpath pattern when crawl start is 'from file' 2012-07-11 23:18:57 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
28b30231c3 fix for url matcher of multiple amp& in an url, see:
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650
2012-07-10 17:39:56 +02:00
Roland 'Quix0r' Haeder
aef9dd0350 - removed cleaning of blacklist cache on startup
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
2012-07-10 13:08:16 +02:00
orbiter
c7afa8bc48 using SwitchboardConstants for solr attributes 2012-07-10 12:01:20 +02:00
sixcooler
a99ef68422 bump to httpclient-4.2.1 2012-07-09 18:58:33 +02:00
orbiter
c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-09 14:33:11 +02:00
orbiter
5f3b8dc040 fix for RSS reader 2012-07-09 14:32:35 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
2160f9a819 Release 1.04 2012-07-09 00:13:59 +02:00
Michael Peter Christen
1addbc792c use less memory for md5 cache 2012-07-08 22:05:04 +02:00
Michael Peter Christen
f32de94723 more logging 2012-07-08 22:04:36 +02:00
Michael Peter Christen
d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
240).
2012-07-08 21:25:22 +02:00
Michael Peter Christen
434ee90c59 added classification for control file types which shall not be loaded
but placed onto the noload-queue
2012-07-08 21:17:33 +02:00
Michael Peter Christen
1517a3b7b9 added webm mime-type 2012-07-08 17:59:20 +02:00
Michael Peter Christen
a90bcb48f6 added webm 2012-07-08 17:58:05 +02:00
Michael Peter Christen
801972fe6f fix for url camel case parser and sentence reader 2012-07-08 16:48:09 +02:00