Commit Graph

8666 Commits

Author SHA1 Message Date
Michael Peter Christen
e2a97ef8f6 better explain how to access the embedded solr 2012-07-23 21:31:12 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
Michael Peter Christen
cba4ab862e fix for http://bugs.yacy.net/view.php?id=202 2012-07-23 00:36:18 +02:00
Michael Peter Christen
b76836db7b Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-07-23 00:35:14 +02:00
reger
36c9875b6e removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid) 2012-07-23 00:00:40 +02:00
Michael Peter Christen
0640a6f7e6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-22 21:50:44 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
6cc5d1094e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-21 13:34:57 +02:00
orbiter
05a3ffd03a patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa
2012-07-20 11:47:50 +02:00
orbiter
5a3c829872 embedded solr is only initiated if it is activated with
IndexFederated_p.html
2012-07-20 11:40:33 +02:00
Michael Peter Christen
161005ceaa Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-20 09:04:14 +02:00
Michael Peter Christen
bf4968d748 source change in classpath 2012-07-20 09:04:02 +02:00
Lotus
3a350a2f83 partial html fix for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454
2012-07-20 08:53:12 +02:00
orbiter
49ee31f837 added classpath for htroot/solr 2012-07-20 00:59:58 +02:00
Michael Peter Christen
97b7bcf2a6 added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler
2012-07-19 11:34:05 +02:00
Michael Peter Christen
f0a079ac9f allow larger log entries 2012-07-14 16:28:14 +02:00
Michael Peter Christen
9b48c9fe2e removed a crawler overhead (terminated loop which searches greatest
stack that has zero-waiting urls). This should cause a slightly faster
crawl for crawl stacks with many different domains in the crawl queue.
2012-07-14 13:11:04 +02:00
Michael Peter Christen
784a4abb18 enhancement in internal data organization which should generate less
synchronizations in database access
2012-07-14 13:09:44 +02:00
Michael Peter Christen
f78ce93a80 collection of speed and memory saving hacks 2012-07-13 21:15:38 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
a196f24f60 prevent enqueueing of non-loggeable logging entries 2012-07-12 19:42:42 +02:00
orbiter
482afed07c reduced logging overhead (a bit) 2012-07-12 19:23:40 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
58e7d1952f reduction of logging to prevent too much IO caused be logging 2012-07-12 02:08:11 +02:00
Michael Peter Christen
83da68c4c1 fixed a memory leak inside the logger which appeared if the log was
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
2012-07-12 01:23:04 +02:00
Michael Peter Christen
e3aa05b9dd added creation of subpath pattern when crawl start is 'from file' 2012-07-11 23:18:57 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
28b30231c3 fix for url matcher of multiple amp& in an url, see:
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650
2012-07-10 17:39:56 +02:00
Roland 'Quix0r' Haeder
aef9dd0350 - removed cleaning of blacklist cache on startup
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
2012-07-10 13:08:16 +02:00
orbiter
c7afa8bc48 using SwitchboardConstants for solr attributes 2012-07-10 12:01:20 +02:00
sixcooler
a99ef68422 bump to httpclient-4.2.1 2012-07-09 18:58:33 +02:00
orbiter
c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-09 14:33:11 +02:00
orbiter
5f3b8dc040 fix for RSS reader 2012-07-09 14:32:35 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
2160f9a819 Release 1.04 2012-07-09 00:13:59 +02:00
Michael Peter Christen
1addbc792c use less memory for md5 cache 2012-07-08 22:05:04 +02:00
Michael Peter Christen
f32de94723 more logging 2012-07-08 22:04:36 +02:00
Michael Peter Christen
d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
240).
2012-07-08 21:25:22 +02:00
Michael Peter Christen
434ee90c59 added classification for control file types which shall not be loaded
but placed onto the noload-queue
2012-07-08 21:17:33 +02:00
Michael Peter Christen
1517a3b7b9 added webm mime-type 2012-07-08 17:59:20 +02:00
Michael Peter Christen
a90bcb48f6 added webm 2012-07-08 17:58:05 +02:00
Michael Peter Christen
801972fe6f fix for url camel case parser and sentence reader 2012-07-08 16:48:09 +02:00
Michael Peter Christen
fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within
small memory configurations
2012-07-08 16:11:50 +02:00
Michael Peter Christen
92731e5287 fix for sevenzip parser 2012-07-08 16:11:19 +02:00
Michael Peter Christen
45641b0c23 catch and log a warning in RasterPlotter 2012-07-06 09:21:12 +02:00
Michael Peter Christen
8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch
- more logging for errors
2012-07-06 09:05:41 +02:00
Michael Peter Christen
c3db015410 prevent loading of content from the cache when retrieval with IFFRESH is
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.
2012-07-06 08:29:41 +02:00
Michael Peter Christen
91f14ea38e fix to solr configuration (case where the external solr was not online) 2012-07-06 01:29:13 +02:00
sixcooler
2c5b68d932 more abstraction of error message 2012-07-05 14:50:37 +02:00