Commit Graph

1279 Commits

Author SHA1 Message Date
sixcooler
76b037a20a check content domain fix:
search image/media should not show pages containing image/media
search text should show all/text but image/media
2012-07-27 04:11:52 +02:00
Michael Peter Christen
e432bb9cd9 better calculation of possible saving in HeapReader index data structure 2012-07-26 10:05:06 +02:00
Michael Peter Christen
9549984c65 documentation/comments 2012-07-25 21:34:23 +02:00
Michael Peter Christen
3bcd9d622b cleaned up classes and methods which are either superfluous at this time
or will be superfluous or subject of complete redesign after the
migration to solr. Removing these things now will make the transition to
solr more simple.
2012-07-25 14:31:54 +02:00
Michael Peter Christen
6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is
written. Also done some code-cleanup.
2012-07-25 01:53:47 +02:00
Michael Peter Christen
315d83cfa0 cleanup 2012-07-24 22:16:56 +02:00
Michael Peter Christen
1f41d9c6f5 bugfix for a NPE 2012-07-24 17:29:32 +02:00
Michael Peter Christen
76202f068e extended abstraction of local and remote solr index using one front-end
for index administration and querying.
2012-07-24 17:23:29 +02:00
Michael Peter Christen
d3f243e2e1 fixed node type calculation for principal peers 2012-07-23 23:40:50 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
Michael Peter Christen
cba4ab862e fix for http://bugs.yacy.net/view.php?id=202 2012-07-23 00:36:18 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
05a3ffd03a patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa
2012-07-20 11:47:50 +02:00
orbiter
5a3c829872 embedded solr is only initiated if it is activated with
IndexFederated_p.html
2012-07-20 11:40:33 +02:00
Michael Peter Christen
97b7bcf2a6 added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler
2012-07-19 11:34:05 +02:00
Michael Peter Christen
f0a079ac9f allow larger log entries 2012-07-14 16:28:14 +02:00
Michael Peter Christen
784a4abb18 enhancement in internal data organization which should generate less
synchronizations in database access
2012-07-14 13:09:44 +02:00
Michael Peter Christen
f78ce93a80 collection of speed and memory saving hacks 2012-07-13 21:15:38 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
a196f24f60 prevent enqueueing of non-loggeable logging entries 2012-07-12 19:42:42 +02:00
orbiter
482afed07c reduced logging overhead (a bit) 2012-07-12 19:23:40 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
58e7d1952f reduction of logging to prevent too much IO caused be logging 2012-07-12 02:08:11 +02:00
Michael Peter Christen
83da68c4c1 fixed a memory leak inside the logger which appeared if the log was
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
2012-07-12 01:23:04 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
28b30231c3 fix for url matcher of multiple amp& in an url, see:
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4439&p=26650#p26650
2012-07-10 17:39:56 +02:00
Roland 'Quix0r' Haeder
aef9dd0350 - removed cleaning of blacklist cache on startup
- added cleaning of blacklist cache if cache is modified in interface
- extended cache saving to all cache types
- moved cache location to DATA/LISTS
- fixed static file path which was relative to the application path but
should be relative to data path - which is different in debian and mac
implementations
2012-07-10 13:08:16 +02:00
orbiter
c7afa8bc48 using SwitchboardConstants for solr attributes 2012-07-10 12:01:20 +02:00
orbiter
c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-09 14:33:11 +02:00
orbiter
5f3b8dc040 fix for RSS reader 2012-07-09 14:32:35 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
1addbc792c use less memory for md5 cache 2012-07-08 22:05:04 +02:00
Michael Peter Christen
f32de94723 more logging 2012-07-08 22:04:36 +02:00
Michael Peter Christen
d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
240).
2012-07-08 21:25:22 +02:00
Michael Peter Christen
434ee90c59 added classification for control file types which shall not be loaded
but placed onto the noload-queue
2012-07-08 21:17:33 +02:00
Michael Peter Christen
a90bcb48f6 added webm 2012-07-08 17:58:05 +02:00
Michael Peter Christen
801972fe6f fix for url camel case parser and sentence reader 2012-07-08 16:48:09 +02:00
Michael Peter Christen
fbc1a2030d fix for sitemap importer: can now also import very large sitemaps within
small memory configurations
2012-07-08 16:11:50 +02:00
Michael Peter Christen
92731e5287 fix for sevenzip parser 2012-07-08 16:11:19 +02:00
Michael Peter Christen
45641b0c23 catch and log a warning in RasterPlotter 2012-07-06 09:21:12 +02:00
Michael Peter Christen
8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch
- more logging for errors
2012-07-06 09:05:41 +02:00
Michael Peter Christen
c3db015410 prevent loading of content from the cache when retrieval with IFFRESH is
used and cache is stale. Should speed up snippet generation when cache
strategy is IFFRESH.
2012-07-06 08:29:41 +02:00
Michael Peter Christen
b1e7c11fba fix for pattern matcher in html parser 2012-07-05 14:24:03 +02:00
Michael Peter Christen
8a6edc0031 fix for solr shutdown 2012-07-05 14:23:43 +02:00
Michael Peter Christen
b8bcc06283 fix for urls beginning with "//" 2012-07-05 14:23:29 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
5bd3c90907 - removed unnecessary semicolons
- added default case for switch
2012-07-05 11:18:31 +02:00
Michael Peter Christen
132afaf687 removed unaccessible code 2012-07-05 11:09:44 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00