Commit Graph

9398 Commits

Author SHA1 Message Date
Michael Peter Christen
5512be6673 fix in GSA result writer which evaluates result context fields as
String. After the migration to Solr 4.1.0 'some' of these fields
suddenly are stored as String[]; this patch compensates this confusion.
2013-03-19 10:33:35 +01:00
Michael Peter Christen
342ba1049b - callback fix
- memory allocation problem in RowCollection: if memory is too low, do
not to try to increase by 1 because this leads to very long execution
time and at the end to the same OOM as if we allocate the memory at the
moment we need it even if the resource observer states that this memory
is not there. To compensate this, the increase size is reduced.
2013-03-19 10:32:01 +01:00
orbiter
65d73e5652 renamed callback function to 'callback' because that is a standard for
jsonp which is also used in backbone.js/jquery
2013-03-19 00:59:47 +01:00
orbiter
17ae51e741 increased number of links limitation from 1000 to 10000 for rss feeds
and html documents
2013-03-17 22:13:56 +01:00
orbiter
243b66ae6d Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy 2013-03-17 13:39:31 +01:00
Frank
7763f2554f add the new PPMbar in Crawler_p for a better style and better use. 2013-03-17 11:43:12 +01:00
orbiter
e4d26d1cb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-17 10:52:42 +01:00
orbiter
940c6849ee enhanced did-you-mean (a bit): can now remember previously searched
words (plus small enhancements)
2013-03-17 10:52:31 +01:00
reger
d57b221921 add: reset Solr schema filed selection to default button in IndexSchema_p 2013-03-17 03:46:29 +01:00
Michael Peter Christen
a725a4242f main release 1.4 2013-03-15 10:25:47 +01:00
Michael Peter Christen
9406a2e438 fixed NPE during index abstract computation 2013-03-15 10:04:27 +01:00
Michael Peter Christen
16e9d4d1dd added a restart hint 2013-03-15 10:00:06 +01:00
Michael Peter Christen
d725782440 turned severe message to warning message about network failure events 2013-03-15 09:40:02 +01:00
Michael Peter Christen
b3a54d5b1c fix for wrong class name in log 2013-03-15 09:35:57 +01:00
Michael Peter Christen
2d36a7eaf5 - do not create a new query for all remote peers
- no document search this time
- adjusted banner and network to not show 'WORDS' but DHT Chunks. This
is to avoid confusion for robinson peers which do not create Word
Entries
2013-03-15 00:14:28 +01:00
Michael Peter Christen
4af0839be2 use appropriate ranking for each search situation:
- when using the /date modifier, a date ranking profile is used
- when using a site: modifier, a ranking profile supporting longer urls
is used
2013-03-14 21:13:12 +01:00
Michael Peter Christen
b8ed66a55d added all clickdepth computations for source and target paths in
webstructure core
2013-03-14 17:54:33 +01:00
Michael Peter Christen
6300730d7f refactoring of clickdepth computation as preparation for clickdepth
computation of webgraph links
2013-03-14 12:13:02 +01:00
Michael Peter Christen
2080fc7406 removed unused tag fields 2013-03-14 10:35:21 +01:00
reger
7804c12976 fix error msg in ConfigHeuristics_p 2013-03-14 03:30:25 +01:00
reger
230a12bfe2 adjust Opensearch discover function to new webgraph Solr schema 2013-03-14 03:10:54 +01:00
orbiter
6b13dd0d3d added clickdepth field writing for webgraph core (unfinished) 2013-03-14 01:35:38 +01:00
orbiter
47114910d5 fix for possible memory leaks 2013-03-13 17:55:37 +01:00
Michael Peter Christen
addba047e2 changes in ranking computation
- an existing ranking servlet for solr was extended. It is now possible
to set boost values for fields, boost functions and boost queries.
- The ranking can have different instances, but currently only the first
one is used
- added an abstraction layer for fields which can be used for search and
those fields can be edited in the solr ranking configruation
- the ranking value from solr within the field score is used to combine
remote search requests, which all are created using the same locally
defined boost values
- reduced the number of fields which are used for search (makes it
faster)
- replaced some text fields by string fields (makes indexing faster)
- removed classes which had no use
- made a large number of experiments for a better ranking and created a
temporary setting which prefers hits inside titles
- adjusted also the RWI-based ranking computation to 'prefer title'
- made special cases like for portal search where no post-processing and
post-ranking is wanted: this keeps the original ranking order as done by
Solr
- fixed many bugs with old settings for ranking
2013-03-13 14:47:00 +01:00
reger
38f46eb33d set RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries) 2013-03-12 03:13:14 +01:00
reger
2962f2b9e9 Merge branch 'master' of git://gitorious.org/yacy/rc1.git 2013-03-12 02:51:17 +01:00
orbiter
ab74d559fb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-11 18:23:43 +01:00
Michael Peter Christen
4490133909 removed target_tag_s (superfluous) 2013-03-11 10:46:29 +01:00
orbiter
cd197bb555 fix for NPE if surrogates do not exist 2013-03-10 19:46:06 +01:00
reger
6ae30f9d0f replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter 2013-03-10 05:22:18 +01:00
Michael Peter Christen
68e739a90b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-10 02:29:38 +01:00
Michael Peter Christen
3d9ce9cd04 - added more selection criteria for network seed list
- enhanced up script
2013-03-10 02:26:24 +01:00
orbiter
168e8d9b4d added/fixed missing DOCTYPE line (submitted by Thomas) 2013-03-08 14:40:09 +01:00
Michael Peter Christen
252bb51f98 fix for wrong mime type in noload crawler 2013-03-07 15:31:00 +01:00
Michael Peter Christen
25300913fa fixes to search debugging after testing with the different search
debugging options
2013-03-05 21:28:22 +01:00
Michael Peter Christen
81380ae5c8 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-05 12:24:10 +01:00
Michael Peter Christen
c2fde018b5 concurrent snippet fetching from solr results which do not have snippets 2013-03-05 12:24:01 +01:00
orbiter
b1140e3d82 added debug switches for detailed search testing 2013-03-05 12:19:32 +01:00
orbiter
cdbfddf091 added filter queries for better image, audio and video results 2013-03-04 21:18:54 +01:00
Michael Peter Christen
587ef83eab added missing cleanup statements for short memory cases during search 2013-03-04 13:01:24 +01:00
orbiter
2562f052b9 do not put the fulltext field text_t into the search cache because it is
not used there and uses a lot of memory
2013-03-04 12:01:10 +01:00
Michael Peter Christen
2b6c79d347 in method exists() also use the new caching-stacks for
documents/metadata
2013-03-04 01:13:17 +01:00
Michael Peter Christen
ae734b3f8d enhanced the search result processing
- no waiting time at the end
- switched on 'classic' snippet production and verification (again)
2013-03-04 00:17:29 +01:00
Michael Peter Christen
2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed
search cache to prevent that too much IO load is on the peer during
search.
2013-03-04 00:07:52 +01:00
Michael Peter Christen
0d7b4bc891 better protection against OOM during search flush and fixed missing
result push
2013-03-03 23:45:47 +01:00
Michael Peter Christen
221ed7d764 - enhanced concurrency during search without IO blocking
- introduced a second queue to flush remote search results (now: old
metadata structure from DHT peers)
- fixed result counters
2013-03-03 22:38:50 +01:00
Marc Nause
2714b59f38 *) For some reason this seems to fix a ClassCastException on my system
(OpenJDK).
2013-03-03 20:38:20 +01:00
Michael Peter Christen
3b1d9dc884 made index storage from DHT search result concurrently. This prevents
blocking by high CPU usage during search. Also: removed query from Solr
for DHT search results; results are taken from the pending queue.
2013-03-02 10:25:52 +01:00
orbiter
f13c0b2abd fix for search 2013-03-01 19:18:16 +01:00
orbiter
0f7ea7ad9f - enhanced solr.add procedure for mass adds
- removed unused solr access classes
- made snippet generation for documents aus YaCy RWI/DHT concurrent (as
it was before the search process removation)
- reduced the number of remote results in settings file because the
processing of such mass documents add is too CPU-intensive (in Solr)
2013-03-01 15:27:17 +01:00