Commit Graph

4263 Commits

Author SHA1 Message Date
reger
566a3b0294 fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search
- removed duplicate QueryParams.hashes2Handles , redundant  with .hashes2Set
2013-04-08 21:25:21 +02:00
reger
40b3f2c5fe comment out dead menue link 2013-04-06 02:34:56 +02:00
reger
bf1e1ddca1 fix typo in prev commit 2013-04-06 02:29:49 +02:00
reger
d4d93be779 uncomment "used time" calculation for remote search log 2013-04-06 02:08:01 +02:00
reger
36202f27b0 improve remote search log, set "Returned Results" to transmitcount (instead of no value) 2013-04-05 03:33:33 +02:00
reger
254074b11d Merge branch 'master' of git://gitorious.org/yacy/rc1.git 2013-03-22 03:46:26 +01:00
Michael Peter Christen
870aedf3c6 fixes for better search interface integration in yaml templates 2013-03-20 16:19:49 +01:00
Michael Peter Christen
735eb70525 better search timing; prevents '0 results' for very large local
indexes >> 10 mio documents
2013-03-19 11:23:18 +01:00
Michael Peter Christen
342ba1049b - callback fix
- memory allocation problem in RowCollection: if memory is too low, do
not to try to increase by 1 because this leads to very long execution
time and at the end to the same OOM as if we allocate the memory at the
moment we need it even if the resource observer states that this memory
is not there. To compensate this, the increase size is reduced.
2013-03-19 10:32:01 +01:00
reger
31d16f20d7 fix invisible icon not found 2013-03-18 00:10:23 +01:00
orbiter
243b66ae6d Merge branch 'master' of git://gitorious.org/~frankensteen91/yacy/frankensteen91s-yacy 2013-03-17 13:39:31 +01:00
Frank
7763f2554f add the new PPMbar in Crawler_p for a better style and better use. 2013-03-17 11:43:12 +01:00
orbiter
e4d26d1cb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-17 10:52:42 +01:00
orbiter
940c6849ee enhanced did-you-mean (a bit): can now remember previously searched
words (plus small enhancements)
2013-03-17 10:52:31 +01:00
reger
d57b221921 add: reset Solr schema filed selection to default button in IndexSchema_p 2013-03-17 03:46:29 +01:00
Michael Peter Christen
9406a2e438 fixed NPE during index abstract computation 2013-03-15 10:04:27 +01:00
Michael Peter Christen
d725782440 turned severe message to warning message about network failure events 2013-03-15 09:40:02 +01:00
Michael Peter Christen
2d36a7eaf5 - do not create a new query for all remote peers
- no document search this time
- adjusted banner and network to not show 'WORDS' but DHT Chunks. This
is to avoid confusion for robinson peers which do not create Word
Entries
2013-03-15 00:14:28 +01:00
Michael Peter Christen
2080fc7406 removed unused tag fields 2013-03-14 10:35:21 +01:00
reger
7804c12976 fix error msg in ConfigHeuristics_p 2013-03-14 03:30:25 +01:00
reger
230a12bfe2 adjust Opensearch discover function to new webgraph Solr schema 2013-03-14 03:10:54 +01:00
orbiter
47114910d5 fix for possible memory leaks 2013-03-13 17:55:37 +01:00
Michael Peter Christen
addba047e2 changes in ranking computation
- an existing ranking servlet for solr was extended. It is now possible
to set boost values for fields, boost functions and boost queries.
- The ranking can have different instances, but currently only the first
one is used
- added an abstraction layer for fields which can be used for search and
those fields can be edited in the solr ranking configruation
- the ranking value from solr within the field score is used to combine
remote search requests, which all are created using the same locally
defined boost values
- reduced the number of fields which are used for search (makes it
faster)
- replaced some text fields by string fields (makes indexing faster)
- removed classes which had no use
- made a large number of experiments for a better ranking and created a
temporary setting which prefers hits inside titles
- adjusted also the RWI-based ranking computation to 'prefer title'
- made special cases like for portal search where no post-processing and
post-ranking is wanted: this keeps the original ranking order as done by
Solr
- fixed many bugs with old settings for ranking
2013-03-13 14:47:00 +01:00
Michael Peter Christen
68e739a90b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-03-10 02:29:38 +01:00
Michael Peter Christen
3d9ce9cd04 - added more selection criteria for network seed list
- enhanced up script
2013-03-10 02:26:24 +01:00
orbiter
168e8d9b4d added/fixed missing DOCTYPE line (submitted by Thomas) 2013-03-08 14:40:09 +01:00
Michael Peter Christen
25300913fa fixes to search debugging after testing with the different search
debugging options
2013-03-05 21:28:22 +01:00
Michael Peter Christen
2d472a39f4 DHT-transferred metadata and crawl receipts now also use the delayed
search cache to prevent that too much IO load is on the peer during
search.
2013-03-04 00:07:52 +01:00
Michael Peter Christen
221ed7d764 - enhanced concurrency during search without IO blocking
- introduced a second queue to flush remote search results (now: old
metadata structure from DHT peers)
- fixed result counters
2013-03-03 22:38:50 +01:00
Marc Nause
2714b59f38 *) For some reason this seems to fix a ClassCastException on my system
(OpenJDK).
2013-03-03 20:38:20 +01:00
orbiter
0f7ea7ad9f - enhanced solr.add procedure for mass adds
- removed unused solr access classes
- made snippet generation for documents aus YaCy RWI/DHT concurrent (as
it was before the search process removation)
- reduced the number of remote results in settings file because the
processing of such mass documents add is too CPU-intensive (in Solr)
2013-03-01 15:27:17 +01:00
orbiter
7ff10bdb1b fix of page navigation for formatted totalcount numbers 2013-03-01 00:48:28 +01:00
orbiter
a734fbc4a5 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-27 22:44:57 +01:00
orbiter
d74472f562 corrected result counter 2013-02-27 22:40:23 +01:00
orbiter
aa3c26c62e added recrawl/reload to CrawlStartSite for a timeout of 3 days 2013-02-27 11:43:36 +01:00
orbiter
c1b7e61882 added option to create empty vocabularies 2013-02-27 08:24:37 +01:00
bubu
e0edad689d fix link to IndexSchema_p.html 2013-02-26 21:12:44 +01:00
Michael Peter Christen
c95a84103a complete redesign of search process:
- removed 'worker' processes
- no internal time-out behaviour: methods either are successful or
return null
- waiting is only done on top-level
- removed snippet-production; this is replaced by solr snippets
- removed statistics based on solr size queries (they had been VERY
long); the statistics (like suggestions or tag cloud) are now again
based on the old but very fast RWI index. In portal or intranet mode the
RWI index is usually switched off; if you like to have statistics again
then you must switch on the rwis again in this mode.
- fixed many bugs regarding correct page counter
2013-02-26 17:16:31 +01:00
Michael Peter Christen
35fa718b77 testing to use solr for portalsearch caused some bugfixing but no full
success: try to comment out the solr search request in
yacy-portalsearch.js
2013-02-25 14:31:50 +01:00
Michael Peter Christen
008288719c fix for schema export to consider also automatically generated
coordinate fields
2013-02-25 01:13:03 +01:00
Michael Peter Christen
089dee1770 - generalized SchemaConfiguration into super-class Configuration and
adopted other classes which used the configuration-only access for that
class
- removed many warnings
- adjusted logging
2013-02-25 00:09:41 +01:00
Michael Peter Christen
56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph
index (new solr core webgraph) .. this is now off by default
- completely redesigned this servlet
- added description how to attach a remote solr
- adjusted naming of servlet and menues
- moved 'lazy initialization' attribut from IndexSchema to
IndexFederated (this is a general option) back again.
2013-02-24 18:09:34 +01:00
Michael Peter Christen
14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	htroot/IndexFederated_p.html
	source/net/yacy/cora/federate/solr/YaCySchema.java
	source/net/yacy/peers/Protocol.java
	source/net/yacy/search/Switchboard.java
	source/net/yacy/search/index/Segment.java

also moved portalsearch-dev to yacy-portalsearch to be able to fix
problems with new attachment to solr of the search widget
2013-02-23 08:48:33 +01:00
Michael Peter Christen
58e1e6fa2b fixes to schema 2013-02-23 08:14:10 +01:00
reger
d31a109efe remove obsolete Solr "commit within" input field from IndexFederated
see 4111606654
2013-02-22 22:03:32 +01:00
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
89ede0fe84 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-21 13:24:10 +01:00
Michael Peter Christen
91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
structure, but is not filled yet. To have the opportunity of a second
core, multi-core functionality had to be implemented to the
deep-embedded solr:
- migrated the solr_40 directory content to a subdirectory
'collection1'; the previously used default core is now called
collection1
- added solr_40/webgraph subdirectory as second core
- added a servlet configuration for the second core 'webgraph' in
/IndexSchema_p.html
- added instance handling as addition to solr connections: all solr
connectors are now instances of an solr 'instance' object; this required
a complete re-design of the solr embedding
- migrated also caching and sharding ontop of new instance handling
- migrated the search apis to handle now the access to a specific core,
the default core named 'collection1'
- migrated the remote solr search interface to access shards of cores;
for the yacy remote search the default core is now called 'solr'; using
the peer address as solr address
- migrated the solr backup and restore process: old backups cannot be
used after this migration!
- redesign of solr instance handling in all methods which access the
instances: they cannot hold copies of these instances any more; the must
retrieve the actuall connection object every time they want to write to
it (this solves also some bugs when switching the index/network)
- added another schema 'solr.webgraph.schema', the old solr.keys.list is
replaced by solr.collection.schema
2013-02-21 13:23:55 +01:00
orbiter
594ed63f2a fixed interactive search which caused an error if pubDate is not present
in a search result
2013-02-16 20:33:27 +01:00
Michael Peter Christen
98a4a4aa97 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-15 01:38:23 +01:00