Michael Peter Christen
72f165d58b
added a Boost class which stores solr query boost values. The class can
...
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
2012-12-02 16:54:29 +01:00
Michael Peter Christen
b5ee88c6af
added more logging to get info which url causes performance problems
2012-12-02 16:52:12 +01:00
reger
1faa045dc1
fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')
2012-12-01 22:41:21 +01:00
reger
6cf33f899c
prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check)
2012-11-28 00:09:53 +01:00
Michael Peter Christen
acd98bebb7
improvements in GSA result writer
2012-11-26 15:18:51 +01:00
Michael Peter Christen
3de784c8dd
replaced more split and replaceAll missing pattern pre-compilation with
...
pre-compiled pattern
2012-11-26 13:40:53 +01:00
Michael Peter Christen
8fc3679c66
using more pre-compile pattern for split methods
2012-11-26 13:11:55 +01:00
Michael Peter Christen
d48e9788d2
enhanced search result processing behavior
...
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
2012-11-26 12:24:35 +01:00
Michael Peter Christen
bf512e6350
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-11-26 00:14:57 +01:00
reger
469efcdb9d
fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead)
...
(leave hosts, topics and not in ConfigPortal included filetype, protocoll navigator untouched)
2012-11-25 22:49:26 +01:00
Michael Peter Christen
eca68fa197
added debug code to crawler monitor
2012-11-25 15:43:42 +01:00
Michael Peter Christen
205f8b222b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-11-25 14:41:49 +01:00
orbiter
ee612e8b93
start the local search only if this peer is doing a remote search or
...
when it is doing a local search and the peer is old
2012-11-25 11:58:57 +01:00
Michael Peter Christen
d465773a37
- removed multi-add of documents (no used)
...
- inserted specialized code for size request
2012-11-25 01:34:39 +01:00
Michael Peter Christen
a1a4d9aa94
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java
2012-11-24 22:31:46 +01:00
Michael Peter Christen
b7004043ea
- added a field cache for solr queries which call only for a single
...
value
- fixed a version conflict exception within a solr add request
2012-11-24 22:30:05 +01:00
orbiter
5aa5202adf
fixes for filesystem indexing
2012-11-24 10:27:29 +01:00
Michael Peter Christen
efd2c4622d
added a new fail type attribute for the index to distinguish two
...
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).
2012-11-23 14:00:30 +01:00
Michael Peter Christen
5e182a566f
- added another enumeration method in kelondro data structure to get a
...
more random access to data for the balancer
- added random access inside the balancer
2012-11-23 13:58:39 +01:00
Michael Peter Christen
4eab3aae60
removed overhead by preventing generation of full search results when
...
only the url is requested
2012-11-23 01:35:28 +01:00
Michael Peter Christen
a114bb23bb
- using edismax in gsa interface
...
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
2012-11-22 13:03:33 +01:00
Michael Peter Christen
d6b82840f8
added a feature to find similarities in documents.
...
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
2012-11-21 18:46:49 +01:00
Michael Peter Christen
f5ca5cea44
- added field options to all solr queries. This can be used to restrict
...
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
2012-11-19 17:24:34 +01:00
Michael Peter Christen
46be4af5b9
Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'
2012-11-18 22:11:04 +01:00
Michael Peter Christen
832eead998
Merge remote-tracking branch 'regerdev/master'
2012-11-18 22:04:11 +01:00
Michael Peter Christen
952e143580
FINALLY YaCy can now search for full strings using double- or
...
singlequoted strings in the search query line!!!
2012-11-18 16:03:34 +01:00
orbiter
5dfd6359cb
redesign of the QueryParams class: introduced QueryGoal which holds the
...
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
2012-11-18 01:22:41 +01:00
cominch
2bb8f045cc
content control: use up-to-date definitions
2012-11-13 17:32:19 +01:00
Michael Peter Christen
5fd3b93661
added deletion of hosts during crawl start if deleteold option was given
2012-11-13 16:54:28 +01:00
Michael Peter Christen
d64445c3cb
because we have the inurl:<term> - searchmodifier, we don't actually
...
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
2012-11-13 11:45:56 +01:00
cominch
a67ff1c8ac
SMW Import: replaced JSON import routines with stable ones
2012-11-12 11:17:50 +01:00
cominch
d2a94cc55e
refactor package
2012-11-09 16:22:24 +01:00
cominch
05742b4562
remove old SMW importer which was part of the ymarks package
2012-11-09 15:44:59 +01:00
cominch
21df1ad9e0
update and generalization of the SMW import and content control routines
2012-11-09 13:48:40 +01:00
Michael Peter Christen
842faf96a2
fixed media search
2012-11-07 17:27:13 +01:00
Michael Peter Christen
93001586a0
removed warnings, removed too-fast pausing of crawls
2012-11-07 15:37:14 +01:00
Michael Peter Christen
8041742e48
added matching of path to query pattern
2012-11-07 15:06:13 +01:00
Michael Peter Christen
8b1c9cba3d
fixed a problem with non-terminating crawls
2012-11-07 15:05:44 +01:00
Michael Peter Christen
61a1d32356
fix to ftp client
2012-11-07 14:58:28 +01:00
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
2012-11-07 14:15:27 +01:00
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
2012-11-07 13:53:29 +01:00
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
2012-11-07 12:52:19 +01:00
Michael Peter Christen
12c0db20e5
fixed npe for surrogate import
2012-11-07 02:46:51 +01:00
Michael Peter Christen
52df6ee369
more logging
2012-11-07 02:04:08 +01:00
Michael Peter Christen
158732af37
automatically delete entries from the crawl profile list if crawl is
...
terminated.
2012-11-07 02:03:44 +01:00
Michael Peter Christen
15d1460b40
added information about the reason of pausing of crawls
2012-11-06 15:21:56 +01:00
Michael Peter Christen
2371ef031c
added solr faceted search support to YaCy search results
...
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
2012-11-06 14:32:08 +01:00
Michael Peter Christen
b30a7162fa
added more thread-renaiming for search processes
2012-11-06 12:31:23 +01:00
Michael Peter Christen
900445d8e9
set the thread name during solr queries to the solr query to get better
...
debugging options
2012-11-06 11:48:04 +01:00
Michael Peter Christen
d481abd087
added the visualization of error-urls to host browser
...
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
2012-11-06 00:29:37 +01:00