Commit Graph

1622 Commits

Author SHA1 Message Date
reger
ad71747525 fix: set defaul language to "en" 2012-12-16 20:53:45 +01:00
Michael Peter Christen
9319b90d8a - fixes for host navigation
- fixes for filetype navigation
- removed unused code
2012-12-15 09:14:49 +01:00
Michael Peter Christen
cb5cbec14d distinguishing modified query string and original query string 2012-12-15 00:05:46 +01:00
Michael Peter Christen
fb0fa9a102 - fixed 'delete from subpath' during crawl start which deleted nothing;
now works;
- changed some crawl start html design details
2012-12-11 13:38:28 +01:00
orbiter
712cc37c40 if maxFileSize < 0 then the file size limit is without limit. 2012-12-10 21:17:45 +01:00
orbiter
1f33c30d7b re-integrating useForHost method (lost sometime?) to get the noProxy
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
2012-12-10 20:44:29 +01:00
reger
f1a9c2e604 fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
see bug http://bugs.yacy.net/view.php?id=215
2012-12-10 20:02:35 +01:00
orbiter
a4a780b871 - fix for bad url conversion in bookmarks when using smb urls
- fix for localhost hosts in solr schema host handling
2012-12-10 07:22:42 +01:00
reger
e80dfeca23 - making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171)
- blacklist test adding explicite response text "not blocked" if no blacklist match
2012-12-08 06:34:48 +01:00
reger
e2d499be9e remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list). 2012-12-08 00:19:20 +01:00
Michael Peter Christen
a3cd3852ab introduced a better place to update the lastacc time value in latency 2012-12-07 15:49:23 +01:00
Michael Peter Christen
864abcd33d removed Latency update after URL selection because that causes
a completely wrong behaviour when cache fresh cases appear. Makes
re-crawling MUCH faster!
2012-12-07 15:35:44 +01:00
Michael Peter Christen
dd241d03bb latency fix: only set last-visit time if access was actually by the
robot
2012-12-07 02:00:12 +01:00
Michael Peter Christen
118233a7e6 fix for bad xml in gsa result when doing a query with quotes 2012-12-07 01:35:02 +01:00
Michael Peter Christen
1e002ab18e added another blacklist-cleaner into balancer 2012-12-07 01:27:24 +01:00
Michael Peter Christen
10527e28ae fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
Michael Peter Christen
756772fbd3 fix for waitingtime computation for intranet configuration 2012-12-06 17:40:52 +01:00
Michael Peter Christen
fa27e5820f - check blacklist (again) when taking urls from the crawl stack because
the blacklist may get extended during crawling
- removed debug output
2012-12-06 00:12:16 +01:00
Michael Peter Christen
adfecc6ba8 more robustness during shutdown 2012-12-05 18:20:43 +01:00
Michael Peter Christen
d4bfe9339e Brute-force attempt to start solr in case of a memory problem.
I don't actually know if this is correct. It is a desperate try to get
YaCy running on production servers which must get alive even with
strange hacks like this. This is also related to a forum posting in
http://forum.yacy-websuche.de/viewtopic.php?t=4528&p=27135#p27135
2012-12-05 18:16:06 +01:00
Michael Peter Christen
8aa08261a7 update to Solr Boost handling 2012-12-05 12:26:42 +01:00
Michael Peter Christen
908ad2f174 Added a new servlet to configure the solr ranking using field boosts 2012-12-03 17:01:19 +01:00
Michael Peter Christen
a01e47b992 enhanced exists()-method for solr; should reduce a lot of IO during DHT
target selection
2012-12-02 17:29:37 +01:00
Michael Peter Christen
72f165d58b added a Boost class which stores solr query boost values. The class can
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
2012-12-02 16:54:29 +01:00
Michael Peter Christen
b5ee88c6af added more logging to get info which url causes performance problems 2012-12-02 16:52:12 +01:00
reger
1faa045dc1 fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*') 2012-12-01 22:41:21 +01:00
reger
6cf33f899c prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check) 2012-11-28 00:09:53 +01:00
Michael Peter Christen
acd98bebb7 improvements in GSA result writer 2012-11-26 15:18:51 +01:00
Michael Peter Christen
3de784c8dd replaced more split and replaceAll missing pattern pre-compilation with
pre-compiled pattern
2012-11-26 13:40:53 +01:00
Michael Peter Christen
8fc3679c66 using more pre-compile pattern for split methods 2012-11-26 13:11:55 +01:00
Michael Peter Christen
d48e9788d2 enhanced search result processing behavior
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
2012-11-26 12:24:35 +01:00
Michael Peter Christen
bf512e6350 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-11-26 00:14:57 +01:00
reger
469efcdb9d fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead)
(leave hosts, topics and  not in ConfigPortal included filetype,  protocoll navigator untouched)
2012-11-25 22:49:26 +01:00
Michael Peter Christen
eca68fa197 added debug code to crawler monitor 2012-11-25 15:43:42 +01:00
Michael Peter Christen
205f8b222b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-11-25 14:41:49 +01:00
orbiter
ee612e8b93 start the local search only if this peer is doing a remote search or
when it is doing a local search and the peer is old
2012-11-25 11:58:57 +01:00
Michael Peter Christen
d465773a37 - removed multi-add of documents (no used)
- inserted specialized code for size request
2012-11-25 01:34:39 +01:00
Michael Peter Christen
a1a4d9aa94 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java
2012-11-24 22:31:46 +01:00
Michael Peter Christen
b7004043ea - added a field cache for solr queries which call only for a single
value
- fixed a version conflict exception within a solr add request
2012-11-24 22:30:05 +01:00
orbiter
5aa5202adf fixes for filesystem indexing 2012-11-24 10:27:29 +01:00
Michael Peter Christen
efd2c4622d added a new fail type attribute for the index to distinguish two
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).
2012-11-23 14:00:30 +01:00
Michael Peter Christen
5e182a566f - added another enumeration method in kelondro data structure to get a
more random access to data for the balancer
- added random access inside the balancer
2012-11-23 13:58:39 +01:00
Michael Peter Christen
4eab3aae60 removed overhead by preventing generation of full search results when
only the url is requested
2012-11-23 01:35:28 +01:00
Michael Peter Christen
a114bb23bb - using edismax in gsa interface
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
2012-11-22 13:03:33 +01:00
Michael Peter Christen
d6b82840f8 added a feature to find similarities in documents.
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
2012-11-21 18:46:49 +01:00
Michael Peter Christen
f5ca5cea44 - added field options to all solr queries. This can be used to restrict
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
2012-11-19 17:24:34 +01:00
Michael Peter Christen
46be4af5b9 Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890' 2012-11-18 22:11:04 +01:00
Michael Peter Christen
832eead998 Merge remote-tracking branch 'regerdev/master' 2012-11-18 22:04:11 +01:00
Michael Peter Christen
952e143580 FINALLY YaCy can now search for full strings using double- or
singlequoted strings in the search query line!!!
2012-11-18 16:03:34 +01:00
orbiter
5dfd6359cb redesign of the QueryParams class: introduced QueryGoal which holds the
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
2012-11-18 01:22:41 +01:00