orbiter
c79f687110
enhanced the network scanner: find more hosts automatically by removal
...
of common subdomains before application of protocol-specific prefix
2013-07-09 11:42:13 +02:00
orbiter
f4f6551c66
better handling of time-out at solrj in case that a commit is done in a
...
fail-over case during add
2013-07-09 11:01:37 +02:00
orbiter
b4677d1cad
fix for bug #252
...
the naming of the servlet was wrong, the bug may not be present on
systems where upper/lowercase matching is lazy (windows)
2013-07-09 10:50:47 +02:00
Michael Peter Christen
2716dfc46c
increase crawler speed by reduction if the busysleep time
2013-07-08 23:40:31 +02:00
Michael Peter Christen
07261fe274
Merge remote-tracking branch 'nutomics/blacklist_structure'
2013-07-08 23:32:15 +02:00
Michael Peter Christen
dea71851d2
- better concurrency for network scanner
...
- network scanner can now start from the list of all hosts in the search
index
2013-07-08 16:29:30 +02:00
Michael Peter Christen
a34e137e27
fix for citation index generation in case that entry.referrerhash() is
...
null. This is especially the case if ftp sites are crawled
2013-07-08 16:26:11 +02:00
Michael Peter Christen
a2c8116a8f
accept (but ignore) a '+' sign in front of search words
2013-07-08 16:20:40 +02:00
orbiter
9f0cc9b401
enhanced network scanner
...
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
2013-07-08 13:17:09 +02:00
orbiter
d8354a389c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-07-07 21:31:28 +02:00
Lotus
6e120e90fe
do not cut text on submit buttons
2013-07-07 19:17:29 +02:00
orbiter
f8c28efd66
fix for rssTerminal coloring
2013-07-04 21:46:46 +02:00
sixcooler
308d73f855
do not use remote proxy if not switched on - regardless of the proto
2013-07-04 19:16:13 +02:00
sixcooler
69906b1d2e
Revert "do not use remote proxy if not switched on - regardless of the proto"
...
This reverts commit 20f452d228
.
2013-07-04 19:13:51 +02:00
sixcooler
20f452d228
do not use remote proxy if not switched on - regardless of the proto
2013-07-04 19:12:50 +02:00
sixcooler
9551720d5c
re-enable saved setting for proxy-crawl-profile
2013-07-04 19:10:57 +02:00
sixcooler
d5d8936f9d
For indexes that are changing rapidly in NRT situations, fcs (stands for
...
Field Cache per Segment) may be a better choice than the default fc.
(saves memory)
see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
2013-07-04 19:08:53 +02:00
Felix Ableitner
44f8fcf62e
Changed class structure of Blacklist.
2013-07-04 18:37:57 +02:00
Michael Peter Christen
3054a6d4b9
added a patch from Sebastian M.B., submitted by email for coloring of
...
rss terminal
2013-07-04 17:12:19 +02:00
Michael Peter Christen
78af998f8f
Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'
2013-07-04 16:56:54 +02:00
Michael Peter Christen
57ffdfad4c
added a crawl option to obey html-meta-robots-noindex. This is on by
...
default.
2013-07-03 14:50:06 +02:00
Felix Ableitner
fd90fcc4e0
Fixes #196 .
2013-07-02 20:45:41 +02:00
Michael Peter Christen
5a5d411ec0
new robots_i attribute fields
2013-07-02 14:29:13 +02:00
Michael Peter Christen
fa08bd9d5a
hack to prevent long waiting times in crawler
2013-07-01 13:24:52 +02:00
Michael Peter Christen
f1c5338210
prepartion for greedy crawl profiles and refactoring
2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474
adding the canonical tag to crawl queues
2013-07-01 13:09:41 +02:00
orbiter
40c5ee47c1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-30 12:07:25 +02:00
orbiter
ae23a0badb
updated copyright message; included LGPL for 'cora' and a warranty
...
warning.
2013-06-30 11:30:39 +02:00
reger
a6bf44212e
bugfix: location (lat/lon) meta data retrival (Double.NaN check)
2013-06-30 03:50:07 +02:00
Michael Peter Christen
203921006a
redesign of citation index storage
2013-06-30 02:11:46 +02:00
orbiter
7c6ccc426c
set crawlingQ to true by default because most webpages are dynamic and
...
crawlingQ should only be switched off in case of crawler traps
2013-06-29 20:28:14 +02:00
Lotus
5de4267a9d
windows installer: update to latest jre
2013-06-29 18:54:30 +02:00
reger
83763ee4a4
jpeg parser: extract GPS location from meta data
2013-06-29 00:35:43 +02:00
Michael Peter Christen
e92b9275ce
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-28 15:33:29 +02:00
Michael Peter Christen
56cdcfa2fa
fixed greedy learning mode - global is not a search attribute in
...
searchitems
2013-06-28 15:33:19 +02:00
Michael Peter Christen
32aa1d4569
removed unused option for queries
2013-06-28 15:32:36 +02:00
Michael Peter Christen
0c5bed7e2c
added configuration option for greedy learning function to ConfigPortal
...
servlet
2013-06-28 15:31:36 +02:00
sixcooler
5d1f619f07
possible helpful closing of solr-requests
2013-06-28 15:19:50 +02:00
Michael Peter Christen
9d291764d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-28 15:03:25 +02:00
sixcooler
e5abccdfe4
added optimize-option
2013-06-28 14:51:37 +02:00
Michael Peter Christen
8ea6ddf636
removed attributes from ConfigPortal.html which are redundant to
...
ConfigSearchPage_p.html
2013-06-28 14:17:14 +02:00
Michael Peter Christen
64140f35cd
fix for solr requests if no query part is given (prevent npe)
2013-06-28 13:16:25 +02:00
Michael Peter Christen
8caaf6203a
fixed false multiple-generation of remote facet search which
...
caused high cpu usage on remote side.
2013-06-28 12:39:36 +02:00
Michael Peter Christen
23fb458963
- fix to gsa searchresult answer in case that no query part is given
...
- fix to gsa default number of results (is 'num')
2013-06-28 12:22:33 +02:00
Michael Peter Christen
823ae4d6a7
added url_protocol_s to error documents
2013-06-26 16:51:36 +02:00
Michael Peter Christen
660a196989
refactoring
2013-06-26 09:27:22 +02:00
Michael Peter Christen
c4538d8d91
added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib
2013-06-26 09:26:34 +02:00
reger
3760e2616b
bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments
2013-06-25 23:24:02 +02:00
Michael Peter Christen
9a6fcdf597
npe fix
2013-06-25 16:36:16 +02:00
Michael Peter Christen
54024958ac
added url_file_name_s in qeury for live-search of urls
2013-06-25 16:36:05 +02:00