orbiter
9f0cc9b401
enhanced network scanner
...
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
2013-07-08 13:17:09 +02:00
orbiter
d8354a389c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-07-07 21:31:28 +02:00
Lotus
6e120e90fe
do not cut text on submit buttons
2013-07-07 19:17:29 +02:00
orbiter
f8c28efd66
fix for rssTerminal coloring
2013-07-04 21:46:46 +02:00
sixcooler
308d73f855
do not use remote proxy if not switched on - regardless of the proto
2013-07-04 19:16:13 +02:00
sixcooler
69906b1d2e
Revert "do not use remote proxy if not switched on - regardless of the proto"
...
This reverts commit 20f452d228
.
2013-07-04 19:13:51 +02:00
sixcooler
20f452d228
do not use remote proxy if not switched on - regardless of the proto
2013-07-04 19:12:50 +02:00
sixcooler
9551720d5c
re-enable saved setting for proxy-crawl-profile
2013-07-04 19:10:57 +02:00
sixcooler
d5d8936f9d
For indexes that are changing rapidly in NRT situations, fcs (stands for
...
Field Cache per Segment) may be a better choice than the default fc.
(saves memory)
see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
2013-07-04 19:08:53 +02:00
Felix Ableitner
44f8fcf62e
Changed class structure of Blacklist.
2013-07-04 18:37:57 +02:00
Michael Peter Christen
3054a6d4b9
added a patch from Sebastian M.B., submitted by email for coloring of
...
rss terminal
2013-07-04 17:12:19 +02:00
Michael Peter Christen
78af998f8f
Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'
2013-07-04 16:56:54 +02:00
Michael Peter Christen
57ffdfad4c
added a crawl option to obey html-meta-robots-noindex. This is on by
...
default.
2013-07-03 14:50:06 +02:00
Felix Ableitner
fd90fcc4e0
Fixes #196 .
2013-07-02 20:45:41 +02:00
Michael Peter Christen
5a5d411ec0
new robots_i attribute fields
2013-07-02 14:29:13 +02:00
Michael Peter Christen
fa08bd9d5a
hack to prevent long waiting times in crawler
2013-07-01 13:24:52 +02:00
Michael Peter Christen
f1c5338210
prepartion for greedy crawl profiles and refactoring
2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474
adding the canonical tag to crawl queues
2013-07-01 13:09:41 +02:00
orbiter
40c5ee47c1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-30 12:07:25 +02:00
orbiter
ae23a0badb
updated copyright message; included LGPL for 'cora' and a warranty
...
warning.
2013-06-30 11:30:39 +02:00
reger
a6bf44212e
bugfix: location (lat/lon) meta data retrival (Double.NaN check)
2013-06-30 03:50:07 +02:00
Michael Peter Christen
203921006a
redesign of citation index storage
2013-06-30 02:11:46 +02:00
orbiter
7c6ccc426c
set crawlingQ to true by default because most webpages are dynamic and
...
crawlingQ should only be switched off in case of crawler traps
2013-06-29 20:28:14 +02:00
Lotus
5de4267a9d
windows installer: update to latest jre
2013-06-29 18:54:30 +02:00
reger
83763ee4a4
jpeg parser: extract GPS location from meta data
2013-06-29 00:35:43 +02:00
Michael Peter Christen
e92b9275ce
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-28 15:33:29 +02:00
Michael Peter Christen
56cdcfa2fa
fixed greedy learning mode - global is not a search attribute in
...
searchitems
2013-06-28 15:33:19 +02:00
Michael Peter Christen
32aa1d4569
removed unused option for queries
2013-06-28 15:32:36 +02:00
Michael Peter Christen
0c5bed7e2c
added configuration option for greedy learning function to ConfigPortal
...
servlet
2013-06-28 15:31:36 +02:00
sixcooler
5d1f619f07
possible helpful closing of solr-requests
2013-06-28 15:19:50 +02:00
Michael Peter Christen
9d291764d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-06-28 15:03:25 +02:00
sixcooler
e5abccdfe4
added optimize-option
2013-06-28 14:51:37 +02:00
Michael Peter Christen
8ea6ddf636
removed attributes from ConfigPortal.html which are redundant to
...
ConfigSearchPage_p.html
2013-06-28 14:17:14 +02:00
Michael Peter Christen
64140f35cd
fix for solr requests if no query part is given (prevent npe)
2013-06-28 13:16:25 +02:00
Michael Peter Christen
8caaf6203a
fixed false multiple-generation of remote facet search which
...
caused high cpu usage on remote side.
2013-06-28 12:39:36 +02:00
Michael Peter Christen
23fb458963
- fix to gsa searchresult answer in case that no query part is given
...
- fix to gsa default number of results (is 'num')
2013-06-28 12:22:33 +02:00
Michael Peter Christen
823ae4d6a7
added url_protocol_s to error documents
2013-06-26 16:51:36 +02:00
Michael Peter Christen
660a196989
refactoring
2013-06-26 09:27:22 +02:00
Michael Peter Christen
c4538d8d91
added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib
2013-06-26 09:26:34 +02:00
reger
3760e2616b
bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments
2013-06-25 23:24:02 +02:00
Michael Peter Christen
9a6fcdf597
npe fix
2013-06-25 16:36:16 +02:00
Michael Peter Christen
54024958ac
added url_file_name_s in qeury for live-search of urls
2013-06-25 16:36:05 +02:00
Michael Peter Christen
16d1d744fa
added url_file_name_s in default collection schema for the file name
...
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.
The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
2013-06-25 16:27:20 +02:00
reger
8d1c4c423d
make imageparser fileextension detection case insensitive (extensions are often upper case)
2013-06-23 00:39:15 +02:00
Michael Peter Christen
f542cf7d9c
fix for daterange: the to-date is inclusive
2013-06-21 15:47:12 +02:00
Michael Peter Christen
f9d859f5dc
now writing image alt texts and (camelcase-)parsed urls into a text
...
search field for a better image retrieval
2013-06-18 16:51:56 +02:00
Michael Peter Christen
c36720d45f
added daterange option to gsa api
2013-06-18 16:25:00 +02:00
Michael Peter Christen
e441a9d4c8
to avoid confusion, the gsa api is available at /search? and
...
/searchresult?
2013-06-18 16:22:06 +02:00
orbiter
8792e6c6e9
stub for better image indexing
2013-06-18 13:28:30 +02:00
orbiter
97f2ac9091
added hint to gsa response writer that the result comes from a yacy peer
2013-06-17 13:29:03 +02:00