Michael Peter Christen
9dfc9c95d8
updated slf4j and log4j
2012-12-27 04:37:21 +01:00
Michael Peter Christen
95712fdc8b
update to pdf parser
2012-12-27 04:16:31 +01:00
Michael Peter Christen
4a9182ae16
use the search configuration to default the cacheStrategy to the value
...
as given in the search configuration
2012-12-27 03:19:21 +01:00
Michael Peter Christen
98819ec3d9
use solr boost configuration to select search fields. At this time it is
...
possible to enter a negative boost value to switch that value off. This
might be different in the future with a better input interface.
2012-12-27 03:17:45 +01:00
Michael Peter Christen
a6ad1d6fd1
update to search tests (use yacy interface and a bugfix)
2012-12-27 03:15:50 +01:00
Michael Peter Christen
e1f89efd0d
- made image search in interactive search using the ViewImage servlet -
...
that enables viewing of images for intranet SMB servers.
- added a filter search for protocol, tld and ext again; otherwise p2p
search produces a lot of rubbish
2012-12-26 21:25:27 +01:00
Michael Peter Christen
8f3bd0c387
fix for smb crawl situation (lost too many urls)
2012-12-26 19:15:11 +01:00
reger
d456f69381
SeedUpload url : check to reject localhost url included in saveSeedList (same check as in / copied from Seed.isProper() ), to prevent identity change on next startup (due to rejected seeduploadurl).
2012-12-24 23:29:02 +01:00
reger
fbf84e9ff3
fix SeedUpload setting propery name for include template file
2012-12-24 04:13:38 +01:00
reger
4987caf1c9
- apply fix for localhost handling (from yacy2solr) also to metadata2solr
2012-12-23 01:30:52 +01:00
reger
0148f1bb8c
fix: exception if default work files don't exist
2012-12-22 23:03:39 +01:00
Michael Peter Christen
9e4033f229
fix for event starter: delete start time when event is removed
2012-12-22 21:16:22 +01:00
Michael Peter Christen
99271ffd13
copy work tables from defaults/data/work if exist there and not in
...
DATA/WORK
This can be used to create start-up behavior work scripts in the
api.bheap table
2012-12-22 20:54:05 +01:00
Michael Peter Christen
99edbf6f14
fix for config basic: do not accept empty peer names
2012-12-22 20:52:52 +01:00
Michael Peter Christen
24c9bb35f7
extended the Scheduler: introduced scheduled events
...
- an event type (once, regular) can be selected
- for this event type, a fixed time can be selected. This may be either
directly after startup or at one of the full hours at a day (==25
options)
The main point about this feature is the opportunity to start an action
directly after startup. That makes it possible to create YaCy
distributions which, after started at the first time, start to index
parts of the intranet/internet by itself.
2012-12-22 16:27:14 +01:00
Michael Peter Christen
433143ba40
removed protocol, tld, ext from the urlmask and created specific
...
navigation field for these
2012-12-19 12:45:40 +01:00
Michael Peter Christen
84f82541e8
search process enhancements
2012-12-19 10:41:22 +01:00
Michael Peter Christen
02020b590b
- removed all extension types from extension navigation which are not
...
proper/known
- automatically show the protocol navigation if there is more than http
and https
- automatically show the extension navigation if there is some media
content
2012-12-19 02:38:05 +01:00
Michael Peter Christen
01200f06cc
using the author field as solr-native facet. this makes it necessary to
...
introduce a copy-field for the author field to be copied to a string
field. This field is then used to generate facets. Without this field,
the facet would consist only of the words of the author names, not of
the full author string.
2012-12-19 01:56:33 +01:00
Michael Peter Christen
2a4c064c89
using the publisher information for the author field if no author is
...
given. This applies to cases where only the copyright field in the html
header is filled but not the author field
2012-12-19 01:54:35 +01:00
Michael Peter Christen
bab573361f
- using a filter query for facet restriction
...
- calculating the whole search result in at most two sub-queries from
solr
2012-12-19 01:00:57 +01:00
Michael Peter Christen
7ad5457db0
using the solr facets as navigation in yacyinteractive.html instead of
...
counting locally result types
2012-12-19 00:59:40 +01:00
Michael Peter Christen
eac9650b31
added another solr field clickdepth_i which reflects the number of
...
clicks which are necessary to get from the portal of a host to a
specific document. At this time, only the start document is flagged with
clickdepth '0', all other with '-1'. To get the actual clickdepth, a
process must use crawled information to collect the actual number of
clicks. This will be added in another/next step.
2012-12-18 17:20:42 +01:00
Michael Peter Christen
1052263af3
- added a new solr field references_i which stores the number of
...
INCOMING links to the corresponding web page. This information is taken
from the reverse link index (a 'little sister' of the RWI index).
- this field can be of use to enhance the ranking because a web page
with more incoming links can be more more important than others. But
this is not true for typical link pages like menues. Therefore the
number of outgoing links is needed.
- added a new solr attribute 'bf' to solr queries which is a boost
function extension. this field can contain a formula which comuptes the
boost according to given field values. After some experiments the
following forumla is now default:
div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4
This takes the number of references and the inbound links. Further
experiments are needed to enhance that forumula.
2012-12-18 14:42:35 +01:00
Michael Peter Christen
7c3de8b4cd
- fix for localhost detection
...
- added IPv6 patterns for localhost detection
2012-12-18 12:52:20 +01:00
Michael Peter Christen
34f8786508
removed dependency of vocabulary navigation from Jena and it's
...
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.
2012-12-18 02:29:03 +01:00
reger
664499bb10
PerformanceQueues: disable input for hardcoded httpd performance values
2012-12-16 21:01:13 +01:00
reger
ad71747525
fix: set defaul language to "en"
2012-12-16 20:53:45 +01:00
Michael Peter Christen
9319b90d8a
- fixes for host navigation
...
- fixes for filetype navigation
- removed unused code
2012-12-15 09:14:49 +01:00
Michael Peter Christen
cb5cbec14d
distinguishing modified query string and original query string
2012-12-15 00:05:46 +01:00
Michael Peter Christen
fb0fa9a102
- fixed 'delete from subpath' during crawl start which deleted nothing;
...
now works;
- changed some crawl start html design details
2012-12-11 13:38:28 +01:00
orbiter
899fd8b62d
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-12-10 21:18:56 +01:00
orbiter
712cc37c40
if maxFileSize < 0 then the file size limit is without limit.
2012-12-10 21:17:45 +01:00
reger
3f26aabfb3
quickfix for translated link containig word "browse" in ru & uk, see http://bugs.yacy.net/view.php?id=213
2012-12-10 21:08:04 +01:00
orbiter
f86d469973
more search command tools
2012-12-10 21:01:14 +01:00
orbiter
54e193a2b8
you can now search for '*' to get just ALL entries in the search index
...
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.
2012-12-10 21:00:30 +01:00
orbiter
7f5526e6ef
allow larger no-proxy expressions
2012-12-10 20:59:43 +01:00
orbiter
1228a5798d
you can now search for '*' to get just ALL entries in the search index
...
as result list. This makes sense if you intend to search just by using
the navigation tools to cut the data set into navigation 'slices'.
2012-12-10 20:55:11 +01:00
orbiter
1f33c30d7b
re-integrating useForHost method (lost sometime?) to get the noProxy
...
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
2012-12-10 20:44:29 +01:00
reger
f1a9c2e604
fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
...
see bug http://bugs.yacy.net/view.php?id=215
2012-12-10 20:02:35 +01:00
orbiter
a4a780b871
- fix for bad url conversion in bookmarks when using smb urls
...
- fix for localhost hosts in solr schema host handling
2012-12-10 07:22:42 +01:00
reger
e80dfeca23
- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171 )
...
- blacklist test adding explicite response text "not blocked" if no blacklist match
2012-12-08 06:34:48 +01:00
reger
e2d499be9e
remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list).
2012-12-08 00:19:20 +01:00
Michael Peter Christen
a3cd3852ab
introduced a better place to update the lastacc time value in latency
2012-12-07 15:49:23 +01:00
Michael Peter Christen
864abcd33d
removed Latency update after URL selection because that causes
...
a completely wrong behaviour when cache fresh cases appear. Makes
re-crawling MUCH faster!
2012-12-07 15:35:44 +01:00
Michael Peter Christen
4491072256
- clear the search cache when altering the solr boosts
...
- better positions for submit buttons
2012-12-07 14:56:34 +01:00
Michael Peter Christen
2b7d46bc1f
using a filter query for the site parameter in GSA api
2012-12-07 14:54:49 +01:00
Michael Peter Christen
dd241d03bb
latency fix: only set last-visit time if access was actually by the
...
robot
2012-12-07 02:00:12 +01:00
Michael Peter Christen
118233a7e6
fix for bad xml in gsa result when doing a query with quotes
2012-12-07 01:35:02 +01:00
Michael Peter Christen
1e002ab18e
added another blacklist-cleaner into balancer
2012-12-07 01:27:24 +01:00