Michael Peter Christen
4fbc4740df
removed warnings
2013-10-07 23:41:50 +02:00
Lotus
202a9fbdad
adding synonyms from German OpenThesaurus ready for use in YaCy
2013-10-07 22:02:42 +02:00
Michael Peter Christen
21aa6a0321
migration to Solr 4.5.0
2013-10-07 17:09:40 +02:00
bhoerdzn
45cf553bc3
try to guess default crawling mode, if none set
2013-10-07 13:13:22 +02:00
bhoerdzn
b4f0c822f2
assign strings before checking contents
2013-10-07 13:01:39 +02:00
Michael Peter Christen
ef31d0f279
fix for rss reader, see http://bugs.yacy.net/view.php?id=294
2013-10-07 12:59:54 +02:00
bhoerdzn
499abe8f91
set default values for string parameters
2013-10-07 12:32:23 +02:00
Jens Bertram
85316b3ac6
Merge branch 'master' into crawlexpert-post
2013-10-07 12:02:52 +02:00
bhoerdzn
42ea56eaad
made crawStartExpert_p aware of post variables; extended template where needed
2013-10-07 11:25:59 +02:00
Michael Peter Christen
101a6e6e14
Patch the citation index for links with canonical tags.
...
This shall fulfill the following requirement:
If a document A links to B and B contains a 'canonical C', then the
citation rank computation shall consider that A links to C and B does
not link to C.
To do so, we first must collect all canonical links, find all references
to them, get the anchor list of the documents and patch the citation
reference of these links.
2013-10-07 11:15:58 +02:00
reger
daebeb93aa
add call to AccessTracker to jetty security handler
2013-10-04 01:16:17 +02:00
reger
172aefaeeb
adjust YaCySecurityHandler to Jetty 9 conventions
...
- mainly adjust prepareConstraintInfo to use the RoleInfo.setChecked as in Jetty Source distribution
- use constraint check behavior as in ConstraintSecurityHandler
see http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-security/src/main/java/org/eclipse/jetty/security/ConstraintSecurityHandler.java?id=jetty-9.0.5.v20130813
2013-10-03 19:38:03 +02:00
orbiter
ba3c173077
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-10-03 18:19:02 +02:00
reger
6f9ed439d3
- expand localHostName check of AbstractRemoteHandler
...
to pevent request is handled as proxy request
- make domain handler not relay on included path in resolved .yacy address
2013-10-01 03:04:32 +02:00
reger
561ea135af
fix : forgot adding security handler
2013-09-30 04:35:17 +02:00
reger
f46771bdf5
upd build script from rc1/master
2013-09-30 03:47:55 +02:00
reger
c7c706fd9f
merge with rc1/master
2013-09-30 03:46:39 +02:00
reger
272b196d05
update Jetty server init() to activate yacy-domain and transparent proxy handler
...
- adding domain & proxy handler to a context (as it was in inital design)
(context required for dispatcher)
- make handler context and servlet context parallel available
(to allow use of YaCyDefaultServlet to handle legacyServlets)
- set transparent proxy request handled after dispatch.forward to skip further handling for .yacy domain requests
2013-09-30 03:12:52 +02:00
reger
fd119deb00
fix NPE on modified since check ( Response.requestHeader allowed to be null)
2013-09-30 02:50:53 +02:00
reger
66145a0410
- add welcome file (index.html) support to YaCyDefaultServlet
...
- change SolrServlet default search field (&df) to text_t
2013-09-29 03:34:00 +02:00
orbiter
a3b5d84c81
Merge remote-tracking branch 'origin/master'
...
Conflicts:
.classpath
2013-09-28 15:46:59 +02:00
orbiter
adfae074cf
added classpath for debugging
2013-09-28 15:45:33 +02:00
Michael Peter Christen
b28d43decc
added two more fields source_cr_host_norm_i,target_cr_host_norm_i in
...
webgraph and an addition to postprocessing to copy all cr ranking
attributes to the link edges associated to the postprocessing documents
2013-09-27 16:57:05 +02:00
Michael Peter Christen
a52f3a597e
fix for canonical-from-http-header feature
2013-09-27 15:09:04 +02:00
Michael Peter Christen
2dd7c5be44
added parsing of http-canonical tags (untested, could not find an
...
example page)
2013-09-27 13:17:50 +02:00
Michael Peter Christen
4476dea5ba
do not fail if a wrong boost key is used; instead, print only a warning
...
See also: http://bugs.yacy.net/view.php?id=293
2013-09-27 12:28:09 +02:00
reger
ab9583d429
add default field (&df) to SolrServlet query if missing
2013-09-26 22:20:35 +02:00
Michael Peter Christen
3bf0104199
fix for crawl domain counter limitation (limit was reached too early)
2013-09-26 13:41:52 +02:00
Michael Peter Christen
82bfd9e00a
- crawl profiles shall be deleted from active and passive stacks if they
...
are deleted to terminate the crawl because otherwise the crawl will go
on after the load-from-passive stack policy.
- better check if a crawl is terminated using the loader queue.
2013-09-26 10:22:31 +02:00
Michael Peter Christen
1b3d26dd23
hack to remove most of the warning: deprecated messages (but not all,
...
one is left)
2013-09-25 21:14:52 +02:00
Michael Peter Christen
a496313248
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-09-25 20:41:02 +02:00
sixcooler
3c48fc65fd
reverted RemoteInstance to deprecated methods of httpClient-4.2
...
this should work with current remote-Solr-Instances
2013-09-25 18:45:16 +02:00
Michael Peter Christen
91a875dff5
self-healing of mistakenly deactivated crawl profiles. This fixes a bug
...
which can happen in rare cases when a crawl start and a cleanup process
happen at the same time.
2013-09-25 18:27:54 +02:00
Michael Peter Christen
095053a9b4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-09-25 17:32:52 +02:00
sixcooler
0cae420d8e
some dns-timing changes:
...
since httpclient uses the domain-cache it is useful not to clean the
domain cache until crawling is running (domains are filled into this
cache)
On huge crawl-starts (eg. from file) my DNS did not follow the high
rates - so I reduced the rate and give some more time(-out)
2013-09-25 15:01:28 +02:00
sixcooler
15b1bb2513
bump to httpClient-4.3
2013-09-25 14:48:37 +02:00
Michael Peter Christen
4f83d5f18c
added the new field harvestkey_s to the collection index and the
...
webgraph index which is temporary filled with the crawl profile key.
This is used to select a set of documents for post-processing as soon as
a crawl is finished. Now the postprocessing for a specific crawl is
started when that specific crawl is finished and not at the end of all
post-processing steps.
2013-09-25 14:38:24 +02:00
orbiter
14442efa6d
when profiles are cleaned, there shall be first a callback showing which
...
profiles are cleaned. This shall enable a profile-termination-driven
postprocessing. To do this, index writings must carry the profile key
which will be implemented in another (next) step.
2013-09-25 11:04:12 +02:00
orbiter
0013d0d0bb
removed superfluous class
2013-09-24 21:18:37 +02:00
orbiter
f90d5296cb
Added new data structure to be used by the balancer (not used yet).
...
These data structures will enable the balancer to store the crawl queue
into individual queues, one each for a single host.
2013-09-24 21:08:40 +02:00
orbiter
0e8d752462
refactoring
2013-09-24 19:55:59 +02:00
orbiter
8ac2e8c8c9
added location navigator which causes that the image to the map search
...
is visible whenever a location is available in the search result.
To activate this, the search.navigation property in yacy.conf must be
modified to the new default values.
2013-09-24 11:26:51 +02:00
orbiter
d86d2be5c3
automatically removed Places autotagging if no location library is
...
wanted
2013-09-24 11:23:45 +02:00
orbiter
214a087cdf
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2013-09-23 20:59:03 +02:00
Michael Peter Christen
96ed0c980e
- added hosthash to all documents (also fail documents which is needed
...
there for deletion), this fixes a problem for the deletion of old
documents for new crawl starts
- added clickdepth and citation computation for fail documents
2013-09-23 18:09:42 +02:00
Michael Peter Christen
179ad281f9
close include byte buffer after usage
2013-09-23 12:19:51 +02:00
reger
52dd491c04
fix not necessary use of DigestURL
2013-09-23 03:05:09 +02:00
reger
6b9a624808
remove double declaration of TLD_any_zone_filter
2013-09-23 03:01:08 +02:00
reger
5111841e5b
- reduce Jetty debug logging
...
- fix Context path initialization
2013-09-23 01:30:45 +02:00
reger
bc6ebb3c06
adjust to DigestURI changes from master to DigestURL
2013-09-22 20:57:50 +02:00