Michael Peter Christen
c1c1be8f02
fix for slow crawling and better logging in balancer
2014-04-29 19:50:33 +02:00
Michael Peter Christen
3acf416335
npe fix
2014-04-29 19:24:05 +02:00
Michael Peter Christen
7d37e74b44
fix to menu colours
2014-04-29 19:13:54 +02:00
Michael Peter Christen
3d5e354471
small changes to search headline colour
2014-04-29 18:46:50 +02:00
Michael Peter Christen
d79d7dde55
fix for result display
2014-04-29 16:24:21 +02:00
Michael Peter Christen
362c988c05
design fixes to better use the new colours
2014-04-29 16:24:01 +02:00
Michael Peter Christen
71efc76170
new default skin pdbootstrap which keeps the design shapes but slightly
...
changes the colours to match with bootstrap colours
2014-04-29 16:23:42 +02:00
Michael Peter Christen
bbadccbd8d
better buttons
2014-04-29 16:22:31 +02:00
reger
2eb7682772
add html5 audio/video <source> tag to html content scraper
...
- <source src=.. type=..> tag content is added to embed collection
2014-04-29 00:41:29 +02:00
Michael Peter Christen
a9963d5c95
bootstrap update
2014-04-28 11:52:13 +02:00
Michael Peter Christen
b2bbb9a0b5
Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
2014-04-28 09:17:21 +02:00
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
2014-04-28 04:59:47 +02:00
malykhin.dmitry
37424b0c42
Update russian translation
2014-04-28 01:54:34 +04:00
reger
4e57000a40
remove redundant javascript & id in index.html
...
to set focus to query field in IE11
2014-04-27 22:22:00 +02:00
reger
ffc5b75c73
optimize and fix lat / lon assignment
2014-04-27 20:52:06 +02:00
reger
9313447de2
reimplement tighter lat/lon calc in URIMetadataNode
...
from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272
2014-04-27 18:20:33 +02:00
reger
d812f80784
add exit proxy link to UrlProxy
...
on proxied pages a link to exit proxy is added to top of page.
Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.
2014-04-26 22:27:59 +02:00
reger
78d08998db
throw MalformedURLException on unknown protocol
...
on other than the supported http https ftp file smb \\ mailto
2014-04-26 01:30:51 +02:00
reger
bb8181b2be
fix: resolve url without path but searchpart
...
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47
added test case for getHost
2014-04-25 20:15:55 +02:00
orbiter
a3542f29b4
npe fix
2014-04-25 09:26:20 +02:00
orbiter
c48d2a2a02
npe fix
2014-04-25 09:23:10 +02:00
reger
121d25be38
recover sax fatal error on OAI-PMH import of xml with entity error
...
this allows to continue loading next resumptionToken even if import file caused sax parser error
fix http://mantis.tokeek.de/view.php?id=63
2014-04-25 01:05:28 +02:00
reger
81dc2aa536
add current css to HTMLResponseWriter to fix metadata view
...
(using css from metas.template except js links)
2014-04-23 23:41:10 +02:00
orbiter
2fd8a0ead6
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-04-23 23:13:23 +02:00
orbiter
8e5ce7cd51
fixed a situation where finished crawls had not been detected.
2014-04-23 23:13:07 +02:00
orbiter
c6f0bd05f8
better removal of stored urls when doing a crawl start
2014-04-23 23:12:08 +02:00
orbiter
2f63bd0261
enhanced Host Balancer strategy: fair round robin
2014-04-23 23:11:37 +02:00
orbiter
0c88a32c36
do not apply lazy value instantiation for numeric or boolean values
...
because that is misleading and confusing in case of 0- or false-values
and may cause NPEs in retrieval functions.
2014-04-23 08:41:36 +02:00
orbiter
8e04030596
in case of short memory, do not cut down robinson peers to 1, just
...
reduce by 50%
2014-04-23 08:37:19 +02:00
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
2014-04-23 00:55:16 +02:00
orbiter
469e0a62f1
added new button to terminate all crawls
2014-04-22 23:14:54 +02:00
orbiter
ccb1864d55
catch IllegalArgumentException for wrong process types (that is needed
...
for migrations when new process types are introduced or disappear)
2014-04-22 23:14:05 +02:00
orbiter
4ee4ba1576
fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of
...
lazy value instantiation of 0-value in crawldepth_i
2014-04-22 19:48:49 +02:00
orbiter
12ba890205
removed warnings
2014-04-22 19:35:15 +02:00
reger
d51f9cc863
add custom Jetty errorhandler
...
to provide custom error page footer line
- remove redundant mime check in UrlProxyServlet
2014-04-21 17:28:21 +02:00
reger
c193a02023
defer creation of new ArrayList after possible early return
...
(to skip not used object allocation)
2014-04-21 17:16:06 +02:00
reger
727dfb5875
refactore URIMetadataNode to further unify interaction with index
...
- URIMetadataNode extending SolrDocument
- use language as stored (String), reducing conversion to string
- optimize debug code in transferIndex
2014-04-20 01:41:30 +02:00
reger
79e7947442
- remove empty http0_9 status text array
...
and unused default_charset = ISO-8859-1
2014-04-18 22:03:16 +02:00
reger
2dabe2009d
- remove unused manual http KeepAlive config
...
(reducing references to obsolete httpdemon)
- add port info to settings_http
2014-04-18 19:57:35 +02:00
Michael Peter Christen
5746aae3db
add canonical links to the same crawldepth, not the next crawldepth
2014-04-18 06:51:46 +02:00
Michael Peter Christen
74ab5ef9fa
increased runtime for postprocessing query job
2014-04-18 06:51:10 +02:00
Michael Peter Christen
8b32dd5f9e
special strategy for balancer: do not remove targets with zero wait time
...
from the queue
2014-04-18 06:50:07 +02:00
Michael Peter Christen
9c6228d948
fix for deadlocks in crawler
2014-04-17 16:58:17 +02:00
Michael Peter Christen
7a2f3e2353
increased resource.disk.used.max.steadystate and
...
resource.disk.used.max.overshot by 4 times because first users reached
that limit and wondered why the crawler was paused automatically :)
The crawler will now stop at 2TB disk usage :)
2014-04-17 16:19:38 +02:00
Michael Peter Christen
10cf8215bd
added crawl depth for failed documents
2014-04-17 13:21:43 +02:00
Michael Peter Christen
7fefebaeca
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-17 12:55:38 +02:00
Michael Peter Christen
c2f62e783f
- better subgraph handling, less overhead for crawls without the
...
webgraph
- usage of crawler crawldepth cache for the linkgraph target depth
computation
2014-04-17 12:54:18 +02:00
Michael Peter Christen
06afb568e2
new Strategies in Balancer:
...
- doublecheck cache now records the crawl depth as well
- doublecheck cache is available from the outside (made static)
- no more need to crawl hosts with lowest depth first, instead all hosts
which have only singleton entries are preferred to reduce the number of
files.
2014-04-17 12:52:54 +02:00
Michael Peter Christen
1aea01fe5b
fix for Table in case that requested file does not exist and paths also
...
do not exist
2014-04-17 12:44:05 +02:00
reger
710054bb37
implement gzip input handling directly in defaultservlet
...
(making reference to legacy httpdemon obsolete)
2014-04-17 03:20:29 +02:00