sixcooler
830057d788
lower Segment-size (hope to get Segments of 10GB)
...
see:
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5216&p=30036#p30034
2014-05-19 17:55:03 +02:00
orbiter
c028ae9b09
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-05-18 21:21:17 +02:00
reger
e31493e139
"Use remote proxy for yacy" has no function, remove option and related config item
...
see/fix bug http://mantis.tokeek.de/view.php?id=23
http://mantis.tokeek.de/view.php?id=189
2014-05-17 23:36:59 +02:00
orbiter
181784a5cb
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-05-15 08:06:59 +02:00
reger
0587077d06
cleanup obsolete and not used serverswitch Authentify code
...
as auth is mostly delegated to Jetty container.
2014-05-14 23:13:49 +02:00
orbiter
c9f66be20b
move unnecessary nested else out of condition
2014-05-13 22:31:12 +02:00
orbiter
0d8072aa99
removed warnings
2014-05-13 22:29:05 +02:00
orbiter
88f4af90da
removed warnings
2014-05-13 22:27:31 +02:00
orbiter
0f425e01ca
another circle computation enhancement
2014-05-13 21:30:47 +02:00
reger
a8d162810c
Exclude = from percent-encoding in MultiProtocolURL
...
fix http://mantis.tokeek.de/view.php?id=185 and http://mantis.tokeek.de/view.php?id=280
2014-05-13 02:33:35 +02:00
reger
024f8e9b33
fix truncated urls containing ","
...
adressing http://mantis.tokeek.de/view.php?id=58
Exclude comma from percent-encoding in MultiProtocolURL (see RFC 1738 2.2 and RFC 3986 2.2)
2014-05-13 01:50:15 +02:00
Michael Peter Christen
9112f0a2df
enhanced circle tool initialization
2014-05-12 16:21:24 +02:00
Michael Peter Christen
a1ac4c3b76
automatically clear graphics cache
2014-05-12 15:45:25 +02:00
Michael Peter Christen
505f58c79c
enhanced circle computation time and memory footprint
2014-05-12 15:34:56 +02:00
reger
cd8c0dbda9
assign serialVersionUID for proxyservlet, too.
2014-05-11 03:51:47 +02:00
reger
b300d7f4ce
set serialVersionUID on urlproxyservlet to skip compiler warning
...
- remove commented out code
2014-05-11 03:31:07 +02:00
reger
e9060d31bd
update to Jetty 9
...
besides adjustments in code it makes the servlet settings in web.xml significant.
This applies to solr, gsa and proxy servlet. There is no longer a default setup in code during init (as jetty 9 checks for double definition).
2014-05-11 01:53:11 +02:00
reger
1432a817dd
respect "index media" switched off in CrawlStartExpert.html
...
fix http://mantis.tokeek.de/view.php?id=64
2014-05-08 22:21:24 +02:00
orbiter
39e1913585
next development step: migration to java 1.7
...
This includes also a small code change to test generic type inference, a
java 1.7 feature
2014-05-08 07:41:11 +02:00
Michael Peter Christen
4e734815e8
enhanced snippets: remove lines which are identical to the title and
...
choose longer versions if possible. Prefer the description part.
2014-05-06 16:48:50 +02:00
Michael Peter Christen
e84e07399a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-05-06 14:51:57 +02:00
orbiter
89f76da24b
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-05-06 05:38:38 +02:00
sixcooler
390f03e041
o not check for segments-count on optimize:
...
this is also done in Solr and our getSegmentsCount() does not return
up-to-date values
2014-05-05 13:24:41 +02:00
reger
8a7c68e4c7
content of surrogates/out never accessed (remove)
...
After import the conent is never accessed but may take up a lot of disk space,
also the getLoadedOAIServer (which lists the files in surrogate out) is not used.
Making the surrogate.out obsolete. Removed keeping of xmls after import.
2014-05-04 09:29:07 +02:00
sixcooler
b8cee9b7d8
remove tables from tabletracker on close to avoid lots of dead entrys in
...
/PerformanceMemory_p.html
2014-05-02 22:55:47 +02:00
reger
1600414450
fix NPE on continuing crawls after YaCy restart
...
(Agent is then nulll)
2014-05-02 19:32:09 +02:00
Michael Peter Christen
229f2248b8
added configuration option for maxmimum load and minimum ram for
...
postprocessing
2014-04-30 13:26:32 +02:00
orbiter
f15c832587
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-04-30 07:42:52 +02:00
Marc Nause
c97da1a0d8
First draft of a blacklist API.
2014-04-30 00:48:38 +02:00
Michael Peter Christen
d4f65833a1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-29 19:51:01 +02:00
Michael Peter Christen
c1c1be8f02
fix for slow crawling and better logging in balancer
2014-04-29 19:50:33 +02:00
Michael Peter Christen
3acf416335
npe fix
2014-04-29 19:24:05 +02:00
reger
2eb7682772
add html5 audio/video <source> tag to html content scraper
...
- <source src=.. type=..> tag content is added to embed collection
2014-04-29 00:41:29 +02:00
reger
0b6db04e40
fix contentscraper img height/width parsing
...
prevent numberformat exception on common "100px" property
- include in test case
2014-04-28 04:59:47 +02:00
reger
ffc5b75c73
optimize and fix lat / lon assignment
2014-04-27 20:52:06 +02:00
reger
9313447de2
reimplement tighter lat/lon calc in URIMetadataNode
...
from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272
2014-04-27 18:20:33 +02:00
reger
d812f80784
add exit proxy link to UrlProxy
...
on proxied pages a link to exit proxy is added to top of page.
Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed.
2014-04-26 22:27:59 +02:00
reger
78d08998db
throw MalformedURLException on unknown protocol
...
on other than the supported http https ftp file smb \\ mailto
2014-04-26 01:30:51 +02:00
reger
bb8181b2be
fix: resolve url without path but searchpart
...
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47
added test case for getHost
2014-04-25 20:15:55 +02:00
orbiter
a3542f29b4
npe fix
2014-04-25 09:26:20 +02:00
orbiter
c48d2a2a02
npe fix
2014-04-25 09:23:10 +02:00
reger
121d25be38
recover sax fatal error on OAI-PMH import of xml with entity error
...
this allows to continue loading next resumptionToken even if import file caused sax parser error
fix http://mantis.tokeek.de/view.php?id=63
2014-04-25 01:05:28 +02:00
reger
81dc2aa536
add current css to HTMLResponseWriter to fix metadata view
...
(using css from metas.template except js links)
2014-04-23 23:41:10 +02:00
orbiter
2fd8a0ead6
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-04-23 23:13:23 +02:00
orbiter
8e5ce7cd51
fixed a situation where finished crawls had not been detected.
2014-04-23 23:13:07 +02:00
orbiter
2f63bd0261
enhanced Host Balancer strategy: fair round robin
2014-04-23 23:11:37 +02:00
orbiter
0c88a32c36
do not apply lazy value instantiation for numeric or boolean values
...
because that is misleading and confusing in case of 0- or false-values
and may cause NPEs in retrieval functions.
2014-04-23 08:41:36 +02:00
orbiter
8e04030596
in case of short memory, do not cut down robinson peers to 1, just
...
reduce by 50%
2014-04-23 08:37:19 +02:00
reger
86f6975edc
exclude html tags in in/outboundlinks_anchortext_txt parsed text
...
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
2014-04-23 00:55:16 +02:00
orbiter
ccb1864d55
catch IllegalArgumentException for wrong process types (that is needed
...
for migrations when new process types are introduced or disappear)
2014-04-22 23:14:05 +02:00