Commit Graph

8233 Commits

Author SHA1 Message Date
Michael Peter Christen
096c17e7cd added test code 2012-02-25 12:42:13 +01:00
Lotus
84f506da68 update installed jre version 2012-02-24 09:11:48 +01:00
Michael Peter Christen
6e51a00a2f Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size"
This reverts commit 73f5a9e8b3.
2012-02-24 02:46:56 +01:00
Michael Peter Christen
73f5a9e8b3 fix for page navigation: show only as much pages as are available for
given navigation constraints, not as given by total results size
2012-02-24 02:31:03 +01:00
Michael Peter Christen
9c51dc0f13 fixed a bug with navigation: if a navigation was applied to file type or
protocol, then it was not possible to remove that again. This is the fix
for that.
2012-02-24 02:28:40 +01:00
Michael Peter Christen
665626a51b catch OOM errors during scanning 2012-02-24 02:15:27 +01:00
Michael Peter Christen
8bfc987374 enhanced hint how to enter file:// urls 2012-02-24 02:14:54 +01:00
Michael Peter Christen
f838997126 updated commons io from 2.0.1 to 2.1 2012-02-24 01:35:01 +01:00
Michael Peter Christen
1cd711d005 added classes for citation references (for new citation ranking) 2012-02-24 01:07:15 +01:00
Michael Peter Christen
eeb57ae824 updated http client libraries 2012-02-24 01:06:30 +01:00
Michael Peter Christen
33a405dab8 ipv6 bugfix 2012-02-24 00:50:46 +01:00
Michael Peter Christen
c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148 2012-02-24 00:38:57 +01:00
Michael Peter Christen
edaa8ac94c Merge commit 'e15e633a0128b8d31011283a65b4ef26a6dddcd8' 2012-02-23 10:07:13 +01:00
reger
e15e633a01 Bugfix for IE9 (doesn't accept html form within form)
changes of API schedule row data changed form input form to unique field names
using row pk.
Fix for issue 96 http://bugs.yacy.net/view.php?id=96

IE9-64bit doesn't interprete iframe with align parameter as desired
misaligns following content (in CrawlProfileEditor_p.html)
2012-02-23 02:40:07 +01:00
Michael Peter Christen
716db3b79a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-02-23 01:26:09 +01:00
Michael Peter Christen
e0f1e7d904 added new citation reference data structure that shall be used for a
citation ranking
2012-02-23 01:22:29 +01:00
Michael Peter Christen
e18a4f6b74 more tolerant merge iterator 2012-02-23 01:21:24 +01:00
Michael Peter Christen
0d148c3353 more logging in resource observer 2012-02-23 01:20:42 +01:00
Michael Peter Christen
2fa037ae1d enhanced crawler 2012-02-23 01:20:24 +01:00
Lotus
43ffae6590 delete yacy.running after kill as requested in
http://forum.yacy-websuche.de/viewtopic.php?t=3835
2012-02-22 18:41:32 +01:00
Michael Peter Christen
e101c2e0e2 added changes from copperdust (submitted by email):
1. Improved and fixed language detection:
	1.1 Identificator.java - recognition fix (improved)
	1.2 DCEntry.java - fix (changed detection order due to detection from
tld in many cases is incorrect)
	1.3 MultiProtocolURI.java - fixed and enhanced language from tld
detection (all currently used top-level domains; ccTLD added but not
tested).
2. Ukrainian language update.
3. Main Slavic languages langstats (tested and works fine).
2012-02-22 12:21:27 +01:00
Michael Peter Christen
58e08b1211 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-02-21 22:31:49 +01:00
Michael Peter Christen
a9b4d49b75 removed debug output 2012-02-21 22:31:14 +01:00
low012
2120db289a *) Small change which should solve problem with cgitb module in Python CGI scripts. 2012-02-14 20:54:19 +01:00
Lotus
ee89cf5ae5 fix must match filter for full domain crawl
allow:
http://www.example.com
http://www.example.com/
http://www.example.com/abc.html?xyz=q
block:
http://www.example.com.cn
http://www.example.com.cn/dsf
2012-02-07 16:13:13 +01:00
reger
4f92389550 Merge branch 'master' of git://gitorious.org/yacy/rc1.git 2012-02-03 23:34:24 +01:00
Michael Peter Christen
8d63a5887c bugfixes 2012-02-02 23:38:23 +01:00
Michael Peter Christen
9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
2012-02-02 21:33:42 +01:00
Michael Peter Christen
7e4e3fe5b6 free some memory after parsing html 2012-02-02 09:55:27 +01:00
Michael Peter Christen
4540174fe0 memory hacks 2012-02-02 07:37:00 +01:00
Michael Peter Christen
b4409cc803 small redesign of blob column index and usage 2012-02-02 06:43:57 +01:00
Michael Peter Christen
d5c1f2746e performance hack 2012-02-02 06:43:15 +01:00
Michael Peter Christen
803963aebd performance hack: better space grow in CharBuffer (speeds up html
parser)
2012-02-01 23:27:59 +01:00
Michael Peter Christen
8b0920b0b5 tried to fix the ipv6 problem as reported in bug
but this did not solve all problems because a bug in the apache http
client prevented that it worked. Thread dump:
Caused by: java.lang.NumberFormatException: For input string:
"1450:400c:c01:0:0:0:69"
	at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
	at java.lang.Integer.parseInt(Integer.java:458)
	at java.lang.Integer.parseInt(Integer.java:499)
	at org.apache.http.client.utils.URIUtils.extractHost(URIUtils.java:310)
	at
org.apache.http.impl.client.AbstractHttpClient.determineTarget(AbstractHttpClient.java:764)
	at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
	at net.yacy.cora.protocol.http.HTTPClient.execute(HTTPClient.java:597)
	at
net.yacy.cora.protocol.http.HTTPClient.getContentBytes(HTTPClient.java:558)
	at net.yacy.cora.protocol.http.HTTPClient.GETbytes(HTTPClient.java:341)
	at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:131)
	at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:74)
	at
net.yacy.repository.LoaderDispatcher.loadInternal(LoaderDispatcher.java:274)
	at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:164)
	at net.yacy.repository.LoaderDispatcher.load(LoaderDispatcher.java:150)
	at
net.yacy.repository.LoaderDispatcher.loadDocument(LoaderDispatcher.java:355)
	at getpageinfo_p.respond(getpageinfo_p.java:97)
2012-02-01 22:26:19 +01:00
Michael Peter Christen
e2f8f263e8 changed storage of search words: keep order 2012-02-01 18:13:31 +01:00
Michael Peter Christen
ed39ef2890 changed generation of protocol information 2012-02-01 18:12:59 +01:00
Michael Peter Christen
0b67a0a5d8 added a column index for tables in blob files. This is heavily used
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
ffb72249ea added missing apicat.sh 2012-02-01 00:49:40 +01:00
Michael Peter Christen
c166eb68b6 fixes in solr schema file 2012-02-01 00:22:43 +01:00
Michael Peter Christen
2e5cd6a1b2 fixed parser extension deny list generation and usage 2012-02-01 00:15:59 +01:00
Michael Peter Christen
8bee1472c9 there is no noindex, only nofollow in links 2012-01-31 23:46:35 +01:00
Michael Peter Christen
5e18f54a8c added shell script to get a servlet. this is the same as apicall.sh but it prints the result to stdout 2012-01-31 23:21:49 +01:00
Michael Peter Christen
3cd6dcd352 do not add new solr fields as activated fields 2012-01-31 22:21:48 +01:00
Michael Peter Christen
e3bb73c3d6 serialized some database access methods 2012-01-31 21:13:49 +01:00
Michael Peter Christen
9727015213 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-01-31 18:18:13 +01:00
Michael Peter Christen
7e728867e5 added a synchronization around iterations to prevent IO-deadlocking
during concurrent remote search requests
2012-01-31 18:17:25 +01:00
david
f077b11d38 Merge branch 'master' of git://git.gitorious.org/yacy/rc1.git 2012-01-30 20:02:11 +01:00
Lotus
29675d9766 more label on search options (usability) 2012-01-30 20:02:02 +01:00
Michael Peter Christen
355ecf330f reduced target file site to 64mb 2012-01-29 20:35:48 +01:00
reger
fa1f35b0c8 Merge rc1/master 2012-01-29 20:06:10 +01:00