Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
2012-04-18 10:30:04 +02:00
Michael Peter Christen
eb2c8ffa62
display is not used any more
2012-04-17 12:30:14 +02:00
Michael Peter Christen
91a86f0b06
fixed to network graph testing
2012-04-17 11:46:14 +02:00
Michael Peter Christen
f31ad84d98
automatic generation of blacklist pattern, see
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2685&p=25305#p25305
2012-04-17 11:22:19 +02:00
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
2012-04-16 23:43:50 +02:00
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
reger
06951ef751
remove heuristic scroogle from search option help text in index.html
2012-04-16 04:00:04 +02:00
Michael Peter Christen
e377092198
fix to xml output format
2012-04-13 09:02:18 +02:00
Michael Christen
41be98dc9d
extended webstructure api to show together with incoming links also
...
outgoing links
2012-04-13 11:53:34 +02:00
Michael Christen
02e4dedff2
fix to url citation collection
2012-04-13 11:52:59 +02:00
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
2012-04-13 07:09:15 +02:00
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
2012-04-13 06:47:33 +02:00
Michael Christen
8f89c8ef07
added information about inbound, outbound and citation links into
...
yacydoc api servlet
2012-03-31 07:38:49 +02:00
Michael Christen
71649a1296
added an api to retrieve the new citation.index with the
...
webstructure.xml api. This api will respond with details about a single
URL if requested with 'webstructure.xml?about=[url|urlhash|host]'.
2012-03-29 17:22:31 +02:00
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
2012-03-29 17:20:14 +02:00
Michael Christen
22f05c83ff
fixed default must-match filter for full domain crawls - the old filter
...
was to restrictive and did not allow intranet crawls
2012-03-28 21:50:00 +02:00
Lotus
3e61287326
some better feedback on properties change
2012-03-25 22:21:42 +02:00
Lotus
96ac95cff9
added hint how to change integration options
2012-03-23 17:02:50 +01:00
Thomas
4f61b8fd82
Fixes for compare-search
2012-03-21 21:43:47 +01:00
Thomas
e0680de7b3
Remove Scroogle from compare-search, Scroogle is dead
2012-03-20 23:00:06 +01:00
Lotus
78f0d8f046
no focus on preview frames for search integration
...
fixes bug http://bugs.yacy.net/view.php?id=161
2012-03-17 21:10:29 +01:00
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
2012-03-17 20:43:19 +01:00
Lotus
e14eb9de82
checkalive.sh: try to fetch only once (default: 20)
2012-03-12 09:30:44 +01:00
Lotus
7792ac6406
fix links & bug #163
2012-03-10 10:59:56 +01:00
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
2012-02-28 13:18:46 +01:00
Michael Peter Christen
aba9b1bfa0
better names for elements of a linked graph
2012-02-27 21:27:17 +01:00
Michael Peter Christen
0cc0290978
bugfix for a must-not-match pattern check. This bug did not make the
...
check semantically wrong, but a trick that prevented an IP lookup in
case that the filter was not used did not work. That bugfix causes that
crawling gets a huge speed boost for noload urls!
2012-02-27 00:52:44 +01:00
Michael Peter Christen
2fc8ecee36
ConcurrentLinkedQueue has a VERY long return time on the .size() method.
...
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
and the following test programm:
public class QueueLengthTimeTest {
public static long countTest(Queue<Integer> q, int c) {
long t = System.currentTimeMillis();
for (int i = 0; i < c; i++) {
q.add(q.size());
}
return System.currentTimeMillis() - t;
}
public static void main(String[] args) {
int c = 1;
for (int i = 0; i < 100; i++) {
Runtime.getRuntime().gc();
long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
Runtime.getRuntime().gc();
long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
Runtime.getRuntime().gc();
long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);
System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
c = c * 2;
}
}
}
2012-02-27 00:42:32 +01:00
Michael Peter Christen
8aba045ba1
if a new pop-up page is set in config portal, then this page applies
...
also to the default page configuration for the httpd if no path is
given.
2012-02-26 20:53:32 +01:00
Michael Peter Christen
fa7b3481b3
better navigation in file search: less results by first try, but much
...
faster. after the first search is done, buttons appear to get more
results for the same search
2012-02-26 17:32:45 +01:00
reger
5fd2c30318
adjust Netbeans project class path settings to updated httpclient and commons jars
2012-02-26 00:06:57 +01:00
reger
aae75def69
fix: prevent logging of Solr doc content
...
with attached Solr server transfered content is written to log despite
log level = off
fixed naming of httpclient logger
2012-02-26 00:04:25 +01:00
Michael Peter Christen
8c06925984
animation of the web structure picture
2012-02-25 15:42:29 +01:00
Michael Peter Christen
898fa7c3f3
use tld heuristic to check if a domain is local or global
2012-02-25 15:41:20 +01:00
Michael Peter Christen
213c8d97f2
use less proccesses in process pool
2012-02-25 14:07:20 +01:00
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
2012-02-25 14:07:02 +01:00
Michael Peter Christen
9c51db4243
Release_1.02
2012-02-25 12:59:19 +01:00
Michael Peter Christen
36e4d82b27
changed ranking
2012-02-25 12:58:12 +01:00
Michael Peter Christen
99c74699de
removed scroogle (scroogle is dead)
2012-02-25 12:57:59 +01:00
Michael Peter Christen
f7ed050771
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-02-25 12:44:02 +01:00
Michael Peter Christen
096c17e7cd
added test code
2012-02-25 12:42:13 +01:00
Lotus
84f506da68
update installed jre version
2012-02-24 09:11:48 +01:00
Michael Peter Christen
6e51a00a2f
Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size"
...
This reverts commit 73f5a9e8b3
.
2012-02-24 02:46:56 +01:00
Michael Peter Christen
73f5a9e8b3
fix for page navigation: show only as much pages as are available for
...
given navigation constraints, not as given by total results size
2012-02-24 02:31:03 +01:00
Michael Peter Christen
9c51dc0f13
fixed a bug with navigation: if a navigation was applied to file type or
...
protocol, then it was not possible to remove that again. This is the fix
for that.
2012-02-24 02:28:40 +01:00
Michael Peter Christen
665626a51b
catch OOM errors during scanning
2012-02-24 02:15:27 +01:00
Michael Peter Christen
8bfc987374
enhanced hint how to enter file:// urls
2012-02-24 02:14:54 +01:00
Michael Peter Christen
f838997126
updated commons io from 2.0.1 to 2.1
2012-02-24 01:35:01 +01:00
Michael Peter Christen
1cd711d005
added classes for citation references (for new citation ranking)
2012-02-24 01:07:15 +01:00
Michael Peter Christen
eeb57ae824
updated http client libraries
2012-02-24 01:06:30 +01:00