Commit Graph

8410 Commits

Author SHA1 Message Date
Michael Christen
71649a1296 added an api to retrieve the new citation.index with the
webstructure.xml api. This api will respond with details about a single
URL if requested with 'webstructure.xml?about=[url|urlhash|host]'.
2012-03-29 17:22:31 +02:00
Michael Christen
8fc86fe397 added storage of full anchor link structure:
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
2012-03-29 17:20:14 +02:00
Michael Christen
22f05c83ff fixed default must-match filter for full domain crawls - the old filter
was to restrictive and did not allow intranet crawls
2012-03-28 21:50:00 +02:00
Lotus
3e61287326 some better feedback on properties change 2012-03-25 22:21:42 +02:00
Lotus
96ac95cff9 added hint how to change integration options 2012-03-23 17:02:50 +01:00
Thomas
4f61b8fd82 Fixes for compare-search 2012-03-21 21:43:47 +01:00
Thomas
e0680de7b3 Remove Scroogle from compare-search, Scroogle is dead 2012-03-20 23:00:06 +01:00
Lotus
78f0d8f046 no focus on preview frames for search integration
fixes bug http://bugs.yacy.net/view.php?id=161
2012-03-17 21:10:29 +01:00
Lotus
0b3f39136e allow custom ppm lower than minimum button on /Crawler_p.html
fixes http://bugs.yacy.net/view.php?id=166
2012-03-17 20:43:19 +01:00
Lotus
e14eb9de82 checkalive.sh: try to fetch only once (default: 20) 2012-03-12 09:30:44 +01:00
Lotus
7792ac6406 fix links & bug #163 2012-03-10 10:59:56 +01:00
Michael Peter Christen
532c7cf827 added physics experiment to the graph plotter. not active by default 2012-02-28 13:18:46 +01:00
Michael Peter Christen
aba9b1bfa0 better names for elements of a linked graph 2012-02-27 21:27:17 +01:00
Michael Peter Christen
0cc0290978 bugfix for a must-not-match pattern check. This bug did not make the
check semantically wrong, but a trick that prevented an IP lookup in
case that the filter was not used did not work. That bugfix causes that
crawling gets a huge speed boost for noload urls!
2012-02-27 00:52:44 +01:00
Michael Peter Christen
2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html

and the following test programm:

public class QueueLengthTimeTest {


    public static long countTest(Queue<Integer> q, int c) {
        long t = System.currentTimeMillis();
        for (int i = 0; i < c; i++) {
            q.add(q.size());
        }
        return System.currentTimeMillis() - t;
    }

    public static void main(String[] args) {
        int c = 1;
        for (int i = 0; i < 100; i++) {
            Runtime.getRuntime().gc();
            long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
            Runtime.getRuntime().gc();
            long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
            Runtime.getRuntime().gc();
            long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);

            System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
            c = c * 2;
        }
    }
}
2012-02-27 00:42:32 +01:00
Michael Peter Christen
8aba045ba1 if a new pop-up page is set in config portal, then this page applies
also to the default page configuration for the httpd if no path is
given.
2012-02-26 20:53:32 +01:00
Michael Peter Christen
fa7b3481b3 better navigation in file search: less results by first try, but much
faster. after the first search is done, buttons appear to get more
results for the same search
2012-02-26 17:32:45 +01:00
reger
5fd2c30318 adjust Netbeans project class path settings to updated httpclient and commons jars 2012-02-26 00:06:57 +01:00
reger
aae75def69 fix: prevent logging of Solr doc content
with attached Solr server transfered content is written to log despite
log level = off
fixed naming of httpclient logger
2012-02-26 00:04:25 +01:00
Michael Peter Christen
8c06925984 animation of the web structure picture 2012-02-25 15:42:29 +01:00
Michael Peter Christen
898fa7c3f3 use tld heuristic to check if a domain is local or global 2012-02-25 15:41:20 +01:00
Michael Peter Christen
213c8d97f2 use less proccesses in process pool 2012-02-25 14:07:20 +01:00
Michael Peter Christen
c639248c23 protection against strange answers from remote peers during search 2012-02-25 14:07:02 +01:00
Michael Peter Christen
9c51db4243 Release_1.02 2012-02-25 12:59:19 +01:00
Michael Peter Christen
36e4d82b27 changed ranking 2012-02-25 12:58:12 +01:00
Michael Peter Christen
99c74699de removed scroogle (scroogle is dead) 2012-02-25 12:57:59 +01:00
Michael Peter Christen
f7ed050771 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-02-25 12:44:02 +01:00
Michael Peter Christen
096c17e7cd added test code 2012-02-25 12:42:13 +01:00
Lotus
84f506da68 update installed jre version 2012-02-24 09:11:48 +01:00
Michael Peter Christen
6e51a00a2f Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size"
This reverts commit 73f5a9e8b3.
2012-02-24 02:46:56 +01:00
Michael Peter Christen
73f5a9e8b3 fix for page navigation: show only as much pages as are available for
given navigation constraints, not as given by total results size
2012-02-24 02:31:03 +01:00
Michael Peter Christen
9c51dc0f13 fixed a bug with navigation: if a navigation was applied to file type or
protocol, then it was not possible to remove that again. This is the fix
for that.
2012-02-24 02:28:40 +01:00
Michael Peter Christen
665626a51b catch OOM errors during scanning 2012-02-24 02:15:27 +01:00
Michael Peter Christen
8bfc987374 enhanced hint how to enter file:// urls 2012-02-24 02:14:54 +01:00
Michael Peter Christen
f838997126 updated commons io from 2.0.1 to 2.1 2012-02-24 01:35:01 +01:00
Michael Peter Christen
1cd711d005 added classes for citation references (for new citation ranking) 2012-02-24 01:07:15 +01:00
Michael Peter Christen
eeb57ae824 updated http client libraries 2012-02-24 01:06:30 +01:00
Michael Peter Christen
33a405dab8 ipv6 bugfix 2012-02-24 00:50:46 +01:00
Michael Peter Christen
c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148 2012-02-24 00:38:57 +01:00
Michael Peter Christen
edaa8ac94c Merge commit 'e15e633a0128b8d31011283a65b4ef26a6dddcd8' 2012-02-23 10:07:13 +01:00
reger
e15e633a01 Bugfix for IE9 (doesn't accept html form within form)
changes of API schedule row data changed form input form to unique field names
using row pk.
Fix for issue 96 http://bugs.yacy.net/view.php?id=96

IE9-64bit doesn't interprete iframe with align parameter as desired
misaligns following content (in CrawlProfileEditor_p.html)
2012-02-23 02:40:07 +01:00
Michael Peter Christen
716db3b79a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-02-23 01:26:09 +01:00
Michael Peter Christen
e0f1e7d904 added new citation reference data structure that shall be used for a
citation ranking
2012-02-23 01:22:29 +01:00
Michael Peter Christen
e18a4f6b74 more tolerant merge iterator 2012-02-23 01:21:24 +01:00
Michael Peter Christen
0d148c3353 more logging in resource observer 2012-02-23 01:20:42 +01:00
Michael Peter Christen
2fa037ae1d enhanced crawler 2012-02-23 01:20:24 +01:00
Lotus
43ffae6590 delete yacy.running after kill as requested in
http://forum.yacy-websuche.de/viewtopic.php?t=3835
2012-02-22 18:41:32 +01:00
Michael Peter Christen
e101c2e0e2 added changes from copperdust (submitted by email):
1. Improved and fixed language detection:
	1.1 Identificator.java - recognition fix (improved)
	1.2 DCEntry.java - fix (changed detection order due to detection from
tld in many cases is incorrect)
	1.3 MultiProtocolURI.java - fixed and enhanced language from tld
detection (all currently used top-level domains; ccTLD added but not
tested).
2. Ukrainian language update.
3. Main Slavic languages langstats (tested and works fine).
2012-02-22 12:21:27 +01:00
Michael Peter Christen
58e08b1211 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-02-21 22:31:49 +01:00
Michael Peter Christen
a9b4d49b75 removed debug output 2012-02-21 22:31:14 +01:00