Commit Graph

502 Commits

Author SHA1 Message Date
Michael Peter Christen
0cf3d36eae more tolerance in case of corrupted file 2012-05-11 20:46:50 +02:00
Michael Peter Christen
34f4225d7e less 'wellformed' calls without asserts 2012-05-08 23:24:39 +02:00
Michael Peter Christen
ba6aaabc51 refactoring + parser bugfixes 2012-05-04 17:28:27 +02:00
Michael Christen
e32055aa15 added stub classes for
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
2012-04-13 07:09:15 +02:00
Michael Peter Christen
2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html

and the following test programm:

public class QueueLengthTimeTest {


    public static long countTest(Queue<Integer> q, int c) {
        long t = System.currentTimeMillis();
        for (int i = 0; i < c; i++) {
            q.add(q.size());
        }
        return System.currentTimeMillis() - t;
    }

    public static void main(String[] args) {
        int c = 1;
        for (int i = 0; i < 100; i++) {
            Runtime.getRuntime().gc();
            long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
            Runtime.getRuntime().gc();
            long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
            Runtime.getRuntime().gc();
            long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);

            System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
            c = c * 2;
        }
    }
}
2012-02-27 00:42:32 +01:00
Michael Peter Christen
213c8d97f2 use less proccesses in process pool 2012-02-25 14:07:20 +01:00
Michael Peter Christen
c639248c23 protection against strange answers from remote peers during search 2012-02-25 14:07:02 +01:00
Michael Peter Christen
1cd711d005 added classes for citation references (for new citation ranking) 2012-02-24 01:07:15 +01:00
Michael Peter Christen
e0f1e7d904 added new citation reference data structure that shall be used for a
citation ranking
2012-02-23 01:22:29 +01:00
Michael Peter Christen
e18a4f6b74 more tolerant merge iterator 2012-02-23 01:21:24 +01:00
Michael Peter Christen
7e4e3fe5b6 free some memory after parsing html 2012-02-02 09:55:27 +01:00
Michael Peter Christen
4540174fe0 memory hacks 2012-02-02 07:37:00 +01:00
Michael Peter Christen
b4409cc803 small redesign of blob column index and usage 2012-02-02 06:43:57 +01:00
Michael Peter Christen
d5c1f2746e performance hack 2012-02-02 06:43:15 +01:00
Michael Peter Christen
803963aebd performance hack: better space grow in CharBuffer (speeds up html
parser)
2012-02-01 23:27:59 +01:00
Michael Peter Christen
e2f8f263e8 changed storage of search words: keep order 2012-02-01 18:13:31 +01:00
Michael Peter Christen
0b67a0a5d8 added a column index for tables in blob files. This is heavily used
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
e3bb73c3d6 serialized some database access methods 2012-01-31 21:13:49 +01:00
Michael Peter Christen
2ea585d616 fix for host navigator 2012-01-26 18:10:34 +01:00
Michael Peter Christen
ef78f22ee1 performance hack 2012-01-25 12:48:48 +01:00
Michael Peter Christen
a02fdf8625 better error messages 2012-01-23 00:47:25 +01:00
Michael Peter Christen
c6ba44468e timeout = 5000 instead 3000 2012-01-23 00:45:32 +01:00
low012
8776b84c10 *) small fix to make password change function of reconfigureYACY.sh work
again
2012-01-17 20:43:19 +01:00
Michael Peter Christen
4901cee3cc suppress auto-tagged subject entries when sending out or receiving
metadata from other peers
2012-01-17 02:10:05 +01:00
sixcooler
985b78cf89 correct 'avaiable()' to use max of young / eden 2012-01-16 16:59:58 +01:00
sixcooler
4da8746275 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-01-16 01:48:36 +01:00
sixcooler
c9aaa9e00a respect non-reserved Memory in GenerationMemoryStrategy
and enable it again
2012-01-16 01:46:12 +01:00
Michael Peter Christen
37f2d1b3e9 replaced Thread initialization with ExecutorService pool for delete
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
2012-01-16 01:05:30 +01:00
Michael Peter Christen
0d6176804b emergency disabling of GenerationMemoryStrategy because of non-working
available-method
2012-01-15 21:58:18 +01:00
Michael Peter Christen
87f0210480 enriched log output to find NPE in HeapReader 2012-01-15 12:08:46 +01:00
Michael Peter Christen
254adea51c small fixes 2012-01-13 11:24:08 +01:00
Michael Peter Christen
49be60a7c8 WorkflowProcess is forced to make small pauses if shortMemoryStatus is
reached.
2012-01-10 03:03:12 +01:00
Michael Peter Christen
b7bb84c0bb set a limit to CharBuffer object size to fight against bad/too large
content
2012-01-10 03:02:17 +01:00
Marek Otahal
72adbeae90 !Important: move from Hashtable to HashMap
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits

Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:29:18 +01:00
Marek Otahal
f75b5e40e0 little fix in copy()
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:16:46 +01:00
Michael Christen
216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
Conflicts:
	source/de/anomic/crawler/CrawlQueues.java
2012-01-04 20:16:37 +01:00
Michael Christen
20962a4ed7 added metadata node stub for metadata from blobs 2012-01-03 14:38:03 +01:00
Michael Christen
575dbbaa93 enhancements in Blob retrieval: try to use less CPU resources by testing
a blog first that most certainly has wanted entries.
2012-01-02 02:14:05 +01:00
Roland 'Quix0r' Haeder
6d4e08ed06 Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME 2011-12-29 03:42:38 +01:00
Roland 'Quix0r' Haeder
fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check 2011-12-29 00:33:16 +01:00
Michael Christen
9e5894c784 Removed handling of components objects for URIMetadataRows.
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Christen
c04bfaa51b refactoring 2011-12-16 23:59:29 +01:00
Michael Peter Christen
613ab6a69d added BEncodedHeapBag and BEncodedHeapShard which are storage container
for a new metadata store. An abstraction of the content for this storage
is defined with MapStore. A MapStore is an abstraction of a RDF Node
store.
2011-12-16 23:00:50 +01:00
Michael Christen
6fecd0db88 one more performance hack to prevent costly md5 computation 2011-12-15 23:33:41 +01:00
Michael Christen
e13441b069 better digest pool size (smaller by default but unlimited) 2011-12-15 17:45:46 +01:00
Michael Christen
1f4afb4dc0 performance hacks 2011-12-15 15:15:53 +01:00
Michael Christen
e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
2011-12-14 22:15:51 +01:00
Michael Peter Christen
0bcef2d156 added feature as requested in
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
Michael Christen
204c29f010 small bugfixes for search result display and cache display 2011-12-10 01:35:38 +01:00
Michael Christen
078fcde0dd bad initialization 2011-12-07 01:02:23 +01:00