Commit Graph

700 Commits

Author SHA1 Message Date
Michael Peter Christen
d8425e6809 added collections to crawl monitor 2012-09-04 14:47:53 +02:00
Michael Peter Christen
528d6763fa - added new solr fields:
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
2012-08-31 10:30:43 +02:00
Michael Peter Christen
316b5fe116 - added a solr type definition verifier
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes
2012-08-28 16:58:06 +02:00
Michael Peter Christen
e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
- added a new geolocation field to index schema, the old values are
migrated if possible
2012-08-27 14:41:33 +02:00
orbiter
2094df2e4e - correct length computation for BStringObject (bugfix suggested by
apfelmaennchen)
- using ASCII for string conversion for Strings generated from Integer
2012-08-26 17:46:40 +02:00
Michael Peter Christen
4716546ef5 - reduced memory usage in index transmission using a transformation of
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
2012-08-22 16:30:33 +02:00
Michael Peter Christen
06b0081fdc fix for NPE during host navigation computation 2012-08-22 01:55:39 +02:00
orbiter
acb9f04e80 removed unused classes 2012-08-21 18:18:30 +02:00
Michael Peter Christen
755f5e76cf removed strange assert statements and simplified code in metadata
transformation
2012-08-19 08:44:39 +02:00
orbiter
ee01c12e56 fixes for putDocument and putMetadata 2012-08-18 13:05:27 +02:00
Michael Peter Christen
f9fc5cfaba better check for bad urls in url transmission 2012-08-17 17:17:00 +02:00
Michael Peter Christen
40c0856489 refactoring 2012-08-17 15:33:02 +02:00
Michael Peter Christen
9bece5ac5f enhanced snippet fetch - removed a bug that caused documents to be
parsed even if a solr text was available
2012-08-17 14:22:07 +02:00
Michael Peter Christen
395b78a0d8 using the solr search index to concurrently search within solr and the
rwis during local search requests.
2012-08-17 01:21:56 +02:00
Michael Peter Christen
e5ef840f40 - renamed DoubleSolrConnector to MirrorSolrConnector and added a
hit/miss/document cache to the MirrorSolrConnector.
- more abstraction to SolrDocument in Connector interface
- bugfixes in Solr field reader
2012-08-13 13:32:32 +02:00
Michael Peter Christen
94a334f128 another fix to the Solr metadata reading process and to the shutdown
process
2012-08-13 11:13:53 +02:00
Michael Peter Christen
b51df6c7e8 - added coordinate storage in solr schema
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html
2012-08-13 10:40:04 +02:00
Michael Peter Christen
f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
2012-08-10 13:26:51 +02:00
Michael Peter Christen
dcc72799c4 better abstraction for result writers using controlled vocabularies and
URIRefs
2012-08-10 07:45:43 +02:00
Michael Peter Christen
a12f693ec9 added two response writer for embedded solr interface:
a rss/opensearch writer and an enhanced solr xml writer.
The enhanced solr writer has less configuration overhead than the
original writer and should by slightly faster. The rss/opensearch writer
is at this time slightly incomplete compared with the already existing
rss search result form YaCy and also snippets are missing at this time.
To test the new interface, open for example:
http://localhost:8090/solr/select?wt=rss&q=olympia
The wt-code for the new result writers are=
wt=rss for opensearch
wt=exml for the enhanced solr xml writer.
Additionally, the SRU search parameters had been added to the solr
interface which can now also be used for a normal solr/xml search.
2012-08-09 18:06:48 +02:00
sixcooler
f32aa9a49c prevent merge of blobs that can't be handled in memory 2012-07-31 23:23:16 +02:00
Michael Peter Christen
1687737771 Abstraction of HandleMap and HandleSet 2012-07-27 12:13:53 +02:00
Michael Peter Christen
e432bb9cd9 better calculation of possible saving in HeapReader index data structure 2012-07-26 10:05:06 +02:00
Michael Peter Christen
9549984c65 documentation/comments 2012-07-25 21:34:23 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
Michael Peter Christen
f0a079ac9f allow larger log entries 2012-07-14 16:28:14 +02:00
Michael Peter Christen
784a4abb18 enhancement in internal data organization which should generate less
synchronizations in database access
2012-07-14 13:09:44 +02:00
Michael Peter Christen
f78ce93a80 collection of speed and memory saving hacks 2012-07-13 21:15:38 +02:00
orbiter
a196f24f60 prevent enqueueing of non-loggeable logging entries 2012-07-12 19:42:42 +02:00
orbiter
482afed07c reduced logging overhead (a bit) 2012-07-12 19:23:40 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
83da68c4c1 fixed a memory leak inside the logger which appeared if the log was
writter faster that the logger is able to print this out to its out
stream. A very large collection of unwritten log outputs had been seen
during strong crawling. The new ArrayBlockingQueue is limited to prevent
this case.
2012-07-12 01:23:04 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
Michael Peter Christen
1addbc792c use less memory for md5 cache 2012-07-08 22:05:04 +02:00
Michael Peter Christen
f32de94723 more logging 2012-07-08 22:04:36 +02:00
Michael Peter Christen
8efc1c1078 - fixed a memory leak (or bad usage) during parsing/snippet fetch
- more logging for errors
2012-07-06 09:05:41 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
5bd3c90907 - removed unnecessary semicolons
- added default case for switch
2012-07-05 11:18:31 +02:00
Michael Peter Christen
132afaf687 removed unaccessible code 2012-07-05 11:09:44 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
83701a1b4c removed unused ImageReference package 2012-07-05 10:24:52 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
Michael Peter Christen
1481037820 replaced non-generic array with collection 2012-07-05 01:02:51 +02:00
Michael Peter Christen
613b45f604 - better data structures in secondary search
- fixed a big memory leak in secondary search
2012-07-03 07:12:20 +02:00
Michael Peter Christen
8a82609360 - smaller caches to save memory
- close cloneable iterators to free memory
2012-07-02 15:40:40 +02:00
Michael Peter Christen
ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. 2012-07-02 10:27:46 +02:00
Michael Peter Christen
0c345d1559 giving threads name so its easier to see whats happening during
debugging and within a thread dump
2012-07-02 09:51:43 +02:00
Michael Peter Christen
b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods 2012-06-22 11:39:17 +02:00
Michael Peter Christen
de3ef8ad73 removed unimportant warnings 2012-06-19 08:45:34 +02:00
Michael Peter Christen
9264d8b4af removed old navigation practice using subject tags in favor of
triplestore-tags
2012-06-17 00:33:40 +02:00
Michael Peter Christen
61bb52d55c - using http://purl.org/dc/terms/references to refer from an
auto-annotated document to a 'pseudo-linked' document which has an url
created with an object-prefix as defined in the vocabulary file
2012-06-12 14:23:51 +02:00
Michael Peter Christen
8b53771db2 changed behavior of navigation processing:
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
2012-06-11 23:49:30 +02:00
Michael Peter Christen
bef823c247 close the reader if finished 2012-06-11 01:20:54 +02:00
cominch
9cbfc1a1c0 augmentedProxy, which forwards every proxy request to a
rewrite engine to customize existing webpages. originally implemented by
Florian Richter.

Conflicts:
	source/de/anomic/http/server/HTTPDProxyHandler.java
2012-06-10 10:15:34 +02:00
Michael Peter Christen
3b992e6b00 using utf8 String compression in Webstructure database 2012-06-09 11:00:33 +02:00
Michael Peter Christen
2280a7b276 - changed initialization order to prefer allocation of memory for table
files first
- bugfixes in memory amount calculation
2012-06-09 09:05:47 +02:00
Michael Peter Christen
0746308bc2 only the metadata tables shall be able to use the tail cache 2012-06-08 18:36:11 +02:00
Michael Peter Christen
7ec9bef0c3 fix for OOM 2012-06-08 17:14:09 +02:00
Michael Peter Christen
41c02cb10e - less restrictions for usage of Table RAM copy
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
2012-06-08 12:48:25 +02:00
Michael Peter Christen
b8f56a9803 npe bugfix 2012-06-08 10:20:43 +02:00
Michael Peter Christen
ba10caf89a lazy initialization of database tables 2012-06-08 09:30:51 +02:00
Michael Peter Christen
701b9a28a0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	htroot/PerformanceMemory_p.java
2012-06-08 09:16:16 +02:00
Michael Peter Christen
10c9c17d51 fixed handlemap spread factor and null iterator handling 2012-06-08 09:13:41 +02:00
Michael Peter Christen
b0095c8d3c flush the compressor cache when a cleanup is done 2012-06-07 19:42:33 +02:00
Michael Peter Christen
96e9d77270 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java
2012-06-06 20:13:28 +02:00
Michael Peter Christen
00f2df1120 a variety of possible memory leak fixes 2012-06-06 18:23:18 +02:00
Michael Peter Christen
3dd8376825 added automatic cleaning of cache if metadata and file database size is
not equal. It might happen that these data is different because one of
that caches is cleaned after a while or when it is too big. The metadata
is then not cleaned, but now wiped after a checkup process at every
application start. This should cause a bit less memory usage.
2012-06-06 14:15:24 +02:00
Michael Peter Christen
6bb07afcc3 accept also files with other file prefix; used to read 'foreign' cache
files
2012-06-06 13:36:10 +02:00
Michael Peter Christen
461a0ce052 removed warnings 2012-06-05 20:03:43 +02:00
Michael Peter Christen
407fdf6968 more bug fixes and performance hacks for search process 2012-06-05 15:04:23 +02:00
Michael Peter Christen
a1fe65b115 performance hacks 2012-06-05 12:06:26 +02:00
Michael Peter Christen
e0d8643226 - performance hacks
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
9b4c699526 ehanced location search:
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
2012-05-31 22:39:53 +02:00
Michael Peter Christen
1f48d1528b performance hacks 2012-05-31 00:46:30 +02:00
Michael Peter Christen
10da7335ea performance hack: use a hash cache for all hashes that are computed by a
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
2012-05-30 16:59:13 +02:00
Michael Peter Christen
7c1feefb28 introduced a default 10 second time-out in rwi normalization time
uring search process to prevent endless deadlocks after a very long
running search
2012-05-30 16:26:05 +02:00
Michael Peter Christen
8d997d55b6 better logging 2012-05-30 15:47:35 +02:00
Michael Peter Christen
43c2c6e588 better logging 2012-05-30 15:27:45 +02:00
Michael Peter Christen
c15fcde1c8 add-on to latest commit 2012-05-21 17:52:30 +02:00
Michael Peter Christen
cf47d94888 performance hack to parse numbers inside of substrings without actually
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
2012-05-21 13:40:46 +02:00
Michael Peter Christen
7e0ddbd275 added a "fromCache" flag in Response object to omit one cache.has()
check during snippet generation. This should cause less blockings
2012-05-21 03:03:47 +02:00
Michael Peter Christen
c6a09eab0b synchronization needed 2012-05-21 00:58:29 +02:00
reger
6696cb1313 bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer
SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset.
Changed the index init to insert lowercase peer names as key
2012-05-20 05:25:16 +02:00
Michael Peter Christen
f294f2e295 bugfix to http://bugs.yacy.net/view.php?id=181
tried to make a bit less 'noise' to dns server

also included: less processes in snippet fetch to reduce load during
search on small computers
2012-05-19 01:06:33 +02:00
Michael Peter Christen
acf8d521a2 fix for http://bugs.yacy.net/view.php?id=126 2012-05-19 00:21:03 +02:00
Michael Peter Christen
fa735f4f04 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-05-17 23:40:08 +02:00
Michael Peter Christen
3e1bc9477f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-05-17 13:58:09 +02:00
Michael Peter Christen
6f8a2fef1f small speed enhancement using a column factory 2012-05-17 11:08:48 +02:00
Roland 'Quix0r' Haeder
d10627d591 More sync in close() methods
Conflicts:
	source/net/yacy/kelondro/logging/GuiHandler.java
	source/net/yacy/kelondro/workflow/InstantBusyThread.java
2012-05-17 06:03:18 +02:00
Roland 'Quix0r' Haeder
fbb946f913 Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile 2012-05-17 05:55:44 +02:00
Michael Peter Christen
89142d1e8d removed (not all) warnings 2012-05-16 13:42:32 +02:00
Michael Peter Christen
15db703808 added missing serialization to remove all warnings 2012-05-15 13:13:07 +02:00
Michael Peter Christen
1795a7325b made HandleSet serializable 2012-05-15 12:55:15 +02:00
Roland 'Quix0r' Haeder
a093ccf5eb Now used synchronization in all close() methods to make sure all objects
are 'closed' in an ordered way

Conflicts:
	source/de/anomic/http/server/ChunkedInputStream.java
	source/de/anomic/http/server/ChunkedOutputStream.java
	source/de/anomic/http/server/ContentLengthInputStream.java
	source/net/yacy/cora/protocol/Domains.java
	source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
	source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
	source/net/yacy/document/content/dao/PhpBB3Dao.java
	source/net/yacy/document/parser/html/AbstractTransformer.java
	source/net/yacy/kelondro/blob/BEncodedHeap.java
	source/net/yacy/kelondro/blob/HeapReader.java
	source/net/yacy/kelondro/index/RAMIndexCluster.java
	source/net/yacy/kelondro/io/ByteCountInputStream.java
	source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
	source/net/yacy/kelondro/table/SQLTable.java
2012-05-14 07:41:55 +02:00
Michael Peter Christen
0cf3d36eae more tolerance in case of corrupted file 2012-05-11 20:46:50 +02:00
Michael Peter Christen
34f4225d7e less 'wellformed' calls without asserts 2012-05-08 23:24:39 +02:00
Michael Peter Christen
ba6aaabc51 refactoring + parser bugfixes 2012-05-04 17:28:27 +02:00
Michael Christen
e32055aa15 added stub classes for
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
2012-04-13 07:09:15 +02:00
Michael Peter Christen
2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
See
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html

and the following test programm:

public class QueueLengthTimeTest {


    public static long countTest(Queue<Integer> q, int c) {
        long t = System.currentTimeMillis();
        for (int i = 0; i < c; i++) {
            q.add(q.size());
        }
        return System.currentTimeMillis() - t;
    }

    public static void main(String[] args) {
        int c = 1;
        for (int i = 0; i < 100; i++) {
            Runtime.getRuntime().gc();
            long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c);
            Runtime.getRuntime().gc();
            long t2 = countTest(new LinkedBlockingQueue<Integer>(), c);
            Runtime.getRuntime().gc();
            long t3 = countTest(new ConcurrentLinkedQueue<Integer>(),
c);

            System.out.println("count = " + c + ": ArrayBlockingQueue =
" + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = "
+ t3);
            c = c * 2;
        }
    }
}
2012-02-27 00:42:32 +01:00
Michael Peter Christen
213c8d97f2 use less proccesses in process pool 2012-02-25 14:07:20 +01:00
Michael Peter Christen
c639248c23 protection against strange answers from remote peers during search 2012-02-25 14:07:02 +01:00
Michael Peter Christen
1cd711d005 added classes for citation references (for new citation ranking) 2012-02-24 01:07:15 +01:00
Michael Peter Christen
e0f1e7d904 added new citation reference data structure that shall be used for a
citation ranking
2012-02-23 01:22:29 +01:00
Michael Peter Christen
e18a4f6b74 more tolerant merge iterator 2012-02-23 01:21:24 +01:00
Michael Peter Christen
7e4e3fe5b6 free some memory after parsing html 2012-02-02 09:55:27 +01:00
Michael Peter Christen
4540174fe0 memory hacks 2012-02-02 07:37:00 +01:00
Michael Peter Christen
b4409cc803 small redesign of blob column index and usage 2012-02-02 06:43:57 +01:00
Michael Peter Christen
d5c1f2746e performance hack 2012-02-02 06:43:15 +01:00
Michael Peter Christen
803963aebd performance hack: better space grow in CharBuffer (speeds up html
parser)
2012-02-01 23:27:59 +01:00
Michael Peter Christen
e2f8f263e8 changed storage of search words: keep order 2012-02-01 18:13:31 +01:00
Michael Peter Christen
0b67a0a5d8 added a column index for tables in blob files. This is heavily used
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
e3bb73c3d6 serialized some database access methods 2012-01-31 21:13:49 +01:00
Michael Peter Christen
2ea585d616 fix for host navigator 2012-01-26 18:10:34 +01:00
Michael Peter Christen
ef78f22ee1 performance hack 2012-01-25 12:48:48 +01:00
Michael Peter Christen
a02fdf8625 better error messages 2012-01-23 00:47:25 +01:00
Michael Peter Christen
c6ba44468e timeout = 5000 instead 3000 2012-01-23 00:45:32 +01:00
low012
8776b84c10 *) small fix to make password change function of reconfigureYACY.sh work
again
2012-01-17 20:43:19 +01:00
Michael Peter Christen
4901cee3cc suppress auto-tagged subject entries when sending out or receiving
metadata from other peers
2012-01-17 02:10:05 +01:00
sixcooler
985b78cf89 correct 'avaiable()' to use max of young / eden 2012-01-16 16:59:58 +01:00
sixcooler
4da8746275 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-01-16 01:48:36 +01:00
sixcooler
c9aaa9e00a respect non-reserved Memory in GenerationMemoryStrategy
and enable it again
2012-01-16 01:46:12 +01:00
Michael Peter Christen
37f2d1b3e9 replaced Thread initialization with ExecutorService pool for delete
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
2012-01-16 01:05:30 +01:00
Michael Peter Christen
0d6176804b emergency disabling of GenerationMemoryStrategy because of non-working
available-method
2012-01-15 21:58:18 +01:00
Michael Peter Christen
87f0210480 enriched log output to find NPE in HeapReader 2012-01-15 12:08:46 +01:00
Michael Peter Christen
254adea51c small fixes 2012-01-13 11:24:08 +01:00
Michael Peter Christen
49be60a7c8 WorkflowProcess is forced to make small pauses if shortMemoryStatus is
reached.
2012-01-10 03:03:12 +01:00
Michael Peter Christen
b7bb84c0bb set a limit to CharBuffer object size to fight against bad/too large
content
2012-01-10 03:02:17 +01:00
Marek Otahal
72adbeae90 !Important: move from Hashtable to HashMap
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits

Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:29:18 +01:00
Marek Otahal
f75b5e40e0 little fix in copy()
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:16:46 +01:00
Michael Christen
216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
Conflicts:
	source/de/anomic/crawler/CrawlQueues.java
2012-01-04 20:16:37 +01:00
Michael Christen
20962a4ed7 added metadata node stub for metadata from blobs 2012-01-03 14:38:03 +01:00
Michael Christen
575dbbaa93 enhancements in Blob retrieval: try to use less CPU resources by testing
a blog first that most certainly has wanted entries.
2012-01-02 02:14:05 +01:00
Roland 'Quix0r' Haeder
6d4e08ed06 Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME 2011-12-29 03:42:38 +01:00
Roland 'Quix0r' Haeder
fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check 2011-12-29 00:33:16 +01:00
Michael Christen
9e5894c784 Removed handling of components objects for URIMetadataRows.
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Christen
c04bfaa51b refactoring 2011-12-16 23:59:29 +01:00
Michael Peter Christen
613ab6a69d added BEncodedHeapBag and BEncodedHeapShard which are storage container
for a new metadata store. An abstraction of the content for this storage
is defined with MapStore. A MapStore is an abstraction of a RDF Node
store.
2011-12-16 23:00:50 +01:00
Michael Christen
6fecd0db88 one more performance hack to prevent costly md5 computation 2011-12-15 23:33:41 +01:00
Michael Christen
e13441b069 better digest pool size (smaller by default but unlimited) 2011-12-15 17:45:46 +01:00
Michael Christen
1f4afb4dc0 performance hacks 2011-12-15 15:15:53 +01:00
Michael Christen
e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
2011-12-14 22:15:51 +01:00
Michael Peter Christen
0bcef2d156 added feature as requested in
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
Michael Christen
204c29f010 small bugfixes for search result display and cache display 2011-12-10 01:35:38 +01:00
Michael Christen
078fcde0dd bad initialization 2011-12-07 01:02:23 +01:00
Michael Christen
14e45e90fd patch for a bug that I don't understand by now. 2011-12-07 00:52:04 +01:00
Michael Christen
86b3385847 fixed a deadlock during secondary remote search 2011-12-07 00:18:34 +01:00