Commit Graph

90 Commits

Author SHA1 Message Date
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
241dd8410a removed snippet pattern filter - it was not used 2012-07-05 09:21:27 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
Michael Peter Christen
1481037820 replaced non-generic array with collection 2012-07-05 01:02:51 +02:00
Michael Peter Christen
613b45f604 - better data structures in secondary search
- fixed a big memory leak in secondary search
2012-07-03 07:12:20 +02:00
Michael Peter Christen
1825f165b8 better integration of blacklist according to use case 2012-07-02 13:57:29 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c fixing redirects and status codes: storing of status code in
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods 2012-06-22 11:39:17 +02:00
Michael Peter Christen
3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00
Michael Peter Christen
82a682b31d fixed problem with seed when switching network 2012-06-19 07:44:44 +02:00
Michael Peter Christen
8e97ada7c9 IPv6 bugfix 2012-06-18 00:33:32 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
#143

Conflicts:
	htroot/Supporter.java
	htroot/yacy/crawlReceipt.java
	htroot/yacy/transferRWI.java
	htroot/yacy/transferURL.java
	source/de/anomic/crawler/CrawlStacker.java
	source/de/anomic/data/ListManager.java
	source/net/yacy/peers/Protocol.java
	source/net/yacy/repository/Blacklist.java
	source/net/yacy/repository/LoaderDispatcher.java
	source/net/yacy/search/Switchboard.java
	source/net/yacy/search/index/MetadataRepository.java
	source/net/yacy/search/index/Segment.java
	source/net/yacy/search/query/RWIProcess.java
	source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Michael Peter Christen
3b992e6b00 using utf8 String compression in Webstructure database 2012-06-09 11:00:33 +02:00
Michael Peter Christen
a1fe65b115 performance hacks 2012-06-05 12:06:26 +02:00
Michael Peter Christen
e0d8643226 - performance hacks
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
7c1feefb28 introduced a default 10 second time-out in rwi normalization time
uring search process to prevent endless deadlocks after a very long
running search
2012-05-30 16:26:05 +02:00
Michael Peter Christen
65d37e6a20 only ASCII needed in seed bitflags 2012-05-30 15:42:28 +02:00
Michael Peter Christen
0f82fb3628 using double instead float for a better release ordering 2012-05-30 15:28:20 +02:00
Michael Peter Christen
71c3163f3d - fixes to node identification
- added link to node in network list
- added marking of portal search node peers
2012-05-29 01:38:54 +02:00
Michael Peter Christen
ad222be7f8 added node state icon in network list 2012-05-25 17:29:54 +02:00
Michael Peter Christen
3c2bec681f added a root node flag: identifies peers with short ping time 2012-05-25 15:33:02 +02:00
Michael Peter Christen
f294f2e295 bugfix to http://bugs.yacy.net/view.php?id=181
tried to make a bit less 'noise' to dns server

also included: less processes in snippet fetch to reduce load during
search on small computers
2012-05-19 01:06:33 +02:00
Michael Peter Christen
89142d1e8d removed (not all) warnings 2012-05-16 13:42:32 +02:00
Michael Peter Christen
15db703808 added missing serialization to remove all warnings 2012-05-15 13:13:07 +02:00
Roland 'Quix0r' Haeder
a093ccf5eb Now used synchronization in all close() methods to make sure all objects
are 'closed' in an ordered way

Conflicts:
	source/de/anomic/http/server/ChunkedInputStream.java
	source/de/anomic/http/server/ChunkedOutputStream.java
	source/de/anomic/http/server/ContentLengthInputStream.java
	source/net/yacy/cora/protocol/Domains.java
	source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
	source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
	source/net/yacy/document/content/dao/PhpBB3Dao.java
	source/net/yacy/document/parser/html/AbstractTransformer.java
	source/net/yacy/kelondro/blob/BEncodedHeap.java
	source/net/yacy/kelondro/blob/HeapReader.java
	source/net/yacy/kelondro/index/RAMIndexCluster.java
	source/net/yacy/kelondro/io/ByteCountInputStream.java
	source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
	source/net/yacy/kelondro/table/SQLTable.java
2012-05-14 07:41:55 +02:00
Marc Nause
a691023d04 *) better formatting for network QPM
*) refactoring
2012-05-08 20:07:34 +02:00
Michael Peter Christen
77f8e9fb9b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-05-04 17:29:16 +02:00
Michael Peter Christen
ba6aaabc51 refactoring + parser bugfixes 2012-05-04 17:28:27 +02:00
Michael Peter Christen
2a0434efa4 Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff' 2012-04-29 21:21:49 +02:00
reger
c1f6b4fb52 lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown) 2012-04-24 00:05:01 +02:00
Michael Peter Christen
f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c refactoring of ContentDomain: now subclass of Classification 2012-04-22 00:04:36 +02:00
Michael Peter Christen
33d1062c79 refactoring: the cache belongs to the crawler 2012-04-21 13:34:07 +02:00
Michael Peter Christen
046f3a7e8d check if httpc has decompressed the release file and rename the file
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
Michael Peter Christen
8c06925984 animation of the web structure picture 2012-02-25 15:42:29 +01:00
Michael Peter Christen
c639248c23 protection against strange answers from remote peers during search 2012-02-25 14:07:02 +01:00
Michael Peter Christen
7e4e3fe5b6 free some memory after parsing html 2012-02-02 09:55:27 +01:00
Michael Peter Christen
b4409cc803 small redesign of blob column index and usage 2012-02-02 06:43:57 +01:00
Michael Peter Christen
0b67a0a5d8 added a column index for tables in blob files. This is heavily used
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
7e728867e5 added a synchronization around iterations to prevent IO-deadlocking
during concurrent remote search requests
2012-01-31 18:17:25 +01:00
Michael Peter Christen
ef5192f8c9 using the generic document parser for crawl starts instead of the html
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
2012-01-23 17:27:29 +01:00
Marek Otahal
72adbeae90 !Important: move from Hashtable to HashMap
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits

Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:29:18 +01:00
Michael Christen
216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
Conflicts:
	source/de/anomic/crawler/CrawlQueues.java
2012-01-04 20:16:37 +01:00
stbrumm
d18095dc48 Patch fuer Issue 0000102
and fixes to Patch (private peer status is a property of a peer, not a
status)
2012-01-03 17:49:37 +01:00