Commit Graph

76 Commits

Author SHA1 Message Date
Michael Peter Christen
3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00
Michael Peter Christen
82a682b31d fixed problem with seed when switching network 2012-06-19 07:44:44 +02:00
Michael Peter Christen
8e97ada7c9 IPv6 bugfix 2012-06-18 00:33:32 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
#143

Conflicts:
	htroot/Supporter.java
	htroot/yacy/crawlReceipt.java
	htroot/yacy/transferRWI.java
	htroot/yacy/transferURL.java
	source/de/anomic/crawler/CrawlStacker.java
	source/de/anomic/data/ListManager.java
	source/net/yacy/peers/Protocol.java
	source/net/yacy/repository/Blacklist.java
	source/net/yacy/repository/LoaderDispatcher.java
	source/net/yacy/search/Switchboard.java
	source/net/yacy/search/index/MetadataRepository.java
	source/net/yacy/search/index/Segment.java
	source/net/yacy/search/query/RWIProcess.java
	source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Michael Peter Christen
3b992e6b00 using utf8 String compression in Webstructure database 2012-06-09 11:00:33 +02:00
Michael Peter Christen
a1fe65b115 performance hacks 2012-06-05 12:06:26 +02:00
Michael Peter Christen
e0d8643226 - performance hacks
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
7c1feefb28 introduced a default 10 second time-out in rwi normalization time
uring search process to prevent endless deadlocks after a very long
running search
2012-05-30 16:26:05 +02:00
Michael Peter Christen
65d37e6a20 only ASCII needed in seed bitflags 2012-05-30 15:42:28 +02:00
Michael Peter Christen
0f82fb3628 using double instead float for a better release ordering 2012-05-30 15:28:20 +02:00
Michael Peter Christen
71c3163f3d - fixes to node identification
- added link to node in network list
- added marking of portal search node peers
2012-05-29 01:38:54 +02:00
Michael Peter Christen
ad222be7f8 added node state icon in network list 2012-05-25 17:29:54 +02:00
Michael Peter Christen
3c2bec681f added a root node flag: identifies peers with short ping time 2012-05-25 15:33:02 +02:00
Michael Peter Christen
f294f2e295 bugfix to http://bugs.yacy.net/view.php?id=181
tried to make a bit less 'noise' to dns server

also included: less processes in snippet fetch to reduce load during
search on small computers
2012-05-19 01:06:33 +02:00
Michael Peter Christen
89142d1e8d removed (not all) warnings 2012-05-16 13:42:32 +02:00
Michael Peter Christen
15db703808 added missing serialization to remove all warnings 2012-05-15 13:13:07 +02:00
Roland 'Quix0r' Haeder
a093ccf5eb Now used synchronization in all close() methods to make sure all objects
are 'closed' in an ordered way

Conflicts:
	source/de/anomic/http/server/ChunkedInputStream.java
	source/de/anomic/http/server/ChunkedOutputStream.java
	source/de/anomic/http/server/ContentLengthInputStream.java
	source/net/yacy/cora/protocol/Domains.java
	source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
	source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
	source/net/yacy/document/content/dao/PhpBB3Dao.java
	source/net/yacy/document/parser/html/AbstractTransformer.java
	source/net/yacy/kelondro/blob/BEncodedHeap.java
	source/net/yacy/kelondro/blob/HeapReader.java
	source/net/yacy/kelondro/index/RAMIndexCluster.java
	source/net/yacy/kelondro/io/ByteCountInputStream.java
	source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
	source/net/yacy/kelondro/table/SQLTable.java
2012-05-14 07:41:55 +02:00
Marc Nause
a691023d04 *) better formatting for network QPM
*) refactoring
2012-05-08 20:07:34 +02:00
Michael Peter Christen
77f8e9fb9b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-05-04 17:29:16 +02:00
Michael Peter Christen
ba6aaabc51 refactoring + parser bugfixes 2012-05-04 17:28:27 +02:00
Michael Peter Christen
2a0434efa4 Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff' 2012-04-29 21:21:49 +02:00
reger
c1f6b4fb52 lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown) 2012-04-24 00:05:01 +02:00
Michael Peter Christen
f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c refactoring of ContentDomain: now subclass of Classification 2012-04-22 00:04:36 +02:00
Michael Peter Christen
33d1062c79 refactoring: the cache belongs to the crawler 2012-04-21 13:34:07 +02:00
Michael Peter Christen
046f3a7e8d check if httpc has decompressed the release file and rename the file
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
Michael Peter Christen
8c06925984 animation of the web structure picture 2012-02-25 15:42:29 +01:00
Michael Peter Christen
c639248c23 protection against strange answers from remote peers during search 2012-02-25 14:07:02 +01:00
Michael Peter Christen
7e4e3fe5b6 free some memory after parsing html 2012-02-02 09:55:27 +01:00
Michael Peter Christen
b4409cc803 small redesign of blob column index and usage 2012-02-02 06:43:57 +01:00
Michael Peter Christen
0b67a0a5d8 added a column index for tables in blob files. This is heavily used
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
7e728867e5 added a synchronization around iterations to prevent IO-deadlocking
during concurrent remote search requests
2012-01-31 18:17:25 +01:00
Michael Peter Christen
ef5192f8c9 using the generic document parser for crawl starts instead of the html
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
2012-01-23 17:27:29 +01:00
Marek Otahal
72adbeae90 !Important: move from Hashtable to HashMap
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits

Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:29:18 +01:00
Michael Christen
216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
Conflicts:
	source/de/anomic/crawler/CrawlQueues.java
2012-01-04 20:16:37 +01:00
stbrumm
d18095dc48 Patch fuer Issue 0000102
and fixes to Patch (private peer status is a property of a peer, not a
status)
2012-01-03 17:49:37 +01:00
stbrumm
9f1b1b4604 Type for Robinson-Mode/Private Perr added 2012-01-03 14:43:17 +01:00
Roland 'Quix0r' Haeder
fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check 2011-12-29 00:33:16 +01:00
Michael Christen
85bd4cc8bc better lookup for peer names 2011-12-25 10:14:15 +01:00
Michael Christen
20e3084bd4 redesign of fining of peers by ip: more leightweight method to read the
seed databases
2011-12-21 01:14:43 +01:00
Michael Christen
0797b0de99 new handling of remote search processes: looking for seeds will now not
block the whole search process any more. A deadlock with a DHT selection
process may have been the cause for interface lockings in the past.
2011-12-21 00:32:03 +01:00
Michael Christen
9e5894c784 Removed handling of components objects for URIMetadataRows.
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Christen
c04bfaa51b refactoring 2011-12-16 23:59:29 +01:00
Michael Christen
1f4afb4dc0 performance hacks 2011-12-15 15:15:53 +01:00
Michael Christen
675d557e88 removed debug logging 2011-12-14 22:21:19 +01:00
Michael Christen
e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
2011-12-14 22:15:51 +01:00
Michael Christen
044f83feed added some pauses into the search process which shall produce
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
2011-12-06 15:28:48 +01:00
Michael Christen
f14faf503b better ranking because we wait a very little time during the search
process more to get better remote sear results into the ranking priority
stack
2011-12-06 02:24:51 +01:00
Michael Christen
e7e429705a - less automatic indexing after a search (needs to reset the default
crawl profiles)
- fix for concurrency problem in storage of serverSwitch Properties
- markup update
2011-12-05 16:22:11 +01:00
admin
484c4ad339 Merge branch 'master' of git://github.com/f1ori/yacy 2011-12-04 09:01:05 +01:00