Commit Graph

638 Commits

Author SHA1 Message Date
orbiter
be15874be1 added request line in http which can support better debugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 11:00:38 +00:00
orbiter
bd55dcee50 - commented out experimental distributed ranking loading
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 21:08:01 +00:00
orbiter
3ed4a09368 small features, some bug fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-23 21:08:04 +00:00
orbiter
b45701d20f this is a re-implementation of the YaCy Block Rank feature
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing

Please play around with the ranking settings and see if this helped to make search results better.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-18 14:26:28 +00:00
orbiter
8879cc1db2 removed System.out.println
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 14:08:02 +00:00
orbiter
d8e934c085 better abstraction of http client identification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-26 13:35:29 +00:00
orbiter
a50f28e6e7 - fixed missing save operation for peer name change
- fixed import of mediawiki dump files
- added script to add mediawiki dump files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-19 23:52:09 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
orbiter
e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 09:29:05 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
38dce547c0 better concurrency (less locking on date formatting) more logging and minor bug fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-28 06:28:29 +00:00
orbiter
89d337841c more logging for OOMs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7534 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-26 09:27:15 +00:00
orbiter
af87af0d4c - removed synchronization in serverSwitch which should improve speed
- fixed wrong assert in network graph
- enhanced double check method in table class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 12:56:25 +00:00
orbiter
d84b4a072e healing for some OOM problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 00:38:49 +00:00
orbiter
d58071947a maybe terminateOldSessions is too slow, removed sleep
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-17 23:36:46 +00:00
orbiter
fe93caac5a added flags and administration options to show advanced search and to show search result attributes (for each search result)
Administration can be done at ConfigPortal.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:54:13 +00:00
orbiter
0cdfb82963 replaced more appearance of double values by float values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:06:29 +00:00
orbiter
eb12e15738 moved all Double values to Float values because of
http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/
YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-01 23:49:11 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
f1f03d8c90 more logging for strange network loading bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-19 09:31:56 +00:00
f1ori
a321c7673d * adminAccountForLocalhost only for localhost
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 11:37:30 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
orbiter
c54170421a fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 11:19:22 +00:00
f1ori
acd93b1b31 * add failsafe mechanisme to domainlist retrieval
domainlist is saved locally, if none of the given urls in network.unit.domainlist
  could be retrieved, the file from the last boot is used instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-02 17:57:48 +00:00
f1ori
def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
* crawler and search allow only urls matching one in domainlist (if list is provided)
* this may be useful to prevent dedicated networks from being "polluted"
* FilterEngine is improved Backlist-object, Blacklist may inherit from FilterEngine in the future

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-30 14:44:33 +00:00
orbiter
155d556568 - better memory protection
- more logging
- little bit of refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-27 13:21:18 +00:00
f1ori
60fd2e549d * log failures when writing config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7259 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-18 15:00:29 +00:00
orbiter
0bc6284e27 - added bugfix for access tracker in case of concurrency conflicts
- added missing entry for new icu4j path in Mac App

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7188 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-25 21:10:50 +00:00
lotus
4450c240b7 npe fix http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2982
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-22 20:24:07 +00:00
orbiter
a2f9974745 some redesign in the access tracker to realize sixcoolers question about "smartes way for deleting the first Object":
- not so much abstraction for a collection, makes use of remove() (no operands) possible
- different way to delete elements in track (destructive, not constructive (less copies of elements in new queue))
- more abstraction for class api since no static class must be used any more

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-19 23:00:24 +00:00
sixcooler
03f0414025 some minor correction of my last commit
sorry for the noise

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7168 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-19 20:57:25 +00:00
sixcooler
42fa0eadb1 fix endless loop:
Collection does not support remove(int)
(isn't there a smartes way for deleting the first Object?)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-19 20:41:44 +00:00
orbiter
37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
- added link to yacystats peer when peer is more than one day old

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-17 10:20:04 +00:00
orbiter
29fe401f93 - some layout and text enhancement for site crawl start
- Quix0rs patch from http://forum.yacy-websuche.de/viewtopic.php?p=20839#p20839 (parts)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 23:00:07 +00:00
orbiter
22047ffad5 enhanced computation speed of many replaceAll string operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 13:19:42 +00:00
orbiter
9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
with this fix search results are (even) faster.
Also enhanced: image search. This is now speeded up using a image search result look-ahead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-04 22:57:12 +00:00
orbiter
1da5241c2d do not block server session if maximum number of sessions is reached, just try to clean up once
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7095 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 12:05:37 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
c60d0282fd more abstraction for tables stored in heaps:
the BEncodedHeap now implements Map<byte[], Map<String, byte[]>>
This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:27:58 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
fbf021bb50 redesign of index abstract processing - currently disabled until enough peers have fix in SVN 6928
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 09:44:21 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00