Commit Graph

589 Commits

Author SHA1 Message Date
orbiter
670244849d fix for http://forum.yacy-websuche.de/viewtopic.php?p=9835#p9835
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 18:29:37 +00:00
orbiter
1fb1665e71 increased dht interval to avoid peer selection failure
(maybe too less peers available to fill the big gaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 13:38:27 +00:00
lotus
b68d06a6e8 performance settings based on network's remote crawl speed
removed some _pro values from config

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 12:52:17 +00:00
orbiter
3c6e8d2015 set default ppm when network is switched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 18:20:05 +00:00
orbiter
3288c19c1a reduce remote crawl PPM for fresh peers in freeworld to 6 PPM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 09:49:08 +00:00
orbiter
05dbba4bab added logging conditions to all fine and finest log line calls
this will prevent an overhead for the generation of the log lines in case that they then are not printed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:30:21 +00:00
danielr
9ff4fc11da partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 16:34:24 +00:00
lotus
d9d9c522a1 addendum to last commit
moved recrawl times for standard profiles to constants
calculate new specific dates in cleanup job

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 13:20:18 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
danielr
753a1ae430 - changed default browser from netscape to firefox
- fixed "Inefficient use of keySet iterator instead of entrySet iterator" [WMI_WRONG_MAP_ITERATOR, FindBugs]
- fixed some possible null pointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 07:54:56 +00:00
orbiter
7989335ed6 Preparations to replace the HTCache with a new storage data structure:
- refactoring of the HTCache (separation of cache entry)
- added new storage class for BLOBs. (not used yet, this is half-way to a new structure)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-19 14:10:40 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
a087090bbb fixed starting crawl results in "No parser available to parse mimetype 'application/octet-stream'"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-10 11:31:40 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
orbiter
ebb40d324b enhanced memory chart: shows now also the size of the word cache as third vector.
The PPM is now shown without a scale, but with a new anotation at the chart entry.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-04 10:47:26 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
50ef5c406f - refactoring of robots parser (removed opaque Objects[] result vector)
- added Allow-component to robots result object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 11:54:37 +00:00
lotus
62afea0c9f some improvements for yacyTray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5008 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-18 14:17:52 +00:00
lotus
fa695c2d9f tray is now only shown on Windows and doesn't block on linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 19:03:38 +00:00
lotus
d77ed28e2f temporary disabled tray because of flaws on only-shell-linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 08:41:39 +00:00
lotus
f8a1e3175e new yacyTray
this will make a YaCy icon in the tray area on supported platforms
enabled by default
the search page will open on double click

used JDIC 0.9.4 from https://jdic.dev.java.net/

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 07:51:45 +00:00
orbiter
1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
orbiter
1400cdc91e - refactoring of resourceObserver (moved it to crawler)
- partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling
- the resourceObserver can now show a error message if the diskUsage is 'out of order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:03:37 +00:00
orbiter
a6719dfd2b - refactoring of robots parser
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-05 00:35:20 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
orbiter
474659a71f - modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order
- added option to set minimum crawl delta for domains in balancer
- added default values to crawl deltas in yacy.init
- added configuration for these deltas in performance queues
- enhanced performance setting computation (more time for indexing queue for a faster flush
- remote crawling is now enabled during local crawling if indexer has space and time for more links
- added database stub for new distributed file system
- refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 13:08:37 +00:00
orbiter
69aac0d74c modified the diskUsage class regarding the following two aspects:
1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was
a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over.
2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 21:47:53 +00:00
danielr
0c1dc703e4 - set staticIP at startUp
- added setting for reduced menu (simpleMenu)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-29 18:35:15 +00:00
orbiter
c998dc6556 - added security functions to flush url and search caches in case that memory is full
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-16 21:39:58 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
det
f597185026 Initial import of the resource observer framework
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 13:10:21 +00:00
orbiter
e0e7f86f82 some bugfixes for the peer-ping process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-05 12:52:27 +00:00
orbiter
40d7f485f3 - fixed several NPE bugs
- fixed loosing of own seed hash (hopefully)
- fixed a bug with crawl start s beginning with (bookmark) files
- added better IP recognition during hello process


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-04 22:24:00 +00:00
orbiter
2f381b8d7a - fixed at least two causes for a NPE after a use case switch.
A large refactoring was neccessary
- added another crawl start option: automatic restriction to sub-path
- removed crawlStartSimple and renamed crawl start expert
   to crawl start (without expert)
- some changes to texts in crawl start
- added some more deletions when an web index is deleted:
   delete also queues and robots cache


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-04 21:34:57 +00:00
orbiter
2a604b7402 added superfast search result computation which can be obtained for local search when snippet fetching is disabled. An example search for the rss interface would be:
http://localhost:8080/yacysearch.rss?query=yacy&Enter=Search&contentdom=text&count=10&resource=local&verify=false
(just add "&verify=false")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-03 23:06:01 +00:00
orbiter
9bef20b537 - added cleanup for unused server loggings: they are removed after the client had not been seen since one hour
- removed configBasic popup trigger when no password is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-02 21:49:59 +00:00
orbiter
1a1841392c small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-26 22:55:55 +00:00
orbiter
faed00d75d added use cases to basic configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-20 13:21:55 +00:00
orbiter
4229cd275c fixed several details about network switching, default password, random password and localhost authentification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-20 09:29:01 +00:00
orbiter
c1d721dd2d fix for attacks on localhost-authorized peers from web pages with links to localhost addresses:
checking of referer in access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-19 22:17:53 +00:00
orbiter
3bd1db776a implemented switch for admin authorization from localhost:
- access is granted for localhost users to administration pages by default
- the default setting can be changed in the BasicConfig.html page
- if the BasicConfig page was accessed with post and no password was submitted, a random password is generated
- a headless installation MUST give a password upon first call of the configuration page, otherwise they will not be able to access it again
- if no password is given within 10 minutes after start-up, a random password is generated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-15 11:26:43 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
78087da287 - changed seed file storage to clear text
- fixed kill script
- fixed saving of seed file (had been corrupted by latest changes)
- some refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 20:30:44 +00:00
orbiter
5fde679acb - fixed problem in performance configuration
- extended rss fetch size for rssTerminal


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-13 15:28:55 +00:00
orbiter
239cc4428d - better domain graph, faster when more links exist, looks better
- new authorization rule: localhost is always authorized for administration. This solves many problems with ajax, and also fixed a problem in rssTerminal
- fix bug in RSSFeed which prevented that entries had been recognized as individual, new entries
- added reloading/updating of status image on status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 22:23:29 +00:00
orbiter
dd75b3cabc - patch for bad profiles
- time-out when deleting profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 14:58:56 +00:00
orbiter
b32736762c enhanced rssTerminal
- 3 lines possible
- distinguishing of private and public data, if not authorized only public data is shown
- shows now more events, including local searches in clear text if user is logged in
- simplyfied peer events
- better recognition of 'real' new peers
- presentation of peer pings from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 23:05:48 +00:00
orbiter
fbb712c669 refactoring:
moved importer classes to crawler and plasma package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 13:44:38 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00