Commit Graph

973 Commits

Author SHA1 Message Date
orbiter
461a2a6ec7 enhanced remote crawling:
- 300 ppm is default now (but this is switched off by default; if you switch it on you may want more traffic?)
- better timing for busy queue
- better amount of remote url retrieval
- better time-out values
- better tracking of availability of remote crawl urls
- more logging for result of receipt sending

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 09:34:17 +00:00
orbiter
670ba4d52b - removed the remote crawl option from the network configuration submenu and
- added a remote crawl menu item to the index create menu. This menu also shows a list of peers that provide remote crawl urls
- set remote crawl option by default to off. This option may be important but it also confuses first-time users


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 00:39:05 +00:00
orbiter
89c2d8b81e better initial hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7157 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 22:11:52 +00:00
orbiter
0cf006865e refactoring and enhanced concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 11:38:03 +00:00
orbiter
83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before)
- added use of LookAheadIterator which should prevent mistakes when coding iterators with embedded iterators
- added a fail-safe reaction in case of database corruption using iterators over database elements (no interruption then)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 10:43:14 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
ac1c08924e more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 15:27:27 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
orbiter
7ebef56add - redesign of a part of the remote search client to make it possible to have a test environment for remote search performance tests
- added a remote search test main methods in yacyClient

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7146 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 13:35:47 +00:00
orbiter
3c0e07ba72 removed all delays in shutdown process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 09:13:28 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
32f73d1aaa added copy for Info.plist for Mac application release updates (this file contains class paths and start parameters)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 09:48:09 +00:00
orbiter
570ca577c6 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 22:42:54 +00:00
orbiter
5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
- added some clear statements that shall clear static cache size within the pdfbox library
- the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-07 17:13:47 +00:00
orbiter
b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by:
- storing the startup properties when yacy is started
- using the properties in the restart-script again. this transports also the DATA directory location as parameter of the -gui option that is used when the Mac version of YaCy is started

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 23:08:43 +00:00
orbiter
d4e4967e19 cleaned up code in yacyRelease (there will be work to do there)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 22:35:48 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
f1ori
55da979291 disable revision detection for git
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:11:19 +00:00
orbiter
0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
- refactoring of method names to comply with Map method names

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-24 12:36:56 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
86d7f8a989 - the web visualization can now be generated in custom color
- added input fields in WatchWebStructure_p.html
- introduced enum classes for Draw Mode and Filter Mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 10:44:00 +00:00
orbiter
64d4204f44 fix for NPE in network image computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 08:18:17 +00:00
sixcooler
a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
make the occurrence of multiple header-keys possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-10 21:22:30 +00:00
orbiter
b480b7a4d0 fix for bug in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 00:13:32 +00:00
orbiter
b12bfe1f91 better usage of OSM tile cache and YaCy cache by usage of better tile server computation based on a coordinate hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-08 23:51:37 +00:00
orbiter
388aa021c2 - concurrent loading of OSM tiles
- added  a 4-time re-try in case that tile server does not respond

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-08 23:14:08 +00:00
orbiter
301a59e07f moved browser access method from kelondro/util/OS to gui/framework/Browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-05 10:49:58 +00:00
orbiter
6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
- confirm that database objects that are not supposed to grow do not have a index memory management that is designed for growth
- changed index sorting method in such a way that it allocates less objects during quicksort
- database classes classes renaming (shorter, naming addresses that objects hold in RAM)
- added a large number of asserts to check if objects actually take the RAM that they should have


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 13:33:12 +00:00
orbiter
5924a0d851 - enhanced concurrency in database index access for multicore
- added statistics about database index caches in PerformanceMemory_p.html
- adoped many classes to use the new statistics
- added missing close statements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 04:58:48 +00:00
orbiter
610855e362 do not use network graph cache if called from authorized account
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:43:15 +00:00
sixcooler
39d96abbb5 fix yacyRelease download
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2920&p=20545#p20545)
better cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 20:13:20 +00:00
sixcooler
c29f24a519 ... migrating to HttpComponents-Client-4.x ...
- Proxy
- Release-download

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-01 22:35:11 +00:00
orbiter
e7ea3b3cc5 added a buffer for network images to reduced load on yacy.net network image server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 12:45:53 +00:00
orbiter
d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 11:40:58 +00:00
sixcooler
b7102eff92 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 23:08:37 +00:00
lotus
74f6fd229e some comments + debug code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 08:12:58 +00:00
sixcooler
5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
monitoring and first try to use remoteProxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-20 01:14:28 +00:00
orbiter
22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:14:20 +00:00
sixcooler
0e56d29335 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-15 00:59:53 +00:00
sixcooler
e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-13 22:10:24 +00:00
sixcooler
c5c67f0504 start migrating to HttpComponents-Client-4.x
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-12 23:07:05 +00:00
orbiter
7188c54ddb patch to get dht access to developer peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 08:42:29 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00
orbiter
bf25407fdd added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 23:10:57 +00:00