Commit Graph

3309 Commits

Author SHA1 Message Date
danielr
c612046e5e r5278 java 1.5 compatible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-17 09:59:59 +00:00
f1ori
af71ec93bf ops, forgot to import something
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5279 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 22:44:25 +00:00
f1ori
9e65e9141c * always use UTF-8 for encoding hashes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 22:35:27 +00:00
orbiter
826ca79735 refactoring and new architecture to store the files of the web cache:
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods

this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 21:24:09 +00:00
danielr
f095137238 - respecting httpdMaxBusySessions (refusing new connections if limit is hit)
- comments in serverBusyThread converted to JavaDoc
- better debug output for npe-case in diskUsage


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 10:53:32 +00:00
orbiter
8ba33f104e fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:59:53 +00:00
orbiter
998861acfd - some refactoring in BLOBHeap to enable more gap processing functions
- better gap merging in BLOBHeap
- shrinking of heap file if gap is at end of file when file is closed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:15:54 +00:00
lotus
9d50bfd0b3 fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 09:09:53 +00:00
orbiter
766cad6e93 enhancement in memory management of BLOB Heap files / merging of deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5266 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 22:15:01 +00:00
orbiter
7860d5d632 fix for bug in seed list management (cause was bad class overloading, only visual effects!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 19:51:53 +00:00
orbiter
ffed5fc415 fixed problem with lost peers in database
migrated seedDB from BLOBTree to BLOBHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 14:40:02 +00:00
orbiter
6fb865fbdc - fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing
- some refactoring of classes that use kelondroMap (Map instead of HashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 08:39:11 +00:00
orbiter
2d65887723 - fix for bug in new profile handling
- added a new feature in ymageChart (cannot be seen yet, just wait... will be used in profiling chart)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5261 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-09 22:31:43 +00:00
orbiter
ff68f394dd fix for problem with balancer and lost crawl profiles:
if crawl profile ist lost, no robots.txt is loaded any more

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5258 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-08 18:26:36 +00:00
lotus
fb8d9850ea fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1462
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-05 10:03:02 +00:00
lotus
0d1a2f6183 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1461
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-04 12:36:11 +00:00
orbiter
9ac16f565b - fixed several bugs in database management functions
- fixed a display bug for the performance graph
- fixed deadlock when initialization of awt happens simultanously
- removed some debugging output

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-03 18:57:02 +00:00
orbiter
820a03f9d6 - removed some warnings
- used fix in SVN 5233 for ysearch.java and search.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5237 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 20:20:39 +00:00
lotus
fe2792e9ce use accept-language header instead of user agent for language detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 17:47:11 +00:00
orbiter
c8bdd965ec - larger update time for status page
- balancer writes cause of robots.txt in log file for crawl delay
- removed log output for forced GC
- smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-30 11:09:46 +00:00
lotus
dda771db9d - search result layout
- tray only for windows

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-29 12:39:57 +00:00
orbiter
ce4715e305 removed indexing of anchor links and tagging such words as part of urls (that was wrong)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 21:12:26 +00:00
orbiter
ce57de6cb3 - fixed re-setting of DHT Send/Receive settings
- small change to network grafics: smaller circles / more URLs necessary for full radius; more PPM necessary for full crawling circles
- fixed exclusion search ('-' did not work any more)
- fixed NPE bug when FTP loader wrote to the error-db

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 20:01:10 +00:00
lotus
31c31e54e4 new tray icon image for different icon sizes (e.g. linux)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 08:54:33 +00:00
f1ori
9589dfe080 * removed trayicon popupmenu title
* added some menu items to trayicon


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-27 08:25:16 +00:00
lotus
5a637f004d localized tray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 11:09:54 +00:00
lotus
9d4f0325e1 - removed shutdown from search page (we have it in tray now!)
- fixed doubleclick action for tray

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 10:55:08 +00:00
lotus
214277dad6 - revert r5202
- cleanup
- installer checks for JRE 1.6 only

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 06:52:36 +00:00
f1ori
7afa084207 * add nativ java trayicon, using reflections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-25 19:36:49 +00:00
apfelmaennchen
b97ff24b43 bookmarksDB / xbel.xml:
- added support for folder=/foldername
- it crashes if foldername ends with /

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-24 21:16:13 +00:00
orbiter
6e7d113eac fix for wrong index initialization after network switch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 23:30:25 +00:00
lotus
0a0cc3bf67 added missing classes to build target "run"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5201 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 15:54:12 +00:00
orbiter
7b35d54c6c fixed some problems with network switching (was not completely 'clean')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 12:11:19 +00:00
orbiter
f0b42e5a98 fixed NPE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-22 14:04:38 +00:00
orbiter
8e0de7f180 update to language statistic evaluation:
- the condenser does not abandon too small words any more before feeding the statistics
- for text indexing no more urls are used to feed the index (this was wrong, but in contrast the indexing of urls for media search is necessary)
- urls are not used any more to feed the statistics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 20:25:47 +00:00
orbiter
1198eeecc7 added language selection to search query:
- the language can be selected using a LANGUAGE:<language> element in the query line, i.e.:
java LANGUAGE:en
- the language can be selected with a post element in google-style syntax with the 'rl' element:
?lr=lang_en&query=java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 07:28:57 +00:00
orbiter
00c1535f84 added ranking and evaluation of language type in a search
the wanted language is taken from the browser user-agent string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 00:04:42 +00:00
lotus
a81cb78211 finally some putHTML on htroot/xml/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5188 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-20 07:55:30 +00:00
orbiter
bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information
- metadata and results from statistical analysis are compared and result is printed out as debug lines
- added ranking profile for wanted language
- added class with ISO 639 table, a list of all valid country codes that will be used for the language identification

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 22:19:11 +00:00
apfelmaennchen
5e8bd0f29c small fixes to getpageinfo_p.xml and htmlFilterContentScraper.java with respect to keyword extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5185 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 14:27:44 +00:00
apfelmaennchen
5b2a57bfd0 - /xml/util/getpageinfo_p.xml added <desc> and <lang> tags
- changed htmlFilterContentScraper.getKeywords() to split either space or comma charater not both

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5183 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 21:01:23 +00:00
orbiter
e1f67262f7 - added and removed some debugging output
- fixed a bug with merge method
- patched wrong output of language identification (not fixed, only patched!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 14:12:15 +00:00
orbiter
ce2a7ed116 integrated language detection classes into condenser environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 13:12:33 +00:00
orbiter
2b13705839 fixed a mistake in indexing queue processing: documents had been parsed before it was checked if they should be indexed or not. parsing was not necessary for this check, so the check was moved in the queue in front of the document parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 11:36:09 +00:00
orbiter
21dbb39afa switched two balancer cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-17 22:13:25 +00:00
orbiter
1bbf362cef update to the crawl balancer: better organization and better crawl delay prediction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-17 21:45:21 +00:00
orbiter
ddcf285499 - fixed a bug in performance setting (did not work with german translation)
- reduced maximum number of error url references to save some memory (this was actually a small memory leak)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5174 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-16 23:04:24 +00:00
orbiter
0cd0fee546 fixed bug with wrong proxy result enqueueing. See:
http://forum.yacy-websuche.de/viewtopic.php?p=8130#p8130
- removed the online status property. This influenced the proxy behavior and created some complexity that was not needed because the online status was never used as it was ceated for (offline browsing)
- checked all proxy identification procedures during crawling and enhanced transparency and error checking
- fixed a proxy identification routine that caused the wrong selection of the proxy result queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-16 21:56:23 +00:00
orbiter
670244849d fix for http://forum.yacy-websuche.de/viewtopic.php?p=9835#p9835
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 18:29:37 +00:00
lotus
fd9233244e configurable free disk space via disk.free
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 17:33:06 +00:00
orbiter
25a62cdc3f small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 15:11:59 +00:00
lotus
73f233bb11 * set resource observer to 1000MB
* transparent favicon

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5160 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 12:41:27 +00:00
orbiter
5fbccfd75e fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1366&p=9348#p9348
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-14 20:10:43 +00:00
orbiter
a28faabfd2 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1351&p=9242#p9242
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-14 20:03:59 +00:00
apfelmaennchen
7b63c66a08 - bugfix in bookmarksDB.Tag.hasPublicItems()
- this anoying little bug prevented display of public items without admin login for /xml/bookmarks/...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-14 18:45:08 +00:00
orbiter
1fb1665e71 increased dht interval to avoid peer selection failure
(maybe too less peers available to fill the big gaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 13:38:27 +00:00
orbiter
1eb813bd43 shifted index deletion-on-exit rule to the class where the errors are produced
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 11:51:48 +00:00
f1ori
ba76995d2c * fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1415
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 10:54:11 +00:00
f1ori
bea6c13139 * with r5137 robotParser didn't work at all -> fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5139 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 09:06:38 +00:00
lotus
3ded1efe84 kelondroExceptionCounter didn't work
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-11 18:51:47 +00:00
f1ori
ae677e1738 * fix problem in robotparser, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1421&p=9742
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5137 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-11 18:12:17 +00:00
lotus
383d89481e count errors before deleting collection.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 16:40:20 +00:00
lotus
0bb4fbc403 delete corrupted collecion.index on exit for rebuild on next start
see http://forum.yacy-websuche.de/viewtopic.php?p=9725#p9725

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 12:55:14 +00:00
lotus
b68d06a6e8 performance settings based on network's remote crawl speed
removed some _pro values from config

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 12:52:17 +00:00
danielr
d60b2b198d proxy fixed 'not modified' http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1419
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 11:06:22 +00:00
f1ori
bd0318ba81 * YaCy only supports gzip-encoding, so remove any other encoding from request
* fixes http://www.yacy-forum.org/viewtopic.php?f=2&t=163


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-09 14:04:52 +00:00
orbiter
bb5c898441 enhancements to localsearch behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-09 10:24:42 +00:00
orbiter
42e2d195ac added hint from http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1294
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 22:37:58 +00:00
orbiter
39964e88fa fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1329#p9121
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 22:06:45 +00:00
orbiter
3f3673b6e5 extended balancer:
- added automatic time delay in case that a large number of urls come from the same domain
- added additional time delay in case that an url is a dynamic (CGI) url. This shall cause less IO on targets


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 21:50:37 +00:00
orbiter
3c6e8d2015 set default ppm when network is switched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 18:20:05 +00:00
orbiter
3288c19c1a reduce remote crawl PPM for fresh peers in freeworld to 6 PPM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 09:49:08 +00:00
lotus
5ce9a100bb fix(2) for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1416
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-07 13:57:53 +00:00
danielr
cf29ca19d4 possible fix for POST character encoding http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-07 13:10:46 +00:00
danielr
a2eeb6138c fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1416
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-07 13:04:17 +00:00
orbiter
d09ddabd09 corrected a design mistake (5-byte hashes not necessary)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 21:28:00 +00:00
orbiter
c97d0fcee7 modified the domain list export function:
- used the new superfast domain list generation from the domain statistics
- better interactive behavior

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 20:28:36 +00:00
orbiter
77ee0765a4 - added domain statistic generation to IndexControlURLs_p.html servlet
- added 'delete all' button to all results of such a domain statistic output which causes that all urls to this domain are deleted
- extended stack cleaner to clean also the statistics: they are not completely destroyed, only the smallest counting domains are removed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 19:41:57 +00:00
orbiter
80a7bc93d6 - added statistical evaluation about domains that appear during crawling
- added tables that show this statistics in CrawlResults web pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 09:59:17 +00:00
orbiter
4fbee21cea - added fetch-ahead again (had been removed in last commit)
- reverted default query mode to verify=false

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 23:50:13 +00:00
lotus
423a89ebe8 * fix if yacy was installed to a path with whitespace
* show nice dots when waiting for restart/update

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 18:49:02 +00:00
orbiter
fc03b0437a fixed a error case where a second search after a first search with a different search word failed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5109 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 15:55:25 +00:00
orbiter
eca171ba2e fix for case where javascript was not filtered by the html parser
see http://forum.yacy-websuche.de/viewtopic.php?p=9667#p9667

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 14:41:20 +00:00
lotus
e645bae29f display table in log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 13:14:01 +00:00
orbiter
ead39064c5 fixed problem with wrong result number calculation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 10:04:46 +00:00
hermens
2437beb96c fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1360&p=9321#p9321
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 07:39:03 +00:00
orbiter
7b12e77a63 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1393&hilit=&p=9655#p9655
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:50:42 +00:00
orbiter
05dbba4bab added logging conditions to all fine and finest log line calls
this will prevent an overhead for the generation of the log lines in case that they then are not printed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:30:21 +00:00
orbiter
d3d41e2ee4 - fixed problem with searching with quotes (still not complete, but not as bad as before)
- fixed parsing of crawl-delay statements when seconds were given with float numbers
- enhanced performance of profiling (not too many loggings; not more than one per second)
- removed some debug output
- fixed wrong return type in logging
- added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!)
- fixed wrong word distance computation in RWI management


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-02 23:49:48 +00:00
lotus
3fbfd5a78b * fix for non-changing offset on new search term
* dht-heap doesn't has to be deleted (5097), we simply write a new one on exit
* do not install YaCy in startup because a Windows-shutdown might corrupt something. Installing YaCy as a service would solve this.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-02 15:09:31 +00:00
danielr
219b93df6a - fixed internal error after receiving chunked POST
- removed debug output
- added info for "501 Unknown" messages



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-29 13:51:22 +00:00
lotus
c245c7a45e delete index.dhtin/out.heap if restore fails
see http://forum.yacy-websuche.de/viewtopic.php?p=9613#p9613

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-29 13:10:41 +00:00
danielr
cd19d0aee6 - added warnings for failed transferRWI (dht-in)
- fixed parseMultipart (uncompress gzipped body) (dht-in)
- fixed parseMultipart (using content-length only if uncompressed)
- better gzipped POST (chunked instead of content-length) (dht-out)



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-29 09:42:39 +00:00
orbiter
df4ff423c4 added additional properties to query id's to distinguish search events better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-28 21:15:59 +00:00
danielr
d6d9b0f14a fixed transferRWI.html 'Read timed out'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-28 08:37:51 +00:00
danielr
e503158527 Proxy: fix for never ending loading after POST
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-27 20:46:34 +00:00
danielr
1a1d57e449 Proxy: added binary passthrough for POST
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-27 08:07:18 +00:00
apfelmaennchen
aa6ae77e5e - autoReCrawl: fix for filter settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 21:51:05 +00:00
apfelmaennchen
8ae29bad57 - fix to previous change of Crawl Profile Names
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 20:42:29 +00:00
apfelmaennchen
434104e4a0 - change Crawl profile name for autoreCrawl
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 18:08:48 +00:00
danielr
9ff4fc11da partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 16:34:24 +00:00
lotus
0df2e47012 changed auto recrawl to comply with new date format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 13:36:10 +00:00
lotus
d9d9c522a1 addendum to last commit
moved recrawl times for standard profiles to constants
calculate new specific dates in cleanup job

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 13:20:18 +00:00
lotus
480497f7c9 changed recrawl
use a specific date to define old documents
this solves an unwanted recrawl-loop during a running crawl

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 20:31:32 +00:00
orbiter
da1b0b2fc6 added two new classes that will be used for the new htcache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:22:23 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
borg-0300
08cdf6db8a fix for wrong "VegaYacyB" peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-24 11:30:00 +00:00
danielr
4d937f6b21 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1396
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-22 23:46:32 +00:00
apfelmaennchen
bd931a82f7 - added dynamic filters to autoReCrawl.conf
- Restrict to sub-path: sub
- Restrict to start-domain: dom

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-22 18:05:05 +00:00
apfelmaennchen
b3fc5e96a3 - removed unused import from bookmarksDB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 21:26:06 +00:00
apfelmaennchen
bc048db7b6 - bugfix for bookmarksDB's rebuildDates()
- dates are now saved as String.valueOf(TimeStamp)
- it might be a good idea to delete (backup) bookmarkDates.db and restart YaCy to rebuild it 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 21:25:05 +00:00
danielr
3c68905540 remove redundant null checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 08:37:39 +00:00
danielr
753a1ae430 - changed default browser from netscape to firefox
- fixed "Inefficient use of keySet iterator instead of entrySet iterator" [WMI_WRONG_MAP_ITERATOR, FindBugs]
- fixed some possible null pointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 07:54:56 +00:00
orbiter
7989335ed6 Preparations to replace the HTCache with a new storage data structure:
- refactoring of the HTCache (separation of cache entry)
- added new storage class for BLOBs. (not used yet, this is half-way to a new structure)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-19 14:10:40 +00:00
danielr
be28af50f5 - fixed "yacy2yacy no proxy"-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-17 10:16:32 +00:00
f1ori
f99c307eff * correct debian build dependencies
* add huge mem page detection in general initscript
* disable logging completely in jmimemagic-library


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-14 21:01:21 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
d9cea5ff23 removed annotations which broke the build with java 1.5
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 09:07:23 +00:00
danielr
a087090bbb fixed starting crawl results in "No parser available to parse mimetype 'application/octet-stream'"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-10 11:31:40 +00:00
danielr
7e7e6a099a undo 5044
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-10 10:54:13 +00:00
danielr
f2d0bd7790 fix for NPE in JakartaHttpClient.setProxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-10 09:37:32 +00:00
danielr
bb6a6fc233 fixed 'FileUploadException Stream ended unexpectedly'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-09 22:44:17 +00:00
danielr
8422ee5ec4 - fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined
- serverFileUtils.copy* use now Charset instead of String
- added some warnings for ignored exceptions


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-09 12:00:31 +00:00
hermens
3ac1988059 Add some sanity checks for invalid seeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-08 13:56:29 +00:00
hermens
cff4393f0c Fix HTCache so oldest Files get deleted first
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5041 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-08 08:06:06 +00:00
danielr
31d97f2b9f replaced httpd.parseMultipart() by a 'right' implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-08 01:40:28 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
apfelmaennchen
0500b1179e added a 2 min start up delay to serverBusyThread autoReCrawl to avoid a Null Pointer Exception...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5035 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-05 05:50:15 +00:00
apfelmaennchen
e1574fe02e - added autoReCrawl folders to bookmarks (DATA/SETTINGS/autoReCrawl.conf)
- the serverBusyThread checks folders every 60 min. (==> autoReCrawl_idlesleep in yacy.conf)
- added option to create bookmarks from CrawlStart URL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-04 20:43:36 +00:00
orbiter
ebb40d324b enhanced memory chart: shows now also the size of the word cache as third vector.
The PPM is now shown without a scale, but with a new anotation at the chart entry.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-04 10:47:26 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
lotus
7e92484400 fix for open browser on windows 2000
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-01 12:58:36 +00:00
f1ori
b0724e5ec0 * add config option to disable cookie monitoring (disabled by default)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-30 21:19:06 +00:00
lotus
0b2f67577e Index Transfer:
- fix for chunk size calculation
- fix: if chunk size was 1, an infinite selection loop ran because no entries were found. if chunk size fails <=3 it will be set back to 500

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-27 18:53:51 +00:00
lotus
694084c570 fix for NPE on shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5021 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-27 06:59:56 +00:00
lotus
5f77f55ed7 possible fix for negative speed values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-27 06:58:35 +00:00
orbiter
50ef5c406f - refactoring of robots parser (removed opaque Objects[] result vector)
- added Allow-component to robots result object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 11:54:37 +00:00
danielr
7913bdb75b Flextable: filename in errormessage if inconsistent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 08:22:36 +00:00
lotus
d42eae25f8 yacyTray:
fix for unproper shutdown
some messages

installer:
start shortcuts minimized

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 06:49:30 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
orbiter
3ca98fee42 removed superfluous copyright statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5010 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 00:21:07 +00:00
danielr
c049d80fbd fixed login problem with yacy as proxy (POST and Cookies)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-19 15:10:00 +00:00
lotus
62afea0c9f some improvements for yacyTray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5008 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-18 14:17:52 +00:00
danielr
7c110e07f0 removed debug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-16 12:39:38 +00:00
danielr
eadc204130 gzip POST wiederholbar gemacht (macht transferURL stabiler)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-16 09:46:25 +00:00
lotus
28c39e2aa4 fix for new starter files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5002 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-14 18:41:11 +00:00
lotus
fa695c2d9f tray is now only shown on Windows and doesn't block on linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 19:03:38 +00:00
lotus
d77ed28e2f temporary disabled tray because of flaws on only-shell-linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 08:41:39 +00:00
lotus
f8a1e3175e new yacyTray
this will make a YaCy icon in the tray area on supported platforms
enabled by default
the search page will open on double click

used JDIC 0.9.4 from https://jdic.dev.java.net/

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 07:51:45 +00:00
orbiter
05c26d58d9 fixed missing remove operation in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 12:03:18 +00:00
orbiter
606b323a2d fixed bug that appeared when a new crawl ist started
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 10:59:06 +00:00
orbiter
38eb5bd1ee fixed a bug in kelondroBLOBHeap. The following files are probably inconsistent and should be deleted:
DATA/HTCACHE/responseHeader.heap
DATA/PLASMADB/crawlRobotsTxt.heap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 10:38:20 +00:00
orbiter
28d5703f8a - fixed a bug in Robots.txt loader which could have caused that robots.txt files had been loaded from the same domain more than once
- patch in BLOBHeap to prevent OOM during startup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 09:12:54 +00:00
orbiter
7b1c9e6aee discovered and removed a (possibly large) memory leak:
many classes used the kelondroMapDataMining (was: kelondroMapObjects) which adds statistical
functions to the kelondroMap (was: kelondroObjects), but these functions were not used by these
classes. Especially the HTCACHE and robots.txt database allocate a very large number of objects
for statistical use, but never used them. By replacing the kelondroMapDataMining with the
kelondroMap object for these classes now less memory is allocated.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:34:48 +00:00
orbiter
0f5fe8cc53 refactoring of method calling for objects from kelondroMapDataMining
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:15:46 +00:00
orbiter
01d1ae6676 patch for negative time in case that the time of the computer is changed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4984 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:05:08 +00:00
orbiter
4acf0a61cd refactoring of kelondroObjects (mainly renaming to kelondroMap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 22:08:16 +00:00
orbiter
441e9c861e fix for npe in HTCache cleaning process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4981 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 21:30:39 +00:00
orbiter
f7aaeb3fad created new main menu entry 'Customization and Integration'
- moved some already existing servlets to this menu
- renamed the skin servlet to appearance
- added a set-to-default-button to the search page appearance setting
- removed the peer profile servlet which is now replaced by a field in the new appearance servlet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 19:57:09 +00:00
lotus
5488543b8f disabled disk usage logpoints
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 07:30:50 +00:00
orbiter
1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
orbiter
81f75f5056 - removed unnecessary classes (these objects are much easier to handle using generics)
- generalized BLOB referencing. This is the preparation to use another BLOB class, the kelondroHeap


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 23:52:53 +00:00
orbiter
b38f467e3c better SRU compliance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 21:50:24 +00:00
orbiter
7052f2f61f - added copyright header of ResourceObserver
- commented/removed some code to eliminate code warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:40:45 +00:00
orbiter
1400cdc91e - refactoring of resourceObserver (moved it to crawler)
- partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling
- the resourceObserver can now show a error message if the diskUsage is 'out of order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:03:37 +00:00
f1ori
b6301a54fa * added class ListDirs to provoid generic listing of directories in systemdirectories and jar-files
* yacy runs, when classes are in a jar-file (->build-jar ant-target)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 14:11:40 +00:00
lotus
f2e2d09916 - fix for index transfer
- imported a random startpoint function from plasmaDHTChunk
in case there was already a gap at the beginning of the index, the transfer process was endless selecting from first startpoint
tested & working on my index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 13:16:17 +00:00
orbiter
a6719dfd2b - refactoring of robots parser
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-05 00:35:20 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
orbiter
474e29ce4a added options to configure the 'corporate identity'-icons, the home page link and the greeting line from
the skin menue. Additionally an example is given there how to integrate a search page with an iframe.
Please see the skin menu.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 23:37:04 +00:00
orbiter
474659a71f - modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order
- added option to set minimum crawl delta for domains in balancer
- added default values to crawl deltas in yacy.init
- added configuration for these deltas in performance queues
- enhanced performance setting computation (more time for indexing queue for a faster flush
- remote crawling is now enabled during local crawling if indexer has space and time for more links
- added database stub for new distributed file system
- refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 13:08:37 +00:00
orbiter
080cda97ef added another peer selection rule:
- select also non-robinson (dht-) peers if their peer tags match with search words
- the peer tag '*' can now act as catch-all rule: shall be selected always

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 23:04:32 +00:00
orbiter
d37fd064f9 changed peer selection for search targets:
- less dht targets are selected
- more other peers are selected: all robinson peers with more than one million urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4962 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 22:42:52 +00:00
orbiter
69aac0d74c modified the diskUsage class regarding the following two aspects:
1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was
a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over.
2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 21:47:53 +00:00
danielr
da917cf4b1 undo reduced menu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4960 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 07:11:13 +00:00
danielr
0c1dc703e4 - set staticIP at startUp
- added setting for reduced menu (simpleMenu)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-29 18:35:15 +00:00
danielr
f7f9ceb967 diskUsage: replaced blocking sleep with semaphore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-26 12:05:12 +00:00
lotus
4a53649ee7 fixed dht-urls and ranking distribution log statistics
* NOTE: please have in mind that there can be whitespaces in pathnames

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-26 07:12:03 +00:00
lotus
8d83185cb4 fixed dht-chunks/protocol log statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-25 08:15:07 +00:00
danielr
63eadfdf84 fixed unlimited FileSizeLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-24 19:11:27 +00:00
lotus
2dc7c00c1c fixed indexing log statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-24 07:01:04 +00:00
danielr
dba7ba079e fixed NPE seen with queues_p.xml (serverClassLoader finds already loaded classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-23 16:55:46 +00:00
det
273fb01142 revert last fix; was wrong
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-21 21:07:28 +00:00
det
b6f50851fa fix memory requirement calculation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-21 20:58:57 +00:00
lotus
ac85c52bae better readability for MIN_FREE_DISK_SPACE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-21 10:20:36 +00:00
lotus
54a73b58cf fixed restart on Windows when directory had spaces in it's name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-21 09:19:26 +00:00
det
609aaf0df3 rework of the windows part
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 12:13:06 +00:00
det
1a4f26ba30 exclude HTDOCS from recursiv scan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 10:03:49 +00:00
det
6c07e894d9 add needed sleep
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 09:53:23 +00:00
hermens
d742cc080c Fix for RAMCache not flushing
see: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1255



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-19 18:27:48 +00:00
danielr
6b7e873962 resourceObserver refactoring and some synchronisation for console output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4939 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-19 12:40:44 +00:00
orbiter
6bdd99e065 - more asserts to solve the ooB-problem
- better caching (?), lets see how it behaves

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-18 21:08:56 +00:00
orbiter
b928ae492a some code-cleanup and possible speed enhancements in different core methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-17 23:56:39 +00:00
danielr
6a9cc29cdd workaround for IndexOutOfBoundsException in ResultURLs.getExecutorHash() seen @ CrawlResults.html?process=4
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-17 18:56:04 +00:00
orbiter
c998dc6556 - added security functions to flush url and search caches in case that memory is full
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-16 21:39:58 +00:00
orbiter
f4ae8082c3 - better error analysis for ooRange Exception in kelondroBase64Ordering
- quadcore support for kelondroRowSet array ordering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-15 23:25:57 +00:00
orbiter
84cbe75005 more asserts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-15 00:04:59 +00:00
orbiter
e269c12710 small changes in partition routine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 23:17:56 +00:00
orbiter
31efb8fbee - fix for LOG path generation when the DATA/LOG does not exists (fix for bug introduced in SVN 4923)
- some more/better asserts
- slight performance enhancements in remove method in index management. Works for all who do not run using asserts (the majority)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 22:51:47 +00:00