Commit Graph

5027 Commits

Author SHA1 Message Date
orbiter
867d0f2f56 removed some unnecessary pause delays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 23:36:33 +00:00
f1ori
d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092
  and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 20:49:16 +00:00
orbiter
8c96bc2ac1 do not use proxy caching rules for crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 16:31:04 +00:00
lotus
fd83e59f8e new remote search average
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 11:50:46 +00:00
orbiter
dba7ef5144 extended crawling constraints:
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 09:58:56 +00:00
orbiter
96174b2b56 more debugging / better result status logging for parser/caching errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:41:43 +00:00
orbiter
84185baa81 added more test files for windows from lulabad
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:17:30 +00:00
lotus
73c44573e8 revert; used to hide memory tables and thread timings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 21:02:18 +00:00
danielr
8e1636e6d0 removed unused config-option
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5338 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 13:08:08 +00:00
f1ori
90e78b2cf6 * improve encoding detection of http service
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 21:06:32 +00:00
orbiter
3246358485 mistake -> rename
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:10:52 +00:00
orbiter
55ec57d27f added linux umlute test files from low012
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:02:19 +00:00
orbiter
0ae84f4f8e set some default values for a crawl start that should cause less confusion and mistakes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:48:22 +00:00
orbiter
e9262b3890 re-named old test files
added more mac test files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:41:48 +00:00
orbiter
ff2a54da68 added more umlaute test files: mac
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:33:48 +00:00
lotus
4745e89451 auto-choose crawl type
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5331 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 14:44:23 +00:00
low012
421d056550 *) changed layout of blacklist adminstration (less cluttered)
*) it is possible to move/edit/delete more than one entry at a time now
*) it is easier to choose a target for blacklist import now
*) fixed several bugs
*) to be continued...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 00:47:54 +00:00
orbiter
ef66438662 - more space in error db to store larger error messages
- added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 21:42:12 +00:00
orbiter
674ad2d55b different handling of error cases that occur during loading files with http or ftp:
methods throw exception instead of returning an error string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 21:33:40 +00:00
danielr
538359a0ff simple fix to get DHT working again (maybe something more has to be done ;)
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1578



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 18:55:16 +00:00
lotus
8ba0c9d1e9 cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-10 15:35:05 +00:00
lotus
a94b3be80b - modern ui for Windows installer
- installer has native system language now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-10 15:33:54 +00:00
f1ori
ae80f3e6a5 * extend opensearchdescription to support compare_yacy.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-09 00:23:19 +00:00
f1ori
7e1fe05e3c * added utf8-encoding to many getBytes-calls
* utf8 should work now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-08 20:24:31 +00:00
lotus
fad044fb54 update to snippet marker:
- do not display indexed html (solves xss issues)
the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags.
- improved speed for existing routine
heavy used regex pattern are precompiled now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-08 10:08:53 +00:00
lotus
16723d0fa6 ask another peer if crawljob loading fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 14:14:34 +00:00
orbiter
1b18d4bcf3 enhancement to crawling and remote crawling:
- for redirector and  remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue
- when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 12:30:55 +00:00
orbiter
3f746be5d4 - consolidation and refactoring of many DHT target - computing methods
- implemented vertical DHT acceptance ("my own DHT") to accept new targets
- added new target computation for global search: addresses vertical targets also
- enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries)
- better performance value computations for PPM selection in network configuration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 10:07:53 +00:00
orbiter
d014b2728a Design-check, Extension and Refactoring of DHT target position computation:
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-03 00:27:23 +00:00
orbiter
dd27ce7216 added control logic to ECO tables that deletes ram copies of the tables if they get too large
table copies in ram are now abandoned if less than 20 MB ram is left

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:53:09 +00:00
orbiter
38e6ba5d00 forgot to re-rename commonsPath
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:39:02 +00:00
orbiter
22989d0d8a added property index.storeCommons to switch commons storage on or off
with index.storeCommons=false all currently stored commons are deleted!
Default is now 'true', but in future full releases it will be switched to 'false'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:30:09 +00:00
f1ori
4b4ce75396 * http-server: submit charset from html metatags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5314 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 23:17:51 +00:00
f1ori
69e695bd4b * detect charset for directory index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 22:14:31 +00:00
f1ori
340ecd919d * include non ascii characters in visible characters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5312 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 21:13:57 +00:00
lotus
5cf0cbb47e javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 08:56:58 +00:00
lotus
8d07607d1d update to resource observer:
- returns high/medium/low disk space
- pauses crawling on medium disk space
- disables index receive on low disk space

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-31 11:33:17 +00:00
low012
83967f8c77 *) servlet does not forget chosen blacklist anymore when editing, moving or delting an entry
*) move or edit will only be performed if new value actually differs from old one

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-30 00:03:14 +00:00
low012
04e41a392f *) fixed bug where RegExes were not deleted and even added to the list a second time when the user tried to edit them
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-29 22:49:44 +00:00
f1ori
d0543a7c39 * fix the debug ant-target
* fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 22:16:56 +00:00
low012
7bac4796d2 *) added servlet which returns all shared blacklists of a peer without information about which part of YaCy (crawler, proxy, ...) blacklist is activated for (to be used for better online import)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5306 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 17:33:43 +00:00
low012
baae3d91b1 *) fixed warning when compiling listManager
*) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for
*) replaced regular put() with putXML() in several cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 16:56:19 +00:00
low012
444575e33d *) prevent XSS when importing blacklist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 11:06:38 +00:00
danielr
a4fb76e93c undo r5300 (not fixed as seen after longer run)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 23:20:09 +00:00
low012
a99a629ed4 *) quick fix to prevent comments for blog entries which don't exist (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1554)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 12:04:10 +00:00
low012
00e27e5050 *) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist
*) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this)
*) further minor fixes
*) to be continued...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 00:11:03 +00:00
danielr
0f9c0bd0d5 fix for ConcurrentModificationException at de.anomic.index.indexContainerHeap$heapCacheIterator.next(indexContainerHeap.java:324)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5300 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-24 14:00:41 +00:00
danielr
103ad2a437 some javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-24 13:58:26 +00:00
orbiter
b098522977 some very small advances to index utf-8 (not working yet), inserted also debugging code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 22:04:13 +00:00
orbiter
2f49666908 integrated the character decoding into the parser, removed old code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 20:56:13 +00:00