Commit Graph

2311 Commits

Author SHA1 Message Date
orbiter
80b6c90d54 enhancements to prevent blocking during dht transfer receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:49:39 +00:00
auron_x
4fb8fddd99 *)made the domainlist of the blacklist sorted
if a new domain is added it is still appended to the end of the list and sorted in with next refresh, may need a fix.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 17:34:27 +00:00
theli
9f298083cd *) adding more urls to the error url
- old error strings where replaced with there corresponding constants   
   See: http://www.yacy-forum.de/viewtopic.php?t=2638

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 15:11:14 +00:00
hermens
d56f06401e - Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
- Small logging updates



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:42:00 +00:00
theli
c09f734d06 *) offer router configuration on ConfigBasic.html
- checkbox to allow router configuration is shown if
   - a) the UPnP forwarder is installed
   - b) a UPnP enabled router was found
   - c) no other forwarder was configured
   See: http://www.yacy-forum.de/viewtopic.php?p=24264

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:31:18 +00:00
hermens
dcbb4d0a6b Display the size of HashBlacklistedCache on PerformanceMemory page.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:19:54 +00:00
theli
c7b6389ca1 *) renaming indexDistribution.dhtReceiptLimitEnabled property to indexDistribution.transferRWIReceiptLimitEnabled
so that the default value is taken over by all peers


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:01:01 +00:00
theli
0baadcadca *) enable indexDistribution.dhtReceiptLimitEnabled limit per default
See: http://www.yacy-forum.de/viewtopic.php?p=24425

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 10:51:24 +00:00
orbiter
d799622da1 better flush limit for index collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:44:43 +00:00
orbiter
d468d665c9 some changes that may help to prevent deadlocks that cause an OutOfMemoryError
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:19:01 +00:00
theli
988341cf81 *) some comments added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-06 14:44:04 +00:00
theli
d54767f634 *) last step of removing embedded html from dir class
- migration finished
*) dir list now sorts the dirlist entries. 
   - directories are listed before files
   - files are sorted alphabetically, case insensitive 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-06 14:38:07 +00:00
rramthun
96b774e427 Adding link to newsletters as agreed in forum.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-06 10:22:53 +00:00
theli
8283df2d77 *) first step of removing embedded html from dir class
- dir list generation uses templates now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-06 08:09:39 +00:00
orbiter
279b1d969d Integrated new indexing data structure 'collections' into the main class
for indexing, the plasmaWordIndex.

The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.

The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.

Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 22:22:14 +00:00
orbiter
4ff742e42d implemented indexCollectionRI
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 19:18:33 +00:00
allo
132cd7da45 no need to copy dir.*
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 18:39:25 +00:00
allo
0164321160 fix for the actions (uploading/deleting, loggin in, ...)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 18:34:31 +00:00
orbiter
01f95eccd3 re-write of kelondroCollectionIndex. This is the data structure that
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-04 23:04:03 +00:00
orbiter
ebc2233092 * implemented (finished) class indexRowSetContainer
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 23:20:03 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
0b7112f8b2 fix for missing topLevelClone in indexRAMCacheRI.wordContainerIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 00:43:03 +00:00
orbiter
e357599f92 * fixed problem with indexContainer iteration from RAM:
indexContainers from RAM must be cloned explicitely to prevent
  side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
  deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-01 10:30:55 +00:00
theli
57fe5cc671 *) code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2338 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-30 06:25:40 +00:00
allo
4e9f02c8ec integration of Michaels string-extraction.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 23:11:15 +00:00
orbiter
8b77afd72c some fixes to new container merger
and some code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 22:40:11 +00:00
allo
d7e8e7da1e more flexible classpath in linuxscript
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 20:21:24 +00:00
allo
4435d916d4 print-only option for use in other scripts.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 18:22:14 +00:00
orbiter
830167596a bugfix for
http://www.yacy-forum.de/viewtopic.php?p=24127#24127

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 18:16:33 +00:00
theli
839806a775 *) serverPortForwardingUpnp.java: code cleanup, license header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 15:32:35 +00:00
theli
523e80445f *) adding libs to eclipse classpath file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2331 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 15:25:44 +00:00
theli
03230cd887 *) removing old port forwarding classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:42:12 +00:00
theli
4fd8449918 *) adding some license files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:37:02 +00:00
theli
6e676224d0 *) adding support for upnp
A new port forwarding method for upnp was added.
   If this method is enabled, yacy automatically determines an UPnP 
   capable internet gateway and configures the gateway port forwarding
   settings properly. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:26:45 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
theli
0db237467f *) bugfix for URL generation from file
see: http://www.yacy-forum.de/viewtopic.php?p=24116

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-25 16:18:45 +00:00
orbiter
ad692fc6c7 implemented option to extract nurls from the database
(plus some iteration enhancements for nurls)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:40:59 +00:00
orbiter
7fd90ca7c8 * strict handling of NURL entry element generation, storage and stacking
* more space for EURL reason strings (you must delete the EURL db to use this)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:04:14 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
1ed3e2daef added option to extract domains and/or urls from the eurl database
when extracting from eurl, the html output format is recommended, since
this format adds also the fail reason to the domain/url.
The complete syntax for domain extraction is now
java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl | eurl } ] [ -format { text  | zip | gzip | html } ] [ <path to DATA folder> ]


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 08:08:33 +00:00
orbiter
7e0a130fb5 new indexURLEntry class 'indexURLEntryNew', to replace old class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:43:36 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
orbiter
493b1cd2bf better logging for domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:43:56 +00:00
orbiter
e20ff77c10 another bugfix in new url class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:37:22 +00:00
orbiter
685430a1b5 bugfix in new URL class, better loggin for domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:33:01 +00:00
orbiter
c57b78722b added some more logging to domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 10:56:40 +00:00
orbiter
79af283f6c better debugging in new URL class for wrong port numbers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 10:21:24 +00:00
orbiter
cc2be7fb43 fix for genurllist in case of bad urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2314 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 10:00:21 +00:00
allo
1b2ea58ee9 wrong substring invocation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-20 13:49:38 +00:00