Commit Graph

75 Commits

Author SHA1 Message Date
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
e3f0136606 *) next step of restructuring for new crawlers
- adding function isSupportedProcotol to plasmaCrawlLoader.java
   - disabling robots.txt check for protocols other than http(s)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:46:17 +00:00
theli
1c8300fcec *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:23:57 +00:00
theli
09b106eb04 *) next step of restructuring for new crawlers
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads 
   - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
   - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
     to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:00:18 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
orbiter
985dcbde7f changed some parameters that may cause better memory usage and more indexing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 23:39:52 +00:00
orbiter
b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 22:21:22 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
orbiter
23dd972608 fixed memory calculation in performanceMemory web page
fixed also maximum cache size computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-20 01:20:34 +00:00
orbiter
1ce3c22761 better memory control:
- added memory monitor for preNURL-db in performanceMemory
- changed default memory assignments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 13:09:04 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
689bbcf9cd replaced kelondroTree db for NURLs by new kelondroFlexTable
The new database is only created if the old is deleted or does not exist

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 23:36:58 +00:00
orbiter
740d49751d * strict type and size check in kelondroRow handling
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
  because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
  into a properties file along the database files.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 03:20:44 +00:00
theli
9f298083cd *) adding more urls to the error url
- old error strings where replaced with there corresponding constants   
   See: http://www.yacy-forum.de/viewtopic.php?t=2638

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 15:11:14 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
orbiter
7fd90ca7c8 * strict handling of NURL entry element generation, storage and stacking
* more space for EURL reason strings (you must delete the EURL db to use this)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:04:14 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
c36e9fc8d3 full integration of kelondroRow
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-02 12:45:57 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
rramthun
f08e33680c Added Blog-news-symbol as requested.
I think I will change the character distance a little bit later.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 15:42:06 +00:00
theli
ddfe0f0e27 *) don't try to parse referer string if it's null
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-13 15:45:04 +00:00
orbiter
3e31820c3d - corrections to PerformanceMemory display of object cache
- configuration of object cache size in kelondroTree initializer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 09:08:42 +00:00
orbiter
00a5d435e2 - fixed some bugs with domain filter
- added new ranking filter "prefermask": urls that match the filter are ranked better


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-13 23:19:36 +00:00
orbiter
7a650d0023 several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-27 16:45:29 +00:00
orbiter
59d52fb4a9 fixed some problems with crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-26 14:52:01 +00:00
orbiter
63f39ac7b5 added 3 new crawling steering options:
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
488a0ed580 replaced old keyIterator and rowIterator by buffered iterators
that are synchronized with database access
Main change is done in kelondroTree, other classes are only adoptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 23:43:24 +00:00
orbiter
dba02f399f starting of re-design of kelondroTree iterator
- new access to iterator
- added many IOException handling in other Classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 20:52:43 +00:00
orbiter
f02b426073 made kelondroTree.nodeIterator private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 18:10:48 +00:00
theli
89286478e7 *) removing thread pool eviction for now. Not needed at the moment
See: http://www.yacy-forum.de/viewtopic.php?p=18290#18290

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 09:09:36 +00:00
theli
fbbbf5f411 *) remote trigger for proxy-crawl
- remote crawling can now be enabled for the proxy crawling profile
   See: http://www.yacy-forum.de/viewtopic.php?p=17753#17753
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-24 09:35:54 +00:00
theli
2da18ab359 *) correcting logging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:56:17 +00:00
theli
8ffc6e35ad *) correcting logging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:28:54 +00:00
theli
ab7a911bb3 *) Trying to solve pool not open problem
See: http://www.yacy-forum.de/viewtopic.php?t=1798

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-29 08:54:19 +00:00
hydrox
d665f3c39c *) fixed Threadnames for stackCrawl-Threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 19:06:21 +00:00
theli
3d5347bc8e *) changing loglevel for some messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 17:40:24 +00:00
theli
b9c9eaeb44 *) next try todo a bugfix :-((
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 17:05:47 +00:00
theli
4b4b93c413 *) next try todo a bugfix :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:55:05 +00:00
theli
d9fbad71b9 *) next try todo a bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:38:25 +00:00
theli
6da97bd2e4 *) next bugfix for threadpool problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:31:40 +00:00
theli
bea2b9edee *) further redesign of threadpools to solve too many thread problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:18:07 +00:00
theli
784fd50437 *) more verbose thread names
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1471 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 15:26:47 +00:00
theli
56e4dbeb71 *) displaying current active + current idle threads in PerformanceQueues_p.html now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 15:17:04 +00:00
theli
859c6a88f5 *) testing various thread pool eviction settings to avoid outOfMemory - Thread creation problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-27 16:51:29 +00:00