Commit Graph

117 Commits

Author SHA1 Message Date
orbiter
b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 22:21:22 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
orbiter
23dd972608 fixed memory calculation in performanceMemory web page
fixed also maximum cache size computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-20 01:20:34 +00:00
orbiter
1ce3c22761 better memory control:
- added memory monitor for preNURL-db in performanceMemory
- changed default memory assignments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 13:09:04 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
689bbcf9cd replaced kelondroTree db for NURLs by new kelondroFlexTable
The new database is only created if the old is deleted or does not exist

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 23:36:58 +00:00
orbiter
740d49751d * strict type and size check in kelondroRow handling
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
  because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
  into a properties file along the database files.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 03:20:44 +00:00
theli
9f298083cd *) adding more urls to the error url
- old error strings where replaced with there corresponding constants   
   See: http://www.yacy-forum.de/viewtopic.php?t=2638

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 15:11:14 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
orbiter
7fd90ca7c8 * strict handling of NURL entry element generation, storage and stacking
* more space for EURL reason strings (you must delete the EURL db to use this)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:04:14 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
c36e9fc8d3 full integration of kelondroRow
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-02 12:45:57 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
rramthun
f08e33680c Added Blog-news-symbol as requested.
I think I will change the character distance a little bit later.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 15:42:06 +00:00
theli
ddfe0f0e27 *) don't try to parse referer string if it's null
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-13 15:45:04 +00:00
orbiter
3e31820c3d - corrections to PerformanceMemory display of object cache
- configuration of object cache size in kelondroTree initializer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 09:08:42 +00:00
orbiter
00a5d435e2 - fixed some bugs with domain filter
- added new ranking filter "prefermask": urls that match the filter are ranked better


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-13 23:19:36 +00:00
orbiter
7a650d0023 several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-27 16:45:29 +00:00
orbiter
59d52fb4a9 fixed some problems with crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-26 14:52:01 +00:00
orbiter
63f39ac7b5 added 3 new crawling steering options:
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
488a0ed580 replaced old keyIterator and rowIterator by buffered iterators
that are synchronized with database access
Main change is done in kelondroTree, other classes are only adoptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 23:43:24 +00:00
orbiter
dba02f399f starting of re-design of kelondroTree iterator
- new access to iterator
- added many IOException handling in other Classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 20:52:43 +00:00
orbiter
f02b426073 made kelondroTree.nodeIterator private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 18:10:48 +00:00
theli
89286478e7 *) removing thread pool eviction for now. Not needed at the moment
See: http://www.yacy-forum.de/viewtopic.php?p=18290#18290

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 09:09:36 +00:00
theli
fbbbf5f411 *) remote trigger for proxy-crawl
- remote crawling can now be enabled for the proxy crawling profile
   See: http://www.yacy-forum.de/viewtopic.php?p=17753#17753
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-24 09:35:54 +00:00
theli
2da18ab359 *) correcting logging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:56:17 +00:00
theli
8ffc6e35ad *) correcting logging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:28:54 +00:00
theli
ab7a911bb3 *) Trying to solve pool not open problem
See: http://www.yacy-forum.de/viewtopic.php?t=1798

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-29 08:54:19 +00:00
hydrox
d665f3c39c *) fixed Threadnames for stackCrawl-Threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 19:06:21 +00:00
theli
3d5347bc8e *) changing loglevel for some messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 17:40:24 +00:00
theli
b9c9eaeb44 *) next try todo a bugfix :-((
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 17:05:47 +00:00
theli
4b4b93c413 *) next try todo a bugfix :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:55:05 +00:00
theli
d9fbad71b9 *) next try todo a bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:38:25 +00:00
theli
6da97bd2e4 *) next bugfix for threadpool problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:31:40 +00:00
theli
bea2b9edee *) further redesign of threadpools to solve too many thread problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 16:18:07 +00:00
theli
784fd50437 *) more verbose thread names
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1471 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 15:26:47 +00:00
theli
56e4dbeb71 *) displaying current active + current idle threads in PerformanceQueues_p.html now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-28 15:17:04 +00:00
theli
859c6a88f5 *) testing various thread pool eviction settings to avoid outOfMemory - Thread creation problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-27 16:51:29 +00:00
rramthun
6c02f889f7 Cosmetic changes.
Corrected version numbering as described in http://www.yacy-websuche.de/wiki/index.php/De:Versionsnummern

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-26 15:12:44 +00:00
theli
b191f06d16 *) Adding additional logging message to locate problems with stackcrawl threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-26 14:24:29 +00:00
theli
d9bcd73d93 *) Bugfix for exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-26 12:15:59 +00:00
theli
f5abfe8d57 *) more failsafe threadpools
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-26 09:37:43 +00:00
rramthun
c4487deba9 Minor changes collected over some time.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-12 19:15:31 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
orbiter
9086261476 refactoring of base64 encoding:
the kelondro database needs specific information about the order of
base64-encoded keys. Since no other package depends on base64
(only the httpd uses base64 for encryption, but does not need to encode these strings)
it is good to move base64 encoding to the new ordering classes in kelondro.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 00:39:00 +00:00
orbiter
0c762daf4b better startup failure handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 23:59:58 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
3d8a5ae652 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 14:24:13 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
low012
90b0eb144e just a typo...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-03 09:58:00 +00:00
theli
b35c5a48bf *) First version of urlRedirector.pl script
- with this script it's possible to pass URLs from squid
     to yacy via the squid redirector interface
   - this URLs are then used by YaCy to feed the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:27:03 +00:00
theli
d0dfccdb77 *) Making CrawlStacker pool configurable via GUI and config file
See: http://www.yacy-forum.de/viewtopic.php?t=1448

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:46:22 +00:00
theli
6f9f8ed8f8 *) Automatic Reset of Stack Crawler DB on startup errors
See: http://www.yacy-forum.de/viewtopic.php?t=1432

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:19:05 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
borg-0300
00ab4d8723 cleaned, small change, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 13:41:51 +00:00
rramthun
27f180f24b Update of YaWoStat to 0.2.
Now does not try to make 400000! operations to load a 4MB textfile :-/

Program is not finished yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1000 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-28 16:27:49 +00:00
rramthun
a98bafb939 Changes to german language file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 20:36:45 +00:00
theli
c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
*) Displaying statistic for incoming connections on status page
*) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy
   See: http://www.yacy-forum.de/viewtopic.php?p=6826
*) Bugfix for Referer Bug
   See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098
*) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 08:17:43 +00:00
orbiter
579b22d8ff small update to network drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 23:11:17 +00:00
orbiter
2b5829c3da small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 19:29:25 +00:00
orbiter
4c7918f5b5 added shotdown to crawl stacker (moved from 882)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 16:40:44 +00:00
orbiter
2851658c2a re-integrated Martins last change to crawl stacker from svn 882 that I had deleted accidently
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 16:11:41 +00:00
orbiter
c83594528c integrated crawl stacker into thread control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 15:59:09 +00:00