Commit Graph

300 Commits

Author SHA1 Message Date
orbiter
a19541e563 code-enhancements after analysis with AppPerfect
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-20 16:36:31 +00:00
orbiter
85075269a6 extended fail-safe memory-managament. prevents too much allocation, too often GC and should help for the 100%CPU-bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-20 00:46:23 +00:00
orbiter
e3c92818db avoiding OutOfMemoryError routines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-19 13:37:17 +00:00
orbiter
3e8ee5a46d enhanced caching in kelondroRecords and added better synchronization/finalizer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-19 05:27:42 +00:00
orbiter
5d06ded005 enhanced html parser speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-17 01:26:51 +00:00
orbiter
5a490aa065 fixed html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 21:49:56 +00:00
orbiter
a25b5b4986 fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 18:31:28 +00:00
theli
9e47ba5ad6 *) adding missing calls for function close() to avoid "too many open file" bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@282 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 08:34:52 +00:00
orbiter
a1ffc27041 preparations for image/movie/music indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 00:31:13 +00:00
orbiter
a5b40923b6 added word migration to assortments (start with 'java -classpath classes yacy -migratewords')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-15 01:22:07 +00:00
theli
ee9e110366 *) removing old logging configuration properties from yacy.init
*) serverLog.java logging functions now also accept exceptions als
   additional parameters.
   The Stacktrace of this ecceptions will then be appended to the 
   logging message and can e.g. be viewed on the gui logging page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-10 09:19:24 +00:00
theli
c1a4e0dc28 *) changing reference to logger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:44:55 +00:00
orbiter
4574fa4ce7 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@224 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 15:28:29 +00:00
orbiter
33f9315e58 implemented multithreading of indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 13:19:05 +00:00
orbiter
ca3b4ccaf4 added snippet-routines (not yet finished)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 00:52:24 +00:00
orbiter
594c591223 changes towards 0.38
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@208 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-03 02:43:35 +00:00
orbiter
d8fdc2526e added experimental snipplet-generation (to be disabled for 0.38)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@206 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-02 01:33:10 +00:00
orbiter
3771b10b89 implemented automated migration indexCache 0.37 -> indexAssortmentCluster
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-01 14:24:25 +00:00
orbiter
e89ded9e41 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-31 22:12:43 +00:00
orbiter
3d8a2ff937 enhanced parallelization of local/global/remote crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-29 11:56:40 +00:00
orbiter
21110dcd5e fixed bugs with open files and caching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-25 13:48:48 +00:00
theli
74eb21f62e *) adding image tag into rss template
*) adding a xslt stylesheet so that the rss document can be viewed in a normal webbrowser
*) adding pubDate tag to each search item

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-25 08:47:34 +00:00
orbiter
5c6147a54c introduced assortment structure (generalization of singletons)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@139 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-18 21:52:17 +00:00
theli
73e297f30f *) adding proper default values for RealtimeParsableMimeTypes if something goes wrong with the configuration file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-18 07:40:48 +00:00
theli
361f05978d Multiple updates regarding the yacy seedUpload facility,
optional content parsers, thread pool configuration ...

Please help me testing if everything works correct.

*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be 
  included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
  available, the uploader is deactivated automatically.

*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
  this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
  the list of enabled parsers is now stored in the main config file

*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool

*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631

*) Replacing some hardcoded strings with the proper constants of the httpHeader class

*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-17 08:25:04 +00:00
theli
ddc5675781 *) Correcting typo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-14 11:14:34 +00:00
theli
d2c4e9a55e *) Implementing yacy forum wishlist item: "Pause Crawling"
see: http://www.yacy-forum.de/viewtopic.php?t=48



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-14 09:41:05 +00:00
orbiter
b4030e5023 implemented serverSwitchActions - action-hooks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-11 14:58:03 +00:00
orbiter
1d7fed87dc redesign of index caching - removed indexCache.db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@86 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-07 21:11:18 +00:00
rramthun
3f85978519 Fixed one spelling mistake, limited input for ICQ numbers to 9 digits and made ICQ number in peer profiles clickable.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@85 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-07 21:07:43 +00:00
theli
2aa5fe8f50 *) Import statements reorganized
Now it's easier to determine which class really uses which other class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@82 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-05 05:32:19 +00:00
orbiter
48650c082c fixed 100%-CPU-Bug in plasmaCondenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@72 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-29 12:07:13 +00:00
orbiter
995673d795 several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@71 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-28 22:04:57 +00:00
orbiter
2de90020ed fixed caching+synchronization+brute-force-denial
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@67 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-27 21:09:40 +00:00
orbiter
9156fd53bc fixed bugs in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@65 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 15:47:33 +00:00
orbiter
e25f2354c2 removed synchronization and thread blockings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@63 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 14:19:44 +00:00
theli
58a65b60bd *) synchronized keyword removed from function processLocalCrawling to avoid deadlocks.
This synchronized keyword is not needed anymore because of the crawler jobqueue which
   is responsible for the synchronization now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@60 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 06:59:36 +00:00
theli
65fc650109 *) plasmaCrawlLoader shutdown problem fixed (hopefully)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@59 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 16:34:16 +00:00
orbiter
ba16da72b4 fixed not-working kelondroRecords-Cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@56 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 14:46:59 +00:00
orbiter
7fb645b0ab enhanced crawling performance, changed memory settings, new performace options
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@51 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 23:15:40 +00:00
theli
58b1a0ba40 *) adding an new package for extra content parsers
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:24:53 +00:00
orbiter
8b31f9e202 enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@44 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-23 13:00:56 +00:00
orbiter
00f223cfc1 fixed post-parsing (a case when the bluelist is empty)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@41 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-21 17:13:43 +00:00
orbiter
97ec8d65e4 fixed makerelease & clean-up of dead code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@33 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-19 14:04:16 +00:00
orbiter
b9203bdb50 bug fixes and code cleaning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@22 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-15 14:18:14 +00:00
orbiter
c0807abd33 new crawl/proxy/cache design + fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@18 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 23:00:20 +00:00
orbiter
e7d055b98e very experimental integration of the new generic parser and optional disabling of bluelist filtering in proxy. Does not yet work properly. To disable the disable-feature, the presence of a non-empty bluelist is necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@17 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 15:52:00 +00:00
orbiter
a87a17a3c8 prepared generic text parser environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@15 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-12 22:57:54 +00:00
orbiter
89eb9a2292 fixed bug with crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@12 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-10 23:51:42 +00:00
orbiter
248077d3f0 initial load with yacy 0.36
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-07 19:19:42 +00:00