Commit Graph

80 Commits

Author SHA1 Message Date
theli
2b3f964037 *) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 08:42:55 +00:00
theli
b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
   make many methods for httpHeader/Requestline parsing
   reusable for new icap implementation
*) adding chunked input stream support
   - needed by new icap implementation
   - needed by future httpc HTTP/1.1 support 
*) httpd.java
   - moving all connection property contants to class httpHeader
   - moving readHeader function to class httpHeader
   - moving parseQuery function to class httpHeader
   - moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
   - adding new fuction to parse the http response line
   - adding new function to converte http headers to a string that
     can be send to the client
   - adding a function that generates a proper url using all parsed
     connection properties
*) ICAP Support
   - yacy now supports handling of icap response modification requests
   - this feature can be used by other icap enabled proxies to contact 
     yacy as icap server, and to handover the downloaded content to yacy.logging
     for indexing
   - functionality was successfully tested with squid 2.5Stable 10 + icap patch
   - further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
   - htcache entries that are still needed for indexing are now properly registered 
     as in use after system restart
   - extended logging: log message now shows parsing and indexing time for each sb. entry
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 21:49:47 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
rramthun
4cb382decb Adding changes by borg-0300 from http://www.yacy-forum.de/viewtopic.php?t=997
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-20 17:05:01 +00:00
orbiter
ba0a486328 moved printStackTrace() to logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 23:35:18 +00:00
theli
470839a16a *) Crawler/Session pool settings will now be stored properly into configfile
Bugfix for:
- http://www.yacy-forum.de/viewtopic.php?t=502
- http://www.yacy-forum.de/viewtopic.php?t=778

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 12:20:03 +00:00
orbiter
858cd94299 replaced indexing ram-queue by file-based stack-queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-06 14:48:41 +00:00
orbiter
712fe9ef18 bugfixed utf-8 decoding and parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-29 22:55:37 +00:00
theli
6697d5e52e *) correcting fkt. mediaExtContains
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 06:44:31 +00:00
theli
aae9a433a6 *) correcting usage of supportedFileExt-List
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 07:43:59 +00:00
orbiter
1e7f062350 many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 02:07:45 +00:00
orbiter
a25b5b4986 fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 18:31:28 +00:00
theli
9e47ba5ad6 *) adding missing calls for function close() to avoid "too many open file" bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@282 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 08:34:52 +00:00
theli
9a98988c3c *) Bugfix for SSL/NIO Bug
See: http://www.yacy-forum.de/viewtopic.php?t=516
   - removing NIO from server/serverCore.java because of massive problems
     with socket close issues
*) Adding support for remote port forwarding via sch
   @Orbiter: Please take a look into
   - hello.java
   - server/serverCore.java.publicIP()
   - yacy/yacyClient.java.publishMySeed(...)
*) Making startup loading of additional content parsers more failsafe


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 07:28:07 +00:00
theli
890e3f4d4a *) adding missing calls for function close() to avoid "too many open file" bug*) adding
*) bugfix in plasma/plasmaParser.java:
   - parsers with missing dependencies wehre not ignored correctly
*) passing a logger instance to the parsers modules which can be used 
   for logging purposes by the parsers (not done yet)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-13 13:49:17 +00:00
theli
1b5ae054f8 *) changing reference to logger
*) parser will not be returned into pool if the parser was deactivated
   via gui

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@250 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:38:00 +00:00
theli
0484c41a84 *) replacing system.xxx.println with logging statements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-23 09:08:36 +00:00
theli
893a662329 *) Adding missing cast statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-17 08:40:27 +00:00
theli
361f05978d Multiple updates regarding the yacy seedUpload facility,
optional content parsers, thread pool configuration ...

Please help me testing if everything works correct.

*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be 
  included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
  available, the uploader is deactivated automatically.

*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
  this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
  the list of enabled parsers is now stored in the main config file

*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool

*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631

*) Replacing some hardcoded strings with the proper constants of the httpHeader class

*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-17 08:25:04 +00:00
theli
2aa5fe8f50 *) Import statements reorganized
Now it's easier to determine which class really uses which other class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@82 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-05 05:32:19 +00:00
theli
351c86d5d9 *) Migration of optional Content Parser integration
- each additional parser must be in a subpackage 
  of plasma.parser
- each parser must have its own ant build file (which will 
  be called automatically from the main build file)
- Calling the main build file results in building a separate 
  zip file for each optional parser. This zip file includes:
  + sources of the Parser.java
  + compiled classes of the Parser.java
  + needed additional libs (libx)
- To install an additional parser the user simply needs to
  extract the zip file listed above into his/her yacy directory.
- The configuration (enabling/disabling) of a parser can be done
  via the webinterface (currently the settings dialoge) and is
  done "on-the-fly". The installation can not be done "on-the-fly"
  at the moment because of classpath issues.
- The classpath of the linux startup/stop scripts is generated 
  automatically now (including all libraries from lib and libx).

*) Bugfix: File Extension was not calculated correctly by the crawler
   e.g.: file extension was accidentally: .php?param=value
   Corrected.

*) Adding additional parser for parsing of rss/atom feeds
- added needed libs to do this.

TODO:
- automatic building classpath for windows startup scripts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@78 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-03 09:47:56 +00:00
orbiter
48650c082c fixed 100%-CPU-Bug in plasmaCondenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@72 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-29 12:07:13 +00:00
orbiter
995673d795 several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@71 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-28 22:04:57 +00:00
theli
f44b219e44 *) Eclipse has accidentally copied in the wrong file header into the new files (because these headers were accidentally set as default for the whole workspace instead of the project)
Fixed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@48 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:47:34 +00:00
theli
58b1a0ba40 *) adding an new package for extra content parsers
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:24:53 +00:00
orbiter
8b31f9e202 enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@44 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-23 13:00:56 +00:00
orbiter
00f223cfc1 fixed post-parsing (a case when the bluelist is empty)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@41 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-21 17:13:43 +00:00
orbiter
e7d055b98e very experimental integration of the new generic parser and optional disabling of bluelist filtering in proxy. Does not yet work properly. To disable the disable-feature, the presence of a non-empty bluelist is necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@17 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 15:52:00 +00:00
orbiter
a87a17a3c8 prepared generic text parser environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@15 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-12 22:57:54 +00:00