Commit Graph

14 Commits

Author SHA1 Message Date
theli
1586d57187 *) odtParser: better handling of large files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 12:00:26 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
theli
b73efd5565 *) missing changes needed because of last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 05:48:28 +00:00
theli
b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
   - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 12:25:07 +00:00
orbiter
3aac5b26da - added automatic tag generation when a web page from the search results is added
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 00:37:02 +00:00
theli
74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 13:18:12 +00:00
theli
d0a5a53789 *) changes needed for multi-language support
- parsers may need to know the charset of the byte stream 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 12:52:46 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
theli
45b39ee1be *) solving unpacking problems with to long filename by
a) renaming the parent folder in the tgz file to yacy
      (can be configured via build properties file)
   b) reconfiguring build file to throw an error if a file
      name is too long 
Please note that currently there is _no_ proplem with too long
class names because of step a.

      

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-14 15:18:41 +00:00
orbiter
015d044c25 tried to fix some problems with latest changes to httpc
very experimental!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 16:01:14 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
theli
bdf30117c1 *) Redesign of parser configuration
- restructuring of mimeTypes based on the parsers
   - displaying parser usage count
   - displaying human readably parser names
   - displaying parser version information

*) httpdFileHandler.java
   - adding possibility to support "streaming" servlets
     which are special servlets that can communicate with
     the client via the connection streams autonomous
   - the name of these new servlet types must end with the 
     file extension .stream
   - this feature will be needed by the yacy ScreenSaver
     class to fetch statistic data from the peer without the
     need to reconnect to the server all the time

*) Adding human readable names and version information for
   all supported parsers

*) plasmaParser.java
   - adding new structure to store parser statistic data

*) Adding openDocument parser
   - can be used to parse odt files

*) jmimemagic
   - adding rules to detect openDocument formats properly

*) serverLog.java
   - adding functions that can be used to query if a given
     logging level is enabled or not.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 07:27:58 +00:00