Commit Graph

2597 Commits

Author SHA1 Message Date
allo
226f2c5b2c first version, of the Serverlet Debugger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 14:25:54 +00:00
orbiter
adf1f74ab2 bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 10:35:35 +00:00
orbiter
fd61209797 lines inside tags without punctuation are extended by a single dot.
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 01:24:00 +00:00
allo
e25172853a fixed license notice
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:25:05 +00:00
allo
1d0c0edda3 first version of posts/get from the del.icio.us api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:16:09 +00:00
orbiter
1969522dc1 removed lowercase of snippets (and other things):
- added new sentence parser to condenser
- sentence parsing can now handle charsets

to do: charsets must be handed over to new sentence parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 00:06:09 +00:00
orbiter
43614f1b36 bugfix in collection index. the index for collections was not created correctly
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-05 23:47:08 +00:00
low012
07155ef3b0 *) added a few constraints to prevent exceptions when clicking on stop or pause on IndexCleaner_p.html when no thread is started
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-05 21:32:07 +00:00
orbiter
1dfab1abe3 more control for seed receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-04 08:55:01 +00:00
theli
1c0e65f55f *) Bugfix for problems with charset detection
See: http://www.yacy-forum.de/viewtopic.php?p=26196

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-04 04:54:21 +00:00
orbiter
db294687ea enhanced logging
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 22:55:59 +00:00
borg-0300
08aa9d4c07 duplicate removes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 19:55:28 +00:00
theli
a9a0f51303 *) suppressing InterruptedException errormessage
See: http://www.yacy-forum.de/viewtopic.php?t=2915

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 15:40:18 +00:00
theli
ce7ee74316 *) better errorhandling in filehandler (try catch block now starts before argument parsing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 14:21:46 +00:00
theli
1d4fb680ce *) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
TODO: make this limit configurable 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 12:16:25 +00:00
theli
1586d57187 *) odtParser: better handling of large files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 12:00:26 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
orbiter
630a955674 read snippets from cache in case they are not provided in RAM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 17:18:24 +00:00
allo
b114def2f8 duplicate classpath entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 16:09:54 +00:00
allo
2ab09e71a7 removing absolute Classpaths
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 16:07:52 +00:00
allo
a723c2809d -t(aillog) option, to start monitoring the log after startup. So you see the log, but can stop viewing it with ctrl+c, without stopping yacy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 16:00:04 +00:00
allo
fda7031991 further cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 15:52:20 +00:00
allo
f0ed7f43c4 more sh (i.e. /bin/dash instead of /bin/bash as sh) compatibility
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 15:29:52 +00:00
orbiter
bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 13:35:38 +00:00
orbiter
c40fca08a2 fixed bad handling of string separation
you can now use a new encoding attribute to create strings from byte arrays

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 10:21:14 +00:00
orbiter
5a40ea7866 refactoring of wget string list generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:59:20 +00:00
orbiter
dbc2e039bb added time-out option parameter to call hierarchy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:40:18 +00:00
orbiter
b59d4576af increased version number to emphasise that the snippet fix
_dramatically_ increased search speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:50:57 +00:00
orbiter
d4c239e4be - fixed problem in collection index with deletion of single url references
- added automatic deletion of not-found snippets after search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:40:52 +00:00
orbiter
00746ca232 identified and fixed search performance problem caused by
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:15:02 +00:00
orbiter
4d9e1b43dd surftipps appearance update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 00:13:59 +00:00
orbiter
b033a80750 better control of failure in node seek of kelondroTree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 00:13:19 +00:00
rramthun
ca8ef0ca9f *)Documented the lng-file format
*)Updated language files to the new standard, especially German
*)Wrote language highlighting definition for Notepad++
*)Corrected News.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-01 12:23:35 +00:00
orbiter
310f1c41cd added option to see ranking scores in surftipps
and some cleanups

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:28:03 +00:00
orbiter
7c0e6de366 bugfix for surftipps votes (wrong page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:06:38 +00:00
orbiter
3ad0709b53 added a delete button to crawl profile list.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 22:35:59 +00:00
allo
971bfc6f15 added ChangeLog based on Rolands Newsletter.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 11:07:32 +00:00
theli
a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 10:09:01 +00:00
theli
cd5f349666 *) Better handling of large files during parsing
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
   Attention: the caller of this function has to ensure that enough memory is available to do this 
   to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java: 
   - better handling of documents with exotic charsets
   - better handling of large documents
   - better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
   to this object as byte array or temp file   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 09:31:53 +00:00
theli
8b2ceddb91 *) Displaying servere and warning logging messages in different colors on ViewLog_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 08:12:22 +00:00
low012
f8ac694e51 *) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 00:27:42 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
allo
b78d171b1e Windows installer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 21:13:56 +00:00
theli
c665f6cddb *) handling of quotes in charset string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 06:29:15 +00:00
theli
b73efd5565 *) missing changes needed because of last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 05:48:28 +00:00
theli
65c1f13d11 *) migration to newer odt parser lib
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 04:47:39 +00:00
theli
140ddba93f *) adding soap functions to pause and resume the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-27 05:22:43 +00:00
theli
ed8227d222 *) Bugfix for NullpoinerException in IndexCreateIndexingQueue_p.java
See: http://www.yacy-forum.de/viewtopic.php?p=25874

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-27 04:35:02 +00:00
theli
c0f7a4124c *) Bugfix for soap templates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-27 04:24:32 +00:00
orbiter
2463e5624a 'quick' release 0.47
- documentation update
- necessary bugfixes (missing css for new peers)
- reduced effect of search result redundancy filter
- removed some debug output, but not all

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 23:41:54 +00:00