Commit Graph

1066 Commits

Author SHA1 Message Date
orbiter
b79e06615d - added new LURL.Entry class for next database migration
- refactoring of affected classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 22:25:07 +00:00
daburna
c97984bbac -corrected link and updated language file for simpleheader.template
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 18:41:46 +00:00
karlchenofhell
b14a500b88 - removed debug output from PerformanceMemory_p
- added URL escaping (tested, nevertheless watch out for possibly broken URLs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 14:51:37 +00:00
karlchenofhell
ebf0da2a45 - now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 12:07:17 +00:00
karlchenofhell
98a84ddb12 - reverted last change partly, can't handle the template system
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-17 22:22:36 +00:00
karlchenofhell
b5e40e2fa2 - fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-17 21:01:35 +00:00
daburna
6d1db21d0b -updated German language file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-17 11:08:06 +00:00
orbiter
77a59a115d refactoring of indexing methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 15:04:16 +00:00
orbiter
688cbfb776 - bugfixing for flextable bug
- bugfixing for collection index bug
- several other bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 00:27:25 +00:00
allo
74f09a0510 some more xml-backend files.
ConfigAdvanced_p.java: list settings after changing.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-15 21:41:47 +00:00
allo
a29b4d4fb5 extended Supertemplates for Headerincludes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-15 13:43:46 +00:00
theli
3bebe72544 *) Default Rex.Exp. changed back to .*.*
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-15 08:37:19 +00:00
daburna
ea9411f9d2 -surftips now working correct
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-14 14:13:12 +00:00
theli
5b114249ce *) Bugfix for ViewLog problem with multiline logging messages
See: http://www.yacy-forum.de/viewtopic.php?t=2972

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-14 13:21:07 +00:00
daburna
a1736675ca -ups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2769 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-14 11:19:43 +00:00
daburna
2de939f544 -updated translation
-removed wrong spelling; there is only 1 p in the English tip. I think the surftipps.java have to be updated.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-14 11:17:14 +00:00
auron_x
c628df43a4 *) removed unused image-file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 09:23:55 +00:00
orbiter
50f2578c55 - some bugfixing and code cleanup
- now assortments can completely left out if they do not exist
  before startup and collection index is selected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 01:19:26 +00:00
orbiter
a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
  also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
  deleted
- The complete database can be limited to a specific size (as wanted many times)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 23:14:41 +00:00
auron_x
e126598a0f *) small enhancement to webinterface, progressbars are now not stretched images, but <div>'s with colored background
-> all skin files were set to use green progressbars (should be changed to colors fitting the skins appearence)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 17:10:28 +00:00
rramthun
581dd2ec72 *)Proper arrow-function on Network.html, but ordering is still broken. Perhaps someone could fix that?
*)Removed double creation of DATA directory. New warning message in case of insufficient rights.
*) Removed roland-ramthun.de-seedlist temporarily, because of server changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-11 18:27:38 +00:00
orbiter
918b59dc5e - bugfix for snippet profile (no delete button)
- bugfix for search process (avoided null pointer exception in case other peer does not respond)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 20:16:20 +00:00
orbiter
2bb529cedb added peer tags for peers in robinson mode
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 20:09:26 +00:00
low012
f7447894f1 *) fixed link to WatchCrawler_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2740 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 12:39:29 +00:00
orbiter
afbb547f3d extended options for abstracts generation in remote search interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 12:22:16 +00:00
allo
3730ec3440 moving to a _p page.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 10:31:21 +00:00
orbiter
c8f3a7d363 added snippet-url re-indexing
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 23:07:10 +00:00
orbiter
2e4aa6a170 refactoring of Advanced Config:
- removed settings that are in Basic Settings
- joined pages that belong together
- moved include pages from yacy/ to /

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 10:24:54 +00:00
orbiter
0f10bdde22 more generic cache methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 02:13:13 +00:00
hermens
440c6ee657 Implement alternative htcache layout
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 17:25:19 +00:00
allo
226f2c5b2c first version, of the Serverlet Debugger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 14:25:54 +00:00
allo
e25172853a fixed license notice
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:25:05 +00:00
allo
1d0c0edda3 first version of posts/get from the del.icio.us api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:16:09 +00:00
orbiter
1969522dc1 removed lowercase of snippets (and other things):
- added new sentence parser to condenser
- sentence parsing can now handle charsets

to do: charsets must be handed over to new sentence parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 00:06:09 +00:00
low012
07155ef3b0 *) added a few constraints to prevent exceptions when clicking on stop or pause on IndexCleaner_p.html when no thread is started
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-05 21:32:07 +00:00
orbiter
db294687ea enhanced logging
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 22:55:59 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
orbiter
bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 13:35:38 +00:00
orbiter
5a40ea7866 refactoring of wget string list generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:59:20 +00:00
orbiter
dbc2e039bb added time-out option parameter to call hierarchy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:40:18 +00:00
orbiter
b59d4576af increased version number to emphasise that the snippet fix
_dramatically_ increased search speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:50:57 +00:00
orbiter
d4c239e4be - fixed problem in collection index with deletion of single url references
- added automatic deletion of not-found snippets after search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:40:52 +00:00
orbiter
00746ca232 identified and fixed search performance problem caused by
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:15:02 +00:00
orbiter
4d9e1b43dd surftipps appearance update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 00:13:59 +00:00
orbiter
310f1c41cd added option to see ranking scores in surftipps
and some cleanups

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:28:03 +00:00
orbiter
7c0e6de366 bugfix for surftipps votes (wrong page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:06:38 +00:00
orbiter
3ad0709b53 added a delete button to crawl profile list.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 22:35:59 +00:00
theli
a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 10:09:01 +00:00
theli
cd5f349666 *) Better handling of large files during parsing
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
   Attention: the caller of this function has to ensure that enough memory is available to do this 
   to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java: 
   - better handling of documents with exotic charsets
   - better handling of large documents
   - better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
   to this object as byte array or temp file   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 09:31:53 +00:00
theli
8b2ceddb91 *) Displaying servere and warning logging messages in different colors on ViewLog_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 08:12:22 +00:00