Commit Graph

96 Commits

Author SHA1 Message Date
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
apfelmaennchen
bc3d3b4c97 fixed rebuildTags() to correctly rebuild folders...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-03 22:36:27 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
7d875290b2 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 22:13:30 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
orbiter
0f5c4abaca more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 10:12:48 +00:00
orbiter
4a80902081 - added ViewProfile as rdf in foaf syntax
- added link to rdf and vCard version on html page
- can be seen on http://localhost:8080/ViewProfile.html?hash=localhash
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 18:21:08 +00:00
apfelmaennchen
b1fae9b5af fixed import Netscape Bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 19:22:36 +00:00
apfelmaennchen
f3a9e9c542 added getFolderList() to bookmarksDB
added cleanTagsString() to bookmarksDB
added getFoldersString() to Bookmark
modified getTagsString() to exclude folderTags

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 20:11:57 +00:00
apfelmaennchen
e81bced2bd reorganized the code and adjusted getTagIterator() to suit folders
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 19:08:32 +00:00
apfelmaennchen
704de4dee8 Neue Funktion angelegt - notwendig für Einschränkung der Tagwolke
public Iterator getTagIterator(String tagName, boolean priv)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 15:58:47 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
fuchsi
1cb6e431a6 Replace the ISO8601 aka W3C datetime parser by one that supports every representation allowed by this standard, see http://www.w3.org/TR/NOTE-datetime
- useful expecially for sitemaps parsing, where this date format is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-19 22:45:58 +00:00
fuchsi
5b0c1449e1 various fixes and cleanups for blacklist handling:
1. avoid adding duplicate file name entries in config properties for lists, 
2. correctly merge all path masks from all list files for the same host masks,
3. rewrite helper methods standard java methods for Collection transformations,
4. merged various methods with identical functionality for different Collection implementations into one,
5. minor refactoring to improve code readability.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 06:20:27 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
allo
7a5b22a0b8 Integration of FeedReader in Bookmarks.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 23:27:42 +00:00
allo
65a8a9fc58 fix for nullpointer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-14 16:56:13 +00:00
orbiter
139c59ebbd - fixed dht selction problem: the seed tables used a wrong ordering
- cleaned some code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 17:59:36 +00:00
theli
0b5fc3c28c *) moving date functions to serverDate class
*) Sitemap-parser
   - logging added
   - parsing of modDate added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 12:36:49 +00:00
orbiter
40c14a4f0e - better implementation of search query properties
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 10:14:48 +00:00
allo
f4af360f7c bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-20 15:37:19 +00:00
orbiter
6ad39bae1e fixed shutdown problem
this fixes the 'inconsistency' messages during start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 08:48:47 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
1cba31de43 redesigned ram organization for database caches
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 22:43:32 +00:00
theli
bd03c6b874 *) bugfix in bookmarksDB:
- NullpointerException when trying to get an unknown bookmark
   - bookmarks can either start with http or https

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 11:56:46 +00:00
karlchenofhell
9623bf7bbe - removed call of java 1.5 method
- added config servlet for local robots.txt
- removed YPStats_p as it is of no use anymore
- supertemplates use XHTML now
- quick-fix for http://www.yacy-forum.de/viewtopic.php?p=32296#32296

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3422 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-01 13:54:14 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
e4910f03d1 tag storage fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 11:52:15 +00:00
orbiter
991182b29b more space for bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 00:20:03 +00:00
orbiter
88fa764b64 implemented new kelondroObjects into bookmarkDB
- Bookmark-Objects are stored inside the kelondroObjects cache
- removed superfluous classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 00:17:55 +00:00
orbiter
9c05e2a820 re-design ob kelondroMap
- this class is replaced by an object that can hold any type of object
- this object must be defined as a class that implements kelondroObjectsEntry
- the kelodroMap is now implemented as kelondroMapObjects

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 23:51:10 +00:00
allo
669c21db05 first version of abstracted kelondroMap Cache.
get returns a kelondroCachedObject(or in most cases a subclass of it),
or a map, which can be used to construct a kelondroCachedObject.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 19:10:55 +00:00
allo
14f2068daf some more bookmark changes towards multiuser bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3291 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-28 17:38:43 +00:00
allo
f40169fcd7 preparing multiuser bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 19:42:50 +00:00
karlchenofhell
3c43e605ba - don't accept malformed bookmarks, fix for: http://www.yacy-forum.de/viewtopic.php?t=3414 (first report)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 23:39:03 +00:00
orbiter
d07b132a0d - fixed colors of network grafic
- added option to activate write cache for seed-db
- did not activate write cache because it did not work

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 19:39:31 +00:00
(no author)
37e53b4a6a replaced tree database structure for seed db by flex data structure
I don't know if this helps, we will find out...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-07 23:34:13 +00:00
orbiter
30888e7a2f implementation of search constraints
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.

In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-23 02:16:30 +00:00
orbiter
497428c8ec refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-10 01:13:33 +00:00
theli
f37e2041e8 *) adding soap function to import yacy bookmarks from xml or html (transfered via soap attachments)
*) soapHandler: code cleanup for service deployment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2915 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 09:56:39 +00:00
allo
1d0c0edda3 first version of posts/get from the del.icio.us api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:16:09 +00:00
theli
97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
- serverFileUtils.java: 
   -- adding methods to copy from stream to writer and readers to writers
   -- moving httpc writeX methods into serverFileUtils class
   - serverCharBuffer.java: removing inheritance from Writer class
   - replacing htmlFilterOutputStream by htmlFilterWriter class which handles
     content as char stream
   - htmlFilterContentTransformer.java: deactivating getText mode 
    (still needs to be migrated to use char streams instead of byte streams)
   - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
   - changes in Scraper and Transformer classes to operate on chars instead of bytes
   - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 10:12:11 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
allo
bd13bd78ad Bugfix for broken Tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-10 17:41:59 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00