Commit Graph

872 Commits

Author SHA1 Message Date
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
b332dfad67 - inserted request object into response object which carries this now instead generating new objects
- fixed a problem with the crawler introduced in SVN 6216

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 23:08:35 +00:00
orbiter
ca72ed7526 -removed superfluous crawl cache
-refactoring of crawler classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 21:07:46 +00:00
orbiter
3f113f38a8 removed unused imports
removed unused libs from eclipse class path

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6201 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 10:19:10 +00:00
f1ori
f814e0fa81 enable warnings and fix most of it
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-11 21:01:27 +00:00
orbiter
21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6188 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-09 22:25:31 +00:00
orbiter
0e8647d62f refactoring of search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 22:14:57 +00:00
orbiter
dafffd0153 refactoring of parsers and document processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 21:48:08 +00:00
orbiter
77d2a3782c removed strange debugging strings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-06 15:21:43 +00:00
orbiter
16efcd0366 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2252&hilit=&p=16389#p16389
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-04 06:24:58 +00:00
orbiter
24cb6d68bc - renamed Stack to RecordStack to avoid name confusion with new classes
- added new Stack class that implements a stack on BLOB files
- added new Stacks class that can be used for a set of Stacks (a 'Stack Database')
- added methods to other classes to support the new stacks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-03 16:35:34 +00:00
orbiter
409538e17a code cleanup and code simplifcation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 22:20:55 +00:00
orbiter
1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:25:46 +00:00
orbiter
222850414e simplification of the code: removed unused classes, methods and variables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 09:27:46 +00:00
orbiter
93dfb51fd4 problems with code style
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6153 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-29 22:22:35 +00:00
orbiter
adf01c676e reduce lookup time when merging a large number of BLOBs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6152 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-29 16:07:10 +00:00
orbiter
9a674d8047 - After the removal of the Tree class some code simplifications are possible. This affects mostly the Records class, which can be refactored and the result of the refactoring results in a reduced number of classes.
- The EcoTable was renamed to Table.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-28 21:37:37 +00:00
orbiter
c5122d6836 completed migration of BLOBTree to BLOBHeaps:
- removed migration code
- removed BLOBTree
after the removal of the BLOBTree, a lot of dead code appeared:
- removed dead code that was needed for BLOBTree
Some more classes may have not much use any more after the removal of BLOBTree, but still have some component that are needed elsewhere. Additional Refactoring steps are needed to clean up dependencies and then more code may appear that is unused and can be removed as well.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-28 21:02:56 +00:00
orbiter
6b307d6d59 more tolerance for corrupted index entries in exported row sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-19 21:35:44 +00:00
orbiter
33aafa9b4b better logging when writing merged dumps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-19 17:02:50 +00:00
orbiter
4d29e90708 uaeh
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-19 14:54:39 +00:00
orbiter
3c3e6499ae added more logging for merge operation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6095 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-19 14:51:35 +00:00
orbiter
15180fc95e - patch for future computation in SplitTable
- added same concurrent process for has() from SPlitTable in ArrayStack

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-18 23:24:23 +00:00
orbiter
9a5ec20b3c avoid merge during startup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-18 21:18:56 +00:00
orbiter
ae015e8e98 refactoring of blob package classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-17 09:58:15 +00:00
orbiter
be1c7ddc64 refactoring of search classes -- moved Ranking Profile to search package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-16 21:45:40 +00:00
orbiter
5a7fd6b4c8 just some comment lines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-16 06:38:26 +00:00
orbiter
ce1adf9955 serialized all logging using concurrency:
high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 21:19:54 +00:00
orbiter
bc6dd8194b refactoring: moved search query class to new search package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 11:49:00 +00:00
orbiter
b8e738a7be a collection of
- small bug fixes
- better/more comments
- more asserts
- fixed synchronization
- test case enhancements
- code cleanup
- performance hacks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 22:09:08 +00:00
orbiter
945777aa80 replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 22:59:54 +00:00
orbiter
733385cdd7 enahnced database access times by removal of unnecessary synchronization.
added also more hacks that resulted from high-volum query testing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-10 23:02:42 +00:00
orbiter
1c54ae4a63 some small changes in HandleMap Testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-10 15:02:52 +00:00
orbiter
2c5554c912 small enhancements in search result computation speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-09 15:22:23 +00:00
orbiter
27fa6a66ad - completed the author navigation
- removed some unused variables

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-08 23:30:12 +00:00
orbiter
c079b18ee7 - refactoring of IntegerHandleIndex and LongHandleIndex: both classes had been merged into the new HandleMap class, which handles (key<byte[]>,n-byte-long) pairs with arbitraty key and value length. This will be useful to get a memory-enhanced/minimized database table indexing.
- added a analysis method that counts bytes that could be saved in case the new HandleMap can be applied in the most efficient way. Look for the log messages beginning with "HeapReader saturation": in most cases we could save about 30% RAM!
- removed the old FlexTable database structure. It was not used any more.
- removed memory statistics in PerformanceMemory about flex tables and node caches (node caches were used by Tree Tables, which are also not used any more)
- add a stub for a steering of navigation functions. That should help to switch off naviagtion computation in cases where it is not demanded by a client

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6034 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-07 21:48:01 +00:00
orbiter
bead0006da replaced tmp file extensions by prt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 18:09:58 +00:00
orbiter
a704d82280 patch for problem with digest
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 16:53:16 +00:00
orbiter
3029ef6eb3 fixed a bug that was recently inserted which caused that no idx and gap files were written.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 16:43:58 +00:00
orbiter
95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 01:56:31 +00:00
orbiter
42ae40b9f6 some bugfixes to database close() methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 22:43:46 +00:00
orbiter
9bfd22f65d fix for http://forum.yacy-websuche.de/viewtopic.php?p=15523#p15523
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 19:57:25 +00:00
orbiter
cc49aedf12 - fixed problem with remote search NPE
- more abstraction for search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-03 08:49:54 +00:00
orbiter
c38c852090 modified access method to get index entries out of a array of BLOBs:
iterate them, then merge; not collect them and merge then.
This should use less memory and may behave better in an environment with many queries.
To ensure that too many queries will not cause total blocking,
a time-out of one second was also added. After the time-out
the index data that was collected so far is returned.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-02 16:53:45 +00:00
orbiter
a5d481eab1 enhanced navigation
- fixed too early computation of navigation
- moved navigation rendering to yacysearchtrailer
- added more asserts

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-01 22:45:28 +00:00
orbiter
1c69d9b8b6 more refactoring of the index classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 14:16:41 +00:00
orbiter
88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 10:03:35 +00:00
lotus
d813fd26ed reset sent/received counters on index delete
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5991 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 15:49:42 +00:00
orbiter
99bf0b8e41 refactoring of plasmaWordIndex:
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment

The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 14:26:05 +00:00