Commit Graph

1643 Commits

Author SHA1 Message Date
danielr
74b1a60043 fixed "java.lang.NoClassDefFoundError: org/a"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-10 08:42:31 +00:00
orbiter
f42c8cf69c updated terminal and dynamic webstructure applet: can now change when crawl is running
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-09 00:01:47 +00:00
orbiter
7ec01d444a fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-08 20:25:11 +00:00
danielr
ae03a54d23 pdfParser: updated lib, fixed ClassNotFoundException: CMSError
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-08 16:55:45 +00:00
orbiter
719f5defb1 updated some grafics at new terminal_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 23:42:14 +00:00
lotus
9bc56a9edc xss protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 16:37:13 +00:00
orbiter
b32736762c enhanced rssTerminal
- 3 lines possible
- distinguishing of private and public data, if not authorized only public data is shown
- shows now more events, including local searches in clear text if user is logged in
- simplyfied peer events
- better recognition of 'real' new peers
- presentation of peer pings from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 23:05:48 +00:00
orbiter
fbb712c669 refactoring:
moved importer classes to crawler and plasma package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 13:44:38 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
d0678f7ab9 refactoring as result of
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=959&p=7560#p7560

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4752 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-01 22:40:42 +00:00
orbiter
483e9a2066 - shifted tld recognition methods from yacyURL to serverDomains
- changed isLocal Property in such a way that it is possible to see if a domain is in the internet (and not intranet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-30 23:06:42 +00:00
orbiter
a3df23659c re-implementation of charset checking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-30 13:23:05 +00:00
orbiter
32b5b057b9 - modified, simplified old kelondroHTCache object; I believe it should be replaced by something completely new
- removed tree data type in kelondroHTCache
- added new class kelondroHeap; may be the core for a storage object that will once replace the many-files strategy of kelondroHTCache
- removed compatibility mode in indexRAMRI


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-29 22:31:05 +00:00
orbiter
88216c1f1f fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1103&hilit=&p=7362#p7362
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:59:20 +00:00
orbiter
d0b893523e - protection against RAM overflow caused by new peer rss news
- more XSS protection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:53:04 +00:00
orbiter
685794e7e7 fix for parser/encoding Exception
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111&hilit=&sid=55a320b54e1e3bda9410e7c50b5147f1&p=7431#p7431

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:14:45 +00:00
orbiter
9935e83c86 added new news window into the status page. At this moment it is just a test.
The news inside the window are about peer arrivals and departures, remote search accesses and crawls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 01:00:10 +00:00
orbiter
bac38cfa18 added very rudimentary peer news as rss feed. An example can be retrieved with
http://localhost:8080/xml/feed.rss?channel=PEERNEWS
to be extended and integrated in interface ...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 23:30:13 +00:00
orbiter
724bbdf9b2 refactoring of RSS reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4736 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 21:31:07 +00:00
orbiter
b9a2a2d287 more search performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 15:09:06 +00:00
orbiter
ff755fb858 small corrections and enhancements after search timing profiling
search should be a little bit faster now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 13:31:55 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00
orbiter
1995faef8d - refactoring of Colage back-end: move to plasma package
- renamed also the plasmaCrawlResults to have a consistent naming for url and image queues
- added a double-check for the images
- added additional queues for the images: all worse-quality images go there, so the queue can be used also if no sizes are given; no image is lost
- added a cleanup for the stacks so they cannot flood the memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-21 22:42:49 +00:00
orbiter
d7e89c2aca fixed near-deadlock situation when deleting crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-20 22:10:26 +00:00
orbiter
5e3ce46339 - better logging when rejecting a url because it is not in declared domain
- more XSS attack protection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-20 21:36:25 +00:00
orbiter
512f48e7d6 - removed unused methods
- fixed xss attack on peer list in CrawlStartSimple

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 03:33:07 +00:00
danielr
96e39b297a reduced StackTraces (by connect timed out)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:50:49 +00:00
orbiter
8313d58ae7 - integrated the collage into the Web Visualization menu
- added a counter for the public and private queue on the page (testing..)
- fixed wrong public/private categorization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 15:45:57 +00:00
orbiter
82bf9ac1c8 - added Collage servlet from datengrab and modified it:
* all images are queued
* private/public is respected
* inserted into switchboard
* added collageQueue class that stores all the queued images

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 13:24:21 +00:00
danielr
959f448e5f - disabled redirects in proxy (so client sees real path)
- added connection stats (only connections currently in use)
- remove "old" connections (closed or idle for some time)
- synchronized shared parts of proxyHandler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 11:39:48 +00:00
orbiter
5d1fbb25e7 fix for bad deploy:
- the name of downloaded release files is adopted if the httpc delivers uncompressed tar.gz files (the .gz is removed from the file name)
- the deploy method is able to handle tar-file (not tar.gz-files)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-11 12:37:17 +00:00
orbiter
202a3adb3e refactoring of HttpClient Writer processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 22:47:05 +00:00
orbiter
2c2dcd12a2 - enhanced performance of Eco-Tables: less time-consuming size() - operations
- will increase speed of indexing and collection.index creation


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 13:24:55 +00:00
orbiter
c3342e1178 - removed class with only one static method
- removed connection method with too long time-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 23:35:20 +00:00
orbiter
2c1c3bb6eb - some refactoring (sorry Daniel, hab in deinem Code rumgewütet)
- fixed broken downloads (flush was missing)
- different problem handling when download is corrupted
- different default values in yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:36:33 +00:00
orbiter
14404d31a8 - enhanced performance graph (more info)
- added conditions for rarely used logging lines to prevent unnecessary CPU usage for non-printed info

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 14:44:39 +00:00
orbiter
696b8ee3f5 fix for http://forum.yacy-websuche.de/viewtopic.php?p=6806#p6806
- removed all InputStream.available() because this does not work for files > 2GB
- iterator terminate when a IOException occurs
- added handling of non-executing index.add methods to enhance assert usage
- added index for file indexes > 2GB, to be used in new indexHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 11:55:59 +00:00
danielr
081ed1d3ec HTTPLoader: reduced stackTraces
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 16:56:15 +00:00
orbiter
225f9fd429 various fixes
- shutdown behavior (killing of client sessions)
- EcoFS reading better
- another synchronization in balancer.size()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 13:12:58 +00:00
orbiter
f099061944 protection against bad dht-flush word selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 20:25:05 +00:00
orbiter
117ae78001 speed enhancement for reading of eco-table indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4647 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 11:50:15 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
783a4c9edb strong speed enhancements for the index cache dump and restore:
storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds
to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and
3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-02 13:18:23 +00:00
orbiter
442204a1c8 fix for concurrentModificationException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-01 21:21:37 +00:00
orbiter
d2f4926951 - more logging for balancer to get a hint where the problem is
- fix for new concurrency method in kelondroSplitTable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 18:45:27 +00:00
orbiter
20dadba426 - added a deadlock prevention function in cache flushing
- removed unused methods in collection index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 17:51:51 +00:00
orbiter
764a40e37d speed enhancements for crawler and url retrieval (affects also search speed)
- concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed
- because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed
- search speed also profits from LURL-lookup enhancement
- changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing
- removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 15:41:19 +00:00
orbiter
368593e449 enhanced the concurrency handling of indexing process (better queue size control, better data concept, better shutdown behavior)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 00:03:44 +00:00
orbiter
be58135b3e possible fix for deadlock in search execution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-29 07:50:37 +00:00
orbiter
0241d070bc added concurrency to indexing process:
- the methods {parsing, semantic analysis (condensing), structure analysis (web structure)} in the serialized indexing path had been made concurrent.
- four BlockingQueues handle concurrency and hand-over of the indexing objects, the last object in the queue is stored into a blockingQueue of maximum size 1 to serialize the process for storage (which uses IO and therefore here should not be deserialized)
- a concurrency of (CPUs + 1) is default. Single-CPU users will profil from the change because large files cannot block the indexing process any more.
- removed the secondary indexing thread, which is superfluous now. Concurrency is default for all users.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-28 11:56:28 +00:00
orbiter
bca87f1e38 - refactoring of serverThreads: renaming to distinguish busy-threads and blocking-threads
- added blockingThreads which are threads that are not driven by pause times but by BlockingQueue lookup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-27 12:03:16 +00:00
orbiter
968c775025 - preparation of parsing/indexing queue for concurrent execution
- remote crawl receipts are now transmitted concurrently in separate threads (makes remove crawls much faster!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 22:43:38 +00:00
orbiter
9b0e20fb06 next refactoring step in document indexing to prepare concurrency environment for document parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4604 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 19:51:05 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
8d6a13bc07 refactoring of parsing-condensing-indexing process:
- separated parts
- removed storagePeer function
next step will be parallelization of processes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-24 22:51:26 +00:00
orbiter
5aa96dbc36 fix for shutdown configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 13:14:57 +00:00
orbiter
93633abed8 - removed some debugging code from search process - should speed up now
- added some profiling code to search event - more time details in PerformanceSearch_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 00:55:04 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
433ff855f7 - fixed another concurrency problem in collection sorting
- fixed a typing problem that was introduced in svn 4579 and caused the crawl monitor to fail

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4585 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 23:47:24 +00:00
danielr
7008a218b3 avoid ConcurrentModificationException in plasmaCrawlerQueues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-17 13:51:56 +00:00
orbiter
7150b463ff changed handling of default values and database paths:
- the default files yacy.init and for the network definition is now moved to the path defaults
- the httpProxy.conf is renamed to yacy.conf
- the DATA/INDEX/PUBLIC is renamed to the actual network nickname, which should be freeworld or sciencenet
more menu entries
- added apfelmaennchens alternative search page to the menu
- added the new thread dump page to the server log menu point as submenu
modifications
- modified the thread dump page: sorting by thread type

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-16 22:31:54 +00:00
danielr
f51bad8ae5 FTP:
- report connection status (to break if no connection possible)
- fixed isFolder()
- additional error output
- fixed paths with encoded symbols (ie. a%20file.txt)
- refactoring


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 21:57:55 +00:00
orbiter
9c989fe5f7 fixed deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4562 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 00:49:16 +00:00
danielr
c565906050 FTP:
- added maxFileSize-check
- added timeout for download
- fixed dirlist (when all filenames have spaces, change to absolute links)
- enhanced isFolder()
- make sure data connection is closed, so a new can be opened
- refactoring


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4561 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 16:28:27 +00:00
danielr
1a7870df0d FTP: source cleanup (added finals, indention for easier diffs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4559 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 12:35:53 +00:00
orbiter
fa1090113d - next try to fix the networking problem:
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 00:16:04 +00:00
orbiter
b664a53553 fix for NPE during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4552 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 15:30:26 +00:00
orbiter
b4ed937f1e - modified zone navigation (does still not work correctly)
- added dht switch in network definition
- 0.574

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4550 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 11:09:38 +00:00
orbiter
8d0470a5c6 new method to compute search history IDs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-10 23:40:56 +00:00
orbiter
9eddc1506b - one try to fix the httpd problem
- fix for handling of collection index that appears when removing elements
- added another navigation method (stub, not working yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:58:22 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
danielr
6788f8f7c1 fixed error 'FTPC cannot change directory'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-06 11:59:23 +00:00
orbiter
bfed9c2da6 - some refactoring in search process
- separated sidebars in new search interface and placed them in their own files
  which can be put in into the search page like plug-ins

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4529 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 21:46:55 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
danielr
fbe335db73 consistent use of de.anomic.server.serverMemory to get information about memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4522 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-02 15:42:50 +00:00
orbiter
8c06436c4a removing the error-db upon each time a start-up is made.
This is necessary because the table uses a lot of RAM and the content is never re-used after Start-Up.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-01 09:44:33 +00:00
orbiter
4fdf695064 - fixed a bug in remote search that prevented that any results had been generated (!)
- added a great number of printStackTrace and new exceptions that shall be used to find the cause
  for a bug in yacy client-server communication which causes the interruption of data transfer
  which then causes the parser bug for the seed strings.
- tried to fix the communication bug on server-side (copy functions)
Be aware that the log may be full of errors and bugs - there should not be more bugs but there is more to see


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 23:12:43 +00:00
orbiter
1dce2f1079 more multithreading support:
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
  the very basic kelondroRowCollection which supports sorting with a second thread
  in case that a double-core processing CPU is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 15:16:47 +00:00
orbiter
253a453413 removed possible synchronization deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 14:05:43 +00:00
orbiter
c48e25d784 - fixed selection box for topwords
- fixed parser detail in condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-25 19:00:11 +00:00
orbiter
87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags
- enhanced text parser (condenser): found and eliminated bad code parts; increase of speed
- added handling of image preview using the image cache from HTCACHE
- some other minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4507 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-25 14:08:15 +00:00
orbiter
a7abee3578 - fixed some data types in new search stack
- added image domain presentation to image preview
- added new search page to menu
- added automatic re-search when an old search profile is requested and a crawl is ongoing,
  to fetch newly crawled entries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 23:40:38 +00:00
orbiter
81687b6bd5 added missing hachCode computation for previous feature
this solves also the missing image double-check fetaure!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4500 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 15:37:46 +00:00
orbiter
bedd8dfbe2 - added image sorting by image size. This is the default now.
This is performed using a 3-stage sorting process:
  - sort by relevance, then do snippet-fetch
  - sort snippets by relevance then do image link extraction
  - sort image links by image size; unknown sizes are handled like small sizes
- only the exact amount of images as requested are shown

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 14:53:51 +00:00
orbiter
727feb4358 - fixed some bugs in ranking computation
- introduced generalized method to organize ranked results (2 new classes)
- added a post-ranking after snippet-fetch (before: only listed) using the new ranking data structures
- fixed some missing data fields in RWI ranking attributes and correct hand-over between data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 10:06:57 +00:00
orbiter
f4c73d8c68 - fixed highslide usage
- some enhancement to index management, better types

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 14:13:35 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
3441ec3928 - some small changes to highslide integration to get it working... (does not work yet)
- performance enhancement for url list parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4495 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-18 23:49:03 +00:00
orbiter
61a81820e3 - refactoring of search tracker
- added link to search history to repeat the search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 23:35:48 +00:00
orbiter
4079c38ce0 - probably slightly better default ranking
- added experimental right column to new search page (no function, only container)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 12:13:38 +00:00
orbiter
ff5969901c modified dir servlet to cooperate with intranet indexing from the own HTDOCS repository:
- removed md5 file generation (spoils the won repository)
- removed comments in file share (was never used)
- moved dir list comparator to other place (maybe solves problem, lets see)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-15 13:12:25 +00:00
orbiter
7f445f34a6 bitte die Java 5 - typischen Warnings einschalten!
(unboxed-Fehler wies auf Programmfehler hin und Typangabe fehlte)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-10 22:50:09 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
159aaf8889 re-introduced global search limitation when index receive is switched off
this was necessary because othervise robinson peers did also global searches, which cannot be a wanted effect

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-06 20:29:22 +00:00
orbiter
efd5807a7c - some renaming of variables to support DC
- initial 120mb RAM for fresh peers
- release 0.57

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4445 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 22:58:40 +00:00
orbiter
ff6b69b37e fix for NPE in access tracker
fix for NPE in word index


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 21:47:27 +00:00
orbiter
42c1e11f2b added another link double-check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 12:40:40 +00:00
orbiter
a5d388bfff fix for HTCache organisation that may have caused unlimited grow of the cache
appeared only for tree-caches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 11:21:50 +00:00
orbiter
96c5e6acc7 added a double-check for search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:55:21 +00:00
orbiter
a1e9e6e2e6 fix for search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:23:04 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
08a12e9bb5 - removed dashed line from default skin (looks much better!)
- better timing when displaying results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 11:30:47 +00:00
orbiter
89169d54fd fixed search result preparation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 00:16:00 +00:00
orbiter
acf771d5e1 - fixed bug with too much RAM in crawler queue
- fixed dir bug
- better calculation of TF for join
- better waiting-on-result logic

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4424 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-31 23:40:47 +00:00
orbiter
a8a5df4a51 - more dublin core naming of page metadata
- better presentation of result counters in search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 21:58:30 +00:00
orbiter
fa3b8f0ae1 fixed bug in remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 00:15:43 +00:00
orbiter
7d875290b2 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 22:13:30 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
orbiter
0f5c4abaca more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 10:12:48 +00:00
orbiter
974fea7933 added term-frequency ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4413 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 23:41:39 +00:00
orbiter
1a296af6ff more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 20:08:32 +00:00
orbiter
4a80902081 - added ViewProfile as rdf in foaf syntax
- added link to rdf and vCard version on html page
- can be seen on http://localhost:8080/ViewProfile.html?hash=localhash
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 18:21:08 +00:00
hermens
d177ceb3b3 Fix for growing responseHeader[12].db when using proxyCacheLayout = hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 21:56:25 +00:00
orbiter
2485681002 added termination control for RotateIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 11:44:27 +00:00
orbiter
e2e7f065e9 minor fixes, some generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 23:58:18 +00:00
orbiter
15397298dc - refactoring of indexControlRWIs: moved statics to own class; better Dublin Core naming
- fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=759&hilit=&p=4866#p4866
- some bugfixes in EcoTable according remove method
- switched more tables to Eco: crawl Profiles, htcache, seeddb, newsdb

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 22:49:00 +00:00
orbiter
db25425893 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 23:08:32 +00:00
orbiter
0b4205eb5a - fix double-deletion in eco tables
- changed behaviour of sort moment (not during a get)
- added some asserts in snippet cache for debugging

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 11:13:39 +00:00
orbiter
002a109c4d patch for http://forum.yacy-websuche.de/viewtopic.php?p=4597#p4597
(urls that have no protocol but start with www will be treated as http://www...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:49:26 +00:00
orbiter
85dc62c16f refactoring: more dublin core - compliant naming
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 19:03:47 +00:00
orbiter
efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
- refactoring of plasmaParserDocument to use Dublin Core - compatible property names
- redesign of url handling in parser and condenser (less String-to-yacyURL conversion)
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 11:51:43 +00:00
orbiter
2f3b2f3481 - extended dbtest for comparisment tests
- added initial space option for eco tables
- used initial space value in initialization of collectionIndex, this should avoid OOM failures" /Volumes/Magneto/dev/workspace/trunk/source/dbtest.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroCollectionIndex.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroDyn.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroEcoTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroRow.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroSplitTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlBalancer.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlStacker.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlZURL.java
- added index consistency check (checks for double-occurrences of primary keys in file)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 21:42:35 +00:00
orbiter
9eb746863d interface enhancements for eco records memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 01:51:02 +00:00
orbiter
58a1f518f8 fixed some problems with eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 12:23:56 +00:00
orbiter
d4d07802ac better RAM protection using eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 01:50:24 +00:00
orbiter
f4e9ff6ce9 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 00:40:19 +00:00
orbiter
cbefc651ac more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-18 18:43:56 +00:00
orbiter
45339c3db5 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-18 17:14:02 +00:00
orbiter
94f21d9403 activated new kelondroEcoTable file structure.
This data structure replaces almost all files in the PLASMA directory
also the collection.index and the LURL-db will be created as Eco-DB, if it does not exist before
existing Flex-databases will be used as they are (the is no data lost)
If you want to force the creation of a Eco-collection.index, simply delete the old index.
The Eco file system will only be used if there is enough memory.
The collection.index RAM limit is 200MB, if you have less, a flex-Table is createt.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 21:48:08 +00:00
orbiter
a0f7f2faad some more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4338 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 18:43:01 +00:00
orbiter
dc26d6262b - removed write buffer from kelondroCache (was never used because buggy; will now be replaced by new EcoBuffer)
- added new data structure 'eco' for an index file that should use only 50% of write-IO compared to kelondroFlex
The new eco index is not used yet, but already successfully tested with the collectionIndex
The main purpose is to replace the kelondroFlex at every point when enough RAM is available.
Othervise, the kelondroFlex stays as option in case of low memory (which then can even use a file-index)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 12:12:52 +00:00
orbiter
dbdec0f4d3 another fix for the "too many processes in loader queue, dismissed" - problem:
this was probably caused by http-forward cases; which are cases when urls from the loader queue change
and it was not possible to remove the old urls from the queue because they had been based on url hashes.
The queue is now again stored using the entry.hashCode, which does not change ieven if the url changes.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-13 23:10:09 +00:00
orbiter
065ba2d60f fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=719&hilit=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-13 00:21:47 +00:00
borg-0300
3cab85158c update for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:41:45 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
71bcf02d3a - removed pro-version (is the same as standard version, use the standard instead)
- changed yacy logo
- removed crawlOrder protocol (unused)
- removed file index in kelondroFlex (will not work, it takes too long to maintain)
- fixed remoted crawl for clusters (now denies remote crawls from peers outside cluster)
- 0.562

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 23:05:52 +00:00
orbiter
ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
- more generics
- removed unused commons classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-08 20:12:31 +00:00
orbiter
3e3d2e39a4 - some refactoring and redesign of kelondroBytesIntMap (created new class kelondroRAMIndex)
- more generics
- preparation to extend the balancer for flexible forced delay times
- set different random-access type, should now omit update of metadata in file and could be a bit faster (lets see)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-07 22:36:48 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
orbiter
df2a7a8ac8 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 18:47:45 +00:00
orbiter
9d8b17188a more generics, bugfixes for wrong cast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 03:39:36 +00:00
low012
b08f877e97 *) tried to get rid of warnings when compiling parsers (http://forum.yacy-websuche.de/viewtopic.php?t=660)
lots of warnings are gone, new one in htmlFilterContentScraper


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4293 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 22:37:02 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
orbiter
db0d3d5e54 release 0.56 (and some last fixes)
- fixed bad peer hash computation in case no peer list is avaiable upon first startup
- security minimum waiting time in search result preparation
- removed dead superseed link

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-22 02:58:38 +00:00
fuchsi
d517e96714 last cleanup bits to serverDate before the release. only safe refactoring (method renaming) changes outside of serverDate.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-21 00:53:46 +00:00
orbiter
52dd015218 new release strategy: the standard release is now built the same way as the pro release
a new release type was added: 'embedded' which is the same as the current standard release was
this will not have any effect to the next release 0.56, which will still a pro-release on public download
the transition the the new release strategy must be done now to enable automatic update by the updated in future releases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-20 02:46:41 +00:00
fuchsi
33ee6745f6 more cleanup in serverDate
- remove direct accesses to SimpleDateFormat fields in serverDate and use the static parse... methods instead
- remove nowDate() as a Date doesn't store timezone information and a new Date() is always faster
- default formatter methods use a GMT timezone by default now, this is important for interchangability as some date formats we use don't include a timezone offset.
- continued renaming and rearanging (formatter) methods. all should follow the general naming scheme formatWHAT(...)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-19 19:39:19 +00:00
fuchsi
a52681dd49 add buffering for the performance graph to avoid ConcurrentModificationException
closes: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=628

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-18 15:59:35 +00:00
orbiter
814aff60bd - (re-)activated ftp protocol. see discussion here: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=623&hilit=&p=3875#p3875
- set default-flushsize for pro to 500

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-18 00:14:44 +00:00
fuchsi
21b8d1b918 small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-14 19:17:54 +00:00
orbiter
270d016d89 fix for missing anonymization in search profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 18:57:43 +00:00
orbiter
e3e4f06be4 enhanced search result preparation in the case that no result is found (fast abandon of search)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 14:18:42 +00:00
orbiter
01554f4012 fixed bug with double-check in crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 01:32:25 +00:00
orbiter
b1e08d354c repaired indexing after search snippet loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 00:33:26 +00:00
orbiter
48138952ff added memory measurement for index recreation to avoid OOM during index RAM space extension
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-11 15:07:03 +00:00
orbiter
9e23acf2d6 introduced new 'authority' ranking property
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-11 01:32:58 +00:00
orbiter
2954f96fae - removed public peer info box on status page, this info can now be seen in the status banner
- added peer count to banner
- added some values to protected status box

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4257 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-08 01:39:59 +00:00
low012
4eb40c4f61 *) added 2 filters: blur and antialiasing (which in fact is nothing more than a mild blur) to ymageMatrix
*) antialiasing is used for logo in banner


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-07 22:51:13 +00:00
orbiter
aeb1cf83a6 - corrected banner link (relative now)
- changed color mode (replace) for banner
- changed default color (fits to default skin) of banner in status

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4255 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-07 21:25:36 +00:00
orbiter
e22014dc83 some memory enhancements when generating and displaying ymage objects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4253 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-07 02:15:12 +00:00
orbiter
f243e338cf implemented online caution also for local and remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-06 21:53:17 +00:00
orbiter
c57eb76b13 removed CMY color model from ymage classes and re-introduced RGB color model
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-06 01:06:17 +00:00
orbiter
b46bcaa5d8 changed method of profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-04 20:19:13 +00:00
low012
76cd6ed6f6 *) New methods to insert bitmaps that feature transparencies.
*) Logo background is transparent now. (Using pixel at (0,0) to determine which color is transparent. Too dirty?)
*) Logo is loaded through filesystem instead of HTTP now.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-04 19:45:50 +00:00
orbiter
be214e594f - generalized ymage initialization options
- auto-adoption of performance memory graph to needed dimension

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4246 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-03 02:35:28 +00:00
low012
ee8a177c26 *) Logo is in the middle of free space now.
*) Fixed bugs in insertBitmap()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-02 21:20:11 +00:00
low012
72698fcd36 *) Banner features a logo now. It does not look nice, but at least it works. Banner is not finished yet.
Which path do I have to set for IMAGE (htroot/env/grafics/yacy.gi) if I want to load it through the file system and not via HTTP?


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-02 20:37:12 +00:00
orbiter
aefb3f7765 added memory graph picture to PerformanceMemory_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-30 03:22:42 +00:00
orbiter
9b0ae4b989 added referrer to remote crawl url list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-29 13:58:00 +00:00
orbiter
7d5544e9b1 added some security checks to new remote crawl pull method to prevent that indexer is overloaded
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-29 02:54:59 +00:00
orbiter
89b9b2b02a redesigned remote crawl process:
- instead of pushing urls to other peers, the urls are actively pulled
  by the peer that wants to do a remote crawl
- the remote crawl push process had been removed
- a process that adds urls from remote peers had been added
- the server-side interface for providing 'limit'-urls exists since 0.55 and works with this version
- the list-interface had been removed
- servlets using the list-interface had been removed (this implementation did not properly manage double-check)
- changes in configuration file to support new pull-process
- fixed a bug in crawl balancer (status was not saved/closed properly)
- the yacy/urls-protocol was extended to support different networks/clusters
- many interface-adoptions to new stack counters

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-29 02:07:37 +00:00
fuchsi
69521d92e5 Add another external dependency from PDFBox package ("Bouncy Castle"). This is necessary for parsing of some encrypted PDF files.
bcprov-jdk14-132.jar is the binary jar as it is provided in the PDFBox-0.7.3 package (same as our FontBox, PDFBox packages).

Closes: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=453


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-27 23:13:26 +00:00
orbiter
90a02990d2 NPE fix, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=549&hilit=&p=3383#p3383
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4230 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-23 09:26:35 +00:00
orbiter
2fcd18a972 - fixed bad behaviour of search event worker processes
- fixed export of url lists in xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-23 01:08:16 +00:00
orbiter
445c0b5333 added domain list extraction and html export format
to URL administration menu http://localhost:8080/IndexControlURLs_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 20:47:06 +00:00
orbiter
d8d77fc4b2 fix for NPE, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=549&hilit=&p=3368#p3368
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4227 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 18:15:28 +00:00
orbiter
bf6952abe7 - added url export to http://localhost:8080/IndexControlURLs_p.html
- removed command-line option to export urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4226 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 16:52:44 +00:00
orbiter
af10f729df fixed image search and favicon loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 01:34:29 +00:00
orbiter
c48b73cda2 redesign of ranking data structure
- the index administration now uses the same code base for url selection and collection
  as the search interface. The index administration is therefore a good test environment for
  ranking order control
- removed old postsorting-algorithms, will be replaced with new one
- fixed many bugs occurred before during ranking; especially the contraint filtering method
  removed too many links
- fixed media search flags; had been attached to too many urls. The effect should be a better
  pre-sorting before media load within snippet fetch

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-21 23:14:57 +00:00
orbiter
6f1308da2f - some enhancements to IndexControlURLs (shows more links, connects referrer to another query)
- some refactoring to search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-17 01:53:02 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
bd5673efbe added cleaning of search event before opening the index administration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-15 12:49:13 +00:00
orbiter
55da871211 preparations for better ranking: better debugging of index properties
to do this, the index administration interface was extended.
It is now possible to select parts of a index.
See properties shown in interface after a word search for details.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-15 03:03:18 +00:00
orbiter
3491531cea - fixed 'appears in url' flag in index generation
- extended index administration page, shows some properties to the web links now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-14 01:15:28 +00:00
orbiter
bc2368e907 fix for problem with remote crawl referrers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 16:32:50 +00:00
orbiter
875096552f fix for NPE in case that remote search results are empty
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 15:54:50 +00:00
orbiter
0abf33ed03 - tried to remove deadlock
- enhanced searchtime in kelondroRowSets
- enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 01:14:51 +00:00
low012
a4010f7dc8 *) fixed bug where dots were added after numbers < 1000: "123" was transformed to "123." which is undesirable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4206 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-11 21:42:50 +00:00
orbiter
ecba35de72 enhanced computing speed of kelondro core function: sorting
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.

For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 00:51:38 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
borg-0300
a5d28785b1 less OOM (works for me)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4194 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-02 14:55:46 +00:00
orbiter
ccbfb15b6b enhancement to crawl stacker enqueue order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 00:57:32 +00:00
hermens
35cf196204 transferRanking(): Do not flush more ranking files than requested by caller.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4189 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 15:55:52 +00:00
hermens
8f9d65da67 Small corrections to dhtFlushControl()
- Test wCacheMaxChunk against maxURLinCache(), not getMaxWordCount(). This triggered a flush everytime dhtFlushControl() was called.
- If triggered, flush at least 1 entry.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 14:21:58 +00:00
orbiter
55c87b3b12 changed behavior of crawl stacker
- final flush only when tabletype = RAM
- prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100
- number of maximun entries in stacker is configurable in yacy.init (stacker.slots)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 11:32:40 +00:00
orbiter
4fefa53135 removed parser object pool, see also svn 4106
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 12:14:18 +00:00