Commit Graph

79 Commits

Author SHA1 Message Date
orbiter
78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
This change should affect filtering and navigators and should cause that search navigation gets faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 13:05:51 +00:00
orbiter
8f11d3a5bb redesigned the ScoreMap classes:
- new concurrent score map using atom operation from java concurrency classes
- redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap
- switched from DynamicScore to ConcurrentScoreMap usage wherever possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-13 01:41:44 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
5905f912c5 replaced more double types with float
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7462 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:22:00 +00:00
orbiter
89ae6101b9 fix for NPE and added comment in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 14:51:07 +00:00
orbiter
6b70393d1d - new java version 1.6
- replaced old gif animator by java 1.6 gif animator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-20 22:51:50 +00:00
orbiter
9b25a33fd9 - fixed numerous bugs
- better document names
- fixed problem with ftp crawling
- added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-17 17:30:09 +00:00
orbiter
db99db4be9 some redesign of the search-fail-response mechanism:
when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-06 14:34:58 +00:00
orbiter
18d33b5c6d fixed several search result navigation bugs
fixed bad behaviours during search result collection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-05 23:54:00 +00:00
orbiter
49b5a206cd - better caclculation of search result size
- predefined search recommendations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-02 12:19:59 +00:00
orbiter
cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
 - for search results ordered by date (most recent up)
 /near
 - for search results where search words appear near to each other (closest up)
 /language/<lang>
 - for a sorting by language where the wanted language gets up. Example: /language/de
  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 18:08:20 +00:00
orbiter
a9f754c45f removed unused CR accumulation and distribution process
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 11:07:42 +00:00
low012
9b3fae9496 *) cleaning up the code a little bit
*) program to interface, not implementation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-28 02:57:31 +00:00
orbiter
ed4371dcf3 enhanced navigation implementation and enhanced tag cloud computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 23:45:12 +00:00
orbiter
ca738ac924 - added a tag cloud to search results (using the topics)
- some refactoring of score classes
- added default package for new classes add_ymark and delete_ymark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 22:01:39 +00:00
orbiter
fcd40cd30f - disabled domZones (buggy, must think about better solution)
- increased time-out for dns resolver and isLocal property

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7233 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-09 10:17:50 +00:00
orbiter
091dd3f6ec - enhanced intranet search speed
- enhanced intranet portscan speed (better time-out)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7227 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-08 10:54:13 +00:00
orbiter
6e6994e328 latest bugfixes to search and indexing function after test of demo presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-05 17:49:53 +00:00
orbiter
aacf572a26 - enhancements for search speed
- bug fixes in many classes including basic data structure classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-04 11:54:48 +00:00
orbiter
84a023cbc8 fixed several search bugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-21 21:48:42 +00:00
orbiter
97ee278931 enhanced search speed:
- better control of number of running search threads
- no time-out waiting time when no ranking feeding takes place
- local search queries by a remote peer may be faster up to 300 milliseconds
- a local search may even be faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 13:17:25 +00:00
orbiter
29fe401f93 - some layout and text enhancement for site crawl start
- Quix0rs patch from http://forum.yacy-websuche.de/viewtopic.php?p=20839#p20839 (parts)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 23:00:07 +00:00
orbiter
34e2f7f487 enhanced snippet fetch strategy: concurrent snippet fetch even for offline-snippet searches. This improves speed since it is now possible to fetch snippets offline and parsing of source files from the htcache can be enhanced using concurrency. This improves local and remote search.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 21:09:14 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
14c843d364 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7148 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 15:00:34 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
570ca577c6 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 22:42:54 +00:00
orbiter
348dece62f redesign of the SortStack and SortStore classes:
created a WeakPriorityBlockingQueue as special implementation
of a PriorityBlockingQueue with a weak object binding.
- better abstraction of ordering technique
- fixed some bugs according to result numbering (distinguish different counters in Queue)
- fixed a ordering bug in post-ranking (ordering was decreased instead of increased)
- reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 15:30:25 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
b18a7606a0 some performance hacks and fixed after reading dump in
http://forum.yacy-websuche.de/viewtopic.php?p=19920#p19920

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:37:36 +00:00
orbiter
64f29f990e a collection of performance hacks and code cleanup:
- removed usage of URL-Caches which could have been a memory leak
- removed unused classes and methods
- removed not necessary synchronizations
- added synchronization hacks where possible
- fine-tuned crawling speed to prevent IO of balancer
- fixed a bug in IODispatcher that may have caused that no merges were done
- reduced number of parameters in very often called methods (compare methods)
- reduced complexity of data structures of now massively used HandleSet class
- reduction of new String() and getBytes() usage / new methods to support this transition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-19 16:42:37 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
6c093d6aed - enhanced domain navigator computation
- fixed domain navigator content in case that a mustmatch constraint was given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
f561e340c6 show more results of single domains when not authorized fully (up to 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 00:12:58 +00:00
orbiter
884b262130 - added a new Wiki Namespace Navigator
- some redesign of Navigator data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:25:49 +00:00
orbiter
7fdf59a77f misc NPE check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 15:59:24 +00:00
orbiter
5d930c96f0 more fixes to search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 00:04:37 +00:00
orbiter
8c520f128d reverted a change in ranking process committed this afternoon
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 20:56:37 +00:00
orbiter
18172451a0 better search computation:
- increased sort limit, now 3000 entries, before: 1000
  this should cause that more results can be shown in case
  of strong limitating constraints, like domain navigation
- enhanced the sort process
- check against domain navigator bugs
- fix in sort stack
- showing now all naviagtion pages at first search (not only next page)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6569 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 15:01:44 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
orbiter
bb2e03761c - fix for deadlock with 100% CPU during search
- fix for failure of ranking because of a ConcurrentModificationException

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6553 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 12:41:43 +00:00
orbiter
a37878b7d5 url parser regex performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-10 14:40:32 +00:00
orbiter
8281e29963 - more configuration for profiling graph (number of events)
- more logging for a shutdown: print reason and accessing IP into log


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-08 14:25:51 +00:00
orbiter
4782d2c438 fix for search bug that appeared when looking at page 3 of results or further
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6515 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-03 12:25:03 +00:00
orbiter
29fde9ed49 better control of ranking order in sort stack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-03 00:36:07 +00:00