Commit Graph

267 Commits

Author SHA1 Message Date
orbiter
863065abc4 added user agent logging to access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-18 08:09:59 +00:00
orbiter
ed4371dcf3 enhanced navigation implementation and enhanced tag cloud computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 23:45:12 +00:00
orbiter
ca738ac924 - added a tag cloud to search results (using the topics)
- some refactoring of score classes
- added default package for new classes add_ymark and delete_ymark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 22:01:39 +00:00
orbiter
e4d561971e added more score cluster options and made score cluster usage more transparent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-14 11:40:02 +00:00
orbiter
b7acd92ce4 Auto-Suggestions for YaCy Search:
- added a suggest servlet according to opensearch and firefox standard
- integrated the suggest servlet into opensearch description file
- integrated a autocomplete plugin for jquery
- added a autocomplete addition to the yacy search windows showing autosuggest queries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-12 01:23:49 +00:00
orbiter
fcd40cd30f - disabled domZones (buggy, must think about better solution)
- increased time-out for dns resolver and isLocal property

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7233 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-09 10:17:50 +00:00
orbiter
0d363a94d7 more performance hacks
this makes YaCy search results VERY fast for all verify=false search cases
and it enhances the search speed also for all other snippet-fetch cases.
With this change my peer performed 100 Queries Per Second (!!!) while doing 10 queries simultanously (!!!)
in an intranet index of 20000 URLs on my 16-core Mac

Check this yourself by doing:
cd bin
./searchtestmulti.sh
after finishing the run, divide 1000 by the given time per query (which is the qps for one thread)
and then multiply again by 10 (because 10 search threads has been started)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-09 08:55:57 +00:00
orbiter
b8aee6d402 performance hacks for better search performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7230 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-08 23:50:28 +00:00
orbiter
091dd3f6ec - enhanced intranet search speed
- enhanced intranet portscan speed (better time-out)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7227 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-08 10:54:13 +00:00
orbiter
6e6994e328 latest bugfixes to search and indexing function after test of demo presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-05 17:49:53 +00:00
orbiter
aacf572a26 - enhancements for search speed
- bug fixes in many classes including basic data structure classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-04 11:54:48 +00:00
orbiter
2c549ae341 fixed a number of small bugs:
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 23:57:58 +00:00
orbiter
e63896f2a8 added an intranet scanner and a servlet which shows all intranet addresses and an option to start a site-crawl for all these addresses at once.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-28 12:18:54 +00:00
orbiter
e54cb7fb0c more bugfixes (also for latest commit)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7202 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-28 10:20:46 +00:00
orbiter
d2fd93135c - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed
- migrated the 'yacy' user agent to 'yacybot' in many client methods since the 'yacy' user agent is only used for the proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-27 14:54:32 +00:00
orbiter
a83186ac7d fix for bug in cytrails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-26 10:32:40 +00:00
orbiter
48c0d508ac fixes for crawling of smb links (file length not always available)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7190 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-25 22:32:26 +00:00
orbiter
10a9cb1971 simplified snippet computation process and separated the algorithm into two classes
also enhances selection criteria for best snippet line computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-22 20:50:02 +00:00
orbiter
84a023cbc8 fixed several search bugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-21 21:48:42 +00:00
orbiter
97ee278931 enhanced search speed:
- better control of number of running search threads
- no time-out waiting time when no ranking feeding takes place
- local search queries by a remote peer may be faster up to 300 milliseconds
- a local search may even be faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 13:17:25 +00:00
orbiter
29fe401f93 - some layout and text enhancement for site crawl start
- Quix0rs patch from http://forum.yacy-websuche.de/viewtopic.php?p=20839#p20839 (parts)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 23:00:07 +00:00
orbiter
461a2a6ec7 enhanced remote crawling:
- 300 ppm is default now (but this is switched off by default; if you switch it on you may want more traffic?)
- better timing for busy queue
- better amount of remote url retrieval
- better time-out values
- better tracking of availability of remote crawl urls
- more logging for result of receipt sending

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 09:34:17 +00:00
orbiter
34e2f7f487 enhanced snippet fetch strategy: concurrent snippet fetch even for offline-snippet searches. This improves speed since it is now possible to fetch snippets offline and parsing of source files from the htcache can be enhanced using concurrency. This improves local and remote search.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 21:09:14 +00:00
orbiter
0cf006865e refactoring and enhanced concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 11:38:03 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
14c843d364 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7148 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 15:00:34 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
orbiter
7ebef56add - redesign of a part of the remote search client to make it possible to have a test environment for remote search performance tests
- added a remote search test main methods in yacyClient

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7146 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 13:35:47 +00:00
orbiter
3c0e07ba72 removed all delays in shutdown process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 09:13:28 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more)
- fixed bug in WeakPriorityBlockingQueue
- show favicon faster using pre-loading (same technique as used for fast image search)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 23:25:19 +00:00
orbiter
570ca577c6 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 22:42:54 +00:00
orbiter
348dece62f redesign of the SortStack and SortStore classes:
created a WeakPriorityBlockingQueue as special implementation
of a PriorityBlockingQueue with a weak object binding.
- better abstraction of ordering technique
- fixed some bugs according to result numbering (distinguish different counters in Queue)
- fixed a ordering bug in post-ranking (ordering was decreased instead of increased)
- reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 15:30:25 +00:00
orbiter
114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
- removed old parser
- removed old importer framework (was only used by removed old parser)
- added a new sitemap parser in parser framework
- linked new parser with parser access in old sitemap processing routines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 14:13:15 +00:00
orbiter
5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
- added some clear statements that shall clear static cache size within the pdfbox library
- the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-07 17:13:47 +00:00
orbiter
c757a4aa9f - corrected lifetime computation for search events
- made search event cache cleanup concurrent because cleanup may cause index modifications

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 16:05:19 +00:00
orbiter
fb828f3767 - performance enhancements in search response time using faster query ID computation and an ID cache
- code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 10:00:07 +00:00
orbiter
e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 11:13:12 +00:00
orbiter
9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
with this fix search results are (even) faster.
Also enhanced: image search. This is now speeded up using a image search result look-ahead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-04 22:57:12 +00:00
orbiter
5de70c3d7c changed way of storage for search requests:
- the search request cache can now get as large as 1000 entries
- if more entries arrive, unused are deleted
- the elements may stay in the cache up to 10 minutes and longer if they are used
- the elements are deleted earlier that 10 minutes if the memory gets low
This commit was mainly done for metager-feeding peers that have a query load of 50000 queries each day. Also added:
- a monitor for cache hit/cache miss in PerformanceMemory_p.html (see at bottom of page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 21:52:45 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
orbiter
4f22e2df41 bugfixes for
- next-execution-time in scheduler
- deletion of scheduled rss feed loading (now deletes also the scheduling entry)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:42:00 +00:00
orbiter
42414a6ae3 added two more tables in rss reader interface:
- fresh recorded rss feeds (not yet loaded or in scheduler)
- rss feeds in scheduler
The first list has a button that can be used to place rss feeds into the scheduler
The second list has a button to delete rss feeds from the scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:01:45 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
7aa860c505 - more logging
- more stability for database heap in case of buffer failure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:16:05 +00:00