Commit Graph

3820 Commits

Author SHA1 Message Date
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
daburna
66905b7c97 update cause of the new searchpage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-04 10:03:10 +00:00
orbiter
e90afa9483 fixed search access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-04 09:04:47 +00:00
orbiter
4779f314fe first version of next-generation search interface:
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-03 23:43:55 +00:00
orbiter
34858be5ef added option to simple crawl start: complete domain crawl
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-02 19:55:14 +00:00
michitux
d2360eaf68 - removed enctype="multipart/form-data" from the get-form of the peer-selection in Messages_p.html (in Konqueror this didn't work and multipart/form-data is only for post)
- removed name="searchresults" from the searchresults (seems to be no longer needed and is invalid)
- moved the favicons in the searchresults to the left side as requested in http://forum.yacy-websuche.de/viewtopic.php?f=5&t=268
- added alt-attributes for the favicons (images must have alt-attributes to be valid)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-01 23:43:08 +00:00
low012
0e27febe47 *) fixed more links
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-01 11:24:23 +00:00
low012
01ac8c8f6a *) fixed dead link
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-01 09:15:55 +00:00
low012
a493bd88b6 *) updated a few links
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-29 16:19:00 +00:00
orbiter
6d759ad0a7 - new bot address
- removed unused skins

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-29 11:46:42 +00:00
orbiter
71e5d24f4a fix for watch crawler, see http://forum.yacy-websuche.de/viewtopic.php?p=1771#p1771
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-28 12:20:19 +00:00
orbiter
f9e6cf6a3d more refactoring of search:
integrated first version of ssi-using search interface,
but the function is currently disabled


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-28 12:15:46 +00:00
orbiter
f81ef40cc4 no dht activity for small networks; this is not needed if the network is small
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 22:35:26 +00:00
orbiter
d9472b6a3a * fixed problem with watch crawler
* added new column to network table (remote crawl urls):
  the new value for provided URLs will be used for new remote crawl method


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4061 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 22:06:58 +00:00
orbiter
e332b844b2 - enhanced remote search: during waiting time for remote crawls
some urls are fetched so the url cache can be filled with these urls
- the url-prefetch is used to sort out some unresolved urls
- the snippet-fetcher is triggered with the search event id. This is used
  to remove missing snippets from the search cache so they will not be displayed again


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 18:18:35 +00:00
orbiter
a34d9b8609 * added a search history cache that maintains search results for 10 minutes
it is necessary for the new search process that will do automatic re-searches
a positive effect is, that when a re-search is done it can be monitored how many
results had been contributed from other peers. The message for this contribution
was moved from the end of the result page to the top.
* enhanced re-search time when a global search was done an the local index has
already a great number of results for this word
* re-organised presearch computation; must be further enhanced

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 23:12:59 +00:00
orbiter
ae86d010bb more refactoring of search processes; also some small speed enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 08:41:52 +00:00
orbiter
b3c830271c fix in xml header
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-23 16:58:40 +00:00
orbiter
bb426565f0 added new yacy protocol for mass url-pull for better remote crawling distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-22 00:59:05 +00:00
borg-0300
4f6d56330d Bugfix für abgeschnittene Überschriften - http://forum.yacy-websuche.de/viewtopic.php?f=6&t=273
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-21 22:07:23 +00:00
low012
54004e929b *) Better Bourne-Shell (OpenSolaris) compatibility, update and restart really work now. As the Bourne-Shell is the grandfather of most modern shells, it should also work with Linux (tested with Mandriva, works) and OSX (Please test!).
*) Fixed a typo.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 21:52:52 +00:00
orbiter
72752bb503 because of a new database structure handling, the memory need for accessing
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
   try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:36:43 +00:00
orbiter
9afd65bf82 small fixes: recommendation in performance queues and network unit domain
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:23:07 +00:00
orbiter
0ad8499e66 - all parsers are activated by default for pro releases
- slightly higher file size limits for parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 12:18:38 +00:00
low012
694defb257 *) better compatibility with OpenSolaris 5/07, updates should work now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-17 15:26:34 +00:00
michitux
3ea42f34bd - fixed a layout-bug in MessageSend_p.html (for details see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=255)
- fixed two bugs with hasLayout/procentual widths in InternetExplorer in MessageSend_p.html and Messages_p.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-16 21:34:29 +00:00
orbiter
16c203f759 fixed remote search access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4048 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-16 11:44:18 +00:00
orbiter
344911bfaa shorter minimum delay values for intranet crawl targets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 23:18:12 +00:00
orbiter
f890cc86aa inserted forwarding patch from fuchs
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=233

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 22:25:48 +00:00
orbiter
b5346141b3 made the plasmaHTCache static (there is only one internet, so we need only one cache)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 21:31:31 +00:00
daburna
aa9a4c1dea #update of de.lng and fra.lng
- fra: french translation taken out of the wiki
- de: small cleanup: removed or updated unsused strings
- de: added translation for CrawlProfileEditor_p.html and Supporter.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 20:51:51 +00:00
orbiter
947fc46904 refactoring of search process:
- re-designed remote request result processing
- re-designed local result accumulation, will be further enhanced with snippet fetcher
- removed search process handling in switchboad
- made snippet class static (there is no need for multiple snippet objects)
- removed some redundant tasks in server-side search process, should be a little bit faster now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 11:36:59 +00:00
orbiter
3ca8f71cbb refactoring of dbtest to create separated kelondro sql connector interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-11 22:40:24 +00:00
michitux
5cf634a4a4 New media-search ui:
- uses the progressbar introduced in the image-search
- results are displayed using the same layout as the text-search
- results are displayed in the order they arrive


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4041 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-11 22:20:01 +00:00
orbiter
61f93cbf14 some code-cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-11 00:42:04 +00:00
orbiter
e76e996737 fixed umlaute-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-10 14:10:57 +00:00
orbiter
4798044708 fixed compile problem with svn 4037
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4038 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-10 14:03:07 +00:00
orbiter
24e25e1141 enhanced SSI server-side support:
- SSIs may now refer to servlets, not only files
- calling a servlet, the servlet/SSI engine is called recursively
- SSIs now work also for non-chunked-encoding supporting clients
This will support the new search page functionality, to show search results
dynamically without using javascript. To test this method, a test page has been added
http://localhost:8080/ssitest.html
..calls dynamicalls 3 servlets, which produce some delays during their execution
please verify that you can see the result step-by-step on your browser
To implement this feature, some refactoring had been taken place, mostly code
had been made static and will execute faster.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-09 21:58:38 +00:00
low012
c8e5a4a6b7 *) fixed bug described by Huppi in http://forum.yacy-websuche.de/viewtopic.php?t=239
*) added a preview function to message system
*) removed some old comments, I hope that's OK


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-08 18:23:45 +00:00
daburna
f77898748b # update of de.lng
- small cleanup: removed or updated unsused strings
- added translation for IndexCreateWWWRemoteQueue_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4035 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-07 15:27:12 +00:00
daburna
6dd674bb53 # oustanding update for German translation file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4034 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 16:38:05 +00:00
orbiter
5c1b444690 some redesign of min/max and normalization computation during search result ordering
this saves about 1 millisecond for each URL reference, which has some good effect
on the search result computation if a word is searched that appears very often
(speed-up of 1 second and more)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 12:50:11 +00:00
orbiter
9678d1b282 fixed new EcoRecords-Nodes. Here I omitted object content copying before
to avoid massive System.arraycopy. That did obviously not protect enough the Node objects


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 10:10:33 +00:00
orbiter
1af0e3bd84 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 00:56:56 +00:00
orbiter
5605887571 refactoring of search processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-05 23:57:25 +00:00
low012
5dee7e9c29 *) addition to Rev 4028, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=204
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-04 08:26:53 +00:00
orbiter
d3e777a98d bugfix for built-bug, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=204&hilit=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-04 00:44:15 +00:00
orbiter
62347b50f4 added security layer for ViewImage:
- images may be requested by localhost and authorized users only, if the request is done using a clear-text URL
- the image may be requested also using a code that can be a license to retrieve a URL for everyone
- some servelets produce URL licenses for ViewImage, like image search results


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 23:06:53 +00:00
orbiter
9a860cf397 bugfix for wrong record tracker message
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 12:24:52 +00:00
orbiter
69d640b041 added missing synchronization in crawl balancer
to avoid that the synchronization is triggered during many-time-used size() operation
a notEmpty method was added that can avoid the synchronization many times

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 12:21:46 +00:00