Commit Graph

58 Commits

Author SHA1 Message Date
orbiter
8d14916c74 more patches for a better out-of-memory management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 01:45:11 +00:00
f1ori
59dea3a284 * implement url proxy, a proxy via the url http://peer:port/proxy.html?url=http://domain.tld/path
* enable with proxyURL = true
* could be useful to browse specific pages with proxy or use own improvements in proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7538 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 21:39:38 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
orbiter
d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 13:33:28 +00:00
orbiter
70ca7cec8c fix for http://forum.yacy-websuche.de/viewtopic.php?p=21763#p21763
and another fix for non-working global search when search options are switched off

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-03 10:43:09 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
f1ori
a321c7673d * adminAccountForLocalhost only for localhost
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 11:37:30 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
f1ori
2521677a45 * deny adminForLocalhost and intranet network setup also on bootup and not only on network switch
* require authentication for yacybot what ever adminForLocalhost is set to
  (after this patch, is the rule from above really nesseccary,
  the crawler also checks the robots.txt)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 21:39:02 +00:00
f1ori
9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 19:20:00 +00:00
orbiter
4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html
instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-08 10:50:23 +00:00
orbiter
7bb4b001ed - view image files from cache
- fixed generic header settings; affects CORS functionality

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-27 09:16:16 +00:00
orbiter
70c95608d4 Added CORS Access header for yacysearch.rss output
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-02 16:28:40 +00:00
f1ori
7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-26 16:10:20 +00:00
orbiter
45b1ab3d07 custom + generic skins:
- added a generic skin which is filled with actual color assignment using a servlet
- enabled css servlets
- added a generic color scheme in configuration file
- added configuration input in Customization/Appearance servlet
- added a jquery color picker widget
- placed color picked widget to input field of generic colour definition input fields

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-11 00:00:10 +00:00
orbiter
aacf572a26 - enhancements for search speed
- bug fixes in many classes including basic data structure classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-04 11:54:48 +00:00
orbiter
c60aed4435 no caching in browser of dynamic web pages sent by YaCy http
this may prevent unnecessary IO caused by cache storage of the browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-29 19:56:42 +00:00
orbiter
ee3820c9cc more logging for strange "java.lang.NoClassDefFoundError: de/anomic/http/server/RequestHeader" error
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 11:01:44 +00:00
orbiter
37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
- added link to yacystats peer when peer is more than one day old

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-17 10:20:04 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
orbiter
e7ea3b3cc5 added a buffer for network images to reduced load on yacy.net network image server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 12:45:53 +00:00
orbiter
dcd01698b4 added a 'transition feature' that shall lower the barrier to move from g**gle to yacy (yes!):
Here a new concept called 'search heuristics' is introduced. A heuristic is a kind of 'shortcut' to good results in IT, here for good search results. In this case it will be used to get a very transparent way to compare what YaCy is able to produce as search result and what g**gle produces as search result. Here is what your can do now:
- add the phrase 'heuristic:scroogle' to your search query, like 'oil spill heuristic:scroogle' and then a call to scroogle is made to get anonymous search results from g**gle.
- these results are _not_ taken as meta-search results, but are used to instantly feed a crawling and indexing process. This happens very fast, here 20 results from scroogle are taken and loaded all simultanously, parsed and indexed immediately and from the results of the parsed content the search result is feeded, along to the normal p2p search
- when new results from that heuristic (more to come) get part of the search results, then it is verified if such results are redundant to existing (they had been part of the normal YaCy search result anyway) or if they had been completely new to YaCy.
- in the search results the new search results from heuristics are marked with a 'H ++' and search results from heuristics that had been already found by YaCy are marked with a 'H ='. That means:
- you can now see YaCy and Scroogle search results in one result page but you also see that you would not have 'missed' the g**gle results when you would only have used YaCy.

- to make it short: YaCy now subsumes g**gle results. If you use only YaCy, you miss nothing.

to come: a configuration page that let you configure the usage of heuristics and get this feature by default.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-25 16:44:57 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
90fa8fd4d4 - support gpx file extension
- non-blocking location search (time-out handling was wrong)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-12 08:49:20 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
orbiter
e820ed061a avoiding excessive DNS lookups to determine localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
48995e71c4 added soft-auth to general authentication scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 00:07:17 +00:00
orbiter
3889438db6 fix for bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 14:20:24 +00:00
orbiter
23bcca07a3 removed directly linked servlets that had been there to test memory failures that appeared in that servlets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 13:30:20 +00:00
orbiter
82f57f79e5 more PMD enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 00:23:07 +00:00
orbiter
d126d6c1b5 renamed the servlet WatchCrawler_p to Crawler_p
this was done because that servlet may be used for wget/cronjob
triggered crawl starts and it appears to be confusing that the
name of the crawl start servlet looks like a pure monitoring tool.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6568 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 10:05:28 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
orbiter
d77a8f3b3e added some modifications recommended by PMD for better performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6560 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 01:40:26 +00:00
orbiter
4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 00:37:59 +00:00
orbiter
969123385b added json and rss output for image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-23 16:10:50 +00:00
orbiter
94b2a664f3 - use a static DiskFileItemFactory (one instantiation is enough)
- use more memory for the DiskFileItemFactory to avoid IO when POST commands come

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-18 15:05:51 +00:00
orbiter
013f337d3f - avoid unnecessary host name lookups for localhost
- avoid unnecessary reverse domain name lookups for remote access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-16 23:00:54 +00:00
orbiter
7144d2df6e added crawlReceipt servlet as individual class to examine OOM problem as documented in
http://forum.yacy-websuche.de/viewtopic.php?p=18120#p18120

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-11 16:12:00 +00:00
orbiter
4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 20:28:37 +00:00
orbiter
e3025ee691 - new icon for OAI-PMH loading action
- added many stack trace outputs for exceptions in crawl profile handler to find the 'missing profile handle' bug
- catched one more timeout exception in httpd file loader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 16:40:15 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
orbiter
c864901087 - moved httpd.mime to defaults path
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 13:29:09 +00:00