Commit Graph

83 Commits

Author SHA1 Message Date
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
f1ori
87e6abd168 * fix urls containing a port number in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 15:02:15 +00:00
f1ori
97045022fa * pass cookies to Server Side Includes
* User.html a bit more usable


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 14:54:14 +00:00
orbiter
610b01e1c3 - added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
- some refactoring for mime type discovery

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 16:05:00 +00:00
sixcooler
a311596881 finishing up my commits (7855-7858) which could be helpful for
not declaring inside loops (helps GC of some VMs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:35:24 +00:00
f1ori
3a5fa73008 * revert parts of previous commit, because it breaks the trickle-feature
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 12:04:40 +00:00
f1ori
6e79675ff3 * use gzip-encoding in more cases
* send Expire-Header for static content
* should improve webserver-performance for slow connections
* fixes #37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 11:47:53 +00:00
cominch
09bb7a390c do not replace malformed or invalid URLs in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-12 07:44:23 +00:00
f1ori
96957375cc * fix url proxy for relative links and chromium
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 09:32:02 +00:00
orbiter
7db208c992 performance hacks: more pre-allocated StringBuilder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-21 23:10:50 +00:00
orbiter
87bd559c42 fixed warning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:53:43 +00:00
f1ori
900dacbf97 * improve link rewriting in proxy-url
* only rewrites links, which are in current search domain

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-01 13:27:04 +00:00
f1ori
dc855d881b * further improve proxyurl
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 21:25:20 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
746e3c3b06 Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 16:34:35 +00:00
f1ori
14e1666b21 * fix replacing regexes in url proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 16:09:29 +00:00
orbiter
d1dbbd956a always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 09:31:07 +00:00
orbiter
9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder'
see http://bugs.yacy.net/view.php?id=9

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 07:37:46 +00:00
orbiter
6e42d4de88 - added full-String search function: find things that match exactly what is quoted in the query
- re-structuring authentification methods to fix a problem with API steering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 00:25:14 +00:00
orbiter
6fa439c82b - refactoring of robots
- added option to crawler to send error-URLs to solr
- changed solr scheme slightly (no multi-value fields where no multi values are)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 14:05:51 +00:00
orbiter
4c013d9088 more UTF8 getBytes() performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-12 05:02:36 +00:00
orbiter
1989ebc24b removed more warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 22:52:30 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
orbiter
3820525464 more memory protection: auto-flush of caches in case of memory shortage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 16:32:34 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
8d14916c74 more patches for a better out-of-memory management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 01:45:11 +00:00
f1ori
59dea3a284 * implement url proxy, a proxy via the url http://peer:port/proxy.html?url=http://domain.tld/path
* enable with proxyURL = true
* could be useful to browse specific pages with proxy or use own improvements in proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7538 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 21:39:38 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
orbiter
d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 13:33:28 +00:00
orbiter
70ca7cec8c fix for http://forum.yacy-websuche.de/viewtopic.php?p=21763#p21763
and another fix for non-working global search when search options are switched off

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-03 10:43:09 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
f1ori
a321c7673d * adminAccountForLocalhost only for localhost
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 11:37:30 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
f1ori
2521677a45 * deny adminForLocalhost and intranet network setup also on bootup and not only on network switch
* require authentication for yacybot what ever adminForLocalhost is set to
  (after this patch, is the rule from above really nesseccary,
  the crawler also checks the robots.txt)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 21:39:02 +00:00
f1ori
9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 19:20:00 +00:00
orbiter
4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html
instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-08 10:50:23 +00:00
orbiter
7bb4b001ed - view image files from cache
- fixed generic header settings; affects CORS functionality

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-27 09:16:16 +00:00
orbiter
70c95608d4 Added CORS Access header for yacysearch.rss output
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-02 16:28:40 +00:00
f1ori
7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-26 16:10:20 +00:00
orbiter
45b1ab3d07 custom + generic skins:
- added a generic skin which is filled with actual color assignment using a servlet
- enabled css servlets
- added a generic color scheme in configuration file
- added configuration input in Customization/Appearance servlet
- added a jquery color picker widget
- placed color picked widget to input field of generic colour definition input fields

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-11 00:00:10 +00:00
orbiter
aacf572a26 - enhancements for search speed
- bug fixes in many classes including basic data structure classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-04 11:54:48 +00:00
orbiter
c60aed4435 no caching in browser of dynamic web pages sent by YaCy http
this may prevent unnecessary IO caused by cache storage of the browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-29 19:56:42 +00:00
orbiter
ee3820c9cc more logging for strange "java.lang.NoClassDefFoundError: de/anomic/http/server/RequestHeader" error
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 11:01:44 +00:00
orbiter
37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
- added link to yacystats peer when peer is more than one day old

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-17 10:20:04 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00