Commit Graph

1619 Commits

Author SHA1 Message Date
orbiter
3288c19c1a reduce remote crawl PPM for fresh peers in freeworld to 6 PPM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-08 09:49:08 +00:00
orbiter
77ee0765a4 - added domain statistic generation to IndexControlURLs_p.html servlet
- added 'delete all' button to all results of such a domain statistic output which causes that all urls to this domain are deleted
- extended stack cleaner to clean also the statistics: they are not completely destroyed, only the smallest counting domains are removed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 19:41:57 +00:00
orbiter
4fbee21cea - added fetch-ahead again (had been removed in last commit)
- reverted default query mode to verify=false

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 23:50:13 +00:00
orbiter
fc03b0437a fixed a error case where a second search after a first search with a different search word failed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5109 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 15:55:25 +00:00
orbiter
ead39064c5 fixed problem with wrong result number calculation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 10:04:46 +00:00
orbiter
05dbba4bab added logging conditions to all fine and finest log line calls
this will prevent an overhead for the generation of the log lines in case that they then are not printed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:30:21 +00:00
orbiter
d3d41e2ee4 - fixed problem with searching with quotes (still not complete, but not as bad as before)
- fixed parsing of crawl-delay statements when seconds were given with float numbers
- enhanced performance of profiling (not too many loggings; not more than one per second)
- removed some debug output
- fixed wrong return type in logging
- added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!)
- fixed wrong word distance computation in RWI management


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-02 23:49:48 +00:00
orbiter
df4ff423c4 added additional properties to query id's to distinguish search events better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-28 21:15:59 +00:00
danielr
9ff4fc11da partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 16:34:24 +00:00
lotus
d9d9c522a1 addendum to last commit
moved recrawl times for standard profiles to constants
calculate new specific dates in cleanup job

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 13:20:18 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
danielr
3c68905540 remove redundant null checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 08:37:39 +00:00
danielr
753a1ae430 - changed default browser from netscape to firefox
- fixed "Inefficient use of keySet iterator instead of entrySet iterator" [WMI_WRONG_MAP_ITERATOR, FindBugs]
- fixed some possible null pointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 07:54:56 +00:00
orbiter
7989335ed6 Preparations to replace the HTCache with a new storage data structure:
- refactoring of the HTCache (separation of cache entry)
- added new storage class for BLOBs. (not used yet, this is half-way to a new structure)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-19 14:10:40 +00:00
danielr
be28af50f5 - fixed "yacy2yacy no proxy"-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-17 10:16:32 +00:00
f1ori
f99c307eff * correct debian build dependencies
* add huge mem page detection in general initscript
* disable logging completely in jmimemagic-library


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-14 21:01:21 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
a087090bbb fixed starting crawl results in "No parser available to parse mimetype 'application/octet-stream'"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-10 11:31:40 +00:00
danielr
8422ee5ec4 - fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined
- serverFileUtils.copy* use now Charset instead of String
- added some warnings for ignored exceptions


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-09 12:00:31 +00:00
hermens
cff4393f0c Fix HTCache so oldest Files get deleted first
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5041 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-08 08:06:06 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
orbiter
ebb40d324b enhanced memory chart: shows now also the size of the word cache as third vector.
The PPM is now shown without a scale, but with a new anotation at the chart entry.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-04 10:47:26 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
lotus
0b2f67577e Index Transfer:
- fix for chunk size calculation
- fix: if chunk size was 1, an infinite selection loop ran because no entries were found. if chunk size fails <=3 it will be set back to 500

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-27 18:53:51 +00:00
lotus
5f77f55ed7 possible fix for negative speed values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-27 06:58:35 +00:00
orbiter
50ef5c406f - refactoring of robots parser (removed opaque Objects[] result vector)
- added Allow-component to robots result object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 11:54:37 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
lotus
62afea0c9f some improvements for yacyTray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5008 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-18 14:17:52 +00:00
lotus
fa695c2d9f tray is now only shown on Windows and doesn't block on linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 19:03:38 +00:00
lotus
d77ed28e2f temporary disabled tray because of flaws on only-shell-linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 08:41:39 +00:00
lotus
f8a1e3175e new yacyTray
this will make a YaCy icon in the tray area on supported platforms
enabled by default
the search page will open on double click

used JDIC 0.9.4 from https://jdic.dev.java.net/

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-13 07:51:45 +00:00
orbiter
7b1c9e6aee discovered and removed a (possibly large) memory leak:
many classes used the kelondroMapDataMining (was: kelondroMapObjects) which adds statistical
functions to the kelondroMap (was: kelondroObjects), but these functions were not used by these
classes. Especially the HTCACHE and robots.txt database allocate a very large number of objects
for statistical use, but never used them. By replacing the kelondroMapDataMining with the
kelondroMap object for these classes now less memory is allocated.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:34:48 +00:00
orbiter
0f5fe8cc53 refactoring of method calling for objects from kelondroMapDataMining
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:15:46 +00:00
orbiter
4acf0a61cd refactoring of kelondroObjects (mainly renaming to kelondroMap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 22:08:16 +00:00
orbiter
441e9c861e fix for npe in HTCache cleaning process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4981 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 21:30:39 +00:00
orbiter
1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
orbiter
b38f467e3c better SRU compliance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 21:50:24 +00:00
orbiter
7052f2f61f - added copyright header of ResourceObserver
- commented/removed some code to eliminate code warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:40:45 +00:00
orbiter
1400cdc91e - refactoring of resourceObserver (moved it to crawler)
- partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling
- the resourceObserver can now show a error message if the diskUsage is 'out of order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:03:37 +00:00
f1ori
b6301a54fa * added class ListDirs to provoid generic listing of directories in systemdirectories and jar-files
* yacy runs, when classes are in a jar-file (->build-jar ant-target)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 14:11:40 +00:00
lotus
f2e2d09916 - fix for index transfer
- imported a random startpoint function from plasmaDHTChunk
in case there was already a gap at the beginning of the index, the transfer process was endless selecting from first startpoint
tested & working on my index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 13:16:17 +00:00
orbiter
a6719dfd2b - refactoring of robots parser
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-05 00:35:20 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
orbiter
474659a71f - modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order
- added option to set minimum crawl delta for domains in balancer
- added default values to crawl deltas in yacy.init
- added configuration for these deltas in performance queues
- enhanced performance setting computation (more time for indexing queue for a faster flush
- remote crawling is now enabled during local crawling if indexer has space and time for more links
- added database stub for new distributed file system
- refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 13:08:37 +00:00
orbiter
d37fd064f9 changed peer selection for search targets:
- less dht targets are selected
- more other peers are selected: all robinson peers with more than one million urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4962 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 22:42:52 +00:00
orbiter
69aac0d74c modified the diskUsage class regarding the following two aspects:
1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was
a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over.
2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 21:47:53 +00:00
danielr
0c1dc703e4 - set staticIP at startUp
- added setting for reduced menu (simpleMenu)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-29 18:35:15 +00:00
orbiter
b928ae492a some code-cleanup and possible speed enhancements in different core methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-17 23:56:39 +00:00
orbiter
c998dc6556 - added security functions to flush url and search caches in case that memory is full
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-16 21:39:58 +00:00
danielr
68c38c2d34 - WatchCrawler shows status without JavaScript
- Performance can be scaled + DHT-profile
- names for pool-threads
- some small refactorings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 10:24:58 +00:00
orbiter
f5ef7f222e - fixed a bug in parser (directory paths had not been recognized)
- no access check when a search is made only local without snippet fetch
- added comment and status message in resourceObserver (this takes very long at startup time!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 09:54:58 +00:00
orbiter
3330181aa0 refactoring:
find a better way to store BLOBs; generalize current BLOG data structure (kelondroDyn)
and prepare it to replace it with something better. The best candidate is the kelondroHeap,
which will become the kelondroBLOBHeap;
removed also some never-used classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 23:12:24 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
det
f597185026 Initial import of the resource observer framework
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 13:10:21 +00:00
orbiter
e0e7f86f82 some bugfixes for the peer-ping process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-05 12:52:27 +00:00
orbiter
40d7f485f3 - fixed several NPE bugs
- fixed loosing of own seed hash (hopefully)
- fixed a bug with crawl start s beginning with (bookmark) files
- added better IP recognition during hello process


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-04 22:24:00 +00:00
orbiter
2f381b8d7a - fixed at least two causes for a NPE after a use case switch.
A large refactoring was neccessary
- added another crawl start option: automatic restriction to sub-path
- removed crawlStartSimple and renamed crawl start expert
   to crawl start (without expert)
- some changes to texts in crawl start
- added some more deletions when an web index is deleted:
   delete also queues and robots cache


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-04 21:34:57 +00:00
orbiter
2a604b7402 added superfast search result computation which can be obtained for local search when snippet fetching is disabled. An example search for the rss interface would be:
http://localhost:8080/yacysearch.rss?query=yacy&Enter=Search&contentdom=text&count=10&resource=local&verify=false
(just add "&verify=false")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-03 23:06:01 +00:00
orbiter
9bef20b537 - added cleanup for unused server loggings: they are removed after the client had not been seen since one hour
- removed configBasic popup trigger when no password is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-02 21:49:59 +00:00
orbiter
1a1841392c small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-26 22:55:55 +00:00
orbiter
25192e0d36 added a deletion button to indexControlRWIs that deletes the complete web index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-24 12:30:50 +00:00
orbiter
0c173821fd more access security regarding database access and snippet retrieval: restrict number of results for not-authorized searchers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-23 09:45:33 +00:00
orbiter
faed00d75d added use cases to basic configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-20 13:21:55 +00:00
orbiter
4229cd275c fixed several details about network switching, default password, random password and localhost authentification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-20 09:29:01 +00:00
orbiter
c1d721dd2d fix for attacks on localhost-authorized peers from web pages with links to localhost addresses:
checking of referer in access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-19 22:17:53 +00:00
orbiter
56a300f92a bugfix / local-search predicate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-16 21:17:55 +00:00
orbiter
2f29ab8779 more target server access security
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-16 19:50:28 +00:00
orbiter
3bd1db776a implemented switch for admin authorization from localhost:
- access is granted for localhost users to administration pages by default
- the default setting can be changed in the BasicConfig.html page
- if the BasicConfig page was accessed with post and no password was submitted, a random password is generated
- a headless installation MUST give a password upon first call of the configuration page, otherwise they will not be able to access it again
- if no password is given within 10 minutes after start-up, a random password is generated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-15 11:26:43 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
78087da287 - changed seed file storage to clear text
- fixed kill script
- fixed saving of seed file (had been corrupted by latest changes)
- some refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 20:30:44 +00:00
orbiter
5fde679acb - fixed problem in performance configuration
- extended rss fetch size for rssTerminal


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-13 15:28:55 +00:00
orbiter
239cc4428d - better domain graph, faster when more links exist, looks better
- new authorization rule: localhost is always authorized for administration. This solves many problems with ajax, and also fixed a problem in rssTerminal
- fix bug in RSSFeed which prevented that entries had been recognized as individual, new entries
- added reloading/updating of status image on status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 22:23:29 +00:00
orbiter
415b92bb07 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1148&hilit=&p=7711#p7711
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 15:06:04 +00:00
orbiter
dd75b3cabc - patch for bad profiles
- time-out when deleting profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 14:58:56 +00:00
lotus
e021278bf0 unescape link display in search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-10 11:25:54 +00:00
danielr
74b1a60043 fixed "java.lang.NoClassDefFoundError: org/a"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-10 08:42:31 +00:00
orbiter
f42c8cf69c updated terminal and dynamic webstructure applet: can now change when crawl is running
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-09 00:01:47 +00:00
orbiter
7ec01d444a fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-08 20:25:11 +00:00
danielr
ae03a54d23 pdfParser: updated lib, fixed ClassNotFoundException: CMSError
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-08 16:55:45 +00:00
orbiter
719f5defb1 updated some grafics at new terminal_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 23:42:14 +00:00
lotus
9bc56a9edc xss protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 16:37:13 +00:00
orbiter
b32736762c enhanced rssTerminal
- 3 lines possible
- distinguishing of private and public data, if not authorized only public data is shown
- shows now more events, including local searches in clear text if user is logged in
- simplyfied peer events
- better recognition of 'real' new peers
- presentation of peer pings from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 23:05:48 +00:00
orbiter
fbb712c669 refactoring:
moved importer classes to crawler and plasma package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 13:44:38 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
d0678f7ab9 refactoring as result of
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=959&p=7560#p7560

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4752 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-01 22:40:42 +00:00
orbiter
483e9a2066 - shifted tld recognition methods from yacyURL to serverDomains
- changed isLocal Property in such a way that it is possible to see if a domain is in the internet (and not intranet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-30 23:06:42 +00:00
orbiter
a3df23659c re-implementation of charset checking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-30 13:23:05 +00:00
orbiter
32b5b057b9 - modified, simplified old kelondroHTCache object; I believe it should be replaced by something completely new
- removed tree data type in kelondroHTCache
- added new class kelondroHeap; may be the core for a storage object that will once replace the many-files strategy of kelondroHTCache
- removed compatibility mode in indexRAMRI


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-29 22:31:05 +00:00
orbiter
88216c1f1f fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1103&hilit=&p=7362#p7362
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:59:20 +00:00
orbiter
d0b893523e - protection against RAM overflow caused by new peer rss news
- more XSS protection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:53:04 +00:00
orbiter
685794e7e7 fix for parser/encoding Exception
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111&hilit=&sid=55a320b54e1e3bda9410e7c50b5147f1&p=7431#p7431

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 22:14:45 +00:00
orbiter
9935e83c86 added new news window into the status page. At this moment it is just a test.
The news inside the window are about peer arrivals and departures, remote search accesses and crawls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-26 01:00:10 +00:00
orbiter
bac38cfa18 added very rudimentary peer news as rss feed. An example can be retrieved with
http://localhost:8080/xml/feed.rss?channel=PEERNEWS
to be extended and integrated in interface ...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 23:30:13 +00:00
orbiter
724bbdf9b2 refactoring of RSS reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4736 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 21:31:07 +00:00
orbiter
b9a2a2d287 more search performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 15:09:06 +00:00
orbiter
ff755fb858 small corrections and enhancements after search timing profiling
search should be a little bit faster now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 13:31:55 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00