Commit Graph

512 Commits

Author SHA1 Message Date
orbiter
4acf0a61cd refactoring of kelondroObjects (mainly renaming to kelondroMap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 22:08:16 +00:00
orbiter
f7aaeb3fad created new main menu entry 'Customization and Integration'
- moved some already existing servlets to this menu
- renamed the skin servlet to appearance
- added a set-to-default-button to the search page appearance setting
- removed the peer profile servlet which is now replaced by a field in the new appearance servlet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 19:57:09 +00:00
orbiter
1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
orbiter
81f75f5056 - removed unnecessary classes (these objects are much easier to handle using generics)
- generalized BLOB referencing. This is the preparation to use another BLOB class, the kelondroHeap


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 23:52:53 +00:00
orbiter
a6719dfd2b - refactoring of robots parser
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-05 00:35:20 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
danielr
68c38c2d34 - WatchCrawler shows status without JavaScript
- Performance can be scaled + DHT-profile
- names for pool-threads
- some small refactorings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 10:24:58 +00:00
orbiter
3330181aa0 refactoring:
find a better way to store BLOBs; generalize current BLOG data structure (kelondroDyn)
and prepare it to replace it with something better. The best candidate is the kelondroHeap,
which will become the kelondroBLOBHeap;
removed also some never-used classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 23:12:24 +00:00
danielr
4b71912e76 fixed wrong class name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 17:13:31 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
apfelmaennchen
2113672bf2 small fix on tag comporator functions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-12 15:05:27 +00:00
orbiter
fbb712c669 refactoring:
moved importer classes to crawler and plasma package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 13:44:38 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
1995faef8d - refactoring of Colage back-end: move to plasma package
- renamed also the plasmaCrawlResults to have a consistent naming for url and image queues
- added a double-check for the images
- added additional queues for the images: all worse-quality images go there, so the queue can be used also if no sizes are given; no image is lost
- added a cleanup for the stacks so they cannot flood the memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-21 22:42:49 +00:00
orbiter
8313d58ae7 - integrated the collage into the Web Visualization menu
- added a counter for the public and private queue on the page (testing..)
- fixed wrong public/private categorization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 15:45:57 +00:00
orbiter
82bf9ac1c8 - added Collage servlet from datengrab and modified it:
* all images are queued
* private/public is respected
* inserted into switchboard
* added collageQueue class that stores all the queued images

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 13:24:21 +00:00
orbiter
202a3adb3e refactoring of HttpClient Writer processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 22:47:05 +00:00
orbiter
e356625b22 - refacotring of stream copy handling to support time-consuming operations
- made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer)
- introduced another timeout setting (java internal property)
- more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 09:53:07 +00:00
orbiter
c3342e1178 - removed class with only one static method
- removed connection method with too long time-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 23:35:20 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
apfelmaennchen
bc3d3b4c97 fixed rebuildTags() to correctly rebuild folders...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-03 22:36:27 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
lulabad
9ecc17baef fixed double Blog entrys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 13:03:40 +00:00
lulabad
94e256e13b * removed single Blogview, now links direct to BlogComments.html
* some other small changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 09:32:29 +00:00
lulabad
00f5f917de - more refactoring to blog
- fixed moderate comment bug. see http://forum.yacy-websuche.de/viewtopic.php?f=9&t=860

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-12 19:17:17 +00:00
orbiter
7f445f34a6 bitte die Java 5 - typischen Warnings einschalten!
(unboxed-Fehler wies auf Programmfehler hin und Typangabe fehlte)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-10 22:50:09 +00:00
lulabad
c1b9a03304 * some refactoring to Blog
* changed default sort order to reverse (newest first)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-10 21:31:11 +00:00
lulabad
766a04bc06 fixed sort problem in Blog. see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=639
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-10 18:35:28 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
lulabad
8358652fa9 some small changes to blog
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-06 21:15:24 +00:00
lulabad
6a85764e1a Second bugfix for numberbug in Blog.
This update fix automatic existing blogentrys.
A backup is not needed but almost a good idea ;)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-05 21:52:14 +00:00
lulabad
40a0591942 Fixed numberbug in Blog, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=639. This wont fix existing Blogentrys (comes later).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4443 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 21:46:18 +00:00
orbiter
7d875290b2 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 22:13:30 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
orbiter
0f5c4abaca more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 10:12:48 +00:00
orbiter
4a80902081 - added ViewProfile as rdf in foaf syntax
- added link to rdf and vCard version on html page
- can be seen on http://localhost:8080/ViewProfile.html?hash=localhash
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 18:21:08 +00:00
apfelmaennchen
b1fae9b5af fixed import Netscape Bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 19:22:36 +00:00
apfelmaennchen
f3a9e9c542 added getFolderList() to bookmarksDB
added cleanTagsString() to bookmarksDB
added getFoldersString() to Bookmark
modified getTagsString() to exclude folderTags

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 20:11:57 +00:00
apfelmaennchen
e81bced2bd reorganized the code and adjusted getTagIterator() to suit folders
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 19:08:32 +00:00
borg-0300
53367d941a more information (BASE64)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:24:24 +00:00
apfelmaennchen
704de4dee8 Neue Funktion angelegt - notwendig für Einschränkung der Tagwolke
public Iterator getTagIterator(String tagName, boolean priv)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 15:58:47 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
fuchsi
d517e96714 last cleanup bits to serverDate before the release. only safe refactoring (method renaming) changes outside of serverDate.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-21 00:53:46 +00:00
hermens
4748d5c1ab Some enhancements to time management:
- remove unnecessary generation of Calendar and Date objects
- synchronized SimpleDateFormat objects in blog-, message- and wikiBoard
- correct use of TimeZones and SimpleDateFormats



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-20 17:11:35 +00:00
fuchsi
1cb6e431a6 Replace the ISO8601 aka W3C datetime parser by one that supports every representation allowed by this standard, see http://www.w3.org/TR/NOTE-datetime
- useful expecially for sitemaps parsing, where this date format is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-19 22:45:58 +00:00
fuchsi
33ee6745f6 more cleanup in serverDate
- remove direct accesses to SimpleDateFormat fields in serverDate and use the static parse... methods instead
- remove nowDate() as a Date doesn't store timezone information and a new Date() is always faster
- default formatter methods use a GMT timezone by default now, this is important for interchangability as some date formats we use don't include a timezone offset.
- continued renaming and rearanging (formatter) methods. all should follow the general naming scheme formatWHAT(...)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-19 19:39:19 +00:00
fuchsi
21b8d1b918 small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-14 19:17:54 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
orbiter
a31b9097a4 preparations for mass remote crawls:
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
  crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
  as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls

After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 01:43:20 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
fuchsi
06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 02:12:31 +00:00
fuchsi
9b0948cb4c gnarf. mixed up the positions. finally fixed...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 10:58:01 +00:00
fuchsi
c0f5fc51ef bugfix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4142 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 10:47:48 +00:00
fuchsi
c5a8585ac6 fix more encooding problems in yacysearch.rss.
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 09:21:03 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
842308ea97 - redesigned crawl start menu, integrated monitoring pages
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown

attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:21:31 +00:00
orbiter
11b4f80bde - fixed non-closing client connections
- added client connection tracker in connections servelet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 21:36:08 +00:00
orbiter
1488769e1f cleanup of unmaintained and outdated performance methods:
removed object pools in httpc. Object pooling is not recommended,
if the creation of the object is not time-intensive. Object pools are only useful,
if there is much computation necessary to create some basic data that is stored
in the object pool and can be re-used. This does not apply to object pools in YaCy.
Object pooling of client sessions would make sense if they would allow re-use of
living connections to other yacy clients. But every connection is closed after usage
of an object in the client pool, therefore the YaCy server client objects are not such
that hold hardware/network-allocated entities.
See:
http://www.javaperformancetuning.com/news/qotm033.shtml
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling
http://docs.sun.com/source/816-7159-10/pt_chap5.html
http://www.microjava.com/articles/techtalk/recylcle2


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 20:49:52 +00:00
fuchsi
5b0c1449e1 various fixes and cleanups for blacklist handling:
1. avoid adding duplicate file name entries in config properties for lists, 
2. correctly merge all path masks from all list files for the same host masks,
3. rewrite helper methods standard java methods for Collection transformations,
4. merged various methods with identical functionality for different Collection implementations into one,
5. minor refactoring to improve code readability.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 06:20:27 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
f9e6cf6a3d more refactoring of search:
integrated first version of ssi-using search interface,
but the function is currently disabled


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-28 12:15:46 +00:00
orbiter
e76e996737 fixed umlaute-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-10 14:10:57 +00:00
orbiter
62347b50f4 added security layer for ViewImage:
- images may be requested by localhost and authorized users only, if the request is done using a clear-text URL
- the image may be requested also using a code that can be a license to retrieve a URL for everyone
- some servelets produce URL licenses for ViewImage, like image search results


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 23:06:53 +00:00
orbiter
57a5b6fa71 some generalization of remote proxy configuration and setting handling in httpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-02 00:42:37 +00:00
orbiter
367fc28928 corrected Brausse->Brausze
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-01 22:15:51 +00:00
orbiter
e76fe1c078 - replaced unicode characters in copyright holder name ('Brausse')
- more logging for bootstrap seedlist loading
- larger DHT chunks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-31 10:00:17 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
b6d9cca67e - fixed problem with yacyVersion and own version generation
- within this context: generalized date format handling
- extended Update interface:
 * a version lookup can be triggered manually
 * a complete lookup + download + re-boot process can be triggered with one click

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 23:47:21 +00:00
orbiter
9da0e53fe8 repaired rss feed reader
- removed old rss parser
- removed unused rss parser libraries
- added new rss reader
- added previously removed FeedReader_p.java and adopted it to new rss parser
- adopted parser interface for rss indexing to new rss parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 10:07:48 +00:00
orbiter
bec4dbc753 added options and execution methods for automated updates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-12 16:23:33 +00:00
orbiter
a9e73b6852 fixed great mess with localization paths. the problem was:
automatic re-translation after update did not work. hopefully now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-04 10:32:30 +00:00
orbiter
36a37f758b fix for oom exception during release download
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=101&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-03 22:55:47 +00:00
low012
2158f83d43 *) cosmetics, changed a character to get rid of "warning: unmappable character for encoding UTF8" during compilation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3946 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-01 17:17:29 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
allo
6074264267 dynamic rights.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-09 19:34:09 +00:00
allo
854eb1492f .yacy /.yacyh urls for the feedreader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-09 12:56:08 +00:00
allo
7a5b22a0b8 Integration of FeedReader in Bookmarks.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 23:27:42 +00:00
allo
7921f07c9d userDB fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 16:11:10 +00:00
allo
7b2e1bb8f2 Feedparser with reflection.
TODO: This needs a special build.xml entry


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 14:31:09 +00:00
karlchenofhell
8bff810d19 - fixed logging output of serverMemory.request()
- don't start up if DATA/yacy.running exists as this is usually a sign of an already started yacy-instance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 12:45:03 +00:00
karlchenofhell
f05ca43780 - the wiki-parser works for remote wiki-code now, not displaying links anymore as if they were local (ViewProfile comment)
- fixed wrong link to CrawlStart on Status-page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 11:35:48 +00:00
karlchenofhell
30c3d909b1 - fixed charset problem in ConfigProfil_p.html (use accept-charset="UTF-8" in forms)
- fixed wrong XML output if no peers are known in Network.xml
- simplified parsing of table properties in wikiCode and ZTableToken
- reimplemented GC heuristics. They are needed to constantly ensure that an amount of free memory is available which is higher than Java's max. limit for performing a Full GC (please use serverMemory.request(long, boolean) rather than serverMemory.available(long, boolean) to provide data for averaging over the last GCs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-05 11:37:19 +00:00
allo
4392ee0c51 BugFix for typo and wrong include
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3789 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-04 16:06:58 +00:00
allo
d1e1580223 Surftips Blacklist
Blacklists List Hardcoded instead of only updated on firststart / migration.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-04 15:36:10 +00:00
hydrox
44bac7dea1 *) blog-comments can now be moderated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-01 06:02:55 +00:00
allo
957a25afff getRight(rightName) instead of get...Right()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-31 14:48:20 +00:00
low012
a0149317ac *) fixed bug where headlines were added to directory of a wiki page multiple times (http://www.yacy-forum.de/viewtopic.php?t=4034)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-25 16:36:09 +00:00
karlchenofhell
baa9402b97 - wiki-parser is now configurable via the config setting wikiParser.class which holds the class-name for the parser to use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 16:19:25 +00:00
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
theli
b1680ab71f *) bugfix for ArrayIndexOutOfBoundsException in robots-parser (thanks to low012)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 13:39:08 +00:00
theli
9a4375b115 *) robots.txt: adding support for crawl-delay
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 13:00:42 +00:00
allo
65a8a9fc58 fix for nullpointer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-14 16:56:13 +00:00
orbiter
139c59ebbd - fixed dht selction problem: the seed tables used a wrong ordering
- cleaned some code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 17:59:36 +00:00
theli
cb43ae11ba *) Bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 12:57:22 +00:00
theli
0b5fc3c28c *) moving date functions to serverDate class
*) Sitemap-parser
   - logging added
   - parsing of modDate added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 12:36:49 +00:00
theli
6f46245a51 *) Bookmarks: Ajax icon is displayed while loading title
*) First version of a sitemap parser added
   - currently only autodetection of sitemap files is supported
*) DB-Import restructured
   - pause/resume should work again now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 09:52:04 +00:00
orbiter
e48189c710 enhanced cluster routing
- cluster definitions can now contain an addition for local ip addresses
- cluster-cluster communication uses the local ip address instead the global address, if one is given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 22:05:34 +00:00
theli
2399ed817c *) robots.txt parser now extracts the sitemap-URL (will be used later)
*) some javadoc added
*) junit testclass for robots.txt parser added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 15:42:38 +00:00
rramthun
e6fb6426a3 *) Some cosmetical changes and corrections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3582 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-19 16:16:54 +00:00
orbiter
40c14a4f0e - better implementation of search query properties
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 10:14:48 +00:00
allo
f4af360f7c bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-20 15:37:19 +00:00
hydrox
9b5fb3908d *) a peer-message are now created when a blog-comment is written
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-15 12:58:17 +00:00
orbiter
6ad39bae1e fixed shutdown problem
this fixes the 'inconsistency' messages during start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 08:48:47 +00:00
karlchenofhell
264a82eec8 - fix for http://www.yacy-forum.de/viewtopic.php?t=3657
- fix for http://www.yacy-forum.de/viewtopic.php?p=32758#32758
- Diff takes any objects now, not only strings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3455 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 22:04:15 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
1cba31de43 redesigned ram organization for database caches
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 22:43:32 +00:00
theli
bd03c6b874 *) bugfix in bookmarksDB:
- NullpointerException when trying to get an unknown bookmark
   - bookmarks can either start with http or https

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 11:56:46 +00:00
karlchenofhell
9623bf7bbe - removed call of java 1.5 method
- added config servlet for local robots.txt
- removed YPStats_p as it is of no use anymore
- supertemplates use XHTML now
- quick-fix for http://www.yacy-forum.de/viewtopic.php?p=32296#32296

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3422 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-01 13:54:14 +00:00
karlchenofhell
a1d68fe092 - use .class rather than Class.forName for classes in class-path
- added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt
- fixed minor bugs in Blog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 22:52:22 +00:00
hydrox
54fef3574f *) missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:38:34 +00:00
hydrox
cb89c74d52 *) added blog-comments
*) removed debug-output when deleting news

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:36:01 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
e3480d4ad3 fix for warning in crawl balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:54:43 +00:00
karlchenofhell
39a2000d8b - added support for [[Bookmark:$bookmarkTag|description]]-link-listings (requested by theli) to wiki-parser
- added support for <pre>-tags to wiki-parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 21:26:48 +00:00
karlchenofhell
619653c054 - fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 15:40:56 +00:00
karlchenofhell
a5a36d9252 - hopefully last fix fo 1.5 methods (sorry for that, eclipse isn't that helpful in identifying those methods)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-22 08:04:09 +00:00
karlchenofhell
e97b6f0458 - we still use Java 1.4 ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:43:31 +00:00
karlchenofhell
0c7b8cf632 - added first version of new wiki-parser
- added blacklist support to manual URLFetcher stack fill
- fix for NPE: http://www.yacy-forum.de/viewtopic.php?t=3559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:31:36 +00:00
low012
801eea8849 *) Fixed bug where pairReplace() got caught in infinite recursion. (http://www.yacy-forum.de/viewtopic.php?t=3466)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 22:07:59 +00:00
karlchenofhell
d114a0136e - crawl profile: don't add null-values
- added some settings and statistics for url-fetcher 'server'-mode
- added own stack for fetchable URLs
- added possibility to fill stack via shift from peer's queues, via POST (addurls=$count and url$num=$url) or via file-upload
- added "htroot" to classpath of linux start-script

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-17 19:16:53 +00:00
orbiter
c464157a6e replaced some toString()
see http://www.yacy-forum.de/viewtopic.php?p=31151#31151

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:26:56 +00:00
(no author)
e218940293 The copyright sign "\u00A9" is already replaced by "&copy;". String "(C)" is not a unicode sequence!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 18:16:27 +00:00
low012
1bc4d8d470 *) If there is more than one pair of patterns in a line, all of them (and not only one pair) will be replaced.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 15:53:40 +00:00
low012
ea7a8cf7aa *) <hr> and <br> tags are XHTML compliant now.
*) Avoid superflous trailing blank in non-proportional sections.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 15:03:13 +00:00
karlchenofhell
f2e6f19b90 - added versioning to Wiki
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-03 15:20:12 +00:00
karlchenofhell
02a73dce87 - added Diff-class for wiki-versioning (forthcoming, first need suitable serverObjects.put() for it)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-03 05:24:44 +00:00
orbiter
e4910f03d1 tag storage fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 11:52:15 +00:00
orbiter
991182b29b more space for bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 00:20:03 +00:00
orbiter
88fa764b64 implemented new kelondroObjects into bookmarkDB
- Bookmark-Objects are stored inside the kelondroObjects cache
- removed superfluous classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 00:17:55 +00:00
orbiter
9c05e2a820 re-design ob kelondroMap
- this class is replaced by an object that can hold any type of object
- this object must be defined as a class that implements kelondroObjectsEntry
- the kelodroMap is now implemented as kelondroMapObjects

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 23:51:10 +00:00
allo
669c21db05 first version of abstracted kelondroMap Cache.
get returns a kelondroCachedObject(or in most cases a subclass of it),
or a map, which can be used to construct a kelondroCachedObject.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 19:10:55 +00:00
allo
14f2068daf some more bookmark changes towards multiuser bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3291 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-28 17:38:43 +00:00
allo
ff79c52fc0 bookmark users can now edit bookmarks.
TO COME: tag bookmarks with username, list bookmarks of a special user, filter private bookmarks for users.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-23 10:24:26 +00:00
allo
f40169fcd7 preparing multiuser bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 19:42:50 +00:00
orbiter
c0851ee943 refactoring: moved and renamed de.anomic.data.searchResults to plasma package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 00:38:03 +00:00
allo
c39dda2374 finished refactoring of searchtemplates.
now plasmaSwitchboard.searchFromLocal calculates a searchResults structure,
which is parsed in the yacysearch/detailedSearch Servlets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 10:42:36 +00:00
allo
35039982da refactoring of search process: store results in a searchResults structure. At the moment, its just stored in it, and read from it again.
Next step: return searchResults instead of serverObjects, and parse the results in the servlets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 07:41:15 +00:00
karlchenofhell
3c43e605ba - don't accept malformed bookmarks, fix for: http://www.yacy-forum.de/viewtopic.php?t=3414 (first report)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 23:39:03 +00:00
orbiter
d07b132a0d - fixed colors of network grafic
- added option to activate write cache for seed-db
- did not activate write cache because it did not work

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 19:39:31 +00:00
allo
0c81bd39d4 XSS-safe put as default.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 14:07:54 +00:00
(no author)
37e53b4a6a replaced tree database structure for seed db by flex data structure
I don't know if this helps, we will find out...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-07 23:34:13 +00:00