Commit Graph

347 Commits

Author SHA1 Message Date
orbiter
017a01714d - enhanced logging in robots.txt parser for remote debugging
- robots.txt is now more robust against database operations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 01:03:49 +00:00
orbiter
5a7cec59f3 moved ynetSearch to get all files out of htroot/api/util/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 00:21:56 +00:00
apfelmaennchen
a8dfe787ed - updated to jquery flexigrid 1.1
- YMarks.html automatically  recognizes if a bookmark is a crawl start


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 21:45:17 +00:00
cominch
cef8ebc41d getpageinfo: Checks if there is a OAI repository behind the URL.
This check is only performed if oai parameter is set when calling e.g. getpageinfo_p.xml?actions=oai

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 12:22:19 +00:00
orbiter
eb1c7c041d write info about robots.txt evaluation into getpageinfo_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8038 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:33:54 +00:00
orbiter
f8b8c82421 - refactoring of getpageinfo_p.xml (moved out of util)
- added more logging in getpageinfo_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:22:40 +00:00
apfelmaennchen
abba31f02e - bugfix for correctly sorting ymarks
- some tuning for the autotagger (still not perfect)
- /api/ymarks/get_metadata.xml now provides info for crawlstarts
- removed unused code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-14 22:00:44 +00:00
apfelmaennchen
5f7dbe1c42 - some refactoring (ymarks)
- improvement for autotagger (is now able to create/detect  multi word tags e.g. 'open source')



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-13 23:19:47 +00:00
apfelmaennchen
4d7ae76017 - update to jquery 1.7 (does not apply to all jquery code, old version is additionally kept for compatibility)
- update to jquery-ui 1.8.16 (includes themes)
- introduced new portalsearch (as default)
- old portalsearch is still available and accessible, but will eventually be removed
- jquery and portal search is now loaded by special header templates for maintenance reasons
- update to new autocomplete, solves bug: http://bugs.yacy.net/view.php?id=29
- many improvements to YMarks GUI and API...more to come anytime soon

Sorry, this is a rather large commit, I hope it doesn't break anything essential, but I need to consolidate some of my efforts in order to move ahead. Especially the update to the portalsearch widget might not be welcomed, but the old one is simply incompatible with newer jquery and jquery-ui libraries, sorry. The code tree /yacy/ui/... is obsolete and will be removed in the future. At that point all productive portalsearches should have migrated to the new version.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-07 20:44:58 +00:00
orbiter
a7df70221e refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-04 09:06:24 +00:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
2d03dc1804 removed unnecessary warning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:37:14 +00:00
orbiter
cf8e3b0df8 small fix for count: overXX includes the count
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7915 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:25:27 +00:00
orbiter
6db8921a0f enhanced termlist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 10:23:22 +00:00
sixcooler
d40a177c05 Generation Memory Strategy fine tuning
add some log-output in termlist_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-27 15:23:24 +00:00
orbiter
a5541751a8 - added memory computation to termlist_p.xml
- added option to delete terms in termlist_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 19:13:45 +00:00
orbiter
9bdee5c71c added a servlet that produces a list of term hashes that appear more than 10000 times
see /api/termlist_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7898 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 16:49:20 +00:00
sixcooler
916d79111e Runtime.maxMemory() DOES change @ runtime:
I wondered getting Total-ram > Max-ram and MemoryControl.available() < 0
MemoryControl.available() < 0 causes some errors where its value is used for dimension of buffers for eg.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-19 12:48:50 +00:00
orbiter
9ebc75db4b fix for channel authorization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 23:14:02 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.

The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.

To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-15 22:57:31 +00:00
orbiter
5b579e21a3 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:21:40 +00:00
apfelmaennchen
61c9a791c4 YMarks: sidebar with tabs for tags and folders
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-06 21:36:35 +00:00
apfelmaennchen
8b8db2aaba YMarks: some small changes/fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-03 21:21:06 +00:00
apfelmaennchen
441035f1f4 YMarks: some improvements to flexigrid quick search on YMarks.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 20:11:58 +00:00
orbiter
6fa439c82b - refactoring of robots
- added option to crawler to send error-URLs to solr
- changed solr scheme slightly (no multi-value fields where no multi values are)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 14:05:51 +00:00
apfelmaennchen
e7c2ea193b YMark:
- general improvements on importers, especially on auto tagging
- added get_tags (needed for tag clouds etc.)
- improved flexigrid support
- added YMarks.html (not fully working) that will eventually replace Bookmarks.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-01 21:42:48 +00:00
apfelmaennchen
b2281f0b7d YMark: intermediate work towards flexigrid support
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-20 22:33:01 +00:00
apfelmaennchen
60412d2bb3 YMark:
- more refactoring >> YMarkEntry
- integration of SurrogateReader as bookmark importer
- various small bug fixes e.g. get_xbel.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-18 21:42:14 +00:00
apfelmaennchen
62855f9567 YMark: code clean up and some small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 21:19:42 +00:00
apfelmaennchen
667e912b19 YMark:
- some improvements to firefox json bookmark importer
- test import with: /api/ymarks/test_import.html
- view ymarks with: /api/ymarks/test_treeview.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 09:09:33 +00:00
apfelmaennchen
a0e4960a4d YMark:
- first attempt for a firefox json bookmark importer
- added JSON library json-simple-1.1.jar

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 20:58:58 +00:00
orbiter
19fd13d3bc Added federated index storage to solr.
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.

The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.

federated solr storage is switched off by default.

To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/

Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
   that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
   want to use solr instead of YaCy.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-14 20:05:04 +00:00
apfelmaennchen
78d6d6ca06 refactoring for ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-08 21:15:10 +00:00
orbiter
b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
(directly or indirectly) and it grants a crawl-delay of 0. Then all forced pause mechanisms in YaCy are switched off and the domain is crawled at full speed.
crawl delay values can be assigned to either
- all yacy peers using the user-agent yacybot
- a specific peer with peer name <peer-name>.yacy or
- a specific peer with peer hash <peer-hash>.yacyh


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-03 23:39:45 +00:00
low012
1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
*) cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-01 23:32:40 +00:00
orbiter
9b25d07295 - added geo information parsing to html parser
- extended metadata information in index with geolocalisation
- added display of location in yacydoc and ViewFile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 00:49:47 +00:00
orbiter
b1a8d0c020 enhancements to web cache and less strict caching rules
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 10:35:26 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
orbiter
dc0db3550e avoid string conversion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-11 00:59:27 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
orbiter
e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 09:29:05 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
low012
c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
*)  cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-18 01:25:46 +00:00
orbiter
4473cf8c61 replaced utf-8 with UTF-8
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-16 13:51:30 +00:00
orbiter
c93f4dda72 - cleaned up yacy news
- removed unused methods
- avoid news generation in case that the peer runs in robinson mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:00:14 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
orbiter
c288fcf634 redesigned CrawlStartScanner user interface and added more features:
- multiple hosts for environment scans can be given (comma-separated)
- each service (ftp, smb, http, https) for the scan can be selected
- the scan result can be accumulated or refreshed each time a network scan is made
- a scheduler was added to repeat a scan and add all found urls to the indexer automatically

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 02:15:20 +00:00
low012
6f4f957e50 *) cleaning up the code a little bit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 00:18:05 +00:00
f1ori
9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 19:20:00 +00:00
orbiter
a563b05b60 enhanced crawler:
- added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big
- the noload queue is emptied with the parser process which indexes the file names only
- the 'start from file' functionality now also reads from ftp crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-11 00:31:57 +00:00
apfelmaennchen
737aaf6952 various small changes to ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-25 21:16:47 +00:00
apfelmaennchen
8a50670546 some code clean up for the last post
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7338 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-24 23:40:55 +00:00
apfelmaennchen
442497868d another step towards an auto tagging function for YMarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-24 23:26:29 +00:00
low012
dad5818b40 *) cleaning up the code a little bit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-24 01:31:41 +00:00
low012
eb79b952ef *) cleaner code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7331 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-21 03:39:53 +00:00
low012
38fdf43587 *) renamed classes according to standard Java coding conventions
*) String.isEmpty() was introduced in Java 1.6, but we still use Java 1.5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-21 01:29:32 +00:00
apfelmaennchen
54e63b556e intermediate step for a YMark auto-tagging function based on word frequencies.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-17 15:17:29 +00:00
apfelmaennchen
403ee9c014 added a drill-down for metadata and word count to /api/ymarks/test_treeview.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-16 00:48:38 +00:00
apfelmaennchen
f147a022f8 enabled YMark Import for /Table_YMark_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-13 10:32:37 +00:00
apfelmaennchen
94a9be18a4 added a ymark table administration: /Table_YMark_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-10 22:53:27 +00:00
apfelmaennchen
25339f93c7 more updates to ymarks
- working xbel import/export
- exported xbel includes yacy specific metadata but still validates against PUBLIC DTD


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-09 17:01:31 +00:00
apfelmaennchen
cdd65aca71 update to ymarks
- get_xbel.xml is almost working
- startet ymark api documentation info.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-07 20:03:01 +00:00
apfelmaennchen
808edffaf6 ymarks
- some refactoring
- working xbel and html import (/api/ymarks/test_import.html)
- working treeview (/api/ymarks/test_treeview.html)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7312 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-06 20:26:13 +00:00
apfelmaennchen
43586a2ace a update to ymarks (please test if you wish):
- import HTML (e.g. FF export) via /api/ymarks/import.html
- view your import via /api/ymarks/test.html
- get a xml list via /api/ymarks/get_ymark_list.xml?tags=&folders=
- delete bookmark tables via standard interface /Tables_p.html
it is still very experimental!! 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-03 22:52:03 +00:00
apfelmaennchen
f5324b27f2 more updates to the new bookmarks (ymarks)....
- split YMarkTables and YMarkIndex in two different classes
- HTML import is working properly
- XBEL import is still broken


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-03 06:47:02 +00:00
orbiter
70c95608d4 Added CORS Access header for yacysearch.rss output
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-02 16:28:40 +00:00
apfelmaennchen
efe0667fdd more new bookmark (ymarks) code with experimental html and xbel import
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-28 15:24:15 +00:00
f1ori
7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-26 16:10:20 +00:00
apfelmaennchen
d0e6c03b51 some updates to the new bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7272 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-25 22:44:05 +00:00
apfelmaennchen
9c94ebdee4 small changes to new bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-22 13:14:09 +00:00
apfelmaennchen
244b56e9d3 an update to the new bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7264 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-21 19:18:17 +00:00
apfelmaennchen
f035f257da added some more bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7261 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-18 21:09:41 +00:00
apfelmaennchen
a79728b97d some updates to experimental bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7254 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-17 09:58:50 +00:00
apfelmaennchen
ef782cd026 and even more experimental bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7253 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-16 10:20:41 +00:00
apfelmaennchen
7aca763ca8 Some more experimental bookmark code...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7250 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 12:53:41 +00:00
apfelmaennchen
4270ed696c Experimental code (I need to transfer the code to my macbook, sorry) for the new bookmarks API based on the Tables concept (same as for crawl starts). Currently you can add a bookmark by api/ymarks/add_ymark.xml?url=http://www.yacy.net&title=YaCy and watch the result via the standard view Tables_p.html.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 05:40:19 +00:00
orbiter
2c549ae341 fixed a number of small bugs:
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 23:57:58 +00:00
orbiter
f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
- nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler
- nobody ever used the auto-dom filter other than with a crawl depth of 1
- the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste
- the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain
- the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used
- the new option also fits into the easy site-crawl start menu

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 12:50:34 +00:00
mikeworks
ad7efe6016 rssTerminal.html: Fixing the 'null' is null or not an object in rss2.js when viewing the YaCy default Status page http://localhost:8080/Status.html with Internet Explorer
feed.xml: copy of feed.rss that helps Internet Explorer also read the Feed - workaround for the fix above
Problem is described in the forums and should be fixed better ;-(http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2766&p=20702)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-26 22:55:52 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
sixcooler
17eebd4ef8 counting crawler traffic again:
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2808

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-11 15:58:15 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
875741bcff fix for http://forum.yacy-websuche.de/viewtopic.php?p=20657#p20657
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 10:05:04 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
- added recording date, last execution date and next execution date for a scheduler (scheduler to be implemented next)
- extended database access methods for more data formats, especially for date insert/retrieval
- extended 'Steering' interface to show new database fields
- migrated Steering to new http client
- extended cora http client to transmit authentication and also added some convenience methods (http response code)
- simplified database back-end (not so much specialized methods for multiple properties)
- extended date formatter to produce a special format to show dates in html (&nbsp; in spaces of date format)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 15:56:38 +00:00
sixcooler
15e8c13526 ... migrating to HttpComponents-Client-4.x ...
(gzip decompression, httploader, robots, ...)

+ enable proxy-crawling while log is fine

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 01:16:26 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
- added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 22:39:27 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
56ff9d5fd4 - extended news size from 512 to 1024 characters
- a new news db will be created (news1024.db), the old one (news.db) can be deleted
- peers with too large news payload are not ignored any more (they may have been invisible because they had a too large news payload!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 10:43:47 +00:00
orbiter
3f93a0cc8f redesign of remote proxy settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-26 00:01:16 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
9842fab6e4 - fixes to query parameter
- replaced/removed search query attribute (was old style, new is 'query' according to SRU)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-20 22:05:04 +00:00
orbiter
1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
- used that to display two layers on map: cities and search result locations
- added many marker grafics for the display of the markers on the map
- some refactoring of the yacy news code plus bugfixes for latest move from Tree to Table data structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-19 12:53:09 +00:00
orbiter
2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
- migrated the opengeodb downloader to a new version of the opengeodb-dump


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 18:30:11 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
c45117f81f fixed dates in metadata
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-08 22:09:36 +00:00
orbiter
06ff0c5b06 fixes for metadata retrieval and presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-05 22:45:54 +00:00
suessthomas
5c5e6accdb Fixes for (X)HTML compatibility.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-05 21:12:58 +00:00
orbiter
7ab207d93a better presentation of search result metadata and fixes to htcache loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-04 20:57:09 +00:00
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
mikeworks
7a3c19846f Updated German translation de.lng: added new Table_RobotsTxt_p.html and some other changes
Changed 'Sprache' -> 'Language' in yacydoc.html and added translation in de.lng

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-29 00:56:51 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
0018163c07 moved table row/column matching method from front-end to back-end
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:01:27 +00:00
orbiter
3300930fc5 - (almost) fixed FTP crawler
- integrated/fixed SMB crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
27b2998eb4 added searchtable function to more tables in interface
you can now sort by any column in most tables in YaCy just by clicking on the headline column of the table

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 10:05:41 +00:00
orbiter
3014e5f6f9 - integrated live search in the IndexControlURLs input window for URLs:
this searchs for occurrences of the given word in URLs and presents them
  in a pop-up list below the input line
- some bugfixes for the new robots table viewer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 15:44:11 +00:00
orbiter
0769517129 added a robots.txt monitor in the crawler monitor submenu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 11:31:15 +00:00
orbiter
840527689b more simplification of bookmark class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 23:04:52 +00:00
orbiter
ada0ce9de3 refactoring of bookmarks: there is a big performance problem in the bookmarks code and furthermore the bookmarks
will loose its leading role for the re-crawl funtion when the new api tables will work. To be prepared for a replacement
of such functions the bookmark class is re-organised.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 22:18:56 +00:00
orbiter
2113fcd7e5 - fixed usage of isEmpty() which is not available in java 1.5
- increased visibility of some methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6564 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 12:33:40 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
orbiter
362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6521 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-09 23:27:26 +00:00
orbiter
e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 14:01:19 +00:00
orbiter
4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 00:37:59 +00:00
orbiter
5399d1e2bc refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6471 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-08 22:58:57 +00:00
orbiter
4c99d4683d possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 23:15:20 +00:00
orbiter
4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 20:28:37 +00:00
orbiter
5e8038ac4d - refactoring of blacklists
- refactoring of event origin encoding


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 20:14:30 +00:00
orbiter
26fafd85a5 - more refactoring
- fixed problem with parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 15:12:34 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
orbiter
5841ee83d3 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 21:29:18 +00:00
orbiter
ce8dc575ca refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:12:19 +00:00
orbiter
bea3b99aff moved table and util classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:14:19 +00:00
orbiter
1e4f8b56ed accumulated classes from different packages into the new rwi package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6394 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 00:39:15 +00:00
orbiter
4446acc8cd moved kelondro order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:22:22 +00:00
orbiter
735e2737e3 * added index segments
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 14:44:20 +00:00
orbiter
031e6eefbd some updates to dublin core, metadata browsing, file indexing and parser stability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-24 12:54:45 +00:00
orbiter
c0e17de2fb - fixes for some problems with the new crawling/caching strategies
- speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer
- fixed some deadlock- and 100% CPU problems in the balancer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-25 21:38:57 +00:00
orbiter
634a01a9a4 replaced wget-requests with caching requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 14:52:27 +00:00
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
ca72ed7526 -removed superfluous crawl cache
-refactoring of crawler classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 21:07:46 +00:00
orbiter
13c63f4082 a set of small fixes to crawling behaviour
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 14:15:51 +00:00
f1ori
8931c8d6b4 improvments to debianpackage:
* autoupdate completely disabled, display hint
* restart-button in interface works!

* moved all build-Variables to yacyBuildProperties
* fixed some warnings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6195 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-11 17:03:22 +00:00
orbiter
0e8647d62f refactoring of search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 22:14:57 +00:00
orbiter
dafffd0153 refactoring of parsers and document processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 21:48:08 +00:00
orbiter
154bbc3364 code cleanup: call of static methods directly to the class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:01:35 +00:00
orbiter
bc6dd8194b refactoring: moved search query class to new search package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 11:49:00 +00:00
orbiter
945777aa80 replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 22:59:54 +00:00
orbiter
cc49aedf12 - fixed problem with remote search NPE
- more abstraction for search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-03 08:49:54 +00:00
orbiter
88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 10:03:35 +00:00
orbiter
99bf0b8e41 refactoring of plasmaWordIndex:
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment

The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 14:26:05 +00:00
orbiter
fec6f9054f some refactoring of search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 23:51:34 +00:00
orbiter
63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
- refactoring: migrated data objects for the new connector classes
- added a DAO interface class to specify an abstract interface for database retrieval connector methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 07:44:22 +00:00
orbiter
e16c25ddf7 (peak-) performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 22:45:39 +00:00
orbiter
c8624903c6 full redesign of index access data model:
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 15:29:00 +00:00
f1ori
dd6b5005ff * fix missing charset handling in getpageinfo_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 12:31:28 +00:00
orbiter
89ec3acb3e - full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 06:34:27 +00:00
orbiter
c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-03 13:23:45 +00:00
orbiter
a29a11e526 added evaluation of incoming links in webstructure api
the api hash changed, new XML schema.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-03 07:59:49 +00:00
orbiter
7ba078daa1 - added fast site-operator
- refactoring merge into BLOBArray

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-02 13:26:47 +00:00
orbiter
bd409fb7ba added web structure analysis for a special domain that can be requested from the api.
Example:
http://localhost:8080/api/webstructure.xml?about=www.yacy.net
returns a xml with the following content:

<?xml version="1.0"?>
<webstructure>
<domains reference="reverse" count="1" maxref="300">
<domain host="www.yacy.net" id="FXg39Q" date="20090401">
  <citation host="java.sun.com" id="o-R3yY" count="1" />
  <citation host="yacy-suche.de" id="-KCLaB" count="1" />
  <citation host="suma-ev.de" id="VRAHIA" count="1" />
  <citation host="www.kit.edu" id="EMaLDQ" count="1" />
  <citation host="yacy.net" id="Fh1hyQ" count="1" />
  <citation host="www.fzk.de" id="V2Kl-A" count="1" />
  <citation host="en.wikipedia.org" id="rwtdfR" count="3" />
  <citation host="vimeo.com" id="MmdQDY" count="3" />
  <citation host="liebel.fzk.de" id="sX4ozA" count="6" />
</domain>
</domains>
</webstructure>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-01 14:53:23 +00:00
borg-0300
8c494afcfe svn attributes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-20 11:21:32 +00:00
orbiter
67aaffc0a2 - added Latency control to the crawler:
because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases).
The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time.

- added API to monitor the latency times of the crawler:
a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-20 10:21:23 +00:00
orbiter
61f9dbf0cc - fixed a display problem in watch crawler
- another small enhancement in balancer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5729 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-18 21:25:52 +00:00
orbiter
83792d9233 more refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-16 16:24:53 +00:00
orbiter
7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 14:56:25 +00:00
orbiter
14a1c33823 refactoring of wordIndex class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 10:34:51 +00:00
orbiter
d7a493b4f5 added experimental timeline api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 16:01:29 +00:00
orbiter
efcd95dc37 simplification of (internal) query process / refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 15:53:20 +00:00
orbiter
aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 11:04:13 +00:00
orbiter
76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:58:14 +00:00
orbiter
c12bb8a6d0 - refactoring of the http client
- added a protection against memory leaks for the access tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-19 16:24:46 +00:00
orbiter
62505bb3cb more bugfixes as recommendet by findbugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-17 09:12:47 +00:00
orbiter
6a32193916 - refactoring of cache naming in web index cache (no more dht semantics there)
- activating a feature in the thread dump that cuts off dumping of a trance of inside-java-core events

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 23:28:13 +00:00
orbiter
c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
  with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 00:06:59 +00:00
orbiter
01b97ef3f8 added new cybertag-tracking feature that was inspired by itgrl
from the forum discussion in
http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612

The feature will provide two basic entities:
- you can integrate image links which point to your yacy installation anywhere in the web.
  the image can be loaded with
  <img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>">
  This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill
  Use this, to track your activity in the web.
- you can view your tracks at
  http://localhost:8080/Tracks.html
- There is a public api to your tracks at
  http://localhost:8080/api/tracks_p.json
  which needs authentication


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-06 15:06:19 +00:00
orbiter
b57c9da1f8 - fixes to doc, ppt, xls parser: better title
- fixes to httpd server response header generation
- fixes to a server date computation bug
- new Button in indexControl to view content of url in ViewFile


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 15:15:13 +00:00
orbiter
75bef03ac6 fix for bad encoding in yacydoc.html and yacydoc.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-02 15:55:45 +00:00
apfelmaennchen
ee3fe19c0b added /api/bookmarks/get_folders.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5559 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 20:41:37 +00:00
apfelmaennchen
7a159dc745 update for api/bookmarks/get_folders
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 20:40:51 +00:00
f1ori
bacccda6d7 * blacklist_p.xml: attrOnly = only give parameters of blacklists, no content
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 13:33:08 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
apfelmaennchen
3905caf8a1 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5536 6c8d7289-2bf4-0310-a012-ef5d649a1542 2009-01-29 22:07:18 +00:00
apfelmaennchen
08ed14603e - fixed YaCy-UI sciencenet search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5535 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-29 22:06:06 +00:00
apfelmaennchen
9bd9ccade2 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5530 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-28 22:47:03 +00:00
apfelmaennchen
96684df1a9 - security fix for addTag.java and editTag.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-28 06:43:13 +00:00
apfelmaennchen
6dd52422ea - added two dialogs to manage bookmark tags in YaCy-UI
- fixed renameTag() in bookmarksDB
- added /api/bookmarks/tags/addTag.xml
- added /api/bookmarks/tags/editTag.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5525 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-28 00:15:43 +00:00
apfelmaennchen
9317650272 forgot to post this one...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-26 18:40:56 +00:00
apfelmaennchen
92d77c3bef Major update to YaCy-UI...still not perfect...but I thought I share my progress :-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-26 18:38:58 +00:00
orbiter
dedfc7df7f removed distinction between DHT-in and DHT-out. This is necessary to make room for the new cell data structure, which cannot use this this distinction in the first place, but will enable the same meaning with different mechanisms (segments, later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-22 00:03:54 +00:00
f1ori
34da04c7dd * fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1754
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-21 21:30:20 +00:00
orbiter
b423d0a036 moved all servlets from htroot/xml to htroot/api
the file server contains a patch that temporary matches all xml paths to api,
that means all interfaces still work. Please adopt all your interfaces to the new path.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-15 23:52:58 +00:00
orbiter
4bd927d513 the Semantic Web moves in!
- added two new api files for document metadata:
- added a XHTML+RDFa html file shows the document metadata in a format that presents the data for rendering and for metadata retrieval. This is a typical document format for a semantic web data structure. the used RDF vocabulary is Dublin Core
- added a xml file that shows the same data as pure DC metadata
- integrated the API into the existing IndexControlURLs interface

With about one billion metadata files (URL metadata) this extension makes the freeworld YaCy network
to one of the probably largest metadata document provider for the semantic web!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5490 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-13 22:04:38 +00:00