Commit Graph

41 Commits

Author SHA1 Message Date
Michael Peter Christen
f1c5338210 prepartion for greedy crawl profiles and refactoring 2013-07-01 13:10:09 +02:00
Michael Peter Christen
906e51214a the web structure image shows the pivot dot in a different color 2012-10-25 10:18:28 +02:00
Michael Peter Christen
9eaede50e7 enhanced web structure images 2012-10-23 18:11:19 +02:00
Michael Peter Christen
21fe8339b4 - enhanced generation of url objects
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
8c06925984 animation of the web structure picture 2012-02-25 15:42:29 +01:00
orbiter
e799ee55b9 fix for web structure if there is no web structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 10:29:20 +00:00
orbiter
6cd27473f5 - better default values for caching and cache usage
- set new caching and verification behavior according to use case automatically

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 10:22:02 +00:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.

The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.

To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-15 22:57:31 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
orbiter
28f669bf0b - fixed/enhanced move to SD/16:9 images (network, web structure)
- added logging in peer ping to analyse time-consuming elements which could be cause for disappearing peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7450 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 10:16:14 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
86d7f8a989 - the web visualization can now be generated in custom color
- added input fields in WatchWebStructure_p.html
- introduced enum classes for Draw Mode and Filter Mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 10:44:00 +00:00
orbiter
5841ee83d3 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 21:29:18 +00:00
low012
5e4f267a36 *) added subversion properties and edited a few comments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-26 22:07:40 +00:00
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
fc69a76197 update to web structure picture:
- allow bigger size
- better instructions for api usage

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-17 12:37:31 +00:00
orbiter
99bf0b8e41 refactoring of plasmaWordIndex:
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment

The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 14:26:05 +00:00
orbiter
bd5f4c78d8 - added default profile for surrogate indexing
- integrated surrogate indexing into indexing queue process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 08:01:38 +00:00
lotus
029e16b653 replaced some put(String, String) by putHTML(String, String) on serverObjects respond
in htroot/ root
didn't touch htroot/xml/
this should solve potential xss issues

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 11:45:11 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
orbiter
842308ea97 - redesigned crawl start menu, integrated monitoring pages
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown

attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:21:31 +00:00
orbiter
e332b844b2 - enhanced remote search: during waiting time for remote crawls
some urls are fetched so the url cache can be filled with these urls
- the url-prefetch is used to sort out some unresolved urls
- the snippet-fetcher is triggered with the search event id. This is used
  to remove missing snippets from the search cache so they will not be displayed again


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 18:18:35 +00:00
orbiter
5fd1d5a58e added time configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-10 20:25:33 +00:00
orbiter
e07458bad4 added time-out function to web analysis
the default time-out is 1 second

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-10 20:00:44 +00:00
allo
5cae5e79c2 nice +/- buttons
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 23:11:13 +00:00
orbiter
8f00c1e896 restrict number of nodes in web structure picture
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 22:56:42 +00:00
orbiter
7fdf31093b automatic focus on one of the crawl starts when web structure is shown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-06 20:18:39 +00:00
orbiter
0ae6664ad8 enhanced web structure picture
- hand-over of get properties from web front-end to graphics generation
- added depth-control buttons
- added marking of anchor-points to highlight relation order
- enhanced ymage graphics library

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-24 15:28:03 +00:00
orbiter
a585b4d41b added web structure image
see http://localhost:8080/WatchWebStructure_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 15:20:50 +00:00