orbiter
8b0aea6910
fixed automatic deletion of too many referenced hosts in web structure
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 21:51:56 +00:00
orbiter
9a8a87612d
added new qph column to search tracker servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-10 22:02:17 +00:00
orbiter
e07458bad4
added time-out function to web analysis
...
the default time-out is 1 second
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-10 20:00:44 +00:00
hydrox
4a1bc4743a
*)News-entries with blacklisted URLs are now ignored
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-10 08:05:18 +00:00
theli
339153d40e
*) favicons that are specified in the document content via html link-tags
...
are now detected and displayed on the search page (requested by allo).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-09 15:22:37 +00:00
karlchenofhell
6265d321bd
- more constants
...
- display why global search is not available on search page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 20:01:16 +00:00
rramthun
18a5380ee3
*) situation-dependent lock-buttons for search-page
...
*) removed one unused import and a double definition of "ogg" as media-type
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 15:26:41 +00:00
karlchenofhell
9d6605a83c
- fixed NPE in Blacklist Cleaner during deletion of more than one double entries
...
- don't display responseHeader1.db in CacheAdmin_p anymore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-06 23:36:38 +00:00
orbiter
594ff95955
:-(
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-06 11:34:39 +00:00
orbiter
4ca797401e
fix for ConcurrentModificationException
...
see http://www.yacy-forum.de/viewtopic.php?p=36566#36566
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-06 10:36:04 +00:00
orbiter
7b904e0077
integrated robots.txt crawlDelay into the crawl balancer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-06 07:53:56 +00:00
orbiter
52cb033f01
- slightly different painting of web structure picture:
...
hosts that have many own connections are painted farer away (this is not yet cato's idea, this will be implemented in another step)
- doc update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-05 15:32:43 +00:00
allo
6c9df13552
more debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-04 20:30:40 +00:00
allo
d1e1580223
Surftips Blacklist
...
Blacklists List Hardcoded instead of only updated on firststart / migration.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-04 15:36:10 +00:00
(no author)
94cc9f05f5
*) Improvements for restart via update wrapper
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-02 15:25:13 +00:00
borg-0300
2ab020445a
bugfix, i think - http://www.yacy-forum.de/viewtopic.php?t=4059
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-31 17:03:02 +00:00
(no author)
ef24bed406
Sorry...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-24 16:25:07 +00:00
(no author)
a29cb2e1af
blupp
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-24 16:14:46 +00:00
orbiter
a585b4d41b
added web structure image
...
see http://localhost:8080/WatchWebStructure_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 15:20:50 +00:00
orbiter
33ad0c8246
added a web structure computation and logging:
...
- all web page parsing operations will now increase a web structure file
- the file is computed in memory and dumped at shutdown-time to PLASMASB/webStructure.map in readable form (not a database)
- the file can be used externally to analyse the link structure of the crawled pages
- the web structure can also be retrieved using a xml-interface at http://localhost:8080/xml/webstructure.xml
- the short-term purpose is the computation of a link-graph image (before linuxtag!)
- a long-term purpose could be a decentralized computation of the citation rank
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 08:13:48 +00:00
karlchenofhell
7904175338
- sorry for typos
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 16:22:46 +00:00
karlchenofhell
baa9402b97
- wiki-parser is now configurable via the config setting wikiParser.class which holds the class-name for the parser to use
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 16:19:25 +00:00
karlchenofhell
0a64047081
- plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
...
- added 7zip parser
- added 'text/sgml' to realtime parseable mimetypes (sometimes returned by the mime type parser)
- added new cached output stream class, very suitable for parsers because of limited memory
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3740 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 23:13:44 +00:00
theli
9a4375b115
*) robots.txt: adding support for crawl-delay
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 13:00:42 +00:00
karlchenofhell
086239da36
- added servlet: remote crawler queue overview
...
- added servlet: crawl profile editor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 10:11:25 +00:00
orbiter
b05e2314cf
another dht selection fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-14 12:52:39 +00:00
orbiter
b28e5d0ee9
protection against wrong word hash length
...
see http://www.yacy-forum.de/viewtopic.php?p=35657#35657
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-14 10:00:23 +00:00
orbiter
0384b8771b
fix for http://www.yacy-forum.de/viewtopic.php?p=35700#35700
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-13 19:37:16 +00:00
orbiter
578c2ef130
release 0.52
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-11 22:12:29 +00:00
orbiter
46367afaaa
update of memory-protection values
...
see http://www.yacy-forum.de/viewtopic.php?p=35539#35539
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-11 18:02:48 +00:00
rramthun
ea87fe5d78
*) Updated German translation
...
*) Changed "Lost Handle" error to warning (masses of it if deleting crawl-profile)
*) Removed unnecessary code from Windows script
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-11 17:48:22 +00:00
orbiter
26f05d1fd0
avoid division by zero if search is done for no words
...
this case is relevant if the bluewords (yacy.blue) are used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 22:10:12 +00:00
orbiter
139c59ebbd
- fixed dht selction problem: the seed tables used a wrong ordering
...
- cleaned some code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 17:59:36 +00:00
orbiter
e602436fda
fixed problem with cluster routing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-07 20:48:24 +00:00
orbiter
d6480dc670
fix for long transfer pauses
...
see http://www.yacy-forum.de/viewtopic.php?p=35243#35243
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 21:43:20 +00:00
theli
6f46245a51
*) Bookmarks: Ajax icon is displayed while loading title
...
*) First version of a sitemap parser added
- currently only autodetection of sitemap files is supported
*) DB-Import restructured
- pause/resume should work again now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 09:52:04 +00:00
theli
74dd6cac95
*) signal yacy shutdown to updater
...
*) some javadoc added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-05 16:20:01 +00:00
theli
43748f87fb
*) changes required for the uploader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3655 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-05 15:41:05 +00:00
rramthun
e12e934ade
*) Fixed broken compile process.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-04 21:33:37 +00:00
orbiter
7cf8981a98
- added debugging code for wrong DHT target iterator
...
- restricted distance constraint from 0.4 to 0.2
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-03 22:57:55 +00:00
orbiter
dd44a1394f
disabled automatic performance setting change
...
- during crawl start
- each indexing cycle
- for delay values
- for short memory cycles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 15:39:27 +00:00
orbiter
b9add5cf37
some bugfixes:
...
- dht iterator start point
- wordIndex synchronization
- surftipps url check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 14:20:43 +00:00
orbiter
06b6e35484
fix for a null pointer exception if clusters are not defined
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3632 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 12:26:29 +00:00
orbiter
47e90f31b2
fix for deadlock in plasmaWordIndex.addPageIndex
...
synchronization for class method not necessary
see also: http://www.yacy-forum.de/viewtopic.php?p=34959#34959
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 22:30:09 +00:00
orbiter
81844e85b2
- fixed more cluster routing problems
...
- fixed a problem in remote search when balancer caused shift process to wait too long
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 00:39:53 +00:00
orbiter
304ed3f4d2
fix for remote crawl requests in clusters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 22:52:07 +00:00
orbiter
e48189c710
enhanced cluster routing
...
- cluster definitions can now contain an addition for local ip addresses
- cluster-cluster communication uses the local ip address instead the global address, if one is given
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 22:05:34 +00:00
orbiter
485bf1ea83
bugfix for robinson/remote crawl bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-27 21:35:43 +00:00
orbiter
62c947b4aa
next try to fix deadlock in plasmaWordIndex
...
see also:
http://www.yacy-forum.de/viewtopic.php?p=34821#34821
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3607 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-27 12:26:36 +00:00
orbiter
871ee1ce0f
one step closer to automatic updates:
...
automatically aquire release information from download archives
web pages from latest.yacy-forum.net and yacy.net are retrieved, parsed,
links wihin are analysed, sorted and the most recent developer and main
releases are provided as direct download link on the status page, if it was
discovered that a more recent version than the current version is available.
This process is done only once during run-time of a peer, to protect our
download archives from DoS by YaCy peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-27 09:23:44 +00:00