Commit Graph

4414 Commits

Author SHA1 Message Date
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
low012
64f32e8f00 *) replaced all IPs in IP filters for proxy with the proper regular expression
*) some cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-11 23:37:13 +00:00
orbiter
93732d6773 increased number of target peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-04 13:25:28 +00:00
orbiter
70ca7cec8c fix for http://forum.yacy-websuche.de/viewtopic.php?p=21763#p21763
and another fix for non-working global search when search options are switched off

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-03 10:43:09 +00:00
orbiter
fe93caac5a added flags and administration options to show advanced search and to show search result attributes (for each search result)
Administration can be done at ConfigPortal.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:54:13 +00:00
orbiter
5905f912c5 replaced more double types with float
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7462 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:22:00 +00:00
orbiter
0cdfb82963 replaced more appearance of double values by float values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:06:29 +00:00
orbiter
eb12e15738 moved all Double values to Float values because of
http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/
YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-01 23:49:11 +00:00
f1ori
982aa689ef * fix StringIndexOutOfBoundException in WebStructureGraph
* add better escaping to saveMap and loadMap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-31 14:25:09 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
orbiter
28f669bf0b - fixed/enhanced move to SD/16:9 images (network, web structure)
- added logging in peer ping to analyse time-consuming elements which could be cause for disappearing peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7450 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 10:16:14 +00:00
orbiter
0376f73fdb extended seed list uploader: do not only upload all active peers but also some more peers that are passive but had been active in the last 24 hours
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7449 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 23:21:33 +00:00
orbiter
991b92f4ae enhanced network graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 13:52:46 +00:00
orbiter
3ae8f40fc8 removed yacy.network.group - this feature was never used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:50:36 +00:00
orbiter
efb4ca8fa8 modified auto-delete of search failure-words:
- words are now not deleted from the search index automatically if index receive is switched off
- a flag in the network definition defines if this feature is switched on at all
- the search filter for not-found word references is switched off for server-side remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:46:00 +00:00
orbiter
f1f03d8c90 more logging for strange network loading bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-19 09:31:56 +00:00
f1ori
4e29e9712a * create cleanupjob for cached failed urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 15:04:00 +00:00
f1ori
a321c7673d * adminAccountForLocalhost only for localhost
* yacy crawls local domains also, if no password is set (the interface is already protected)
* it's not required anymore, to set a password in intranet mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 11:37:30 +00:00
low012
48463c4507 *) General private License? ;-)
*) minor code changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:28:08 +00:00
orbiter
c93f4dda72 - cleaned up yacy news
- removed unused methods
- avoid news generation in case that the peer runs in robinson mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:00:14 +00:00
orbiter
6c1b14c8e1 - more control in access tracker: count number of returned search results (not only info how much is in the index)
- extended query params for this
- enhanced cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-11 22:58:14 +00:00
low012
9f38c0023d *) Minor changes, mainly cleaning up a little bit, no functional changes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 20:24:52 +00:00
orbiter
54e77e6255 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 08:40:41 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
lotus
0e54233408 UPnP: map port again if we are not reachable (e.g. when router rebooted)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 21:17:21 +00:00
lotus
b1484299b2 same units for memory observer configuration (MiB)
old setting for DHT (RAM) will be lost after update
can be set on /Performance_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 20:38:01 +00:00
orbiter
89ae6101b9 fix for NPE and added comment in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 14:51:07 +00:00
orbiter
0769f4caa6 added search suggestions for interactive search: is only shown if there are no search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 14:30:25 +00:00
orbiter
a4c9d27287 - moved some variables from Stwitchboard to new class AccessTracker
- added a limitation in access tracking to delete queries which are older than 10 minutes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 01:54:27 +00:00
f1ori
e4aabaa1c3 * fix negative filelength for files >2G
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 17:25:39 +00:00
orbiter
cdfe8afe3f fix for really bad table iteration implementation: reduction of IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 16:44:55 +00:00
f1ori
ee3cef91e8 * fix filesize in ftp crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 02:15:22 +00:00
orbiter
b2ed4cfaf8 more small bugfixes and light refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 01:57:05 +00:00
low012
3d95981f7d *) cleaning up the code a little bit
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 17:07:21 +00:00
orbiter
6b70393d1d - new java version 1.6
- replaced old gif animator by java 1.6 gif animator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-20 22:51:50 +00:00
orbiter
e88c428008 fix to ftp loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-18 10:22:54 +00:00
orbiter
9b25a33fd9 - fixed numerous bugs
- better document names
- fixed problem with ftp crawling
- added automatic removal of search results from services that are not online according to the latest network scan: this does not delete the index but just does not show them. after the next network scan when the server is available again, the results are again showed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-17 17:30:09 +00:00
orbiter
7bdb13bf7f more fixes to smb crawling: better file names
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-17 00:52:24 +00:00
orbiter
94c48500cc several fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-17 00:11:42 +00:00
orbiter
58b59f9bc8 - a collection of bug fixes and some redesign of the Scanner class
- fixed smb crawling
- added smbget to download script generation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 23:37:21 +00:00
orbiter
c54170421a fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 11:19:22 +00:00
low012
6f4f957e50 *) cleaning up the code a little bit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 00:18:05 +00:00
f1ori
2521677a45 * deny adminForLocalhost and intranet network setup also on bootup and not only on network switch
* require authentication for yacybot what ever adminForLocalhost is set to
  (after this patch, is the rule from above really nesseccary,
  the crawler also checks the robots.txt)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 21:39:02 +00:00
f1ori
9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 19:20:00 +00:00
orbiter
56264dcc17 - added CamelCase parser to MultiProtocolURI: generate better to-be-indexed words from urls
- integrated new parser into loader processes: enrich document parser
- fixed a concurrent modification exception in kelondro iterator
- hand-over of document size from crawler to indexer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-15 00:03:19 +00:00
orbiter
acab6801d9 added new network scanner
- you can scan any ip or host in the internet for services
- this replaces the intranet scanner

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-13 18:19:37 +00:00
orbiter
a563b05b60 enhanced crawler:
- added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big
- the noload queue is emptied with the parser process which indexes the file names only
- the 'start from file' functionality now also reads from ftp crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-11 00:31:57 +00:00
orbiter
c36da90261 added a very fast ftp file list generator to site crawler:
- when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once
- the harvester runs concurrently and feeds into the normal crawl queue

also in this:
- fixed the 'start from file' crawl function
- added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags.
- this causes that a crawl start is now also possible from clear text link files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-09 17:17:25 +00:00