orbiter
b3238bec83
performance hack for httpd
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6776 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 07:09:55 +00:00
orbiter
1e8e79b9ef
redesign of reference hash (URL-hash) parameter hand-over:
...
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
749ffbd642
- added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception.
...
- reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:46:40 +00:00
orbiter
9ddb8e4a43
set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
e12f1fd821
- added setting of access rights for executable scripts after auto-installation
...
The correct access right was missing expecially for bin/apicall.sh
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6769 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 09:51:01 +00:00
orbiter
95f31da8da
increase dump cache queue length from 1 to 2
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6766 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-24 20:36:35 +00:00
orbiter
6c093d6aed
- enhanced domain navigator computation
...
- fixed domain navigator content in case that a mustmatch constraint was given
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075
using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
90dd197ae7
- no latency for local crawls
...
- catch interrupted exception during 'fast' crawls in workflow processor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:18 +00:00
orbiter
bfb518cd47
some refactoring to get the LoaderDispatcher a little bit more independent from the switchboard
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:28:03 +00:00
orbiter
c855fc48c6
only load robots.txt for http and http protocol
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:15:11 +00:00
orbiter
748abfcffa
added patches to prevent yacy-protocol DoS settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 15:31:15 +00:00
orbiter
e820ed061a
avoiding excessive DNS lookups to determine localhost
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
de88200e11
- added Byte Order Mark recognition to serverObjects
...
The BOM character FEFF may appear at the beginning of strings if some browsers append the characters %EF%BB%BF to input values.
see http://en.wikipedia.org/wiki/Byte_order_mark
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6748 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 10:58:40 +00:00
orbiter
3300930fc5
- (almost) fixed FTP crawler
...
- integrated/fixed SMB crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
9623d9e6d2
added a smb loader component for the YaCy crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
48995e71c4
added soft-auth to general authentication scheme
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 00:07:17 +00:00
orbiter
72f00dee59
removed never-used server access account function
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
57e1eae95e
longer time-out for url fetching .. may help to show all that links that the statistic say for a search result
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6727 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 22:23:08 +00:00
orbiter
9e639603e3
after frequent occurrences of 100% CPU usages and permanent blockings I try to disable a function in a method that may cause the problem when calling an external library (apache http client 3.x). The thread dump that shows the problem is attached here.
...
at java.lang.StringCoding.encode(StringCoding.java:266)
at java.lang.String.getBytes(String.java:946)
at org.apache.commons.httpclient.util.EncodingUtil.getAsciiBytes(EncodingUtil.java:237)
at org.apache.commons.httpclient.methods.multipart.Part.sendDispositionHeader(Part.java:220)
at org.apache.commons.httpclient.methods.multipart.Part.send(Part.java:308)
at org.apache.commons.httpclient.methods.multipart.Part.sendParts(Part.java:385)
at org.apache.commons.httpclient.methods.multipart.MultipartRequestEntity.writeRequest(MultipartRequestEntity.java:164)
at de.anomic.http.client.Client.zipRequest(Client.java:364)
at de.anomic.http.client.Client.POST(Client.java:339)
at de.anomic.yacy.yacyClient.wput(yacyClient.java:285)
at de.anomic.yacy.yacyClient.transferURL(yacyClient.java:1053)
at de.anomic.yacy.yacyClient.transferIndex(yacyClient.java:942)
at de.anomic.yacy.dht.Transmission$Chunk.transmit(Transmission.java:200)
at de.anomic.yacy.dht.Dispatcher.storeDocumentIndex(Dispatcher.java:397)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:103)
at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:66)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:19:23 +00:00
orbiter
4144927d94
show less errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 21:02:08 +00:00
orbiter
b88f5fbb4b
slightly changed crawling policy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:46:08 +00:00
orbiter
7684a575c4
fix for deletion of error database each time when YaCy starts up
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 00:33:39 +00:00
orbiter
f561e340c6
show more results of single domains when not authorized fully (up to 100)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 00:12:58 +00:00
orbiter
c4bdb1e7f2
added one more option in ViewFile to show an iframe like for the orginal web page content but using the cache than the direct link to the content in the web. Upgraded the very old and previously not any more used CacheResource_p servlet to a new and working version.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 23:41:51 +00:00
orbiter
c09a995930
better logging of double occurrences of urls in the crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 20:31:30 +00:00
orbiter
884b262130
- added a new Wiki Namespace Navigator
...
- some redesign of Navigator data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:25:49 +00:00
orbiter
617dfbbd06
allo 'authorization by encoded password' also if requesting client is not from localhost but from the same host as yacy is running on.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 16:03:55 +00:00
orbiter
599c3766c4
added authentication to automated API call
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 14:10:03 +00:00
orbiter
727dd9b193
- fixed a bug in robots.txt parser
...
- moved storage of robots.txt entries to WorkTables, so it is now possible to browse the robots entries with the table browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 11:58:07 +00:00
orbiter
54af9e6b49
- added parsing of robots meta-tag in html headers to detect a noindexing request
...
- added evaluation and indexing prevention in case that a noindexing is given in a html file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-03 23:32:56 +00:00
orbiter
46c4f8b68a
better look-ahead into the crawl queue: show more on crawl monitor
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 23:11:58 +00:00
lotus
7b546415dc
added svn6695 for windows
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 14:58:53 +00:00
orbiter
f175f9a2d3
changed way how number of search requests are counted:
...
so far only search requests at the remote search interface had been counted.
This was done to protect the privacy of searchers, because counting was not done and published at the own search interface.
This caused that no search requests of robinson peers had been counted, becuase they cannot be counted at remote peer.
This change introduces a distinction of locally done search requests at the local search interface from search requests that are on the local interface but had been submitted from a remote IP without authentication.
Now 3 counters are maintained:
- partial count of remote searches
- total count of local searches on robinson peers from non-authenticated clients
- total count of local searches on robinson peers from localhost or authenticated clients
In the global statistic of search requests now the first two counters of the three cases are added
Because we habe a large number of robinson peers with a large number of remote non-authenticated requests the statistic should show at least three times of the number of search requests.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 13:53:55 +00:00
orbiter
84222e3b4f
fix for auto-updater: delete old libraries before copy of new one
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-24 13:46:50 +00:00
orbiter
93b7ddc27d
fix for http://forum.yacy-websuche.de/viewtopic.php?p=19376#p19376
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-21 22:49:35 +00:00
orbiter
8030ed3319
self-healing for lost crawl profile handles
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-18 21:55:45 +00:00
orbiter
e3e5e05ec2
fix for problem in ranking setting which was caused by the introduction of a toString() method in serverObjects
...
see also: http://forum.yacy-websuche.de/viewtopic.php?p=19310#p19310
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-17 21:31:08 +00:00
orbiter
e3ccfb54aa
fix for display problem in Firefox on MacOS X
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-17 09:08:16 +00:00
orbiter
564927ce72
redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-16 23:06:04 +00:00
orbiter
ef62d017e5
integrated session id filtering for crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:15:17 +00:00
orbiter
d8d9984913
added framework for session id filtering (not ready yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:30:41 +00:00
orbiter
2bc36de336
- fix for bug in svn 6669
...
- cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:06:13 +00:00
orbiter
d378ca4604
better handling of concurrency in seed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:57:35 +00:00
orbiter
6538043d89
fix for http://forum.yacy-websuche.de/viewtopic.php?p=19189#p19189
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:45:31 +00:00
lotus
945e0ba5a5
allow global search if res. observer disabled index transmission
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 17:14:16 +00:00
lotus
8faeedd99a
not a fix! for:
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2679
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 09:33:30 +00:00
lotus
11188cd7eb
resource observer now uses the Java 6 method to check for free space. thus, disk observing now needs Java 6 installed.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-06 18:48:06 +00:00
orbiter
be18b5d8cd
fix for 'cannot switch back to default language'-bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 23:53:02 +00:00
orbiter
74e736c903
missing file for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6645 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 14:52:58 +00:00