Commit Graph

7549 Commits

Author SHA1 Message Date
cominch
09bb7a390c do not replace malformed or invalid URLs in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-12 07:44:23 +00:00
orbiter
c0d9474b31 update to eclipse class path environmen
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-06 14:29:17 +00:00
orbiter
52d799e7c8 fix for solr auth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7833 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-05 09:21:30 +00:00
orbiter
9eb8e9acd9 no error message about missing browser in headless environments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-05 06:54:05 +00:00
orbiter
d3c89b90ce temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 17:04:49 +00:00
orbiter
bd99969758 fixed bad query
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:53:18 +00:00
orbiter
768c59740c - replaced solrj 3.1 with solrj 3.3
- updated also slf4j
- added authentication for solrj


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:35:30 +00:00
orbiter
e7c7598923 docfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 10:48:01 +00:00
low012
c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 23:55:55 +00:00
orbiter
b84089ff04 fix for solr scheme list definition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 22:59:43 +00:00
orbiter
fd02d6d9f8 fixed solr scheme table view
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7825 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 22:55:36 +00:00
orbiter
4f730a711b same for debian as for latest commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 21:40:12 +00:00
orbiter
60ee245486 setting startup options:
-Xss256k
and
-XX:ReservedCodeCacheSize=1024m 
after appearance of a malloc error together with a crash of the jvm which stated at the end of the log:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32756 bytes for ChunkPool::allocate
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=

this follows the last two points in the list of recommendations. To set appropriate values the default values from
http://www.oracle.com/technetwork/java/hotspotfaq-138619.html
and 
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
had been considered

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 21:33:24 +00:00
orbiter
6d2e252bcf fix for:
java.lang.NullPointerException
	at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97)
	at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48)
	at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58)
	at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69)
	at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43)
	at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023)
	at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922)
	at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869)
	at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267)
	at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239)
	at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 20:44:33 +00:00
orbiter
719777b2a7 replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 18:13:37 +00:00
orbiter
2d4bb139d3 - added counting of links with noindex tag for solr index
- bugfixes for solr index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 06:40:05 +00:00
orbiter
528b59e078 replaced xerces.jar library that was originally added 2005 with SVN 126 to the libx directory and that was moved to lib in SVN 5781
the new replacement is taken from http://xerces.apache.org and has the version 2.11.0 and was inside the file Xerces-J-bin.2.11.0.tar.gz
and consists of two files named xercesImpl.jar and xml-apis.jar
The original purpose of that library was to support:
- content parsers
- optional seed uploader
- SOAP API (which will be committed later)
Since the SOAP API does not exist any more the purpose is to support content parser and an optional seed uploader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 22:33:35 +00:00
orbiter
e7e1a0f328 replaced commons-io v1.4 with v2.0.1
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7818 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 21:10:13 +00:00
orbiter
5092a14bcb replaced fontbox, jempbox, pdfbox v 1.5 with v1.6
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 20:52:33 +00:00
lotus
68681a9576 hint for proxy scraping
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 17:23:37 +00:00
lotus
fa6f2c2b44 use proxy accounts by default for more security
http://bugs.yacy.net/view.php?id=45

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 17:16:00 +00:00
orbiter
892caccdca added default configuration in ConfigurationSet in case of new values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 00:09:49 +00:00
orbiter
7bf39c8bcf added XX:MaxPermSize to debian and mac start scripts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-01 22:50:46 +00:00
orbiter
bda3eec0ff added parsing of canonical link element to html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-01 16:38:01 +00:00
orbiter
b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:49:21 +00:00
orbiter
214ea005cf added "-XX:MaxPermSize=256m" to start script
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:44:06 +00:00
orbiter
b666a929e7 fixed Semaphore handling in case of interruptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:37:14 +00:00
orbiter
de7a054d77 added parser for such files like the new solr.key.list
it parses text files with the following syntax:
 - all lines beginning with '##' are comments
 - all non-empty lines not beginning with '#' are keyword lines
 - all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:35:45 +00:00
orbiter
6deef60bc0 added keyword list for solr index attributes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:33:27 +00:00
f1ori
a17351dcfe * navigation bar for filetype constraints
javascript interpreted backslashes from urlmask as escaping and didn't forward them to yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:30:24 +00:00
f1ori
96957375cc * fix url proxy for relative links and chromium
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 09:32:02 +00:00
f1ori
fdc84d8319 small pi link on index page to administration pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 09:32:00 +00:00
orbiter
9ebc75db4b fix for channel authorization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 23:14:02 +00:00
orbiter
267290a821 removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:45:04 +00:00
orbiter
6d9e5865ee faster appearance of search result page (but complete search time is the same)
this was inspired by http://bugs.yacy.net/view.php?id=37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:17:02 +00:00
orbiter
f7ca84cfc0 enhanced template engine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:15:13 +00:00
low012
4fe1329de2 *) trying to at least fix symptoms of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3293#p22791
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7799 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-25 10:15:42 +00:00
orbiter
d8072d1866 added more info to DNS cache in /PerformanceMemory_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-24 08:27:36 +00:00
orbiter
f803da8aae code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-24 00:24:00 +00:00
orbiter
4999740790 added new navigation to search trailer json and xml files which causes that these navigation is also available in the search widget
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-24 00:22:57 +00:00
orbiter
84c9658644 added a file type navigator
added a protocol navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 15:39:52 +00:00
orbiter
31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 11:57:17 +00:00
orbiter
4b425ffdd2 fix for http://bugs.yacy.net/view.php?id=41
added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-22 10:19:32 +00:00
orbiter
a65ecffef6 fix for http://bugs.yacy.net/view.php?id=42
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-22 10:04:30 +00:00
orbiter
7db208c992 performance hacks: more pre-allocated StringBuilder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-21 23:10:50 +00:00
orbiter
87bd559c42 fixed warning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:53:43 +00:00
orbiter
07e89a7ae5 added @Deprecated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:33:45 +00:00
orbiter
9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-20 22:29:45 +00:00
orbiter
996f0a8764 disabled assert in Base64Order which eats away too much performance during testing with -l
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:34:55 +00:00
orbiter
f667b9c289 enhanced identificator: using AtomicInteger for counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-19 13:31:10 +00:00