Commit Graph

5137 Commits

Author SHA1 Message Date
orbiter
279482a76d fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-29 08:45:43 +00:00
orbiter
1b86d06d1e fix for http://bugs.yacy.net/view.php?id=62
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-26 10:07:16 +00:00
orbiter
9e4875230f performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-20 23:06:49 +00:00
orbiter
eb9c9edb01 enhanced table method (used by almost all yacy api interfaces)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8000 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-18 23:38:19 +00:00
orbiter
4ad9fc2bff new snippet strategy for search hits in metadata: show beginning of text instead of hit position
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-13 00:34:52 +00:00
orbiter
a9838f8b99 fix for http://bugs.yacy.net/view.php?id=59
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-12 22:26:48 +00:00
hermens
d3df03838a make sure myself-target is always inserted at its appropriate position
this was previously omitted if the own peer should have been the first target
or the peer was the last peer before the rotation to AAAAAAAAAAAA


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-10 15:23:37 +00:00
hermens
c3e7efa846 added sender side prevention of rwi flooding as mentioned in SVN 7993
saves memory and speeds up enqueueContainers by limiting the size of transfer.Chunk
saves network bandwidth by not transmitting RWIs that would get discarded at the target anyway


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-10 14:35:03 +00:00
orbiter
5af9598bd1 enhanced exported row parsing during row import
this affects the search and dht receive speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7994 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-10 09:46:38 +00:00
orbiter
7598a9e26b fix for thread dump
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-07 23:23:49 +00:00
orbiter
8eef8722d1 update to ThreadDump analysis: freerunner and thread state recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-07 22:53:14 +00:00
orbiter
1df43b137d another performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-06 23:35:14 +00:00
orbiter
7df0643f0e performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-06 23:31:04 +00:00
orbiter
a7df70221e refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-04 09:06:24 +00:00
orbiter
1b45e33f04 added robots tag parser to solr scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 13:39:01 +00:00
orbiter
cf4fd525ee added directDocByURL attribute in crawl profile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 12:38:28 +00:00
orbiter
c61e4cfd78 - fix for incomplete clear() in balancer
- renamed Parser Errors to Rejected URLs

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7984 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 10:27:14 +00:00
orbiter
813f297a95 another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 08:26:31 +00:00
orbiter
035ebfbf3b - performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
- this may have also (good) performance side effects on other parts of YaCy


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 07:57:50 +00:00
orbiter
b250e6466d implemented crawl restrictions for IP pattern and country lists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-29 15:17:39 +00:00
f1ori
e207c41c8e * fix urlproxy for urls containing dolar signs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-29 12:53:55 +00:00
orbiter
57d5529a01 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-28 21:16:40 +00:00
orbiter
5ad7f9612b added crawl settings for three new filters for each crawl:
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)

note: the settings and input environment is there with that commit, but the values are not yet evaluated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-27 21:58:18 +00:00
orbiter
47a8c69745 added a new feature to MultiProtocolURIs to get the locale for each url:
This is done using a new library InetAddressLocator.jar which is NOT added by default to YaCy because it is very old and with that library we will never get a debian package. However, some people want that functionality and it can be made available if the library is taken from http://javainetlocator.sourceforge.net/ and placed into the /lib directory where it will be found using reflection.
The new feature will be used to extend the crawler steering.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7975 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-27 15:26:14 +00:00
orbiter
2c3161b4ac refactoring:
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-26 21:42:28 +00:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
low012
42b5f09f68 *) this should fix a bug in snippet creation (also cleaned up a little bit)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:07:22 +00:00
low012
277b454a62 *) added comments
*) minor refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 13:16:52 +00:00
orbiter
6b22865dbc - removed some warinings
- removed a dead update location

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-24 01:58:54 +00:00
orbiter
0c6d95e57b - more tolerance against failure of table opening
- more connections for solrj

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-21 15:08:05 +00:00
orbiter
4f31869c5a enhanced search result timing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-21 10:43:08 +00:00
orbiter
6b02b696b0 - add number of search results to end of rss and json output to reflect latest status of retrieval
- distinguish search access with different verify state in access of search cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 19:41:44 +00:00
f1ori
87e6abd168 * fix urls containing a port number in urlproxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 15:02:15 +00:00
f1ori
97045022fa * pass cookies to Server Side Includes
* User.html a bit more usable


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-20 14:54:14 +00:00
orbiter
ce2a76d603 performance hack for search process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-16 10:00:51 +00:00
orbiter
aaf7a0feaa yet another cache strategy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-15 22:40:01 +00:00
orbiter
8a428d3e77 ensure termination of pdf parser to avoid deadlocking of other processes during search result preparation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-15 11:17:38 +00:00
orbiter
2c4a672fe2 bugfixes and performance hacks for tabe index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-15 11:17:02 +00:00
orbiter
dad5b586a4 added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-15 10:01:21 +00:00
orbiter
734059d33e performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 23:34:05 +00:00
orbiter
23e81b28b2 synchronization enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 21:19:02 +00:00
orbiter
dd4635e323 patches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 20:11:27 +00:00
orbiter
bb0c045036 fix for problem with relocation of network
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-13 18:46:11 +00:00
orbiter
85a5487d6d YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-13 14:39:41 +00:00
orbiter
0819e1d397 protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-09 23:00:45 +00:00
orbiter
52a2b3f110 try to fix bug http://bugs.yacy.net/view.php?id=26
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-08 19:13:19 +00:00
orbiter
2cba860693 - fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
- patch for better urls to solr admin interface

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-08 12:23:55 +00:00
orbiter
2842ce30d6 added synchronization in ReferenceContainer and logging for shrinking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 22:15:01 +00:00
orbiter
cec3836e73 added reference limitation to IndexControlRWIs_p.html servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 21:47:54 +00:00
sixcooler
ecb4986b38 refactored stuff from last commit to ReferenceContainer
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 18:55:16 +00:00