Commit Graph

86 Commits

Author SHA1 Message Date
orbiter
09987e93fd fixed some more bad handling of byte[]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 22:02:12 +00:00
orbiter
9e853e1977 partly reverting SVN 5818: identical comparator required for join operator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 08:18:01 +00:00
orbiter
63cd152969 fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 22:18:35 +00:00
orbiter
c8624903c6 full redesign of index access data model:
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 15:29:00 +00:00
orbiter
12d81e98eb - fixed bad search results when searching for empty string
- simplified result handling and page composition in case that nothing was searched


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 11:22:43 +00:00
orbiter
dc2af61bc9 allow up to 50 results from remote peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-13 21:47:57 +00:00
orbiter
c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-03 13:23:45 +00:00
orbiter
7ba078daa1 - added fast site-operator
- refactoring merge into BLOBArray

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-02 13:26:47 +00:00
orbiter
6ffc6e3389 more refactoring of indexer and kelondro classes;
- integrating the indexer into kelondro as package 'text'
- renaming of classes in kelondro.index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 10:00:32 +00:00
orbiter
76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:58:14 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
lotus
1951d30a62 addendum to last commit
handle words with length < 3 correctly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:43:40 +00:00
lotus
325ba7bfb8 only query words with length > 2
this is not complete, yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 16:41:38 +00:00
orbiter
2f49666908 integrated the character decoding into the parser, removed old code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 20:56:13 +00:00
orbiter
0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
The old process used a not really efficient way to detect html encoding strings in texts.
All calling methods had been adoped to call the new class in an enhanced way with less parameters.

Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 18:59:04 +00:00
orbiter
820a03f9d6 - removed some warnings
- used fix in SVN 5233 for ysearch.java and search.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5237 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 20:20:39 +00:00
orbiter
ce57de6cb3 - fixed re-setting of DHT Send/Receive settings
- small change to network grafics: smaller circles / more URLs necessary for full radius; more PPM necessary for full crawling circles
- fixed exclusion search ('-' did not work any more)
- fixed NPE bug when FTP loader wrote to the error-db

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 20:01:10 +00:00
orbiter
1198eeecc7 added language selection to search query:
- the language can be selected using a LANGUAGE:<language> element in the query line, i.e.:
java LANGUAGE:en
- the language can be selected with a post element in google-style syntax with the 'rl' element:
?lr=lang_en&query=java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 07:28:57 +00:00
orbiter
00c1535f84 added ranking and evaluation of language type in a search
the wanted language is taken from the browser user-agent string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 00:04:42 +00:00
orbiter
d3d41e2ee4 - fixed problem with searching with quotes (still not complete, but not as bad as before)
- fixed parsing of crawl-delay statements when seconds were given with float numbers
- enhanced performance of profiling (not too many loggings; not more than one per second)
- removed some debug output
- fixed wrong return type in logging
- added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!)
- fixed wrong word distance computation in RWI management


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-02 23:49:48 +00:00
orbiter
df4ff423c4 added additional properties to query id's to distinguish search events better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-28 21:15:59 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
orbiter
0c173821fd more access security regarding database access and snippet retrieval: restrict number of results for not-authorized searchers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4838 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-23 09:45:33 +00:00
orbiter
56a300f92a bugfix / local-search predicate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-16 21:17:55 +00:00
orbiter
2f29ab8779 more target server access security
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-16 19:50:28 +00:00
lotus
9bc56a9edc xss protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-07 16:37:13 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
93633abed8 - removed some debugging code from search process - should speed up now
- added some profiling code to search event - more time details in PerformanceSearch_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 00:55:04 +00:00
orbiter
b4ed937f1e - modified zone navigation (does still not work correctly)
- added dht switch in network definition
- 0.574

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4550 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 11:09:38 +00:00
orbiter
8d0470a5c6 new method to compute search history IDs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-10 23:40:56 +00:00
orbiter
bfed9c2da6 - some refactoring in search process
- separated sidebars in new search interface and placed them in their own files
  which can be put in into the search page like plug-ins

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4529 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 21:46:55 +00:00
orbiter
61a81820e3 - refactoring of search tracker
- added link to search history to repeat the search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 23:35:48 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
45339c3db5 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-18 17:14:02 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
ecd7f8ba4e - added NEAR operator (must be written in UPPERCASE in search query)
- more generics
- removed unused commons classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-08 20:12:31 +00:00
orbiter
9d8b17188a more generics, bugfixes for wrong cast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 03:39:36 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
orbiter
270d016d89 fix for missing anonymization in search profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 18:57:43 +00:00
orbiter
e3e4f06be4 enhanced search result preparation in the case that no result is found (fast abandon of search)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-12 14:18:42 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
341f7cb327 steps to enhance remote search performance:
- added a file size limitation, that disallows parsing of large documents during (offline-) remote search
- added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-26 10:11:50 +00:00
orbiter
f4a5c287fe re-implemented post-ranking of search results
(should enhanced search result quality)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-08 11:50:19 +00:00