Commit Graph

191 Commits

Author SHA1 Message Date
orbiter
9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder'
see http://bugs.yacy.net/view.php?id=9

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 07:37:46 +00:00
orbiter
76f2817e00 a fix for the snippet computation and hopefully better snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 23:05:38 +00:00
orbiter
deda54d684 - relaxed matching of string-search (this is now case-insensitive)
- added transport of string-search pattern to remote search protocol
- fixed a problem parsing snippets with a '-' inside

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 22:37:06 +00:00
orbiter
15e3a57b4e removed unused functions in condenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 09:23:10 +00:00
orbiter
e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-01 15:49:04 +00:00
orbiter
4e8fa03514 added more attributes to html evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 15:36:44 +00:00
orbiter
528da7c9ea removed unused class and added license header for new class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 13:14:30 +00:00
orbiter
f6077b3cc0 added more attributes for html parser and enhanced data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-28 13:09:01 +00:00
orbiter
d8e934c085 better abstraction of http client identification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-26 13:35:29 +00:00
orbiter
b77b8cac0c - enhanced html parser: recognized much more details in the content
- added more properties to solr index
- refactoring
- more constants in switchboard
- fix for some NPEs
- recognition of more images
- removed synchronization in HandleMap (obviously not necessary?)
- added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-21 13:58:49 +00:00
orbiter
3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
- added deletion of solr index in IndexControlRWIs
- added asynchronous adding of large url lists (happens when crawls are startet with file)
- fixed npe in Image display
- replaced language warning with fine logging
- added a domain name cache in Domains that helps to speed up the isLocal property (less DNS lookups)
- added a new storage class for this new cache: KeyList. The domain key list is stored in DATA/WORK/globalhosts.list
- added concurrent solr updates and chunked transfers (50 documents until a commit is done) for high-speed feeding (> 40000 ppm)
- fixed a bug in content scraper that chopped off large parts of crawl lists (using crawl start from file)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-18 16:11:16 +00:00
orbiter
958ff4778e enhanced location search:
search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found.
This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed.

Added also location parsing from wikimedia dumps. A wikipedia dump can now also be a source for a location search.
Fixed many smaller bugs in connection with location search.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 15:54:19 +00:00
orbiter
c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 22:04:23 +00:00
orbiter
b788182954 some enhancements to scoring speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 15:17:00 +00:00
orbiter
01690eab86 fix for mediawiki importer and wikicode parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-13 13:22:27 +00:00
orbiter
4c013d9088 more UTF8 getBytes() performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-12 05:02:36 +00:00
orbiter
564184909a enhanced the surrogate parser: better reading of UTF-8 characters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-01 11:05:42 +00:00
orbiter
156cf02703 - added an index constraint 'has location' to the condenser
- added evaluation of the 'has location' constraint to search using the /location operator


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-31 09:41:30 +00:00
orbiter
0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
- added parser for in-text appearing geo-locations
- added geo-locations to rss search result
- added evaluation of metadata-attached geo-locations in yacysearch_location to show search results within a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 23:26:36 +00:00
orbiter
9b25d07295 - added geo information parsing to html parser
- extended metadata information in index with geolocalisation
- added display of location in yacydoc and ViewFile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 00:49:47 +00:00
orbiter
f3baaca920 - enhancements to DNS IP caching and crawler speed
- bugfixes (NPEs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 09:34:10 +00:00
orbiter
78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
This change should affect filtering and navigators and should cause that search navigation gets faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 13:05:51 +00:00
orbiter
a50f28e6e7 - fixed missing save operation for peer name change
- fixed import of mediawiki dump files
- added script to add mediawiki dump files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-19 23:52:09 +00:00
orbiter
1989ebc24b removed more warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 22:52:30 +00:00
orbiter
8f11d3a5bb redesigned the ScoreMap classes:
- new concurrent score map using atom operation from java concurrency classes
- redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap
- switched from DynamicScore to ConcurrentScoreMap usage wherever possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-13 01:41:44 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
orbiter
30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7580 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 12:35:32 +00:00
lotus
cb6d307bba adding extension for parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 20:36:01 +00:00
orbiter
3820525464 more memory protection: auto-flush of caches in case of memory shortage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 16:32:34 +00:00
orbiter
e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 09:29:05 +00:00
low012
3b40b98256 *) set SVN properties
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 01:51:51 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
8d14916c74 more patches for a better out-of-memory management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 01:45:11 +00:00
orbiter
f8d0454c53 small bug fixes and experiments with search speed enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 14:29:22 +00:00
orbiter
5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 21:11:53 +00:00
orbiter
a92d80a545 performance enhancements using an alternative to a insensitive collator (a complex string compare):
- less synchronizations
- better speed
..at most important and commonly used classes: http headers, url parsing and html parsing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-25 11:23:57 +00:00
orbiter
e717bf74ba more logging, more care about OOMs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 09:51:05 +00:00
orbiter
5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-13 17:37:28 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
orbiter
0cdfb82963 replaced more appearance of double values by float values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:06:29 +00:00
orbiter
eb12e15738 moved all Double values to Float values because of
http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/
YaCy does not really need double-precision floating point computation anywhere, so this should not affect any feature

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-01 23:49:11 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
low012
9f38c0023d *) Minor changes, mainly cleaning up a little bit, no functional changes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 20:24:52 +00:00
orbiter
54e77e6255 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-10 08:40:41 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
low012
9eae33f886 *) Ooops...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 13:04:48 +00:00
low012
a001e8075c *) minor enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 13:03:49 +00:00
low012
11ea966f9e *) added SID file (Commodore 64) sound file parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 12:06:04 +00:00
orbiter
3ca06d6290 patch for http://forum.yacy-websuche.de/viewtopic.php?p=21460#p21460
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 23:57:29 +00:00
low012
936e976c23 *) added FreeMind (http://freemind.sourceforge.net/) mindmap parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 20:13:31 +00:00