Commit Graph

44 Commits

Author SHA1 Message Date
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00
orbiter
f204076d25 removed usage of temporary files: causes too much IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 22:17:18 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
orbiter
9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
e0da0a84b0 performance fix in http parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:52 +00:00
orbiter
89b4fff1c2 adopted ant script for new exif library
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:36:38 +00:00
orbiter
24e5faee75 added exif parsing for jpg images
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:23:38 +00:00
orbiter
82f76e1296 removed log line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 20:31:38 +00:00
orbiter
0f8004f9da enhanced html parser to recognize a href tags inside header tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 17:52:07 +00:00
orbiter
54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
- added evaluation and indexing prevention in case that a noindexing is given in a html file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-03 23:32:56 +00:00
lotus
38a3d55afd added more possible php extensions for html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-24 20:04:31 +00:00
orbiter
56e0d9bd01 - testings with image parser
- added image size as part of parsed text in images
- avoid unnecessary error messages if parsing of documents failed but one succeeded


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6597 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-19 14:59:58 +00:00
orbiter
7d400b17d0 html parser support for .cfm files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6590 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-17 16:29:49 +00:00
orbiter
f6731c6240 more logging etc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-17 00:41:50 +00:00
orbiter
007f8297de added php3 as extension type for html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-16 15:53:18 +00:00
orbiter
5df628a2a4 - added BEncoder class
- added BEncodedHeap class that encodes B data structures and stores that to a heap
- refactoring of MapView, this is now named MapHeap to fit into the naming scheme of the BEncodedHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-13 16:21:37 +00:00
orbiter
2113fcd7e5 - fixed usage of isEmpty() which is not available in java 1.5
- increased visibility of some methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6564 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 12:33:40 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
orbiter
3f771d2a16 fix for rss parser: be lazy when rss is not well-formed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6552 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 01:02:23 +00:00
orbiter
dff4f95c78 some patches to get the torrent parser working
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 00:42:12 +00:00
orbiter
fbd24c2d84 integrated the torrent parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-04 16:07:31 +00:00
orbiter
bd32f8b8cb added a torrent metadata file parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6546 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-03 23:26:15 +00:00
orbiter
a37878b7d5 url parser regex performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-10 14:40:32 +00:00
orbiter
8281e29963 - more configuration for profiling graph (number of events)
- more logging for a shutdown: print reason and accessing IP into log


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-08 14:25:51 +00:00
orbiter
e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 14:01:19 +00:00
orbiter
4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 00:37:59 +00:00
orbiter
969123385b added json and rss output for image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-23 16:10:50 +00:00
orbiter
d183f8d980 refactoring (moved code from ContentTransformer to TemplateEngine)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 14:57:00 +00:00
orbiter
dbdf2570ba added comparator and more fixes for SortStack/SortStore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 03:30:48 +00:00
orbiter
d2938c44a1 - added bmp parser to the document parsers
- image parser that implement the document parser interface return itself in the list of images of the document which should cause that the parsed images contribute to the image search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-19 23:22:53 +00:00
orbiter
06d0dcde20 more enhancements to image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6490 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-19 00:43:42 +00:00
orbiter
2d8f3ee301 some performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-18 16:03:28 +00:00
orbiter
a97fdb4566 catch for NPE in image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-07 23:39:31 +00:00
orbiter
cd6745b292 accept rss feeds without channel descriptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6464 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 22:46:21 +00:00
orbiter
08f1cbb125 another update to the pdf parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 22:41:37 +00:00
orbiter
605e896d6c more details for exception catching when parsing pdfs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 19:47:24 +00:00
orbiter
4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 20:28:37 +00:00
orbiter
11f7da06ed - fixes to csv parser
- automatic OAI-PMH import by just clicking on one link from the provided resource list

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6449 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 21:18:19 +00:00
orbiter
9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from
http://roar.eprints.org/index.php?action=csv
- integrated the csv parser into the crawlers parser list
- added an extension to the OAI-PMH import function to download and show the roar csv file using the csv parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 20:10:59 +00:00
orbiter
52470d0de4 - fix for xls parser
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-22 22:38:04 +00:00
orbiter
26fafd85a5 - more refactoring
- fixed problem with parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 15:12:34 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00