yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
reger	c6687dd560	fix a system.out to log.fine in bmpParser	2015-10-25 00:26:45 +02:00
Michael Peter Christen	ac034db8bc	Merge branch 'master' of https://github.com/luccioman/yacy_search_server # Conflicts: # htroot/js/highslide/highslide.js # source/net/yacy/document/ImageParser.java	2015-10-24 11:22:35 +08:00
luc	5902ce032e	Corrected NullPointerException case when ImageIO reader is not found for image format.	2015-10-19 14:11:26 +02:00
reger	c6495a5b62	add a log entry on parsing ajax crawling scheme snapshot (prev. commit `9252e36aeb`)	2015-10-18 06:19:12 +02:00
reger	9252e36aeb	implement ajax crawling scheme for ajax sites which adhere to the proposed use of hash-bangs to provide html content see freshly deprecated https://developers.google.com/webmasters/ajax-crawling/ Implementation improves parsing of the homepage (ajax page) which uses metatag "fragment" in header and parses supplied html snapshot instead of mostly empty ajax/scripted page. Implementation supports also hash-bang urls (url with anchor starting with ! like ...path#!hashfragment) but our crawler filters it (use of hash-bang is controversly discussed and proposal is deprecated, makes no sense to adjust the crawler, but as long as it is used by some sites the minor change/improvement in htmlparser is good for some time). Quick - how does it work - if metatag fragment with content "!" is found - htmlparser tries to get content of htmls snapshot (using a different url) - htmlparser returns 2 documents (original url and snapshot content - but using same original url) - after parsing result documents are joined (and stored to index containing content also from snapshot page... as the original ajax page contains typically no parseable html content)	2015-10-18 05:51:01 +02:00
Michael Peter Christen	7d075a1d76	added log lines	2015-10-16 23:30:04 +02:00
luc	d6522fa4a2	Integrated haraldk/TwelveMonkeys library to first add TIF image format support.	2015-10-15 10:06:51 +02:00
reger	78e8c6f3e5	refactor special handling (static override) of SUPPORTED_EXTENSIONS/MIME_TYPES not used for genericImageParser	2015-10-11 01:23:52 +02:00
reger	d54c5d310a	add links with image extension not automatically to image links. With the wide spread use e.g. of Wikimedia the url file extension of links with image extension often point to html.	2015-10-10 23:49:58 +02:00
reger	851e8f6c8a	check jpeg file signature in genericImageParser to fail early without further object allocation if source is not a jpeg.	2015-10-05 01:58:31 +02:00
reger	d5330391de	remove some unused var allocation in parser	2015-10-01 23:11:58 +02:00
reger	7c82cd4415	add a end condition to svgParser for wrong content (if parser choosen just by file extension)	2015-09-29 22:57:33 +02:00
reger	356d4d1301	remove rdfParser from init (current function identical with genericParser)	2015-09-26 17:30:34 +02:00
reger	c647d899e3	add svgParser to parse metadate from svg images Reads document level included title and description and skips the graphic content to save bandwidth. svg metadata element is not interpreted - remove rdfParser from init (current function identical with genericParser)	2015-09-26 17:27:33 +02:00
reger	bad34804fe	optimize parseInt for <img> tag attribute parsing Performance better as using Numberformat.parse or parseInt(substring())	2015-09-26 15:42:23 +02:00
reger	2f51baff4f	check for loading error (includs unsupported formats) to prevent blank thumbnail display in image search because of not handled source which don't load on click. Now the cross icon indicates the problem (inlcuding not supported format)	2015-09-24 01:58:19 +02:00
reger	a3195d78ae	add Portuguese month names to date recognition	2015-09-20 23:28:42 +02:00
reger	d2cc11ea8f	fix html parser taking <style> content as text. Noticed some result description contain css content from style tag. Added <style> to tag list to scrape it's content not as text + test case included	2015-09-19 05:30:55 +02:00
reger	1e8369e18b	use a parsed date in Document.toString	2015-09-12 22:00:40 +02:00
reger	41c4eade51	extract modification date from vCard (vcfParser)	2015-09-06 04:28:27 +02:00
reger	8768896975	extract lastmodified from openoffice doc set lastmod date in office document parsers	2015-09-06 00:04:54 +02:00
sixcooler	a3dd4be749	added / corrected charste to be 1.7 compatible. @Orbiter: please check is this is ok for you	2015-08-10 20:53:20 +02:00
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	2015-08-10 14:27:44 +02:00
Michael Peter Christen	7b412e8c07	added msg (text emails) format; should be handled by html parser.	2015-07-08 17:36:37 +02:00
Ryszard Goń	59096935d0	Use language-detection library for increased accuracy	2015-07-02 18:41:13 +02:00
Michael Peter Christen	90f75c8c3d	added enrichment of synonyms and vocabularies for imported documents during surrogate reading: those attributes from the dump are removed during the import process and replaced by new detected attributes according to the setting of the YaCy peer. This may cause that all such attributes are removed if the importing peer has no synonyms and/or no vocabularies defined.	2015-07-02 00:23:50 +02:00
Michael Peter Christen	7829480b82	refactoring: separated condenser and tokenizer	2015-07-01 18:28:18 +02:00
Michael Peter Christen	593de05922	enhanced surrogate import process speed (dramatically!)	2015-06-29 12:28:34 +02:00
reger	7478338a40	remove augmented parsing activation from frontend experimental implementation not used and based on error prone experimental rdfaparser	2015-06-05 00:51:00 +02:00
reger	11aa2edfe1	remove RDFa parser activation from frontend reason: experimental implementatin of RDFa parser not executed (limited to special urls) but may cause error on normal html parsing due to a inputstream.reset	2015-06-05 00:15:16 +02:00
Michael Peter Christen	d0aff91f23	fix for index import	2015-06-01 01:56:09 +02:00
Michael Peter Christen	b43811d38c	added surrogate import process for exported solr dumps. Just throw your solr dump file into DATA/SURROGATES/in/ and it will be imported!	2015-05-30 13:19:59 +02:00
reger	8a9622c31c	fix string OoB on getImagelinks with long alttext in description calculation	2015-05-24 01:59:40 +02:00
Michael Peter Christen	ff29b0e503	added option to re-index exported xml snapshot dumps to HTCACHE/snapshots by just placing them in the SURROGATES/in path	2015-05-08 15:30:26 +02:00
Michael Peter Christen	6f4fe4b175	revert of `8a7c68e4c7` keeping surrogates after processing is essential for some users. If the space they are taking is too high, please set up an automatic deletion process (like a cronjob).	2015-05-08 14:01:30 +02:00
Michael Peter Christen	fed26f33a8	enhanced timezone managament for indexed data: to support the new time parser and search functions in YaCy a high precision detection of date and time on the day is necessary. That requires that the time zone of the document content and the time zone of the user, doing a search, is detected. The time zone of the search request is done automatically using the browsers time zone offset which is delivered to the search request automatically and invisible to the user. The time zone for the content of web pages cannot be detected automatically and must be an attribute of crawl starts. The advanced crawl start now provides an input field to set the time zone in minutes as an offset number. All parsers must get a time zone offset passed, so this required the change of the parser java api. A lot of other changes had been made which corrects the wrong handling of dates in YaCy which was to add a correction based on the time zone of the server. Now no correction is added and all dates in YaCy are UTC/GMT time zone, a normalized time zone for all peers.	2015-04-15 13:17:23 +02:00
Michael Peter Christen	b060ba900d	added parsing of contentprop attribute in html tags for content='startDate' and content='endDate'. The value of these field is now written to new solr fields startDates_dts and endDates_dts.	2015-04-13 16:20:00 +02:00
Michael Peter Christen	4cb4f67f38	added parsing of dd, dt and article html fields. The parsed result is written to special solr fields which are deactivated by default.	2015-04-12 22:02:45 +02:00
Michael Peter Christen	4d00175157	<experimental> added parsing of <article> html element. Whenever such an element occurs, the complete content of all article elements replaces the parsed <content> part of documents.	2015-04-10 16:16:20 +02:00
reger	2e8c24e02a	fix link to DeReWo download file	2015-03-11 20:02:23 +01:00
Michael Peter Christen	893889bc7b	added special terms for on: - Date modifier: tomorrow, today; i.e.: search for: "Berlin on:tomorrow" to find events happening tomorrow in Berlin	2015-03-02 13:10:05 +01:00
Michael Peter Christen	535f1ebe3b	added a new way of content browsing in search results: - date navigation The date is taken from the CONTENT of the documents / web pages, NOT from a date submitted in the context of metadata (i.e. http header or html head form). This makes it possible to search for documents in the future, i.e. when documents contain event descriptions for future events. The date is written to an index field which is now enabled by default. All documents are scanned for contained date mentions. To visualize the dates for a specific search results, a histogram showing the number of documents for each day is displayed. To render these histograms the morris.js library is used. Morris.js requires also raphael.js which is now also integrated in YaCy. The histogram is now also displayed in the index browser by default. To select a specific range from a search result, the following modifiers had been introduced: from:<date> to:<date> These modifiers can be used separately (i.e. only 'from' or only 'to') to describe an open interval or combined to have a closed interval. Both dates are inclusive. To select a specific single date only, use the 'to:' - modifier. The histogram shows blue and green lines; the green lines denot weekend days (saturday and sunday). Clicking on bars in the histogram has the following reaction: 1st click: add a from:<date> modifier for the date of the bar 2nd click: add a to:<date> modifier for the date of the bar 3rd click: remove from and date modifier and set a on:<date> for the bar When the on:<date> modifier is used, the histogram shows an unlimited time period. This makes it possible to click again (4th click) which is then interpreted as a 1st click again (sets a from modifier). The display feature is NOT switched on by default; to switch it on use the /ConfigSearchPage_p.html servlet.	2015-03-02 04:30:10 +01:00
reger	2d2299f484	fix mimetype of rss items in rss parser - remove self reference as anchor for items	2015-02-25 01:58:42 +01:00
Michael Peter Christen	b432049d59	enhanced date parsing time	2015-02-25 01:05:46 +01:00
reger	a0f04db9ea	add extracted description/subject to pptParser	2015-02-22 05:31:56 +01:00
reger	7e35518787	add extracted description/subject to docParser	2015-02-16 00:50:16 +01:00
Michael Peter Christen	1f5b5c0111	npe fix for latest scraper feature	2015-02-10 08:33:30 +01:00
Michael Peter Christen	ee97302a23	hack to make date detection faster (while it becomes a bit incomplete regarding language alternatives)	2015-02-09 18:46:06 +01:00
Michael Peter Christen	b5ac29c9a5	added a html field scraper which reads text from html entities of a given css class and extends a given vocabulary with a term consisting with the text content of the html class tag. Additionally, the term is included into the semantic facet of the document. This allows the creation of faceted search to documents without the pre-creation of vocabularies; instead, the vocabulary is created on-the-fly, possibly for use in other crawls. If any of the term scraping for a specific vocabulary is successful on a document, this vocabulary is excluded for auto-annotation on the page. To use this feature, do the following: - create a vocabulary on /Vocabulary_p.html (if not existent) - in /CrawlStartExpert.html you will now see the vocabularies as column in a table. The second column provides text fields where you can name the class of html entities where the literal of the corresponding vocabulary shall be scraped out - when doing a search, you will see the content of the scraped fields in a navigation facet for the given vocabulary	2015-01-30 13:20:56 +01:00
Michael Peter Christen	de3e373913	using precompiled CommonPattern.TAB for split	2015-01-29 02:22:28 +01:00
Michael Peter Christen	1f5047b15f	using precompiled pattern CommonPattern.SEMICOLON for splits	2015-01-29 02:19:41 +01:00
Michael Peter Christen	69eacdf4eb	applying precompiled CommonPattern.COMMA.split to all places where split(",") was used	2015-01-29 01:46:22 +01:00
reger	5ca0762179	fix: eom on parsing ico file by genericImageParser trace: java.lang.OutOfMemoryError: Java heap space at java.awt.image.DataBufferInt.<init>(DataBufferInt.java:75) at java.awt.image.Raster.createPackedRaster(Raster.java:467) at java.awt.image.DirectColorModel.createCompatibleWritableRaster(DirectColorModel.java:1032) at java.awt.image.BufferedImage.<init>(BufferedImage.java:331) at net.yacy.document.parser.images.bmpParser$IMAGEMAP.<init>(bmpParser.java:149) at net.yacy.document.parser.images.bmpParser.parse(bmpParser.java:69) at net.yacy.document.parser.images.genericImageParser.parse(genericImageParser.java:116)	2015-01-24 23:17:07 +01:00
Michael Peter Christen	4144c7cc52	do not write frame links to webgraph	2015-01-06 14:14:25 +01:00
reger	3ac1d14a21	improve TexParser.mimeOf( fileextension ) by returning 1st defined in supported list. This prevents unusual mapping of supported fileextension -> mimetype (like htm=application/x-tex)	2015-01-02 04:20:02 +01:00
Michael Peter Christen	d2792a43fd	do not write iframe and embed links into webgraph, but use them anyway for crawling	2015-01-02 02:44:03 +01:00
Michael Peter Christen	6ad43c4a8b	removed debug code	2014-12-22 14:24:09 +01:00
Michael Peter Christen	9e588944fa	prevent NPE during initialization of very large vocabularies	2014-12-21 19:02:36 +01:00
Michael Peter Christen	8c3e5b7b6d	added experimental pdf splitting which enables YaCy to split pdfs during parsing into individual pages and add them all using different URLs. These constructed urls are generated from the source url with an appended page=<pagenumber> attribute to the url get/post properties. This will distinguish the different page entries. The search result list will then replace the post parameter with a url anchor # mark which causes that the original url is presented in the search result. These URLs can be opened directly on the correct page using pdf.js which is now built-in into firefox. That means: if you find a search hit on page 5 and click on the search result, firefox will open the pdf viewer and shows page 5.	2014-12-21 18:10:15 +01:00
Michael Peter Christen	65125439fe	added query modifier 'on'. This makes it possible to search for date occurrences within the (web) page documents (not the document last-modified!). This works only if the solr field dates_in_content_sxt is enabled. A search request may then have the form "term on:<date>", like gift on:24.12.2014 gift on:2014/12/24 * on:2014/12/31 For the date format you may use any kind of human-readable date representation(!yes!) - the on:<date> parser tries to identify language and also knows event names, like: bunny on:eastern .. as long as the date term has no spaces inside (use a dot). Further enhancement will be made to accept also strings encapsulated with quotes.	2014-12-16 13:53:12 +01:00
Michael Peter Christen	8b5d074715	fix for image parser (there is a class missing!)	2014-12-16 12:10:15 +01:00
reger	9edc7308aa	update to metadata-extractor-2.7.0.jar add 2 simple JUnit test cases for jpeg and tif parsing	2014-12-15 20:45:05 +01:00
Michael Peter Christen	bbf0ac40c3	add the actual DateDetection class... (missed in latest commit)	2014-12-14 13:43:30 +01:00
Michael Peter Christen	66b5a56976	Added and integrated new date detection class which can identify date notions within the fulltext of a document. This class attempts to identify also dates given abbreviated or with missing year or described with names for special days, like 'Halloween'. In case that a date has no year given, the current year and following years are considered. This process is therefore able to identify a large set of dates to a document, either because there are several dates given in the document or the date is ambiguous. Four new Solr fields are used to store the parsing result: dates_in_content_sxt: if date expressions can be found in the content, these dates are listed here in order of the appearances dates_in_content_count_i: the number of entries in dates_in_content_sxt date_in_content_min_dt: if dates_in_content_sxt is filled, this contains the oldest date from the list of available dates #date_in_content_max_dt: if dates_in_content_sxt is filled, this contains the youngest date from the list of available dates, that may also be possibly in the future These fields are deactiviated by default because the evaluation of regular expressions to detect the date is yet too CPU intensive. Maybe future enhancements will cause that this is switched on by default. The purpose of these fields is the creation of calendar-like search facets, to be implemented next.	2014-12-14 13:40:45 +01:00
Michael Peter Christen	6a1865f507	refactoring date -> lastModified	2014-12-11 23:37:41 +01:00
Michael Peter Christen	8df8ffbb6d	enhanced the snapshot functionality: - snapshots can now also be xml files which are extracted from the solr index and stored as individual xml files in the snapshot directory along the pdf and jpg images - a transaction layer was placed above of the snapshot directory to distinguish snapshots into 'inventory' and 'archive'. This may be used to do transactions of index fragments using archived solr search results between peers. This is currently unfinished, we need a protocol to move snapshots from inventory to archive - the SNAPSHOT directory was renamed to snapshot and contains now two snapshot subdirectories: inventory and archive - snapshots may now be generated by everyone, not only such peers running on a server with xkhtml2pdf installed. The expert crawl starts provides the option for snapshots to everyone. PDF snapshots are now optional and the option is only shown if xkhtml2pdf is installed. - the snapshot api now provides the request for historised xml files, i.e. call: http://localhost:8090/api/snapshot.xml?urlhash=Q3dQopFh1hyQ The result of such xml files is identical with solr search results with only one hit. The pdf generation has been moved from the http loading process to the solr document storage process. This may slow down the process a lot and a different version of the process may be needed.	2014-12-09 16:20:34 +01:00
reger	28456dfc09	skip creation of unused Bluelist contenttransformer	2014-12-02 21:03:00 +01:00
Michael Peter Christen	321840fde3	Replaced all fixed thread pools with cached thread pools. The cached thread pools will flush their cached (dead) threads after 60 seconds. This will cause that YaCy now runs constantly withl about 50 threads, about 100 at peak times. Previously, about 400 threads had been cached and kept in a hibernation state, which caused that the numproc counter in /proc/user_beancounters (exists only in VM-hosted linux) was as high as the cached number of threads. This caused that VM supervisors terminated whole VM sessions if a limit was reached. Many VM providers have limits of numproc=96 which made it virtually impossible to run YaCy on such machines. With this change, it will be possible to run many YaCy instances even on VM hosts.	2014-12-02 16:26:07 +01:00
Michael Peter Christen	a1ee101079	recognize more html file extensions	2014-12-02 12:10:44 +01:00
reger	0c97cc2440	skip unused call parameter for hashSentence()	2014-11-30 19:42:33 +01:00
reger	5790c7242e	skip to tokenize punktuation as word in WordTokenizer remove unused variables in condenser related to Tokenizer	2014-11-29 17:16:05 +01:00
Michael Peter Christen	6a2a669db4	added loading of the synonyms file from addon/synonyms into the knowledge loader	2014-11-19 17:36:56 +01:00
Michael Peter Christen	07c5b57953	removed warnings	2014-10-15 11:19:25 +02:00
reger	59c6532a65	add link extraction to pdfParser this extracts clickable links in pdf and adds it to the list of links include a test case for this function this is the corrected comment for commit: `aa2e15d846`	2014-10-06 04:51:31 +02:00
reger	aa2e15d846	allow url parameter in worktable apicall allow url=wwwl?param=a&param=b (with ?, & encoded) fix: http://mantis.tokeek.de/view.php?id=100 fix double adding of '&' in MultiProtocolURL.escape()	2014-10-05 20:05:03 +02:00
reger	b0c87d8240	fix image search expand box, cut-off of 2nd capture line height tested with IE11 and Firefox 32 (change worked for both to show 2nd line without cutting off height) +fix charset parameter in metadataImageParser +update start errMsgTxt to "java 1.7"	2014-10-03 01:43:05 +02:00
Michael Peter Christen	3073c69aee	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-09-30 14:54:06 +02:00
reger	eaccce3467	added metadataImageParser for tif and psd (Photoshop) images. This is a modified genericImageParser adding tif (and psd) support even if java ImageIO plugin for tif is not installed in JDK. Adds just tif and psd to the available parsers. Uses the same library to extract metadata, so could eventually be merged with genericImageParser. All detected metadata are added to the parsed document (potentially some more as with genericImageParser)	2014-09-30 05:04:47 +02:00
reger	a69f5358ff	use javax ImageIO getReader to add supported image extension/mime genericImageParser uses javax ImageIO, supported images depend on available plugins for ImageIO package (this is JDK installation specific). Jpeg, png and gif are availabel by default. Tif and others only on avalable plugin (in classpath). Add supported image type dynamically on startup.	2014-09-29 07:42:51 +02:00
Michael Peter Christen	67cd4c37bd	activated the new apk parser which was already ready but not included in the parser initialization. To make the apk parser usable, the handling of application type links had to be modified. Now all documents which have not a parser attached are placed to the noload-queue while all other documents are parsed using the associated parser class. This may have side-Effects on other parsers and the display of different file classes (images, apps, videos).	2014-09-24 13:32:58 +02:00
reger	03a7a29db3	limit OAI import urn resolver try for Deutsche National Library The resolver service of National Library uses name space nbn, limit use of nbn-resolving.de accordingly to urn:nbn: - add resolver for rfc's	2014-09-14 01:38:27 +02:00
orbiter	b6d57f06eb	enhanced the apk parser (up to beeing production-ready). The parser is not yet activated and will be after the next release step.	2014-09-04 09:41:42 +02:00
orbiter	c9e593cf78	removed warnings	2014-08-11 23:53:12 +02:00
reger	e9eae45b55	simplify rssreader and improve atom feed link extraction - type detection (rss/atom) - init type parameter overwritten during parse, parameter obsolete - detection by endtag changed to simpler first-tag evaluation - channel image not used, removed related extra parser handling - remove unused code (set/getImage) in rssfeed - atom link extraction to account for possible multipe link tags - spec limits link to one with rel="alternate" or one without rel attribute not accounting for the follwing type & hreflang exception yet: o atom:entry elements MUST NOT contain more than one atom:link element with a rel attribute value of "alternate" that has the same combination of type and hreflang attribute values.	2014-08-10 01:29:16 +02:00
reger	8f77719091	fix "Ljava.lang.String" in crawl queue anchor name (e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)	2014-08-04 02:38:58 +02:00
Michael Peter Christen	98f45c9032	fix for image alt attachment to AnchorURLs in html parser.	2014-08-01 12:04:15 +02:00
orbiter	08409ec680	no idea why the words max was an ordered one. This change increaes speed dunring document processin a bit	2014-07-23 17:54:16 +02:00
Michael Peter Christen	b44626e55b	fixed target_alt_t in webgraph	2014-07-22 18:24:10 +02:00
Michael Peter Christen	2de159719b	added an option to set 'obey nofollow' for links with rel="nofollow" attribute in the <a> tag for each crawl. This introduces a lot of changes because it extends the usage of the AnchorURL Object type which now also has a different toString method that the underlying DigestURL.toString. It is therefore not advised to use .toString at all for urls, just just toNormalform(false) instead.	2014-07-18 12:43:01 +02:00
Michael Peter Christen	e039e78210	small bugfixes	2014-07-16 16:04:38 +02:00
Michael Peter Christen	fb3dd56b02	fix for processing of noindex flag in http header	2014-07-10 17:13:35 +02:00
Michael Peter Christen	f3a6b6e21e	fix for bad URL decoding	2014-07-10 01:59:29 +02:00
Michael Peter Christen	aee5b108e5	added linkScraperParser, a parser which ignores the text like the generic parser but extracts links like the htmlParser. This should be used for ASCII documents without known text format annotation like source code files or json documents. Probably also good for xml files without known schema.	2014-07-07 13:37:17 +02:00
reger	40133ba2d0	fix NPE in Condenser, discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"	2014-07-06 13:24:36 +02:00
reger	cb2c17d236	extract author and keywords in .doc and .ppt parser	2014-06-29 02:54:09 +02:00
orbiter	fec673c9d1	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	2014-06-27 10:15:37 +02:00
orbiter	4a66af716d	added apkParser stub (work in progress)	2014-06-27 10:15:01 +02:00
reger	2d67f29244	adjust mergeDocument after parsing to - preserve charset and languages - fix merge of author	2014-06-26 22:16:15 +02:00
Michael Peter Christen	0d29b972cc	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-06-26 13:02:56 +02:00
reger	7847a93558	fix AbstractParser.singleList not adding null strings - prevents null titles in oo... parser (as detected by ParserTest) - correct ParserTest dc_description check (dc_description allowed to return 0 length array)	2014-06-26 02:56:45 +02:00

1 2 3 4 5 ...

627 Commits