ArchiveOrangemail archive

user.nutch.apache.org


(List home) (Recent threads) (3 other Apache Nutch lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line.
  • Moderate traffic list: up to 30 messages per day
  • This list contains about 9,501 messages, beginning May 2010
  • 3 messages added yesterday

user.nutch.apache.org

March 2011 - page 1
hemantverma09129907213602 Mar 2011* I am using nutch 1.1 for crawling. I am able to crawl so many site without any issue but when I am crawling www.magicbricks.com it is stopping at dept...
Rida Benjelloun 129912236603 Mar 2011 Constellio Enterprise Search team is proud to launch the version 1.2 of its powerful Open Source search engine software. Based on Google Search Applia...
mohammad amin golshani 129913616903 Mar 2011 [ArchiveOrange]: There doesn't seem to be anything here...
McGibbney, Lewis John 129916566703 Mar 2011 Hi list, I have trawled the mail archives for something which could help me on this one, and although there is some interesting past use cases I have ...
Otis Gospodnetic 129920936704 Mar 2011* Hi, People often compare Rackspace and EC2 instances and often show Rackspace has faster servers (for about the same price, I would imagine). But has ...
Klemens Muthmann 129926514404 Mar 2011 Hi, I am trying to configure my nutch crawler with the runbot script from the wiki. http://wiki.apache.org/nutch/Crawl I tried to insert regular expre...
Otis Gospodnetic 129944802106 Mar 2011* Hi, I'm trying to do some basic calculations trying to figure out what, in terms of time, resources, and cost, it would take to crawl 500M URLs. T...
MilleBii 129950274607 Mar 2011* I kind of remember that you can load a single indexed segment, instead of the full index, with Luke, but I can find back how....
MilleBii 129951268907 Mar 2011* Randomly I now seem to get this error in production where it was working fine for more than a year.... java.lang.NullPointerException+ for some querie...
bhawna singh 129951649307 Mar 2011* Hi, I am getting ClassNotFoundException exception for the admin command. I searched on web and I see many other see this error too. I checked Nutch 1....
chidu r 129952454707 Mar 2011* Hi all I am trying to setup nutch 1.2 on Hadoop and used the instructions at http://wiki.apache.org/nutch/NutchHadoopTutor... it has been very useful....
Drew Kutcharian 129955400708 Mar 2011 Hi Everyone, We are looking for someone to help us build a similarity engine. Here are some preliminary specs for the project. 1) We want to be able t...
bhawna singh 129955400908 Mar 2011* Yes I have compiled the java files. I do not see Analyze class in nutch1.2. The nutch 1.2 APIs also do not have Analyze class. What am I missing here?...
Jason 129957849608 Mar 2011 hi guys,by default,Hits.getTotal() returns only an estimate of the total number of hits,I want the exact total number of hits,so I I use the function ...
Volos Stavros 129960106408 Mar 2011 Hi, I am trying to setup a search cluster. Each node has 12 cores and 24 GBytes of memory. Distributed search works properly. However, when I stress m...
Paul Rogers 129961456908 Mar 2011 Hi all I have installed nutch trunk from svn. It's installed (under Linux) in /opt/nutch-svn/trunk with a soft link to this directory as /opt/nutc...
Amin Bandeali 129969621309 Mar 2011* How can I tell after my hadoop configuration that the crawl (fetch, merge, etc) is running on the slave machines? I currently have two nodes. First on...
Otis Gospodnetic 129970804209 Mar 2011* Hi, Here's another Q about a wide, large-scale crawl resource requirements on EC2 - primarily storage and bandwidth needs. Please correct any mist...
jianpeng sun 129971509209 Mar 2011 I am not sure why class WeakHashMap but HashMap is used in the implementation of class PluginRepository. I have never used this kind of map and as far...
McGibbney, Lewis John 129977062710 Mar 2011* Hello list, Straightforward question... I am attempting to run the recrawl script on the wiki with a few minor changes to suit my configuration. Nutch...
bhawna singh 129979423810 Mar 2011* Hi All, I am crawling a URL list of 300K and after fetching around 200K I see IOException: Spill Failed error. Below is the stack trace. Would anyone ...
Jason 130000885213 Mar 2011* hi guys,I want to change the value of a field in index,as far as I know ,I can make a copy of the document,and change the field value,delete the origi...
webdev1977 130010787214 Mar 2011 Any ideas would be greatly appreciated! All of the examples of custom implementations of HtmlParseFilter seems to be suited to one pattern matching fo...
webdev1977 130010893214 Mar 2011 All of the examples of custom implementations of HtmlParseFilter seems to be suited to one pattern matching for one tag in an html page.. For instance...
Jonathan Oulds 130012828414 Mar 2011* Hello there, This is my first foray into managing a search engine so please bear with me. I am trying to index all our in house documentation that we ...
Paul Rogers 130012864014 Mar 2011* Dear All I'm having trouble building nutch from the trunk svn. Having built it using ant when I issue the command src/bin/nutch inject crawl/crawl...
Abdulelah almubarak 130017161315 Mar 2011* Hi every body i have some problem when i set up hadoop i have 1 master and 3 slave my problem TaskTracker appear on slave while running $bin/start-all...
Abdulelah almubarak 130018061015 Mar 2011* Hi Everybody. i have some problem when running crawling with hadoop. Error appear in terminal : naba@naba01:~/nutch-1.2$ bin/nutch crawl yahooUrl -dir...
Paul Rogers 130018674815 Mar 2011 Hi Markus Many thanks for the reply I have nutch installed in /opt/nutch/trunk but the nutch executable was in src/bin so I was issuing the command sr...
Gabriele Kahlout 130018766615 Mar 2011* $ export JAVA_HOME=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home Gabriele-Kahlouts-MacBook:~ simpatico$ echo $JAVA_HOME/System/Libr...
Nemani, Raj 130018856715 Mar 2011* All, I use Nutch to crawl couple of internal websites and index the crawl results into Solr. Periodically Urls get removed from these websites and I a...
Paul Rogers 130019417215 Mar 2011* Dear All I'm currently having great difficulty building nutch from trunk. The reason that I'm attempting this is that I wish to use the latest...
Gabriele Kahlout 130020192115 Mar 2011* Hello, I'm trying to build a custom search engine using nutch + solr and looked forward to try LinkAnalysisTool, as described here<http://today...
McGibbney, Lewis John 130020487015 Mar 2011 Hello list, I have been trying tirelessly to crawl a specific domain. The domain has a redirect therefore I have tried experimenting with various prop...
Paul Rogers 130034835317 Mar 2011* Hi All having fixed the problem with nutch and the InjectorJob Issuing the command runtime/deploy/bin/nutch inject crawl/crawldb urls now gives the er...
McGibbney, Lewis John 130040218217 Mar 2011* Hi list, OK I have seen quite a few threads on this topic as well as a couple of comments appended to the blog entries provided on the wiki. I also po...
Ron Berkle 130043921818 Mar 2011* I would like to upgrade my Nutch version 1.0 to 1.2. I don't see any information in the Nutch tutorials or wiki. Can anyone tell me the steps of u...
Patricio Galeas 130053434019 Mar 2011 Hi, I'm testing the web-crawl using the tutorial from xxxx. When I merge de segments I get a HDFS error (see below). What I'm doing wrong? Tha...
Patricio Galeas 130053829719 Mar 2011 Hi, I'm testing the web-crawl using the tutorial from "How to Setup Nutch (V1.1) and Hadoop". When I merge de segments I get a HDFS erro...
vijaymhaskar 130056071819 Mar 2011* Whenever i trying to make connection through nutch to internet Connection refused exception coming, my internet also working i'm not able to find ...
ramires 130071026521 Mar 2011 hi i use nutch-1.2 on debian with 5 server. My index distributed to 5 server. when i querying on nutch result's contents mixing each other which a...
ramires 130071289821 Mar 2011 i use nutch-1.2 on debian with 5 server. My index distributed to 5 server. when i querying on nutch result's contents mixing each other which are ...
Patricio Galeas 130075909422 Mar 2011 Hi, I have observer a strange results by running "bin/nutch readdb crawl/crawldb -stats", after I moved the index to HDFS. In the last three...
Gabriele Kahlout 130087934123 Mar 2011* Hello, I've downloaded and wrote a simple parser to give me pedia urls from this dbpedia file <http://downloads.dbpedia.org/3.6/en/wikipedia......
Volos Stavros 130089167823 Mar 2011 Hi all, I have been using Nutch 1.1 with hadoop 0.20.2. I was able to achieve 90% utilization on a two-node cluster. Each node has 12 cores. I am tryi...
vijaymhaskar 130094746724 Mar 2011* Sir, I am Vijay Mhaskar and i am new to this technology , i just want to know about how the indexes are managed in case of distributed nutch because i...
εΎεŽšι“ 130095594624 Mar 2011* i have used nutch to crawl web-content sometime, now my crawldb has ten milion pages . but i don't know what urls in it. i want to manage the craw...
Gabriele Kahlout 130096436924 Mar 2011* $ bin/nutch inject crawl/crawldb dmoz Injector: starting at 2011-03-15 22:17:40 Injector: crawlDb: crawl/crawldb Injector: urlDir: dmoz Injector: Conv...
McGibbney, Lewis John 130116238826 Mar 2011* Hi list, Just returned from a couple of weeks working solely with Solr and am experiencing an exact replication of the problem as per thread below, al...
iacueva 130121330427 Mar 2011* Anyone have the book: "Building Search Applications with Lucene and Nutch "...... we are supposed to open source community ..... I need to k...

Next page

Home | About | Privacy