ArchiveOrangemail archive

pig-user.hadoop.apache.org


(List home) (Recent threads) (34 other Apache Hadoop lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • This list contains about 3,324 messages, beginning Nov 2008
  • This list doesn't seem to be active
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

Using Pig with HBase

Ad
Dmitry Demeshchuk 1279734167Wed, 21 Jul 2010 17:42:47 +0000 (UTC)
Greetings.

I'm trying to query HBase using Pig but do something wrong and cannot
figure out what exactly.

1. First, I create a table in HBase:

hbase(main):001:0> create 'test_table', 'test_family'

and add values to it:

hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
hbase(main):009:0> scan 'test_table'

ROW                          COLUMN+CELL
 1                           column=test_family:body,
timestamp=1279710032517, value=body1
 1                           column=test_family:value,
timestamp=1279710094584, value=value1

So, now I have something in base.


2. After that, I try to get data from HBase using Pig:

grunt> A = load 'test_table' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
test_family:value');
grunt> DUMP A;

Then I get an error message:

2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2118: Unable to create input splits for: test_table



Could you please help me to find where I keep screwing up?

Thank you.
Dmitriy Ryaboy 1279738366Wed, 21 Jul 2010 18:52:46 +0000 (UTC)
Which version of Pig are you using? If 0.6, have you tried the elephant bird
HBase loader?
Is there a more detailed stack trace in the pig log?

-DmitriyOn Wed, Jul 21, 2010 at 4:10 AM, Dmitry Demeshchuk wrote:

> Greetings.
>
> I'm trying to query HBase using Pig but do something wrong and cannot
> figure out what exactly.
>
> 1. First, I create a table in HBase:
>
> hbase(main):001:0> create 'test_table', 'test_family'
>
> and add values to it:
>
> hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
> hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
> hbase(main):009:0> scan 'test_table'
>
> ROW                          COLUMN+CELL
>  1                           column=test_family:body,
> timestamp=1279710032517, value=body1
>  1                           column=test_family:value,
> timestamp=1279710094584, value=value1
>
> So, now I have something in base.
>
>
> 2. After that, I try to get data from HBase using Pig:
>
> grunt> A = load 'test_table' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
> test_family:value');
> grunt> DUMP A;
>
> Then I get an error message:
>
> 2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
> - ERROR 2118: Unable to create input splits for: test_table
>
>
>
> Could you please help me to find where I keep screwing up?
>
> Thank you.
>
> --
> Best regards,
> Dmitry Demeshchuk
>
Dmitry Demeshchuk 1279740788Wed, 21 Jul 2010 19:33:08 +0000 (UTC)
I tried to use Pig 0.7 with HBase 0.20.5. Well, jar archive in the Pig
source is for 0.20.0 but I kinda hoped it wouldn't make a big
difference.
As for elephant-bird, I downloaded it but it didn't work for me - I
guess that was a version problem since you mentioned Pig 0.6.


Here's the full dump from console:

grunt> A = load 'test_table' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
test_family:value');
grunt> DUMP A;
2010-07-21 14:27:37,414 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2010-07-21 14:27:37,474 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: Store(file:/tmp/temp-2060573535/tmp1000611667:org.apache.pig.builtin.BinStorage)
- 1-4 Operator Key: 1-4)
2010-07-21 14:27:37,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-07-21 14:27:37,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-07-21 14:27:37,527 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:37,533 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:37,534 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2010-07-21 14:27:38,995 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-07-21 14:27:39,040 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,041 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2010-07-21 14:27:39,049 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
2010-07-21 14:27:39,164 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,183 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:host.name=docspider.pravo.ru
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.version=1.6.0_20
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun
Microsystems Inc.
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.20/jre
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.class.path=/home/dem/pig-0.7.0/bin/../conf:/usr/lib/jvm/java-6-sun-1.6.0.20//lib/tools.jar:/home/dem/pig-0.7.0/bin/../build/classes:/home/dem/pig-0.7.0/bin/../build/test/classes:/home/dem/pig-0.7.0/bin/../pig-0.7.0-core.jar:/home/dem/pig-0.7.0/bin/../build/pig-0.7.1-dev-core.jar:/home/dem/pig-0.7.0/bin/../lib/automaton.jar:/home/dem/pig-0.7.0/bin/../lib/hadoop20.jar:/home/dem/pig-0.7.0/bin/../lib/hbase-0.20.0.jar:/home/dem/pig-0.7.0/bin/../lib/hbase-0.20.0-test.jar:/home/dem/pig-0.7.0/bin/../lib/zookeeper-hbase-1329.jar
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.io.tmpdir=/tmp
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:os.version=2.6.31-20-server
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:user.name=dem
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:user.home=/home/dem
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:user.dir=/home/dem/pig-0.7.0
2010-07-21 14:27:39,291 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
host=localhost:2181 sessionTimeout=60000
watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@6536d9d8
2010-07-21 14:27:39,292 [Thread-5] INFO
org.apache.zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is
false
2010-07-21 14:27:39,315 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Attempting connection to server
localhost/0:0:0:0:0:0:0:1:2181
2010-07-21 14:27:39,318 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Priming connection to
java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1:43793
remote=localhost/0:0:0:0:0:0:0:1:2181]
2010-07-21 14:27:39,321 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Server connection successful
2010-07-21 14:27:39,404 [Thread-5] ERROR
org.apache.hadoop.hbase.mapreduce.TableInputFormat -
java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.getRegionInfo(Unknown Source)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:931)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:573)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:623)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:582)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:623)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:125)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:103)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:73)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getInputFormat(HBaseStorage.java:96)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:257)
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
	at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
	at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.io.IOException: Could not find requested method, the usual cause
is a version mismatch between client and server.
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:723)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
	... 22 more

2010-07-21 14:27:39,542 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-07-21 14:27:39,542 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-07-21 14:27:39,542 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2010-07-21 14:27:39,548 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in:
"file:/tmp/temp-2060573535/tmp1000611667"
2010-07-21 14:27:39,548 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2010-07-21 14:27:39,565 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,568 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2118: Unable to create input splits for: test_table
Details at logfile: /home/dem/pig-0.7.0/pig_1279740439519.log


And the details from the log file:

Pig Stack Trace
---------------
ERROR 2118: Unable to create input splits for: test_table

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias A
        at org.apache.pig.PigServer.openIterator(PigServer.java:521)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
        at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
6015: During execution, encountered a Hadoop error.
        at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
        at .apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
        at .apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
        at .apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at .apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at .apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at .apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at .lang.Thread.run(Thread.java:619)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
2118: Unable to create input splits for: test_table
        ... 8 more
Caused by: java.lang.NullPointerException
        at .apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:273)
        at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)On Wed, Jul 21, 2010 at 10:52 PM, Dmitriy Ryaboy  wrote:
> Which version of Pig are you using? If 0.6, have you tried the elephant bird
> HBase loader?
> Is there a more detailed stack trace in the pig log?
>
> -Dmitriy
>
>
> On Wed, Jul 21, 2010 at 4:10 AM, Dmitry Demeshchuk wrote:
>
>> Greetings.
>>
>> I'm trying to query HBase using Pig but do something wrong and cannot
>> figure out what exactly.
>>
>> 1. First, I create a table in HBase:
>>
>> hbase(main):001:0> create 'test_table', 'test_family'
>>
>> and add values to it:
>>
>> hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
>> hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
>> hbase(main):009:0> scan 'test_table'
>>
>> ROW                          COLUMN+CELL
>>  1                           column=test_family:body,
>> timestamp=1279710032517, value=body1
>>  1                           column=test_family:value,
>> timestamp=1279710094584, value=value1
>>
>> So, now I have something in base.
>>
>>
>> 2. After that, I try to get data from HBase using Pig:
>>
>> grunt> A = load 'test_table' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
>> test_family:value');
>> grunt> DUMP A;
>>
>> Then I get an error message:
>>
>> 2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
>> - ERROR 2118: Unable to create input splits for: test_table
>>
>>
>>
>> Could you please help me to find where I keep screwing up?
>>
>> Thank you.
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk
>>
>
Home | About | Privacy