ArchiveOrangemail archive

zookeeper-user.hadoop.apache.org


(List home) (Recent threads) (34 other Apache Hadoop lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • This list contains about 2,457 messages, beginning Jul 2008
  • This list doesn't seem to be active
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

How to reestablish a session

Ad
Gustavo Niemeyer 1290093863Thu, 18 Nov 2010 15:24:23 +0000 (UTC)
Greetings,

As some of you already know, we've been using ZooKeeper at Canonical
for a project we've been pushing (Ensemble, http://j.mp/dql6Fu).
We've already written down txzookeeper (http://j.mp/d3Zx7z), to
integrate the Python bindings with Twisted, and we're also in the
process of creating a Go binding for the C ZooKeeper library (to be
released soon).

Yesterday, while working on the Go bindings, a test made me wonder
about what's the correct way to reestablish a session with ZooKeeper.

In another thread a couple of months ago, Ben mentioned:> i'm a bit skeptical that this is going to work out properly. a server may
> receive a socket reset even though the client is still alive:
>
> 1) client sends a request to a server
> 2) client is partitioned from the server
> 3) server starts trying to send response
> 4) client reconnects to a different server
> 5) partition heals
> 6) server gets a reset from client
>
> at step 6 i don't think you want to delete the ephemeral nodes.I also don't think it should delete ephemeral nodes.  While performing
some tests, though, I noticed that something similar to this may
happen.

The following sequence was performed in the test:

1) Establish connection A to ZK
2) Create an ephemeral node with A
3) Establish connection B to ZK, reusing the session from A
4) Close connection A
5) The ephemeral node from (2) got deleted.

So, this made me wonder about what's the proper way to reestablish a
session in practice, due to partitioning. Imagine that the
reconnection which happened on (3) was an attempt from the client to
restore the communication with the ZK cluster when faced with
partitioning.  Once the connection succeeded, the old resources from
connection A should be disposed, but how to do this without risking
killing the healthy connection on B (imagine that the network comes
back between (3) and (4)).

Anyone has thoughts on that?
Fournier, Camille F. [Tech] 1290103658Thu, 18 Nov 2010 18:07:38 +0000 (UTC)
This is exactly the scenario that you use to test session expiration, make one connection to a ZK and then another with the same session and password, and close the second connection, which causes the first to expire. It is only a clean close that will cause this to happen, though (one where the client calls close to end the connection). 

Right now, if you have a partition between client and server A, I would not expect server A to see a clean close from the client, but one of the various exceptions that cause the socket to close. These do not do anything currently to change the state of the session, and if the client connects elsewhere before the session timeout, the session will remain active.

C


-----Original Message-----
From: Gustavo Niemeyer [mailto:] 
Sent: Thursday, November 18, 2010 10:16 AM
To: ZooKeeper Users
Subject: How to reestablish a session

Greetings,

As some of you already know, we've been using ZooKeeper at Canonical
for a project we've been pushing (Ensemble, http://j.mp/dql6Fu).
We've already written down txzookeeper (http://j.mp/d3Zx7z), to
integrate the Python bindings with Twisted, and we're also in the
process of creating a Go binding for the C ZooKeeper library (to be
released soon).

Yesterday, while working on the Go bindings, a test made me wonder
about what's the correct way to reestablish a session with ZooKeeper.

In another thread a couple of months ago, Ben mentioned:> i'm a bit skeptical that this is going to work out properly. a server may
> receive a socket reset even though the client is still alive:
>
> 1) client sends a request to a server
> 2) client is partitioned from the server
> 3) server starts trying to send response
> 4) client reconnects to a different server
> 5) partition heals
> 6) server gets a reset from client
>
> at step 6 i don't think you want to delete the ephemeral nodes.I also don't think it should delete ephemeral nodes.  While performing
some tests, though, I noticed that something similar to this may
happen.

The following sequence was performed in the test:

1) Establish connection A to ZK
2) Create an ephemeral node with A
3) Establish connection B to ZK, reusing the session from A
4) Close connection A
5) The ephemeral node from (2) got deleted.

So, this made me wonder about what's the proper way to reestablish a
session in practice, due to partitioning. Imagine that the
reconnection which happened on (3) was an attempt from the client to
restore the communication with the ZK cluster when faced with
partitioning.  Once the connection succeeded, the old resources from
connection A should be disposed, but how to do this without risking
killing the healthy connection on B (imagine that the network comes
back between (3) and (4)).

Anyone has thoughts on that?
Gustavo Niemeyer 1290110548Thu, 18 Nov 2010 20:02:28 +0000 (UTC)
> Right now, if you have a partition between client and server A, I would not expect
> server A to see a clean close from the client, but one of the various exceptions
> that cause the socket to close.Please don't get me wrong, but I find it very funny to rely on the
stability of a network partition to avoid having a session killed.

Either way, that's not a big deal for me, now that I understand the
problem.  Knowing about it, I can simply postpone the close() until a
safe time.  It just felt worth pointing out, since this will arguably
be *very* hard to track down in practice.
Benjamin Reed 1290113144Thu, 18 Nov 2010 20:45:44 +0000 (UTC)
oops, sorry camille, i didn't mean to replicate your answer. you 
explained it better than me :)

benOn 11/18/2010 10:06 AM, Fournier, Camille F. [Tech] wrote:
> This is exactly the scenario that you use to test session expiration, make one connection to a ZK and then another with the same session and password, and close the second connection, which causes the first to expire. It is only a clean close that will cause this to happen, though (one where the client calls close to end the connection).
>
> Right now, if you have a partition between client and server A, I would not expect server A to see a clean close from the client, but one of the various exceptions that cause the socket to close. These do not do anything currently to change the state of the session, and if the client connects elsewhere before the session timeout, the session will remain active.
>
> C
>
>
> -----Original Message-----
> From: Gustavo Niemeyer [mailto:]
> Sent: Thursday, November 18, 2010 10:16 AM
> To: ZooKeeper Users
> Subject: How to reestablish a session
>
> Greetings,
>
> As some of you already know, we've been using ZooKeeper at Canonical
> for a project we've been pushing (Ensemble, http://j.mp/dql6Fu).
> We've already written down txzookeeper (http://j.mp/d3Zx7z), to
> integrate the Python bindings with Twisted, and we're also in the
> process of creating a Go binding for the C ZooKeeper library (to be
> released soon).
>
> Yesterday, while working on the Go bindings, a test made me wonder
> about what's the correct way to reestablish a session with ZooKeeper.
>
> In another thread a couple of months ago, Ben mentioned:
>
>> i'm a bit skeptical that this is going to work out properly. a server may
>> receive a socket reset even though the client is still alive:
>>
>> 1) client sends a request to a server
>> 2) client is partitioned from the server
>> 3) server starts trying to send response
>> 4) client reconnects to a different server
>> 5) partition heals
>> 6) server gets a reset from client
>>
>> at step 6 i don't think you want to delete the ephemeral nodes.
> I also don't think it should delete ephemeral nodes.  While performing
> some tests, though, I noticed that something similar to this may
> happen.
>
> The following sequence was performed in the test:
>
> 1) Establish connection A to ZK
> 2) Create an ephemeral node with A
> 3) Establish connection B to ZK, reusing the session from A
> 4) Close connection A
> 5) The ephemeral node from (2) got deleted.
>
> So, this made me wonder about what's the proper way to reestablish a
> session in practice, due to partitioning. Imagine that the
> reconnection which happened on (3) was an attempt from the client to
> restore the communication with the ZK cluster when faced with
> partitioning.  Once the connection succeeded, the old resources from
> connection A should be disposed, but how to do this without risking
> killing the healthy connection on B (imagine that the network comes
> back between (3) and (4)).
>
> Anyone has thoughts on that?
>
Benjamin Reed 1290109645Thu, 18 Nov 2010 19:47:25 +0000 (UTC)
that quote is a bit out of context. it was with respect to a proposed 
change.

in your scenario can you explain step 4)? what are you closing?

benOn 11/18/2010 07:16 AM, Gustavo Niemeyer wrote:
> Greetings,
>
> As some of you already know, we've been using ZooKeeper at Canonical
> for a project we've been pushing (Ensemble, http://j.mp/dql6Fu).
> We've already written down txzookeeper (http://j.mp/d3Zx7z), to
> integrate the Python bindings with Twisted, and we're also in the
> process of creating a Go binding for the C ZooKeeper library (to be
> released soon).
>
> Yesterday, while working on the Go bindings, a test made me wonder
> about what's the correct way to reestablish a session with ZooKeeper.
>
> In another thread a couple of months ago, Ben mentioned:
>
>> i'm a bit skeptical that this is going to work out properly. a server may
>> receive a socket reset even though the client is still alive:
>>
>> 1) client sends a request to a server
>> 2) client is partitioned from the server
>> 3) server starts trying to send response
>> 4) client reconnects to a different server
>> 5) partition heals
>> 6) server gets a reset from client
>>
>> at step 6 i don't think you want to delete the ephemeral nodes.
> I also don't think it should delete ephemeral nodes.  While performing
> some tests, though, I noticed that something similar to this may
> happen.
>
> The following sequence was performed in the test:
>
> 1) Establish connection A to ZK
> 2) Create an ephemeral node with A
> 3) Establish connection B to ZK, reusing the session from A
> 4) Close connection A
> 5) The ephemeral node from (2) got deleted.
>
> So, this made me wonder about what's the proper way to reestablish a
> session in practice, due to partitioning. Imagine that the
> reconnection which happened on (3) was an attempt from the client to
> restore the communication with the ZK cluster when faced with
> partitioning.  Once the connection succeeded, the old resources from
> connection A should be disposed, but how to do this without risking
> killing the healthy connection on B (imagine that the network comes
> back between (3) and (4)).
>
> Anyone has thoughts on that?
>
Gustavo Niemeyer 1290109971Thu, 18 Nov 2010 19:52:51 +0000 (UTC)
Hi Ben,> that quote is a bit out of context. it was with respect to a proposed
> change.My point was just that the reasoning why you believed it wasn't a good
approach to kill ephemerals in that old instance applies to the new
cases I'm pointing out.  I wasn't suggesting you agreed with my new
reasoning upfront.

> in your scenario can you explain step 4)? what are you closing?

I'm closing the old ZooKeeper handler (zh), after a new one was
established with the same client id.
Benjamin Reed 1290112809Thu, 18 Nov 2010 20:40:09 +0000 (UTC)
ah i see. you are manually reestablishing the connection to B using the 
session identifier for the session with A.

the problem is that when you call "close" on a session, it kills the 
session. we don't really have a way to close a handle without do that. 
(actually there is a test class that does it in java.)

if you want this, you should open a jira to do a close() without killing 
the session.

why don't you let the client library do the move for you?

benOn 11/18/2010 11:51 AM, Gustavo Niemeyer wrote:
> Hi Ben,
>
>> that quote is a bit out of context. it was with respect to a proposed
>> change.
> My point was just that the reasoning why you believed it wasn't a good
> approach to kill ephemerals in that old instance applies to the new
> cases I'm pointing out.  I wasn't suggesting you agreed with my new
> reasoning upfront.
>
>> in your scenario can you explain step 4)? what are you closing?
> I'm closing the old ZooKeeper handler (zh), after a new one was
> established with the same client id.
>
Gustavo Niemeyer 1290125574Fri, 19 Nov 2010 00:12:54 +0000 (UTC)
> why don't you let the client library do the move for you?

Maybe there's no need to reestablish the session manually, but there
are a few details in the API which give a hint this should be
supported.  The strongest one is that there's a parameter in
zookeeper_init() to allow reestablishing an existing session.  Without
the ability to close a previous connection reliably without killing
the existing session, how can we use this parameter and the function
to retrieve the existing client id?  Another hint is in
is_unrecoverable(), which says the application must close the zhandle
and try to reconnect in case it returns true.  Maybe I misinterpreted
it, and it actually means the *session* is dead, rather than just the
connection?
Benjamin Reed 1290153111Fri, 19 Nov 2010 07:51:51 +0000 (UTC)
is_unrecoverable() means exactly that: the session is toast. nothing you 
do will get it back.

zookeeper_init is almost never used with a non-null client_id. the main 
use case for it is crash recovery. i've rarely seen it used, but you can 
start a session, save off the client_id to disk, create ephemerals etc., 
then if your program crashes, you can restart and recover the session 
and pick back up where you left off. in this case we don't worry about 
the session being closed by the previous instance of the program because 
it crashed. it's pretty tricky to use.

benOn 11/18/2010 04:12 PM, Gustavo Niemeyer wrote:
>> why don't you let the client library do the move for you?
> Maybe there's no need to reestablish the session manually, but there
> are a few details in the API which give a hint this should be
> supported.  The strongest one is that there's a parameter in
> zookeeper_init() to allow reestablishing an existing session.  Without
> the ability to close a previous connection reliably without killing
> the existing session, how can we use this parameter and the function
> to retrieve the existing client id?  Another hint is in
> is_unrecoverable(), which says the application must close the zhandle
> and try to reconnect in case it returns true.  Maybe I misinterpreted
> it, and it actually means the *session* is dead, rather than just the
> connection?
>
Gustavo Niemeyer 1290197751Fri, 19 Nov 2010 20:15:51 +0000 (UTC)
> is_unrecoverable() means exactly that: the session is toast. nothing you do
> will get it back.Ok, I was wondering about what exactly was unrecoverable indeed.> zookeeper_init is almost never used with a non-null client_id. the main use
> case for it is crash recovery. i've rarely seen it used, but you can start a
> session, save off the client_id to disk, create ephemerals etc., then if
> your program crashes, you can restart and recover the session and pick back
> up where you left off. in this case we don't worry about the session being
> closed by the previous instance of the program because it crashed. it's
> pretty tricky to use.Understood.  I agree this is a pretty unique case, and a very hard one
to get right by itself (how to get the app in the proper state to
receive watches after the whole application has crashed?).
Home | About | Privacy