ArchiveOrangemail archive

Open SA Forum AIS Services mailing list


openais.lists.linux-foundation.org
(List home) (Recent threads) (19 other Linux Foundation lists)

Subscription Options

  • RSS or Atom: Read-only subscription using a browser or aggregator. This is the recommended way if you don't need to send messages to the list. You can learn more about feed syndication and clients here.
  • Conventional: All messages are delivered to your mail address, and you can reply. To subscribe, send an email to the list's subscribe address with "subscribe" in the subject line, or visit the list's homepage here.
  • Low traffic list: less than 3 messages per day
  • This list contains about 12,367 messages, beginning Apr 2007
  • 0 messages added yesterday
Report the Spam
This button sends a spam report to the moderator. Please use it sparingly. For other removal requests, read this.
Are you sure? yes no

corosync offline

Ad
th.schreiber1278417280Tue, 06 Jul 2010 11:54:40 +0000 (UTC)
Hello,

I've build a cluster with just two nodes, both of them see each other, but 
 they don't like to go online. This is my config:

interface {
        bindnetaddr:    172.28.87.0
        mcastaddr:      226.94.1.1
                mcastport:      5420
                ringnumber:     0
}
Both nodes have the same config.
..

# crm_mon --one-shot
============
Last updated: Tue Jul  6 13:38:39 2010
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
1 Resources configured.
============

OFFLINE: [ lis01 lis11 ]
..


I made a tcpdump:
...
13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
....

most of the time, just the .64 node is sending packets. Just this cut 
shows after long time the .66 node
This tcpdump is one the other node near the same, also .64 sends most of 
the packets.

When I stop openais(corosync) on .64 the other node send all the time 
until the .64 is online again.
That seems that both see each other.


The syslog output:

 # tail -f /var/log/messages
Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign 
on to the LRM 6 (30 max) times
Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate 
connection
Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign 
on to the LRM 7 (30 max) times
Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate 
connection
... and so on
Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to 
crmd failed: reply failed
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit: 
Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
pcmk_wait_dispatch: Child process crmd exited (pid=15909, rc=2)
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
pcmk_wait_dispatch: Child respawn count exceeded by crmd
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member: 
Node hhloklis11 now has process list: 00000000000000000000000000111112 
(1118482)
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 
operations (1071.00us average, 0% utilization) in the last 10min
....



OS is SuSE SLES11 SP1

pacemaker-1.1.2-0.2.1
pacemaker-mgmt-2.0.0-0.2.19
corosync-1.2.1-0.5.1
libcorosync4-1.2.1-0.5.1
openais-1.1.2-0.5.19
libopenais3-1.1.2-0.5.19

openais config is empty.


Kernel: 2.6.32.12-0.7-default      x86_64


Any help?


Thomas Schreiber
Andrew Beekhof 1278484771Wed, 07 Jul 2010 06:39:31 +0000 (UTC)
On Tue, Jul 6, 2010 at 1:53 PM,   wrote:
>
> Hello,
>
> I've build a cluster with just two nodes, both of them see each other, but
>  they don't like to go online. This is my config:
>
> interface {
>         bindnetaddr:    172.28.87.0
>         mcastaddr:      226.94.1.1
>                 mcastport:      5420
>                 ringnumber:     0
> }
> Both nodes have the same config.
> ..
>
> # crm_mon --one-shot
> ============
> Last updated: Tue Jul  6 13:38:39 2010
> Stack: openais
> Current DC: NONE
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
>
> OFFLINE: [ lis01 lis11 ]
> ..
>
>
> I made a tcpdump:
> ...
> 13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
> 13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
> 13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> ....
>
> most of the time, just the .64 node is sending packets. Just this cut shows
> after long time the .66 node
> This tcpdump is one the other node near the same, also .64 sends most of the
> packets.
>
> When I stop openais(corosync) on .64 the other node send all the time until
> the .64 is online again.
> That seems that both see each other.
>
>
> The syslog output:
>
>  # tail -f /var/log/messages
> Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on
> to the LRM 6 (30 max) times
> Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on
> to the LRM 7 (30 max) times
> Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> ... and so onSo did you check if the lrmd was running (and if not, why not)?> Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to
> crmd failed: reply failed
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit:
> Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: pcmk_wait_dispatch:
> Child process crmd exited (pid=15909, rc=2)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: pcmk_wait_dispatch:
> Child respawn count exceeded by crmd
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member: Node
> hhloklis11 now has process list: 00000000000000000000000000111112 (1118482)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message:
> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message:
> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 operations
> (1071.00us average, 0% utilization) in the last 10min
> ....
>
>
>
> OS is SuSE SLES11 SP1
>
> pacemaker-1.1.2-0.2.1
> pacemaker-mgmt-2.0.0-0.2.19
> corosync-1.2.1-0.5.1
> libcorosync4-1.2.1-0.5.1
> openais-1.1.2-0.5.19
> libopenais3-1.1.2-0.5.19
>
> openais config is empty.
>
>
> Kernel: 2.6.32.12-0.7-default      x86_64
>
>
> Any help?
>
>
> Thomas Schreiber
> _______________________________________________
> Openais mailing list
> 
> https://lists.linux-foundation.org/mailman/li...
>
th.schreiber1278490002Wed, 07 Jul 2010 08:06:42 +0000 (UTC)
Hello Andrew,
yes, the lrmd is running, but it has defunct:

root      6068  6044  0 Jul06 ?        00:00:00 [lrmd] <defunct>
root      6076  6044  0 Jul06 ?        00:00:00 /usr/lib64/heartbeat/lrmd



Thomas Schreiber




Andrew Beekhof  
07.07.2010 08:38

An

Kopie
"Openais@lists.linux-foundation.org" 
Thema
Re: [Openais] corosync offlineOn Tue, Jul 6, 2010 at 1:53 PM,   wrote:
>
> Hello,
>
> I've build a cluster with just two nodes, both of them see each other,but>  they don't like to go online. This is my config:
>
> interface {
>         bindnetaddr:    172.28.87.0
>         mcastaddr:      226.94.1.1
>                 mcastport:      5420
>                 ringnumber:     0
> }
> Both nodes have the same config.
> ..
>
> # crm_mon --one-shot
> ============
> Last updated: Tue Jul  6 13:38:39 2010
> Stack: openais
> Current DC: NONE
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
>
> OFFLINE: [ lis01 lis11 ]
> ..
>
>
> I made a tcpdump:
> ...
> 13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
> 13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
> 13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> ....
>
> most of the time, just the .64 node is sending packets. Just this cutshows> after long time the .66 node
> This tcpdump is one the other node near the same, also .64 sends most ofthe> packets.
>
> When I stop openais(corosync) on .64 the other node send all the timeuntil> the .64 is online again.
> That seems that both see each other.
>
>
> The syslog output:
>
>  # tail -f /var/log/messages
> Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed tosign on> to the LRM 6 (30 max) times
> Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed tosign on> to the LRM 7 (30 max) times
> Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> ... and so onSo did you check if the lrmd was running (and if not, why not)?


> Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply 
to> crmd failed: reply failed
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit:
> Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR:pcmk_wait_dispatch:> Child process crmd exited (pid=15909, rc=2)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR:pcmk_wait_dispatch:> Child respawn count exceeded by crmd
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member:Node
> hhloklis11 now has process list: 00000000000000000000000000111112 
(1118482)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: 
route_ais_message:> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN:route_ais_message:> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28operations> (1071.00us average, 0% utilization) in the last 10min
> ....
>
>
>
> OS is SuSE SLES11 SP1
>
> pacemaker-1.1.2-0.2.1
> pacemaker-mgmt-2.0.0-0.2.19
> corosync-1.2.1-0.5.1
> libcorosync4-1.2.1-0.5.1
> openais-1.1.2-0.5.19
> libopenais3-1.1.2-0.5.19
>
> openais config is empty.
>
>
> Kernel: 2.6.32.12-0.7-default      x86_64
>
>
> Any help?
>
>
> Thomas Schreiber
> _______________________________________________
> Openais mailing list
> 
> https://lists.linux-foundation.org/mailman/li...
>
Dejan Muhamedagic 1278492307Wed, 07 Jul 2010 08:45:07 +0000 (UTC)
Hi,On Wed, Jul 07, 2010 at 10:04:34AM +0200,  wrote:
> Hello Andrew,
> yes, the lrmd is running, but it has defunct:
> 
> root      6068  6044  0 Jul06 ?        00:00:00 [lrmd] <defunct>
> root      6076  6044  0 Jul06 ?        00:00:00 /usr/lib64/heartbeat/lrmdThe first instance of lrmd exited. We'd need the full logs to say
what happened. Since this is SLE11, you can open a call with
Novell for the incident. BTW, it is strange that there's a zombie
still, corosync should've collected the status.

Thanks,

Dejan> 
> Thomas Schreiber
> 
> 
> 
> 
> Andrew Beekhof  
> 07.07.2010 08:38
> 
> An
> 
> Kopie
> "Openais@lists.linux-foundation.org" 
> Thema
> Re: [Openais] corosync offline
> 
> 
> 
> 
> 
> On Tue, Jul 6, 2010 at 1:53 PM,   wrote:
> >
> > Hello,
> >
> > I've build a cluster with just two nodes, both of them see each other, 
> but
> >  they don't like to go online. This is my config:
> >
> > interface {
> >         bindnetaddr:    172.28.87.0
> >         mcastaddr:      226.94.1.1
> >                 mcastport:      5420
> >                 ringnumber:     0
> > }
> > Both nodes have the same config.
> > ..
> >
> > # crm_mon --one-shot
> > ============
> > Last updated: Tue Jul  6 13:38:39 2010
> > Stack: openais
> > Current DC: NONE
> > 2 Nodes configured, 2 expected votes
> > 1 Resources configured.
> > ============
> >
> > OFFLINE: [ lis01 lis11 ]
> > ..
> >
> >
> > I made a tcpdump:
> > ...
> > 13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> > 13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
> > 13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
> > 13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> > 13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> > 13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> > ....
> >
> > most of the time, just the .64 node is sending packets. Just this cut 
> shows
> > after long time the .66 node
> > This tcpdump is one the other node near the same, also .64 sends most of 
> the
> > packets.
> >
> > When I stop openais(corosync) on .64 the other node send all the time 
> until
> > the .64 is online again.
> > That seems that both see each other.
> >
> >
> > The syslog output:
> >
> >  # tail -f /var/log/messages
> > Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to 
> sign on
> > to the LRM 6 (30 max) times
> > Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> > (I_NULL) just popped!
> > Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> > connection
> > Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to 
> sign on
> > to the LRM 7 (30 max) times
> > Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> > (I_NULL) just popped!
> > Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> > connection
> > ... and so on
> 
> So did you check if the lrmd was running (and if not, why not)?
> 
> 
> > Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply 
> to
> > crmd failed: reply failed
> > Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit:
> > Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
> > Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
> pcmk_wait_dispatch:
> > Child process crmd exited (pid=15909, rc=2)
> > Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
> pcmk_wait_dispatch:
> > Child respawn count exceeded by crmd
> > Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member: 
> Node
> > hhloklis11 now has process list: 00000000000000000000000000111112 
> (1118482)
> > Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: 
> route_ais_message:
> > Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> > Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN: 
> route_ais_message:
> > Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> > Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 
> operations
> > (1071.00us average, 0% utilization) in the last 10min
> > ....
> >
> >
> >
> > OS is SuSE SLES11 SP1
> >
> > pacemaker-1.1.2-0.2.1
> > pacemaker-mgmt-2.0.0-0.2.19
> > corosync-1.2.1-0.5.1
> > libcorosync4-1.2.1-0.5.1
> > openais-1.1.2-0.5.19
> > libopenais3-1.1.2-0.5.19
> >
> > openais config is empty.
> >
> >
> > Kernel: 2.6.32.12-0.7-default      x86_64
> >
> >
> > Any help?
> >
> >
> > Thomas Schreiber
> > _______________________________________________
> > Openais mailing list
> > 
> > https://lists.linux-foundation.org/mailman/li...
> >
> 

> _______________________________________________
> Openais mailing list
> 
> https://lists.linux-foundation.org/mailman/li...
Home | About | Privacy