WebSphere Concepts: Cell, Node, Cluster, Server…

May

WebSphere Concepts: Cell, Node, Cluster, Server…

Categories: JavaEE
Tags: Cell, Cluster, concepts, Deployment Manager, dmgr, Node, Node Agent, Server, WebSphere

Quick post… If you are not familiar with WebSphere at first you might get confused with its concepts: cell, deployment manager, node, node agent, cluster, server, …

First of all, lets start with the concept of a Cell:

A Cell is a virtual unit that is built of a Deployment Manager and one or more nodes. I guess a picture will help making things clearer:

WebSphere Cell

But still there are a few concepts that need to be explained. The next obvious one is the Deployment Manager.

The Deployment Manager is a process (in fact it is an special WebSphere instance) responsible for managing the installation and maintenance of Applications, Connection Pools and other resources related to a J2EE environment. It is also responsible for centralizing user repositories for application and also for WebSphere authentication and authorization.

The Deployment Manager communicates with the Nodes through another special WebSphere process, the Node Agent.

The Node is another virtual unit that is built of a Node Agent and one or more Server instances.

The Node Agent it the process responsible for spawning and killing server processes and also responsible for configuration synchronization between the Deployment Manager and the Node. Extra care must be taken when changing security configurations for the cell, since communication between Deployment Manager and Node Agent is ciphered and secured when security is enabled, Node Agent needs to have configuration fully resynchronized when impacting changes are made to Cell security configuration.

Servers are regular Java process responsible for serving J2EE requests (eg.: serving JSP/JSF pages, serving EJB calls, consuming JMS queues, etc).

And to finish, Clusters are also virtual units that groups Servers so resources added to the Cluster are propagated to every Server that makes up the cluster, this will in fact affect usually more than a single Node instance.

Lets finish this post with another diagram to illustrate all those concepts.

WebSphere Concepts

66 Responses to “WebSphere Concepts: Cell, Node, Cluster, Server…”

Feed for this Entry Trackback Address

1 websphere
May 27, 2009 at 5:58 pm

Thanks for this info

Reply
- 2 Suji
  July 14, 2009 at 8:50 am
  
  Good explanation..I have one question, does the servers in the node, share the server filesystem(server related like wpconfig.properties etc) or do they have sperate file system..
  
  Reply
  - 3 rafaelri
    July 14, 2009 at 12:01 pm
    
    Suji,
    they have a separate file system. But if you server is in a cluster, node agent performs synchronization of cluster related configuration using a file replication mechanism. That’s the reason you should keep system clock in sync across nodes in a cell.
    
    regards,
    Rafael
  - 4 Suji
    July 16, 2009 at 9:54 am
    
    thanks for the explanation, now I know that server has spereate synchronized binaries in all the servers in a cluster.
    Now what happens when we deploy application on the cluster environment? does copy of ear will be created in all the servers?
  - 5 rafaelri
    July 16, 2009 at 12:54 pm
    
    Exactly, Dmgr will send copies of the EAR to all nodes that own a server on the cluster that the application is deployed. It performs the same actions as if it were with configuration files.
  - 6 Suji
    July 16, 2009 at 1:55 pm
    
    I am getting more and more information from your post.
    Now my question is, how the request is handled in a cluster? Does the request pertaining to one user is directed to particular node or it floats between the nodes?
    If it floats between the nodes, how the session management is done?
    
    -Thanks
  - 7 rafaelri
    July 16, 2009 at 3:06 pm
    
    I can guess that you mean a web request right?
    It boils down to the spec, it covers only that if the request is sent to multiple nodes in a cluster this session MUST be propagated.
    WebSphere by itself does not do load balancing for web request between servers in a cluster, you need to install IHS or Edge components for this purpose and how the request will be spread is responsibility of IHS and Edge. And this is also an option that is up to you, it depends on your load scenario, application requirements, etc. There are cases where affinity is the way to go, but affinity comes with the price that you might overload one server. Edge components provide an (almost) smart tool that can monitor IHS to check its load but still it can’t check WebSphere load.
    
    regards,
    Rafael Ribeiro
8 sri
August 19, 2009 at 12:55 pm

hi,
i’m getting more information from ur post. I had a doubt.
node agent is connected to servers so when a node agent is down or it is not working. wil servers work or not.

Reply
- 9 rafaelri
  September 6, 2009 at 2:05 am
  
  no. it wont work. another mandatory process is the dmgr.
  it seems like node agent play part of the role of a name server for the node.
  
  Reply
  - 10 Garry
    October 20, 2010 at 6:44 pm
    
    Wrong. Servers will work without the node agent or deployment manager. They cannot, however, be managed with the nice web-based GUI without the node agents and dmgr running. All of your traffic routing takes place within the web server plugin. If you have no node agents running, you cannot update the application from the dmgr, or start or stop the servers remotely. The node agents are responsible for inter-node communications only. If I have fifteen JVMs per node, all fifteen will run, but they won’t share any information when the node agent goes down.
  - 11 rafaelri
    October 21, 2010 at 12:07 am
    
    Garry,
    
    Are you saying IBM documentation is wrong? This Redbook on PDF page 35 is rather clear:
    “The Node Agent must be running when starting the application server on that node so the application server can register with the Location Service Daemon (LSD). In WebSphere V6 the LSD is HAManager enabled therefore you only need one running Node Agent in the cluster to provide the LSD when the application servers are started on the node.”.
- 12 Srini
  November 17, 2011 at 12:43 am
  
  Rafaelri,
  
  node agent process is mandatory to start the application server on that node, but once the application server has been started, if node agent and DMGR goes down then also your application server will be able to server the requests.
  
  Reply
13 sri
August 20, 2009 at 9:48 am

HI,
ple help to solve my problem.I created bus member,destination nd also message engine. message engine have to get started automaticaaly but it did not get start.unavailable symbol is showing. ple resolve my problem.

Below mentioned is the error when i’m trying to start to start the message engine.

“The messaging engine jaya-e6885a8c72Node01.server1-MsgBus cannot be started as there is no runtime initialized for it yet, retry the operation once it has initialized. If dynamic configuration reload is not enabled for this bus then the server will need to be restarted”.

Reply
- 14 rafaelri
  September 6, 2009 at 2:04 am
  
  Hi Sri,
  
  Probably you have anything wrong with your messaging engine runtime (eg.: your storage folder has files from a previous running engine or database tables are shared by multiple engines [tables for storage are exclusive]) check these common problems.
  
  Reply
15 Samba
November 7, 2009 at 1:42 am

Thanks this info especially the diagram gave me some basic ideas about the concepts.
You may want to include some description of deployment manager, as it definitely works hand in hand with the node, cell concepts.

Reply
16 Shrinivas
November 30, 2009 at 12:43 pm

Hi,

Nice posting very informative. I have a question for you. I have installed multiple server instance under one profile. How do i manage those server, i mean how do i start and stop server. Do i need to have Node agent and Node configured for that. I am using WS v 6.1 Network Deployemtn with advance install option.
Whien i try to start application configiured on differetn server i get error

“Activities can not be started because it is installed on a different server”

Kindly let me know your inpiuts

Thanks
Shrini

Reply
- 17 rafaelri
  December 1, 2009 at 1:08 am
  
  Hi Shrini,
  
  Sincerely I never saw this error message … In regard to multiple servers on a single node, no problem it is perfectly fine, you’ll see the multiples servers at DMgr profile and you’ll also be able to select in which of them (server or cluster) your application module will be installed (you can even change it after it is installed under manage modules option).
  Then, after application is installed and if it is set to auto start with server you only need to start the corresponding server, for this to happen the node agent needs to be running (so the node is active).
  
  best regards,
  Rafael Ribeiro
  
  Reply
18 sai pavan
December 8, 2009 at 3:49 pm

Hi Rafaelri,
i have one question which was asked in an interview.
there are 5 cells on a node .in order to do single sign on, hw many times we have to exchange the keys between these 5 cells?

Reply
- 19 rafaelri
  December 11, 2009 at 11:13 pm
  
  Hi Sai,
  
  Sorry but I sincerely don’t know. I’d suggest that you ask this on WebSphere Application Server Forum.
  
  best regards
  
  Reply
20 satyendra
January 25, 2010 at 1:11 pm

Hello,

I am very much keen for IBM websphere ‘ Test 000-253: IBM WebSphere Application Server Network Deployment V6.1, Core Administration ‘. So Can please suggest best book for Test 000-253: IBM WebSphere Application Server Network Deployment V6.1, Core Administration except Red book.

Thanks and Regards,
Satyendra

Reply
- 21 rafaelri
  January 27, 2010 at 11:49 pm
  
  Satyendra,
  
  Sorry for not being able to answer this but although it might seem that I am an WAS admin (from some of my posts) I usually play this role but my main role is as an architect for J2EE solutions (nowadays most of them run on WAS) so I can’t suggest you any book for this particular WAS administration exam.
  
  regards,
  Rafael Ribeiro
  
  Reply
22 pandi
February 10, 2010 at 8:23 am

thanks it’s easy to learn

Reply
23 sree
March 20, 2010 at 7:35 am

Can you have more than one server within a node?

Regards
Sree

Reply
- 24 rafaelri
  March 20, 2010 at 1:49 pm
  
  Yes. As I said on the post: “The Node is […] built of […] and one or more Server instances.”.
  
  Reply
25 DavidS@IBM
June 8, 2010 at 5:59 am

This is not the case for any of the currently supported releases of WebSphere Application Server (v6, v6.1, v7).

The deployment manager is only needed for centralised administration & management of a WAS cell. Not for name server or any run-time service. To start servers on the nodes, the node agent must be running, but the deployment manager doesn’t need to be running. Started application servers on a node no longer require the node agent. They can continue to function correctly with all name services, session management & fail-over capability without node agent or deployment manager.

Reply
- 26 rafaelri
  June 8, 2010 at 10:39 pm
  
  David,
  
  I’ll retry the tests I did … it was quite a long time ago and sincerely what you described is what I expected but I sincerely remember to see a server misfunction while nodeagent or dmgr is down.
  And I ask you next time to provide a real e-mail since I’m the only one that will see this… it helps me have more confidence in approving comments…
  
  best regards,
  Rafael Ribeiro
  
  Reply
  - 27 SANDHYARAJ
    December 14, 2010 at 12:27 pm
    
    this notes is very good, but i need more clarification and explanation.
    so please provide the notes regarding the websphere server.
28 Ritesh@LnT
June 10, 2010 at 12:54 am

Rafaelri,

As david has explained, node agent is no longer needed for the servers to be up and running unless you make any changes in the config repository.
Deployment manager is needed only to centrally administer all servers and nodes. Its not required to keep DMGR running for the servers to be up and running.

Warmest Regards,
Ritesh Vyas

Reply
- 29 rafaelri
  June 11, 2010 at 1:55 am
  
  Hi Ritesh!
  
  I did a very simple test here and noticed one interesting behaviour. My setup was simple: I had a cluster with a bus that in turn had a cluster member with a file system storage and a queue destination. The cluster had a single server and I installed an application that had a single EJB with startup capabilities. Upon startup this EJB scheduled a timer to send every 3s an ObjectMessage (cause I remembered a friend complained about not being able to send JMS messages over WAS default JMS provider when dmgr was down). My first test was to bring the whole cell up: dmgr, node, server and then check if everything was working as expected and in fact it was working (what a surprise :D). Then I shut down dmgr and finally nodeagent. Upon dmgr shutdown a few DCS messages were seen as follows:
  [6/10/10 22:20:33:916 BRT] 00000040 DiscoveryServ W DCSV1115W: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Member was1Cell01\was1CellManager01\dmgr connection was closed. Member will be removed from view. DCS connection status is Discovery|Ptp, receiver closed.
  [6/10/10 22:20:33:917 BRT] 00000014 RmmPtpGroup W DCSV1115W: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Member was1Cell01\was1CellManager01\dmgr connection was closed. Member will be removed from view. DCS connection status is View|Ptp, receiver closed.
  [6/10/10 22:20:34:093 BRT] 0000002d SystemOut O TO Begin
  [6/10/10 22:20:34:094 BRT] 0000002d SystemOut O TO end
  [6/10/10 22:20:34:243 BRT] 00000036 SystemOut O onMSG
  [6/10/10 22:20:34:243 BRT] 00000036 SystemOut O Thu Jun 10 22:20:34 BRT 2010
  [6/10/10 22:20:34:331 BRT] 00000014 VSyncAlgo1 I DCSV2004I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: View synchronization completed successfully. The View Identifier is (9:0.was1Cell01\was1CellManager01\dmgr). The internal details are None.
  [6/10/10 22:20:34:575 BRT] 00000017 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Confirmed all new view members in view identifier (10:0.was1Cell01\was1Node01\nodeagent). View channel type is View|Ptp.
  [6/10/10 22:20:34:650 BRT] 00000017 CoordinatorIm I HMGR0228I: The Coordinator is not an Active Coordinator for core group DefaultCoreGroup.
  [6/10/10 22:20:34:889 BRT] 00000017 CoordinatorIm I HMGR0218I: A new core group view has been installed. The core group is DefaultCoreGroup. The view identifier is (10:0.was1Cell01\was1Node01\nodeagent). The number of members in the new view is 2.
  [6/10/10 22:20:34:895 BRT] 00000017 CoreGroupMemb I DCSV8050I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: New view installed, identifier (10:0.was1Cell01\was1Node01\nodeagent), view size is 2 (AV=2, CD=2, CN=2, DF=3)
  but application was still functioning… then I shut down the nodeagent and as expected a few other DCS messages:
  [6/10/10 22:22:00:070 BRT] 00000014 RmmPtpGroup W DCSV1115W: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Member was1Cell01\was1Node01\nodeagent connection was closed. Member will be removed from view. DCS connection status is View|Ptp, receiver closed.
  [6/10/10 22:22:00:068 BRT] 00000040 DiscoveryServ W DCSV1115W: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Member was1Cell01\was1Node01\nodeagent connection was closed. Member will be removed from view. DCS connection status is Discovery|Ptp, receiver closed.
  [6/10/10 22:22:00:160 BRT] 00000017 RoleViewLeade I DCSV8053I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: View change in process. Excluded members are [was1Cell01\was1Node01\nodeagent].
  [6/10/10 22:22:00:185 BRT] 00000014 VSyncAlgo1 I DCSV2004I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: View synchronization completed successfully. The View Identifier is (10:0.was1Cell01\was1Node01\nodeagent). The internal details are None.
  [6/10/10 22:22:00:201 BRT] 00000017 ViewReceiver I DCSV1033I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: Confirmed all new view members in view identifier (11:0.was1Cell01\was1Node01\server1). View channel type is View|Ptp.
  [6/10/10 22:22:00:220 BRT] 00000017 CoordinatorIm I HMGR0218I: A new core group view has been installed. The core group is DefaultCoreGroup. The view identifier is (11:0.was1Cell01\was1Node01\server1). The number of members in the new view is 1.
  [6/10/10 22:22:00:270 BRT] 00000017 CoreGroupMemb I DCSV8050I: DCS Stack DefaultCoreGroup at Member was1Cell01\was1Node01\server1: New view installed, identifier (11:0.was1Cell01\was1Node01\server1), view size is 1 (AV=1, CD=1, CN=1, DF=3)
  [6/10/10 22:22:00:751 BRT] 00000017 CoordinatorIm I HMGR0206I: The Coordinator is an Active Coordinator for core group DefaultCoreGroup.
  But application was still functioning as you and David pointed. Then I tried another test. As you both mentioned that dmgr and node serve only for administration and synchronization of configuration changes… I tried bringing up the server on the command line with both dmgr and nodeagent down and surprisingly it did not work. As result of the startServer command it said: “Server launched but failed initialization” and over SystemOut.log I could see:
  [6/10/10 22:23:48:749 BRT] 0000000a UserManagemen I CWWIM6001I Waiting for notification that the server has finished starting.
  [6/10/10 22:23:48:800 BRT] 0000000a ThreadPoolMgr W WSVR0626W: The ThreadPool setting on the ObjectRequestBroker service is deprecated.
  [6/10/10 22:23:48:830 BRT] 0000000a SSLComponentI I CWPKI0003I: SSL service is starting
  [6/10/10 22:23:48:842 BRT] 0000000a SSLComponentI I CWPKI0004I: SSL service started successfully
  [6/10/10 22:23:49:086 BRT] 0000000a SASRas A JSAS0006I: Security connection interceptor initialized.
  [6/10/10 22:23:49:128 BRT] 0000000a SASRas A JSAS0001I: Security configuration initialized.
  [6/10/10 22:23:49:140 BRT] 0000000a SASRas A JSAS0003I: Authentication mechanism: LTPA
  [6/10/10 22:23:49:142 BRT] 0000000a SASRas A JSAS0004I: Principal name: defaultWIMFileBasedRealm/
  [6/10/10 22:23:50:735 BRT] 0000000a SASRas A JSAS0007I: Client request interceptor registered.
  [6/10/10 22:23:50:795 BRT] 0000000a SASRas A JSAS0008I: Server request interceptor registered.
  [6/10/10 22:23:50:816 BRT] 0000000a SecurityCompo A JSAS0009I: IOR interceptor registered.
  [6/10/10 22:23:51:699 BRT] 0000000a ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl open FFDC0009I: FFDC opened incident stream file /home/rafaelri/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/server1_7d427d42_10.06.10_22.23.51_0.txt
  [6/10/10 22:23:51:889 BRT] 0000000a ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl resetIncidentStream FFDC0010I: FFDC closed incident stream file /home/rafaelri/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/server1_7d427d42_10.06.10_22.23.51_0.txt
  [6/10/10 22:23:57:129 BRT] 0000000a StartUpServic I STUP0013I: The Startup Beans service has stopped.
  [6/10/10 22:23:57:149 BRT] 0000000a SchedulerServ I SCHD0040I: The Scheduler Service is stopping.
  [6/10/10 22:23:57:171 BRT] 0000000a SchedulerServ I SCHD0002I: The Scheduler Service has stopped.
  [6/10/10 22:23:57:228 BRT] 0000000a AppProfileCom I ACIN0009I: The application profiling service is stopping.
  [6/10/10 22:23:57:231 BRT] 0000000a ActivitySessi I WACS0049I: The ActivitySession service is stopping.
  [6/10/10 22:23:57:430 BRT] 0000000a CGBridgeServi I CWRCB0103I: The core group bridge service has stopped.
  [6/10/10 22:23:57:570 BRT] 0000000a WsServerImpl E WSVR0009E: Error occurred during startup
  META-INF/ws-server-components.xml
  [6/10/10 22:23:57:738 BRT] 0000000a WsServerImpl E WSVR0009E: Error occurred during startup
  com.ibm.ws.exception.RuntimeError: com.ibm.ws.exception.RuntimeError: com.ibm.ejs.EJSException: Could not register with Location Service Daemon, which could only reside in the NodeAgent. Make sure the NodeAgent for this node is up an running.; nested exception is:
  org.omg.CORBA.ORBPackage.InvalidName: LocationService:org.omg.CORBA.TRANSIENT: java.net.ConnectException: Connection refused:host=was1,port=9101 vmcid: IBM minor code: E02 completed: No
  at com.ibm.ws.runtime.WsServerImpl.bootServerContainer(WsServerImpl.java:198)
  at com.ibm.ws.runtime.WsServerImpl.start(WsServerImpl.java:139)
  at com.ibm.ws.runtime.WsServerImpl.main(WsServerImpl.java:460)
  at com.ibm.ws.runtime.WsServer.main(WsServer.java:59)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:615)
  at com.ibm.wsspi.bootstrap.WSLauncher.launchMain(WSLauncher.java:183)
  at com.ibm.wsspi.bootstrap.WSLauncher.main(WSLauncher.java:90)
  at com.ibm.wsspi.bootstrap.WSLauncher.run(WSLauncher.java:72)
  at org.eclipse.core.internal.runtime.PlatformActivator$1.run(PlatformActivator.java:226)
  at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:376)
  at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:163)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:615)
  at org.eclipse.core.launcher.Main.invokeFramework(Main.java:334)
  at org.eclipse.core.launcher.Main.basicRun(Main.java:278)
  at org.eclipse.core.launcher.Main.run(Main.java:973)
  at com.ibm.wsspi.bootstrap.WSPreLauncher.launchEclipse(WSPreLauncher.java:245)
  at com.ibm.wsspi.bootstrap.WSPreLauncher.main(WSPreLauncher.java:73)
  Caused by: com.ibm.ws.exception.RuntimeError: com.ibm.ejs.EJSException: Could not register with Location Service Daemon, which could only reside in the NodeAgent. Make sure the NodeAgent for this node is up an running.; nested exception is:
  org.omg.CORBA.ORBPackage.InvalidName: LocationService:org.omg.CORBA.TRANSIENT: java.net.ConnectException: Connection refused:host=was1,port=9101 vmcid: IBM minor code: E02 completed: No
  at com.ibm.ws.runtime.component.ORBImpl.start(ORBImpl.java:382)
  at com.ibm.ws.runtime.component.ContainerImpl.startComponents(ContainerImpl.java:977)
  at com.ibm.ws.runtime.component.ContainerImpl.start(ContainerImpl.java:673)
  at com.ibm.ws.runtime.component.ServerImpl.start(ServerImpl.java:485)
  at com.ibm.ws.runtime.WsServerImpl.bootServerContainer(WsServerImpl.java:191)
  … 22 more
  Caused by: com.ibm.ejs.EJSException: Could not register with Location Service Daemon, which could only reside in the NodeAgent. Make sure the NodeAgent for this node is up an running.; nested exception is:
  org.omg.CORBA.ORBPackage.InvalidName: LocationService:org.omg.CORBA.TRANSIENT: java.net.ConnectException: Connection refused:host=was1,port=9101 vmcid: IBM minor code: E02 completed: No
  at com.ibm.ejs.oa.LocationService.register(LocationService.java:150)
  at com.ibm.ejs.oa.EJSServerORBImpl.(EJSServerORBImpl.java:164)
  at com.ibm.ejs.oa.EJSORB.init(EJSORB.java:55)
  at com.ibm.ws.runtime.component.ORBImpl.start(ORBImpl.java:379)
  … 26 more
  Caused by: org.omg.CORBA.ORBPackage.InvalidName: LocationService:org.omg.CORBA.TRANSIENT: java.net.ConnectException: Connection refused:host=was1,port=9101 vmcid: IBM minor code: E02 completed: No
  at com.ibm.rmi.corba.InitialReferenceClient.resolve_initial_references(InitialReferenceClient.java:220)
  at com.ibm.rmi.corba.ORB.resolve_initial_references(ORB.java:4150)
  at com.ibm.rmi.iiop.ORB.resolve_initial_references(ORB.java:670)
  at com.ibm.CORBA.iiop.ORB.resolve_initial_references(ORB.java:3184)
  at com.ibm.ejs.oa.LocationService.register(LocationService.java:106)
  … 29 more
  Caused by: org.omg.CORBA.TRANSIENT: java.net.ConnectException: Connection refused:host=was1,port=9101 vmcid: IBM minor code: E02 completed: No
  at com.ibm.CORBA.transport.TransportConnectionBase.connect(TransportConnectionBase.java:425)
  at com.ibm.ws.orbimpl.transport.WSTransport.getConnection(WSTransport.java:423)
  at com.ibm.CORBA.transport.TransportBase.getConnection(TransportBase.java:163)
  at com.ibm.rmi.iiop.TransportManager.get(TransportManager.java:93)
  at com.ibm.rmi.iiop.GIOPImpl.getConnection(GIOPImpl.java:115)
  at com.ibm.rmi.iiop.GIOPImpl.locate(GIOPImpl.java:189)
  at com.ibm.rmi.corba.Corbaloc.locateUsingINS(Corbaloc.java:307)
  at com.ibm.rmi.corba.Corbaloc.resolve(Corbaloc.java:378)
  at com.ibm.rmi.corba.ORB.objectURLToObject(ORB.java:3680)
  at com.ibm.CORBA.iiop.ORB.objectURLToObject(ORB.java:3227)
  at com.ibm.rmi.corba.InitialReferenceClient.resolve_initial_references(InitialReferenceClient.java:159)
  … 33 more
  Caused by: java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:372)
  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:233)
  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:220)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:378)
  at java.net.Socket.connect(Socket.java:536)
  at java.net.Socket.connect(Socket.java:486)
  at java.net.Socket.(Socket.java:394)
  at java.net.Socket.(Socket.java:207)
  at com.ibm.ws.orbimpl.transport.WSTCPTransportConnection.createSocket(WSTCPTransportConnection.java:308)
  at com.ibm.CORBA.transport.TransportConnectionBase.connect(TransportConnectionBase.java:356)
  … 43 more
  
  The mentioned FFDC had the same ConnectException error message.
  I sincerely guess that what we saw has anything to do with this:HAManager: Design discussion, Why you need two JVMs to start an ND cell
  
  best regards,
  Rafael Ribeiro
  
  Reply
  - 30 rafaelri
    June 11, 2010 at 2:02 am
    
    I forgot to say: I am running 6.1.0.0 (no FP applied since it would take me too much time to download FP applier and FPs only for such a test).
    I am trying now to start nodeagent then server (I guess it’ll work) and then shut down later (but it’ll also work since I did a similar test as I mentioned on previous comment). I’ll also try to contact the friend that complained about misfunctioning bus when DMGR is down and I’ll also check at work if authentication (VMM and ActiveDirectory integration) works while DMGR is down cause I guess this is another thing that fails when dmgr is down… but that is only a suspicion.
    As I suspected… nodeagent needs to be up when server is going up, so it isn’t only for configuration as we suspected… it seems to play some role upon server startup even if server is brought up through command line.
    
    best regards,
    Rafael Ribeiro
  - 31 rafaelri
    July 11, 2010 at 12:29 am
    
    As a complement… I am still trying to figure out the issue a colleague faced with dmgr down but as the slide 8 suggests on http://publib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/index.jsp?topic=/com.ibm.iea.was_v6/was/6.0/WLM-HA/WASv6_HA_Overview/player.html node agent needs to be running for HA.
  - 32 rafaelri
    July 11, 2010 at 1:07 am
    
    This RedBook explains better the node agent dependency: http://www.redbooks.ibm.com/redbooks/pdfs/sg246688.pdf on page 13. And after reading a lot I agree that the dmgr dependency my colleague saw was probably a misconfiguration somewhere or he was performing an external connection to a JMS queue or sth and he was probably specifying dmgr bootstrap port.
33 rafaelri
July 11, 2010 at 1:33 am

Indeed I guess he probably did what is described on page 67 of the RedBook pdf mentioned above

Reply
34 sombriks
July 16, 2010 at 3:12 am

excellent post!

actually some of you can have issues with websphere variables existing on diferent scopes. pay attention if any variable are created on more than one scope and also make sure that your environment are really synchronized.

these days i’ve found an issue with oracle oci driver running under a websphere 6.1 cluster (2 hp-ux machines, 2 node/profiles on each and one also having the Dmgr), if i publish my ear under node a or node b, it just works fine, but publish under the cluster (which have nodes from both physical servers) the app fails to define the OracleDriver to the app;
maybe some node don’t have the propely configuration, but if anyone already tried (and hopefully) solved the cluster issue, contribute here with that amazing post, 🙂

Reply
35 rafaelri
July 16, 2010 at 7:12 pm

Hi Sombriks!

you only need to configure the oracle driver on the cluster… just the same way you’d configure it on a server scope or on a node scope… after you have a cluster you’ll soon notice that you have a cluster scope where you can install jdbc drivers, jca drivers, specify websphere variables, etc… and there is even a well defined order for the precedence of those variables… give this a try (define the jdbc provider for the cluster and you’ll notice that it automagically defines the classpath for the jdbc driver at the cluster scope…) and please let me know if this worked…

regards,
Rafael Ribeiro

Reply
36 Abhishek@UST Global
August 6, 2010 at 8:25 pm

Fantastic post. Many thanks for your efforts. The pics are helpful and really helps the blog.

Thanks,
Abhishek

Reply
37 sombriks
August 20, 2010 at 5:12 pm

Hey rafaelri!

indeed it works, and follows the very same rules for any other scoped resource, i.e. if the same driver is created on a more specific scope that one will override the cell-defined value.

Reply
38 shwetha
November 10, 2010 at 5:09 am

Nice post.. MAny many thanks… the pics were really helpful.

Reply
39 Hareesh
November 23, 2010 at 12:53 pm

Hi, Nice post. Thanks.

How about profiles as a candidate to fit into the picture? Where would that fit in?

Reply
40 G.B.D.J.
December 8, 2010 at 12:26 pm

Regarding your answer to Garry:

“Garry,

Are you saying IBM documentation is wrong? This Redbook on PDF page 35 is rather clear:
“The Node Agent must be running when starting the application server on that node so the application server can register with the Location Service Daemon (LSD). In WebSphere V6 the LSD is HAManager enabled therefore you only need one running Node Agent in the cluster to provide the LSD when the application servers are started on the node.”.”

For me, the documentation is saying that the node agent must be running to start the applications servers, but it says nothing about turning off the node agents after the servers started.

I do not know if this is a stable environment or if it happens commonly, but I have seen servers running and no node agent process at all….

Reply
- 41 rafaelri
  December 8, 2010 at 9:05 pm
  
  In fact that’s the point: nodeagent plays a role on server startup either it is from command line or from web based admin. What I tried to emphasize is that it isn’t bound only to server administration.
  
  Reply
42 senthil
December 13, 2010 at 4:10 pm

Realy good website for WEBSPHERE admin

Reply
43 Sreeni
December 24, 2010 at 10:06 am

Hi Rafael,

The image description is very easy to understand the Websphere basic concepts of Node, Cell and Cluster etc. Thanks for the same. Can you please clarify the link between Node, OS (e.g. Windows) and physical hardware machines (e.g. x86)?

Regards,
Sreeni

Reply
- 44 rafaelri
  December 24, 2010 at 12:35 pm
  
  The node is a virtual unit built by the node agent and zero (although it does not make sense it is still a node) or multiple servers and it is hardware independent, so you can have a cell with nodes from multiple architectures. You can even have multiple nodes from multiple cells on a single server (it makes sense, for example, for hardware consolidation).
  
  Reply
45 Sreeni
December 28, 2010 at 8:16 am

Thanks for your quick & prompt reply.

Regards,
Sreeni

Reply
46 papu
February 23, 2011 at 12:19 pm

Hi Rafael –

I am new to WAS and am trying to setup a cluster with 4 members [App servers] (2 from node on machine1 and 2 from node on machine2). What I dont understand is whether adding a cluster member is like adding an App server to the cluster ?

This is what I did so far :

1.installed a cell profile on machine1 (So i have a dmgr on a node and another node with an appserver federated to the cell)

2. installed an App server profile on machine2 and federated the node it belongs to to cell on machine1.

Now do I just go ahead and create the cluster via Dmgr and add 4 members ? or do i have to install more 1 more app server profiles each with existing nodes on both machines ?

thanks,

Reply
- 47 rafaelri
  February 24, 2011 at 1:21 am
  
  Hi Papu!
  
  Exactly: the cluster member is simply an AppServer added as a cluster member. But if you want to have any fault tolerance you’ll have to create a node inside machine1 as well federated to the cell and add 2 servers on this node and 2 on the node on machine2.
  Let me know if you still have questions.
  
  regards,
  Rafael Ribeiro
  
  Reply
48 FedrBodr
May 6, 2011 at 6:08 am

Hello Rafael!
I am looking for a solution and found your blog, maybe you can help me? pleas.
How i can get all Clusters members information (all servers) like host and etc?
I need run diagnostics software environment on all servers(like dir, mqQueue) and then gather all information at one server and show in web interface.
I shall be very glad to any advice or direction for the study…

Reply
- 49 FedrBodr
  May 6, 2011 at 6:47 am
  
  And I forgot not by AdminConfig..
  
  Reply
  - 50 rafaelri
    May 21, 2011 at 1:03 pm
    
    when you say not by adminconfig you mean only invoking the binary or even importing the classes and invoking them directly inside your application?
    I am asking cause I think that’s the best approach you have… you are almost isolated to protocol changes from IBM… they usually don’t change the API that much in this way your application is pretty safe from WAS changes…
51 james
May 20, 2011 at 3:41 pm

We have an Business application developed in .ear , this application,now the users want to implement in other country the application with other configuration.The applications are in the same server.There isn’t other server.I have created an new cell, because i need to isolate the two applications.

if this correct? I have used different cell if it is true,can be for example different jvm argument,different directory to deploy the ear?
If i create an new profile with action -addprofile, when I deploy , deploy all profiles , old and new one

Reply
- 52 rafaelri
  May 21, 2011 at 1:01 pm
  
  Yes, this is correct but with a minor clarification to your explanation: you’ll have “different servers” from WebSphere perspective. You may have the same PHYSICAL server but for WebSphere since it is a different cell you’ll end up with a different server but… you may have a single cell and different servers… create another server instance and specify jvm options (that you should try to avoid – or at least only for JVM parameters, never for application parameters…), libraries, etc…
  
  Reply
53 james
May 24, 2011 at 8:34 pm

when is AddProfile action useful ? Can you give a brief example please?

Reply
54 nani
August 5, 2011 at 10:54 am

i want to know vertical clustering in websphere 7.0

Reply
- 55 rafaelri
  August 5, 2011 at 11:38 am
  
  http://www.google.com/search?q=vertical+clustering+in+websphere+7.0
  
  Reply
56 WAS-WLM@IBM
September 16, 2011 at 9:46 pm

Couple of comments to make on some of the various above posts;

#7
rafaelri
July 16,2009
“WebSphere by itself does not do load balancing for web request between servers in a cluster”

This actually isn’t true. Starting in v6.1 you can create a process called a WebSphere Proxy Server to act as a client for web requests. The WAS Proxy Server is actually supposed to be the strategic replacement for the Edge Product, most of the functionality that existed in Edge should also exist in the WAS Proxy Server. WAS Proxy is also, in a simplistic sense, extremely easy to configure for simple routing, really just paying attention to the virtual host aliases is all that should be needed. Because the WAS Proxy is a WAS process, all of the data about the applications can simply be pulled directly from the cluster members hosting those applications. There are also a variety of options for affinity, and the capability for the WAS Proxy to route web requests to non-WAS processes (using Generic Clusters).

If you are familiar with WebSphere EJB load balancing, the core of WAS Proxy functionality is pretty much the same, with a few implementation details that are different.

#8-11,24-31,39-40
in short: “When do I need a deployment manager and a node agent”

As a preface for the various comments to come after this, most of my answers are centric to WebSphere EJB/IIOP routing and HTTP routing using the WAS Proxy Server, there may be differences in IHS/plug-in behaviors, as well as requirement differences in other areas of WAS, such as administration. I’m also only referencing v6+ behaviors, in part because v5 is now out of service and because v5 also has an entirely different set of needs (in short, just have the dmgr and NA’s running).

#10
“The node agents are responsible for inter-node communications only. If I have fifteen JVMs per node, all fifteen will run, but they won’t share any information when the node agent goes down.”

Depends what is defined as information. From a config-file system sense this is probably correct. However, data can be shared through the HAManager component (v6+) between the application servers even without a deployment manager and a nodeagent. Although there probably aren’t too many scenarios in which this would really be useful.

#11
“The Node Agent must be running when starting the application server on that node so the application server can register with the Location Service Daemon (LSD)”

This is correct, but you can configure the application server to not require the nodeagent by specifying the following property as part of the startServer command;

-Dcom.ibm.websphere.management.registerServerIORWithLSD=false

along with a few other changes. More info here: http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=%2Fcom.ibm.websphere.nd.multiplatform.doc%2Finfo%2Fae%2Fae%2Fxrun_jvm.html

#24
“The deployment manager is only needed for centralised administration & management of a WAS cell. Not for name server or any run-time service. To start servers on the nodes, the node agent must be running, but the deployment manager doesn’t need to be running. Started application servers on a node no longer require the node agent. They can continue to function correctly with all name services, session management & fail-over capability without node agent or deployment manager.”

This is not entirely correct (you lose some failover capabilities with no nodeagents), but to really answer the root question of when do I need a NA or a Dmgr, lets discuss what capabilities are covered by each and what you may lose if they are not running.

Deployment Manager –
As mentioned previously, primarily for administration, but does own some additional capabilities with respect to EJB/IIOP routing, specifically the PreferLocal settings for all configured clusters and the weight tables for all of the members in all of the clusters. If the deployment manager is not running, the PreferLocal setting will not be honored, the default of false will be used. Changes to member weights will be ignored and all routing will take place as if the members had a weight of 2. In addition to losing the admin console, you’d also lose the ability to use the cluster mbean to invoke specific administrative actions on the clusters and cluster members, such as explicitly marking a server unavailable so that no requests will be routed to it. All of the Cluster-based MBeans reside on the deployment manager, so it must be running to use any of that functionality. Apart from that, you don’t lose much from a request routing perspective by the deployment manager being shut down.

Node Agents –
This is a little more tricky. Lets assume default, out of the box behavior. The nodeagent has 2 primary uses with respect to routing, resolving indirect IORs into directIORs when a client is initially attempting to route to a Clustered EJB, and as a failover mechanism for EJB requests. This article is one of the better ones at explaining the indirect/direct path;

http://www.ibm.com/developerworks/websphere/techjournal/0807_pape/0807_pape.html

In short, the resolve from an indirectIOR to a directIOR is the reason a nodeagent must be running when the client first attempts to route to the cluster. After the resolve has occurred, the nodeagents can be shut down and routing can still function.

The other main use of the nodeagent process with respect to EJB/IIOP routing is as a failover mechanism by the WAS routing code (WLM – Workload Management). In the event, for whatever reason, the client can no longer successfully reach a cluster member, the client will route the request to one of the nodeagents in the target cell as a failover mechanism. Because, by default, the cluster members must register with the nodeagent on startup, the nodeagents will always have the latest data about the cluster members and can update the client if its information has become stale or unusable. An example of this would be if a client process in CellA routes a number of requests to CellB:ClusterA, stops routing requests, all of the cluster members in CellB:ClusterA are shut down and restarted, and then the client process in CellA tries to route again. When the cluster members are shut down and restarted, they will come up on different dynamic ports, so the port information the client is holding onto is currently stale. It will attempt to route requests using the old data, get connection refused exceptions on each attempt, and when it runs out of members to try, the request will go to the nodeagent. The nodeagent will have the most recent port data, update the client, and the client can then resume routing successfully (all of this will be transparent to a user, they wouldn’t even know the client had to fail over to the nodeagent). If no nodeagents happened to be running, the client would not be able to get the new port information and would continuously fail in its attempts to route.

So in summary, you need a Nodeagent to start up application servers, to resolve the initial request to a cluster, and the nodeagent can be useful in some failover scenarios.

However, this all assumes default config. I don’t want to muddy the waters by bringing up thin client vs thick client, cross-cell vs. single-cell, multiple coregroups and core group bridges, static ORB ports and static EJB routing. The above is a general good guide, but you may run into slightly different behaviors or requirements once you start configuring your topology outside of what would be considered the “default”, or using other technologies such as SIB/JMS, SIP, and some of the stack products on top of WAS, so there really isn’t one answer, it’s always going to be a “it depends” 😛

Reply
- 57 rafaelri
  September 17, 2011 at 12:46 am
  
  Tks a lot for the complete and deeply explained reply.
  I wouldn’t expect less from an IBMer.
  Regards,
  Rafael
  
  Reply
58 Devendra Koli
November 2, 2011 at 6:32 am

Thanks for diagrammatic explanation

Reply
59 Sudhindra
November 10, 2011 at 7:58 pm

I am WebSphere Admin newbie. Thanks for the informative discussion.

Reply
60 Kingsly
December 15, 2011 at 8:15 pm

Excellent succint post on Websphere cell, node, server, cluster concepts.

I wonder if it is possible to have one deployment manager shared among multiple cells?

Reply
61 Tom Purl
January 9, 2012 at 4:36 pm

Great explanation. This is *much* better than the IBM documentation that I’ve seen. Thanks!

Reply
62 Kingsly
January 18, 2012 at 11:13 pm

rafaelri – Please clarify if it Is possible to have one deployment manager shared among multiple cells?

Reply
63 Ashok
November 14, 2012 at 3:16 pm

Very well described …Thanks for everyone

Reply
64 Francois
November 26, 2012 at 9:10 pm

Rafael, I enjoyed your explanation and I am hoping you could answer a question for me. Can members of a WAS cluster reside on different network segments, or must they all be on the same network segment?

I know this can be done with recent versions of Weblogic but I am totally unable to find an answer for Websphere application server. We want to setup a WAS cluster spanning two different datacenters, but stretching VLANs across datacenters is a nightmare (says my telco experts)… so we’re trying not to go down that road.

We have load balancers in both sites (with a stretched vlan of their own) so we can load-balance and pin users to a WAS server. We will stick to standalone WAS servers on each site if we have no other choice…

Any input or ideas appreciated! Thanks.
Francois

Reply
- 65 rafaelri
  November 26, 2012 at 9:36 pm
  
  Francois,
  
  I guess they can if they can reach each other through TCP (as I remember nodeagent comm was thru TCP) but I also guess they need to have a link between them with a good capacity otherwise synchronization would be poor. But that’s only guessing I have never done this and there’s been a few time since I last used WAS.
  
  best regards,
  Rafael
  
  P.S.: have you tried asking this on IBM forums? I used to ask a lot there and if you post like you posted over here (showing that you have researched before asking) people tend to answer!
  
  Reply
  - 66 Francois
    November 26, 2012 at 9:42 pm
    
    Bandwidth and latency is not an issue between our DCs – we have an obscenely fat pipe between them! 10 Gbps, less than 5 ms round-trip time…
    
    But you’re absolutely right, IBM Forums is probably the place I should go for answers. I’m on it (I’ll post my findings here)
    
    Thanks for the answer!