SLAC ESD Software Engineering Group

 

 

UNIX SYSTEM ADMIN

 

 

NFS Server Setup on Sun Storage 7310

 

SLAC Detailed

SLAC Computing

Software Home

Software Detailed

Documentation and Web Suport

 


 

 

Infrastructure preparation

 

1) Sun Unfied Storage 7310 cluster with one J4400 cabling

2) allocate three nodenames/IPs for each node

 

node 0

one for NFS server (node 0): node0_nfs_ip/x
connect to node0' NET port 2

one for ILOM/SP (remote console) on LCLS-UTIL: node0_sp_ip/x
connect to node0's NET MGT port

one for BUI on LCLS-UTIL: node0_bui_ip/x
connect to node0's NET port 0


Will get renamed to mccfs2

  • mccfs4                172.27.8.13   NGE2

 

  • mccfs4-mgt         172.27.7.16   NET MGT
 
  • mccfs4-bui          172.27.7.25   NGE0 

node 1

one for NFS server (same as node0): node0_nfs_ip/x
connect to node1's NET port 2

one for ILOM/SP (remote console) on LCLS-UTIL: node1_sp_ip/x
connect to node1's NET MGT port

one for BUI on LCLS-UTIL: node1_bui_ip/x
connect to node1's NET port 1

.

.

  • mccfs4 (same as node0) for active/passive clustering

 

  • mccfs3-mgt          172.27.7.15   Net MGT
 
  • mccfs3-bui           172.27.7.24    NGE1

 

 

Infrastructure setup

1) set up DHCP for ILOM/SP

Configure DHCP servers (mccsrv01 and mccsrv02), and add node0_sp_ip and node1_sp_ip

Make it ready for setting up BUI.

2) set up BUI connection
ssh node0_sp_ip
start SP/console to access node0's console
configure the network for node0's BUI (node0_bui_ip)

Enter
Hostname, DNS, IP, netwmask,etc.

  • nodename: node0_bui
  • domain: slac.stanford.edu
  • default gateway: 172.27.4.1
  • mask: 255.255.252.0      
  • 134.79.151.12  mccsrv01
  • 134.79.151.13  mccsrv02 (later)

 

Cluster setup

use node0's BUI (https://node0_bui:215 for Cluster setup)

ssh node0's SP to monitor node0's console
ssh node1's SP to monitor node1's console

ensure tht node1 hasn't been initial set up; otherwise, perform a factory reset

Click 'START' to begin. Set up the cluster using the displayed cluster
setup screen.

1) Enter node1's bui name (node1_bui) when prompted.

2) set up the network configuration

nge0: node0 bui Datalink           node0 bui Interface                (BUI for node0)
(lock the bui Network port)

nge1: node1 bui Datalink          node1 bui Interface                 (BUI for node1)

         Hit Apply

         Configuring Route to add a default route for node1_bui

         family: IPv4

         kind: Default

        Gateway: 172.27.4.1

        Interface: node1_bui (nge1)

nge2: node0 nfs Datalink           node0 nfs Interface               (ip for node0 nfs server)

         Hit Apply

         Configuring Route to add a default route for node0

         family: IPv4

         kind: Default

        Gateway: 172.27.8.1

        Interface: node0 (nge2)

All IPs must be static (otherwise, cluster won't work)

Enter DNS, NTP, NIS, etc

  • 134.79.151.12
  • 134.79.151.13
 

Skip NIS LDAP AD

3) Storage Pool Setup

Create one storage pool

Pool name: mccsp
Data Profile: Mirrored

Log Profile: Striped

Skip "Phone Home" setup
Perform SCRUB

4) assign ownership

Assign node0 bui network to node0_bui and set as private  NGE0
Assign node1 bui network to node1_bui                             NGE1
Assign node0 nfs to node0_bui                                           NGE2

Assign mccsp to node0_bui

Hit Apply

Don't select failback when asked to confirm the change.

5) perform a "Failback"
Verify that all "singleton" resources have relocated to their host (owner)
nodes.

6) BUI to node1
Go to "Configuration Cluster" and make node1 bui network interface private
on node1 (only time needed to configure in the whole cluster setup)

7) Validate Cluster Setup

On node1's Configuration Cluster Window, perform a "Takeover". Monitor
node0's console. Verify that all of node0's singleton resources have failed
over to node1. Use "ping mccfs2" and "arp mccfs2" to find mccfs2's MAC address and make sure it belongs to node1.

On node1's Configuration Cluster Window, perform a "Failback". Monitor
node0 and node1 consoles. Verify that all of node0's singleton resources
have failed back to node0.  Use "ping mccfs2" and "arp mccfs2" to find mccfs2's MAC address and make sure it belongs to node0.

Bui to node0, commit (this setp is only for intial cluster setup and validation)

Project setup

Shares -> PROJECTS (+ Add Project)
Name: mccfs

Edit mccfs
Shares -> General

Space Usage
Data (default)

Inherited Properties

Mountpoint /export/mccfs
Read only (unchecked)
Updated access time on read (unchecked)
Non-blocking mandatory locking (unchecked)
Data deduplication (unchecked)
Data compression (off)
Checksum (Fletcher4 (Standard))
Cache device usage (All data and metadata)
Synchronous write bias (Latency)
Database record size (8k)
Additional replication (Normal (Single Copy))
Virus scan (unchecked)
Prevent destruction (unchecked)

Default Settings

FILESYSTEMS

User (root)
Group (root)
Permissions (RWX  RWX  RX)

LUNS (use default)

Shares -> PROJECTS -> Protocals
NFS
Share Mode (None)
Disable setuid/setgid file creation (unchecked)
Prevent clients from mounting subdirectories (unchecked)
This is requred for IOC NFS mounting.
Anonymous user mapping (nobody)
Character encoding (default)
Security mode (Default (AUTH_SYS))

NFS Exceptions


Network LCLSCA (RW), PCD(RW), FACETCA (RW), DMZ (R)
CHARSET (default)
ROOT ACCESS (unchecked, except for lcls-daemon3.slac.stanford.edu and mccfs5.slac.stanford.edu )

Note that the nodename must be fully qualified. The nodes with root access will be used to configure the shared filesystems

with root privilege.

Note:

a) Root Acess for 172.27.8.0/22 has been enabled to help with data migration. Will be disabled.

b) Configuration-> Services -> NFS: Minimum supported version set to NFSv2 to support rtems based hard IOCs which are running NFSv2.

HTTP
Share mode (Read only)

Shares -> PROJECTS -> Access
Shares -> PROJECTS -> Snapshots
Shares -> PROJECTS -> Replication

Create shares in project mccfs


Shares -> PROJECTS -> mccfs
+
Project (mccfs)
Name (usrlocal

(Note: for /usr/local, uncheck "Inherit from project" and manually set the mountpoint to /export/mccfs/usr/local; for all other shares, check "Inherit from project")
Data migration source (None)
User (nobody)
Group (other)
Permissions (RWX RX RX)
Inherit mountpoint (checked)
Rject non UTF-8 (checked)
Case sensitivity (Mixed)
Normalization (None)

Create shares: home, u1, u2, u3

 For each share (or filesystem created), be sure to update its access: Shares -> Shares, click each filesystem, select Access:

Root Directory Access:

User: root

Group: root

Permissions RWX RX RX

Services

Below is a snapshot of services enabled/disabled on our SS7310 system.

 

25-Oct-2013 Turned on SYSLOG

Click on "Configuration"

Click on button to the left of Syslog (edit Service Configuration)

Selected Protocol: Updated Syslog (RFC 5424)

Added mccsyslog: 134.79.151.40

Clicked Apply

Changes will be propagated to other server automatically

 

 

 

 

Test

http://node0

         Maintenance/System

1. Firmware Upgrade Procedure

Takeover and Failback:

A takeover takes all of the resources from the partner head, regardless of which head those resources are assigned to.  A failback merely gives back any resources assigned to the partner head.

To do a takeover from the partner head "for example" node1.  This will cause a reboot of node0 and when it comes backup it is ready for an upgrade. When upgrading node1, you should do a takeover from node0 which will cause a

reboot of node1 and once it comes up it is ready for an upgrade.

 

 

 

2. Support Bundles

  • Create a Support Bundles and send to TechSupport

            Go to Maintenance -> System in BUI. To generate a support bundle, click (+) ioc next to Support Bundles. It will first

            generate the bundle and then send to TechSupport. Since we don't have a connection to TechSupport, it will fails in upload

            process and retry. At this point, click Cancle to stop the upload; instead, download the bundle to the local desktop machine

            and ftp it to TechSupport.  In case we want to check the bundle file ourself, ftp the bundle file to public

            (ftp ftp.slac.stanford.edu, use binary mode), unzip and untar on a Solaris machine, and exaim e.g. cifs/cifs.out file.

3. Configuration Backup

           Configuration Backup tasks can be accomplished using the Configuration Backup area near the bottom of the

           Maintenance > System screen in the BUI.

           To create a backup, click the "Backup" button above the list of saved configurations and follow the instructions.

           You will be prompted to enter a descriptive comment for the backup.

       

           This configuration file can be sent to TechSupport for debug, but we should NEVER use it to restore the system,

            as I found it is very dectructive - it can wide out all configurations in Storage Pool, Projects and Shares.

Note:

a. all bundle files and configuration backup files are saved to Z:\unixadmin\SS7310 for safety.

b. Snapshoots of system configurations are also ketp in Z:\unixadmin\SS7310\Configuration Snapshoot for reference.

How to upgrade the ILOM/BIOS


Unconfiguring Clustering

Don't do it, as it is a very destructive operation. Do it only when one of the clustered nodes (or heads) fails and needs to be replaced with a new one.

  1. Select the failed/questional head (and reset to its factory configuration, if needed)
  2. From the system console of the head that will be reset to its factory configuration, perform a
    factory reset ( CLI > maintenance > system > factoryreset)
  3. The head will reboot, and its peer will take over normally. When the head reboots, power it
    off and wait for its peer to complete takeover.
  4. Detach the cluster interconnect and JBOD cables (see above) from the powered-off head (This is very important; otherwise, it will cause unconfiguring to fail)
  5. On the surviving head, click the Unconfig button on the Configuration -> Clustering
    screen. All resources will become assigned to the surviving head, and that head will no
    longer be a member of any cluster.

 

NFS Server Migration Plan

  1. configure mccfs2 to mount /newu1, /newlocal and /new home on mccfs4

mount mccfs4:/export/mccfs/home /newhome

mount mccfs4:/export/mccfs/usr/local /newlocal

mount mccfs4:/export/mccfs/u1 /newu1

2. on mccfs2 as root, run following (one at a time) and make sure each completed successfully

          rsync -avSH /usr/local/ /newlocal

          rsync -avSH /u1/ /newu1

          rsync -avSH /home/ /newhome

3. on lcls-srv20

          mount /u1, /home, /usr/local on mccfs4

  • reconfigure all IOCs to use the new mounting path (in $IOC/All/Prod) and in DHCP for FACET and LCLS
  • rename mccfs2 (nodename/IP) to mccfs5

This is equivalent to disable NFS server, but has additional advantages. All NFS clients should stop writing to the NFS server. mccfs5 will be kept to continue hosting Matlab License server, printing server, account management, and system file distribution. We can test all these functions on mccfs5.

  • on mccfs5 as root, make a final data migration (again, one at time)

mount /newu1, /newlocal, /newhome on mccfs4
rsync -avSH --delete /u1/ /newu1
rsync -avSH --delete /usr/local/ /newlocal
rsync -avSH --delete /home/ /newhome

  • rename mccfs4 to mccfs2 (nodename/IP)
  • reboot all NFS clients and IOCs orderly
  • reconfigure mccfs5 to mount mccfs2 and reboot mccfs5
  • test applications on Sunray, OPIs, Servers (daemon and interative)
  • test IOCs applications (check screenloging, save/restore, edm and etc.)

Backup setup
-?

Issues

It is a restriction of the 7000 Series Appliance that all shares must be exported for NFS out of the base directory of /export. The restriction is imposed due to the underlying OS implementation and the way NFS works. Because of this, we are limited to having all shares be under the /export directory with no way to change this.

 

Miscellaneous


          drwxr-xr-x+ 3 root root 5 Feb 24 15:53 .


The plus sign (+) indicates the presence of an ACL (access control list). ACL is an extension to the normal *nix permissions system which increases security by allowing the system more fine-tuning in who is permitted to access specific files.

 

 

Author: Ken Brobeck and Jingchen Zhou. Last edited on 02/10/11