Networking Power Requirements to upgrade to 10Gbits/s
Gary Buhrmaster, Yang Wei, John Weisskopf,
Ron Barrett, Charley Granieri, Ken Martell, Les
Cottrell, Boris Ilinets, Karl Amrhein,
Teresa Downey
Introduction
LHC USATLAS
has a need to upgrade the networking capacity for their servers at SLAC to
10Gbits/s. This is a requirement to test and validate the performance of the
SLAC tier2 site in preparation for the LHC turn on later this year. The
requirement has a desired deadline of end of January. The purpose of the first
meeting on 1/7/2008 was to understand the current situation, evaluate what
needs and can be done to enable 10Gbits/s to the SLAC core network, and to
identify the next steps. These notes are not meant to capture all the
details of this lively meeting but rather to identify the next steps.
The new
10Gbits/s equipment (8 switches) is already on-site and installed in the racks.
The basic requirement for the networking equipment is to have house and UPS
power (for backup) and stay within the cutover limits. This is complicated by
the problems of balancing load, staying within limits during cut-over,
harmonics, problem with UPS1 (networking UPS) being 17 years old and past end
of life.
Actions proposed from 1/7/08 Meeting
Attendees
included all from the above list except Karl Amrhein
and Teresa
The
following actions will be taken in the short term to get UPS power for the
first 4 new switch/routers and the Nokia firewall (for business services, the
same power issues) up and running:
Turn off the UPS to the console and
flora servers in the network row. This should get a couple of Amps. This
will be done by Ron by close of business (COB) 1/9/08.
Rebalance the network equipment to
see the effects.
Inventory network equipment to see
what is connected to UPS and identify what can be turned off, removed from the
UPS or moved to another UPS. This will be done by Ken and Ron and will be
ready by January 14th at 9:00am.
Verify that FARM13 is using house
power only. John/Ron did this following the meeting.
Move FARM12 equipment to another
switch (FARM16) so FARM12 can be removed. Some of the machines are in the
klyster cluster administered by Bob Steel and others
are administered by Stuart Marshall of Kavli (Orange) servers. Systems will need
to re IP since they need Gbit/s. John will coordinate
the move to be completed on Tuesday 1/15/08. Wei will alert and coordinate the
re IP address. Teresa is prepared to do the CANDO updates, Neal is aware of the
LSF needs. We will bring up a klyster and Orange
host in advance of the cutover to ensure the final cutover will be smooth.
Remove equipment from FARM01 (2
sulkies are critical so must be moved). They can be moved to FARM11 (in
the same row) without a new IP address. John will coordinate, it will be done
soon after Mike Hogaboom returns on January 22nd.
New nodes need to be added to FARM16
which will require re-addressing hosts. Wei will talk to systems to get a
schedule for this.
Run the border router on one side
only (i.e. only house or only UPS), then run the 2nd border on the other power
source. This will normally use the same amount of power but may help in case of
a loss of power on one side. Gary & Charley will review this after the
above items have been accomplished.
Charley will get the first 4 new
switches (BORDER1 and 2, new CORE1 and 2) running on house power (assuming it
is available) to provide burn in. Then they will be configured.
Longer term
we have to address:
Bringing up the remaining 4 new
switches.
Additional power will be needed for
the 4 new switches. Boris can provide this from the 75KVA Power Management
Module (PMM) purchased (a 125KVA PMM was also purchased at the same time) to
provide power to move the windows systems to where the VAXen
used to be (part of the floor replacement project). An executive decision will
be required on this (i.e. who gets the PMM).
Creating a plan to provide
sufficient clean power with backup for the network equipment. We should replace
the 17 year old UPS1 which is running hot, introducing harmonics and could
fail. Also consider whether to get lots of small UPS's to provide backup power
Accomplishments/Questions
- Tuesday:
- Need to expedite a plan to
move the systems on the farm09/10 switches to the (new) farm16 (in fact,
that is what farm16 was put in place to do). We also need to move
the ports from farm07, which had most of its systems turned off at one
point. This will allow the turn-off of two or three additional old
switches (and for *simple* values, for every two-to-three old switches
you can turn off, you get the equivalent for a new style (much more power
hungry) switch). As with the moves of systems from the (temp)
farm16 to the (real/new) farm16, this requires a reboot of the servers,
along with some configuration/cabling moves. Wei will need to get
the systems groups schedule for accomplishing these moves. This is
a new action item for Wei.
- Some load removed in Sacramento
row. PP-UPS1 decreased to Ia = 80.5A, Ib = 65A, Ic = 60.1A.
Unfortunately phase imbalance is still the same and In = 32A. Would
be very nice if we could re-distribute some loads from phase A to 2 other
phases and lower the In.
- Wednesday:
- Notified Ted Shab that getting close to having power for new EPN2
firewall.
- IP adresses
for for klyster and
orange machines allocated
- Thursday:
- Sulky16, 19 have cables in
place, moved from FARM01 to FARM11 and FARM01 turned off.
- Have green light from Bob
Steel and Stuart Marshall) for Tuesday move of Orange & Klyster machines from FARM12 to FARM16
- Karl Amrhein
taking care of Orange cluster testing/Infiniband issues.
- Friday:
- All in-use equipment in 4
network rows labeled, power cables labeled at both end of cable.
Orange=UPS, white = house power. Needs to be entered into inventory.
- Orange-nfs
needs to be tested, it has IP address (172.23.32.64), but is not pingable, is it connected? Stuart Marshall agrees no
need to pre-test with a single host in preparation for Orange
cluster move.
- Ports on FARM16 assigned for
Orange & Klysters
- Follow on meeting arranged for
1/15/07.
- Monday:
- Completed removal of equipment
from Sacramento
row. Boris checking load & temp of UPS1 cabinet in preparation for
Row 8 energization.
- Two new PMMs
arrived.
- Need to get network cables for
first 4 new switches installed. Gary
has assigned. Charley reviewed and sent email to Ron etc.
- Need to move Nospam3’s second
power source to house power. Ron will make it happen. It has been renamed
from mailgate03, console acting up.
- Tuesday:
- Orange-nfs
reported (via RT) connected (John) and switch configured (Charley). Karl
does OS installation.
- Second house power supply
moved from old SWH-FARM16 to new SWH-FARM16, new SWH-FARM16 now has both
hours & UPS.
- FARM12 machines moved to
FARM16. Klyster and Orange clusters back up (Luster awaits
Stuart Marshall). SWH-FARM12 shut down.
- Spreadsheets of device, power
source(s), location and notes created for 4 network rows.
- Wei achieves file transfer of
> 1 Gbits/s for 6 hours from BNL to SLAC.
Action Items from meeting 1/16/08:
Attendees,
Ken, Wei, Charley, Boris, Gary, Antonio, John, Ron, Richard, Les
John noted
that his team will be under pressure to install Dells. Not sure of impact on
10G project.
- EPN2 power is not a priority at
this time, it has not been decided when to do it.
- Move FARM16 old to FARM16 new.
- They are close, believe we can
reuse cables, Ron will check
- Wei will schedule Re-IP,
inform users, move.
- Ron will move cables
- Schedule tentatively for
Friday1/18/08
- Bring up 10G core
- We have enough power.
- Charley will take lead, before
Jan 28th
- There needs to be a 30 min
outage of PPUPS1. This will require electricians in the scheduling.
Probably in early Feb.
- Move one of each of
RTR-BSDNET1 and 2 power supplies to house power, Charley will schedule
this
- Boris, Charley, Ron discuss
how and when to make this happen and report on plan by Tuesday 22 Jan.
- Charley will verify FARM7 can
move to FARM16.
- Get update on next scheduled
power outage.
Next
meeting in 1 week to review progress on 10G move and FARM16, and look at
FARM7. Boris may have jury duty, Monday is Martin Luther King’s birthday.
Accomplishments/Questions
- Wednesday:
- Halimede, RTR-MON powered off.
- Ron updated and made Excel
power spreadsheets available via Sharepoint
- Wei contacts users of hosts on
FARM16 to schedule outage
- Thursday
- Tuesday-Thursday
- First four switches powered
up. Now drawing 72Amp on A phase (load used to be 65.7Amp). Netdev met and working on plan to bring up the new
core. Initially to interconnect switches (RTR-CORE1,2 and RTR-BORDER1,2
for redundancy) then connect up ESnet &
Stanford, plus RTR-CORE1,2-OLD & FARM-CORE1. When done we can power
down RTR-DMZ1 and RTR-CORE.
- Completed outage to move
machines from FARM16 old to new. Machines moved & restored to service
except yakut12
- Next meeting scheduled for 2pm
Friday. Next switches are FARM9 & 10 (fibre
connections are tricky, these are big outages) and FARM7 (lot of old
cabling & random hosts). FARM7 can move to FARM16, probably easier
than FARM 9 & 10 which are big outages.
- ID need to move Bbr-xfer12..17
from FARM09 to FARM16. Wei gets agreement from Wilko
and Shirley to make move on Monday at 10:30am
FARM16 old powered off.
- Jean Pierre asks when will
there be power for the new firewall, so he can replace the Nokia IP740
RTR-FW01 which has both house power and UPS it is advertised at
3A/100-120V. The new Nokia IP 1270 Firewall draws 300W. During the
cutover, both firewalls needs to be connected
for about 8 hours. Once the new firewall is in production the old
RTR-FW01 and BSDNET1 can be removed. They are all in Walkabout Creek
2BK-03.
Meeting Friday 1/25/08
In
attendance: Len Moss, Chuck Boeheim, Randy Melen, John Bartelt, Boris Ilinets, Gary Buhrmaster, Shirley Melen, Wei Yang, Charley
Granieri, Les Cottrell.
Wei needs a
date when the ATLAS hosts will have a 10Gbps path to outside. He wants to
participate in the ATLAS full dress rehearsal which happens in February.
He is also on vacation from Feb 1 – 18 in China. Gary reported he will be doing link testing
with ESnet on Jan 28th evening. He hopes
after a couple of days to advertise BGP. He hopes to be able to report good
news by Friday 1st February.
We looked
at the requirements to move machines off FARM07, 09 and 10. The contents of
these switches can be found via http://www.slac.stanford.edu/comp/net/mon-slaconly/lanmon/cathtml/switch-index.netmaster.html
For these switches there are many hosts that need to be moved that need opening
a ticket, coordinating with users, setting a date, get cabling in place,
getting IP addresses and making the move.
- BBR-XFER12..20 on FARM9 are
being moved to FARM16 by Shirley on Monday 28th February.
- Shirley will look after the Sulkys,
- Len will look after the Yakuts, BBR-SIMUL,BBR-EVDISP,
BLDLNX, and tentatively some of the OBJY hosts.
- Neal will need to be involved
in moving the GRIDDEVs and MORABs, The GRIs can be done at any time. I am not sure who will
coordinate/do this.
- John Bartelt
will look after the GLASTLNX01-15 hosts. Two of them are known on the
Internet and so will need extra care.
- Yacek needs to be contacted for the
DATADEVSOL and DATADEVLNX hosts. I am unclear who will lead this.
The next
meeting will be on February 29th, Charley will organize.