[Xastir] No radar

Gerry Creager gerry.creager at tamu.edu
Sun Sep 27 10:33:32 EDT 2009


I'm trying rather desperately to get that back on-line.  Several things 
happened to cause this failure. I got the University to put money into 
new hardware to make things somewhat better, I hope.

1.  We spin a large disk farm of RAID5 arrays.  We did perform some 
upgrades and improvements over the last year. One of these involved 
using LVM2 to allow us to span multiple RAID shelves to form a single 
volume. The idea was to better be able to handle our data assets. Simply 
put, we didn't do a good job of implementing LVM2 on this hardware set, 
and we caused some crashes.  I have a new sys admin working with me who 
has a better appreciation of LVM than I do, and we're trying to get 
things back in order.  Part of that, however, is to create a RAID shelf 
that's isolated, and dedicated soley to radar data.

2.  The machine that's been serving out these data, mesonet.tamu.edu, 
started showing signs of hardware failure: system disk, memory and CPU 
errors on logs. This could be a result of simple power supply failure, 
or, the fact that a sorely underpowered 8 year old webserver with 600k 
(or so) hits/day might just be getting old.  We've invested in a newer 
webserver, and I'm trying to get permission to order a 4-server system 
that'll provide load-balancing and newer hardware for this.

3.  In my lab, I've got a fair bit of hardware, power capacity and 
cooling.  What has happened is, while I've got lots of current available 
at the load center, I don't have a way to distribute it to the racks. 
I've max'ed out all the circuit breakers (don't ask how this happened: 
It's too funny to even recount, in a sad sorta way...). We're working 
now to get additional current in place but this will require taking the 
data center down while I go from a single 20-amp 110vAC breaker/rack to 
2 30-amp 208-volt, 3p AC feed/rack.  I'll then put in, as needed, 3 
phase power distribution units for each rack, and will have the ability 
to power up systems without compromising power, breakers, etc.

4.  The one thing I try to keep up at all costs is the Unidata Local 
Data Manager (LDM) data distribution feeds, for which Texas A&M is a 
top-tier provider.  This service is offered to .edu's and a bunch of 
others around the country, and indeed, the world.  If I have to stop 
doing something and fix one of those systems, almost everything, save a 
family emergency takes 2nd place to getting those back up.

That said, getting radar data back up is a large part of our mission, 
but we've had budget hits and personnel problems, and things have been 
tight.  I'm currently 5 months into what I'd planned as a 2 week 
deployment schedule for a new supercomputer. Hardware, not software, has 
been the problem.  We've gotten good support from SuperMicro but have 
found some problems with their production/delivery systems they have now 
corrected, as well as discovering what happens when someone puts a bunch 
of bad chip caps on DIMM memory modules and they don't fail "hard", but 
their failure mode causes the motherboard to change voltage supply 
levels... and then roll over dead.

I also try to do my own research as a weather modeler, with particular 
interests in tropical cyclones, and boundary layer weather phenomena, 
including wind forecasts for wind energy (turbine) farms. Neither of 
these has seen any of my effort this summer, because of all the other 
issues.

So: I'm trying to get things back together, and apologize for the 
problems. We actually hope to have mesonet.tamu.edu back available this 
week, as well as the boxes I use to produce the radar graphics.  As soon 
as they're back I'll try to remember to announce their return, and ask 
for problem reports so we can see what the issues are, and repair them.

73, gerry n5jxs

Curt, WE7U wrote:
> On Sun, 27 Sep 2009, Rick Green wrote:
> 
>>  I think the server must be down.  In his last post here, Gerry said 
>> something about alligators up to his eyeballs...  I wish I was in a 
>> position to help.  But at this distance, I can only be patient.
> 
> Yes, that came across some days (weeks?) back, but I don't try to
> pressure people who are doing things for free for us out of the
> goodness of their hearts.
> 
> There are replacement methods that you can use which don't depend on
> the servers TAMU.  Perhaps someone on here can refresh our memories.
> I think they were RIDGE radars available from regional NWS sites?
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Xastir mailing list