[Xastir] No radar
Gerry Creager
gerry.creager at tamu.edu
Sun Sep 27 10:33:32 EDT 2009
I'm trying rather desperately to get that back on-line. Several things
happened to cause this failure. I got the University to put money into
new hardware to make things somewhat better, I hope.
1. We spin a large disk farm of RAID5 arrays. We did perform some
upgrades and improvements over the last year. One of these involved
using LVM2 to allow us to span multiple RAID shelves to form a single
volume. The idea was to better be able to handle our data assets. Simply
put, we didn't do a good job of implementing LVM2 on this hardware set,
and we caused some crashes. I have a new sys admin working with me who
has a better appreciation of LVM than I do, and we're trying to get
things back in order. Part of that, however, is to create a RAID shelf
that's isolated, and dedicated soley to radar data.
2. The machine that's been serving out these data, mesonet.tamu.edu,
started showing signs of hardware failure: system disk, memory and CPU
errors on logs. This could be a result of simple power supply failure,
or, the fact that a sorely underpowered 8 year old webserver with 600k
(or so) hits/day might just be getting old. We've invested in a newer
webserver, and I'm trying to get permission to order a 4-server system
that'll provide load-balancing and newer hardware for this.
3. In my lab, I've got a fair bit of hardware, power capacity and
cooling. What has happened is, while I've got lots of current available
at the load center, I don't have a way to distribute it to the racks.
I've max'ed out all the circuit breakers (don't ask how this happened:
It's too funny to even recount, in a sad sorta way...). We're working
now to get additional current in place but this will require taking the
data center down while I go from a single 20-amp 110vAC breaker/rack to
2 30-amp 208-volt, 3p AC feed/rack. I'll then put in, as needed, 3
phase power distribution units for each rack, and will have the ability
to power up systems without compromising power, breakers, etc.
4. The one thing I try to keep up at all costs is the Unidata Local
Data Manager (LDM) data distribution feeds, for which Texas A&M is a
top-tier provider. This service is offered to .edu's and a bunch of
others around the country, and indeed, the world. If I have to stop
doing something and fix one of those systems, almost everything, save a
family emergency takes 2nd place to getting those back up.
That said, getting radar data back up is a large part of our mission,
but we've had budget hits and personnel problems, and things have been
tight. I'm currently 5 months into what I'd planned as a 2 week
deployment schedule for a new supercomputer. Hardware, not software, has
been the problem. We've gotten good support from SuperMicro but have
found some problems with their production/delivery systems they have now
corrected, as well as discovering what happens when someone puts a bunch
of bad chip caps on DIMM memory modules and they don't fail "hard", but
their failure mode causes the motherboard to change voltage supply
levels... and then roll over dead.
I also try to do my own research as a weather modeler, with particular
interests in tropical cyclones, and boundary layer weather phenomena,
including wind forecasts for wind energy (turbine) farms. Neither of
these has seen any of my effort this summer, because of all the other
issues.
So: I'm trying to get things back together, and apologize for the
problems. We actually hope to have mesonet.tamu.edu back available this
week, as well as the boxes I use to produce the radar graphics. As soon
as they're back I'll try to remember to announce their return, and ask
for problem reports so we can see what the issues are, and repair them.
73, gerry n5jxs
Curt, WE7U wrote:
> On Sun, 27 Sep 2009, Rick Green wrote:
>
>> I think the server must be down. In his last post here, Gerry said
>> something about alligators up to his eyeballs... I wish I was in a
>> position to help. But at this distance, I can only be patient.
>
> Yes, that came across some days (weeks?) back, but I don't try to
> pressure people who are doing things for free for us out of the
> goodness of their hearts.
>
> There are replacement methods that you can use which don't depend on
> the servers TAMU. Perhaps someone on here can refresh our memories.
> I think they were RIDGE radars available from regional NWS sites?
>
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Xastir
mailing list