Saturday, April 12, 2008
serious problems with www/ftp server
the server began crashing yesterday (2008.04.11) afternoon but was automatically recovered throughout the day. last night the machine crashed and would not restart. it looks very much like a trip to the colo is in order to fix it.
as this is a very old box running with very old disks in it, i am considering retiring it completely. the websites will all move to a new server but the news server will have to be retired as i do not have the bits to run it on other hardware.
more on that as things progress. i wont really know anything until i'm able to look at the machine in person.
update @ 13:22PDT: i'm in the colo now and the machine seems to have recovered after a lengthy disk repair task. it's extremely hot in here and i'm concerned that it's the heat in the cabinet that is causing problems with the hardware. the colo operators are working on bringing up another cooling unit as the place is definitely too warm for these machines. i'm leaving the server alone for now. if it crashes again i will take the disks out and bring them home and move all the data to a new machine. stay tuned.
update @ 14:33PDT: well that was wishful thinking. as soon as i got home the server went and died again. i will attempt a remote restart but i will not going back to the colo until tomorrow and then i'm not sure how long it will be before the www stuff is up again. note that the major www sites such as rideontwo.com are not at all effected by this outage.
updated @ 09:57PDT on Monday the 14th: i think the colo facilities temperature has dropped significantly and this server has been able to recover for the time being which points to an almost certain heat related issue with the hardware. I'm still planning on moving the data to a new machine but maybe it doesn't need to be so urgent.
as this is a very old box running with very old disks in it, i am considering retiring it completely. the websites will all move to a new server but the news server will have to be retired as i do not have the bits to run it on other hardware.
more on that as things progress. i wont really know anything until i'm able to look at the machine in person.
update @ 13:22PDT: i'm in the colo now and the machine seems to have recovered after a lengthy disk repair task. it's extremely hot in here and i'm concerned that it's the heat in the cabinet that is causing problems with the hardware. the colo operators are working on bringing up another cooling unit as the place is definitely too warm for these machines. i'm leaving the server alone for now. if it crashes again i will take the disks out and bring them home and move all the data to a new machine. stay tuned.
update @ 14:33PDT: well that was wishful thinking. as soon as i got home the server went and died again. i will attempt a remote restart but i will not going back to the colo until tomorrow and then i'm not sure how long it will be before the www stuff is up again. note that the major www sites such as rideontwo.com are not at all effected by this outage.
updated @ 09:57PDT on Monday the 14th: i think the colo facilities temperature has dropped significantly and this server has been able to recover for the time being which points to an almost certain heat related issue with the hardware. I'm still planning on moving the data to a new machine but maybe it doesn't need to be so urgent.