Ce forum est maintenant fermé, seule cette archive statique reste consultable.
  FORUM Matbe.com
  OS, Software, Réseaux
  Autre

  Problème: Opteron248+ 8Gb Ram +Caché database

Bas de page
Auteur Sujet :

Problème: Opteron248+ 8Gb Ram +Caché database

n°45734
Xam
Je veux être réincarné en chat
Posté le 06-07-2006 à 16:22:33  
 

Bon, on a une misère au boulot.
 
Config: BI Opteron 248 - 8Gb Ram PC3200.
 
On a la machine en production depuis 15 jours.
Memtest+ (64bit) OK
 
Depuis la mise en prod, elle a planté 3 fois déjà.
Notre fournisseur nous a d'abord demandé de vérifier les ram...
-> ok.
 
Maintenant, il pencherait vers .. je vous laisse lire la suite..
 
 
Notre problème:
 

Citation :

Hi,
The new server from ** with the new caché database environment was installed on the 18th of june.
The first system block was on the 22th of june. Afterwards it turned well for over 9 days, until we had the second blocking on the 5th of july.
I never said that caché is causing the problem, but to find out as quickly as possible the cause of the problem we have entered as well hardware as software issues.
On the first blocking (22.06) we were advised to update the kernel to the last version, what we did.
After the second blocking when the system was rebooted, and running for some hours, we saw that new processes were generating the 'Unable to handle kernel paging request' error in the messages log and that they were caused by running processes on caché which were blocked on the opening of the printing device, which was shutted by the user by pressing the x-button while they thought they were hanging, but the processes remained visible in the control panel, and were giving a <NOJOB> error while pressing 'details' or 'terminate' option.
There were 5 jobs who had this trouble, however on different time spots and also towards different printing devices.
Normally as I understand linux, the opening of the printing device should not give any problem while it is first transfered to the local queu, and then the cups handle the transfer to the printer servers. On caché, in our applications there is even a timeout present on the library who handles the opening of printing devices.
 
Actions taken :
- On the 5th of july we did some tests towards the RAM for 15 minutes, which did not return any problems
- Yesterday we informed our users to stop shutting ssh-windows down by the x-button, but to inform the IT department when they seem to be blocked on an application
- Today we runned from 12.00 until 13.00h a second test towards the RAM, which did not return any problems.
- Also we changed the memory tact-rate from 200 to 166 MHz, advised by ttec on mail of the 26th of juin
 
- Each thursday for the coming weeks, the server we be shut from 12.00 until 12.15/13.00h maximum to do some testing and to refresh the machine. By this we will see if the problem also occurs on short running periods
 
For the moment we didn't have any new dying of the machine
While it is the top season is sales, we can only stop the servers each thursday to do some testing. For the moment we are up and running again and will see with the measures taken if the problem will return.


 
 
Leur réponse:
 

Citation :

Well, we also don't think that the RAM is faulty! We just have reason to believe
that the system may have a problem with all RAM-slots being occupied and a
set frequency of 400 MHz. If I understood my engineers correctly this could be
in connection with a design flaw as far as Opteron systems ( in general! ) are
concerned.
The tactrate of 333 MHz would be an appropriate workaround here.


 
 
Moi je pense que cela provient de la database CACHE. et non pas du hardware...
 
Quelqu'un a une idée?
Surtout sur la dernière phrase: this could be in connection with a design flaw as far as Opteron systems ( in general! ) are concerned.  


---------------
Mangeur de chicons et buveur de (bonnes) bières.  
L'église est une secte qui a réussi. Et comme toutes les sectes elle adore le pognon.
Le règne des puissants repose sur la désinformation des masses
mood
Pub
Posté le 06-07-2006 à 16:22:33  
 

n°45746
mustang21
si fait chef !
Posté le 06-07-2006 à 18:28:47  
 

si tu me donne ta config du boulot , je te donne un pékat 2.4ghz complet qui troune nickel mais bon ca sera  p'tet juste niveau ram (1gg)
 
:p

n°45747
apache02
Posté le 06-07-2006 à 18:38:44  
 

Xam a écrit :

Surtout sur la dernière phrase: this could be in connection with a design flaw as far as Opteron systems ( in general! ) are concerned.


 
C'est vrai que ce genre de réputation colle un peu aux systèmes autres qu'Intel. Vu le relatif petit nombre de serveurs à base d'AMD (comparé aux solutions à base de Xeon),
un bug est moins vite repéré et surtout corrigé.
 
Chez nous, si on a le malheur d'associer le mot serveur et le mot AMD, le mot suivant est "dehors" ;)
 
Intel garde quand même une longeur d'avance dans le monde des serveurs "low cost".
 
Maintenant, je ne sais pas si le commercial qui t'as écrit ça se rend compte qu'il s'enfonce un couteau dans le bras droit ;)


Message édité par apache02 le 06-07-2006 à 18:43:54
n°45749
Orphen
Parle à la main !
Posté le 06-07-2006 à 19:03:20  
 

C'est pas grave, il peut toujours bouger les doigts, même si ça fait mal :jap:

n°46087
Ashe
reenignE esreveR
Posté le 22-07-2006 à 01:59:32  
 

Weee, 2 ans apres la guerre :sol:
Skwa l'OS?
(sinon celui qui a ecrit le premier quote va falloir reprendre des cours d'anglais :D)


---------------
pcx360 | Binary Genetics | Dreaming Prophet
“Entropy isn’t what it used to be.”
n°46092
Naunaud128
Crawling Up A Hill
Posté le 22-07-2006 à 11:59:52  
 

Et memtest veut pas dire que la ram est stable sous l'OS :)


---------------
Let the changes in
  FORUM Matbe.com
  OS, Software, Réseaux
  Autre

  Problème: Opteron248+ 8Gb Ram +Caché database