1.

Solve : graphic card dying??

Answer»

Hello,

I ask you now as a last resort to a problem that I have found no solution to, this far.

First: I run Windows XP, SP2, on a Dell Latitude D830 (laptop), with its warranty expired of course some months ago. My graphic card is a Nvidia Quadro NVS 135M. Processor Intel Core 2 Duo T7500 or T7700 I don't remember.

Now, the problem: when I start my computer in normal mode, the moment when Windows should display the "welcome screen" (and at the same time, activate the graphic card and its driver), I get a beautiful random screen made of strips of different colors, that look very "computerized". But nothing else. If I look at it too long, sometimes the system stops and understands there is a problem. At first this problem took some time (like 5-20 minutes of normal operation, then gradually malfunctioning until it came to this state mentioned above of complete graphic bug), then it became instantaneous, then I opened the laptop and took off the huge pile of dust in the fan: the laptop worked again for some SESSIONS, totalizing several hours. But then the bug came back, and no update in the graphic card driver, no reopening to check if there was a problem with the thermal-cooling assembly or the heat sink compound (thermal paste), could get it right. Now my computer cannot start in normal mode, no matter how weird my ideas can be to get around it.

Now, the other modes: in safe mode, the computer works completely normally. In VGA mode: after 1 second of coloured stripes, I get a black screen, but windows keeps starting up, not noticing the screen does not respond anymore.

About the drivers update: at first, I tried the only update available for my graphic card, on the Dell site: I did it just before taking off the dust of my fan. The update corrected some little malfunction of the card (horizontal lines while playing videos with fast changing images), but it did not stop the bug from coming back after a while. Then I installed the update available on the Nvidia web site ("verde" driver, that was not on the dell site) in safe mode, but still, no way to start normally or in VGA mode. Then I updated the chipset driver, the BIOS, and some "netbook software" mentioned as "urgent" by Dell. None of it helped.

What I am suspecting, is some kind of "degenerative disease" of some component of the graphic card, that would be exacerbated by overheating. I can expect my computer to have been overheated slightly over a long period of time, while the fan was almost blocked by dust, and this overheating may have started the problem. But now it looks like the graphic card's maximal supported temperature gets lower and lower. Or maybe it has come to a point where it is no longer triggered by temperature. I cannot know for certain, since I have not run my computer in abnormally cold conditions yet.

There may be two outcomes to this situations: either the graphic card is already beyond any repair or workaround, and I just have to wait my first income to buy another one (which I would really avoid if I could: I am in a "hole with no income" right now, and It's been 3 months), or I can still do something with it, downclock the GPU or whatever, and still get something of that old thing. The problem is, I found no way to downclock the GPU, or lower any parameter of the graphic card, since every NVIDIA utility is disabled in safe mode, because the graphic card is disabled in safe mode! This is why I tried VGA mode, but it does not work: which means the problem is really with the fact of using the GPU, whatever the driver may be, and something in it is weak and maybe already broken.

So if anyone could help, or tell me about any similar story (with the final cause: too big graphic card on a small laptop - compatibility? Special care by a manufacturer for his computers to die completely for us consumers to buy new ones? Just no luck and should have cleaned my fan earlier?), it would help me out a lot: I am in a dead end right now...

Thanks! Quote

What I am suspecting, is some kind of "degenerative disease" of some component of the graphic card, that would be exacerbated by overheating.


That sums it up. You tried drivers and it did not really resolve the issue. Dust and heat can cause permanent damage. On not.

Here is something you can try. Bit the laptop in your refrigerator for an hour. Cover it so that moisture will not enter. DO NOT let the laptop get down to freezing. About 50 F or 10 C is enough. Now turn the laptop back on and notice if there is a noticeable improvement in normal boot mode. If so, then yes, it is a over heating problem.
The logic to this is that it will take longer for the GPU to get to the critical temperature where is fails. This is not a cure. Your GPU has a thing internally called 'thermal runaway' and happens  after damage is done to the silicon. If it was a desktop, could try extra cooling, but not easy to do on a laptop.

Here is a video. This guy says it is easy. I don't believe it. But see for yourself.
How to remove and install a laptop graphics card. (Part 1)
Yes...Someone on the chat suggested that I should also use a better paste that the one I used when replacing the thermal cooling assembly (a silicone paste, white in colour: better use arctic paste?). I will do both of them at once (the paste and the fridge), and yes, maybe it will start in a better way. What I do not understand yet, is how a GPU can be damaged (irreversibly damaged) but still work sometimes. Is it a matter of the distance between conductors, that allow tunneling effect at higher temperatures, but not on lower ones, and the max temperature getting lower and lower?
So it seems this laptop is beyond durable workaround. It will teach me 2 lessons: first, get 5, not 3 years of warranty, and second: clean your fan more often!

Thanks for the reply, it puts words on it!

And by the way: I cannot replace only my graphic card, as it is integrated together with the motherboard, on the same card! And the GPU chip cannot be taken off on my motherboard from what I saw...maybe I'm wrong...Dell Latitude D830 motherboard is about $200 as a used or refurbished item. They have a limited warranty.

The CPU has built-in protection to prevent thermal runaway. Plus it has a fan. And you can replace the CPU.

The Graphics processor should not generate as MUCH heat.Dust will shorten the life. Dust impends air flow. Tao bad they don't put fans on laptop GPUs.

Maybe soon we will have mobile devices with better thermal control. Are you interested in  Thermal runaway? Some of the research gets kind of deep. Here is a sample of what they want to do:

Quote
Abstract— Preventing silicon chips from negative, even disas-
trous thermal hazards has become increasingly challenging these
days; considering thermal effects EARLY in the design cycle is thus
required. To achieve this, an accurate yet fast temperature model
together with an early-stage, thermally optimized, design flow
are needed. In this paper, we present an improved block-based
compact thermal model (HotSpot 4.0) that automatically achieves
good accuracy even under extreme conditions. The model has
been extensively validated with detailed finite-element thermal
simulation tools. We also show that properly modeling package
components and applying the right boundary conditions are cru-
cial to making full-chip thermal models like HotSpot accurately
resemble what happens in the real world. Ignoring or over-
simplifying package components can lead to inaccurate tempera-
ture estimations and potential thermal hazards that are costly to
fix in later designs stages. Such a full-chip and package thermal
model can then be incorporated into a thermally optimized design
flow where it acts as an efficient communication medium among
computer architects, circuit designers and package designers in
early microprocessor design stages, to achieve early and accurate
design decisions and also faster design convergence. For example,
the temperature-leakage interaction can be readily analyzed
within such a design flow to predict potential thermal hazards
such as thermal runaway. An example SoC design illustrates the
importance of adopting such a thermally optimized design flow
in early design stages.
..
http://citeseerx.ist.psu.edu/
Pretty heavy stuff.
EDIT: That link is hard to find. Here is another that sounds like your problem. Notice they focus on the CPU and ignore the other overheating issues, such as the GPU overheating.
http://www.howtofixcomputers.com/forums/dell/dell-inspiron-1525-overheating-296811.htmlAbout motherboard replacement, I came to the same conclusion: it costs far less than a new laptop, and if I use good thermal paste with regular cleaning of the fan, my laptop should last this time.

The problem is: I am afraid of compatibility issues. Like:
will my processor (T7500 core 2 duo) fit on every D 830 Motherboard?
will the GPU always be aligned with the cooling assembly I have?
Will it always support my WSXGA+ screen (1680x1050)?

On all the links I saw, the said nothing about such compatibility...And I cannot make dell tell me the dell part number of the motherboard I already have (this way I could be sure). Is there a spot on the motherboard, where this part number could be written? Or some hardware recognizing utility on the web?

Once I am sure, I buy the motherboard, the thermal paste, and then pray whatever computer god that all my guesses were right...

BTW thanks for the link :-)In my opinion you are throwing money away resuscitating a laptop for any reason and looking down the road for things to get better or stay the same is wishful thinking at best...

Others here may have differing opinions however...I have thought about that. Although this laptop is old, it has a lot of advantages, modern "cheap" laptops lack:
- a good screen with good resolution, good refresh time, very clear, you can see everything _ really everything _ no matter what angle you are looking at it from;
- not bad calculating power
- The Gpu, when working, can cope with 60 fps HD videos, works well with Cinema 4D and other stuff,
- the system is still XP and works well (I have some science computing software that would not work on Seven)
- I recently purchased a second battery
- and...I won't have enough money to buy a new good one, before 2 months! Because of a hole of 3 months with no income at all...and this computer is also kind of my workstation (I have a computer at work, but a PC: I like taking my data with me, like in the train as I travel every weekend, or at home, and I have no PC at home!)

And compared to all that, now I understood the problem that killed this laptop, for like 200$ I could refurbish it...if I find something trustful, why not!EDIT: after a useful lshw command on a live cd I have (pretty useful command, this is), I knew everything I needed to know. From my motherboard part number (OHN338: apparently Google too hears it almost for the first time, and never heard of resellers), to the true name of the enemy inside my PC: the G86M chip. By browsing about that I found a lot of complaints about overheating, even someone who had his motherboard changed 5 times because of that by dell (he had warranty of course), until I stumbled upon this:
story
don't know how reliable this is, but this might explain a lot...about precision laptops failing the same way etc...or maybe it is just today's laptops that always want a too big graphic performance for the cooling and thermal stress they can afford...So, maybe you're right about that motherboard stuff, maybe it's still too much money for what it is...I will keep searching just in case, and just to try out, but this is quite embarrassing...I did not want to buy a cheap computer just to link today and in 4 months time when I will have all the money needed to buy a true good laptop...But *censored* if I can live with no PC for all this time!Thanks for sharing your research. The reference implicates that you will have the same issue again with a replacement motherboard. It seems THET the designers did not pay enough attention to thermal design.
Here a a partial quote of the three-year-old article:

Quote
All Nvidia G84 and G86s are bad
Comment No word on MCPs yet
By Charlie Demerjian
Wed Jul 09 2008, 18:43


THE BURNING QUESTION on everyone's mind is what Nvidia parts are failing in the field? No GT200 jokes here, NV personnel are still quite sensitive about that, but our moles have told us about the bum GPUs.

The short story is that all the G84 and G86 parts are bad. Period. No exceptions. All of them, mobile and desktop, use the exact same ASIC, so expect them to go south in inordinate numbers as well. There are caveats however, and we will detail those in a bit.

Both of these ASICs have a rather terminal problem with unnamed substrate or bumping material, and it is heat related. If you ask Nvidia officially, you will get no reason why this happened, and no list of parts affected, we tried. Unofficially, they will blame everyone under the sun, and trash their suppliers in very colourful language.   ...

http://www.theinquirer.net/inquirer/news/1028703/all-nvidia-g84-g86s-bad
Quote from: patio on October 27, 2011, 02:51:53 PM
In my opinion you are throwing money away resuscitating a laptop for any reason and looking down the road for things to get better or stay the same is wishful thinking at best...

Others here may have differing opinions however...
If you can fix it for less than $100, then OK; otherwise, I agree with patio, new laptops have never been cheaper.


Discussion

No Comment Found