I want to log GPU temperatures to a file, probably once a second. What's the best way to handle it?
- open the file, keep it open while logging, close after logging is finished
- open, write that second's temp, close - every second
- fill a memory buffer then open/create, write, close - every buffer
There is the possibility of overheating and windows freezing (that's what this is for, to see if the GPU or CPU is too hot).
For that reason, I am looking at the second option but is that too much overhead? A game will be thrashing everything.
Opinions sought.
The write and close immediatly seems the more usable,can be fast.
Usable also is a dll with shared memory avoiding disk access.
>The write and close immediatly seems the more usable,can be fast.
That's what I am leaning towards.
>Usable also is a dll with shared memory avoiding disk access.
Because Windows may freeze, it needs to be a file I can look at later.
If overhead with the logger is a factor, i would be inclined to keep an app open, log results into a buffer then lazy write the buffer content to disk at a convenient interval. If the app itself is at risk of locking up, I would use a remote app and send data to it.
Hi,
I would buffer a minute's worth (or 5 minutes) and
then write it out to minimize any effects on the system.
If worried about freezing, give each ~10 minute interval
a separate file.
Regards,
Steve N.
Open the file with FILE_FLAG_WRITE_THROUGH, keep the file open;
Fill up your own data buffer and flush when full;
Adjust the 'full' threshold according to the current CPU temperature (lazy when low temp, crazy when high temp);
Close the file when finished (or a freeze and reboot will have the same effect.)
No repeated open-close, and no loss of data :greenclp:
MSI Afterburner (http://event.msi.com/vga/afterburner/download.htm (http://event.msi.com/vga/afterburner/download.htm)) v2.2.4
(http://s14.postimage.org/p2fl7iia5/msiafterburner.jpg) (http://postimage.org/image/p2fl7iia5/)
has an option to monitor and log to file, and can display gpu temp whilst a game is running (along with other stuff - i usually just have FPS & Time displayed alongside GPU temp)
Works for both nvidia and ati cards.
For CPU temps i use CoreTemp to keep an eye on them - http://www.alcpu.com/CoreTemp/
Does it matter?
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
.Repeat
NanoTimer()
Open "A", #1, "test.tmp"
Print #1, Time$, CrLf$
Close #1
; Print Str$("Writing took %i µs\n", NanoTimer(µs))
Print Str$("Writing took %i ms\n", NanoTimer(ms))
invoke Sleep, 500
invoke GetKeyState, VK_SHIFT
.Until sword ptr ax<0
Inkey "bye"
Exit
end start
Output:
Writing took 18 ms
Writing took 10 ms
Writing took 8 ms
Writing took 5 ms
Writing took 19 ms
Writing took 9 ms
Writing took 9 ms
bye
the time consumed is not as much an issue as having the most recent data available for forensics
i like Tedd's method :P
Tedd, so if Windows freezes the file is still OK? The main problem is not the game freezing but Windows itself, with no time to close the file.
I like the idea of lazy/crazy writes too 8)
Sinsi,
Keep it simple. I did the above test on my fast office machine, now again on my slow Celeron, and surprise, opening, writing and closing is done in less than a millisecond.
Even if it takes ten, and you do it once per minute, that slows down your game by 0.02%. I guess the player could live with that ;-)
I didn't think there would be much if any overhead, I am from the old-school "if you open a handle you close it".
I will try Tedd's way and deliberately kill windows (reset button should do it) and see about the logfile.
If windows can't close it then is should show up as a lost cluster (by my way of thinking).
The file size should be updated with each flush - which would be every write if there's no buffering - so the file should be okay as long as writes get to finish.
Closing is more of a formality, and to ensure the data has been flushed; the file still needs to be in a consistent state as much as possible, since there is always the possibility of a crash.
You will get inconsistencies if the write gets interrupted before it finishes, but there's no way around that whatever method you use.
I still basically like the idea of using another app to perform the file IO so that even if the source app crashes the file data is not lost or left incomplete. Sending a HWND_BROADCAST is hardly a problem when you are talking about 1 second intervals. The slave app could also keep track of the main app sending the message to identify if it was still running or not.
CoreTemp does not support Pentium IV processors.
Sinsi,
how much data do you need to pass to the log file each interval ? Is it 1 or 2 DWORD values or does it need to be a more complex form of data ?
Not too much, just the basics
;right now
gputemp dd ?
fanrpm dd ?
;later
cputemp dd cores dup (?)
I think the core temps might use privileged instructions, is rdmsr one?
I might put the time too, once I figure time functions out...
I have sketched up a basic logger that registers a custom windows message and creates a memory mapped file. This will allow any sized data to be passed. What I need to know is what data format you want to pass, numbers, strings or any combination of both in a structure. I have made the memory mapped file 64k but it can be any size so passing large amounts of data is no big deal.
RDMSR is a definitely a privileged instruction
Hi Dave,
Quote from: dedndave on November 10, 2012, 11:04:06 PM
RDMSR is a definitely a privileged instruction
yes it is. There are Linux tools available to use that instruction: http://linux.koolsolutions.com/2009/09/19/howto-using-cpu-msr-tools-rdmsrwrmsr-in-debian-linux/ (http://linux.koolsolutions.com/2009/09/19/howto-using-cpu-msr-tools-rdmsrwrmsr-in-debian-linux/) But take care.
Gunther
Agner Fog has a 32/64 driver set that can be used...
http://www.agner.org/optimize/#testp
you want testp.zip - inside that, you'll find DriverSrcWin.zip
it includes the drivers and a .H file that is pretty simple to convert to .INC
i also found this link
which, after reading it, leads me to believe it isn't a simple task to get the temp
(refering to the reply by uvts_cvs dated Jul 24, 2011)
but, it may be misinformation - lol
http://stackoverflow.com/questions/5327203/how-to-access-cpus-heat-sensors (http://stackoverflow.com/questions/5327203/how-to-access-cpus-heat-sensors)
Sinsi,
This is a scruffy example of the suggestion I made to use a remote app to log the data from your test app.
It is 2 test pieces, one sends the data, the other is a remote application that receives the data using a memory mapped file and sending a custom message with the HWND_BROADCAST handle.
It only displays the sent data in 4 message boxes but the idea works and it would be easily fast enough to do what you want.
you can't think of this problem in human terms, a second for us is fast; for a computer it is dead slow.
I like Hutch's method of a 'push/pull' it is very efficient.
Hi everybody,
Paul
Why use a cannon to kill a fly?
include \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=94.0)
Init
Kill "test.tmp"
mov ebx, 100
.Repeat
Print "*"
Open "A", #1, "test.tmp" ; open file in Append mode
mov eax, 1000
cdq
div ebx
Print #1, Str$(eax), " " ; write 1000/ebx, will crash for ebx=0
Close
dec ebx
.Until Sign?
Inkey "You will never see this, haha"
Exit
end start
Last entries in test.tmp: 333 500 1000
@Paul: Welcome back, nice to see you!
imho fastest/optimized scheme would be mapping file to memory, eventually using WRITE_COPY in order to be able to fix up a progress errors, when they occur (sort of backup of file) and after everything went fine, just write the image to disk ;) (i'm not an expert, just trying to help, don't flame me too much... :P)
:biggrin:
JJ,
nuking flies is good practice but in the case that sinsi described where he wanted to log data right up to the end where there was some risk of the source crashing, a remote app doing the logging was the basic idea so it stayed up even if the source app went down. A memory mapped file and a SendMessage using the HWND_BROADCAST handle is simple enough to code and it is genuinely fast which may help if a higher sampling rate is required.
Quote from: hutch-- on November 17, 2012, 09:55:38 AM
... if a higher sampling rate is required.
Hutch, he wants to monitor CPU temperature. Once a minute would be generous :biggrin:
On my slow old Celeron notebook, the open, write & close sequence takes less than 0.4 milliseconds...
Depends on the task, I would happily write it directly to disk in most instances but the task defined a risk of the source crashing, this is why its done as a remote logger, not a direct process. I think he was after logging at about 1 second intervals but the memory mapped file and SendMessage will handle much faster again but the big win is it will handle much larger amounts of data each interval than you will with file IO.
the temp probably doesn't change all that fast
i would think once every 5 or 10 seconds would be ample :P
he seems to only want to write a few bytes each time - keep it simple
The GPU temps can go from 52 to 62 in 2-3 seconds, this is normal for the card (GTX 580).
What NVidia won't say is the upper limit, so 80C might be OK on one system but overheat on another.
It looks like, in this case, the GPU isn't the problem. I used the motherboard program to check CPU temps, and by
alt-tabbing could check, the CPU gets up to 90C as far as I can tell. Six cores working flat out gives off a lot of heat.
AMD CPUs are notorious for running hot then freezing, Intel seem to slow down until the system shuts off.
Having both fans (CPU and case) running high from start keeps it to 85C max, but is bloody noisy.
The method I finally used was open>setfilepointer>write>close every 500ms.
A couple of times, using tedd's way, chkdsk found free space marked as used and the logfile was 0KB.
Got some good ideas from this discussion, so thanks you blokes :t
Quote from: sinsi on November 17, 2012, 05:01:16 PM
The method I finally used was open>setfilepointer>write>close every 500ms.
Wise decision ;)
Quote from: sinsi on November 17, 2012, 05:01:16 PM
The GPU temps can go from 52 to 62 in 2-3 seconds, this is normal for the card (GTX 580).
What NVidia won't say is the upper limit, so 80C might be OK on one system but overheat on another.
It's 97 celsius... just below the boiling point of water.
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-580/specifications (http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-580/specifications)
In my experience NVidia chips are furnaces. They naturally run very hot, heat up very quickly and cool rather slowly.
For the AMD temps ... 60c is recommended max on most of their chips. 90c can kill it.
http://products.amd.com/pages/desktopcpudetail.aspx?id=34&AspxAutoDetectCookieSupport=1 (http://products.amd.com/pages/desktopcpudetail.aspx?id=34&AspxAutoDetectCookieSupport=1)
There are several things you can do to improve the temperature problem...
1) When it's warm shut it down and immediately (as in within a few seconds) remove the Heat Sink... carefully clean away all that silly rubber crap from the cooler and the chip, let it cool, and redo the thermal junction with a thin layer of
this stuff (http://www.stanleysupplyservices.com/product-detail.aspx?pn=441-302) (One stick does about 150 chips) The Thermstrate will take a couple of days to reach maximum effectiveness but it works wonders. My dual core CPU is runs about 5 to 10c above ambient with the stock cooler.
2) Make sure the bottom of the heat sink is flat! I use a "draw file" technique to clean and level the bottoms of heatsinks on systems that overheat. Some people will lay down 100 grit sandpaper on a flat surface and run the heat sink back and forth, to get the same effect... turn often and stop when the entire surface is evenly patterned. Clean it gently with steel wool to remove fine particles and then do step 1.
Note: DO NOT polish it! Leave it with the scratches... the scratching holds thermal compound in place.
3) Enable "Cool and Quiet" on the CPU. This is load driven throttling. The CPU runs at about 40% speed until it needs to go faster. This will bring your temperatures down considerably.
4) Go aftermarket for your cooler... If you are overclocking AMD chips you have no choice but to go with liquid cooling...
Like This (http://www.newegg.com/Product/Product.aspx?Item=N82E16835209054) ... or better.
Hope that helps....
it would have been helpful if they had put a bit of heat sink on there ::)
(http://techgage.com/reviews/nvidia/geforce_gtx_580/nvidia_geforce_gtx_580_01_thumb.jpg)
at any rate, going from 52 to 62 C in 2 or 3 seconds is a bit surprising
it takes a certain amount of energy to change the tempurature of a certain mass by some delta
Quote from: dedndave on November 18, 2012, 12:50:07 AM
it would have been helpful if they had put a bit of heat sink on there ::)
(http://techgage.com/reviews/nvidia/geforce_gtx_580/nvidia_geforce_gtx_580_01_thumb.jpg)
at any rate, going from 52 to 62 C in 2 or 3 seconds is a bit surprising
it takes a certain amount of energy to change the tempurature of a certain mass by some delta
That's an OEM board, intended to be sold to manufacturers who would then mount their own brand name covers and heatsinks and resell the board.
I hope you didn't power it up like that... damage can occur in just a few seconds...
crappy engineering, then
i would never put a product on the market that will fry itself - lol
Quote from: dedndave on November 18, 2012, 01:04:12 AM
crappy engineering, then
i would never put a product on the market that will fry itself - lol
I think you're missing the point, Dave...
That board, without a heat sink and cooling covers, was never intended for retail sale.
It's a wholesale item intended to be sold to other computer companies, not end users.
Ok, you're "Joe's PC Emorium" and you want to sell video cards with your name on them... So you order 5,000 of these things with no cooling solution... and you order 5,000 cooling setups with your company logo on them. You do a little assembly and sell the boards under your own name.
It's not bad engineering... it's just a level of the business most people never see.
Whoever put that up for retail sale was irresponsible in the extreme ...
i get that
but - i say nvidia is missing the point...
they should put the heatsinks and fans on there and mark up the cost
they are missing out on a piece of profit
not to mention, they maintain control over the thermal considerations
if the OEM wants their logo on there, sell it to them that way and mark it up again :t
Quote from: dedndave on November 18, 2012, 01:14:56 AM
but - i say nvidia is missing the point...
That may be true ... but it isn't how the industry works.
Actually ... I hope you don't mind but I'm going to expand on this a bit...
Right now, there are versions of that board with Gigabyte, Asus and MSI brand names on them. All sell cheaper than the NVidia branded board. This is why they do that... NVidia makes a lot more money selling that barebones board than they do retailing their own produces... by orders of magnitude more. Wholesaling in batchs of 1,000 or more is far more lucritive than retail could ever be.
It's a real hoot listening to two geeks in a coffee shop arguing if the Gigabyte version is better than the Asus one. "The Gigabyte outperforms Asus every time"... "No the Asus has a better display".... Ummm guys... it's the SAME BOARD!
This kind of rebranding happens all the time...
At one time Radio Shack's CB radios were made by Cobra... and people would tell me RS is crap get a Cobra if you want a REAL radio.
Sears Canada used to sell a whole line of stereos that were just JVC boards in Sears cases... People would tell me that JVC is so much better. (the real difference was the speakers.)
The radio shack CB was half the price of the equivalent Cobra model.
The Sears stereos were often 1/3 or less the price of the equivalent JVC component.
This could not happen without rebranding.
it is common to remove the cooler for photos because people want to see what they pay for ( ::)) - such 'heating elements' can't work without a cooler :icon_cool:
Quote from: qWord on November 18, 2012, 01:34:51 AM
it is common to remove the cooler for photos because people want to see what they pay for ( ::)) - such 'heating elements' can't work without a cooler :icon_cool:
Yes they can... For a few seconds until they overheat and self-destruct.
I once worked for a company that got a contract to supply 300+ computers to a large office building. NONE of the computers worked on delivery and it fell to me to fix them... Every one of them had a damaged CPU chip on the motherboard. What they did in the assembly plant was to fire them up "bareback" for a quick check, then they finished the assembly adding pre-loaded hard drives, fans, coolers, etc. and shipped them out without turning them on again. They had gotten enough time to default the BIOS... and had damaged every CPU chip in the process.
Dave already indicated he had the version without the heat sink.
And... yes you can order that barebones board from NVidia in quantities of 100 or more...
Funny enough I lost a good quality video card some years ago because one of the clips that held the heat sink onto the main video chip pulled out and it fried the chip very quickly. It was no joy to diagnose as the machine would not even boot so I had to do it the hard way, pulled all of the cards out of it and tested different cards one at a time. I had an ancient PCI video card to test with and it worked so I then carefully pulled the video card apart and found the heat sink had move just enough to not seat on the chip.
I made sure the next one did not have any heat sink mounting problems. :P
Well, the 90C was reported by the motherboard utility (ASRock OC Tuner) but I finally got Core Temp to work.
When ASRock says 85, Core Temp says 55, so I am more inclined to believe Core Temp (AMD say 67 is top).
I think ASRock uses motherboard sensors whereas Core Temp has a driver that queries the CPU cores.
Just installed Core Temp out of curiosity. It says core #0 30°, core #1 24° - and that doesn't change when running the proggie below. CPU is AMD 4450B.
include \masm32\include\masm32rt.inc
.code
start:
.Repeat
xor ecx, ecx
.Repeat
dec ecx
.Until Zero?
print "*"
invoke Sleep, 1
invoke GetKeyState, VK_SHIFT
.Until sword ptr ax<0
print "bye"
exit
end start
Single threaded, your proggie is a consistent 17% CPU usage (1 out of 6 cores), no effect on temps.
Spawn 6 threads though, I will give it a go.