Beta testing - Calibrated Boinc core

International discussions and queries, preferably in English. Deutsch wilkommen. Français bienvenu.

Moderátoři: zdespi, Moderátoři

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » pon led 16, 2006 6:47 pm

Mr.Pernod píše:If you happen to know which processor(s) they use as reference, I can try to locate one and compare numbers, no problem.
It is a hypothetic machine. I do not think one like that exists. I think they simply run a reference unit on some machine(s) and then recalculate its CPU time value accordingly. The value is then passed with every WU, but the real value can vary, depending on the character of the unit. It is really valid for the reference WU only.

Mr.Pernod píše:Note about the NNW with BoincManager 5.2.13 and core client 5.3.11, BoincManager does not display a message in the messagetab when toggling the function, but it does work.
What is NNW? No messages at all with that manager? Well, I advice installing it over a matching BOINC version, so that may fix the problem. BOINC dev team changed the RPC ports lately, so unmatching versions of the manager and the core may indeed fail to communicate.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 6:51 pm

No New Work / Allow New Work button. This function does not give feedback with this particular combination, but the actual functionality is intact.

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » pon led 16, 2006 6:56 pm

Mr.Pernod píše:No New Work / Allow New Work button. This function does not give feedback with this particular combination, but the actual functionality is intact.
Yes, I saw the official version had some changes in the messaging. Some messages were removed, but I have no idea why. This might be well one of them. Maybe it was just a mistake, but I did not feel like not synchronizing with all the official modifications.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 7:12 pm

don't worry about, it's not a real problem, it's just something people might notice and panic about. :wink:

btw, how is this for strange behaviour....
5.3.11.tx31, dual Athlon MP2400+

Kód: Vybrat vše

1/16/2006 6:01:35 PM||Suspending network activity - running CPU benchmarks
1/16/2006 6:01:37 PM||Running CPU benchmarks
1/16/2006 6:02:37 PM||Benchmark results:
1/16/2006 6:02:37 PM||   Number of CPUs: 2
1/16/2006 6:02:37 PM||   1959 double precision MIPS (Whetstone) per CPU
1/16/2006 6:02:37 PM||   2711 integer MIPS (Dhrystone) per CPU
1/16/2006 6:02:37 PM||Finished CPU benchmarks
1/16/2006 6:02:38 PM||Resuming computation
1/16/2006 6:02:38 PM||Rescheduling CPU: Resuming computation
1/16/2006 6:02:38 PM||Resuming network activity

1/16/2006 6:03:13 PM||Suspending network activity - running CPU benchmarks
1/16/2006 6:03:15 PM||Running CPU benchmarks
1/16/2006 6:04:14 PM||Benchmark results:
1/16/2006 6:04:14 PM||   Number of CPUs: 2
1/16/2006 6:04:14 PM||   1959 double precision MIPS (Whetstone) per CPU
1/16/2006 6:04:14 PM||   1561 integer MIPS (Dhrystone) per CPU
1/16/2006 6:04:14 PM||Finished CPU benchmarks
1/16/2006 6:04:15 PM||Resuming computation
1/16/2006 6:04:15 PM||Rescheduling CPU: Resuming computation
1/16/2006 6:04:15 PM||Resuming network activity

1/16/2006 6:10:26 PM||Suspending network activity - running CPU benchmarks
1/16/2006 6:10:28 PM||Running CPU benchmarks
1/16/2006 6:11:27 PM||Benchmark results:
1/16/2006 6:11:27 PM||   Number of CPUs: 2
1/16/2006 6:11:27 PM||   1957 double precision MIPS (Whetstone) per CPU
1/16/2006 6:11:27 PM||   5534 integer MIPS (Dhrystone) per CPU
1/16/2006 6:11:27 PM||Finished CPU benchmarks
1/16/2006 6:11:28 PM||Resuming computation
1/16/2006 6:11:28 PM||Rescheduling CPU: Resuming computation
1/16/2006 6:11:28 PM||Resuming network activity

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » pon led 16, 2006 7:41 pm

Hmm, that's odd. Did it stop consequently alone, or did you have to restart the client?

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 7:58 pm

didn't stop the client, just ran the benchmark a few times.

I just restarted BOINC on the machine, but the benchmarks keep going up and down like crazy.

on the single cpu Athlon XP's it gave around 2GFlops/5.9GIops first time round and remain close to those numbers when rerunning them,
the Xeons also reported good scores, only the dual Athlon MP gives me these weird result.
I have seen fluctuating benchmarks before, both with standard and optimized BOINC clients, but never this far apart.

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » pon led 16, 2006 8:15 pm

Ah, so it was you who started the benchmarks manually! I thought it was the problem you spoke about, and that the client ran the benchmarks unattended in a loop :)
Greatly varying benchmarks are very common with all versions I saw (not only mine). This was also one of the reasons to introduce the calibration. By using the calibration, the benchmarks are irrelevant, and you should avoid running them too often - each benchmarking requires then additional time and WU's to adjust the calibration.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 8:21 pm

I just looked up some old benchmark scores from the stock 5.2.13 and an optimized 5.3.2, but those were pretty consistent over several runs.
a fluctuation this big will require recalibration every 5 days when the automated benchmarks are run :cry:

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 8:59 pm

Dorsai posted a strange thing concerning "suspend network activity" in 5.3.11.tx31 in the thread over on the SETI boards.
Dorsai píše:Re the network thing, very odd, but Boinc just popped up a message, "16/01/2006 18:43:11||Suspending network activity - user request"...Odd...I turn it off, it turns back on, then a few mins later turns it'self back off..Odd.. :-/
I have not been able to reproduce this behaviour, but this is what happens when I suspend network activity through the menu in BOINCManager and the select a project and click the update-button.

Kód: Vybrat vše

1/16/2006 7:58:15 PM||Suspending network activity - user request
1/16/2006 7:58:23 PM||Resuming network activity
1/16/2006 7:58:23 PM|seti@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
1/16/2006 7:58:23 PM|seti@home|Reason: Requested by user
1/16/2006 7:58:23 PM|seti@home|(not requesting new work or reporting results)
1/16/2006 7:58:34 PM|seti@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
I have suspended networkactivity again and will check what happens when the 10 minute time-out finishes.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 10:25 pm

I can't reproduce Dorsai's exact issue.
when I suspend network activity, it stays suspended when a project is trying to upload, but when I force an update with the update-button network activity is resumed.
there must be some serious changes in 5.3.11 when compared to the recommended 5.2.13

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » pon led 16, 2006 11:16 pm

ok, the dual athlon MP went crazy.
In addition to the extreme variations in benchmarks, it also started reporting extreme benchmark values to the SIMAP project

Kód: Vybrat vše

Measured floating point speed 8214.27 million ops/sec 
Measured integer speed 5948.17 million ops/sec
I deinstalled BOINC, reinstalled 5.2.13 and then upgraded boinc.exe to 5.3.11.tx31
Benchmarks are now normal and reported correctly to the project.

For SIMAP I currently have this in the client_state.xml

Kód: Vybrat vše

    <credit_correction_factor>1.000000</credit_correction_factor>
    <credit_correction_checksum></credit_correction_checksum>
I assume this will change untill the calibration has settled down.

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » úte led 17, 2006 12:55 am

As I already explained several times, the bechmarks are completely irrelevant and you do not need to bother about it. Also, when I run 5 subsequent benchmarks with the same client (of any version and any author), I usually get 3-5 fold differences in the results, in the extremities. And I am definitely not alone - you can see it reported many times on the S@H board. Still, as I tell: it is completely irrelevant, and you can quietly ignore it.

As for the network connectivity - it is a known bug in the official version, but I assumed it was fixed in the official 5.3.11. Apparently not yet completely.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » úte led 17, 2006 8:19 am

trux, I am having a serious problem with SIMAP here
look at this host and its results.
the benchmark values in the host properties matched the values in the client_state.xml last night, but seem to increase with every result returned.

manager 5.2.13, core 5.3.11.tx31

client_state.xml:

Kód: Vybrat vše

    <p_ncpus>2</p_ncpus>
    <p_vendor>AuthenticAMD</p_vendor>
    <p_model>AMD Athlon(tm) MP 2400+</p_model>
    <p_fpops>1945469920.975126</p_fpops>
    <p_iops>5518443152.832880</p_iops>
..
..
<project>
    <master_url>http://boinc.bio.wzw.tum.de/boincsimap/</master_url>
    <project_name>boincsimap</project_name>
..
    <credit_correction_factor>0.132821</credit_correction_factor>
    <credit_correction_checksum>32865cfcb9c93af14fbdc883b9b17cf8</credit_correction_checksum>
calibration history since BOINC reinstall last night:

Kód: Vybrat vše

2006-01-16 22:29:42 [boincsimap] CC calibration: 8.31 >> 5.03 (time: 1924s >> 1118s / Gfpops: 1.95 >> 2.26)
2006-01-16 22:34:11 [boincsimap] CC calibration: 8.26 >> 5.00 (time: 1913s >> 1112s / Gfpops: 1.95 >> 2.26)
2006-01-16 23:02:38 [boincsimap] CC calibration: 8.44 >> 5.91 (time: 1954s >> 1256s / Gfpops: 1.95 >> 2.61)
2006-01-16 23:07:57 [boincsimap] CC calibration: 8.66 >> 6.98 (time: 2004s >> 1414s / Gfpops: 1.95 >> 3.01)
2006-01-16 23:36:41 [boincsimap] CC calibration: 8.74 >> 8.07 (time: 2023s >> 1555s / Gfpops: 1.95 >> 3.45)
2006-01-16 23:40:00 [boincsimap] CC calibration: 8.24 >> 8.66 (time: 1908s >> 1586s / Gfpops: 1.95 >> 3.92)
2006-01-17 00:09:06 [boincsimap] CC calibration: 8.34 >> 9.92 (time: 1930s >> 1721s / Gfpops: 1.95 >> 4.44)
2006-01-17 00:13:13 [boincsimap] CC calibration: 8.54 >> 11.42 (time: 1976s >> 1877s / Gfpops: 1.95 >> 4.99)
2006-01-17 00:43:12 [boincsimap] CC calibration: 8.69 >> 12.99 (time: 2013s >> 2024s / Gfpops: 1.95 >> 5.58)
2006-01-17 00:45:09 [boincsimap] CC calibration: 8.10 >> 13.41 (time: 1874s >> 1981s / Gfpops: 1.95 >> 6.18)
2006-01-17 01:15:47 [boincsimap] CC calibration: 8.38 >> 15.29 (time: 1940s >> 2143s / Gfpops: 1.95 >> 6.81)
2006-01-17 01:17:14 [boincsimap] CC calibration: 8.24 >> 15.02 (time: 1907s >> 2106s / Gfpops: 1.95 >> 6.81)
2006-01-17 01:48:18 [boincsimap] CC calibration: 8.36 >> 18.09 (time: 1936s >> 2300s / Gfpops: 1.95 >> 8.07)
2006-01-17 01:50:44 [boincsimap] CC calibration: 8.60 >> 20.05 (time: 1992s >> 2437s / Gfpops: 1.95 >> 8.70)
2006-01-17 02:21:48 [boincsimap] CC calibration: 8.61 >> 21.47 (time: 1994s >> 2503s / Gfpops: 1.95 >> 9.31)
2006-01-17 02:22:49 [boincsimap] CC calibration: 8.24 >> 21.81 (time: 1907s >> 2447s / Gfpops: 1.95 >> 9.88)
2006-01-17 02:53:48 [boincsimap] CC calibration: 8.20 >> 24.08 (time: 1899s >> 2526s / Gfpops: 1.95 >> 10.96)
2006-01-17 02:57:01 [boincsimap] CC calibration: 8.63 >> 25.34 (time: 1998s >> 2658s / Gfpops: 1.95 >> 10.96)
2006-01-17 03:26:44 [boincsimap] CC calibration: 8.35 >> 25.60 (time: 1934s >> 2609s / Gfpops: 1.95 >> 11.44)
2006-01-17 03:30:21 [boincsimap] CC calibration: 8.56 >> 27.25 (time: 1982s >> 2707s / Gfpops: 1.95 >> 11.88)
2006-01-17 04:00:41 [boincsimap] CC calibration: 8.73 >> 28.73 (time: 2022s >> 2790s / Gfpops: 1.95 >> 12.28)
2006-01-17 04:02:19 [boincsimap] CC calibration: 8.21 >> 27.80 (time: 1901s >> 2646s / Gfpops: 1.95 >> 12.63)
2006-01-17 04:34:03 [boincsimap] CC calibration: 8.17 >> 28.38 (time: 1892s >> 2655s / Gfpops: 1.95 >> 12.96)
2006-01-17 04:34:36 [boincsimap] CC calibration: 8.71 >> 30.95 (time: 2018s >> 2849s / Gfpops: 1.95 >> 13.25)
2006-01-17 05:06:12 [boincsimap] CC calibration: 8.24 >> 29.82 (time: 1909s >> 2709s / Gfpops: 1.95 >> 13.50)
2006-01-17 05:08:28 [boincsimap] CC calibration: 8.69 >> 31.93 (time: 2011s >> 2868s / Gfpops: 1.95 >> 13.72)
2006-01-17 05:39:56 [boincsimap] CC calibration: 8.66 >> 32.27 (time: 2005s >> 2870s / Gfpops: 1.95 >> 13.91)
2006-01-17 05:41:46 [boincsimap] CC calibration: 8.55 >> 32.24 (time: 1980s >> 2844s / Gfpops: 1.95 >> 14.07)
2006-01-17 06:13:38 [boincsimap] CC calibration: 8.18 >> 31.12 (time: 1893s >> 2726s / Gfpops: 1.95 >> 14.21)
2006-01-17 06:13:50 [boincsimap] CC calibration: 8.70 >> 33.11 (time: 2014s >> 2901s / Gfpops: 1.95 >> 14.21)
2006-01-17 06:45:45 [boincsimap] CC calibration: 8.20 >> 31.70 (time: 1899s >> 2747s / Gfpops: 1.95 >> 14.43)
2006-01-17 06:47:23 [boincsimap] CC calibration: 8.67 >> 33.71 (time: 2006s >> 2907s / Gfpops: 1.95 >> 14.52)
I have pulled this host from SIMAP and am going to test this one to make sure it's not host-related.

hostdetails at this moment:

Kód: Vybrat vše

CPU type AuthenticAMD 
AMD Athlon(tm) XP 2800+ 
Number of CPUs 1 
Measured floating point speed 2072.87 million ops/sec 
Measured integer speed 6004.86 million ops/sec 

Uživatelský avatar
trux
Banánová odměna
Banánová odměna
Příspěvky: 874
Registrován: čtv kvě 12, 2005 2:16 am
Kontaktovat uživatele:

Příspěvek od trux » úte led 17, 2006 3:32 pm

As I already mentioned, it is quite well possible that at some projects it does not work well. I know nothing about SIMAP, but can imagine it could be the case if the project reports the estimated value too high (theoretical maximum), and the client, at practically all WU's, aborts the unit before completing, when it finds the final result or another stop condition prematurely. If no unit is completed, the client has no chance to estimate the full lenght, and starts considering the short aborted units for full ones, claiming for them the full credit. In such case only calibrating with a reference unit of a known value would help, but since I have such unit just for S@H, I am afraid it would not be easy. I might create some empiric table with additional coeffcients for such improperly behaving projects, but it may be rather difficult too, especially because I do not participate in too many projects and do not know the behaviour of all of them.

I'll probably add a table with some limiting values for individual projects later, but may need help of bigger amount of users participating at all those projects, to collect the necessary statistical data.

Uživatelský avatar
Mr.Pernod
Nováček
Příspěvky: 31
Registrován: pát led 13, 2006 6:55 pm

Příspěvek od Mr.Pernod » úte led 17, 2006 5:18 pm

The second machine is bottoming out at about 50% claimed when compared to avergae granted, I'm pulling that one from SIMAP for now as soon as I get home.
The same machines are doing fine on LHC@home, even though that project has wildly varying result-runtimes (from 10 to 25.000 seconds).

I posted this issue on the SIMAP forums, asking if they were running anything special server-side.
On the server side we are running the standard daemons from the boinc distribution except on the validator.
For now the conclusion is that the estimated runtimes registered serverside are way off (8 hours opposed to 1 hour actual).
The project has corrected this and I will attempt another test tomorrow-evening when the results with the new estimated runtimes should be available.

Odpovědět

Kdo je online

Uživatelé prohlížející si toto fórum: Žádní registrovaní uživatelé a 1 host