6_6_6

NCQ: Best Upgrade For a Power User!

Recommended Posts

@FAT_Punisher

You are an idiot.

Sometimes I really have the feeling that you are trying to kill this thread. If you are so much convinced that this thread leads to nowhere, please see elsewhere . Otherwise avoid comments that could be uselessly upsetting.

This changes the requirements for NCQ as well (if it works).

I think you misunderstood. Aim of NCQ is to limit the number of rotations/movements necessary, to reach data from two or more locations.

I think it's more clear in the picture from wikipedia:

300px-NCQ.svg.png

A big block size help to minimize the number of operations required , for a single sequential reading. It's like when you walk: you'll move faster when you do big "jumps" with your feet at once.

I don't know if we could find a better comparison.

In principle, there's no relationship between NCQ & bloc size.

However it seem people mentioned a big drop in performance for 512k bloc size, and that's what I found weird.

with huge block sizes (1 MB and up to 16 MB) even drives with NCQ disabled begin to work better with 2x HD_Speed (meaning higher total throughput)[...]Which is perfectly clear: the drive has to jump less often

Maybe the gap in performance between NCQ enabled & disabled decrease when using big bloc sizes. But that doesn't make NCQ less useful. I really doubt that windows would use such huge blocs.

So we really should stop testing with 256k only.

I only need one test. Is there an unexpected behavior for 512k block size with the seagate 7200.11 & NCQ enabled (when compared with 256k blocks) . Because if there's a big drop in performance, that wouldn't be logical.

Edited by extrabigmehdi

Share this post


Link to post
Share on other sites
The author writes:
When copying within the same drive, a larger buffer size should be chosen than when copying between two drives, so the read/write head doesn't have to jump between source and target all the time.

This changes the requirements for NCQ as well (if it works).

But buffer is not the same thing as block size. I think that even with a very large TC buffer, the filesystem still accesses HD data with certain block size. But what that size is and is it possible to change it, that is a mystery to me. I seem to recall seeing a regedit tweak for something like this.

Share this post


Link to post
Share on other sites

Okay,

Can we get a statement of the current position/understanding with this NCQ issue then. And what we need forum readers to test, in order to help move towards understanding this ncq issue.

It seems that there is a lot of information, but its becomming hard to digest because its so spread out.

Thanks,

LittleJhon.

Share this post


Link to post
Share on other sites

FYI: Read the post directly above this b4 commencing.

oh, and FAT Punisher

could you please share with me the communications you had with WD regarding the WD6400aaks. I PM'd you on your hexus account, it would help in building understanding of where WD and intel lie on this issue.

LittleJhon

Share this post


Link to post
Share on other sites
Sometimes I really have the feeling that you are trying to kill this thread. If you are so much convinced that this thread leads to nowhere, please see elsewhere . Otherwise avoid comments that could be uselessly upsetting.

By no means do I want to kill this thread. It is very interesting.

I am only trying to look deeper than 6_6_6 does, because I am not convinced, especially since I tested the 7200.11 myself. He does not like that, obivously.

I do not want that everybody who reads this thread buys an 7200.11 and is disappointed afterwards, like I was.

And I am trying to find out if I simply missed something and the 7200.11 is perhaps working as 6_6_6 claims. Because I want to have this huge performance boost!

But in real-world situations, not in benchmarks.

The "idiot" statement was provoked by 6_6_6's behaviour towards me. I am posting personal experiences, thoughts, arguments, not "random noise".

I think you misunderstood. Aim of NCQ is to limit the number of rotations/movements necessary, to reach data from two or more locations.

[...]

In principle, there's no relationship between NCQ & bloc size.

I know that.

What I was trying to say: If two concurrently running applications (let's say 2x HD_Speed) read huge chunks of data (e.g. 16 MB block size in HD_Speed, or bigger buffer size in Total Commander), the head movements of the drive are already greatly reduced. Therefore, NCQ can't do as much as it could when using smaller block sizes anymore.

One could try it: 2x HD_Speed, 16 MB block size, with NCQ enabled and with NCQ disabled. The difference should be at least smaller than with 256k.

But you're right: 16 MB blocks will probably be used very seldom.

But buffer is not the same thing as block size. I think that even with a very large TC buffer, the filesystem still accesses HD data with certain block size. But what that size is and is it possible to change it, that is a mystery to me. I seem to recall seeing a regedit tweak for something like this.

This is exactly the problem.

Myself included, we simply don't know if the buffer size in TC is the same as the block size in HD_Speed, or what block/buffer sizes applications use. We do not even know if the file system really reads/writes with a fixed block size. And this is important for the HD_Speed tests! If (only an example) the file system uses 128k blocks, HD_Speed tests with 256k block are not very useful.

My guess (only a guess) is that there is no fixed block size. HD_Speed can choose a block size, so can other applications.

I tried to find information about this, but I wasn't yet successful.

It would be also helpful to know if what HD_Speed calls "block size" is the same as "buffer size" in Total Commander, or the buffer size you can freely choose when you write a programm (e.g. in C++) yourself, using CreateFile() and ReadFile() (Windows only).

The Microsoft Platform SDK says something about opening a file without buffering (caching):

FILE_FLAG_NO_BUFFERING: The system opens a file with no system caching. This flag does not affect hard disk caching. [...]

An application must meet certain requirements when working with files that are opened with FILE_FLAG_NO_BUFFERING:

File access must begin at byte offsets within a file that are integer multiples of the volume sector size.

File access must be for numbers of bytes that are integer multiples of the volume sector size. For example, if the sector size is 512 bytes, an application can request reads and writes of 512, 1024, or 2048 bytes, but not of 335, 981, or 7171 bytes.

Buffer addresses for read and write operations should be sector aligned, which means aligned on addresses in memory that are integer multiples of the volume sector size. Depending on the disk, this requirement may not be enforced.

This at least suggests (because of the relation to the sector size) that the block/buffer size can be chosen freely (if multiples of the sector size) and is indeed used on a very low level to fetch the data from disc. This would contradict the theory that there is a "fixed file system block size" (speaking of not fragmented files).

What can we do?

  • People with a 7200.11 can test.
  • If you use HD_Speed, do not only use 256k block size. It's very unrealistic to assume that Windows and all applications will use 256k as well.
  • Conduct real world tests, not only HD_Speed. Copy two files at the same time (or better: read two files at the same time). Copy a file and open an application, and measure how long it takes while copying at the same time, and how long without copying. Open 10 applications together and see if it's faster with NCQ enabled (7200.11 only!) than it is with NCQ disabled. Watch the total throughput of the disc while doing all this ("Control Panel / Administrative tools / Performace", click the + icon, choose "physical drives" (or similar), and look for "Bytes/s"). Always try with NCQ and without.

Share this post


Link to post
Share on other sites
oh, and FAT Punisher

could you please share with me the communications you had with WD regarding the WD6400aaks.

I'll try to translate the relevant part. I wrote to the WD support because of applications temporarly hanging/freezing with NCQ enabled at an ICH9R (and I couldn't disable it, because I needed the RAID mode of the ICH9R to be active).

This is what I wrote:

I have posted my problem in a discussion board already (as "|FAT|Punisher"), and since I am not the only one who experiences this behaviour, you might perhaps consider that thread interesting:

http://forums.hexus.net/hexus-hardware/138...tml#post1474001

The first posting in that thread describes probably the same problem, the remaining postings are more or less specs about NCQ and not that interesting.

In short: when reading a big file sequentially, so that the hard drive is under full load, other applications that try to access the same drive stop working completely for tens of seconds. They cannot move the slightest amount of data, not even write some 100 kB/s, while they're freezing this way. The GUI of those apps even hangs, and it happens to all sort of apps, even the Windows Explorer. But there are never error messages, including the Windows Event Log.

I am aware that multiple concurrent accesses to the same hard drive will of course slow down all participating processes, but the observed behaviour is worse by far than anything I have seen during my 15+ years experience with PCs.

Thank you for your time.

The WD support wasn't exactly talkative, and they didn't give real explanations or statements apart from the following (freely translated from German):

This problem is well-known to us, that is what I have tried to explain to you, that there are problems with some controllers.

But it lies in the responsibility of the chipset manufacturers to provide working drivers for the respective systems.

They wanted me to check the "hard drive compatibility list" of the mainboard if the WD6400AAKS is listed there. :blink:

Have you ever seen a "hard drive compatibility list" for your mainboard... <_<

Share this post


Link to post
Share on other sites

About the "block size" thing: read here. :)

Stumbled across it by pure incident.

You can get DiskMon here.

It clearly shows that choosing a block size of 16MB in HD_Speed results in 32768 sectors being read from the hard drive at the same time. One sector is 512 bytes (usually), 32768 * 512 = 16777216 = 16MB. :)

Copying big files in the Windows Explorer shows the read/write request to be only 128 sectors long (at least on my system), equaling to 64k blocks!!

Now you can check the block sizes of your favourite application and do the corresponding HD_Speed tests. :)

Share this post


Link to post
Share on other sites

Question to the 7200.11 owners:

Could you make a series of HD_Speed tests with every block size HD_Speed offers?

  • 2x HD_Speed
  • At positions 0% and 50%
  • test every block size
  • measure the total throughput (as discussed)
  • if you have the time: with and without NCQ

I know this is much work to do.

But since an application could use every block size it likes (as long as it is a multiple of the sector size), this would help a lot.

Since the Windows Explorer uses 128k (on my system! Try it on yours!), this might be a "standard" block size for more applications, we would have to verify that.

Share this post


Link to post
Share on other sites
About the "block size" thing: read here. :)

Stumbled across it by pure incident.

You can get DiskMon here.

It clearly shows that choosing a block size of 16MB in HD_Speed results in 32768 sectors being read from the hard drive at the same time. One sector is 512 bytes (usually), 32768 * 512 = 16777216 = 16MB. :)

Copying big files in the Windows Explorer shows the read/write request to be only 128 sectors long (at least on my system), equaling to 64k blocks!!

This was a great find! I checked and Total Commander's copy settings/buffer size seems to change this, 256k buffer uses 256k block size, 1020k buffer uses 1024k block etc. But Windows seems to load files/programs with 16k, 32k or 64k blocks, so I'm not sure how this should be tested to really simulate this. Maybe 4 HD_Speeds with 16k, 32k, 64k and 128k block sizes?

Btw. I will soon test/compare this NCQ stuff with a 3ware 9650SE 4 port RAID-card. I'll test in single drive mode and also raid modes with Seagate 7200.11/7200.10 and Samsung F1 and T166 drives.

Share this post


Link to post
Share on other sites

Here we go again, 32k, 64 and 128k Seagate 7200.11 750GB tests with NCQ on and off.

32k NCQ off:

Seagate-32k-NoNCQ.png

32k NCQ on:

Seagate-32k-NCQ.png

64k NCQ off:

Seagate-64k-NoNCQ.png

64k NCQ on:

Seagate-64k-NCQ.png

128k NCQ off:

Seagate-128k-NoNCQ.png

128k NCQ on:

Seagate-128k-NCQ.png

128k NCQ result is weird and looks bad, for some reason it's always like this.

Share this post


Link to post
Share on other sites

@DVB2100

thanks for posting your results.

What I retain from these graph is that a seagate is able to sustain a minimal throughput of 40Mb/s when there's two concurrent threads either when NCQ is on or off.

The peaks that appears when NCQ is ON tends to improve average througput , although with the 128k bloc size it's not obvious

(minimal throughput is worse with NCQ On)

I advise you posts results at least for 512k bloc size too.

My conclusion is that the seagate have a much better minimal throughput ( when there are concurrent threads )

than samsung hdd, or even a velociraptor in IDE mode.

Now if someone could test the velociraptor with NCQ ON, this would be nice too.

Edited by extrabigmehdi

Share this post


Link to post
Share on other sites
Now if someone could test the velociraptor with NCQ ON, this would be nice too.

Yes, I would very much like to see that. :)

One other question, what does it mean in HD Tune when all of the boxes for features on the drive are greyed out but have checks in them.

Share this post


Link to post
Share on other sites

@Atamido

unchecked --> unsupported

checked --> supported.

grayed --> probably hdtune is unable to get status of feature.

Hdtune works better for me under Vista than Xp, especially with a sata hdd.

BTW, you might be interested to know that Velociraptors have worse read/write performance ,

when NCQ is enabled for a RAID0. I don't know for a single disk. See the review:

http://www.maxishine.com.au/documents/wd_velocitaptors.html

Well, not very useful, but I found a "funny" discussion about NCQ here:

http://forums.guru3d.com/showthread.php?t=254481

The guy wanted absolutely NCQ, and finally here's the advice he got:

so try this. open notepad and type "NCQ = ON" and save it to your windows directory. presto! you now have NCQ performance

Share this post


Link to post
Share on other sites
Btw. I will soon test/compare this NCQ stuff with a 3ware 9650SE 4 port RAID-card. I'll test in single drive mode and also raid modes with Seagate 7200.11/7200.10 and Samsung F1 and T166 drives.

I'm looking forward to that!

The 9650SE has a feature called "StreamFusion":

StreamFusion optimizes I/O accesses to maximize application performance under multiple stream loads

This sounds like something similar to NCQ done by the controller, so I am very very curious if it works, especially with non-7200.11 drives.

Thanks for your many screenshots, by the way. It shows that anything else than 256k blocks aren't really good for NCQ with the 7200.11.

Share this post


Link to post
Share on other sites

@fat punisher

This sounds like something similar to NCQ done by the controller, so I am very very curious if it works, especially with non-7200.11 drives.

Interesting, but I doubt this would be similar to NCQ. An other optimisation, I guess.

It shows that anything else than 256k blocks aren't really good for NCQ with the 7200.11.

But results show that the seagate still outperform other drive, although they are much less exciting than with 256k.

My samsung doesn't even sustain 12Mb/s with two "concurrent threads".

A minimum of 40Mb/s with the seagate is still better ...

And this minimum is reached even with NCQ off.

Edited by extrabigmehdi

Share this post


Link to post
Share on other sites

Here are some tests with a VelociRaptor on my home PC's ICH7R. (I temporarily stole the drive from work to perform some non-destructive tests. :ph34r: ) However, I may have an issue with my controller not being in the correct mode as I discussed in detail here.

Here is the drive with the Intel controller in IDE mode.

untitled2iw3.png

untitled3dz0.png

Here is the drive in with the Intel controller in AHCI mode.

untitled4uu4.png

untitled5as0.png

Here is the drive attached to the Marvell controller in AHCI mode.

untitled6vt0.png

untitled7cy8.png

For whatever reason, it appears that NCQ is not working. Perhaps it's my system, or the drive?

It was fairly consistent Intel AHCI was a tiny bit faster than Marvell AHCI, which was a tiny bit faster than Intel IDE.

Share this post


Link to post
Share on other sites
The 9650SE has a feature called "StreamFusion":
StreamFusion optimizes I/O accesses to maximize application performance under multiple stream loads

This sounds like something similar to NCQ done by the controller, so I am very very curious if it works, especially with non-7200.11 drives.

But it remains to be seen if it does anything in single drive mode, maybe Streamfusion works just with RAID.

Share this post


Link to post
Share on other sites
But results show that the seagate still outperform other drive, although they are much less exciting than with 256k.

My samsung doesn't even sustain 12Mb/s with two "concurrent threads".

A minimum of 40Mb/s with the seagate is still better ...

And this minimum is reached even with NCQ off.

I'll have to do the same tests with my Samsung F1... I remember that I did one NCQ-off test with it, and it performed better (55-60MB) than the Seagate, but I don't remember what block size I used, maybe it was 512k. But: I had earlier the F1 1TB as my system drive and now I have the Seagate and I could swear that the system feels faster when loading programs, even without NCQ.

Share this post


Link to post
Share on other sites

@DVB2100

now I have the Seagate and I could swear that the system feels faster when loading programs, even without NCQ

I bet the improvements seen with the Seagate have nothing to do with NCQ.

I got the feeling that all these tests don't help , to see how the user experience with a seagate 7200.11 compares with the velociraptor.

Share this post


Link to post
Share on other sites
I bet the improvements seen with the Seagate have nothing to do with NCQ.

I got the feeling that all these tests don't help , to see how the user experience with a seagate 7200.11 compares with the velociraptor.

Yes, that is also my feeling about this, I think that the Seagate just has better cache management. My system disk has changed from Samsung HD401LJ -> HD501LJ -> F1 750GB -> F1 1TB -> Seagate 7200.11. With those Samsung disks, Windows load times got better but the user experience remained about the same and only the Seagate seems to have some effect to it. Then again, I have not really tested/timed this, so maybe it's just my imagination...

User experience testing is very difficult, maybe we should run 4-6 HD_Speeds all at different block sizes between 16-256k and set positions to 0-20%. Maybe that could simulate normal Windows usage better?

Share this post


Link to post
Share on other sites
User experience testing is very difficult, maybe we should run 4-6 HD_Speeds all at different block sizes between 16-256k and set positions to 0-20%. Maybe that could simulate normal Windows usage better?

Not really. In real use, there are almost never 4-6 processes all trying to load gigabytes from the disk.

Share this post


Link to post
Share on other sites
My conclusion is that the seagate have a much better minimal throughput ( when there are concurrent threads )

than samsung hdd, or even a velociraptor in IDE mode.

And as fast as my RAID 0 with two WD6400AAKS. I get ~40 MB/s total throughput with >= 64kB blocks.

Admitted: the 7200.11's 40 MB/s with two concurrent HD_Speeds are very very good!

With those Samsung disks, Windows load times got better but the user experience remained about the same and only the Seagate seems to have some effect to it. Then again, I have not really tested/timed this, so maybe it's just my imagination...

Yeah, it's all about how the system "feels", that was the reason why I came here in the first place. ;)

Unfortunately, feelings are hard to measure. :(

Share this post


Link to post
Share on other sites

Using Diskmon, here are is the distribution of requests that I come up with. I am assuming that the "length" column represents requests 512 byte units. Also, for scaling purposes, the largest requests are cut off. These seem to scale fairly linearly to around 10MB.

On this, after logging on to my system, I started Diskmon as soon as I could click on the desktop icon, then let it log until the harddisk settled down. Not very scientific, but it shows some basic startup usage. (Applications include Microsoft Firewall Agent, Citrix PNA, Windows Desktop Search 4, Trend Micro C/S Security Agent, One Note.)

diskmonstartupxq2.png

w630.png

The majority of the requests are 16, 128, and 256KB in size.

This is Diskmon for opening Outlook/Word/Excel 2007, Firefox 3 with about 10 tabs, Internet Explorer 7, Windows Media Player 11, and a Citrix session.

diskmonapplicationsfr1.png

w691.png

The majority of these requests are 128KB in size, followed by 16, and then a bottom weighted curve between the two.

It seems like testing HD Speed with multiple block sizes between 16 and 256KB is going to more accurately represent real world usage.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now