Category: Technology (Page 1 of 9)

ASV Scans /= PCI Compliance

There is an old story about a chicken and eagle. I hear this story being told by life coaches or motivational trainers trying to get through to our thick, jaded, technical skull that there is something more to life than coding and technology.

The abbreviated version is this: A farmer was walking and finds an eagle’s egg fallen out of the nest. He picks it up, brings it back to his farm, and puts it into the chicken coop. Soon, it hatches, and joins the other chickens in the farm and learns how to be a chicken, even though its an eagle. So this is where some of the version diverges.

a) The chicken and the eagle starts talking one day and the eagle notices another eagle flying high in the sky and he goes, “Dang, I wish I could be an eagle,” and his chicken-pal looks at him scornfully and says, “You are a chicken. How can you be like the king of all birds, soaring through the sky?” So the eagle keeps thinking he is a chicken and the next day he gets roasted for dinner. And the farmer finds his meat a bit tough and doesn’t taste like chicken at all. The moral here is: Don’t let your limitations inhibit you or you will end up a cooked and eaten. This is probably the original version before the other two came along below:

b) The farmer is visited by a naturalist who observes this ‘chicken’ and immediately knows he is an eagle. So he takes this chicken up to a high cliff, and throws him over, shouting: “Spread your wings and fly! Soar like the eagle you are meant to be!” And the eagle soars through the clouds and sky and become the king of all birds. The moral of the story: All of us are eagles, even if you think you are a chicken. All you need is a life coach or a motivational trainer to throw you off the ledge and you will soar. This is the preferred version for life coaches and motivational speakers. For obvious reason.

c) Same as story b) above, but instead of soaring, the naturalist throws the ‘chicken’ off the ledge, and it falls 100 feet and splatters its brains all over the bottom of the ledge and dies since it doesn’t know how to fly. And gets cooked and roasted for dinner. The moral of the story (and this is by far, our more preferred, realistic and risk-averse version): Don’t do something you may be destined for but not ready for. Or you will end up smashed, cooked and eaten.

All three versions have this theme in common: The eagle isn’t a chicken and the chicken isn’t an eagle. The chicken may have commonalities of an eagle, like wings and a beak, but just because it has those doesn’t make it an eagle.


Yes, I am aware that the anecdote above isn’t a very good illustration of the point I am trying to make, but I couldn’t think of a better one. And in a roundabout way, what I want to illustrate here is that ASV scans do not make you PCI Compliant.

We get this a lot.

A company would come and say they are PCI-compliant. Or we have a client who outsources certain portion of their operations to another company and that company comes back and shows us their ASV compliant scan and says this is all they need to show us. We (The auditors/consultants) are compelled to accept this because the ASV scans demonstrate their PCI Compliance, they say.

Let’s make a point here: ASV questions and subquestions in the SAQ D covers around 14 queries. Out of around 600. That means ASV covers 2.33% of PCI-DSS. There is a massive load of other controls and items covering PCI-DSS Other than those precious ASV quarterly scans. What about your patching? Hardening? Firewall security? HR policies? Logging and monitoring? Logical access? MFA? Hardening of systems? Anti-virus and host firewalls? What about service provider management? What about vendor default passwords? What about storage, encryption, key management? Software development? Application and penetration testing? Internal vulnerability scans? Training?

You can see how impossible it is to accept just the ASV report as an evidence of PCI compliance, much like how we cannot accept the chicken as an eagle, but yet, we are constantly berated upon that we don’t know what we are doing and that their Banks have accepted their ASV scans as a sign of PCI compliance, so we should to. But we can’t. We can’t accept 2.33% as a 100% of something. It’s simply mathematically not possible.

So there you go – banks. Why do banks perpetuate this myth that PCI compliance = ASV scans? Why? It’s 2.33% of PCI-DSS! You can’t accept something as an eagle just because it has wings and a beak! There’s really no argument about it.

Here is what 2.3% feels like:

a) The number of Jazz music of all US Music sales in 2013

b) Increase in slot machine spending in New Zealand in 2018 Q1

c) Auto parts industry against the US GDP in 2013

d) Android 6.0 Marshmallow installation for all Android devices in July 2016

e) Thats lesser than the % of freshwater we have on this planet (2.5% of water on the planet is freshwater)

I am sure there’s a lot of 2.33% out there on this planet, but the point we are making is this: It’s not compliance. It’s a small but important part of compliance but it’s not compliance. So no matter what your banks tell you, we can never accept the ASV scan as a sign of PCI compliance. It can be accepted as one of the evidences of PCI compliance amongst many, but not as an evidence of complete compliance.

Now, stop calling a chicken an eagle. Let us know about your questions for PCI or any compliance at pcidss@pkfmalaysia.com.

PCI-DSS:Say No to Certificates

For those who have been reading this blog long enough, you would know that we are absolutely, completely, mind-numbingly devoted to the anti-certificate movement within the PCI-DSS. Really. And every single month, almost, it never fails that we get enquiry that our customer or their acquirer are demanding to see the precious certificate of compliance. And rejecting the AoC. Rejecting the RoC.

It has truly become so farcical in PCI-DSS when acquirers – banks! – demand this of our customers. To an extent that even our customers give a wry shrug at us, the way my wife and I would shrug at each other when my kid tells us that he just witnessed an elephant doing hoola-hoops in a tutu in his kindergarten that morning.

We have written it before and will keep writing it till the horn sounds for the second coming: Compliance ‘certificates’ are NOT recognised by the PCI-SSC! PCI-DSS seals with those wondrous badges (like the police etc) are not recognised by the PCI-SSC. In the words of the council:


The only documentation recognized for PCI DSS validation are the official documents from the PCI SSC website. Any other form of certificate or documentation issued for the purposes of illustrating compliance to PCI DSS or any other PCI standard are not authorized or validated, and their use is not acceptable for evidencing compliance. 

PCI COUNCIL REMONSTRATING TO ALL PRO-CERTIFICATES TO STOP DOING THIS NONSENSE

So banks – please, please, for the sake of all that is good and worthy in our God given Earth – DO NOT demand your providers/customers/merchants to show the certificate of compliance. It’s ridiculous and it demonstrates that you, an entity that should know PCI first and foremost, are absolutely not doing your job well. You are making demands for things that are considered unauthorized and unacceptable!

We are not saying certificates are illegal or those peddling these certificates are cheating anyone. By far and large, all QSAs generally provide these so called certificates as an easy way to illustrate compliance, or just to have the customer frame it up and put it onto their wall. This is perfectly, absolutely fine. Even our QSAs do it. It’s not the problem with the QSAs putting these certificates out. The problem are with the acquirers or those demanding to see PCI compliance from their merchants/providers etc. Banks, financial institutions etc who refuses to see anything else but the ‘certificate’ as evidence of PCI-DSS compliance. It’s frustrating. Yes, most clients will be able to provide these ‘certificates’, but where it boils us up is when the acquirer refuses to accept the RoC and AoC as evidence of compliance! WHY NOT? Because likely the person in the bank requesting PCI-DSS have zero clue what PCI-DSS is , or what it’s supposed to be, in the first place.

Banks, here is a simple illustration:

Would you accept the below as proof of compliance:

Or would you accept the one below:

If you answer the first one, then the question is why do you reject the second one?

“Well because it looks fake and it looks like its scrawled by a two year old, or a random hamster running around on a paper with ink on its paws,” you reply.

Well, guess what?

Both should either be rejected, or accepted because both are of the same value. SAME VALUE. Just because one certificate isn’t designed as aesthetically nicer than the other doesn’t make it less of a certificate. Why? Because the baseline worth of the certificate is zero. There is ZERO value to the certificate on paper. The only value attached to it is from the viewpoint of the person looking at this worthless piece of paper and going, “Hum, that looks nice.” or “Hmm, that color looks off.”

I know this may sound like an over-reaction, because at the end, since Certificate of Compliance is now the norm (due to these demands) – everyone who has an AoC would probably have a certificate as well, right? Well, what about those doing their own SAQ? Do they design their own Certificate then and say this is a self attested cert? So, Mr Bank, how do you wriggle yourself out of this scaffafle? Why do you place so much value into something that (according to PCI SSC) is absolutely worthless, and do not focus on the actual documents that are worth something? And worse of all – to actually reject the documents that are formally from PCI-SSC and accept only these glorified certificates that are worth as much as the paper its printed on!

I think, the only resolution to this is to completely do away with PCI certificates. The next person touting these certificates as the only means of PCI validation, we are going to show them that certificate that’s drawn by the hamster and see what they say.

Jokes aside – let us know if you have any questions on PCI-DSS or any security compliance in your company – we are always willing to help out – drop us a note at avantedge@pkfmalaysia.com.

SAQ A and A-EP, the eternal struggle

Another week, another lockdown struggle, another political instability and another question on the eternal confusion called the SAQ A and A-EP. And this time, it wasn’t so much of us trying to clarify with the customer on this – but us trying to explain to QSAs on it. It just shows how much confusion there is to this thing even after all these years, that even auditors, whose bread and butter is literally on PCI-DSS still struggle to understand it. I don’t blame them – it’s the way that the SAQs are worded, and the confusion that surrounds it that makes it so frustratingly difficult to interpret.

SAQ A by far gets the most mileage. Not because a lot of people are eligible for it, but because at 20+ questions, it’s by default the go-to SAQ for most organisations, whether they are eligible for it or not. I mean, comparing the SAQ D and the A is like scaling Everest vs the little sand hill that your 5 year old kid just built on the beach. Something like that. So everyone (even non-eligible Service Providers) always default to the Open Shortest Path First, which is the SAQ A when they need to choose an SAQ to be “PCI-Compliant”.

However, SAQ A is notoriously difficult to be eligible for and today we are going to look at it. Again. We have often seen everyone anything from storing card information, to hardcopy storage of insurance policies, to doing outsourced call center picketing in front of our office shouting for their SAQ A rights. I mean, let’s start here with SAQ A and A-EP and the difference.

We are not going to focus on the controls in these SAQ, but rather the ‘eligibility’ of it, meaning, on Page 4 of both SAQ under “Before You Begin”. Instead of just repeating all that is typed in there in this article, I will assume those reading this article is keen for a deeper dive into the murky waters of SAQ and not here for a shallow wade – so I am going to assume you have those SAQs right in front of you and I don’t have to delve into the history much, ok?

SAQ A’s story starts off by stating there are TWO types of business who are eligible for it.

a) E-Commerce Merchants

b) MOTO (mail order/telephone order) – card not present

c) Of course, those who do not STORE, PROCESS or TRANSMIT card holder data in ANY electronic format on their system and premise.

Let’s start with MOTO first, because this often confuses people. Straight away those doing MOTO will dance a jig in front of me and gleefully points out that they deserve the SAQ A. All your base are belong to SAQ A, if those nerds like me would understand. Because I usually move them over to SAQ C or C-VT depending on how their call center/MOTO transactions are set up (even B-IP may apply e.g MOTO function on terminal, but mostly MOTO ends up being in SAQ D because they often store card data on file).

Hold on there though. Eligibility of MOTO is tied to the eligibility of c) – i.e you do not store (OK), process (erm, yeah ok, sometimes) or TRANSMIT card holder. Often the transmit and process part is done when you have people on your premise doing MOTO. The moment a phone call comes in – BAM! you are hit. You are done for.

So the ONLY time MOTO is eligible for SAQ A is later described in the SAQ:

Mail order/telephone order (MOTO) or e-commerce merchants that have completely outsourced all operations (where there is no redirection mechanism from the merchant to the third party) and therefore do not have any systems in scope for this SAQ, would consider these requirements to be “not applicable.”

SAQ A

The above is talking about how we can mark Requirement 2,6 and 8 as Non applicable. But notice where it states: COMPLETELY OUTSOURCED ALL OPERATIONS. This means, your company’s MOTO transaction is never done by your own company or on your own premise or by your people or using your technology. You have Completely, irrevocably, irreversibly outsourced the entire function to someone else who is PCI-DSS compliant. Then OK, you cool.

So now we know how to deal with that MOTO part. Oh wait, wait. There is a scenario from one client, where customers actually come over to the counter and try to make payment. However, because they have upgraded everything, instead of dipping or waving that card into a terminal for a POS payment, the counter person whips up a high tech iPad, connects to the companies website, looks at the credit card (while the customer is in front of them) and type out the transaction onto the e-commerce site itself for the transaction. How do we deal with this?

Well. This certainly doesn’t qualify for SAQ A in a strict sense, since this is considered a ‘face-to-face’ channel. However, logically, that transaction is made as an e-commerce, card non present transaction, because the CVV is entered as well and on the merchant end, it qualifies as a e-commerce transaction based on the flow and the fee they are paying. This is an interesting scenario as I would likely look at it as an e-commerce flow, since technically, the customer can do it themselves, but its just that for some reason, maybe they don’t know how, or they can’t figure it out, they go over to the counter to do it. The acquirer certainly doesn’t know about it. But because the information is going through the company’s asset, the company’s line, the company’s network, there would be additional risk they need to consider. In the end, it would be the call of the QSA on how they view this, however, I don’t think this could qualify for an SAQ A channel. It could be technically treated as a SAQ B-IP as we can assume its a terminal, but most auditors, to err on the side of caution may just opt for the full SAQ D on this.

OK, MOTO done.

Now for the e-commerce. I am not going to repeat what I’ve written some years back: https://www.pkfavantedge.com/it-audit/pci-dss-saq-a-and-saq-a-ep-differences-in-a-nutshell/

But I am just going to dive right in where the confusion begins. SAQ A-EP is written in a way that confuses people, and requires some sort of Indiana Jones exploration to figure out what in tarnation they are trying to get at.

So, under Before you begin, the second eligibility point (we call this ITEM 2):

Item 2: “All processing of cardholder data, with the exception of the payment page, is entirely outsourced to a PCI DSS validated third-party payment processor;”

This is confusing. They say – “All processing of cardholder data EXCEPT the payment page”. This means, the payment page actually SITS with the merchant, while everything else is outsourced to PCI third party. This means, this SAQ is eligible for merchants with PAYMENT PAGE (where credit card is entered) residing in their premise. So therefore, if the PAYMENT PAGE is also outsourced, immediately, this SAQ is no longer eligible. In a simple logic:

if (paymentpage) then { read next line;} elseif (!paymentpage) { exit();}

That means, SAQ A-EP doesn’t apply anymore to us if we have outsourced the payment page because this condition is not met, and therefore the if statement should exit, or if you are in a loop, it should end. SAQ ENDS.

The problem is auditors are generally non-programmers and even when the condition is no longer eligible, they keep going!!

And it’s really, the next line that is the confusion of all confusion:

Item 3: “Your e-commerce website does not receive cardholder data but controls how consumers, or their cardholder data, are redirected to a PCI DSS validated third-party payment processor;”

I mean, if we had exited the SAQ loop on the second condition, we won’t need to deal with this nonsense. So let’s break it down. YES, my e-commerce website does not receive card holder data, since I outsourced ALL MY PAYMENT page already to third party. But wait, you are saying “CONTROLS’ how consumers or data are ‘redirected’ to a third party? What? Obviously there is an element of control here, so how do we define ‘control’? Isn’t redirecting to an outsource payment page CONTROL?

The next confusion is the next line:

Item 4: “If merchant website is hosted by a third-party provider, the provider is validated to all applicable PCI DSS requirements (e.g., including PCI DSS Appendix A if the provider is a shared hosting provider);”

Hold up – didn’t we already agree that if the merchant entire website is hosted by a third party PCI provider, this would already not be in SAQ A-EP (see the exit rule of item 2). In fact, isn’t completely outsourcing the website the whole point of SAQ A? What sort of black magic is this?

Item 5: “Each element of the payment page(s) delivered to the consumer’s browser originates from either the merchant’s website OR a PCI DSS compliant service provider(s);”

Look at this wording. Look at it. Tell me that this is not contradicting item 2, the ‘with exception of the payment page’ condition. Let me rephrase item 2:

“You can go for SAQ A-EP if you host your payment page and have outsourced your processing to a PCI third party” –therefore implying that if you don’t host payment page and outsource everything, then another SAQ (SAQ A) applies.

Item 5 slaps Item 2 in the face and goes, “No. SAQ A-EP for you if you host the payment page, or the payment page is hosted by your PCI-DSS service provider. So no, Item 2, you wrong. You dead wrong.”

That usage of the word “OR” in that sentence confuses programmers or those with IT background, I think. This is a logical connector where if condition A OR condition B, if any of this is TRUE or both TRUE, we enter into the loop. Compared with the AND connector, where both needs to be true, otherwise we don’t process the loop. So the above statement is stating “ANY CONDITION WHATSOEVER” since it uses “OR”, will need to apply SAQ A-EP.

In fact, if they had clarified if all of these conditions are connected to each other either through the AND or OR operator, it would makes much more sense to us. It’s like the question, “Are you going to do it now OR do it later?” and we answer “Yes!” because we are indeed doing it now or later, and the question didn’t specify which condition as long as we are doing it.

Anyway, back to the story. The note in SAQ A-EP states:


For the purposes of this SAQ, PCI DSS requirements that refer to the “cardholder data environment” are applicable to the merchant website(s). This is because the merchant website directly impacts how the payment card data is transmitted, even though the website itself does not receive cardholder data.

SAQ A-EP OMINOUS NOTES

It is very ominous. It states, even if your website does not receive card holder data, you still impact or ‘control’ how the payment card is transmitted.

All is not lost though, because if you flip back to SAQ, under the SAQ A notes:

For this SAQ, PCI DSS Requirements that address the protection of computer systems (for example, Requirements 2, 6, and 8) apply to e-commerce merchants that redirect customers from their website to a third party for payment processing, and specifically to the merchant web server upon which the redirection mechanism is located

SAQ A OPTIMISTIC NOTES

I mean, I don’t know how clear it needs to be. It states in SAQ A “FOR THIS SAQ” – apply to merchants that ‘REDIRECT’ customers FROM their website (merchant website) to 3rd party for payment processing and specifically TO the merchant web server where the redirection occurs.

I am going to clarify the phrase that is underlined. the word “TO” is a preposition of the verb “apply to” in the earlier sentence, i.e this applies to merchants, specifically to their web server etc etc. Why its confusing here is because some may read it as a preposition to indicate direction , i.e redirect customers from their website to a 3rd party, specifically TO a merchant web server etc etc, which basically indicates the redirect is going into a loop (from merchant site to third party back to the merchant site) which doesn’t make sense.

I just want to point this out as I may not be the only one confused with this play of words and irresponsible usage of the preposition “TO”. Only me? Ok, fine.

Anyway – long story short, we used the notes in SAQ A to get out of jail for our client, and the QSA seemed to be resigned to that, noting there is a huge huge confusion with how A-EP is written. You do need to know, A-EP was born after A, so definitely, there would be some contradiction since they weren’t written together. SAQ A-EP is like the grumpy uncle that sits in the corner in your Christmas party and snaps at you when you ask him how he’s doing, while SAQ A is like the uncle with all the presents and all the children running around him and laughing with him as he tells a joke. Ah, SAQ A, we like you a lot.

Anyway – a final note on us, while we can state on PCI side, a full outsourcing of e-commerce payment page to third party qualifies for SAQ A, you do need to think – SAQ A-EP makes sense. The page doing the re-direct could be attacked and compromise and the redirect sent to another ‘payment page’ that looks exactly the same as the actual one. So while you are laughing with SAQ A, you need to take into account not to ignore the reasonable requirements that A-EP puts to you – vulnerability scan, firewall rules, penetration testing – i.e these are all best practice baselines that should be practiced regardless of compliance conditions etc. I would recommend a middle ground and take up a risk approach to it – implement these controls based on a good risk assessment and not just ignore the poor, grumpy SAQ A-EP uncle sitting in the corner. Because he may have a point in terms of security, after all.

Let us know about your experience or questions on PCI, SAQs or any other compliance questions at avantedge@pkfmalaysia.com!

Hard Drive Cloning with Clonezilla

Covid-19 has caused a lot of disruption (and not in a good way) to many companies and industries, ours included. One thing that it has done, whether in a detrimental or constructive way, is to force many of us to slow down in almost every aspects of our lives. While working from home does have its benefits, there is a difference between working from home with and without a bunch of yelling kids. While there may be those who can serenely drink their cappuccino while their kids are swinging like primates from the ceiling fan, many of us are juggling trying to get work done and trying to survive the gauntlet of emotions when it comes to educating your own kids, depending on their age. For me, this is the age where they think running around the garden stark naked screaming in the middle of the afternoon is their God given right.

But the flip side of it is that, once they are all slumbering away their misdeeds of the day, there is indeed that peace between 11 pm and 3 am that one can truly feel the serenity of a cloistered monk, and get things done. One of the things that I would suggest to do, is to probably do a backup of your data and laptop while you have time.

For me, one of the easiest and quickest way to do this is to just clone the entire drive you want to backup. In this short article, we will see how we can clone to a larger or equivalent drive (HDD or SSD) using this great software called Clonezilla.

Now, if you want to clone a larger drive to a SMALLER one, that might be a bit trickier, but it can possibly be done but that’s beyond this article. This is straight up backing up by cloning.

One of the frustrating thing is that there are so many software out there claiming to get this done. Minitool Partition Wizard used to be a good tool which I used previously, but the new version 12 now tricks you into doing all the steps for cloning and when you click ‘Start’, it pops up that this feature is only available in the Paid version. The previous versions (if you have) allows you to do it, but unfortunately I removed it earlier and I don’t have the offline installer for it, as all installers now automatically download the new version. Other software like Aomei Partition Assistant actually gets to the point where you can execute the cloning, but frustratingly tries to install Windows PE and fails, and then tries to go Pre-Os mode and just hangs there not knowing what to do.

Previously I did an extremely difficult cloning of a 1TB drive full of errors using DDRescue, a very, very good tool especially for drives that are strewn with bad sectors and errors which none of the bloatware out there can resolve and even Clonezilla had issues with. It took me a good part of one day just to get it done but it finally did and I managed to save all data from a drive that was dying. Again, DDRescue is a good tool, but for non-error, straight up clone, I think Clonezilla is the easiest and best.

First of all, for Clonezilla, the easiest way is to create a Live USB flash drive and use that. I used Rufus USB which is pretty straightforward but there are other ways to get it done as well. Go on to
https://clonezilla.org/liveusb.php to get a full idea of how to do this.

Secondly, prepare the target drive where you want to dump your drive to. A HDD/SSD would likely need an enclosure to house it. For the SSD, make sure you know which type it is and get the enclosure from Lazada or Shopee. These are fairly cheap . I got mine for around RM25 – RM30 or somewhere there. Mine looked like this:

Plug the new drive into the enclosure.

Once done, just plug in the Clonezilla USB in and reboot and ensure you are booting from USB over your BIOS. This may or may not be tricky if you are booting with UEFI with secure boot, and you may need to disable it and use legacy boot to get the USB to boot for now.

Once booted, you are welcomed to the tiny Clonezilla OS, which is Debian I believe.

Go ahead with the default settings, but now you can plug in your enclosure USB drive, which Clonezilla should be able to detect as it boots up. It takes less than a minute and we are now dump into the main screen. Select the keyboard settings as it is, and start Clonezilla.

Select device-device as this is what you are looking at. I would opt to select “Expert mode” as this provides you with more options as you are cloning to a larger storage. If you are doing a direct mirror to a similar sized storage, using beginner mode may be less work.

Select Disk to Local Disk and at this point Clonezilla should be able to see a few things: your current drive and the drive that is plugged in using the enclosure. As in all things, you need to be wary which is which and not copy it wrongly.

Select your current drive as source and the enclosure drive as the destination.

In the expert mode, you have quite a fair bit of options. Just make sure -r option is there as this resizes the filesystem automatically and in the other part of expert parameters to select -k1 which is to create partition table proportionally.

If you are sure that the drive is fine and you are just doing a backup, then skip the checks for errors of the source file system which would save some time. If you do have errors, my suggestion would be to go to DDRescue option as opposed to Clonezilla. I tried Clonezilla on a dying drive and it didn’t work too well.

Finally, select what you’d like to do once the whole thing is done, to reboot, shutdown etc.

We later have a whole bunch of funny excerpt to confirm if we actually know what we are doing. Because I was cloning the OS drive, I cloned the boot loader along with it.

And finally we are off. It takes only a short while – I guess less than half an hour to get it done.I didn’t take time of it, I just let it run and did my own things and within an hour checked again and found it was completed. Based on the average rate and my 250GB, it’s roughly around 20 minutes or so.

Once that is completed, shut down your laptop, take out the old SSD and swap it with a new one and voila you now have a new expanded SSD on your OS drive. You can do frequent backups to your laptop as well by doing the same method – but getting a same size SSD – and it would save you years of trouble (if you don’t change your laptop). Even if you upgrade your laptop, you can still use the backup as an external USB drive and copy the data there on your own time.

If you plugin the enclosure that had your previous OS image, you may get the problem of Windows making it offline due to signature collisions. Just right-click and select Online for the drive and you should be good to go.

Well, that’s it.

The reason for this article is that it’s frustrating working through software like Aomei Partition or Minitool (latest version) Partition Wizard etc because there are so many things that cannot work – e.g installing Windows PE, going into Pre-Os mode and in Aomei Partition, just hanging my laptop and not moving forward. The preference is just to get a simple solution that works without forcing us to buy professional versions etc or wasting our time with software gymnastics that we don’t need. Clonezilla (or the even better DDRescue) would be the go-to software for this.

Now, stay safe, and get your kids down from the ceiling fan!

The Trouble with UASP

A break from our compliance articles, here is a simple hack for a problem plaguing us for some time. Our backup program is fairly comprehensive…we use a lot of different backup options. We are of the philosophy that you can never back up enough. Never. And anyway, PKF Malaysia is pretty big. We have offices in KL, Penang, Ipoh, KK, Sandakan, KB, Cambodia under us. That’s a lot of data. One of the backup options we use is automated to large external USB drives for archival.

While not the primary means of backup, it’s still pretty frustrating when we faced a strange issue with two of our external drives acting up all of a sudden. We don’t know why. They were working well enough, but suddenly, just stopped functioning well. Like a teenage kid. What happened was, we found that backups would regularly hang to the external drives (we have 2 of the same brand). At first, we thought it was just a connection issue (USB failing etc) and we took them out and tried it on various Windows servers (not *nix as we didn’t want to mess around with those, as they were critical). The idea anyway was to have these function on Windows only. Same issue was observed on all Windows servers and even on workstations and laptops.

The issue was simply excruciatingly slow transfer speeds and a graph looking similar to this:

Now that doesn’t look like a healthy drive.

What is happening here was that it would start whirring up once we tried to transfer files and the active time of the disk would be 1%. In a second, it would hit 50% and then within 2-3 seconds hit 100%. At that moment, the transfer speed had probably crawled up to around 15-20 MB/s and it would drop again to zero as the disk starts to recover. And it would happen again and again. This made data transfers impossible, especially big backups we did.

Troubleshooting it, after we tried updating firmware to no success:

a) Switched cables for the drives: Same issue

b) Switched servers / workstations : Same issue. We note we didn’t try it on *nix because those were critical to us but as an afterthought, it was probably something we could have done to isolate if the problem was the drive or the OS.

c) Switched other external drives: Works fine. So we kept one of these older drives as a control group so we know what is working

d) Switched USB3.0 to USB2.0: This is where we began to see something. The USB2.0 switch made the drives work again. However, transferring at USB2.0 speeds wasn’t why we purchased these drives anyway, but again, this is a control group, which is very important in troubleshooting as it provides a horizon reference and pinpoint to us where the issue is.

We ran all the disk checks on it and it turns out great. Its a healthy drive. So what is the issue?

We opened a support ticket to the vendor of this drive and surprisingly got a response within 24 hours asking us the issue asking us for the normal things like serial number, OS, error screens , disk management readings etc. So we gave and explained everything and then followed up with:

We have already run chkdsk, scannow, defrag etc and everything seems fine. Crystaldisk also gave us normal readings (good) so we don’t think there is anything wrong with the actual drive itself. We suspect it’s the UASP controller – as we have an older drive running USB (Serial ATA) which was supposed to be replaced by your disk – and it runs fine with fast transfer speed. We suspect maybe there is to do with the UASP connection. Also we have no issues with the active time when running on the slower 2.0 USB. Is there a way to throttle the speed?

We did suspect it was the UASP (USB Attached SCSI Protocol) that this drive was using. Our older drives were using traditional BOT (Bulk Only Transport) and the USB3.0 was working fine. Apparently UASP was developed to take advantage of the speed increases of USB3.0. I am not a storage guy, but I would think UASP is able to handle transfers in parallel while BOT handles in sequential. I would think UASP is like my wife in conversation, taking care of the kids, cooking, looking at the news, answering an email to her colleague, sending a whatsapp text to her mum and solving world hunger. All at once. While BOT is me, watching TV, unable to do anything else until the football game is over. That’s about right. So it’s supposed to be a lot faster.

How UASP works though is it requires both the OS and the drive to support it. The USB controller supporting this in Windows is the uaspstor.sys. You can check this when you look at the advanced properties in device manager and clicking on the connected drive or through disk management screen and view the detailed properties. Interestingly, our older drives loaded usbstor.sys which is the traditional BOT.

UASP is backward compatible to USB2.0, so we don’t really know why USB2.0 worked while USB3.0 didn’t. The symptoms were curiously the same. Even at USB2.0, the active time would ramp up, but we think, because USB2.0 didn’t have enough pipe to transfer, the transfer rates were around 30 MB/s and the active time of the drive peaked around 95%. So because it never hit the 100% threshold for the drives to dial down, we don’t see the spike up and down like we see in USB3.0. However, active time of 90+% still isn’t that healthy, from a storage non-expert perspective.

The frustrating thing was, the support came back with

Thank you very much for the feedback. Regarding to the HDD transferring issue, we have to let you know that the performance of USB 3.0 will be worse when there is a large amount of data and fragmented transmission. The main reason is the situation caused by the transmission technology not our drive. For our consumer product, if the testing result shown normal by crystaldiskinfo, we would judge that there is no functional issue. Furthermore, we also need to inform you that there is no UASP function design for our consumer product. If there are any further question, please feel free to contact us.

One of the key things I always tell my team is that tech support, as the first line of defence to your company MUST always know how to handle a support request. The above is an example how NOT to be a tech support.

Firstly, deflecting the issue from your product. Yes, it may be so that your product is not the issue. But when a customer comes to you asking for help, the last thing they want to hear is, “Not my problem, fix it yourself.” That’s predominantly how I see tech support, having worked there for many years ourselves. We have a secret script where we need to segue the complaint to where we are no longer accountable. For instance – did you patch to latest level? Did you change something to break the warranty? Did you do something we told you not to do in one of the lines amongst the trillion lines in our user license agreement? HA!

So no, don’t blame the transmission technology. Deflecting the issue, and saying the product is blameless is what we in tech support call the “I’ll-do-you-a-favor” manoeuvre. Because here, they establish that since they are not obligated to assist anymore, any further discussion on this topic is a ‘favor’ they are doing for you and they can literally exit at any point of time. It makes tech support look good when the issue is resolved and if the going gets too tough, it’s not too hard to say goodbye.

Secondly, don’t think all your clients are idiots. With the advent of the internet, the effectiveness of bullshitting has decreased dramatically from the times of charlatans and hustlers peddling urine as a form of teeth whitening product. They really did. Look it up. Maybe, a full drive would have some small impact to the performance. But a quick look at the graph shows a drive that is absolutely useless, not due to a minuscule performance issue but to an obvious bigger problem. I mean how is it that we can make a logic that once it reaches a certain percentage of drive storage it is rendered useless?

Lastly, know your product. Saying there is no UASP function is like us telling our clients PCI-DSS isn’t about credit cards but about, um, the mating rituals of tapirs. The tech support unfortunately did not bother to get facts correct, and the whole response came across as condescending, defensive and uninformed.

So now, we responded back:

I am not sure if this is correct, as we have another brand external drive working perfectly fine with USB3.0 transmission rates, whether its running a file or transferring a file to and from the external drive. Your drive performance is problematic when a file was run directly from the drive, and also transmission to copy file TO the drive. As we say, the disk itself seems ok but regardless, the disk is not usable when connecting to our USB 3.0 port, which most of our systems have, that means the your external drive can only work ok for USB 2.0. We suggest you to focus the troubleshooting on the UASP controller.

Further on, after a few back and forth where they told us they will recheck we responded

Additional observation: Why we don't think its a transmission problem, aside from the fact other drives have no issues, is that when we run a short orientation video file from your drive, your drive active time ramps up to 100% quickly - it goes from 1% - 50% - 100%, then the light stops blinking, and it drops back to 1% again, and it ramps up again.  This happens over and over. We switched to other laptops and observed the same issue. On desktops as well, different Windows systems.  What we don't want to do is to reformat the whole thing and observe, because, really, the whole reason for this drive is for us to store large files in it as a backup, and not have a backup of backup. We have also switched settings to enable caching (and also to disable it) - same results. The drives are in NTFS and the USB drivers have been updated accordingly. We have check the disk, ran crystaldisk, WMIC, defrag (not really needed as these are fairly new), but all with same results.  We dont find any similar issues online, of consistent active time spikes like we have shown you, so hopefully support can assist us as well.  

After that, they still came back saying they needed more time with their engineers and kept asking us whether other activities were going on with the server and observing the disk was almost full. We did a full 6 page report for them, comparing crystaldiskinfo results of all our other drives (WD, Seagate) and point out specifically their disks were the ones having the spike issues and requested them again to check the controllers.

After days of delay, stating their engineers were looking into it, and our backups were stalled they came back with:

Regarding to the HDD speed performance, we kindly inform you that the speed (read/write) of products will be limited by different testing devices, software, components and testing platforms. The speed (read/write) of products is only for reference. From the print screen that you provide,  we have to let you know that the write speed performance is slow because there are data stored in this drive. Kindly to be noted that the speed may vary when transferring huge data as storage drive or processing heavy working load as storage drive. Besides, please refer to the photo that we circled in red, both different drive have different data percentage.  For our drive, the capacity is near full, then it is normal to see the write speed performance slower than other drive.

This response was baffling . It wasn’t just slow. It was unusable because there is data stored in this drive? Normal to see write speed slower?

After almost a week and half talking to this tech support, they surmise (with their engineers) that their drives cannot perform because there is data stored in it. It makes one wonder then why are disk drives created if not to store data.

Our final response was:

We respectfully disagree with you. Your drive is unusable with those crystaldisk numbers and I am sure everyone will agree to that. You are stating your drive is useless once it starts storing data, which is strange since your product is created to store data. Whether the drive is half full or completely full is not the point, we have run other drives which are 95% full and which are almost 100% full with no issues. Its your drive active time spiking up to 100% for unknown reasons, and we have not just one but two of your drives doing this. We have insisted you to look at the controller but it seems you have not been able to troubleshoot that.  I believe you have gone the limit in your technical ability and you are simply unable to give anymore meaningful and useful support, and I don't think there's anything else you can do at that will be of use for us .We will have to mark it out as a product that we cannot purchase and revert back to other drives for future hard drive purchase and take note of your defective product to our partners. 

And that was the end. Their tech support simply refused to assist and kept blaming a non-existent event (data are stored in the disk), which had absolutely zero logical sense. It was a BIZARRE tech conclusion that they came to and an absolute lesson of what not to do for tech support.

That being said, we temporary resolved the issue with a workaround by disabling one of our servers using UASP and force it to use BOT. This hack was taken from this link, so we don’t get the credit for this workaround.

Basically you go to C:\Windows\System32\drivers and just rename uaspstor.sys to uaspstor.sys.bk or whatever to back it up. Then copy usbstor.sys to uaspstor.sys.

Depending on your system, you might have to do this in safe mode. We managed to do it without. Reboot and plugged in the troublesome drive and now it works. Some other forums says to go to the registry and basically redirect the UASPStor Imagepath to usbstor.sys instead of uaspstor.sys. However, this is problematic as when you try to plugin a traditional external drive using BOT, it doesn’t get recognised, because we think that the usbstor.sys is locked for usage somehow. So having a usbstor.sys copied to uaspstor.sys seems to trick Windows into using the BOT drivers instead while thinking it’s using this troublesome UASP driver.

Now obviously this isn’t a solution, but it’s a workaround. For us, we just plug these drives into one of the older servers and used it as a temporary backup server until we figure out this thing in the long run.

But yeah, the lesson was really in our interaction with tech support and hopefully we all get better because of this! For technical solutions and support, please drop us a email or the comment below and we will get to you quickly and in parallel (not sequentially)!

« Older posts

© 2020 PKF AvantEdge

Theme by Anders NorenUp ↑