Category: PKF Avant Edge (Page 1 of 13)

The Trouble with UASP

A break from our compliance articles, here is a simple hack for a problem plaguing us for some time. Our backup program is fairly comprehensive…we use a lot of different backup options. We are of the philosophy that you can never back up enough. Never. And anyway, PKF Malaysia is pretty big. We have offices in KL, Penang, Ipoh, KK, Sandakan, KB, Cambodia under us. That’s a lot of data. One of the backup options we use is automated to large external USB drives for archival.

While not the primary means of backup, it’s still pretty frustrating when we faced a strange issue with two of our external drives acting up all of a sudden. We don’t know why. They were working well enough, but suddenly, just stopped functioning well. Like a teenage kid. What happened was, we found that backups would regularly hang to the external drives (we have 2 of the same brand). At first, we thought it was just a connection issue (USB failing etc) and we took them out and tried it on various Windows servers (not *nix as we didn’t want to mess around with those, as they were critical). The idea anyway was to have these function on Windows only. Same issue was observed on all Windows servers and even on workstations and laptops.

The issue was simply excruciatingly slow transfer speeds and a graph looking similar to this:

Now that doesn’t look like a healthy drive.

What is happening here was that it would start whirring up once we tried to transfer files and the active time of the disk would be 1%. In a second, it would hit 50% and then within 2-3 seconds hit 100%. At that moment, the transfer speed had probably crawled up to around 15-20 MB/s and it would drop again to zero as the disk starts to recover. And it would happen again and again. This made data transfers impossible, especially big backups we did.

Troubleshooting it, after we tried updating firmware to no success:

a) Switched cables for the drives: Same issue

b) Switched servers / workstations : Same issue. We note we didn’t try it on *nix because those were critical to us but as an afterthought, it was probably something we could have done to isolate if the problem was the drive or the OS.

c) Switched other external drives: Works fine. So we kept one of these older drives as a control group so we know what is working

d) Switched USB3.0 to USB2.0: This is where we began to see something. The USB2.0 switch made the drives work again. However, transferring at USB2.0 speeds wasn’t why we purchased these drives anyway, but again, this is a control group, which is very important in troubleshooting as it provides a horizon reference and pinpoint to us where the issue is.

We ran all the disk checks on it and it turns out great. Its a healthy drive. So what is the issue?

We opened a support ticket to the vendor of this drive and surprisingly got a response within 24 hours asking us the issue asking us for the normal things like serial number, OS, error screens , disk management readings etc. So we gave and explained everything and then followed up with:

We have already run chkdsk, scannow, defrag etc and everything seems fine. Crystaldisk also gave us normal readings (good) so we don’t think there is anything wrong with the actual drive itself. We suspect it’s the UASP controller – as we have an older drive running USB (Serial ATA) which was supposed to be replaced by your disk – and it runs fine with fast transfer speed. We suspect maybe there is to do with the UASP connection. Also we have no issues with the active time when running on the slower 2.0 USB. Is there a way to throttle the speed?

We did suspect it was the UASP (USB Attached SCSI Protocol) that this drive was using. Our older drives were using traditional BOT (Bulk Only Transport) and the USB3.0 was working fine. Apparently UASP was developed to take advantage of the speed increases of USB3.0. I am not a storage guy, but I would think UASP is able to handle transfers in parallel while BOT handles in sequential. I would think UASP is like my wife in conversation, taking care of the kids, cooking, looking at the news, answering an email to her colleague, sending a whatsapp text to her mum and solving world hunger. All at once. While BOT is me, watching TV, unable to do anything else until the football game is over. That’s about right. So it’s supposed to be a lot faster.

How UASP works though is it requires both the OS and the drive to support it. The USB controller supporting this in Windows is the uaspstor.sys. You can check this when you look at the advanced properties in device manager and clicking on the connected drive or through disk management screen and view the detailed properties. Interestingly, our older drives loaded usbstor.sys which is the traditional BOT.

UASP is backward compatible to USB2.0, so we don’t really know why USB2.0 worked while USB3.0 didn’t. The symptoms were curiously the same. Even at USB2.0, the active time would ramp up, but we think, because USB2.0 didn’t have enough pipe to transfer, the transfer rates were around 30 MB/s and the active time of the drive peaked around 95%. So because it never hit the 100% threshold for the drives to dial down, we don’t see the spike up and down like we see in USB3.0. However, active time of 90+% still isn’t that healthy, from a storage non-expert perspective.

The frustrating thing was, the support came back with

Thank you very much for the feedback. Regarding to the HDD transferring issue, we have to let you know that the performance of USB 3.0 will be worse when there is a large amount of data and fragmented transmission. The main reason is the situation caused by the transmission technology not our drive. For our consumer product, if the testing result shown normal by crystaldiskinfo, we would judge that there is no functional issue. Furthermore, we also need to inform you that there is no UASP function design for our consumer product. If there are any further question, please feel free to contact us.

One of the key things I always tell my team is that tech support, as the first line of defence to your company MUST always know how to handle a support request. The above is an example how NOT to be a tech support.

Firstly, deflecting the issue from your product. Yes, it may be so that your product is not the issue. But when a customer comes to you asking for help, the last thing they want to hear is, “Not my problem, fix it yourself.” That’s predominantly how I see tech support, having worked there for many years ourselves. We have a secret script where we need to segue the complaint to where we are no longer accountable. For instance – did you patch to latest level? Did you change something to break the warranty? Did you do something we told you not to do in one of the lines amongst the trillion lines in our user license agreement? HA!

So no, don’t blame the transmission technology. Deflecting the issue, and saying the product is blameless is what we in tech support call the “I’ll-do-you-a-favor” manoeuvre. Because here, they establish that since they are not obligated to assist anymore, any further discussion on this topic is a ‘favor’ they are doing for you and they can literally exit at any point of time. It makes tech support look good when the issue is resolved and if the going gets too tough, it’s not too hard to say goodbye.

Secondly, don’t think all your clients are idiots. With the advent of the internet, the effectiveness of bullshitting has decreased dramatically from the times of charlatans and hustlers peddling urine as a form of teeth whitening product. They really did. Look it up. Maybe, a full drive would have some small impact to the performance. But a quick look at the graph shows a drive that is absolutely useless, not due to a minuscule performance issue but to an obvious bigger problem. I mean how is it that we can make a logic that once it reaches a certain percentage of drive storage it is rendered useless?

Lastly, know your product. Saying there is no UASP function is like us telling our clients PCI-DSS isn’t about credit cards but about, um, the mating rituals of tapirs. The tech support unfortunately did not bother to get facts correct, and the whole response came across as condescending, defensive and uninformed.

So now, we responded back:

I am not sure if this is correct, as we have another brand external drive working perfectly fine with USB3.0 transmission rates, whether its running a file or transferring a file to and from the external drive. Your drive performance is problematic when a file was run directly from the drive, and also transmission to copy file TO the drive. As we say, the disk itself seems ok but regardless, the disk is not usable when connecting to our USB 3.0 port, which most of our systems have, that means the your external drive can only work ok for USB 2.0. We suggest you to focus the troubleshooting on the UASP controller.

Further on, after a few back and forth where they told us they will recheck we responded

Additional observation: Why we don't think its a transmission problem, aside from the fact other drives have no issues, is that when we run a short orientation video file from your drive, your drive active time ramps up to 100% quickly - it goes from 1% - 50% - 100%, then the light stops blinking, and it drops back to 1% again, and it ramps up again.  This happens over and over. We switched to other laptops and observed the same issue. On desktops as well, different Windows systems.  What we don't want to do is to reformat the whole thing and observe, because, really, the whole reason for this drive is for us to store large files in it as a backup, and not have a backup of backup. We have also switched settings to enable caching (and also to disable it) - same results. The drives are in NTFS and the USB drivers have been updated accordingly. We have check the disk, ran crystaldisk, WMIC, defrag (not really needed as these are fairly new), but all with same results.  We dont find any similar issues online, of consistent active time spikes like we have shown you, so hopefully support can assist us as well.  

After that, they still came back saying they needed more time with their engineers and kept asking us whether other activities were going on with the server and observing the disk was almost full. We did a full 6 page report for them, comparing crystaldiskinfo results of all our other drives (WD, Seagate) and point out specifically their disks were the ones having the spike issues and requested them again to check the controllers.

After days of delay, stating their engineers were looking into it, and our backups were stalled they came back with:

Regarding to the HDD speed performance, we kindly inform you that the speed (read/write) of products will be limited by different testing devices, software, components and testing platforms. The speed (read/write) of products is only for reference. From the print screen that you provide,  we have to let you know that the write speed performance is slow because there are data stored in this drive. Kindly to be noted that the speed may vary when transferring huge data as storage drive or processing heavy working load as storage drive. Besides, please refer to the photo that we circled in red, both different drive have different data percentage.  For our drive, the capacity is near full, then it is normal to see the write speed performance slower than other drive.

This response was baffling . It wasn’t just slow. It was unusable because there is data stored in this drive? Normal to see write speed slower?

After almost a week and half talking to this tech support, they surmise (with their engineers) that their drives cannot perform because there is data stored in it. It makes one wonder then why are disk drives created if not to store data.

Our final response was:

We respectfully disagree with you. Your drive is unusable with those crystaldisk numbers and I am sure everyone will agree to that. You are stating your drive is useless once it starts storing data, which is strange since your product is created to store data. Whether the drive is half full or completely full is not the point, we have run other drives which are 95% full and which are almost 100% full with no issues. Its your drive active time spiking up to 100% for unknown reasons, and we have not just one but two of your drives doing this. We have insisted you to look at the controller but it seems you have not been able to troubleshoot that.  I believe you have gone the limit in your technical ability and you are simply unable to give anymore meaningful and useful support, and I don't think there's anything else you can do at that will be of use for us .We will have to mark it out as a product that we cannot purchase and revert back to other drives for future hard drive purchase and take note of your defective product to our partners. 

And that was the end. Their tech support simply refused to assist and kept blaming a non-existent event (data are stored in the disk), which had absolutely zero logical sense. It was a BIZARRE tech conclusion that they came to and an absolute lesson of what not to do for tech support.

That being said, we temporary resolved the issue with a workaround by disabling one of our servers using UASP and force it to use BOT. This hack was taken from this link, so we don’t get the credit for this workaround.

Basically you go to C:\Windows\System32\drivers and just rename uaspstor.sys to uaspstor.sys.bk or whatever to back it up. Then copy usbstor.sys to uaspstor.sys.

Depending on your system, you might have to do this in safe mode. We managed to do it without. Reboot and plugged in the troublesome drive and now it works. Some other forums says to go to the registry and basically redirect the UASPStor Imagepath to usbstor.sys instead of uaspstor.sys. However, this is problematic as when you try to plugin a traditional external drive using BOT, it doesn’t get recognised, because we think that the usbstor.sys is locked for usage somehow. So having a usbstor.sys copied to uaspstor.sys seems to trick Windows into using the BOT drivers instead while thinking it’s using this troublesome UASP driver.

Now obviously this isn’t a solution, but it’s a workaround. For us, we just plug these drives into one of the older servers and used it as a temporary backup server until we figure out this thing in the long run.

But yeah, the lesson was really in our interaction with tech support and hopefully we all get better because of this! For technical solutions and support, please drop us a email or the comment below and we will get to you quickly and in parallel (not sequentially)!

PCI-DSS and Vendors

One of the things that advisors and consultants do, as part of our journey to get our clients to comply to PCI-DSS is the inevitable (and unenviable) task of dealing with vendors. A vendor can be classified as anyone or any company that is selling a service or a product to our client, that directly or indirectly relates or affect their PCI-DSS compliance. Examples include firewall vendors, encryption technology vendors, HSM vendors, server vendors, Virtual solution vendors, SIEM vendors, SOC providers, call center solution vendors, telemarketing services, hosting providers, cloud providers and the list goes on. Having dealt with hundreds of vendors over the course of the decade we have come across all kinds: some are understanding, some are hostile, some are dismissive, some are helpful and the list goes on.

But there is always a common denominator in vendors: They all start by justifying why their product or service is:

a) Not relevant to PCI-DSS compliance (because they don’t store card data, usually)

b) Why their product is PCI acceptable (but it’s really not, or when we have questions on certain aspects of it)

It always begins with these two start points and it can then branch off into a myriad of different plots, twists, turns and endings, very much like a prolonged Korean drama.

With this in mind, we recently had an interesting call with one of such vendor, who basically runs a fairly important PCI subsystem for one of our clients. The problem was that their logs and console had two things that we sometimes find: the combination of truncated and hashed values of a credit card information, grouped together.

Now, just a very quick recap:

a) Truncated card data – This is where you see parts of the card replaced by XXXX characters (or any character) where the full card number is not STOREDNow it must be noted that TRUNCATED and MASKED are treated differently, although oftentimes confused and used interchangeably. When we say something is MASKED, it generally means the PAN (Primary account number) is stored in full but not displayed in full on the console/application etc. This applies sometimes to call centers or outsourced services where full PAN is not required for back office operations but for reconciliation or references. TRUNCATED here means even in storage, the full PAN is not present.

b) Hashed Card Data – Hashing means its a one-way transformation of card data into a hash value with no way to reverse it (Unlike encryption). If we use a SHA-256 hash algorithm on a PAN, you get a fixed result. Fraud management systems may store this hash PAN in order to identify transactions by that particular card (after hashing), and not worry about the actual card data being stored. It’s like hashing of passwords where the actual password isn’t known.

It’s to be noted, when done properly, these two instances of data may even be considered entirely out of scope of PCI-DSS. The problem here is when you have both of these stored together and correlated together, it renders the data protection weaker than just having one control available. This is probably where the concept usually gets lost on clients implementing these controls, as we have seen many times before – for example, tokenized information being stored together with truncated values.

Even PCI-DSS itself states clearly in the standard item 3.4 in the Note, that “Where hashed and truncated versions of the same PAN are present in an entity’s environment, additional controls must be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.”

To clarify, it doesn’t mean that it CANNOT be done, but additional controls must be in place. A further look at this is found in the Tokenization Product Security Guidelines Supplementary document:

IT 1A-3.b: Verify that the coexistence of a truncated PAN and a token does not provide a statistical advantage greater than the probably of correctly guessing the PAN based on the truncated value alone.

Further on:

…then the vendor provides documentation to validate the security strength (see Annex C – Minimum Key Sizes and Equivalent Key Strengths for Cryptographic Primitives) for each respective mechanism. The vendor should provide a truncated PAN and irreversible token sample for each.

And furthermore in Tokenization_Guidelines_Info_Supplement.pdf:

Note: If a token is generated as a result of using a hash function, then it is relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of the PAN. Where hashed and truncated versions of the same PAN are present in the environment, additional controls should be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN.

So in short, at anytime we see there are hashed values and truncated value together, we need to validated further on the controls. A good writeup is found here at another blog which summarises the issues surrounding this.

However, as our call with this particular vendor continued on, he demonstrated just how vendors should or should NOT approach PCI-DSS compliance, which sort of inspired this post:

A) DON’T place yourself as the topical expert in PCI-DSS: Don’t. Not because you are not, but because you are representing a product or a service, so you always view certain things through a lense you have been trained on. I know, because I was with vendors for many years and most of our consultants are from vendor backgrounds. He immediately started by stating, he is extremely well verse with section 3.4 of PCI-DSS (which basically talks about the 4 options of protecting card holder data stored), and that he has gone through this conversation many times with consultants. This immediately sends the QSA red flags, once the vendor starts moving away from what they know (their product) to what they may think they know but generally may not (PCI-DSS), and in general we don’t want to put the auditor on defence once vendors sound defensive. It should be collaborative. DO state clearly that we are subject matter experts in our own field and we are open to discussions.

B) DON’T recover by going ‘technical’: In his eagerness to demonstrate his opinion on PCI, he insisted that we all should know what 3.4 is about. Concerning the four controls stated in PCI-DSS (token, truncation, hashing, encryption), he claimed that their product is superior to what we are used to because his product has implemented 3 out of 4 of these controls (hashing, truncation and encryption) and he claims this makes it even more compliant to PCI-DSS. At this point, someone is going to call you out, which is what we did reluctantly as we were all staring at each other quizzically. We had to emphasize we really can’t bring this to the auditor or justify this to our client who was also on the call, as this is an absolute misinterpretation of PCI-DSS, no matter what angle you spin. PCI never told us to implement as many of these options as possible. In fact, clearly stating if more than one of these are introduced, extra care must be taken in terms of controls that these cannot be correlated back to the PAN. We told him this was a clear misinterpretation to which his response was going into a long discourse of where we consultants were always ‘harping’ on impractical suggestions of security and where we always think it’s easy to crack hashes just because we know a little bit about ‘rainbow tables’. We call this “going technical”. As Herman Melville, the dude that wrote Moby Dick puts it:

“A man of true Science uses but few hard words and those only when none others will serve his purpose; whereas the smatterer in Science… thinks that by mouthing hard words he understands hard things”. – Dude that wrote Moby Dick.

Our job is really to uncomplicate things and not to make it sound MORE complicated, because there may always be someone in the room (or video conference) who knows a little more than what they let on.

DO avoid jargonizing the entire conversation as it is very awkward for everyone, especially for those who really know the subject. DO allow input from others and see from the point of view of the standard, whether you agree or disagree or not and keep in mind the goal is common: to make our client compliant.


C) DO find a solution together. As a vendor, we must remember, the team is with the client. The consultant is (usually) with the client. So its the same team. A good consultant will always want vendors to work together. We always try to work out an understanding if vendors cannot implement certain things, then let’s see what we can work on, and we can then talk to the the QSA and reason things out. Compensating controls etc. So the solution needs to be together, and finally, after all those awkward moments of mansplaining everything to us, we just went: “OK, let’s move on, these are the limitations, let’s see where the solution is.” And after around 5 minutes or so, we had a workaround sorted out. Done. No need to fuss. So next step is to get this workaround passed by the auditor for this round and if not, then we are back again to discuss, if yes, then done, everyone is moving out to other issues. Time is of essence, and the last thing we need is each of us trying to show the size of our brains to each other.

D) Don’t namedrop and look for shorter ways to resolve issues. One of the weirdest thing that was said in the conversation after all our solution discussion was when the vendor said that he knew who the QSA was and he dropped a few names and said, just tell the QSA it’s so and so, and we’ve worked together and he will understand. Firstly, it doesn’t work like that. Namedropping doesn’t allow you to pass PCI. Secondly, no matter how long you have worked with someone, remember, another guy in the room may know that someone longer than you. We’ve been working with the QSA since the day they were not even in the country and for a decade, so we know everyone there. If namedropping was going to pass PCI, we would be passing PCI to every Tom, Dick, Harry and Sally around the world. No, it doesn’t work that way, we need to resolve the issues.

So there you have it. This may sound like a rant, but the end of the conversation was actually somewhat amicable. Firstly, I was genuinely appreciative of the time he gave us. Some vendors don’t even get to the table to talk and the fact that he did, I really think its a good step forward and made our jobs easier. Secondly, we did find the workaround together and that he was willing to even agree to a workaround, that’s a hard battle won. Countless vendors have stood their ground and stubbornly refused to budge even when PCI non-compliance was screaming at their faces. Thirdly, I think, after all the “wayang“, I believe he actually truly believed in helping our client and really thought that his product was actually compliant in all aspects. Of course, his delivery was awkward, but the intention was never to make life difficult for everyone, but to be of assistance.

At the end, the experience was a positive one, given how many discussions with vendors go south. We knew more of their solution, we worked out a solution together and more importantly, we think this will pass PCI for our client. So everyone wins. In this case, the Korean Drama ended well!

For more information on PCI-DSS, drop us a line at pcidss@pkfmalaysia.com and we will get back to you immediately! Stay safe!

Clarifying ASV Scans

It has been a while since our last post but as we are getting back up to speed to restart our work, our email engines are churning again with a lot of queries and questions from clients and the public on PCI-DSS, ISMS, ITSM, GDPR matters. We even have an odd question or two popping up regarding COVID-19 and how to secure against that virus. I don’t know. It’s a multi-billion dollar question which nobody can answer.

So while all these things are going, the one relentless constant we are still facing is: PCI-DSS deadlines. Despite the worldwide pandemic, we still get clients telling us they need to get their certificate renewed, their ASV scans done, their penetration testing sorted within x number of days. The reality of course is a bit more difficult. For example, once you have tested or scan, how does one remediate the issue when we cannot even get onsite to do proper testing? What about the development team, or the patching process, or the testing procedures and change management that needs to be done? The reality is simply, due to the pandemic, DELAYS will occur.

One of the main concerns are ASV scans, because ASV scans need to be done quarterly, there may be actual issues in remediation delays that may cause the company to miss the quarter.

How do we overcome this?

The main step is to always check with your QSA on this. I cannot repeat this ENOUGH. An organisation undergoing PCI-DSS, no matter what your size, especially if you are undergoing QSA certified program (Level 1 or Level 2 SAQ signoff from QSA) – ENGAGE your QSA to assist you. The QSA isn’t just supposed to come in at the end of your certification cycle, start poking holes into all your problems and tell you – you can’t pass because you missed our your internal VA back in Quarter 1. Or state your segmentation testing is insufficient at the end of your certification cycle. Or tell you that your hardening procedures are inadequate, with 1 month left to your certification cycle. The QSA needs to be in engagement at all times – or at the very least on a quarterly basis. Get them to do a healthcheck for you – all QSAs worth their salt should be able to do this. The mistake here is to treat your QSA as just an auditor and not onboard them throughout your certification cycle. An example is in the supplementary document from the council “Penetration-Testing-Guidance-v1_1” shows the possible involvement of the QSA:

In order to effectively validate the segmentation methodologies, it is expected that the penetration tester has worked with the organization (or the organization’s QSA) to clearly understand all methodologies in use in order to provide complete coverage when testing.

Pg 10 PCI Data Security Standard (PCI DSS) v1.1

It’s essentially critical to understand the relationship the QSA must have and the involvement they have, especially in the scoping part of PCI-DSS. The problem we often see is there is a disconnect between the company and their QSAs in terms of scope, or expectation, or evidences, which generally leads to A. LOT. OF. PAIN.

For ASV scans, a QSA may also provide ASV services provided these are properly controlled that there is proper segregation of duties and independence within the QSA/ASV company itself.

However, we have also done many companies whereby we provide the ASV scan and another QSA does the audit. Or the other way where we provide the QSA audit, and ASV is done by another company.

There is one example whereby we were auditing a company, and the ASV scans were done by another firm. We have been engaged from the start on a quarter basis and we highlighted to them that their Q1 ASV scan isn’t clean. We got on a call with the ASV company and worked together to ensure that the next quarter, these non compliant items would be remediated. So even with Q1 ASV not passed, at the end as QSA we still accepted the PCI recertification. PCI Council addressed this in FAQ 1152 – “Can an entity be PCI DSS compliant if they have performed quarterly scans, but do not have four “passing” scans?”

Without early engagement of the QSA and ASV, there would be a lot of problems once the recert audit comes around. In this case we could set the proper expectation early in the cycle for the customer to address.

Another possible instance is whereby the ASV themselves can pass a quarter scan with non compliant findings with compensating controls. This procedure is detailed out in section 7.8 of the ASV program guide, whereby within the quarter scan itself, before the expiry of that quarter, compensating controls are provided and validated and the ASV is able to issue an acceptable report for that quarter. This is important, because QSAs like to see 4 quarterly clean reports, and they throw a tantrum if they don”t get what they want. So in short, for ASV scans, do the following in this order:

a) Remediate all and get a clean report for the quarter; or

b) If you have non compliant for the quarter, engage your ASV, provide acceptable compensating controls, and attempt (not influence) with the ASV to accept/validate these controls and provide a clean report for the quarter but documented under Appendix B of the scan report summary; or

c) If for whatever reason, a clean report cannot be provided for the quarter, work closely with the ASV and the QSA to ensure that at least the next quarter or quarter after next remediation is correctly done. This is tricky because once the quarter report is out, it’s out of the ASV’s hands and into the QSA – on whether they can accept these reports or not. You can hang on to FAQ 1152 – but remember, FAQs are NOT the standard, so you are essentially in the hands of the QSA.

Those are your options for ASV, if there are any delays. DO NOT, in ANY CIRCUMSTANCE, MISS Your quarterly scan. Missing your scan is NOT THE SAME as getting a non compliant report. Missing your scan means there is no recourse but to delay your certification until you can get your 4 quarters in.

Finally before we sign off – let’s clarify here what a ‘quarter’ means. Some clients consider ‘quarterly’ scans to be their actual calendar year quarter. No. It’s not. Essentially a quarter is 3 months of a cycle of 12 months compliance year. A compliance year is not your calendar year (it could be, but it doesn’t have to be). So let’s divide this into two scenarios:

a) Where the ASV scans are required for the compliance year

In the case – the compliance year first needs to be defined, and this is usually done by identifying the signoff date of your AoC. For example if the QSA signed off your certification on April 1st, then that is where your quarter 1 begins. April – June; July – September; October – December; January – March. 4 quarters. You need to perform your ASV scan within the quarter, resolve the issues, and get the clean report out. This is CRITICAL to understand. Because many organisation fail this portion where they do not even perform any scans for the first few quarters and only pick up their PCI-DSS again mid way through and everyone is like: “Oops.” So while drinks and celebration are in the works once you signoff the AoC – your quarter 1 has also begun, so don’t drink too much yet.

So know your quarters. Start your scan early in the quarter, rescans must be done after remediation, and in case you need compensating controls, you need to get ALL THESE DONE within the quarter. If you perform your rescans in the next quarter, you are doomed. You MAY perform the rescan in this quarter and the clean report comes out next quarter for the current quarter – but all scans must be done within the quarter itself.

a) Where we have NO clue when the quarters are

As funny as this may sound (in a tragic way), there are many instances where we (wearing the ASV hat) gets plopped into situations where the client HAS NO CLUE when their compliance quarters are. I don’t know why this occurs. When I request them to check their AoC, or their QSAs for guidance, some can’t provide it. This is as great a mystery as the Sphinx itself. We call these internally, ‘Orphaned ASV scans’. These are projects where we are given the IPs and just told to shut up and scan the IPs. In this case because we onboard all ASV scans with quarters to define when we need to remind our customers, or escalate issues if the quarter runs out – we generally just use the date of the scan as a reference for quarters. So for instance, we provide a clean scan on April 31st. Since they are orphaned scans, without a compliance year/cycle for reference, we use the date of the scan report itself – meaning this scan expires 31st July.

By and large, we are seeing less and less of these orphaned ASV scans issues. Because QSAs these days are more engaged with customers and their customer service has also improved, it’s rare we find a client who isn’t aware of these cyclical requirements. Most clients, not just the large ones, are serviced by QSAs who themselves are reinventing themselves not just as auditors coming in once a year to observe and audit, but provide separate, independent units/consultants to assist healthchecks and support as well to enquiries pertaining to clients.

And a final note on this article – when we refer to ‘QSA’ or ‘ASV’ under our umbrella, we mean ControlCase International (QSA and ASV), whom PKF have been working with for close to a decade. As to why we do not want to become QSAs ourselves, we take independence and segregation of audit and operations seriously, as accounting and audit is our DNA. An article has been written at lenght on this:
http://www.pkfavantedge.com/it-audit/pci-dss-so-why-arent-we-qsa/

So – drop us a note at pcidss@pkfmalaysia.com for any queries on ASV scans, PCI-DSS or compliance in general. And no, we don’t know how to solve the resolve the Coronavirus yet, but I hope we get there soon. Stay safe and stay well!

The Art of Compensating Controls

In our many advisories over the years with companies going through PCI, some of the challenges we face include this little thing called compensating controls.

It’s a PCI term in many sense. Same as ‘segmentation penetration testing’. It’s when you can’t address the actual control in PCI-DSS for a reason and you need to put other controls to address the ‘spirit’ of the control. The spirit means: why did PCI-DSS put that control in, in the first place?

Immediately, this becomes an outlet of some of the greatest creativity ever existed in our technology field. If everyone involved can channel such creativity of controls into developing innovative tech, we would be far ahead achieving the technology vision 2020 that our country is lagging behind in. As they say: Necessity becomes the mother of invention. So everyone starts inventing controls.

Unfortunately such invention of controls in thin air will likely face a bounced rejection from the QSA which would further cause stress due to the impending deadline and the admission that the compliance will be delayed. So it’s in everyone’s interest that compensating controls are done correctly from the beginning.

Now this article’s purpose is not to go through the whole list of what compensating controls are. There are already thousands of articles that do that. Suffice to say, a compensating control is when you cannot meet the PCI requirements and you still want to be compliant.

So to make it simple:

a. Write which control you can’t address. It may be logging and monitoring, complex password etc.

b. Why can’t you do it? Some good reasons are that the legacy system that runs on the factory store churning out a thousand printed statements a minute, that cannot be patched due to patches no longer being produced. A bad reason is, Bob is new to the company and has no clue what does patching means.

c. Risk Analysis. This isn’t something natural many companies do for IT, but it has to be done. If the system cannot be patched what are the risks? Infection of malware? Internal exploit? Availability problem? Loss of power?

d. Document the control. This has to be done. You can’t just go and say, “OK, QSA, trust us, it’s in.” It needs to be detailed procedures on how the organisation will carry out these compensating controls.

e. Ensure it’s implemented and validate it with the QSA. In fact, we would suggest to bring in the QSA early in the process. Since they are the ones validating your controls, it makes sense to onboard them with the fact that you are doing compensating controls. What you may find acceptable to your risk may not be acceptable to the QSA. It’s not a matter of, “Hey, let me bear the risk this year!”. It’s a matter of “If this thing goes south, who else is going to be affected?” – QSA, banks, companies – because credit card information isn’t just the domain of the company handling it – it has upstream and downstream repercussions – from the payment brands, banks, acquirers TPA, service providers to the customers.

To be honest, compensating is a major pain in the butt. There is no way to describe it better. It’s actually worse that the actual controls. Plus its not a given each year – QSA may decide due to next year’s evolving risks your controls are no longer acceptable!

An example here: say you can’t patch your pre-historic system for a good reason. The vendor has since announced they no longer support that system and a new release is imminent in a year’s time and the notice is for all customers to bear down and wait for the release. At this moment it’s completely out of the customer’s hands.

The compensating controls could be

a) Having a documented notice from the vendor that security patching cannot be done

b) Hosting the system in an isolated VLAN that does not have any other CDE/non-cde systems

c) MFA needs for ALL access, not just administrative

d) Firewall rules must be specific to port/source/destination

e) Copy/paste, USB etc are disallowed in said system, and any attempts to do that is logged through DLP.

f) Antimalware must be installed and logs monitored specifically

g) Logging and monitoring needs to be reviewed daily specifically for this system and a report daily separately for incidents to this system

Taking a look at the above examples, these are controls that are considered above and beyond what is necessary for PCI requirements. In short, its a lot harder or more expensive to get compensating controls done than for the actual controls, no matter what you may think.

We once had a conversation with a company who were thinking of switching to us from previous consultants. When asked, we realised that many of their systems were not PCI compliant and they had put in compensating controls. Their compensating controls were: “Mitigation plan is in place to replace these systems in a year’s time.”

That’s not a compensating control. That’s something you plan to do in 365 days time. In the meantime, what are you proposing to lower the risk? That’s your compensating controls.

Don’t use the compensating controls as a get out of jail free card. Any consultant/QSA worth their salt would know how difficult it is to get these controls done and more importantly passed for PCI-DSS. In other words, instead of looking at it as a convenient shortcut or workaround, it should be viewed as the last resort.

Drop us a note at pcidss@pkfmalaysia.com for any queries you may have on PCI-DSS and we will respond to that immediately!

Alienvault USM – Flat File Log Capture – Part 1

We’ve been working with and on Alienvault since the beginning of 2016 and a lot has changed since then. When we started out with Alienvault, they were a small-ish company still, with big ambitions, working with a very technical group out of Cork, Ireland. We had direct access to their technical team (I think even to one engineer) and the amount of knowledge we got from those early days are pretty much invaluable to where we are right now. Of course, Alienvault has changed a lot since then, and now being part of AT&T – for the record, we believe they have the right roadmap to go into cloud with their USM Anywhere concept, and their product right now is much more robust and enterprise ready. They are on the right trajectory.

However, back in the days, for Alienvault USM Appliance (not Anywhere), which is their Appliance offering, we could literally ‘jailbreak’ the system and go into the underlying OS and do stuff to Alienvault that we can’t do anymore in Anywhere. Some of the changes we made were to increase optimisation, put in our own scripts to clean up the system, troubleshoot the system and of course, create plugins for custom applications. We would write custom plugins in 1 – 2 days for multiple applications because of deadlines, I remember and had to do so much in so little time – but we did it anyway. We had to write a plugin for one of the oldest mainframes for a financial institution that was so difficult to interpret, we had to dig up old manuals to sort out the entries for log and events. It was like we were interpreting Egyptian hieroglyphs. But that’s what it took – 2 days, I think because of compliance requirements and customer breathing down our neck to get it done.

Writing plugins was the easier part of the battle – in some old machines or legacy applications, getting the logs was the problem. If Alienvault doesn’t get the logs, it can’t do anything with it. One solution was to leverage on the HIDs (Host IDS), or OSSEC as it was known, to grab log files from systems. It wasn’t so elegant, and we still had to end up writing plugins for it to normalise, but it resolve the issue where application was not able to forward logs to the SIEM, or not able to write the logs to the Windows Event service, or any other way to get logs out to a syslogger. So the solution here is for the application to just write the logs to a file, and Alienvault go ahead and grab this and interpret it. It may not be real time, but it works.

There’s a good write up over in Alienvault at
https://www.alienvault.com/documentation/usm-appliance/ids-configuration/process-reading-log-file-with-hids-agent-windows.htm. So a lot of it is just a repeat and probably an exposition on why we are doing certain things in a certain way.

So the first thing to do here is to ensure that you are able to install HIDs on the server. HIDs will be the key to get this file out to Alienvault. Technically, you could actually use NXLog as well but let’s explore that another time.

Once HIDs is installed, get into the ossec configuration file to define the <localfile> location. Now assuming that you have configured your application to write to a flat file called database.log.txt.

Go ahead and restart OSSEC. That’s pretty much what you need to do to start off so it’s pretty simple.

The rest of it is all done on Alienvault.

To summarise the steps:

Enable “logall” on your USM Appliance. You want to dump whatever you are getting in that flat file database log to a log inside your Alienvault so you can start doing stuff to it. In this case, in your AV User Interface:

Environment > Detection > HIDS > Config > Configuration.

Add <logall>yes</logall> to the <global> section of the file .

You are dumping these logs into /var/ossec/logs/archives/archives.log.

Restart the HIDs service through UI.

You should be able to see new logs coming into archives.log. Just do a tail -f on it, edit the log file (database.log.txt) in your remote system (just write something on it) and see if it appears in your archives.log. Once you see it, you are almost done. Very simple.

So for now, you have customised logs coming into your Alienvault. The next thing to do is to interpret these logs and make sure events are able to be derived from these logs to something that is useful to you!

Drop us an email at alienvault@pkfmalaysia.com for more information on Alienvault or any technical queries you have, and we will attend to it.

« Older posts

© 2020 PKF AvantEdge

Theme by Anders NorenUp ↑