One fine day, we got an error "An error occurred in the secure channel support " on trying to connect an IBM InfoSphere DataStage rich client to its service tier. It took a long time for me to resolve, but path of debugging was so interesting, that I felt a need to document it here.
One of the teams I work with, works on IBM InfoSphere DataStage for Business Intelligence. Here's our software stack:
- Windows Server 2012 R2 on Private Cloud
- InfoSphere DataStage 18.104.22.168
- WebSphere Application Server 8.5.5
After a bunch of Windows updates, and some maintenance done on the OS by the Private Cloud support team, the BI team was not able to start any of the InfoSphere DataStage rich clients (Administrator, Director, Designer):
The team could not use any of the rich clients. Fortunately, this error was in the development environment, and the production environment was still functional. I was very fortunate to have a chance to compare the working environment with the non-working one while debugging.
Any debugging starts with possibilities on where things could be wrong, and slowing and steadily eliminating them through the process of intelligent trial-and-error. Here were my possibilities:
- The BI team informed me that there were recent Windows Updates to the system, and it stopped working after that. So the updates could be a problem.
- We have had recent communication from the private cloud support that support for TLSv1.0 was being removed, could this have caused an issue?
- Did someone make some changes in the OS which messed this? This was pretty vague and could not have been pursued.
A colleague had already tried to debug this issue. On searching through Google, I hit upon an IBM product problem page with solution. The date was 2017, and software version was what we had, so it seemed relevant. It described to turn on SSL 2.0, so he tried adding the registry keys:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0] @="0" [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Client] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001 @="0" [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Server] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001 @="0"
It didn't work.
I am not a security person, but based on an unrelated recent issue, I had a vague idea that in order of least secure to most secure, the various versions were:
- SSL 2.0
- SSL 3.0
- TLS 1.0 - 1999 standard, very old and weak
- TLS 1.1 - it wasn't/isn't very popular, very few software use it, since most jumped to the next version
- TLS 1.2 - current standard, strong enough for today's environments and computing power, as of 2019, all current middleware supports this
- TLS 1.3 - I haven't gone into details of this, but I haven't seen a middleware supporting this yet - I have dabbled with WebSphere App Server, MySQL, DB2, PHP, Apache HTTPD
We eventually flipped the DisabledByDefault to 1 and Enabled to 0, to make the environment not accept SSL 2.0 requests.
Colleague informed that in his previous experience, he would check Windows Updates if there were issues like this, and uninstall recent updates and it would work fine. We found 3 recent successful updates: KB4493467, KB4493435 and KB4489883 and uninstalled them all. Still no luck. We later installed back KB4493467 and KB4489883, and could not find KB4493435 in the Check for Updates, but read that these changes were part of April 2019 Security Monthly Quality Rollup (KB4493467), so we were good.
On looking for information on the error, I came across someone suggesting checking in the Event Viewer. This is what I saw:
Okay, so now I have two messages I could try looking, one from the login screen, and one from this event viewer log. The wording of this log suggested the error at OS level, rather than at app level, but it was still very vague. And I had no idea what SChannel was.
I had no way of looking at source code of the product (proprietary), and even if I did, it'd take be years to figure out anything, plus the code was most likely written in C++ (and I have only recently started learning). I had no access to the product support. The message gave me a vague feeling that the error was at OS level, not app (keep in mind that I have never coded TLS transactions).
So, I had to install WireShark, true to their words, to go deep! I saw some interesting things happening between the client and the server (both on same machine):