All posts by bondy

Azure Site-To-Site VPN – Lab Setup

In the fine traditions of this site, I am not going to go into the minutiae of every aspect of this or why we do it. The goal here is to get it up and running as quickly as possible with as few steps as possible. Whether I achieve this or not, you’ll have to be the judge, suffice to say there will be some basic steps I assume you will be able to do. So let’s get cracking.

1. Create a resource Group (eg RG_S2SVPN)
2. Create a VNet (eg vnet_s2svpn – 10.0.0.0/16)
3. Create a Subnet (eg Subnet1, 10.0.0.0/24)
4. Create a VM on the subnet you just created (this will be used for testing connectivity later)
5. Create a Gateway Subnet (eg GatewaySubnet, 10.0.1.0/29)
6. Create a VirtualNetworkGateway. This can be done manually in the portal as with anything else but it can no longer be done this way if you wish to use the basic SKU. If you wish to use the basic SKU update the code below if necessary and run this in Cloud Shell:

$location = "east us"
$resourceGroup = "RG_S2SVPN"
$VNetName = "vnet_s2svpn"
$VNGWPIPName = "s2svnetgw-ip"
$vnetgwipconfig = "vnetgwipconfig1"
$VNetGWName = "s2svnetgw-gw"
$vnet = Get-AzVirtualNetwork -name $VNetName -ResourceGroupName $resourceGroup
$subnet = Get-AzVirtualNetworkSubnetConfig -Name GatewaySubnet -VirtualNetwork $vnet
$vnetgwPIP = New-AzPublicIpAddress -Name $VNGWPIPName -ResourceGroupName $resourceGroup -Location $location -Sku Basic -AllocationMethod Dynamic
$vnetgwIpConfig = New-AzVirtualNetworkGatewayIpConfig -Name $vnetgwipconfig -SubnetId $subnet.Id -PublicIpAddressId $vnetgwPIP.Id
New-AzVirtualNetworkGateway -Name $VNetGWName -ResourceGroupName $resourceGroup -Location $location -IpConfigurations $vnetgwIpConfig -GatewayType Vpn -VpnType RouteBased -GatewaySku Basic

  1. Create a Local Network Gateway (eg OnPremGateway, IP = <Physical Internet Router IP> – hint:What’s my IP in Google), Address Space = , eg 192.168.0.0/24)
    1. Create local VPN router (typically on a server OS VM on your home network)
      – ‘Configure and enable Routing and Remote Access’
      – Custom Configuration
      – Select ‘VPN’ and ‘LAN routing’
      – Start Service
      – Click Network Interfaaces | New Demand-Dial Interface
      – Configure:
      Name (‘AzureS2S’)
      Connect using VPN
      IKEv2
      Public IP of your VPNGW in Azure
      Route IP packets on this Interface
      Static route w/metric of your azure subnet, eg 10.0.0.0 / 255.255.255.0 / Metric (eg 5)
      No need to specify any credentials
      – Click new connection (AzureS2S)|Options|Persistent Connection
      |Security|Specify a password for the Pre-sharedKey
    2. You will need to create a static route on your physical network/broadband router, pointing to the software router you created above. Different routers will have slightly different options but you should aim to provide the information below:
      – On WAN options, you will need to select port forwarding
      – Enable this, add ports 500/1701/4500 (UDP)
      – For the internal IP address, give the IP of the router you created in (8)
    3. In the portal, search for ‘connections’
      – Basics: Create Site to Site (IPSec), bi-direction connectivity, name and region
      – Settings: Select the virtual and on prem gateways and preshared key from above. Leave defaults, Create
    4. From the local VPN router you set up in (8) right click the connection you created and click ‘connect’. If all is well ‘connection state’ should now change to ‘connected’ after a few seconds. The Azure portal connection should also now show a ‘connected’ status after a refresh.
    5. Now you have the connection in place, log into your azure VM. For the purposes of testing, turn off the firewall (or at least let ICMP traffic through). You should be able to ping the VM on it’s local network IP (eg 10.0.0.4) from the router computer.
    6. In order to be able to communicate to your Azure VM from other machines on your local (‘on prem’/lab) you will need to create a static route from those machine(s):
      • On the local machine in question, get an admin cmd prompt up
      • ROUTE ADD 10.0.0.0 255.255.255.0 metric 5

Try pinging the VM again…it should now be able to communicate to your azure VM. You can browse shares on it if you want, drop files etc just as you would on a machine on your local network/lab (you’ll need to provide the appropriate credentials, obvs).

Relaxing Intune Bitlocker policy for removeable disks

A quick post about an issue I ran into today whilst trying to create an OSDCloud USB stick (BTW anyone interesting in cloud imaging, I thoroughly recommend checking out David Segura’s site : OSDCloud.com).

Anyway I created a new image on my Intune-managed laptop and as you might typically expect, we have bitlocker policies for drive encryption. However by default, this will also ask the user to either encrypt the drive to allow writing to it or not do so but open it in a read-only mode. Given that I needed a bootable USB drive the encryption option wasn’t going to work for me. Digging through the policy settings, I eventually came to a setting called Deny write access to removable drives not protected by BitLocker which needed to be set to disabled. After several minutes/syncs and a few restarts later (yes, I’m impatient) The previously greyed out ‘paste’ option when I selected the drive appeared and for all intents and purposes I figured all would now be well. Unfortunately not.

At this point I was scratching my head a bit until I noticed a file on my desktop called WriteAccessUSB.reg. I guess I must have run into a similar issue in the past and this did the trick. Open regedit and browse to the following location:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FVE

Add/change the following setting:

"RDVDenyWriteAccess"=dword:00000000

Finally just remove and replace your USB drive (no need to restart) and it should be readable.

Generic exception : ImportUpdateFromCatalogSite failed. Arg = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. Error =The underlying connection was closed: An unexpected error occurred on a send.

I recently rebuilt my WSUS/SUP server and after running a sync, was presented with a sea of red giving me an error for each and every update it tried to sync.

Transpires this is a result of the (relatively) recent enforced strengthening of the TLS protocol by Microsoft. The fix is pretty simple though. Jump onto your WSUS server and just run the commandline below to configure .NET Framework to support strong cryptography:

reg add HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319 /V SchUseStrongCrypto /T REG_DWORD /D 1

Now resync your updates and all should be well.

NB

I also ran into an issue after this whereby the wsyncmgr.log ‘synchronized’ all the updates (well appeared to do so) but no meta data appeared in the console. To fix this I unchecked all products and categories, sync’d again, then rechecked those I needed. I ran the sync once again and they all appeared.

SCCM Content Distribution Broken – WMI?

There can of course be many reasons for broken ConfigMgr content distribution – lack of space, physical security on disks and many, many others. This is one possibility though – can the site server actually reach the DP though WMI? If not, then this will undoubtedly cause problems.

This happened to my infrastructure, I suspect, through a patch deployment. See here for more information. Anyway, to test if this is an issue, run up a session of WBEMTEST and connect to the DP in question from your site server via:

\\<ConfigMgr DP>\root\CimV2

Assuming you’re getting ‘Access Denied’ (typically with an 80004005 error) this may well be the fix you’re looking for. You will also see the following in the SYSTEM eventlog of the DP:

The server-side authentication level policy does not allow the user XXX from address XXX to activate DCOM server. Please raise the activation authentication level at least to RPC_C_AUTHN_LEVEL_PKT_INTEGRITY in client application.

You’ll likely see the following in Distribution Manager status messages:

SOLUTION:

In REGEDIT, browse to

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Ole\AppCompat

Create a DWORD value:

RequireIntegrityActivationAuthenticationLevel

Give this a value of 0, then restart the machine.

You should now be able to successfully connect via WMI, as will your site server.

Configuration Manager can’t connect to the administration service

The configuration manager console can’t connect to the site database through the administration service on server

I am looking to test out one or two features which rely on MECM’s Administration Service so was somewhat disappointed when I got the error above whenever I clicked on the respective nodes. Mine is a fully PKI environment and my initial suspicion was that it was certificate-related. Having spent several hours tinkering with the certificates and messing with IIS and getting nowhere I decided to sleep on it…

The first thing I noticed was that the SMS_REST_PROVIDER.log hadn’t logged anything for over a month so something must be broken. I went to the SMS_REST_PROVIDER component on the server with the SMS Provider role and noticed I was unable to query/start the component. Looking at the status messages, it was constantly trying to reinstall and failing. A little more detective work and I found a possible lead to DCOM security, so I opened DCOMCNFG, expanded Component Services, expanded Computers, and then expanded My Computer. First red flag I saw was that there was clearly an error ‘bang’ on the My Computer icon. Anyway, I persevered and right-clicked it and selected MSDTC whereby I got an error:

“The remote server has been paused or is in the process of being started.”

This lead me to another post which was talking about a cluster configuration which was receiving the same error message. This got me thinking…I don’t have a cluster, what’s this on about? Anyway, I went back and checked the MECM box and it transpired I did have an old cluster I’d set up ages ago which I’d forgotten about and had since deleted one of the nodes! This was no longer required, so I simply ran a couple of Powershell commands:

Remove-Cluster -force

Remove-windowsFeature failover-clustering -restart

After restarting. I checked DCOMCNFG and the My Computer icon no longer had the bang in place. Nice. Looked at the console but still no joy. It was still telling me the Admin Service was unavailable 🙁

I nonetheless sensed I was close. I went back to the DCOMCNFG applet and went down to the Distributed Transaction Coordinator node, under which there is another node called Local DTC. I right-clicked this and went to the security tab. I was interested to see whether the DTC logon account was correct. Unfortunately, it was (it should be NT AUTHORITY\NetworkService by the way). Another dead end. This time however I tried selecting the Network DTC Access check box. and opened up the MECM console again. I clicked on the Console extensions node and this time there was a short pause and everything appeared!

One weird thing I noticed. I was able to uncheck the Network DTC Access check box and my admin service seems to remain in place without error. I will monitor this but seems that it just needed temporary access here from my observations at present.

UPDATE:

Following the above, I found that a remote console I was using kept crashing. I had to add the Network DTC Access check box before it would load correctly. Further, it appears this checkbox should be kept checked as the console will begin to crash again when opened without it over time.

XML Parsing Error at line XXXXX char : Operation Aborted: MaxXMLSize constraint violated

Been a while since I last posted but ran into an issue today that had everyone confused as it quite a tricky one to track down. There had been a number of changes in the environment over the last few weeks so each and every one one of them was examined in microscopic detail. Let me explain…

We started to see a few hundred or so machines start to fail on a certain application (actually it was just a simple Powershell script packaged as an application) during the OSD build task sequence. As it happens this app was part of a nested TS but this is probably irrelevant. In any case, some machines were fine, others were failing. Nobody had touched the app in any way for several months.

After much digging and many red herrings, tucked away in the SMSTSLOG.log was the following message :

XML Parsing Error at line 473689 char 52: Operation Aborted: MaxXMLSize constraint violated.

The cause of this error was down to ‘too much policy’. Basically the affected machines had a lot of Defender Updates deployed to them and it was essentially too much for the machines to handle. Once removed everything started to work again.

If you’re pulling your hair out and can’t figure out why something is failing, then there are thousands of possibilities, admittedly. But it might be worth a quick search for the words XML Parsing Error.

Script Status Message queries!

If, like me, you spend more than your fair share of time searching through status messages to figure out what broke in the deployment over the weekend, then you’ll know what an arduous process it can be putting the criteria into each query. If you have a good few machines to check then you literally spend half your time typing in machine names and times.

Well no more, because did you know it is perfectly possibly to script this? Status Message Viewer (statview.exe) is simply an executable and with the right parameters and the correct time format applied, you can simply call the status messages from as many machines as you see fit (although I’d recommend you limit this to no more than 15-20 at a time).

One observation when running this against multiple machines is that you’ll notice some of the status messages won’t always contain as much info as you expect – simply refresh the status message and all info will display as expected.

Finally, create a text file containing a list of the machines you wish to take status messages from and use the path as a parameter along with the date from which you wish to obtain the messages, in the format YYYY-MM-DD.

Please note this script assumes you have installed the ConfigMgr admin console on the machine on which you run the script, and in the default location. If you have installed it elsewhere please change statview.exe path accordingly.

Param(
 [string]$path,
 [string]$date
 )
If($date -eq "" -or $path -eq "") 
 { 
     Write-Host "File path and date must be supplied as a parameters.
     Example: 
     -path C:\Temp\Computers.txt
     -date 2021-04-09"
     exit
 } 
$command = "C:\Program Files (x86)\Microsoft Configuration Manager\AdminConsole\bin\i386\statview.exe"
$siteServer = "SCCMSiteSvr.contoso.com"
$startDate = Get-Date -Format "yyyyMMddHH:mmm.000" -Date $date
$Computers = Get-Content $path
foreach($compName in $Computers)
{
    $commandArgs = "/SMS:Server=\\$siteServer\ /SMS:SYSTEM=$compName /SMS:COMPONENT=Task Sequence Engine /SMS:COMPONENT=Task Sequence Action /SMS:SEVERITY=ERROR /SMS:SEVERITY=WARNING /SMS:SEVERITY=INFORMATION /SMSSTARTTIME=$startDate"
    & "$command" $commandArgs
} 

in-line script execution time-out…

Had this recently on a machine we were upgrading to Win 10 1909. Initially it looked as though there was an issue detecting the application being installed correctly but on closer inspection, the AppDiscovery log file revealed that the same timeout issue was happening on several applications. Googling about there were quite a few posts on how later versions on ConfigMgr now incorporated a client property to change the script timeout setting but this sadly appeared not to be the case. Other posts suggested a script that could be run at server level to fix this. Not really the short-term fix I needed to sort my issue as it would doubtless take weeks to get the change through at work.

Then I found what I needed – a client-side script which I have now lost the source to, so really sorry if this came from you. I’m happy to set the record straight and link as needed. In any case, I do have the script itself, see below. This wil set the timeout to 1200 seconds (from the 60s default). This fixed my issue. I would imagine this could be added to the start of a task sequence if required. Note it’s a VBScript…old skool.

On Error Resume Next
strQuery = "SELECT * FROM CCM_ConfigurationManagementClientConfig"
Set objWMIService = GetObject("winmgmts:\\" & "." & "\ROOT\ccm\Policy\Machine\ActualConfig")
Set colItems = objWMIService.ExecQuery(strQuery, "WQL")
For Each objItem in colItems
objItem.ScriptExecutionTimeOut=1200
objItem.put_()
Next

Set objWMIService = GetObject("winmgmts:\\" & "." & "\ROOT\ccm\Policy\Machine\ActualConfig")
Set colItems = objWMIService.ExecQuery(strQuery, "WQL")
For Each objItem in colItems
If 1200 = objItem.ScriptExecutionTimeOut Then
WScript.Echo "True"
Else
WScript.Echo "False"
End if
Next 

Timed out waiting for CcmExex service to be fully operational

The mis-spelling above is intentional BTW, this is how it appears in the SMSTSLOG.log file. Typically a task sequence will be ticking along then something will happen after which the above error is displayed, almost always when it is trying to install either a ConfigMgr package or a ConfigMgr application (ie not a command-line). This is because the client isn’t actually required to execute, for example, a Powershell command line, but it must be initiated if a package or application is called on.

Ultimately, in my experience, this always comes down to one issue – an inability to reach the Management Point. There maybe various reasons for this and one such reason is described here.

However in my case this wasn’t the problem. Following a task sequence upgrade to 1909, I found all our laptops failing with this same error. These laptops were all on the internal network (ie not connected via VPN/DA, etc). If I logged on locally, I found that while I was able to ping machines, I was unable to reach shares from other machines on the network. Something was very wrong with networking (and obviously why the build was failing).

Running an IPCONFIG /all revealed that the broken laptops were all trying to use the IP/HTTPS Tunnel Adapter Interface. This synthetic interface is typically used for Direct Access which these laptops were certainly not using at the time. On further investigation, if I removed the group policies the laptops inherited for the duration of the upgrade I was then able to complete the upgrade without issue.

THE SOLUTION

The group policies are causing the tunnel adaptor to become active albeit briefly (haven’t got to the bottom of why this happens BTW). Unfortunately the adaptor isn’t able to communicate with the MP as it should. Then I read an article about a bug found in the 1909/2004 OS’s. You must ensure this patch (Oct 2020 update) is applied to the installation media prior to upgrade. Essentially, the certificates were disappearing causing the communication problem with the Tunnel adaptors on the laptop models. Once the patch was added, all was well.

Configmgr client application stuck at 0% downloading

I’ve been meaning to do a quick blog about this issue for some time now especially as I have witnessed this incredibly frustrating problem at two separate clients during roll-outs, particularly OS deployments that also require updated applications, etc. A Google search will reveal many different ‘fixes’ for this issue, most common of which tends to involve re-installing the ConfigMgr client. Indeed this is an approach I took first time round and usually seemed to work but it isn’t really a fix as such.

In the latest instalment, this approach just didn’t work at all so I needed to find what else was causing the issue. If I left the offending application to time out – this might take several hours btw – I could usually restart it and it would go ahead and download. However this was still unacceptable.

SOLUTION

Turns out the answer, with hindsight, was the same in both instances – too many policies. Essentially in both circumstances, the machines experiencing the issue had a lot of applications and/or task sequences deployed to them. Task sequences in particular usually have many steps and this seems to flood the client causing it to display this ‘stuck at 0%’ behaviour. Try removing the machine from all unnecessary collections, leave for an hour or so for the machine to expunge any of the old deployments and try again.

This still feels like a Microsoft issue to me but until it’s addressed this is the workaround.