Company News

Fixing the AppFabric Cache Cluster in SharePoint 2013

July 10, 2013

I ran into this at a client site recently, and wanted to blog my experience.  I had a number of things not working as expected in the Cache (including Event ID 1000 and Event ID 1026), and at the end of the day, it appears to have boiled down to 2 things.  Firstly, the cache cluster was improperly configured.  As such, I ended wiping out the cluster, and rebuilding it.  Then after much pain, I found that one of the servers in the cluster that was constantly complaining about not being able to start properly was still misconfigured (using the wrong account), and after stopping the cluster, exporting the config, fixing it, and reimporting the config, then restarting the cluster finally solved the problem for good.

I started with this blog post http://www.sharepointconsultant.ch/2013/03/07/adding-a-local-sharepoint-2013-development-server-as-a-cache-host-to-appfabrics-cache-cluster/.

That gave me the following knowledge:

1) There’s a Windows Service named “AppFabric Caching Service”, which matches 1:1 to each server in the cluster (IE. every server that’s part of the cluster has this service on it, and it should be set to run “Automatic”, and be running, if it’s healthy).

2) The key PowerShell you’ll need to know is as follows.

** Always run your PowerShell window as Administrator when working with the AppFabric Cache **

Start with the following line of PowerShell to let it know who’s boss.

PS C:> Use-CacheCluster

Next, find out the details about your individual host.  (It’s most likely configured on port 22233)

PS C:> Get-CacheHostConfig –ComputerName [yourServerName] -CachePort 22233

That should return the details for this server in the cluster.  Something like below.

HostName        : [Your Server Name]
ClusterPort     : 22234
CachePort       : 22233
ArbitrationPort : 22235
ReplicationPort : 22236
Size            : 400 MB
ServiceName     : AppFabricCachingService
HighWatermark   : 99%
LowWatermark    : 90%
IsLeadHost      : True

If, however, you’re getting an error along the lines of:

PS C:> Get-AFCacheHostConfiguration : ErrorCode<ERRCAdmin010>:SubStatus<ES0001>:Specified host is not present in cluster.

You can register your host in the cluster as follows.

PS C:> Register-CacheHost –Provider [yourProvider] –ConnectionString [yourConnectionString] -Account "NT AuthorityNetwork Service" -CachePort 22233 -ClusterPort 22234 -ArbitrationPort 22235 -ReplicationPort 22236 –HostName [yourServerName]

You’ll need 3 pieces of information to properly run the statement above.

yourProvider & yourConnectionString – Can be found in the registry under HKLM Software Microsoft AppFabric V1.0 Configuration or they can also be found in C:Program FilesAppFabric 1.1 for Windows Server in the DistributedCacheService.exe.config file.

yourServerName – The name of your server

(Optionally you can change the account, but I would recommend you leave the Network Service account in place – this seems to keep SharePoint 2013 happy)

Now when you run this command:

PS C:> Get-CacheHost

You should see the following.

HostName : CachePort         Service Name            Service Status Version Info
——————–         ————            ————– ————
MyServer1.domain.com:22233   AppFabricCachingService UP             3 [3,3][1,3]
MyServer2.domain.com:22233   AppFabricCachingService UP             3 [3,3][1,3]

At the very least, you should see both servers in the cluster at this point.  If you see this above, you’re done, and don’t need the rest of this article.  However, if you’re unlucky, and one or more of the servers are down (Service Status = Down, or Starting) keep reading.

At this point, one of my servers was not started (DOWN), so I went ahead and ran the following.

PS C:> Start-CacheHost –ComputerName [yourServerName] –CachePort 22233

If that failed, like it did for me, I would recommend exporting your cache cluster configuration, and seeing if anything is wrong.  To do this, run the following.

PS C:> Export-CacheClusterConfig [path to output filename]

So, for example…

PS C:> Export-CacheClusterConfig c:file.txt

When looking at the file, down near the bottom, I noticed that the account that MyServer1 was running under was all goofy (usernames shouldn’t have tilde’s in them).

<hosts>
     <host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
         hostId=”1909348767″ size=”800″ leadHost=”true” account=”DOMAINappsrv1~
         cacheHostName=”AppFabricCachingService” name=”MyServer1.domain.com”
         cachePort=”22233″ />
     <host replicationPort=”22236″ arbitrationPort=”22235″ clusterPort=”22234″
         hostId=”1634054989″ size=”400″ leadHost=”true” account=”DOMAINspService”
         cacheHostName=”AppFabricCachingService” name=”MyServer2.domain.com”
         cachePort=”22233″ />
</hosts>

WARNING: MAKE A BACKUP BEFORE YOU MAKE ANY CHANGES!!!

I fixed the account name (to match the service account on the other server DOMAINspService) and then had to import the configuration back in. 

** BUT WAIT – There’s more! **

Before you try to import your configuration, you’ll need to go into your Windows “Services” application and disable the “AppFabric Caching Service”, and then stop the service on each server in the cluster.

To do this, go find the following service and double click on it.

image

Next follow this order exactly, set the startup type to disabled, then stop the service (this is the same as running the PowerShell to shut down the AppFabric host).

image

Repeat the above steps on each server in the cache cluster.

Finally, once you’re done, now you can import the file like below.

PS C:> Import-CacheClusterConfig C:file.txt

Confirm
Are you sure you want to perform this action?
Performing operation “Replace cluster configuration.” on Target “Cluster configuration.”.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is “Y”): y

If you shut down the cluster properly (like I describe above), your configuration should take at this point. 

If you see the following error, ensure that you’ve shut down the service on all servers in the cluster (seen above).

Import-AFCacheClusterConfiguration : ErrorCode<ERRCAdmin001>:SubStatus<ES0001>:Hosts are already running in the
cluster.
At line:1 char:1
+ Import-AFCacheClusterConfiguration C:file.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     + CategoryInfo          : NotSpecified: (:) [Import-AFCacheClusterConfiguration], DataCacheException
     + FullyQualifiedErrorId : ERRCAdmin001,Microsoft.ApplicationServer.Caching.Commands.ImportAFCacheClusterConfigurat
    ionCommand

Go back to your services window and set your AppFabric service back to Automatic. Now all you should need to do is start the cluster, and you’ll be good.

PS C:> Start-CacheCluster

And all your servers should be UP at this point.  You can also check the cluster health with the following.

PS C:> Get-CacheClusterHealth

You can also check the Cache status with the following command.

PS C:> Get-Cache

Don’t forget, you can always see all the valid PowerShell commands using the following.

PS C:> Get-Help *Cache*

I hope this helps others where I was pulling my hair out.

29 Responses

  1. Peter Rees says:

    This is a fantastic article and very well written. Thanks – helped me solve my problem.

    • Mike says:

      This article literally saved me a tremendous amount of time!! Thank you to the author!

    • Dan N says:

      I agree! I’ve spent days trying to fix this. For me, just exporting and re-importing the config file, then starting things back up, finally fixed the problem. GREAT blog post!!

  2. phani kumar says:

    Hi,
    All My settigs mentioned in the article are fine with my server but still getting below error very frequently. Can you please help me with this?

    ViewStateLog: Failed to write to the velocity cache: http://server:2731/default.aspx

    Unexpected Exception in SPDistributedCachePointerWrapper::InitializeDataCacheFactory for usage ‘DistributedViewStateCache’ – Exception ‘Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode:SubStatus:The request timed out.. Additional Information : The client was trying to communicate with the server : net.tcp://server:22233 at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCacheProperties(RequestBody request, IClientChannel channel) at Microsoft.ApplicationServer.Caching.DataCacheFactory.GetCache(String cacheName) at Microsoft.SharePoint.DistributedCaching.SPDistributedCachePointerWrapper.InitializeDataCacheFactory()’.

    Below is the info from my dev server

    ps>> Get-Cachehostconfig with host details gives me

    HostName : server.corp.domain.com
    ClusterPort : 22234
    CachePort : 22233
    ArbitrationPort : 22235
    ReplicationPort : 22236
    Size : 819 MB
    ServiceName : AppFabricCachingService
    HighWatermark : 99%
    LowWatermark : 90%
    IsLeadHost : True

    ps>>get-cachehost

    PS C:Usersgnfoip02> Get-CacheHost

    HostName : CachePort Service Name Service Status Vers
    ion
    Info
    ——————– ———— ————– —-
    server.corp.domain.com:22233 AppFabricCachingService UP 3 [3
    ,3][
    1,3]

    Hosts under exported files is having

  3. Pieterjan Spoelders says:

    Great article, helped me out with a quirk in our SP installation! Thanks a lot!

  4. Rob says:

    Thanks for this, I was drawing a blank not knowing that you need to execute Use-CacheCluster first. Why oh why does the Technet documentation not mention this? It turned out after exporting the config that I also had the wrong account configured

  5. Kristian says:

    Great post, helped me solve an annoying issue.

    What I found out to resolve my issue, was that the cache-host causing problems had the pre-windows 2000 hostname in the host name attribute. Changing that to the FQDN and importing the config fixed the problem.

  6. Manoj says:

    Thank you Colin, these steps helped me solve a similar issue.

  7. Daniel says:

    This post is truly a gold mine.
    I had this issue and found a couple of things

    When you register FQDN for the server … otherwise it creates a dummy entry which you need to remove. Also if you accidentally create an other service instantance you get and error which is easy to fix

    PS C:Usersdwesterdale> Add-SPDistributedCacheServiceInstance
    Add-SPDistributedCacheServiceInstance : Cannot start service AppFabricCachingService on computer ‘.’.
    At line:1 char:1
    + Add-SPDistributedCacheServiceInstance
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : InvalidData: (Microsoft.Share…ServiceInstance:SPCmdletAddDist…ServiceInstance) [Add-SPDistributedCacheServiceInstan
    ce], InvalidOperationException
    + FullyQualifiedErrorId : Microsoft.SharePoint.PowerShell.SPCmdletAddDistributedCacheServiceInstance

    # now deleete the old instance
    PS C:Usersdwesterdale> Remove-SPDistributedCacheServiceInstance

    PS C:Usersdwesterdale> Add-SPDistributedCacheServiceInstance

    PS C:Usersdwesterdale> Start-CacheCluster

    — this shows an ‘error’ bacause the host has already started but I think there is no issues really.

    Thereafter I unregister the dummy entry I mentioned above ..
    AppFabric service is now happy!

  8. Daryl says:

    Great post, like Daniel said above – this is gold!

  9. Sachindra says:

    This post really nails this problem. Great. Why microsoft does not have such blogs?

    • MMMan says:

      My response would be, that’s why the have the MVP program. It helps them to get the best of the community contributors and reward them by saying, you helped us out and figured this out on our behalf.

  10. Yas Mad says:

    Excellent post it helped solve the same exact issue

  11. Jon says:

    Solved my problem! Thanks for taking the time to blog about this!

  12. Chad says:

    Colin – thank you, the Export-CacheClusterConfig helped immensely. I used the MS powershell to change the Cache Svc from the farm account to a managed service account, and everything was fine until I removed the farm account from local admin – at which point, the Dynamic Cache service crashed continually.
    When I ran the export I found that 2 groups were involved: securityProperties>

    The managed account was already in the WSS_WPG group but not in WSS_ADMIN_WPG
    I added the account and all is well.
    That still doesn’t explain why the farm account being in the local admin group would affect the DCS (which was running under another service account).

  13. Simon C says:

    Great write-up, helped me enormously. I had two cache hosts in the cluster, one of which was starting (forever) and the other down. Exporting the config, manually matching up the service account and port values, and then importing the saingle file to both hosts did the trick.
    Interestingly, my dodgy config came from autospinstaller installation of the service!

  14. Ravi says:

    Excellent post and great time and life saver 🙂

    It did help me understand as well as troubleshoot the problem. My problem was bit different. One the host in cluster was suffering from ping loss and that has put the services down.

    Is that okay if I can have only one host running the cache services while I disable the windows service on others?

    Thank you very much Colin!!

  15. Ankit says:

    Hi,
    I am working on Windows Server App fabric. I have been trying to add 2nd cache host to my cluster.
    What I did : –
    I created a cluster and added my local machine as cache host on cache port 22233.
    Now i installed appfabric on 2nd machine and while configuring it I joined it with the previous cluster.
    When i used command Get-Cache Host, it showed my machine with service status as UP and it showed the 2nd machine with service status as UNKNOWN.
    Also, when i viewed the config of each host both were shown as lead host.

    Please help me in adding 2nd cache host in a cluster.

  16. ghost says:

    Hi
    i missing the appfabric services in SharePoint
    will this help?

  17. Paul Gemme says:

    Colin,

    Excellent Post… Thanks for sharing the knowledge.
    This helped me solve a rather perplexing issue!

  18. Marcel says:

    Hello – thank you for your article which almost helped me. 😉 I have 2 servers in a cluster where one has a status of UNKNOWN. The App Fabric service stops and disables itself immediately on being started. The event viewer shows a failure in KERNELBASE.dll. My errors have resisted your fixes. Any ideas?
    1. Get-CacheHostConfig : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found
    2. Register-CacheHost : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found
    3. Start-CacheHost : ErrorCode:SubStatus:The requested name is valid, but no data of the requested type was found

    • Marcel says:

      I must use the NetBIOS name of the server, so if your server name is over 15 characters, truncate it in the PowerShell commands. Also check you have a DNS entry for the shorter name.

  19. […] itgroove.net: Fixing the AppFabric Cache Cluster in SharePoint 2013 […]

  20. Karim says:

    You are a super hero.

  21. Jeff Fink says:

    Excellent article, thank you very much!

  22. Arthur says:

    Thank you!

  23. Juha Metsäkallas says:

    This all requires that your cluster somehow runs, i.e. there is at least one server that is ok. If that is not the case, say, you have one server Sharepoint farm and the distributed cache is broken on it, then the very first command “Use-CacheCluster” already fails (“Failed to connect hosts in the cluster.”).I’m looking for help how to rebuild the cluster from absolutely scratch. Where the original, the one and only Sharepoint server has gone, and you have a clone of it under different name. (This was *not* my idea.)

    Any pointers appreciated,

    TIA

Leave a Reply

Colin Phillips

itgroove Alumni

Colin Phillips

Subscribe via Email

Categories

Powered by...

itgroove Corporate Profile
Connect With Us