While
working at my last job we began using a new product for our backups and replication called AppAssure.
Currently owned by Dell, AppAssure is a utility along the same
lines as EMCs Avamar or several other vendors backup products that promise you the world but fall exceedingly short.
Our main problem lied with several of our
roll up jobs not completing. While this in and of itself would not have been an issue, because
the rollups were performed every night, it became increasingly problematic when
the rollup jobs stopped working for roughly a year on several of our VM instances. Even
with de-duplication we sat at several TeraBytes worth up backups for
roughly 50 Virtual Machines. This caused AppAssure to hang on certain
processes, one of which was rollups, and on occasion stop the GUI from responding. One sure fire way of bringing
the product back in order, however, was to fire up Powershell and give the core processes a much needed reboot. As for how we fixed the rollup jobs, thats a story for a whole other post:
First we tried to manually stop all of our process through the AppAssure GUI which, as hinted at earlier, was a completed bust. After I gave up on this in about 2 minutes time I pulled up Powershell in administrator mode and got to work
In powershell window (administrator mode), this can be accessed by right clicking on powershell and clicking run as administrator:
NOTE: that all commands must be run on the core
- First import our powershell module that we will need to do the work
- Import-Module AppAssurePowerShellModule
- Pause our protection on all machines
- Suspend-Snapshot –All
- Pause our replication from one core to the other (if multiple cores are at work)
- From current core to remote core:
- Suspend-Replication -Outgoing
–All - From remote core to current core:
- Suspend-replication -Incoming
–All - If you are using virtual standby for any of your machines stop this process as well.
- Suspend-Vmexport –All
- Now we check to verify all of our current jobs have stopped running.
- Get-ActiveJobs –All
- If active jobs are running let them complete or try to cancel them. You can cancel any job except Rollups or Repository Checks in the Web UI. This was always funny to me because guess what one of our leading points of contention was...thats right rollups not completing. BUT DONT YOU DARE STOP THEM!
- Once you have verified all the processes are completed by using Get-ActiveJobs -All You can now stop the core process.
- First disable the core process. This isn't vital but can be very helpful if you restart the server and need to do work on AppAssure before the core begins to repost itself.
- Set-Service AppAssureCore -StartupType Disabled
- Stop the Core Service
- Stop-Service AppAssureCore
- Check the status of the core service
- Get-Service AppAssureCore
- Now this can take some time so go out have a coffee pull up netflix and watch a couple mind numbing episodes of your favorite show. Pull up youtube and tell your co workers your trying to verify bandwidth throughput. Just don't be thinking this is going to go quickly. If all else fails you can kill the process just know that you will have to sit through the mind numbing awesomeness of the repository rebuilding itself when you pull everything back up, which can take FOOOOOOOORRRREEEEEEEVVVVEEER
- Kill the process with fire
- Stop-Process -Processname Core.Service -Force
- Check the state of the process again just to make sure its dead
- Get-Process Core.Service
- Now you can go about your merry way. I generally ran a restart on the server at this point and any updates that it required. It really doesn't matter what you do since my problem most often was resolved by killing the Core Service in the first place.
- Now to restart everything. First re-enable our core service with a delay.
- Sc.exe Config AppAssureCore Start= Delayed-Auto
- Then start our core service.
- Start-Service AppAssureCore
- From here it's basically just the opposite of what we have already done.
- Resume our protection
- Resume-Snapshot –All
- Resume our replications
- From current core to remote core:
- Resume-Replication -Outgoing
–All - From remote core to current core:
- Resume-Replication -Incoming
–All - Resume Virtual Standby:
- Resume-Vmexport –All
From here you can go back to the AppAssure GUI and verify the the repository recovery processes are being run. This part can take awhile as all the gremlins are putting all the information back into a usable state and gremlins as we know work at their own pace and take union mandated smoke breaks. After this has been completed the process that was seemingly in a never ending loop should be dead as a doornail and things should be running again as the good lord Michael Dell had intended.
0 comments:
Post a Comment