Most system admin tasks include some sort of backup or disaster recovery requirement. Whether your environment is virtual or physical, there are few enterprise quality solutions that fit the needs exactly. These are my opinions so use them to help make your selection but spread your research around the web from other sources. I also do not go into depth but provide my experience with Veeam over the years. Some of the standard requirements below are the basis for choosing any backup/recovery solution.
- Ease of implementation
- Backup recovery time
- Physical/Virtual machine backups
- Disaster Recovery/Replication Fail-over
- 321 Rule
Ease of Implementation
The ease of implementation should always be included in your decision making. Although it isn’t a major factor it can provide a streamlined path into a solid backup/recovery solution.
Veeam is relatively easy to implement with a bunch of caveats. For starters you need to provide a good amount of resources if you have a large environment. This includes your DR site so what you create in Production should be done in DR. Expect at least a week maybe more for things to level out. If your guests are in different domains, have different admin accounts, older operating systems, or just are finicky in general give Veeam time. Initially you may have a bunch of failures, strange errors, and resource contention. These things take time to fix and manually running jobs doesn’t really simulate real life. So plan on having a week minimum for your backups to level out.
As for replication this can take even longer. Bandwidth, disk I/O, and free resources all play a huge role when using Veeam Replication. Plan on at least 2 weeks for replication to level out.
Backup Recovery Time
Now comes the million dollar question. How quickly can I recover a machine? Well it would be nice to provide a real number but the true answer is it depends. Your backup repository is the key limiting factor when it comes to recovery time. If you are backing up to slow disk naturally recovery will be just as slow. Instant VM recovery is probably the best method to recover a critical machine in full. How long you may ask? We see it take around 5-10 minutes usually from start to finish if everything is local. But this is just to get it back online. To do the full recovery it takes whatever time it takes to migrate the data from the backup repository storage to the production vmfs datastore. While this occurs your machine keeps running as usual. So I suggest your backup repository storage has enough resources to run guest vm’s during this process.
I will not spend a lot of time talking about retention as it can be very complex with Veeam. The simple answer is throw the calendar out of the window. Veeam doesn’t keep data based on date. It considers a backup a restore point which in all reality is a snapshot. So 30 snapshots in one day equals 30 restore points which will blow your 30 day retention away. So if you need 30 days just set your restore point to 30 but be aware that if you do more than one backup a day with the job you will not get 30 days of retention. For example if you have a backup job that runs 3 times a day and you have retention set to 30 you will only get a 10 day recovery window by default. Now there are other settings to create fulls from time to time and more detailed retention policy. This can make things very complicated and prone to issues so be prepared to become a Veeam guru. I typically lean towards the old Ronco tag line “Set it and forget it” as well as KISS (Keep It Simple Stupid). I do not have the time nor resources to worry about backups so I tend to do out of box as much as possible.
Physical/Virtual Machine Backups
If you still have physical machines Veeam may not be the right fit. However Veeam has been working on their Endpoint backup solution a bit to make it more reliable and usable for enterprise use. Does it work well enough you ask? Well if you have only a few physical machines yes but any more may cause issues with your peace of mind. It does work but I wouldn’t rely on it for critical systems for now.
Disaster Recovery/Replication Fail-over
So you need replication to your DR site? Veeam works really well however if you need real-time recovery objectives it isn’t your solution. The least amount of time you will get for recovery will be 15 minutes. This will go downhill fast depending the resources, bandwidth, and size of your environment. If you have a few machines out of 100 that need 15 minutes fine that is doable, but true system recovery objectives will only be known once you test. High change guests will be difficult to hit the 15 minute mark. If you want to use Veeam for replication and need the 15 minute window for everything I would look at moving replication to the storage instead of Veeam. They only have a few supported storage systems so if you have them you are lucky. If not then I would look at something else like Zerto. The goal in my environment is to use Veeam for 3 years (saved on costs by buying 3 years of support) and move to Zerto. I also lacked the time/resources to move to Zerto first as we needed a quick solution I was familiar with. Doing this has allowed me to focus on other areas of the business while we plan the implementation of Zerto.
So what is the 321 rule? Well you should keep 3 different backups sets, on 2 different media types, with 1 being off-site. I do this in three ways conveniently. My normal 30 day backup is stored locally to provide the recovery objectives. My replicated data is my second copy but I only keep a single restore point. My third copy is in the cloud (Veeam Cloud Copy with Singlehop) and I keep 7 days due to the cost. As you can see this does follow the 321 rule. If I lose my production site I have DR with the latest replicated data. If I lose my production backup data I have 7 days in the cloud which should be ample enough to recover something that was deleted. It is probably too much but 7 days is what we chose. Veeam Cloud Copy does work but be sure you don’t get mad at failures. It will fail regularly but picks right back up during the retry. Since the percentage is very low we will ever need the cloud copy of data I do not worry too much about failures unless it is consistant.
So hopefully I have provided some good info for you to move to a decision. While Veeam does work and it works well there are other options. I would stay away from Unitrends, Barracuda, Symantec, etc… as they all are not enterprise class. I have used them all and you will hate yourself for spending the money on something that does not work consistently. If you have issues with Barracuda/Unitrends their answer 99% of the time will be a reboot of the guest. Veeam and VMware also have a strong relationship that the others do not. Do you wonder why VMware still hasn’t made many changes to SRM? Because Veeam does it better and they would rather not compete. VMware has focused on beefing up their API to provide a stable platform for third-party tools. This is a good thing as it gives us more resources to do our jobs successfully. If you have the money and time I would strongly suggest looking at Zerto as well. It is more expensive but it is the future.