What does a DBA need to check for daily, weekly, monthly, quarterly and yearly.
Introduction
This is just a quick checklist of all the things that a DBA needs to monitor on a daily, weekly, monthly, quarterly and yearly basis. Of course there are events that you want to know about immediately, for those events you can setup alerts or you can invest in a tool which will monitor SQL Server and alert you when something goes wrong
Monitor Daily
Check the SQL Server error log
Check the error log daily or better yet several times per day. If you can set up alerts when things get written to the error log with a certain severity then try to do so
Check for failed backups
The last thing you want to know is that your backup process has failed the day that you need to restore a database
Check for free disk/file space
You don't want users to tell you that their transactions are failing because the log or data file is full
Error log/windows event logs
Catch problems early, don't wait for the whole system to crash, if you catch it early enough you could prevent a disaster
Monitor Buffer cache hit ratio and Page life expectancy
Both of these counters are an indicator that you need more memory
Check for failed SQL Agent jobs
Make sure that you get notified when a job fails, some jobs might be time sensitive or a pain in the neck to run the next day because they might be configured in such a way that they expect to work only for the current date
Monitor Deadlocks
You can capture deadlocks in the errorlog by enabling the following trace flag DBCC TRACEON (3605,1204,1222,-1)
Check the error log daily or better yet several times per day. If you can set up alerts when things get written to the error log with a certain severity then try to do so
Check for failed backups
The last thing you want to know is that your backup process has failed the day that you need to restore a database
Check for free disk/file space
You don't want users to tell you that their transactions are failing because the log or data file is full
Error log/windows event logs
Catch problems early, don't wait for the whole system to crash, if you catch it early enough you could prevent a disaster
Monitor Buffer cache hit ratio and Page life expectancy
Both of these counters are an indicator that you need more memory
Check for failed SQL Agent jobs
Make sure that you get notified when a job fails, some jobs might be time sensitive or a pain in the neck to run the next day because they might be configured in such a way that they expect to work only for the current date
Monitor Deadlocks
You can capture deadlocks in the errorlog by enabling the following trace flag DBCC TRACEON (3605,1204,1222,-1)
Monitor Weekly
Cycle The SQL Server Error Log
You don't want files that are huge, keep your log files small by recycling them once a week
Test your Full Recovery model by restoring backups
This is similar to make sure that your backups work, how do you know that files even if completed successfully can actually be restored
DBCC checks
Make sure that you do a DBCC CHECKDB regularly, this will catch corrupted tables and indexes and the overall health of your databases
Update statistics, check if statistics are stale
If you don't have auto update statistics enabled then you need to make sure that your statitics are not stale, otherwise your queries might be slow
Index maintenance
Make sure that indexes are not fragmented,
You don't want files that are huge, keep your log files small by recycling them once a week
Test your Full Recovery model by restoring backups
This is similar to make sure that your backups work, how do you know that files even if completed successfully can actually be restored
DBCC checks
Make sure that you do a DBCC CHECKDB regularly, this will catch corrupted tables and indexes and the overall health of your databases
Update statistics, check if statistics are stale
If you don't have auto update statistics enabled then you need to make sure that your statitics are not stale, otherwise your queries might be slow
Index maintenance
Make sure that indexes are not fragmented,
Monitor Monthly
Disaster Recovery testing
Does your fail over strategy work, have you tried a mirror fail over? What happens if your whole datacenter goes down, do you have redundancy?
Does your fail over strategy work, have you tried a mirror fail over? What happens if your whole datacenter goes down, do you have redundancy?
Check for Service Packs and Cumulative Updates
Make sure that your systems are up to date. Remember the SQL Slammer worm, people who did not apply the latest SQL Server Service Pack got hit. Besides security, Service Pack contain bug fixes, improvements and sometimes even new features. Before you apply Service Packs and Cumulative Updates to you production servers make sure that you have tested them on your staging/QA servers.
Make sure that your systems are up to date. Remember the SQL Slammer worm, people who did not apply the latest SQL Server Service Pack got hit. Besides security, Service Pack contain bug fixes, improvements and sometimes even new features. Before you apply Service Packs and Cumulative Updates to you production servers make sure that you have tested them on your staging/QA servers.
Monitor Quarterly
Capacity planning
Is your server in good shape to handle the extra data that will be stored in the next 6 months, do you have enough disk space, are the CPUs fast enough to hanle the load, do you need more CPUs? These are all question that you have to ask yourself before it is too late
Perfmon metrics (trending)
Did you create a baseline for your server, do you know what a normal load is? The Creating a baseline for SQL Server has a couple of ideas
Security audit
Are users still allowed in that should not have, are users account still on the server even though they have left the company?
Is your server in good shape to handle the extra data that will be stored in the next 6 months, do you have enough disk space, are the CPUs fast enough to hanle the load, do you need more CPUs? These are all question that you have to ask yourself before it is too late
Perfmon metrics (trending)
Did you create a baseline for your server, do you know what a normal load is? The Creating a baseline for SQL Server has a couple of ideas
Security audit
Are users still allowed in that should not have, are users account still on the server even though they have left the company?
Monitor Yearly
Data center/Hardware/Server planning
Do you have enough rack space and servers for the next two years?
Do you have enough rack space and servers for the next two years?
No comments:
Post a Comment