运维与监控  - 讨论区

标题:系统管理员的软硬件维护清单

2011年01月29日 星期六 09:14

春节长假将至,有些系统管理员们被老板要求写一份公司的软硬件维护清单,对于没写过此类文档的运维朋友们而言会感到很苦恼。

系统维护清单该怎么写?

其实不光是在长假前后,系统管理员平时也应该养成按时(比如每天、每周、每月)按照维护清单进行软硬件维护的习惯。

简单而言,系统维护主要包括如下几个方面:

  1. 保持软件和系统的更新。软件更新通常包含bug修复和安全漏洞修复,这是为了你的安全着想。
  2. 杀毒软件的更新和定期查杀病毒。
  3. 检查你的系统监控数据是否完好的保存。各种监控。
  4. 检查系统的备份是否完好的保存。备份的重要性相信不用再强调了!
  5. 检查机房的物理环境,如温度、湿度等。
  6. 检查硬盘/RAID的情况,磁盘占用情况,是否有坏道。
  7. ……

从某种角度而言,系统维护清单都应该是系统管理员们必须遵守的铁律。

具体的系统维护清单,其实不少厂商(尤其是微软和IBM)都提供了软硬件维护清单的参考文档。可惜的是,大部分都还没有翻译成中文(这也是为什么技术人学好英文很重要,因为太多资料手册都是English Only)。下面摘录部分相关文档,以供大家参考。

微软BizTalk Server维护清单参考文档

每日检查清单

 

Steps Reference

Check for failed disks in the hardware RAID (reliability check).

"View Disk Properties" in the Windows Server 2003 product Help at  http://go.microsoft.com/fwlink/?linkid=104161

Check for messages requiring manual intervention such as suspended messages (reliability check).

For information about manually checking for suspended messages see "Investigating Orchestration, Port, and Message Failures" in BizTalk Server 2006 R2 Help at  http://go.microsoft.com/fwlink/?linkid=104169

For information about performing automated monitoring using Microsoft Operations Manager 2005 see "Suspended Message Alerts" at  http://go.microsoft.com/fwlink/?linkid=105059

Check the event logs for errors and warnings (administration check).

BizTalk Server 2006 R2 errors and warning events are saved in the application log. The event source is "BizTalk Server 2006". We recommend that you monitor the event log using an automated solution such as Microsoft System Center Operations Manager. For more information, see  Monitoring with MOM 2005 or Operations Manager 2007 .

 

每周检查清单

 

Steps Reference

Ensure that each host has an instance running on at least two physical BizTalk servers (reliability check).

High Availability for BizTalk Hosts

Ensure that each receive location is redundant (reliability check).

Scaling Out Receiving Hosts

Ensure that the SQL Server Agent service is running on the SQL server (administration check).

Ensure that all SQL Server jobs related to BizTalk Server are working properly (administration check).

Ensure that the SQL Server jobs responsible for backing up BizTalk Server databases are running normally (administration check).

Ensure that the latest security updates are installed (security check).

Microsoft Update site at  http://update.microsoft.com/microsoftupdate/v6/default.aspx

Analyze weekly performance monitoring logs against baseline and thresholds (performance check).

Ensure that the system is not experiencing frequent auto-growth of BizTalk Server databases (performance check).

Run SQL Server Profiler during high load to check for long response times and high resource usage (performance check).

"Using SQL Server Profiler" in the SQL Server 2005 Books Online at http://go.microsoft.com/fwlink/?LinkID=106720

Ensure that message batching for all adapters is appropriate for resource consumption or latency (performance check).

Ensure that the large message threshold is appropriate for resource consumption (performance check).

 

每月检查清单

 

Steps Reference

Ensure the master secret key is backed up and readily available on offline storage (reliability check).

How to Back Up the Master Secret

Ensure that failover for all clustered services has been tested (reliability check).

How to Test Group Failover

Ensure that the Enterprise SSO service is clustered (reliability check).

Clustering the Master Secret Server

Ensure that the BizTalk Server databases are clustered under SQL Server services (reliability check).

Clustering the BizTalk Server Databases

Ensure that at least two physical BizTalk servers are part of the BizTalk group (reliability check).

How to Ensure Multiple Servers Are Part of a BizTalk Group

Determine whether any unstable code is being used, and if so, use separate hosts (reliability check).

High Availability for BizTalk Hosts

Perform functional testing of all new BizTalk applications (reliability check).

Determine whether there are any unnecessary BizTalk applications, artifacts, and configurations (administration check).

  • Remove all unnecessary BizTalk applications, artifacts, and configurations.
     
  • For more information about removing a BizTalk application or artifact using the BTSTask command-line tool see "RemoveApp Command" in BizTalk Server 2006 R2 Help at http://go.microsoft.com/fwlink/?LinkID=106721 .
     
  • For more information about removing an artifact from an application using either the BizTalk Server Administration console or the BTSTask command-line tool, see "How to Remove an Artifact from an Application" in BizTalk Server 2006 R2 Help at  http://go.microsoft.com/fwlink/?LinkId=106722 .
     

Check the BizTalk Server Administration console for any non-approved changes (administration check).

"Using the BizTalk Server Administration Console" in BizTalk Server 2006 R2 Help at http://go.microsoft.com/fwlink/?LinkId=106723 .

Check BTSNTSvc.exe.config for any non-approved modifications (administration check).

"BTSNTSvc.exe.config File" in BizTalk Server 2006 R2 Help at  http://go.microsoft.com/fwlink/?LinkId=106724 .

Check the BizTalk Server-related registry keys for any non-approved modifications (administration check).

"Windows registry information for advanced users" article at  http://support.microsoft.com/kb/256986

Run the Best Practices Analyzer for BizTalk Server (administration check).

"BizTalk Server 2006 Best Practices Analyzer" article at  http://go.microsoft.com/fwlink/?LinkId=83317

Ensure that the latest service packs and updates are installed (administration and security check).

Microsoft Update site at  http://update.microsoft.com/microsoftupdate/v6/default.aspx

Ensure that the artifacts for different trading partners are not installed on the same host (security check).

Configuring Hosts and Host Instances

Ensure that BizTalk Server is using only domain-level users and groups (security check).

"Domain Groups" in BizTalk Server 2006 R2 Help at  http://go.microsoft.com/fwlink/?LinkId=106725 .

Ensure that the MSDTC Security Configuration is enabled (security check).

"Set the appropriate MSDTC Security Configuration options on Windows Server 2003 SP1 and Windows XP SP2" entry in "Troubleshooting Problems with MSDTC" in BizTalk Server 2006 R2 Help at http://go.microsoft.com/fwlink/?LinkID=101609 .

Determine whether the BizTalk Server cache refresh interval needs to be increased (performance check).

How to Adjust the Cache Refresh Interval

Determine whether the throttling options of each host need to be adjusted (performance check).

Inbound Host Throttling

Outbound Host Throttling

Determine whether unnecessary tracking is enabled, such as orchestration, shape, and Business Rule Engine (BRE) event tracking (performance check).

Determine whether you are using a dedicated host for tracking maintenance (performance check).

How to Use a Dedicated Host for Tracking Maintenance

Determine whether the default XML send pipeline is being used instead of the PassThrough send pipeline (performance check).

"Managing Send Ports Using BizTalk Explorer" in BizTalk Server 2006 R2 Help at http://go.microsoft.com/fwlink/?LinkId=106727 .

Check the BizTalk Server database sizes for an increasing trend (performance check).

Determine whether the system is encountering database contention (performance check).

For more information about avoiding contention in the MessageBox database, see  Avoiding Disk Contention .

 

 

IBM Lotus Domino服务器维护 清单

 

Task

Frequency

Back up the server

Daily, weekly, monthly

Monitor mail routing

Daily

Run Fixup to fix any corrupted databases *

At server startup and as needed

Monitor Administration Requests database (ADMIN4.NSF)

Weekly

Monitor databases that need maintenance

Weekly

Monitor replication

Daily

Monitor modem communications

Daily

Monitor memory

Monthly

Monitor disk space

Daily, weekly, monthly

Monitor server load

Monthly

Monitor server performance

Monthly

Monitor Web server requests

Monthly

Monitor server first domino servers

Daily

2011年01月29日 星期六 09:16

SQL Server硬件检查清单

 

The Basics    
Hardware Manufacturer:    
Model Number:    
Serial Number:    
Tower/Rack/Blade    
Physical Location of Server:    
Purchase Date:    
Warranty/Service Contract Number:    
Warranty/Service Telephone Number:    
Date Warranty Expires:    
     
CPU    
Number of CPU Sockets:    
Number of Installed CPUs:    
CPU Model:    
CPU Ghz Speed:    
Number of Cores per CPU:    
Type of Hyperthreading:    
Is Hyperthreading on or off:    
CPU L2 Cache Size:    
CPU Bus Speed:    
Motherboard BIOS Version:    
Is BIOS Version Current:    
     
Memory    
Current Amount of RAM:    
Additional RAM Capacity Available:    
Number of Memory Slots Used:    
Number of Memory Slots Available:    
ECC Memory:    
     
Network Adapter    
Hardware Manufacturer:    
Model Number:    
Speed:    
Number of Ports per Card:    
Number of Cards:    
BIOS Version Number:    
Is BIOS Version Current:    
NIC Speed/Duplex Setting:    
Is the NIC Power Saving Feature Off:    
     
Storage    
Type: Local, DAS, SAN, Combo:    
     
Local/Integrated RAID Controller    
Number of Local RAID Controllers:    
Type: SCSI, SAS, etc.    
Controller Hardware Manufacturer:    
Number of Ports:    
Controller Model Number:    
Controller Cache Size:    
Is There a Cache Battery:    
Is Write Back Caching On:    
Controller BIOS Version Number:    
Is Controller BIOS Version Current:    
     
External RAID Controllers    
Number of External RAID Controllers:    
Type: SCSI, SAS, etc.    
Controller Hardware Manufacturer:    
Controller Model Number:    
Number of External Ports:    
Controller Cache Size:    
Is There a Cache Battery:    
Is Write Back Caching On:    
Controller BIOS Version Number:    
Is Controller BIOS Version Current:    
     
Local Disk Configuration    
RAID Configuration:    
Number of Physical Drives:    
Physical Dimension of Drives:    
Drive Capacity:    
Drive Speed/RPM:    
Total Available Disk Space:    
     
HBAs for External Storage    
Number of HBAs:    
Type: iSCSI, Fibre Channel, etc:    
Type of Connectors:    
HBA Hardware Manufacturer:    
HBA Model Number:    
HBA BIOS Version Number:    
Is HBA BIOS Version Current:    
     
DAS Disk Configuration    
RAID Configuration:    
Number of Drives:    
Physical Dimension of Drives:    
Drive Capacity:    
Drive Speed/RPM:    
Total Available Disk Space:    
     
SAN Disk Configuration    
SAN Manufacturer:    
SAN Model:    
iSCSI, Fibre Channel, etc:    
SAN Cache Capacity:    
SAN Software Version:    
Is SAN Software Current:    
Number of Attached LUNs:    
RAID Configuration per LUN:    
Number of Drives Used per LUN:    
Capacity of Drives Used in LUNs:    
Speed of Drives Used in LUNs:    
Available Disk Space per LUN:    
Are LUNs Shared or Dedicated:    
     
High Availability    
Redundant Power Supplies:    
Redundant NICs:    
Redundant Controllers:    
All Components Connected to UPS:    
Is Server Physically Secure:    
If Cooling Required, is it Redundant:    
     
Clustering    
Number of Cluster Nodes:    
Number of Active Nodes:    
Number of Passive Nodes:    
Type of Quorum:    
Type of Shared Storage:    
Are HBAs Redundant:    
Are Storage Switches Redundant:    
Are NIC Switches Redundant:    
Are NICs Redundant:    
     
Backup    
Tape Drive: Internal/External:    
Tape Drive Manufacturer:    
Tape Drive Model:    
Local Disk:    
DAS Disk:    
SAN Disk:    

Windows Server 2003系统维护清单

 

Daily Operations Checklist

Checklist: Performing Physical Environmental Checks

Use this checklist to ensure that physical environment checks are completed.

Task:

·         Verify that environmental conditions are tracked and maintained.

·         Check temperature and humidity to ensure that environmental systems such as heating and air conditioning settings are within acceptable conditions, and that they function within the hardware manufacturer's specifications.

·         Verify that physical security measures such as locks, dongles, and access codes have not been breached and that they function correctly.

·         Ensure that your physical network and related hardware such as routers, switches, hubs, physical cables, and connectors are operational.

Checklist: Check Backups

Task:

·         Make sure that the recommended minimum backup strategy of a daily online backup is completed.

·         Verify that the previous backup operation completed.

·         Analyze and respond to errors and warnings during the backup operation.

·         Follow the established procedure for tape rotation, labeling, and storage.

·         Verify that the transaction logs were successfully purged (if your backup type is purging logs).

·         Make sure that backups complete under service level agreements (SLA).

·          Checklist: Check CPU and Memory Use

·         Use this checklist to record the sampling time of each counter.

Checklist: Check Disk Use

Follow the checklist and record the drive letter, designation, and available disk space.

Task

·         Create a list of all drives and label them in three categories: drives with transaction logs, drives with queues, and other drives.

·         Check disks with transaction log files.

·         Check disks with SMTP queues.

·         Check other disks.

·         Use server monitors to check free disk space.

·         Check performance on disks.

 

Drive Letter

Designation (drives with transaction logs, drives with queues, and other drives)

Available space MB

Available % free

Your data here

 

 

 

Your data here

 

 

 

Your data here

 

 

 

 

Checklist: Event Logs

Check event logs using the following checklist.

Task

·         Check application and system logs on the server to see all errors.

·         Check application and system logs on the Exchange server to see all warnings.

·         Note repetitive warning and error logs.

·         Respond to discovered failures and problems.

Weekly Maintenance Checklist

Checklist: Create Reports

Use this checklist to create status reports to help with capacity planning, service level agreement (SLA) reviews, and performance analysis.

Task: 

·         Use daily data from event log and System Monitor to create reports.

·         Report on disk usage.

·         Create reports on memory and CPU usage.

·         Generate uptime and availability reports.

Checklist: Incident Reports

Use this checklist to create incident reports.

Task 

·         List the top generated, resolved, and pending incidents.

·         Create solutions for unresolved incidents.

·         Update reports to include new trouble tickets.

·         Create a document depository for troubleshooting guides and post- mortems about outages.

Checklist: Antivirus Defense

Use this checklist to perform your antivirus defense.

Task 

·         Perform a virus scan on each computer.

·         Check anti-virus definition updates timely.

Checklist: Status Meeting

Use this checklist to conduct weekly status meetings during which the tasks are reviewed.

Task 

·         Server and network status for the overall organization and segments.

·         Organizational performance and availability.

·         Overview reports and incidents.

·         Risk analysis and evaluation including upcoming changes.

·         Capacity, availability, and performance reviews.

·         Service level agreement (SLA) performance, and review items that have not met target objectives.

来源: http://51cto.com/

如下红色区域有误,请重新填写。

    你的回复:

    请 登录 后回复。还没有在Zeuux哲思注册吗?现在 注册 !

    Zeuux © 2024

    京ICP备05028076号