Advanced Topics in Computer Systems

9/24/01

Anthony Joseph & Joe Hellerstein

 

Information Technology Disaster Planning, Management, and Recovery

Areas

·        Systems: HW, OS, Applications, …

·        Communications: Network, Phones, …

·        Data: Storage, Mgmt, Electronic versus non-Electronic format

 

How to prepare?

    Some issues:

·        Real-time vs non-real-time business

o       Factory w/ JIT, Financial, Tax acct

o       Healthcare versus fast-food

o       Governmental agencies

·        Amount and type of business data

o       Paper - what to do with it? Scan it!

o       Bits and bytes - how accessed?

§         r/o or r/w?

§         Sharing?

·        Scope and scale of expected disaster

o       Local, state, regional

·        Type of disaster and disruption

o       Earthquake, Flood, Hurricane, Rolling power outage, Terrorist, …

·        Size of business

o       One location versus multi-national (can leverage other locations)

·        Type of communication

o       Telephone, fax, internet, cellular, wireless device

 

Types of preparation

·        Hot site (reserved for the bigger companies)

o       Identical hardware (tape drive, backup software)

o       Can range from shared room to entire building/campus

o       Can be expensive (value of WTC infrastructure >$7 billion)

·        Cold site

o       Private or rented

o       IBM's facility received first call at 9:10 (22 minutes after first plane hit)

o       IBM: 175K sq ft, four stories

o       Fees range from $100/month to $1 million for recovery plan

·        Leverage off other locations of your company

·        What can smaller businesses do?

·        Data management options

o       Real-time data replication (East coast / west coast)

§         Expensive! $100K/month at least

§         Most common for financial companies

§         Only about 250 companies nationwide

o       Instant fail-over (activated by many large WTC companies)

§         Example Morgan Stanley has hot site in Teaneck NJ

o       Nightly tape (or physical disk) backup with off-site storage

§         Iron Mountain has 33 WTC customers storing data in Moonachie NJ (13 miles away). Sent drives to IBM cold site

§         How to reach storage facility?

§         Time to recover?

o       Less frequent backup

§         Empire Blue Shield (largest health insurer in NY)

§         Scans all documents, lost previous two weeks work

 

Communication issues

·        Volume of calls, types of lines

o       Verizon's West St facility supported 200K voice lines, 3 million private lines (ring-down circuits and leased data lines)

§         Ring-down circuits (dedicated point-to-point voice versus speed dial)

o       Verizon rerouted 24 OC-48 circuits (20 Million T1s)

·        Dynamic fail-over at physical layer

o       SONET (Synchronous Optical Network)

o       Dual redundant optical links arranged in bi-directional ring topology

o       Doesn't help when attached equipment is destroyed

·        WTC buildings contained two phone switches each capable of supporting a large city

 

Disaster management/recovery

·        Real-time recovery versus afterwards

o       Government agencies

o       Web site traffic increased to 160K/day for White House

o       Navy and army hit 205K and 137K

o       FBI collected 66K tips in 10 days

o       FEMA (no real traffic normally) hit 88K

·        Many large companies operating within one day

·        Healthcare, insurance, financial, …

·        Network management

o       Very little Internet infrastructure damaged in WTC disaster

o       But, incredible demand for information

o       E-mail, IM as communication alternatives

o       Loss of two of three over the air networks in NYC

§         NBC, ABC had facilities only on WTC

§         CBS had backup on Empire State building

§         WNBC streamed 300K feed

§         ABCNEWS streamed 56K feed

·        Cellular

o       40% US penetration rate

o       Played pivotal role in WTC disaster

o       Withstood high demand and significant damage

§         More resilient than fixed infrastructure – mobile!

o       Verizon alone lost 10 cellular towers

o       Replaced seven sites in NYC, added two in DC, one in PA using Cells On Wheels in hours

§         Versus predicted 90-days to replace fixed residential lines

o       Incorporates priority scheduling for overload conditions

§         911 ranked higher than ordinary calls

o       High demand for satellite phones (DISA bought 20K Iridium handhelds last year)

·        Telephone

o       Verizon lost major telco exchange and routing capability

o       Also lost Broad st office (80% of Nasdaq lines)

o       Traffic of 340 million calls/day (double normal)

o       Prioritized outgoing traffic over incoming traffic

o       Two-fifths of Pentagon damaged, but internal PBX remained internally operational

§         But, affected by Verizon cellular problems