Published by Addison-Wesley Professional (May 23, 2015) © 2015
Martin Abbott | Michael FisherThe Comprehensive, Proven Approach to IT Scalability–Updated with New Strategies, Technologies, and Case Studies
In The Art of Scalability, Second Edition, leading scalability consultants Martin L. Abbott and Michael T. Fisher cover everything you need to know to smoothly scale products and services for any requirement. This extensively revised edition reflects new technologies, strategies, and lessons, as well as new case studies from the authors’ pioneering consulting practice, AKF Partners.
Writing for technical and nontechnical decision-makers, Abbott and Fisher cover everything that impacts scalability, including architecture, process, people, organization, and technology. Their insights and recommendations reflect more than thirty years of experience at companies ranging from eBay to Visa, and Salesforce.com to Apple.
You’ll find updated strategies for structuring organizations to maximize agility and scalability, as well as new insights into the cloud (IaaS/PaaS) transition, NoSQL, DevOps, business metrics, and more. Using this guide’s tools and advice, you can systematically clear away obstacles to scalability–and achieve unprecedented IT and business performance.
Coverage includes
• Why scalability problems start with organizations and people, not technology, and what to do about it
• Actionable lessons from real successes and failures
• Staffing, structuring, and leading the agile, scalable organization
• Scaling processes for hyper-growth environments
• Architecting scalability: proprietary models for clarifying needs and making choices–including 15 key success principles
• Emerging technologies and challenges: data cost, datacenter planning, cloud evolution, and customer-aligned monitoring
• Measuring availability, capacity, load, and performance
Foreword xxiii
Acknowledgments xxvii
About the Authors xxix
Introduction 1
Part I: Staffing a Scalable Organization 7
Chapter 1: The Impact of People and Leadership on Scalability 9
The Case Method 9
Why People? 10
Why Organizations? 11
Why Management and Leadership? 17
Conclusion 19
Chapter 2: Roles for the Scalable Technology Organization 21
The Effects of Failure 21
Defining Roles 23
Executive Responsibilities 25
Individual Contributor Responsibilities 30
A Tool for Defining Responsibilities 35
Conclusion 39
Chapter 3: Designing Organizations 41
Organizational Influences That Affect Scalability 41
Team Size 44
Organizational Structure 51
Conclusion 69
Chapter 4: Leadership 101 71
What Is Leadership? 72
Leadership: A Conceptual Model 74
Taking Stock of Who You Are 76
Leading from the Front 78
Checking Your Ego at the Door 79
Mission First, People Always 80
Making Timely, Sound, and Morally Correct Decisions 81
Empowering Teams and Scalability 82
Alignment with Shareholder Value 83
Transformational Leadership 84
Vision 84
Mission 87
Goals 89
Putting It All Together 90
The Causal Roadmap to Success 94
Conclusion 95
Chapter 5: Management 101 99
What Is Management? 100
Project and Task Management 102
Building Teams: A Sports Analogy 105
Upgrading Teams: A Garden Analogy 107
Measurement, Metrics, and Goal Evaluation 111
The Goal Tree 114
Paving the Path for Success 115
Conclusion 116
Chapter 6: Relationships, Mindset, and the Business Case 119
Understanding the Experiential Chasm 119
Defeating the IT Mindset 122
The Business Case for Scale 124
Conclusion 127
Part II: Building Processes for Scale 129
Chapter 7: Why Processes Are Critical to Scale 131
The Purpose of Process 132
Right Time, Right Process 135
When Good Processes Go Bad 139
Conclusion 140
Chapter 8: Managing Incidents and Problems 143
What Is an Incident? 144
What Is a Problem? 145
The Components of Incident Management 146
The Components of Problem Management 149
Resolving Conflicts Between Incident and Problem Management 150
Incident and Problem Life Cycles 150
Implementing the Daily Incident Meeting 152
Implementing the Quarterly Incident Review 153
The Postmortem Process 153
Putting It All Together 156
Conclusion 157
Chapter 9: Managing Crises and Escalations 159
What Is a Crisis? 160
Why Differentiate a Crisis from Any Other Incident? 161
How Crises Can Change a Company 162
Order Out of Chaos 163
Communications and Control 168
The War Room 169
Escalations 170
Status Communications 171
Crisis Postmortem and Communication 172
Conclusion 173
Chapter 10: Controlling Change in Production Environments 177
What Is a Change? 178
Change Identification 179
Change Management 180
The Change Control Meeting 191
Continuous Process Improvement 192
Conclusion 193
Chapter 11: Determining Headroom for Applications 197
Purpose of the Process 198
Structure of the Process 199
Ideal Usage Percentage 203
A Quick Example Using Spreadsheets 206
Conclusion 207
Chapter 12: Establishing Architectural Principles 209
Principles and Goals 209
Principle Selection 212
AKF’s Most Commonly Adopted Architectural Principles 214
Conclusion 222
Chapter 13: Joint Architecture Design and Architecture Review Board 225
Fixing Organizational Dysfunction 225
Designing for Scale Cross-Functionally 226
JAD Entry and Exit Criteria 228
From JAD to ARB 230
Conducting the Meeting 232
ARB Entry and Exit Criteria 234
Conclusion 236
Chapter 14: Agile Architecture Design 239
Architecture in Agile Organizations 240
Ownership of Architecture 241
Limited Resources 242
Standards 243
ARB in the Agile Organization 246
Conclusion 247
Chapter 15: Focus on Core Competencies: Build Versus Buy 249
Building Versus Buying, and Scalability 249
Focusing on Cost 250
Focusing on Strategy 251
“Not Built Here” Phenomenon 252
Merging Cost and Strategy 252
Does This Component Create Strategic Competitive Differentiation? 253
Are We the Best Owners of This Component or Asset? 253
What Is the Competition for This Component? 254
Can We Build This Component Cost-Effectively? 254
The Best Buy Decision Ever 255
Anatomy of a Build-It-Yourself Failure 256
Conclusion 258
Chapter 16: Determining Risk 259
Importance of Risk Management to Scale 259
Measuring Risk 261
Managing Risk 268
Conclusion 271
Chapter 17: Performance and Stress Testing 273
Performing Performance Testing 273
Don’t Stress over Stress Testing 281
Performance and Stress Testing for Scalability 287
Conclusion 288
Chapter 18: Barrier Conditions and Rollback 291
Barrier Conditions 291
Rollback Capabilities 297
Markdown Functionality: Design to Be Disabled 300
Conclusion 301
Chapter 19: Fast or Right? 303
Tradeoffs in Business 303
Relation to Scalability 306
How to Think About the Decision 307
Conclusion 311
Part III: Architecting Scalable Solutions 315
Chapter 20: Designing for Any Technology 317
An Implementation Is Not an Architecture 317
Technology-Agnostic Design 318
The TAD Approach 323
Conclusion 325
Chapter 21: Creating Fault-Isolative Architectural Structures 327
Fault-Isolative Architecture Terms 327
Benefits of Fault Isolation 329
How to Approach Fault Isolation 336
When to Implement Fault Isolation 339
How to Test Fault-Isolative Designs 341
Conclusion 341
Chapter 22: Introduction to the AKF Scale Cube 343
The AKF Scale Cube 343
The x-Axis of the Cube 344
The y-Axis of the Cube 346
The z-Axis of the Cube 349
Putting It All Together 350
When and Where to Use the Cube 352
Conclusion 353
Chapter 23: Splitting Applications for Scale 357
The AKF Scale Cube for Applications 357
The x-Axis of the AKF Application Scale Cube 359
The y-Axis of the AKF Application Scale Cube 361
The z-Axis of the AKF Application Scale Cube 363
Putting It All Together 365
Practical Use of the Application Cube 367
Conclusion 371
Chapter 24: Splitting Databases for Scale 375
Applying the AKF Scale Cube to Databases 375
The x-Axis of the AKF Database Scale Cube 376
The y-Axis of the AKF Database Scale Cube 381
The z-Axis of the AKF Database Scale Cube 383
Putting It All Together 385
Practical Use of the Database Cube 388
Conclusion 393
Chapter 25: Caching for Performance and Scale 395
Caching Defined 395
Object Caches 399
Application Caches 402
Content Delivery Networks 407
Conclusion 408
Chapter 26: Asynchronous Design for Scale 411
Synching Up on Synchronization 411
Synchronous Versus Asynchronous Calls 412
Defining State 418
Conclusion 422
Part IV: Solving Other Issues and Challenges 425
Chapter 27: Too Much Data 427
The Cost of Data 427
The Value of Data and the Cost-Value Dilemma 430
Making Data Profitable 431
Handling Large Amounts of Data 434
Conclusion 444
Chapter 28: Grid Computing 447
History of Grid Computing 447
Pros and Cons of Grids 449
Different Uses for Grid Computing 454
Conclusion 457
Chapter 29: Soaring in the Clouds 459
History and Definitions 460
Characteristics and Architecture of Clouds 463
Differences Between Clouds and Grids 467
Pros and Cons of Cloud Computing 468
Where Clouds Fit in Different Companies 476
Decision Process 478
Conclusion 481
Chapter 30: Making Applications Cloud Ready 485
The Scale Cube in a Cloud 485
Overcoming Challenges 487
Intuit Case Study 491
Conclusion 493
Chapter 31: Monitoring Applications 495
“Why Didn’t We Catch That Earlier?” 495
A Framework for Monitoring 496
Measuring Monitoring: What Is and Isn’t Valuable? 503
Monitoring and Processes 504
Conclusion 506
Chapter 32: Planning Data Centers 509
Data Center Costs and Constraints 509
Location, Location, Location 511
Data Centers and Incremental Growth 514
When Do I Consider IaaS? 516
Three Magic Rules of Three 519
Multiple Active Data Center Considerations 525
Conclusion 527
Chapter 33: Putting It All Together 531
What to Do Now? 532
Further Resources on Scalability 535
Part V: Appendices 537
Appendix A: Calculating Availability 539
Hardware Uptime 540
Customer Complaints 541
Portion of Site Down 542
Third-Party Monitoring Service 543
Business Graph 544
Appendix B: Capacity Planning Calculations 547
Appendix C: Load and Performance Calculations 555
Index 563