InfoQ

The Software Architects' Newsletter
August 2021
View in browser

Welcome to the InfoQ Software Architects' Newsletter! Each month, we bring you essential news and experience from industry peers on emerging patterns and technologies.

This month, we focus on the topic of "Architecting for Resilience". Identified as an "early adopter" trend in the recent Architecture and Design InfoQ Trends Report, designing for resilience, and the emergence of supporting technologies, is becoming increasingly popular, particularly in cloud and microservice-based systems. However, key challenges remain for architects designing such systems, including both human and technical issues.

News

A Sticky Situation: How Netflix Gains Confidence in Changes

In this recording from QCon Plus, Haley Tucker, a member of the Resilience Engineering team at Netflix, discusses "sticky canaries" and explores what they are and how they can help build confidence in changes and increase resilience.

Key takeaways from the session included: make sure you build the hooks into your platform that enable the types of experiments you need; build guardrails and safety mechanisms when you are testing in production; make sure you are monitoring customer KPIs, not proxy metrics.

What Have We Learned Over the Last Decade of Microservices?

This episode of the InfoQ podcast is a panel discussion from the microservices track QCon Plus held in May 2021. Track host Nicki Watt asks, "What have we learned over the last decade of microservices?"

The panelists included Chris Richardson, James Lewis, and Katie Gamanji. There were several great insights about how successfully developing, deploying, and maintaining resilient software depends as much or more on cultural and environmental factors than simply adopting microservices and all the tools and technology that now exist.

Learning from Incidents

Jessica DeVita and Nick Stenning have been improving how software teams learn from incidents in production. In this article, they share what they’ve learned from the research community and offer advice on the practical application of this work.

Key takeaways included:

  • More and more software systems are becoming "safety-critical."
  • Software teams generally analyze failure in ways that are simplistic or blameful.
  • Teams may fall into many intuitive traps, particularly the idea that human error is to blame for incidents.
  • The language used by investigators and facilitators is crucial to learning.
  • Teams can protect their learning by keeping repair planning discussions separate from an investigation.

5 Proven Patterns for Resilient Software Architecture Design

In this SearchAppArchitecture article, Priyank Gupta, a partner at Sahaj Software, discusses a set of coding and design patterns that ease the path to failure mitigation. His goal is to help architects "get in front of failures, prevent them from running rampantly through a distributed software system, and -- when needed -- gradually decommission problematic components without disturbing the whole operation."

Patterns discussed included bulkhead, backpressure, circuit breaker, batch-to-stream, and graceful degradation. For readers keen to explore these concepts in more depth, Michael Nygard’s book "Release It!" is highly recommended by the InfoQ team.

The Importance of Automated Testing for Microservices

This Software Testing News article from Constance Drugeot aggregates the experiences of several practitioners that have implemented automated testing for microservices. Key takeaways included ensuring that the required amount of testing should be focused at the right time with the most suitable tools, and that an organization should apply a "consumer-driven approach" to testing:

"The different types of tests require a big effort from the engineering and technical team. However, Guillermo warns, a robust testing strategy is not only affecting technical people within an organization. It will require product owners and different stakeholders to get involved with different inputs on how the microservices should work in a given scenario, how the whole digital strategy should be affected by one change in a microservice."

 

Case Study

Continuous Learning as a Tool for Adaptation

The InfoQ Covid Resilience series has highlighted and focused on distilling sources of organizational resilience through learning from unexpected events. In this capstone article, Nora Jones explores key themes from each of the articles in the series with a unique view on the practicality of organizational resilience for building companies that adapt to surprises and continue to thrive despite uncertainty and unexpected events.

Key takeaways included:

  • Engineering leadership should emphasize learning over action items to maximize the value spent in post-incident activities.
  • In doing so, action items become more likely to be completed, more collaborative, and more productive when space and time are given to learning after an incident takes place.
  • Performance improvement is best achieved by increasing and disseminating insights, not just reducing errors.
  • Asking questions after incidents using techniques like Cognitive Interviewing can increase insights and allow post-incident meetings to be more worthwhile.

Developing a learning organization is a competitive advantage. Leaders who study, encourage, and understand what it means to create this kind of organization are well-positioned to achieve more vital collaboration and team dynamics under pressure. Meaning that incidents might not be so detrimental and just might be another avenue that your people and business can easily cope with and learn from.

This content is an excerpt from a recent InfoQ article written by Nora Jones: "Continuous Learning as a Tool for Adaptation".

To get notifications when InfoQ publishes content on these topics, follow "resilience", "chaos engineering", and "continuous improvement" on InfoQ.

Missed a newsletter? You can find all of the previous issues on InfoQ.

Sponsored

LaunchDarkly

“The tools and practices software developers have created to release slowly and safely don’t translate to a world in which dozens of changes are released each day. What if you could fully decouple the act of delivering software from the act of releasing features?” Learn how feature management can help you “progressively deliver new code and features to end users, run experiments and A/B tests, customize the user experience, and maintain highly reliable applications—all while the application is running and without the need to deploy new code.”

Learn more about this topic with our free Book "Effective Feature Management (O'Reilly)" co-written by LaunchDarkly CTO & Co-Founder John Kodumal.

Upcoming Events

Discover events for senior software engineers by senior software engineers


QCon Plus Online Software Conference Nov 1-12: Uncover What's Next for Software Engineering

What’s your path to production? How are companies like Netflix, Fastly, or Google thinking about edge and architecture? How will you and your company manage the return to the office in a post COVID world? What does it look like to fully embrace a cloud operating model?

These questions and more are the types of things you'll find on the schedule at QCon Plus, an immersive, multi-discipline online conference this November 1-12. Save $250 off the full price if you register before Aug 31st.

InfoQ Live September 21st: Container Security and Observability in Kubernetes Environments.

As containers become the default component for modern applications and Kubernetes the main container-orchestration system, developers and architects must secure and monitor workloads on distributed systems and public cloud environments. With the enterprise adoption of DevOps and Kubernetes continuing to grow, learn how to take care of security and monitoring of Kubernetes.

InfoQ Live October 19th: Improving Application Security & Deployment Speed with Microservices and DevSecOps.

Maintaining the quality of your software during the move from your monolithic or other legacy system can be tough. Attend InfoQ Live October and learn from industry experts how to identify the challenges when planning the move to microservices and what are the best practices for achieving software quality during the entire process.

 

Senior software developers rely on the InfoQ community to keep ahead of the adoption curve. One of the main reasons software architects and engineers tell us they keep coming back to InfoQ is because they trust the information provided and selected by their peers.

We’ve been helping software development teams adopt new technologies and practices for over 15 years through InfoQ articles, news items, podcasts, tech talks, trends reports, and QCon software development conferences.

We hope you find this newsletter useful. If not, you can unsubscribe using the link below.

Unsubscribe

Forwarded email? Subscribe and get your own copy.

Subscribe