Announcing Envoy Mobile

Today we are thrilled to announce the initial OSS preview release of Envoy Mobile, an iOS and Android client network library that brings the power of Envoy Proxy to mobile platforms. This is the beginning of a journey that we hope mobile developers around the industry will join us on. When Lyft originally announced Envoy in 2016, the project goal was simply stated as: The network should be transparent to applications.
Read more

Learning AWS App Mesh

At re:Invent 2018, AWS announced AWS App Mesh, a service mesh that provides application-level networking. App Mesh makes it easy for your services to communicate with each other across multiple types of compute infrastructure, including: App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high availability for your applications. Service meshes like App Mesh help you run and monitor HTTP and TCP services at scale.
Read more

How to run evolution strategies on Google Kubernetes Engine

Reinforcement learning (RL) has become popular in the machine learning community as more and more people have seen its amazing performance in games, chess and robotics. In previous blog posts we’ve shown you how to run RL algorithms on AI Platform utilizing both Google’s powerful computing infrastructure and intelligently managed training service such as Bayesian hyperparameter optimization. In this blog, we introduce Evolution Strategies (ES) and show how to run ES algorithms on Google Kubernetes Engine (GKE).
Read more

No Coding Required: Training Models with Ludwig, Uber’s Open Source Deep Learning Toolbox

Uber AI’s Piero Molino discusses Ludwig’s origin story, common use cases, and how others can get started with this powerful deep learning framework built on top of TensorFlow. Machine learning models perform a diversity of tasks at Uber, from improving our maps to streamlining chat communications and even preventing fraud. In addition to serving a variety of use cases, it is important that we make machine learning as accessible as possible for experts and non-experts alike so it can improve areas across our business.
Read more

Kubernetes Operators Best Practices

Kubernetes Operators are processes connecting to the master API and watching for events, typically on a limited number of resource types. When a relevant event occurs, the operator reacts and performs a specific action. This may be limited to interacting with the master API only, but will often involve performing some action on some other systems (this could be either in cluster or off cluster resources). Operators are implemented as a collection of controllers where each controller watches a specific resource type.
Read more

How to monitor Golden signals in Kubernetes

What are Golden signals metrics? How do you monitor golden signals in Kubernetes applications? Golden signals can help to detect issues of a microservices application. These signals are a reduced set of metrics that offer a wide view of a service from a user or consumer perspective, so you can detect potential problems that might be directly affecting the behaviour of the application. Golden signals can help to detect issues of a microservices application.
Read more

Square Case Study

Since 2009, Square has enabled quick and easy credit card payments to small businesses. Four years ago, the company branched out into peer-to-peer transactions via its Cash App. After some steady growth, the app rocketed in popularity in 2016, reaching millions of users over just a few months and landing at the top of the App Store downloads. The only problem? “We had a large monolith of a few hundred thousand lines of code that was built on the assumption of one single MySQL database; it was never really designed to scale from the start,” says Engineering Manager Jon Tirsen.
Read more

Aria Presto: Making table scan more efficient

The Aria is a set of initiatives to dramatically increase PrestoDB efficiency. Our goal is to achieve a 2-3x decrease in CPU time for Hive queries against tables stored in ORC format. For Aria, We are pursuing improvements in three areas: table scan, repartitioning (exchange, shuffle), and hash join. Nearly 60 percent of our global Presto CPU time is attributed to table scan, making scan improvements high leverage and thus the area we chose to focus on first.
Read more

Building Lyft’s Marketing Automation Platform

We take pride in our mission to improve people’s lives with the world’s best transportation. More than 50 million carbon neutral Lyft rides happen every month across the US and Canada—and we’ve barely scratched the surface in the potential for rideshare. Part of our growth is improvements in our acquisition process—like launching region-specific ad campaigns that increase awareness, and consideration of our multi-modal offerings. Coordinating these campaigns to acquire new users at scale has become time-consuming, leading us to take on the challenge of automation.
Read more

Secure Control of Egress Traffic in Istio, part 1

This is part 1 in a new series about secure control of egress traffic in Istio that I am going to publish. In this installment, I explain why you should apply egress traffic control to your cluster, the attacks involving egress traffic you want to prevent, and the requirements for your system to do so. Once you agree that you should control the egress traffic coming from your cluster, the following questions arise: What requirements does a system have for secure control of egress traffic?
Read more