News
    Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex configurations that affect these products at city and sub-city levels. To maintain our growth and architecture, Uber’s Observability team built a robust, scalable metrics and alerting pipeline responsible for detecting, mitigating, and notifying engineers of issues with their services as soon as they occur.
  
  
  Read more
  
Scaling Spark Streaming for Logging Event Ingestion
    Walking over a stream during an Airbnb Experience in Kuala Lumpur. Searching, viewing, and booking such Experiences will all produce logging events that will be processed by our stream processing framework. Logging events are emitted from clients (such as mobile apps and web browser) and online services with key information and context about the actions or operations.
Each event carries a specific piece of information. For example, when a guest searches for a beach house in Malibu on Airbnb.com, a search event containing the location, checkin and checkout dates, etc. would be generated (and anonymized for privacy protection). At Airbnb, event logging is crucial for us to understand guests and hosts and then provide them with a better experience.
  
  
  Read more
  
Stateful Service Design Considerations for the Kubernetes Stack
    At this summer’s QCon in New York, Jonas Bonér delivered one of the most popular talks of the conference with his focus on Designing Events-First Microservices. In this InfoQ Q&A, we asked Bonér to explain how “bringing bad habits from monolithic design” is a road to nowhere for service design, and where he sees his Akka framework fitting in the cloud-native stack. In this article, author Amit Baghel discusses how to monitor the performance of Apache Spark based applications using technologies like Uber JVM Profiler, InfluxDB database and Grafana data visualization tool.
  
  
  Read more
  
Some notes about HTTP/3
    HTTP/3 is going to be standardized. As an old protocol guy, I thought I’d write up some comments. Google (pbuh) has both the most popular web browser (Chrome) and the two most popular websites (#1 Google.com #2 Youtube.com).
Therefore, they are in control of future web protocol development. Their first upgrade they called SPDY (pronounced ‘speedy’), which was eventually standardized as the second version of HTTP, or HTTP/2. Their second upgrade they called QUIC (pronounced ‘quick’), which is being standardized as HTTP/3.
  
  
  Read more
  
Rookout launches its live Kubernetes debugger
    Rookout, a startup that offers debugging tools for applications that run on modern container and serverless platforms, is launching a new feature today that brings the equivalent of breakpoints to Kubernetes. To get around this, developers tend to go for a more indirect way of diagnosing issues and debugging their apps running on Kubernetes. That mostly means logging and distributed tracing, both of which have spawned their own ecosystems of open-source projects and startups.
  
  
  Read more
  
Decision Tree in Machine Learning
    A decision tree is a flowchart-like structure in which each internal node represents a test on a feature (e.g. whether a coin flip comes up heads or tails), each leaf node represents a class label (decision taken after computing all features) and branches represent conjunctions of features that lead to those class labels. The paths from root to leaf represent classification rules. Below diagram illustrate the basic flow of decision tree for decision making with labels (Rain(Yes), No Rain(No)).
  
  
  Read more
  
Project Calico, the CNI way
    When it comes to Kubernetes networking, Calico is widely used. One of the main reasons being its ease of use and the way it shapes up the network fabric. Calico is a pure L3 solution, where packets are routed in just the same manner as your regular Internet.
Each node (eg. VM) acts like a vRouter, which means tools like traceroute, ping, tcpdump, etc just work as expected! Whether the packet is flowing from one container to another or container to another node (or vice-versa), its just treated as a flat network route (L3 hops).
  
  
  Read more
  
Five Lessons From the First Three Years of Michelangelo
    Uber has been one of the most active contributors to open source machine learning technologies in the last few years. While companies like Google or Facebook have focused their contributions in new deep learning stacks like TensorFlow, Caffe2 or PyTorch, the Uber engineering team has really focused on tools and best practices for building machine learning at scale in the real world. Technologies such as Michelangelo, Horovod, PyML, Pyro are some of examples of Uber’s contributions to the machine learning ecosystem.
  
  
  Read more
  
Real Time Facial Expression Recognition
    Computer animated agents and robots bring new dimension in human computer interaction which makes it vital as how computers can affect our social life in day-to-day activities. Face to face communication is a real-time process operating at a time scale in the order of milliseconds. The level of uncertainty at this time scale is considerable, making it necessary for humans and machines to rely on sensory rich perceptual primitives rather than slow symbolic inference processes.
  
  
  Read more
  
How we spent two weeks hunting an NFS bug in the Linux kernel
    On Sep. 14, the GitLab support team escalated a critical problem encountered by one of our customers: GitLab would run fine for a while, but after some time users encountered errors. When attempting to clone certain repositories via Git, users would see an opaque Stale file error message. The error message persisted for a long time, blocking employees from being able to work, unless a system administrator intervened manually by running ls in the directory itself.
  
  
  Read more