news

How should pipelines be monitored?

For online serving systems it’s fairly well known that you should look for request rate, errors and duration. What about offline processing pipelines though? For a typical web application, high latency or error rates are the sort of thing you want to wake someone up about as they usually negatively affect the end-user’s experience. Request rate isn’t something to alert on in and of itself, however it’s important to know as it’s often related to errors/latency plus you’ll want it for capacity planning.
Read more

Panel: First Steps with Machine Learning

This panel is a very diverse group, and I’m actually going to let them introduce themselves rather than me trying to butcher any names. This is all about answering my need, literally, my first steps. What should I be focused on as a software engineer wanting to get into ML and start using ML more convinced leadership on things that I want to do? For example, I work for an edge company deploying use cases at edge, so I want to be able to use machine learning to be able to anomaly-detect things at the edge.
Read more

Remote-controlled Salmon Farms to Operate Off Norway by 2020

Tucked within Norway’s fjord-riddled coast, nearly 3,500 fish pens corral upwards of 400 million salmon and trout. Not only does the country raise and ship more salmonoid overseas than any other in the world (1.1 million tons in 2018), farmed salmon is Norway’s third largest export behind crude petroleum and natural gas. In a global industry expected to quintuple by 2050, farmed salmon is a fine kettle of fish.
Read more

Deprecated APIs Removed In Kubernetes 1.16

As the Kubernetes API evolves, APIs are periodically reorganized or upgraded. When APIs evolve, the old API is deprecated and eventually removed. The 1.16 release will deprecate APIs for four services: None of these resources will be removed from Kubernetes or deprecated in any way. However, to continue using these resources, you must use a current version of the Kubernetes API. NetworkPolicy: will no longer be served from extensions/v1beta1 in v1.
Read more

Making Apache Spark Effortless for All of Uber

Apache Spark is a foundational piece of Uber’s Big Data infrastructure that powers many critical aspects of our business. We currently run more than one hundred thousand Spark applications per day, across multiple different compute environments. Spark’s versatility, which allows us to build applications and run them everywhere that we need, makes this scale possible. However, our ever-growing infrastructure means that these environments are constantly changing, making it increasingly difficult for both new and existing users to give their applications reliable access to data sources, compute resources, and supporting tools.
Read more

Secure your service mesh with Istio and keep an eye on it with Kiali

It is important to fine-tune the set of services that a workload has access to. It is a good practice to give the least privilege. In that sense, we should grant permissions to each workload to communicate with exactly the services it needs to access. This could also help reducing the attack surface in case of a compromised workload in our mesh. Unwanted requests between servicesFor example, a developer could contact the ratings service directly instead of using the review service.
Read more

Amenity Detection and Beyond—New Frontiers of Computer Vision at Airbnb

In 2018, we published a blog post titled Categorizing Listing Photos at Airbnb. In that post, we introduced an image classification model which categorized listing photos into different room types and helped organize hundreds of millions of listing photos on the Airbnb platform. Since then, the technology has been powering a wide range of internal content moderation tools, as well as some consumer-facing features on the Airbnb website. We hope such an image classification technology makes our business more efficient, and our products more pleasant to use.
Read more

Can We Trust GitHub Stars?

GitHub stars are an essential growth factor for many open source projects, but they can easily be from bot accounts. How can we trust GitHub stars again? For Open Source GitHub projects, stars are a primordial metric. Of course, there are ways to abuse this system, as you might have heard recently. As an open source company, we want our community’s legitimacy to be transparent, and we want to help the open source community do the same for other projects.
Read more

Signal Sciences brings real-time web attack visibility to Datadog

Signal Sciences is proud to announce our integration with the Datadog platform. This integration furthers our mission of producing the leading application security offering that empowers operations and development teams to proactively see and respond to web attacks—wherever and however they deploy their apps, APIs, and microservices. As the only next-gen WAF (web application firewall) built for today’s rapid development and deployment environments, Signal Sciences has integrated with Datadog to allow users to visualize and analyze web application activity in their Datadog dashboards, and to receive alerts about potential attacks.
Read more

How to Upgrade Your PostgreSQL Passwords to SCRAM

In a lot of PostgreSQL environments, it’s common practice to protect user accounts with a password. Starting with PostgreSQL 10, the way PostgreSQL manages password-based authentication got a major upgrade with the introduction of SCRAM authentication, a well-defined standard that is a significant improvement over the current system in PostgreSQL. What’s better is that almost all PostgreSQL drivers now support this new method of password authentication, which should help drive further adoption of this method.
Read more