Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing

At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process data at a massive scale in real time with exactly-once semantics, and the emergence of these […]

How Amazon is solving big-data challenges with data lakes

Back when Jeff Bezos filled orders in his garage and drove packages to the post office himself, crunching the numbers on costs, tracking inventory, and forecasting future demand was relatively simple. Fast-forward 25 years, Amazon’s retail business has more than 175 fulfillment centers (FC) worldwide with over 250,000 full-time associates shipping millions of items per […]

2019 in Review: 10 AI Papers That Made an Impact

Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a […]

How we 30x’d our Node parallelism

What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service. The service was originally designed such that each worker would process only a single request at […]

Serving 100µs reads with 100% availability

This is the story of how we built ctlstore, a distributed multi-tenant data store that features effectively infinite read scalability, serves queries in 100µs, and can withstand the failure of any component. Highly-reliable systems need highly-reliable data sources. Segment’s stream processing pipeline is no different. Pipeline components need not only the data that they process, […]

Making the LinkedIn experimentation engine 20x faster

At LinkedIn, we like to say that experimentation is in our blood because no production release at the company happens without experimentation; by “experimentation,” we typically mean “A/B testing.” The company relies on employees to make decisions by analyzing data. Experimentation is a data-driven foundation of the decision-making process, which helps with measuring the precise […]

2020 Cloud Report

Read the 2020 Cloud Report from Cockroach Labs, and learn which cloud platform performs best for transactional workloads across TPC-C, Network Throughput, CPU, and Storage benchmarks. If there’s one thing we’ve learned in our three years of benchmarking cloud providers on transactional workloads, it’s this: the results change often. Last year’s report showed AWS dramatically […]