Library of Congress Storage Architecture

In 2026 is there demand for 7X more manufactured storage annually and is there sufficient value for this storage to spend $122B more annually (2.4X) for this storage? Unlike HDD, tape magnetic physics is not the limiting issues since tape bit cells are 60X larger than HDD bit cells … The projected tape areal density in 2025 (90 Gbit/in2) is 13x smaller than today’s HDD areal density and has already been demonstrated in laboratory environments.
Read more

Artificial intelligence yields new antibiotic

Using a machine-learning algorithm, MIT researchers have identified a powerful new antibiotic compound. In laboratory tests, the drug killed many of the world’s most problematic disease-causing bacteria, including some strains that are resistant to all known antibiotics. It also cleared infections in two different mouse models. The computer model, which can screen more than a hundred million chemical compounds in a matter of days, is designed to pick out potential antibiotics that kill bacteria using different mechanisms than those of existing drugs.
Read more

Deep Learning for Anomaly Detection

Anomalies, often referred to as outliers, are data points or patterns in data that do not conform to a notion of normal behavior. Anomaly detection, then, is the task of finding those patterns in data that do not adhere to expected norms. The capability to recognize or detect anomalous behavior can provide highly useful insights across industries. Flagging or enacting a planned response when these unusual cases occur can save businesses time, money, and customers.
Read more

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing

At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process data at a massive scale in real time with exactly-once semantics, and the emergence of these systems over the past several years has unlocked an industry-wide ability to write streaming data processing applications at low latencies, a functionality previously impossible to achieve at scale.
Read more

How Amazon is solving big-data challenges with data lakes

Back when Jeff Bezos filled orders in his garage and drove packages to the post office himself, crunching the numbers on costs, tracking inventory, and forecasting future demand was relatively simple. Fast-forward 25 years, Amazon’s retail business has more than 175 fulfillment centers (FC) worldwide with over 250,000 full-time associates shipping millions of items per day. Amazon’s worldwide financial operations team has the incredible task of tracking all of that data (think petabytes).
Read more

2019 in Review: 10 AI Papers That Made an Impact

Synced spotlights 10 artificial intelligence papers that garnered extraordinary attention and accolades in 2019. The volume of peer-reviewed AI research papers has grown by more than 300 percent over the past three decades (Stanford AI Index 2019), and the top AI conferences in 2019 saw a deluge of paper. CVPR submissions spiked to 5,165, a 56 percent increase over 2018; ICLR received 1,591 main conference paper submissions, up 60 percent over last year; ACL reported a record-breaking 2,906 submissions, almost doubling last year’s 1,544; and ICCV 2019 received 4,303 submissions, more than twice the 2017 total.
Read more

How we 30x’d our Node parallelism

What’s the best way to safely increase parallelism in a production Node service? That’s a question my team needed to answer a couple of months ago. We were running 4,000 Node containers (or ‘workers’) for our bank integration service. The service was originally designed such that each worker would process only a single request at a time. This design lessened the impact of integrations that accidentally blocked the event loop, and allowed us to ignore the variability in resource usage across different integrations.
Read more

Serving 100µs reads with 100% availability

This is the story of how we built ctlstore, a distributed multi-tenant data store that features effectively infinite read scalability, serves queries in 100µs, and can withstand the failure of any component. Highly-reliable systems need highly-reliable data sources. Segment’s stream processing pipeline is no different. Pipeline components need not only the data that they process, but additional control data that specifies how the data is to be processed. End users configure some settings in a UI or via our API which in turn this manipulates the behavior of the pipeline.
Read more

From 15,000 database connections to under 100: DigitalOcean’s tale of tech debt

A new hire recently asked me over lunch, “What does DigitalOcean’s tech debt look like?” I could not help but smile when I heard the question. Software engineers asking about a company’s tech debt is the equivalent of asking about a credit score. It’s their way of As a cloud provider that manages our own servers and hardware, we have faced complications that many other startups have not encountered in this new era of cloud computing.
Read more

Boeing’s Starliner won’t make it to the ISS now because its internal clock went wrong

Boeing’s CST-100 Starliner launched into space for the first time today, but the spacecraft failed to make it into a stable orbit that would allow it to rendezvous with the International Space Station. What happened: An Atlas V rocket safely carried Starliner into space from Cape Canaveral Air Force Station on Friday, but the capsule had an anomaly with its internal system timer. A faulty internal clock means Starliner won’t rendezvous with the ISS, a massive setback for NASA and Boeing.
Read more