Word2Vec: A Comparison Between CBOW, SkipGram & SkipGramSI

Learn how different Word2Vec architectures behave in practice. This is to help you make an informed decision on which architecture to use given the problem you are trying to solve. In this article, we will look at how the different neural network architectures for training a Word2Vec model behave in practice. The idea here is […]

The Dark Secrets Of BERT

BERT stands for Bidirectional Encoder Representations from Transformers. This model is basically a multi-layer bidirectional Transformer encoder(Devlin, Chang, Lee, & Toutanova, 2019), and there are multiple excellent guides about how it works generally, includingthe Illustrated Transformer. What we focus on is one specific component of Transformer architecture known as self-attention. In a nutshell, it is […]

10 Best Machine Learning Textbooks that All Data Scientists Should Read

Machine learning is an intimidating topic to tackle for the first time. The term encompasses so many fields, research topics and business use cases, that it can be difficult to even know where to start. To combat this, it’s often a good idea to turn to textbooks that will introduce you to the basic principles […]

The Best NLP Papers From ICLR 2020

I went through 687 papers that were accepted to ICLR 2020 virtual conference (out of 2594 submitted – up 63% since 2019!) and identified 9 papers with the potential to advance the use of deep learning NLP models in everyday use cases. Source: topbots

A Hacker’s Guide to Efficiently Train Deep Learning Models

Three months ago, I participated in a data science challenge that took place at my company. The goal was to help a marine researcher better identify whales based on the appearance of their flukes. More specifically, we were asked to predict for each image of a test set, the top 20 most similar images from […]

Millions of tiny databases

The Physalia configuration store for chain replication of EBS is implemented as key-value stores maintained over a large number of these cells. They built a test harness, called SimWorld, which abstracts networking, performance, and other systems concepts. The goal of this approach is to allow developers to write distributed systems tests, including tests that simulate […]

NTP: Building a more accurate time service at Facebook scale

Almost all of the billions of devices connected to the internet have onboard clocks, which need to be accurate to properly perform their functions. Many clocks contain inaccurate internal oscillators, which can cause seconds of inaccuracy per day and need to be periodically corrected. Incorrect time can lead to issues, such as missing an important […]

Reducing UDP latency

Hi! I’m one of Embox RTOS developers, and in this article I’ll tell you about one of the typical problems in the world of embedded systems and how we were solving it. Control and responsibility is a key point for a wide range of embedded systems. On the one hand, sensors and detectors must notify […]

Cloudflare’s Current Expansion Is Different from the Others

The company is expanding its US network in a big way, and it’s turned to two edge data center startups for help. In January, Cloudflare, which helps companies make their web services run faster and be more secure – and which more recently started to use its global data center network to also provide cloud […]