progress has been rapidly accelerating in machine learning models that process language over the last couple of years. this progress has left the research lab and started powering some of the leading digital products. a great example of this is the . google believes this step (or progress in natural language understanding as applied in search) represents “the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of search”.
this post is a simple tutorial for how to use a variant of bert to classify sentences. this is an example that is basic enough as a first intro, yet advanced enough to showcase some of the key concepts involved.
3d和值价目表this year, we saw a dazzling application of machine learning. exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. the gpt-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the decoder-only transformer. the gpt2 was, however, a very large, transformer-based language model trained on a massive dataset. in this post, we’ll look at the architecture that enabled the model to produce its results. we will go into the depths of its self-attention layer. and then we’ll look at applications for the decoder-only transformer beyond language modeling.
My goal here is to also supplement my earlier post, The Illustrated Transformer, with more visuals explaining the inner-workings of transformers, and how they’ve evolved since the original paper. My hope is that this visual language will hopefully make it easier to explain later Transformer-based models as their inner-workings continue to evolve.
the package is the workhorse of data analysis, machine learning, and scientific computing in the python ecosystem. it vastly simplifies manipulating and crunching vectors and matrices. some of python’s leading package rely on numpy as a fundamental piece of their infrastructure (examples include scikit-learn, scipy, pandas, and tensorflow). beyond the ability to slice and dice numeric data, mastering numpy will give you an edge when dealing and debugging with advanced usecases in these libraries.
in this post, we’ll look at some of the main ways to use numpy and how it can represent different types of data (tables, images, text…etc) before we can serve them to machine learning models.
in this video, i introduced word embeddings and the word2vec algorithm. i then proceeded to discuss how the word2vec algorithm is used to create recommendation engines in companies like airbnb and alibaba. i close by glancing at real-world consequences of popular recommendation systems like those of youtube and facebook.
My Illustrated Word2vec post used and built on the materials I created for this talk (but didn’t include anything on the recommender application of word2vec). This was my first talk at a technical conference and I spent quite a bit of time preparing for it. In the six weeks prior to the conference I spent about 100 hours working on the presentation and ended up with 200 slides. It was an interesting balancing act of trying to make it introductory but not shallow, suitable for senior engineers and architects yet not necessarily ones who have machine learning experience.
“There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote
bush or the pattern of its leaves.
We try to copy these patterns in our lives and our society,
seeking the rhythms, the dances, the forms that comfort.
Yet, it is possible to see peril in the finding of
ultimate perfection. It is clear that the ultimate
pattern contains it own fixity. In such
perfection, all things move toward death.”
~ Dune (1965)
I find the concept of embeddings to be one of the most fascinating ideas in machine learning. If you’ve ever used Siri, Google Assistant, Alexa, Google Translate, or even smartphone keyboard with next-word prediction, then chances are you’ve benefitted from this idea that has become central to Natural Language Processing models. There has been quite a development over the last couple of decades in using embeddings for neural models (Recent developments include contextualized word embeddings leading to cutting-edge models like BERT and GPT2).
word2vec is a method to efficiently create word embeddings and has been around since 2013. but in addition to its utility as a word-embedding method, some of its concepts have been shown to be effective in creating recommendation engines and making sense of sequential data even in commercial, non-language tasks. companies like , , , and have all benefitted from carving out this brilliant piece of machinery from the world of nlp and using it in production to empower a new breed of recommendation engines.
3d和值价目表in this post, we’ll go over the concept of embedding, and the mechanics of generating embeddings with word2vec. but let’s start with an example to get familiar with using vectors to represent things. did you know that a list of five numbers (a vector) can represent so much about your personality?
The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Moreover, the NLP community has been putting forward incredibly powerful components that you can freely download and use in your own models and pipelines (It’s been referred to as , referencing how years ago similar developments accelerated the development of machine learning in Computer Vision tasks).
if you’re planning to learn data analysis, machine learning, or data science tools in python, you’re most likely going to be using the wonderful library. pandas is an open source library for data manipulation and analysis in python.
3d和值价目表one of the easiest ways to think about that, is that you can load tables (and excel files) and then slice and dice them in multiple ways:
Translations: , , Watch: MIT’s lecture referencing this post
In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformers outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their offering. So let’s try to break the model apart and look at how it functions.
the transformer was proposed in the paper . a tensorflow implementation of it is available as a part of the package. harvard’s nlp group created a . in this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
A High-Level Look
let’s begin by looking at the model as a single black box. in a machine translation application, it would take a sentence in one language, and output its translation in another.
Translations: , , Watch: MIT’s lecture referencing this post
May 25th update: New graphics (RNN animation, word embedding graph), color coding, elaborated on the final attention example.
Note: The animations below are videos. Touch or hover on them (if you’re using a mouse) to get play controls so you can pause if needed.
3d和值价目表sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning. google translate started such a model in production in late 2016. these models are explained in the two pioneering papers (, ).
3d和值价目表i found, however, that understanding the model well enough to implement it requires unraveling a series of concepts that build on top of each other. i thought that a bunch of these ideas would be more accessible if expressed visually. that’s what i aim to do in this post. you’ll need some previous understanding of deep learning to get through this post. i hope it can be a useful companion to reading the papers mentioned above (and the attention papers linked later in the post).
a sequence-to-sequence model is a model that takes a sequence of items (words, letters, features of an images…etc) and outputs another sequence of items. a trained model would work like this:
i love using python’s package for data analysis. the is a great place to start learning how to use it for data analysis.
3d和值价目表things get a lot more interesting once you’re comfortable with the fundamentals and start with . that guide shows some of the more interesting functions of reshaping data. below are some visualizations to go along with the pandas reshaping guide.
i’m not a machine learning expert. i’m a software engineer by training and i’ve had little interaction with ai. i had always wanted to delve deeper into machine learning, but never really found my “in”. that’s why when google open sourced tensorflow in november 2015, i got super excited and knew it was time to jump in and start the learning journey. not to sound dramatic, but to me, it actually felt kind of like prometheus handing down fire to mankind from the mount olympus of machine learning. in the back of my head was the idea that the entire field of big data and technologies like hadoop were vastly accelerated when google researchers released their map reduce paper. this time it’s not a paper – it’s the actual software they use internally after years and years of evolution.
so i started learning what i can about the basics of the topic, and saw the need for gentler resources for people with no experience in the field. this is my attempt at that.
in november 2015, google and open sourced , its latest and greatest machine learning library. this is a big deal for three reasons:
Machine Learning expertise: Google is a dominant force in machine learning. Its prominence in search owes a lot to the strides it achieved in machine learning.
Scalability: the announcement noted that TensorFlow was initially designed for internal use and that it’s already in production for some live product features.
Ability to run on Mobile.
3d和值价目表this last reason is the operating reason for this post since we’ll be focusing on android. if you examine the , you’ll find a little directory. i’ll try to shed some light on the android tensorflow example and some of the things going on under the hood.