Highlights of SIGIR 2016

26 Jul 2016 - 13 minute read

This note contains highlights of SIGIR 2016 held at Pisa, Italy. SIGIR (Special Interest Group for Information Retrieval) is an annual conference related to new advances in Information Retrieval. I haven’t gone through all the papers yet, but sharing some highlights of the parts that I could attend / go through offline via the slides / proceedings.

Tourists posing near the Leaning Tower.

Day 1

The first day was dedicated to tutorials. Most tutorials were ‘survey’ like in the content, in that they did not present anything new, but were good if you want to get an idea about what’s happening in a particular area of IR.

Deep Learning Tutorial

This tutorial was conducted by members of the Huawei Labs, China. It was about the current state of Deep Learning in the industry.

Went over basics like Convolutional Networks, Long Short Term Memory (LSTM), Embeddings, Recurrent Neural Networks, etc.
Application of these techniques to Classification, Matching, Translation, Q&A answering, etc.
Slides here

Succinct Data Structures

This tutorial was about data-structures related to inverted indices. It went over both, theory (minimal and easy to understand), as well as a hands-on session. I really enjoyed this tutorial.

Related to minimizing the space occupied by an inverted index, specifically about optimizing storage and lookups in posting lists. Link to the tutorial
Talked about Elias-Fano & Partitioned-Elias Fano encoding.
Practical task of constructing an inverted index for suggesting top-k keywords for a typeahead (based on wikipedia page titles).
Demo-ed the use of their Succinct Data Structures Library (SDSL).
Described different data structures such as Wavelet trees, Suffix Arrays, Tries, etc.

Day 2 & Day 3

The second day started with a keynote from Christopher Manning. Some highlights from the talk:

His talk was about Natural Language Inference, and how Deep Learning can help with that.
The primary goal was to be able to answer 4th grade science examinations.
He talked about employing semantic embeddings between sentences and phrases for this.
More detail about his talk here.

Listing notes from few talks which were interesting:

Learning to Rank with Selection Bias in Personal Search (Google)

Personal Search is hard because human judgement is not possible on personal corpora such as GMail, GDrive.
Personal Search is much sparser than web-search, as in for each searcher, the corpus is almost entirely different. Hence we cannot use models which use a large number of click data for each query-document pair.
Went over the explanation of Selection Bias: How queries with equal probabilities of appearing in the universe of all queries, can have different probabilities of appearing in the training set, because of selection bias.
Introduced the inverse propensity bias, which is the ratio between the probability of the given query appearing in a random sample to the probability of the query appearing in the training set, and use that as a multiplicative weight for the loss function.
They try to learn these weights in a global as well as segmented fashion. Where segmentation happens on query categories.
They show improvements in online metrics as a result of accounting for selection biases.

Fast and Compact Hamming Distance Index

This was another interesting talk, if you are interested in data-structures and efficiency work. The problem was to find all keys (assume fixed-size integers) in a given set $S$, which are at most distance $k$ away from a given key $Q$. Distance here is defined as the number of bits which differ.
Applications are similarity detection / finding near duplications, image similarity. They use simhash to compute a hash of the document under question. Simhash has a property which is atypical of hash functions, in that it generates similar hashes for similar documents.
They discussed several approaches like breaking each key into blocks of size $k$, and checking if at least one block matches with $Q$ (each key which passes this test is a candidate).
Presented a Triangle Inequality approach, where $H(Q, P)$ $\leq$ $H(Q, K) + H(K, P)$. Where $H(A, B)$ is the hamming distance between $A$ & $B$. This is helpful in clustering together similar keys, and avoiding wasteful comparisons.

Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

The problem here was to match search queries to relevant ads.
Used semantic embeddings to match queries to ads.
All relevant ads had to have a minimum cosine similarity score, which was decided using offline ratings. Use skipped over ads as a negative signal.
For training over large data-sets, they parallelize the training not by sharding the data, but by splitting the learning of contiguous non-overlapping range of the dense vector representations. That is, one trainer machine can learn [d0,…,d15], the other can learn [d16,..d31], and so on, but on the entire training data, using different random seeds.
These partial embeddings are then stitched together.

Day 4

Day 4 had talks from the Industry which talked about IR systems at scale (big / small). I found these to be very interesting. It’s sad that they were not recorded.

Search Is Not a Box - Hadar Shemtov (Google)

This talk went over Google Voice Search.
About 50% of all searches come from Mobile. I found this number to be smaller than expected, given that Android + iOS usually account for significantly more than 50% in several metrics for other big products.
20% queries now come from voice. This number is huge, given voice is still relatively new.
The speaker then went over the differences between Voice & Typed queries: Intent mix is different for voice v/s typed. The former has more of informational queries, like weather, movie times, game scores, etc.
The results for these queries are built via curated knowledge graphs, featured snippets, etc.
The searchers don’t look at multiple results for voice. This could be because they use voice in a hands-free mode, like when driving / cooking, etc.
Voice has to support a conversational mode. Voice Search cannot be stateless, as there would be follow-up questions.
Modules for Voice Search cannot optimize for clicks anymore.
They support the ability to zoom into excerpts from matching documents, to be able to zoom into the precise answer. This was really cool.
They went one level deeper with Google Now to proactively present results for queries you are likely to perform.

There was a related paper presented by Ido Guy from Yahoo! Research.

Searching by Talking: Analysis of Voice Queries on Mobile Web Search

This paper has a more deeper analysis of voice v/s web queries, along with backing data. I really recommend reading this paper, as it is very well written and received one of the several Best Papers award.
More Q&A + audio/video sort of queries.
Found that voice queries tend to be longer.
Language used in voice search is very different:
- Richer language such as “i’m looking for”, “what is the”, and even “please”.
- Higher Use of words which are harder to spell, but easy to say, such as “Mississippi”, and avoidance of terms which are easier to spell, but harder to say, such as “2015”.

When Watson Went to Work - Aya Soffer (IBM Research)

IBM Research built Watson, and trained it to play Jeopardy as a way to demonstrate success in Q&A.
They then exposed parts of Watson as cloud APIs (retrieval & ranking, sentiment analysis, entity tagging, etc.)
Multiple applications within IBM for these modular APIs.
She then talked in detail about how they are being used in Customer Care to build intelligent bots.

Ask Your TV: Real-Time Question Answering with Recurrent Neural Networks - (Ferhan Ture, Comcast)

Comcast has built a ‘Voice Remote’, which should basically free the user from actually knowing the channel numbers / looking up the on-screen guide. It can take commands such as “HBO” (which will switch the channel to HBO).
They also want it to support Q&As. Such as, “Tom Hanks Birthday”, “Who played Donovan in Bridge of Spies?”.
Created synthetic Q&As using Knowledge Graphs. Like if they know attributes like { “person”: “Tom Hanks”, “birthday”: “01/01/1950”, “place-of-birth”: “USA” }, they can create questions like “Where was Tom Hanks born?”, “When was Tom Hanks born?”, etc.
They then use RNNs for tagging entities, and classifying the question into a certain question type.
They also added diversity into the way a question was phrased, using synonyms, to get a well-trained model independent of the way the searcher articulates. Also added noise into the training data.
Used the keras toolkit which uses theano underneath, to write a quick script to train a model (apparently 5 lines to get boot-strapped).

Amazon Search: The Joy of Ranking Products (Daria Sorokina)

Amazon has about 100 ML models for ranking products. They have one model per category (Electronics, Books, etc.) per site (US, Japan, India, etc.).
They use GBDT as the training algorithm, with about 200 trees / model, and ~ 150 features per model.
They use data from searcher behavior to train the models. Positive Labels are like: Item was Clicked, Added to Cart, Purchased, Consumed (Applicable for Video / Music).
Negative Labels are like: Ignored, Clicks on a random result placed in the top few results (indicates positional bias).
Their result blending across different categories / verticals works as follows:
- They use the Vertical Score as a feature
- Relevance of the query for the given category / vertical.
- Hunger Score: To introduce diversity in the results, each vertical has a score which is proportional to how long its results have not been included in the blended results.
They use Probabilistic Context Free Grammars (PCFG) for query understanding. For example, “dress” in “dress shoes” is a modifier, but not in “casual dress”.
They use reformulations as a signal to find related products.
They semi-curate / bias results for certain queries, such as “diamond jewelry”.
- In this case, if they optimize for CTR, products such as fake/cheap diamond jewelry would always get placed at the top.
- They find certain queries for which they want to
- They categorize their users into “fashionable” users, and use their training data to rank results for such queries.

Learning to Rank Personalized Search Results in Professional Networks (Viet Ha-Thuc)

LinkedIn Search has different use-cases (recruiting, connecting, job seeking, sales, research, etc.)
They would want to personalize the results for recruiters, job seekers, etc.
Use LinkedIn “skills” as a way to cluster users, underlying assumption being that people with similar skills are likely to connect (while also removing unhelpful skills).
Getting Intent Estimations for different intents.
Use intent estimations for their federated search (people result, job result, group result, etc.)
Slides here

Day 5

The last day had workshops. I attended the first few talks of the NeuIR (Neural Network IR) workshop, before I had to leave to catch my flight. The keynote was given by Tomas Mikolov from FAIR. His slides are here.

Key points:

Gave a brief history of Recurrent Neural Networks (RNN), and their resurgence after 2010, when a key problem with the gradient descent (“exploding” gradients) was fixed. Also went through their applications.
Simple Recurrent Networks (SRN) and Structurally Constrained Recurrent Nets (SCRN) as a way to store Longer Short Term Memory.
SCRNs perform better than traditional LSTMs for large data. I couldn’t get much in terms of intuition as to why what he said was true.
Rest of the talk comprised of discussing what an Intelligent Machine would comprise of.
The idea is to train the machine (learner) on simple tasks, and it should be able to learn how to perform similar tasks without supervision, but through rewards. The motivation is to be able to do well in general tasks, not just a specific application (like, say playing Go).
The learner should be able to interact with humans and be able to search the internet and learn how to perform certain tasks. These are very ambitious goals. Presented Stack RNNs, which can be used for tasks such as recognition of patterns like a^n b^n, or learning how to perform binary addition.

Miscellaneous Papers I Glanced Over

Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015

A thorough analysis of papers in the last 10 years, and a report regarding whether or not they used significance tests, which tests were used, and whether or not p-values and/or test statistics were reported.
While number of papers where significance tests were being used has increased, the proportion of papers which report p-values and/or test statistics hasn’t. The authors also mention ‘overpowered’ cases, where the test/control populations were so large, a very tiny movement was also stat-sig (0.007% in the case of this case)

Query to Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor Grammars

They try to extract out entities (product and brand) from shopping queries in an unsupervised fashion.
They employ several steps to clean up the data (removing stop-words, re-ordering words to match known distributions of word groupings, i.e., replacing “head bobble” with “booble head”, for example).
They use Probabilistic Context Free Grammars, which have several derivation trees for each rule, with associated probability to work on this cleaned up data.

Explicit In Situ User Feedback for Search Results

Microsoft had an interesting poster, where they demo-ed that they now ask for feedback regarding a certain result, after the searcher clicks on that result, but clicks on the back button.
They then use these results along with offline raters, and try to correlate their offline relevance v/s user reports. Though there is inherent selection bias in this set, since searchers are likely to click on the back button mostly for unsatisfactory results and/or browsy behaviors.

Overall

This was my first IR conference, and an academic conference in a long time. These are the key takeaways:

Deep Learning has several applications in IR. This is true both for Industry as well as Academia.
- Industrial talks were straight-forward applications around semantic understanding, query understanding, Q&A answering, etc.
- Fun tidbit: It seems constructing very deep networks is in vogue, and a good way to get a paper accepted. One particular architecture presented during the conference had a convolutional layer, an LSTM layer, max-pooling, and a fully connected layer. When asked by the audience, if they tried a less deep network (because their problem didn’t seem complex enough), they responded that trying to go less deep was Future Work :-)
I enjoyed the more ‘applied’ aspects of the conference, such as the tutorials, workshops, industrial talks and poster presentations.
A significant fraction of the academic papers that I could go through, had very specific scope, not always well-thought in terms of scale, and sometimes not thorough in terms of stat-signess.
I felt presentations could have been more effective with examples. I liked how some of efficiency papers, posters and industrial track talks were more driven by examples. People needing deeply technical details, can always refer to the papers.
Enjoyed interacting with people from different places (such as Microsoft, Academia, small startups (scale small enough that they can take short-cuts)), and understand their perspective.

Lambda: A LISP interpreter written in Go for an iOS app

16 Jan 2016 - 15 minute read

In the summer of 2015, I wanted to work on a side-project, that I can quickly use, instead of spending lots of weekends, and it getting no where. Luckily, I found the right project on Peter Norvig’s website. Here I will try to describe, how to write a simple LISP interpreter, and how to use that in your iOS app.

To those who are new to LISP, it is pretty simple to explain. LISP programs are based on something called ‘s-expressions’. An s-expression looks like this: $(\mathrm{operator}\ \mathrm{operand_1}\ \mathrm{operand_2}\ \mathrm{operand_3}\ …)$.

For example:

$(+\ 1\ 2)$ is an s-expression, where $+$ is the operator, and $1$ and $2$ are the operands. Hence, it is equivalent to $1\ +\ 2$.
Similarly, $(*\ 2\ 3\ 4)$ in LISP evaluates to $2\ *\ 3\ *\ 4$.

The operands can themselves be recursively computed too.

For example, $(+$ $1$ $(*$ $2$ $3))$ is a valid expression. First we evaluate the inner $(*$ $2$ $3)$ part, then the original expression resolves to $(+$ $1$ $6)$, which then evaluates to $7$. This can go on recursively.

For someone who wants to design an interpreter, LISP is the ideal real-life language to start with. This is for two reasons:

There are minimal symbols to interpret. Only $($ and $)$ and the operators that you define.
The parsing is straight-forward and recursive. Zero syntactic-sugar.

I could only stay motivated, and bring this project to a closure, because you can pick a very small subset of LISP, and still do a lot of interesting things.

What I Built

To keep you motivated about reading the article, lets do a demo first and then we can go into details about how I built this.

Here is the GitHub repository for the interpreter, and the app on iTunes (Lambda Lisp). Feel free to file issues / contribute.

If you are still reading: Let’s build an interpreter!

Lexing

Lexing involves finding lexemes, or syntactical tokens, which can then be combined to interpret a grammatical sentence. In the expression $(+$ $1$ $2)$, the tokens are [$($, $+$, $1$, $2$, $)$]. Sophisticated compilers use lex or flex for finding these tokens, handling white-space, attaching a token type to each of them, etc.

I did not want to bloat up my simple interpreter by using lex / flex. I found this nifty one-line bare-bones Lexer in Peter Norvig’s article:

def tokenize(chars):
    "Convert a string of characters into a list of tokens."
    return chars.replace('(', ' ( ').replace(')', ' ) ').split()

Essentially, what this does is to handle white-space (somewhat). It basically adds spaces around the brackets, and then splits the expression on white-space.

We need to do the replacement for all operators, but otherwise it works well. This is because LISP is simple enough that attaching types to tokens (and error-ing out, if required) can be done at the time of parsing. This is how I did it in Go, just for completeness sake.

func tokenize(exp string) []string {
  return strings.Fields(
    strings.Replace(strings.Replace(exp, "(", " ( ", -1), ")", " ) ", -1),
  )
}

The recurrent theme in the design of this interpreter, is to be lazy and push the harder problems to the next layer.

Constructing an AST

Given an expression, we would need to make sure that the expression follows a structure, or a Grammar. This means two things in our case:

The s-expression should be well formed. The brackets should be balanced, should not be out of order, etc.
Operators such as $+$, $*$, etc. get the right number of operands, have the right type of operands, etc.

At this stage, we are only concerned about well-formedness of the s-expression. We don’t care if the $+$ operator received incompatible operands, for instance. This means that given an expression like $(+$ $1)$, we would mark this expression to be okay at this point, because the expression is well-formed. We will catch the problem of too few operands to $+$, at a later time.

We can start solving the problem of checking well-formedness of the expression by using an Abstract Syntax Tree (or AST). An AST is a way of representing the syntactic structure of code. In this tree, the leaf nodes are atomic values, and all the non-leaf nodes are operators. Recursion can be naturally expressed using an AST.

This is how we can represent a node of this tree in the code:

type ASTNode struct {
  value    string     // Only valid for leaf nodes.
  isValue  bool       // Checks if this is a value (also if children == nil).  
  children []*ASTNode // Children of this AST Node.
}

To actually verify the well-formedness of the expression and build the AST, we would go about it this way:

// This method gets a list of tokens, and returns:
// 1. The ASTNode of the tree so formed.
// 2. Unused tokens in the end of the array.
// 3. Any error while constructing the tree.
// Removed some error handling to make it a little brief.
func buildAST(tokens []string) (*ASTNode, []string, error) {
  var token = ""
  tokensLen := len(tokens)
  // If it is an empty list of tokens, the AST is a nil node
  if tokensLen == 1 {
    token, tokens = pop(tokens)
    if !isValue(token) {
      return nil, tokens, errStr("value", token)
    }

    // Create a one-node AST.
    node := new(ASTNode)
    node.isValue = true
    node.value = token
    node.children = nil
    return node, tokens, nil
  } else {
    token, tokens = pop(tokens)
    // Expect an opening bracket to start.
    if token != openBracket {
      return nil, tokens, errStr(openBracket, token)
    }

    node := new(ASTNode)
    node.isValue = false
    // Create a slice with 0 length initially.
    node.children = make([]*ASTNode, 0)

    tokensLen = len(tokens)
    for len(tokens) != 0 && tokens[0] != closedBracket {
      var childNode *ASTNode = nil
      var err error = nil
      // If we get an opening brace, it means we need to recursively get the
      // AST of the sub-expression from here on.
      if tokens[0] != openBracket {
        token, tokens = pop(tokens)
        childNode, _, err = buildAST([]string{token})
      } else {
        childNode, tokens, err = buildAST(tokens)
      }

      if err != nil {
        return nil, tokens, err
      }
      node.children = append(node.children, childNode)
    }

    // Expect a closing bracket when ending.
    token, tokens = pop(tokens)
    if token != closedBracket {
      return nil, tokens, errStr(token, closedBracket)
    }
    return node, tokens, nil
  }
}

You can see how the grammar for interpreting the s-expression grammar is hard-coded in the code here. We expect the expression to be either a single value, or something like $(\mathrm{operator}\ \mathrm{o_1}\ \mathrm{o_2}\ …\ )$, where $\mathrm{o_i}$ can be an atomic value, or a nested expression.

Note that we construct the AST slightly differently. The operator is also part of the children.

Parsing & Evaluation

We combine the parsing and evaluation of the AST into one stage. The result of evaluating an AST is an Atom, which can either have a Value or an errror.

type Atom struct {
  Err error
  Val Value
}

Here is a stripped down AST evaluation code:

func evalAST(env *LangEnv, node *ASTNode) Atom {
  var retVal Atom
  if node.isValue {
    retVal.Value, retVal.Err = getValue(env, node.value)
    return retVal
  }

  // Assuming that the first child is an operand
  symbol := node.children[0].value
  operator := env.getOperator(symbol)

  // Skipped error handling related to operator & arguments for brevity.
  operands := make([]Atom, 0)
  for i := 1; i < len(node.children); i++ {
    v := evalAST(env, node.children[i])
    if v.Err != nil {
      return v
    }
    operands = append(operands, v)
  }

  v := operator.handler(env, operands)
  if v.Err != nil {
    return v
  }
  retVal.Val = v.Val
  return retVal
}

Basic evaluation is very simple. We have a struct called LangEnv, which is the ‘environment’ data-structure storing amongst other things, defined operators. When evaluating an AST, if it is a single node, the value of the node is the result. Otherwise, we simply lookup the operator in the environment using getOperator, then resolve the operands recursively, and pass the operands to the operator. The operand deals with making sure that the operands are sane.

An operator looks something like this:

type Operator struct {
  symbol           string
  minArgCount      int
  maxArgCount      int
  handler          (func(*LangEnv, []Atom) Atom)
}

As seen, symbol is the name of operator, so for the binary addition it can be “+”. handler is the function which will actually do the heavy lifting we have been avoiding all this while. It takes in a slice of Atoms (and a LangEnv, more on that later) and returns an Atom as a result.

Now, the fun stuff.

Type System

Remember Atom has a Value inside? Value is an interface, and any type which wants to be a Value, needs to implement the following methods.

type Value interface {
  Str() string                 // Returns a string representation of the value.
  getValueType() valueType     // Return the valueType (enum of all Values).
  to(valueType) (Value, error) // Convert to a different Value type.  
  ofType(string) bool          // Check if a given string is of this type.
  newValue(string) Value       // Create a new Value of this type.
}

This is enough power to figure out which value is of which type. In LangEnv we keep a list of builtin Value types, such as intValue, floatValue, stringValue, etc.

To deduce the type of a value, we simply do this:

func getValue(env *LangEnv, token string) (Value, error) {
  types := builtinTypes()
  for _, t := range types {
    if t.ofType(token) {
      return t.newValue(token), nil
    }
  }
  return nil, errors.New(fmt.Sprintf("Could not get type for token: %s", token))
}

Now imagine an expression like $(+$ $1.2$ $3)$.

$1.2$ resolves to floatValue, and $3$ resolves to intValue. How would we implement the handler method for the $+$ operator to add these two different types? You might say, that this would involve casting of $3$ to floatValue. Now how do we decide what values to cast, and what type should we cast them to?

This is how we do it. In the method, typeCoerce, we try to find out which is the right value type to cast all our values to. It is declared like:

func typeCoerce(operatorName string, operands *[]Atom, typePrecendenceMap map[valueType]int)
  (valueType, error)

This is what we do inside typeCoerce:

We get the type : count mapping for all the param values.
If there is only one type, there is nothing to do, return the corresponding type. Every value belongs to the same type.
If there are multiple types in step 1, pick the one with the highest precedence.
Try and cast all operand values to that type. Error out if any of them resists.

Hence, the $+$ operator could be implemented this way:

op :=
  &Operator{
    symbol:      add,
    minArgCount: 2,
    maxArgCount: 100,     // Just an arbit large value.
    handler: func(env *LangEnv, operands []Atom) Atom {
      var retVal Atom
      var finalType valueType
      finalType, retVal.Err = typeCoerce(add, &operands, numValPrecedenceMap)
      if retVal.Err != nil {
        return retVal
      }

      switch finalType {
      case intType:
        var finalVal intValue
        finalVal.value = 0
        for _, o := range operands {
          v, _ := o.Val.(intValue)
          finalVal.value = finalVal.value + v.value
        }
        retVal.Val = finalVal
        break

      case floatType:
        var finalVal floatValue
        finalVal.value = 0
        for _, o := range operands {
          v, _ := o.Val.(floatValue)
          finalVal.value = finalVal.value + v.value
        }
        retVal.Val = finalVal
        break
      }
      return retVal
    },
  }
// Add the operator to the LangEnv's opMap (operator map)
addOperator(env.opMap, op)

Here we basically just call typeCoerce on the operands, and if its possible to cast them to one single type, we do that casting, and actually perform the addition in the new type.

The $+$ operator can be used to add strings as well. However, we don’t want something like $($$+$ $3.14\ \mathrm{“foo”})$. The typeCoerce method can be trivially extended to support a list of type valid type precedence maps, and all operands need to belong to the same precedence map. In this case, it could be { {intType: 1, floatType: 2}, {stringType: 1 } }. This particular list ensures that we don’t add ints and strings for example, because they don’t belong to the same precedence map.

Note that the entire implementation of the operator is defined in the Operator struct’s handler method. Whether or not the operator supports this sort of casting, or decides to roll its own, or not use it at all, is the prerogative of the operator.

Defining Variables

A typical variable definition could look like this: $(\mathrm{defvar}\ \mathrm{x}\ 3.0)$.

Now, defvar is an operator too. It expects the first argument to be of varType (matches the regex of a variable), and the value can be anything (except varType). We just need to check if the type conditions match, and the variable is not a defined operator. If both are okay, we define the variable in our LangEnv’s varMap.

We need to change the part in our evalAST method which to support variable lookup.

func evalAST(env *LangEnv, node *ASTNode) Atom {
  // ...
    for i := 1; i < len(node.children); i++ {
      v := evalAST(env, node.children[i])
      if v.Err != nil {
        return v
      }
      // <!-- Lookup code begins
      if v.Val.getValueType() == varType {
        v.Val, v.Err = getVarValue(env, v.Val)
        if v.Err != nil {
          return v
        }
      }
      // Lookup code ends --!>
      operands = append(operands, v)
    }
  // ...
}

Here we can assume we have a helper method called getVarValue, which looks up the value of a variable in the varMap, or throws an error if required (this is also simple to implement).

Defining Methods

Defining methods is even more fun! We use the defun operator. Consider this:

(defun circle-area (r) (* 3.14 r r))

The first operand is the name of the method, the second is a list of variables that you would pass to the method, and the last is the actual method, assuming that the variables you need are already defined. (circle-area 10) after this definition should return 314.

Calculating the area of a rectangle is pretty much similar.

(defun rect-area (x y) (* x y))

We need a couple of things for function calls to work fine:

Create a new astValue which can be used for keeping ASTs. So far we were keeping ints, floats and so on.
Have a way to tell evalAST to not evaluate the AST in the defun arguments. This is because in circle-area, the (* 3.14 r r) itself is the value (AST value).
The defun operator needs to add an operator to the opMap, with the same name as the method, and define its handler method.
The handler would need to expect the same number of params as specified in the definition.
Up until now, our variables have been global in scope. Assume that there is a (defvar x 3.0) defining the variable x, followed by (defun foo (x) (+ 1 x)) which defines a method uses a param labelled x. The interpreter may look at the varMap and think that the programmer wants to use the global x, which is $3.0$. The actual intention of the programmer is to use the parameter x.

For this to work correctly, we would need:

A new LangEnv to be created, inside the handler.
First copy the same varMap as the parent LangEnv passed to the handler.
Then copy the params passed to the handler. Any duplicates will be overwritten, but all global definitions would be preserved. The variable defined in the inner scope would be visible.
Inside the handler, we will call evalAST to evaluate the AST we were provided in the method definition with the new LangEnv
We also keep track of the recursion depth in LangEnv, and it is incremented every time a recursive call is made. If it exceeds a large value (100000 for now), we can error out, so as to salvage the interpreter at least.

This is the only complicated part of the interpreter. Those interested in the code can check it out here.

Recursion

Apart from making sure that we have some sort of recursion depth limit enforced, recursion does not need any special handling. Except, defining some new operators like cond (the if-else equivalent), which are required for writing something useful.

Here is the implementation of the factorial function:

(defun fact (x)
  (cond                     ; conditional block
    ((= x 0) 1)             ; if x == 0, return 1
    (true                   ; else
      (* x (fact (- x 1)))  ; return x * fact(x - 1)
    )
  )
)

fact(10) returns 3628800 as expected.

Porting to iOS

Once I had the interpreter working fine, I wanted to run this on an iOS app. Why? Just for fun. It turns out with Go 1.5, a new tool called gomobile. Hana Kim gave a talk about this at GopherCon 2015.

What it does is, it compiles your Go package into a static library, and generates Objective-C bindings for it, and wraps them together in a nice iOS friendly .framework package.

There are a few restrictions regarding not being able to return complex types such as structs within structs, but apart from that it was fairly easy to use in my bare-bones app. Here is the code for the app, and we have already seen the demo earlier.

(Thanks to Dhruv Matani for reviewing this blog-post.)

A gentle intro to libevent

15 Dec 2014 - 5 minute read

These past two days in some free time, I decided to explore this nifty C library called libevent.

Following the theme from the previous post, the first question is: ‘Why do we need it?’. If you are familiar with network programming, or any multithreaded programming which involves blocking IO, you already know the problem at hand. Right from the hardware level to the software level, a common problem that happens is: IO is slower than the CPU. If we have several tasks to finish, and the current task being executed is waiting for a blocking IO to finish, we should ideally work on the other tasks and let that blocking IO finish in the background, and check on it later.

When we have several such operations happening in the background, we need a way to figure out when a particular operation (such as read, write, accept a connection), can be performed without blocking, so that we can quickly do that, and return to other things. select(2), poll(2), epoll(4), kqueue(2) (on *BSD systems), are one of the several ways to do it. In essence, you register a set of file descriptors that you are interested in, and then call one of these ‘backends’. They would usually block until either one of the fd-s that you are interested in, is ready for data to be read or written to it. If none of them is ready, it would block for a configured amount of time and then return.

The problem that libevent solves is, it provides an easy to use library for notifying when an event happens on the file descriptors which you consider interesting. It also hides the real backend (select, epoll, kqueue) being used, and this helps you avoid writing platform-dependent code (eg., kqueue works only on *BSD) and if there were a new and improved backend in the future, your code would not change. It is like the JVM for asynchronous event notification system.

I only have experience with select, so my context is limited. Using select is very tedious.

// Essentially a bitmask
fd_set readfds;

while (true) {
  // Zero out the bitmask
  FD_ZERO(&readfds);
  
  // Enable the fd that we are interested in
  FD_SET(sockfd, &readfds);

  int ret = select(sockfd + 1, &readfds, NULL, NULL, &timeout);
  if (ret < 0) {
    fprintf(stderr, "select() returned %d\n", ret);
  }
  
  // Is the bit corresponding to the fd enabled?
  // If yes, that fd is ready to be read from.
  if (FD_ISSET(sockfd, &readfds)) {
    // Do what we want here
  }
}

In essence, what we do here is to create a set of file descriptors (fd_set). We then run a loop where every time we set all the file descriptors we are interested in, into that set. Then we call select(), and it either times out, or one of the bits in that set would be set. We have to check for each of the file descriptors. This makes it a little ugly. Other backends might be more pleasant to use, but libevent is way ahead of select in terms of usability. Here is some sample code:

event_base* eb = event_base_new();
event* t = event_new(eb, sockfd, EV_READ | EV_PERSIST, acceptConnCob, NULL);
event_add(t, NULL);
event_base_dispatch(eb);

An event_base represents a single event loop like the one that we saw in the select example. To subscribe to changes in a file descriptor, we will first create an event. This can be done using event_new, which takes in the event base we created, the file descriptor, flags signalling when the event is active, the callback method and arguments to the callback method. In this particular example, we ask that the acceptConnCob method be called when the file descriptor is ready for reading (EV_READ) and persist this event, i.e, even when the event fires, automatically add it for the next time (EV_PERSIST is to be used here). Note that we had to add the file descriptors in every iteration of the while loop of the select example, so using the EV_PERSIST flag is a slight convenience here. Once, I have created the event, I need to add that event to the event_base it should be processed by, along with a timeout, using the event_add method. If the fd doesn’t become active by the specified time, the callback will be called anyways. Finally, to get things running, we will ‘dispatch’ the event base, which will spawn a new thread to run the event loop.

Note that nowhere have I specified which backend to use. I don’t need to. However, there are ways to prefer or avoid certain backends in libevent, using the event_config struct. Refer to the link in the end.

I can add multiple events, and there are a lot of options that you can use. For instance, I can create a timer by passing -1 as a file descriptor with the EV_TIMEOUT and EV_PERSIST flags, and the required timeout in event_add. This would call the timeout callback every ‘timeout’ seconds. Example:

event* e = event_new(eb, -1, EV_TIMEOUT | EV_PERSIST, pickFortuneCookieCob, NULL);
timeval twoSec = {2, 0};
event_add(e, &twoSec);

I created a simple fortune cookie server (one of my favorite demo examples), where I have a set of messages, and if someone asks me for a fortune cookie, I will give them the current fortune cookie. Every few seconds, I will pick a new fortune cookie to be returned. This is implemented by having two events, one for accepting connections and the other for having a timer. The code is here.

One small thing to keep in mind is that if the callbacks themselves to do something heavy, then it defeats the purpose of using libevent. This is because the callbacks are called from the same thread as the actual event loop. The longer the callback runs, the longer the event loop is not able to process other events.

libevent allows you to do a lot of customizations. In the above example, I have added callbacks to override the logging mechanism, so that I can use glog (Google’s logging library). There are several other features such buffered events and a lot of utility methods (such as creating simple http servers), that you can find in the libevent book.

There are other similar async event notification systems such as libev, and libuv, etc. but I haven’t had the time to figure out the differences. I hope to cover the very interesting wrapper around libevent in folly, Facebook’s open source C++ library, in a future post.

Newer Older