This blog has relocated to https://coolbutuseless.github.ioand associated packages are now hosted at https://github.com/coolbutuseless.

29 April 2018

mikefc

In this post, as an example of using minilexer, I’ll parse the stanford bunny 3D object into an R data structure and display it.

In a prior post, I introduced the minilexer package, and showed some basic uses of the core functions.

In subsequent posts, I used minilexer to write:

Example parser: obj format for 3d objects

A simple text file to store 3d objects is the Wavefront obj format. The filetype is well documented on the internet (e.g. 1, 2, 3), and an example octahedron object is show below which has 6 vertices and 8 faces.

octahedron_obj <- '
# OBJ file created by ply_to_obj.c
#
g Object001

v  1  0  0
v  0  -1  0
v  -1  0  0
v  0  1  0
v  0  0  1
v  0  0  -1

f  2  1  5
f  3  2  5
f  4  3  5
f  1  4  5
f  1  2  6
f  2  3  6
f  3  4  6
f  4  1  6
'

The basic structure of a .obj file is:

  • Comments start with # and continue to the end of the line
  • There are symbols at the start of each line telling us what the data on the rest of the line represents, e.g.
    • v means this line defines a vertex and will be followed by 3 numbers representing the x, y, z coordinates.
    • f means this line defines a triangular face and the following 3 numbers indicate the 3 vertices which make up this face
    • vn means this line defines a vector for the direction of the normal at a vertex
  • The format is more complicated than this, and I’m leaving out a lot of details, but this is enough to get the general idea.

Use lex() to turn the text into tokens

  1. Start by defining the regular expression patterns for each element in the obj file.
  2. Use minilexer::lex() to turn the obj text into tokens
  3. Throw away whitespace, newlines and comments, since I’m not interested in them.
obj_patterns <- c(
  comment    = '(#.*?)\n',  # assume comments take up the whole line
  number     = pattern_number,  # This regex is defined in `minilex` and matches most numeric values
  symbol     = '\\w+',
  newline    = '\n',
  whitespace = '\\s+'
)

Tokenising the obj

Split the obj text data into tokens, but then remove anything that we don’t need to create the actual data structure representing the 3d object.

tokens <- lex(octahedron_obj, obj_patterns)
tokens <- tokens[!(names(tokens) %in% c('whitespace', 'newline', 'comment'))]
tokens
##      symbol      symbol      symbol      number      number      number 
##         "g" "Object001"         "v"         "1"         "0"         "0" 
##      symbol      number      number      number      symbol      number 
##         "v"         "0"        "-1"         "0"         "v"        "-1" 
##      number      number      symbol      number      number      number 
##         "0"         "0"         "v"         "0"         "1"         "0" 
##      symbol      number      number      number      symbol      number 
##         "v"         "0"         "0"         "1"         "v"         "0" 
##      number      number      symbol      number      number      number 
##         "0"        "-1"         "f"         "2"         "1"         "5" 
##      symbol      number      number      number      symbol      number 
##         "f"         "3"         "2"         "5"         "f"         "4" 
##      number      number      symbol      number      number      number 
##         "3"         "5"         "f"         "1"         "4"         "5" 
##      symbol      number      number      number      symbol      number 
##         "f"         "1"         "2"         "6"         "f"         "2" 
##      number      number      symbol      number      number      number 
##         "3"         "6"         "f"         "3"         "4"         "6" 
##      symbol      number      number      number 
##         "f"         "4"         "1"         "6"

Use a TokenStream to help turn the tokens into data

Initialise a TokenStream object to help us manipulate/interrogate the list of tokens we have.

stream <- TokenStream$new(tokens)

Write a function to parse the lines which start with f

The lines which start with f encode a single triangular face. The numbers which follow the f are the indicies of the vertices which make up the face.

To parse the lines which start with f:

  • make sure the current token is f
  • keep consuming tokens as long as they are numbers
  • when we run out of numbers, we consider this object parsed and return the data, in this case a numeric vector.
parse_f <- function() {
  
  # make sure the current token is `f`
  stream$consume_token('symbol', 'f')
  
  # keep consuming tokens as long as they are `numbers`
  values <- stream$consume_tokens_of_type('number', c(3, 4))
  
  # when we run out of numbers, we consider this object parsed and 
  # return the data, in this case a numeric vector.
  as.numeric(values)
}

Write function to parse the lines which start with g, v and vn

#-----------------------------------------------------------------------------
# Parse the 'group name' specification
#-----------------------------------------------------------------------------
parse_g <- function() {
  stream$consume_token('symbol', 'g')
  stream$consume_token()
}


#-----------------------------------------------------------------------------
# Parse the coordinates for a vertex. This may be 3 or 4 values, but 
# i'm just ignoring the 4th, and keeping the (x, y, z) 
#-----------------------------------------------------------------------------
parse_v <- function() {
  start_position <- stream$position
  stream$consume_token('symbol', 'v')
  values <- stream$consume_tokens_of_type('number', c(3, 4))
  as.numeric(values[1:3])
}


#-----------------------------------------------------------------------------
# Parse the vector for that represents the normal at a vertex. 
# This may be 3 or 4 values, but i'm just ignoring the 4th, and keeping the (x, y, z) 
#-----------------------------------------------------------------------------
parse_vn <- function() {
  start_position <- stream$position
  stream$consume_token('symbol', 'vn')
  values <- stream$consume_tokens_of_type('number', c(3, 4))
  as.numeric(values[1:3])
}

Write a top-level function containing a parse loop to keep extracting objects until we’re done

  • Check the current token
  • Call the parser for that token
  • Repeat
parse_obj <- function() {
  obj   <- list()  # This is where we'll hold the parsed data.
  
  while (!is.na(stream$current_value())) {
    cv <- stream$current_value()
    if (cv == 'g') {
      parse_g()
    } else if (cv == 'v') {
      v <- parse_v()
      obj$v <- rbind(obj$v, v)
    } else if (cv == 'f') {
      f <- parse_f()
      obj$f <- rbind(obj$f, f)
    } else if (cv == 'vn') {
      vn <- parse_vn()
      obj$vn <- rbind(obj$vn, vn)
    } else {
      message <- glue("Parse error at position {stream$position}. Not understood: {stream$current_value()}")
      stop(message)
    }
  }
  
  obj
}

obj <- parse_obj()

The 3d object now exists as a list of data.frames (one for vertices and one for faces)

obj
## $v
##   [,1] [,2] [,3]
## v    1    0    0
## v    0   -1    0
## v   -1    0    0
## v    0    1    0
## v    0    0    1
## v    0    0   -1
## 
## $f
##   [,1] [,2] [,3]
## f    2    1    5
## f    3    2    5
## f    4    3    5
## f    1    4    5
## f    1    2    6
## f    2    3    6
## f    3    4    6
## f    4    1    6

Post processing the data: Fortify/denormalise/tidy.

The text representation of the obj data is quite compact and avoids repetition but this isn’t quite in the right form for us to manipulate in R.

The following code turns this data into the faces data.frame which is slightly more useful as each face has an actual ID, and the x, y and z co-ordinates of its 3 vertices (a, b, c) are explicitly listed on each row i.e. we’ve created a tidy data.frame !

#-----------------------------------------------------------------------------
# Fortify/denormalise/tidy the `f` and `v` data into `faces`
#-----------------------------------------------------------------------------
create_faces <- function(obj) {
  suppressWarnings({
    faces <- data.frame(obj$f) %>%
      set_names(c('a', 'b', 'c')) %>%
      mutate(face_id = seq(n())) %>%
      gather(idx, vert_id, -face_id) %>%
      arrange(face_id, idx) %>%
      as.tbl()
    
    verts <- data.frame(obj$v) %>%
      set_names(c('x', 'y', 'z')) %>%
      mutate(vert_id = seq(n())) %>%
      as.tbl()
  })
  
  faces %<>% left_join(verts, by='vert_id')
  
  faces
}

faces <- create_faces(obj)
faces %>% knitr::kable(caption='tidy faces data.structure')
Table 1: tidy faces data.structure
face_id idx vert_id x y z
1 a 2 0 -1 0
1 b 1 1 0 0
1 c 5 0 0 1
2 a 3 -1 0 0
2 b 2 0 -1 0
2 c 5 0 0 1
3 a 4 0 1 0
3 b 3 -1 0 0
3 c 5 0 0 1
4 a 1 1 0 0
4 b 4 0 1 0
4 c 5 0 0 1
5 a 1 1 0 0
5 b 2 0 -1 0
5 c 6 0 0 -1
6 a 2 0 -1 0
6 b 3 -1 0 0
6 c 6 0 0 -1
7 a 3 -1 0 0
7 b 4 0 1 0
7 c 6 0 0 -1
8 a 4 0 1 0
8 b 1 1 0 0
8 c 6 0 0 -1

Let’s view the object!

Use your mouse to rotate and zoom the object.

view3d(theta = 10, phi=15)
rgl::triangles3d(faces$x, faces$y, faces$z, col='grey')

You must enable Javascript to view this page properly.

Bunny!

The exact same code was used to parse a much more interesting object: bunny.obj.

Use your mouse to rotate and zoom the object.

obj_file <- '../../data/obj/bunny.obj'
obj_text <- readLines(obj_file) %>% paste(collapse="\n")
tokens   <- lex(obj_text, obj_patterns)
tokens   <- tokens[!(names(tokens) %in% c('whitespace', 'newline', 'comment'))]
stream   <- TokenStream$new(tokens)
obj      <- parse_obj()
faces    <- create_faces(obj)