Download

Visit our Downloads page to get tagged versions of Sirius-suite [1] or clone the latest from the repo (instructions below).

Set up Sirius-suite

The Sirius-suite is a collection of algorithmic components extracted from the end-to-end Sirius system. Every kernel contains a baseline, pthread, GPU version (CRF and Regex were not ported to the GPU), and an input set.

Each kernel is part of a specific service from Sirius:

Prerequisites

Sirius-suite depends on the following libraries:

*Building Caffe

Caffe is under active development and since some of the latest changes may break downstream projects, we provide users with a snapshot version at our Downloads page that is verified to build with sirius-suite. Caffe and Sirius-suite have some dependencies for which we provide scripts:

$ tar xzf sirius-caffe.tar.gz
$ cd sirius-caffe;
$ sudo ./get-opencv.sh
$ sudo ./get-libs.sh
$ sudo ./make-and-install.sh

In Makefile.config, make sure BLAS := open. If you have a GPU, also set the flags accordingly.

Building the kernels

After building and installing Caffe and OpenCV correctly, make from the top directory of sirius-suite/ will build the kernels for all the supported platforms. Alternatively, make test will build, run, and test each kernel’s correctness w.r.t the baseline.

$ tar xzf sirius-suite-1.1.tar.gz
$ cd sirius-suite
$ ./prepare.sh
$ make test

The kernels produce json formatted output which consists of: the kernel name, input data sizes, and timing of the stages of the kernel. The scripts/ folder includes useful python scripts to run the kernels multiple times and parse the resulting output.

Automatic Speech Recognition

Gaussian Mixture Model (GMM) Scoring

This is a Gaussian Mixture Model (GMM) kernel commonly used in ASR applications to score Hidden Markov Model (HMM) state transitions. This kernel is extracted from PocketSphinx, the embedded C version of Sphinx. The input file includes the acoustic model and 1 set of HMM state transitions to score by the GMM.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./gmm_scoring ../input/gmm_data.txt

The kernel does the following:

  1. Reads the acoustic model features,
  2. Reads the states that need to be scored,
  3. Scores each HMM state transition.

Deep Neural Network (DNN) Scoring

This is a DNN based Automatic Speech Recognition (ASR) kernel executing one forward pass. The kernel takes voice feature vectors as input and generates probabilities as output. The kernel uses Caffe for the DNN forward pass. Make sure Caffe is linking to OpenBlas to run the multithreaded version of this kernel.

Directory structure

./model/ contains the network configuration file and pre-trained model file.
./input/ contains an input file of features and the corresponding expected output file. The input included is a sentence of 548 feature vectors each of which consists of 440 floating numbers.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./dnn_asr ../model/asr.protoxt ../model/asr.caffemodel ../input/features.in

The kernel does the following:

  1. Initiates the model with weights with the model configuration and pretrained model (Kaldi’s Network implemented in Caffe),
  2. Loads the feature input,
  3. Executes a DNN forward pass.

Image Matching

Feature Extraction

The Feature Extraction (fe) kernel, the first step of the image matching pipeline in Sirius, extracts interesting keypoints from the input image. The feature extraction uses OpenCV’s SURF baseline and GPU implementation. The Pthreaded version tiles the image and each thread is responsible for a piece (or multiple pieces) of the image.

Directory structure

./input/ contains images of various sizes. When using more threads, consider using a larger image.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./surf-fe ../input/2048x2048.jpg

The kernel does the following:

  1. (pthread) The image is tiled,
  2. SURF FE generates keypoints.

Feature Description

This is the Feature Description (fd) kernel, the second step of the image matching pipeline in Sirius, receives image keypoints which are clustered into robust descriptors representing interesting areas of the image. The feature description uses OpenCV’s SURF baseline and GPU implementation. The Pthreaded version tiles the image and each thread is responsible for a piece (or multiple pieces) of the iamge.

Directory structure

./input/ contains images of various sizes. When using more threads, consider using a larger image.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./surf-fd ../input/2048x2048.jpg

The kernel does the following:

  1.  (pthread) The image is tiled,
  2. SURF FE generates keypoints,
  3. SURF FD generates feature descriptors.

Question-Answering (QA)

Regular Expression

This is a regular expression kernel reflective of OpenEphyra’s question-answering system. The kernel uses SLRE for the regular expression matching.

Directory structure

./input/ contains a list of patterns and questions to match. These are patterns and questions used in OpenEphyra’s Question-Answering system. There is a list of 100 patterns and two files each with 200 and 300 sentences.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./regex_slre ../input/list ../input/questions

The kernel does the following:

  1. Reads and compiles the list of regular expressions,
  2. Reads in the list of questions,
  3. Applies each regexp to each sentence.

Stemmer

This is a word stemming kernel used in OpenEphyra’s question-answering system. The stemming algorithm attempts to extract the root of each word by matching common word endings. For example, adaptability becomes adapt. The stemming kernel is based on the original Stemming algorithm.

Directory structure

./input/ contains the original list of 29,401 words that can be stemmed. The larger input files are copies of this original file.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./stem_porter ../input/voc.txt

The kernel does the following:

  1. Reads in the list of words,
  2. Using 6 different steps, matches and stems the current word against common word endings.

Conditional Random Fields

The Conditional Random Fields (CRF) algorithm assigns each word in OpenEphyra’s question-answering service a part-of-speech (POS) which is influenced by neighboring words. The implementation uses the LAPOS tagger.

Directory structure

./input/ contains a pretrained model on the WSJ corpus and test input data.

Running the kernel

  1. Build the kernel using make or make test
  2. Execute the kernel:
$ ./crf_tag ../input/model.la ../input/test-input.txt

The kernel does the following:

  1. Initializes an instance of the tagger and reads the pretrained model,
  2. Tags each word with a part-of-speech.

Citing Sirius-suite

If you use Sirius-suite in your research, please cite the official publication [1].

[1] [pdf] Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ron Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ASPLOS ’15, New York, NY, USA, 2015. ACM. Acceptance Rate: 17%
[Bibtex]
@inproceedings{hauswald15asplos,
author = {Hauswald, Johann and Laurenzano, Michael A. and Zhang, Yunqi and Li, Cheng and Rovinski, Austin and Khurana, Arjun and Dreslinski, Ron and Mudge, Trevor and Petrucci, Vinicius and Tang, Lingjia and Mars, Jason},
title = {Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers},
booktitle = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
series = {ASPLOS '15},
year = {2015},
numpages = {13},
publisher = {ACM},
address = {New York, NY, USA},
note = {Acceptance Rate: 17% },
}