Sirius [1] is an open-source end-to-end standalone intelligent personal assistant (IPA) service. Sirius receives queries in the form of speech or images and returns results in the form of natural language. Sirius implements the core functionalities of an IPA including speech recognition, image matching, natural language processing and a question-and-answer system.

endtoend2
Sirius Intelligent Personal Assistant

Download

Visit our Downloads page to get the latest version of Sirius [1].

Install

Sirius [1] has been tested on Ubuntu 12.04 and 14.04. The following instructions are based from the sirius-application/ directory.

Prerequisites

Sirius (and Sirius-suite) [1] has several dependencies which can be resolved with the included get-dependencies.sh. The full list of dependencies is summarized below:

Setting up Sirius

$ tar xzf sirius-1.0.1.tar.gz
$ cd sirius/sirius-application
$ tar xzf question-answer.tar.gz
$ sudo ./get-dependencies.sh
$ sudo ./get-opencv.sh
$ ./get-kaldi.sh
$ ./compile-sirius-servers.sh

After all the correct dependencies have been resolved compile-sirius-servers.sh compiles all the Sirius servers.

The subsequent scripts mentioned are included in the run-scripts/ directory unless otherwise noted.

Automatic Speech Recognition (ASR)

Sirius supports three backends: Kaldi (DNN/HMM based), Pocketsphinx, and Sphinx4 (the latter are GMM/HMM based) to perform Automatic Speech Recognition.

To open an ASR server:

$ ./start-asr-server.sh
or
$ ./start-asr-server.sh pocketsphinx

or specify an ASR, hostname and port:

$ ./start-asr-server.sh pocketsphinx localhost 8080

In a separate terminal, to test ASR:

$ ./sirius-asr-test.sh ../inputs/questions/what.is.the.speed.of.light.wav

Image Matching (IMM)

Image Matching uses SURF to match query images to a stored database.

In image-matching/ first build and store a database of descriptors in protobuf format where the arguments are the name of the database and the directory containing the images:

$ ./make-db.py landmarks matching/landmarks/db/

To change the database used by the IMM service, change the name in  start-imm-server.py.

In run-scripts/, open the IMM server:

$ ./start-imm-server.sh

In a separate terminal, test IMM using:

$ ./sirius-imm-test.sh ../image-matching/matching/landmarks/query/query.jpg

Question-Answering System (QA)

The Question-Answering system uses OpenEphyra and a Wikipedia database stored in Lemur’s Indri format.

Extract the Wikipedia database (after untaring and building question-answer):

$ wget http://web.eecs.umich.edu/~jahausw/download/wiki_indri_index.tar.gz
$ tar xzvf wiki_indri_index.tar.gz -C question-answer/

In run-scripts/, open the QA server:

$ ./start-qa-server.sh

In a separate terminal, test QA using:

$ ./sirius-qa-test.sh "what is the speed of light"

Combining Services

It is very easy with Sirius to combine ASR and QA to create the full intelligent personal assistant pipeline. After opening multiple servers using ./start-<service>-server.sh, test an ASR-QA query using:

$ ./sirius-asr-qa-test.sh ../inputs/real/what.is.the.capital.of.italy.wav

Contact

We would love to have your help in improving Sirius. For questions and comments, post to sirius-users.

Citing Sirius

If you use Sirius in your research, please cite the official publication [1].

[1] [pdf] Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ron Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ASPLOS ’15, New York, NY, USA, 2015. ACM. Acceptance Rate: 17%
[Bibtex]
@inproceedings{hauswald15asplos,
author = {Hauswald, Johann and Laurenzano, Michael A. and Zhang, Yunqi and Li, Cheng and Rovinski, Austin and Khurana, Arjun and Dreslinski, Ron and Mudge, Trevor and Petrucci, Vinicius and Tang, Lingjia and Mars, Jason},
title = {Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers},
booktitle = {Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
series = {ASPLOS '15},
year = {2015},
numpages = {13},
publisher = {ACM},
address = {New York, NY, USA},
note = {Acceptance Rate: 17% },
}