Paul Butler


    • Two Sigma

    • New York, NY
    • October, 2016present
    • Quantitative Analyst
    • I work with fundamentals-related data to create stock forecasts.

    • Google

    • New York, NY
    • October, 2014October, 2016
    • Software Engineer
    • Developed features for Search based on the Knowledge Graph, and the infrastructure that supports them.

    • Chango

    • Toronto, ON
    • July, 2013July, 2014
    • Data Scientist
    • Built tools for detecting botnets and ad fraud hidden in terabytes of real-time ad market data.

    • Bit Aesthetics

    • Toronto, ON
    • November, 2012July, 2013
    • Data Hacker
    • Consulted on data projects under the name Bit Aesthetics.


    • New York City, NY
    • August, 2011November, 2012
    • Data Scientist and Product Developer
    • Built and maintained a system to automatically optimize bids in ad placement auctions. Began as an intern and continued part-time remotely.

    • Facebook

    • Palo Alto, CA
    • September, 2010December, 2010
    • Software Engineering Intern
    • Worked on internal data infrastructure projects as well as open-source projects in the Hadoop family. Worked on self-initiated visualization projects, including “Visualizing Facebook Friends”


    • Stanford University

    • April, 2018June, 2018
    • SCPD (off-campus)
    • CS231n Convolutional Neural Networks for Visual Recognition
    • Stanford University

    • September, 2013December, 2014
    • SCPD (off-campus)
    • Mining Massive Data Sets Certificate


    • Kontagent Big Data Challenge

    • September, 2012
    • Won a $10k cash prize for creating an interactive HTML5 visualization of Toronto transit schedule data.

Selected Work

    • Visualizing Facebook Friends

    • December, 2010
    • Created a visualization of a sample of ten million friend pairs to construct a world map. Published by Facebook and featured by the BBC and The Economist.

Open Source

    • runipy

    • Headless execution engine for IPython notebooks.

    • simplediff

    • Simple and popular diff algorithm with implementations provided in four languages.

    • sklearn-pandas

    • Allows the sckit-learn machine learning library to build and cross-validate models directly on Pandas data frames.