I am Leiyu Zhao, a student of Master of Computational Data Science (MCDS) at School of Computer Science, Carnegie Mellon University.
Refer to my picture_as_pdfCurriculum Vitae (PDF) for brief intro.
Cumulative GPA: 4.11/4.33; Advisor: Garth Gibson
Relevant Courses: Introduction to Computer Systems, Cloud Computing, Distributed Systems, Storage Systems, Database Applications,
Parallel Computer Architecture and Programming, Advanced Cloud Computing, Machine Learning
Cumulative GPA: 94/100 (3.94/4.00), ranking 2/63
Major GPA: 97/100 (4.00/4.00), ranking 1/63
Developed built-in universal apps for Windows 10 Chinese Market Release, like Nearby Numbers
Developed non-built-in universal apps for Windows 10, like Microsoft How-old and Microsoft Couplet
Deployed OpenStack Keystone, Swift and Nova on cluster to back up THU Cloud Computing Platform.
Designed H2Cloud to host file system in a single object storage cloud like OpenStack Swift. For detailed information please refer to PROJECTS part.
Submitted paper Maintaining Filesystem in Object Storage Cloud to IEEE INFOCOM'17 (pending).
Coding Languages and Debugging
Algorithm and Data Structure
Distributed and Parallel Computing Systems
MapReduce, Apache Spark, CUDA
HTML, CSS, Bootstrap, Nginx, ExpressJS, Django, ASP.NET
MySQL, Apache HBase, MongoDB
CloudFS is a FUSE-based filesystem on Linux to store files in hybrid mode: small files and
metadata are stored on local SSD, while big files are stored on Amazon S3 to utilize both the speed of SSD and high capacity of cloud storage.
The variations in data locations and access patterns are transparent to user by hidding behind POSIX filesystem functions.
It has full support for deduplication by rabin-fingerprint-anchored chunking, coarse-grained (chunks) local caching, instant snapshotting
(and COW), online backup and recovery.
Modules and data accesses
We design H2Cloud, a cloud storage system consisting of H2Layer and OpenStack Swift to maintain the users' filesystem
on the single object storage cloud. By the means, system complexity is greatly reduced compared to dual-cloud solutions
like Dropbox; while filesystem operation is expedited by a large scale compared to consistent hashing solutions like
Dual-cloud Solutions like Dropbox
Consistent Hashing Solutions like Amazon S3
Comparison on Performance of MOVE and DELETE
Filesystem Structure in H2Cloud
Gurgling is a light and configurable web framework with API similar to ExpressJS.
It provides full support for HTTP1.1, HTTPS, Websocket, etc.. Refer to the README in
GitHub page for more information.
The Sound of Tsinghua is an interaction platform based on Wechat to offer ticket booking and information service to all
the stuff at Tsinghua University.
Since release in Dec.2014, it has got more than 21,000 subscribers (up to Oct.2015). Also, it has distributed tickets for dozens of
activities, including popular ones with more than 2,000 participants and 3,000 subscribers requiring tickets concurrently.
UI for Administration console
Subscriber increase statistics diagram
Readers per broadcast statistics diagram
Based on UNIX v6 kernel, the GUI module implements a graphic card driver, a complete GUI framework (including
kernel model and syscalls) and a set of executable GUI programs. The architecture is originally designed by Leiyu.
Core datastructure Visible Tree
GUI Start Screen
CoDrawboard is a highly fault tolerant distributed realtime collaborative drawboard, which
allows all the users connecting to the same arbitrarily big backend cluster to share the same
Thanks to PAXOS, the system functions well as long as quorum (more than half) backends are
alive. With live-switch and offline mode, the users may not even feel the failure no matter
how many servers on the backend crush.
Refer to README for more info.
Connecting to cluster
UI - the drawboard
Backend failure tolerence
Television Audience Analytics Web Service performs realtime analytics on ~1 billion
pieces of system logging in the TV sets in Tianjin Province, China. Backed by Apache
Spark and Hadoop Distributed Filesystem, the system is capable of handling customized
analytics requests about television audience in seconds.
DEMO Active days distribution
DEMO Total watchtime analytics
DEMO Tuning times distribution
and a runtime to execute it in JVM. Advanced features of
Overall working procedure
Js types implementation
Asynchronous model implementation
Original vs generated TAC
Rousey is a wristband-like device which takes advantage of infrared camera and recognizes gestures to provide an invisible mouse
as an input to PC/Tablet/Ultrabook to combine accuracy with portability. It is highly commented by engineers from Intel, winning
Excellent Works in Intel Mobile Computing Innovation Contest.
Image capture & processing
The network is an ad-hoc cluster consisting of self-made nodes to collect data from distributed sensors for analysis.
To ensure both quality and quantity, a protocol based on ZigBee is designed for distributed data analysis
and reliable data transfer.
Distributed data analysis process
Smartdental is a product developed by School of Software, Tsinghua University and School of Stomatology, Peking University.
Annotation module is an addition supporting manuscript annotations on pathological pictures. The module is checked in to
the original version and under review for release.
UI: Editing annotation
UI: SYNC to client
The system works on a paper database to recommend contents to potential readers. By using hybrid
algorithm (combining collaborative filtering and Tf-Idf) and dynamic estimation, the recommendation
accuracy is estimated to be more than 71% (AP@5). A technical report describing the algorithm in detail
The tool stitches several related photos to one panorama. By matching SURF vector of images,
calculating transforming matrices, searching optimum stitch using minimal-cut algorithm and
weakening color discrepancies, photos can be stitched together naturally.
Optimized Algorithm for keypoint matching