Bojan Karlaš / PhD student at ETH

I am a PhD student in the Systems Group of ETH Zurich advised by Prof. Ce Zhang. My research revolves around data management systems for machine learning. I’ve done internships at Microsoft, Oracle and Logitech.


Incomplete Databases
This is a project that involves many interesting and important things.
Read More
Testing of Machine Learing Models
This is a project that involves many interesting and important things.
Read More



Ease.ML: A Lifecycle Management System for MLDev and MLOps
LA Melgar, D Dao, S Gan, NM Gürel, N Hollenstein, J Jiang, B Karlaš, T Lemmin, T Li, Y Li, S Rao, J Rausch, C Renggli, L Rimanic, M Weber, S Zhang, Z Zhao, K Schawinski, W Wu, C Zhang
[CIDR] Conference on Innovative Data Systems Research

We present Ease.ML, a lifecycle management system for machine learning (ML). Unlike many existing works, which focus on improving individual steps during the lifecycle of ML application development, Ease.ML focuses on managing and automating the entire lifecycle itself. We present user scenarios that have motivated the development of Ease.ML, the eight-step Ease.ML process that covers the lifecycle of ML application development; the foundation of Ease.ML in terms of a probabilistic database model and its connection to information theory; and our lessons learned, which hopefully can inspire future research.

Paper Video BibTeX


RAB: Provable Robustness Against Backdoor Attacks
M Weber, X Xu, B Karlaš, C Zhang, B Li
[arXiv] arXiv preprint arXiv:2003.08904

Recent studies have shown that deep neural networks (DNNs) are vulnerable to various attacks, including evasion attacks and poisoning attacks. On the defense side, there have been intensive interests in provable robustness against evasion attacks. In this paper, we focus on improving model robustness against more diverse threat models. Specifically, we provide the first unified framework using smoothing functional to certify the model robustness against general adversarial attacks. In particular, we propose the first robust training process RAB to certify against backdoor attacks. We theoretically prove the robustness bound for machine learning models based on the RAB training process, analyze the tightness of the robustness bound, as well as proposing different smoothing noise distributions such as Gaussian and Uniform distributions. Moreover, we evaluate the certified robustness of a family of" smoothed" DNNs which are trained in a differentially private fashion. In addition, we theoretically show that for simpler models such as K-nearest neighbor models, it is possible to train the robust smoothed models efficiently. For K= 1, we propose an exact algorithm to smooth the training process, eliminating the need to sample from a noise distribution. Empirically, we conduct comprehensive experiments for different machine learning models such as DNNs, differentially private DNNs, and KNN models on MNIST, CIFAR-10 and ImageNet datasets to provide the first benchmark for certified robustness against backdoor attacks. In particular, we also evaluate KNN models on a spambase tabular dataset to demonstrate its advantages. Both the theoretic …

Paper BibTeX
Building continuous integration services for machine learning
B Karlaš, M Interlandi, C Renggli, W Wu, C Zhang, DMI Babu, J Edwards, C Lauren, A Xu, M Weimer
[SIGKDD] Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Continuous integration (CI) has been a de facto standard for building industrial-strength software. Yet, there is little attention towards applying CI to the development of machine learning (ML) applications until the very recent effort on the theoretical side. In this paper, we take a step forward to bring the theory into practice.

Paper Promo Video Talk Video BibTeX
Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
B Karlaš, P Li, R Wu, NM Gürel, X Chu, W Wu, C Zhang
[VLDB] Proceedings of the VLDB Endowment

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications remains elusive. In this paper, we present a formal study of this impact by extending the notion of Certain Answers for Codd tables, which has been explored by the database research community for decades, into the field of machine learning. Specifically, we focus on classification problems and propose the notion of" Certain Predictions"(CP)–a test data example can be certainly predicted (CP’ed) if all possible classifiers trained on top of all possible worlds induced by the incompleteness of data would yield the same prediction.We study two fundamental CP queries:(Q1) checking query that determines whether a data example can be CP’ed; and (Q2) counting query that computes the number of classifiers that support a particular prediction (ie, label). Given that general solutions to CP queries are, not surprisingly, hard without assumption over the type of classifier, we further present a case study in the context of nearest neighbor (NN) classifiers, where efficient solutions to CP queries can be developed–we show that it is possible to answer both queries in linear or polynomial time over exponentially many possible worlds.

Paper BibTeX
End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines
Z Yang, Z Zhao, H Pei, B Wang, B Karlaš, J Liu, H Guo, B Li, C Zhang
[arXiv] arXiv preprint arXiv:2003.00120

As machine learning (ML) being applied to many mission-critical scenarios, certifying ML model robustness becomes increasingly important. Many previous works focuses on the robustness of independent ML and ensemble models, and can only certify a very small magnitude of the adversarial perturbation. In this paper, we take a different viewpoint and improve learning robustness by going beyond independent ML and ensemble models. We aim at promoting the generic Sensing-Reasoning machine learning pipeline which contains both the sensing (eg deep neural networks) and reasoning (eg Markov logic networks (MLN)) components enriched with domain knowledge. Can domain knowledge help improve learning robustness? Can we formally certify the end-to-end robustness of such an ML pipeline?We first theoretically analyze the computational complexity of checking the provable robustness in the reasoning component. We then derive the provable robustness bound for several concrete reasoning components. We show that for reasoning components such as MLN and a specific family of Bayesian networks it is possible to certify the robustness of the whole pipeline even with a large magnitude of perturbation which cannot be certified by existing work. Finally, we conduct extensive real-world experiments on large scale datasets to evaluate the certified robustness for Sensing-Reasoning ML pipelines.

Paper BibTeX
Online Active Model Selection for Pre-trained Classifiers
MR Karimi, NM Gürel, B Karlaš, J Rausch, C Zhang, A Krause
[arXiv] arXiv preprint arXiv:2010.09818

Given pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query a label so that we can distinguish the best model from the rest while making a small number of queries? Answering this question has a profound impact on a range of practical scenarios. In this work, we design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round. Our algorithm can be used for online prediction tasks for both adversarial and stochastic streams. We establish several theoretical guarantees for our algorithm and extensively demonstrate its effectiveness in our experimental studies.

Paper BibTeX


Is advance knowledge of flow sizes a plausible assumption?
V Ðukić, SA Jyothi, B Karlaš, M Owaida, C Zhang, A Singla
[NSDI] 16th {USENIX} Symposium on Networked Systems Design and Implementation

Recent research has proposed several packet, flow, and coflow scheduling methods that could substantially improve data center network performance. Most of this work assumes advance knowledge of flow sizes. However, the lack of a clear path to obtaining such knowledge has also prompted some work on non-clairvoyant scheduling, albeit with more limited performance benefits.

Paper Video BibTeX
Continuous integration of machine learning models with ease. ml/ci: Towards a rigorous yet practical treatment
C Renggli, B Karlaš, B Ding, F Liu, K Schawinski, W Wu, C Zhang
[arXiv] arXiv preprint arXiv:1903.00278

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Developing a machine learning model is no difference-it is an engineering process with a life cycle, including design, implementation, tuning, testing, and deployment. However, most, if not all, existing continuous integration engines do not support machine learning as first-class citizens.In this paper, we present this http URL, to our best knowledge, the first continuous integration system for machine learning. The challenge of building this http URL is to provide rigorous guarantees, eg, single accuracy point error tolerance with 0.999 reliability, with a practical amount of labeling effort, eg, 2K labels per test. We design a domain specific language that allows users to specify integration conditions with reliability constraints, and develop simple novel optimizations that can lower the number of labels required by up to two orders of magnitude for test conditions popularly used in real production systems.

Paper BibTeX
Data Science through the looking glass and what we found there
F Psallidas, Y Zhu, B Karlaš, M Interlandi, A Floratou, K Karanasos, W Wu, C Zhang, S Krishnan, C Curino, M Weimer
[arXiv] arXiv preprint arXiv:1912.09536

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, by performing the largest analysis of DS projects to date, focusing on questions that can help determine investments on either side. Specifically, we download and analyze:(a) over 6M Python notebooks publicly available on GITHUB,(b) over 2M enterprise DS pipelines developed within COMPANYX, and (c) the source code and metadata of over 900 releases from 12 important DS libraries. The analysis we perform ranges from coarse-grained statistical characterizations to analysis of library imports, pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret, and dare to draw a few (actionable, yet subjective) conclusions on (a) what systems builders should focus on to better serve practitioners, and (b) what technologies should practitioners bet on given current trends. We plan to automate this analysis and release associated tools and results periodically.

Paper BibTeX
Automl from service provider’s perspective: Multi-device, multi-tenant model selection with gp-ei
C Yu, B Karlaš, J Zhong, C Zhang, J Liu
[AISTATS] 22nd International Conference on Artificial Intelligence and Statistics

AutoML has become a popular service that is provided by most leading cloud service providers today. In this paper, we focus on the AutoML problem from the\emph {service provider’s perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way? We focus on GP-EI, one of the most popular algorithms for automatic model selection and hyperparameter tuning, used by systems such as Google Vizer. The technical contribution of this paper is the first multi-device, multi-tenant algorithm for GP-EI that is aware of\emph {multiple} computation devices and multiple users sharing the same set of computation devices. Theoretically, given users and devices, we obtain a regret bound of $ O ((\text {\bf {MIU}}(T, K)+ M)\frac {N^ 2}{M}) $, where $\text {\bf {MIU}}(T, K) $ refers to the maximal incremental uncertainty up to time for the covariance matrix . Empirically, we evaluate our algorithm on two applications of automatic model selection, and show that our algorithm significantly outperforms the strategy of serving users independently. Moreover, when multiple computation devices are available, we achieve near-linear speedup when the number of users is much larger than the number of devices.

Paper BibTeX and in action: towards data management for statistical generalization
C Renggli, FA Hubis, B Karlaš, K Schawinski, W Wu, C Zhang
[VLDB Demo] Proceedings of the VLDB Endowment

Developing machine learning (ML) applications is similar to developing traditional software — it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality, and performance. In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of “software engineering for ML” is largely missing — developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself.In this paper, we view the management of ML development life-cycles from a data management perspective. We demonstrate two closely related systems, and, that provide some “principled guidelines” for ML application development: ci is a continuous …

Paper BibTeX

2018 in action: towards multi-tenant declarative learning services
B Karlaš, J Liu, W Wu, C Zhang
[VLDB Demo] Proceedings of the VLDB Endowment

We demonstrate, a multi-tenant machine learning service we host at ETH Zurich for various research groups. Unlike existing machine learning services, presents a novel architecture that supports multi-tenant, cost-aware model selection that optimizes for minimizing total regrets of all users. Moreover, it provides a novel user interface that enables declarative machine learning at a higher level: Users only need to specify the input/output schemata of their learning tasks and can handle the rest. In this demonstration, we present the design principles of, highlight the implementation of its key components, and showcase how can help ease machine learning tasks that often perplex even experienced users.

Code Paper BibTeX
Network Scheduling in the Dark
V Đukić, SA Jyothi, B Karlaš, M Owaida, C Zhang, A Singla
[SoCC] Proceedings of the ACM Symposium on Cloud Computing

Motivation. Advance knowledge of future events in a dynamic system can often be used to take actions that improve system performance. In data center networks, such knowledge could potentially benefit many problems, including routing and flow scheduling, circuit switching, packet scheduling in switch queues, and transport protocols. Indeed, past work on each of these topics has explored this, and in many cases, claimed significant improvements [1–3]. Nevertheless, little of this work has achieved deployment in data centers, which largely use techniques that are agnostic to traffic information, such as shortest path routing with randomization, and first-in-first-out queueing at switches. A significant roadblock for traffic-aware scheduling is that in practice, traffic characteristics can be hard to ascertain accurately in a timely fashion. In particular, past work on network flow and packet scheduling has assumed advance …

Paper BibTeX


The curious case of the PDF converter that likes Mozart: Dissecting and mitigating the privacy risk of personal cloud apps
H Harkous, R Rahman, B Karlaš, K Aberer
Proceedings on Privacy Enhancing Technologies

Third party apps that work on top of personal cloud services, such as Google Drive and Drop-box, require access to the user’s data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google’s Chrome store, we discover that the existing permission model is quite often misused: around two-thirds of analyzed apps are over-privileged, i.e., they access more data than is needed for them to function. In this work, we analyze three different permission models that aim to discourage users from installing over-privileged apps. In experiments with 210 real users, we discover that the most successful permission model is our novel ensemble method that we call Far-reaching Insights. Far-reaching Insights inform the users about the data-driven insights that apps can make about them (e.g., their topics of interest, collaboration and activity patterns etc.) Thus, they seek to bridge the gap between what third parties can actually know about users and users’ perception of their privacy leakage. The efficacy of Far-reaching Insights in bridging this gap is demonstrated by our results, as Far-reaching Insights prove to be, on average, twice as effective as the current model in discouraging users from installing over-privileged apps. In an effort to promote general privacy awareness, we deployed PrivySeal, a publicly available privacy-focused app store that uses Far-reaching Insights. Based on the knowledge extracted from data of the store’s users (over 115 gigabytes of Google Drive data from 1440 users with 662 installed apps), we also delineate the ecosystem for 3rd party cloud apps from the standpoint of …

Paper Video Website BibTeX



Research Intern / Gray Systems Lab / Redmond, USA
Building a testing tool for ML models. Researching usage data from ML.NET feature engineering pipelines.


Research Intern / Oracle Labs / San Francisco Bay Area, USA
Developing an automated ensemble construction method for the Oracle Auto-ML system.

2016 - 2017

Research and Development Intern / Lausanne, Switzerland
Applying machine learning and signal processing techniques to detect filler words in speech audio.

2015 - 2016

EPFL Research Scholar Program
Research Scholar / LCBB Lab / Lausanne, Switzerland
Applying graph theory and development of algorithms for assigning absolute orientations to genetic markers.

2014 - 2015

EPFL Research Scholar Program
Research Scholar / LSIR Lab / Lausanne, Switzerland
Design, develop, and validate an improved App Permissions Dialog for Google Drive.

2012 - 2014

Software Design Engineer / Microsoft Development Center Serbia / Belgrade, Serbia
Worked in the SQL Server Parallel Data Warehouse Team (PDW) on development of a Microsoft big data solution. Participated in all phases of the software development cycle, collaborated with various teams in the US, worked with a large code base and wrote maintainable production quality code.


2018 - Present

Eidgenössische Technische Hochschule (ETH)
PhD in Computer Science / DS3 Lab / Zürich, Switzerland
Worked on many interesting and important projects.

2014 - 2017

École polytechnique fédérale de Lausanne (EPFL)
Master in Computer Science / Lausanne, Switzerland
Worked on many interesting and important projects.

2008 - 2014

School of Electrical Engineering, Belgrade University (ETF)
Bachelor in Software Engineering / Belgrade, Serbia
Worked on many interesting and important projects.


I speak English fluently, as well as intermediate German and French.
My native language is Serbian.
I enjoy running, hiking, books and videogames.