New Jersey
Programming Languages and Systems
Seminar

Dec 6, 2024

Cornell Tech, New York, NY

The New Jersey Programming Languages and Systems Seminar (NJPLS) brings researchers in the New Jersey area together for a day of informal talks and discussions. We accept talks on complete research and work in progress, dealing with topics of interest to the programming languages and systems communities. See the talk proposal form for more information.

When Friday, Dec 6, 2024
Where Bloomberg Center · 2 West Loop Rd, New York, NY, 10044
What Research talks and discussions. Breakfast, lunch, and coffee provided.

Registration is free thanks to Cornell Tech and Jane Street!

Unfortunately, we have reached capacity for this event. Please join the waitlist if you are interested in attending. We will select people from the waitlist on a first-come first-served basis, and will inform you if you are selected.

Any questions or concerns should be directed to Noam Zilberstein (noamz@cs.cornell.edu).


Program

8:30 AM Breakfast and Sign-in
9:25 AM Welcome
9:30 AM Named Commutative Blocks for Verified Parallelization Parisa Fathololumi
Stevens Institute of Technology
Abstract

Recent works increasingly leverage code commutativity to parallelize sequential programs. For instance, CommSet introduced pragmas that allow programmers to specify commutativity for function calls, enabling compilers to better perform whole-program, dependency-based parallelization. Veracity goes further, parallelizing sequential code segments with formal correctness guarantees and automatic synthesis of commute conditions. However, no current language allows programmers to select arbitrary code regions that may only commute under specific conditions, synthesize those conditions, and dynamically exploit them in whole-program parallelization.

We present a method of whole-program parallelization with support for conditional commutativity. First, programmers specify commutativity of disparate code fragments by giving parameterized names to code blocks anywhere in their program and providing commutativity condition expressions over those parameters and the global state. Our algorithms then analyze dependence and commutativity, construct a program-specific parallelization scheme, and translate the code into asynchronous tasks. These tasks are input into a scheduler, which respects dependencies and dynamically checks commutativity conditions to determine if tasks can run concurrently. We demonstrate that these conditions can often be verified or synthesized automatically.

Our implementation, Veracity2G, adapts benchmarks from CommSet, Veracity, and other literature. We show Veracity2G can automatically synthesize commutativity conditions, enabling parallelization with speedups attributable to conditional commutativity.

9:55 AM Opportunistic Lambda Calculus: Automatic Parallelization and Streaming for Scripting Stephen Mell
University of Pennsylvania
Abstract

Scripting languages are widely used to compose external calls, such as foreign function calls that perform expensive computations, calls to remote APIs, or more recently, calls to machine learning models such as large language models (LLMs). Since external calls are typically the performance bottleneck in scripts, we design a novel scripting language to automatically exploit external-call parallelism. While our language is general, it is especially suited to scripting LLMs (since calls commute and are very slow), and includes support for streaming APIs. We call our key idea "opportunistic evaluation": instead of waiting for a function’s input to be available, we evaluate the function as much as possible on the known portion of the input, continuing the process when more input data is received. Crucially, functions are written as if the entire input is available. We show how opportunistic evaluation achieves both automatic parallelization of blocking external calls and automatic streaming. We define a lambda calculus with a non-strict, non-lazy evaluation order and with syntactic placeholders for unknown values, and we prove it confluent. Notably, we represent data with Church encodings and show how the conventional constructor/destructor approach gets stuck on partial data. We implement our approach as a Python library and show that it yields 1.4x to 37.2x speedups on five LLM scripting case-studies.

10:20 AM Expressive Tree Policies for Microservice Networks Karuna Grewal
Cornell University
Abstract

A microservice-based application is composed of self-contained components called services that collectively process a request. Controlling inter-service communications is key to enforcing safety properties in such applications. Presently, only a limited class of single-hop properties can be specified. These policies can be overly permissive because they ignore the rich service tree structure of the inter-service communications. Policies that can express the service tree structure will offer fine-grained control over communication patterns to development and security teams.

In this talk, firstly, I will present an expressive policy language to specify service tree structures. Then, I will focus on our visibly pushdown automata-based runtime monitoring technique for enforcing service tree policies. Finally, I will present our engineering solution for a distributed monitor on top of an existing microservice deployment framework.

10:45 AM Coffee Break
11:15 AM Raven: An SMT-Based Deductive Verifier for Concurrent Separation Logic Ekansh Gupta
New York University
Abstract

A current state of the art in using separation logic to prove linearizability of fine-grained concurrent data structures is the higher order concurrent separation logic framework Iris, which is mechanized in Coq/Rocq. While Iris’s logic is highly expressive, its Coq mechanization is cumbersome to use, which slows down efforts to prove correctness of widely-used real world concurrent data structures.

We leverage advances in automated reasoning to implement Raven—a deductive verifier for an Iris-compatible concurrent separation logic, with powerful features such as user-defined resource algebra, invariants, atomic specification, and higher order modules. Raven achieves this in a few ways:

  1. By carefully simplifying the Iris logic and restricting the most higher-order features of Iris, such as impredicativity and step-indexing
  2. By utilizing simple heuristics to automatically infer elements of the logic, specifically mask annotations; and
  3. By encoding logical reasoning to first order logic, and using a powerful SMT Solver (Z3) for highly automated reasoning, which removes the user burden of manually reasoning about each proof step.

The end result is a user-friendly tool which can be used by non Iris experts to rapidly verify specifications like functional correctness and linearizability of complex fine-grained concurrent data structures.

11:40 AM Non-Termination Proving At Scale Julien Vanegue
Bloomberg & Imperial College London
Abstract

Program termination is a classic non-safety property whose falsification cannot in general be witnessed by a finite trace. This makes testing for non-termination challenging, and also a natural target for symbolic proof. A number ofworks in the literature apply non-termination proving to small, self-contained benchmarks, but it has not been developed for large, real-world projects; as a consequence, despite its allure, non-termination proving has had limited practical impact. In this paper we develop a compositional theory for non-termination proving: compositionality opens the way to scalable application to large codebases. Discovering non-termination is an under-approximate problem, and we present UNTer, a sound and complete under-approximate logic for proving non-termination. We then extend UNTer with separation logic and develop UNTersl for heap-manipulating programs, yielding a compositional proof method amenable to automation via under-approximation and bi-abduction. We extend the Pulse analyser from Facebook and develop Pulse∞, an automated, compositional prover for non-termination based on UNTersl. We have run Pulse∞ on large codebases and libraries, each comprising hundreds of thousands of lines of code (LOC), including OpenSSL, libxml2, libxpm and CryptoPP; it discovered several non-termination bugs which have been reported to developers of these libraries.

12:05 PM Monotone Procedure Summarization via Vector Addition Systems and Inductive Potentials Nikhil Pimpalkhare
Princeton University
Abstract

This talk presents a technique for summarizing recursive procedures operating on integer variables. The motivation of this work is to create more predictable program analyzers, and in particular to formally guarantee compositionality and monotonicity of procedure summarization. To summarize a procedure, we compute its best abstraction as a vector addition system with resets (VASR) and exactly summarize the executions of this VASR over the context-free language of syntactic paths through the procedure. We improve upon this technique by refining the language of syntactic paths using (automatically synthesized) linear potential functions that bound the number of recursive calls within valid executions of the input program. We implemented our summarization technique in an automated program verification tool; our experimental evaluation demonstrates that our technique computes more precise summaries than existing abstract interpreters and that our tool’s verification capabilities are comparable with state-of-the-art software model checkers.

12:30 PM Lunch
2:00 PM SyFi: Symbolic Reasoning for File System Effects Eric Zhao
Brown University
Abstract

Shell programs are routinely used by software developers, scientists, system administrators, and many others to automate a variety of tasks. Regrettably, the shell’s complex semantics and the use of black-box components makes authorship of shell programs highly error-prone. At best, buggy programs may panic unexpectedly; at worst, they mutate the file system and broader environment in catastrophic and irreversible ways. Yet, automated reasoning about the effects of realistic shell programs on the file system is frustrated by a rich set of features which describe file paths, including non-linear references, glob patterns, and symbolic variables.

In this talk, we outline the design of SyFi, a new Hoare-style logic for file system effects and an associated solving procedure. It is the first model to reason about symbolic variables, non-linear references, and globs altogether—and thus is able to automatically detect potentially catastrophic file manipulation errors in a broad subset of realistic shell programs. Our in-progress Rust implementation of SyFi powers the core of a novel static analyser for the POSIX shell, which we evaluate on an extensive collection of real-world shell scripts.

2:25 PM Lexical Effect Handlers, Directly Cong Ma
University of Waterloo
Abstract

Lexically scoping effect handlers is a language-design idea that equips algebraic effects with a modular semantics: it enables local-reasoning principles without giving up on the control-flow expressiveness that makes effect handlers powerful. However, existing implementations risk incurring costs akin to the run-time search for dynamically scoped handlers.

This talk will present a compilation strategy for lexical effect handlers, adhering to the lexical scoping principle and targeting a language with low-level control over stack layout. Key aspects of this approach are formalized and proven correct. We embody the ideas in a language called Lexa: the Lexa compiler translates high-level effect handling to low-level stack switching. We evaluate the Lexa compiler on a set of benchmarks; the results suggest that it generates efficient code, reducing running-time complexity from quadratic to linear in some cases.

2:50 PM Correctly Rounded Math Libraries Without Worrying about the Application’s Rounding Mode Sehyeok Park
Rutgers University
Abstract

The RLibm project has recently proposed methods to generate fast and correctly rounded math libraries. Particularly, their method to generate a single implementation for an elementary function that produces correctly rounded results for multiple representations and rounding modes is appealing for developing fast reference libraries without double rounding issues. The key insight is to build polynomial approximations that produce the correctly rounded results for a representation with two additional bits when compared to the largest target representation and with the non-standard round-to-odd rounding mode, which makes double rounding the RLibm math library result to any smaller target representation innocuous.

3:15 PM Coffee Break
3:45 PM A Non-allocating Option in OCaml Diana Kalinichenko
Jane Street
Abstract

Functional programmers use data types like option and result to safely represent computations with possible errors. However, in languages like OCaml, these abstractions come with a performance cost. Every function that returns a value of type ’a option must use the Some constructor to represent successful results. This process involves additional memory allocation, which can eventually trigger latency-inducing garbage collection.

Is it possible to avoid this allocation? In general, no: if every possible return value already has a meaning, extra memory must be used to signal exceptional states. However, such is not the case in OCaml. Every OCaml value is either a pointer to a memory-allocated block or an immediate value with the bottom bit set to 0. This representation leaves one bit pattern unused: all 0s, which corresponds to the null pointer value.

This talk explores the design and implementation challenges of using the all-0, or null, value to denote error conditions while keeping strong type guarantees. We introduce a new optional type, or_null, with constructors This and Null. A key part of the design is distinguishing maybe-null types from non-null types. We have a prototype implementation in our branch of the OCaml compiler.

4:10 PM Inferring Pluggable Type Qualifiers for Legacy Programs Martin Kellogg
New Jersey Institute of Technology
Abstract

Pluggable type systems augment a host language with a set of type qualifiers that enable extended typechecking, e.g., to prevent null pointer exceptions or resource leaks. The main obstacle for developers to adopt these tools in practice is writing annotations; type inference is the classical solution to this problem. However, classic type inference approaches make assumptions that do not hold when annotating legacy codebases with type qualifiers, such as that a valid typing exists. In this talk, I will overview our work to address this problem by first describing several recent works that propose novel type qualifier inference strategies. Then, I will report on a direct comparison between three techniques that infer nullability annotations for Java programs; the biggest take-away is that developers, while annotating their code, make semantic changes that make inference easier—so many prior evaluations (including our own!) were unfairly biased in favor of the tools they evaluated (since they performed “type reconstruction” experiments, using human-written annotations as ground-truth for inference). Finally, I will discuss our ongoing and future work in this space to address these problems by making semantically-meaningful changes during the inference process, in the same ways that human developers do.

4:35 PM Formalizing Erlang's Success Typings: Proof of Reverse Type Safety Elan Semenova
University of Maryland
Abstract

Erlang, a dynamically typed programming language often used in distributed systems, has the idea of success types. Success types aim to have a dual assertion to most type systems: that if a program is ill-typed, then the program will not evaluate. Dialyzer, a static analysis tool widely used in practice, implements these types, but a theoretical basis has not yet been created for them. This project focuses on rigorously specifying the type system and creating a proof of correctness for it.

We first build precise definitions of ill-typed and stuck terms on top of existing definitions. Due to stuck having a different meaning for ill-typed abstractions, doing so requires a doubly-step-indexed logical relation, the novel contribution of this work. We then use these definitions in a proof by induction to prove reverse type safety.

5:00 PM Closing Remarks