This run of NJPLS will be in-person only, with no accommodations for hybrid or remote participation. If there are, e.g., new variants or university policy changes, we will simply cancel the event.

Per Stevens policy, masks will be required.

If you're driving to Hoboken and need parking, download and print a parking permit before you come. Park in the Babbio garage and walk to Howe. Please register by Friday, April 29!

Please distribute and post our flyer, too!

Program

9:00am10:00amBreakfast (catered)
10:00am11:00am
Xiaodong Jia (Penn State)
Konstantinos Kallas (UPenn)
Peter Thiemann (Uni. Freiburg)
11:00am12:00pm
Santosh Nagarakatte (Rutgers)
12:00pm1:30pmLunch (catered)
1:30pm2:30pm
Sebastian Wolff (NYU)
Ryan Kavanagh (McGill)
Adam Chen (Stevens)
2:30pm3:00pmCoffee break (catered)
3:00pm4:00pm
Sankha Narayan Guria (UMD)
Li–yao Xia (UPenn)
Nisarg Patel (NYU)
4:00pm4:15pmBreak
4:15pm5:15pm
Harrison Goldstein (UPenn)
Joshua Cohen (Princeton)
Jacob Prinz (UMD)
5:15pm5:30pmBusiness Meeting
5:30pmDrinks and dinner out on the town

Abstracts

A Verified Parser Generator for Visibly Pushdown Grammars

Xiaodong Jia (Penn State)
This talk is about a functional, derivative-based, efficient, and formally verified parser generator for visibly pushdown grammars. The generated parsers outperform many hand-crafted parsers in parsing JSON, XML, and HTML; the correctness and the time complexity of the parsers are formally verified in Coq.

PaSh: Light-touch Parallelization for Shell Scripts

Konstantinos Kallas (UPenn)
This talk describes PaSh, a system for parallelizing shell scripts. PaSh addresses several challenges by proposing: (i) an order-aware dataflow model that captures a data-processing fragment of the shell, (ii) a set of provably correct dataflow transformations that extract parallelism in this fragment, (iii) a lightweight framework capturing the correspondence of shell commands and order-aware dataflow nodes, and (iv) a just-in-time compiler that enables sound and effective parallelization despite the dynamic nature of the shell. Applying PaSh on a variety of unmodified shell scripts achieves significant speedups (max: 33x, avg: 5.8x, on 64 cores), while remaining virtually indistinguishable from Bash β€” a result confirmed by the complete POSIX shell test suite. These results indicate that PaSh can be used as a drop-in replacement for any non-interactive shell use, providing significant speedups without risk of breakage. PaSh is open-source and available by the Linux Foundation at https://github.com/binpash/.

Polymorphic Imperative Session Types

Peter Thiemann (Uni. Freiburg)
The predominant formalizations of session-typed communication are either based on process calculi or on linear lambda calculus. An alternative model that models sessions using context-sensitive typing and typestate has been neglected due to its apparent restrictions. We call this model imperative because it treats communication channels like mutable variables and may look more familiar to imperative programmers.

We show that higher-order polymorphism and existential types are key to lift the restrictions imposed by previous work. On this bases, we define PolyVGR, the system of polymorphic imperative session types and establish its basic metatheory, type preservation and progress.

KEYNOTE: A Case for Correctly Rounded Elementary Functions

Santosh Nagarakatte (Rutgers)
This talk will provide an overview of the RLIBM project where we are building a collection of correctly rounded elementary functions for multiple representations and rounding modes. Historically, polynomial approximations for elementary functions have been designed by approximating the real value.

In contrast, we make a case for approximating the correctly rounded result of an elementary function rather than the real value of an elementary function in the RLIBM project. Once we approximate the correctly rounded result, there is an interval of real values around the correctly rounded result such that producing a real value in this interval rounds to the correct result. This interval is the freedom that the polynomial approximation has for an input, which is larger than the ones with the mini-max approach. Using these intervals, we structure the problem of generating polynomial approximations that produce correctly rounded results for all inputs as a linear programming problem. The results from the RLIBM project makes a strong case for mandating correctly rounded results with any representation that has fewer than or equal to 32-bits.

Bio: Santosh Nagarakatte is an Associate Professor and Undergraduate Program Director of Computer Science at Rutgers University. He obtained his PhD from the University of Pennsylvania in 2012. His research interests are in Hardware-Software Interfaces spanning Programming Languages, Compilers, Software Engineering, and Computer Architecture. His papers have been selected as IEEE MICRO Top Picks papers of computer architecture conferences in 2010 and 2013. He received the NSF CAREER Award in 2015, ACM SIGPLAN PLDI 2015 Distinguished Paper Award, and ACM SIGSOFT ICSE 2016 Distinguished Paper Award for his research on LLVM compiler verification. His papers have also been selected as the 2016 SIGPLAN Research Highlights Paper and 2018 Communication of the ACM Research Highlights Paper. His PhD student David Menendez’s dissertation was awarded the John C Reynolds ACM SIGPLAN Outstanding Dissertation Award in 2018. His papers on correctly rounded elementary functions have been recognized with the ACM SIGPLAN PLDI 2021 Distinguished Paper Award and the ACM SIGPLAN POPL 2022 Distinguished Paper Award.

A Concurrent Program Logic with a Future and History

Sebastian Wolff (NYU)
We devise a new separation logic geared towards automation. While local reasoning is known to be crucial for automation, we show how to retain this locality for (i) reasoning about inductive properties without the need for ghost code, and (ii) reasoning about computation histories with hindsight. We used our new logic to automatically verify challenging concurrent search structures, such as Harris' set.

Channel-Dependent Session Types

Ryan Kavanagh (McGill)
Session types provide a discipline for statically reasoning about communicating systems. However, we cannot presently use session types to reason about individual messages carried by a channel. This prevents us from capturing more precise invariants about communicating processes, in particular, invariants that constrain what a process may send based on what it has so far received. Fundamentally, the problem is that binary session types do not capture the flow of information through a system or the environment in which in which a channel is used.

In this talk, we present preliminary results on channel-dependent session types (CDSTs). CDSTs use a restricted form of concurrent type-level computation to capture a richer class of communication protocols than allowed by traditional binary session types. In particular, CDSTs can observe communications on other channels used by a process and then use these observations to restrict the communications permitted on session-typed channels. We show how to use CDSTs to specify communication protocols with various desirable causal and temporal properties.

Veracity: Declarative Multicore Programming with Commutativity

Adam Chen (Stevens)
There is an ongoing effort to provide programming abstractions that ease the burden of exploiting multicore hardware. Many programming abstractions (eg, concurrent objects, transactional memory, etc) simplify matters, but still involve intricate engineering. We argue that some difficulty of multicore programming can be meliorated through a declarative programming style in which programmers directly express the independence of fragments of sequential programs.

In this talk, we will discuss our proposed paradigm in which programmers write programs in a familiar, sequential manner, with the added ability to explicitly express the conditions under which code fragments commute. Putting such commutativity conditions into source code offers a new entry point to exploit the known connection between commutativity and parallelism. We give sequential and concurrent semantics for commute statements, as well as a correctness condition called scoped serializability and discuss how it can be enforced when combined with commutativity conditions.

We will next describe a technique for automatically synthesizing commute conditions directly from source code. We will demonstrate our new language Veracity, commutativity synthesis tools, and runtime implemented in Multicore OCaml on top of Servois and libcuckoo, a state-of-the-art concurrent hashtable. We will show how to verify and synthesize commute conditions for a variety of programs and the speedup that can be obtained from exploiting commutativity.

Absynthe: From Abstract Interpretation to Program Synthesis

Sankha Narayan Guria (UMD)
Tests that check program behavior on concrete inputs, combined with formal properties that reason about all possible behaviors form an attractive way to specify program functionality. In this talk we propose Absynthe, a tool that uses such specifications to guide program synthesis by using the framework of abstract interpretation. Absynthe requires as inputs abstract domains and program analyzers that check the property of a program in that domain, and systematically yields a program synthesis tool. Abstract domains serve as a way to tune the precision of the analysis, which in turn directs how efficiently Absynthe searches through the space of programs. Final correctness checking is left to running tests in a concrete interpreter. We evaluate Absynthe by synthesizing string manipulation programs from the SyGuS benchmark and Pandas data frame manipulation programs in Python.

The lazy demands and the clairvoyant costs

Li–yao Xia (UPenn)
Laziness is a powerful tool for building efficient purely functional data structures, but it also makes it challenging to reason about performance rigorously. While laziness is conventionally modeled statefully—the "natural semantics"—recent work has shown that laziness can also be modeled non-deterministically—the "clairvoyant semantics"—leading to a logic for reasoning about the cost of lazy programs that is quite simple to formalize in a proof assistant such as Coq. Using that logic, this talk presents an approach to analyze the cost of lazy data structures, such as Okasaki's queues. To formalize the concept of amortization, a key idea is to explicitly represent the "demand" of a lazy function.

Verifying Concurrent Multicopy Search Structures

Nisarg Patel (NYU)
Multicopy search structures such as log-structured merge (LSM) trees are optimized for high insert/update/delete (collectively known as upsert) performance. In such data structures, an upsert on key π‘˜, which adds (π‘˜, 𝑣) where 𝑣 can be a value or a tombstone, is added to the root node even if π‘˜ is already present in other nodes. Thus there may be multiple copies of π‘˜ in the search structure. A search on π‘˜ aims to return the value associated with the most recent upsert. We present a general framework for verifying linearizability of concurrent multicopy search structures that abstracts from the underlying representation of the data structure in memory, enabling proof-reuse across diverse implementations. Based on our framework, we propose template algorithms for (a) LSM structures forming arbitrary directed acyclic graphs and (b) differential file structures, and formally verify these templates in the concurrent separation logic Iris. We also instantiate the LSM template to obtain the first verified concurrent in-memory LSM tree implementation.

Reflecting on Random Generation

Harrison Goldstein (UPenn)
Frameworks like QuickCheck provide domain specific languages (DSLs) for writing generators, programs that randomly generate program inputs that are useful for testing. Generators provide lots of utility for testers, enabling thorough testing of functions with hard-to-satisfy preconditions, but it turns out that there is even more potential utility hiding in plain sight.

In this talk, I'll present reflective generators, a modified DSL for random generation that builds on ideas from bidirectional programming to improve workflows throughout the testing process. Reflective generators can do more than just sample input values: they can validate their own correctness, adapt their distributions based on user-specified examples, and even mutate test inputs while maintaining invariants. These new capabilities come at a relatively low cost and unlock state-of-the-art testing strategies that normally require entirely separate testing infrastructure.

Verified Erasure Correction in Coq with MathComp and VST

Joshua Cohen (Princeton)
Most methods of data transmission and storage are prone to errors, leading to data loss. Error-correcting codes encode the data with redundant parity information to allow the original data to be recovered even in the presence of a small number of errors. An erasure code uses an ECC to correct erasures, errors where the location is known. There are dozens of classes of error-correcting and erasure codes, many based on sophisticated mathematics, making them difficult to verify using automated tools.

In this talk, we present a formal, machine-checked proof of a C implementation of erasure correction based on Reed-Solomon coding. The C code has been actively used in network defenses for over 25 years, but the correctness of certain optimizations it uses was previously unknown. To prove this code correct, we had to both reason about the algorithm, which is based on finite fields, linear algebra, and polynomials, as well as prove that the C program correctly implements this algorithm. These proofs are both highly nontrivial but are quite different; we use Coq's Mathematical Components library for the former and the Verified Software Toolchain for the latter (after correcting a latent bug). This is the first end-to-end, formal proof of a real-world erasure code, and the first time that MathComp and VST, two very different Coq libraries with conflicting types, tactics, and proof styles, were used together.

Deeper Shallow Embeddings

Jacob Prinz (UMD)
Deep and shallow embeddings are two popular techniques for embedding a language in a host language with complementary strengths and weaknesses. In a deep embedding, embedded constructs are defined as data in the host: this allows for syntax manipulation and facilitates metatheoretic reasoning, but is challenging to implement—especially in the case of dependently typed embedded languages. In a shallow embedding, by contrast, constructs are encoded using features of the host: this makes them quite straightforward to implement, but limits their use in practice.

In this paper, we attempt to bridge the gap between the two, by presenting a general technique for extending a shallow embedding of a type theory with a deep embedding of its typing derivations. Such embeddings are almost as straightforward to implement as shallow ones, but come with capabilities traditionally associated with deep ones. We demonstrate these increased capabilities in a number of case studies; including a DSL that only holds affine terms, and a dependently typed core language with computational beta reduction that leverages function extensionality.

Talk proposals

Talk proposals were due on Wednesday, April 6th. There were twenty on-time proposals.

Contact info

You can join the NJPLS mailing list to stay up-to-date on future events. If you have questions about this run of NJPLS at Stevens, contact Michael Greenberg.