Kevin Chen

Real estate is one of the hardest open problems in scaled self driving

2024-09-03T16:24:33-07:00

I’ve had a minor obsession with Waymo’s autonomous vehicle depots recently.

Over the past few months, I’ve flown a drone as part of a stakeout to understand how they work. And I’ve taken a deep dive into an apparent Waymo outage to find the company charging its electric vehicles from temporary diesel generators.

The reason for my obsession? I believe depot buildouts will be one of the last hard problems in scaled autonomous driving. Long after the hardware, software, and AI have been perfected, real estate acquisition will remain a limiting factor in large-scale AV deployment.

Waymo’s main depot at 201 Toland Street, San Francisco.

Will self driving follow software scaling laws?

In 2021, Elon Musk claimed that Tesla FSD’s release will be “one of the biggest asset value increases in history.”

The day FSD goes to wide release will be one of the biggest asset value increases in history
— Elon Musk (@elonmusk) October 20, 2021

Musk is arguing that, once autonomous driving has been solved, it can be instantly rolled out at the push of a button. Nearly all of Tesla’s fleet could be put to productive use without humans behind the wheel.

While Musk’s viewpoint is on the extreme end, it’s a sentiment shared by many who have worked on or invested in autonomous driving over the years. Once you have hardware capable of supporting safe driverless operation, it’s just a matter of developing the right software. Software can be replicated infinitely at zero marginal cost. Could autonomous driving therefore scale as quickly as software platforms like Uber or DoorDash?

The answer is not so simple. Self-driving cars are still cars — cars that exist in the physical world and need to be parked, fueled, cleaned, and repaired. Uber and other multi-sided marketplace platforms have been able to grow exponentially because they distribute these responsibilities to the individual drivers — many Uber drivers park at their own homes — allowing the platform provider to focus on developing the software pieces.

So far, AV companies like Waymo and Cruise have taken a different approach. They’ve preferred to centralize these operational tasks in large depots staffed with their own personnel. This is because AV technology is still maturing and cannot be easily productized in the short term. Additionally, Timothy B. Lee notes in Understanding AI that “having hardware, software, and support services all under one roof makes it easier for Waymo to experiment with different technologies and business models.” When the kinks are still being worked out, it is more straightforward to vertically integrate everything in a single organization.

The many jobs to be done of a robotaxi depot

Depots for human-driven fleets, such as rental cars or delivery vans, only require a parking area with minimal additional infrastructure. This enables a fairly straightforward trade-off between location and cost: the fleet operator seeks a location close to customer demand while minimizing rent. For example, a logistics company participating in Amazon’s Delivery Service Partner program can run its depot from any sufficiently cheap parking lot near the local Amazon warehouse.

The same constraints affect depot selection for autonomous vehicles. However, the depot also needs to be more than just a convenient parking lot to store off-duty cars.

Because AVs are often also EVs, the ideal site also has electric vehicle charging.
Because AVs need to upload driving logs to the cloud, it should have a high-speed Internet connection too.

Let’s explore these constraints in detail.

Location

Depots should be placed close to customer demand to minimize deadheading (non-revenue driving), which would raise costs while degrading the customer experience with longer pickup times. Ideal depots are therefore located in desirable residential or commercial areas, where there is more competition among potential tenants.

Placing depots in high-demand neighborhoods instead of industrial areas can also increase the probability of local opposition. Waymo has already encountered opposition during a proposed expansion of their main depot, even though it is located in an industrial neighborhood with many similar facilities. Again from Timothy B. Lee:

Waymo sought a permit to convert the warehouse next door into some office space and a parking lot for Waymo employees. San Francisco’s Board of Supervisors unanimously rejected Waymo’s application.

The rejection was partly based on fears that Waymo would eventually use the space to launch a delivery service in the city (Waymo hasn’t announced any plans to do this so far). But it also reflected city leaders’ frustration with their general lack of power over Waymo.

Now consider the recent incident in which driverless Waymo vehicles honked at each other while entering a depot near residential buildings in San Francisco, often well into the early hours of the morning. While residents and the company resolved the situation amicably, it will surely be raised in future discussions of new Waymo depots in residential areas should they come before the Board of Supervisors or Planning Commission.

Electric vehicle charging

AV developers have preferred to run their services with electric vehicles. Although AV and EV technologies are not inherently coupled, running a fully electric fleet adds an environmental angle to the AV sales pitch, allowing the companies to claim that AV rides reduce emissions by displacing gas-powered driving. An EV fleet also lowers vehicle maintenance costs.

Waymo and Cruise each have locations with DC fast charging capability. This approach avoids relying on public chargers. Taking Waymo’s primary San Francisco depot as an example, the company installed 38 chargers of approximately 60 kW each, implying a total site power of around 2.4 MW.

Waymo vehicles charging in San Francisco. Approximately one-third of parking spots in the main depot have charging.

Bringing in so many high-power chargers likely added significant complexity to Waymo’s depot construction. While we don’t know Waymo’s process, we have a fairly good benchmark from the Tesla community, which tracks Tesla Supercharger installations closely. From Bruce Mah, a seasoned EV charging observer, the construction process is:

As with any construction project, things usually start with selecting a site and permitting. There will often be some demolition / excavation of part of a parking lot (Superchargers are often built in existing parking lots). Tesla equipment such as charging cabinets, posts, etc. will usually be installed next (see T1 below). Eventually there will be some inspections from the local Authority Having Jurisdiction (AHJ). A utility transformer (from PG&E, SCE, etc.) is usually the last piece of equipment to be installed. Repaving, painting, and installation of parking stops will also usually happen late in the process, as well as landscaping and lighting enhancements.

Of these steps, permitting and utility work are not within the charging operator’s control. California municipalities, especially San Francisco, have a notoriously slow and political permitting process. With PG&E, the utility serving much of the state, electrical service upgrades involving a new distribution transformer can take months.

Timelines aside, building out a charging site is also expensive. For example, an agreement between Tesla and the City of West Hollywood values an eight-plug location at $482,942 for both equipment and construction.

Data offload

The final piece of the puzzle is data offload. Autonomous vehicles log vast amounts of data as they drive, measured in hundreds of GBs to TBs per hour.

Some of the data is subject to mandatory retention and must be uploaded for later review. At a minimum, all AV collisions in California must be reported to the DMV. Regulators at all levels of government expect the AV developer to present analyses of serious incidents, including recordings from the vehicle and explanations of the AV’s decisions.
In addition to regulatory requirements, the AV developer often wants to return much more data for engineering purposes: near misses, stuck events, novel or interesting scenarios, and more.

The upshot is that the AV operator needs to upload a substantial portion of the hundreds of GBs to TBs logged per hour of driving. Uploading over cellular networks would not be cost effective. These transfers must occur at a depot.

Today, it’s likely that Waymo and Cruise use disk swapping for data offload. When a car fills up its internal logging disk, it notifies an operator to plug in a fresh one. The full disks may be uploaded directly from the depot or shipped to a datacenter. This whole process is labor intensive and, over time, may pose a reliability concern due to dust or water ingress.

A Waymo operator performs a possible disk swap.

Many AV developers are moving toward direct data transfer from the vehicle using Ethernet, Wi-Fi, or a private 5G network, which reduces the number of manual touch points and moving parts. Charging is a great time to perform these transfers. However, this imposes an additional requirement on the depot: a fast upload speed, probably a fiber connection of at least 10 Gbps.

Where do we go from here?

When we put all three requirements together (great location, high-power EV charging, and high-speed Internet), there may be few to none sites that fit the bill. This would require the AV operator to take on site-specific construction projects to add amenities like charging and Internet — a strategy that sits in direct opposition to rapid and cost-effective scaling.

Another possibility is to engineer ways to relax the constraints.

Decoupling the requirements

Waymo and Cruise do not require all of their locations to have charging and data offload. For example, Waymo operates satellite lots in downtown San Francisco only for storing their off-hail vehicles. Every night, fleet management software instructs the cars to travel back to the main depot for charging and data offload.

A Waymo satellite location in San Francisco with minimal staffing, no charging, and apparently no data offloading.

This solution works as long as the total charging and data transfer capacity across all locations exceeds the average throughput required to keep the fleet in working order. However, the lack of redundancy can lead to cascading failures, such as the apparent power outage at Waymo’s main depot that led the company to shut down many vehicles during a Friday evening rush hour.

Reducing charging power

Waymo and Cruise currently use DC fast charging (DCFC) for their fleets. Level 2 (L2, also known as AC) charging could reduce the cost of buildouts because the equipment is cheaper and can often be installed without bringing in a new utility transformer. This could enable overnight charging in satellite locations that do not currently have any charging capacity. Imagine an operator arriving at night to plug in all the cars when there is little demand, then returning in the morning to unplug them before customers wake up.

There is an order of magnitude speed difference between L2 and DCFC.

This is important for consumer charging, where the consumer cares about the time to get a single car back on the road.
However, charging power for any individual car becomes less important when charging a large fleet. Fleet operators care about the total throughput of turning around cars, which is proportional to total power delivered across all chargers.

In other words, assuming an autonomous ride hailing service will always have overnight lulls in demand and enough parking spots during those times, the most scalable strategy is to procure your desired total charging power at the lowest price.

DCFC equipment costs disproportionately more per kW due to the additional complexity of the charging equipment — and that doesn’t include the additional maintenance complexity. The table below compares ChargePoint’s cheapest L2 and DCFC units:

Charger	Power (kW)	Price ($)	Unit Price ($/kW)
ChargePoint CPF50	9.6 kW	$1,299	$135/kW
ChargePoint CPE250	62.5 kW	$52,000	$832/kW

In addition to more scalable depot buildouts, reduced charging power can also improve the longevity of the vehicle’s traction battery, which is an important factor in managing vehicle depreciation.

Reducing data logging rate

Most AV developers start out by logging and uploading all data generated on their vehicles. This makes development easy because the data is always there when you need it. These assumptions need to be broken when a growing fleet generates proportionally more log data, most of which contain routine driving and are not very interesting.

We can split the data logged by AVs into two categories:

Raw sensor data, such as lidar point clouds, camera images, radar returns, and audio.
Derived data, such as detections from the perception system or motion plans from the behavior system.

One approach is to keep only one category of data. Retaining only the derived data can still enable debugging of serious incidents, as long as the perception system can be trusted to provide a faithful representation of the raw sensor data. On the other hand, retaining only the raw sensor data makes the logs more useful for developing the mapping and perception system. An approximation of the derived data can be generated by running a replay simulator as needed, but it is challenging to reproduce the exact same outputs as those on the vehicle unless the AV software is fully deterministic.

Data retention decisions can also be made temporally. The key challenge here is high-recall classification of which time ranges in the log must be retained. For example, if a DMV-reportable collision occurs, the associated log data must never be discarded. These decisions can happen either on-device or in the cloud, but they must be made without uploading the full log to the cloud, since our bottleneck is the connection from the vehicle to the Internet.

Conclusion

The current trajectory for scaled autonomous driving would require desirable depot locations to include charging and Internet, making real estate acquisition challenging. There exist opportunities to reduce the additional requirements over time with the goal of making the problem closer to “rent a bunch of conveniently located parking lots.” While these are not traditionally considered autonomous driving problems, solving them will be key to unlocking the next phase of scaling.

Large language models are a sustaining innovation for Siri

2024-06-11T11:37:00-07:00

Many people assume that large language models (LLMs) will disrupt existing consumer voice assistants. Compared to Siri, while today’s ChatGPT is largely unable to complete real-world tasks like hailing an Uber, it’s far better than Siri at understanding and generating language, especially in response to novel requests.

From Tom’s Hardware, this captures the sentiment I see among tech commentators:

GPT-4o will enable ChatGPT to become a legitimate Siri competitor, with real-time conversations via voice that are responded to instantly without lag time. […] ChatGPT’s new real-time responses make tools like Siri and Echo seem lethargic. And although ChatGPT likely won’t be able to schedule your haircuts like Google Assistant can, it did put up admirable real-time translating chops to challenge Google.

Last year, there were rumors that OpenAI was working on its own hardware, which would open the possibility of integrating ChatGPT at the system level along the lines of the Humane Ai Pin. Would such a product be able to mount a successful challenge against Siri, Alexa, and Google Assistant?

After Apple’s WWDC keynote yesterday and seeing the updated Siri APIs, I think it’s more likely that LLMs are a sustaining innovation for Siri — a technological innovation that strengthens the position of incumbent voice assistants.

Apple promised to increase the number of ways in which Siri can take action in apps. Source: Apple.

Traditional voice assistants work by matching the user’s queries to a fixed set of intents. Pierce Freeman explains the general approach:

The previous generation of personal assistants had control logic that was largely hard-coded. They revolved around the idea of an intent — a known task that a user wanted to do like sending a message, searching for weather, etc. Detecting this intent might be keyword based or trained into a model that converts a sequence to a one-hot class space. But generally speaking there were discrete tasks and the job of the NLU pipeline was to delegate it to sub-modules.

Once the query has been matched to an intent, the next step is to “fill in the blanks” for any inputs needed by the intent:

If it believes you’re looking for weather, a sub-module would attempt to detect what city you’re asking about. This motivated a lot of the research into NER (named entity recognition) to detect the more specific objects of interest and map them to real world quantities. city:San Francisco and city:SF to id:4467 for instance.

Conversational history was implemented by keeping track of what the user had wanted in previous steps. If a new message is missing some intent, it would assume that a previous message in the flow had a relevant intent. This process of back-detecting the relevant intent was mostly hard-coded or involved a shallow model.

A natural outcome is that Apple is forced to develop an expansive and complicated system of intents because it is the only way to expand the assistant’s capabilities. In 2016, Apple also allowed third-party developers to integrate their apps’ functionality by providing intents through SiriKit. Once the developer defines the inputs and outputs, the intents could appear in Siri, the Shortcuts app, proactive notifications, etc. alongside first-party intents by Apple. Similar frameworks exist on other platforms: Android App Actions and Alexa Skills.

However, no matter how rich the intent library becomes, the overall user experience can still suffer if access is gated by a brittle intent-matching process: either (1) matching to the incorrect intent, or (2) after matching the correct intent, parsing the request parameters incorrectly. In my opinion, the intent matching stage is the primary source of users’ frustration with Siri.

Incorrect named entity recognition by Siri.

Contrast this with ChatGPT plugins, a similar system that allows the model to interact with external APIs by determining which plugin might be relevant to the user’s request, then reading the plugin’s API specification to determine the input and output parameters. In other words, the intent matching is performed by an LLM. The generalist nature of LLMs seems to reduce brittleness. For example, when using the code interpreter plugin, the model can write arbitrary Python code and fix resulting runtime errors.

The main issue for challengers (OpenAI, Humane, and Rabbit) is the lack of third-party integrations to make their assistants helpful in consumers’ digital lives, extending beyond general knowledge tasks. For example:

The Humane Ai Pin only streams music from Tidal, not Spotify nor Apple Music.
The Rabbit R1 “large action model” is, in reality, just a few handwritten UI automation scripts for the applications shown in their demo. The system does not appear to generalize to unseen applications.
- In general, while companies working on UI agents have shown some limited demos, I’m not aware of any that run with high reliability and scale. Even if they achieve scale and generalization, their agents could be made less reliable by app developers using anti-scraping techniques because the developers prefer to own the customer relationship, or as leverage for future partnership negotiations. This type of system is probably a few years out at a minimum.

Without a large user base, developers have no incentive to port their apps, leaving the integration work to the platform owner, as in the cases of Humane and Rabbit. Meanwhile, Apple, Amazon, and Google each have a pre-existing app ecosystem. Their position as aggregators means developers are highly motivated to access the enormous installed base of iOS, Alexa, and Android devices.

Assuming LLM technology will become a commodity, the incumbents’ in-house LLMs ought to be able to provide decent intent matching and language skills. Combined with an expansive library of intents, it seems very possible that integrating LLMs will cement the incumbent voice assistants as the dominant platforms. Challengers might be better off building products in areas that don’t depend on third-party integrations for key functionality.

How autonomous vehicle simulation works

2024-06-06T18:57:40-07:00

When autonomous vehicle developers justify the safety of their driverless vehicle deployments, they lean heavily on their testing in simulation. Common talking points take the form of “we made our car drive X billion miles in simulation.” From these vague statements, it’s challenging to determine what a simulator is, or how it works.

There’s more to simulation than endless driving in a virtual environment.

For example, Waymo’s technology overview page says (emphasis mine):

We’ve driven more than 20 billion miles in simulation to help identify the most challenging situations our vehicles will encounter on public roads. We can either replay and tweak real-world miles or build completely new virtual scenarios, for our autonomous driving software to practice again and again.

Cruise’s safety page contains similar language:¹

Before setting out on public roads, Cruise vehicles complete more than 250,000 simulations and closed course testing during everyday and extreme conditions.

The main impression one gets from these overviews is that (1) simulation can test many driving scenarios, and (2) everyone will be super impressed if you use it a lot.

Going one layer deeper to the few blog posts and talks full of slick GIFs, you might reach the conclusion that simulation is like a video game for the autonomous vehicle in the vein of Grand Theft Auto (GTA): a fully generated 3D environment complete with textures, lighting, and non-player characters (NPCs). Much like human players of GTA, the autonomous vehicle would be able to drive however it likes, freed from real-world consequences.

Source: Cruise.

While this type of fully synthetic simulation exists in the world of autonomous driving, it’s actually the least commonly used type of simulation.²

Instead, just as a software developer leans on many kinds of testing before releasing an application, an AV developer runs many types of simulation before deploying an autonomous vehicle. Each type of simulation is best suited for a particular use case, with trade-offs between realism, coverage, technical complexity, and cost to operate.

In this post, we’ll walk through the system design of a simulator at a hypothetical AV company, starting from first principles.

We may never know the details of the actual simulator architecture used by any particular AV developer. However, by exploring the design trade-offs from first principles, I hope to shed some light on how this key system works.

Our imaginary self-driving car
Replay simulation
- Interactivity and the pose divergence problem
Synthetic simulation
Hybrid simulation
Conclusion

Our imaginary self-driving car

Let’s begin by defining our hypothetical autonomous driving software, which will help us illustrate how simulation fits into the development process.

Imagine it’s 2015, the peak of self-driving hype, and our team has raised a vast sum of money to develop an autonomous vehicle. Like a human driver, our software drives by continuously performing a few basic tasks:

It makes observations about the road and other road users.
It reasons about what others might do and plans how it should drive.
Finally, it executes those planned motions by steering, accelerating, and braking.
Rinse and repeat.

This mental model helps us group related code into modules, enabling them to be developed and tested independently. There will be four modules in our system:³

Sensor Interface: Take in raw sensor data such as camera images and lidar point clouds.
Sensing: Detect objects such as vehicles, pedestrians, lane lines, and curbs.
Behavior: Determine the best trajectory (path) for the vehicle to drive.
Vehicle Interface: Convert the trajectory into steering, accelerator, and brake commands to control the vehicle’s drive-by-wire (DBW) system.

We connect our modules to each other using an inter-process communication framework (“middleware”) such as ROS, which provides a publish–subscribe system (pubsub) for our modules to talk to each other.

Here’s a concrete example of our module-based encapsulation system in action:

The sensing module publishes a message containing the positions of other road users.
The behavior module subscribes to this message when it wants to know whether there are pedestrians nearby.

The behavior module doesn’t know and doesn’t care how the perception module detected those pedestrians; it just needs to see a message that conforms to the agreed-upon API schema.

Defining a schema for each message also allows us to store a copy of everything sent through the pubsub system. These driving logs will come in handy for debugging because it allows us to inspect the system with module-level granularity.

Our full system looks like this:

Simplified architecture diagram for an autonomous vehicle.

Now it’s time to take our autonomous vehicle for a spin. We drive around our neighborhood, encountering some scenarios in which our vehicle drives incorrectly, which cause our in-car safety driver to take over driving from the autonomous vehicle. Each disengagement gets reviewed by our engineering team. They analyze the vehicle’s logs and propose some software changes.

Now we need a way to prove our changes have actually improved performance. We need the ability to compare the effectiveness of multiple proposed fixes. We need to do this quickly so our engineers can receive timely feedback. We need a simulator!

Replay simulation

Motivated by the desire to make progress quickly, we try the simplest solution first. The key insight: our software modules don’t care where the incoming messages come from. Could we simulate a past scenario by simply replaying messages from our log as if they were being sent in real time?

As the name suggests, this is exactly how replay simulation works.

Under normal operation, the input to our software is sensor data captured from real sensors. The simulator replaces this by replaying sensor data from an existing log.
Under normal operation, the output of our software is a trajectory (or a set of accelerator and steering commands) that the real car executes. The simulator intercepts the output to control the simulated vehicle’s position instead.

Modified architecture diagram for running replay simulation.

There are two primary ways we can use this type of simulator, depending on whether we use a different software version as the onroad drive:

Different software: By running modified versions of our modules in the simulator, we can get a rough idea of how the changes will affect the vehicle’s behavior. This can provide early feedback on whether a change improves the vehicle’s behavior or successfully fixes a bug.
Same software: After a disengagement, we may want to know what would have happened if the autonomous vehicle were allowed to continue driving without human input. Simulation can provide this counterfactual by continuing to play back messages as if the disengagement never happened.

We’ve gained these important testing capabilities with relatively little effort. Rather than take on the complexity of a fully generated 3D environment, we got away with a few modifications to our pubsub framework.

Interactivity and the pose divergence problem

The simplicity of a pure replay simulator also leads to its key weakness: a complete lack of interactivity. Everything in the simulated environment was loaded verbatim from a log. Therefore, the environment does not respond to the simulated vehicle’s behavior, which can lead to unrealistic interactions with other road users.

This classic example demonstrates what can happen when the simulated vehicle’s behavior changes too much:

Watch on YouTube.

Dragomir Anguelov’s guest lecture at MIT. Source: Lex Fridman.

Our vehicle, when it drove in the real world, was where the green vehicle is. Now, in simulation, we drove differently and we have the blue vehicle.

So we’re driving…bam. What happened?

Well, there is a purple agent over there — a pesky purple agent — who, in the real world, saw that we passed them safely. And so it was safe for them to go, but it’s no longer safe, because we changed what we did.

So the insight is: in simulation, our actions affect the environment and needed to be accounted for.

Anguelov’s video shows the simulated vehicle driving slower than the real vehicle. This kind of problem is called pose divergence, a term that covers any simulation where differences in the simulated vehicle’s driving decisions cause its position to differ from the real-world vehicle’s position.

In the video, the pose divergence leads to an unrealistic collision in simulation. A reasonable driver in the purple vehicle’s position would have observed the autonomous vehicle and waited for it to pass before entering the intersection.⁴ However, in replay simulation, all we can do is play back the other driver’s actions verbatim.

In general, problems arising from the lack of interactivity mean the simulated scenario no longer provides useful feedback to the AV developer. This is a pretty serious limitation! The whole point of the simulator is to allow the simulated vehicle to make different driving decisions. If we cannot trust the realism of our simulations anytime there is an interaction with another road user, it rules out a lot of valuable use cases.

Synthetic simulation

We can solve these interactivity problems by using a simulated environment to generate synthetic inputs that respond to our vehicle’s actions. Creating a synthetic simulation usually starts with a high-level scene description containing:

Agents: fully interactive NPCs that react to our vehicle’s behavior.
Environments: 3D models of roads, signs, buildings, weather, etc. that can be rendered from any viewpoint.

From the scene description, we can generate different types of synthetic inputs for our vehicle to be injected at different layers of its software stack, depending on which modules we want to test.

In synthetic sensor simulation, the simulator uses a game engine to render the scene description into fake sensor data, such as camera images, lidar point clouds, and radar returns. The simulator sets up our software modules to receive the generated imagery instead of sensor data logged from real-world driving.

Modified architecture diagram for running synthetic simulation with generated sensors.

The same game engine can render the scene from any arbitrary perspective, including third-person views. This is how they make all those slick highlight reels.

The high cost of realistic imagery

Simulations that generate fake sensor data can be quite expensive, both to develop and to run. The developer needs to create a high-quality 3D environment with realistic object models and lighting rivaling AAA games.

Example of Cruise’s synthetic simulation showing the same scene rendered into synthetic camera, lidar, and radar data. Source: Cruise.

For example, a Cruise blog post mentions some elements of their synthetic simulation roadmap (emphasis mine):

With limited time and resources, we have to make choices. For example, we ask how accurately we should model tires, and whether or not it is more important than other factors we have in our queue, like modeling LiDAR reflections off of car windshields and rearview mirrors or correctly modeling radar multipath returns.

Even if rendering reflections and translucent surfaces is already well understood in computer graphics, Cruise may still need to make sure their renderer generates realistic reflections that resemble their lidar. This challenge gives a sense of the attention to detail required. It’s only one of many that needs to be solved when building a synthetic sensor simulator.

So far, we have only covered the high development costs. Synthetic sensor simulation also incurs high variable costs every time simulation is run.

Round-trip conversions to pixels and back

By its nature, synthetic sensor simulation performs a round-trip conversion to and from synthetic imagery to test the perception system. The game engine first renders its scene description to synthetic imagery for each sensor on the simulated vehicle, burning many precious GPU-hours in the process, only to have the perception system perform the inverse operation when it detects the objects in the scene to produce the autonomous vehicle’s internal scene representation.⁵ Every time you launch a synthetic sensor simulation, NVIDIA, Intel, and/or AWS are laughing all the way to the bank.

Despite the expense of testing the perception system with synthetic simulation, it is also arguably less effective than testing with real-world imagery paired with ground truth labels. With real imagery, there can be no question about its realism. Synthetic imagery never looks quite right.

These practical limitations mean that synthetic sensor simulation ends up as the least used simulator type in AV companies. Usually, it’s also the last type of simulator to be built at a new company. Developers don’t need synthetic imagery most of the time, especially when they have at their disposal a fleet of vehicles that can record the real thing.

On the other hand, we cannot easily test risky driving behavior in the real world. For example, it is better to synthesize a bunch of red light runners than try to find them in the real world. This means we are primarily using synthetic simulation to test the behavior system.

Skipping the sensor data

In synthetic agent simulation, the simulator uses a high-level scene description to generate synthetic outputs from the perception/sensing system. In software development terms, it’s like replacing the perception system with a mock to focus on testing downstream components.

This type of simulation requires fewer computational resources to run because the scene description doesn’t need to make a round-trip conversion to sensor data.

Modified architecture diagram for running synthetic simulation with generated agents.

With image quality out of the picture, the value of synthetic simulation rests solely on the quality of the scenarios it can create. We can split this into two main challenges:

designing agents with realistic behaviors
generating the scene descriptions containing various agents, street layouts, and environmental conditions

Making smart agents

You could start developing the control policy for a smart agent similar to NPC design in early video games.

A basic smart agent could simply follow a line or a path without reacting to anyone else, which could be used to test the autonomous vehicle’s reaction to a right of way violation.
A fancier smart agent could follow a path while also maintaining a safe following distance from the vehicle in front. This type of agent could be placed behind our simulated vehicle, resolving the rear-ending problem mentioned above.

Like an audience of demanding gamers, the users of our simulator quickly expect increasingly complex and intelligent behaviors from the smart agents. An ideal smart agent system would capture the full spectrum of every action that other road users could possibly take. This system would also generate realistic behaviors, including realistic-looking trajectories and reaction times, so that we can trust the outcomes of simulations involving smart agents. Finally, our smart agents need to be controllable: they can be given destinations or intents, enabling developers to design simulations that test specific scenarios.

Watch on YouTube.

Two Cruise simulations in which smart agents (orange boxes) interact with the autonomous vehicle. In the second simulation, two parked cars have been inserted into the bottom of the visualization. Notice how the smart agents and the autonomous vehicle drive differently in the two simulations as they interact with each other and the additional parked cars. Source: Cruise.

Developing a great smart agent policy ends up falling in the same difficulty ballpark as developing a great autonomous driving policy. The two systems may even share technical foundations. For example, they may have a shared component that is trained to predict the behaviors of other road users, which can be used for both planning our vehicle’s actions and for generating realistic agents in simulation.

Generating scene descriptions

Even with the ability to generate realistic synthetic imagery and realistic smart agent behaviors, our synthetic simulation is not complete. We still need a broad and diverse dataset of scene descriptions that can thoroughly test our vehicle.

These scene descriptions usually come from a mix of sources:

Automatic conversion from onroad scenarios: We can write a program that takes a logged real-world drive, guesses the intent of other road users, and stores those intents as a synthetic simulation scenario.
Manual design: Analogous to a level editor in a video game. A human either builds the whole scenario from scratch or makes manual edits to an automatic conversion. For example, a human can design a scenario based on a police report of a human-on-human-driver collision to simulate what the vehicle might have done in that scenario.
Generative AI: Recent work from Zoox uses diffusion models trained on a large dataset of onroad scenarios.

Example of a real-world log (top) converted to a synthetic simulation scenario, then rendered into synthetic camera images (bottom). Notice how some elements, such as the protest signs, are not carried over, perhaps because they are not supported by the perception system or the scene converter. Source: Cruise.

Scenarios can also be fuzzed, where the simulator adds random noise to the scene parameters, such as the speed limit of the road or the goals of simulated agents. This can upsample a small number of converted or manually designed scenes to a larger set that can be used to check for robustness and prevent overfitting.

Fuzzing can also help developers understand the space of possible outcomes, as shown in the example below, which fuzzes the reaction time of a synthetic tailgater:

An example of fuzzing tailgater reaction time. Source: Waymo.

The distribution on the right shows a dot for each variant of the scenario, colored green or red depending on whether a simulated collision occurred. In this experiment, the collision becomes unavoidable once the simulated tailgater’s reaction time exceeds about 1 second.

Limitations of pure synthetic simulation

With these sources plus fuzzing, we’ve ensured the quantity of scenarios in our library, but we still don’t have any guarantees on the quality.

Perhaps the scenarios we (and maybe our generative AI tools) invent are too hard or too easy compared to the distribution of onroad driving our vehicle encounters.

If our vehicle drives poorly in a synthetic scenario, does the autonomous driving system need improvement? Or is the scenario unrealistically hard, perhaps because the behavior of its smart agents is too unreasonable?
If our vehicle passes with flying colors, is it doing a good job? Or is the scenario library missing some challenging scenarios simply because we did not imagine that they could happen?

This is a fundamental problem of pure synthetic simulation. Once we start modifying and fuzzing our simulated scenarios, there isn’t a straightforward way to know whether they remain representative of the real world. And we still need to collect a large quantity of real-world mileage to ensure that we have not missed any rare scenarios.

Hybrid simulation

We can combine our two types of simulator into a hybrid simulator that takes advantages of the strengths of each, providing an environment that is both realistic and interactive without breaking the bank.

From replay simulation, use log replay to ensure every simulated scenario is rooted in a real-world scenario and has perfectly realistic sensor data.
From synthetic simulation, make the simulation interactive by selectively replacing other road users with smart agents if they could interact with our vehicle.⁶

Modified architecture diagram merging parts of replay and synthetic simulation.

Hybrid simulation usually serves as the default type of simulation that works well for most use cases. One convenient interpretation is that hybrid simulation is a worry-free replacement for replay simulation: anytime the developer would have used replay, they can absentmindedly switch to hybrid simulation to take care of the most common simulation artifacts while retaining most of the benefits of replay simulation.

Conclusion

We’ve seen that there are many types of simulation used in autonomous driving. They exist on a spectrum from purely replaying onroad scenarios to fully synthesized environments. The ideal simulation platform allows developers to pick an operating point on that spectrum that fits their use case. Hybrid simulation based on a large volume of real-world miles satisfies most testing needs at a reasonable cost, while fully synthetic modes serve niche use cases that can justify the higher development and operating costs.

Cruise has written several deep dives about the usage and scaling of their simulation platform. However, neither Cruise nor Waymo provide many details on the construction of their simulator. ↩
I’ve even heard arguments that it’s only good for making videos. ↩
There exist architectures that are more end-to-end. However, to the best of my knowledge, those systems do not have driverless deployments with nontrivial mileage, making simulation testing less relevant. ↩
Another interactivity problem arises from the replay simulator’s inability to simulate different points of view as the simulated vehicle moves. A large pose divergence often causes the simulated vehicle to drive into an area not observed by the vehicle that produced the onroad log. For example, a simulated vehicle could decide to drive around a corner much earlier. But it wouldn’t be able to see anything until the log data also rounds the corner. No matter where the simulated vehicle drives, it will always be limited to what the logged vehicle saw. ↩
“Computer vision is inverse computer graphics.” ↩
As a nice bonus, because the irrelevant road users are replayed exactly as they drove in real life, this may reduce the compute cost of simulation. ↩

Why autonomous trucking is harder than autonomous rideshare

2024-01-10T15:22:43-08:00

Recently, The Verge asked, “where are all the robot trucks?”

It’s a good question.

Trucking was supposed to be the ideal first application of autonomous driving. Freeways contain predictable, highly structured driving scenarios. An autonomous truck would not have to deal with the complexities of intersections and two-way traffic. It could easily drive hundreds of miles without encountering a single pedestrian.

DALL-E 3 prompt: “Generate an artistic, landscape aspect ratio watercolor painting of a truck with a bright red cab, pulling a white trailer. The truck drives uphill on an empty, rural highway during wintertime, lined with evergreen trees and a snow bank on a foggy, cloudy day.”

The trucks could also be commercially viable with only freeway driving capability, or freeways plus a short segment of surface streets needed to reach a transfer hub. The AV company would only need to deal with a limited set of businesses as customers, bypassing the messiness of supporting a large pool of consumers inherent to the B2C model.

Autonomous trucks would not be subject to rest requirements. As The Verge notes, “truck operators are allowed to drive a maximum of 11 hours a day and have to take a 30-minute rest after eight consecutive hours behind the wheel. Autonomous trucks would face no such restrictions,” enabling them to provide a service that would be literally unbeatable by a human driver.

If you had asked me in 2018, when I first started working in the AV industry, I would’ve bet that driverless trucks would be the first vehicle type to achieve a million-mile driverless deployment. Aurora even pivoted their entire company to trucking in 2020, believing it to be easier than city driving.

Yet sitting here in 2024, we know that both Waymo and Cruise have driven millions of miles on city streets — a large portion in the dense urban environment of San Francisco — and there are no driverless truck deployments. What happened?

I think the problem is that driverless autonomous trucking is simply harder than driverless rideshare.

The trucking problem appears easier at the outset, and indeed many AV developers quickly reach their initial milestones, giving them false confidence. But the difficulty ramps up sharply when the developer starts working on the last bit of polish. They encounter thorny problems related to the high speeds on freeways and trucks’ size, which must be solved before taking the human out of the driver’s seat.

What is the driverless bar?

Here’s a simplistic framework:

No driver in the vehicle.
No guarantee of a timely response from remote operators or backend services.
Therefore, all safety-critical decisions must be made by the onboard computer alone.
Under these constraints, the system still meets or exceeds human safety level.

This is a really, really high bar. For example, on surface streets, this means the system on its own is capable of driving at least 100k miles without property damage and 40M miles without fatality.¹

The system can still have flaws, but virtually all of those problems must result in a lack of progress, rather than collision or injury. In short, while the system may not know the right thing to do in every scenario, it should never do the wrong thing.

(There are several high quality safety frameworks for those interested in a rigorous definition.²³ It’s beyond the scope of this post.)

Now, let’s look at each aspect of trucking to see how it exacerbates these challenges.

Truck-specific challenges

Stopping distance vs. sensing range

The required sensor capability for an autonomous vehicle is determined by the most challenging scenario that the vehicle needs to handle. A major challenge in trucking is stopping behind a stalled vehicle or large debris in a travel lane. To avoid collision, the autonomous vehicle would need a sensing range greater than or equal to its stopping distance.

We’ll make a simplifying assumption that stopping distance defines the minimum detection range requirements. A driverless-quality perception system needs perfect recall on other vehicles within the vehicle’s worst-case stopping distance.

Passenger vehicles can decelerate up to –8 m/s². Trucks can only achieve around –4 m/s², which increases the stopping distance and puts the sensing range requirement right at the edge of what today’s sensors can deliver.

Here are the sight stopping distances for an empty truck in dry conditions on roads of varying grade:⁴

Speed (mph)	0% Grade (m)	–3% Grade (m)	–6% Grade (m)
50	115–141	124–150	136–162
70	122–178	136–162	236–305

Sight stopping distances defined as the distance needed to stop assuming a 2.5-second reaction time with no braking, followed by maximum braking. The distance is computed for an empty truck in dry conditions on roads of varying grade. Stopping distance increases in wet weather or when driving downhill with a load (not shown).

Now let’s compare these distances with the capabilities of various sensors:

Lidar sensors provide trustworthy 3D data because they take direct measurements based on physical principles. They have a usable range of around 200–250 meters, plenty for city driving but not enough for every truck use case. Lidar detection models may also need to accumulate multiple scans/frames over time to detect faraway objects reliably, especially for smaller items like debris, further decreasing the usable detection range.

Note that some solid-state lidars claim significantly more range than 250 meters. These numbers are collected under ideal conditions; for computing minimum sensing capability, we are interested in the range that can provide perfect recall and really great precision. For example, the lidar may be unable to reach its maximum range over the entire field of view, or may require undesirable trade-offs like a scan pattern that reduces point density and field of view to achieve more range.

Radar can see farther than lidar. For example, this high-end ZF radar claims vehicle detections up to 350 meters away. Radar is great for tracking moving vehicles, but has trouble distinguishing between stationary vehicles and other background objects. Tesla Autopilot has infamously shown this problem by braking for overpasses and running into stalled vehicles. “Imaging” radars like the ZF device will do better than the radars on production vehicles. They still do not have the azimuth resolution to separate objects beyond 200 meters, where radar input is most needed.

Cameras can detect faraway objects as long as there are enough pixels on the object, which leads to the selection of cameras with high resolution and a narrow field of view (telephoto lens). A vehicle will carry multiple narrow cameras for full coverage during turns. However, cameras cannot measure distance or speed directly.

A combined camera + radar system using machine learning probably has the best chance here, especially with recent advances in ML-based early fusion, but would need to perform well enough to serve as the primary detection source beyond 200 meters. Training such a model is closer to an open problem than simply receiving that data from a lidar.

In summary, we don’t appear to have any sensing solutions with the performance needed for trucks to meet the driverless bar.

Controls

Controlling a passenger vehicle — determining the amount of steering and throttle input to make the vehicle follow a trajectory — is a simpler problem than controlling a truck. For example, passenger vehicles are generally modeled as a single rigid body, while a truck and its trailer can move separately. The planner and controller need to account for this when making sharp turns and, in extreme low-friction conditions, to avoid jackknifing.

These features come in addition to all the usual controls challenges that also apply to passenger vehicles. They can be built but require additional development and validation time.

Freeway-specific challenges

OK, so trucks are hard, but what about the freeway part? It may now sound appealing to build L4 freeway autonomy for passenger vehicles. However, driving on freeways also brings additional challenges on top of what is needed for city streets.

Achieving the minimal risk condition on freeways

Autonomous vehicles are supposed to stop when they detect an internal fault or driving situation that they can’t handle. This is called the minimal risk condition (MRC). For example, an autonomous passenger vehicle that detects an error in the HD map or a sensor failure might be programmed to execute a pullover or stop in lane depending on the problem severity.

While MRC behaviors are annoying for other road users and embarrassing for the AV developer, they do not add undue risk on surface streets given the low speeds and already chaotic nature of city driving. This gives the AV developer more breathing room (within reason) to deploy a system that does not know how to handle every driving scenario perfectly, but knows enough to stay out of trouble.

It’s a different story on the freeway. Stopping in lane becomes much more dangerous with the possibility of a rear-end collision at high speed. All stopping should be planned well in advance, ideally exiting at the next ramp, or at least driving to the closest shoulder with enough room to park.

This greatly increases the scope of edge cases that need to be handled autonomously and at freeway speeds. For example:

Scene understanding: If the vehicle encounters an unexpected construction zone, crash site, or other non-nominal driving scenario, it’s not enough to detect and stop. Rerouting, while a viable option on surface streets, usually isn’t an option on freeways because it may be difficult or illegal to make a u-turn by the time the vehicle can see the construction. A freeway under construction is also more likely to be the only path to the destination, especially if the autonomous vehicle in question is not designed to drive on city streets.

Operational solutions are also not enough for a scaled deployment. AV developers often disallow their vehicles from routing through known problem areas gathered from manually driven scouting vehicles or announcements made by authorities. For a scaled deployment, however, it’s not reasonable to know the status of every mile of road at all times.

Therefore, the system needs to find the right path through unstructured scenarios, possibly following instructions from police directing traffic, even if it involves traffic violations such as driving on the wrong side of the road. We know that current state-of-the-art autonomous vehicles still occasionally drive into wet concrete and trenches, which shows it is nontrivial to make a correct decision.

Mapping: If the lane lines have been repainted, and the system normally uses an HD map, it needs to ignore the map and build a new one on-the-fly from the perception system’s output. It needs to distinguish between mapping and perception errors.

Uptime: Sensor, computer, and software failures need to be virtually eliminated through redundancy and/or engineering elbow grease. The system needs almost perfect uptime. For example, it’s fine to enter a max-braking MRC when losing a sensor or restarting a software module on surface streets, provided those failures are rare. The same maneuver would be dangerous on the freeway, so the failure must be eliminated, or a fallback/redundancy developed.

These problems are not impossible to overcome. Every autonomous passenger vehicle has solved them to some extent, with the remaining edge cases punted to some combination of MRC and remote operators. The difference is that, on freeways, they need to be solved with a very high level of reliability to meet the driverless bar.

Freeways are boring

The features that make freeways simpler — controlled access, no intersections, one-way traffic — also make “interesting” events more rare. This is a double-edged sword. While the simpler environment reduces the number of software features to be developed, it also increases the iteration time and cost.

During development, “interesting” events are needed to train data-hungry ML models. For validation, each new software version to be qualified for driverless operation needs to encounter a minimum number of “interesting” events before comparisons to a human safety level can have statistical significance. Overall, iteration becomes more expensive when it takes more vehicle-hours to collect each event.

AV developers can only respond by increasing the size of their operations teams or accepting more time between software releases. (Note that simulation is not a perfect solution either. The rarity of events increases vehicle-hours run in simulation, and so far, nobody has shown a substitute for real-world miles in the context of driverless software validation.)

Is it ever going to happen?

Trucking requires longer range sensing and more complex controls, increasing system complexity and pushing the problem to the bleeding edge of current sensing capabilities. At the same time, driving on freeways brings additional reliability requirements, raising the quality bar on every software component from mapping to scene understanding.

If both the truck form factor and the freeway domain increase the level of difficulty, then driverless trucking might be the hardest application of autonomous driving:

	City	Freeway
Cars	Baseline	Harder
Trucks	Harder	Hardest

Now that scaled rideshare is mostly working in cities, I expect to see scaled freeway rideshare next.

Does this mean driverless trucking will never happen? No, I still believe AV developers will overcome these challenges eventually. Aurora, Kodiak, and Gatik have all promised some form of driverless deployment by the end of the year. We probably won’t see anything close to a million-mile deployment in 2024 though. Getting there will require advances in sensing, machine learning, and a lot of hard work.

Thanks to Steven W. and others for the discussions and feedback.

This should be considered a bare minimum because humans perform much better on freeways, raising the bar for AVs. Rough numbers taken from Table 3, passenger vehicle national average on surface streets: Scanlon, J. M., Kusano, K. D., Fraade-Blanar, L. A., McMurry, T. L., Chen, Y. H., & Victor, T. (2023). Benchmarks for Retrospective Automated Driving System Crash Rate Analysis Using Police-Reported Crash Data. arXiv preprint arXiv:2312.13228. (blog) ↩
Kalra, N., & Paddock, S. M. (2016). Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94, 182-193. ↩
Favaro, F., Fraade-Blanar, L., Schnelle, S., Victor, T., Peña, M., Engstrom, J., … & Smith, D. (2023). Building a Credible Case for Safety: Waymo’s Approach for the Determination of Absence of Unreasonable Risk. arXiv preprint arXiv:2306.01917. (blog) ↩
Computed from tables 1 and 2: Harwood, D. W., Glauz, W. D., & Mason, J. M. (1989). Stopping sight distance design for large trucks. Transportation Research Record, 1208, 36-46. ↩

How Cruise vehicles return to the garage autonomously in heavy rain

2023-01-16T20:54:09-08:00

Cruise doesn’t carry passengers in heavy rain. The operational design domain (ODD) in their CPUC permit (PDF) only allows services in light rain.

I’ve always wondered how they implement this operationally. For example, Waymo preemptively launches all cars with operators in the driver’s seat anytime there’s rain in the forecast. Cruise has no such policy: I have never seen them assign operators to customer-facing vehicles.

Yet Cruise claims to run up to 100 driverless vehicles concurrently. It would be impractical to dispatch a human driver to each vehicle whenever it starts raining. When the latest atmostpheric river hit San Francisco, I knew it was my chance to find out how it worked.

Monitoring the Cruise app

As the rain intensified, as expected, all cars disappeared from Cruise’s app and the weather pause icon appeared.

But then something unusual happened. The app returned to its normal state. A few cars showed up near a hole in the geofence — and they were actually hailable.

Visiting the garage

I drove over to find that this street is the entrance to one of Cruise’s garages. The same location has been featured in Cruise executives’ past tweets promoting the service.¹

Despite the heavy rain and gusts strong enough to blow my hat/jacket off, a steady stream of Cruise vehicles were returning themselves to the garage in driverless mode.

Driverless Cruise vehicles enter the garage during heavy rain.

A member of Cruise’s operations team enters the vehicle to drive it into the garage.

In total, I observed:

8 driverless vehicles
1 manually driven vehicle
1 support vehicle (unmodified Chevy Bolt not capable of autonomous driving)

Two vehicles skip the garage

After the first six driverless vehicles returned, the next two kept driving past the garage. I followed them in my own car. They drove for about 16 minutes, handling large puddles and road spray without noticeable comfort issues. Eventually they looped back to the garage and successfully entered.

A Cruise vehicle drives through a puddle during its detour.

I’m not totally sure what happened here. I can think of two reasonable explanations:

Boring: The cars missed the turn for some unknown reason.
Exciting: Cruise has implemented logic to avoid overwhelming the operations team’s ability to put cars back in the garage. If there are too many vehicles waiting to return, subsequent cars take a detour to kill time instead of blocking the driveway.

Key take-aways

Cruise is capable of handling heavy rain in driverless mode.
The majority of Cruise vehicles returned to the garage autonomously. This enables them to handle correlated events, such as rain, without deploying a large operations team.
Cruise may have implemented “take a lap around the block” logic to avoid congestion at the garage entrance.

~~I can’t find the timelapse of Cruise launching their driverless cars anymore. I’m pretty sure it was posted to Twitter. Please let me know if you have the link!~~ Update: Link to tweet by @kvogt. ↩

Inferring Cruise occupancy from Kyle Vogt’s fleet dashboard screenshot

2023-01-04T19:56:15-08:00

Background

Last month, Kyle Vogt (CEO of Cruise) tweeted a screenshot of Cruise’s fleet management dashboard:

The press: AVs are over-hyped and are still 5-10 years away.

Us: At this moment there are 100 @Cruise AVs in driverless mode in SF, and many are currently carrying passengers.

(from two nights ago) pic.twitter.com/rV6gWRgK0q
— Kyle Vogt (@kvogt) December 1, 2022

Eagle-eyed commenters noticed there were several types of vehicle icons and wondered whether they had any meaning.

Full-resolution image from Kyle’s tweet.

Cruise’s letter to the CPUC

The answer has been hiding in plain sight. Two weeks later, on December 16, 2022, Cruise submitted an Advice Letter (PDF, 23 MB) to the California Public Utilities Commission. They were requesting an expansion of Cruise’s driverless deployment permit.

Page 48 contains a screenshot of Cruise’s fleet monitoring dashboard:

6.3. Fleet Monitoring and Learning

Cruise continuously monitors its driverless fleet while it is in operation. Cruise uses a suite of internal tools to oversee its fleet of Cruise AVs, including information about each Cruise AV on the road, such as their current location, operating condition and passenger states.

Figure 24: Cruise internal fleet monitoring tool (provided as example; actual may vary)

The key shows:

Failed: red
Healthy: green
Recovery: blue
Occupied: filled
Vacant: outline

I’m assuming occupied vs. vacant indicates whether the vehicle currently carries a passenger.

Note that the numbers in this image don’t add up, suggesting it might be a mockup rather than the live app. For example:

12 failed + 160 driverless + 4 recovery + 24 manual = 200 vehicles total
56 awaiting ride assignment + 13 assigned a ride + 131 assignment in progress + 20 returning to facility + 30 unavailable = 250 vehicles total

Interpreting Kyle’s screenshot

We now have a snapshot of Cruise’s ridership at some unknown time on the night of November 29, 2022.

Healthy, occupied: 2 vehicles
Healthy, vacant: 85 vehicles
Total: 87 vehicles

Note that the number of vehicles may be undercounted because:

The viewport doesn’t show Cruise’s entire service area.
Some map icons overlap.
We don’t know whether the filter by AV mode has been set to show only healthy (green) vehicles.

Tearing down the Rewind app

2022-12-24T23:59:00-08:00

Rewind is a Mac app that records your computer’s screen and audio, allowing the user to scroll through a timeline of past screen recordings. Rewind also recognizes text, including text in videos and Zoom calls, allowing the user to perform full-text search on anything that has been displayed. Rewind is developed by the same team as Scribe.

Rewind also records Zoom meetings as a first-class feature. Whenever the user enters a meeting, Rewind asks to record and transcribe audio from all participants.

A Zoom call shown in the Rewind app’s history browser.

All indexing, including OCR and speech-to-text, happens locally. The developers claim that Rewind “doesn’t tax system resources, like CPU and memory, while recording” by taking advantage of accelerators built into the Apple M1 and M2. Given these claims, I was eager to sign up for the beta ($20/month, first month free) to find out how they pulled this off.

How it works: Overview
Analyzing the Rewind app
Ideas for Improvement
Conclusion
Appendix
- Rewind App
- Database Schema

How it works: Overview

Use accessibility APIs to identify the frontmost window.
- Store the timestamps to a SQLite database in the user’s Library folder.
Take a screenshot of the screen that contains the frontmost window.
- If there are multiple screens, only the currently focused screen will be captured.
- Use ScreenCaptureKit to hide disallowed windows, including private browser windows and a user-defined exclusion list.
OCR the screenshot on-device using Apple’s Vision framework, the same pipeline that powers Live Text.
- Store the inference results to a SQLite database.
Compress the screenshot sequence to an H.264 video with FFmpeg.
- Store videos in the user’s Library folder.

Additionally, if the user joins a Zoom call and enables transcription through Rewind:

Transcribe the audio on-device using the OpenAI Whisper model.
- Store the transcripts and speaker information in a SQLite database.

In the following sections, I’ll provide details on how I came to these conclusions.

Analyzing the Rewind app

Application Bundle

After installation, I poked through the Rewind.app bundle.

Executables:

Rewind: the Cocoa application
Rewind Helper: a non-UI binary

Other notable files include:

Resources/ggml-base.en.bin: The model weights for the OpenAI Whisper transcription model in the whisper.cpp, likely for Zoom call transcription. The SHA-1 taken on my local machine (137c40403d78fd54d454da0f9bd998f78703390c) matches the file from the weights repo.
Resources/favicons: A directory of 912 favicons for popular websites, stored as PNG images. Examples include amazon_com.png, dropbox_com.png, youtu_be.png, etc. These are used in the timeline view when the frontmost app is a web browser, instead of showing the browser’s app icon.
Frameworks/Sparkle.framework: Despite installing through a .pkg, Rewind uses Sparkle for updates.

Frameworks

After loading into a disassembler, the Rewind binary contains references to:

whisper.cpp
VisionKit
- ImageAnalyzer — API for running Live Text inference and accessing results
- ImageAnalyzerOverlayView — displays Live Text results and allows the user to copy text
Sentry — telemetry

The Rewind Helper contains a statically linked FFmpeg.

Permissions

Upon launch, Rewind requests permissions to:

record the screen
record the microphone
control other applications (accessibility).

Recording the microphone is optional if the user doesn’t want to transcribe Zoom meeting audio.

Excluded Applications & Private Browsing

Rewind allows the user to exclude apps from its recording. By default, Rewind also excludes private windows from several popular browsers.

In this example, I’ve opened three windows (listed from back to front):

TextEdit
Safari (regular)
Safari (Private Browsing)

Top: My actual desktop. Bottom: What Rewind sees.

Rewind’s screenshot excludes the private browsing window and the menu bar — and reveals additional content previously occluded by the private window.

This suggests that Rewind uses Apple’s ScreenCaptureKit, which allows filtering by window and can recomposite the desktop based on the input parameters. (The legacy CGDisplayCapture API can only capture the entire screen or individual windows. The app developer would have to perform their own compositing.)

Storage Format

Rewind stores all screen recordings and transcripts to:

~/Library/Application Support/com.memoryvault.MemoryVault

There are three items in this directory:

(1) `chunks`: H.264 videos

A directory of timestamped movie files. Surprisingly, the date format contains colons.

The chunks directory.

Each chunk file is an MP4 container:

$ cd ~/Library/Application\ Support/com.memoryvault.MemoryVault/chunks/2022-12-19T00:57:32
$ file chunk
chunk: ISO Media, MP4 Base Media v1 [ISO 14496-12:2003]

Inspecting the file with VLC, we find that it contains a single H.264 stream, about 5 minutes long at 0.5 fps.

VLC inspector on a chunk file.

(2) `temp`: PNG screenshots

This directory contains a series of screenshots. A new screenshot is created every two seconds.

With a screen resolution of 3024 × 1964 (14-inch MacBook Pro), the app writes about 1–2 MB/s of PNGs when displaying text. The exact throughput depends on the total screen size and content being shown. In the worst case, such as when playing back a full-screen movie, each screenshot might exceed 10 MB.

Download

A PNG file gets created every two seconds.

(3) `db.sqlite3`: Metadata

Rewind uses this SQLite database to store video file metadata, focused app metadata, OCR results, and call transcripts. Here are the key tables:

frame contains a row for each video frame.

SELECT * FROM frame LIMIT 10

Query Results

What’s on my ballot: November 2022 California general election

2022-10-22T01:07:39-07:00

Here’s how I’m voting in the November 2022 general election. While preparing for this election, I consulted:

San Francisco Chronicle voter guide
SPUR endorsements
GrowSF voter guide, which links to questionnaires from many local candidates that are far more detailed than their campaign websites

I also referred to my research for the June 2022 primary.

California: Voter-nominated Offices
Federal
- United States Senator
- United States Representative, District 11
California: Non-partisan Offices
- Judges
- Superintendent of Public Education
San Francisco: Education
San Francisco
California Ballot Measures
San Francisco Ballot Measures

California: Voter-nominated Offices

Governor

➡️ Gavin Newsom

Newsom supports housing, although his effectiveness has been questionable. His challenger, Brian Dahle, has some questionable environmental positions on allowing oil drilling and desalination.

Lieutenant Governor

➡️ Eleni Kounalakis

I’m voting for Kounalakis with reservations. Kounalakis is less pro-housing than I’d prefer. For example, she prefers to reform the CEQA without legislative changes.

Secretary of State

➡️ Shirley Weber

The other candidate, Rob Bernosky, has a platform with several euphemisms, including “cleaning California’s voter rolls.”

Controller

➡️ Lanhee Chen

Unfortunately, Ron Galperin (my preferred primary candidate) didn’t make it to the general election. The candidates are now:

Malia Cohen, who leans more left compared to most Californians
Lanhee Chen, who leans more right

Cohen’s platform contains several social policies around healthcare, minimum wages, and gun violence that are better addressed by the legislature.

I’ll vote for Chen because he seems more focused on the State Controller’s watchdog role. This is a weak endorsement because Chen’s top policy priority is reducing fraud in the application processes for public benefits, such as Medi-Cal. Depending on implementation, this may also lead to fewer qualified people receiving public benefits as well.

Treasurer

➡️ Fiona Ma

Attorney General

➡️ Rob Bonta

Bonta has been actively enforcing statewide housing production laws to ensure municipalities’ compliance.

Insurance Commissioner

➡️ none

Unfortunately, Marc Levine didn’t make it to the general election, leaving us with two less-then-ideal candidates:

The Chronicle notes incument Ricardo Lara’s ethical lapses while in office, including swaying decisions to favor campaign donors.
Like many other primary candidates, Robert Howell touts his background in cybersecurity rather than insurance. This should not be a selling point. It means he’s unlikely to have the domain knowledge needed to do the job effectively.

Board of Equalization, District 2

➡️ none

The Chronicle points out that the Board of Equalization doesn’t do much anymore. The vast majority of responsibilities have been moved to other bodies.

State Assembly, District 17

➡️ Matt Haney

Haney supports housing development. Many local governments in California obstruct development, so it’s important to have housing advocates in the state legislature.

Federal

United States Senator

➡️ Alex Padilla (both terms)

Padilla performed well as California’s Secretary of State. His no-nonsense approach to election trust included prosecuting those setting up fake ballot boxes.

United States Representative, District 11

➡️ Nancy Pelosi

It would be great to bring in some younger politicians so they can start building their influence, etc. Unfortunately, Pelosi decided to run one more time.

California: Non-partisan Offices

Judges

➡️ Yes to all

Philosophically, I don’t think it makes sense to elect judges as they are supposed to be appointed by the executive. But given that this is the system we have in California, I treat these retention elections like recalls: keep the candidate in office except in cases of serious ethical lapses.

Superintendent of Public Education

➡️ Tony Thurmond

Thurmond’s track record isn’t great and he received endorsements from problematic Californian teachers’ unions. However, Lance Christensen supported school vouchers in a Chronicle endorsement interview. This position is far to the right of most Californians.

San Francisco: Education

Member, Board of Education

➡️ Ann Hsu
➡️ Lainie Motamedi
➡️ Lisa Weissman-Ward

In Feburary, San Francisco voted to recall several members from the school board in a landslide election. The interest group SF Guardians led the recall campaign. I consider them experts on the workings of the school board. They are now endorsing the interim, mayor-appointed candidates to be elected to the school board.

The candidates I didn’t select:

Gabriela López is running again in this election. Voters recalled her earlier this year with 72 percent of votes.
Karen Fleshman opposed the school board recall in an interview with GrowSF. On the bright side, she supports accelerated learning programs, such as reintroducing middle-school algebra.
Alida Fisher opposed the school board recall in an interview with GrowSF.

Member, Community College Board (term ending 2027)

➡️ Jill Yee
➡️ Marie Hurabiell
➡️ John Rizzo

The City College of San Francisco continues to face a budget crisis, aging facilities, and low enrollment. It is also on verge of losing accreditation, which threatens to take away state funding and further decrease enrollment. My ideal board candidate would solve these problems by taking an analytical, detail-oriented approach and not being afraid to make big changes.

Making this determination isn’t easy because it’s hard to find details about the candidates’ platforms. The Chronicle hasn’t completed its endorsement interviews yet. Even the candidates’ own websites only have vague descriptions at best.

To choose these candidates, I read the entire GrowSF interview for each candidate. I didn’t consider anyone who declined to answer the questionnaire.

I decided to vote for these candidates:

Jill Yee is a former student, current professor, and dean at CCSF. The depth of her experience at CCSF, her analytical approach, and no-nonsense mindset shine in the interview. For example, she proposes addressing the college’s budget crisis by cutting low-enrollment programs to focus resources on in-demand vocational programs. Her methodology involved reading the course catalog to identify overlapping curriculum. She also understands the details of CCSF’s funding sources, which will help with the upcoming accreditation.
Marie Hurabiell has personal experience with CCSF’s problems from the students’ perspective. She also brings experience from nearly a decade on the Board of Regents for Georgetown.
John Rizzo is an incumbent board member with 15 years of experience. I believe this context will be useful to keep on the board. His top priority is accreditation.

These incumbents seem fine and could be substituted for John Rizzo, although they have less experience on the board:

Brigitte Davila is an incumbent board member with seven years of experience. Her top priority is accreditation.
Thea Selby is an incumbent board member with seven years of experience. Her top priority is accreditation.

I don’t believe these these candidates are qualified:

Jason Zeng’s responses sound good, but they’re too vague to predict his concrete actions if elected.
William Walker only has experience as a CCSF student. His policy positions are vague (“Identify areas where the college might be willing to shrink”). Other positions are impractical. His top priority is enrollment growth through community partnership, which can’t be achieved without accreditation.

Member, Community College Board (term ending 2025)

➡️ Murrell Green

No candidate responded to GrowSF’s survey. Green was recently appointed by London Breed.

San Francisco

Assessor–Recorder

➡️ Joaquín Torres

There are no other candidates.

District Attorney

➡️ Brooke Jenkins

I am generally okay with incumbent Brooke Jenkins’ policies.

In GrowSF interviews, challengers Maurice Chenier and Joe Alioto Veronese could have provided more thoughtful opinions instead of buzzwords. For example, Chenier repeats “victim-based administration” to nearly every question.

John Hamasaki supports concealed carry, which is a non-starter. He also conducts himself poorly on social media.

Public Defender

➡️ Mano Raju

Both candidates come with great qualifications and seem to have reasonable viewpoints. Raju has focused on activism outside the courtroom, while challenger Rebecca Susan Feng Young wants the office to focus on trials.

Board of Supervisors, District 10

➡️ Brian Sam Adam

Incumbent Shamann Walton is one of the worst supervisors in SF. He opposes housing development, which is a non-starter.

On other issues, he holds nonsensical positions. For example, he opposed closing JFK Drive to cars on the grounds that it is “segregationist” toward District 10 residents who typically drive to Golden Gate Park. He responded to criticism by doubling down, directly comparing the street to the Jim Crow–era South during a city hearing.

From his GrowSF interview, I found that I agree with challenger Brian Sam Adam on several issues:

Supports car-free JFK and car-free Great Highway on weekends
Supports repealing Proposition 13

I don’t agree with these positions on education and housing:

Opposed the school board recall
Opposes accelerated learning programs
Doesn’t believe upzoning will significantly improve the housing crisis
Wants to limit upzoning to commercial and industrial areas

It is understandable that both SF YIMBY and GrowSF declined to provide an endorsement in this race. However, I’ll still vote for Adam as the candidate with more reasonable policies and public conduct.

California Ballot Measures

A note on navigating ballot propositions

California’s ballot proposition system requires voter approval in certain situations, such as issuing bonds or amending the state constitution. Voters can also introduce new laws and veto laws already passed by the legislature. Ballot measures can only be repealed by another ballot measure, unless otherwise specified.

The downside of direct democracy is that most voters are less informed than their representatives. Voters don’t spend time talking to constituents and can’t request analysis from staffers. As a result, the proposition section of the ballot has become a prime target of astroturfing campaigns and populist policies like Proposition 13.

Because of its tendency to produce bad ideas and make them hard to undo, my heuristic is to vote “no” by default. I’ll also watch out for propositions that could be passed as normal legislation. They tend to be put on the ballot by special interests or astroturf campaigns looking for hard-to-repeal regulatory capture.

1: Constitutional Right to Reproductive Freedom

✅ Yes

This proposition would amend the state constitution to make the existing reproductive rights protection unambiguous. The existing language could be susceptible to future reinterpretation by the courts, similar to what happened in Dobbs v. Jackson Women’s Health Organization.

26: In-person Roulette, Dice Games, Sports Wagering on Tribal Lands

❎ No

Sports gambling is fine, but this could have been submitted by the legislature, as SPUR and the Chronicle note. This proposition is written by industry players with large amounts of revenue at stake. It’s likely to lead to regulatory capture.

For example, the tax rate is fixed at 10 percent. A good-faith attempt at policymaking would have allowed the legislature to adjust the tax rates and other parameters.

I’d like the industry to try again in two years. Given the size of the gambling market in California, they’re all but guaranteed to submit another ballot proposition if this one does not pass.

27: Online Sports Gambling Outside Tribal Lands

❎ No

Generally the same reason as 26.

This proposition is even more clearly trying to accelerate regulatory capture. Its primary sponsors are DraftKings and FanDuel. Gaming companies must be qualified to offer online sports betting in at least 10 states, or five states and operating 12 brick-and-mortar casinos.

28: Additional Funding for Arts and Music in Public Schools

❎ Weak No

This measure would reallocate existing public education funding so that at least 1 percent goes toward arts and music. While increasing funding and removing local school district control are generally good ideas, I think the legislature should have more flexibility to adjust the funding amount.

29: On-site Licensed Medical Professional at Kidney Dialysis Clinics

❎ No

This measure is a rerun of 2020 Proposition 23, which was a rerun of 2018 Proposition 8. Every other year in California, we need to vote down another nonsense dialysis proposition. Sorry, I don’t make the rules…this is just how voting works now.

SEIU-UHW West, the union representing dialysis clinic workers, wants to increase minimum staffing levels. They have been unable to achieve this in negotiations.

It’s not appropriate to make these decisions through ballot propositions. The public shouldn’t be tiebreaking union negotiations. We’re also not qualified to decide whether there’s a medical reason to require a licensed professional at dialysis clinics.

30: Electric Vehicle Subsidies, Income Tax above $2M

✅ Weak Yes

I strongly support increasing electric vehicle incentives. The measure generally seems well designed too.

It explicitly allocates funds to low-income car owners.
It balances funding between at-home charging (30 percent) and public fast-chargers (10 percent), which avoids the misconception that fast chargers are a drop-in replacement for gas stations.
It provides general guidelines for programs and budget allocations, while delegating the specific implementation to public agencies.

However, I don’t think all provisions need to be done through ballot propositions. The EV market is changing rapidly, including key factors like battery mineral availability and consumer interest. We don’t know how our needs will change between now and the proposition’s expiration date in 2043. It’s also unclear how much value this mesure adds beyond the federal EV rebates that take effect next year and also target low-income drivers through used car subsidies.

Common Misconceptions

I also want to correct two common misconceptions about this measure:

It’s fine that this measure is primarily funded by Lyft. Lyft is interested because they are required to serve 90 percent of its California mileage using electric vehicles by 2030. That doesn’t necessarily mean there is something nefarious going on. Any vehicle subsidy also indirectly subsidizes Lyft’s service. Plus, all internal combustion vehicles pollute the same air regardless of whether they are on Lyft’s platform.

This measure may not include e-bikes. Co-sponsor SPUR says it includes e-bike purchase subsidies, while the Chronicle says it doesn’t.

The text of the measure, section 80220, is ambiguous:

80220. Eligible Programs.

Programs eligible for funding pursuant to this chapter may include, but are not limited to, those that:

[…]

(b) Provide block grants, grants, loans, or other incentives for zero-emission transit buses so people get to where they need to go in ZEVs.

(c) Provide incentives, grants, and block grants for governments and businesses to buy medium-, heavy-duty, and off-road agricultural and construction ZEVs.

(d) Provide financing assistance to help those without access to capital or high credit acquire new and used ZEVs.

(e) Help people retire old polluting vehicles and replace them with new and used ZEVs or other zero-emission mobility options.

[…]

(h) Increase access to clean mobility options, including but not limited to:

(1) Electric bikes.

(2) Bike-sharing.

(3) Protected bike lanes.

(4) Transit passes.

Note the difference between (b) through (d), which explicitly call out financial incentives and assistance for ZEVs, which are defined as motor vehicles that also meet the zero-emissions requirements. On the other hand, e-bikes would fall into sections (d) and (e), which use weaker language. It’s not clear to me how this will be interpreted.

31: Flavored Tobacco Referendum

✅ Yes

This proposition is a referendum of the legislature’s 2020 ban on flavored tobacco products, such as vape liquids. A yes vote will uphold the existing law and allow it to take effect.

Several tobacco companies spent a total of $23 million to support the repeal campaign.

San Francisco Ballot Measures

A: City Retiree Cost of Living Adjustment

✅ Yes

This provides city retirees the same COLA treatment regardless of retirement date.

Also, it doesn’t make sense to have a hard requirement on whether the retirement system is fully funded. This changes depending on market conditions and isn’t the best indicator of system health.

B: Department of Sanitation and Streets

✅ Yes

This measure would move the responsibilities of the Department of Sanitation and Streets back to the Department of Public Works, partially repealing 2020 Measure B. At the time, I opposed the measure because:

It’s not clear to me that this reorganization would fix San Francisco’s sanitation problems. For example, the argument in favor says that this will allow “data-driven cleaning.” But they haven’t shown why that practice is impossible to implement under the current organizational structure.

Since then, we’ve learned that duplicating administrative roles comes with significant cost. The Controller reports that recombining the departments will save $3.5 million in FY23 and $2.5 million in FY24.

C: Establish Homelessness Oversight Commission

❎ Weak No

The city spends over $700 million per year to reduce homelessness. I agree that this spending needs oversight to ensure effectiveness. However, the board needs the right background to oversee the budget.

Based on the legal text, the commission members are:

Four seats appointed by the Mayor and confirmed by the Board of Supervisors, with these profiles:
- Person who has personally experienced homelessness
- Service provider or advocate
- Mental health or substance abuse expert
- Neighborhood or small business association member
Three seats appointed by the Board of Supervisors:
- Person who has personally experienced homelessness
- Service provider or advocate
- Social worker for homeless families with children or homeless youth

The commission isn’t required to have any members with experience in government or managing a $700 million budget. The current composition represents the stakeholders well, so would be more suitable for an advisory board rather than approving budgets.

D: Streamline Affordable Housing Approval (Voter Initiative)

✅ Yes

San Francisco’s long lead times for building permit approval are well known. It makes sense to streamline the approval process and protect the projects from objections through abuse of the California Environmental Quality Act review process.

E: Streamline Affordable Housing Approval (Board of Supervisors)

❎ No

This is a weaker version of Proposition D placed on the ballot by the Board of Supervisors so they can retain the ability to deny new housing. If both D and E pass, the proposition with the most votes will take effect.

F: Renew Library Preservation Fund

✅ Yes

The fund provides nearly all of the library’s budget. Renewing the fund maintains the status quo of public library funding.

G: Establish Student Success Fund

❎ Weak No

I generally support using excess property taxes to fund public schools. I’m not sure this measure is the best way to increase funding.

SPUR’s analysis summarizes the requirements to receive funds:

The three minimum criteria are: that the school has a school council composed of administration, students, parents and other key stakeholders to support grant implementation, that the school has or hires a full-time community school coordinator and that the school agrees to coordinate its services with the city and school district.

Schools that meet the requirements can apply for grants up to $1 million per school. It’s not clear that this will make a difference, given the high overhead for implementation. The alternative is keeping the money in the General Fund.

H: City Elections in Even-numbered Years

✅ Yes

I: Allow Motor Vehicles on JFK Drive, Great Highway

❎ No

JFK Drive and the Great Highway have been SF’s most successful slow streets. I bike on these streets multiple times per week. The Board of Supervisors even voted to make them permanent. This measure would return these streets to the pre-pandemic status quo.

Additionally, the Chronicle points out that Proposition I would require the city to maintain indefinitely the Great Highway between Sloat and Skyline Boulevards. This segment was already scheduled to be closed next year because of natural erosion and sea level rise.

For more details, I found these endorsements helpful and I completely agree with them:

J: Recreational Use of JFK Drive

✅ Yes

This measure will reaffirm the Board of Supervisors’ decisions on JFK Drive, keeping the status quo. If both Propositions I and J pass, we need more votes on J to keep JFK Drive closed to motor vehicles.

L: Reauthorize Sales Tax for Public Transportation

✅ Yes

This measure reauthorizes the existing sales tax until 2053. I’m always supportive of funding transit because it’s important for a healthy city.

M: Vacancy Tax

❎ No

This measure would impose a tax on unoccupied apartments, but exempts single-family homes and duplexes for some reason. I generally support vacancy taxes to discourage homeowners withholding investment properties from the market. This implementation seems ineffective.

N: Golden Gate Park Underground Parking

✅ Yes

The parking garage is currently operated by a private entity, which has decided to repay the construction costs by setting high prices. It currently costs $6.25 per hour on weekends, which is similar to other private lots in SF but may be prohibitively high for low-income visitors.

The proposition would allow the city to acquire the garage, set or subsidize parking rates, while not obligating any action.

O: City College Parcel Tax

❎ No

While CCSF needs more funding, this parcel tax is unlikely to help in the long term. The tax is expected to raise $37 million per year. For comparison, CCSF’s anticipates $316 million in revenue for 2022–2023.

Another budget risk a transition in the state’s apportionment formula, which will require big changes at CCSF regardless of whether Proposition O passes.

In FY19, California adopted a new “Student Centered Funding Formula” for apportionment. The nonpartisan Legislative Analyst’s Office describes its components as:

(1) a base allocation linked to enrollment, (2) a supplemental allocation linked to low‑income student counts, and (3) a student success allocation linked to specified student outcomes.

[…]

The new funding formula included a temporary “hold harmless” provision for those districts that would have received more funding under the former apportionment formula. The intent of the hold harmless protection was to provide time for those districts to ramp down their budgets…

CCSF enrollment has been declining since 2018–2019 which should decrease funding from the SCFF’s enrollment component. The “hold harmless” grace period delays this decline until 2024–2025.

Analyzing Tesla AI Day 2022

2022-10-01T21:49:21-07:00

I was fortunate enough to attend this year’s Tesla AI Day. Tesla’s architecture has always been interesting to me because their hardware limitations drive advances in their software. Relative to other players in the industry:

Low-spec sensors require ML- and data-first approaches, instead of adding more sensor hardware.
Low-spec compute requires efficient model architectures. For example, preferring to share backbones among several tasks rather than adding more and larger task-specific models.

These are good directions that will help reduce the cost of machine learning and robotics, making them more relevant for everyday applications.

Elon Musk opens the presentation.

Below, I’ll discuss three topics that stood out to me:

An LLM-inspired lane graph regression model
Running the lane model efficiently on the Autopilot computer
Increasing Dojo training tile utilization

Online lane graph predictions with a language model

Tesla described their approach for producing sparse lane graph predictions from a dense feature map produced by their 2D to 3D vision model. This is a key component in the autonomous vehicle software stack that enables the behavior system to reason about the possible actions of the vehicle and other agents.

We approached this problem like an image captioning task, where the input is a dense tensor, and the output text is predicted into a special language that we developed at Tesla for encoding lanes and their connectivities. In this language of lanes, the words and tokens are the lane positions in 3D space. The ordering of the tokens and predicted modifiers in the tokens encode the connective relationships between these lanes.

The architecture is inspired by transformer-based language models:

Tesla’s lane prediction model architecture.

Producing a lane graph from the model output sentence requires less postprocessing than parsing a segmentation mask or a heatmap. The DSL directly encodes the data structure needed by downstream consumers.

Recent advances in large language models and stable diffusion have produced impressive results approaching the quality and creativity of human-generated text and images. While they have clear applications in creative tools, they are limited by the need to have a human interpret the generated output.

I think the biggest impact of generative models will be in ingesting and producing structured data, such as Tesla’s lane graph DSL. This allows the learned component to be integrated into a larger software system. The models’ impact can grow with compute capacity rather than with the number of person-hours available to prepare inputs or view outputs.

Deploying the lane model: array indexing with dot products

Deploying the language model to the vehicle posed a challenge: Tesla’s neural network accelerator (“TRIP”) only supports multiply–accumulate operations.

When we built this hardware, we kept it simple and made sure it can do one thing ridiculously fast: dense dot products. But this architecture is autoregressive and iterative. […] The here challenge was: how can we do this sparse point prediction and sparse computation on a dense dot product engine?

A matrix multiplication operation selects an entry from the embedding table.

They are rewriting indexing operations as a dot product with a one-hot vector. As an illustration, here’s how they might select the second item from a 1D lookup table with four entries:

\[\begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \\ \end{bmatrix}^\intercal \begin{bmatrix} X_0 \\ X_1 \\ X_2 \\ X_3 \\ \end{bmatrix}\]

This feels like a workaround to get the model running on existing acclerator hardware developed with dense convolutions in mind. The current Autopilot computer, HW3, shipped in 2019. It was not clear that sparse operations would become so important to vision tasks. Unlike other companies, Tesla doesn’t operate its own fleet and unlike smartphone manufacturers, Tesla does not currently have the volume to spin new chips each year. They’ve made it work with the hardware they have.

The downside is that this implementation wastes FLOPs linearly with the number of entries in the lookup table. They definitely contain more than four items and the illustration suggests they might have two dimensions.

Transformers are becoming more prevalent in Tesla’s model architecture. It will be interesting to see whether future hardware runs these models more natively.

Dojo: Dealing with density

In last year’s presentation, Tesla introduced Dojo, the most unconventional system in Tesla’s tech stack.

Its D1 accelerator chips contain a custom CPU with vector instructions, SRAM, and a chip-to-chip interconnect.
The chips are arranged in a 2D grid and connected with their neighbors, forming a tile.
Tiles are then packaged into a rack mount to connect to their neighbors.
Several cabinets combine to form a cluster.

The overall strategy is to train arbitrarily large models by greatly increasing bandwidth and reducing latency between processors. This requires increasing density. All other design decisions are downstream of this strategy:

Enclosure: 15 kW per tile, six tiles per tray. I don’t think any other accelerator comes close to this W/m³.
Memory: SRAM instead of DRAM for storing weights and activations (nearly 700 MB per chip). On a more traditional computer architecture, this would be like storing a program’s working set in the CPU cache.
Microarchitecture: The D1 accelerator chip doesn’t have much logic compared to other processors. They push the complexity to software. For example, Tesla performs static scheduling at compile time because the D1 CPU only supports in-order dispatch.

A blast from the past: VLIW

D1’s architecture reminds me of very long instruction word (VLIW) designs from the 1990s and early 2000s, such as Intel Itanium. Those processors failed in the market because instruction set compatibility dominated. When it came to improving performance, software developers would rather wait for next year’s microarchitecture than recompile their binaries for a new instruction set.

Deep learning workloads are different. Developers are highly motivated to increase performance. So far, they have been willing to adapt to architecture changes.

Feeding examples to the D1 accelerator

This year, Tesla provided additional details on how the training tiles are integrated into the rest of the datacenter.

There are several interface processors mounted below each tile. These standard x86 machines contain network interfaces and video decoding hardware to load training examples.

Interface processor details.

Tesla faced a problem when training a video model on Dojo. The utilization of the accelerator chips was only 4 percent.

With our dense ML compute, Dojo hosts effectively have 10x more ML compute than the GPU hosts. The data loaders running on this one host simply couldn’t keep up with all that ML hardware.

That’s right: the Dojo tiles achieved too much density relative to the x86 machines keeping them fed with training examples. The Dojo tiles spent most of their time waiting for data. For the system to run efficiently, the two machine types need to have a similar throughput.

They developed a custom DMA over Ethernet protocol that allows adding additional, external data loading hosts to communicate with the Dojo tiles. This improved utilization to an impressive 97 percent.

External data loading hosts connected over Ethernet. The diagram suggests the additional machines might be located in another rack.

However, the root cause of an unbalanced system remains unaddressed. While it is convenient to change the ratio between data loading machines and training accelerators by adding more machines to the network, I doubt this is the long-term architecture for Dojo. I would expect a future design to make the system more balanced.

The Cybertruck was the only show car we weren’t allowed to approach.

What’s on my ballot: June 2022 California primary election

2022-06-05T20:37:36-07:00

Here’s how I’m voting in the June 2022 primary election. While preparing for this election, I consulted the San Francisco Chronicle and SPUR endorsements.

California
San Francisco
- City Attorney
San Francisco Ballot Measures

California

Governor

➡️ Gavin Newsom

Newsom supports housing, although his effectiveness has been questionable.

Lieutenant Governor

➡️ Eleni Kounalakis

Kounalakis is less pro-housing than I’d prefer. For example, she prefers to reform the CEQA without legislative changes. The other candidates are not qualified.

Secretary of State

➡️ Shirley Weber

There are no other reasonable candidates in this race. For example, many other candidates believe in widespread voter fraud.

Controller

➡️ Ron Galperin

Galperin promises to improve the efficiency of housing and homeless programs, an important area for California.

Lanhee Chen’s top policy priority is reducing fraud in the application processes for public benefits, such as Medi-Cal. It will likely lead to more complicated applications. This is a situation where the optimal amount of fraud is non-zero: I prefer to accept some fraud in exchange for e.g. a lower-friction application process that allows more people to access public benefits.

Steve Glazer also seems fine but has a less clear platform. Some items (gun control) could be better handled through legislation.

Treasurer

➡️ Fiona Ma

Although the Chronicle notes Ma’s “series of scandals,” there are no other qualified candidates.

Insurance Commissioner

➡️ Marc Levine

The Chronicle notes incument Ricardo Lara’s ethical lapses while in office, including swaying decisions to favor campaign donors.

The other candidates are not qualified because they’re unlikely to have the right domain knowledge for insurance.

United States Senator

➡️ Alex Padilla (both terms)

Padilla performed well as California’s Secretary of State. His no-nonsense approach to election trust included prosecuting those setting up fake ballot boxes.

United States Representative, District 11

➡️ Nancy Pelosi

It would be great to bring in some younger politicians so they can start building their influence, etc. Unfortunately, Pelosi decided to run so we’re stuck voting for the boomer again.

State Assembly, District 17

➡️ Matt Haney

Haney supports housing development. Many local governments in California obstruct development, so it’s important to have housing advocates in the state legislature.

San Francisco

City Attorney

➡️ David Chiu

There are no other candidates.

San Francisco Ballot Measures

Navigating ballot propositions

Here’s what I wrote last year about California ballot measures. Similar dynamics appear in local elections.

California’s ballot proposition system requires voter approval for certain kinds of bills, including issuing bonds, amending the state constitution, and amending previously passed propositions. Voters can also introduce new laws and veto laws already passed by the legislature.

There is a problem with direct democracy: people typically aren’t as informed as their representatives. Suppose there is a measure to issue $5 billion in bonds. How do I know that’s the right amount? Why is it not $5.1 or $4.9 billion? Because few voters are public policy experts, the proposition section of the ballot has become a prime target of astroturfing campaigns and populist policies.

Because of its tendency to produce bad ideas and make them hard to undo, my heuristic is to vote “no” by default, especially when the proposition in question seems complicated or has received funding from interest groups. I’ll also watch out for propositions that could be passed as normal legislation and hold them to a higher standard. They tend to be put on the ballot by special interests or astroturf campaigns trying to trick voters into passing favorable regulation.

A: MUNI Bond

✅ Yes

MUNI is not the most efficient when spending its budget. For example, the Van Ness BRT project overran its budget multiple times. However, funding transit is essential for a city where not everyone owns a car. We are probably below the optimal amount of spending on MUNI.

B: Department of Building Inspection appointment process

❎ No

Although reforming the Department of Building Inspection is important, giving the control to the Board of Supervisors seems like it could cause gridlock.

C: Limit recall period

❎ No

Since I voted in favor of the recalling the three Board of Education members, I feel that the recall is still being properly used.

D: Office of Victim and Witness Rights

❎ No

Although it is important to guide victims in the crime reporting process, this does not need to be a ballot measure. If the office ends up not working out, we would need another proposition to remove it. (Note that the responsibilities can be changed by the Board of Supervisors. I wasn’t sure what this meant concretely.)

E: Behested payments

❎ No

This also doesn’t need to be a ballot measure. In the meantime, behested donations don’t directly benefit the politician and need to be reported above a certain threshold.

F: Refuse Rate Board

✅ Yes

The rate is currently renegotiated every five years. It seems correct to do more frequently.

G: Public health emergency leave

❎ No

Although the idea makes sense — codifying COVID-19 protections for a future pandemic — the definition of public health emergencies is too broad. For example, SPUR notes that Spare the Air days count as emergencies even though they are triggered fairly often.

H: Chesa Boudin recall

❎ No

Voters recalled three members of its school board for incompetence. Boudin’s case is different: he ran on a progressive agenda and delivered what he promised.

Boudin has also been scapegoated for property crime rates. The reality is that it’s a complicated problem with many causes, including the SFPD and broader trends including a high cost of living and income inequality.

What’s on my ballot: November 2020 general election

2020-11-01T18:48:06-08:00

Here’s how I’m voting in the November 2020 general election in San Francisco:

Federal
- President & Vice President
- US Representative, District 12
California State Legislature
- State Senator, District 11
- State Assembly, District 17
San Francisco City & County
State ballot measures
San Francisco ballot measures
District ballot measures
- RR: Caltrain Sales Tax

Federal

President & Vice President

➡️ Joseph R. Biden, Kamala D. Harris

You already know why.

US Representative, District 12

➡️ Nancy Pelosi

Pelosi has a lot of clout in Congress. It would be against our interests to vote her out.

California State Legislature

State Senator, District 11

➡️ Scott Wiener

Wiener is a high-profile housing advocate and housing is California’s top issue in my opinion.

State Assembly, District 17

➡️ David Chiu

The other candidate, who I won’t link here, is an extreme libertarian. Among other things, the candidate’s platform includes the belief that “secession is a civil right.”

San Francisco City & County

Member, Board of Education

➡️ Kevine Bogess
➡️ Alida Fisher
➡️ Jenny Lam
➡️ Michelle Parker

I’ll defer to the San Francisco Chronicle’s endorsements on this one. Apart from the anti-vaxxer, it’s difficult to distinguish the candidates in this race.

Member, Community College Board

➡️ Shanell Williams
➡️ Tom Temprano
➡️ Jeanette Quick
➡️ Marie Hurabiell

I’m deferring again to the San Francisco Chronicle’s endorsements. They’ve picked candidates with a good mix of backgrounds: incumbents who have experience with the school board (Williams, Temprano), a recent CCSF student (Quick), and a finance-focused lawyer (Hurabiell).

BART Director, District 9

➡️ Patrick Mortiere

Bevan Dufty is the incumbent and has the most experience, but hasn’t provided any platform or proposed policies. He spent half of his candidate statement telling a nice story about that time he personally cleaned a BART station. I do not consider candidates unless they share meaningful information.

Similarly, Michael Petrelis (no website) has only provided vague policy proposals. To use COVID-19 as an example, the candidate’s statement says simply: “Enhancing Covid-19 [sic] protections for all workers and riders.”

The remaining candidates are Patrick Mortiere and David Wei Wen Young. Their platforms are fairly similar. Mortiere wants more bike-friendly stations and an expansion of discounted fares for low-income customers. Young plans to keep drug abusers out of BART with new fare gates, which I don’t think will really solve the problem, and promises not to accept campaign contributions from BART vendors and unions. Both candidates mention the need to cut spending to make up for ridership decline, but neither offers the specifics of which items they would cut.

State ballot measures

Navigating California’s ballot proposition system

California’s ballot proposition system requires voter approval for certain kinds of bills, including issuing bonds, amending the state constitution, and amending previously passed propositions. Voters can also introduce new laws and veto laws already passed by the legislature.

There is a problem with direct democracy: people typically aren’t as informed as their representatives. Suppose there is a measure to issue $5 billion in bonds. How do I know that’s the right amount? Why is it not $5.1 or $4.9 billion? Because few voters are public policy experts, the proposition section of the ballot has become a prime target of astroturfing campaigns and populist policies.

Because of its tendency to produce bad ideas and make them hard to undo, my heuristic is to vote “no” by default, especially when the proposition in question seems complicated or has received funding from interest groups. I’ll also watch out for propositions that could be passed as normal legislation and hold them to a higher standard. They tend to be put on the ballot by special interests or astroturf campaigns trying to trick voters into passing favorable regulation.

These resources can help too:

Ballotpedia provides neutral summaries of each proposition, often in far greater detail than provided by the official voter guide. Ballotpedia also lists the top donors and their spending, which can be used as a proxy for whether to support a proposition.
Newspaper endorsements by the San Francisco Chronicle and the Los Angeles Times.
SPUR, a member-funded think tank that publishes in-depth analyses with citations. Although they are opinionated, they also do a great job of summarizing the context surrounding each issue.

14: Stem Cell Research Bonds

❎ No

Basic research, including stem-cell research, should be funded by the federal government. I would prefer for a federal agency to fund projects on a more granular basis, rather than as a huge lump sum by committed up front by voters.

15: Assess Commercial Property Tax with Current Value

✅ Yes

California’s Proposition 13 is one of the worst tax policies in history. It effectively locks in the property tax rate based on the sale price, not the fair market value of the property. This simple rule causes a variety of poor second-order effects in the residential market, including:

Making California more dependent on income and sales taxes, which are more volatile.
Regressive tax: the homeowners whose property values have increased the most are taxed the least.
Incentivizes holding onto property and increases “tenure” of homeowners trying to avoid property tax reassessment. This reduces the market’s ability to increase housing supply in popular areas, such as Silicon Valley, because current residents don’t want to move out. For commercial properties, it incentivizes businesses to structure their property deals to avoid transferring ownership.

Proposition 15 would phase out this policy for commercial properties worth more than $3 million, starting in 2022. It’s an important first step in reversing the harms done by Proposition 13.

16: Diversity as a Factor in Public Employment, Education, Contracting

✅ Yes

In 1996, Proposition 209 banned affirmative action in public employment, contracting, and education by constitutional amendment. This proposition would allow public entities in the state to use affirmative action, while not mandating that they do so.

To me, this seems like an appropriate cleanup of legislation that should not have been created through propositions.

17: Restore Right to Vote after Completion of Prison Term

✅ Yes

Currently, they must finish parole to be allowed to vote again. I’m in favor of anything that expands the right to vote.

18: Permit 17-Year-Olds to Vote in Primary

✅ Yes

This proposition would allow underage voters to vote in a primary or special election if they would turn 18 by the date of the following general election. It seems like an easy way to increase young voters’ engagement.

19: Expand Elderly Property Tax Transfer and Reduce Property Tax Inheritance Loopholes

❎ No

This proposition allows homeowners who are over 55, disabled, or disaster victims to transfer their low property tax base (from Proposition 13) to their new primary residence without restrictions. (They must currently move within the same county and to a home with lower value.) They would be able to transfer their low property taxes three times instead of just once. This may reduce the cost of moving away from a high-demand area after retirement, freeing up housing supply, and provides a path for wildfire victims to move to a more defensible location.

Parents can pass their low Prop 13 tax rates to their children through inheritance. In exchange for giving the elderly an additional tax break, this proposition would limit such inheritance to the primary residence.

On balance, it seems like we’re not getting enough reform in exchange for increased tax breaks for older homeowners who don’t need it. Presumably, if they really wanted to move to a cheaper area, they could use the appreciation of their current home’s value to pay for the increased property taxes.

I also don’t like that this proposition is funded by realtors, who have spent over $40 million in cash contributions to “Yes on 19” committees. Realtors are already doing pretty well on the regulatory capture front.

20: Restrict Parole for Certain Non-Violent Offenses

❎ No

This proposition would limit parole for some non-violent offenses, such as shoplifting. It would also allow some misdemeanors to receive felony sentences. In addition, it disallows early release for child sex trafficking and felony domestic violence.

Passing this proposition would reverse recent progress toward reducing the prison population, which doesn’t make sense for minor offenses yet is very expensive for taxpayers. Disallowing early release for certain crimes can be implemented through the normal legislative process.

The donors in favor, according to Ballotpedia, are mostly police officers’ associations, plus a $300,000 contribution from Devin Nunes.

21: Expand Rent Control

❎ No

The Costa–Hawkins Housing Act limited the ways cities can enact rent control. This proposition would amend it in the following ways:

Instead of limiting rent control to properties built before 1995, allow cities to enact rent control on any property that was built more than 15 years ago.
Allow “vacancy control,” where the city limits the increase in rent charged to a new tenant moving into a vacant unit. Currently, cities can’t place limits on the rent charged to new tenants.

Rent control raises housing prices in the long run by reducing new housing construction and discouraging renters from moving. In addition, there is no reason this needs to be a proposition, which makes it difficult to undo in case of unintended consequences in the future. Costa–Hawkins could be amended or repealed by the legislature instead.

22: Exempt App-based Transportation from Employee Benefits

❎ No

AB 5 classified many independent contractors as employees. The state later issued hundreds of exemptions, which means AB 5 basically only applies to gig economy apps. Now Uber, Lyft, and DoorDash are funding a proposition to exempt themselves too.

In exchange, drivers would receive a wage floor set at 1.2 times the minimum wage, compensation for work-related injuries, and health insurance contributions.

The proposition could be amended by seven-eights of the legislature, rather than requiring another proposition. This is somewhat meaningless as the threshold is set so high that is is unlikely to be reached.

In my opinion, Uber, Lyft, and DoorDash employ a spectrum of drivers. Some are more employee-like while others are more contractor-like. I disagree with AB 5’s classification of all drivers as employees.

But I also don’t think a ballot proposition is the right way to address this issue. The regulation needs to change as we learn more about the economics of these operations. For example, Uber CEO Dara Khosrowshahi claimed that prices in San Francisco would rise by 20 percent if California drivers were employees. I’d like to run the experiment and see some evidence before making permanent changes.

Finally, Proposition 22 also acts as a referendum on whether $200 million in campaign contributions can allow an industry to write its own regulations in California. This is an absurd amount of money. To put it into perspective, every second YouTube ad in the past couple months has been paid for by the Uber–Lyft–DoorDash interest group. I don’t want to reward this type of behavior.

23: Requirements for Dialysis Clinics

❎ No

This is a rerun of 2018 Proposition 8. It is also the result of a dialysis clinic labor dispute. While it sucks to be on the same side as DaVita, the question of dialysis clinic regulation should be resolved through the normal legislative process.

24: Amend Consumer Privacy Laws

❎ No

This proposition would expand the GDPR-like California Consumer Privacy Act that passed in 2018. It’s not clear to me why this needs to be a ballot proposition. As I mentioned in the section about Proposition 22, I believe privacy regulation needs to change as we learn more about the effects of Internet services.

25: Referendum on Replacing Cash Bail with Risk Assessment Procedure

✅ Yes

The current cash bail system keeps defendants in jail if they are accused, but not convicted, of a crime and can’t come up with enough money. Being stuck in jail can cause further financial stress, such as losing one’s job, if ultimately proven innocent. It is a regressive tax.

SB 10 ended cash bail and replaced it with a system of risk assessments to decide whether a defendant should be released. Risk is assessed based many factors, including whether the crime is violent and whether the defendant has a history of violence.

This proposition is a referendum on SB 10. A “yes” vote puts the (already passed) law into effect, while a “no” vote repeals the law. It was placed onto the ballot by interest groups representing the commercial bail bond industry.

San Francisco ballot measures

A: Health and Recovery Bonds

✅ Yes

Allows the city to issue $488 million in bonds to pay for parks, housing/drug services, and infrastructure. The bonds would be repaid by increasing property taxes.

B: Create Department of Sanitation and Streets

❎ No

This proposition would split operations and cleaning responsibilities into the Department of Sanitation and Streets, while the design and construction responsibilities remain with the Department of Public Works. It also creates two five-member oversight commissions, one for each department. Members would be appointed by the Board of Supervisors.

It’s not clear to me that this reorganization would fix San Francisco’s sanitation problems. For example, the argument in favor says that this will allow “data-driven cleaning.” But they haven’t shown why that practice is impossible to implement under the current organizational structure.

C: Remove Citizenship Requirement for Members of City Bodies

✅ Yes

Currently, one must be a registered voter and US citizen to serve on city boards and commissions. This proposition would remove that requirement.

San Francisco has a lot of immigrants and it’s a long, difficult process to receive US citizenship. There could exist many non-citizens who have great ideas on how to run our city.

D: Create Sheriff Inspector General and Oversight Board

✅ Yes

I’m deferring to SPUR’s analysis because I don’t know much about how we handle police misconduct currently.

This detail stuck out to me:

In May of 2019, the Sheriff’s Department entered into an agreement with the Department of Police Accountability (DPA) to investigate several existing high-profile allegations of misconduct

[…]

In August of 2020, the relationship between DPA and Sheriff’s Department was modified in an effort to strengthen the provision of oversight. A key addition to the agreement includes the ability for incarcerated people and the public to file complaints directly with DPA, as opposed to the sheriff assigning cases to DPA.

It seems better to codify the agreement between the Sheriff and DPA into law.

E: Remove Police Department Minimum Staffing Requirement

✅ Yes

The City Charter requires San Francisco to maintain at least 1,971 police officers. This proposition would remove the requirement.

Regardless of your opinions on policing, having a fixed headcount number is a nonsensical way to make staffing decisions.

F: Various Business Tax Changes

✅ Yes

Among other things, this proposition primarily eliminates the payroll expense tax in exchange for increasing the gross receipts tax. The city would make an additional $97 million per year.

According to SPUR and the San Francisco Chronicle, this is a surprisingly good tax reform since the payroll tax can discourage hiring, while the gross receipts tax is progressive to reduce the burden on small businesses. It’s part of a long-term shift away from payroll taxes.

G: Permit 16-Year-Olds to Vote on Local Issues

✅ Yes

I am generally in favor of expanding voting. This seems like an easy way to increase engagement among young people.

H: Expedite Planning Process in Commercial Districts

✅ Yes

San Francisco’s approval process is disgustingly bad. It makes sense to expedite this process and remove the notification requirement for some uses, reducing the barrier to entry for new businesses.

I: Real Estate Transfer Tax Increase

❎ No

The city levies a transfer tax on real estate sales. This proposition would increase the tax on property valued at $10+ million from approximately 3 percent to approximately 6 percent. According to SPUR, the transfer tax is highly volatile income source because it depends on both property value and transaction volume. And San Francisco already has a high transfer tax rate.

Some proceeds would be spent on compensating landlords whose tenants didn’t pay rent due to COVID-19 hardships. In my opinion, this spending is dubious at best. Being a professional landlord is a speculative business: those who can’t weather a market downturn shouldn’t get into it.

J: School District Parcel Tax

✅ Yes

In 2018, voters passed a tax of $320 per parcel to fund the school district. The measure was challenged in court over the required voter threshold to pass, and until that dispute is resolved, the school district might not be allowed to spend the money raised by the tax.

This proposition reduces the tax to $288 per parcel. It also requires a two-thirds majority to pass, which would avoid the issue with the 2018 tax.

K: Authorize City-developed Affordable Housing

✅ Yes

Article 34 of California’s Constitution requires cities to receive voter approval before constructing low-income housing projects or paying private organizations to do so. This proposition authorizes San Francisco to construct up to 10,000 units under Article 34.

L: Tax Companies with Large Executive Pay Disparity

❎ No

This proposition increases the payroll and gross receipts tax rate of companies whose ratio of executive to worker pay for workers in San Francisco exceeds 100 to 1. The Controller notes that this would increase tax revenues by $60 to $140 million. SPUR notes that the increase is small enough not to affect the decision to do business in San Francisco and that companies may find other ways to compensate executives.

I am voting no because it seems to be difficult to enforce, while not bringing in much tax revenue or offering much of an incentive for more equal pay.

District ballot measures

RR: Caltrain Sales Tax

✅ Yes

The US-101 corridor is the most economically productive road in the United States. Caltrain is essential for moving workers between San Francisco and Santa Clara counties, which will keep the region productive after the pandemic. Caltrain currently faces a large budget shortfall from a decline in ridership.

This proposition would increase sales tax by 0.125 percentage points to fund Caltrain’s operations and upcoming electrification.

Speeding up code with vectorization

2020-04-17T01:42:28-07:00

I’ve been writing a lot of math code with latency requirements these days. When I talk to people about my problems, they usually suggest multithreading and general-purpose GPU computing.

These both have downsides. Multithreading might not be the best option in systems that already have a lot of concurrent workloads. And GPU computing adds the overhead of a round-trip copy to GPU memory, which might end up increasing latency depending on the problem size.

Vectorization can offer a way out. Modern CPUs have several copies of key components like arithmetic logic units (ALUs), allowing them to execute multiple operations in parallel — even on the same CPU.¹ Developers can take advantage of this functionality through special vector instructions.

For the rest of this article, I’ll use Clang 10 and Intel’s AVX2 extension in all examples. However, the same ideas apply to other compilers and architectures.

Here are the rules I’ve developed for vectorization:

Option 1: Make a library do it for you

If we use a library like Eigen, chances are it’s already vectorizing code for us. We can write maintainable, high-level code using Eigen’s Matrix type.

Let’s say we want to compute the Euclidean distance between two vectors:

#include 

using MyVector = Eigen::Matrix<float, 16, 1>;

float Distance(const MyVector& a, const MyVector& b) {
  return (a - b).norm();
}

Eigen overloads the subtraction operator with template magic that automatically uses vector routines when they’re enabled.² On x64, passing -mavx2 tells the compiler that it’s safe to generate a binary that uses AVX-2 vector instructions.³

We can now check the assembly to see whether Eigen vectorized for us. If you’re not familiar with reading x64 assembly, that’s okay. All you need to know in this step is:

Vector instruction names begin with the letter “v.”
Vector registers begin with “xmm,” “ymm,” and “zmm.”
Loops are implemented using branch (“b”) or jump (“j”) instructions.

Notice that the entire function uses vector instructions and doesn’t contain any loops despite working with 16-length vectors.

Eigen even decided to use the vsqrtss instruction instead of calling the slower std::sqrt, which has the appropriate error handling for negative inputs.

Distance(Eigen::Matrix<float, 16, 1, 0, 16, 1> const&, Eigen::Matrix<float, 16, 1, 0, 16, 1> const&):
  vmovaps ymm0, ymmword ptr [rdi]
  vmovaps ymm1, ymmword ptr [rdi + 32]
  vsubps ymm0, ymm0, ymmword ptr [rsi]
  vmulps ymm0, ymm0, ymm0
  vsubps ymm1, ymm1, ymmword ptr [rsi + 32]
  vmulps ymm1, ymm1, ymm1
  vaddps ymm0, ymm0, ymm1
  vextractf128 xmm1, ymm0, 1
  vaddps xmm0, xmm0, xmm1
  vpermilpd xmm1, xmm0, 1
  vaddps xmm0, xmm0, xmm1
  vmovshdup xmm1, xmm0
  vaddss xmm0, xmm0, xmm1
  vsqrtss xmm0, xmm0, xmm0
  vzeroupper
  ret

View on Compiler Explorer

Option 2: Make the compiler do it for you

Sometimes it’s not convenient to express an algorithm using an existing library, or maybe the code is small enough that it doesn’t justify the work to bring in another dependency.

In this case, we can write a normal loop and let Clang or GCC translate it into platform-specific vector instructions. The resulting code is extremely flexible, because it can be understood by anyone who writes C++, no library knowledge required. However, the compiler’s vectorizer isn’t as smart as library authors, and there are many situations where it doesn’t generate great code.

This code does computes the same distance, but with a handwritten loop:

#include 
#include 
#include 
#include 

float Distance(const float* a, const float* b) {
  // Compute the squared differences between a and b.
  std::array<float, 16> diffs;
  for (int i = 0; i < 16; i++) {
    const float diff = a[i] - b[i];
    diffs[i] = diff * diff;
  }
  // Sum the squares.
  float dist = std::accumulate(diffs.begin(), diffs.end(), 0.0f);
  dist = std::sqrt(dist);
  return dist;
}

We can again read the assembly to find out whether the compiler could vectorize our code. Clang 10 figured it out, mostly.

It successfully vectorized the subtraction and multiplication to compute squared differences.
Unfortunately, Clang wasn’t smart enough to use vectorize the summation of the array of squares, generating an add (vaddss) instruction for each of the 16 items.
It still generates an unnecessary branch with call sqrtf for when the sum of squares is negative.

Distance(float const*, float const*):
  sub rsp, 72
  vmovups ymm0, ymmword ptr [rdi]
  vsubps ymm0, ymm0, ymmword ptr [rsi]
  vmulps ymm0, ymm0, ymm0
  vmovups ymmword ptr [rsp + 8], ymm0
  vmovups ymm0, ymmword ptr [rdi + 32]
  vsubps ymm0, ymm0, ymmword ptr [rsi + 32]
  vmulps ymm0, ymm0, ymm0
  vmovups ymmword ptr [rsp + 40], ymm0
  vxorps xmm1, xmm1, xmm1
  vaddss xmm2, xmm1, dword ptr [rsp + 8]
  vaddss xmm2, xmm2, dword ptr [rsp + 12]
  vaddss xmm2, xmm2, dword ptr [rsp + 16]
  vaddss xmm2, xmm2, dword ptr [rsp + 20]
  vaddss xmm2, xmm2, dword ptr [rsp + 24]
  vaddss xmm2, xmm2, dword ptr [rsp + 28]
  vaddss xmm2, xmm2, dword ptr [rsp + 32]
  vaddss xmm2, xmm2, dword ptr [rsp + 36]
  vaddss xmm2, xmm2, dword ptr [rsp + 40]
  vaddss xmm2, xmm2, dword ptr [rsp + 44]
  vaddss xmm2, xmm2, dword ptr [rsp + 48]
  vaddss xmm2, xmm2, dword ptr [rsp + 52]
  vaddss xmm2, xmm2, dword ptr [rsp + 56]
  vaddss xmm2, xmm2, dword ptr [rsp + 60]
  vaddss xmm2, xmm2, dword ptr [rsp + 64]
  vextractf128 xmm0, ymm0, 1
  vpermilps xmm0, xmm0, 231
  vaddss xmm0, xmm2, xmm0
  vucomiss xmm0, xmm1
  jb .LBB0_2
  vsqrtss xmm0, xmm0, xmm0
  add rsp, 72
  vzeroupper
  ret
.LBB0_2:
  vzeroupper
  call sqrtf
  add rsp, 72
  ret

View on Compiler Explorer

Preventing auto-vectorizer regressions

One downside of auto-vectorization is that it’s not obvious to the reader that the compiler is generating vector instructions instead of the usual loop. We can consider using compiler extensions to ensure that our code stays vectorized.

In Clang, we write #pragma clang loop vectorize(enable)⁴ above any loop we want the compiler to vectorize. The pragma also emits a warning if the code fails to vectorize. For example, someone might inadvertently prevent vectorization by adding a print statement inside the loop. To avoid merging such a change, we can configure our build system to treat this warning as an error.

LLVM’s auto-vectorization documentation has more information about the supported flags and use cases.

Designing data structures for easier vectorization

When we vectorized with Eigen, the library’s organization around its Matrix type made vectorization easier. This is because all the values for each operand are stored in contiguous memory.

Writing normal C++ code doesn’t force us to comply with this constraint. We could have stored each pair values from a and b as members of a struct:

struct Input {
  float a = 0.0f;
  float b = 0.0f;
};
std::vector<Input> my_inputs = ...;

View on Compiler Explorer

The vector my_inputs stores the values interleaved: a, b, a, b, and so on. These values will need to be deinterleaved when they are loaded into a vector register so that we can have one register which only contains values from a, and another with values from b.

In the best case, the compiler wastes a few instructions shuffling data around. In the worst case, the interleaved data prevents the compiler from vectorizing the code at all.

Option 3: Write vector code by hand

If all else fails, then we’re stuck with writing vector code by hand. This doesn’t mean writing assembly. Compilers offer built-in functions, most of which map directly to vector instructions.

The benefit of this approach is fine-grained control: for example, we can choose faster math instructions if our algorithm can tolerate an approximate answer. This comes at the cost of flexibility. We must reimplement the algorithm for each target architecture. We also need to rewrite if we want the performance improvements offered by future instruction set updates.

The process usually starts with opening the Intel Intrinsics Guide to find the relevant functions. We can filter only instructions available on our target processor, then use search to find the ones (subtraction, multiplication, and square root) needed to implement our algorithm.

This handwritten code generates the same assembly as the first example:

#include 

float Distance(const float* a, const float* b) {
  // Load the 16-length inputs into two 8-length registers each.
  __m256 a1 = _mm256_load_ps(a);
  __m256 a2 = _mm256_load_ps(a + 8);
  __m256 b1 = _mm256_load_ps(b);
  __m256 b2 = _mm256_load_ps(b + 8);

  // Take the square of difference of the first 8 items.
  __m256 diff1 = _mm256_sub_ps(a1, b1);
  diff1 = _mm256_mul_ps(diff1, diff1);
  // Take the square of difference of the second 8 items.
  __m256 diff2 = _mm256_sub_ps(a2, b2);
  diff2 = _mm256_mul_ps(diff2, diff2);

  // Sum the first and second differences.
  __m256 sum_diffs = _mm256_add_ps(diff1, diff2);

  // Split 8-length register into two 4-length registers.
  __m128 upper = _mm256_extractf128_ps(sum_diffs, 1);
  __m128 lower = _mm256_extractf128_ps(sum_diffs, 0);
  // Add the upper and lower sums.
  __m128 sum1 = _mm_add_ps(lower, upper);

  // Swap the upper and lower two items.
  __m128 sum2 = _mm_permute_pd(sum1, 1);
  // Add the swapped vectors.
  sum1 = _mm_add_ps(sum1, sum2);

  // Move the 2nd item into the 1st position.
  sum2 = _mm_movehdup_ps(sum1);
  // Add the swapped vectors.
  sum1 = _mm_add_ps(sum1, sum2);

  // Take square root.
  sum1 = _mm_sqrt_ps(sum1);

  // Extract the result to a scalar array.
  float result[4];
  _mm_storeu_ps(&result[0], sum1);
  return result[0];
}

View on Compiler Explorer

It’s a good idea to keep the scalar implementation around so that a unit test can that we didn’t introduce any bugs from manual vectorization.

We can also consider a portable SIMD wrapper library like xsimd or MIPP. These libraries provide their own vector types, which the library maps to architecture-specific intrinsics. Keep in mind that libraries might not expose obscure operations that are only available on one architecture, so the portability can have a performance cost.

Conclusion

There are many ways to use vector instructions on modern processors. To write the most maintainable code possible for an application, try the options in this order:

Use an existing library, like Eigen or OpenCV. The code is extremely readable and fast, if the algorithm can be written in terms of the library’s high-level operations.
Use the compiler’s auto-vectorizer. The code is mostly readable and mostly fast, though complicated operations can confuse the compiler.
Write the vector code by hand. The code is tedious to write and maintain, but can express exactly what we want.

Superscalar processor, Wikipedia. ↩
For more information on how Eigen is implemented, check out What happens inside Eigen, on a simple example. ↩
See Eigen Vectorization FAQ for details on how to enable vectorization. ↩
Auto-Vectorization in LLVM. ↩

!!Con West 2019 Notes

2019-02-23T16:51:21-08:00

!!Con is a conference held every spring in New York City. It’s two days of lightning talks that can be about anything related to computers!

At the beginning of last year’s !!Con, I wrote:

This conference is a great showcase of the diverse backgrounds of the NYC tech scene. I’m really going to miss it when I move back to the Bay Area.

Fortunately, I spoke too soon! This year, we got !!Con West, held at UC Santa Cruz, which might be the UC campus with the most redwood trees. Here are my notes.

Day 1 Keynote
- The Best Parts! Of My Favorite Things!
Session 1
Day 2 Keynote
- Glitch Nuggets of Resistance!
Session 5
Session 6
Session 7
Session 8

Day 1 Keynote

The Best Parts! Of My Favorite Things!

Lynn Cyrin

Version control
- Branches like having three thoughts at once
Semantic versioning
- Quickly tell which releases are likely to be broken
Public and private APIs
- Progressive disclosure: read public APIs to understand quickly; private when you need to know the details
Infrastructure as code
- Communicate changes by diffing code
Automated documentation generators
- “Trick programmers into writing more” because you just write directly into the code
Copy paste
- Put all the potentially copyable code into a repo so you can copy-paste them in the future
Autocomplete
Postgres
- Document model and relational model
- Popular so you can always get help
Package Managers
- “Automated copy pasters”
- Node business cards: “websites but for terminals”
Web micro-frameworks
- Just do one job and not try to take over the whole project
Web (macro-)frameworks
- When you need most of the website done for you
Error contexts
- Good messages save time
Containers
Programming languages
- Python: “My literal mom”
- Ruby on Rails: Makes your code look cool
- JavaScript: Learning async
- Rust: learn memory by fighting the borrow checker (hope you don’t have a deadline)
- Golang: binaries in regular build process
Conclusion: tell your friends about your favorite things

Session 1

IMUs FTW!! Building IMU-based gesture recognition!

Jennifer Wang

Harry Potter gesture recognition wand
Compute to run the algorithms
- Raspberry Pi is big but runs TensorFlow, easy to prototype
- Cannot run Python on Arduino
In hardware, experimentation costs money instead of just time

EarthBound’s almost-Turing-complete text system!

Alex Rasmussen

EarthBound’s text system
- EarthBound is a Super Nintendo role-playing game
- Looks like normal game with normal text options
- Actually a tiny virtual machine that’s quite complicated for its job
Instruction set
- Registers: working, argumentary, secondary. 4 bytes each. 2 sets of registers that can be swapped.
- Normal instructions: text input/output, set flags, jump if flag set, jump and return
- Game-specific: set HP/level of players, movement in the game world, show sprites
- Extremely CISC: summon bicycle, summon photographer, teleport
Why study it?
- Mod the game
- Make a patch that turns it into a totally different game
- Build high-level language that compiles to this virtual machine

`/etc/services` is made of people! (and also ports!)

Breanne Boland

List of services, their ports, protocols, and… email address?
- Email addresses are the people who wrote the RFCs to reserve those ports for their services
API for getting service names: can map back and forth
- telnet lets you use both 22 and ssh
How do you get into /etc/services?
- Becoming less well known because many services restrict themselves to 80 and 443 to get through firewalls
- About 400 ports left – not too late!
- Fill out a form

“Wheels within whiles!” or possibly “Whiles within wheels!”

Michael Albaugh

“I’m about the same age as the stored-program electronic computer”
“Could the analytical engine emulate the difference engine?”
- Difference engine: stack of adding machines for producing math lookup tables of 7th order polynomial approximation
- Analytical engine: more complex, closer to modern computers
Why emulate?
- PowerPC Mac to Intel: run your old software on your new computer
- To learn about old computers or games that no longer exist
- As an easier way to study old computers even if they exist
- Try out future designs

Sadly, I had to leave early on the first day.

Day 2 Keynote

Glitch Nuggets of Resistance!

VJ Um Amel

Data body (as opposed to real body): made of SSN, GPA, SAT score, immigration papers, …
- “VJ Um Amel” is the name of Laila’s data body, which is her performative project over the past few years
Media theory and practice professor
Design
- Studies the artificial world
- Concerned with suitability for a purpose
Arab Spring: young people designing society through social media
- Group of young Arab techies developing things
Glitch: problem in a computer system
- Maspero incident: protesters interrupted (“glitched”) Egyptian state’s propaganda broadcast

Session 5

Guiding a starship with noise! And blinking!

Simon Porter

How does New Horizons navigate to take pictures of things?
- 1.6 kbps downlink best case, 0.5 kbps worst, 6 hours one way
Picked the first Kuiper Belt Object they saw
Determining the flyby parameters
- Need to preprogram the flight plan
- Spacecraft location from radio tracking
- Onboard optical navigation to control perpendicular motion
- Don’t know the range! Guess based on limited orbit data for the object
Using Hubble images to find the orbit
- Combine uncertainty of Hubble direction & uncertainty of object location in image
- Combine multiple observations from many amateur telescopes on earth to get the size uncertainty down to kilometers

The secret life of Not-a-Number!

Annie Cherkaev

IEEE 754 allows implementers to provide diagnostic information
NaN:
- Set all exponent bits to 1
- For the mantissa, set the first bit to 0 if it’s a signaling NaN, which should cause an exception when touched
- Other bits can provide diagnostic information (51 bits of space for a double!)
Use case: JavaScriptCore NaN Boxing
- JavaScript needs to track the types of variables
- If valid floating point, do the computation
- Otherwise, look at the NaN bits which tell the type of the variable
  - Booleans will fit because only have two values
  - Pointers: 48 bits max, so it works
  - Integers: JavaScript arrays can only have 2³² values so they can be indexed by a 32-bit integer

Hacking Lego! Computer generated Lego instructions!

Michael Knowles

Likes working with Lego, but unfortunately it doesn’t pay well
Lego mosaics: building low bit depth images with Lego
- Can we do this in 3D?
- Build a tool to convert models to Lego using voxel grid
- Merge voxels with the same color into taller bricks
- Export each layer to Lego CAD software

The world’s first racing-the-beam ray tracer on discarded FPGA hardware!!!

Tom Verbeure

Pano Logic G1 thin client
- Had no CPU! Everything is “in hardware” (FPGA)
- Very cheap because the company went out of business
Reverse engineer the board & want to render graphics on it
- “When you buy a big GPU, you have this urge to make it work at capacity. It doesn’t matter what it does.”
- Racing the beam: Atari frame buffer had only two lines & had to render one line while the other got sent to the CRT
- Prototype in C code to count how many operations needed per pixel and know how much hardware you need

Session 6

Robots, rockets, and more! Control theory in 10 minutes

Wesley Aptekar-Cassels

Control theory is everywhere, even non-obvious applications like queue length in a network

Minimax search and the structure of cognition!

Zack M. Davis

Minimax search
- Could look at all possible moves & pick the one that results in the best board position
- To have better long-term decision making, then search the opponent’s best possible responses, then your best possible response, …
- Simple algorithm that implicitly encodes many chess behaviors

Postgres plays Pokémon!

Liz Frost

Query your Pokémon in Postgre
- Read the Gameboy emulator memory in a Postgres extension
- Works because Pokémon info is stored at known addresses

Software patterns… from the 9th century?!!

Michael Arntzenius

70s garbage collection paper: once half of memory is used, copy all the live objects to the other half of memory
Medieval farmers: split field into two parts, alternating back and forth to allow farmland to recover
- Why should it be half? Why not plant 2/3?
- Dependencies of this change: plant legumes to restore nitrogen, deeper plows, horses, oats for horses…which needs better farm productivity
- Cycle in dependency graph
Not just a technological change — important social considerations
- Rearrange field ownership to allow longer fields for easier plowing
- People don’t want to make a change that risks starvation
- Our technology can support many things, but people don’t always want to make big changes

Session 7

How to throw out 95% of pixels in virtual reality, without anyone noticing!!

Amrita Mazumdar

95% of the eye’s photoreceptors are concentrated at the fovea
In virtual reality, your fovea covers a small percentage of the screen size (compared to phones)
Use neural net to guess where people might look, then compress everything else

How to calculate the phase of the moon very, very badly!

André Arko

Calendar app for werewolves required calculating the phase of the moon
Look up the day of a full moon, then add 27.321661 days every time
- Worked for awhile!
- 2 years later, it was about 3 days wrong
- Using the wrong number & can’t use the average
User interface
- 28 moon icons, so just bucket the floating point moon phase
- Multiple days could have full moon icon depending on rounding
- Solution: assign full moon icon to the day that contains the full moon

Value Your Types!

Eric Weinstein

Dependent type: a type whose definition depends on a value
- Example: list of integers where each value is larger than the value before it

The Conjuring: ransomware edition!!

Pranshu Bajpai

Goal: show that demonic posession (e.g. in The Conjuring) is similar to randomware
- Initial entry: always looking to (posess, infect) vulnerable hosts
- Needs a unique (item, encryption key) to perform the ritual
  - Symmetric key: malevolent entity uses a secret word to posess host, where the same key is used to decrypt
  - Some randomware uses the date and time to generate keys, which are easy to guess
- (Demons, malware authors) don’t like (exorcists, malware analysts)
Ransomware developers sell their malwre to ransomware opeartors, either one-time fee or revenue share

Session 8

Observability in the Kitchen: Improve Your Breadmaking Skills with Open-Source Monitoring!!

Daisy Tsang

Sourdough healthier than regular bread, but requires long fermentation process (about 1 day) and could mold if you’re not careful
Prometheus: open source system monitoring server written in Go
Raspberry Pi with temperature & humidity sensors
Write exporter for sensor to Prometheus
Highly over-engineered sourdough starter process!

Computers are fast! But how come they sometimes feel slow?

Mike Lazer-Walker

Latency matters to making computers feel instant, letting people use computers more effectively
Input latency of a game depends on many things: human reaction time, keyboard interrupt, OS, application drawing stuff, display buffer & response

My, my, TTY!

Tabitha Sable

1960s–70s: paper teletypes let you write on one typewriter and print on another connected typewriter
1970s: UNIX written on PDP-11; login always local because teletype was directly wired to server
1970s–80s: video display terminals (“glass teletypes”) emulated paper teletypes with more features, like control sequences, introducing complexity
- BSD shipped database of terminals and supported features
1980: BSD added curses library, allowing easy GUI building in terminal
Late 1980s: Stanford University Network workstation (or SUN for short), Macs, Windows PCs, X11 killed video terminals by bundling terminal emulator apps
- New features: setting title, colors, mouse clicks
- “If you absent mindedly click on the menu, it actually works, which was kind of horrifying to me the first time it happened”
People switch from telnet to SSH
Stuff we use today is influenced by a long history
- Bell used paper teletypes, which is why they made ed (actually a good editor when you have paper)
- Berkeley bought video terminals, which is why we got vi

Supporting macOS Mojave’s Dark Mode on the web

2018-10-25T23:51:59-07:00

macOS Mojave adds a Dark Mode for native apps that makes you look approximately 78 percent cooler when using the computer. In Safari Technology Preview 68, it’s now available on webpages too! Here’s how I added support to this website.

Download video

Using the `prefers-color-scheme` CSS media query

The release notes mention a new CSS media query for Dark Mode without saying how to use it. We can try to use a unit test from the WebKit repo as sample code instead.

In revision r237156, a test named prefers-color-scheme.html looks promising. It shows that the new prefer-color-scheme media query can either be light or dark.

Let’s assume our CSS already looks good in light mode. We can use the media query to add overrides for some rules in Dark Mode:

/* The CSS rules we had before. */
body {
    line-spacing: 1.2em;
    color: black;
    background: white;
}
@media (prefers-color-scheme: dark) {
    /* Overrides for Dark Mode. */
    body {
        color: white;
        background: black;
    }
}

We can even put CSS variables in the media query. Unfortunately, it’s a pretty cutting edge feature: as of October 2018, only 91 percent of US web traffic supports it.¹ That’s not enough for something as fundamental as setting the colors on a website.

Why don’t we use the media query for light mode too? Most browsers don’t know about prefers-color-scheme yet. If we had enclosed the light mode rules with @media (prefers-color-scheme: light) { }, none of the CSS rules would apply in those browsers. Our styles would only show up in Safari!

Designing for Dark Mode

In the code above, we simply swapped the text and background colors. But that isn’t enough to make your site look great in Dark Mode. Apple’s WWDC talk Introducing Dark Mode is a fun, lightweight video that outlines their design philosophy and offers helpful app design tips. The same rules apply to websites.

Here are my main takeaways (but you should really watch the video because the presenter is more eloquent):

Since text becomes white, links and other colored text should also become lighter.
- Similarly, visual cues that normally become darker (like an active button) should become lighter in Dark Mode.
Don’t mindlessly flip all the colors — not everything looks good inverted.
Dark Mode is supposed to let the content shine, so don’t darken or invert things like images.
Redraw icons to fill in areas that should be white.

Optional: Add more JavaScript

On this website, I already had a dark mode for the photo gallery. My site generator would create the gallery page with to trigger the dark CSS rules:

/* How Kevin's photo gallery works. */
body {
    line-spacing: 1.2em;
    color: black;
    background: white;
}
body#dark {
    /* Overrides for dark mode. */
    color: black;
    background: white;
}

I wanted to use the CSS rules I already had — duplicating all the rules from body#dark into a media query would be tedious and error-prone, especially since the media query is an experimental feature.

Experts agree that any computer science problem can be solved by adding more JavaScript. So I’ll listen to changes in the media query using JavaScript, then modify the tag to match.

First, do the media query:

var mql = window.matchMedia('(prefers-color-scheme: dark)');

The mql.matches flag will be true when Dark Mode is set. Add a callback to mql that runs when the media query changes. (We don’t have to animate the transition. The system captures a screenshot of the entire desktop before the transition and gracefully fades to the new appearance.)

function setDark(e) {
    document.body.id = (e.matches ? "dark" : "");
}
mql.addListener(setDark);

So far, our code only runs when the Dark Mode setting changes. We also need to set the initial light/dark state when the page loads:

document.addEventListener("DOMContentLoaded", function() {
    setDark(mql);
});

Here’s the whole thing:

var mql = window.matchMedia("(prefers-color-scheme: dark)");
function setDark(e) {
    document.body.id = (e.matches ? "dark" : "");
}
mql.addListener(setDark);
document.addEventListener("DOMContentLoaded", function() {
    setDark(mql);
});

That’s it!

Can I use…CSS Variables? ↩

What I learned in 2017

2018-06-08T00:00:00-07:00

At the end of 2017, I wrote down a few important things I’d learned that year. And now that we’re more than halfway through 2018, I decided to stop procrastinating and flesh out the details. Some of these things are probably obvious to you, but they were new to me!

Being right is worth nothing

School rewards us for being right: for knowing the right facts and using them to come to the right conclusions. I personally also put a lot of value on being right. But that’s not how the world works.

People might value the emotional aspect of a decision. They might prefer not to change because of inertia, or because the matter at hand isn’t very important. They might not want to admit that they were wrong. There are many other things people consider before the hard facts.

So it’s not always useful to point out the facts. Things often turn out better if I don’t. And in situations where correctness matters, because being right is worth nothing, being right needs to be combined with something valuable, like an emotionally intelligent presentation.

And even if other people don’t value being right, I can still value it if that makes me happy.

Curiosity before criticism

In a new situation, I find it easy to fall into the trap of pointing out things that are wrong. But it doesn’t help me learn. And the people who can change things usually aren’t receptive to hearing feedback in this way, because being right is worth nothing. (That saying really does apply to everything.)

I can rewrite each criticism of the format “you shouldn’t do this” into a question — “why is it done this way?” This changes the tone from confrontational to collaborative, which leads to better conversations.

Everyone’s a role model

Over the summer, I watched an interview with Michelle Obama where they asked her how to bring about change without influence. She pointed out that everyone already has lots of influence. It doesn’t seem obvious, but we each have people who are silently watching and learning from us.

It sounded corny then, and it still does now. But it’s true.

As far back as I can remember, there have always been examples of people who were watching and learning from me. This seems to be true for my friends too. It means that my words and behavior have more potential for positive or negative consequences than I think.

Vulnerability leads to closeness

This seems obvious in the context of one’s personal life, but I’d never considered applying the same idea to the workplace.

I had the opportunity to work on a really special team last year. Members of this team were comfortable with sharing just about anything over lunch: the usual workplace gossip, sure, but also life philosophy, deep-seated anxieties, horror stories of weekend trips gone wrong…you name it, we talked about it.

Sharing so much about yourself in front of people that you don’t know very well takes a lot of courage. But bringing your whole self to work also builds trust and connection. Watching this happen in an unexpected environment (the workplace) helped me realize how well it works.

Conclusions

I wrote these ideas out individually. But looking back now, an unexpected thread ties these lessons together: I learned all of them at work. I was super lucky to have worked alongside great people who taught me things in the context of the workplace that also apply to life in general.

If you’d told me these things at the beginning of 2017, I would’ve agreed with all of them. But it took the specific experiences of the past year to make me really understand.

!!Con 2018 Notes

2018-05-12T14:16:17-07:00

!!Con is a conference held every spring in New York City. It’s two days of lightning talks that can be about anything related to computers!

This conference is a great showcase of the diverse backgrounds of the NYC tech scene. I’m really going to miss it when I move back to the Bay Area.

Day 1 Keynote
- { }
Session 1
Session 2
Session 3
Session 4
Day 2 Keynote
- Build skills through hobbies! Bring them to work!
Session 5
Session 6
Session 7
Session 8
- The itty, bitty, tiny bytes that make up a Pokémon!
- Pseudofractals! Accidental aesthetics where math meets pixels

Day 1 Keynote

{ }

Mimi Onuoha

Exploring the implications of machine-readable world through art & programming
“Embedded in every technology there is a powerful idea” —Neil Postman
- Oral culture: prioritize memory.
- Written culture: prioritize logical organization of thought.
- Digital culture: prioritize data!
Obsession with data as the object
- Does it mean what you think it means?
- Hard to tell when you don’t have context
Pathways
- Tracked location data of several groups for 2 weeks
- Met in person to collect: make the collection relationship explicit, forcing people to think about it
Unable to collect? Common reasons of missing data
- Those with resources to collect don’t have an incentive to know
- Resists “metrification”: things are hard to categorize or don’t inherently generate data
- Nonexistence benefits or protects someone
Art piece: cabinet of missing datasets
- Filing cabinet with empty folders labeled with names of datasets that don’t exist
- Flip through all the things we don’t know

Session 1

Telling stories with traceroute!

Karla Burnett

How traceroute works
- Packets have a time-to-live (TTL) to prevent getting stuck in routing loops
- traceroute is a hack that sends packets with different TTLs to make different routers along the path respond
Telling stories through traceroute
- Send fake TTL expired messages from spoofed IPs
- Reverse DNS maps IP to hostname, but the hostname doesn’t have to be real
- Each line of the story needs a new IP: $3 per month

Tales of ⌧! Can You Tell Your Story When Your Character Is Undefined?!

Persa Zula

Small boxes that show up in text when characters not defined
- [ ]: “tofu,” so [X]: “not tofu”
Displaying characters
- Characters are encoded as codepoints in Unicode
- Font’s cmap table maps codepoints to glyph identifier

Turning Google Earth into SimCity 2000! (From Light to Pixels to Impossible Perspectives!)

Logan Williams

Perspective projection
- Farther objects are smaller
Orthographic projection
- Satellite photo pointing sideways has no perspective: all lines are parallel
- Used in drafting, art, and SimCity 2000
- Only possible when image sensor is same size as object (flatbed scanner)
- In rendering software, set field of view very small so input rays of light are almost parallel
Doing this in real life?
- Fly over the scene and take the same row of pixels from each frame
- Ensures all pixels formed by light coming from the same direction
- Produce images that represent time in addition to space

We built a map to aggregate real-time flood data in under two days!

Aruna Sankaranarayanan

Chennai floods in 2015
- Bad maps made it hard to communicate which roads were navigable
- Google only lets you annotate locations, not streets
- Used MapBox to crowdsource mapping during the floods
- 600,000 people used the software, showing that simple tools can be really useful

Session 2

Moving towards dialogue: collaborating with your computer using typed holes!

Vaibhav Sagar

Typed hole: something where you know the type but not the contents of the value
- Can specify in Haskell using underscore _1, _2
- Type inference tells you what the type should be
- In Idris, a more powerful type system also includes list length, so type inference can even help you write snippets of code
Untyped holes: don’t know what the type is, but still want help writing program
- Open research topic

Compressing the Library of Babel!

David Turner

Compressing the infinite library described in Borges’ short story
gzip?
- Works by finding identical sequences and inserting references back to them
- On Babel: compression ratio is 1:1 because of pigeonhole principle
Write a program?
- Kolmogorov Complexity = size of compressed data + size of decompressor
- Generation program only a few lines of Python
- Suppose some pages are missing: pigeonhole principle means representing missing books requires storing entire book

It’s super effective! Solving Pokémon Blue with a single, huge regular expression

Alex Clemmer

Pokémon Blue
- Little boy leaves home to wander the world and catch Pokémon
- Mostly wandering around in the grass (very tedious)
- Create a regex which accepts iff the moves are a winning game
How to solve?
- Pokémon Blue is a finite state machine
- Equivalence between FSM and regex (according to “nice book” by “this guy” Michael Sipser)
- Simplify Pokémon to use ASCII world maps
- World map → FSM → regex
Differences from actual Pokémon game
- Items can have state, player interaction
- Players can hold items
- Game has random encounters (use probabilistic regex)

Transform live video streams with code and a REPL!!

Mark Wunsch

Live coding: code improv! Code as the user interface, applied to video streaming.
Use Racket language and GStreamer multimedia framework
Write in Racket REPL to manipulate video stream, draw stuff, PIP

Session 3

Creating an Arabic Programming Language!

Ahmed Abdalla

Learning to program requires English proficiency
- Not just keywords, but also error messages & documentation
- English proficiency requires privileged backgrounds in some countries
Noor: Imperative, Algol-style programming language for kids
- Keywords are simple & informal so kids understand
- Used his Sudanese dialect because languages reflect the aesthetics of their creators
Arabic editor challenges
- Bidirectional language is overall right-to-left, but English words within Arabic text are left-to-right
- Ligatures required in the language, not just aesthetic

Evil Twins and the Secret Lives of Linkers!

Josh Bowman-Matthews

Symbol defined twice in different .o files. What happens when you link?
- If functions are not inline, linker complains about duplicate symbols
- Inlining makes a copy of the function inside each .o file
- Therefore, linker is allowed to pick one arbitrarily & not complain
Don’t accidentally write the same function twice

Satellites are talking to us! Let’s hear them out!

Ed Medvedev

“There are only a handful of things cooler than satellites”
Radio amateur satellites: tiny satellites for ham radio
Listening to satellites
- You will need radio dongle for computer & antenna
- Look up online when the ISS will be overhead
ARISS: Amateur Radio on ISS
- Runs BBS so you can chat with people relayed through space
- Broadcast images & talk to astronauts sometimes

The joys of PICO-8 token crunching!! Or: what I learned about programming from being restricted on every side!

Ayla Myers

PICO-8: game engine with retro graphics & sound, but modern programming
- Artificial limits on bit depth, memory, code size (tokens), …
- Built-in editors for everything
Economics of the PICO-8 device
- Trade-offs: smaller sprite sheet leads to more complex code
- “Spent” code tokens to “buy” more sprites
- Can draw PPF curve for tradeoff between e.g. code complexity & number of sprites

Session 4

UX for Cats and Dogs!

Joel Potischman

Problem: daughter moved to college but misses cat
- Manually send cat photos
- Can the cat send us selfies?
Solution
- Train cats to come eat when the sound is played
- Take a picture of cats while they wait for food
- Feed the cats

If you could solve this word tile puzzle, you could solve the halting problem! (Too bad you can’t!)

Kamal Marhubi

Tile puzzle (Post Correspondence Problem)
- Tiles have symbols on top and bottom
- Goal: order the tiles so that top and bottom say the same thing (can repeat tiles)
- Not possible to tell whether a solution exists
Reducing Turing machine to tile puzzle
- Make a tile from each state somehow
- ???
- Solving the tile puzzle is the same as running the Turing machine
Wikipedia has a really unclear explanation
- “Some things are actually hard, and some things just have too many Greek letters”

So THAT’S how my phone knows where I am!

Mike Lazer-Walker

Positioning throughout history
- Latitude is easy because you can look at the sun
- Longitude much harder, so people dead reckoned based on speed and time
- Solved with accurate clock that allowed using time zone as a proxy for distance
Urban canyon problem: GPS error gets really bad in cities
- Skyscrapers bounce the signal, messing with the distance

Day 2 Keynote

Build skills through hobbies! Bring them to work!

Liz Fong-Jones

Many years ago, a disowned trans woman who “almost didn’t make it”
Growth comes from learning new skills
Playing Puzzle Pirates game (2004–2008)
- Some people made new players work without teaching the game
- Learning mentorship: encouraging people to help newcomers by rewarding them
- Organizing people: conquering an island requires multiple levels of coordinators to solve puzzles & distribute booty as reward
- Active on community forums building diagnostic tools led to first tech job at Puzzle Pirates!
World of Warcraft (2008–2012)
- Building up new players’ skills to have a stronger team in next raid
- Not all leaders are blameless after the team loses
Eve Online (2012–present)
- After fleet commander died, there was no communication & everyone else died
- In incident response, “any call is better than no call”
- Value your trustworthiness: not scamming in the game pays off when other players trust with other opportunities
Factorio (2017–present)
- Learning what’s the most important to automate right now
- Refactoring a complex, running system
Playing these games taught her management skills in a safe environment before she became a manager for real
- Don’t be afraid to list non-traditional background on your resume
Intentional skill building
- What skills will your next project require?
- Does this activity help me get closer? (Or just mindless play?)

Session 5

Relativistic Software Calendars: It’s About Time!

John Feminella

Software makes assumptions
- About the machine (“integers are 32 bits”)
- About the world (“everyone has a real name”)
Time is one of these assumptions
What is time?
- Originally measured by motion of earth & moon
- Calendar is an abstract representation of this motion
- Software calendars bake in assumptions about being on earth
Enter relativity
- Time distorts depending on the velocity of observer
- GPS is one of the few programs that accounts for relativity

Undo all the things!

Tom Ballinger

bpython is a Python interpreter that can undo the effects of expressions
- Works by saving commands & rerunning only some of them
- Could be slow
- Doesn’t work if there are side effects
  - Random numbers, time, reading files
  - Non-idempotent actions: saving to files, buying shoes online
Use fork() to save state
- Fork every time you run a command
- Undo goes back to previous process
- Build this into readline() & run with LD_PRELOAD
  - Works with unmodified interpreters!
  - readline() now forks your program lol
Building an undo-able Lisp interpreter from scratch
- Save periodic snapshots
- When code changes, roll back to the snapshot before that code was run
- JavaScript version because “the word has to be on the left side of the parentheses or the programming language will never be popular”

Estimating the Value of Pi with a Dartboard and (Not so Much) Luck!

Stephen Tu

Estimating pi
- Sample a square uniformly at random
- Percentage inside circle is $ \pi / 2 $
Estimating with error
- Define an epsilon $ \epsilon $ which is the error on our estimate
- Define a probability (95%) that the estimate is within $ \epsilon $
- Need 100x more samples for 10x reduction in $ \epsilon $

Ray-tracing and special relativity: Rendering objects near the speed of light!

Lucy Zhang

If an object is moving near the speed of light, what would a photo look like?
Computer graphics
- Ray casting: for each pixel, figure out where the light ray from the camera hits in the scene
Relativity
- Light can never exceed speed of light, even if the light is on a fast-moving train
- Lorentz transformation converts between two coordinate systems that are moving relative to each other
Relativistic raytracing
- Define the rays toward the object with a 4th parameter, time
- Use Lorentz transformation
- Do intersection normally
- As cube moves faster, it compresses more (expected) & appears to rotate (why?)
Terrell rotation
- Rotates because farther light takes longer to reach the camera
  - Like rolling shutter effect but for relativity!
- Anyone with a computer can “rediscover” Terrell rotation with a bit of code!

Session 6

Talking to my past self (without introducing temporal paradoxes!)

Andrew Louis

Stores all data about himself
- “MSN was this cool software package that taught high schoolers how to type really fast”
Training a bot on chat logs
- Train RNN to reproduce chat logs from high school
- Sequence-to-sequence to respond to other messages
  - Originally developed for machine translation
- Bot tends to converge on safe responses like “lol” and “ya” which work in any situation
- Tends to respond with nonsense
Why is it hard?
- Training data won’t capture general knowledge about the world
- Needs working memory, not just the previous message
- “Before submitting a conference talk, make sure you’re not committing to solving an open research problem”

Four fake filesystems!

Omar Rizwan

Files are anything that you can open, read, write, close using a path
GrabFS
- Contains screenshots of each process’s windows
- Use shell script with cp to take screenshot every second
btfs (BitTorrent filesystem)
- Downloads blocks as requested
- Opening part of a file doesn’t require downloading whole thing
ytfs (YouTube filesystem)
- Making directory is a search query
- Directory populated with movie files of search results
Git filesystem
- Exposes remote repository as a directory
Get to reuse existing tools & UI
- Plan 9 operating system takes filesystem idea to the extreme

Using Postgres to `\watch` Star Wars!

Will Leinweber

Using psql, the command-line Postgres program, to watch ASCII Star Wars
- Import frames as database rows
- Store a function in the database that prints each row

If at first you don’t succeed at beating HQ Trivia, try cheating!!

Hung Truong

HQ Trivia is fun but we want to win
Pipeline
- iOS Vision framework to detect text
- Tessaract library for OCR
- Google the question with all answers
- Pick the answer with the most occurrences

Session 7

The Man Comes Around: and so does his sound!

Vince Allen

Johnny Cash did gospel music at the beginning & end of his career
- What are the differences between the two periods?
Identifying gospel songs
- Topic modeling: assigns words to categories to classify the entire text
- Apply topic modeling to lyrics
- Pick the category with religious terms
Comparing songs acoustically
- Take a spectrogram of the song
- Convert this to a vector
- I didn’t really understand the rest :-(

Step by Step: Algorithms that teach you math!

Evy Kassirer

Building a step-by-step solver for teaching math
Computer algebra systems
- SymPy can simplify expressions symbolically, which sounds like a solver
- Goal is to get the answer, so they represent division $ a / b $ as $ a \times b^{-1} $
- This would be confusing for explaining stuff
Building their own solver
- Search the parse tree for patterns like “number + number”
- Apply rules to transform to equivalent tree
- Identified which rules were applied to find relevant lecture videos

Whoa, pictures! A visual history of visual programming languages!

Emily Nakashima

Visual Programming Languages (VPLs)
- Programming & documentation are all in words
- Our brains are really good at visual stuff
Examples
- Scratch: Build programs with blocks
- GRAIL (1968): Draw flowcharts which are converted to programs with OCR
- Pygmalion (1975): Visually show state on screen
- Cube (1995): 3D flowcharts?
Challenges
- Diffing & merging visual programs
- Humans can handle more unique words (symbol names) than unique shapes
- Hard to represent complexity without losing the simplicity of visual programming
- Unclear whether VPLs actually make you program differently
  - People prefer whatever they learned first
Good applications of VPLs
- Simple systems: Twilio phone tree designer
- Kids & learning

Fast, but not too fast! What 17th-century windmills can teach us about database migrations

Wander Hillen

Windmill design
- Whole windmill rotates to match wind direction
- When windmill moves too quickly, slats automatically open to catch less wind
- Big changes in load: wind energy increases with $ \vert v \vert^2 $
WeTransfer depends on automation to handle load changes
- Throttle expensive background tasks when the system is busy

Session 8

The itty, bitty, tiny bytes that make up a Pokémon!

Jan Mitsuko Cash

Pokémon data structure
- 232 bytes each
- Stores properties like species ID, experience, nickname
- Backward compatible between generations 1–2, and gen 3 on
- Gen 3 added encryption

Pseudofractals! Accidental aesthetics where math meets pixels

Jes Wolfe!

Generate topographical map
- For each pixel, compute color as $ (x^2 + y^2) \bmod 2 $
- Weird artifacts! Doesn’t look like a parabolic topo map at all
Aliasing
- Caused by Nyquist limit
- If you zoom out too far, frequency of pixels (sampling) is no longer high enough to show frequency of topo map lines
Making art
- Zoom out slowly to watch the aliasing pattern change
- Use $ (x^2 + y^2) \bmod N $ and assign to colormap of N colors

Why we still can’t stop plagiarism in undergraduate computer science

2018-03-22T09:35:20-07:00

Imagine that you’re hired to work at your local public library. As an eagle-eyed checkout clerk, you soon realize that half the patrons leave without actually checking out their books! This leaves everyone else scratching their heads when the catalog doesn’t match the shelves. But conveniently, the library has an unused anti-theft alarm sitting in the back room.

There’s just one problem: your supervisor, though sympathetic to your cause, doesn’t want you using the alarm. You see, people take books for many reasons, and not all of them are malicious. Besides, who cares if the shelves are in disarray? They’ve been like that for years, but everyone who works at the library still gets their paychecks the all the same. If you want to set up the alarm, you’ll have to do so on your own time, in addition to your other responsibilities.

In many undergraduate computer science programs, this is the absurd reality we face when trying to combat plagiarism. Everyone agrees plagiarism is wrong. Everyone wishes they could stop it. Everyone has access to the tools that find it. But no one seems willing to take any action.

Why bother?

The most important goal is to keep the course fair for students who do honest work. Instructors must assign grades that accurately reflect performance. A student who grapples with a problem — becoming a stronger programmer in the process — should never receive a lower grade than one who copies and pastes.

Finally, as educators, we also hope that the accused student can learn difficult lessons about ethical behavior in the classroom rather than the workplace.

Understanding the scope of plagiarism

I’ve been a teaching assistant in a lower-division computer science course for the past two years. Each semester, in a class of 200 to 300 students, we typically discover 20 to 40 blatant cases of plagiarism on homework. And because of the nature of our process, even more cases go undetected. Here’s how it works:

We begin by uploading our students’ code to an online plagiarism detection tool. The tool compares the submissions with each other and with our massive back catalog of previous solutions, flagging pairs of similar programs. We’ll manually review nearly 1,000 of these pairs over the course of the semester, throwing out all but the most suspicious cases. These cases end up representing about 100 students.

Then, we apply another filter, keeping only the cases that contain indisputable evidence — for example, hundreds of lines copied right down to the last whitespace error. We have virtually eliminated false positives at this point. (In the course’s entire history, only one such example exists.)

That means we aren’t catching cases where there’s plausible deniability. And we certainly can’t find students who transform someone else’s homework beyond recognition. (They’ve demonstrated a better grasp of programming than your average copy-paste-rename job, but still haven’t learned what the homework was trying to teach.)

Because our process heavily favors precision over recall, the 20 to 40 cases at the end represent a lower bound on plagiarism in the course. This is corroborated by the results from Lisa Yan et al, the authors of a new plagiarism tool named TMOSS. In a Stanford course, 43 percent of true positives detected by TMOSS wouldn’t have been discovered by a traditional tool similar to ours.¹

The large number of students suspected of plagiarism isn’t unique to our course. In fact, if a university or course doesn’t have comparable numbers, it most likely reflects their (lack of) plagiarism detection, rather than the actual rate of plagiarism.

According to a 2017 New York Times report,² a course at UC Berkeley found about 1 in 7 students in violation of policies on copying code. At Stanford, one course suspected up to 20 percent of its students. And in spring 2017, Harvard’s CS50 reported nearly 60 of 600 students for cheating.

The aftermath

What happens after an instructor confronts a student about plagiarism?

Some admit their mistake right away. Others deny it for awhile before coming clean. And every semester, there will be a few conversations that go something like this:

“The TAs think you copied a few homeworks. You need to tell me what’s going on.”

“I didn’t cheat. I don’t know what you’re talking about.”

“Look, Bob, you have three screens of code that are exactly the same as Alice’s, except she called the parameter searchQuery and you renamed it to info_strings in your final commit.”

“Those similarities are a coincidence that happens fairly often in programming. Stop harassing me!” *files complaint with department*

Trying to gaslight the computer science department about computer science makes no sense, but let’s move on.

The long tail of case resolution times. After confronting students, conversations like the above take up the vast majority of the teaching staff’s time. A small but vocal minority of students hope to avoid consequences by dragging out their cases.

They send endless emails to the teaching staff, sometimes even getting parents involved. They appeal to the relevant offices in the university, claiming that in computer programming, independently writing hundreds of identical lines of code is a common occurrence. To refute these claims, the teaching staff might then be asked to provide a nontechnical explanation of the similarities, sometimes months after the semester has ended.

Combined with the time it takes to find cases in the first place, the time sink alone is enough to discourage many instructors from looking. But don’t worry, because there’s more:

Lack of support from the university. When there’s enough noise, administrators in high places start getting spooked. No one wants to end up on the front page of the New York Times. So rather than offer their support and influence, they do nothing. Teaching staffs are left to deal with the issue using their own time and resources.

Becoming the bad guy. At the end of it all, students take out their anger in course evaluations. Instructors become “terrible human beings” and worse for daring to find out why so many programs written by different people have the exact same bugs.

These reviews matter: the hiring process for faculty positions often requires candidates to submit their past student evaluations. And like any other form of anonymous online abuse, repeatedly reading personal attacks and other vitriol about yourself as part of your job carries a large mental health cost. (Imagine if your boss outsourced a portion of your quarterly performance review to a panel of YouTube commenters and Twitter trolls.)

On a more personal level, it hurts to know which students are plagiarizing. We teach because we want to help people learn. It might be a little disappointing when, for instance, a student comes to office hours at the end of a semester not knowing how to compile programs. But it’s far worse to find out that they’d asked for compilation help because they were trying to submit a friend’s solution.

Competing against inaction

It’s clear that instructors who choose to pursue plagiarism cases encounter a wide variety of disincentives, ranging from slightly unpleasant to truly dreadful. But on an individual level, nothing bad really happens to those who don’t bother confronting plagiarism directly. Research-track faculty can still publish their papers. And with fewer negative reviews, teaching-track lecturers might even improve their chances of being hired in the future.

So it should come as no surprise that most professors avoid doing anything about plagiarism directly.

Many creative solutions

I attended a discussion on plagiarism at this year’s SIGCSE, a computer science education conference. It opened my eyes to all the ideas faculty have to avoid facing plagiarism directly.

Some of them simply don’t generalize, like creating new homework assignments from scratch each semester. People who bring this up have likely never asked an instructor of an introductory course whether they’d like to spend large chunks of time rushing out assignments in exchange for also reducing their quality. And it still wouldn’t address plagiarism among students in the same semester.

But what really bothered me were all the moralistic solutions.

Making students sign an honor pledge. Offering leniency to those who turn themselves in during the 72-hour “regret period” after the deadline. Or even inviting students to email the instructor at any time if they are about to cheat, so that the instructor can talk them out of it one-on-one.

These are great ideas — for supporting students in courses with large enrollments. But they cannot be the only solution to combating plagiarism, because they do nothing to address the underlying incentive structure:

Benefit: Copying and pasting someone else’s code saves a ton of development time, and ensures a higher grade. Good grades might be tied to desirable outcomes like graduation, financial aid, and job prospects.
Cost: Zero! If the teaching staff doesn’t look for plagiarism directly, students know they’ll never be caught.

Empirical evidence shows that this is true. Our students sign an honor pledge at the beginning of the semester, but we did not see any change in the rate of plagiarism. Even for Harvard’s CS50, the birthplace of the “regret period,” David Malan reports that it “has not materially impacted CS50’s number of cases.”³

Finding our way out

Student incentives come from faculty

Student incentives are set by the faculty: for example, if the instructor assigns a larger weight to some assignment, students usually spend more time working on it. This means instructors have a lot of influence here! They can combat plagiarism by making it costly.

If students knew that there would be a consistent, non-zero cost to plagiarizing, many of them wouldn’t do it. Copying homework would no longer be in their best interest. To have an effect, the policy would have to be implemented consistently across all courses, beginning with the introductory courses, which play the role of communicating the department’s standards to new students.

Achieving this level of uniformity means we need to provide a strong incentive every instructor — not just the ones who care the most — to enforce the rules.

Fixing faculty incentives

University administrators should communicate their support. Instructors should know that, not only will they suffer no retaliation, but that the university encourages them to enforce university policies. This might require administrators to acknowledge the inconvenient truth of widespread plagiarism.

Next, we need to reduce the cost of looking for plagiarism.

Efficiently deploying plagiarism detection software. Most instructors I’ve spoken with use MOSS (if they use automated detection at all). However, each teaching staff must learn to use the software on their own, sometimes writing additional code to format the inputs or aggregate the results. We’ve somehow managed to turn plagiarism software — which should be written once and deployed widely at zero marginal cost — into something that is very expensive to use!

Instead of deploying the software on a per-class basis, the university should pay to integrate it into their learning management systems. The software could even automatically run on all programming assignments by default.

Spreading the workload among more TAs. Universities can further decrease the burden by increasing the TA headcounts of courses that enforce plagiarism rules. This ensures that time spent combing through the results of the software doesn’t come at the cost of teaching.

Improving the quality of results from plagiarism detection software. The software returns more false positives as class sizes increase, since the number of pairs of students grows quadratically. But because MOSS is proprietary, the community’s open-source development revolves around creating tools that wrap MOSS, rather than improving the core algorithm. Anyone who has an idea for improving the algorithm must first reimplement all existing functionality starting from the pseudocode in the paper.⁴ There is an opportunity to have a big impact by creating a free and open source alternative.

We won’t be able to fix everything. For example, personal attacks in course evaluations may be interleaved with genuine criticisms, making them difficult for moderators to separate.

I’m not smart enough to have all the answers. These are just some of my ideas for moving incentives in the right direction. It’s not even clear which ideas will work. But one thing’s for certain: students will learn to care when their teachers start caring.

Thanks to Nelson Gomez, Joshua Zweig, John Hui, Vivian Shen, Edward Wang, and two anonymous readers for their feedback and discussion.

Lisa Yan, Nick McKeown, Mehran Sahami, and Chris Piech. TMOSS: Using Intermediate Assignment Work to Understand Excessive Collaboration in Large Classes. SIGCSE 2018. ↩
Jess Bidgood and Jeremy B. Merrill. As Computer Coding Classes Swell, So Does Cheating. The New York Times. 29 May 2017. ↩
David J. Malan. Teaching Academic Honesty in CS50. 6 March 2018. ↩
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. SIGMOD 2003. ↩

SIGCSE 2018 notes

2018-02-22T14:59:03-08:00

SIGCSE attendees from Columbia’s Computer Science department.

Over the weekend, I attended SIGCSE — the ACM’s conference on computer science education — with the teaching staff of Columbia’s Advanced Programming course. We learned about everything from rubric design to creating community in large classes to catching plagiarism at scale. These are the notes from the more interesting sessions I attended!

Thursday, February 22
Friday, February 23

Thursday, February 22

TMOSS: Using Intermediate Assignment Work to Understand Excessive Collaboration in Large Classes

Lisa Yan, Nick McKeown, Mehran Sahami, Chris Piech

“Excessive collaboration” is cheating, such as copying too much from online solutions or classmates
Cheating detection is hard because of different patterns
- Submitting an online solution
- Viewing online solution, then modifying it
- Viewing online solution, then working independently
Intermediate code snapshots captured through Eclipse plugin at every compilation
TMOSS comparison algorithm
- Moss does $ O(N^2) $ pairwise comparison of tokenized programs
- At average $ M = 250 $ snapshots per student, $ O(N^2 M^2) $ is too much
- For each student, compare each snapshot against all final submissions: $ O(N^2 M) $
- Plot similarity score over “time” (aka snapshot index)
- In TMOSS, the distribution of scores overlaps much less than in normal MOSS, showing it’s better at distinguishing cheaters
Applications of TMOSS data
- Students who cheat also start homework later, do worse on exams
- Online matches start even later than classmate matches
- TMOSS-only students (those who modified the code to fool MOSS) are 43 percent of students!
TMOSS on GitHub

Lightweight Techniques to Support Students in Large Classes

Mia Minnes, Christine Alvarado, Leo Porter

Huge class sizes
- Enrollment is growing but we can’t hire faculty to match
- Students feel anonymous, perhaps more for URMs
Their previous work
- Enhance discussion sections as “micro-classes” with the 1 TA, 2-3 tutors, who built a relationship over the entire semester
- Students felt more included, but academic performance didn’t change!
- Very expensive so couldn’t justify poor ROI
This work: lightweight (cheap or free) first order approximations
From “micro-classes,” students liked having a smaller community
- Students liked familiar faces in assigned zone seating, about 50 per zone
- Assigned specific seating led to backlash
  - Zones give some choice
  - Can also say “next time, your seat choice is your zone for the rest of the term”
- Create physical space between zones, using walkways or barriers
- Took some convincing, for example asking students to move away from the empty area, reminding people about zones
Personal tutors (not TAs)
- In class: tutor stays in the same part of the classroom
  - Listen to students whispering in class to ask questions or provoke a discussion if students are confused
  - Continuity from tutors coming to class helps build connection
  - “Remember when we talked about XYZ in class?”
- Out of class: personalized congrats emails throughout the term
- Students reported a more welcoming environment
- Some tutors offered 15-minute 1:1 sessions and cut back on other responsibilities
- Tutors have “their own” students
TA said it wasn’t much more work and helped get to know students and class dynamics better

PVC: Visualizing Memory Space on Web Browsers for C Novices

Ryosuke Ishizue, Kazunori Sakamoto, Hironori Washizaki, Yoshiaki Fukazawa

Why visualization?
- Hard for newcomers to use debugging tools like gdb because they’re text-only
- Existing tools: Python Tutor, SeeC
- When a program changes, SeeC requires a compile and execute cycle to update the visualization
PlayVisualizerC (PVC)
- Web-based interpreter that doesn’t support full C syntax
  - Interpreter is an interesting trade-off — they don’t need to modify C compiler, but also don’t get all potentially helpful diagnostics for free
  - I wonder if this would be easier in clang
- Interpreter runs on server and sends program state to client

Special Session: Watch them Teach

Abstraction

Mehran Sahami

Context: CS1, written small programs using Karel the Robot, but only written methods with no parameters or return value
Abstraction: “Civilization advances by extending the number of operations we can perform without thinking about them.” —Alfred Whitehead
- Use the sqrt() function without remembering how it works
- Toaster analogy: put in bread, get out toast
- One toaster works by heating coils, but if it worked by magic, we wouldn’t use it any differently
Reuse
- CD player does not require us to change the electronics to play a new song
- If you buy an iPod, can reuse the same headphones rather than replacing whole system
- Then show the code
Teaching abstraction in a concrete way

We Should Give Messy Problems and Make Students Reflect on What They Learn

Paul Dickson

We give students large projects because we want them to learn by figuring it out
People learn a lot from reflection, but everyone hates it
- Only done once
- Not a big part of the grade

Teaching Students a Systematic Approach to Debugging

Roman Lysecky, Frank Vahid

Students encounter problems and try to fix them by changing random stuff
- Instructors try to teach how to use a debugger, but the problem isn’t debugging
- Students should learn a systematic approach to troubleshooting
Teach troubleshooting with everyday things — no code
- “My lights do not turn on”
- Create a hypothesis about the cause
- Design a test to validate or reject the hypothesis
Then move to programming
- Experiments include reading the code, executing the code, etc
- Web environment for writing and testing
Book: Troubleshooting Basics

Providing Meaningful Feedback for Autograding of Programming Assignments

Georgiana Haldeman, Andrew Tjang, Monica Babes-Vroman, Stephen Bartos, Jay Shah, Danielle Yucht, Thu Nguyen

Instructors use automatic grading to deal with CS enrollment growth
- Students code for a test harness
- Instructors write test cases, which the student perceives as feedback
Impact of feedback: past research
- “Binary instant feedback,” like unit tests, leads to trial-and-error programming and more cheating
- Students and instructors both benefit from “directed feedback”
Concepts and Skills Based Feedback Generation Framework (CSF^2)
- Cluster student submissions based on unit test pass/fail vectors
- Manually inspect the programs in each cluster and write a hint
  - Mapping of test vector to hint is known as knowledge map
- During the next term, classify the pass/fail vector of the test cases to decide which hint to display to the student
- Classification accuracy around 90%

An Explicit Strategy to Scaffold Novice Program Tracing

Benjamin Xie, Greg L. Nelson, Andrew J. Ko

Programming instruction lacks strategy to apply learned knowledge
- Their approach: teach people to trace code line by line
Why do people struggle to trace?
- Inaccurate knowledge of programming language semantics
- Don’t use external representations of program state (hold all the variables in your head)
- Weak problem solving strategies
- Not familiar enough to put all the above pieces together
Previous work
- Visualization tools: requires learning another piece of software
- “Sketching,” or writing down state of variables: students don’t use this on their own even after seeing it during lecture
Their explicit instruction approach
- Give explicit steps of how to solve the problem using line-by-line tracing
  - Don’t depend on students coming up with these strategies ad-hoc
  - Student: having a strategy “forces you not to skip around”
- Use memory table, which is sketching with a printed template (columns for variable name and value history)
  - Not going to magically fix poor knowledge of programming semantics
- Those who learned the strategy did better on practice exam questions, but didn’t better on midterm by a significant amount
Possible extensions
- Grade the memory table on an exam? (To provide partial credit?)
- Online tool to fill in the memory table?

Discussion: Combating the Wide Web of Plagiarism

Solutions may appear on websites outside the purview of US copyright law
Tutors who give away the answers, maybe because they’re trying to save time on the student
Is cheating a problem to be solved, or a fact of life?
Exams in a computer lab with real compilers and documentation (Google cache), which makes the homework exactly like the exam
Grading system where you cannot pass the course without having a passing grade in the exam category
Intro students believe there is only one solution to programming problem, so have them do a peer review after turning in the homework
Students who design their own project seem to cheat less
- Maybe harder to find a solution after writing a unique problem

Friday, February 23

How am I Going to Grade All These Assignments? Thinking About Rubrics in the Large

College Board has experience grading massive Advanced Placement CS exams consistently
- Can apply this reading experience in the classroom
Question development
- Development meeting is 3 days long!
- AP exam questions give away a lot of information because students can’t ask questions of the proctors
- This seems like the easy part
Rubric development
- Decide what skills you are testing and want to reward
- Write a canonical solution
- Give points for each part of the algorithm that the student writes
- Write general scoring guidelines to discourage shotgunning and cover cases outside the rubric
  - Extraneous code with side effects
  - Not declaring local variables
- Don’t start grading — test the rubric on a sample of student work, focusing on solutions that test your edge cases
  - Simple oversights: forgetting a semicolon, incorrect syntax, misspelling
  - Incorrect execution
  - Correct solution but different approach, which may or may not violate the assignment spec
  - Student wrote almost nothing but demonstrated some knowledge
- Write down how to handle those edge cases in the rubric
Applying the rubric
- Training to ensure consistency among graders, and over time (first/last exam graded)
- Grade each individual point on the rubric, not looking at the whole thing and making up a number
- Once a student loses points for something, continue evaluating the rest of the code as if they’d gotten that part
- Review samples with graders first
- Grade in pairs so people can ask questions
- Start with grading each solution twice, then only spot check
  - Positive feedback after double checking is important but often overlooked

BlueBook: A Computerized Replacement for Paper Tests in Computer Science

Chris Piech, Chris Gregg

Qualitatively, there are students who understand the homework but do poorly on handwritten exams
- Writing solutions from top to bottom, rather than interatively
- “Strange transfer task” to perform well on exams
BlueBook app user experience
- Takes over screen to prevent looking at notes
  - Detects when you switch to another window
  - No defense against VMs — relies on trusting that students won’t be “truly adversarial”
- Enter password to decrypt exam
- Editor with syntax highlighting
  - Periodic snapshots submitted to server in case of hardware failure
  - Doesn’t compile code: found that students would spend too much time trying to get it perfect
- Problem spec in HTML, CSS, JS
  - Can include interactive examples and even a demo solution
- Submit electronically and no way to go back
Potential extensions
- Automatic grading
  - 98% of student code doesn’t compile: TAs fix the syntax errors
  - Could let students compile but not run
  - Automatic syntax correction
- Better security
  - Nothing prevents students from sending exam to a friend offsite
  - Laptops make it easier to look at screens in front of you
- Equation editor, drawing diagrams

Using a computer-based testing facility to improve student learning in a programming languages and compilers course

Terence Nip, Elsa Gunter, Geoffrey Herman, Jason Morphew, Matthew West

Prairie Learn: similar platform as BlueBook
Testing model
- Students schedule exam appointment at computer lab over multi-day exam period
  - Take the exam at any time
  - Generic proctor, since students from multiple classes can take the exam at the same time
  - Allows preventing internet access more effectively, unlike BYO hardware
- Numbers in exam questions are randomized
- Grades go down over the exam period, showing that questions may not be leaking
  - Selection bias issue: students who need to cheat won’t come on the first few days
Results
- Instructor give more exams (quizzes) because they’re cheaper now
  - Achieves scale because the exams/automatic graders becomes a fixed cost for the instructor
- Grades have gone up at the low end

Ideas for fooling Amazon Go

2018-01-23T20:19:42-08:00

Amazon Go is a grocery store that does away with checkout lines by using computer vision to figure out what you purchased.

What happens if you…

Take an item off the shelf and give it to someone else.
Go shopping with your identical twin.
Use the restroom to put on a face mask.
Take two products off the shelf, swap the labels, and put the cheaper one back.
Bring an empty food container from your last visit, take the same item from the shelf, and put the empty container back.
- An optimization: open prepackaged food, eat it in the store, then reseal the packaging and put it back.

Thanks Chaiwen Chou and Mitchell Gouzenko for the inspiration.

How to create a digital Suica card in Apple Pay (2024 Update)

2017-11-05T23:28:23-08:00

Suica is a smart card used to pay at train stations and convenience stores in Japan. In 2016, Apple added support for Suica to Apple Pay on iPhone 7 devices sold in Japan. In 2017, Apple added support to all iPhone 8/X and later, regardless of where they’re sold.¹

Virtual Suica card in the iOS Wallet app.

But there are some limitations. When a physical Suica is transferred to Apple Pay, the physical card cannot be used anymore. This could be inconvenient if you prefer to keep your physical card active as a backup. To have the best of both worlds, you can generate a new Suica just for Apple Pay.

Generating a new card with the Suica app (2024)

This section was last updated on 21 Mar 2024.

Download the Suica app on the App Store.²

Note that the Suica app is only available in Japanese as the English version has been discontinued. However, you can still use the app by following the screenshots below.

The app opens to a grayed-out Suica card. Tap the plus (+) button in the top right corner.

You can now choose between several types of Suica. Scroll to the last option, which is the anonymous (無記名) Suica. Tap the green button at the bottom of the screen:

The next screen reminds you about the restrictions of anonymous cards. You will not be able to receive these services at the customer service counter:

Suspending the card in case of a lost or stolen device
Refunds
Purchasing commuter passes

Tap the button in the top right corner to continue.

The next screen shows the terms of service. There are two agreements in the list. Tap each one to read it.

Scroll to the bottom of each agreement, then tap the button in the top right corner to accept the terms.

Tap the white table cell to set the card’s initial balance.

Once you have selected a balance, tap the Apple Pay button to load the initial funds. You will be charged in Japanese yen. If possible, use a card without foreign transaction fees.

You will then be prompted to add the card to the Wallet app. It might take a few minutes before the card is ready to use.

Generating a new card with the Suica app (2017)

⚠️ The information in this section is out of date. See 2024 update above.

Update: There’s now an English version of the Suica app, SuicaEng. You can download that instead of following these instructions for using the Japanese app.

Update 2: The SuicaEng app has been discontinued.

Download the Suica app on the App Store, then follow the screenshots below because the app doesn’t have an English localization.

Tap the + button in the top right corner. It will take you to the page below. Tap the first option:

Tap the button in the top right corner to continue. In the alert that appears, tap the second option.

Tap the table cell to set the initial balance of the card.

After this, tap the Apple Pay button to pay for your new card. You’ll be charged for the initial balance, and maybe a foreign transaction fee depending on your bank. Follow the instructions to add the card to the Wallet app.

If you need to move your Suica to a new phone

On the old phone:

In the Wallet app, tap the (i) button next to the Suica.
Scroll to the bottom and delete the Suica from this phone.

On the new phone:

Open Settings > General > Language & Region. Make sure your Region is set to Japan.
- You don’t need to change the Language setting.
In the Wallet app, tap the + button.
Don’t forget to change your region back to what it was before!

Use Suica, PASMO, or ICOCA cards on iPhone or Apple Watch in Japan. Apple Support. ↩
The iOS Wallet app can also create a new Suica card without needing to install the Suica app. However, the Wallet app requires a Japanese payment card to load the Suica, making this approach not suitable for travelers. ↩

Kevin Chen

Real estate is one of the hardest open problems in scaled self driving

Will self driving follow software scaling laws?

The many jobs to be done of a robotaxi depot

Location

Electric vehicle charging

Data offload

Where do we go from here?

Decoupling the requirements

Reducing charging power

Reducing data logging rate

Conclusion

Large language models are a sustaining innovation for Siri

How autonomous vehicle simulation works

Contents

Our imaginary self-driving car

Replay simulation

Interactivity and the pose divergence problem

Synthetic simulation

The high cost of realistic imagery

Round-trip conversions to pixels and back

Skipping the sensor data

Making smart agents

Generating scene descriptions

Limitations of pure synthetic simulation

Hybrid simulation

Conclusion

Why autonomous trucking is harder than autonomous rideshare

What is the driverless bar?

Truck-specific challenges

Stopping distance vs. sensing range

Controls

Freeway-specific challenges

Achieving the minimal risk condition on freeways

Freeways are boring

Is it ever going to happen?

How Cruise vehicles return to the garage autonomously in heavy rain

Monitoring the Cruise app

Visiting the garage

Two vehicles skip the garage

Key take-aways

Inferring Cruise occupancy from Kyle Vogt’s fleet dashboard screenshot

Background

Cruise’s letter to the CPUC

Interpreting Kyle’s screenshot

Tearing down the Rewind app

Contents

How it works: Overview

Analyzing the Rewind app

Application Bundle

Frameworks

Permissions

Excluded Applications & Private Browsing

Storage Format

(1) chunks: H.264 videos

(2) temp: PNG screenshots

(3) db.sqlite3: Metadata

What’s on my ballot: November 2022 California general election

Contents

California: Voter-nominated Offices

Governor

Lieutenant Governor

Secretary of State

Controller

Treasurer

Attorney General

Insurance Commissioner

Board of Equalization, District 2

State Assembly, District 17

Federal

United States Senator

United States Representative, District 11

California: Non-partisan Offices

Judges

Superintendent of Public Education

San Francisco: Education

Member, Board of Education

Member, Community College Board (term ending 2027)

Member, Community College Board (term ending 2025)

San Francisco

(1) `chunks`: H.264 videos

(2) `temp`: PNG screenshots

(3) `db.sqlite3`: Metadata