!!Con is a conference held every spring in New York City. It’s filled with all kinds of hilarious lightning talks that celebrate the joy, excitement, and surprise of programming – you might’ve heard of it from the talk “I made a cell phone! (DON’T TELL THE FCC KTHX!)” given at last year’s conference.
I attended with some of the teaching staff from the Advanced Programming course at school. Even though our class is about systems programming, we still learned a ton and we’ll definitely be back next year!
Contents
- Catt Small: The Creative Programmer (Keynote Speaker)
- Kiran Bhattaram: The tales of the cursed operating systems textbook!
- Mark Allen: values of β may give rise to dom!
- Lydia Gu: Ping! Are you there?
- Katie Bechtold: Code in Spaaaaaace!!!
- Allison Parrish: Lossy text compression, for some reason?!
- Brendan Curran-Johnson: A Shot in the Dark!
- Sher Minn Chong: Plants are Recursive!!: Using L-Systems to Generate Realistic Weeds
- Diana Liao: Mixing Paint! With Computers!
- Andreas Fuchs: I’m not a number, I’m a free file descriptor!!1 (Our protagonist promptly disappears down a wormhole)
- Kamal Marhubi: Storing your data in kernel space: an excellent bad idea!
- Samy Al Bahra: Debugging debuggers!!!
- Laura Lindzey: Convolution and the Fourier Transform: Math! (in pictures!!)
- Jennifer Fernick: All Together Now! Programming the Quantum Computer
- Mark Phillips: Upstream/Downstream: Discovering and Displaying Watershed Topology!
- Mark Dominus: My favorite NP-complete problem!
Catt Small: The Creative Programmer (Keynote Speaker)
Code culture and her path into programming
- Programmers focus on the technology to use, rather than what the result should be.
- Consequence of programming education without projects
- Everyone has their favorite language
- That language isn’t necessarily the least frustrating for beginners to start with, or the best tool for the job at hand
- The technology becomes a distraction rather than a tool for making things
- “The real programmer thing” — impostor syndrome
- You can be a ~*~real programmer~*~ even if you didn’t “make your own operating system from your own blood”
- Hurts people through burnout (pressure to avoid taking a break)
- Keeps people out of the industry because we portray programming as all-in?
- Catt’s path to programming
- As a child, obsessed with all things anime
- Otaku World had tutorials for buiding KISS dols through text files (they’re still on the website!)
- Much more interactive than paper
- Needed a website to show off dolls => learned web design
- Xanga and LiveJournal let you change the code
- First job: building site with menu and scrolling inspirational quotes
- Learned JavaScript from a book she found in a box on the street
- => Celebrate what you make even if it’s not perfect
Kiran Bhattaram: The tales of the cursed operating systems textbook!
- Familiarity with OS (Dad handed out Ubuntu CDs at dinner parties), but never knew the whole thing
- Every chapter she read in this textbook unearthed a new kernel bug
- Chapter 1: Bunch of slab memory allocated
- This was how they reloaded a process gracefully
- When you fork and kill the parent, the kernel fails to free the
anon_vma
object
- Chapter 2: Sending to
localhost
took 40 ms.- Nagle’s algorithm on the client: don’t send a small TCP packet until you have all the data, or until 40 ms pass
- Server had delayed acks
- They were just waiting for each other
- Chapter 3: Failed to restore backup from database (“oints” error message)
- Kernel buffers disk writes, so need to fsync and stop writes before imaging the database server
- Chapter 1: Bunch of slab memory allocated
Mark Allen: values of β may give rise to dom!
One of the strangest error messages in an operating system
- Backstory: K & R at Bell Labs
- Multics: trying to build a massively concurrent (25 users) operating system for phone switches, but the project failed
- After Multics canceled, wanted to develop on PDP-11: $483,000, 256 kB main memory and less powerful than TI-84 graphing calculator
runoff
: typesetting program, now distributed asgroff
(GNU Runoff)- Convinced management to buy PDP-11 as a project for typesetting patent applications on the computer
- “Values of β may give rise to dom!”
- They came into the office and found this phrase on something that’d been typeset
ed
inserted the exclamation mark for syntax errors- They put the phrase into a text-to-speech synthesizer and thought it was the funniest thing ever!
- So they put it into the
mv
program for when move dotfiles
- System 6 kernel: “If you don’t want to read an operating systems textbook, just read the source” (10,000 lines of code)
- Ethos of Unix: do one thing at a time, tool composition, everything is text, document everything (including bugs)
- Simplicity, elegance, openness
Lydia Gu: Ping! Are you there?
The ping
tool: “Whenever I’m feeling lonely and wondering if I have a connection…to the Internet”
- History of
ping
- Mike Muss wrote it and the required kernel support to debug a networking problem
ping
tool itself- Layered networking model: link layer (Wi-Fi, Ethernet), network (IP), data (TCP)
- Sockets are TCP, but raw sockets are to the network layer (required by
ping
) - ICMP is a messaging protocol for sending debugging messages between servers
- Security of
ping
- Smurf attack: ping the broadcast address with the victim’s IP as the return address
Katie Bechtold: Code in Spaaaaaace!!!
What it’s like to program for spacecraft
- Developed code for New Horizons, which has flown past Pluto
- What the environment looks like
- Spacecraft is a network of embedded computers
- High latency for talking to spacecraft
- Lot of energy and money to reach interplanetary space (Earth is a gravity well)
- Expenses leads to risk averse behavior by management
- Resulting engineering decisions: deterministic techniques
- Realtime operating systems
- Not necessarily fast – most important need is predictable timing to use instruments and thrusters at the right time
- No dynamic memory allocation after initialization (memory leaks, heap fragmentation, unpredictable timing)
- No recursion, loops must have a statically determined upper bounds
- Languages: C, Ada, ???
- Bus standards are deterministic (no random backoff), usually by assigning time slots by some schedule
- Modern standards are packet-switched and point-to-point, but still avoid randomness
- Hardware: low power, radiation (shielding, voting logic, watchdog timers)
- Use older hardware: Voyager had 8,000 instructions per second, New Horizons was 12 MHz
Allison Parrish: Lossy text compression, for some reason?!
Poetry and programming
- JPEG compression relies on discrete cosine transform (DCT)
- Key insight is that time series can be represented with 100 percent fidelity by the sum of the same number of cosines, so just store the coefficients
- Most information is carried by the high-frequency waves, so throw away the low-frequency waves’ coefficients to save space
- Using DCT to compress text represented as bytes
- Represent some lines from Genesis using waves
- Gibberish if we throw away too many high-frequency waves
- Surprisingly, the stenographer transcribed it
- Better: map words using
word2vec
so similar meanings are near each other in the vector space- Can leave out 80 percent of coefficients before compression changes source text
- Still basically the same text, but with strange synonyms substituted
- Only use the first cosine => “diary measuring diary measuring”
- “I promised to talk about the practical implications of this, but the truth is I don’t really know”
- Future work with text as a waveform: auto-tune the text?
Brendan Curran-Johnson: A Shot in the Dark!
Reverse engineering a camera
- Camera has tons of features, including “live composite” which blends images together as if it’s a long exposure
- Pointed it at a screen to experiment with how it took pictures (it was faster than 30 fps)
- Connect to camera over HTTP, and found a command that listed all the private APIs and how to use them
- “I’m not even sure if message is the right word to use for TCP, so you don’t need to know a lot to use Wireshark” => don’t need an expert to do things, and they might still work
Sher Minn Chong: Plants are Recursive!!: Using L-Systems to Generate Realistic Weeds
- Fractals: a shape that contains a tiny version of of itself
- Koch snowflake: drawn by taking a line, replacing every straight line with a V recursively
- In nature: fern leaves, broccoli, trees
- L-systems: represent fractal as a set of string rewriting rules (like a formal grammar)
- Map string to drawing by letting each character represent an action
- Draw a weed by specifying a different set of rules
- Deterministic: we get the same structure over and over again
- Adding randomness
- Goal: look like the same species, but not identical
- Randomly decide between several very similar rules
Diana Liao: Mixing Paint! With Computers!
- Elementary school: you can make any color by mixing red, yellow, blue (subtractive color)
- Interpolation using normal color spaces
- Mix colors by linear interpolation (weighted average)? Usually works, except in RGB blue + yellow = gray :(
- Use CMYK instead? Usually works, except blue (1, 1, 0) + yellow (0, 0, 1) = black (1, 1, 1)
- There are other color spaces, but won’t go over them
- Find a physics equation to explain it
- Kubelka–Munk theory: given light and surface pigments, returns the reflection of light from a surface
- Previously used in Krita art application, but too complicated to maintain
- Need to know how transparent your pigments are, etc
- Hacky solution: manually set a midpoint
- If you want a blue to yellow gradient, manually add a green midpoint
- Paper iPad app uses this
- They also implemented Kubelka–Munk, but it was too realistic (didn’t always do what users expected, even though it was right)
- Summary: mixing colors is complicated and not intuitive
Andreas Fuchs: I’m not a number, I’m a free file descriptor!!1 (Our protagonist promptly disappears down a wormhole)
- File descriptors are an index into a table with bookkeeping information about the open file
lsof
: shows the open files- File descriptors that are automatically opened: 0 (stdin), 1 (stdout), 2 (stderr)
- Sharing file descriptors
- Inherit them by forking
- Pass them with a Unix domain socket, which the receiving process opens as soon as they read the message
- This includes sending Unix domain sockets through Unix domain sockets
- Circumventing file limit with domain sockets: what if you want to open more files without talking to sysadmin?
- Make a pair of domain sockets, with each end connected to the same process
- Send file descriptor into domain socket
- Close the file
- When you need it back, read it out of the domain socket
- Limit to the number of descriptors inflight in socket: 5,000 on Linux and 1,500 on OS X
- Can we take this further? Put the domain socket into another domain socket => 200,000 open files (then the OS itself cannot open files)
- On Linux,
lsof
only shows the domain sockets! - Linux does mark-and-sweep on file descriptors in sockets to know when to close
- Download test program
Kamal Marhubi: Storing your data in kernel space: an excellent bad idea!
- We want memory usage of our proess to appear low
- User programs normally cannot access kernel memory (enforced by memory management unit)
- Store the data in pipes and have an interface that looks like a continuous buffer
- When you need data, read out the entire pipe, get what you want, write it back in
- Linux max pipe buffer size is 1 MB, up to 64 MB total
- Exception:
pipe()
system call never fails, it just might give you less (4 kB) than you ask for - 64k file descriptors per process, so up to 256 MB (close the writing end of the pipe when done storing data)
- Exception:
- When you run the test program, the out-of-memory killer will kill the display process eventually
Samy Al Bahra: Debugging debuggers!!!
- Programs are compiled into Executable and Linking Format (ELF) on Linux, which kernel parses
- Machine code: no types, no variables (only registers and addresses)
- Symbol table maps these back to variable names and types
.debug_line
,.debug_frame
, and.debug_info
: code that’s executed to build data structures with information about how the executable is laid out- DWARF needs to support crazy compiler optimizations, so it’s complicated and Turing-complete
- Some compilers emit crappy DWARF, which might make it look like more variables are optimized out
- Debuggability (DWARF quality) is a really important factor in choosing compilers
- Some debuggers rely on reading thread data structures, which are easy to corrupt
- Things can actually be optimized out:
- Usually loop counters are optimized to save a register
- Tail-call optimization can remove all but the outermost stack frame
- Function arguments can be removed
Laura Lindzey: Convolution and the Fourier Transform: Math! (in pictures!!)
- Motivation: mapping ice sheets
- Fly plane that sends radar through ice
- Cross-correlation (f ★ g): difference between two functions
- Correlation or convolution: In convolution, need to multiply g by –1 before running cross-correlation
- Applications of cross-correlation
- In the radar application, the transmitted and received signal are convolved
- CDMA
- Image processing: Gaussian blur, edge extraction are just convolution with a special image
- Fourier transform tells the frequency components of a time series
- Application of Fourier transform: noise reduction
- Once we know the noise profile (frequencies) of equipment, we can cancel it out
- Take the Fourier transform data with only noise => amplitude and phase of the noise
- On image data, subtract those from transformed data, then run inverse Fourier transform
- Convolution is normally O(N^2), but can be O(N log(N)) if we use Fourier transform
- Makes a lot of image processing computationally feasible
Jennifer Fernick: All Together Now! Programming the Quantum Computer
- Today, quantum computers are mostly imaginary: we can only have ~15 qbits at a time
- Need specialized buildings, or someone walking can disturb the state
- Measuring state disturbs it (how to make memory?)
- 64 qbit quantum computer = 18 quintillion parallel operations = 5,000 Earths covered with supercomputers
- Qbits are composed into quantum logic gates, which are composed into quantum circuits
- Qbits are very small, but the machines around them are very big
- Note: D-WAVE uses quantum annealing, which isn’t the kind of quantum computer that will break cryptography, but can apply to optimization problems
- Quantum programming languages: want a classical-looking programming interface, but run on a quantum computer
- Challenging because the computers don’t exist yet
- What should the language’s semantics be?
- Speed up traditional applications by providing quantum algorithms with better scaling
- Break RSA encryption with Shor’s algorithm (exponential speedup of factorization)
- Previously: designing crypto asks, what problems can computers not solve?
- Now: what problems can computers and quantum computers not solve?
Mark Phillips: Upstream/Downstream: Discovering and Displaying Watershed Topology!
- Problem: given a point on a map, where will the water flow from/to that point?
- Solution: The watersheds website shows blue line for the path of water from the point, and red polygon for areas that drain to the point
- Basically all of the midwest flow to New Orleans
- In Utah and Nevada, many areas never flow to the ocean
- Can find the continental divide by seeing
- USGS: watershed boundary dataset
- The boundaries follow local ridge lines
- They’re carefully chosen so that the water in that region leaves one point (each polygon drains to another polygon)
- Now it’s a directed acyclic graph, and we know how to work with those!
- Other problems
- Dataset is 2 GB: how do you do anything interactively? => simplify the polygons
- Simplifying the polygons can lead to gaps between adjacent simplified polygons on the shared border => TopoJS format to store the polygons
- DOM manipulation is slow => use
<canvas>
to avoid making a bunch of DOM nodes - Rendering still slow because O(N) => precompute the region so now rendering is O(sqrt(N))
Mark Dominus: My favorite NP-complete problem!
- P problems are “easy” to solve and “easy” to check
- Sorting is so easy that you can check by just sorting the original list
- NP problems are “hard” to solve but “easy” to check
- Knapsack problem: if something doesn’t fit, don’t know if you made a mistake in the arrangement of the van’s contents. But if it fits (if you are given a solution), it’s easy to see.
- SAT (satisfiable) is a “special” NP-complete problem because any problem in NP can be converted into a SAT problem
- Solving SAT or any of hundreds of NP-complete problem efficiently will solve the rest of NP
- No known algorithm that solves these problems quickly for every input
- Mark’s favorite NP-complete problem: Elmo’s World
- Each episode is 20 minutes long, about some baby topic: babies, bicycles, lifeguards, mail, weather
- They are distributed on DVD: 3 episodes per video release, such as “Firefighters, Lifeguards, and Nurses”
- How do you pick the groups of three? Constraints: groups can’t overlap, all groups must be on some DVD, and some combinations are weird (getting dressed, putting on shoes, and sleeping)
- This is exact cover by 3-sets (X3C) Richard Karp
- Conclusions
- One of the canonical, original, NP-complete problems is the same as planning Elmo video releases
- No good algorithm, so how did they solve it?
- They didn’t: “Flowers, Bananas, and Hair” was a DVD