Great multi-touch experience: can pass dragged items between hands, and hold
onto them while interacting with other UI elements, as if manipulating
physical objects
Drag and drop objects
UIDragItem is the model object for the data you’re dragging
UIPasteConfiguration can be used to support both paste and drop with
the same code
UIDropInteraction for more customization: can provide a delegate that
accepts/rejects drops, requests formats
Animations, etc can be customized with the delegates
Timeline for a drag and drop interaction
Drag starts: app needs to provide the items to be dragged
Default image is a snapshot of the dragged view
Drop proposal: .cancel, .copy (should be default), .move,
.forbidden
.move only works within the same app
Get another callback when the user lifts finger to complete or fail the
drop session
Sample code
Bulletin board app for dragging around photos
From Monroe to NASA (Session 106)
Dr. Christine Darden
In high school, geometry was her first exposure to math. Ended up studying
math and teaching in college out of concern that nobody would hire a black
female mathematician.
Ended up in graduate school as a research assistant for aerosol physics.
Earned a degree in applied math and recruited by NASA.
After five years of implementing math on the computer, she pushes management
to give her a promotion. Previously, men and women of the same background
were assigned different positions: men became researchers and published
papers, while the women worked as computers.
Researched ways to reduce sonic boom for supersonic aircraft.
Went into management at Langley and retired in 2007.
CoreML provides inference implementations of many learning algorithms,
including deep learning, allowing developers to focus on app features
Vision, NLP built on top of CoreML
High performance, on-device implementations with Accelerate framework
and Metal Performance Shaders, which you can use to build custom models
Storing models in CoreML
Many models, including feed forward and recurrent neural networks (30+
layer types)
Developers aren’t required to work with the models directly — CoreML
can handle these “low level details”
Models stored / shared as JSON that contains the “function signature,”
structure and weights. This format is optimized for “openness,” and
compiled to something more performant by Xcode.
Apple provides some models, and you can import from other open source
learning frameworks using Core ML Tools (open source)
Using models
All models have the same interface, like scikit-learn
Xcode generates Swift code so that you can write things like
myModel.prediction(input: myInput)
You can also access the lower level MLModel object you want
Context: quality and popularity of photos/videos continues to increase, but
bandwidth is still expensive
High Efficiency Video Coding (HEVC or H.265) uses up to 40% less space
compared to H.264 (50% for iOS camera)
Variable block sizes allows larger blocks, which provides compression
benefits for high resolution videos
Use Discrete Sine Transform (DST) as well as DCT, also adding variable
size
More directions for intra prediction
Filters to do sub-pixel (not pixel aligned) motion estimation?
Sample adaptive offset filter for deblocking
High Efficiency Image File Format (HEIF) is an image container format that
can use HEVC as the codec
Multiple images: sequences, alpha/depth channel
Usually uses HEVC for compression, but can support others
HEIF photos on Apple devices
Images are encoded as 512x512 tiles to support partial downloading
.heic extension indicates the HEIF file was encoded with HEVC (this is
always the case)
Includes EXIF and thumbnail as usual
Hardware decoding: A9 (iPhone 6s and later), Skylake (ex: Touch Bar
MacBook Pro)
Hardware encoding: A10 (iPhone 7)
HEVC movies on Apple devices
QuickTime is used as the container
Also supports 10-bit color (software encoder available)
Hardware decoding: A9, Macs?
Hardware encoding: A10, Intel 6th generation Core
Decodable vs. playable
Decodable: it is possible to decode the movies, such as for a
transcoding or non-realtime workflow (true on all devices)
Playable: can be played in realtime
Sharing
Transcode to JPEG or H.264 if you’re not sure about the receiver’s
decoding capabilities, or if the server cannot transcode
Ex: Mail
Try to avoid transcoding and increased file size with capabilities
exchange
Ex: AirDrop conditionally encodes depending on the receiving device
Introducing ARKit: Augmented Reality for iOS (Session 603)
Sneak peek: applications of ARKit
Tell 3D stories that you can explore by moving the device
IKEA puts furniture in your living room
Pokémon Go improved tracking by adopting ARKit
What’s provided by ARKit?
Visual–inertial odometry (physical distances with no camera
calibration)
Finding surfaces and hit testing (intersecting?) them
Lighting estimation
Custom AR views to help with rendering
ARKit API
Create ARSession object and run with ARSessionConfiguration. It will
create AVCaptureSession and CMMotionManager for you, then start
tracking.
Grab frames from the ARSession or get notified via delegate
Calling run() again updates the configuration
ARFrame: image, tracking points, and scene information
ARAnchor represents a real-world position, which you can add and
remove
Tracking details
Find the same feature in multiple frames to triangulate device’s
position
Pose estimation saves power by fusing high-frequency motion data
(relative) with low-frequency camera data (absolute)
The transform and camera intrinsics are returned as an ARCamera
Plane detection details
Finds planes that are horizontal with respect to gravity
Multiple similar planes are merged
Hit testing
Given a ray from the camera to some visible point in the scene, returns
all the distances along the ray that intersect with planes in the scene
Option to estimate planes by finding coplanar feature points
Wednesday, June 7
Capturing Depth in iPhone Photography (Session 507)
Dual camera
iPhone 7 Plus sold much better than 6s Plus, and Brad attributes this to
the dual camera
Dual camera (virtual device) seamlessly matches exposure and compensates
for parallax when zooming
Depth and disparity
Value of depth map at each location indicates distance to that object
Stereorectified: multiple cameras that have same focal length and
parallel optical axes
Disparity: the inverse of depth. For the same object, change in position
between the images goes down as object gets farther.
AVDepthData can be either depth or disparity map, because they’re both
depth-related data
Holes are represented as NaN. Happens when unable to find features, or
points that are not present in both images.
Calibration errors (unknown baseline) can happen from OIS, gravity
pulling on lens, or focusing. This leads to a constant offset error in
the depth data. So we only have relative depth (cannot compare between
images).
Streaming depth data
AVCaptureDepthDataOutput can give raw depth data during preview, or
smooth it between frames
Maximum resolution 320x240 at 24 fps
AVCaptureDataOutputSynchronizer helps synchronize multiple outputs of
a capture session
Can opt into receiving camera intrinsics matrix with focal length and
optical center (used by ARKit)
Capturing depth data
In iOS 11, all Portrait Mode photos contain depth data embedded
Lenses have distortion because they’re not pinholes — straight lines
in the scene don’t necessarily become straight lines in the image.
Features are different parts may be warped differently (especially
because the cameras have different focal lengths).
Depth maps are computed as rectilinear, then distortion is added so they
correspond directly to the RGB image. This is good for photo editing,
but you need to make it rectilinear for scene reconstruction.
AVCameraCalibrationData: intrinsics, extrinsics (camera’s pose), lens
distortion center (not always the same as optical axis), radial
distortion lookup table
Dual photo capture
Capturing separate images from both cameras in a single request
Use depth data to apply different effects to different parts of the scene
Preparing depth data
Read it into CIImage from the photo’s auxiliary data
Upscale with edge-preserving algorithm
Normalize by finding min and max, because dual camera can only provide
relative depth data
Higher quality by using disparity instead of depth (why?)
Editing with depth data
Common pattern: filter the image, then use a modified depth map as a
blend mask between original and filtered
Pass nil as color space when working with disparity maps so Core Image
doesn’t try to color manage
CIDepthBlurEffect is the same filter used in Portrait Mode. You can
adjust the size of the simulated aperture and the focused area (either
by rect or facial landmark locations).
Was the default on iOS, watchOS, and tvOS — now macOS too
Dry run conversions in iOS 10.0, 10.1, and 10.2 to test robustness of
the conversion proecess
Many devices gained free space because LwVM (volume manager) is no
longer needed
COW snapshots ensure consistent state when making iCloud backups
Unicode normalization
Unicode characters can be represented in many ways, such as ñ = n + ~
Native normalization for erase-restored iOS 11 devices
Runtime normalization (in the filesystem driver?) for 10.3.3 and 11
devices that are not erase-restored
Future update to convert all devices to native normalization. Maybe
because APFS compares filenames by hash, so they need to redo all the
hashes on disk.
APFS on macOS
System volume will be automatically converted by High Sierra release
installer (it’s optional during the beta). Other volumes can be manually
converted. Boot volume must be converted by the installer to be
bootable, so don’t do it manually.
Multiple volumes on existing drives do not use space sharing — they
are converted independently. Suggest adding APFS volumes to an existing
container and manually copying the files over.
EFI driver embedded into each APFS volume, allowing boot support even
for encrypted drives in virtual machines
FileVault conversion preserves existing recovery key and passwords.
Snapshots are encrypted even if they were taken before enabling
FileVault.
All Fusion Drive metadata is pinned to the SSD
Defragmentation support for spinning hard drives only (never on SSDs)
APFS and Time Machine
Since Lion, Mobile Time Machine locally caches some backups so you don’t
need the external drive all the time
Was implemented with 2 daemons, including a filesystem overlay. Lots of
complexity: O(10,000) lines of code.
In High Sierra, it’s been reimplemented on top of APFS snapshots
Make a backup in O(1) time: tmutil snapshot
List the volumes with mount. The ones beginning with
com.apple.TimeMachine are the hourly APFS snapshots being taken by
Time Machine.
Unmount all snapshots to put it into a cold state:
tmutil unmountLocalSnapshots /. Then try to enter Time Machine and it
will load very fast (they are mounted lazily).
Local restores are also O(1) because we just make a COW reference to
existing blocks on disk
APFS and Finder
Fast copying using COW clones
If a bunch of clones referencing the same blocks are copied to another
APFS container, the cloning relationship is preserved
Check for available video codec types (for example,
AVCapturePhotoOutput). If HEVC is supported, it will be the first item
in the array, so it gets used by default.
Get used to dealing with HEVC content
Encoding HEVC
AVAssetWriter has new presets
Hierarchical frame encoding
I-frames contain full information. P-frames refer to a previous frames,
and B-frames can refer to previous and future frames.
When dropping frames, can only drop if nobody depends on it
If you want to drop a lot of frames, the p-frames must depend on frames
far in the past, which makes compression suffers
Hierarchical encoding in HEVC makes the referenced frames closer
Opt into compatible high frame rate content in Video Toolbox: set the
base layer frame rate (30 fps) and expected frame rate (the actual
content frame rate, such as 120 fps).
What is High Efficiency Image File Format (HEIF)?
Proposed 2013, ratified in summer 2015 (only 1.5 years later!)
Uses HEVC intra-encoding, so average 2x smaller images than JPEG
Supports auxiliary images, such as alpha or depth maps
Store image sequences, such as bursts or focus stacks
EXIF and XMP for metadata
Arbitrarily large files with efficiently loading tiles, allowing loading
gigapixel panos while using O(100 MB) of memory
Low level access to HEIF
CGImageSource supports HEIF images, and infers type from file
extension
Tiling support: get metadata from CGImageSourceCopyPropertiesAtIndex,
then specify area with CGImage.cropping(). CG will only decode the
relevant tiles.
CGImageDestinationCreate() will infer format from extension, though it
returns nil if the device does not have hardware encoder
High level access to HEIF (PhotoKit)
Required to render output as JPEG or H.264, regardless of input format
Don’t need anything for Live Photo editing, since editing code doesn’t
render stuff directly
Capturing HEIF
Must use AVCapturePhotoOutput to get HEIF
It used to deliver a CMSampleBuffer, but unlike JPEG/JFIF, HEIF is a
container format that may hold many codecs
Introducing AVCapturePhoto, which encapsulates the photo, preview,
auxiliary data, settings, and more
Every captured image is now automatically put into a container (HEIF,
JFIF, TIFF, or DNG)
Photos captured during movies will contend for the same HEVC block.
Video is given priority, so photos will take longer and be less
compressed. Recommendation: JPEG for photos captured during video.
HEVC encode takes longer than JPEG, but it’s still fast enough for 10
fps burst. However, faster bursts should use JPEG.
Thanks for reading! If you’re enjoying my writing, I’d love to send you infrequent notifications for new posts via my newsletter. You’ll receive the full text of each post, plus occasional bonus content.