A week on with a VisionPro

There are excellent reviews of the VisionPro “out there”, this post isn’t meant as another. It’s a record of my first experiences, thoughts, and scribbled notes for future me to look back on after a few iterations of the product.

I had been planning on getting a Vision Pro when it was first rumored. I put away funds from contracts and gigs, and when the time came and it was available for order, I still had sticker shock. When I bought one, I didn’t skimp, but I didn’t blow it out either. My goal is to learn this product – how it works and how to work with it, and to write apps that work beautifully on it. When the available-to-developers-only head-strap extension was announced, I grabbed it too. My prior experience with any headset is using an Oculus (now Meta) Quest 2, which was fun and illustrative – but I couldn’t use it more than a few hours before nausea would start to catch up with me.

Right off, the visual clarity of the Vision Pro blew me away. The displays are mind-bogglingly good, and the 3D effect is instantly crisp and clear. I found myself exploring the nooks and corners of the product that first evening, without a hint of nausea that I’d feared might happen. The two and a half hours of battery life came quickly.

Beyond the stunning visuals, I wanted to really understand and use the interaction model. From the API, I know it supports both indirect and direct interaction using hand-tracking. Most of the examples and interactions I had at the start were “indirect” – meaning that where I looked is where actions would trigger (or not) when I tapped my fingers together. It’s intuitive, easy to get started with very quickly, and (sometimes too) easy to forget it’s a control and accidentally invoke it.

In early window managers on desktop computers, there was a pattern of usage called “focus follows mouse” (which Apple pushed hard to move away from). The idea was that whichever window your mouse cursor was over is where keyboard input would be directed. The indirect interaction mode on Vision Pro is that on steroids, and it takes some getting used to. In several cases, I found myself looking away from the control while wanting to continue using it, with results that were messy – activating other buttons, etc.

Most of the apps (even iOS apps “just” running on Vision Pro) worked flawlessly and easily, and refreshingly didn’t feel as out of place as iOS designed apps feel on an iPad (looking at you Instagram). One of the most useful visual affordances is a slight sheen that the OS plays over areas that are clearly buttons or targeted controls, which makes a wonderful feedback loop so that you know you’re looking at the right control. The gaze tracking is astoundingly good – so much better than I though it would be – but it still needs some space for grace. iOS default distances mostly work, although in a densely packed field of controls I’d want just a touch more space between them myself. After wearing the device for a couple of hours, I’d find the tracking not as crisp and I’d have a bit more error. Apps that eschewed accessible buttons for random visuals and tap targets are deeply annoying in Vision Pro. You get no feedback affordances to let you know if you’re on target or not. (D&D Beyond… I’ve got to say, you’ve got some WORK to do)

Targeting actions (or not) gets even more complicated when you’re looking at touchable targets in a web browser. Video players in general are a bit of a tar pit in terms of useful controls and feedback. Youtube’s video player was better than some of the others, but web pages in general were a notable challenge – especially the ones flooded with ads, pop-overs, and shit moving around and “catching your eye”. The term becoming far more literal and relevant when you accidentally trigger an errant click after some side movement shifted my gaze, and now I’m looking at some *%&$!!# video ad that I want nothing to do with.

In a win to potential productivity for me, you can have windows everywhere. The currently-narrowish field of vision constrains it: you have move your head – instead of glance – to see some side windows. It’s a huge refresher to the “do one thing at a time” metaphor that didn’t exist on macOS, pervades iOS, and lives in some level of Dante’s inferno on iPadOS. I can see a path to being more productive with the visionOS “spatial computer” than I ever would be with an iPad. The real kicker for me (not yet explored), will be text selection – and specifically selecting a subrange of a bit of text. That use case is absolutely dreadful in Safari on iOS. For example, try and select the portion of the URL after the host name in the safari address bar. That seemingly simple task is a huge linchpin to my ability to work productively.

The weight and battery life of this first product release are definitely suboptimal. Easily survivable for me, but sometimes annoying. Given the outstanding technology that’s packed into this device, it’s not surprising. The headset sometimes feels like it’s slipping down my face, or I need to lift and reset it a bit to make it comfortable. For wearing the device over an hour or so while sitting upright, I definitely prefer to use the over-the-head strap – and I don’t give a shit what my hair looks like.

Speaking of caring what I look like – I despise the “persona” feature and won’t be using it. It’s straight into the gaping canyon of uncanny valley. I went through the process to set one up and took a look at it. I tried to be dispassionate about it, but ultimately fled in horror and don’t want a damn thing to do with it. I don’t even want to deal with FaceTime if that’s the only option. I’d far prefer to use one of those stylized Memoji, or be able to provide my own 3D animation puppet that was mapped to my facial expressions. I can make a more meaningful connection to a stylized image or puppet than I can to the necrotic apparition of the current Persona.

And a weird quirk: I have a very mobile and expressive face, and can raise and lower either eyebrow easily. I use that a lot in my facial expressions. The FaceTime facial expression tracking can’t clue in to that – it’s either both or not at all. While I’m impressed it can read anything about my eyebrows while wearing the Vision Pro, that’s a deal killer for representing my facial expressions.

Jumping back to something more positive – in terms of consuming media, the Vision Pro is a killer device right where it is now. The whole space of viewing and watching photos and video is amazing. The panoramas I’ve collected while traveling are everything I hoped for. The immersive 180° videos made me want to learn how to make some of those, and the stereoscopic images and video (smaller field of view, but same gist) are wonderful. It’s a potent upgrade to the clicking wheels of the 3D viewFinder from my childhood. Just watching a movie was amazing – either small and convenient to the side, or huge in the field of view – at my control – with with a truly impressive “immersive theater” mode that’s really effective. It’s definitely a solo experience in that respect – I can’t share watching a movie cuddled up on the couch, but even with the high price point – the video (and audio) quality of Vision Pro makes a massive theater out of the tightest cubby. In that respect, the current Vision Pro is a very comparable value to a large home theater.

Add on the environments (I’m digging Mt Hood a lot) – with slightly variable weather and environmental acoustics, day and night transitions – it’s a tremendous break. I’d love to author a few of those. A sort of crazy, dynamic stage/set design problem with a mix of lighting, sounds, supportive visual effects, and the high definition photography to backdrop it all. I was familiar with the concept from the Quest, but the production quality in the Vision Pro is miles ahead, so much more inviting because of that.

I looked at my M1 MacBook Pro and tapped on the connect button and instantly loved it. The screen on the laptop blanked out, replaced by a much larger, high resolution floating display above it. I need to transition my workspace to really work this angle, as its a bit tight for a Vision Pro. Where I work currently, there are overhead pieces nearby that impinge on the upper visual space, prompting warnings and visual intrusions when I’m looking around to keep me from hitting anything. Using the trackpad on the Mac as a pointer within Vision Pro is effective, and the keyboard is amazing. Without a laptop nearby, I’d need (or want) at least a keyboard connected – the pop-up keyboard can get the job done (using either direct or indirect interaction), but it’s horrible for anything beyond a few words.

I have a PS5 controller that I paired with my iPad for playing games, and later paired with the Mac to navigate in the Vision Pro simulator in Xcode. I haven’t paired it with the Vision Pro, but that’s something I’d really like to try – especially for a game. For the “immerse you in an amazing world” games that I enjoy, I can imagine the result. With the impressive results of the immersive environments, there’s a “something” there that I’d like to see. Something from Rockstar, Ubisoft, Hello World Games, of the Sony or Microsoft studios. No idea if that’ll appear as something streamed from a console, or running locally – but the possibilities are huge by leveraging the high visual production values that Vision Pro provides. I’m especially curious what Disney and Epic Games might do together – an expansion or side-track from their virtual sets, creating environments and scenes that physically couldn’t otherwise exist – and then interacting within them. I’m sure they’re thinking about the same. (Hey, No Man’s Sky – I’m ready over here!)

As a wrap up, my head’s been flooded with ideas for apps that lean into the capabilities of Vision Pro. Most are of the “wouldn’t it be cool!” variety, a few are insanely outlandish and would take a huge team of both artists and developers to assemble. Of the ones that aren’t so completely insane, the common theme is the visualization and presentation of information. A large part of my earlier career was more operationally focused: understanding large, distributed systems, managing services running on them, debugging things when “shit went wrong” (such as a DC bus bar in a data center exploding when a water leak dripped on it and shorted it out, scattering copper droplets everywhere). I believe there’s a real potential benefit to seeing information with another dimension added to it, especially when you want to look at what would classically be exposed as a chart, but with values that change over time. There’s a whole crazy world of software debugging and performance analysis, distributed tracing, and correlation with logging and metrics. All of which benefit from making it easier to quickly identify failures and resolve them.

I really want to push what’s available now in a volume 3D view. That’s the most heavily constrained 3D representation in visionOS today, primarily to keep anyone from knowing where you’re gazing as a matter of privacy. Rendering and updating 3D visualizations in a volume lets you “place” it anywhere nearby, change your position around it, and ideally interact with it to explore the information. I think that’s my first real target to explore.

I am curious where the overlap will appear with webGL and how that presents into the visionOS spatial repertoire. I haven’t yet explored that avenue, but it’s intriguing, especially for the data visualization use case.

Published by heckj

Developer, author, and life-long student. Writes online at https://rhonabwy.com/.