Exciting Augmented Reality Applications with Apple ARKit3

In a previous post, Try Before You Buy with Augmented Reality, we discussed the practical benefits of AR. For example, according to an article on Digital Commerce 360, Build.com shared that the return rate for AR shoppers is 22 percent lower than shoppers who didn’t use the tool and bought the same product.

Another post, Web AR for Android Devices on SOLIDWORKS Sell, outlined the practicality and ease of use of Web AR. For instance, it no longer needs a separate QR code to indicate a physical anchor point for a digital model. Plus, you can simply kick off the AR experience from a browser. There is no need for a long list of special custom Apps in iOS or Android, except for one: Google’s ARCore library on Android devices. As a consumer or a marketer using SOLIDWORKS Sell, you do not have to code at all to enable the Web AR experiences.

Growing adoption comes with growing demands. It is great that we do not have to print and place a special QR code sheet to mark an anchoring surface, especially since this prior Web AR anchor only supported an upward-facing horizontal surface such as a floor or a table top, which does not accommodate all the various products.

A common request has been to place digital models on non-floor planes such as a wall or a ceiling. Examples include an outlet as shown in Figure 1, or a ceiling lighting fixture as shown in Figure 2.

Figure 1. An outlet plate mounted vertically on a wall.
Figure 2. A lighting fixture mounted to a ceiling.

Now with ARKit3, these recognizable anchors have been expanded to support more planes, such as walls and ceilings as shown in Figure 3.

Figure 3. ARKit3 recognizes floors, walls and ceilings as potential anchors. 

According to the 2019 Apple Worldwide Developer Conference presentation, here is the current list of plane classifications supported in ARKit3: wall, floor, ceiling, seat, table, door and window. I hope developers can take advantage of these engine enhancements in the future.

Speaking of anchoring, Figure 4 shows another common challenge where a digital object bottom face may not be modeled at the zero height origin. When it is placed in the AR mode, it looks as though it is floating in the air, which compromises the realism.

Figure 4. A flower vase model is floating above an anchoring plane.

With ARKit3, a developer can easily enable the automatic transformation to place a digital object more precisely against its anchoring plane, as shown in Figure 5.

Figure 5. ARKit3 automatically transforms the floating vase to align with the anchoring plane.

On the topic of realism, you may have run into the unnatural scenario shown in Figure 6. A digital coffee maker looks great on a physical table, until multiple objects come into the camera feed. The previous algorithm always rendered a digital model on the top layer in front of the feed. It had difficulties in interpreting the person who should have stayed closer to the camera than the coffee maker, according to the natural depths of all the objects.

Figure 6. A coffee maker is unnaturally placed in front of the camera feed, clipping a person closer to the camera.

With ARKit3, Apple provides a more immersive experience, such as that shown in Figure 7. The front person clips the coffee maker, rather than the other way around. The person in the back also stays correctly behind the digital model as if the model knows its accurate depth in the field is between the two persons.

Figure 7. A coffee maker is correctly placed behind a foreground person, and in front of a person further away.

You may ask, “How did Apple do that?” Well, the illustration in Figure 8 shows the video composition process, which Apple calls “People Occlusion”.

Figure 8. The “People Occlusion” composition process arranges the accurate occluding sequence in the field depth.

The first step is called Segmentation, at the bottom left in Figure 8. The machine learning neural network recognizes people in the camera feed and creates a separate pixel layer containing the people. However, simply putting one layer of people in front of the digital coffee maker doesn’t cut it, as shown in Figure 9.

Figure 9. An AR object is incorrectly occluded by a person in the back.

The person standing behind is incorrectly occluding the virtual object. So, the next step is to break down the people layer into multiple layers according to the depth estimations powered by machine learning. Next, insert the virtual object at the correct depth. The last step is to only place people who are closer to the camera in front of an AR object.

It is even more impressive to watch a person walking around an AR object and see how “People Occlusion” works dynamically.

To put it simply, it’s similar to adjusting the object display orders on a Microsoft PowerPoint slide, except that all the steps have to be done in real time without manual interventions. That is partly why ARKit3 is only available in iOS13 and requires the Apple neural engine in the A12 chips or later, found in the latest devices such as iPhone XS and XR.

 “People Occlusion” enables virtual content to be rendered behind people, and it can handle multiple persons dynamically. The neural engine recognizes fully or partially visible people, as well, such as the person in the back or even merely the palm of someone’s hand.

It’s worth noting that the people layer does not have to be broken down all the time. A developer may choose to collect people all together to enable green screen-style effects. For example, Figure 10 shows a breathtaking scene from the movie “The Hobbit.”

Figure 10. A breath-taking scene is shown in the movie “The Hobbit.”

As a matter of fact, the background of the beautiful distant mountains and waterfalls was inserted into the green panel areas in a studio as shown in Figure 11.

Figure 11. Green screens in a movie studio.

The power of “People Occlusion” can help extract actors and actresses much more easily, and allow all kinds of background images or videos to be used to achieve the desired visual effects. With all the visual magic and inspiring music in the movie, it is easy to simply enjoy the scene in Figure 10 and forget about its humble beginning in Figure 11.

It is worth noting that “People Occlusion” is only the first step to solve the camera feed depth problem. For versatile physical objects other than people, such as a piano, a cabinet or a door, the recognition has to be more accommodating and versatile as well. Furthermore, the occlusions based on the depth estimations and relationship can get more complicated. I look forward to Apple’s future advancements.

To learn more about how SOLIDWORKS SellWeb AR can help promote your ideas and products, please visit its product page. The best way to learn is to play with live examples featured on a demo site including actual client webpages. Have fun and leave your thoughts below.

About the Author

Oboe Wu is a product management professional with 20 years of experience in engineering and software. He is an advocate of 3D technologies and practical applications.

Recent Articles

Related Stories

Enews Subscribe