Read This Before Building Your First Zoom App

Helping developers focus on delivering value, while architecting a better Zoom application.

Published in

Prezi Engineering

9 min readOct 9, 2023

I take it, if you’re reading this, you’ve made the decision or at the very least are toying with the idea of building a Zoom app. That’s great news for many Zoom users like myself who rely on the additional functionality and features these apps give our daily in and out of meeting experiences, but as a software engineer, you might want to understand first just what kind of architectural and coding decisions you need to take to build the best possible app.

Over the last year, we have built and released a Zoom app and added countless features, all of which taught us things we wish we knew upfront. In this article, I’ll share everything the team and I have bumped into along the way and, where appropriate, even provide code examples.

A quick primer on the Layers API

The best way to visualise what it really is, is to compare it to Photoshop. I know. It’s an unexpected comparison, but it makes a lot of sense. In Photoshop, everything is a layer. The same applies to Zoom’s Layers API. You have the background layer, you have the camera layer and finally at the top you get a web layer, and this final web layer is what makes the Layers API really cool and where the JS SDK runs.

And now onto the meaty part of this article. Prepare for some aha moments and interesting snippets of code. For content, we have used React with hooks, a bit of Context API here and there, no state management libraries (Redux, yuck), Radix components for most of the UI. For tests, we went with Cypress, but more on that in a section of its own.

External integrations

It’s nigh on impossible these days to build a web app and not rely on some sort of external integration, even if that’s something as simple as pulling a library from a CDN that isn’t on your domain.

Examples of whitelisted URLs in the Zoom app settings. Screenshot by author.

While in the Marketplace app settings panel under App Credentials > Domain Allow List you’ll find the option to add the required domains, you’ll quickly find that not every domain is treated the same way. Here are a few examples:

google.com—straight up not allowed because it’s too broad. This one is important as it can affect your ability to easily integrate Login with Google, use YouTube in the app, etc.
*.giphy.com—we found that the wildcard may or may not work, so we ended up adding all the 5 Giphy subdomains to the allow list like media0.giphy.com, media1.giphy.com, etc.
img.icons8.com—this one isn’t disallowed, but it requires extra information as to why the app needs it.

Unfortunately, you won’t always know what domains you will need to allow, so planning ahead may not be possible, but if it is, I would recommend adding all the required domains and subdomains to the Domain Allow List as soon as you can, as adding new ones also requires an app review by Zoom.

Multiple contexts, two browsers with limitations

Anyone who isn’t familiar with CEF, should start reading up on it right away. The camera view on all operating systems uses CEF (Chromium Embedded Framework), that has a number of otherwise potentially very useful APIs disabled, like navigator.mediaDevices and all of its methods. This is also true for the inMeeting and inClient contexts, where on Mac it’s running the native browser, Safari. This means that getting extra control over audio and video or screen sharing isn’t possible. You also cannot throw popups and enable permissions to access various APIs like you would in the browser. In the camera view, even a simple alert('Y U No work?') won’t work.

In terms of working with the various contexts, your app will want to have the context change in a state variable, so you can watch it in your useEffect. A mature app will end up looking something like this:

useEffect(async () => {
 // initiate zoom SDK, determine contexts, and do other stuff
}, [appContextChanged])

if (component === 'Loading') {
        // Only display spinner in sidebar and not camera webview
        return !navigator.userAgentData || navigator.userAgentData.platform === '' ? (
            <SpinnerWrapper>
                <Spinner size={40} width={4} marginTop="0px" />
            </SpinnerWrapper>
        ) : (

        );
    } else if (component === 'camera') {
        return <CameraView zoomSdk={zoomSdk} />;
    } else if (component === 'sidepanel') {
        return (
                <MeetingView zoomSdk={zoomSdk} initialCameraState={initialCameraState} />
        );
    } else if (component === 'sandbox') {
        return <Sandbox />;
    } else if (component === 'preMeeting') {
        return <PreMeeting appContextChanged={appContextChanged} zoomSdk={zoomSdk} />;
    } else if (component === 'unsupported') {
        return <UnsupportedView />;
    }

Watching appContextChanged is important, as we found that users often open the app from the main client rather than the meeting, which means, our UI had to update to accommodate that change. Of course, if you happen not to have a different UI for the two contexts, then you don’t have to, but I’d argue that you’re not taking advantage of the full capabilities of the Zoom SDK, as many APIs are not available in the main client, which in itself is a consideration to keep in mind.

Inter-context communication

Speaking of context, the Zoom SDK offers a handy postMessage method and, paired with the onMessage event, you essentially have a communication channel between the camera and the side panel, aka between CEF and Safari on a Mac for instance. That’s handy for countless things. You can send data, state changes, etc.

It does, however, come with one important limitation: the payload cannot exceed 512 KB, and that’s not a lot. In our case, that meant that we couldn’t send blobs, Base64 or DataUri images across contexts. It also meant, we had to really be efficient with the data we were sending. In our app, at one point, we were sending all slide data to the camera. Imagine 100 slides’ data, especially when it includes rich text data in HTML strings. That quickly adds up, and we found that out once we added the import feature for PDFs and PPTs that for each page generates a slide on the fly. It quickly turned into a bottleneck that we had to work around.

Debugging

There is one unexpected positive use though for postMessage and onMessage. You can use these for debugging. Given the fact that the CEF view cannot be inspected in the browser as it runs in the camera, we had to get creative and send all error messages from the camera to the meeting view. Maybe not ideal, but it certainly works and saved us countless hours of debugging.

It can be something as simple as telling the side-panel view that the camera is ready:

zoomSdk.postMessage({
    message: 'camera ready',
});

Or handling unhandled rejections like so:

window.onunhandledrejection = event => {
            zoomSdk.postMessage({
                message: `UNHANDLED PROMISE REJECTION: ${event.reason}`,
       });
};

Local storage isn’t quite…

We all love local storage, don’t we? It’s very useful to keep track of certain app states, information, etc. As long as you don’t delete the browser or deliberately clear local storage, the data is reliably accessible. That’s not quite the case when building Zoom apps.

Something we found out the hard way — but in hindsight it makes all the sense in the world — is that, once the user logs out of Zoom or switches to another user, that local storage is cleared, or to be more accurate, you’re getting a new instance of the embedded browser, so all that data is gone. Logging back in with the previous user also doesn’t bring back the data.

What this all means for your app is that you’ll have no choice but to save some information into a database. That can be something as simple and easy to spin up like Firebase, but it is an extra lift, and you need to account for it. On that note, yes, Firebase is fully supported by Zoom apps.

App state

Speaking of state and data, while we didn’t use any state management libraries in our app, apart from React’s own context API, we did make another pragmatic decision around where to handle state. Given the fact that:

local storage isn’t really persistent
there are two separate browsers that can only communicate via postMessage
and we cannot really inspect the camera view

We decided that all the app state variables should stay in the side-panel view. This decision effectively rendered the camera view a passive component that only reacts to messages it receives from the inMeeting context (side-panel).

Sidebar size

The inMeeting context is basically the side-panel or sidebar view. By default, its width is practically a mobile screen viewport width. That’s great for many things, but every so often you might find wishing for more real-estate, which Zoom gives you through expandApp. In fact, you can even toggle programmatically between collapsed and expanded modes, like so:

await zoomSdk.expandApp({
     action: 'expand' | 'collapse',
})

There is but one twist to this, though. It’s either or. There is no in-between width or a set percentage / pixel width option. This means that your designs have to be clever enough to work around that limitation. We opted to stick to the collapse mode, as the expand mode would have been far too wide.

Mobile

Within the app settings under Features, you will also find a toggle for mobile. You will likely — but wrongly — assume that turning it on is all that’s required to have your app running on phones and tablets. Besides the additional information we had to provide for Apple devices, we also quickly found out that the Layers API is only partially supported on iOS, iPadOS, and Android. Given how many people use these devices nowadays, you need to account for this in your design, development, and app distribution strategy. We opted to stick to just desktop.

Testing

I briefly alluded to testing, but it deserves a couple more paragraphs. Honestly, this will probably be the biggest bottleneck you’ll have to account for, and there are several reasons for that:

Your local browser sandbox is not a 1:1 replica of the app. It can’t be, as the browser can’t run Zoom’s APIs.
Your Zoom environment due to the two separate browsers and multiple running contexts has its own limitations, many of which I have already mentioned.
Testing in the client requires either each developer having a working local setup with nGrok, with multiple development copies of the app set up the exact same way, or a CI that generates builds on every push. The downside of the first approach is that it’s simply not scalable. Someone on the team will always have some tiny difference in their app settings in the Zoom Marketplace for things to fall over. The latter works, but it’s slow as you have to wait for every build to generate and in our case that took about 5 minutes / build, and when you do that 10–20 times a day, it adds up.
Finally, automated testing can’t be done well / or at all on the native instance, as the flow includes logging into an account, adding an app, running it, then removing the app. Just logging in requires verification via a code sent through email, so we found that certain tests had to stay manual and done every day to avoid surprises.

It’s safe to say that developers in our team spent about 10–20% of their time with testing complexities.

App reviews

While in our experience, app reviews by Zoom were always swift — up to 48 hours — you will want to strategise what changes you make and when in the app settings on the Zoom Marketplace. All changes require a manual review by Zoom, so perhaps try to add as many APIs, events and whitelisted URLs as you can upfront. Work closely with your product manager and UX designer, and add everything you need before the story even gets into your backlog to save time while in development.

So, which aspect of building a Zoom app surprised you the most? For us, building Prezi Video for Zoom was a very rewarding experience. We had to learn a lot, and do it quickly, while delivering features that half the time brought about challenges of their own. Had someone told us everything you’ve just read upfront, we would have probably achieved even more. But now, armed with all this knowledge, you can architect and design your new Zoom app idea with less unknown unknowns, focus on the fun stuff and make virtual meetings better for everyone.

Attila Vago — Software Engineer improving the world one line of code at a time. Cool nerd since forever, writer of codes and blogs. Web accessibility advocate, LEGO fan, vinyl record collector. Loves craft beer! Read my Hello story here! Subscribe for more stories about LEGO, tech, coding and accessibility! For my less regular readers, I also write about random bits and writing.