Viggle AI Revolutionising Digital Creativity with Controllable Video Generation
How a 15-Member Startup is Empowering Millions to Turn Imagination into Reality with AI-Driven Visual Content
Introduction
Viggle AI, a startup known for its eponymous video generation service, recently announced the completion of a $19 million early-stage investment, led by Andreessen Horowitz.
The company’s founder and CEO, Chu Hang has worked at Google, NVIDIA, Facebook, and Autodesk.
You might not be familiar with Viggle AI, but you’ve probably seen its viral memes online. These include the widely circulated meme where rapper Lil Yachty is jumping at a summer music festival, but his face is replaced with Joaquin Phoenix’s Joker from the movie “Joker.”

Jesus seems to be cheering on the crowd:

Viggle initially started from the Discord community and currently has over 4.3 million users. In March, they launched an independent app. The team developed their own JST-1 model, which supports creating more realistic character movements and expressions.
In today’s article, we will explore how Viggle, a small team founded by Chinese entrepreneurs, achieved “viral” user growth and the insights from the founder’s interview with a16z.
How a 15-member Team Created a Global Community with Viral Growth
In March 2024, Viggle launched its beta testing phase, attracting over 4 million users to its Discord community, where they gathered to create and play with derivative works.
Currently, Viggle’s user base consists of two main groups:
- Social Media Enthusiasts: These users want to create entertaining memes and seek social engagement. Viggle’s highly appealing special effects inspire these users to experiment and then share their creations through social channels, driving viral growth. This is one of the best ways to increase product visibility.
- Professional Creators: These users utilise Viggle to design games and create visual effects. For example, animation engineers can quickly turn concepts into rough animations, visualising ideas and feelings, significantly reducing design draft time, streamlining workflows, and boosting efficiency.
Viggle’s team members are passionate and highly skilled. Founder Chu Hang has held AI researcher positions at leading global tech companies such as Autodesk, Facebook, Nvidia, and Google. He earned a bachelor’s degree in Information Engineering from Shanghai Jiao Tong University, pursued a master’s degree in Electrical and Computer Engineering at Cornell University, and conducted research at the Advanced Multimedia Processing Lab. In 2016, Chu Hang joined the University of Toronto to pursue a Ph.D. in Computer Science, focusing on machine learning research.
Over the past eight months, Nan Ha has served as the head of product growth at Viggle. She is an expert in SEO, content marketing, and affiliate marketing partnerships, having graduated from the joint USC and LSE master’s program in Global Communication and Media. Under her leadership, Viggle’s Discord community rapidly expanded from 500 members to over 4 million, making it the second-largest community globally.
By operating on the Discord platform, Viggle quickly expanded its user base and leveraged Discord’s content moderation and community management tools to handle its massive user group. Discord’s immediacy and interactivity heavily supported Viggle’s viral spread and user growth.
For startups like Viggle and Midjourney, operating on Discord means they don’t need to build a separate user platform. Instead, they can tap into Discord’s tech-savvy user base and built-in content moderation tools. For Viggle, with only 15 employees, this support is crucial.
Ben Shanken, Discord’s VP of Product, commented, “No one is prepared for such growth. We chose to partner with them during their wide dissemination phase because they are still a startup. In fact, there’s a lot of generative AI content on Discord.”
Viggle has also achieved widespread reach on TikTok.
Under the hashtag #Viggle, there are over 40,000 videos, and under #viggleai, there are more than 33,000 videos, showing strong user engagement.
For instance, TikTok influencer Geirill, with 6,030 followers, received 416,000 likes, with Viggle AI videos accounting for 314,000 likes alone. Promotions by KOLs and influencers have brought significant traffic to Viggle, and users enjoy creating videos with Viggle, further promoting product dissemination and viral effects.

Viggle’s low barrier to entry allows both ordinary users and amateur creators to easily get started. This broad user base and convenient creation experience are among the key reasons for Viggle’s explosive growth within the Discord community and on TikTok.
Memes Are Just a Small Part of What Users Create
These various versions of memes are all created by users, but the original materials are provided by Viggle AI.
“Our model is fundamentally different from traditional video generators. Existing video generators are primarily pixel-based and do not understand physical structures. We enable our model to understand these, which significantly improves controllability and generation efficiency,” said CEO Chu Hang.
For example, to create a video of a character like the Joker singing and dancing, you only need to upload a video containing the singing and dancing actions, along with an image of the character.

Alternatively, users can upload a character image and add a text prompt directly.

It’s also possible to create animated characters entirely using text prompts.

Additionally, you can stylize real photos and add animations to them.

Creating memes is only a small part of Viggle users’ needs. Currently, the videos it generates are far from perfect: characters often twitch incessantly and show no facial expressions. However, Viggle has already become a favoured visualisation tool for creative professionals. For filmmakers, animators, and video game designers, Viggle allows them to directly transform their ideas into visual effects.
One source of training materials for Viggle’s AI model is YouTube videos. In media interviews, the CEO revealed that they have relied on public data so far, including YouTube videos. This statement is similar to what OpenAI CTO Mira Murati said about training Sora with data.
This could become an issue. In April of this year, YouTube’s CEO stated that using YouTube videos to train text-to-video AI “clearly violates” their terms of service. This is true for Sora and might also be the case for Viggle.
Subsequently, a Viggle spokesperson clarified to the media: Viggle utilises various public resources, including YouTube, to generate AI content. Our training data is meticulously curated and refined to ensure the entire process complies with all terms of service. We prioritise maintaining good relationships with platforms like YouTube and are committed to adhering to their terms, avoiding mass downloads and any other actions involving unauthorised video downloads.
This still seems to contradict YouTube’s comments from April. Reportedly, many AI model developers, including OpenAI, Nvidia, Apple, and Anthropic, have used YouTube video transcriptions or clips to train their models.
This might be one of Silicon Valley’s not-so-secret secrets: everyone might be doing it, but few are willing to say it out loud.
Highly Controllable and Timely Video Generation
Traditionally, video and 3D generation have been seen as two separate challenges. Viggle has successfully addressed two key issues in generative video technology — high latency and low controllability — by adopting an innovative joint solution.
Controllability
Compared to other purely generative AI products like Runway and Sora, Viggle offers higher controllability and predictability. When using Runway, users generate videos by inputting a prompt, but they cannot predict the final result and often need multiple attempts to achieve the desired effect, lacking control over the generation process. Viggle allows users to upload existing videos and images, clearly indicating their expectations for the final generated video. By learning from the templates and actions in the uploaded videos, Viggle quickly and accurately produces the video content users envision, addressing the common controllability issues found in other AI video generation tools. This makes Viggle a better choice for video professionals and AI creators, especially in scenarios where quality and realistic physical effects are crucial.
Low Latency
Viggle’s unique JST-1 technology unifies video and 3D generation within a single foundational model, significantly reducing video generation latency. This unified model can effectively utilise 3D spatial information and temporal dynamics, minimising the redundancy and delay associated with traditional methods that handle these processes separately. Consequently, users no longer need to wait minutes or hours to obtain a few seconds of video.
JST-1 serves as the driving force, enabling the video-3D foundational model to analyse actions and poses from existing 2D video materials and construct 3D models. This process not only involves shape and appearance transformation but also includes a physical understanding of character movements and environmental interactions.
Exploring Cross-Domain Applications and Enhancing Animation Production Efficiency
Viggle also plans to continue improving its technology and expanding its functionalities. Chu stated, “We focus on building the backend service model while leveraging Discord’s frontend infrastructure. This approach allows us to iterate faster and concentrate on developing the most advanced AI systems.”
Additionally, the company is exploring multiple application scenarios beyond entertainment, such as in game design and visual effects. With Viggle, animation teams can quickly generate preliminary animation assets from concept designs, saving time and effort. This has the potential to revolutionise animation production, making it more efficient and user-friendly.

Founder: Beyond Pixels, Focus on Precision in AI Generation Process

Chu Hang claims that Viggle is being led with a focus on “controllable video generation.”
Unlike existing text-to-video conversion tools, Viggle is designed to provide precise control, allowing actions and characters to be specified accurately. The project was initially intended for filmmakers and game developers, but its ease of use and practical applications have led to broader use, including meme creation.
The challenge with existing tools was their steep learning curve, so Viggle has been developed to be simple and user-friendly, enabling users to start without complex learning processes. By uploading an image to define the character and either text or a video to specify the action, Viggle generates footage of the character performing the action. The tool has gained popularity not only in its intended applications but also in creative and entertainment contexts, such as meme creation.
Millions of characters have been generated, mimicking various templates, like a clown taking the stage. The convenience and diversity offered by Viggle have led to its adaptation for various use cases. Content creators have shown interest in showcasing their work on Viggle, with collaboration opportunities for promotion being explored.
Viggle is not limited to video uploads for content creation. Various templates, including dance moves and sports event scenes, are available. Even with just a frontal image, the model attempts to generate a 360-degree full-body view. The creator community has contributed many excellent templates, and diversity in content creation is driven by these creative ideas.
The project prioritises supporting creative communities by providing them with the best tools and early access to new features, ensuring they have the strongest support. The application of character modelling to real-life scenarios, objects, and environments is a significant focus. Two main paths to achieving real-world modelling are considered: a pixel-level approach, where Transformer models (Diffusion Models) have been effective, and a path focused on achieving precise controllability, akin to a graphics engine, which has been chosen for Viggle.
Viggle introduces a new content consumption experience, allowing deeper interaction with moments of interest. Users can insert their virtual selves into scenes, experiencing them as if in a parallel universe. This approach not only enhances entertainment value but also offers a personalised experience. The humorous aspects of content creation are taken seriously, with rigorous research being conducted to ensure the technology performs well and delivers real entertainment value.
Conclusion
In conclusion, Viggle AI is not just pushing the boundaries of video generation technology; it’s reshaping how we interact with digital content. By combining high controllability with low latency, Viggle enables creators — from casual meme enthusiasts to professional animators — to bring their ideas to life in ways previously unimaginable. The startup’s rapid growth, fueled by its community-driven approach on platforms like Discord and TikTok, underscores the growing demand for accessible, yet powerful, creative tools. As Viggle continues to refine its technology and explore new applications, it stands at the forefront of a new era in content creation, where the lines between professional and amateur, reality and imagination, continue to blur. The future of digital creativity is here, and Viggle is leading the charge, turning pixels into possibilities.
References
- Zeff, M. (2024). Viggle makes controllable AI characters for memes and visualizing ideas | TechCrunch. https://techcrunch.com/2024/08/26/viggle-makes-controllable-ai-characters-for-memes-and-visualizing-ideas/.
- Maxwell (2021). https://mp.weixin.qq.com/s/kHNmPYHNhhw5WngvzRcpgg.
- a16z (2024). Viggle: When Memes Become 3D. https://www.youtube.com/watch?v=IebslwhPzFo.
- Z Potentials (2021). Z Product | Discord. https://mp.weixin.qq.com/s/VGEcjWQ6pNPB1qO8eANtaA.
Catch the latest version of this article over on Medium.com. Hit the button below to join our readers there.