Tara Meyer
February 17, 2026
A Guide To ComfyUI Workflows, Open-source AI Tools, And Benefits to the Creative Pipeline
There’s a quiet revolution happening in mobile game studios, and it’s starting in China. Teams there are scaling user acquisition (UA) 10x without additional headcount by leveraging open-source AI tools. These quick to scale teams are testing hundreds of ad creatives weekly, not monthly, and using these free tools to grow.
But, there are major differences across the global economy. Chinese developers have been refining open-source AI workflows, while Western studios continue debating over which subscription service to buy.
Jakub from Two & a Half Gamers predicts that, “by the end of 2026, there will be around 50% of all UA creatives either having AI hooks or completely done by AI.” He brings over a decade of mobile gaming experience, specializing in system design, monetization, and scaling user acquisition for studios worldwide. For the last three years, he’s worked as an independent consultant, advising everyone from indie studios to major publishers on creative workflow optimization.
“I work with multiple gaming studios around the world, or even non-gaming people these days, because based on the whole Duolingo—you know, apps pretty much taking over the App Store—they’re looking for our know-how.”
What sets Jakub and his team apart? He’s implementing ComfyUI workflows daily for real clients with real budgets. And the studios investing in ComfyUI workflows and similar creative automation tools are building competitive moats that subscription-based tools can’t replicate.
Jakub sits down with Roman, the Marketing Director of Tenjin and shares a guide targeted at open-source AI tools to grow your UA and creative output. Made for UA managers drowning in requests or testing backlogs, studio founders wanting to scale without linear cost increases, creative directors tired of repetitive work or burnout, and small indie and solo developers who want professional-grade creatives without the expense.
This episode of Tenjin ROI 101 is for anyone who wants practical tools to grow their mobile app.
What You’ll Learn
- Why Open-Source Beats Black-Box Tools
- What You Need To Get Started
- Creative Automation Tools Comparison: ComfyUI Vs Alternatives
- How Professionals Use Image-To-Video ComfyUI Pipelines
- How Creative Automation Tools Benefit Teams
- Velocity and Volume Determines Mobile Game Success
- From Creative Generation to Performance Metrics
Why Open-Source AI Workflow Beats Black-Box Tools
Before diving into our ComfyUI workflow tutorial and technical specifications, you’ll need to understand the fundamental difference between Western and Chinese approaches to AI tools. Western AI tools are like a subscription service, charging monthly. On the other hand, many Chinese-based open-source AI tools cost nothing after initial setup.
Western “Black-Box” Approach
Examples include OpenAI, Anthropic, and Midjourney.
- Easy to start with minimal learning curve
- Closed source, subscription-dependent
- “Prompt in, result out” with minimal control
“The Western approach of blackbox AI tools, which are, again, completely closed—you can only do positive prompts, negative prompts, then some very small customization to it.”
According to Jakub, these top AI tools for generating UGC video content work great, until you require:
- Consistent character designs across hundreds of ad variations
- Precise control over compositions for specific hooks
- Integration with existing creative pipelines
- Budget predictability (no per-generation costs)
If you are scaling many ad creatives at a global scale, Jakub argues that many of these black-box tools end up becoming bottlenecks. That’s why he’s a fan of open-source AI solutions, especially when it comes to creative iterations.
The Chinese Open-Source AI Ecosystem
China’s AI strategy deliberately mirrors successful gaming modding communities:
“China is currently flooding the market with all these open-source models because it’s their kind of political policy of, ‘We’ll get these models in the people’s hands. Therefore, we control the ecosystem.'”
This has resulted in a thriving culture and ecosystem where:
- Various AI models are constantly improved through community contribution
- Unlimited customization (if you invest the effort)
- No subscription costs, only for hardware
- Workflows become proprietary advantages
Jakub’s analogy to the game Skyrim is perfect:
“Imagine it basically like Skyrim. Skyrim is played to this day and is one of the best RPGs in the world. Why? Because it has a giant modding community that revives it, patches it, improves it, so on and so forth. So that’s their approach, basically.”
Why This Matters for Mobile Game UA
ComfyUI workflows bring a modding mindset to creative production. It lets teams remix within a community and use open-source AI models to rapidly generate whatever assets they need across multiple formats.
“Open-source AI generation is not confined only to images and video. You can generate whatever you want, basically, in any modality, as long as you have the open-source model for it… You can do audio, 3D assets, 2D assets, 2D sprites—like, you can generate whatever you want.”
Ultimately, this is what your creative workflow grows into: a compounding tool for growth that gets more capable over time, a layer of proprietary IP that competitors can’t easily replicate, and an appreciating asset, rather than a recurring expense. These are some of the main reasons why forward-thinking mobile studios are investing now.
Tools to Grow: ComfyUI Hardware & Software Requirements
According to Jakub, there’s practical hardware and a shopping list for setting up automated creative production with ComfyUI workflows (plus models from places like CivitAI).
“You need a good computer. So we need at least something like, I would say, 8 to 10GB of VRAM, NVIDIA GPU. This stuff won’t work on AMD. Maybe in some experimental form it will, but you need a CUDA core GPU. That’s the first step. Once you have this, you need ComfyUI. Again, you can get it on the internet, very easily.”
Hardware Investment
Unlike cloud services, open-source AI workflow tools run locally. This requires upfront investment but eliminates ongoing costs.
Minimum Specs:
- GPU: NVIDIA RTX 3060 (12GB VRAM)
- RAM: 16GB system memory
- Storage: 512GB SSD (for models and workflows)
Recommended Specs:
- GPU: NVIDIA RTX 4070 or 4080 (16GB+ VRAM)
- RAM: 32GB system memory
- Storage: 1TB NVMe SSD
ROI Calculation:
- Midjourney subscription: $60/month = $720/year
- Runway video generation: $95/month = $1,140/year
- Total avoided cost: $1,860/year
- Hardware payback: 6-16 months
That puts hardware payback at roughly 6–16 months, depending on which build you choose. After year one, each new generation is effectively free because you’re no longer paying per month, per seat, or per output.
Software Stack (All Free)
- ComfyUI – Core creative workflow software framework
- Stable Diffusion models – SDXL, SD 1.5, specialized models
- LoRA models – Character consistency, style control
- ControlNet – Compositional precision
- AnimateDiff/Video extensions – Image to video ComfyUI capabilities
- Face restoration models – Professional quality finishing
Download sources
- CivitAI (models and workflows)
- Hugging Face (base models)
- GitHub (ComfyUI and extensions)
Take Your Time To Learn
“It’s effort-based. You need to put in some effort, and then you have it. I can do it. I’m not a programmer. I’m a game designer. I can do Excel sheets like maths and economy, but I can’t code, and I was able to do all these things. So it’s not that hard.”
Jakub’s key insight here is that it isn’t necessarily technical skill. It’s more about having dedication to the process and having the motivation to create your own assets and unique ad creatives.
Creative Automation Tools Comparison: ComfyUI vs Alternatives
| Feature | ComfyUI | Midjourney | Runway | Traditional |
| Monthly cost | $0 | $60-$120 | $95-$600 | $5,000-$15,000 |
| Setup Time | 2-4 hours | 5 minutes | 5 minutes | Weeks |
| Control Level | Complete | Limited | Medium | Complete |
| Character Consistency | Excellent | Poor | Medium | Excellent |
| Video Generation | Yes | No | Yes | Yes |
| Iteration Speed | Very fast | Fast | Medium | Slow |
| Learning Curve | Steep | Easy | Easy | Steep |
| Best for | High volume UA teams | Quick concepts | Video polish | Hero assets |
The Verdict for Mobile Game UA
A ComfyUI workflow is the clear winter for teams producing 50 or more creative variations on a weekly basis. The upfront investment in setting up pays off in the long run, with unlimited generation capacity and maintaining granular control, a must for branding.
“What I’m saying is that the teams of the future will be building their own tools and own data models and old datasets that they will be then pretty much using through these open-source AI models.”
ComfyUI Tutorial: Image-to-Video Workflow
This is where ComfyUI workflows really start to shine for scaling UA creatives. Plus, there’s a core insight that pros figure out pretty quickly:
“The key to video generation, anything, is image generation. That’s the number one rule that you learn with these things.”
Why Text-To-Video Doesn’t Scale
The workflow seems intuitive. Type a prompt and get an immediate output… and for one-offs this might work. But, there’s a major issue if you’re trying to scale or present to a client similar options.
“Lots of times, people just go text-to-video. Like, you go to an image generator, and you do something and just input some text, and it just generates something, which is great, but you don’t have control. That’s the big problem. You don’t have control of how it looks, how the characters look, how the environment—how anything looks.”
When you’re testing dozens (or hundreds, thousands) of UA creatives a month, this lack of control kills you. You can’t isolate what’s working in your A/B tests. And you definitely can’t iterate fast enough to stay competitive.
An Image-First Professional Pipeline
Phase 1: Base Image Generation
- Precise prompt engineering
- ControlNet for compositional control
- Initial generation batch (20-50 variants)
Phase 2: Refinement
- Face restoration (Detailers)
- Hand correction (critical for UGC realism)
- Background enhancement
- Quality upscaling
Phase 3: Animation
- Image-to-video ComfyUI conversion
- Character consistency maintained
- Motion parameters fine-tuned
- Duration and pacing control
Phase 4: Post-Processing
- Final color grading
- Text/UI overlays
- Export optimization
This is where ComfyUI workflows start to deliver real leverage for creative and UA teams. There’s a principle that pros internalize fast: the key to video generation is image generation.
“You can generate whatever you want, basically, in any modality, as long as you have the open-source model for it. The ComfyUI thing that I will be showing is just like the, let’s say, the frame for it. But you can do it from audio, 3D assets, 2D assets, 2D sprites—like, you can generate whatever you want.”
This positions ComfyUI not only as a creative tool, but as creative workflow software infrastructure.
You can control how characters look, how environments render, how brand elements appear frame-to-frame. For teams doing A/B testing and iterating on hundreds of assets a month, this is necessary.
Consistency is key for building brands. If you can’t maintain consistency, you can’t isolate certain variables, and you can’t move fast enough to stay competitive.
How Creative Automation Tools Benefit Teams
While the output advantages are obvious, the impact on creative teams might be more significant.
Avoid Creative Fatigue and Burnouts
Producing high-volume creatives through traditional methods has an effect on teams. Minute variations and repetitive changes can ruin team spirit and motivation, not to mention creative fatigue or burnout.
That’s because when you’re testing many variations, it usually takes more time, more work to analyze, and this can cause overtime hours. These aspects affect the quality of creative output and cause unhealthy team pressure. And, these effects can be avoided with current tools and the right pipeline.
Creative automation can help assuage these issues by eliminating repetitive work cycles, freeing up more time for the creators to focus on strategy and implementation. It also pushes high-volume production and testing to a technical, rather than human-level.
“The teams of the future will be building their own tools and own data models and old datasets that they will be then pretty much using through these open-source AI models.”
According to Jakub, expect more UA teams becoming tool-builders rather than pixel-focused. Teams will evolve to make more engaging, creating more valuable and sustainable content.
New Pipelines Build A Competitive Moat
The real competitive advantage comes from building custom creative pipelines that competitors cannot buy. When studios invest the time to train LoRAs on their specific character designs, develop brand-matched style models, and curate libraries of their highest-performing creative elements, something fundamental shifts.
An open-source AI workflow stops being just another tool in the stack and becomes actual intellectual property.
We’re talking about proprietary workflows that achieve brand-specific quality levels that generic tools can’t replicate, institutional knowledge baked directly into your creative infrastructure, and appreciating assets that get better with every generation.
Unlike subscription services that vanish the moment you stop paying, these custom pipelines compound in value over time. They learn your studio’s aesthetic preferences, optimize for your specific UA metrics, and become increasingly difficult for competitors to reverse-engineer. This is why the smartest mobile gaming team are now tooling up for the long-run.
Velocity and Volume Determines Mobile Game Success
The clearest evidence of this change is the story of King Shot. Launched in February 2025, the game rapidly scaled to generate approximately $1.5-2 million daily, a trajectory that would have been nearly impossible just two years ago. As Jakub explains:
“King Shot is the biggest game of this year. It was launched somewhere like February, and currently it’s doing something like one and a half, nearly two million a day.”
What makes King Shot’s success particularly instructive isn’t just revenue. The game’s UA strategy relies on a sophisticated bait-and-switch approach that presents approachable, puzzle-style gameplay in advertising (inspired by the Steam game Thronefall), then seamlessly transitions players into a deeper 4X strategy experience once they’ve installed.
This isn’t deceptive advertising in the traditional sense; rather, it’s a carefully engineered funnel that dramatically widens top-of-funnel acquisition while maintaining strong retention metrics.
“It’s all based on this kind of bait-and-switch fake ads, fake onboarding, real gameplay, 4X-style thing… They widen the funnel so much because it’s so approachable.”
The brilliance is in execution: users see engaging puzzle mechanics in ads, experience those same mechanics in the initial onboarding, and gradually discover the game’s more complex 4X systems as they progress. The “fake ads” and “real gameplay” align closely enough that user trust remains intact, while the accessible entry point captures audiences who might otherwise never consider a traditional 4X strategy game.
But here’s the critical insight that explains why ComfyUI workflows and creative automation tools have become essential: this strategy only works at massive creative volume.
King Shot isn’t running five or ten ad creatives. They’re testing hundreds of variations simultaneously, each targeting slightly different audience segments, creative hooks, and messaging angles. They’re iterating on winning concepts daily, not weekly or monthly.
This volume-dependent approach is now proliferating across multiple mobile gaming genres. And, social casino games are adopting similar strategies…even puzzle games are using it. Traditional RPG and strategy titles are also exploring how creative-first UA can widen their acquisition funnels without compromising their core gameplay identity.
The implication is automated creative production isn’t a nice-to-have optimization anymore; it’s become table stakes for competitive UA in 2026. Studios that can generate, test, and iterate on hundreds of creative variations weekly are building insurmountable advantages over those still relying on traditional production timelines. When your competitor can test 50 new creative concepts in the time it takes you to produce five, they’re not just moving faster—they’re learning exponentially more about what resonates with audiences, which hooks drive performance, and how to optimize every stage of the creative funnel.
From Creative Generation to Performance Metrics
Jakub’s work with open-source AI tools like ComfyUI represents far more than a technical roadmap for the structural transformation of mobile gaming creative teams. Generating hundreds of creative variations means nothing without accurate attribution to measure performance.
Leading studios are integrating their AI pipelines directly with mobile measurement platforms like Tenjin to measure:
- Creative-level ROAS using creative ID tagging via file naming conventions and granular attribution
- Install-to-click conversion rates segmented by creative-level data
- Using cohort analytics to refine creative performance in high-volume platforms like Meta
- LTV trajectories for AI-generated vs. traditional creatives
These measurements demonstrate which specific creative model combinations and strategy deliver returns.
Using these tools to grow also requires accurate attribution. The investment you make in open-source AI workflow infrastructure only brings value when paired with mobile measurement partners that can provide a loop between creative production and performance outcomes.
Read the full transcript
In this video, we cover:
• 🇨🇳 The difference between Western and Chinese AI adoption and open-source models.
• 🖥️ The hardware and software you need (GPU requirements & ComfyUI).
• 🎨 A live breakdown of image generation workflows, including “Detailers” and specific rendering techniques.
Leveraging Open-Source AI for Mobile Game User Acquisition
Roman: Hi everyone, welcome to another episode of ROI 101. I’m Roman from Tenjin, and today I’m joined by Jakub from Two and a Half Gamers. Hi, Jakub!
Jakub: Hi, hello there. Nice, thanks for having me. I’m Jakub from Two and a Half Gamers for those who don’t know.
Roman: What do you do there Jakob? A super quick intro for people who might not know who you are.
Jakub: So currently I’m like 10 years plus in the game industry, mainly mobile game industry. Lately, I’ve been working for the last three years, I guess, as an independent consultant, pretty much. But of course, yeah, we run the Two and a Half Gamers podcast with Felix and Matteo, which will be four years next month, so quite some time, I guess.
Roman: Feels a lot longer, dude. It feels a lot longer. I’m not sure how you feel.
Jakub: Yeah, yeah, yeah. That’s the grind there. But yeah, I work with multiple gaming studios around the world, or even non-gaming people these days, because based on the whole Duolingo—you know, apps pretty much taking over the App Store—they’re looking for our know-how. And it’s a perfect match a lot of times where, you know, they need progressions, monetization, and all these other things like system design, basically.
Roman: Yeah, yeah. We’ve seen the same with apps—like a huge amount. But anyway, we met at Modictum with Jakub, and we decided that we want to talk about AI. Of course, it’s still 2025, so we have to talk about AI.
Let’s just jump in, Jakub. It’s going to be like free flow. We don’t have an agenda. We’ll just see what Jakub has to show, and I’ll ask plenty of questions.
Jakub: Yeah, yeah. There’s lots of stuff, and yeah, I guess this will hopefully be as practical as possible because this won’t be one of those discussions that like, “AI will replace your job, AI will be this, AI will be that,” and so on and so forth. This will be like, what can you do now, completely free, and it’s extremely impactful. So let’s start there.
Jakub: Okay, so I guess, yeah, for those listening, best case scenario, you can probably watch this on YouTube or somewhere there, because we’ll be sharing the screen, and I guess it’ll be from now on some kind of a workflow.
So yeah, before we get to this nice image, which we’ll get to in a second, let’s first look at some of the actual stuff that’s currently completely taking over the market, which is basically AI creatives.
AI creatives are actually the most impactful, let’s say, surface-level view of AI that we see in the market. It’s one of the most important things in the current environment because UA is more important than product this year and next year even more, and so on and so forth.
It was not like this a few years before, but now it is. And if you want to give the best example, just look at King Shot. King Shot is the biggest game of this year. It was launched somewhere like February, and currently it’s doing something like one and a half, nearly two million a day.
And it’s all based on this kind of bait-and-switch fake ads, fake onboarding, real gameplay, 4X-style thing, where it was actually taken from Thronefall, which was the game on Steam.
(There we go.) That was pretty much very good but, again, very approachable.
But what happens is basically they widen the funnel so much because it’s so approachable. Users get to see these fake ads. Then when they go into the game, they see the gameplay which is the same as the one in the ads, which means like the fake ads, fake onboarding kinda equalizes itself. Therefore, nothing’s fake anymore, and it’s exactly the thing that you’ve seen in the ads. But slowly, the game unfolds you into 4X or some other high-LTV engine that we see.
It’s proliferating also to other genres, like Social Casino. Like, just wait when we release the next episode on the channel. You’ll see how this bait-and-switch also works there.
And all of this is, again, possible because creatives and marketing is the key in this whole setup. And AI creatives—I’m not saying you can’t do this without the AI creatives—but it’s enabling it in a very, very big way that, again, it gives you volume because you need volume for this.
And AI creatives these days are extremely prevalent. And we think that our prediction is basically that by the end of 2026, there will be around 50% of all UA creatives either having AI hooks or completely done by AI. Like, here you have an example. The one that I showed before, it was actually like a hook, and there was the creative real gameplay and so on. This is the fully generated one where you would have stuff like—you see here, completely generated in an image and video editor, and you just run it as your creative, and that’s it, basically.
So, again, we won’t talk about “AI takes your job, AI does this.” We’re literally talking about what’s currently trending in the market now and how to get this. So if your creative team is not using AI, you’re already behind. That’s basically the state of it.
So how do we actually get to this? And how are these things done? And like, a little bit more nitty-gritty stuff of generation?
Because, as I said, I won’t talk about any other use cases about AI these days, because in my opinion, mastering the UA pipeline and mastering this and addition to boost your volume is the key.
Of course, there are stuff like—let’s say, you know, it’s just an example here. Here’s an example from YouTube that I found where, again, you can use the ComfyUI thing, which I’m using today, and generate 3D assets through it. Again, open-source AI generation is not confined only to images and video. You can generate whatever you want, basically, in any modality, as long as you have the open-source model for it. The ComfyUI thing that I will be showing is just like the, let’s say, the frame for it. But you can do from audio, 3D assets, 2D assets, 2D sprites—like, you can generate whatever you want, basically, and completely for free, as I said, as long as your graphics card is able to handle it.
So that’s there. So don’t just think, “Oh yeah, this is just images and videos, and it won’t help us through.” We can do pretty much everything, because how I think the teams of the future will be going is that they will all be making this custom. Because that’s the biggest difference between the Western approach of like blackbox AI tools, which are, again, completely closed—as for you can only do, I don’t know, positive prompt, negative prompt, then like some very small customization to it—whereas if we go actually to what we can do today…Yeah, it’s kind of very heavy what you can do and what you can actually create and check and stuff like that.
It gives you completely free hands, uncomparable. And as I said, what I’m saying is that the teams of the future will be building their own tools and own data models and old datasets that they will be then pretty much using through these open-source AI models. Because that’s the attitude, or let’s say that’s the way that China handles it.
Like, China is currently flooding the market with all these open-source models because it’s their kind of political policy of, “We’ll get these models in the people’s hands. Therefore, we control the ecosystem.” Instead of the Western approach, which is like, “We have these giant OpenAI companies that are doing like the best of everything,” but again, it’s not that supportive as in China.
In China, the community is also driving these models because they’re adding all of these additions and stuff. Imagine it basically like Skyrim. Skyrim is played to this day and is one of the best RPGs in the world. Why? Because it has a giant modding community that revives it, patches it, improves it, so on and so forth. So that’s their approach, basically.
Roman: …Your first creative when we started. It had the Chinese characters, and I already—because I also follow the channel—I know that you have some folks from China, and they’re like sharing some crazy stuff.
And leads me to my first question: Do you feel like they’re upfront than like everyone else with this AI adoption? And like, clearly you’re saying yes, right?
Jakub: I would say so. Not only are their models—again, they’re open source, so you can go customize and use them for yourself—but the approach and pipeline is, again, different in China.
Because, again, this is the big difference between the West and the East: user acquisition is the most important job in the mobile game industry in China. In the West, it’s not.
In the West, it’s a product, usually. Product—as for either design or, you know, live ops, PM, monetization stuff like that. That’s the most important part, the core of it. User acquisition for them [China] is, again, as I said, the most important part, because also the product is so up to par across the whole industry there. So their product is great to begin with. But yeah, that’s another discussion for some different time.
Roman: But can the folks from the West adopt this kind of—like, the models are open source, as you said?
Jakub: Yeah. Again, they can. Like, you know, we have AIs all over the place, so there’s basically no language barrier if you know how to use them. It’s just artificial. It’s like, you know, effort-based. Like, you need to put in some effort, and then you have it.
But other than that, like, yeah, it’s quite easy. Like, I can do it. I’m not a programmer. Like, I’m a game designer. I can do Excel sheets like maths and economy, but I can’t code, and I was able to do all these things. So it’s not that hard. Yeah, everybody can do that.
And it’s, again, just people in the West kind of sleeping on themselves, whereas they should be doing these things all over the place. But yeah, we’ll get to it.
So, as I said, how to do these creatives and how to pretty much even get to some of these things. Because, again, you can do and do this still pretty easily, through like Nano Banana or Chat GPT, or any other image generator in the West. You can still do great. Like, don’t get me wrong.
This is more hardcore and, let’s say, more customizable stuff because of what you can do and what you can create. You can, for instance, create your own LoRA. We’ll get to it—what that means. But basically, what it means is that you create your own dataset from your art, your custom art, your whatever you want to do, and you add it onto a model. Therefore, the model suddenly spits out like an art that would be coming from your artist, which isn’t really the thing that you can do with GPT or these other tools.
Because currently, as I’m seeing it, for instance, every big company—and I mean like companies like, I don’t know, Blizzard, CD Project Red, and all these other guys—they’re probably already creating their own models, which are completely fed only on their own data, meaning that they’re, again, creating the armies of these artists that they’ll suddenly be able to do and use, which is completely legally okay. That’s because there’s no copyright so far, and they’re just using the model, not the training material. But yeah, that’s again one of these things.
So how does it look, and what’s there? So this is ComfyUI. Let’s start maybe from a little bit easier workflow until we get to the hard stuff. Again, it’s quite easy. It’s visual prompting once you get into it. So you just download the thing from Hugging Face. Hugging Face is the big programmer repo with all the databases and models and everything. It’s all open source on the internet.
And the important part—like, you’re looking at this like, “Oh, this is so—like, how did you create?” No, you don’t. You don’t need to. It’s very easy because all of these things that you see here, for instance, these workflows that I have here, you just take from someone else.
Like, if you’re hardcore, you can literally go and like, “Okay, add a node and like edge spaghetti here and do this visual coding thing,” that, you know, goes from here, from here, from here. You can do it yourself, but I don’t. Because, for instance, this one that I have here—the big one—yeah, no chance for me.
But again, what you do: You go on the internet, you read the guide, and on the guide you have like this whole thing, pretty much. And again, somebody did it for you. So don’t get—maybe let’s get rid of this so it’s a little bit more easy on the eyes. Don’t get scared and don’t think, “Oh, this is just horrible.” As I said, I went through these. I didn’t know shit about all of this, and pretty much by trial and error, you can figure it out quite fast. It’s not that hard.
And my number one advice when working with these tools: Whatever errors or stuff that you have there, just throw it into ChatGPT, and it will just tell you in layman’s terms like, “You need to do this, you need to do that, you need to do this.” And it’s great because, again, we need to realize that suddenly we have this AI that’s literally right there sitting in the corner for us, which we can ask anything, and it will do anything for us.
So all of these things—like, “I don’t understand this, I don’t understand that”—doesn’t matter, because again, you slap it into AI, it will tell you. And especially programming code. Immediately, it’ll fix errors and do stuff for you. So it’s, again, an effort-based barrier, no other barrier.
So if we go into the basics…
Roman: So maybe we can clarify, maybe for the small one. This is what was used to generate one of those creatives that you’re showing at the bottom [of the screen]?
Jakub: Yeah, yeah. So let’s say this one. So how do you use this? How do you generate those?
So, for instance, this one—this was an image, and you run the image through a video generator which then animates it, and then you stitch it into a movie, or like a creative, basically. Because all of these kinds of cuts, that means that it’s another image and another generation, usually. So in order to do these—for instance, this one already requires a little bit more advanced workflow because one thing that we have here is a consistent character, which is like, yeah, it’s not something that you see every day.
So, again, for this you use ComfyUI, where you have workflows for consistent character. Literally, create a character, and from that point on, you kind of save it like, “This is my character.” And then all the generations can go through that character. Therefore, you end up with something like this, where I said like, “Okay, let my character sit in the evening in the office,” and there it goes.
And the video generator is just kind of a cherry on top. It’s not that hard. The important part of let’s say creative video generation, is actually the image itself. That’s because the workflow that you always go to is image-to-video, not text-to-video.
Lots of times, people just go text-to-video. Like, you go to an image generator, and you do something and just input some text, and it just generates something, which is great, but you don’t have control. That’s the big problem. You don’t have control of how it looks, how the characters look, how the environment—how anything looks.
So again, the key to video generation, anything, is image generation. That’s the number one rule that you learn with these things.
Therefore, if you want to have great creatives, you first need to master the image generation. Once you master the image generation, then always the first frame starts with your image, and from that image you go and create the creative, and you can do pretty much whatever you want.
So how do we get to image generation? So, as I said, you install stuff like ComfyUI. You can do Nano Banana or whatever—anything is good. But this is just a much better way of having controllability. So let’s just go over this very simple workflow and how it works and what we have here.
So this is the Z Image Turbo, which is the latest model from Alibaba that is literally taking over the internet in the last month. For those who don’t know, it’s unheard of because this is a very small model—literally like 6.1 billion parameters—and it’s outstandingly good. But yeah, I’ll just go very fast through it.
So here, for instance, we have the base model which is quantized. Quantized means that in order for these—some of these models—we don’t really have the top-of-the-line graphics cards, so the community, again, creates lower versions of these models to cut down on the VRAM requirement but also a little bit on the quality. So that means that I can run this on my 3080 Ti GPU graphics card, which has 12GB of VRAM, even though the base version of this model requires 16.
So you literally go on the internet, and again, in the guide itself—I have here, for instance—you can get and find. So you have these repositories. For instance, the quantized version of the model—you go all the way into small ones, which is like 2 gigs or whatever, and you can run this even on 6 gigs VRAM card.
Roman: So the first step is actually to buy a good computer. Is that what it is? Haha.
Jakub: Haha, yes, you need a good computer. So we need at least something like, I would say, 8 to 10GB of VRAM, NVIDIA GPU. This stuff won’t work on AMD. Maybe in some experimental form it will, but you need a CUDA core GPU. That’s the first step. Once you have this, you need ComfyUI. Again, you can get it on the internet, very easy. It’s just one repository from Hugging Face.
Also, I recommend installing ComfyUI Manager, which is just the UI add-ons stuff, pretty much a utility that, again, you don’t need to go on the website, download manually. You can just click on it, and it will download it from GitHub immediately.
And once you have this, again, you just drag stuff. You can literally go here and drag an image here, and the image and its metadata will then create the workflow if it’s embedded in it. So that’s the beauty of it. Like, you don’t really need to create all this spaghetti visual coding stuff. It will just have the—for instance, this one is an example workflow on the site. It was just like, throw in an image, here we go.
So again, what we have here and what are some of the things that you can control here and what gives you the things. So here we have the base model, as I said—the text encoder and the model itself. It’s quantized, so it’s lower quality, lower VRAM, so we can actually run it. Then we have stuff like “shift.” This is specific for the model. It’s more of like a contrast slider. So less shift means more contrast. More shifts means less contrast. That’s there.
Then we have the positive prompt. Yeah, I’ll get to it—how I got it. And the negative prompt. If I understand correctly, this one doesn’t really work with negative prompts that much. It’s, again, some image generators don’t even have that. Like Flux, for instance—they don’t have a negative prompt. Then we have the image size, which is just like a square of 1024 bits times the same. We could pump it up to 2K, even higher. The problem is that it will just load longer, and we don’t need it for the sake of this video. So that’s there.
Roman: Jakub, quick question. Is it also effort-based, as you said at the beginning, in order to understand everything you actually—
Jakub: Yeah. As I said, no programming skills on my side, no computer science, no nothing. My background is psychology. Like, you don’t need anything. You can get these things still. As I said, for instance, we can link the literally the how-to guide tutorial into the video. There’s like a 40-minute tutorial, but most of the stuff—it’s not even a tutorial. It’s just the guy goes over what’s the comparison between these models—Z Image, Flux, and Qwen—is more of a comparison.
Jakub: So really, where he goes through file manager and just tells you how to install it—this takes like 10 minutes, honestly. It’s not like it will do this and that and it will be super hard. No, it won’t. It will be just like four or five clicks. Again, you have ChatGPT sitting right next to you that if you don’t understand, you just tell it, “I don’t understand this. What should I do?” It will tell you. It’s that easy.
Like, for instance, I didn’t understand which quantized model I should pick for my graphics card. And yeah, so this is what it told me. So I just literally pasted the repository from the thing, and it told me like, “Okay, so you go here, and these are the models. So if you have 10 to 12GB VRAM, pick this one because this one will probably be enough for your memory.” That’s it. And you do all these steps like this. It’s super easy. So nothing really to it.
So once we have all these fixed, let’s just finish the last step. So steps are very important. This is the setup that tells you how many actual parts of generations it goes through, because all these images—usually the diffusion models—it starts from noise. So imagine just a black-and-white grainy picture that all these pictures start like that. And this will be like how many steps—the noise will be run through this.
Then we have CFG value, which is how much prompt adherence compared to creativity we let the model do. Meaning, how much more creative we let it be compared to how it must be exactly as we prompted. Again, a value that you can play with. And then some base stuff that you don’t really need there.
So if we go here and run it, we have this kind of a demon guy, which is like hyper-realistic—a line drawing of a furious forest spirit. Da-da-da-da-da. Let’s run it.
Jakub: We have the same seed. Yeah, we need to change the seed to random because we don’t want to have a different seed each time.
Roman: Prompt. Mhm, I see a lot of text in there. Yeah. How do we get this?
Jakub: Yeah, exactly. Yeah, let me just generate the thing so you see it. Went through the prompt, now it’s in the K sampler, and then from K sampler, it goes to decode, and then there we have our image. So we have, instead of this guy, we have this guy. That’s quite easy.
How do we get this giant prompt? So prompting is kind of another way of learning these things. So, for instance, this prompt I got from CivitAI. CivitAI, again, is one of those things that I would recommend you go check it out. It’s pretty much the biggest open-source community website on the internet. Think of it literally as an Instagram. So it’s just basically images and videos of other creators that people vote on and then can check and do stuff.
The very important part about this site is that you can go there and learn and get stuff for yourself. So, for instance, our forest spirit is just—I was just browsing here. For instance, everyone, today’s images—what’s that? Generated. You have some very interesting stuff that you can get here.
By the way, spoiler alert: I’m using the Civit AI Green site because there’s also the Civitai.com site, which is like 90% porn, because that’s what people generate with user-generated context. So just saying, if you want the one without it, it’s the Green one. If you want the one with it, it’s the base one.
Roman: Thanks for picking the right one for this recording. I appreciate it.
Jakub: No worries. So again, I just found the image from a creator, and the key part here is not the image itself, but again, this thing on the right, which we can zoom on a little bit.
Jakub: So what we have here is that it tells us actually how this was created. And we can even run it on the site itself and generate it there if you really want. The site allows it if you buy literally through credits. But again, why should we do it if we have it open source?
So what this tells us: It’s using the Z Image Turbo generator. So I can literally just go here, click here, and then I have the model. It was released November 26th, and I can download it or create with it or basically get stuff from it. You also have some kind of current generations and what people are doing there and stuff like that. But again, we already know the model.
Then we have the prompt. So we have the prompt. We can take the prompt, and you can play with it and use it. Prompts have very specific setups. Again, we would probably need a different podcast for it. But again, you don’t need to create this stuff yourself from scratch. You can learn from other people. This is why this site is so important.
It comes into the formula because you can create amazing stuff just by copying other people’s work and reverse-engineering it and seeing how it works. And therefore, you learn, and you learn very, very fast.
Then we also have some other important things, which is the metadata—basically how the guy specified his sliders in ComfyUI. So we see, as we talked, CFG scales a little bit more to the adherence, so it’s 1.1 only, eight steps. The sampler—we can even take the same seed and generate the same exact image if we want. That’s also possible because he left the seed here.
Some people don’t share their generation metadata because they’re very—you know, want to stay confidential and stuff like that because some people work very hard on their workflow. But most of the stuff that you see here, you can do, and you can just take and learn from it. This is the beauty of the site—that you learn so much.
Jakub: So again, this was pretty easy to do, and we can do whatever we want, actually. Just for the sake of it—so if we go here, we can leave our fire guy and—
Roman: Roman, tell me what do you want to generate?
Jakub: Let’s do something Christmas-related.
Roman: Christmas. Zombie.
Jakub: Zombie.
Roman: Like, do you remember Plants vs. Zombies?
Jakub: Okay, this is what immediately sparked for me. By the way, good that you’re saying it. The beauty of these models—uhhh Christmas postcard…
Yeah, let’s try this one. The beauty of these models—good that you mentioned—is that they’re completely uncensored, which is, again, the big advantage of it. Because if you go into ChatGPT or, again, one of these kinds of main models, you can’t generate IP-based stuff. For instance, my son asked me, like, “Oh, can you—can I have, like, Olaf or whoever from Frozen?” Or like, no, you can’t, because these models have other AIs that are censoring the output of them so that you can’t do it. It’s impossible here.
Roman: Quick!
Jakub: Yeah, it’s very quick. Again, as I said, I’m using a downward-quality one, so this would be a little bit different than the usual quality that you can pump it up an d there are still better models. This is the Turbo one, so speed is more important than quality itself. But again, whatever you do here, you see, you still can get amazing quality.
But again, if I would go and, as I said, Elsa and Anna from Frozen standing in front of a giant frozen castle, cinematic, high quality, realistic—let’s try. Yeah, the more these tags and words you add to it, the better the image will be, of course. That’s like without saying. As I said, I would recommend for anyone to learn just the process.
Oh, there we go! See?
Roman: Oh, that’s literally—well, yeah, like 95%.
Jakub: It’s like if we would fine-tune it a little bit more with details and—you see, the ice maybe needs a little bit of stuff like that here and there. And yeah, we can get to it very easily.
Roman: Your legal department is not going to be happy about—
Jakub: Yeah, yeah, yeah. But again, you can do whatever you want. That’s the beauty of it. So it gives you—and it’s completely free. You know, just take electricity and your GPU, nothing really to it.
But again, I would recommend for anyone just to kind of touch this, run through it, and just learn it. Because, again, you can apply this same process—how this works—to any modality, to like, as I said, text-to-video, image-to-video, 2D art, 3D art, voice, you know, whatever. It works the same. And I think it’s important for people to understand what’s under the hood and how much control they can actually have. Because it’s amazing.
And we’ll probably end up with this last thing, which is my signature stuff that I was working on. And yeah, this gives you much, much, much more control. This is a very advanced workflow that—not this one, sorry, this one. There we go. Let it run because this one is actually 240 steps.
Roman: What does it do? I didn’t understand. What does it do?
Jakub: Yeah, yeah. So what we have here is that we are actually using an Ion on Justice anime model, and we are using the model only for 140 steps. And what we’re trying to achieve—we’re trying to generate a snow leopard anthropomorphic warrior in a realistic style for our game. There’s a pretty big prompt here, pretty big negative prompt also.
It took some time for me to do this. But we want this to be realistic, and the anime model that I’m using here is not able to do realistic stuff. So what’s happening here?
So what you do: You use a refiner. So what it does—after 140 steps, this model stops, and I actually plug in a different model.
So now we’re doing a two-model generation now through Fennekin, which is a realistic model, which finishes the generation, the denoising of the noise from the image for another 100 steps. So it goes all the way to 240. That’s why it’s taking like 3 minutes. And then it basically creates something that each of these models couldn’t create on their own. Because we want, again, a fantasy-style snow leopard warrior guy that—again, I was not satisfied with anything I found on the internet, so I just dug deeper and deeper and deeper and deeper and got into it.
The very important thing is that this model and the workflow that we have here—and by the way, everything that you see here, we’re not using even half of it. Here, we have basically the possibilities of this workflow, and you can just plug them out like functions. You know, just click here and enable it or not. All the violet stuff that you see means that that’s inactive. We’re not using an OpenPose, IP adapter, or ControlNet upscaler, all these other things. It can do so many things that, again, would take a different podcast to do.
But what it can do is still—we’re using the after-generation corrections, like Detail. This is the really important part. Because in the image that we generated here, for instance, you see that, yeah, they’re great, but there’s something strange about these two. It’s not that, you know, the position of their eyes and everything—it’s like they look from Wish.
So what happens here is we can look at it in real time, actually, as the workflow is continuing. And okay, it’s already on ADetailer. So we have the base image here, and you see, it’s not perfect. It’s like the face is kind of distorted. Yeah, we don’t really want this. So what’s happening? We have a face detailer, and the face detailer actually fixes only the face. So we are putting another generation on the image that we already have here. And not only that, we’re also fixing the eyes to make them a little bit better.
Roman: Oh, yeah, yeah, yeah. I see, I see.
Jakub: Basically. And you can do—again, there are like four passes we can do, both hand and body kind of setup. You see how the body is kind of, again, fixed a little bit.
So last time I was checking some stuff on the internet, a professional from an AI agency that was sharing his workflow said that it takes him something like 20 hours on an image and 500 generations to kind of get it where it wants to be—like top quality. So just to give you an example, from the really, really basic stuff, like “Let me generate Olaf from my son,” to very, very advanced stuff like this is how it works. Because, again, this is something that needs to be kind of perfect, because it, again, defines what you want to do.
Jakub: And if we go, again, somewhere here—not this one, but the one creative that I got really, really—not this one. Yeah, there we go. So you see how beautiful these creatives are? Literally like a Pixar movie. And again, you get to this quality by being able to use advanced workflow. And what you end up with are these perfect creatives afterwards.
So, again, that’s the beauty of it. Because this looks literally like a high-level cinematic. It’s like something that somebody would take, again, lots and lots and lots of work and time to kind of get and generate—I mean, draw. But then, again, you can just generate it through pretty much an advanced workflow timeline. And yeah, it would go, and you need consistent characters and all these other things.
But as I said, it’s like step one to getting all these things. So for any creative team that is making creatives, yeah, I think AI—the image generation and video generation mastery—is like an existential problem next year, basically. Because you just won’t be able to keep up with the volume. It’s just very, very hard. And having enough creative volume means that your CPI is getting low as it should be, and it’s getting higher. So eventually, all of these things translate basically into that.
And as I said, even though this looks super complex and stuff, it’s not. Like, I’m not even scratching the surface of it—how complex it can get. It’s just some basic stuff that I’m showing, pretty much, not really to it.
Roman: Yeah, I really like how we look at both of the schemas. At the end of the day, we generate an image.
Jakub: Yeah, yeah, yeah, yeah, right. Yeah, we have a nice warrior here. But again, you can do whatever you want in the end. And that’s, again, as I said, that’s the beauty. You can play with it and try it for yourself. There’s not some money-hungry website consuming credits or whatever. As I said, the only thing that you need is your GPU and electricity. So you can do whatever. I’m currently past something like 12K, 13K generations, probably.
So yeah, sometimes, even here, for instance, there’s a setting that—run it in batches of eight or whatever. You can just go and sleep and let it run and then pick the best one. I always do that sometimes. Yeah, it’s like an idle game, so, you know, you come back and you collect your rewards basically after it.
But I think it’s very fulfilling in order to be able to know how this stuff works. Because then also what it gives you is, once you go and once you see these creatives and all these other things that are currently trending on the—most of the times, I can even spot the generation model just by looking at it. Because some models are very specific—you know, you can see the giveaway.
Like, for instance, all the ChatGPT images—they have this orange tint behind them. So if we go, I think, one of the last ones here—yeah, see, this one. So this is a creative that Candy Crush is running, and you can see in the end that there’s this kind of orange aroma around the image, which means that it was based on ChatGPT as the base image. So here, it’s kind of very, very, very visible. Yeah. But again, it’s very subtle. But once you see 10,000 of these images, it’s very obvious afterwards.
So therefore, you can immediately see how these teams are working, how they are generating these things, and you can do it yourself. Again, it gives you a lot of edge over the process.
Roman: And it’s really a skill, right, for probably 2026—if you’re doing creatives, whether you’re like a UA manager and that’s part of your responsibility, or you’re part of the creative team.
And just trying to summarize at the end: The cloud-based services like ChatGPT or Claude will give you less flexibility with what you can do. Therefore, we would like to use open-source models.
Jakub: Like, honestly, you can even combine those in a way that, for instance, you can do the image in your open-source model to kind of refine it better, and then use a video generator. Video generator is quite important, but it’s like the final part of the thing where, you know, what you want to generate, which starts from the base image, which can be a combination of like, “I don’t know, use the ComfyUI—I want to generate the image,” and then like Veo 3 for finishing, you know, the video and stuff like that. Again, you can do whatever you want.
Best-case scenario, you test all of it, and you figure out which one works best for you. That’s the beauty of it. But again, by knowing these things, we even have the option to test it yourself. Otherwise, you just like, “Oh, we can just use Veo 3. That’s the only thing we know.” And that’s it.
Roman: I see.
Jakub: And the other big problem with these—these things degrade very fast. And not really degrade, but pretty much new stuff gets released all the time, and you need to keep up. I’ve seen some of the creatives here, for instance—I think, yeah, these ones, or maybe a little bit older ones—that you can see some of those are just running on old models. People just haven’t updated yet, which is, again, normal, because this was—the update cycle here is like 3 months or something. But you want to be using cutting-edge stuff because, again, it gives you an edge on quality, stuff like that, and all these other things. So yeah, just kind of moves very, very fast.
Roman: How do you keep up, Jakub? You personally? Do you have time?
Jakub: I don’t know. Okay, yeah, understandable. So I listen to a few podcasts, of course, based on AI. Literally, we can link—this one is, in my opinion, the—this kind of AI Search YouTube channel is literally a guy that just does the news. Every week, he goes, “What was released this week?” and just goes through all the models. And they have specific videos on specific stuff, like comparisons, stuff like that. So this one kind of debrief I watch very regularly.
Jakub: Then I have a few podcasts that are industry-based, like what’s the latest of ChatGPT versus Microsoft and all these other things. And then, of course, you go on the Civitai site, and you just go and see what people generate.
So, for instance, here you could clearly see that lots of these things like, okay, there’s a lot of ChatGPT actually in it. There’s a lot of—what’s this? Yeah, Google Nano Banana, pretty much is trending really high these days. And you just see what people generate and what’s pretty much there in the market. And this just tells you, “Okay, yeah, that’s basically it.” So, you know who generates what, right?
Roman: And this is so interesting that you’re actually a game design expert, and you are now fully into this creative part of the game. Full in. How does it feel, Jakub?
Jakub: Oh, it’s great, you know. I’m always kind of obsessed in—but again, the biggest problem was I didn’t understand how this thing works, which for me is the biggest itch of like, “I need to do something about it,” because I don’t feel safe, or how do we say it? I don’t feel on top of it if I don’t understand how it works.
So it kind of drives me to kind of go into this rabbit hole and learn about this. Because, again, if you don’t know something, at least know how it works. You don’t need to specifically do it, but at least know that there are these options. Because that way, you won’t get sidelined. You won’t get in a situation where somebody tells you something, you can’t call their bullshit, and you don’t know if this is the best, not the best, or are they even telling you the truth, stuff like that.
So, again, AI is just moving so fast these days in a way, and it’s one of the most important technologies of our lives currently. So why not, you know, slap two birds with one stone? Where, again, we need this because of our game industry professional expertise. And on the other hand, of course, AI will be there, and it will change stuff. That’s for sure.
But again, the edge that it gives me now is I know, for instance, gaming-wise—no, gameplay-wise, it won’t change stuff that much. It will maybe help with some optimizations, like matchmaking or whatever bolts, I don’t know what. But we still haven’t reached the point that AI is doing the, you know, the AI-enabled games—games that won’t be working without AI capabilities. We still didn’t hit that inflection point.
It’s not something like it was, for instance, in, I don’t know, 1999 or something, where Doom 3D was released. Because the 3D-ness of it enabled you to do stuff that you couldn’t do in 2D. We haven’t reached this point yet. Again, why? Because you learn about this, and you know that, like, “Oh, this doesn’t make sense. It cannot even code properly yet, so much hallucinations.”
Roman: Everything is connected. All right, Jakub, this was super insightful. I’m sure the guys will have a lot of questions. I’ll ask everyone to leave their questions in the comments. We’ll ask Jakub to answer them once he has time.
Any parting thoughts? Or we will, of course, leave all the links in the descriptions to Two and a Half Gamers and the stuff that we mentioned during the videos. But I also want Jakub to say something, especially at the end of the year. Last parting thoughts from you, Jakub?
Jakub: Yeah, yeah. As I said, if you have any questions or any thoughts, comments, feel free to leave them under the video, or you can join the Two and a Half Gamers Slack. That’s also open for all the people to kind of share their knowledge and talk with others.
Jakub: Yeah, I would—as I said, parting line is: Go and learn it. Don’t wait for it until it will kind of, you know, it’s too late to kind of catch up.
Roman: Well said, well said. We’ll end on this point. Like and subscribe. I’m sure you liked this episode. And thanks a lot, Jakub.
Jakub: Yeah, no worries. See you there. Cheers.
Roman: Bye-bye.
Marketing Content Manager
Tara Meyer