塔拉-迈耶
2 月 17, 2026
ComfyUI Workflow、开源AI工具指南:如何赋能素材生产
移动游戏工作室正悄然经历一场变革,这场变革始于中国。中国的团队利用 开源 AI 工具,在不增加人手的前提下,买量(UA)规模翻了 10 倍。当别人还在按月生产测试素材时,他们已能按周批量产出成百上千个创意。
全球市场反差巨大:中国开发者正不断打磨开源AI工作流,而西方工作室却还在纠结该买哪个会三方服务。
Two & a Half Gamers播客频道的Jakub,在手游行业深耕十余年,专精于系统设计、变现策略和全球工作室规模化买量。过去三年,他作为独立顾问,帮助独立工作室到头部大厂的客户优化素材工作流。他预测:“ “by the end of 2026, there will be around 50% of all UA creatives either having AI hooks or completely done by AI.” He brings over a decade of mobile gaming experience, specializing in system design, monetization, and scaling user acquisition for studios worldwide. For the last three years, he’s worked as an independent consultant, advising everyone from indie studios to major publishers on creative workflow optimization.
“现在找我的人,不光有游戏公司,甚至是Duolingo这种非游戏领域的客户,都需要这方面的专业知识。”
Jakub团队的优势是什么?他每天都在为真实市场的客户落地 ComfyUI 工作流 ,并取得了实实在在的预算回报。那些投资建立 ComfyUI 工作流及类似素材自动化工具的工作室,正在构筑一道订阅制工具无法模仿的竞争壁垒。
本期 Tenjin ROI 101,Jakub 与 Tenjin 市场总监 Roman 对话,分享了如何利用开源 AI 工具实现买量和素材产出增长的指南。这篇内容专为以下人群准备:被需求压垮、测试排期堆积如山的 UA 经理;想扩张规模但不想增加人力成本的老板/创始人;厌倦重复工作和职业倦怠的创意总监,以及没钱请大团队但想要高质量素材的独立开发者。
本期《Tenjin ROI 101》适合所有渴望通过实用工具推动移动应用下载增长的团队。
您将了解
- 为何开源工具胜过黑箱工具
- 入门所需准备工作
- 素材自动化生成工具横向评测:ComfyUI vs. 其他方案
- 专业人士如何运用ComfyUI搞定“图生视频”
- 自动化工具怎么帮团队提效
- 速度与产量决定手游成败关键
- 从素材生成到效果衡量:闭环为王
为何开源工具胜过黑箱工具
在深入介绍 ComfyUI 工作流的技术细节之前,我们需要先清楚西方与中国在 AI 工具玩法上的根本差异。西方AI工具需要充会员,按月收费;而许多中国开源 AI 工具在初始设置后,后续基本零成本。
西方“黑箱”模式
代表工具: OpenAI, Anthropic和 Midjourney.
- 上手简单,学习门槛低
- 闭源,依赖付费订阅
- “输入提示词,输出结果”,可控性差
“西方的‘黑箱’ AI工具完全封闭——你只能改改正向/负向提示词,想深度定制?根本做不到”
Jakub指出,这些顶级的AI工具在生成UGC视频内容时平时表现优异,但一旦遇到以下需求,立马露馅:
- Jakub指出,这些顶级的AI工具在生成UGC视频内容时平时表现优异,但一旦遇到以下需求,立马露馅:
- 精确控制有特定钩子的构图
- 与现有素材生成工作流无缝对接
- 预算可预测(费用非按次生成)
如果你需要在全球范围内大规模铺量,这类“黑箱”工具迟早成为瓶颈,这就是为什么Jakub大力推崇开源AI解决方案的原因(尤其针对需要高频迭代的创意)。
中国的开源AI生态
中国的AI发展战略借鉴了当年的游戏模组(Mod)社区
“中国正在向市场倾销大量开源模型。他们的逻辑是:‘把模型发到每个人手里,我们掌控整个生态。’”
这一策略造就了一个繁荣的文化生态圈子:
- 社区大神不断迭代优化
- 只要投入精力,即可实现无限定制
- 无订阅费用,仅需硬件成本
- 调教好的工作流,别人抄不走
Jakub用 Skyrim 打了个特形象的比方:
这就跟《上古卷轴》(Skyrim)一样。这老游戏为啥到现在还是神作?因为有个庞大的模组社区天天给它打补丁、加新内容、修Bug。中国的AI生态就是走的这条路。
这对买量(UA)意味着什么
ComfyUI workflow就是把这种“模组思维”搬到了 素材生产上。团队可以像在模组社区一样,自由混搭各种节点,用开源模型快速生成任何你需要的东西——不管是图、视频,还是别的。
开源AI生成不局限于图像和视频。只要有对应的模型,你想生成啥就生成啥……音频、3D素材、2D素材、2D sprites——你想生成什么都可以。
最终,你的素材生成工作流程将演变为一个随着时间推移能力不断增强的复合增长引擎;竞争对手难以复制的专有知识产权的壁垒;一项持续增值的资产。这也是为什么前瞻的游戏工作室现在就开始布局的核心原因。
增长利器:ComfyUI软硬件配置清单
Jakub为我们列出了一份实用清单,指导你怎么搭一套基于ComfyUI(配合CivitAI等模型)的自动化素材生产线。
“你需要一台性能不错的电脑。NVIDIA显卡,显存至少8-10GB。AMD显卡目前别碰,虽然有些实验版能跑,但极不稳定。必须得是支持CUDA的NVIDIA显卡。硬件搞定后,去网上下个ComfyUI 装上,几步就搞定。”
硬件投资
跟云服务按次扣费不同,开源方案是本地运行。前期买点硬件,后面几乎零成本。
最低配置:
- GPU:NVIDIA RTX 3060(12GB显存)
- 内存:16GB
- 存储:512GB SSD(用于存放模型和工作流文件)
推荐配置:
- GPU:NVIDIA RTX 4070 或 4080(16GB以上显存)
- 内存:32GB
- 存储:1TB NVMe SSD
ROI计算:
- MidJourney订阅费:$60/月 ≈ $720/年
- Runway视频生成订阅费:$95/月 ≈ $1,140/年
- 可以节省成本:$1,860
- 硬件投资回本周期:6 - 16个月
根据您的硬件配置,大概6到16个月就能回本。一年之后,你生成的每一张图、每一个视频,本质上都是免费的。不再需要支付月费、席位费或按次生成费用。
软件栈(全部免费)
- ComfyUI - 核心素材工作流框架
- Stable Diffusion模型 - SDXL, SD 1.5 及各类专用模型
- LoRA模型 - 用于保持角色一致性、控制艺术风格
- ControlNet - 实现精准的构图控制
- AnimateDiff/视频扩展 - 为ComfyUI提供图生视频能力
- 面部修复模型 - 实现专业级的画面精修
下载来源
- CivitAI -模型与预设工作流的宝库
- Hugging Face - 基础模型
- GitHub -本体及各类扩展插件
投入时间,掌握技能
你的回报全看你投入多少。只要你肯下功夫,就能掌握。我不是程序员,我是游戏设计师。我只会用Excel算算数值和经济模型,代码一行不会写,但我照样能做到。所以这个方案真没那么难。
Jakub的核心观点是:关键不在于你有多高的技术背景,而在于你是否愿意投入这个过程,有没有动力去搭建一套属于你自己、别人抄不走的素材库。
素材自动化工具横向评测:ComfyUI vs. 其他方案
| Feature | ComfyUI | Midjourney | Runway | Traditional |
| 月度成本 | $0 | $60-$120 | $95-$600 | $5,000-$15,000 |
| 设置时间 | 2-4 hours | 5 minutes | 5 minutes | Weeks |
| 控制级别 | 完全 | 限制 | 中等 | 完全 |
| 角色一致性 | 极佳 | 差 | 中等 | 极佳 |
| 视频生成 | 是 | 没有 | 是 | 是 |
| 迭代速度 | 非常快 | 快 | 中等 | 慢 |
| 学习曲线 | 陡峭 | 容易 | 容易 | 陡峭 |
| 适合 | 高产量UA团队 | 快速概念设计 | 视频精修 | 精品素材资源 |
最终定论:手游买量,ComfyUI工作流才是王道?
对于每周需要制作50个或更多素材变体的团队来说,ComfyUI workflow无疑是更好的选择。它只需要前期投入一定的配置时间,就能带来长期的回报:无限的生成能力和颗粒度的精细控制——这对于构建稳定的品牌资产至关重要。
正如Jakub所说:“未来的团队,都是自己造工具、自己训模型、自己攒数据集,然后靠着这些开源模型把效率拉满。”
ComfyUI教程:图生视频才是王道
这正是ComfyUI工作流在规模化买量(UA)素材制作中真正大放异彩的地方。专业人士入行后很快就会发现一个核心铁律:
“不管你想搞什么,视频生成的核心永远是‘先制作好图’。这是第一条铁律!”
为什么“文生视频”是个坑?
流程看似直观:输入提示词,立即生成视频……对于一次性创作来说,这或许可行。但如果你想规模化量产,或需要向客户展示多个选项时,就会遇到一个大问题。
“很多人偷懒直接‘文生视频’。输几个字,AI吐个视频出来,这看似很棒,但实际缺乏对素材的控制,这才是大问题。角色长啥样?背景啥风格?根本由不得你。这就是最大的死穴。”
当你每个月要测试几百上千个素材时,这种失控就是灾难。你无法在A/B测试中剥离出有效变量,当然就无法快速迭代保持竞争力。
图像为主的流程
第一阶段:基础图像生成
- 精准的提示词
- 使用ControlNet控制构图
- 初始批次生成(20-50个变体)
第二阶段:精修优化
- 面部修复
- 手部修正(对UGC真实感至关重要)
- 背景增强
- 画质无损放大
第三阶段:动画化
- 通过ComfyUI实现图生视频
- 保持角色一致性
- 微调运动参数
- 控制时长和节奏
第四阶段:后期处理
- 最终调色
- 文案/UI叠加
- 导出优化后的成品
这正是ComfyUI工作流为创意和买量(UA)团队带来的真正杠杆效应。专业人士会迅速领悟到这条铁律:视频生成的关键在于图像生成。 the key to video generation is image generation.
“只要你有相应的开源模型,你可以生成任何你想要的东西,ComfyUI系统就像是它的框架,你可以通过音频、3D素材、2D素材、2D sprites来实现,上限只取决于你的想象力。
因此,ComfyUI不仅定位为一个素材生成工具,更是素材工作流软件的基础架构。
你可以控制角色的外观、环境的渲染、品牌元素在帧与帧之间的呈现。对于每月进行A/B测试并迭代数百个素材的团队来说,这种精细化控制是必备条件。
一致性是建立品牌的核心。如果你无法保持素材的稳定和一致性,就无法剥离测试变量,更无法快速迭代以保持竞争优势。
自动化工具怎么帮团队提效
虽然产出效率的提升显而易见,但自动化对创意团队本身的深远影响往往被低估。
告别创意枯竭与“机械搬砖”
传统模式下,高强度量产素材对团队是一种折磨,细微的调整和重复性的修改不仅消磨团队士气,更会带来创意枯竭甚至职业倦怠。
因为测试大量变体意味着需要耗费巨量时间进行数据分析,必然导致加班常态化。这种高压环境会拉低素材质量,形成恶性循环。但借助正确的工具和流程,这些负面影响完全可以避免。
素材自动化能够通过消除重复劳动,让创作者从“流水线工人”的角色中解放出来,回归到策略制定与创意执行的核心工作。它将高产量的生成和测试任务,从人力层面转移到了技术层面——让人做擅长的创意,让机器做擅长的重复。
Jakub表示:“未来的团队,都将构建自己的工具、自己的数据模型和数据集,并通过这些开源AI模型来驱动它们。”
他预测,未来的UA团队将不再仅仅是专注像素的“工匠”,而是进化为工具构建者(Tool-Builders)。他们将创造出更具吸引力、更高价值且可持续的内容体系。
新流程构建竞争壁垒
真正的竞争优势源于——构建竞争对手无法购买的定制化素材生产管线。当工作室投入时间为专属角色设计训练LoRA模型、开发契合品牌风格的模型、整理最高效素材库时,根本性的变革便悄然发生。
开源AI工作流不再只是工具栈中的一个软件,而变成了实实在在的知识产权 (IP)。
我们所说的专有工作流程,能够实现通用工具无法复制的品牌品质标准,将团队的经验直接融入基础框架,随着每一次生成迭代,系统越来越懂你的审美和指标,价值不断复利增长。
与“停止付费即失效”的订阅服务不同,这些定制管线是越用越强的资产,且极难被逆向工程。这就是为什么最精明的手游团队正在为长远发展进行“工具武装”。
速度与产量决定手游成败关键
这一变革最有力的证据是 《King Shot》 的故事。该游戏于2025年2月上线,迅速攀升至每日产生约150万至200万美元的收入,这种增长速度在两年前几乎是不可能的。
《King Shot》是2025年的爆款手游。它大约在二月上线,目前日收大约是一百五十万到将近两百万美元。
《King Shot》的成功之所以具有特别的借鉴意义,不仅在于其营收表现,该游戏的买量策略采用了一种升级版的 “Bait-and-Switch” 策略:在广告中呈现轻松易上手的解谜式玩法(借鉴了Steam热门游戏《Thronefall》的核心玩法),吸引广泛用户,待玩家安装后,便无缝过渡至更深层次的4X策略体验。
这不是传统意义上的欺骗性广告,而是一个精心设计的漏斗,用低门槛的拓宽获客漏斗开口,同时通过无缝衔接真实玩法保持高留存。
“这一切都基于这种升级版的 “Bait-and-Switch” 策略吸引用户。虚假广告、虚假新手引导、真实玩法、4X风格……极大地拓宽了受众范围——因为这种形式非常容易上手。”
其精妙之处在于:用户被广告中引人入胜的解谜机制吸引,最初的新手引导中体验到相同的机制,随着游戏进程逐渐发现游戏中更复杂的4X模式。这种“广告即玩法”的紧密衔接,保住了用户的信任,而低门槛又抓住了那些原本可能永远不会考虑4X策略游戏的受众。
这里有一个关键的洞见,解释了为什么ComfyUI workflow自动化工具不可或缺: 这种策略只有在巨量的素材产出下才能奏效。
《King Shot》并非只投放了五到十个广告素材,他们同时测试数百种变体,每个变体针对不同的受众细分、创意钩子和文案角度,他们每天都在迭代优胜方案,而非按周或按月推进。
这种“以量取胜”的打法正迅速蔓延至多个手游类型。社交博彩游戏正在采用类似的策略……甚至益智类游戏也在使用,传统的RPG和策略游戏也在探索,如何在不损害其核心玩法特性的基础上,拓宽获客漏斗开口。
这意味着:自动化素材生产已不再是“锦上添花”,而是2026年买量竞争的入场券。能够每周生成、测试和迭代数百个素材变体的工作室,拥有巨大的优势。当对手在你制作5个素材的时间里测试了50个,他们不仅跑得更快,更是在以指数级速度学习受众喜好、验证钩子效果、优化漏斗转化。这种差距是毁灭性的。
从素材生成到效果衡量:闭环为王
Jakub对ComfyUI等开源AI工具的研究,远不止于为手游创意团队转型提供技术路线图。因为,即使能生成数百个素材变体,如果没有精准的归因来衡量效果,一切都将毫无意义。
头部工作室正在将其AI工作流直接与像Tenjin等 移动归因平台 深度集成,实现核心指标的精细化监控:
- 通过文件命名中的素材ID标签,实现素材层级ROAS的精细归因
- 基于素材层级数据,分析安装到点击的转化率
- 使用群组分析来优化在Meta等高流量平台上的素材表现
- 追踪AI生成素材与传统素材吸引用户的LTV对比
这些数据能明确指出:哪种模型组合、哪种策略能带来回报。
利用AI工具实现增长,精准归因是先决条件。你在开源AI基础设施上的投入,只有与能提供素材生产到效果验证闭环的MMP协同,才能真正转化为商业价值。
阅读全文
In this video, we cover:
• 🇨🇳 The difference between Western and Chinese AI adoption and open-source models.
• 🖥️ The hardware and software you need (GPU requirements & ComfyUI).
• 🎨 A live breakdown of image generation workflows, including “Detailers” and specific rendering techniques.
Leveraging Open-Source AI for Mobile Game User Acquisition
Roman: Hi everyone, welcome to another episode of ROI 101. I’m Roman from Tenjin, and today I’m joined by Jakub from Two and a Half Gamers. Hi, Jakub!
Jakub: Hi, hello there. Nice, thanks for having me. I’m Jakub from Two and a Half Gamers for those who don’t know.
Roman: What do you do there Jakob? A super quick intro for people who might not know who you are.
Jakub: So currently I’m like 10 years plus in the game industry, mainly mobile game industry. Lately, I’ve been working for the last three years, I guess, as an independent consultant, pretty much. But of course, yeah, we run the Two and a Half Gamers podcast with Felix and Matteo, which will be four years next month, so quite some time, I guess.
Roman: Feels a lot longer, dude. It feels a lot longer. I’m not sure how you feel.
Jakub: Yeah, yeah, yeah. That’s the grind there. But yeah, I work with multiple gaming studios around the world, or even non-gaming people these days, because based on the whole Duolingo—you know, apps pretty much taking over the App Store—they’re looking for our know-how. And it’s a perfect match a lot of times where, you know, they need progressions, monetization, and all these other things like system design, basically.
Roman: Yeah, yeah. We’ve seen the same with apps—like a huge amount. But anyway, we met at Modictum with Jakub, and we decided that we want to talk about AI. Of course, it’s still 2025, so we have to talk about AI.
Let’s just jump in, Jakub. It’s going to be like free flow. We don’t have an agenda. We’ll just see what Jakub has to show, and I’ll ask plenty of questions.
Jakub: Yeah, yeah. There’s lots of stuff, and yeah, I guess this will hopefully be as practical as possible because this won’t be one of those discussions that like, “AI will replace your job, AI will be this, AI will be that,” and so on and so forth. This will be like, what can you do now, completely free, and it’s extremely impactful. So let’s start there.
Jakub: Okay, so I guess, yeah, for those listening, best case scenario, you can probably watch this on YouTube or somewhere there, because we’ll be sharing the screen, and I guess it’ll be from now on some kind of a workflow.
So yeah, before we get to this nice image, which we’ll get to in a second, let’s first look at some of the actual stuff that’s currently completely taking over the market, which is basically AI creatives.
AI creatives are actually the most impactful, let’s say, surface-level view of AI that we see in the market. It’s one of the most important things in the current environment because UA is more important than product this year and next year even more, and so on and so forth.
It was not like this a few years before, but now it is. And if you want to give the best example, just look at King Shot. King Shot is the biggest game of this year. It was launched somewhere like February, and currently it’s doing something like one and a half, nearly two million a day.
And it’s all based on this kind of bait-and-switch fake ads, fake onboarding, real gameplay, 4X-style thing, where it was actually taken from Thronefall, which was the game on Steam.
(There we go.) That was pretty much very good but, again, very approachable.
But what happens is basically they widen the funnel so much because it’s so approachable. Users get to see these fake ads. Then when they go into the game, they see the gameplay which is the same as the one in the ads, which means like the fake ads, fake onboarding kinda equalizes itself. Therefore, nothing’s fake anymore, and it’s exactly the thing that you’ve seen in the ads. But slowly, the game unfolds you into 4X or some other high-LTV engine that we see.
It’s proliferating also to other genres, like Social Casino. Like, just wait when we release the next episode on the channel. You’ll see how this bait-and-switch also works there.
And all of this is, again, possible because creatives and marketing is the key in this whole setup. And AI creatives—I’m not saying you can’t do this without the AI creatives—but it’s enabling it in a very, very big way that, again, it gives you volume because you need volume for this.
And AI creatives these days are extremely prevalent. And we think that our prediction is basically that by the end of 2026, there will be around 50% of all UA creatives either having AI hooks or completely done by AI. Like, here you have an example. The one that I showed before, it was actually like a hook, and there was the creative real gameplay and so on. This is the fully generated one where you would have stuff like—you see here, completely generated in an image and video editor, and you just run it as your creative, and that’s it, basically.
So, again, we won’t talk about “AI takes your job, AI does this.” We’re literally talking about what’s currently trending in the market now and how to get this. So if your creative team is not using AI, you’re already behind. That’s basically the state of it.
So how do we actually get to this? And how are these things done? And like, a little bit more nitty-gritty stuff of generation?
Because, as I said, I won’t talk about any other use cases about AI these days, because in my opinion, mastering the UA pipeline and mastering this and addition to boost your volume is the key.
Of course, there are stuff like—let’s say, you know, it’s just an example here. Here’s an example from YouTube that I found where, again, you can use the ComfyUI thing, which I’m using today, and generate 3D assets through it. Again, open-source AI generation is not confined only to images and video. You can generate whatever you want, basically, in any modality, as long as you have the open-source model for it. The ComfyUI thing that I will be showing is just like the, let’s say, the frame for it. But you can do from audio, 3D assets, 2D assets, 2D sprites—like, you can generate whatever you want, basically, and completely for free, as I said, as long as your graphics card is able to handle it.
So that’s there. So don’t just think, “Oh yeah, this is just images and videos, and it won’t help us through.” We can do pretty much everything, because how I think the teams of the future will be going is that they will all be making this custom. Because that’s the biggest difference between the Western approach of like blackbox AI tools, which are, again, completely closed—as for you can only do, I don’t know, positive prompt, negative prompt, then like some very small customization to it—whereas if we go actually to what we can do today…Yeah, it’s kind of very heavy what you can do and what you can actually create and check and stuff like that.
It gives you completely free hands, uncomparable. And as I said, what I’m saying is that the teams of the future will be building their own tools and own data models and old datasets that they will be then pretty much using through these open-source AI models. Because that’s the attitude, or let’s say that’s the way that China handles it.
Like, China is currently flooding the market with all these open-source models because it’s their kind of political policy of, “We’ll get these models in the people’s hands. Therefore, we control the ecosystem.” Instead of the Western approach, which is like, “We have these giant OpenAI companies that are doing like the best of everything,” but again, it’s not that supportive as in China.
In China, the community is also driving these models because they’re adding all of these additions and stuff. Imagine it basically like Skyrim. Skyrim is played to this day and is one of the best RPGs in the world. Why? Because it has a giant modding community that revives it, patches it, improves it, so on and so forth. So that’s their approach, basically.
Roman: …Your first creative when we started. It had the Chinese characters, and I already—because I also follow the channel—I know that you have some folks from China, and they’re like sharing some crazy stuff.
And leads me to my first question: Do you feel like they’re upfront than like everyone else with this AI adoption? And like, clearly you’re saying yes, right?
Jakub: I would say so. Not only are their models—again, they’re open source, so you can go customize and use them for yourself—but the approach and pipeline is, again, different in China.
Because, again, this is the big difference between the West and the East: user acquisition is the most important job in the mobile game industry in China. In the West, it’s not.
In the West, it’s a product, usually. Product—as for either design or, you know, live ops, PM, monetization stuff like that. That’s the most important part, the core of it. User acquisition for them [China] is, again, as I said, the most important part, because also the product is so up to par across the whole industry there. So their product is great to begin with. But yeah, that’s another discussion for some different time.
Roman: But can the folks from the West adopt this kind of—like, the models are open source, as you said?
Jakub: Yeah. Again, they can. Like, you know, we have AIs all over the place, so there’s basically no language barrier if you know how to use them. It’s just artificial. It’s like, you know, effort-based. Like, you need to put in some effort, and then you have it.
But other than that, like, yeah, it’s quite easy. Like, I can do it. I’m not a programmer. Like, I’m a game designer. I can do Excel sheets like maths and economy, but I can’t code, and I was able to do all these things. So it’s not that hard. Yeah, everybody can do that.
And it’s, again, just people in the West kind of sleeping on themselves, whereas they should be doing these things all over the place. But yeah, we’ll get to it.
So, as I said, how to do these creatives and how to pretty much even get to some of these things. Because, again, you can do and do this still pretty easily, through like Nano Banana or Chat GPT, or any other image generator in the West. You can still do great. Like, don’t get me wrong.
This is more hardcore and, let’s say, more customizable stuff because of what you can do and what you can create. You can, for instance, create your own LoRA. We’ll get to it—what that means. But basically, what it means is that you create your own dataset from your art, your custom art, your whatever you want to do, and you add it onto a model. Therefore, the model suddenly spits out like an art that would be coming from your artist, which isn’t really the thing that you can do with GPT or these other tools.
Because currently, as I’m seeing it, for instance, every big company—and I mean like companies like, I don’t know, Blizzard, CD Project Red, and all these other guys—they’re probably already creating their own models, which are completely fed only on their own data, meaning that they’re, again, creating the armies of these artists that they’ll suddenly be able to do and use, which is completely legally okay. That’s because there’s no copyright so far, and they’re just using the model, not the training material. But yeah, that’s again one of these things.
So how does it look, and what’s there? So this is ComfyUI. Let’s start maybe from a little bit easier workflow until we get to the hard stuff. Again, it’s quite easy. It’s visual prompting once you get into it. So you just download the thing from Hugging Face. Hugging Face is the big programmer repo with all the databases and models and everything. It’s all open source on the internet.
And the important part—like, you’re looking at this like, “Oh, this is so—like, how did you create?” No, you don’t. You don’t need to. It’s very easy because all of these things that you see here, for instance, these workflows that I have here, you just take from someone else.
Like, if you’re hardcore, you can literally go and like, “Okay, add a node and like edge spaghetti here and do this visual coding thing,” that, you know, goes from here, from here, from here. You can do it yourself, but I don’t. Because, for instance, this one that I have here—the big one—yeah, no chance for me.
But again, what you do: You go on the internet, you read the guide, and on the guide you have like this whole thing, pretty much. And again, somebody did it for you. So don’t get—maybe let’s get rid of this so it’s a little bit more easy on the eyes. Don’t get scared and don’t think, “Oh, this is just horrible.” As I said, I went through these. I didn’t know shit about all of this, and pretty much by trial and error, you can figure it out quite fast. It’s not that hard.
And my number one advice when working with these tools: Whatever errors or stuff that you have there, just throw it into ChatGPT, and it will just tell you in layman’s terms like, “You need to do this, you need to do that, you need to do this.” And it’s great because, again, we need to realize that suddenly we have this AI that’s literally right there sitting in the corner for us, which we can ask anything, and it will do anything for us.
So all of these things—like, “I don’t understand this, I don’t understand that”—doesn’t matter, because again, you slap it into AI, it will tell you. And especially programming code. Immediately, it’ll fix errors and do stuff for you. So it’s, again, an effort-based barrier, no other barrier.
So if we go into the basics…
Roman: So maybe we can clarify, maybe for the small one. This is what was used to generate one of those creatives that you’re showing at the bottom [of the screen]?
Jakub: Yeah, yeah. So let’s say this one. So how do you use this? How do you generate those?
So, for instance, this one—this was an image, and you run the image through a video generator which then animates it, and then you stitch it into a movie, or like a creative, basically. Because all of these kinds of cuts, that means that it’s another image and another generation, usually. So in order to do these—for instance, this one already requires a little bit more advanced workflow because one thing that we have here is a consistent character, which is like, yeah, it’s not something that you see every day.
So, again, for this you use ComfyUI, where you have workflows for consistent character. Literally, create a character, and from that point on, you kind of save it like, “This is my character.” And then all the generations can go through that character. Therefore, you end up with something like this, where I said like, “Okay, let my character sit in the evening in the office,” and there it goes.
And the video generator is just kind of a cherry on top. It’s not that hard. The important part of let’s say creative video generation, is actually the image itself. That’s because the workflow that you always go to is image-to-video, not text-to-video.
Lots of times, people just go text-to-video. Like, you go to an image generator, and you do something and just input some text, and it just generates something, which is great, but you don’t have control. That’s the big problem. You don’t have control of how it looks, how the characters look, how the environment—how anything looks.
So again, the key to video generation, anything, is image generation. That’s the number one rule that you learn with these things.
Therefore, if you want to have great creatives, you first need to master the image generation. Once you master the image generation, then always the first frame starts with your image, and from that image you go and create the creative, and you can do pretty much whatever you want.
So how do we get to image generation? So, as I said, you install stuff like ComfyUI. You can do Nano Banana or whatever—anything is good. But this is just a much better way of having controllability. So let’s just go over this very simple workflow and how it works and what we have here.
So this is the Z Image Turbo, which is the latest model from Alibaba that is literally taking over the internet in the last month. For those who don’t know, it’s unheard of because this is a very small model—literally like 6.1 billion parameters—and it’s outstandingly good. But yeah, I’ll just go very fast through it.
So here, for instance, we have the base model which is quantized. Quantized means that in order for these—some of these models—we don’t really have the top-of-the-line graphics cards, so the community, again, creates lower versions of these models to cut down on the VRAM requirement but also a little bit on the quality. So that means that I can run this on my 3080 Ti GPU graphics card, which has 12GB of VRAM, even though the base version of this model requires 16.
So you literally go on the internet, and again, in the guide itself—I have here, for instance—you can get and find. So you have these repositories. For instance, the quantized version of the model—you go all the way into small ones, which is like 2 gigs or whatever, and you can run this even on 6 gigs VRAM card.
Roman: So the first step is actually to buy a good computer. Is that what it is? Haha.
Jakub: Haha, yes, you need a good computer. So we need at least something like, I would say, 8 to 10GB of VRAM, NVIDIA GPU. This stuff won’t work on AMD. Maybe in some experimental form it will, but you need a CUDA core GPU. That’s the first step. Once you have this, you need ComfyUI. Again, you can get it on the internet, very easy. It’s just one repository from Hugging Face.
Also, I recommend installing ComfyUI Manager, which is just the UI add-ons stuff, pretty much a utility that, again, you don’t need to go on the website, download manually. You can just click on it, and it will download it from GitHub immediately.
And once you have this, again, you just drag stuff. You can literally go here and drag an image here, and the image and its metadata will then create the workflow if it’s embedded in it. So that’s the beauty of it. Like, you don’t really need to create all this spaghetti visual coding stuff. It will just have the—for instance, this one is an example workflow on the site. It was just like, throw in an image, here we go.
So again, what we have here and what are some of the things that you can control here and what gives you the things. So here we have the base model, as I said—the text encoder and the model itself. It’s quantized, so it’s lower quality, lower VRAM, so we can actually run it. Then we have stuff like “shift.” This is specific for the model. It’s more of like a contrast slider. So less shift means more contrast. More shifts means less contrast. That’s there.
Then we have the positive prompt. Yeah, I’ll get to it—how I got it. And the negative prompt. If I understand correctly, this one doesn’t really work with negative prompts that much. It’s, again, some image generators don’t even have that. Like Flux, for instance—they don’t have a negative prompt. Then we have the image size, which is just like a square of 1024 bits times the same. We could pump it up to 2K, even higher. The problem is that it will just load longer, and we don’t need it for the sake of this video. So that’s there.
Roman: Jakub, quick question. Is it also effort-based, as you said at the beginning, in order to understand everything you actually—
Jakub: Yeah. As I said, no programming skills on my side, no computer science, no nothing. My background is psychology. Like, you don’t need anything. You can get these things still. As I said, for instance, we can link the literally the how-to guide tutorial into the video. There’s like a 40-minute tutorial, but most of the stuff—it’s not even a tutorial. It’s just the guy goes over what’s the comparison between these models—Z Image, Flux, and Qwen—is more of a comparison.
Jakub: So really, where he goes through file manager and just tells you how to install it—this takes like 10 minutes, honestly. It’s not like it will do this and that and it will be super hard. No, it won’t. It will be just like four or five clicks. Again, you have ChatGPT sitting right next to you that if you don’t understand, you just tell it, “I don’t understand this. What should I do?” It will tell you. It’s that easy.
Like, for instance, I didn’t understand which quantized model I should pick for my graphics card. And yeah, so this is what it told me. So I just literally pasted the repository from the thing, and it told me like, “Okay, so you go here, and these are the models. So if you have 10 to 12GB VRAM, pick this one because this one will probably be enough for your memory.” That’s it. And you do all these steps like this. It’s super easy. So nothing really to it.
So once we have all these fixed, let’s just finish the last step. So steps are very important. This is the setup that tells you how many actual parts of generations it goes through, because all these images—usually the diffusion models—it starts from noise. So imagine just a black-and-white grainy picture that all these pictures start like that. And this will be like how many steps—the noise will be run through this.
Then we have CFG value, which is how much prompt adherence compared to creativity we let the model do. Meaning, how much more creative we let it be compared to how it must be exactly as we prompted. Again, a value that you can play with. And then some base stuff that you don’t really need there.
So if we go here and run it, we have this kind of a demon guy, which is like hyper-realistic—a line drawing of a furious forest spirit. Da-da-da-da-da. Let’s run it.
Jakub: We have the same seed. Yeah, we need to change the seed to random because we don’t want to have a different seed each time.
Roman: Prompt. Mhm, I see a lot of text in there. Yeah. How do we get this?
Jakub: Yeah, exactly. Yeah, let me just generate the thing so you see it. Went through the prompt, now it’s in the K sampler, and then from K sampler, it goes to decode, and then there we have our image. So we have, instead of this guy, we have this guy. That’s quite easy.
How do we get this giant prompt? So prompting is kind of another way of learning these things. So, for instance, this prompt I got from CivitAI. CivitAI, again, is one of those things that I would recommend you go check it out. It’s pretty much the biggest open-source community website on the internet. Think of it literally as an Instagram. So it’s just basically images and videos of other creators that people vote on and then can check and do stuff.
The very important part about this site is that you can go there and learn and get stuff for yourself. So, for instance, our forest spirit is just—I was just browsing here. For instance, everyone, today’s images—what’s that? Generated. You have some very interesting stuff that you can get here.
By the way, spoiler alert: I’m using the Civit AI Green site because there’s also the Civitai.com site, which is like 90% porn, because that’s what people generate with user-generated context. So just saying, if you want the one without it, it’s the Green one. If you want the one with it, it’s the base one.
Roman: Thanks for picking the right one for this recording. I appreciate it.
Jakub: No worries. So again, I just found the image from a creator, and the key part here is not the image itself, but again, this thing on the right, which we can zoom on a little bit.
Jakub: So what we have here is that it tells us actually how this was created. And we can even run it on the site itself and generate it there if you really want. The site allows it if you buy literally through credits. But again, why should we do it if we have it open source?
So what this tells us: It’s using the Z Image Turbo generator. So I can literally just go here, click here, and then I have the model. It was released November 26th, and I can download it or create with it or basically get stuff from it. You also have some kind of current generations and what people are doing there and stuff like that. But again, we already know the model.
Then we have the prompt. So we have the prompt. We can take the prompt, and you can play with it and use it. Prompts have very specific setups. Again, we would probably need a different podcast for it. But again, you don’t need to create this stuff yourself from scratch. You can learn from other people. This is why this site is so important.
It comes into the formula because you can create amazing stuff just by copying other people’s work and reverse-engineering it and seeing how it works. And therefore, you learn, and you learn very, very fast.
Then we also have some other important things, which is the metadata—basically how the guy specified his sliders in ComfyUI. So we see, as we talked, CFG scales a little bit more to the adherence, so it’s 1.1 only, eight steps. The sampler—we can even take the same seed and generate the same exact image if we want. That’s also possible because he left the seed here.
Some people don’t share their generation metadata because they’re very—you know, want to stay confidential and stuff like that because some people work very hard on their workflow. But most of the stuff that you see here, you can do, and you can just take and learn from it. This is the beauty of the site—that you learn so much.
Jakub: So again, this was pretty easy to do, and we can do whatever we want, actually. Just for the sake of it—so if we go here, we can leave our fire guy and—
Roman: Roman, tell me what do you want to generate?
Jakub: Let’s do something Christmas-related.
Roman: Christmas. Zombie.
Jakub: Zombie.
Roman: Like, do you remember Plants vs. Zombies?
Jakub: Okay, this is what immediately sparked for me. By the way, good that you’re saying it. The beauty of these models—uhhh Christmas postcard…
Yeah, let’s try this one. The beauty of these models—good that you mentioned—is that they’re completely uncensored, which is, again, the big advantage of it. Because if you go into ChatGPT or, again, one of these kinds of main models, you can’t generate IP-based stuff. For instance, my son asked me, like, “Oh, can you—can I have, like, Olaf or whoever from Frozen?” Or like, no, you can’t, because these models have other AIs that are censoring the output of them so that you can’t do it. It’s impossible here.
Roman: Quick!
Jakub: Yeah, it’s very quick. Again, as I said, I’m using a downward-quality one, so this would be a little bit different than the usual quality that you can pump it up an d there are still better models. This is the Turbo one, so speed is more important than quality itself. But again, whatever you do here, you see, you still can get amazing quality.
But again, if I would go and, as I said, Elsa and Anna from Frozen standing in front of a giant frozen castle, cinematic, high quality, realistic—let’s try. Yeah, the more these tags and words you add to it, the better the image will be, of course. That’s like without saying. As I said, I would recommend for anyone to learn just the process.
Oh, there we go! See?
Roman: Oh, that’s literally—well, yeah, like 95%.
Jakub: It’s like if we would fine-tune it a little bit more with details and—you see, the ice maybe needs a little bit of stuff like that here and there. And yeah, we can get to it very easily.
Roman: Your legal department is not going to be happy about—
Jakub: Yeah, yeah, yeah. But again, you can do whatever you want. That’s the beauty of it. So it gives you—and it’s completely free. You know, just take electricity and your GPU, nothing really to it.
But again, I would recommend for anyone just to kind of touch this, run through it, and just learn it. Because, again, you can apply this same process—how this works—to any modality, to like, as I said, text-to-video, image-to-video, 2D art, 3D art, voice, you know, whatever. It works the same. And I think it’s important for people to understand what’s under the hood and how much control they can actually have. Because it’s amazing.
And we’ll probably end up with this last thing, which is my signature stuff that I was working on. And yeah, this gives you much, much, much more control. This is a very advanced workflow that—not this one, sorry, this one. There we go. Let it run because this one is actually 240 steps.
Roman: What does it do? I didn’t understand. What does it do?
Jakub: Yeah, yeah. So what we have here is that we are actually using an Ion on Justice anime model, and we are using the model only for 140 steps. And what we’re trying to achieve—we’re trying to generate a snow leopard anthropomorphic warrior in a realistic style for our game. There’s a pretty big prompt here, pretty big negative prompt also.
It took some time for me to do this. But we want this to be realistic, and the anime model that I’m using here is not able to do realistic stuff. So what’s happening here?
So what you do: You use a refiner. So what it does—after 140 steps, this model stops, and I actually plug in a different model.
So now we’re doing a two-model generation now through Fennekin, which is a realistic model, which finishes the generation, the denoising of the noise from the image for another 100 steps. So it goes all the way to 240. That’s why it’s taking like 3 minutes. And then it basically creates something that each of these models couldn’t create on their own. Because we want, again, a fantasy-style snow leopard warrior guy that—again, I was not satisfied with anything I found on the internet, so I just dug deeper and deeper and deeper and deeper and got into it.
The very important thing is that this model and the workflow that we have here—and by the way, everything that you see here, we’re not using even half of it. Here, we have basically the possibilities of this workflow, and you can just plug them out like functions. You know, just click here and enable it or not. All the violet stuff that you see means that that’s inactive. We’re not using an OpenPose, IP adapter, or ControlNet upscaler, all these other things. It can do so many things that, again, would take a different podcast to do.
But what it can do is still—we’re using the after-generation corrections, like Detail. This is the really important part. Because in the image that we generated here, for instance, you see that, yeah, they’re great, but there’s something strange about these two. It’s not that, you know, the position of their eyes and everything—it’s like they look from Wish.
So what happens here is we can look at it in real time, actually, as the workflow is continuing. And okay, it’s already on ADetailer. So we have the base image here, and you see, it’s not perfect. It’s like the face is kind of distorted. Yeah, we don’t really want this. So what’s happening? We have a face detailer, and the face detailer actually fixes only the face. So we are putting another generation on the image that we already have here. And not only that, we’re also fixing the eyes to make them a little bit better.
Roman: Oh, yeah, yeah, yeah. I see, I see.
Jakub: Basically. And you can do—again, there are like four passes we can do, both hand and body kind of setup. You see how the body is kind of, again, fixed a little bit.
So last time I was checking some stuff on the internet, a professional from an AI agency that was sharing his workflow said that it takes him something like 20 hours on an image and 500 generations to kind of get it where it wants to be—like top quality. So just to give you an example, from the really, really basic stuff, like “Let me generate Olaf from my son,” to very, very advanced stuff like this is how it works. Because, again, this is something that needs to be kind of perfect, because it, again, defines what you want to do.
Jakub: And if we go, again, somewhere here—not this one, but the one creative that I got really, really—not this one. Yeah, there we go. So you see how beautiful these creatives are? Literally like a Pixar movie. And again, you get to this quality by being able to use advanced workflow. And what you end up with are these perfect creatives afterwards.
So, again, that’s the beauty of it. Because this looks literally like a high-level cinematic. It’s like something that somebody would take, again, lots and lots and lots of work and time to kind of get and generate—I mean, draw. But then, again, you can just generate it through pretty much an advanced workflow timeline. And yeah, it would go, and you need consistent characters and all these other things.
But as I said, it’s like step one to getting all these things. So for any creative team that is making creatives, yeah, I think AI—the image generation and video generation mastery—is like an existential problem next year, basically. Because you just won’t be able to keep up with the volume. It’s just very, very hard. And having enough creative volume means that your CPI is getting low as it should be, and it’s getting higher. So eventually, all of these things translate basically into that.
And as I said, even though this looks super complex and stuff, it’s not. Like, I’m not even scratching the surface of it—how complex it can get. It’s just some basic stuff that I’m showing, pretty much, not really to it.
Roman: Yeah, I really like how we look at both of the schemas. At the end of the day, we generate an image.
Jakub: Yeah, yeah, yeah, yeah, right. Yeah, we have a nice warrior here. But again, you can do whatever you want in the end. And that’s, again, as I said, that’s the beauty. You can play with it and try it for yourself. There’s not some money-hungry website consuming credits or whatever. As I said, the only thing that you need is your GPU and electricity. So you can do whatever. I’m currently past something like 12K, 13K generations, probably.
So yeah, sometimes, even here, for instance, there’s a setting that—run it in batches of eight or whatever. You can just go and sleep and let it run and then pick the best one. I always do that sometimes. Yeah, it’s like an idle game, so, you know, you come back and you collect your rewards basically after it.
But I think it’s very fulfilling in order to be able to know how this stuff works. Because then also what it gives you is, once you go and once you see these creatives and all these other things that are currently trending on the—most of the times, I can even spot the generation model just by looking at it. Because some models are very specific—you know, you can see the giveaway.
Like, for instance, all the ChatGPT images—they have this orange tint behind them. So if we go, I think, one of the last ones here—yeah, see, this one. So this is a creative that Candy Crush is running, and you can see in the end that there’s this kind of orange aroma around the image, which means that it was based on ChatGPT as the base image. So here, it’s kind of very, very, very visible. Yeah. But again, it’s very subtle. But once you see 10,000 of these images, it’s very obvious afterwards.
So therefore, you can immediately see how these teams are working, how they are generating these things, and you can do it yourself. Again, it gives you a lot of edge over the process.
Roman: And it’s really a skill, right, for probably 2026—if you’re doing creatives, whether you’re like a UA manager and that’s part of your responsibility, or you’re part of the creative team.
And just trying to summarize at the end: The cloud-based services like ChatGPT or Claude will give you less flexibility with what you can do. Therefore, we would like to use open-source models.
Jakub: Like, honestly, you can even combine those in a way that, for instance, you can do the image in your open-source model to kind of refine it better, and then use a video generator. Video generator is quite important, but it’s like the final part of the thing where, you know, what you want to generate, which starts from the base image, which can be a combination of like, “I don’t know, use the ComfyUI—I want to generate the image,” and then like Veo 3 for finishing, you know, the video and stuff like that. Again, you can do whatever you want.
Best-case scenario, you test all of it, and you figure out which one works best for you. That’s the beauty of it. But again, by knowing these things, we even have the option to test it yourself. Otherwise, you just like, “Oh, we can just use Veo 3. That’s the only thing we know.” And that’s it.
Roman: I see.
Jakub: And the other big problem with these—these things degrade very fast. And not really degrade, but pretty much new stuff gets released all the time, and you need to keep up. I’ve seen some of the creatives here, for instance—I think, yeah, these ones, or maybe a little bit older ones—that you can see some of those are just running on old models. People just haven’t updated yet, which is, again, normal, because this was—the update cycle here is like 3 months or something. But you want to be using cutting-edge stuff because, again, it gives you an edge on quality, stuff like that, and all these other things. So yeah, just kind of moves very, very fast.
Roman: How do you keep up, Jakub? You personally? Do you have time?
Jakub: I don’t know. Okay, yeah, understandable. So I listen to a few podcasts, of course, based on AI. Literally, we can link—this one is, in my opinion, the—this kind of AI Search YouTube channel is literally a guy that just does the news. Every week, he goes, “What was released this week?” and just goes through all the models. And they have specific videos on specific stuff, like comparisons, stuff like that. So this one kind of debrief I watch very regularly.
Jakub: Then I have a few podcasts that are industry-based, like what’s the latest of ChatGPT versus Microsoft and all these other things. And then, of course, you go on the Civitai site, and you just go and see what people generate.
So, for instance, here you could clearly see that lots of these things like, okay, there’s a lot of ChatGPT actually in it. There’s a lot of—what’s this? Yeah, Google Nano Banana, pretty much is trending really high these days. And you just see what people generate and what’s pretty much there in the market. And this just tells you, “Okay, yeah, that’s basically it.” So, you know who generates what, right?
Roman: And this is so interesting that you’re actually a game design expert, and you are now fully into this creative part of the game. Full in. How does it feel, Jakub?
Jakub: Oh, it’s great, you know. I’m always kind of obsessed in—but again, the biggest problem was I didn’t understand how this thing works, which for me is the biggest itch of like, “I need to do something about it,” because I don’t feel safe, or how do we say it? I don’t feel on top of it if I don’t understand how it works.
So it kind of drives me to kind of go into this rabbit hole and learn about this. Because, again, if you don’t know something, at least know how it works. You don’t need to specifically do it, but at least know that there are these options. Because that way, you won’t get sidelined. You won’t get in a situation where somebody tells you something, you can’t call their bullshit, and you don’t know if this is the best, not the best, or are they even telling you the truth, stuff like that.
So, again, AI is just moving so fast these days in a way, and it’s one of the most important technologies of our lives currently. So why not, you know, slap two birds with one stone? Where, again, we need this because of our game industry professional expertise. And on the other hand, of course, AI will be there, and it will change stuff. That’s for sure.
But again, the edge that it gives me now is I know, for instance, gaming-wise—no, gameplay-wise, it won’t change stuff that much. It will maybe help with some optimizations, like matchmaking or whatever bolts, I don’t know what. But we still haven’t reached the point that AI is doing the, you know, the AI-enabled games—games that won’t be working without AI capabilities. We still didn’t hit that inflection point.
It’s not something like it was, for instance, in, I don’t know, 1999 or something, where Doom 3D was released. Because the 3D-ness of it enabled you to do stuff that you couldn’t do in 2D. We haven’t reached this point yet. Again, why? Because you learn about this, and you know that, like, “Oh, this doesn’t make sense. It cannot even code properly yet, so much hallucinations.”
Roman: Everything is connected. All right, Jakub, this was super insightful. I’m sure the guys will have a lot of questions. I’ll ask everyone to leave their questions in the comments. We’ll ask Jakub to answer them once he has time.
Any parting thoughts? Or we will, of course, leave all the links in the descriptions to Two and a Half Gamers and the stuff that we mentioned during the videos. But I also want Jakub to say something, especially at the end of the year. Last parting thoughts from you, Jakub?
Jakub: Yeah, yeah. As I said, if you have any questions or any thoughts, comments, feel free to leave them under the video, or you can join the Two and a Half Gamers Slack. That’s also open for all the people to kind of share their knowledge and talk with others.
Jakub: Yeah, I would—as I said, parting line is: Go and learn it. Don’t wait for it until it will kind of, you know, it’s too late to kind of catch up.
Roman: Well said, well said. We’ll end on this point. Like and subscribe. I’m sure you liked this episode. And thanks a lot, Jakub.
Jakub: Yeah, no worries. See you there. Cheers.
Roman: Bye-bye.
营销内容经理
塔拉-迈耶