I had hoped to try out OpenAI’s text-to-video generating artificial intelligence tool Sora for this piece but I wasn’t fast enough. Masses of people thought the same, and new account creation at Sora is disabled for now.
To recap, Sora was announced in February this year, and caused gasps in the industry as the videos it generates can be rather good. They can also be completely weird if Sora gets the real-world physics calculations wrong, but overall, the videos are worrisomely good enough.
So I can’t show any “original” examples, but there are lots of online to look at, such as:
As cool as the new Sora is, gymnastics is still very much the Turing test for AI video.
— Deedy (@deedydas) December 10, 2024
1/4 pic.twitter.com/X78dNzusNU
Apropos the physics issues mentioned above.
Sora isn’t the only GenAI video creator in town, and some will be generated by other tools, because inauthenticity and engagement farming is the name of the Internet game these days. You should be able to use the Coalition for Content Provenance and Authenticity (C2PA) metadata which is meant to be embedded into Sora files to check their provenance. https://contentcredentials.org/verify
ChatGPT Plus subscribers who pay US$20 a month can generate up to 50 priority videos at a maximum of 720p resolution, and just five seconds length; rich folks who can stump up US$200 a month for ChatGPT Pro subs get an unlimited amount of Sora generation (five clips simultaneously), with 500 priority videos at 1080p resolution and 20 seconds length, with and without C2PA watermarks.
You can blend two scenes with Sora, and tweak the output as well.
The question is, will OpenAI be able to recoup all the investor money it sinks into the tools like Sora, which use thousands of super expensive high-end graphics cards for training that consume silly amounts of power? The US$200/month ChatGPT Pro subscription points to cost recovery time starting for OpenAI.
There could be a limit to how much OpenAI and others are able to charge for subscriptions. That’s because models are moving closer to users (so to speak). With a decently powerful computer that has plenty of system and video memory, it’s possible to run a chunky LLM on your own hardware. Like Meta AI’s recently released Llama 3.3, with 70 billion parameters. Using Ollama, the Llama 3.3:70b runs fine on my MacBook M3 Pro Max with 96 gigabytes of memory, and even better on a 128 GB MBP Max M4 (as you’d expect).
When Llama 3.3 runs along with web browsers and a few other apps, around 79 GB of RAM is used; the graphics unit hits full usage, the laptop fans start up, and battery life? fuhgeddaboodit: plugging in the MBPs is a must. It’s possible to customise Llama 3.3 and create your own peculiar model, and I’m still figuring out how to tune it.
Llama 3.3 is free for personal use, not commercial (no idea how Meta would police that); AI Svengali Simon Willison dubbed it a “GPT-4 class model” and he runs it on an M2 chip MBP Pro with 64 GB of memory. That gets you the power of the fairly recent Llama 3.1 with 405 billion parameters, in a laptop.
“I honestly didn’t think this was possible—I assumed that anything as useful as GPT-4 would require many times more resources than are available to me on my consumer-grade laptop,” Willison said. Think about that for a moment.
GenAI video requires beefier resources than text models however. Take Chinese tech giant Tencent’s HunyuanVideo for example: to generate videos with 1280 by 720 pixel resolution and 129 frames, you need an Nvidia graphics card with at least 60 GB of memory; 80 GB is recommended, and Hunyuan runs on Linux-based operating systems.
Unfortunately, PB Tech is out of the $27,595 inc GST Leadtek Nvidia A100 cards with 80 GB, so again, I can’t directly show you what HunyanVideo is capable of, but the examples published online look very impressive.
Now, a newish laptop or desktop with heaps of memory and a powerful video card won’t be cheap, and subscriptions are nice in the sense that once you’re done with a project, you can just hit cancel and stop the AI operating expenditure. It’ll be interesting to see if models continue to improve as quickly as they have the last couple of years, and reach a “sweet spot” of good enough performance vs hardware requirements, for local use.
You can’t write about the above without noting that generating videos with AI is ethically dubious. Just some days before Sora was released out of preview, artists who had access to the video generation tool are furious over the “artwashing” they provided for free to OpenAI.
Expect more collisions like that because the value for OpenAI and other AI companies doesn’t lie in the output their tools create. The material used for training models needs to be as cheap as possible, ideally free, because if it’s compensated at rates that reflect the time and effort spent, well… a US$200 per month subscription would be cheap.
Somewhat related to the above, I’ve been trying out local (or edge) AI features on the devices that I have, like Windows Recall on Microsoft Copilot+ hardware. Some are useful, others are neither here nor there, and a few of them can make you seem like a psychopath when communicating with people who might not realise you’re using AI generated responses.
More on how local AI will inexorably fuse with your operating system as features arrive slowly at a later stage, but as you try them out, something feels wrong. What’s being generated seems far removed from who you are, with text being awfully nice indeed, in that marvellous American cultural imperialist manner:
On the esteemed vessel Venus,
A spectacle you should behold,
The figurehead depicted a woman in a state of undress,
While the mast resembled a colossal phallus.
The captain of this vessel, a figure of reproach,
Was unfit for any laborious task,
Let alone the menial job of shoveling waste.
Chorus
In the rigging, we found ourselves confined,
With nothing else to occupy our minds.
The captain’s name was Morgan,
A man of extraordinary beauty,
He played his organ ten times a day,
Melodies that were both sweet and sultry.
The first mate’s name was Cooper,
A man of unwavering resolve,
He labored tirelessly, his efforts culminating in a state of exhaustion.
… and images created are cute and twee. Which is fine for some situations, and that ChatGPT rewrite above is really quite impressive even though it couldn’t come up with alternative rhymes for the original lyrics.
As technology progresses, local AI trained on your content will learn who you really are, allowing it to be used for accurate impersonation for purposes such as automation. You might not even need to add fresh data from yourself for model retraining after a while. I can't even begin to imagine where that will go.
3 Comments
Co-Pilot for Microsoft 365 is being given the hard sell around government. Join a Teams meeting late? No problem, just ask Copilot fo ra summary. Bad at taking notes? No problem.
Signing up to Copilot-enhanced Office 365 takes 2 days - where the LLM reads your emails, picks up your drafting style etc so you can ask it to draft emails for you. Pretty good for productivity, but $50 per person per month is steep.
The potential for there to be no need to add fresh data had occurred to me as well. I imagine everyone's emails would end up sounding like the linguistic equivalent of brown after a while.
We welcome your comments below. If you are not already registered, please register to comment.
Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.