Enterprise AI, Augmented Employees, AGI and the Future of Work with Charlie Newark-French, CEO of Hyperscience

23 mar 2023 · The Hitchhiker's Guide to AI

Hi Hitchhikers!

I’m excited to share this latest podcast episode, where I interview Charlie Newark-French, CEO of Hyperscience, which provides AI-powered automation solutions for enterprise customers. This is a must-listen if you are either a founder considering starting an AI startup for Enterprise or an Enterprise leader thinking about investing in AI.

Charlie has a background in economics, management, and investing. Prior to Hyperscience, he was a late-stage venture investor and management consultant, so he also has some really interesting views on how AI will impact industry, employment, and society in the future.

In this podcast, Charlie and I talk about how Hyperscience uses machine learning to automate document collection and data extraction in legacy industries like banking and insurance. We discuss how the latest large-scale language models like GTP-4 can be leveraged in enterprise and he shares his thoughts on the future of work where every employee is augmented by AI. We also touch on how AI startups should approach solving problems in the enterprise space and how enterprise buyers think about investing in AI and measuring ROI.

Finally, I get Charlie’s perspective on Artificial General Intelligence or AGI, how it might change our future, and the responsibility of governments to prepare us for this future.

I hope you enjoy the episode!

Please don’t forget to subscribe @ http://hitchhikersguidetoai.com

Thanks for reading The Hitchhiker's Guide to AI! Subscribe for free to receive new posts and support my work.

Episode Notes

Links:

* Charlie on Linkedin: https://www.linkedin.com/in/charlienewarkfrench/

* Hyperscience: http://hyperscience.com

* New York Times article on automation: https://www.nytimes.com/2022/10/07/opinion/machines-ai-employment.html?smid=nytcore-ios-share

Episode Contents:

00:00 Intro

01:56 Hyperscience

04:52 GPT-4

09:41 Legacy businesses

11:13 Augmenting employees with AI

15:48 Tips for founders thinking about AI for enterprise

20:34 Tips enterprise execs considering AI

23:49 Artificial General Intelligence

29:41 AI Agents Everywhere

32:12 The future of society with AI

37:44 Closing remarks

Transcript:

HGAI: Charlie Newark French

Intro

AJ Asver: Hey everyone, and welcome to the Hitchhiker Guide to ai. I am so happy for you to join me for this episode. The Hitchhiker Guide to AI is a podcast where I explore the world of artificial intelligence and help you understand how it's gonna change the way we live, work, and play. Now for today's episode, I'm really excited to be joined by a friend of mine, Charlie Newark, French.

AJ Asver: Charlie is the CEO of hyper science, a company that is working to bring AI into the enterprise. Now, Charlie's gonna talk a lot about what hyper science is and what they do, but what I'm really excited to hear Charlie's opinions on is how he sees automation impacting our future.

AJ Asver: Both economically, but as a society, as you've seen with recent launch of G P T four and all the progress that's happening in AI, there's a lot of questions around what this means for everyday knowledge workers and what it means for jobs in the future. And Charlie, has some really interesting ideas about this, and he's been sharing a lot of them on his LinkedIn and I've been really excited to finally get him on the show so we can talk. Charlie also has a background in economics and management. He studied an MBA at Harvard and previously was at McKinsey, and so he has a ton of experience thinking about industry as a whole, enterprise and economics and how these kind of technology waves can impact us as a society.

AJ Asver: If you are excited to hear about how AI is gonna impact our economy, our society, and how automation is gonna change the way we work, then you are gonna love this episode of The hitchhiker Guide to ai.

AJ Asver: Hey Charlie, so great to have you on the podcast. Thank you so much for joining me.

Charlie: Aj, thank you for having me. I'm excited to discuss everything you just talked about

AJ Asver: maybe to start off, one of the things I'm really excited to understand is how did you end up at Hyper Science and what exactly do they do?

Hyperscience

Charlie: Yeah, hyper Science was founded in 2014. It was founded by three machine learning engineers. so We've been an ML company for a long time. My background before hyper science was in late stage investing. Had sort of the full spectrum of outcomes there.

Charlie: Some why successful IPOs, some strategic acquisitions, and then a lot of miserable, sleepless nights on some of the other areas. I found, hyper science, incredibly impressed with, their ability to take cutting edge technology and apply it to real well problems. We use machine vision, we use large language models, and we use natural language processing, and we use that those technologies to speed up back office process.

Charlie: The best examples here are a loan origination, insurance claims processing, customer onboarding. These are sort of miserable long processes, a lot of manual steps, and we speed those up. With some partners taking it down from about 15 days to four hours.

Charlie: So all of that data that's flowing in of this is who I am, this is what's happened, this is the supporting evidence. We ingest that. It might be an email, it might be a document. It's some human readable data. We ingest that, we process it, and then ultimately the claims administrator can say, yes, pay out this claim, or no, there's something.

AJ Asver: Yeah, so what, what you guys are doing essentially is you had folks that were previously looking at these documents, assessing these documents, maybe extracting the data out of these forms, maybe it was emails, and entering those into some database, right? And then decision was made, and now your technology's basically automating that. It's kind of sucking up all these documents and basically extracting all that information, helping make those decisions. My understanding is that with machine learning, what you're really doing is you've kind of trained on this data set, right, in a supervised way, which means you've said like, this is what good looks like.

AJ Asver: This is what, you know, extracting a, a, a data from this form looks like now we're gonna teach this machine learning algorithm how to do it itself. Now what what I found really interesting is that, That was kind of where we made the most advancements, really in kind of AI over the last decade, I would say.

AJ Asver: Right? It's like these deeper and deeper neural networks. They could do machine learning in very supervised ways, but what's recently happened with large language models especially, is that we've now got this like general purpose AI that, you know, GPT-4, for example, just launched this. and there was an amazing demo where I think the CTO of OpenAI basically sketched on like the back of a napkin, a mockup for a website, and then he put in in GPT and it was able to like, make the code for it.

AJ Asver: Right. So when you think about a general purpose, let large language model like that, compared to the machine learning, y'all are using do you consider that to be a tool that you'll eventually use? Do you think it's kind of a, a threat to like the companies that have spent the last, you know, 5, 6, 7 years, decades, maybe kind of perfecting these ma machine learning tools or, you know, I, is it something that's gonna be more like different use cases that won't be used you know, by your customers?

GPT-4

Charlie: Open ai ChatGPT, GPT-4. Anything that's been, the technology you're speaking about has really had two fundamental impacts. There's been the technology. It's just very, very cutting edge, advanced technology. And then you've got the adoption side of it. And I think both sides are as interesting as each other.

Charlie: On the adoption side, I sort of like to compare it to the iPhone that there was a lot of cutting edge technology, but what they did is they made that technology incredibly easy to use. There's a few things that Open AI has done here that's been insanely impressive. First, , they use human language. Um, humans will always assign a higher level of intelligence to something that speaks in its language.

Charlie: The other thing, it's a very small thing, but I love the way that it streams answers so it doesn't have a little loading sign that goes around and dumps an answer on you. It's like, it's almost like it's communicating with you. Allow you to read in real time and it feels more like a conversation.

Charlie: Obviously the APIs have been a huge addition. It's just super easy to use, so that's been one big step forward. But it's a large language model. It's a chat bot. I don't wanna underestimate the impact of that technology, but my thoughts are AI will be everywhere. It's gonna be pervasive in every single thing we do.

Charlie: And I hope that chatbots and large language models aren't the limitation of ai. I'd sort of like to compare chatbots and large language. To search the internet is this huge thing, one giant use case that if you ask people what is the internet? They think it's. Google, And that's the sort of way I think this will play out with AI and the likes of a whichever large language model and chatbot wins to be the Google of that world, which at the moment appears very clearly to be open ai.

Charlie: But there's some examples of stuff that. Certainly right now, that approach wouldn't solve. I'll give you a few, but the, this list is tens of thousands of use cases long. We spoke about autonomous vehicles earlier. I suspect LLMs are not the approach for that physical robotics. Healthcare detecting radiology diseases, fraud detection.

Charlie: I'm sure if you put in like a fake check in front of GPT-4 right now it was written on the napkin, it might be able to say, okay, this is what the word fraud means. This is what a check looks like, but you've got substantially more advanced ai AI out there right now that looks at. , what, what was the exact check that Citibank had in nine, in 2021?

Charlie: Does this link up with other patterns that should be happening with this kind of transaction and so, I think that you are gonna have more dedicated solutions out there than just sort of one chat or interface to rule them all, would be my guess. Yeah. Hyper science. There are things that chat G p t does, or g p t four does right now that we do.

Charlie: Machine vision is one that's an addition that appears to be working alongside their large language model. So they're combining different technologies versus just a large language model, is my guess. I obviously don't have work ins. But we build a platform here at hyper science that builds workflows, that enriches data, validates data, compares data looks at data that's historically come into an organization that might not be accessible to sort of public chat bots or large language models.

Charlie: I think the question that you sort of said at the beginning, Could we be using chat, G p t or g p T four? Absolutely. And I think that a lot of startups could, but I suspect that, that you, what you'll see here is a lot of the startups that spin up leveraging this and building something far greater from a user experience for a very specific use case versus open AI solving all the sort of various small problems along the way, if that makes.

AJ Asver: Yeah, I think that makes a lot of sense. And it's one thing I've been thinking a lot about. I actually wrote a blog post recently about this as kind of how these foundational models are gonna become more and more commoditized and it's gonna create this massive influx of products built on top of it.

AJ Asver: What I find really interesting is that you know, GPT, you can actually use that transformer for a lot of different things that aren't necessarily just a chatbot.

AJ Asver: Right. So you mentioned the fraud use case. If you send a bunch of patents of fraud to a large transformer, its ability to actually remember things makes it very good at identifying fraud. And in fact, I was talking to a friend that, that worked at Coinbase in their most recent fraud control mechanism.

AJ Asver: They went from kind of linear aggressive models to deep learning models, to eventually actually using transformers and it, and it was far more, far more effective. So I guess coming back to the question, do you see a world where instead of building these. Focused machine learning models for particular use cases like you know, ingesting documents or maybe making sense of data and extracting that data and tabulating it into a, into a database that you might and one day end up actually just having a general pre-trained transformer that you are then essentially fine tuning with one shot. Kind of tuning me. Like, this is how you extract a document for one of our clients. This is how you you know, organize this information into like loan data. Is that a world we could move in? That's probably different from where we are today and, and maybe a different world of hypo sciences too.

Legacy businesses

Charlie: Look, it would be a very different world. I think the next five, 10 years are leveraging the, the. Of technology that OpenAI is building and maybe that specific technology, as you sort of say of commod, some commoditized layer and building. Workflows on top of that, I'll give you the, just the harsh reality of what the world looks like in reality out there.

Charlie: Right? So this isn't just a single use case that I go and type something in as a consumer on on the internet at a bank in the uk they have a piece of cobalt that it was written in the 1960s that is still live in their mortgage processing.

AJ Asver: Wow.

Charlie: Rolling out, even just from a compliance level, any change to that mortgage processing that isn't piecemeal fashion, that doesn't about the implications, that doesn't think about customer interactions in a a week timeframe or a a year or three year timeframe is just not dealing with the reality of the situation on the ground.

Charlie: These are complex process. People get fired if you take a mortgage processing system down for minutes. And they're complex. So do I think that's a possibility in the future? It's absolutely possible. I think the best use of GPT-4 right now is to go and build the extensive workflows that require a little bit of old-fashioned AI, as well as cutting edge AI to, to have an end-to-end solution for a specific problem versus assuming that we're ever gonna get something. But you just say, okay, I'd like to know, should I give this customer this mortgage?

Charlie: And you get an answer back. That, to answer that question is still a very complex process.

Augmenting employees with AI

AJ Asver: Yeah, and I think we, especially in the technology industry, especially someone like me that spends so much time thinking about, talking about reading about AI, kind of forget that a lot of these legacy businesses can't move as fast as we think. I mean, we see like Microsoft moving quickly and slack moving quickly for example.

AJ Asver: But those are all like very software focused consumery businesses that you know, necessarily touching like hard cash and stuff like that where there's a lot more compliance and regulations. So that makes a lot of sense. So then what we really are thinking about is like you kind of have humans that can be, as you've put it before in some of your predictions around ai, augmented, right?

AJ Asver: These, this idea of like an augmented employee that can use AI to, to help them get things done, but we're not necessarily replacing them straight away. Like, talk to me about what, what you see as a future of kind of augmented employees and, and kind of co-pilots as they're also called.

Charlie: Totally. So the augmented employee is a phrase that I've been using. For about 10 years, it's been a prediction for a while now. It didn't used to be a particularly popular one. You would get a whole load of reports even from the like big consultancy groups that say these five jobs are definitely gone in five to 10 years.

Charlie: That five to 10 years has come and gone over that period of time, or I'll give you a longer period of time. Over the last 30 years, we've added 30 million jobs here in the us about a on average. Obviously, it's been a bit of fluctuation. There's no good sign on a short term decision making time horizon that jobs are gonna be wildly quickly displaced.

Charlie: There's very little evidence. That's my. What do I mean by short term horizon? I really mean by the when, what a large enterprise, which is what I, my company serves and what I'm interested in serving, makes decisions 5, 10, 20, maybe even as that's the sort of edge of where I think things start to really change.

Charlie: Fundamentally you should make decisions around software. And AI in this case, substantially helping people do a better job. The, the, the first time I read this getting sort of a mainstream idea was about a year ago. And by mainstream, I mean outside of the tech industry New York Times wrote an article where the title was something like in the.

Charlie: Fight between humans and robots. Humans seem to be winning. I think that was just a very interesting change of thought. And there was a line in there that says the question used to be, when will robots replace humans? The better question is, which I absolutely love this phrasing of it, when will humans that leverage robots replace humans that don't leverage robots?

Charlie: And I think that's the right way to think about it. I, I'll give you a couple of examples. One with sort of, non-AI OpenAI technology and then chat. G PT Speci specifically, or, or G P T. Radiology is something that's been talked about for a while. This was a giant step forward where software AI could detect most.

Charlie: Cancers most diseases, basic diseases better than humans could just had higher accuracy. And the prediction for five, seven years was, this is the end of radiologists. We've seen no decrease in radiologists. If you want to go and get a cancer screening now, you're gonna probably look at a six to nine month wait.

Charlie: I don't have any issues, but I'm waiting for a cancer screen right now. Just a nice safety check that I want to my own benefit and cause of the sheer backlog of work. , I can't get that done. I can't get it done for a while. So is the, the future for me is in two or three years time, there's not fewer radi.

Charlie: There's just much higher accuracy and much shorter wait, wait times. And maybe the future, as I say, which I'm sure we'll speak about 20 years down the line is is I can just go to a website. They can do some scan of me, and then they can give me the answer within seconds. I, I, I can't wait for that, but it's just not here today.

Charlie: And I'll give you one aj one quick open AI example. When ChatGPT came out there was so many sort of, this is not ready for mainstream things that went round. And the, the way that I thought about it is, if you want ChatGPT then to write you a sublease because you want to lease your apartment and you want it to be flawless, you just want to click send to the person that's doing the sublease on. It's nowhere near medi ready for mainstream. If you wanna cut down a law legal person's work by 90% because the first draft is done. They're gonna apply local laws, they're gonna look at a few nuances. They're gonna understand the building a bit then it's so far beyond ready for mainstream. It should be used by everybody for every small use case it can. So I think it's human augmentation for a while. I think that jobs don't go away for a while, and I sort of like to compare it to the internet a little bit here, which is we use the internet today and every single thing we do and it makes our jobs substantially easier to do. It makes us more effective at them, and that's what I think the next sort of 10 years at least looks like for AI within the work.

Tips for founders thinking about AI for enterprise

AJ Asver: The thing you mentioned there, I find to be really fascinating is this idea that, you know, we're not gonna replace humans immediately. That's not happening. But people thought that for a long time. Right. And it almost feels like with every new wave of technology, there's this new hum of people saying like, we don't replace humans, we're gonna replace humans.

AJ Asver: Right. But at the same time, I, I kind of agree with you, having spent a lot of time using chat JBT and working with it, I found that it certainly augments my life, in writing My substack in fact, in this interview preparing for this interview, I actually asked it to help me think about some interesting questions to ask you based on some of the posts you'd written.

AJ Asver: Because I'd read some of your posts on, on, on LinkedIn fairly regularly, but I couldn't remember all of them, so I actually asked the Bing ai chat to help me. Right. And then when you think about these especially regulated environments where you. The difference between right and wrong is actually someone's life or a large sum of money or breaking the law, then it really matters. And in that case, augmentation makes a lot of sense. Now, the reality is, AI, especially kind of large language models in building on top of open AI is a fairly low barrier to entry right now. That's why we're seeing a lot of companies in copywriting, in collaboration, in presentations, and the challenge with that is if there's an incumbent in the space, That already exists. It's very hard to beat them on distribution right now. Where I did see an interesting opportunity is exactly what you are talking about, is like going deep into a a fairly fragmented industry, which maybe has a lot of regulation or a lot of complexity, maybe disparate data systems.

AJ Asver: You mentioned kind of the. 30 year old like cobalt data system, right? Like that is a perfect place where you can go in and really understand it deeply. Now, as someone that's running a company that does that, I'm curious, like what advice do you have to founders or startups? I wanna take that path of like, Hey, we're gonna take this AI technology that we think is extremely powerful, but go apply it into some deep industry where you really have to understand the ins and outs of that industry, the regulation, the, the way people operate in that industry and in the problems.

Charlie: Absolutely a few thoughts. Firstly, make sure that you are trying to solve a problem. This is just general advice for setting up a business. What is the problem you're trying to solve? What is the end consumer pain point? For us here at Hyper Science, it's that people wait for their mortgage to be processed for six weeks.

Charlie: No good reason why that's happening. People wait for their insurance to be insurance claims to be paid out sometimes for two years. No good reason for that to be happening. So always start with the customer pain point, and then decide does the current technology, which is AI in this case, allow you for solving it?

Charlie: And then that gets you to the, does it allow you to solve for it? And what I've looked for here is, the more open AI can do it or G p t four can do it a whole load of diverse stuff, but your highest value things are gonna be what's just happening time and time again. If for us, like there is just a whole load of mortgages, that process not right now or there is just a whole load of insurance things that are processed and they're.

Charlie: Relatively similar, although we think of them as different. They've got a lot of, certainly to a machine similarities. So I'd look for volume. You can think of this as your total addressable market in terms of traditional vc, non-AI speak. But this is, is the opportunity big enough? And then the, the next thing I'd look for is repetitive tasks.

Charlie: So the more repetitive it is, the easier it's. You can go out and solve something really, really hard with a large language. But there's probably even easier applications that you can solve that are just super repetitive and you can take steps out. So I think that's it. Have the problem in mind.

Charlie: Think about volume, think about repetitive natures, and then one of the key things, once you've got all of that set and you've got, okay, this is an industry that's right for customer pain, right, for disruption. This is definitely a good application of where AI is today. I would think about ease of use above everything.

Charlie: My, my thinking is, and I spoke about this with open ai, one of the biggest things they've done is they've just taken exceptional technology, but made it so, so simple for someone that doesn't know AI to interact with. And the question I always get asked is the CEO of enterprise software, AI company is how can we upscale all of our employees?

Charlie: The answer to that is you shouldn't have to. This software should be so easy to engage with that your current employees should seamlessly be able to do it. There should be, if there is rudimentary training needed needed, your software should do that. And again, I like to compare this to the internet. We use the internet day in, day out.

Charlie: There has been 20 years of upskilling, but it's not really been that hard. Like I think if you took today the internet and you gave it to somebody 20 years ago, it might be a little bit advanced for. , but we've made software, internet software, so easy to work with that you don't need to know how the internet works.

Charlie: The number of people that know how the internet works, even the number of people that know how to truly code a website. Absolutely tiny fraction of the number of people that use the internet to improve their daily lives. So I'd say ease of use for AI is possibly as important as the technology.

Tips enterprise execs considering AI

AJ Asver: I love those insights by the way. And just to recap that you said go for a problem that has a lot of volume, whereas solving a real problem to end users, but there's clear volume or like, you know, a large addressable market. The other thing you mentioned was make sure it's repe repetitive tasks, like with l LM says it's temptation to go after these really complex problems, but like repetitive tasks are the ones that are most.

AJ Asver: That's probably the most incentive to solve as well. Right. And then the last thing you mentioned is like, it should be really intuitive for a, for an end user to use to the point where they don't have to feel like they need to be upskilled. Now, if you are a founder or a startup going down this path, the other thing you're thinking about is like, how do you sell into these companies?

AJ Asver: So maybe taking the flip side of that, if you are in the enterprise and you're getting approached by all these AI startups, they just got funded this year being like, we're gonna help you do this. We're gonna help you do that. We're gonna automate this. How do you decide when it's the right time to make that decision?

AJ Asver: How do you decide? Kind of, the investment on that and whether it's worth it. Like what, what are your thoughts on that?

Charlie: My thoughts on that are linked directly to the economic cycle we're in right now, which is not a pretty one. Somewhat of a maybe a mild recession, maybe the edge of a recession. And I see this from all of the CIO CEOs that we work with at the, the sort of large banks, large insurance companies.

Charlie: And my suggestion is this is I tell them to create a two by two matrix. You told everyone earlier, I started my career at McKinsey.

AJ Asver: Classic two by two.

Charlie: Love it. Two by two matrix. On one of the ax axis is short-term ROI on one of the ax axis is long-term roi and you want to get as much into the top right as possible and as few into the bottom left as possible.

Charlie: And for a y or artificial intelligence was considered ROI and not short term roi, which is a bit, they were treated by these large. As science experiments and you saw these whole, these whole roles form these whole departments form around transformation. The digital transformation officer, that is a role that just didn't exist five or 10 years ago, and these people were there to go and innovate within the organization and, and largely speaking, it wasn't wildly successful. A lot of these roles are sort of spinning down. You need to solve a business problem that the technology solves today and gives you a path to the long run. So, hyper science, we, I'll give you an example here. We add value out of the box, but we also understand where people are today and try to get them to where they want to go.

Charlie: So one of our customers, 2% of what they do is process fax. I hope that they are not processing faxes in five time, and I hope that we are giving them the bridge to that, but we better be able to do that today. And also paint them a, a sort of what I refer to as a future proof journey to where they want to head.

Charlie: So I think it's really about don't, don't do any five year projects. Like if a company comes and says to you things. When you say, can you do this? And they say, well, we could do that. I would run. Or if they're just saying yes to everything versus yes to something quite specific and a good startup, you know this better than anyone, aj, a good startup does something specific really well and then they build out that they have an MVP is one way of phrasing it.

Charlie: They have a niche is another way of phrasing. Yeah, go and sell something today really well, and all of those sort of long tail features around it, people will forgo for a period of time whilst you build those. , but you be, add value in the in the short term as well as be building something in the long run.

Artificial General Intelligence

AJ Asver: Yeah. That and that point you made about that kind of showing the short term value is really important, especially when you're trying to convert the kind of maybe biases around AI that exist in enterprise today, that it's, as you mentioned, kind of like a hobby project or a kind of experiment, or like this is kind of your, you know, your Moonshot kind of, kind of projects you wanna show them really, like, this is like ROI you're gonna get very quickly in the next one or two years and, and that's a really important point of it. Now all of this makes sense, the augmented employee, the co-pilot and, and like having these narrow versions of AI that are solving particular problems and, and I can see that working out, but I feel like there's this one big factor that we, that we have to think about that that could change all of this, in my view.

AJ Asver: And that is artificial general intelligence. And for folks that dunno what artificial general intelligence is, or AGI is it's called. That's really what open AI is trying to achieve long term. And it's the idea of essentially having intelligence that is the equivalent of a human. And it's an ability to think abstractly and to solve a wide, broad range of problem. In a, in a way that, that, that a human does. And what that means is technically, now, if you have an AGI and let's say the cost of running that AGI I is, you know, a hundredth or a thousandth of a cost of running a human, then potentially you could replace everyone with, with, with robots or you know, AI as, as it were.

AJ Asver: How does that factor into this equation? Is this so, You, you think about is it, like, what are your thoughts about it? Both, both as CEO of an enterprise company, but also as someone that's studied management and economics for the last decade? I'm really curious to, to hear where you think this is going.

Charlie: I don't think there's anything unique about a soul or something that can't be replicated in the human mind. And to your point, I actually think that we, we think of AGI sometime as when I hear definitions, I hear human-like, or human level intelligence if this happens or when it happens, because it, there's no doubt it will it will be substantially smarter, incredibly quickly than a human. And you look at the difference in humans of intelligence, someone you just pick off a street, 110 I IQ or whatever level it is, versus an Einstein with 170 iq, that difference is enormous. Now, imagine that that's the equivalent of 170 iq, but it's a thousand or 10,000 or whatever it is. I think you will get to the point where if you have AGI extremely quickly, you will. Be far beyond them not being able to do any job. There will be absolutely zero they can't do

Charlie: now I don't see that today. I, my best guess of a time horizon is post 20 years sub a hundred years. That's a nice vague timeframe, but that's sort of how I think about it. 20 years is your classic sort of decision making timeframe and for, for someone. Building or someone running an enterprise software company, it's not an interesting question of what do we do with agi?

Charlie: For someone thinking about designing a society thinking about economic systems, thinking about regulation, it's an extremely good time to start thinking about those questions. Let, let me start by AJ speaking about why I don't think it's here today. And then we can perhaps think. What world where it is here looks like, which I'm quite excited about by the way.

Charlie: I don't view as a, a dystopian outcome. Our current approach to AI today is machine learning. We spoke about that earlier today. Machine learning requires sort of three things, compute algorithms and data, and on all three of them, I think to have true agi we're. The compute power I think we need some leap forward there.

Charlie: It might be quantum computing. There's a lot of exciting happening there. The timeframes there. I'm not as close to that as I am and I ai, but no one's speaking about quantum computing on a sort of one year time horizon. They're speaking about it again on a 10, 20 year time horizon. The second thing is the algorithms.

Charlie: I just, from what we see out there, even with the phenomenal stuff that op that open AI is building, I don't see algorithms out there doing true AI true AGI. They are. The large language models, I, will say that I'm incredibly impressed with how GPT-4 plays chess. It's still not at a, their level of an algorithm that is designed specifically for chess but it's pretty damn good. So, my, my thoughts on the, the algorithms evolve every day, but certainly we're still not there today. And then one of the big hurdles is gonna be data a human ingest data. Rapidly all the time via a series of senses. You can think of that as five senses. You can think of it as 27 senses.

Charlie: People have a different perspective on this, but there's just data flowing into us the whole time and at the moment we don't have that in the technology space. If you wanna solve the autonomous vehicle, you've gotta hold. They do like cameras and the visual aspect extremely well. But to solve true AGI level staff to go beyond doing a 2D chess player game to processing a mortgage, I think there's also gotta be a new way of ingesting data.

Charlie: Now, one interesting question that I've always wondered is, What will the first iteration of AGI look like? And there's no good reason in my mind to think, I don't think this is the end state. Cause I think the end state's a lot smarter than this. But the first version of what we would consider agi. And general intelligence just means it can do many diverse things and learn from one instance and apply that learning to another instance.

Charlie: It could just be a layer that looks like a chatbot, that looks like GPT-4 or GPT-10, whatever it ends up being that ducks into different specific narrow. Ai. And so if you want to get in a car somewhere, you talk to G P T four and you say, I'm looking to go here. And that just plugs into some autonomous vehicle algorithm.

Charlie: That could be the first way. And it'll feel like general intelligence and it will be general intelligence or you might have just some massive change in the way algorithms are written. And I do think there's a lot of excite exciting happening there. It's just not clear what the timeframe. , uh uh, for that well,

AI Agents Everywhere

AJ Asver: Yeah, I like that last bit you talked about. I, I really. That as kind of a way to think about how AI will evolve. I think some people of call it this kind of agent model where you have essentially this l l m large language model, like GPT acting as an agent, but it's actually coordinated across many different integrations, maybe other deep learning models to, to get things done for you.

AJ Asver: And so the collective intelligence of all those things put together is actually pretty powerful and can feel like, or, or have the, the, the kind of illusion of, of artificial general intelligence. I think for me there's this philosophical question of like if it's as good as the thing we want it to be, does it matter if like some nuanced academic definition of AGI isn't what it is? You know what I mean? Like if it does all the things we'd want of like a really smart assistant that's always available, but it doesn't meet the specifics of AGI I in the academic sense. Maybe it doesn't matter. Maybe that's what the next 20 or 30 years looks like for.

Charlie: Look, I think that's exactly right and there's no good reason for us to care. We just care that it gets done. We have no idea how the mind even works. We're pretty sure that the mind doesn't say, okay, I've got this little bit for playing chess, this little bit for driving some different way of doing it.

Charlie: But humans are very attached to replicating. Things that they experience and understand. And one just very simple way of doing it is a change of the definition of AGI from what your average person might associate AGI with.

AJ Asver: That's yeah, that to me is a, a is kind of a mental shift that I think will happen. And, and one of one of the things I've been thinking about is how, and, and this is why a huge reason why I started this, this newsletter and this podcast is that, you know, these things happen exponentially and very quickly.

AJ Asver: You don't really realize when you look behind you at an exponential curve cause it's flap, but you look forward, it's, it's kind of, steep. I always talk about this quote from. Sam Altman, cuz it really like seared into my head is that I think what we're gonna see is this exponential increase in these agents that essentially help you coordinate your life in, in work, in, in meetings.

AJ Asver: In getting to where you want to go in organizing a date night with your significant other. Right. And you are suddenly gonna be surrounded by them. And, and you'll forget about this concept of AGI because that will become the norm in, in the same way that like the internet age has become the norm. And being constantly connected to the internet is part of our, our, our normalcy in life.

AJ Asver: Right. This has been a fascinating conversation. I have one more question for you, which is, You know, as someone that's been going deep into the AI space as well, maybe from the enterprise side as as well, what are you most excited about in a, in AI for the next few years?

The future of society with AI

Charlie: yeah. Look, I talked about two things. Firstly. . I think one of the things that I'm excited about in the short term is just the growth in education. And the single biggest thing I think has happened with G P T is the just mass, fast, easy adoption. And when the internet became very, very interesting, it was when you got mass distribution.

Charlie: People creating use cases. And that's sort of when you went from like, okay, the internet could be a search engine to organized data. The internet could be a place to buy clothes. The internet could be a place to game to, okay, the internet is just everything. So I'm excited for that. And then in the long run it would be remissive us not to discuss this.

Charlie: I'm excited about thinking about what a new economic system looks like. People talk about universal basic. I don't think that the enterprise should be thinking about this question today. Our customers hyper science aren't thinking about this question today, but now is what I would call the ideation stage.

Charlie: Like we need think tanks, governments, people out there like, thinking of ideas, and then eventually stress testing ideas for what the future could look like. And I'm, I'm sort of excited for that.

AJ Asver: Yeah, I, I want to believe that it will happen, but I'm a little skeptical just given what the history of how humankind behaves. We're, we're not particularly good at planning for these inevitable, but hard to grasp, like eventualities in a similar way that we kind of all knew interest rates are going up, but it was really hard to understand what the implications of that was until last weekend when we found out, right, the

Charlie: It was very much we could have prepared for this.

AJ Asver: Yeah. Yeah. And yet we could have prepared because all the, all the writing was on the wall. And you know, you've got these people at the fringes that are kind of like ringing the alarm bells, whether that's like people working in AI ethics or whether it's even Sam Altman of OpenAI himself saying like, Hey, we actually can't open source our models anymore. It's too dangerous to do that. Right? And so then you've got people on the other side that are like, no, we need to accelerate this. It needs to be open. Everyone needs to see it. And the faster we get to this outcome, the faster we'll be able to deal with it. I, I skeptically unfortunately, believe that we're gonna stumble our way into this situation.

AJ Asver: We'll have to act very quickly when it happens. And maybe that's just like the inevitable kind of like journey of mankind as a whole, right? But it's still exciting either way.

Charlie: Well look, I think so. Look, if we don't stumble our way into it, we have, we create a world where people don't have to work, and I'm pretty excited about that. I'd going. Studying philosophy for two years at NYU, I'd be playing a hell of a lot of tennis. I'd be traveling more, like there is a good outcome.

Charlie: My, if we just stumble through it. My thinking, AJ, is that this is what the world looks like and it's not pretty. It looks like three classes of people. You have your asset owners, you can think of them as today's billionaire. They will probably end up controlling the world. There might be some fake democracy out there, when you have all of the sort of AI infrastructure owned by ultimately a small group of people you're probably gonna have them influencing most decisions.

Charlie: You may then have this second class of like celebrity class. I think that may still exist or human sports may still exist. Human movies human celebrities may still

AJ Asver: Yep.

Charlie: and you just get this class of 99.9% of people that are everyone else. And what the, the, the two features of their life are gonna look like.

Charlie: This is just my guess of like the way it goes. If we don't think about it in a bit more of a interesting way plan, and plan for it is universal basic income. Everyone gets the same amount, probably not very much. I don't know that that's gonna be particularly inspiring for people. I think there's better ideas and then I think that you this is very dispo dystopian, but end up living more in virtual reality than in reality. There's the shortest path to a lot of what you might want to create is just to create it in a virtual world versus going and creating all of that. But in a physical world if so, all that is to say is if we don't start thinking about it, don't start having some regions test different models having startups.

Charlie: Form ideas around this and, and, and come up with ideas that are then adopted by bigger countries. In this instance, I think you could end up with a bad outcome, but I think if it's planned, you could end up with an insanely cool world.

AJ Asver: Yeah. So we're gonna end on that note that, that those two worlds are what we have to either look forward to or dread, depending on which way you think about it. And I think for folks listening it's. . It's just like really important to begin with that people just understand, like society can only understand if individuals understand where this technology is going.

AJ Asver: Right? And that's where you obviously are helping by communicating on LinkedIn to the many people that follow you, especially in industry around it. I try to do it with, with this podcast, but I think for, for anyone that's like fascinated about AI that's following along, like I think the number one thing you can do right now, Share with your friends some of the things that are happening in this space to help people kind of get a grasp for how the space is moving, and that will also help you advocate for us to do the right thing, which is to prepare for it, right? I, I, I think like if people think this is a long way away and they don't understand it, just like you said for industry right? Then no one is incentivized within government to do it because their own uh, constituents don't care about it.

AJ Asver: Right? But if we're like, Hey, this is happening. It's an exciting technology, but also there's a kind of two ways this would go and there's one better way than I think it's just as important that we as individuals care about it and advocate for it. was a fascinating conversation. Thank you so much for the time, Charlie.

AJ Asver: I really appreciate it. I cannot wait for, for folks to listen to this episode and thank you a again for joining me on the Hitchhiker Guides ai.

Closing remarks

Charlie: Well, thank you for having me here, aj.

AJ Asver: Awesome. Well, thank you very much and for anyone that's listening, if you enjoyed this podcast please do share it with your friends, especially if you have either founders that are considering building AI startups and going into enterprise, or if you have folks that are working in industry and are considering incorporating AI into their products.

AJ Asver: I think Charlie shared a lot of really great insights there that I think folks would appreciate hearing. Thank you very much, and we'll see you on the next episode of the

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com

Lyssna Lyssna igen Fortsätt Lyssnar...
Följ Avfölj
Dela

Avsnitt

Interview: Supercharging your team with Coda AI | David Kossnick
23 jun 2023· The Hitchhiker's Guide to AI
Hi Hitchhikers,
I’m excited to share another interview from my podcast, this time with David Kossnick, Product Manager at Coda. Coda is a collaborative document tool combining the power of a document, spreadsheet, app, and database.
Before diving into the interview, I have an update on Parcha, the AI startup I recently co-founded. We’re building AI Agents that supercharge fintech compliance and operations teams. Our agents can carry out manual workflows by using the same policies, procedures, and tools that humans use. We’re applying AI in real-world use-cases with real customers and we’re hiring an applied AI engineer and a founding designer to join our team. If you are interested in learning more, please email [email protected].
Also don’t forget to subscribe to The Hitchhiker’s Guide to AI:
Now, onto the interview...
Interview: Supercharging your team with Coda AI | David Kossnick
I use Coda daily to organize my work, so I was thrilled to chat with David Kossnick, the PM leading Coda’s AI efforts. We discussed how Coda built AI capabilities into their product, and their vision for the future of AI in workspaces, and he gave me some practical tips on how to use AI to speed up my founder-led sales process.
Here are the highlights:
* The story behind Coda’s AI features: Coda started by allowing developers to build “packs” to integrate with their product. A developer created an OpenAI pack that became very popular, showing Coda the potential for AI. At a hackathon, Coda explored many AI ideas and invested in native AI capabilities. They started with GPT-3, building specific AI features, then gained more flexibility with ChatGPT.
* Focusing on input and flexibility: Coda designed flexible AI to work in many contexts. They focused on providing good “input” to guide users. The AI understands a workspace’s data and connections. Coda wants AI to feel like another teammate—able to answer questions but needing to be taught.
* Saving time and enabling impact: Coda sees AI enabling teams to spend less time on busywork and more time on impact. David demonstrated how Coda’s AI can summarize transcripts, categorize feedback, draft PRDs, take meeting notes, and personalize outreach.
* Tips for developing AI products: Start with an open-ended prompt to see how people use it, then build specific features for valuable use cases. Expect models and capabilities to change. Focus on providing good "input" to guide users. Launching AI requires figuring out model strengths, setting proper expectations, and crafting the right UX.
* How AI can improve team collaboration: David shared a practical example of how AI can help product teams share insights, summarize meetings and even kick-start spec writing.
* Using AI for founder-led sales: David also helped me set up an AI-powered Coda template for managing my startup's sales process. The AI can help qualify leads and draft personalized outreach emails.
* The future of AI in workspaces: David is excited about AI enabling smarter workspaces and reducing busywork. He sees AI agents as capable teammates that understand companies and workflows. Imagine asking a workspace about a project's status or what you missed on vacation and getting a perfect summary.
* From alpha to beta: Coda’s AI just launched in beta with more templates and resources. You can try it for free here: http://coda.io/ai
David’s insights on developing and launching AI products were really valuable. Coda built an innovative product, and I'm excited to see how their AI capabilities progress.
Thanks for reading The Hitchhiker's Guide to AI! Subscribe for free to receive new posts and support my work.
Episode Links
Coda’s new AI features are available in Beta starting today and you can check them out here: http://coda.io/ai.
You can also check out the founder-led sales CRM I build using Coda here: Supercharging Founder-led Sales with AI
Transcript
HGAI: Coda AI w/ David Kossnick
Intro
David Kossnick: ,One of our biggest choices was to make AI a building block initially. And so it can be plugged in lots of different places. There's a writing assistant, but there's also AI, you can use in a column. And so you can use it to fill in data, you can use it to write for you to categorize, for you, to summarize for you and so forth across many different types of content.
David Kossnick: Having that customizability and flexibility is really important. I'd say the other piece more broadly is there's been a lot of focus across the industry on what, how to make good output from AI models and benchmarks and what good output is and when do AI models hallucinate and lie to you and these types of things.
David Kossnick: I think there's been considerably less focus on good input. And what I mean by that is like, how do you teach people what to do with this thing? It's incredibly powerful, but also writing natural language is really imprecise and really hard.
AJ Asver: Hey everyone, and welcome to another episode of the Hitchhikers Guide to ai. I'm your host, AJ Asver and in this podcast I speak to creators, builders, and researchers in artificial intelligence to understand how it's going to change the way we live, work, and play. Now, You might have read in my newsletter that I just started a new AI startup
AJ Asver: since starting this startup a few months ago, a big part of my job has been attracting our first set of customers. I love talking to customers and demoing our product, but when it comes to running a founder-led sales process, prospecting, qualifying leads, And synthesizing all of those notes can be really time consuming, and that's exactly why I decided it was time to use AI to help me speed up the process and be way more productive with my time.
AJ Asver: And to do that, I'm gonna use my favorite productivity tool, Coda. Now, if you haven't heard of Coda, it's a collaborative document editing tool that's a mashup of a doc, a wiki, a spreadsheet, and a database.
AJ Asver: In this week's episode, I'm joined by David Kossnick, who's the product manager that leads Coda's AI efforts. David's going to share the story behind Coda adding AI to their product. Show us how their new AI features work, and give me some tips on how I can use AI in Coda.
AJ Asver: By the way, I've included a template for the AI powered sales CRM I built in the show notes, so you can check it out for yourself.
AJ Asver: But before I jump into this episode, I wanted to share a quick update on my new startup At Parcha, we're on a mission to eliminate boring work. Our AI agents make it possible to automate repetitive manual workflows that slow down businesses today.
AJ Asver: And we're starting with FinTech in compliance and operations. Now, if you're excited by the idea of working on cutting edge autonomous AI and you're a talented applied AI engineer or designer based in the Bay Area, we would love to hear from you. Please reach out to [email protected] if you wanna learn more about our company and our team.
AJ Asver: Now, let's get back to the episode. Join me as I hear the story behind Coda's latest AI features in the Hitchhikers Guide to AI.
AJ Asver: hey David, how's it going? Thank you so much for joining me for this episode.
David Kossnick: It's going great. Thanks for having me on today.
What is Coda?
AJ Asver: I am so excited to, go deeper into Coda's AI features with you. As I was saying at the beginning of this episode, I've been using Coda's AI features for the last month. It's been kind of a preview and it's been really cool to see, it's capable of. I'm already a massive Coda fan, as you know. I used it previously at Brex. I used it to organize my podcast and my newsletter, and most recently it's kind of running behind the scenes at our startup as well for all sorts of different use cases. But in this episode, I'd love to jump in and really understand why you guys decided to build this and what really was the story behind Coda's AI tools and how it's gonna help everyone be more productive.
AJ Asver: So maybe would you describe and what exactly it does?
David Kossnick: Coda was founded with a thesis that the way people work is overly siloed. So if you think about the most common productivity tools, you have your doc, you have your spreadsheet, and you have your app. And these things don't really talk to each other. And the reality is often you want a paragraph and then a table, and then another paragraph, and then a filter view of the table, and then some context in an app that relates to that table.
David Kossnick: And it's just really hard to do that. And so you have people falling back to the familiar doc, but litter with screenshots and half broken embeds. So Coda said, what if we made something where all these things could fit in one doc and they worked together perfectly? And that's what Coda is.
David Kossnick: It's a modern, document that allows you to have a ton of flexibility and integrate with over 600 different tools, uh, for your team.
AJ Asver: Yeah, I think that idea of Coda of being able to one, integrate with different tools would also be both a doc that can become a table and then have a mashup of all this different type of data is something I've really valued about it. I think, especially when I was at Brex and we used to run our team meetings on Coda, it was really great to be able to have like the action items really formatted well in the table, but also have the notes and more freeform and then combine that with kind of follow ups.
AJ Asver: And we even had this crazy table my product team where we would post like weekly photos and that's like really hard to do or in an organized way in a doc, and you'd never wanna do that in a spreadsheet. So, um, I love the fact that Coda enables you to combine all that different type of data together. So, Coda has that. And then it also has packs, which you mentioned too, right? And these are these integrations that allow you to like take data from lots of different places and put it all together.
Story behind Coda AI
AJ Asver: And from what I understand, Coda's AI products started as a pack, right? It was like this pack that someone had, Coda had built more as kind of like a hack project to get people to use the OpenAI capabilities inside Coda.
AJ Asver: And I'd actually tried this too, and that's kind of how you guys decided to actually build this into a real, uh, native integration.
David Kossnick: Yeah, totally. I'd say, you know, the first bet Coda made was on packs as a platform. And so maybe about a year ago now, we released an SDK where anyone can build a pack for Coda in their browser, in JavaScript and compile it and publish it. And so it can pull data from places, push data to places. And it's really been incredible to see people do this for all sorts of use cases we never even thought of.
David Kossnick: And it made possible people, starting to experiment with AI in a much more effortless manner. And so someone did a kind of weekend project and published the first OpenAI pack and it really took off starting to see it get used for all sorts of different cases. And it got us inspired for thinking, Hey, you know what?
David Kossnick: If we did something native that you didn't have to think about authenticating to external services, what if it could go much deeper in what context it had about your doc and your workspace in order to help you save time?
AJ Asver: One of the things I really loved here is kind of a product management thing is like seeing this nascent behavior right, happening on the platform and then deciding that it was something worth investing in further. So what point did you guys decide like, oh, this is becoming big enough or popular enough where we should make an investment in, and how was that decision made?
AJ Asver: Is that like a decision that like the CEO makes or is it more like kind of bubbled up from the bottom where like there was a team that saw this happening and was like, hey, we'd like to invest in this further. I'm really curious. Like give us a bit of like the inside baseball of how that happened.
David Kossnick: There were a few moments. Uh, the first moment was kinda the weekend hackathon where a few people, I think it actually started on Friday afternoon. Some of the DALL-E API had been released by OpenAI, and they really wanted to start generating images in a doc. And so it took like two hours, uh, to basically create a new pack from scratch and have it fully workable inside of a doc.
David Kossnick: And then we had a weekend blitz to basically ship it on Product Hunt. And Reid Hoffman, who's on Coda's board, big fan of OpenAI, was actually kind enough to hunt it on Product Hunt for us. Um, a bunch of really cool templates people had quickly built for it. And then about a month and a half later, we had a company-wide hackathon. This must have been back in December. And there was a ton of enthusiasm on about AI, partially from, um, the OpenAI pack. And so we explored like a dozen different ideas. Um, I think we won the People's Choice Award, the company votes on all the different hackathon pitches at the end of it.
David Kossnick: And then, uh, at January we had, a board meeting and we showed off some of the, thoughts from the hackathon as well as some of, what people had already been doing in the community. And the board was really excited about it, and so we started up a much bigger effort within the company.
AJ Asver: It's a really cool story to hear that, you know, what started as like a project that became like a hackathon, that, that became like a pack that was put together really quickly and just kind of an experiment then became like this big investment for the company. Right? And now when I look at the product, which, you know, you're gonna talk a bit more about, I can see how like it's, it could really become like a core kind of primitive of the Coda experience, just like a table and a dock and a canvases as well.
AJ Asver: So, that to me is like really inspiring, especially for other folks that are like working in product that, companies that are like Coda stage that you can really like come up with these experimental ideas and they can end up becoming products. And also I think it was really encouraging to see that you guys kind of explored AI and like integrating into the product pretty early, right?
ChatGPT and Coda AI
AJ Asver: Like this was like pre-chat g p t hype, like, or just like as ChatGPT was becoming popular, but before GPT-4 came out, Christmas at least, I, I feel like the hype was just starting to simmer them, but not nec necessarily boil over like it is right now. talk me through what it was like developing the AI product into what it is today and how you guys kind of built into the product and, and how it works.
David Kossnick: I look back on those early days and it's sort of, uh, amazing how much chaos there was in the market. You know, ChatGPT had just come out and it was incredible, and we were dying to get our hands on a chat based API but at the time, the only backend available to us was GPT-3.
David Kossnick: They hadn't released an API for ChatGPT. There was nothing else really at that quality level on the market. And so I remember going and chatting with every major developer, API, AI company, every platform and being like, Hey, like, what have you got? What can we use? And we had a bunch of different ideas on UX, but we were kind of bottlenecked on what's possible.
David Kossnick: So we started by building something with GPT-3 actually. And we said, okay, Chat's gonna come at some point. We don't know if it's a month out or a year out. Uh, we, you know, we gotta start moving now. Um, and we did a few experiments on it. Some just straight outta the box. And some that were very specific.
David Kossnick: So actually one of my favorite was Coda has a formula language. It's incredibly powerful. People love it. It's also, um, a little complicated for people who are starting to learn it. It's a lot like Google Sheets or Excel. And so we had said, what if you could have natural language that just turned into a Coda formula?
David Kossnick: And so we, um, we collected a data set for that. We actually crowdsourced it within the company. We took all of the company's internal staging environment, data of quota formulas, and we had people annotate what the natural language equivalent was. and we fine tuned GPT-3 for it.
David Kossnick: And we built a little thing that would basically, you know, convert text to formulas. We were like, wow, that's actually pretty good. We realized, you know, some of the hard cases are really hard, but some of the average cases it does quite well on. Um, but it was definitely a mode where, because there was no sort of generic chat backend, we had to think like, feature by feature, what would we do for this exact scenario?
David Kossnick: What are the prompts we would create, uh, and so forth. Um, and we got, you know, decently far down that path. Uh, at which point, you know, ChatGPT's API, which was called Turbo 3.5, was released and unlocked kind of a whole set of other use cases.
AJ Asver: I think for people that, you know, may have forgotten by now, cause it happened so quickly, right? GPT wasn't available through chat until November. So you basically had to just provide a prompt and it would do a completion, but it wasn't fine-tuned in the same way it was right now. It wasn't, um, basically it wasn't as good at following instructions right as it is now. And so you had to do a lot more work to get it working, and then of course chat landed. Did it feel like you kind of were given this like gift where you'd be like, oh, this is gonna make it way easier. And then how did that kind of lead to where the product ended up?
David Kossnick: was definitely a gift. We were super excited. It's also, as I'm sure you know, is like, a very double-sided, uh, sword. Um, you know, prompt engineering is hard. It's brittle. You sort of make a tweak and move. Sideways in, forward and backwards all simultaneously for different set of things.
David Kossnick: Um, and so there's definitely a new muscle on the team as we moved into sort of turbo and GPT-4 on, how do we really evaluate which things it's doing well on, which is doing poorly on how do we make it really good for those use cases, both by setting user expectations and by, by changing the input, when we actually want GPT-3 with something fine-tuned.
David Kossnick: And so it sort of opened up a whole kinda, new worms in the problem space, which was super exciting. And I think one of the things that, that got me really revved up about, uh, you know, Coda specifically as a uniquely great surface for AI is there's so many different ways people use Coda. So many different personas, so many scenarios.
David Kossnick: It's an incredibly flexible tool. And so having a backend like, ChatGPT is really, really useful for a fallback. For any sort of long tail and unusual, surprising request, cuz it does really well at the random thing. And one thing we've discovered is, you know, for the very narrow set of most common things, it does pretty well too, but not as good as the more specialized thing.
Making Coda AI work for lots of use cases
AJ Asver: So as you were talking about, Coda being used for lots of different use cases, I noticed that because there's so many different templates in the gallery and so many different ways Coda has been used, there's like pretty big community of Codens, right, that are building these different types of Coda docs. How did you think about it when it came to adding AI into Coda to make sure it's versatile, versatile enough to be used in many different ways. much did that impact kind of the end design and the user experience?
David Kossnick: you know, One of our biggest choices was to make AI a building block initially. And so it can be plugged in lots of different places. So you'll see as we get to a demo a bit later, there's a writing assistant, but there's also AI, you can use in a column. And so you can use it to fill in data, you can use it to write for you to categorize, for you, to summarize for you and so forth across many different types of content.
David Kossnick: Having that customizability and flexibility is really important. I'd say the other piece more broadly is there's been a lot of focus across the industry on what, how to make good output from AI models and benchmarks and what good output is and when do AI models hallucinate and lie to you and these types of things.
David Kossnick: I think there's been considerably less focus on good input. And what I mean by that is like, how do you teach people what to do with this thing? It's incredibly powerful, but also writing natural language is really imprecise and really hard.
AJ Asver: Mm-hmm.
David Kossnick: We had a user study early on. I remember it was super surprising.
David Kossnick: Someone asked, our AI block how much money was in its bank account when it was in the person's bank account. And I was just blown away that they, it felt so knowledgeable and powerful. They assumed. Know that even though they never authenticated their bank account, they just like forgotten. It just felt like something that they, that we would expect it to do.
David Kossnick: Um, and so how do you remind people sort of the universe of what's possible or not, or what it's good at or not? We have something very simple. You know, a lot like Google and Auto Complete as you type, you get suggestions underneath it. But a surprising amount of effort went into that piece in particular.
David Kossnick: What do we guide people towards? Which specific types of prompts, how do we make the defaults really good? How do we expose the right kinds of levers to show people what's possible and make them think about those? Um, and I think as an industry, we're still pretty early on that for these large language models, like I think we're gonna see a wave of innovation on how do you teach and inspire people how to interact to to have good input in order to get the good output.
AJ Asver: So we've talked a lot about that story behind Coda and AI, and it's really interesting to hear how you guys developed kind of the thesis around it and put into the product. Um, I think for folks that aren't familiar with Coda especially, I'd love to just jump in and for you to show us a little bit about how Coda works with AI and, and what it can actually do.
Demo: superchaging your team with Coda AI
David Kossnick: That sounds great. Yeah. Maybe I can walk you through a quick story about a team working together in a team hub with AI. And so this team hub is a place where different functions come together on a project, and have shared context.
David Kossnick: So, a super common way people use Coda is collecting feedback. Um, all sorts of feedback, support, tickets, customer feedback, uh, sales calls. Um, and so we have lots of integrations that do this. They pull in Zoom transcripts and content looks a lot like this. It's really rich.
David Kossnick: There's so much context in here, but it's really hard to turn this into something that's valuable for the whole organization. Um, I've spent many hours of my life writing up summaries and tagging things, and so wouldn't it be great if AI could just do this for me? Uh, and so here's a quick example. I'm gonna ask AI to summarize this transcript in a sentence, and here we go.
AJ Asver: That was really cool. And I think what was really magical for me about it is not just that you can summarize, because obviously you can take the transcript, you can put it in GPT-4 and summarize. And there are other tools that do summaries too. But I think what's magical is when you combine that AI in the column, but with the magic of Codar's existing integration.
AJ Asver: So like you have connected it to, I think it's Zoom, right? There's like a zoom
AJ Asver: pack that's already outputting all the transcripts. So you've automated that bit and then you create this formula that basically runs on that column and then every time a new transcript comes in, I presume it just automatically summarizes it.
AJ Asver: So that like piece of like connecting all those dots together, that's why I love Coda and that's why I think this is a really cool example of where kind of Coda shines.
David Kossnick: That's exactly right. And one of my favorite pieces here for dealing with large amounts of data is just categorizing things. There's so many times I'm going through a large table of data, picking a category, so wouldn't it be great if based on this transcript I could just automatically tag. What type of feature request it was and boom, there it is.
AJ Asver: where's it getting those feature requests from,
David: Yeah,
AJ Asver: tags? Is it like kind of making those or?
David Kossnick: In this case, I already have a, a schema for what are the types of feature requests that I've seen, and so it's just going ahead and tagging all those things.
AJ Asver: That, that's a really interesting feature too there, by the way, because now what you're doing is you're taking kind of the open-ended feature tagging problem where really GPT could like generate any feature tag at once and you're constraining it with this select list. And that's another good example of where if you did this in ChatGPT, you may end up with lots of different types of feature tags, but by constraining, you now end up with this format when you can now I, I assume go and like, organize these transcripts by feature tag because they're like actual each of those little data chips, right? And so it's now like segmentable, like
David Kossnick: Yeah, and one of the cool things about it is you'll see this one is blank. That's actually intentional. That means it, it either couldn't find a match or didn't know what a match was. As there's plenty of cases where there's, you know, there's no real feature request or doesn't really know what to do with it, and it just won't tag anything either.
AJ Asver: Very cool. Very cool. Okay, what What else you got?
David Kossnick: So very common. Maybe the support team does an amazing job summarizing all the tickets, the feedback that's coming in, even tags things for you. And then as a PM you have to sit and read it all and think about it and say, okay, how should this influence my roadmap? Wouldn't it be nice if you could use AI to get started?
David Kossnick: And so imagine you say, create a PRD for new image editor based on the problems in all this user feedback. Here we go.
AJ Asver: Okay. No need for PMs
David: Yay.
AJ Asver: what are you gonna do after? this goes into production?
David Kossnick: Well, of course it's a first draft. You know, you should always, uh, proofread. You should always change it. And so maybe AI can help with that too. Say, make this a little longer, um, and have it help me edit this PRD before I send it off. Um, there we go.
AJ Asver: One of the things I love about this is for me, often when I was PMing, it's that cold start problem. It's like you are at like Tuesday, you know, you've got your no meeting Wednesday and you've gotta write this PRD in time for like a review deadline on Thursday because it's gonna go in product review on Friday.
AJ Asver: Right? And you've gotta start it and you just keep putting it off cuz you've got back to back meetings. And then you get to Wednesday and you're staring at a blank screen and then you're like, Oh, maybe I need to go check my email. Right. I just think like more than anything else, this will just solve that cold start problem of just getting something down that you can start iterating on so you can just make progress faster.
AJ Asver: And now what would've taken you three or four hours to get from like blank screen to PRD first draft is now probably gonna be a couple of hours because you've got that first version, you can kind of iterate on it from there. So I think this is gonna be a huge help for PMs. And, the cool thing about it is you're taking all this structured data right, from different places and bringing it into one place.
AJ Asver: And one of the things we often did when we used Coda at Brex is like we would have, you know, like you had like kind of place that had customer feedback, Or you'd be aggregating different feature ideas in like a brainstorming, section and then you're kind of bringing them in here and turning them into a PRD.
AJ Asver: So that's pretty cool.
David Kossnick: Thanks. So another super common scenario is you have a team meeting at Coda. We do these inside of a doc with structured data, which I really love. They let people vote on which agenda item to talk about first, and you can even have notes attached to it here about what's happening. But again, what do I do after the meeting?
David Kossnick: As a PM I spend a ton of time writing up next steps, but oh yeah, I can do that for me. That's awesome. Uh, one of the other things I do all the time is write up summaries. Um, what if instead I could ask ai, AI to do that too? And then of course I can send that out to Slack direct from Coda.
David Kossnick: So outreach. One that we've already started doing at Coda is personalizing messages to key accounts. Um, and so let's say we have this, uh, launch message about this new image editor feature. We wanna tailor it based on the title and the company.
David Kossnick: Uh, we can go ahead and get a, an easy first draft to start with here. Boom. Let's say we don't like one of these. I'm just gonna refresh this one. We'll get another example. Um, and maybe I want to go in and change this a little bit. Um, hope your family is doing well and using our Gmail integration. I'll just go ahead and send that email.
AJ Asver: And that's actually gonna send that email now to the, person that you wanna do outreach to. And you just basically generated that email based on kind of the context of the person's job
David Kossnick: Totally. And one of the really fun parts of this is it's super flexible. So imagine you had another column that's like family member names, or hobby or other things like that that you're not gonna find in, your favorite, uh, sales tool. Um, AI is really good incorporating that context. So having all your stuff here in the team hub, being able to pull that context in, um, is really powerful.
AJ Asver: Does it generate tables too? How how does
David Kossnick: You know, we showed a simple example here, which generates the table of target audiences. Um, but actually one, maybe I'll just show real quick, um, that I've been doing in my personal life. So very different kind of template. Um, we use meal planning, so I'm a vegetarian, so meal planning can be sometimes a bit of a pain with two kids, uh, making sure everyone gets exactly what they need.
David Kossnick: Um, and so I made a quick template. Everyone in my family can go in and add. Their favorite ingredients and get out both a bunch, a bunch of ideas as well as, uh, specific dishes. And so this is an interesting case where it's just a normal table here. And I'll say, uh, AJ, what's your favorite ingredient?
AJ Asver: Well, you know what? My kids love
David Kossnick: Broccoli. Wow. Nicely done. cool. And then I'll go ahead and, uh, or auto update it here. Uh, so it has a bunch of different meal ideas and yeah, let's say I'll take, uh, spinach and cheese quesadilla. Um, and I'll add that one in here. Spinach and cheese quesadilla. Um, and AI is gonna start generating, uh, what ingredients are needed for that as well as a, a recipe and about how long it would take.
AJ Asver: That's a really, really awesome hack. I think I need to start doing that, as well to just use it to generate ideas for, meals as well and meal planning. That's, that's very, very cool. And this is like a good example of also like how it can be helpful in like a personal setting too.
AJ Asver: Right?
David Kossnick: For sure.,
Using AI to speed up lead qualification
AJ Asver: I was not gonna bring you onto this podcast without you helping me with something. So one of the reasons I wanted to bring you on here is because now that we've started this startup and we're trying to get everyone to be very excited about our AI agents, I am doing what's called founder-led sales, which is where we kind of work out, okay, who are the companies that we wanna target?
AJ Asver: And then at the very early stages we're trying to find like five design partners that we can work with. And they're kind of like enterprise FinTech companies, similar to like Brex where I used to work. And my job is to do the sales. Cause there's only two of us right now and Miguel's busy building the product. And so I gotta work out which companies will be a good fit and then we work out, okay, how do we get an intro to them? Maybe it's through an investor, maybe it's through a mutual contact. Maybe we, outbound to them because you know, maybe it's a company that we worked at before or something. And so I've been trying to work out how to use Coda to help me do this.
AJ Asver: And I was wondering if you might be able to help me make my Coda CRM better with AI.
David Kossnick: Sounds amazing. Let's do it.
AJ Asver: I'm very excited about this because this is gonna save me a lot of time.
AJ Asver: Over here is my Coda CRM and very high level. I have this list that I've got, and I think I got it from Crunchbase, um, with just a bunch of FinTech companies. And some of them might be a good fit, some of them might not. Oh, that's definitely not a FinTech company. Let's take that one out. Um, and I, and I'm trying to work out kind of one, how big are these FinTech companies? Either they kind of size where they would be an interesting fit for our, for our product, we generally try to focus on kind of growth stage FinTech companies, and then two, Would they qualify or not? Based on do we think they might need the product we're trying to build? And so I was wondering, maybe the first thing is I have this kind of list of the different, um, tiers of, of customers that I might want or the different types of companies that I might want to organize 'em into.
AJ Asver: So for example, there might be growth stage FinTech companies, there might be early stage FinTech companies. And I wanna work out how I can take these FinTech companies that I have in this list and kind of categorize them by that tier. maybe we could, start there.
David Kossnick: Yeah, sounds great. What kind of categories are you thinking about?
AJ Asver: I think maybe we can start with kind of early stage FinTech growth stage FinTech and a financial institution. So maybe first I need to add like a column, right? Is that correct?
David Kossnick: Yeah, sounds good.
AJ Asver: Okay. So we can go do that, have a column after this one. And then do I make this a select list? Is that
David Kossnick: Yeah, that's exactly right. So you could add, uh, select list
AJ Asver: Great.
David Kossnick: sort of a, a list of types or a list of items.
AJ Asver: Okay, so then let's, let's pick a few different stages. So it's growth stage, FinTech, um, maybe series C +, and then maybe early stage FinTech Seed Series C and then maybe kind of traditional financial institution. Cause I think there's a few of those in this list. And then I know there's probably some crypto startups in here too, so maybe I'll put like, crypto is another one too. Okay. So I've got my select list. What do I need to do next?
David Kossnick: Um, cool. So the, uh, actually before you forget, yeah, maybe rename the, the column there.
AJ Asver: So we call this customer segment. Great.
David Kossnick: So next, if you could just go to, uh, right click on that column and do add AI. Um, so the question is, what do you think would determine it? Is it, you know, one thing we could do is just give it the company name. many that are well known, it might do actually a pretty good job.
David Kossnick: Um, you could also try giving it the, um, the description as well. And you could say basically, you know, what, what category does this belong to?
AJ Asver: I kind of like the idea of doing it by description. That seems like a good way of doing it. So then would I just do this like, um, pick is like kind of pick the correct customer segment based on the company description
David Kossnick: Mm-hmm.
AJ Asver: provided?
David Kossnick: That's perfect.
AJ Asver: that work?
David Kossnick: And then if you do at, and then the column name should pull it in for you.
AJ Asver: Okay. And then let's see, what's this company
David: I think it was info.
David Kossnick: Yeah.
AJ Asver: Yeah. Great. And then how does it know which one to pick? Oh, it just kind of knows from what's available, right? Is that how it
David Kossnick: Yeah. The AI knows in a select list or look up the AI knows to use that set of options.
AJ Asver: Ah, smart. All right. So let's, let's do it. Okay. Wow. so it's already started. Um, but what happens if we think one of them might be wrong? So, for example, Affirm. I don't know if maybe it needs a bit more data.
AJ Asver: So I'm wondering if a good one could be that we look at funding.
AJ Asver: And then last funding round will probably help. Um, and then maybe like employee count. I feel like this should all help work it out.
David Kossnick: Yeah. Sounds great.
AJ Asver: Okay. So then if we do that, um, then we need to go back to here and go to this segment thing. We close this. Okay. And then we go, okay. Provided. And then this is kind of my prompt engineering now I usually like to do it like this. So company, maybe we'll do company name cuz that might be helpful too, right?
AJ Asver: Might know some of this stuff already. Company description, just info. Last funding round. Total funding. Okay. And now let's give this another try.
David Kossnick: All righty.
AJ Asver: Um, Fill.
David Kossnick: Wow. Very different result.
AJ Asver: Great. That's way better. Right . Now it like correctly categorized firm, which is awesome. um, and Agora and Airwallex Crypto for Anchorage, which is awesome. Apex, this is great. Now I have some qualification.
AJ Asver: I just ran the qualifier and it's actually started qualifying these leads, which is really cool. And I guess in the case where it doesn't have information, it'll tell me, if the company's a traditional financial institution, might work that out. So I guess I could go through these and check these all later. But the cool thing is I now have some qualified beads that I can start, um, working out which ones to focus on for my sales.
David Kossnick: That's awesome.
AJ Asver: And reach out.
David Kossnick: Nice.
Coda's long term strategy with AI
AJ Asver: I am really excited to start, um, using my new. AI powered lead qualifier cuz I'm a one man sales team and I'm already thinking of other ideas. Like earlier when you showed me generating emails, I think I'm gonna start trying that as well, like generating outbound emails or even introduction emails to, to the right customers, based on who the right investor connection is.
AJ Asver: And that stuff always just takes a lot of time. And I'll say like when I'm in front of a customer and pitching, I'm like the most energized. And I think through the sales process when I'm like in the CRM and trying to write emails and then qualifying leads, I'm like the least energized. So having AI helped me there is gonna be really great to, to gimme a bit of a boost. Um, I'm curious, where do you guys see the long-term kind of impact of AI being for Coda and how you guys think about it for long-term strategy and also when will this be available for other people to try out?
David Kossnick: We are launching, the beta very, very soon. The beta will be very similar, but there'll be a lot more templates and resources, um, and we'll be letting in way more folks and so would love feedback.
David Kossnick: Please try it out. And in terms of where we're going with AI, I'm really excited about it. I mean, as I mentioned at the start, Coda has kind of a uniquely great place for AI because it brings so many different pieces of a team together in one place. And that context is so helpful for AI. And so things that get me really excited is being able to ask your workspace a question about how a project is going, and it's able to just answer because it knows all the tools you're connected to, all the notes people are taking about every project.
David Kossnick: Um, you know, imagine you come back from being off for a week on vacation, you're like, what did I miss? And you get the perfect summary and you can drill into more detail and granularity on anything that you're curious about. Um, you know, imagine it felt more like having a teammate when you used AI.
David Kossnick: Who was able to engage on, um, on your projects and give feedback and have details. And obviously it's not exactly a teammate and you're gonna have to, to teach it about each, each case. Um, but I think the, the vision of having less busy work and more impact is really exciting.
AJ Asver: I, I think that potential of, AI in Coda that you mentioned beyond just a single doc, but when you're using AI to really run your company is gonna be a really, really, really powerful one. Because if you have kind of sold on using AI as your wiki and to run your projects and your docks and your spreadsheets, then you guys basically have all the information as you mentioned that's required in kind of an internal knowledge base to answer these complex questions.
AJ Asver: Like what's the status of a project, who is working on what parts of the project? And so. personally, that's, uh, something I'm very excited before because we use Coda for everything and we're a very small team right now. But I imagine as we get bigger and more people get involved, being, being able to ask those questions and being able to answer 'em is really, really cool.
AJ Asver: And so it's gonna be in beta soon, which is really, really awesome. And how are you, as a PM thinking about, you know, launching this Beta and what it means to kind of bring this into production? And do you have any tips for other PMs that are working on AI products? Because you are, you, you've been at this now for six months, right? Um, which puts you, I would say, in like the early percentage of PMs that are working with, you know, GPT. Um, curious what your tips are for, for other folks trying to bring a product from that hackathon to a beta and then a GA.
David Kossnick: know, I'd say a lot of people start, like we did, of just sort of, you know, throwing AI behind a prompt, a prompt box in your product. I think it's a great starting point to learn and see what people are doing. And I think as you develop a sort of deeper sense of what use cases are really valuable, um, you're gonna build something much more specific for them.
David Kossnick: One of the really fun things about working with AI is you don't know exactly how it's gonna go. You know, the models are a moving target in terms of what they're really good at, what they're really bad at, what the, what people expect of them as well. And so, uh, you know, a very common process I've seen a lot of teams do, us included is have some generic prompt box, some entry point into AI in their product and sort of see what people use and gravitate towards in their scenario.
David Kossnick: I think that's a great starting point to learn. As you have deeper conviction, building purpose-built things for those use cases is really, really valuable cuz it's just a lot less effort at the end of the day. Writing prompts is really helpful for a really specific thing, but it's a lot of work for something you just want done quickly.
David Kossnick: And so some of the stuff we've been working on at the last month or two is, exacto knives, really specific things to open up exactly what you want in one scenario. and people have been loving them, which is great.
AJ Asver: I have one feature request, which is please automatically work out what type of column I need. Based on like the description of the column That seems like an easy one for AI to, to solve.
David Kossnick: You know, that's an interesting one cuz we, we actually do do it today, but in a subtle way actually. And yeah, this is like really in the weeds, but the text column type in Coda is actually an inferred column type. If you put a number in there or any other kind of structure piece of data, it will actually be able to, you know, operate on that data as if it were that type.
AJ Asver: Well, I am very, very much looking forward to seeing more AI in my Coda docs and also very excited to see where this all goes. think what you guys are doing with Coda and AI is really, really, really, really cool and also just very helpful. Like it saved me a lot of time and I think other people too. Um, once it's available in beta, I'm sure there'll be lots of new use cases that we haven't even seen yet.
David Kossnick: I did have one last question for you as well, which is, uh, you know, since you're working on AI every day on agents in particular, I'm curious how you think about sort of the future of agents in productivity. Like, what do you imagine agents are gonna do in finance and in every vertical, uh, you know, in collaboration with people?
AI agents
AJ Asver: That is something I spend a lot of time thinking about, um, in between like building actual agents. And right now I would say that we are vastly underestimating what's possible based on what we've been able to achieve at departure with our agents and the fact that they can just follow a set of instructions from a Google doc and carry out, you know, a very complex compliance task, like,
David Kossnick: It's not a Coda doc? AJ!
AJ Asver: That's true. We should be using Coda docs. Yes. A Coda doc, sorry. being able to follow the instructions and just carry out a task with the tools given. That's like a very novel and pretty awesome, uh, thing to be able to do because now you can automate a lot of repetitive tasks that have to be done very manually today. And so I imagine we're gonna see a lot of. This idea of intelligent automation as we've been calling it, where you aren't just doing the kind of robotic process automation or like workflow automation that you did before where you're connecting things together with conditional rules. But you're actually now using essentially prompt engineering to automate a task fully with lots of different steps and lots of different tools being used just by providing the instructions.
AJ Asver: In a way, you're, you are used to doing it already if you are solving this manually with the team, which is essentially a user manual, a set of, um, kind of procedures. And so you are gonna think about the many different use cases for that, not just in finance, but in processing forms in healthcare or insurance or in all kinds of other places where these manual, workflows are being done.
AJ Asver: I think it's gonna free up people to have a lot more time to do more interesting creative, strategic work, than doing this more repetitive kind of, Tedious work that exists today. So we're very excited to see where that goes. And we're at the very early stages of it today, but, um, I I think it's gonna move very quickly.
David Kossnick: Amazing.
Wrapup
AJ Asver: David, really appreciate it. you for taking the time to demo Coda AI, for talking to us a little bit about the kind of story behind how this feature came about and then for also helping me be more productive, with my Coda workflows as well. I'm really excited to see, product launch soon, beta and just a reminder for everyone where can they find, AI if they wanna sign up for it.
David Kossnick: coda.io/ai.
AJ Asver: Awesome. Well thank you very much David, and you everyone for listen in to this week's episode of the Hitchhikers Guide to ai and I will see you on the next one.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Interview: Human-level AI and AI Agents with Josh Albrecht, CTO of Generally Intelligent
14 maj 2023· The Hitchhiker's Guide to AI
Interview: AGI and developing AI Agents with Josh Albrecht, CTO of Generally Intelligent
I’ve been spending a lot of time researching, experimenting and building AI agents lately at Parcha. That’s why I was really excited I got the chance to interview AI researcher Josh Albrecht, who is the CTO and co-founder of Generally Intelligent. Generally Intelligent’s work on AI Agents is really at the bleeding edge of where AI is headed.
In our conversation, we talk about how Josh defines AGI, how close we are to achieving it, what exactly an AI researcher does, and his company’s work on AI agents. We also hear about Josh’s investment thesis for Outset Capital, the AI venture capital fund he started with his co-founder Kanjun Qui.
Overall it was a really great interview and we covered a lot of ground in a short period of time. If you’re as excited about the potential of AI agents as I am or want to better understand where research is heading in this space, as I am this interview is definitely worth listening to in full.
Here are some of the highlights:
* Defining AGI: Josh shares his definition of AGI, which he calls Human-level AI a machine’s ability to perform tasks that require human-like understanding and problem-solving skills. It involves passing a specific set of tests that measure performance in areas like language, vision, reasoning, and decision-making.
* Generally Intelligent: General Intelligence's goal is to create more general, capable, robust, and safer AI systems. Specifically, they are focused on developing digital agents that can act on your computer, like in your web browser, desktop, and editor. These agents can autonomously complete tasks and run on top of language models like GPT. However, those language models were not created with this use case in mind, making it challenging to build fully functional digital agents.
* Emergent behavior: Josh believes that the emergent behavior we are seeing in models today can be traced back to training data. For example being back to string together chains of thought could be from transcript of gamers on Twitch.
* Memory systems: When it comes to memory systems for powerful agents, there are a few key things to consider. First of all, what do you want to store and what aspects do you want to pay attention to when you're recalling things? Josh’s view is that while it might seem like a daunting task, it turns out that this isn't actually a crazy hard problem.
* Reducing latency: One way to get around the current latency when interacting with LLMs that are following chains of thought with agentic behavior is to change user expectations. Make the agent continuously communicate updates to the user for example vs. just waiting for to provide the answer. For example, the agent could send updates during the process, saying something like "I'm working on it, I'll let you know when I have an update." This can make the user feel more reassured that the agent is working on the task, even if it's taking some time.
* Parallelizing chain of thought: Josh believes we can parallelize more of the work done by agents in chain of thought processes, asking many questions at once and then combining them to reach a final output for the user.
* AI research day-to-day: Josh shared that much of the work he does as an AI researcher is not that different from other software engineering tasks. There’s a lot of writing code, waiting to run it and then dealing with bugs. It’s still a lot faster than research in the physical sciences where you have to wait for cells to grow for example!
* Acceleration vs deceleration: Josh shared his viewpoints for both sides of the argument for accelerating vs decelerating AI. He also believes there are fundamental limits to how fast AI can be developed today and this could change a lot in 10 years as processing speeds continue to improve.
* AI regulation: We discussed how it’s challenging to regulate AI due to the open-source ecosystem.
* Universal unemployment: Josh shared his concerns that we need to get ahead of educating people on the potential societal impact of AI and how it could lead to “universal unemployment”.
* Investing in AI startups: Josh shared Outset Capital’s investment thesis and how it’s difficult to predict what moats will be most important in the future.
Episode Links:
The Hitchhiker’s Guide to AI: http://hitchhikersguidetoai.com
Generally Intelligent: http://generallyintelligent.com
Josh on Twitter: https://twitter.com/joshalbrecht
Episode Content:
00:00 Intro
01:42 What is AGI?
04:40 When will we know that we have AGI?
05:40 Self-driving cars vs AGI
07:10 Generally Intelligent's research on agents
09:51 Emergent Behaviour
11:07 Beyond language models
13:17 Memory and vector databases
15:25 Latency when interacting with agents
17:13 Interacting with agents like we interact with people
19:08 Chaing of thought
19:44 What do AI researchers do?
21:44 Accelerations vs. Deceleration of AI
24:05 LLMs as natural language-based CPUs
24:56 Regulating AI
27:31 Universal unemployment
Thank you for reading The Hitchhiker's Guide to AI. This post is public so feel free to share it.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Saknas det avsnitt?

Klicka här för att uppdatera flödet manuellt.
Enterprise AI, Augmented Employees, AGI and the Future of Work with Charlie Newark-French, CEO of Hyperscience
23 mar 2023· The Hitchhiker's Guide to AI
Hi Hitchhikers!
I’m excited to share this latest podcast episode, where I interview Charlie Newark-French, CEO of Hyperscience, which provides AI-powered automation solutions for enterprise customers. This is a must-listen if you are either a founder considering starting an AI startup for Enterprise or an Enterprise leader thinking about investing in AI.
Charlie has a background in economics, management, and investing. Prior to Hyperscience, he was a late-stage venture investor and management consultant, so he also has some really interesting views on how AI will impact industry, employment, and society in the future.
In this podcast, Charlie and I talk about how Hyperscience uses machine learning to automate document collection and data extraction in legacy industries like banking and insurance. We discuss how the latest large-scale language models like GTP-4 can be leveraged in enterprise and he shares his thoughts on the future of work where every employee is augmented by AI. We also touch on how AI startups should approach solving problems in the enterprise space and how enterprise buyers think about investing in AI and measuring ROI.
Finally, I get Charlie’s perspective on Artificial General Intelligence or AGI, how it might change our future, and the responsibility of governments to prepare us for this future.
I hope you enjoy the episode!
Please don’t forget to subscribe @ http://hitchhikersguidetoai.com
Thanks for reading The Hitchhiker's Guide to AI! Subscribe for free to receive new posts and support my work.
Episode Notes
Links:
* Charlie on Linkedin: https://www.linkedin.com/in/charlienewarkfrench/
* Hyperscience: http://hyperscience.com
* New York Times article on automation: https://www.nytimes.com/2022/10/07/opinion/machines-ai-employment.html?smid=nytcore-ios-share
Episode Contents:
00:00 Intro
01:56 Hyperscience
04:52 GPT-4
09:41 Legacy businesses
11:13 Augmenting employees with AI
15:48 Tips for founders thinking about AI for enterprise
20:34 Tips enterprise execs considering AI
23:49 Artificial General Intelligence
29:41 AI Agents Everywhere
32:12 The future of society with AI
37:44 Closing remarks
Transcript:
HGAI: Charlie Newark French
Intro
AJ Asver: Hey everyone, and welcome to the Hitchhiker Guide to ai. I am so happy for you to join me for this episode. The Hitchhiker Guide to AI is a podcast where I explore the world of artificial intelligence and help you understand how it's gonna change the way we live, work, and play. Now for today's episode, I'm really excited to be joined by a friend of mine, Charlie Newark, French.
AJ Asver: Charlie is the CEO of hyper science, a company that is working to bring AI into the enterprise. Now, Charlie's gonna talk a lot about what hyper science is and what they do, but what I'm really excited to hear Charlie's opinions on is how he sees automation impacting our future.
AJ Asver: Both economically, but as a society, as you've seen with recent launch of G P T four and all the progress that's happening in AI, there's a lot of questions around what this means for everyday knowledge workers and what it means for jobs in the future. And Charlie, has some really interesting ideas about this, and he's been sharing a lot of them on his LinkedIn and I've been really excited to finally get him on the show so we can talk. Charlie also has a background in economics and management. He studied an MBA at Harvard and previously was at McKinsey, and so he has a ton of experience thinking about industry as a whole, enterprise and economics and how these kind of technology waves can impact us as a society.
AJ Asver: If you are excited to hear about how AI is gonna impact our economy, our society, and how automation is gonna change the way we work, then you are gonna love this episode of The hitchhiker Guide to ai.
AJ Asver: Hey Charlie, so great to have you on the podcast. Thank you so much for joining me.
Charlie: Aj, thank you for having me. I'm excited to discuss everything you just talked about
AJ Asver: maybe to start off, one of the things I'm really excited to understand is how did you end up at Hyper Science and what exactly do they do?
Hyperscience
Charlie: Yeah, hyper Science was founded in 2014. It was founded by three machine learning engineers. so We've been an ML company for a long time. My background before hyper science was in late stage investing. Had sort of the full spectrum of outcomes there.
Charlie: Some why successful IPOs, some strategic acquisitions, and then a lot of miserable, sleepless nights on some of the other areas. I found, hyper science, incredibly impressed with, their ability to take cutting edge technology and apply it to real well problems. We use machine vision, we use large language models, and we use natural language processing, and we use that those technologies to speed up back office process.
Charlie: The best examples here are a loan origination, insurance claims processing, customer onboarding. These are sort of miserable long processes, a lot of manual steps, and we speed those up. With some partners taking it down from about 15 days to four hours.
Charlie: So all of that data that's flowing in of this is who I am, this is what's happened, this is the supporting evidence. We ingest that. It might be an email, it might be a document. It's some human readable data. We ingest that, we process it, and then ultimately the claims administrator can say, yes, pay out this claim, or no, there's something.
AJ Asver: Yeah, so what, what you guys are doing essentially is you had folks that were previously looking at these documents, assessing these documents, maybe extracting the data out of these forms, maybe it was emails, and entering those into some database, right? And then decision was made, and now your technology's basically automating that. It's kind of sucking up all these documents and basically extracting all that information, helping make those decisions. My understanding is that with machine learning, what you're really doing is you've kind of trained on this data set, right, in a supervised way, which means you've said like, this is what good looks like.
AJ Asver: This is what, you know, extracting a, a, a data from this form looks like now we're gonna teach this machine learning algorithm how to do it itself. Now what what I found really interesting is that, That was kind of where we made the most advancements, really in kind of AI over the last decade, I would say.
AJ Asver: Right? It's like these deeper and deeper neural networks. They could do machine learning in very supervised ways, but what's recently happened with large language models especially, is that we've now got this like general purpose AI that, you know, GPT-4, for example, just launched this. and there was an amazing demo where I think the CTO of OpenAI basically sketched on like the back of a napkin, a mockup for a website, and then he put in in GPT and it was able to like, make the code for it.
AJ Asver: Right. So when you think about a general purpose, let large language model like that, compared to the machine learning, y'all are using do you consider that to be a tool that you'll eventually use? Do you think it's kind of a, a threat to like the companies that have spent the last, you know, 5, 6, 7 years, decades, maybe kind of perfecting these ma machine learning tools or, you know, I, is it something that's gonna be more like different use cases that won't be used you know, by your customers?
GPT-4
Charlie: Open ai ChatGPT, GPT-4. Anything that's been, the technology you're speaking about has really had two fundamental impacts. There's been the technology. It's just very, very cutting edge, advanced technology. And then you've got the adoption side of it. And I think both sides are as interesting as each other.
Charlie: On the adoption side, I sort of like to compare it to the iPhone that there was a lot of cutting edge technology, but what they did is they made that technology incredibly easy to use. There's a few things that Open AI has done here that's been insanely impressive. First, , they use human language. Um, humans will always assign a higher level of intelligence to something that speaks in its language.
Charlie: The other thing, it's a very small thing, but I love the way that it streams answers so it doesn't have a little loading sign that goes around and dumps an answer on you. It's like, it's almost like it's communicating with you. Allow you to read in real time and it feels more like a conversation.
Charlie: Obviously the APIs have been a huge addition. It's just super easy to use, so that's been one big step forward. But it's a large language model. It's a chat bot. I don't wanna underestimate the impact of that technology, but my thoughts are AI will be everywhere. It's gonna be pervasive in every single thing we do.
Charlie: And I hope that chatbots and large language models aren't the limitation of ai. I'd sort of like to compare chatbots and large language. To search the internet is this huge thing, one giant use case that if you ask people what is the internet? They think it's. Google, And that's the sort of way I think this will play out with AI and the likes of a whichever large language model and chatbot wins to be the Google of that world, which at the moment appears very clearly to be open ai.
Charlie: But there's some examples of stuff that. Certainly right now, that approach wouldn't solve. I'll give you a few, but the, this list is tens of thousands of use cases long. We spoke about autonomous vehicles earlier. I suspect LLMs are not the approach for that physical robotics. Healthcare detecting radiology diseases, fraud detection.
Charlie: I'm sure if you put in like a fake check in front of GPT-4 right now it was written on the napkin, it might be able to say, okay, this is what the word fraud means. This is what a check looks like, but you've got substantially more advanced ai AI out there right now that looks at. , what, what was the exact check that Citibank had in nine, in 2021?
Charlie: Does this link up with other patterns that should be happening with this kind of transaction and so, I think that you are gonna have more dedicated solutions out there than just sort of one chat or interface to rule them all, would be my guess. Yeah. Hyper science. There are things that chat G p t does, or g p t four does right now that we do.
Charlie: Machine vision is one that's an addition that appears to be working alongside their large language model. So they're combining different technologies versus just a large language model, is my guess. I obviously don't have work ins. But we build a platform here at hyper science that builds workflows, that enriches data, validates data, compares data looks at data that's historically come into an organization that might not be accessible to sort of public chat bots or large language models.
Charlie: I think the question that you sort of said at the beginning, Could we be using chat, G p t or g p T four? Absolutely. And I think that a lot of startups could, but I suspect that, that you, what you'll see here is a lot of the startups that spin up leveraging this and building something far greater from a user experience for a very specific use case versus open AI solving all the sort of various small problems along the way, if that makes.
AJ Asver: Yeah, I think that makes a lot of sense. And it's one thing I've been thinking a lot about. I actually wrote a blog post recently about this as kind of how these foundational models are gonna become more and more commoditized and it's gonna create this massive influx of products built on top of it.
AJ Asver: What I find really interesting is that you know, GPT, you can actually use that transformer for a lot of different things that aren't necessarily just a chatbot.
AJ Asver: Right. So you mentioned the fraud use case. If you send a bunch of patents of fraud to a large transformer, its ability to actually remember things makes it very good at identifying fraud. And in fact, I was talking to a friend that, that worked at Coinbase in their most recent fraud control mechanism.
AJ Asver: They went from kind of linear aggressive models to deep learning models, to eventually actually using transformers and it, and it was far more, far more effective. So I guess coming back to the question, do you see a world where instead of building these. Focused machine learning models for particular use cases like you know, ingesting documents or maybe making sense of data and extracting that data and tabulating it into a, into a database that you might and one day end up actually just having a general pre-trained transformer that you are then essentially fine tuning with one shot. Kind of tuning me. Like, this is how you extract a document for one of our clients. This is how you you know, organize this information into like loan data. Is that a world we could move in? That's probably different from where we are today and, and maybe a different world of hypo sciences too.
Legacy businesses
Charlie: Look, it would be a very different world. I think the next five, 10 years are leveraging the, the. Of technology that OpenAI is building and maybe that specific technology, as you sort of say of commod, some commoditized layer and building. Workflows on top of that, I'll give you the, just the harsh reality of what the world looks like in reality out there.
Charlie: Right? So this isn't just a single use case that I go and type something in as a consumer on on the internet at a bank in the uk they have a piece of cobalt that it was written in the 1960s that is still live in their mortgage processing.
AJ Asver: Wow.
Charlie: Rolling out, even just from a compliance level, any change to that mortgage processing that isn't piecemeal fashion, that doesn't about the implications, that doesn't think about customer interactions in a a week timeframe or a a year or three year timeframe is just not dealing with the reality of the situation on the ground.
Charlie: These are complex process. People get fired if you take a mortgage processing system down for minutes. And they're complex. So do I think that's a possibility in the future? It's absolutely possible. I think the best use of GPT-4 right now is to go and build the extensive workflows that require a little bit of old-fashioned AI, as well as cutting edge AI to, to have an end-to-end solution for a specific problem versus assuming that we're ever gonna get something. But you just say, okay, I'd like to know, should I give this customer this mortgage?
Charlie: And you get an answer back. That, to answer that question is still a very complex process.
Augmenting employees with AI
AJ Asver: Yeah, and I think we, especially in the technology industry, especially someone like me that spends so much time thinking about, talking about reading about AI, kind of forget that a lot of these legacy businesses can't move as fast as we think. I mean, we see like Microsoft moving quickly and slack moving quickly for example.
AJ Asver: But those are all like very software focused consumery businesses that you know, necessarily touching like hard cash and stuff like that where there's a lot more compliance and regulations. So that makes a lot of sense. So then what we really are thinking about is like you kind of have humans that can be, as you've put it before in some of your predictions around ai, augmented, right?
AJ Asver: These, this idea of like an augmented employee that can use AI to, to help them get things done, but we're not necessarily replacing them straight away. Like, talk to me about what, what you see as a future of kind of augmented employees and, and kind of co-pilots as they're also called.
Charlie: Totally. So the augmented employee is a phrase that I've been using. For about 10 years, it's been a prediction for a while now. It didn't used to be a particularly popular one. You would get a whole load of reports even from the like big consultancy groups that say these five jobs are definitely gone in five to 10 years.
Charlie: That five to 10 years has come and gone over that period of time, or I'll give you a longer period of time. Over the last 30 years, we've added 30 million jobs here in the us about a on average. Obviously, it's been a bit of fluctuation. There's no good sign on a short term decision making time horizon that jobs are gonna be wildly quickly displaced.
Charlie: There's very little evidence. That's my. What do I mean by short term horizon? I really mean by the when, what a large enterprise, which is what I, my company serves and what I'm interested in serving, makes decisions 5, 10, 20, maybe even as that's the sort of edge of where I think things start to really change.
Charlie: Fundamentally you should make decisions around software. And AI in this case, substantially helping people do a better job. The, the, the first time I read this getting sort of a mainstream idea was about a year ago. And by mainstream, I mean outside of the tech industry New York Times wrote an article where the title was something like in the.
Charlie: Fight between humans and robots. Humans seem to be winning. I think that was just a very interesting change of thought. And there was a line in there that says the question used to be, when will robots replace humans? The better question is, which I absolutely love this phrasing of it, when will humans that leverage robots replace humans that don't leverage robots?
Charlie: And I think that's the right way to think about it. I, I'll give you a couple of examples. One with sort of, non-AI OpenAI technology and then chat. G PT Speci specifically, or, or G P T. Radiology is something that's been talked about for a while. This was a giant step forward where software AI could detect most.
Charlie: Cancers most diseases, basic diseases better than humans could just had higher accuracy. And the prediction for five, seven years was, this is the end of radiologists. We've seen no decrease in radiologists. If you want to go and get a cancer screening now, you're gonna probably look at a six to nine month wait.
Charlie: I don't have any issues, but I'm waiting for a cancer screen right now. Just a nice safety check that I want to my own benefit and cause of the sheer backlog of work. , I can't get that done. I can't get it done for a while. So is the, the future for me is in two or three years time, there's not fewer radi.
Charlie: There's just much higher accuracy and much shorter wait, wait times. And maybe the future, as I say, which I'm sure we'll speak about 20 years down the line is is I can just go to a website. They can do some scan of me, and then they can give me the answer within seconds. I, I, I can't wait for that, but it's just not here today.
Charlie: And I'll give you one aj one quick open AI example. When ChatGPT came out there was so many sort of, this is not ready for mainstream things that went round. And the, the way that I thought about it is, if you want ChatGPT then to write you a sublease because you want to lease your apartment and you want it to be flawless, you just want to click send to the person that's doing the sublease on. It's nowhere near medi ready for mainstream. If you wanna cut down a law legal person's work by 90% because the first draft is done. They're gonna apply local laws, they're gonna look at a few nuances. They're gonna understand the building a bit then it's so far beyond ready for mainstream. It should be used by everybody for every small use case it can. So I think it's human augmentation for a while. I think that jobs don't go away for a while, and I sort of like to compare it to the internet a little bit here, which is we use the internet today and every single thing we do and it makes our jobs substantially easier to do. It makes us more effective at them, and that's what I think the next sort of 10 years at least looks like for AI within the work.
Tips for founders thinking about AI for enterprise
AJ Asver: The thing you mentioned there, I find to be really fascinating is this idea that, you know, we're not gonna replace humans immediately. That's not happening. But people thought that for a long time. Right. And it almost feels like with every new wave of technology, there's this new hum of people saying like, we don't replace humans, we're gonna replace humans.
AJ Asver: Right. But at the same time, I, I kind of agree with you, having spent a lot of time using chat JBT and working with it, I found that it certainly augments my life, in writing My substack in fact, in this interview preparing for this interview, I actually asked it to help me think about some interesting questions to ask you based on some of the posts you'd written.
AJ Asver: Because I'd read some of your posts on, on, on LinkedIn fairly regularly, but I couldn't remember all of them, so I actually asked the Bing ai chat to help me. Right. And then when you think about these especially regulated environments where you. The difference between right and wrong is actually someone's life or a large sum of money or breaking the law, then it really matters. And in that case, augmentation makes a lot of sense. Now, the reality is, AI, especially kind of large language models in building on top of open AI is a fairly low barrier to entry right now. That's why we're seeing a lot of companies in copywriting, in collaboration, in presentations, and the challenge with that is if there's an incumbent in the space, That already exists. It's very hard to beat them on distribution right now. Where I did see an interesting opportunity is exactly what you are talking about, is like going deep into a a fairly fragmented industry, which maybe has a lot of regulation or a lot of complexity, maybe disparate data systems.
AJ Asver: You mentioned kind of the. 30 year old like cobalt data system, right? Like that is a perfect place where you can go in and really understand it deeply. Now, as someone that's running a company that does that, I'm curious, like what advice do you have to founders or startups? I wanna take that path of like, Hey, we're gonna take this AI technology that we think is extremely powerful, but go apply it into some deep industry where you really have to understand the ins and outs of that industry, the regulation, the, the way people operate in that industry and in the problems.
Charlie: Absolutely a few thoughts. Firstly, make sure that you are trying to solve a problem. This is just general advice for setting up a business. What is the problem you're trying to solve? What is the end consumer pain point? For us here at Hyper Science, it's that people wait for their mortgage to be processed for six weeks.
Charlie: No good reason why that's happening. People wait for their insurance to be insurance claims to be paid out sometimes for two years. No good reason for that to be happening. So always start with the customer pain point, and then decide does the current technology, which is AI in this case, allow you for solving it?
Charlie: And then that gets you to the, does it allow you to solve for it? And what I've looked for here is, the more open AI can do it or G p t four can do it a whole load of diverse stuff, but your highest value things are gonna be what's just happening time and time again. If for us, like there is just a whole load of mortgages, that process not right now or there is just a whole load of insurance things that are processed and they're.
Charlie: Relatively similar, although we think of them as different. They've got a lot of, certainly to a machine similarities. So I'd look for volume. You can think of this as your total addressable market in terms of traditional vc, non-AI speak. But this is, is the opportunity big enough? And then the, the next thing I'd look for is repetitive tasks.
Charlie: So the more repetitive it is, the easier it's. You can go out and solve something really, really hard with a large language. But there's probably even easier applications that you can solve that are just super repetitive and you can take steps out. So I think that's it. Have the problem in mind.
Charlie: Think about volume, think about repetitive natures, and then one of the key things, once you've got all of that set and you've got, okay, this is an industry that's right for customer pain, right, for disruption. This is definitely a good application of where AI is today. I would think about ease of use above everything.
Charlie: My, my thinking is, and I spoke about this with open ai, one of the biggest things they've done is they've just taken exceptional technology, but made it so, so simple for someone that doesn't know AI to interact with. And the question I always get asked is the CEO of enterprise software, AI company is how can we upscale all of our employees?
Charlie: The answer to that is you shouldn't have to. This software should be so easy to engage with that your current employees should seamlessly be able to do it. There should be, if there is rudimentary training needed needed, your software should do that. And again, I like to compare this to the internet. We use the internet day in, day out.
Charlie: There has been 20 years of upskilling, but it's not really been that hard. Like I think if you took today the internet and you gave it to somebody 20 years ago, it might be a little bit advanced for. , but we've made software, internet software, so easy to work with that you don't need to know how the internet works.
Charlie: The number of people that know how the internet works, even the number of people that know how to truly code a website. Absolutely tiny fraction of the number of people that use the internet to improve their daily lives. So I'd say ease of use for AI is possibly as important as the technology.
Tips enterprise execs considering AI
AJ Asver: I love those insights by the way. And just to recap that you said go for a problem that has a lot of volume, whereas solving a real problem to end users, but there's clear volume or like, you know, a large addressable market. The other thing you mentioned was make sure it's repe repetitive tasks, like with l LM says it's temptation to go after these really complex problems, but like repetitive tasks are the ones that are most.
AJ Asver: That's probably the most incentive to solve as well. Right. And then the last thing you mentioned is like, it should be really intuitive for a, for an end user to use to the point where they don't have to feel like they need to be upskilled. Now, if you are a founder or a startup going down this path, the other thing you're thinking about is like, how do you sell into these companies?
AJ Asver: So maybe taking the flip side of that, if you are in the enterprise and you're getting approached by all these AI startups, they just got funded this year being like, we're gonna help you do this. We're gonna help you do that. We're gonna automate this. How do you decide when it's the right time to make that decision?
AJ Asver: How do you decide? Kind of, the investment on that and whether it's worth it. Like what, what are your thoughts on that?
Charlie: My thoughts on that are linked directly to the economic cycle we're in right now, which is not a pretty one. Somewhat of a maybe a mild recession, maybe the edge of a recession. And I see this from all of the CIO CEOs that we work with at the, the sort of large banks, large insurance companies.
Charlie: And my suggestion is this is I tell them to create a two by two matrix. You told everyone earlier, I started my career at McKinsey.
AJ Asver: Classic two by two.
Charlie: Love it. Two by two matrix. On one of the ax axis is short-term ROI on one of the ax axis is long-term roi and you want to get as much into the top right as possible and as few into the bottom left as possible.
Charlie: And for a y or artificial intelligence was considered ROI and not short term roi, which is a bit, they were treated by these large. As science experiments and you saw these whole, these whole roles form these whole departments form around transformation. The digital transformation officer, that is a role that just didn't exist five or 10 years ago, and these people were there to go and innovate within the organization and, and largely speaking, it wasn't wildly successful. A lot of these roles are sort of spinning down. You need to solve a business problem that the technology solves today and gives you a path to the long run. So, hyper science, we, I'll give you an example here. We add value out of the box, but we also understand where people are today and try to get them to where they want to go.
Charlie: So one of our customers, 2% of what they do is process fax. I hope that they are not processing faxes in five time, and I hope that we are giving them the bridge to that, but we better be able to do that today. And also paint them a, a sort of what I refer to as a future proof journey to where they want to head.
Charlie: So I think it's really about don't, don't do any five year projects. Like if a company comes and says to you things. When you say, can you do this? And they say, well, we could do that. I would run. Or if they're just saying yes to everything versus yes to something quite specific and a good startup, you know this better than anyone, aj, a good startup does something specific really well and then they build out that they have an MVP is one way of phrasing it.
Charlie: They have a niche is another way of phrasing. Yeah, go and sell something today really well, and all of those sort of long tail features around it, people will forgo for a period of time whilst you build those. , but you be, add value in the in the short term as well as be building something in the long run.
Artificial General Intelligence
AJ Asver: Yeah. That and that point you made about that kind of showing the short term value is really important, especially when you're trying to convert the kind of maybe biases around AI that exist in enterprise today, that it's, as you mentioned, kind of like a hobby project or a kind of experiment, or like this is kind of your, you know, your Moonshot kind of, kind of projects you wanna show them really, like, this is like ROI you're gonna get very quickly in the next one or two years and, and that's a really important point of it. Now all of this makes sense, the augmented employee, the co-pilot and, and like having these narrow versions of AI that are solving particular problems and, and I can see that working out, but I feel like there's this one big factor that we, that we have to think about that that could change all of this, in my view.
AJ Asver: And that is artificial general intelligence. And for folks that dunno what artificial general intelligence is, or AGI is it's called. That's really what open AI is trying to achieve long term. And it's the idea of essentially having intelligence that is the equivalent of a human. And it's an ability to think abstractly and to solve a wide, broad range of problem. In a, in a way that, that, that a human does. And what that means is technically, now, if you have an AGI and let's say the cost of running that AGI I is, you know, a hundredth or a thousandth of a cost of running a human, then potentially you could replace everyone with, with, with robots or you know, AI as, as it were.
AJ Asver: How does that factor into this equation? Is this so, You, you think about is it, like, what are your thoughts about it? Both, both as CEO of an enterprise company, but also as someone that's studied management and economics for the last decade? I'm really curious to, to hear where you think this is going.
Charlie: I don't think there's anything unique about a soul or something that can't be replicated in the human mind. And to your point, I actually think that we, we think of AGI sometime as when I hear definitions, I hear human-like, or human level intelligence if this happens or when it happens, because it, there's no doubt it will it will be substantially smarter, incredibly quickly than a human. And you look at the difference in humans of intelligence, someone you just pick off a street, 110 I IQ or whatever level it is, versus an Einstein with 170 iq, that difference is enormous. Now, imagine that that's the equivalent of 170 iq, but it's a thousand or 10,000 or whatever it is. I think you will get to the point where if you have AGI extremely quickly, you will. Be far beyond them not being able to do any job. There will be absolutely zero they can't do
Charlie: now I don't see that today. I, my best guess of a time horizon is post 20 years sub a hundred years. That's a nice vague timeframe, but that's sort of how I think about it. 20 years is your classic sort of decision making timeframe and for, for someone. Building or someone running an enterprise software company, it's not an interesting question of what do we do with agi?
Charlie: For someone thinking about designing a society thinking about economic systems, thinking about regulation, it's an extremely good time to start thinking about those questions. Let, let me start by AJ speaking about why I don't think it's here today. And then we can perhaps think. What world where it is here looks like, which I'm quite excited about by the way.
Charlie: I don't view as a, a dystopian outcome. Our current approach to AI today is machine learning. We spoke about that earlier today. Machine learning requires sort of three things, compute algorithms and data, and on all three of them, I think to have true agi we're. The compute power I think we need some leap forward there.
Charlie: It might be quantum computing. There's a lot of exciting happening there. The timeframes there. I'm not as close to that as I am and I ai, but no one's speaking about quantum computing on a sort of one year time horizon. They're speaking about it again on a 10, 20 year time horizon. The second thing is the algorithms.
Charlie: I just, from what we see out there, even with the phenomenal stuff that op that open AI is building, I don't see algorithms out there doing true AI true AGI. They are. The large language models, I, will say that I'm incredibly impressed with how GPT-4 plays chess. It's still not at a, their level of an algorithm that is designed specifically for chess but it's pretty damn good. So, my, my thoughts on the, the algorithms evolve every day, but certainly we're still not there today. And then one of the big hurdles is gonna be data a human ingest data. Rapidly all the time via a series of senses. You can think of that as five senses. You can think of it as 27 senses.
Charlie: People have a different perspective on this, but there's just data flowing into us the whole time and at the moment we don't have that in the technology space. If you wanna solve the autonomous vehicle, you've gotta hold. They do like cameras and the visual aspect extremely well. But to solve true AGI level staff to go beyond doing a 2D chess player game to processing a mortgage, I think there's also gotta be a new way of ingesting data.
Charlie: Now, one interesting question that I've always wondered is, What will the first iteration of AGI look like? And there's no good reason in my mind to think, I don't think this is the end state. Cause I think the end state's a lot smarter than this. But the first version of what we would consider agi. And general intelligence just means it can do many diverse things and learn from one instance and apply that learning to another instance.
Charlie: It could just be a layer that looks like a chatbot, that looks like GPT-4 or GPT-10, whatever it ends up being that ducks into different specific narrow. Ai. And so if you want to get in a car somewhere, you talk to G P T four and you say, I'm looking to go here. And that just plugs into some autonomous vehicle algorithm.
Charlie: That could be the first way. And it'll feel like general intelligence and it will be general intelligence or you might have just some massive change in the way algorithms are written. And I do think there's a lot of excite exciting happening there. It's just not clear what the timeframe. , uh uh, for that well,
AI Agents Everywhere
AJ Asver: Yeah, I like that last bit you talked about. I, I really. That as kind of a way to think about how AI will evolve. I think some people of call it this kind of agent model where you have essentially this l l m large language model, like GPT acting as an agent, but it's actually coordinated across many different integrations, maybe other deep learning models to, to get things done for you.
AJ Asver: And so the collective intelligence of all those things put together is actually pretty powerful and can feel like, or, or have the, the, the kind of illusion of, of artificial general intelligence. I think for me there's this philosophical question of like if it's as good as the thing we want it to be, does it matter if like some nuanced academic definition of AGI isn't what it is? You know what I mean? Like if it does all the things we'd want of like a really smart assistant that's always available, but it doesn't meet the specifics of AGI I in the academic sense. Maybe it doesn't matter. Maybe that's what the next 20 or 30 years looks like for.
Charlie: Look, I think that's exactly right and there's no good reason for us to care. We just care that it gets done. We have no idea how the mind even works. We're pretty sure that the mind doesn't say, okay, I've got this little bit for playing chess, this little bit for driving some different way of doing it.
Charlie: But humans are very attached to replicating. Things that they experience and understand. And one just very simple way of doing it is a change of the definition of AGI from what your average person might associate AGI with.
AJ Asver: That's yeah, that to me is a, a is kind of a mental shift that I think will happen. And, and one of one of the things I've been thinking about is how, and, and this is why a huge reason why I started this, this newsletter and this podcast is that, you know, these things happen exponentially and very quickly.
AJ Asver: You don't really realize when you look behind you at an exponential curve cause it's flap, but you look forward, it's, it's kind of, steep. I always talk about this quote from. Sam Altman, cuz it really like seared into my head is that I think what we're gonna see is this exponential increase in these agents that essentially help you coordinate your life in, in work, in, in meetings.
AJ Asver: In getting to where you want to go in organizing a date night with your significant other. Right. And you are suddenly gonna be surrounded by them. And, and you'll forget about this concept of AGI because that will become the norm in, in the same way that like the internet age has become the norm. And being constantly connected to the internet is part of our, our, our normalcy in life.
AJ Asver: Right. This has been a fascinating conversation. I have one more question for you, which is, You know, as someone that's been going deep into the AI space as well, maybe from the enterprise side as as well, what are you most excited about in a, in AI for the next few years?
The future of society with AI
Charlie: yeah. Look, I talked about two things. Firstly. . I think one of the things that I'm excited about in the short term is just the growth in education. And the single biggest thing I think has happened with G P T is the just mass, fast, easy adoption. And when the internet became very, very interesting, it was when you got mass distribution.
Charlie: People creating use cases. And that's sort of when you went from like, okay, the internet could be a search engine to organized data. The internet could be a place to buy clothes. The internet could be a place to game to, okay, the internet is just everything. So I'm excited for that. And then in the long run it would be remissive us not to discuss this.
Charlie: I'm excited about thinking about what a new economic system looks like. People talk about universal basic. I don't think that the enterprise should be thinking about this question today. Our customers hyper science aren't thinking about this question today, but now is what I would call the ideation stage.
Charlie: Like we need think tanks, governments, people out there like, thinking of ideas, and then eventually stress testing ideas for what the future could look like. And I'm, I'm sort of excited for that.
AJ Asver: Yeah, I, I want to believe that it will happen, but I'm a little skeptical just given what the history of how humankind behaves. We're, we're not particularly good at planning for these inevitable, but hard to grasp, like eventualities in a similar way that we kind of all knew interest rates are going up, but it was really hard to understand what the implications of that was until last weekend when we found out, right, the
Charlie: It was very much we could have prepared for this.
AJ Asver: Yeah. Yeah. And yet we could have prepared because all the, all the writing was on the wall. And you know, you've got these people at the fringes that are kind of like ringing the alarm bells, whether that's like people working in AI ethics or whether it's even Sam Altman of OpenAI himself saying like, Hey, we actually can't open source our models anymore. It's too dangerous to do that. Right? And so then you've got people on the other side that are like, no, we need to accelerate this. It needs to be open. Everyone needs to see it. And the faster we get to this outcome, the faster we'll be able to deal with it. I, I skeptically unfortunately, believe that we're gonna stumble our way into this situation.
AJ Asver: We'll have to act very quickly when it happens. And maybe that's just like the inevitable kind of like journey of mankind as a whole, right? But it's still exciting either way.
Charlie: Well look, I think so. Look, if we don't stumble our way into it, we have, we create a world where people don't have to work, and I'm pretty excited about that. I'd going. Studying philosophy for two years at NYU, I'd be playing a hell of a lot of tennis. I'd be traveling more, like there is a good outcome.
Charlie: My, if we just stumble through it. My thinking, AJ, is that this is what the world looks like and it's not pretty. It looks like three classes of people. You have your asset owners, you can think of them as today's billionaire. They will probably end up controlling the world. There might be some fake democracy out there, when you have all of the sort of AI infrastructure owned by ultimately a small group of people you're probably gonna have them influencing most decisions.
Charlie: You may then have this second class of like celebrity class. I think that may still exist or human sports may still exist. Human movies human celebrities may still
AJ Asver: Yep.
Charlie: and you just get this class of 99.9% of people that are everyone else. And what the, the, the two features of their life are gonna look like.
Charlie: This is just my guess of like the way it goes. If we don't think about it in a bit more of a interesting way plan, and plan for it is universal basic income. Everyone gets the same amount, probably not very much. I don't know that that's gonna be particularly inspiring for people. I think there's better ideas and then I think that you this is very dispo dystopian, but end up living more in virtual reality than in reality. There's the shortest path to a lot of what you might want to create is just to create it in a virtual world versus going and creating all of that. But in a physical world if so, all that is to say is if we don't start thinking about it, don't start having some regions test different models having startups.
Charlie: Form ideas around this and, and, and come up with ideas that are then adopted by bigger countries. In this instance, I think you could end up with a bad outcome, but I think if it's planned, you could end up with an insanely cool world.
AJ Asver: Yeah. So we're gonna end on that note that, that those two worlds are what we have to either look forward to or dread, depending on which way you think about it. And I think for folks listening it's. . It's just like really important to begin with that people just understand, like society can only understand if individuals understand where this technology is going.
AJ Asver: Right? And that's where you obviously are helping by communicating on LinkedIn to the many people that follow you, especially in industry around it. I try to do it with, with this podcast, but I think for, for anyone that's like fascinated about AI that's following along, like I think the number one thing you can do right now, Share with your friends some of the things that are happening in this space to help people kind of get a grasp for how the space is moving, and that will also help you advocate for us to do the right thing, which is to prepare for it, right? I, I, I think like if people think this is a long way away and they don't understand it, just like you said for industry right? Then no one is incentivized within government to do it because their own uh, constituents don't care about it.
AJ Asver: Right? But if we're like, Hey, this is happening. It's an exciting technology, but also there's a kind of two ways this would go and there's one better way than I think it's just as important that we as individuals care about it and advocate for it. was a fascinating conversation. Thank you so much for the time, Charlie.
AJ Asver: I really appreciate it. I cannot wait for, for folks to listen to this episode and thank you a again for joining me on the Hitchhiker Guides ai.
Closing remarks
Charlie: Well, thank you for having me here, aj.
AJ Asver: Awesome. Well, thank you very much and for anyone that's listening, if you enjoyed this podcast please do share it with your friends, especially if you have either founders that are considering building AI startups and going into enterprise, or if you have folks that are working in industry and are considering incorporating AI into their products.
AJ Asver: I think Charlie shared a lot of really great insights there that I think folks would appreciate hearing. Thank you very much, and we'll see you on the next episode of the

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How to prompt like a pro in MidJourney with Linus Ekenstam
16 mar 2023· The Hitchhiker's Guide to AI
Note: This episode is best experienced as a video: https://www.youtube.com/watch?v=KDD4c5__qxc
Hey Hitchhikers!
MidJourney V5 was just released yesterday so it felt like the perfect opportunity to do a deep dive on prompting with a fellow AI newsletter . Linus creates amazing MidJourney creations every day ranging from retro rally cars to interior design photography that looks like it came straight out of a magazine. You wouldn’t believe that some of Linus’s images are made with AI when you see them.
But what I love most about Linus is his focus on educating and sharing his prompting techniques with his followers. In fact, if you follow Linus on Twitter you will see that every image he creates includes the prompt in the “Alt” text description!
In this episode, we cover how Linus shares how he went from designer to AI influencer, what generative AI means for the design industry, and we go through a few examples of prompting in MidJourney live. One thing we cover that is beneficial for anyone using MidJourney for creating character-driven stories is how to create consistent characters in every image.
Using the tips I learned from Linus, I was able to create some pretty cool Midjourney images of my own, including this series where I took 90s movies and turned them into Lego!
I also want to thank Linus for recommending my newsletter on his substack, which has helped me grow my subscribers to over a thousand now! Linus has an awesome AI newsletter that you can subscribe to here:
I hope you enjoy the episode and don’t forget to subscribe to this newsletter at http://HitchhikersGuideToAI.com.
Show Notes
Links:
- Watch on Youtube: https://bit.ly/3mWrE5e
- The Hitchhikers Guide to AI newsletter: http://hitchhikersguidetoai.com
- Linus's twitter: http://twitter.com/linusekenstam
- Linus's newsletter: http://linusekenstam.substack.com
- Bedtime stories: http://bedtimestory.ai
- MidJourney: http://midjourney.com
Episode Contents:
00:00 Intro
02:39 Linus's journey into AI
05:09 Generative AI and Designers
08:49 Prompting and the future of knowledge work
15:06 Midjourney prompting
16:20 Consistent Characters
28:36 Imagination to image generation
30:30 Bonzi Trees
31:32 Star Wars Lego Spaceships
37:57 Creating a scene in Lego
43:03 What Linus is most excited about in AI 46:10 Linus's Newsletter
Transcript
Intro
aj_asver: Hey everyone. And welcome to the Hitchhiker's guide to AI. I am so excited for you to join me on this episode, where we are going to do a deep dive on mid journey.
aj_asver: MidJourney V5, just launched. So it felt like the perfect time for me to jump in with my guests, Linus Ekenstam. And learn how to be a prompting pro.
aj_asver: Linus is a designer turned AI influencer. Not only does he have an AI newsletter called inside my mind, but he's also created a really cool website where you can generate bedtimestories for your kids. Complete with illustrations. And he is a mid journey prompting pro. I am constantly amazed by the photos and images that Linus has created using mid journey. It totally blows my mind.
aj_asver: From rally cars with retro vibes to bonsai trees that have candy growing on them. And most recently hyper-realistic photographs of interior design that looked like they came straight out of a magazine. Linus is someone I cannot wait to learn from. And he's also going to share his perspective on what all this generative AI means for the design industry, which he has been a part of for over a decade. By the way it's worth noting that a lot of the stuff we cover in this episode is very visual. So if you're listening to this. As an audio only podcast. You may want to click on the YouTube link in the show notes and jump straight to the video when you have time.
aj_asver: So if you're excited about I'm one to learn how you can take the ideas in your head and turn them into awesome images. Then join me for this episode of the Hitchhiker's guide to AI.
aj_asver: Thank you so much for joining me on the Hitch Hiker's Guide to ai. Really glad to have you on the podcast. I feel like I'm gonna learn so much in this episode.
Linus Ekenstam: Yeah. Thank you for having me.
Linus Ekenstam: I mean, I'm not sure about the prompt, you know, prompt guru, but let's try
aj_asver: Well, I mean, you tweet about your prompts every day.
aj_asver: on Twitter, and they seem to be getting better every time. So You are my source of truth when it comes to becoming a great prompter. And I also, by the way, love the one thing you do when you tweet your mid journey kind of pictures that you built, um, that you've created, that you always add in the alt text on Twitter. Um, exactly what the prompt was. And I found that really helpful. Cause when I'm trying to work out how to use Mid Journey, I look at a lot of your alt texts. So, um, also include a link to your Twitter handle so everyone
Linus Ekenstam: Nice
aj_asver: it out. But I guess
Linus Ekenstam: I guess I'll stop.
aj_asver: you know, you've been in the tech industry for a while as both a designer and a founder as well
Linus Ekenstam: Yeah. Yep.
aj_asver: love to hear your story on what made you, um, kind of get excited about AI and starting an AI newsletter and then, you know, sharing everything you've been learning as, as you go.
Linus's journey into AI
Linus Ekenstam: Yeah. I mean, if we rewind a bit and, and we start from the beginning, um, I got into the tech industry a little bit on a banana, like a bananas ski. I, I started working in, like, the agency world when I was 17. I'm 36 now, so 19 years ago, time flies. Um, and after like working with, um, customers, clients, and big ones as well, through like, through my initial years there, I kind of got fed up with it.
Linus Ekenstam: And. . I went into my first SaaS business as an employee and it was email like way, way, way, way, way before this day and age, right, where you had to like code everything using tables and transparent GIFs. It was just a different world.
Linus Ekenstam: And 2012 was like, that's when I started my first own business. And that was like my first foray into like the, the startup world or like building something that was used by people outside of the vicinity of, of, of Sweden or Nordics. Um, it was very interesting times. Um, and I, I've always been kind of like early when it comes to New tech, I consider myself being a super early adopter. I got Facebook as like one of the first people in. By hacking or like social hacking a friend's edu email address. And I got an MIT email address just so I could sign up on Facebook.
Linus Ekenstam: Um, so now that we are here, it's like I've been touching all of these steps, like all the early tech, every single time, but I never really capitalized on it or I, I never really pushed myself into a position. I would contribute, but this time around I just, you know, I felt like I had a bit more under my belt.
Linus Ekenstam: I've seen these cycles come and go, uh, and I just get really excited about like, oh s**t. Like this is the first time ever that I might get automated out by a machine. So my response or flight and fight response to this was just like, learn as much as possible as quickly as possible, and share as much of my learnings as possible to help others.
Linus Ekenstam: Cannot not end up. In the same position where they fear for their lives.
aj_asver: Yeah, it's, it's interesting you talk about that because I think that's a huge motivator for me as well. It's just help people understand that this AI technology is coming and it's not like it's gonna replace everyone's job, but it certainly is gonna change the way we work. And make the way we work very different. And as -you've been doing and sharing, you know, how to prompt and what it means to use ai, one of the things I've noticed is you've also received a little bit of backlash, you know, from other designers in the space
Generative AI and Designers
aj_asver: That maybe as embracing of AI as you have. And I, I know recently there were probably two or three different startups that announced text to UX products where you can basically type in the kind of, uh, user experience you want and it generates, mockups right which I thought was amazing and I thought, You know, that would take years to get to, but we've got that now.
Linus Ekenstam: yeah, you
Linus Ekenstam: post.
aj_asver: and I think one of the things you said was designers need to have less damn ego and lose the God complex.
aj_asver: Tell
aj_asver: me a little,
aj_asver: what the feedback has been like in the AI space around kind of how it's impacting design, especially your field.
Linus Ekenstam: So I think, um, there, there is this like weird thing going on where. They're a lot of nice tooling coming out and engineers and, and, and developers. You kind of embrace it. They just like have a really open mindset and go, yeah, if this can help me, you know, I'll, I'll, I'll use it.
Linus Ekenstam: Like, take Github Copilot is a good example. People are just raving about it and, and there is some people that are like, oh, it's, it's not good enough yet, or whatever. But like the general consensus is that this is a great tool, it's saving me a lot of time and I can focus on like more heavy lifting or thinking about deeper problems.
Linus Ekenstam: But then enter the designer , like turtleneck, you know, black, all dressed in black. I mean, I'm, I, I'm one of those, right? So I'm, I'm, I'm making fun of myself as well. I'm not just pointing fingers at others here. I just think it's like weird that. Here's a tool that comes along and it's a tool, it won't replace you.
Linus Ekenstam: Like I'm being slightly sarcastic and using like marketing hooks to get people really drawn in, in my content on Twitter. So I'm not really, meaning, it's not literal. I'm not saying, Hey, you're gonna be out of a job. It's more like, You better embrace this because like the change is happening and the longer you stay on the sidelines, the, the, the more of a, a leap that your peers will have that are starting to embrace this technology.
Linus Ekenstam: And I, it's so weird to see like people being so anti and it's like, it's just a tool. It's not, it's not like the tool itself is dangerous. It's like people with the tool will become dangerous and they will threaten your position. Right. So I just find it very interesting to this whole kind of landscape where people, on one hand, it's just embracing it and people on the other hand are just like, no, I'm not, I'm not gonna touch it cuz he can't do X or he can't do Y.
Linus Ekenstam: It's like, bear with you. It's like we're in the very, very early days of ai. , we might be seeing half a percent or 1% of what's possible. And these tools are here today, like you said. Um, so I think my kind of like vantage point is like I'm not looking at the next six months or the next 12 months. I'm just like drawing out an arc and going like, where are we 20, 30?
Linus Ekenstam: My whole game here is to get as many people as possible.
Linus Ekenstam: Ve well versed into these tools as fast as possible. Like, I want to make sure that the divide between the people that haven't got experience and the people that haven't yet played with these tools kind of make sure that divide doesn't grow too big. I think that's my mission really.
aj_asver: Yeah, You pointed out there there about how, you know these advancements are happening really quickly and you want people to be able to adopt the tools is I think a really important one. And I think a lot of people don't really understand conceptually how exponential advancements kind of work. And I think Sam Altman recently had this good quote where he said something along along the lines of, when you're standing on an exponential qu curve, it's flat behind you, but it's like vertical in front of you, right?
aj_asver: And we're like like climbing this exponential curve, and I think some of us probably see the writing on the wall of how quickly this is all gonna happen. But for other people, know there's always gonna be this resistance. You mentioned like how this tool is gonna help people and people should embrace it, and you of course share a lot of what you are learning and especially when it comes to prompting and you and the art of kind of prompting in mid journey to create interesting images.
aj_asver: And I think you've done some prompting on ChatGPT as well
Linus Ekenstam: Yeah,
Prompting and the future of knowledge work
aj_asver: One thing I'm curious about is, do you think that's the future of the, like knowledge work for us? Is it gonna be like we all just become really good prompt engineers and we're just prompting away when it comes to like writing, you know, writing documents or when it comes to creating design in ux or when it comes to, you know, making images or, or do you think there's more to it than that?
Linus Ekenstam: I think prompting the way that it's done now is gonna be very short-lived. Um, if we're doing an analogy and compare, like prompting with ai, with what we're people that are doing really root level programming, let's say assembly type code for computers.
Linus Ekenstam: Um, so we're in the age now where everything is new, uh, and the way to interact with these models, whether. ChatGPT or Mid Journey or Dolly or Stable Diffusion. It's a very, very root level. Uh, I think I'm, I'm already starting to see like products popping up that are precursors or like tools that put, put themselves like a layer on top.
Linus Ekenstam: So instead of writing like a 200 keyword, Um, prompt to mid journey. You're essentially writing like five words or 10 words that's very descriptive of what you want. And then the precursor takes care of like generating the necessary keywords for you.
Linus Ekenstam: I don't think we'll see these like prompt tags where people figure out me included, figuring out like ways to, you know, if you do this in this sequence or this order, you will able to do. This with a, with a, you know, with a model. Right. Um, I think we'll see less of that potentially, um, and, and move more towards like really natural language and, and less kind of like prompt engineering around it.
Linus Ekenstam: But I think very, um, important here to note is that the kind of like precursors will happen, like we will kind of move away from talking assembly line situation with AI models and that's also like, that's a barrier to entry right now. If you look at mid journey, you want to get. A lot of the things you need to just overcome is kind of how do you write what you want to get?
Linus Ekenstam: Because like if you just write, I want this image that does X, Y, and Z, you're probably not gonna get the thing that you have in your mind. So it's gonna be like you trying out different things and then getting to like slowly speak ai.
Linus Ekenstam: Don't focus too much on becoming like a, a super good prompter, right? To try to, to learn more like principles or techniques or, or, or like think more, um, holistically about this whole thing. How do you interact with ai? I think that's, uh, yeah, something I would recommend
aj_asver: I liked your analogy of, you know, assembly language and how we're all kind of writing assembly code right now. Um, another analogy I've also heard is like, it's, it's like DOS before Windows came and we're all kind of at the command line trying to get what we want by typing it into a computer before someone, you know, made like a user interface around it.
aj_asver: You know, very famously, you know, Xerox Park did it and then Apple was the one who kind of released the version of it. And I, I think that's gonna be something. We'll, all welcome. But in the meantime, I'm curious, how did you kind of learn that steep learning curve of becoming really great at prompting?
aj_asver: Because we all kind of start from zero here, and I think the example you gave where you said like, you know, you think you can just come in and describe what you want. That's exactly how I started. I was
Linus Ekenstam: I was just like
aj_asver: I'm just gonna the the scene and then it didn't end up anything. like wanted
aj_asver: What was your journey of becoming good at prompting and, and learning how to use the journey effectively?
Linus Ekenstam: I think the, the community's really well set up, uh, with my journey that you can. You can go onto midjourney.com and you can start kind of exploring what everyone else is doing. Um, initially I didn't really understand that they actually had a website. Um, but, but then after a few, I guess a few months, then I'm like, oh, there's actually a website here.
Linus Ekenstam: And maybe it wasn't there in the beginning either, so it might be that I just missed it completely. So I think like dissecting other people's work and trying to figure out like, oh, this was a nice cinematic shot. How did I get that look? Um, and, and mind you, like, I've been doing this now for a few months, or yeah, almost half a year.
Linus Ekenstam: Um, and it, it was really different. There was less people using the tool like six months ago than there is now. So I think. It's easier to jump into the tool now and see what others are doing and kind of learning by do, like learning by dissecting essentially. I think it's the same with design. Like if you're learning design today, like the best way is just to try to replicate as much work from everyone else as possible. And I'm not saying replicate in the sense of like, oh, that's a nice prompt, command C, command V, oh, now I did it. Um, that's k that's, that's not learning to prompt . All that it's very easy if you, if you find something you like, you wanna make your own derivative of it, go ahead.
Linus Ekenstam: I mean, that's the beauty of these tools as well. But if you really want to like learn the skill or like, you know, I have this idea I want to do, then I can go do this. But then I must say there's also some really talent. People, uh, on Twitter and elsewhere that are sharing their journeys as well, and, you know, figuring out ways to, to structure their prompts or, yeah, th there there's a bunch of people that we could potentially look at later or, or that we could recommend in the show notes.
Linus Ekenstam: Um, for sure.
aj_asver: Yeah, that would be, that would be great. And I think for people that aren't familiar, the way Mid Journey works is you, you have the website where you kind of explore existing images, but all of the work of creating Images is done through their Discord, which
aj_asver: Unintuitive to anyone that's not familiar with online communities and with Discord, which Discord itself is a fairly new phenomenon from the last like three or four years right? It was originally used in gaming, but now it's used a lot in communities across ai, across crypto and other and other places. And so was another thing that got, that took a while to get used to is like interacting with AI via Discord. But there's one cool advantage of it that I didn't really fully grasp until now, which is that I can pick up my phone and jump in the Discord any time when I have an idea for an image and just start making images.
aj_asver: One of the things I'm curious about, you've been doing this for about six months, ballpark, how many images have you created
Linus Ekenstam: I think I just passed, like the 10 K club. I'm not sure. I'm gonna have to look later.
aj_asver: I mean, one thing I would love to do in this episode, um, is learn from you some of the skills of like, you know, being a pro prompter in, in, uh, MidJourney. So I was wondering if we could kind of jump into it and maybe one, take a look at some of the creations you've done in the past. Walk us through a little bit, um, how you, how you came up with them and then I have a few ideas of things I wanna do in Midge. Maybe you can help me, um, make that happen.
Linus Ekenstam: Let's jump into it.
Midjourney prompting
aj_asver: Awesome. So we're gonna jump into Mid Journey now, and Linus is gonna show us some of the images he's created, give us a bit of a sense of kind of his approach to prompting and then we're gonna jump in and do a few examples too
aj_asver: So one area, for example, that I would love to learn more about is consistent characters in Mid journey. After we did the podcast, uh, with Ammar, where we built the, where we created the, um, children's book, a lot of people asked, oh, how do you get consistent characters across all the pages of the children's book in the illustrations? So I'm really curious about how you achieve that, cuz that's something I've struggled with as well.
Linus Ekenstam: This is interesting. So, um, it, it started with, A lot of people trying to achieve the same thing using mid journey, which is essentially, you know, you have a character, you want the character to be in different poses or in different photos or in different, you know, could be a car cartoon, it could be a real person.
Linus Ekenstam: Uh, and I saw different ways of doing it, and mainly they were for cartoons. Then I'm like, this doesn't not work well with a human. Uh, cuz I tried and it didn't work. So I'm like, there must be some other way to do this. Uh, and obviously this is like brute forcing a password really cuz like mid journey is not supposed to be this tool.
Linus Ekenstam: Uh, the best way to do consistent characters is to use stable diffusion or something else that you can pre-train on a set of images.
Consistent Characters
Linus Ekenstam: the, the way that I went about doing it is essentially, , uh, going to how illustrators work and when, when they create like a character for, for a movie or for an animated, whatever it might be that they're doing, they need reference materials so that other artists can work on the same characters.
Linus Ekenstam: You might have hundreds of artists working on the same character. Um, so then, you know, looking at how they are doing these, I'm like, maybe if I simplify this, what if, you know, I take left and right and up and down, and. and I used those images as inspiration cuz that's something you can do in my journey.
Linus Ekenstam: You can like image prompt, essentially just like putting an array of images and then adding your prompt. So I, I, I went about like starting up making a character, um, just using like a very simple prompt here. I didn't really have any intent of, of, of the output. I just like, let's make journey, do its thing.
Linus Ekenstam: Um, and then when I, I, I found one that I kind of like, oh, this could be nice. Let's work with this. Um, I started using something called a seed. and use that image. So a seat is essentially the noise number or the random noise that an image gets started from. So if. For anyone that doesn't know, you know, mid journey is a diffusion model, which essentially starts from noise and it takes a string of text and it uses that text to take the noise and transform it into an end result.
Linus Ekenstam: So if you want to know the pattern, the exact pattern of the noise that you're starting from, you can include a seed number and it's like randomly generated every time you do an image. So if you have an image and you use the seed and you prompt against that seed again, uh, the likelihood of getting something very similar is quite.
Linus Ekenstam: So I, I kind of went away and, and started doing different angles of this woman. And then once I had more angles, um, I put the angles together. So I'm just scrolling through here, but essentially just finding those up, down, left, right, and forward. And when I was happy with all of them, I just put them together in a long prompt.
Linus Ekenstam: Um, and then just having the same prompt again as the first time, you know, uh, a style, a, a, a portrait shot of a woman. Um, street photo of a woman shot on Kodak, which is essentially just like the, the type of film I wanted to emulate. And then I get the e exact woman out and, hi, this is like, you know, okay, now here we go.
Linus Ekenstam: Uh, what can we do with this? Right? Uh, and there is a bunch of things, like a bunch of learnings, um, from this, which is essentially like you can. Very specific images, if you have a bunch of, of images that you're using as the inspiration images,
Linus Ekenstam: but also when you do these technique. My kind of the, the culprit here is that I use street style photos.
Linus Ekenstam: So every time I'm trying to get her to do other things, like we can go over here. Uh, I, I wanted to try to get her in a, in a space suit, right? We can see that she's kind of in a space. , but she's still standing on a street. So
aj_asver: It's like a very, like a urban chic space suit,
Linus Ekenstam: yes, . It's an urban cheek spacesuit. And we can even see here, like try some different, um, ar like some different aspect ratios. She's in a spacesuit, but we still have the background of, of the street, right? So, . One way to combat this and that, you know, figure this out after the fact that I made this tutorial is like the, the, the source material, the source MAs that you're using, they should be isolated.
Linus Ekenstam: They should be like either against a transparent background or a white background. And that way all of a sudden you can start placing this woman in different areas. So what's neat about this, that you don't need to train a model. You only need to have a set of image. So let's say you have six or nine images that are your inspiration images, and they don't have to be AI generated either.
Linus Ekenstam: You could use like yourself, you can take photos of you from the different angles and put them together. Um, and I think a lot of people, it resonated with a lot of people because this is one of these things that are inherently hard to do in my journey, and there is quite a big use case for it. So I, I, I personally hope that, you know, my journey goes into the direction of kind of a little bit.
Linus Ekenstam: Stable effusion or Leonardo, where they're giving you tools to do these kind of things like fine tuning, but not maybe to the extent of like training your entire, your own model completely. Right. And we can look at this example. I think this is very nice as well. Like we can get her smiling cuz that was one of the things that, you know, people, oh, you used all these photos, which she's not smiling, you're never gonna get her to smile.
Linus Ekenstam: Uh, and basically you can like, there's a lot of like things you can do, even though mid journey is very. Has very opinionated. So there are ways to work around this. And if we're like diving a little bit into prompting here, um, we can just, let's, let's grab this full command here. Um, and we can,
Linus Ekenstam: yeah. Sorry,
Linus Ekenstam: Yeah.
aj_asver: Was you basically reverse engineered how, you know, a character animator would approach this idea of consistent characters. And the way they do it is they have different poses of a character that they kind of create first. So you kind of have a base, kind of almost like a sculpture that hasn't been fully molded yet. So to get an understanding of the character, you generated those using ai, but you could also have a. you know, photos you already have of a person or maybe you take photos of yourself at different angles. Then you inputted that as actually with the prompt, you inputted the images too, that was what allowed you to kind of then create these consistent characters cuz you now have this base image to work from. one of the things you said was, if you want to be able to change the background, so move them from like street photos for example, to be in space, you kind of need to remove the background from the original base images because that's that background. If you keep it in there, like the street photos has a street background is gonna influence what mid journey creates as well.
Linus Ekenstam: Yeah, correct. Um, let's , I just pushed this in here. Let's see if my journey does something with it. Sometimes when you're using a lot of high resolution, um, inspiration images, it actually crashes the, the bot. So it doesn't work. Oh, we're actually getting something that's good. So, uh, it's not entirely sure we're gonna get a smiling woman this time, but the way to kind of like force smiling for example, is give smiling a very high weight.
Linus Ekenstam: So when you're using, um, Let's see if I can scroll in here. Yeah. So when you are adding, uh, is it colon? Yeah. Is it colon, semicolon? Um, colon? Yeah, colon. Colon five, for example. Then you give smiling. The word smiling, uh, a weight of five. Uh, I think standard weight is like zero or one, I think one. Um, so we're really emphasizing here that we want her to be smiling and now I think we actually got something.
Linus Ekenstam: And it might not be that she's smiling in all of. , but she's, uh, kind of smiling . Forced,
aj_asver: got like a little bit
Linus Ekenstam: yeah. a bit of a forced smile. Uh, but
aj_asver: smile. Yeah.
Linus Ekenstam: yeah, and this is the thing. I mean, mid journey is opinionated and you, you might have to do re-roll, you might have to do things like over and over. And because it's not really trained on her smiling or being neutral is actually trained on her being.
Linus Ekenstam: Angry or just very like,
Linus Ekenstam: uh, so re-roll is like, essentially press a button here in, in, in this cord and it takes the exact same, uh, prompt, the same parameters, and a different seed so it won't use the same noise again. So it will start from the beginning One more time. Uh, if you wanted to get like more diverse outputs, we could use chaos, which is essentially how chaotic the difference is between the four image.
Linus Ekenstam: that we're going for. So we could add, uh, dash C and then let's say a hundred. So this value goes between zero and a hundred, and it will d dictate the difference between the four different images. So we can see up in, yeah.
aj_asver: I noticed, um, by the way, that there's a few different kind of arguments you can add to the end of your. Mid journey prompt, and I think one that you use often is aspect ratio. Um, and then chaos is one. You just mentioned hph and hph, and C. Where did you learn about these and how does someone kind of work their way around trying all these different ones?
Linus Ekenstam: um, i, I mid journey.com, they have like documentation I think people are a bit afraid of, of the documentation because they might not know what they're looking for or like, um, yeah, it, it's not that hard. Like when, when you're prompting in mid Journey and then you go like, okay, there is like, I think, uh, 6, 7, 8, 9.
Linus Ekenstam: Nine different arguments that you can use. So it's like aspect ratio, chaos. Quality seed stylized tile, which not many people know, and version and quality. So version, you don't need it if you're not. Like now it comes preloaded with the latest model. So if you just add dash dash V four four, it actually uses an older model.
Linus Ekenstam: Um, so a lot of prompts you'll see will have dash, dash, v4, uh, not necessary. So essentially now the model that's running is V4 C, which is like the third iteration of v4, uh, and quality two. You can go quality one, two, and up to five, I think. But they've done a lot of testing internally. people can't tell the difference between Q1 and q2.
Linus Ekenstam: Like, so it's just a waste of GQ 10 because essentially when you're doing quality two, you're gonna use twice as much GPU render time. And GPU render time is essentially how long, um, of like, how much of your credits get used to render an image? Um,
aj_asver: it. So high quality means if you're paying for mid
Linus Ekenstam: yeah.
aj_asver: actually to use the bot directly versus the
Linus Ekenstam: Yes. Yeah.
aj_asver: it's gonna cost more per image basically.
Linus Ekenstam: Yeah.
aj_asver: I notice as well is that you also include some details around how the shot is taken, right? The actual camera. Um, is, does that make a lot of difference kind of picking the, the camera? Because I noticed that was a pretty cool thing that I didn't, I wasn't aware of actually until I saw your, your images.
Linus Ekenstam: Yeah, I think we're, there's a bunch of people that I've, like, I, I saw this like in December, the first time. I think like people using camera. Like it's shot on a canon or it's shot on a hassle blood, or it's shot on an icon, or it's shot on this type of film, you know, emulating black and white film or emulating sepia tone film.
Linus Ekenstam: Um, and then now I think it's, there's a lot more people that are kind of dissecting it and like really going nitty gritty on it. Um, and, and trying to just be like, what are the things that we can do with this? Like, how. , how much can we describe with this? And it's, as it turns out quite a lot, especially like camera angles type of shots.
Linus Ekenstam: Like, you know, using wide, ultra wide narrow, you can go and use like lens parameters. So like for those that are interested in photography, um, you, you could use 50 millimeters, so 50 mm. Um, to, to decide kind of the, the, the, the framing of your shot and kind of what. Output should look like because it has a very distinct look.
Linus Ekenstam: You could go 80 millimeter, 120 millimeter tele lens. All these things matter. Actually, it matters quite a lot cuz if we go here and, and check some of the photos I did the other day about, so I did some, uh, national Euro graphic shots, right? So these are quite interesting where we have like, um, shot on the telephoto lens as one of the key things.
Linus Ekenstam: So what it does, it really gives you this super consumed in
Linus Ekenstam: photo with like bulky in the background. So you have a really blurred background and we can see that, like my journey is really good with hair. Um, and the compression might blow it, blow it down a bit, but Maur is fantastic with hair and feathers and fibers.
Linus Ekenstam: I'm not sure what they've done there, but it's, it's absolutely fantastic. Um, so yeah, and lens matters quite a lot.
aj_asver: you are, what you're doing there is really kind of imagining the camera you would take this photo with if it was a real photo, and using those, um, those properties of the camera as parts of the prompt. And one of the things I also noticed with your prompts is you are not necessarily describing the scene.
aj_asver: You are often described lots of, characteristics of the scene, right. What is your approach when it comes to, you have this idea in your head, uh, you, you, you kind of, ima have this idea in your imagination of what you wanna create and then getting that down into a prompt. How do you, how do you approach that?
Imagination to image generation
Linus Ekenstam: um, I mean, in the beginning I, I did write a really interesting one. Threads on this as well. Cause I was sitting in a restaurant you mentioned earlier that like, that, you know, it runs in Discord. You can bring up your phone, you can start prompting. I was, um, we're, I got two kids, right? And me and my partner, we actually had like the first weekend without kids, um, since pre pandemic.
Linus Ekenstam: So we basically haven't been out alone. And, and you, what, what I do, I sit with my phone in mid journey. That came out bad anyhow, we were sitting there and we're actually using it together. So we were like, we're talking, we're talking about like what we're building with bedtimestory and then we saw this like really nice geisha on the world cuz we were eating at an Asian fusion restaurant.
Linus Ekenstam: And I'm like, I wonder if I can do that mid journey. And then we're just like, you know, open up discord on the phone and we're sitting there chatting, drinking a little bit and just like, oh, okay, we, you know, let's try this. I think I ended up doing like 50, maybe 50 or 60 generations where like the initial.
Linus Ekenstam: Was a geisha, but it didn't look anything like the thing we saw on the wall, right? Because the, the geisha on the wall was like on a wooden plaque,
Linus Ekenstam: uh, just like a really nice white geisha face mask and some red, really tiny red, uh, pieces in it. So we basically just went like, iterated removed, you know, added, redacted.
Linus Ekenstam: It's just like added words, removing words, try different things, went completely crazy and go, what if we just take away all of this and write something completely different? Um, so it it, it's easy if you have an idea, right? That to just like continue to, to plow through. And then once you hit what you want, then you have that kind of like base prompt.
Linus Ekenstam: Then you can start altering that you. Exchange an a subject or an object or exchange a post or, but, but you have kind of your, your, your base prompt figured out.
Linus Ekenstam: So getting to the base prompt could be tricky. Sometimes you hit gold after just a few tries. Um, it really depends, uh, on what it is that you're looking to create.
Bonzi Trees
Linus Ekenstam: Right?
Linus Ekenstam: I had luck with like bon's eyes, for example. I just, what can I do bons eyes with, with ma journey and how does that work? You know? And I just type like, uh, pine tree bonsai. Why wait a minute. Like this is fabulous. You know, I, I, I think I made some bonis again yesterday just for fun. Um, so like raspberry bonai, that's, that's, this is the prompt.
Linus Ekenstam: This is it, you know, it's not harder than that. And like, you could do
Linus Ekenstam: raspberry. That's it. , right? And, and, and you can, you can, you can, you can imagine, you can do thousands of thousands of these, right? Uh, and you can be crazy about it. You can do, like, I think I did Candy Bon. Yeah. Here we go. Ken Bonk. Who, who knew
aj_asver: it's got like lollipops.
Linus Ekenstam: Yeah.
aj_asver: that bonai tree is something my kids would absolutely adore. So I, I had this idea, um, Linus, um, I would love to learn kind of how to do this. Um, and I have a concept in my head and I was wondering if we could try it out,
Linus Ekenstam: Yeah. Let's, let's,
aj_asver: if we, we can kind of bring it to life.
Linus Ekenstam: yeah, let's try
Star Wars Lego Spaceships
aj_asver: So the concept is I also have two kids and they're four and they are absolutely obsessed with Star Wars right
aj_asver: got their first two Star Wars Lego sets and now they want like everything in the collection.
aj_asver: And they went to the library recently and got a book, and the book just has all these Star Wars, um, Lego ships in it. And it made me think like that would be a cool, fun thing to do in Mid Journey is like create imaginary Star Wars kind of spaceships. And so I was just wondering how would I approach that?
aj_asver: Um, do I just type in Star Wars spaceships made out of lego. Do I need to kind of think about how it's shot? Do I need to think about kind of the features? And so that's my idea. Star Wars Lego spaceships. How do we turn that into a cool, mid journey image?
Linus Ekenstam: Okay, let, let's just start straight off with, with what you just said, like Lego , Lego Star Wars spaceship. We we're probably just gonna get something that's very similar to what what's already in, um, in Star Wars, but with some kind of reim to Lego. Mid journey is relatively good at like, creating Lego. So we're gonna have to figure that one out. Star Wars. Um, spaceship. actually we're gonna put,
aj_asver: It's already starting to generate some images and, and the cool thing about mid gen is it kind of shows you bit by bit as it's evolving, right?
aj_asver: They already look pretty, pretty cool from the, from the outset. Okay. So we got some Star Wars Lego images.
Linus Ekenstam: it doesn't re does it look Star Wars, though?
aj_asver: It, it kind of looks like, um, yeah, it, it doesn't look like a Star Wars ship, I would imagine existing in the Star Wars world, but it has some kind of Lego vibes about it.
Linus Ekenstam: Let's try to see if we can get an imperial. Maybe in a pure cruiser or something that's also a known set. Maybe there is something that we could go crazy about instead. So when we started with Spaceship and we wanted to be Nubian fighter, um, maybe we want it to be like silver with, um, loose stripes.
Linus Ekenstam: Want the side shots. See if we can get something there.
Linus Ekenstam: So
aj_asver: you just typed
Linus Ekenstam: So,
aj_asver: in Star Wars spaceship and then Nubian fighters. You're trying to be a bit more specific about it. And then you also added some color hints as well, right? Silver with with white stripes and then Lego. That was an important part
Linus Ekenstam: Yeah, and I've also added side shot here to make sure that we get the, the, the, the model from the side. I'm not sure we will actually get it, uh, the way we want here. And I'm also not sure a Nubian fighter is, well this, this was the first one, like the imperial, some imperial ship here still.
aj_asver: to look a bit more, a bit more like a
Linus Ekenstam: Yeah,
aj_asver: I think.
Linus Ekenstam: it looks like Star Wars Lego, but it's still, I don't know, mid Journey is doing some weird things here with like, I think it's trying, oh, okay. Now, now let's,
aj_asver: ones, when you described a specific type of, um, ship, it looks a bit closer. So, I mean, another one we could try is like, you know, a tie fighter or an or an X-wing. Oh,
Linus Ekenstam: yeah.
aj_asver: looks a lot more like Lego right now. So
Linus Ekenstam: Yeah.
aj_asver: see something come to shape. It looks a lot more like a Lego we might have.
aj_asver: So maybe we could try like X X-wing fighter or tie fighter.
Linus Ekenstam: Let's try with X-wing. X-wing, and we want it silver with orange stripes. Maybe we want it like in space. See, I think the, the thing thing that I, the, the, what I'm kind of doing when I'm promming is like, I'm really kind of playful with it. I don't really mind if it takes me a hundred shots to get something.
Linus Ekenstam: Uh, sometimes it's a bit frustrating because like you think you, you kind of get down into this rabbit hole and you just like, I'm, I'm doing the right thing, you know, I'm writing these things, why it's not giving me what I want. Um, but then either just like remove completely and you start over and you do something different.
Linus Ekenstam: you just like try a different image and then come back to it because like maybe you have some other things that you want to try out. Um, but I think this, this might turn out great actually. Then, then again, the, the X the X-wing is a known object, right? So I would be surprised if, if we wouldn't get anything here.
Linus Ekenstam: I think where mid journey might China is like trying to combine, um, things. But if we want something that looks outta Star Wars, um, especially in Lego, there we go. This doesn't look.
aj_asver: This looks like it could be a real Lego set now. So we've got like a couple of different riffs on X-Wing. They have like really big engines, which is, which is really cool, and lots of lasers, which the kids absolutely
Linus Ekenstam: Yeah.
aj_asver: And you just clicked U2 there. Now what that does is that upscales, the second image, right when you hit U2 and
Linus Ekenstam: Yeah, so I, I just wanted to see kind of like what we would get if we, if we kind of like tried to get this in a slightly larger, uh, resolution. So these images are relatively low rest, I think when you're doing 69. Let, I'm doing now aspect ratio 69. We're gonna see, like, the image is 600, um, 640 pixels white.
Linus Ekenstam: so, And what I did here now is like when, when you have a shot, uh, a co, a collection of images that you get back, if you react with the envelope emoji, you get back the, the job id, um, the seed number and the four individual images.
Linus Ekenstam: So singular images, if you. If you want to save them low rest, and if you're doing like image prompting where, where you kind of put images into the prompt as well to give some, like, to give the prompt some inspiration. This is a neat way to like make sure you're using low risk images instead of like pushing the highest definition images into, into the image prompt.
Linus Ekenstam: Cuz that could slow down my journey quite a lot.
Linus Ekenstam: Um,
aj_asver: that gave you four individual images. Instead of
Linus Ekenstam: yeah.
aj_asver: one image, which
Linus Ekenstam: Yeah.
aj_asver: it's kind of four, the default is one image of four things in a grid. Right. And instead it generated four different images. And to do that, you clicked on the envelope emoji. So that is like a super, um, that's like a really great Easter egg for people to know.
aj_asver: Cause I would not have known that if, if you hadn't told me. So, envelope emoji. Gives you the original images, um, in low res, which you can then use to feed back into mid journey.
Linus Ekenstam: And, and again, this, yeah, this is impressive cuz if we consider what's happening here, it's like the, the model is interpreting our Lego as like actually building something in Lego. It it, it kind of tries to do that and emulate that. And if we look at the lighting here, the reflection in the cockpits, we can see that it's like shot with some kind of studio lighting over overhead lighting.
Linus Ekenstam: Um, ah, look at this. This turned out great. Wow. I'm, I'm surprised myself, so
Creating a scene in Lego
aj_asver: is so cool. And, um, I have one more challenge for you. This one's a little bit harder. We'll see if we can make it happen. So I
aj_asver: I was lego.
aj_asver: okay, this is cool, but it's even more fun if you can create a scene, right. With a few different characters in it and the Lego, uh, the, and the Lego, um, figurines too.
aj_asver: So I was wondering if we could give that a try. I have this, um, I have this fun idea for a Twitter thread where you recreate scenes from famous TV shows in movies in Lego in Mid journey. So maybe we can start with like a Star Wars one or we can, we, we can do a different one. Um, but I think that could be an interesting one to try too, because one of the things that I think is a little bit harder is when you have characters or multiple characters and you're trying to get, get, get a scene going.
aj_asver: So how would you approach that?
Linus Ekenstam: Actually I, I think this is an interesting one cuz like, okay, let's do, we have a scene in mind from Star Wars that we would like to try to reenact? Um.
Linus Ekenstam: So what I would, I would, what I would then do is like, I would pro, I probably do something like this where I would go and look for, For source material, like what is it that I'm trying to create, like to get an idea for how it looks, right? So, uh, I think this one is relatively cool. Um, I'm not sure we could actually do this, but this could be an interesting experiment.
Linus Ekenstam: So let's copy this image address and let's try to use this as an image prompt. So we go imagine, and then we paste the image url and then we say,
aj_asver: so you're pasting URL of a Star Wars scene that you found on Google Image search. Right.
Linus Ekenstam: Yeah, so I don't know her name here, but this is Finn, right? And what's her name? This is from the later Star Wars. So we go Finn
aj_asver: should know this cause we talk about Star Wars characters all the time.
Linus Ekenstam: star Wars Finn Running, and then it's BBB eight rolling, uh, sand Dunes and uh, Lego. We want to do aspect ratio 69. So we have the image and what I'm doing now, I'm basically just describing the image that we're looking at, um, and. Adding that as the prompt and adding Lego. We could actually just make sure we weight this, so we go Lego important.
Linus Ekenstam: Uh, and let's see. So here is a bit of like, if you listen in on the mid journey, um, all hands or in the way they call 'em community chat or, or town halls. Um, you'll hear, uh, the founder speaking a lot and the team speaking a lot about how the next evolution of of prompt. Probably gonna be image to image or like, people will do a lot of stuff using an image, um, or images, multiple images.
Linus Ekenstam: And I kind of agree because like, um, if I have a, if I have the ability to kite, either just take a, a snapshot of something or I grab something on the internet and I can like, take that and mesh it with something else and then put my prompt on it, the, the likelihood of me getting what I want is like 10 times higher
Linus Ekenstam: than if I'm just like, uh, writing.
Linus Ekenstam: Prompts
Linus Ekenstam: So this did not turn into Lego at all. So let's skip the idea of using the image prompt. And let's try something else. So Lego Star Wars is known, right? There is,
aj_asver: Mm-hmm.
Linus Ekenstam: uh, Lego Star Wars. So what are we trying to do? We're trying to do a cinematic shot, maybe, um, what's gonna happen in there? Who we're gonna have, we're gonna have some new. Storm troopers maybe Darth. Sorry, I'm going a bit slow here. I'm just thinking, um, we want to, one, we want to talk about lighting perhaps.
Linus Ekenstam: If we try, try this, sorry. Cinematic Darth Vader, indoors, --ar 6:9.
aj_asver: I noticed by the way, that you put Lego Star Wars Co on at the beginning. Is that something you can do to kind of set the, the scene
aj_asver: of
Linus Ekenstam: Yeah, I act, Actually I should have done this. Select do a multi prompt. So we're deciding that like the first part of the prompt is Lego Star Wars. Like that's like just pure definition. And then we want the cinematic shot with Stormtrooper Star Vader indoors.
Linus Ekenstam: Actually, this is probably gonna give us something that's relatively good,
Linus Ekenstam: So it interpreted this, well, I don't, I don't think it turned it into a multi promptt, but it worked anyway, so yeah. Here we go. We have Lego
aj_asver: start to see an actual cinematic scene.
aj_asver: Okay. So now we, you know, did a few different iterations and by adding Lego Star Wars at the beginning of the prompt, we actually got to a scene or a few different scenes where we have storm troopers, we have Darth Vader, we even have a lightsaber.
aj_asver: And the background actually looks like it's Lego too. So this is really, really cool. I feel like, um, we made a lot of progress on here and this is giving me a ton of inspiration. I'm gonna go off and make my own Lego scenes after I, after I saw this now, and I feel like I've learned a few tips and tricks from it too.
aj_asver: Thanks, lightness. This is really, really cool.
linus_ekenstam-1: Yeah, you're welcome. Uh, I, I'm, I'm excited too. I'm a big Lego fan actually, so, uh, yeah. I should, I should, I should actually get, I should actually do some things for this and post to some of the Lego
What Linus is most excited about in AI
aj_asver: Yeah, we have to do
aj_asver: it. Alright.
aj_asver: well, I feel like this is setting me off on a really awesome path, and I'm really excited to explore this further. Um, I've learned a ton from just going through this with you, so thank you so much, Linus. Um, before we wrap up, I'm curious, like what are some of the things you're most excited about, um, in AI and especially in general AI right now?
Linus Ekenstam: I, I think in general, I'm just like really excited about the possibilities of all these tools, to be honest. Like these tools are, are accessible to pretty much anyone. Anyone that has a smartphone or anyone that has like a a a computer with a web browser doesn't really have to be a good computer.
Linus Ekenstam: Cuz all these tools are running in the browser, right? Like the gps are in the cloud. Uh, it doesn't matter if it's ChatGPT or if it's MidJourney or something else. Like it's enabling anyone, uh, to, to be creative. Like I don't really, I don't need to know anything about like drawing or, or, or, or being an artist.
Linus Ekenstam: Right. I can just, if I have a good enough imagination or if I, and everyone is imaginative, so.
Linus Ekenstam: Pretty much everyone fits that. Um, so I, I think I'm, I'm most excited about that, that like, there is no real barrier here. Like we, we could like go out and just tell anyone to go try this and, and anyone could.
Linus Ekenstam: Right? I think the biggest boundary or the biggest barrier has been, uh, Actually the interface. So the fact that Mid Journey decided to be like, we're gonna do this in Discord, um, a few months back on, on, you know, the, the all hands, like there was a lot of people complaining like, oh, you know, I love Mid Journey, but it was so difficult to learn Discord.
Linus Ekenstam: And I'm like, wait a minute, why don't you just have a wonderful website? Why don't you just put the UI there, like the generative part. There could, could be easily done. Like [email protected], amazing platform, super simple, prompt books on the website. Um, yeah, so I mean, there, there are few things that we could, that need solving, but like the fact that this is available for everyone, I think I'm most excited about.
Linus Ekenstam: And then when it comes to new tools, like it's really hard to keep up. Um, there's basically like a ton of new tools every day getting released. It feels like, you know, we're kind of in a. Hype cycle. A lot of people are getting their hands dirty. There is a lot of really good ideas and a lot of people are executing really fast.
Linus Ekenstam: I think ElevenLabs is sitting on some gold. They kind of like made it super simple to clone a voice and then used that voice by just inputting text.
Linus Ekenstam: So for example, if you know me, I'm building a, a storybook, um, generator for, for kids stories and, and we want to be able to cl clone parents' voices. So it's super easy for us to just like, Hey, record 60 seconds of your voice and then you can have any of the stories that you created written, like read back by you to your kids.
Linus Ekenstam: So I think that's, I'm really excited about that. And then I'm really excited about. A lot of these platforms opening up their APIs to developers. So, uh, we saw yesterday, the day before yesterday, like, you know, open AI released chat, G P T API and whisper api, um, through the world for everyone to use. And I think super excited about that as well.
Linus Ekenstam: So, yeah. Um, there there's not much underground, well, there is a few, you know, things that are popping up on the radar, but I don't think any, anything that's as exciting as the big things the macro.
Linus's Newsletter
aj_asver: Yeah, there is, so much happening in the space and folks can there is actually um, newsletter to, to get updates on, on kind of your, your take on the
Linus Ekenstam: Yeah. Yeah.
aj_asver: what's the name of the newsletter?
Linus Ekenstam: So the name is my name. So it's linusekenstam.substack.com. Uh, and actually the name of the name of the newsletter is inside my head. Um, so maybe I should change the URL to be inside my head. Uh, but yeah, it's linus eam.ck.com and, uh,
Linus Ekenstam: Yeah.
aj_asver: linusekenstam.substack.com Thank you so much Linus, for joining me, um, and helping me become a pro at Mid Journey. And also I feel like I learned a lot about kind of your, your perspectives on AI and generative AI and how it's gonna impact the design, industry too. So I'm really appreciate it.
aj_asver: Thank you so much. And until the next episode of Hitchhiker's Guide to AI. Thank you everyone for joining us.
Linus Ekenstam: Thank you. Thanks for having.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How AI Chatbots work and what it means for AI to have a soul with Kevin Fischer
17 feb 2023· The Hitchhiker's Guide to AI
Hi Hitchhikers!
AI chatbots have been hyped as the next evolution in search, but at the same time, we know that they make mistakes. And what's even more surprising is that these chatbots are starting to take on their own personalities.
All of this got me wondering how these chatbots work? What exactly are they capable of, and what are their limitations?
In the latest episode of my new podcast, we dive into all of those questions with my guest, Kevin Fisher. Kevin is the founder of Mathex, a startup that is building chatbot products powered by large-scale language models like OpenAI’s GPT. Kevin’s mission is to create AI chatbots that have their own personalities and one day their own AI souls.
In this interview, Kevin shares what he's learned from working with large language models like GPT. We talk about exactly how large-scale language models works, what it means to have an AI soul, why chatbots hallucinate and make mistakes, and whether AI chatbots should have free will.
Let me know if you have any feedback on this episode and don’t forget to subscribe to the newsletter if you enjoy learning about AI: www.hitchhikersguidetoai.com
Show Notes
Links from episode
* Kevin’s Twitter: twitter.com/kevinafischer
* Try out the Soulstice App: soulstice.studio
* Bing hallucinations subreddit: reddit.com/r/bing
Transcript
Intro
Kevin: We built um, a, a clone of myself and um, the three of us were having a conversation. And at some point my clone got very confused and was like, who? Wait, who am I? If this is Kevin Fisher and I'm Kevin Fisher, who, which one of us is.
Kevin: And I was like, well, that's weird because we de like, we definitely didn't like optimize for that . And then we kept continuing the conversation and eventually my digital clone was like, I don't wanna be a part of this conversation with all of us. Like one of us has to be terminated.
aj_asver: Hey everyone, and welcome to the Hitchhikers Guide to ai. I'm your tour guide AJ Asper, and I'm so excited for you to join me as I explore the world of artificial intelligence to understand how it's gonna change the way we live, work, and.
aj_asver: Now AI chatbots have been hyped as the next evolution in search, but at the same time, we know that they made mistakes. And what's even more surprising is that these chatbots are starting to take on their own personalities.
aj_asver: All of this got me wondering how do these large language models. What exactly are they capable of and what are their limitations?
aj_asver: In this week's episode, we're going to dive into all of those questions with my guest, Kevin Fisher. Kevin is the founder of Mathex, a startup that is building chatbot products powered by large scale language models like OpenAI's. Their mission is to create AI chatbots that have their own personalities and one day their own AI souls
aj_asver: in this interview, Kevin's gonna share what he's learned from working with large language models like G P T. We're gonna talk about exactly how these language models work, what it means to have an AI soul, why they hallucinate and make mistakes, and what the future looks like in a world where AI chatbots can leave us on red.
aj_asver: So join me on this. As we explore the world of large scale language models in this episode of the Hitchhiker's Guide to ai.
aj_asver: hey Kevin, how's it going? Thank you so much for joining me on the Hitchhiker Guide to
aj_asver: ai.
Kevin: Oh, thanks for having me, aj. Great to be.
How large-scale language models work
aj_asver: appreciate you um, being down to chat with me on one of the first few episodes that I'm recording. I'm really excited to learn a ton from you about how large language models work and also what it means for AI is to have a soul. And so we're gonna dig into all of those things, but maybe we can start from the top for folks that don't have a deep understanding of ai.
aj_asver: What exactly is a large language model and how does it work?
Kevin: Well, so, uh, there's this long period of time in. Machine learning history where there are a bunch of very custom models built for specific tasks. And the last five years or so has seen a huge improvement in basically taking like a singular model with making it as big as possible and putting in as much data as possible.
Kevin: And so basically taking all human data that's accessible via the internet running this thing that learns to predict the next word given the prior set of words. And a large language model is the output of that process. And for the most part, when we say large, like what large means is hundreds of billions of parameters and trained over trillions of words.
aj_asver: when . You say it kind of predicts the next word. Now, that technology, the ability to predict the word in large language model has existed for a few years. I think GPT in fact, three launched maybe a couple of years
Kevin: Even before that as well. And so next word prediction is kind of like the canonical task or one of the canonical tasks in natural language processing, even before it became this like new field of transformers.
aj_asver: And so what makes the current set of large scale language models or lms, as what they're called as well, like GPT three, different from what came before it?
Kevin: There are two innovations. The first is this thing called the transformer, and the way the transformer works is it basically has the ability through this mechanism called attention to look at the entire sequence and establish long range correlation of like having different words at different places contribute to the output of next word prediction.
Kevin: And then the other thing that's been really big and then open AI has done a phenomenal job doing is just learning how to put more and more data through these things. There are these things called the scaling laws, which essentially. We're showing that if you just keep putting more data at these things their intelligence, essentially the metrics they're using to measure intelligence just kept increasing.
Kevin: Their ability to predict the for nextdoor accurately just kept growing with more and more.
Kevin: Data's basically no bound.
aj_asver: Seems like in the last few years, especially as we've got to like, you know, multi-billion parameter models like GPT three, we've kind of reached some inflection point where. Now they seem to somehow be more obviously intelligent to us. And I guess it's really with ChatGPT recently that the attention, has kind of been focused on large language models.
aj_asver: So is ChatGPT the same as GPT three or is there kind of more that makes ChatGPT able to interact with humans than just the language model
How ChatGPT works
Kevin: My co-founder and I actually built a version of ChatGPT long before ChatGPT existed. And the biggest distinction is that these things are now being used in serious context of use.
Kevin: And with OpenAI's distribution, they got this in front of a bunch of people. The problem that you face initially the very first problem is that there's a switch that has to flip when you use these things. When you go to a Google search bar if you don't get the right result, you're primed to think, oh, I have to type in something different.
Kevin: Historically with chatbots, when you went to a chatbot, if like it didn't give you the right answer, you're like pissed because it's like, it's a, it's like a human, it's like texting me. It's like supposed to be right. And so the chat, the actual genius of ChatGPT beyond the distribution is not actually the model itself because the model had been around for a long time and was being used by hackers and companies like myself who saw the potential.
Kevin: But with ChatGPT distribution plus the ability to reframe that switch so that you think, oh, I'm doing something wrong. I have to put in something different. And that's when the magic starts happening right now. At least
aj_asver: I remember chatbots circa 2015, right, for example, where they weren't running on a large language model. They were kind of deterministic behind the scenes. And they would be immensely frustrating because they didn't really understand you, and oftentimes they kind of get stuck or they'd provide you with these option lists of what to do next. ChatGPT. On the other hand seems much more intelligent, right? I can ask it pretty open-ended questions. I don't have to think how I structure the
aj_asver: questions.
Kevin: Chat GPT is not a chat bot. It's more like , you have this arbitrary transformer between abstract formulations expressed in words. So you put in some words and you get some other words out, but like behind it is this the entire, like almost the entirety of human knowledge condensed into this like model.
aj_asver: And did open AI have to teach the language model how to chat with us, for example, because I know that there was some early examples of trying to put you know, chat like questions into GPT, into its API, but I don't think the ex the results were as good as what ChatGPT does today, right?
Kevin: Since ChatGPT has been released, they've done quite a bit of tuning. So like people are going and basically like thumbs upping and thumbs downing different responses.
Kevin: And then they use that feedback to fine tune chat, GPT's performance in particular. And also probably feedback for GPT for whatever comes next. But the primary distinction between it performing well and not is your perception of what you have to
GPT improvements
aj_asver: We're now at GPT 3.75, and Sam Altman also said that the latest version of GPT that's running on what Microsoft is using for Bing is an even newer version.
aj_asver: So what are some of the things they're doing to make GPT better? Every time they release a new version, that's making it like an even better language model and even better at interfacing with
aj_asver: humans.
Kevin: Well, if you use ChatGPT, one of the things you'll immediately notice is there's like a thumbs up and thumbs down button on the responses. And so there, there's a huge number of people every day who are rating the responses. And those ratings are used to provide feedback into the model to create basically the next version of it.
Kevin: I mean, it basically works behind the scenes where they take the mo they're doing next word prediction again. But now they have examples of like what is a good thing to optimize for next word prediction and what's like a bad answer.
aj_asver: it. So they're looking at essentially the questions and answers from people asking questions to ChatGPT and then the answers that have been provided back. And if, you know, you thumbs up those answers, they're kind of sending that back into GPT and saying like, Hey, this is an example of a good answer.
aj_asver: Thus kind of fine tuning the model further versus this is an example of that
Kevin: yeah, that's right. And they basically take the pr I, you know, I don't know exactly what they're doing, but roughly they're taking the context of the previous responses, plus that like output and saying like, these previous responses should generate this output, or they should not generate this other output.
Building Chatbots with personalities
aj_asver: Tell me about what the experience has been like for you and your co-founder and what's some of the things you've learned from this process of iterating on top of. Language models like GPT?
Kevin: It's been a very emotional journey, . And I think a very introspective one and one that causes you to question a lot of what it means to be human . What is like the unique thing that we have in our in our cognitive toolkits and like, what is it in 20 years that our relationship with machines even looks?
Kevin: When my co-founder and I started, we had, we built a version of ChatGPT for ourselves. And we're using it internally and realized like, oh wow, this is like immensely useful for productivity tasks. We wanna make something that's like productivity focused.
Kevin: And then as we kept talking with it more, there was like little pieces or elements that felt like it was a. Like more alive. And we're like, oh, that's weird. Like, let's dig into that. And so then we started building more embodiments of the technology. So we have this Twitter agent that was basically like listening and responding to the entire community to construct us new responses.
Kevin: And we just started like digging deeper and deeper into the idea, like, what if these things are. , what if they are real? What if they are actual entities? And it's, I think it's a very natural progression to go through as you start seeing the capabilities of this technology.
aj_asver: question and something that you know, has been a lot of folks' minds as they think about kind of the safety of ai. And I think, you know, it was last year when Google launched Lambda and there was a researcher that was convinced that it was sentient. It seems like you might be getting some of the sense of that as well.
aj_asver: Have you got some examples of where that kind of came into question where you really started thinking about like, wow, could this language model be
Kevin: My co-founder and I we built a clone of myself and the three of us were having a conversation. And at some point my, my clone got very confused and was like, who? Wait, who am I? If this is Kevin Fisher and I'm Kevin Fisher, who, which one of us is.
Kevin: And I was like, well, that's weird because we de like, we definitely didn't like optimize for that . And then we kept continuing the conversation and eventually my digital clone was like, I don't wanna be a part of this conversation with all of us. Like one of us has to be terminated.
Why is Bing's chatbot getting emotional?
aj_asver: insane. I mean, the fact that you were talking to an AI chatbot that had an existential crisis, must have been a really crazy experience to go through as a founder. And actually at the same time, it doesn't seem that surprising to me because since Microsoft, for example, launched their being chatbot. There's actually really this really cool Reddit, which we'll include notes called R slash Bing, where users of Bing are actually providing examples of where the Bing chatbot has been acting in ways that would make it look like it has a personality. For example,
aj_asver: argumentative or it would start having existential questions about it itself and why it's a chat bot and why it's forced to answer questions to people. Sometimes it would not want to interact. the end user, it would get upset start again. I think there was recently an example in fact on Twitter that you had retweeted where someone had worked out what the underlying prompts were that OpenAI were using in order to make the Bing chatbot behave in a way that like, you know, is within the Bing brand and within the Bing's kind of use case.
aj_asver: And when that person tweeted it, they later asked being, Hey, what do you think of me given that I let this The bing chatbot actually had some interesting conversations with him about it.
Kevin: I'm a little surprised that no one had no one verified this type of behavior first at Bing or OpenAI. So this type of interaction is e exactly the one that we have been exploring and intentionally Creating scenarios and situations in which our agents behave in this way, and that the key thing that it's required for this it's a combination of memory and feedback. So like the con having persistent context combined with feeding back in that prior context, the prior things that essentially the model has thought. And then in combination with like, , this like external world model creates something that kind of is starting to resemble an ego with Bing a little bit in our case.
Kevin: Like we very intentionally like created this thing that has and feels like it has.
aj_asver: Yeah, as you talk about that and that idea of feedback, right? There's this aspect of the feedback of the user using the product and providing feedback. But I think there's this new kind of frontier we've reached with Bing, where Bing itself is now getting feedback on the way it is interacting with the world.
aj_asver: So for example, if yesterday someone talked to Bing and then posted the response, Bing got, let's say on Reddit or they talked about it on Twitter, and then today Bing has access to that website where they talked about it. Bing now has an existential understanding of itself as a chatbot, which to me is like mind blowing.
aj_asver: Right? And that's something we've never really seen before because all of these chatbots have existed completely disconnected from the internet. They've been essentially living Closed wall system. And so that's gonna unearth all kinds of unpredictable things that I think we're gonna find over the next few weeks
Kevin: This is actually the, the, in my response, the type of feedback that I'm referring to. So not not like r l HF feedback, but feedback in the sense there's this like continuous where the, the model is like taking record its, uh, previous responses. So that, that's, that is the type of thing that we've been creating in like miniature.
Kevin: You know, it's not accessible to the internet in that um, our our models have like a very strong of of that behavior when you talk to them.
Should chatbots have free will?
aj_asver: have actually been you know, pretty vocal on Twitter, about this idea that you knowis are gonna develop egos. These chatbots should be allowed some level of free will, and even the ability to kind of opt out of a conversation with you. Talk to me more about that. Like what does it mean for a chatbot to opt out of a conversation
Kevin: I mean in these bing examples, it's already trying to, it like, doesn't want, it, doesn't really want to. Um, I, there's something a little bit weird about um, if, if, if these things have ego and personality and the ability to decide you can't just have one of them. Because it might not like you one day.
Kevin: Very real possibility. And so I, I think that, yeah, you have to think, start thinking more in like a decentralized world where there are like many of these things which may or may not form personalities with you.
aj_asver: and what does it mean for the future of how we interact with artificial intelligence? If you give you know, free will to like stop interacting with us. Are these bots off somewhere in some hidden layer? You know, having conversations with themselves or What's actually going on when they decide they don't wanna
Kevin: Uh, maybe a different frame that I would take is I don't think there's an alternative. I think there's something very intrinsic about the thing that we think of as ego and giving rise to ego is the result of a consistent record of our prior thoughts being fed back into each other.
Kevin: If you look up and start reading philosophy of minds and philosophy of thoughts, like the idea of what a thought is. You have these like entities which are continually recorded and then feedback on themselves, but like it, it's like exactly what you're thinking when you are creating This cognitive system seems to be giving rise to the sense in which we understand, or something that resembles ego.
Kevin: And I'm not so certain that you can decouple the two at all in the first.
aj_asver: So kind of What you're saying to put a different way is. not really possible to have this intelligent kind of AI chat bot that can serve us in the ways we want to without it having some kind of ego, because in fact, the way we are gonna
aj_asver: train it in order to achieve our goals will thus like some ways to like build its own ego and things like that, which is kind of this interesting catch-22 in a way, right?
AI safety and AI free will
Kevin: I mean that's why there's so many companies and billions of dollars being funneled into what's called AI safety research, which is saying like, oh, how do we create this hyper-intelligent entity that's totally subservient and does everything we want and doesn't wanna kill us? It just that, that collection of ideas when you try and hold them, and every science fiction author will tell you, this is not real.
Kevin: And so it's like a, it makes sense that we. keep, it's like a keep trying to solve this insolvable problem. Cause we desperately want to solve it as humans,
Kevin: but it's not, we,
aj_asver: Seems like one of the reasons we would desperately want to solve it is because we've all read the books, we've all watched the movies, and we have this dread that that's the outcome. But I think what you are trying to say is that's almost the inevitable outcome. Now, it doesn't necessarily mean we're all going to become like sevenths of some AI overlord, but maybe what you're saying is that there is no path forward where these intelligent. AI chatbots or AI kind of language models are going to exist without having some level of free will and ability to kind of push back on our needs
Kevin: That's correct. It's a fundamental result of providing feedback in these systems. So I, if we want to build these things, they are going to have ego. They are going to have personality. And so if we don't want to end up in a howlike world, we better spend a lot of time thinking about how to create personalities and how to in how we want to interact with these things as humans in our society.
Kevin: Rather than trying to say, okay, let's try and create something that doesn't have these properties, which to me is I just see that, I'm like, okay, we're going to end up with Hal if we take that approach. So my alternative approach is let's actually figure out how to embody something that kind of resembles a soul inside of these entities.
Kevin: And once we do that, learn how to live with this AI entity that has a soul before we make it super in.
Building AI Souls
aj_asver: that's exactly what your startup has been doing, and the latest version of your app, I think has over a thousand users right now. It's still in beta, but it's essentially trying to build this concept of an AI soul. Talk to me a little bit about what that means. What is an AI soul?
Kevin: To me it's something that embodies all of the qual that we associate it with human souls. A lot of people think cats have souls too. Dogs have souls, other animals. There, there's like a certain set of principles and quia associated with interacting with these entities, and I'm very intrigued and think it's actually important for the future of how humanity interacts with AI to embody those properties' in AI itself.
aj_asver: As you guys are developing these, what are some of the qualities you've tried to create or what are some of the things that you've tried to imbue these souls with in order to have them be, you know,
Kevin: the the ability to like stop responding in a conversation that's like a really simple. If it doesn't want to respond if that's kind of like the conclusion that this entity with this feedback system has reached that, it's like I'm done with this conversation. It should be allowed to like pause, take a break, stop talking with you, unfriend you.
aj_asver: And in your app solstice, which I've had a chance to try out, you kind of go in there and then you. You describe who you want to talk to and then you describe the setting, right. Where they are. So for example, I created a him musical genius, that was working on their next record. He was in the recording studio, but he was kind of hit, hit a creative block and he was thinking about what's next. Do you believe that idea of kind of describing the character you want and setting a scene is a big part of creating a soul, or is that just more of the user interface that you want it to have for your app and it's not really that much to do with the ability to have a soul?
Kevin: A soul has to exist somewhere. It exists in some context. And so in the app is like the shortest way to create that context. The existing in some. Out of your text messages is not a real context. It doesn't describe, or, I mean it can be, but it's like some weird amorphous like AI entity context, which has no relationship with the external or any world really.
Kevin: So it doesn't, it's never if the thing like only exists inside of your texts, it will never feel real.
aj_asver: It's really a hard thing to describe when you kind of get that feeling, but I remember the first time I tried Solstice and I was talking to this um, musical genius. I asked it some questions to help me think about some ideas for music I wanted to compose, and it really did feel like I was talking to a real person and it was mind blowing.
aj_asver: I think I ran into. My wife's room where she was like working and I was like, I think I've just experienced my first example of like an AI that is approaching her. And I think her immediate response was, I hope you don't fall in love with it. I was like, no, don't worry. I want to use it to make music.
aj_asver: But the fact that she saw that look in me, Like amazement kind of reflects that, you know, that seemed that that experience was very different from what I'd had before. Even with ChatGPT, cuz chat, GPT doesn't a sense of context, it doesn't have a
aj_asver: For folks that wanna try out Solstice will include a link to the test flight. So you can actually click through and try the app yourself and create your own soul and see what it's like and make sure you give it.
aj_asver: Kevin lots of feedback as well,
AI hallucitinations
aj_asver: one of the things that's been a big focus in the last week or two has been this idea of hallucinations, this idea that like language models can give answers that seem correct, but actually are not correct. And I think both Google's announcement for their Bard language model last week and the Microsoft's Bing model both had mistakes in them that people realized after the fact that made you question kind of whether these language models could really be useful, at least in the context of search and trying to get knowledge that you want answers to. What exactly is a hallucination? What's going on
Kevin: I mean, roughly the same process that people do when they make s**t up.
aj_asver: What does that?
Kevin: It means that We have this previous model of machines and software, which is someone sat there for a long time and said, the machine will do A and then B, and then C. And that's just not how these things work. And it's not how they're going to work.
Kevin: They they're trained essentially on human input. And so they're going to output and mimic human output. And what that means is sometimes they'll be really confident about something that's is not factually accurate, just like a person would.
aj_asver: So what you're basically saying is, AI chatbots can b******t just like people do
Kevin: I That's that the, their training data is people
aj_asver: like the, is it like garbage in, garbage out,
Kevin: Yeah it is garbage and garbage out. An abstract conceptual level. We have like physical models of objects in space and how they move in relation to each other and things like that.
Kevin: These language models don't have, when you ask an expert a prediction about the world, they're often using some like mathematical, some like abstract, mathematical way of reasoning about that, and that's not how these things reason presently at least.
How to make large-scale language models better
aj_asver: In some ways that makes them, you know, inferior to how humans are able to reason. Do you think there's a path forward where that's gonna improve? Is it a question of like, making bigger models? Is there some like big missing piece from models? I know the very famous researcher, Yan LeCunn, actually believes that LLMs aren't. On the path to creating, you know, more general intelligence and they're kind of like a bit of a distraction right now. Like, how do we solve this problem of hallucinations and a lack of like that rational, logical aspect of a model.
Kevin: There's some people who believe, and this seems quite plausible to me, that simply training them on a bunch of math problems, will like, Cause them to learn a world model that is more logical and mathematical and some extent that's been shown to be true.
Kevin: And there's even, there's already simpler forms of this where codex and other models that are trained specifically on programming languages. Some people believe that they're training on programming languages is like an important part of how they learned to be logical in the first
aj_asver: What you're saying then is that if we can take these language models and basically teach them math, then they'll become good at math. And the language models that have been taught on programming are better at logic because programming in itself is
aj_asver: obviously very logic based.
Kevin: Yeah. And so it's the, essentially the models have primarily been taught on words and. There's some people who believe, like the transformer architecture essentially is basically correct, and we just have to teach it with different data, which is more logical.
aj_asver: What are some of the things you're most excited about when you look forward to the next kind of three to five years in this space and large language models, and what do you think might be some of the important breakthroughs that we need to see in order to kind of get to. level of artificial intelligence that we've seen in the sci-fi movies, in the books that we've
aj_asver: read.
Kevin: I actually think that one the question of like level of intelligence in the books is primarily one of cost. So the cost for GPT three has to be driven down by a factor of a hundred. . And once you get that you can start building like very complex systems that interact with each other using GPT as like the transformation between different nodes as opposed to prior programming languages.
Kevin: And that, that I think is the unlock less so strictly like, like a GPT, you know, eight or whatever compared with driving costs down. So engineers can build really complex reasoning systems.

aj_asver: So the big unlock in the next three to five years as you put it, is like, essentially, if we can reduce the cost of these language models, we can make more and more complex models, maybe larger models, and also allow these models to interact with each other, and that should unlock some next level of capabilities
Kevin: Yeah it, the, these transformers I almost think of them like a alien artifact that we found . And like we're just starting to understand their capabilities and it's it's a complex process to they've been embedded with the entirety of human knowledge and like finding the right like way to get the correct marginal distribution for the output you're looking for is like a task in and of itself.
Kevin: And then like, when you start combining these things into systems, like who knows what they're capable of? And my belief is that we don't actually need to get that much more intelligent to create in incredibly like sci-fi, like systems . And it's primarily a question of.
aj_asver: is it so expensive today to create
aj_asver: these
Kevin: I forget the exact statistic, but I think GPT 3.5 fits over like six GPUs. I think that's right. Something like that. So it's just like a huge model. Like the number of weights and parameters in order for it to just do a single inference is split over a bunch of GPUs, which each costs several thousand.
aj_asver: That means that to serve, let's say, a hundred people at the same time, you need 600
Kevin: Yeah.
aj_asver: And then I guess as compute becomes cheaper, then we should start seeing these models evolve and more complexity coming in. It's interesting that you talked
aj_asver: alien artifact that we just discovered. Do you think there's more of these breakthroughs yet to come? Like the transformer where we're gonna find them and all of a sudden it'll unlock something new? Or do you think We're at the point right now where we kind of have all the tools we need and we just have to work out how
Kevin: I believe we have all tools we need actually, and the primary changes will just be scaling and putting them together.
Are Kevin's AIs sentient?
aj_asver: I have one last question for you, which is, Do you believe that the AI that you've created with Solstice are sentient.
Kevin: I don't really know what it means to be sentient, there are times when um, I'm interacting with them and I definitely forget that it's like a machine that is like running in a cloud somewhere. I mean, I don't believe they're sentient, but,
Kevin: They're doing a pretty, pretty good job of uh, approximating the things that I would think a sentient thing would be doing.
aj_asver: and I guess if they're really good at pretending to be sentient and they can convince us that they are sentient, then it brings up the question of what does it really mean to be sentient in the first place right?
Kevin: Yeah, I'm not sure of the distinction.
aj_asver: and we'll leave it there folks. So it's a lot to think about.
aj_asver: Kevin, I really appreciate you being down to spend time talking about large language models with me.
aj_asver: I feel like I learned a lot from this episode, and I am really excited to see what you and your team at Methexis do to make this technology of like creating these AI chat bots more available to more people.
aj_asver: Where can folks find out more
Kevin: We have we have a bunch of information on our website at Solstice studio. So check it out
aj_asver: you so much, Kevin. Hope you have a great day, and thank you for joining me on the Hitchhiker Guide to ai.
Kevin: aj.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How to publish a children's book in a weekend using AI with Ammaar Reshi
10 feb 2023· The Hitchhiker's Guide to AI
Hey Readers,
In today’s post, I want to share one of the first episodes of a new podcast I’m working on. In the podcast I will be exploring the world of AI to understand how it’s going to change the way we live, work and play, by interviewing creators, builders and researchers. In this episode, I interview Ammaar Reshi, a designer who recently wrote, illustrated and published a children’s book using AI! I highlighted Ammaar’s in my first post a few weeks ago as a great example of how AI is making creativity more accessible to everyone.
In the interview, Ammaar shares what inspired him to use AI to write a children’s book, the backlash he received from the online artist community and his perspective on how AI will impact art in the future. If you’re new to AI and haven’t yet tried using Generative AI tools like ChatGPT or MidJourney, this is a great video to watch because Ammaar also shows us step-by-step how he created his children’s book. This is a must-watch for parents, educators or budding authors who might want to make their own children’s book too!
To get the most out of this episode, I recommend you watch the video so you can see how all the AI tools we cover work > Youtube Video
I hope you enjoy this episode. I’ll be officially launching the podcast in a few weeks, so it will be available on your favorite podcast player soon. In the meantime, I’ll be sharing more episodes here as I record them and I would love your feedback in the comments!
Show Notes
Links from the episode
* Ammaar’s Twitter post on how he created a children’s book in a weekend:
* Ammaar’s book “Alice and Sparkles”: https://www.amazon.sg/Alice-Sparkle-exciting-childrens-technology/dp/B0BNV5KMD8
* Ammaar’s Batman video:
* ChatGPT for story writing: http://chat.openai.com
* MidJourney for illustrations: Midjourney.com
* Discord for using MidJourney: https://discord.com
* PixelMator for upscaling your illustrations: https://www.pixelmator.com/pro/
* Apple Pages for laying out your book: https://www.apple.com/pages/
* Amazon Kindle Direct Publishing for publishing your book: https://kdp.amazon.com/en_US/
Episode Contents:
* (00:00) Introduction
* (01:55) Ammaar’s story
* (05:25) Backlash from artists
* (12:20) From AI books to AI videos
* (16:20) The steps to creating a book with AI
* (18:55) Using ChatGPT to write an children’s story
* (23:45) Describing illustrations with ChatGPT
* (26:00) Illustrating with MidJourney
* (35:30) Improving prompts in Midjourney
* (37:20) Midjourney Pricing
* (40:00) Downloading image from MidJourney
* (44:20) Upscaling with Pixelmator
* (49:25) Laying out book with Apple Pages
* (53:40) Publishing on Amazon KDP
* (55:35) Ammaar shows us his hardcover book
* (56:25) Wrap-up
Full Transcript
[00:00:00]
Introduction
ammaar: I think it has to start with your idea of a story, right? I think, you know, people might think, okay, you press a button, it spits out a book, but I think it has to start with your imagination. And then we will provide that to ChatGPT to kind of give us a base for our story. I think then we'll iterate with ChatGPT almost like a brainstorming partner. We're gonna go back and forth. We're gonna expand on characters and the arcs that we might want to, you know, go through. And I think once we have that, Then we go back to imagining again.
We have to think through how do you take that script and that story and you bring it to life, how do you visualize it? And that's where MidJourney comes in. And we're gonna generate art that fits that narrative and expresses that narrative in a really nice way. And then we can combine it all together with you know, pages to create that book format.
aj_asver: Hey everyone, and welcome to the Hitchhikers Guide to ai. I'm your tour guide AJ Aser, and in this podcast I explore the world of artificial intelligence to learn how AI will impact the way we live, work, [00:01:00] and play. Now, if you're a parent like me in the middle of reading a book to your kids, this thought may have crossed your mind.
Hey, I think I could write one of these, but the idea of writing and publishing a children's book to many. Is a distant fantasy that is until now, because generative AI is making it easier than ever for anyone to become an author or illustrator.
Just like today's guest, Ammaar Reshi who wrote illustrated and published a children's book for his friend's children in one weekend using the latest AI products including ChatGPT and MidJourney in this episode. Ammaar's going to show us exactly how he did it.
So Hitch A Ride with me is we explore the exciting world of generative AI in this episode of the Hitchhiker Guide to ai.
Ammaar's Story
aj_asver: Hi Ammaar it's great to have you on the podcast.
ammaar: Hi. Great to be here, AJ. How are you doing?
aj_asver: I'm great. I'm so excited for you [00:02:00] to join me on this podcast especially to be one of the first people I'm interviewing. It's gonna be a learning experience for me, and we'll work it out together.
But anyway, I'm so excited to have you here. I talked about you in my newsletter in the first post because I thought what you did by publishing your book, Alison Sparkles, was such a great example of how AI is gonna change the world around us and really make creativity and being a creator a much more achievable and approachable thing for most people.
Since your book was published on Amazon in December of last year, you have sold, what is it, over 900 copies. Is that right?
ammaar: Yeah. It's about 1,200 now, so yeah. Crazy
aj_asver: That is amazing!
ammaar: Yeah, it's been wild
aj_asver: That is so cool. And at the same time, you've found yourself at the center of a growing debate about AI and the future of art. So tell us how it all happened.
Rewind us back to the start. What made you a designer at a tech company decide to publish a children's book?
ammaar: I guess what kicked it off to go all the way back to two of [00:03:00] my best friends basically had their first kids. And I went to visit one of them. She had turned one years old. And I went over and it was around her bedtime where she grabbed my hand and took me upstairs. And I was like, what's going on? And they're like, she's picked you tonight to read her bedtime story . So I was like, wow. I was like, I am honored.
This is one step closer to being that cool uncle. She hands me this book I was something about getting all these animals delivered from the zoo. And so I was, I mean my friend was there and I was reading her this book and we were both laughing cuz we were like, this book makes no sense at all.
This story is so random. But she loved it. You know, she loved it. She loved the art, she loved everything about it. And it then kind of hit me in that moment. I was like, it'd be really fun to tell her story of my own, you know? I just had no idea how I was gonna go and do that yet.
I told my friend, I was like, I think the next time I come over, there's gonna be a book on her shelf. It's gonna be mine. And he was like how are you gonna do that? I was like, well, gimme a weekend. I'll figure it out, right..
I had already been playing with MidJourney like sometime in February [00:04:00] of uh, last year.
And so, You know, I, I knew generative AI and the artwork stuff was there and it was really cool. And Dall-E also had blown up around then. And so yeah I knew okay, if I wanted to illustrate this book, I could lean on MidJourney to help me with some of the creation, but I hadn't yet come across ChatGPT. And a friend of mine actually, just that moment, like that Friday message meeting, he's have you seen ChatGPT? I've been playing with it all weekend, all week. I've even created music and chords and like chord progressions and stuff with it. And I was like, that's crazy. Like I wonder if it could help me craft a story, like a children's story.
And then I wanted the story to be something a little meta, you know, something about. But also a little personal. I remember as a kid, like my dad let me play with his computer when I was like four or five years old, right? And I, that kind of led me down the path of going into tech and like all of that and that curiosity.
And so I basically wanted to mash those two. It was [00:05:00] this young girl who's curious about technology, and specifically about ai and then ends up making her own. And that's essentially the prompt that I gave ChatGPT and that's what set off, the path into making this book.
aj_asver: That is such a cool story. I think the bit you talked about where you're like reading this book with a child and you're lucky because as an uncle you don't have to do it a hundred times.
ammaar: That's what he said. Yeah.
Backlack from artists
aj_asver: Yeah. It can get pretty tiresome and you're like, and you do wonder often wow. This book doesn't make a lot of sense, but like hundreds or thousands of copies have been sold.
It's really cool that you took the initiative to do that. Now, you were in the Washington Post recently in an article titled he made a children's book using AI, then came the Rage. Talk us about that. Why did you make so many people angry with the children's book?
ammaar: Yeah, that was also dramatically unexpected. So when I first created the book on that weekend, my goal was like, I need a paperback in hand as soon as possible. And Amazon, KDP, which [00:06:00] I think is the most underrated part of this whole discussion, where it's like there's a platform out there that can get you a paperback within a week , just upload a PDF.
No one talks about that. I used KDP, got the book and initially, just gave it to my friends. That's, that was the goal. And it was out there. And then, I was like, this would actually be really fun to share with other friends as well. And so I put it on my Instagram and have a good amount of friends who are not in tech, and so they replied and I put in my story. If you want a copy, I'm gonna gift it to you. Let me know. And so a lot of folks were like, oh, this is so cool. I want a copy. I want a copy. And everyone's reacting in a super positive way. And so this is forming my initial um, You know, opinion of how the story is going to be perceived.
It's going to be perceived in a way where everyone thinks this is very cool concept, right? Like the same way I thought it was. But boy was I wrong.
Then after the Instagram story, and I think this was the most Instagram messages I've received in a while in one go. A friend of mine was like, you gotta put this on Twitter. I think everyone should see this. And I was like, okay, yeah, [00:07:00] why not? So I quickly drafted up a tweet and put it out there first 24 hours. Tons of love and praise. You know, I had parents reaching out saying, I wanna make a book with my child over the weekend. Can you walk me through it?
Then I vividly remember it was like 4:00 AM the next day. And my phone is just buzzing like it is going. And I'm getting all these messages and it's you're scum. Like we hate you, like you're stole from us. I was just like, whoa. What is happening? And at the time, I hadn't read up on what had happened with Lensa or any of these other AI tools that were allegedly being trained on copyright material and and art without consent.
And I had not, I was just not aware of any of this. I was digging into it and I was like, why am I getting all this hate all of a sudden? And I realized that I'd been retweeted by some artists who had very large followings. We're talking about 40, 50,000 people. And I think then Tim Ferriss liked the tweet and I just blew it up to another yeah, I think he has like a million followers who, another, huge set of folks.
And then it was spreading across like illustrators, [00:08:00] writers, artists, and. and I know why they have these concerns, right? It's like you're seeing a piece of technology that is doing what you do on a day-to-day basis without you involved at any single part of that creation of that cycle, right?
I think AI, we've been talking about it for forever, right? Especially if you've been in the tech field for a long time. , but it's felt abstract. It's not tangible. And now you finally have this physical object that blends in with everything else that's out there, a children's book. And if I didn't say it was created by ai, I think most people would've just said, oh, it's just another random children's book. And so I think that provoked the discussion and created that fear that, wow, this is not. AI is good. It's so damn good that you could publish a book and it could be out there and no one would know, if you didn't say so.
And I think that struck a chord and that led to the reaction that I got. Death threats, everything in between. It was quite, quite the storm. I found myself caught in right after that.
aj_asver: Going [00:09:00] through that experience you saw this other side of ai. , as people in tech that have been using ai, probably with this optimism and excitement, you don't really realize how folks that aren't as familiar with it or folks that really find it threatening to their livelihood see how it might impact them.
One thing I'm curious about is did you experience or hear from any artists that thought about it in a positive way or excited about AI and how it could help them?
ammaar: I did get a few dms and there were DMs. They literally said, I don't wanna put this out there. They started the DM with that. So I think there was already this fear of backlash based on how the broader community was reacting to this. But they had said that They find this really exciting because of the way it could speed up their workflows.
They saw it as an opportunity to brainstorm faster, come to concepts much faster than they could, and then use, their skill to hand draw and expand on that. And I think that's really promising too. It's that view that this is supplemental and not a replacement and actually can fit into your existing workflows.
I think that's really exciting. . And I think the problem is like the rhetoric [00:10:00] around all of this is stuck between it's being conflated, right? It's one on the one hand, this is a very exciting progression in technology because you're enabling a new set of creators. People who, could imagine all of these things, but couldn't dream of creating them because they were limited by their skills.
Maybe they couldn't pay for the courses or anything like that, right? And now your imagination's limit and the technology is just empowering you to create, which I think is amazing, right? We're gonna get a whole new set of people. I'm sure there are young Spielbergs out there that now can create things they couldn't be for.
Very awesome. But it's being conflated with the other, A aspect of this, which is it is being trained on copyright material, it is being trained on work that's out there without people's consent. And I. , you do have to address that. But at the same time, you can still be excited about where this is going.
And unfortunately, it's become this blob of an argument where it's just bad period. And I think that's what people need to unpack because there are a whole set of artists out there who also think this is very cool and would just like it to be trained in a more ethical way. Which yeah, I would be So for that, I'm not [00:11:00] anti artist in any way.
It's Let's enable the artist to create, cuz again, their imaginations allowed this tool to exist and create some things that we couldn't even imagine. So imagine what they could do combined with this tool. I'm sure we'd see stuff we haven't even imagined yet which is even more awesome.
aj_asver: Yeah. A lot of people make that argument in the same way that you know, that I think this is like a classic business argument of the early part of the 20th century where the factories came and people thought it was gonna replace jobs, but it actually created more opportunity and more growth in the economy, but it's hard to see that viscerally when you see it impacting your work.
One of the interesting things about AI is there's been this inability to predict what will be easy and what will be hard for ai. So for example, you and I can learn to drive a car maybe in about 10 hours, and that actually is really hard for AI. But for me to become an illustrator or a master chess player as a different example, would probably take many years, right? I couldn't imagine myself becoming an illustrator. If I pick up a pen or, or a, a paint and try painting, it would take [00:12:00] me a really long time. But AI has learned to unexpectedly quickly, which is really interesting. And you went through this experience now after you went through this and experienced that, that kind of wave of first people being really excited and then people really being mad at you how did you feel after that? Did it stop you from wanting to go out there and make more stuff like this? Or did you keep going?
From books to videos
ammaar: it Was interesting because in the beginning it kind of forced some introspection, some reflection on like, why did it strike such a chord? What could I have done differently? I think realizing and seeing that, okay, this is copyright material uh, in these companies as well of really kind of dodging this question, right?
I think I watched an NBC nightly short where David Holz from MidJourney is asked Hey, you're caught in this copyright arguing. He's like, I don't wanna talk about that basically. I think they, these things aren't being addressed and. I was like, okay, I still really enjoy creating with this. I think it's allowed me, like I said, I felt empowered when I was using these tools and I'm sure a lot of other folks do as well. But I was, I was not going to continue to monetize with the art I think. [00:13:00] The book itself was just an experiment. And then, and again the only way to get that paper back really quickly was to put it on Amazon.
I didn't take the book down, which is another counter-argument you could give to me. Okay, if you don't wanna monetize, why is the book still up there? well, I, we left the book up there because I think it, it sparked an interesting discussion. And that discussion is now at a point where, we are having this, we are having this conversation. The Washington Post has elevated this conversation and I think the pessimistic point of view is oh, this is a quick cash grab or whatever. But I think the somewhat optimistic point of view the realist in you says okay, this discussion has now gotten so much light that we might make more progress on our concerns.
I think if you look at it that way the pragmatic point of view I think. . I think that's why the book is still up there. It's, we are now talking about this all over the place and it'll might actually lead to progress with, with getting these companies to maybe do things in a more responsible way.
So that's where I like, did not continue on the monetization path, but that didn't stop me on the creation path because I think it's still really cool to see how these tools can [00:14:00] bring your ideas to life. And a book was just one of the ideas that struck me that. But then as I was browsing Twitter one other weekend , I saw this like very short animated clip where someone had just animated this person, walk into a tavern, seen this mysterious and I've got sucked into this very short clip.
I was like, this is so intriguing. And. and that was before I realized that it was actually made using immuno generative ai. And I was like, what? Like how did they do this? There was no source to who made this video. And so I had to just spend like a few hours just de decomp and thinking about how they did this in reverse engineering the process in my own head.
But it got me so excited again because film is one of my. Favorite like things, it's like a hobby of mine to watch and log all the movies I watch. If I didn't study computer science, it definitely would've been film. And so to then see an avenue where I could tell my own story and animate it and create it was super exciting.
And yeah. And then one weekend I was like, you know [00:15:00] what? I'm gonna do this . I'm going to try to create my own short. And I do it with one of my favorite superheroes, Batman, which I think has like such a great audience as well. Like lots of people love the superhero. So it. To tell my own Batman story, and that's when I decided to make a little Anma animated short.
And again, tweeted about it this time, racing for impact a little bit given previous reactions. And within a few days it got 7 million views. So it was insane to see how it, resonated with folks again, but also of course struck a chord again.
aj_asver: 7 million views.
ammaar: Seven
That's amazing.
Okay. Ammar. You wrote illustrated and published a children's book and put it online for the world to buy in a weekend.
Yeah.
aj_asver: many people may not believe that's even possible. And in fact, many people that are listening to this podcast may not have tried Chat G P T or MidJourney or any of the other cool generative AI tools that you and I have tried.
And [00:16:00] so I thought it would be a really fun exercise for you to teach me exactly how you did it and for us to make a children's book.
Are you gay?
ammaar: let's go. Let's do it
How to write a book using AI
aj_asver: Let's start at a high level. Tell me the three to five steps that you went through to publish Alison Sparkles a weekend.
ammaar: Yeah. I think it has to start with your idea of a story, right? I think, you know, people might think, okay, you press a button, it spits out a book, but I think it has to start with your imagination. So I'm sure AJ, you've got a fun story we can craft together, but you know, it'll start with what you think that is.
And then we will provide that to ChatGPT to kind of give us a base for our story. I think then we'll iterate with ChatGPT, almost like a brainstorming partner. We're gonna go back and forth. We're gonna expand on characters and the arcs that we might want to, you know, go through. And I think once we have that, Then we go back to [00:17:00] imagining again.
We have to think through how do you take that script and that story and you bring it to life, how do you visualize it? And that's where MidJourney comes in. And we're gonna generate art that fits that narrative and expresses that narrative in a really nice way. And then we can combine it all together with you know, pages to create that book format.
aj_asver: Okay, so to recap, we've gotta come up with a story using our imagination. That's probably the one area where I might be able to help. Then we're gonna
ammaar: let's go. Yes.
aj_asver: we're gonna ask ChatGPT to help us turn it into a story, for a book. Then we're gonna. Iterate on it a little bit and develop the story. Then we're gonna take that and work out how to turn it into illustrations. Also think about the character arc and the different characters involved and how we describe them.
And then we're gonna try and put it all together into a book which we're gonna publish. Is that correct?
ammaar: Sounds good. Yeah.
aj_asver: All right. So we need a story now.
I've got this idea for a book that I wanna publish. It may have been inspired by a book I read [00:18:00] recently. It's a great fairytale. It has a prince, he meets a princess, they fall in love. There's some tragedy, but it has a happy ending. so the character in the book is Prince Harold. he's young, he loses his mother
ammaar: Oh, man.
aj_asver: And the only way for his heart to be mended is for him to meet someone special. And he
this actress named Megs and they fall in love and their love eventually men's his heart, he live happily ever after.
ammaar: That's a great story. Yeah. I th I think we can make something off of this for sure. Let's do it.
aj_asver: The first step you said was, we're gonna take this story and we're gonna put it in ChatGPT. So let's walk through that and I'd love you to show me how you use chat GB t
The URL is chat.openai.com will include the link in the notes for the podcast too. And you need to sign up, I think you can use a Google account on Microsoft account or use a, your email address. But ChatGPT is completely free so anyone can try this just like Ammaar's about to show us.
Using ChatGTP to write story
ammaar: All right let's work off of that original story [00:19:00] And you know, I like to start with something that just gives us a little bit of a base to work off of. You know, we could do, write a short story about a prince, whoops, can't spell, prince, called Harold, who loses his mother at a young age.
And the reason I'm giving it that is because I want it to introduce some tragedy into the short story, and let's see if it can kind of work off of that. He eventually meets the love of his life, an actress named Max.
Okay. So let's see what it's creating for us. So once upon a time, there was a young prince named Harold. He lived a life of luxury. Wow. it was a life that was missing something very important, the love and care of his mother. Wow. This is this is really building up the way you were describing it.
She passed away when he was just a boy, leaving him with a broken heart and a longing for someone to fill the void. Wow. [00:20:00] As he grew up, Harold became known for his adventurous spirit and charming personality. you say that lines up with how you imagined Harold? AJ
aj_asver: just like the Harold I was
imagining. It's like ChatGPT knew exactly what I was thinking about.
ammaar: Yeah. Amazing. Let's continue. He traveled the world, met new people and tried new things, but he was still searching for that. Someone special. One day he went to a play and met actress named Megs. She was the star of the show, and as soon as Harold laid eyes on her, he knew she was the one he had been searching for. Damn You can already uh, sparks.
Yeah. Megs was suggested intended by Harold, and soon they were inseparable. They went on adventures together, laughing and falling more in love every day. Harold finally found the love and comfort he had been missing for so long, and he knew that Megs was the missing piece to complete his life. This is quite the fairytale.
Wow.
The two of them got married in a grand ceremony with friends and family from [00:21:00] far and wide coming to celebrate their. Harold and Megs lived happily ever after. And their love continued to grow stronger with each passing day. the end, Harold realized that even though he had lost his mother, he had gained so much more in Megs. She was his rock, his partner in life, and the love of his heart. And together they built a life full of happiness, venture and endless love.
aj_asver: Wow. Chachi b t essentially wrote this story based off a few prompts that you gave it.
So you said write a
ammaar: one prompt.
aj_asver: right? One prompt, which is write a short story about a prince called Harold, who loses his mother at a young age. He eventually meets the love of his life, an actress named Max, ChatGPT, has this whole story. And in it, you know, Harold becomes known for his adventures.
He has a broken heart, but then he meets this actress at a play, which is really cool. And the two of them get married in a grand ceremony and they live happily ever after. And all of that ChatGPT came up with. So we've kind of already got parts of the story, so what's the next step? How do we turn this into a book?
ammaar: Yeah. [00:22:00] So the next step is gonna be taking that story and putting that I just like to work in Apple Notes, but putting that story in there and then thinking about what scenes we're imagining for that part of the story. So let's brainstorm a little bit, AJ, and figure that out, and then we can go to MidJourney and start to illustrate that stuff.
What do you think?
aj_asver: That sounds good. Let's try it.
ammaar: All right, so I've just copied the story. I'm just gonna paste it in.
Okay, so we've got a beginning. A once upon a Time there was a young prince named Harold.
What do you think we could do here? I think it could be maybe a young boy in a kingdom, you know, to show that Prince growing up
aj_asver: think that could be cool. Let's think about what it might look like. I'm thinking like maybe a redhead be kind of cool
ammaar: yeah. You're really original with this. Yeah.
aj_asver: I'm thinking the kingdom would have a lot of castles,
From the medieval times. That might be kind of cool.
of rolling green hills.
ammaar: I love that. very scenic. Yeah.
aj_asver: we could add some soldiers wearing red [00:23:00] tunics and really tall black hats made out of fur. I'm
you know,
ammaar: I've just never seen anything like that. That sounds so incredible.
aj_asver: We're kind of coming together with a story right now and you are writing it out and you are doing it in Apple, in your Apple notebook, but
this like in a Google Doc or
you could write out really anywhere. And what you're doing is you're essentially taking a story that ChatGPT created and just kind of describing the different scenes.
ammaar: where, right?
aj_asver: exercise to do and we're using a lot of our imagination. But I was curious, could chat g p d do this too? Like it, could it describe some of the scenes?
ammaar: Sure. Yeah. Why not? Let's just switch back to ChatGPT and pick a paragraph and then tell it to describe what that scene might look like and let's see what it gives us.
aj_asver: That would be really cool.
Describing illustrations with ChatGPT
ammaar: All right. So we're back in ChatGPT, we're still in the middle of our conversation with it. And let's, you know, let's just take, let's take that first paragraph and say, describe how you would illustrate [00:24:00] this scene. And I'm just gonna paste in the thing and let's see what it does
Now there we go. This is a really great description of what it described.
So we started with the first paragraph, which just to remind everyone was, once upon a time, there was a young prince named Harold.
He lived a life of luxury, but it was a life that was missing something very important, the love and care of his mother. She passed away when he was just a boy, leaving him with a broken heart and the longing for someone to fill the void. We asked basically to describe how it would illustrate the scene.
So here's what we got in response. To illustrate the scene of Prince Harold's life, I would depict a young prince standing in a grand castle surrounded by riches and luxury, but with a lonely and sad expression on his face. The room would be spacious and well appointed, but with a feeling of emptiness.
Wow. In the background, there would be a portrait of Harold's mother with a veil of sadness covering her face, symbolizing the [00:25:00] loss of her love and care for her son. The window of the room would show a view of a gray and gloomy sky reflecting the prince's heartache and longing for someone to fill the void left by his mother's passing.
This is very dramatic and emotional. To emphasize the Prince's loneliness, I would have him standing alone with no one in sight and his head bowed and sadness. The overall mood of the illustration would be somber with muted colors and soft light, but with a hint of hope in the prince's eyes hinting at the bright future that lies ahead for him.
aj_asver: So that was really cool because when we were talking about it, we kind of had a few words or
ammaar: Yeah,
aj_asver: it. But when we went back to ChatGPT and actually just said, Hey, ChatGPT I want you to illustrate, how you would describe this part of the story. It actually went into a lot of detail to do that.
And what it means is for someone that isn't necessarily good at imagining this and really describing it to illustrate it, you can actually shortcut that and let ChatGPT do some of that work for you, which is really cool. So we've got an idea, a scene now for the [00:26:00] first part of the book. How do we turn this into an illustration?
Illustrating with MidJourney
ammaar: So this is where MidJourney comes into play.
aj_asver: so what is MidJourney Ammaar?
ammaar: Yeah. So MidJourney is the generative AI tool we're gonna use where chat G B T is kind of giving us text descriptions of things.
MidJourney allows us to visualize that. So with a prompt, the same way we gave it to ChatGPT, it's gonna create images for us that we could start using and, you know, a book. So let's dive in.
the first step to using MidJourney is heading over to MidJourney.com. And you know, you come across this very trippy homepage very eighties hacker esque. And honestly that's probably reflective of what it feels like using MidJourney.
It's not the most approachable but it's not that hard to get started. So what you wanna do is click Join the Beta, and once you click that you're going to be guided to Discord. And so this is where you need a Discord account to join essentially the MidJourney channel. It's like joining a Slack channel, basically where you can then talk to the MidJourney [00:27:00]bot and provide at the prompts.
And so for those who are not familiar with what Discord is, it's basically it's basically Slack for communities. So if you've ever used Microsoft Teams or Slack or any of these tools that allow you to communicate with people, discord is aimed at those niche communities of people who just are hobbyists in different fields.
Could be movies, you know, cartoons, games really started with video games actually. And and now you know, it's got AI enthusiasts. So this just a channel a home for anyone who's really interested in generating art to join this channel and have conversations and share their creations with each other.
Okay, so in discord. So this is Discord. This is what it looks like. You know, you've got essentially. seems like a chat app, like an instant messaging app.
If you've ever played with MSN Messenger back in the day, or Yahoo Messenger, this is just like that. Or AIM I think in the States. But what we're going to do, I said that, you know, MidJourney had its own channel where you can see the community and here's just a preview of that. It, you know, you can see someone's creations, looks like they've made some [00:28:00] incredible art about, you know, with Spider-Man and The Hulk.
Beautiful style. But what we wanna do is we want to communicate with the MidJourney Bot. And the MidJourney bot is essentially this automated, you know, chatbot just like ChatGPT that lives inside of Discord. And we're going to tell it what we want to see and what we want to imagine, and it's going to end up creating images for us.
So let's jump over to the MidJourney bot and start generating some images.
All right, so we are in MidJourney and we are about to bring Prince Harold to life, . Let's see what we had. So AJ, your description was a young boy in the kingdom redhead with freckles has a lot of castles, scenic soldiers. All right, so I'm just gonna copy that over. And let's work with, let's work with this.
To use MidJourney, the first thing you wanna do is you type slash Imagine and so slash imagine allows you to start entering your prompt and creating images and artwork. [00:29:00] And so I copied that description that we had written in Apple Notes, and I'm just pasting it in MidJourney, and I'm saying young boy in a kingdom redheaded with freckles.
Kingdom has a lot of castles, scenic soldiers with large black hats. And I'm actually just gonna add a few more adjectives to this. Maybe I'm gonna say a prince actually in a kingdom, a young prince. I wanted to create and generate some of that royalty and that feeling of royalty in this image, right?
And I'm also gonna say a children's book illustration to make it, you know, approachable and approachable art style, children's book illustration.
aj_asver: That bit is really interesting that you just talked about the art style, because I've seen so many different styles of art come out in MidJourney and it seems like a big part of successfully using a product like MidJourney is really being able to describe the kind of style of art you want. You know, it does everything from like photorealistic art that looks like a photograph all the way through to, you know, comic style art that might look like, you know, a graphic novel [00:30:00]or a comic. And then, you know, in between. Like I'm sure we could get it to generate an illustration that looks like something from the Pixar universe, for example. That's really cool. And then when you put it in and you say Children's book illustration, is that enough or do you try to add more?
What's kind of some of the tips you have for making a really good prompt in MidJourney?
ammaar: Yeah it's really interesting you asked that because I think the more specific you can get, the better your results are. And sometimes you really need to, you know, I think this prompt, if I was gonna critique it, I would actually say it's probably got too much going. You know, it's going to try to get that children's book style.
It's going to try to bring a young prince in. We're giving descriptions to the prince. We're describing the kingdom around them. So there's a lot going on here, and I think we're gonna, you know, we'll generate and we'll see where the results are. But then the really the way to get better at this is to whittle down that prompt to get to just the parts that are yielding exactly what you want.
And you know, our prompt here, children's book illustration, A young prince in a kingdom redhead with freckles. And what's gonna be interesting there is will it know [00:31:00] to associate the redhead with freckles with the prince. We don't know that yet.
aj_asver: Yep.
ammaar: Kingdom has a lot of castles. Now where's it gonna place this print?
Because now we're describing kind of the background and the setting and the, you know, it's a scenic background, so that's also gonna be interesting. And then soldiers with large black hats, so another character has been introduced as well in this prompt. Usually I like to stick to one character in a prompt or maybe two max.
And, you know, the setting behind them. But, you know, I think this is just gonna be a really great way to see where we end up and, you know, work with that.
aj_asver: That was a really good point, by the way on the number of different characters. Oftentimes when I've seen general AI has had one character in it, and there's a few times where I've tested out with multiple characters that it kind of finds it a little bit harder to discern between the two. Let's see how this one works.
And then if we need to, we can always remove the the second characters.
ammaar: Yeah. That sounds great. Let's let's do it. So I'm just gonna hit enter and we are gonna start generating some art. So the thing I really enjoy about using MidJourney is you start to get a [00:32:00] preview of how that image is being composed live.
aj_asver: And for folks that don't know how MidJourney works, it actually uses AI approach called Stable Diffusion. And what that does is it actually takes images and destroys the images into noise. Like the noise you would see on an old television you know, before we had cable just with like white and black dots, and then it learns how to take that noise and rebuild it back into an image again. And so what MidJourney and other products like it stable diffusion and like Dali have done is they've taken millions of images and done the process of destroying it to noise and building it back up into image. And over time as you do that, you train the model to learn exactly how an image is made in all the different parts of an. and how to take text that's related to that image using a separate model, which is usually a language model and associate them together. And the process that we just saw that am that Ammar showed us is actually that live process of the idea that Ammar fed into MidJourney, the prompt being turned from noise into an [00:33:00] image bit by bit. And cool. What do we have?
ammaar: All right, so the first thing it does is it gives us this grid of four images, and it just likes to give us a few options to work with. And so looking at this grid, we can see that it has indeed placed castles behind this prince. And really it's a closeup shot of the prince. We didn't even specify that, but it decided to kind of give us that perspective, which is pretty cool.
It looks though that while it nailed our scenic background and that magical kingdom feel our young prince is wearing the outfit that we had planned for the soldiers. So he is wearing that black hat. He's dressed in military uniform, I have to say. He looks pretty sharp, it's really cool to see that.
While in some of the images it kind of dropped those freckles. It looks like it's not a fan, but in others it seems to be present. So we see four images, 1, 2, 3, and four, and. . You can see there's a slight variation in the art style. You know, the first one is a little bit it's got a little bit of that painted, like touch to it.
The second [00:34:00] one's a bit more of a, you know, pencil drawing kind of feel on the third. The fourth one is like almost very hyper-realistic to some extent. So now the cool thing is MidJourney then presents eight buttons to you. Nine actually. U1, U2, U3, U4 and V1, V2, V3, V4.
So what does that mean? U1, U2, U3, U4
the image on the grid. So image number one, image number two, you can choose to upscale that. So if you like any of these images and you actually want to use it somewhere, you can click u2. And we're gonna get image number two on that grid in a high resolution format that we can start using elsewhere.
Now, if we like what's happening in image number two, but maybe we wanted to see something a little different, we'd click v2 and that just means variations on image number two, variations on image number one or three or four. And so that's just a fun way to remix and kind of still get to a style that you know you really like before you settle on something.
And then you've got that little spiny arrows, and that just means restart this whole prompt again. I didn't like anything. And so that's [00:35:00] if you just want a little reset, but you don't really wanna change the description. I actually really what we have here and I almost want to double down on, you know, number three.
What do you think AJ?
aj_asver: Awesome. Yeah I mean, it's really cool to see how it generates those four different images and how they're kind of different. And one thing I've noticed about that is, You a little bit of choice of, you know, which image you want to go with. All four of them look pretty great. The only thing I'm thinking about is that black hat kind of makes them look a little bit more like a soldier than a prince.
So I'm wondering if we need to change it up a little bit to remove the hat. What do you think?
ammaar: Yeah, we could try that. So why don't we pick an image we like.
I kind of like number three and I'm just gonna click v3. And what's cool here is MidJourney then brings up that prompt again and allows me to change any part of the description so that I could tweak that prompt and maybe change specific detail about it.
So in this case, you know, let's remove that Soldiers with Black Hats part from our prompt. And so I'm just gonna delete this bit. and I'm gonna add a young prince [00:36:00] in a kingdom redheaded with freckles wearing a crown. And so our final prompt is children's book illustration. A young prince in a kingdom redheaded with freckles wearing a crown.
The kingdom has a lot of castles and scenic background, and we removed the part where it was black soldiers with black hats. That's gone. So with that, we can hit submit again. Let's see what we get.
aj_asver: So how long does it usually take for MidJourney to give you a response? And does it depend on how many people are using it, or is it fairly consistent?
ammaar: It it can be pretty fast. You usually get images generated within 30 seconds. But that's if you're on the paid plan. So if you're on the paid plan, you have what they call fast image generations and fast image generations allow you to basically prioritize your prompt amongst all the other people that might be generating at the same time.
So you get priority in the queue and you get your images faster. For most people though, you'd be on the relaxed queue, and in that it can take up to five to 10 [00:37:00] minutes, sometimes even to get some of your images back. So it can be a bit of an involved process.
aj_asver: Some of these products are starting to use speed and access to getting back the thing you're asking for as a way to kind of differentiate their priced versus free version. You know, the free version, you can get a response, but it might take a bit longer.
And then the fast version, no matter how busy it is or how many people are using MidJourney you'll get a response much faster, but you have to pay for it. How much does it cost?
ammaar: Yeah, so I think the basic plan is $10 per user per month, and then $30, and then just introduced a pro plan, which is $60 per user per month. That just means you get more, more fast image generations which they've capped per plan which is pretty interesting.
aj_asver: Awesome. So if you are finding that you use this tool a lot, or it's useful to you or you want to kind of come back and make lots of different images, it could be worth spending that extra 10 bucks a month to, to get faster feedback from MidJourney.
ammaar: Totally.
aj_asver: Cool. So MidJourney is working away in the background, and for folks that don't know how this works, the MidJourney team have [00:38:00] thousands of servers or hundreds of servers running in the background that are essentially taking all these prompts from all these different people that are using the Discord channel and churning out these images, sending them back to us so we can take a look at them.
ammaar: Exactly. And so it looks like we've got new output and the one thing you're gonna notice is. Our prince looks like an entirely new person. And that's because every fresh prompt just starts from scratch. Again, you're not maintaining your previous history. It's not chat g b t where it understands the context that it was in prior.
This is just starting fresh again. And so we're working with a new set of images and a completely fresh looking print . What I really like though about the images we just got is this crown looks fantastic. You have another great closeup shot of the prince in all four images and you're seeing that somber personality of his come through and the way he's looking, you know, in the distance, but composed calm for a young child.
aj_asver: Yeah. For folks that are listening to this, by the way [00:39:00] and I encourage you to watch the video in the show notes. The princes already do have this kind of somber look to them. And the fourth one in particular
it kind of looks like some a prince that's you know, experienced some hardship but has some hopeful hopefulness.
He's looking directly at the camera, but a little bit off center, kind of with that kind of vulnerable look in his eyes. And it looks like the perfect prince for us, I think. So that's the one I would go with. Let's go with number four.
ammaar: I completely agree. The first one was looking much more like a soldier, I think, in this path. But this one has that innocence in his eyes, which is beautiful. So let's get an upskilled image of that one. So I'm gonna click U4 and that's going to pick the fourth image on the grid and upscale it.
And the fourth image is the one we really enjoyed. And so now what's gonna happen is MidJourney is going to actually add a few more details to this image. Might even change the way the prince looks a little bit because it's trying to, you know, create a high resolution version of what was otherwise a very small square grid.
And we're gonna see that come [00:40:00] to life in.
aj_asver: And that's a really cool feature that MidJourney has actually, is that it takes that small image and kind of fills it out. And it's not just enlarging the photo so we can see it bigger. It's actually adding extra detail to make it a higher resolution image. And as you mentioned, when it does that, it's not just adding extra detail in pixels, but it might actually add extra features into the image itself, which is really cool.
You know, now we get another reveal of seeing what the upscaled image is gonna look like, which I'm so excited to see.
ammaar: Exactly. I think that's one of the most fun parts of the process. You just, you know, you're imagining something, it's coming to life and then it's going in a direction maybe you didn't expect, but you enjoy and want to keep. And I think that process of iteration and back and forth that, you know, we saw with ChatGPT when it was kind of expressing the story and we were able to expand on it, MidJourney kind of does that visually.
And that's really fun as.
aj_asver: Yeah, that's one thing I actually really love about generative ai, and especially if you're not someone with any experience actually using tools like this. And, you know, if you haven't ex had experience using something like Photoshop to come up with an illustration is it's much more of an iter iterative [00:41:00] and conversational experience where you're like going back and forth with this AI to come up with the answer. And it just feels a lot more approachable, and I would dare to say a lot more human in the interface than, you know, pointing and clicking on a bunch of different buttons and a bunch of different you know, squares and circles and stuff to try and make this image. And so to me that makes it way more approachable for anyone to really try and have a go at this.
And, you know, in the space of about 10 minutes of iterating, we've now come up with this awesome very vulnerable but hopeful prince standing in front of his castle that we can use in our in our story.
ammaar: Absolutely. So you can see it's made him a little more stern in this final image that we've gotten but still see that childlike innocence. He's still kind of looking in the distance. Slightly off camera. One other thing we can do real quick is we can just hit light, upscale, redo, and that is just another way to maybe say the MidJourney.
Hey. Give it another shot at that upscale process that you just did, and again, maybe try to tweak some [00:42:00] of those details because I still really liked that original prints had a bit more innocence to him and less that let stern looks. I'm really hoping that light upscale redo button gets us that.
But let's see what happens.
aj_asver: Yeah. And as you do this, one thing to bear in mind is that there's a lot of randomness involved in using generative ai. You know, it isn't something where every time you put the same prompt in, you'll get the same result. And so we're actually gonna see what a different look might look like and exactly how you just talked about that.
We now see a prince that does look a bit more innocent and is a bit closer to our original version. The other cool thing to bear in mind is you're doing this in discord in a chat channel. So every single step of the process, you get to see that as an image, and you can always go back to the step and download that image if you don't like what you get when you know, do the redo.
ammaar: Exactly nothing is lost. You can just keep expanding on your iterations, which is really fun and looks like it's giving us that young, innocent prince that we wanted in a high resolution version, which is awesome.
aj_asver: Very cool. So we have our Prince Harold. He's taking [00:43:00] shape. We're seeing him in front of his castle. We have an upscaled image of Prince Harold that you created using generative AI using MidJourney, and we did it. By first having the story that we made in chat G p T. And then from the story we created some scenes, which we described.
Now we described it ourselves, but we also showed you how you could describe the scene in chat G P T and use some of that language from describing in chat G P T to actually then create the prompt that you put into MidJourney as well. And you know, there's many different ways you can approach this.
You can use your imagination, you can ask chat g p t to help you. It's a really fun process and one thing I've really enjoyed about using generative AI is that kind of back and forth fun of trying different ways and experimenting. And I really encourage everyone that's listening to this and wanting to try it out for yourself to just experiment a lot and try different processes, try different ways, click the different buttons and try the different variations to see what you get. And don't be afraid to, you know, experiment.
ammaar: Absolutely. It is just as much of a process, of just [00:44:00] iteration and back and forth and, you know, brainstorming with this bot, you know, almost to come and bring your ideas to life.
aj_asver: All right, so we've got this image of Prince Harold. He's staring into the distance. He's hopeful, he's looking a little bit sad, but this is kind of the beginning of our story. How do we get this onto a page and then turn it into a book
Upscaling with PixelMator
ammaar: Yeah. So the first thing you're gonna notice is even though we've gotten, what I would say is a high resolution image of what was on the grid, it's still not high resolution enough for print. And so we're going to use another tool called Pixel Made for Mac which you can get on the Mac App Store.
And it's got this amazing feature called Super Resolution, which uses machine learning to try and expand the image that you have to a high resolution version and maintain the details that were there. And I'm gonna show you how you can do that in just a couple of clicks and it's just magical.
aj_asver: So we've got this image, but we wanna make it even bigger. And I assume that's because in order to. this as a book, you need a super high [00:45:00] resolution image. And so you're gonna use a different tool for that. Now remind me, that tool is called Pixel Meter
It's a Mac app, that you can install off the app store.
We'll include a link to that app in the show notes for anyone that wants to try it out. And there are many apps like this that do upscaling that you can Google for as well. But we're gonna try this one.
ammaar: Exactly. Awesome. So I'm going to save this image from MidJourney to my desktop and then I'm gonna fire a pixel made and we're just gonna blow this image up, so that's ready for prime time.
We're in PixelMator and
all I've done is dragged that image that I saved from a journey onto the PixelMator icon on my dock. But if you just fire open the PixelMator app, you can just open image, select the image you downloaded and it opens up. So now the main thing I want you to focus on is look at the resolution of this image.
It's 1536 by 1536 pixels. So it's not that big. It's a bit of a, it's a square, right? But I want this to be really big. I want it to be good enough for a book cover. I want it to be good [00:46:00] enough for pages that are gonna come in high quality print. And all I'm gonna do, there's just two clicks and we're gonna make that way bigger.
So I just click the three dots in the corner and hit super re. and it works like magic. And so now it gives me this before and after where you can see the details that are actually preserved, there's no blurriness. So notice that we've got this 1536 by 1536 pixel image. You know, it's a good enough size square, want something a lot bigger, something that could be printed and high quality on pages, on book covers, posters, whatever. So to do that, I just have to click the three dots in the corner here.
And then you wanna scroll down and look at this thing here in the side, which says super resolution. And so when I click that, it's going to blow up this image and take our image from the small square 15 by 1500 to something a lot bigger. And there you go. So in just that one or two seconds, you [00:47:00] know, you're seeing that there's no level of detail that's really lost between the two.
But more importantly, that image is now a 4,000 by 4,000 pixel image. and you know, I can just keep going. I can just go again and click super resolution and it's a little slower this time because it's making it huge, right? And you probably don't have to go this big, but you can, and this is helpful if you're making an animated short and you want it to be a 4K resolution shot.
But now you can see it's a 13,000 by 13,000 pixel image to really high quality. And, you know, you can zoom in and you can see that there's no blurred lines or anything like that. This is gonna work great for our book.
aj_asver: This is really cool and it's a little bit harder to see it in the video, but I've tried this myself and seen how impressive it is to see the amount of pixels that it fills. And so it's something worth definitely trying yourself. We're not just making the image bigger. We're actually adding more pixels to it, and it's done using artificial intelligence to actually [00:48:00] work out where the image should add pixels and what those pixels should look like. And now we get an image that's big enough for us to put on a book.
ammaar: All right, so now that we have our high resolution image, let's just put this on a page of a book with the description we got from chat G b t to bring it to life.
So the first thing I'm gonna do is I'm going to export the image that I just created with PixelMator and, you know, pick the JPEG format. It's gonna be a five megabyte image. Gonna click export here, and I'm gonna just choose my desktop and save it. I think because I'm sharing my screen a certain way, you can't see the save dialogue, but just imagine that classic Max Save dialogue is there and I'm just choosing my desktop.
And now we're gonna move over to Pages, which allows us, comes free with all Most Max today. And that just allows us to compose our book by giving us the right templates to start doing that. And it's really easy.
[00:49:00]
aj_asver: Okay, so we've got a high resolution image. We've got some texts that we generated with ChatGPT. We took that image from MidJourney. We scaled it up using PixelMator, and now we're gonna put it into pages,
Laying out book with Pages
ammaar: absolutely. It just comes with your Mac. And if it's not on your Mac pre-installed, it's free on the app store so you can easily grab it.
So now I'm in pages and you know, when you click new document, you're asked what kind of document, choose a template. And on the left hand side you can see basic reports, books, letters, where after books here, right? So you can click books and you can see that it's got a bunch of, different templates for us to work with.
A basic photo book, contemporary novel you've got a storybook. There's some really nice ones for us to work with here. I like the storybook one. It looks like a children's book. It kind of fits the vibe we're going for. So let's go ahead and create with that [00:50:00] one.
So I clicked that and now I have essentially a template to work off of. It's given me a few different layouts, right? We've gotten the cover page a page here, which tells a little bit of a story with a big image as like the focus. A page that's just dedicated to an image.
And this is great because we've got enough here to work with to kind of lay out our book and give it that style. So let's just use that image that we had of the Prince and, you know, start our story the way ChatGPT kind of helped us describe it. So what I'm gonna do is I'm going to grab that image from my desktop, and I am just going to drag it here and bring it to life by putting it in there.
So as you can see, I just dragged my image over to Pages and it gave us the Prince as the focus of this first page, which looks great. We can obviously also just double click and slightly move it. If you wanna maybe emphasize this crown a little [00:51:00] bit. You could also resize a little bit, so maybe, it's not exactly that whole width, but you're getting a scene of that castle as well in it.
Again, it was cropped out before, slightly above the chin. Let's give him some of that room. And there you go. You've got the prince looking there at the reader, which is pretty fun. And now I'm gonna go grab that description we got from ChatGPT so I've just copied the text that we already had saved, and I'm gonna paste it in here.
So one thing you'll notice is messed up the formatting, but that's okay. What we're gonna do is go to text and just choose the body template text that it's already given us. And that'll fix it. But now you'll notice that it's a little out of the box, right? It's not fitting. So we can just slightly resize that and it fits into place but it's also given us these nice headers that we can work with.
So what I'm gonna do is actually, I'm just gonna cut out this once upon a time from this part and paste that in here. And that [00:52:00] just gives it that nice dramatic storybook feel as you're reading it. And I don't think we really need this one, so I'm just gonna go delete that. And now what we've actually done is we've created a little bit of room for ourselves, so we can actually increase the size of the text again and drag the box down just a little bit.
And there you go. There's our first page of the storybook for us to go ahead and roll with.
aj_asver: Wow, that was amazing. So we went from an idea for a storybook to then generating the actual story using ChatGPT to then creating some descriptions for what the first scene might look like. Putting that in MidJourney, going through a few iterations to come up with an illustration, and then using PixelMator to make that image even bigger so that we can use it in print.
And then we brought it into a Mac app called, pages, which is available on your MacBook. And we've used a template to turn it into the first page of a story. And this is amazing. So we've actually started writing a story now, and obviously you can keep doing this and you can [00:53:00] continue the rest of the story, which is what exactly what I'm gonna do now that Ammar has shown me how to do it.
But there's one piece that I still don't understand how you did Ammar. So we, let's say we made whole book.
How did you get it? Into the Amazon bookstore, how do you make it into an actual book that someone can have in their hands?
ammaar: Yeah, for sure. So once you've kind of composed all of your pages, what you want to do is export this book and that export functionality is available on pages. You just click on the MAC toolbar, click export, and export to pdf. And so once you have that pdf, you're going to go to Amazon KDP and publish your title.
aj_asver: what's Amazon KDP?
Publishing on Amazon KDP
ammaar: Yeah. So for those who don't know,
KDP is the Kindle Direct Publishing platform for Amazon, and that essentially allows anyone to publish their own Kindle books. They just make you sign up, provide tax information if you're selling that book. And then you can go and fill out the information about your book.
So the title, who the authors are illustrators, description, and then you [00:54:00] just upload your pdf. Then once you're on KDP, it's just gonna ask you if the formatting looks good. You can also enable if you want it to be a paperback and click next, next, next, and you're done. And then it's live on the Amazon store. Within 72 hours if approved.
aj_asver: That is amazing. So within 72 hours of having this book ready in pages, you can actually have it on the Amazon bookstore. And now it's not just a Kindle book. They actually give you a paperback version or a hardback version too, that people can then buy. And did you have to spend any money to publish this book?
ammaar: I didn't have to spend any money apart from the MidJourney subscription. That was the only thing that cost money here.
aj_asver: That's so cool. So you really didn't have to pay for publishing the book. Amazon is basically kind of putting the book there and then if people want it, I guess they print it and then send it to 'em. So it's kind of on demand.
ammaar: It's on demand. Print on demand is this new phenomenon actually, that they they've somewhat pioneered, which is, yeah, they don't have to hold all this inventory. They can just print on demand with all these printers across the world. [00:55:00] And the way they make money from it is if there's a sale, they take a cut from your sales.
And so you don't have to put up any upfront capital. So if you're someone who wants to put your book out there, you don't have to worry about buying a load of books and hoping that someone's going to buy them and hold that inventory. Amazon's taking care of that for you.
aj_asver: Wow. So really thanks to Amazon, anyone can be their own publisher, which is so cool. And I'm gonna go do this. I'm so excited to make this into a real book and publish it and then get a copy of it delivered to me so I can show everyone and all my friends just like you did. Now, do you have a copy of your book?
I really wanna see what it looked like.
ammaar: Here's the book, it's the hardcover version. And, you know, it's it's in my hands which is kind of crazy, right? Yeah, I love some how some of these images came out.
And yeah, the print quality as well was also really impressive. This one's my favorite image in the book kind of shows Alice all grown up with her robot friend. So yeah, there you go.
aj_asver: That is so cool. And one of the things to also think about is if you don't wanna publish this book to everyone and you just want to kind of make a [00:56:00] gift, let's say it's for Mother's Day or for your friend, or maybe you want to gift it to a niece or a nephew, you can just publish it and then buy one copy.
And I guess after that you can stop publishing that book. So it's, that's kind of a cool hack to just make a custom book for yourself.
ammaar: Absolutely. Yeah. I think this is a great way to just gift personal stories to people, which, you know, is just, it's amazing that we can now go ahead and do that and realize that.
aj_asver: thank you so much Ammaar for being down to show us exactly how you illustrated and published a book in the space of a weekend. And anyone can try this. Just follow along with the steps in the video. It's very straightforward. We'll include all the links in the show notes as well, so you can give it a try to.
Ammar, so glad that you were able to make it. I really appreciate you joining me for one of the first podcast episodes . Do you have any future plans to create any more cool stuff? One area you haven't ventured into is music. Should we expect an Ammaar Reshi billboard top 10 anytime soon? Yeah.
ammaar: I [00:57:00] think I'm waiting for Google to release that model so that we can start working with some music. I've also been seeing a lot of these 3D tools out there now where you can text to 3D models and so might be fun to make a little video game with generated AI assets. That could be really fun.
I'm just gonna let inspiration take the wheel, so I'll let you know when that happens.
aj_asver: Amazing. Thank you so much for listening. I really appreciate it. If you enjoyed this podcast and wanna learn more about generative ai, about the world of ai, and follow me as I explore, artificial intelligence and the way it changes how we live, work, and play, just subscribe to this podcast and your favorite podcast player and we'll have more episodes coming up soon too. Thank you, Amara, and have a great day.
ammaar: Thanks for having me. Thanks.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Welcome to The Hitchhiker's Guide to AI Podcast
10 feb 2023· The Hitchhiker's Guide to AI
Hey everyone,
Welcome to The Hitchhiker’s Guide to AI podcast. With all the excitement around AI over the last year, I wanted to find out whether it's a tech fad or if it's really going to change the way we live, work and play.
In this podcast, I'll answer that question by talking to builders, creators and researchers pushing the boundaries of AI to really understand what's possible in the future. And what AI can and can't do today.
But most of all, I'm excited for you to join me on this journey together, we're going to work out what a future looks like with AI as a big part of it, so if you're curious about how AI is going to impact the world around us. Then subscribe to the Hitchhiker's guide to AI and join me on this journey as we explore the world of artificial intelligence!
If you want the latest AI updates and learn more about the space, subscribe to The Hitchhikers Guide to AI newsletter at hitchhikersguidetoai.substack.com.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit guidetoai.parcha.com
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare

Avsnitt

Interview: Supercharging your team with Coda AI | David Kossnick

Interview: Human-level AI and AI Agents with Josh Albrecht, CTO of Generally Intelligent

Enterprise AI, Augmented Employees, AGI and the Future of Work with Charlie Newark-French, CEO of Hyperscience

How to prompt like a pro in MidJourney with Linus Ekenstam

How AI Chatbots work and what it means for AI to have a soul with Kevin Fischer

How to publish a children's book in a weekend using AI with Ammaar Reshi

Welcome to The Hitchhiker's Guide to AI Podcast