Exploring Web Programming using AI

For the last several weeks, I have been exploring web programming using AI. There are a couple of web page sketches I wanted to write. Looky here:

Background

There’s been a lot of noise about using AI in programming work flows. I spent much of August studying, trying to get a feel for what’s going to be important over the next few years. There’s the general overview, and then more specifically, the separation of cloud/desktop computing into tangible computing. Tangible computing is an embodiment of what people call Edge Computing into actual physical devices.

One question I wanted to explore was how to interact with remote compute elements. The universal method for dealing with almost any connected device is through some type of web-based application. This means that on the actual device itself, like a Jetson, is that there needs to be some type of server and a little bit of understanding of how to deliver information.

Web Programming

At the same time, there’s been a big push to use AI-based tools in the programming arena. Web programming is relatively mature as far as such things go. As a programmer, there are a few things to know. First, there is a server which runs on a host machine. Clients are connected to the server through a network connection, hard wired or wireless.

While clients can be connected via a custom app, most use a web browser to interact with the server. The server sends markup language (HTML), Style Sheets (CSS), and scripts (Javascript) to the client web browser to render pages.

On the server side, you need to know how to actually serve the pages. Typically, this is done through a mature library of code, wrapped for the language of your choice. The main challenges to you as a developer is to organize and send data the clients. This usually involves some type of database or file organization along with any dynamic data that you need to generate.

As a note, there are many levels of complexity in the web ecosystem. A lot of attention goes to enterprise-level work. However, for people working with smaller systems on the edge, that may not be a good fit.

Using AI to “Help”

In the video, I wrote a couple of sketches of web apps. In the first case, I just told the AI I was using (ChatGPT-4o) to architect and start writing code. Not surprisingly, ChatGPT started writing an enterprise level application.

Regarding the web page user interface, the only novel interaction on the web page was a splitter element which allows the user to adjust the size of the left and right pane with the mouse. Otherwise, it was a straightforward web application.

ChatGPT chose to use the most popular architecture today: A Single Page Application (SPA), with a Python Flask server, and a React server for JavaScript on the client side. Not surprisingly, this led to quite a bit of work just keeping the separation of concerns straight in our discussions.

The server side was fairly straightforward. The web pages represent personas, which are a description of how you would like the AI to respond when you work with it. Sometimes you might want it to explain something to you as if you are in high school, or you might want to talk to it like it is a college professor. This could be a profession too, such as “talk to me like I am a marketing professional”. This puts context on the discussion.

The AI was very helpful here in two ways. First, it was able to generate personas from descriptions. ChatGPT is able to do a basic interview and generate a persona to store in a JSON file. Additionally, it was able to create an image of the person or thing being described.

Second, the server side code was generally a good template on how to build the application. Because the application is taking the personas and serving them, making requests to add or subtract code is simple. It’s basically “Give me a route that serves this information,”, and ChatGPT produces a good outline of the code.

That’s not to say it’s beautiful, flawless code. But it’s something you can work with.

Client Side – Big Ouchies

On the client side, ChatGPT and I were way over our heads. It had chosen React as a framework, which means there’s a React server running. Remembering there is a symbiosis between the server and client, it was difficult to break down tasks into small enough elements for ChatGPT to work on them.

On the client side, the way I ended up thinking about it is a glorified code template. Thinking about it for a couple of weeks, I concluded that it’s not so much a knock on ChatGPT but rather a mismatch of the tools to the task.

A web front end/back end architecture may be useful if you’re an enterprise or have teams of people working on website design, programming and maintenance. When you’re a lone developer, that’s a lot of things to keep on the cognitive stack.

Annotated Video

About 10 years ago, I remember coming across Bret Victor’s annotated video ‘Media for Thinking the Unthinkable’ in one of his excellent presentations. While the article is quite good, I thought the presentation was just as fascinating.

I built a prototype at the time, but the technology wasn’t quite there yet for anything more than bespoke builds. One had to be able to control the video from the browser and have transcriptions of the audio to match against. Automating transcriptions was still a while away, so about the only realistic way to get transcriptions was to either manually handle it or sending to a service. Expensive, either way.

Dynamicland

Victor’s research lab, Dynamicland, recently had an update which rekindled my interest in the annotated video idea. They also do a lot of tangible computing, I can’t recommend their website enough.

If you look at the Dynamicland Intro page, you’ll see the modern implementation of annotated videos. Everything became much easier to implement this idea, or so I thought, and took out the ‘sketch pad’ to get to work.

Pretty simple game plan. Get the video transcript, turn it into paragraphs with timestamps, get thumbnails, get headers, mash it all together and you’re ready to rock. It was almost like that …

Getting the video transcript was easy, there’s a Python library to do that. I used a JetsonHacks video as the guinea pig. The video transcript is a group of sentence fragments. Perfect! A LLM should be able to turn that into paragraphs.

Coaxing and Fiddling

I figured an hour or so to get the paragraphs and put time stamps on them. It actually took about eight. The transcript was too long to be processed all at once, so it took a little while to convince ChatGPT to cut it into chunks. There were times ChatGPT was using code and helpfully presenting it to me for paragraph splitting. I put my foot down, gave it the transcript sentence fragments, and told it:

Please process the entire transcript provided using your language model capabilities. Do not use code. Break the text into coherent paragraphs based on the semantic flow, making sure to analyze the entire file and maintain its original meaning. Add proper punctuation and capitalization where needed, but keep the text verbatim. Do not summarize or omit any content. You can process the transcript in chunks. Finally, save the processed transcript into a text file for download.

It eventually came about to a workable solution. Each transcript segment has a time stamp, so I wrote a script to attach a time stamp to each paragraph from that info. Creating the paragraph thumbnails is easy, I have the source file and could use ffmpeg to ask for a thumbnail for a given time. Those go into a JSON file. Same thing with headers.

Putting it All Together

Next up was writing the server, and building the page to serve. I went with FastHTML, Jeremy Howard’s new take on web programming. For a simple prototype like this (and certainly something you should check out), it was a great fit. The basic idea is to serve everything from the backend, just like the “good old days”. There’s lots of modern tricks of course!

I lifted the javascript and CSS from the Dynamicland annotated video. This would be a no-no in a production environment without licensing/permission, but this is for personal exploration. In order to setup the transcript pane, I wrote (with the help of ChatGPT) a little weaving function which assembled the headers, thumbnails and paragraphs. It didn’t take very long to build the server, considering that I was learning FastHTML at the same time.

In a couple of days, including learning about FastHTML and understanding how the Dynamicland annotated video works, I have a functioning annotated video of my own! All the server, javascript and CSS code is ~ 500 lines of code. It feels within reach of most developers.

Conclusion

I don’t think there are any great insights I can offer here. The ideas reinforce what I already know, mainly that web programming is quite a volume of work. You have to know a lot of things, and have a very good idea of what the final result should be. Also, you have to be very careful that you don’t try to use a development model that is more suited to a group of people rather than a developer on their own.

As far as the AI part goes, it’s interesting and entertaining. Sometimes actual work might get done. However, for now, I think the important thing is to play to the AI’s strengths. It’s pretty good at building what I’ll call smart templates. It’s also good at explaining when you encounter errors, and can make good guesses as to what chunks of code are doing. I might even be coaxed into letting it writing unit tests, or adding guard code. Maybe a sprinkle of documentation here and there.

But at the end of the day, it doesn’t feel ready to build non-trivial applications right out of the box. There are AI wranglers out there who are better than I am at this; maybe they get the desired results. But it’s not general knowledge just yet.

The post Exploring Web Programming using AI appeared first on JetsonHacks.