[IDEAS] GUI App Synthesis with LLM

Posted on November 22, 2022 in Thoughts Apps Markdown Extension

This is an idea I have been thinking about for a while. The idea is basically to let any user write their custom needs in natural language, and then use LLM to synthesize the GUI software. LLM itself wouldn’t be able to generate realistic cross-platform GUI software, as the existing software development tools are too complicated for LLM to generate in a few shots. However, LLM can be used to generate a DSL, which can then be used to generate the GUI software.

App Synthesis with LLM

Many people have been talking about the idea of using LLM to generate software. However, most examples are just generating simple HTML code or simple Python code. Any real-world software is much more complicated than that, and I wouldn’t be confident to say that LLM can do so in the near future.

From a high-level perspective, a GUI app consists of frontend UI and backend logic. The frontend UI is usually a hierarchical structure of UI components, and the backend logic is usually a set of APIs to interact with the UI components.

Extended Natural Language for UI Programming

UI programming has been always a hard task. I personally prefer the web + container approach, such as Tauri, Electron, or React Native (to be strict, RN is actually using native UI rendering, but it is still web-based). However, the web approach is still hard to use, as it requires a lot of knowledge about HTML, CSS, and Javascript.

AuTool: A Yaml-Embedded DSL for Task Automation

# A AuTool task file defined in yaml
- task: A task is a collection of steps
- actions:
    - cmd.while(...):
      - cmd.if(...):
        - os.shell(...)

The DSL is actually not necessary, as the whole thing can actually

class Task:
    def __init__(self):
        self.steps = []
    
    def actions(self):
        while True:
            yield self.steps

Ex2: DSL Segments Mixed with Natural Language

Similar ideas have been explored in many other excellent works, such as MDX (Markdown + JSX). MDX mingles JSX to improve the expressiveness of Markdown. However, the complexity of JSX is not abstracted away; users still needs to learn JSX to use MDX. Moreover, this cannot do more than JSX can achieve, which is just frontend UI.

import {Chart} from './snowfall.js'
export const year = 2023

# Last year’s snowfall

In {year}, the snowfall was above average.
It was followed by a warm spring which caused
flood conditions in many of the nearby rivers.

<Chart year={year} color="#fcb32c" />

Another amazing work is Typst. Typst is way better than awkward LaTex typesetting or Beamer (i.e., Tex-based slides making), and it is also much more expressive than Markdown. The design philosophy of Typst is pretty much similar to the way of imposing low-level control on C with assembly or pragmas, i.e, __asm { ... } || #pragma omp .... Moreover, Typst APIs are pretty much declarative, which makes it easier for LLM to reason about and generate.

Glaciers as the one shown in @glaciers !

#figure(
  image("glacier.jpg", width: 70%),
  caption: [
    _Glaciers_ form an important part
    of the earth's climate system.
  ],
) <glaciers>

A problem in Typst is that it is designed only for Tex like text. No way you can render it to hierarchical UI components or interactive logic, or more complex backend logic. Similarly, Marp is also restricted in the way that it can only render to simple slides.

Markdown-UI

A pretty cool Markdown-to-UI conversion translator: Creating UX with MARKDOWN. The syntax is simple than markdown-ui which is another markdown-extension for pretty UI generation.

===Login===
username: ___
password: *___
===

[Proposal] MxL: Markdown Extension for App Building

People love markdown for content making, as it is simple and expressive. However, markdown is only good to create unstructured content, such as blog posts, notes, etc. It is not capable of creating structured content, such as UI components, interactive logic, etc.

Some people proposed MDX to bring the power of React/JSX into MD, but the complexity of JSX makes it hard to use; I’d rather use React directly. Other works like Typst or Notion are extending Markdown to more structured content, but they are still restricted to text rendering.

Jupyter and RMarkdown are mingling executable code snippets with markdown text. IMHO, one down side of Jupyter is that python code can be very verbose for realistic programs, and the whole canvas can become very messy.

% Front matter to set up app metadata
---
app: "My Todo GUI App"
settings:
  - start-time: *
  - server: 127.0.0.1
---

% pragmas to set up UI layout
# layout(
  separator = "Tabs" | "Menu" | "None",
  style = "Material design",
  prompt = "I would like the elements to be compact and simple. The"
)

% Two-way binding to backend data
# $todo = database.query("select * from todo")

= Page1
A page to display the pending Todo items here.
# set( bind = $todo, max = 3 )
| Syntax | Description |
| --- | ----------- |

= Page2
Another page to how heatmap of the Todo items.
# set( data = $todo, prompt = "" )

<code mermaid>
graph LR
A[Square Rect] -- Link text --> B((Circle))
A --> C(Round Rect)
B --> D{Rhombus}
C --> D
</code>

UI Rendering

The coarse-grained layout (i.e., pages, subtitles, tables, etc.) are described by users with markdown and extension APIs, while the styling and fine-grained layout are picked by LLM by prompting. We should allow users to insert HTML/CSS/JS code snippets to customize the UI.


= Page 1
== Subtitle
Some content here.

== Subtitle 2
<div className="note">
  Some notable things in a block quote!
</div>

Process Automation

The pragmas/directives also provides APIs to manipulate local OS and navigate through web pages. Note the actual automation is executed separate process (running in the background as a system service), while the frontend (i.e., markdown) is only defining UI and sending commands to the backend.

The following coded snippet shows how to automate the process of downloading receipts from emails and web pages, and then save them to a local folder.

---
app: "process automation API examples"
- settings:
  - server: "..."
---

% translate to wasm or request to system service
# $steps = function ($inp, ...) {
  % query emails or web pages
  web.query(
    "https://imap.gmail.com:993",
    username = "...", password = "...",
    prompt = "select all emails with subject = 'receipt'",
  )
  
  % desktop automation (requires RTE system service and auth)
  fs.open("history.xlsx").sync( data = $data)

  % mouse automation
  $loc = screen.locate("icon that indicates the receipt")
  mouse.to(location = $loc); mouse.click()

  % keyboard automation
  keyboard.press("ctrl + s")
}

The outcome of execution results are also rendered in the UI. All variables are assumed to have two-way binding, i.e., the UI is updated when the variable is updated, and vice versa.

---
app: "process automation API examples"
- settings:
  - server: "..."
---

# $data = database.query("select * from receipts")

= Page 1
Welcome! This is an app to summarize my spending. 
Please select the tags you are interested in, and

# set( options = $data.tags, max = 1, bind = $selected )
- Select tags: ___

The following receipts are downloaded as a result of your selection.
# $data = database.query("select * from receipts where tags = $selected")
# set( bind = $data, max = 3 )
| Date | Amount | Tags |

Use Cases: Productivity AIO

To develop this app is initially for my own use, but I later realized that it can be used for many other purposes. The following are some use cases I can think of:

Window management: arrange windows in a certain way, close all windows, etc.
Easier UI app making and distribution (same as Streamlit)

% E.x., an carousel with pictures
# set( mode = "carousel | waterfall", max = 3 )
== Must-visit places in Reykjavik
- ![The dawn view of church](...)
- ![Bird's view of waterfall](...)


% E.x., a map view with markers
# set( render = "map", max = 3 )
== Must-visit places in Reykjavik
- (54.32, 23.34): ![The dawn view of church](...)
- 243 Hoy Rd, Auckland, New Zealand, 13223: ![Bird's view of waterfall](...)

Important message delivery (redemption code, etc.)
Form filling/signature distribution to teammates

---
app: "Form Filling for Teammates"
- settings:

% to teammates
- broadcast: "..."
  - 
---

% this will be rendered as a form
# set( bind = $info, image = "..." )
== Personal Information 
- First Name: ___
- Last Name: ___

# set( options=['Male', 'Female', 'Other'], max=1 )
- Sex: ___

# set( bind = $list, max = 2 )
== 💄 CheckList
- [ ] 化妆品  
- [ ] 衣服 
- [ ] 鞋子

Language learning & bookkeeping and vocabulary refreshing
A smoother interaction with AI infrastructure (e.g., GPT API, DALL-E API, etc.)

== My amazing day 
Alice and I went to the park today. We saw a cute dog.

% E.x. insert image for a given prompt in article
# set( prompt = "...", model = "...")
![A image of a dog](...)

% E.x. local or remote file search box
# set( $bind = $files, max = 3, action = "upload")
- Search local disk: ___

# set( "substitute content with summary of the $files" )
> Here is the summary

% E.x. insert pronounce button for
# set( $bind = $words, max = 3, action = "pronounce")
== Vocabulary
- [ ] Mozart
- [ ] Beethoven

Financial management (e.g., custom stock price monitoring, etc.)
Personalized feed (e.g., surveys, custom RSS feed, lightening deals, etc). You can subscribe to anything with read-to-earn tag, and get paid for reading them.

% settings.yaml
subscription:
  sources:
    - https://www.reddit.com/r/...

  filters:
    - topics: "..."
    - keywords: "..."

Real-time screen monitoring, e.g., better-genshin-impact
- Event-driven Mode (e.g., when window changes, do something)

% Auto 
---
app: "Visual Auto-Helper"
settings:
  
---
% E.x. auto change keyboard layout when window changes
# watch( event = "WINDOW_CHANGE", action = "..." )
# $action = function ($inp, ...) {
  if $inp.window == "Genshin Impact":
    os.shell("change keyboard input to Chinese")
}

% E.x. auto click when object appears
# watch( event = $appear, action = "...", interval = 10s )

Shell scripting. Multi sessions, or command books.
- Combined with Zellij; it can sync and restore sessions. Pop up a terminal when command is not returning expected results.
- Zellij CLI to run background panes/tasks

Implementation

We should transpile the MxL into pure JSX, and execute in the browser; the dynamic UI rendering is realized in browser extension; code block regions are automatically render if target code blocks detected on the page (same as mermaid does).

For desktop implementation, we can use Tauri + Rust. The difference is that there won’t be any automatic code block detection and UI rendering, the MD files are executed as standalone apps. While for the actual UI rendering, we would need to use a webview.