Deciphering Complexity

May 31, 2023

The Multimodal Future of AI and its Implications

6 Comments

Jun 6, 2023

Great post! I am running a Discord community for practitioners tinkering with Multimodal AI research and applications. Would love to have you join: https://discord.com/invite/Sh6BRfakJa

Expand full comment

Devika

Jun 2, 2023

Wow! Waiting to see more on how these impact financial services

Expand full comment

Reply (1)

Girish Chandrasekar

Jun 2, 2023

Oh there’s a post on fine tuning that’ll use a financial use case as the example coming soon!

Expand full comment

Matt Roberson

Jun 1, 2023

Nice work! I too have been drinking from the firehose. I am working on a way to analyze baseball swings. Have you encountered any solutions to train a model on correct form in a movement? In this case, a baseball swing.

Expand full comment

Reply (1)

Girish Chandrasekar

Jun 2, 2023

Very exciting -- I think the starting point is really a high quality multimodal dataset that aligns “form”/pose with outcome data (launch angle, launch velocity, distance, etc). Luckily, I believe MLB provides the latter for every game and you can probably align this with video of a game. You could probably also use another llm to “describe” the at bat to provide text as another aligned data source. Further, you could consider fine tuning a base LLM on text data, say from a book on baseball form, to ensure that you have the best starting point for this. I’d probably try collapsing the video and outcome data to the text domain before training embeddings directly on this (like they did in PaLM-E). This way, you could upload a video and be able to chat with the model to discover improvements one could make to their form. I’d love to hear about how this goes!

Expand full comment

Derek Cheung

Jun 1, 2023

Excellent post. Thanks

Expand full comment

Reluminations

Deciphering Complexity