SCBX Unlocking AI EP 9: Advancing ThaiLLM Development and Applications

SCBX Unlocking AI EP 9: Advancing ThaiLLM Development and Applications

Session 1: ACL & LLM Roadmap in Thailand

SCBX Unlocking AI is back in EP 9 under the theme Advancing ThaiLLM Development and Applications to present stories about AI and LLMs that have been in the attention of people around the world since the birth of ChatGPT.

This event has many interesting discussion topics. Dr. Thepchai Sapnithi, Vice President, AiAT & Director of Artificial Intelligence Research from the National Electronics and Computer Technology Center (NECTEC) gave a lecture on ACL & LLM Roadmap in Thailand.

But before explaining about LLMs, Dr. Thepchai first introduced what is called ACL and how important it is to the world of AI and LLM.

ACL has 2 meanings as follows:

  1. The Annual Meeting on Computational Linguistics is an academic conference of computer linguists, which has been held since 1962, but was originally called the Association on Machine Translation and Computation Linguistics.
  2. Association of Computational Linguistics or those who oversee and organize this conference. If this event and the organizers are absent, it may result in the awakening in the AI field changing from the front of the hand to the back of the hand.

Dr. Thepchai added that ACL conferences have always been held in the United States until 1996, when they began to be held in other countries, such as Canada and Spain, before the first Asian conference was held in Hong Kong in 2000.

And in August 2024, Thailand will also host the meeting. He himself will also attend the event. After participating in the ACL since 2009.

The reason is that the advent of ChatGPT in a few years has also given rise to multilingual LLMs from Google, Microsoft, Meta, Apple, and OpenAI, which focus on English.

Thailand also has Thai LLMs such as OpenThaiGPT and Typhoon, etc.

Dr. Thepchai also said that Thailand has set out a roadmap with five AI action plans, with the LLM strategy also in the second aspect, which shows that Thailand attaches great importance to this issue and should be useful for business use or future research.

By now. He and the NECTEC team have received many requests from both government and private agencies to help create LLMs with specific fields to help with specific tasks for each organization. We must continue to monitor how it will be developed and used.

Another important task is to join hands with the National Research Agency (NRA). The Ministry of Higher Education, Science, Research and Innovation and the Ministry of Digital Economy and Society to create 3 LLM models:

  1. The Pretrained LLM is a versatile base model for using the Thai language that can be shared by everyone.
  2. Fine Tuning Model to support tourism Medical and Environmental
  3. An opensourced model that private organizations can use to develop and extend to other industries in which they work.

Another thing that is being done is the Chatbot Arena or taking each chatbot to talk to each other to know which bot's information is correct. Easy to use More practical?

Another thing that Dr. Thepchai emphasized is that the use of AI must be safe and reliable, so the Trustworthy AI Framework development project was born. Whether it violates the law, obscenity, is it friendly to mental health, and is it good for Thai society and culture, etc.

last Dr. Thepchai also laid out the roadmap that NECTEC has roughly planned that within the next 5 years, more OpenThaiGPT will be developed, and in 2028, OpenThaiLLM and Multimodel OpenThaiLLM version 3.0.0 will be developed for 3D World Instruction work.

If everything goes according to plan, We are sure to see many more challenges, novelties, and exciting things happen to Thailand.

Session 2: Advancing LLM R&D in Southeast Asia: Bridging Innovation and Collaboration

From where there is a language barrier in this world. Communicating with foreigners requires knowledge and experience for a long time. But the arrival of generative AI and the rapid development of LLMs has broken down language barriers.

Until now, not only in the United States or Europe, smart and cutting-edge LLMs are being developed, but in Asia, especially in Southeast Asia, they are also developing.

ศ.ดร.สรณะ นุชอนงค์ Director of VISTEC-depa หนึ่งในผู้บุกเบิกด้าน LLM ในไทย มาบรรยายเรื่อง Advancing LLM R&D in Southeast Asia: Bridging Innovation and Collaboration ในงาน SCBX  Unlocking AI EP 9: Advancing ThaiLLM Development and Applications to explain how far LLM education, research, and development in Southeast Asia is now progressing.

One of the most prominent projects is in Singapore. Called SEALD (Southeast Asian Languages in One Network Data), or simply referred to as AI Singapore, it is a project that invites researchers from neighboring countries to collaborate to create LLMs that stand out in Asia's local languages.

This is because all Southeast Asian countries already have their main language. Not only that, there are thousands of dialects or secondary languages. The project aims to develop LLMs with information on different languages in the region are considered to be a good use of diversity.

However, If researchers are to R&D and develop LLMs effectively, they must also have good measurements, but the problem with LLM development today is that it is still difficult to measure. Not only in Asia, but even English LLMs themselves still have this problem.

Prof. Dr. Sarana explained that there are 4 major obstacles in measuring the results as follows:

  1. Sparse Evaluation Data - Only less than 10 languages have been developed and trained by AI.
  2. Resource Gap - Of the 1,308 languages in Southeast Asia, 700 have only 1-2 datasets that are accurate for measurement.
  3. Quality of Resources The quality of information on the Internet has not been confirmed to be accurate enough to be used further.
  4. Cultural Relevance Information or terminology about multiple secondary languages translated from English may not correspond to the actual culture or context of the area.

The Singapore government understands this problem well, so it has developed the SEACrowd project to officially collect datasets from Southeast Asian countries, and Vistec researchers are also involved in the project.

Prof. Dr. Sarana hopes that SEACrowd will be a good benchmark that helps ensure that the data obtained will really present the Southeast Asian region, not just thinking about it or translating it from Western sources.

Because don't forget that LLM development is measured with the right benchmark. Practical. It will help guide researchers to their goals and let them know which problems need to be solved. Any problem does not need to be solved. What will help develop LLM to be better in the future, etc.

But there are also precautions to measure. Prof. Dr. Sorana gave an example of a paper called Don't Make Your LLM an Evaluation Benchmark Cheater which concluded that you should be careful about deliberately measuring the results to produce good-looking numbers. It looks the most beautiful, but it cannot be used in practice when it comes to actual work because it is no different from a student who intends to read a book to get a good score on an exam but cannot use the knowledge he has gained to any advantage.

As for the future. Prof. Dr. Sorana affirmed that Thailand will continue to cooperate with the SEACrowd project to bring the Thai language into the central database for the world to know more.

Session 3: Understanding Textual Embeddings: Applications in Retrieval and Recommendation

Ever wondered how job posting sites like JobTopGun can help job applicants land the right job? The answer lies in the SCBX Unlocking AI EP 9: Advancing ThaiLLM Development and Applications!

Dr. Ekpol Songsuwanich from the Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, gave a lecture on Understanding Textual Embeddings: Applications in Retrieval and Recommendation and shared his experience in practicing and developing LLMs.

Dr. Ekpol said that we humans have the ability to interpret, we know which sentences have the same meaning as or are close to which sentences, such as A Little Girl Seems to be Very Sad, which has a similar meaning to The Little Child is Far From Being Happy.

However, when developing LLMs, it is also a challenge to enter information so that the computer understands the meaning of sentences that are written differently. But the meaning is the same or similar.

Dr. Ekpol gave an example of how he used to train the system of JobTopGun, a job search website that allows people to upload their resumes to the website. His job is to do whatever it takes to find a way to match resumes to find the right job position.

Therefore, he trained artificial intelligence to read the resume of the job applicant and the job description of each job from each organization until it knew the appropriate job characteristics of each person.

For the learning techniques that teach artificial intelligence, there are 2 genres: Sparse Embeddings and Dense Embedbings.

  1. Sparse Embeddings refers to having machine learning read a book and counting how many keywords these books have, such as William Shakespeare's book. How many words are there for Battle or how many words are there for Soiler, etc., and then analyze the results of the advantages of Sparse Embeddings, which are easy to use. For example, in some cases, it may not find the word you want to count, and often the computer may overlook words that have the same meaning but are not the same word. Therefore, it may be necessary to change from counting the number of times to counting percentages or frequency instead of how often you encounter this word.
  2. Dense Embedbings: Input for Deep Learning and then convert things into numbers. If any number is the same or similar. Dr. Ekpol shared that this is how he taught JobTopGun's machine learning to learn how to read a job candidate's resume. It indicates that the resume of the job applicant is suitable for the job that is open for application.

Both Sparse and Dense Embedbings have their own advantages and disadvantages. Before using it, you need to see which cases should use Sparse, which cases should use Dense, or whether they should be used together to bring out the best of both methods.

Just understand the concept and how it works. Dr. Ekpol is confident that we will be able to develop many interesting things. To continue to improve the work of themselves and the organization.

Session 4: Investment Insights Unleashed: AI-Powered Digital RM and Customer Service

People who will be successful investors It is indispensable to keep up with the news at all times, but because the information of this era comes so fast that it is difficult to keep up. Missing out on a little important information can result in huge losses in the blink of an eye, instead of making a profit.

But what if investors had easier access to the information they need to invest? Mr. Veerint Itroj, AI Transformation Lead from InnovestX, projected the image in a lecture titled "Investment Insights Unleashed: AI-Powered Digital RM and Customer Service" at SCBX Unlocking AI EP 9: Advancing ThaiLLM Development and Applications.

Mr. Veerint said that InnovestX currently has a customer base of about 1 million people, but most of them, about 98%, are general customers who have relatively little time to invest because they have to work full-time and there is no one to advise them on which stocks to buy and which to sell. You have to allocate time to work to follow the news from various sources.

And because the current investment is not only Thai stocks but also foreign stocks. There are mutual funds, bonds, and bonds to choose from, as well as risky assets such as cryptocurrencies. How much does it affect our own investment portfolio?

Another challenge is that customers often contact InnovestX to inquire about investment information through multiple channels, making customer service have to work especially hard to receive information from customers.

At the same time, it is difficult to develop human investment professionals. Work to keep up with the increasing number of customers.

Based on the accumulated cases, the development of AI-Powered Digital RM through an LLM called Typhoon was developed to help InnovestX employees in their work and help the platform's customers access the investment information that best meets their needs.

In customer service, InnovestX uses AI to help with customer service, which is a chatbot to provide account opening services or answer basic investment questions, which reduces the problem of information overload, reduces work time, and has human experts take care of only major cases. The by-product also reduces human error and helps employees not to get too stressed or tired.

The data used by InnovestX's LLM will be taken from the analysis prepared by the InnovestX analysis team, and every paper will have a reference to the source of the data used so that investors can read more if they have time, which will help increase credibility and reassure investors that analysts do not make up their own calculations without principles or evidence.

Not only that, but using Typhoon's model has also helped InnovestX save more than 10% of its budget compared to using other models such as GPT4o, and employees can respond to customers faster. Save more time and be more confident that you can quickly deliver quality investment advice to your clients.

At the event, SCBX Unlocking AI also demonstrated the use of AI-Powered Digital RM in real-time, demonstrating how InnovestX's AI works. How to respond to the information that customers actually input to create a good feeling for customers who use the service, so that they can continue to invest happily and successfully.

If anyone is interested, I want to know if the investment information from InnovestX is really reliable and reliable. You can try chatting with InnovestX.

Session 5: Advancing Thai LLMs and Their Applications

งาน SCBX Unlocking AI EP 9: Advancing ThaiLLM Development and Applications นอกจากอัดแน่นไปด้วยการบรรยายถึง LLM ในแง่มุมที่น่าสนใจ ยังมีการเสวนาปิดท้ายด้วย ชื่อเดียวกับธีมงานเลยนั่นคือ Advancing Thai LLMs and Their Applications

The event was honored by professionals with experience in studying and developing LLMs at the national and global levels. Dr. Thepchai Sapnithi, Vice President, AiAT & Director of Artificial Intelligence Research from NECTEC, Prof. Dr. Sarana Nuchonong, Director of VISTEC-depa, Dr. Ekpol Songsuwanit from the Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, and Ms. Veerin Itroj, AI Transformation Lead from InnovestX.

With Dr. Thitipat Achakulvisut Lecturer at the Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, served as the emcee.

What are the interesting points of this discussion? Insiderly ai has summarized it as follows:

  1. Mr. Veerint said that LLM can be used in many financial fields. In addition to processing information to help with investment. It can also be used to help solve consumers' bad debts, reduce costs, reduce risks, and help improve customer experience, etc.
  2. Dr. Ekpol said that LLM can be used in many academic fields to take better care of students, and gave an example that the number of students in the faculty has increased many times, but the number of lecturers has not increased in the same ratio. So he experimented with using LLM to help with work, such as exam checking.
  3. However, when I tried to check the test, I found that the LLM did not give very accurate answers compared to the human self-examination. If used, it may cause the student's grades to be misplaced. Therefore, it is something that needs to be monitored to see how in the future a system with precision can be developed to help the work and ease the burden of the faculty.
  4. Dr. Thepchai added that NECTEC also uses LLM, but it is used for simple tasks such as using it as a chatbot to answer questions about corporate regulations such as holidays, leave, and employee benefits. Complex scenarios may be included as variables to test how well they can answer questions that have changed from what they have learned.
  5. Prof. Dr. Sarana added that he is currently assisting the Faculty of Law, Thammasat University, to train LLM to act as a legal assistant, but the reason why he does not train to be a lawyer is because he is not yet confident and fully relied on its answers that are more accurate than or equal to the diagnosis of human cases. As a result, the role of LLMs as legal assistants is now limited to helping to find information. It is expected that by September, a demonstration demonstration of its use will be released for everyone to see.
  6. Another thing that Prof. Dr. Sarana emphasized is: People still have the misconception that we can use LLM as a database to search for information, which is not because we don't forget that the information obtained is still inaccurate and wrong from the facts. If anyone wants to use it as a database, they must be really proficient, they must know whether the information they provide is right or wrong, and they must always check its information.
  7. Dr. Thepchai said that having a good LLM that provides accurate information is important. It is also necessary to learn good basic information. However, he admits that there are currently only a few people who can practice LLMs and fine tune the answers as intended, so if we want to see the AI field develop more, it is necessary to accelerate the development of personnel to help improve the industry.
  8. In the case of InnovestX, it is difficult to find people to supplement the AI Engineer team. Compared to other countries that are in an uptrend. This makes people less interested in information from Thai stocks. The amount of money that will support the growth of the industry is also less. It cannot be developed effectively.
  9. The problem of people also results in the community of people in the AI development industry not being as strong as it should be. Dr. Ekpol said that he had met people from many organizations who tried to study and develop LLM on their own, but when they had to learn it themselves, it was costly and had to pay a very expensive trial and error fee. That makes him see the importance of a discussion like SCBX Unlocking AI that can bring people who are interested in the same things together and lead to better things at the same time. And we should have more space like this. If you really want the AI field to evolve, you need to change it.
Great! Next, complete checkout for full access to The Insiderly AI.
Welcome back! You've successfully signed in.
You've successfully subscribed to The Insiderly AI.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.