GPT-4, Bard, and more are here, but we’re running low on GPUs and hallucinations remain

Jeremy Kahn

2023年3月21日下午3:15·7 分鐘文章

Wow, what a week—perhaps the most eventful week in A.I. (at least, in terms of sheer volume of announcements) that I can remember in the past seven years writing about this topic.

- Google and Microsoft each began pushing generative A.I. capabilities into their rival office productivity software.

- Chinese search giant Baidu launched its Ernie Bot, a large language model-based chatbot that can converse in both Chinese and English, only to see its stock get hammered because the company used a pre-recorded demo in the launch presentation. (Baidu, in a defensive statement emailed to me yesterday, implied it was the victim of an unfair double-standard: Microsoft and Google also used pre-recorded demos when they unveiled their search chatbots. And while Google’s stock did take a hit for an error its Bard chatbot made, no one seemed upset that the demos weren’t live.)

- Midjourney released the fifth generation of its text-to-image generation software, which can produce very professional-looking, photo realistic images. And Runway, one of the companies that helped create Midjourney’s open-source competitor Stable Diffusion, released Gen-2 which creates very short videos from scratch based on a text prompt.

- And just as I was preparing this newsletter, Google announced it is publicly releasing Bard, its A.I.-powered chatbot with internet search capabilities. Google unveiled Bard, its answer to Microsoft’s Bing chat, a few weeks ago but it was only available to employees—now, a limited number of public users in the U.S. and the U.K. will be able to try the chatbot.

But let's focus on what was by far the most widely anticipated news of the past week: OpenAI’s unveiling of GPT-4, a successor to the large language model GPT-3.5 that underpins ChatGPT. The model is also multimodal, meaning you can upload an image to it and it will describe the image. In a clever demonstration, OpenAI cofounder and president Greg Brockman drew a very rough sketch on a piece of paper of a website homepage, uploaded that to GPT-4, and asked it to write the code needed to generate the website—and it did.

A couple of key points to note though: There’s a great deal about GPT-4 that we don’t know because OpenAI has revealed almost nothing about how large a model it is, what data it was trained on, how many specialized computer chips (known as graphics processing units, or GPUs) it took to train, or what its carbon footprint might be. OpenAI has said it is keeping all these details secret for both competitive reasons and what it says are safety concerns. (In an interview, OpenAI’s chief scientist Ilya Sutskever told me it was primarily competitive concerns that had made the company decide to say so little about how they built GPT-4.)

Because we know almost nothing about how it was trained and built, there have been a number of questions about how to interpret some of the headline-grabbing performance figures for GPT-4 that OpenAI did publish. The stellar performance that GPT-4 turned in on computer programming questions from Codeforces’ coding contests in particular has been called into question. Since GPT-4 was trained on so much data, some believe there’s a decent chance it was trained with some of the exact same coding questions it was tested on. If that's the case, GPT-4 may simply have shown that it's good at memorizing answers rather than at actually answering never-before seen questions. The same data “contamination” issue might apply to GPT-4’s performance on other tests too. (And, as many have pointed out, just because GPT-4 can pass the bar exam with flying colors doesn’t mean it is about to be able to practice law as well as human.)

Another thing about GPT-4: Although we don’t know how many GPUs it takes to run, the answer is probably a heck of a lot. One indication of this is the way that OpenAI is having to throttle usage of GPT-4 through ChatGPT Plus. “GPT-4 currently has a cap of 25 messages every 3 hours. Expect significantly lower caps, as we adjust for demand,” reads the disclaimer that greets those who want to chat with GPT-4. Lack of GPU capacity may become a serious challenge to how quickly generative A.I. is adopted by businesses. The Information reported that teams within Microsoft that wanted to use GPUs for various research efforts were being told they would need special approval since the bulk of the company’s vast GPU capacity across its datacenters was now going to support new generative A.I. features in Bing and its first Office customers, as well as all of the Azure customers using OpenAI’s models. Charles Lamanna, Microsoft’s corporate vice president for business applications and low code platforms, told me that “there’s not infinite GPUs and if everybody uses it for every event, every team's meeting, there's probably not enough, right?” He told me Microsoft was prioritizing GPUs for areas that had the highest impact and “highest confidence of a return for our customers.” Look for discussions about limited GPU capacity holding back the implementation of generative A.I. in business to become more prevalent in the weeks and months ahead.

Most importantly, GPT-4, like all large language models, still has a hallucination problem. OpenAI says that GPT-4 is 40% less likely to make things up than its predecessor, ChatGPT, but the problem still exists—and might even be more dangerous in some ways because GPT-4 hallucinates less often, so humans may be more likely to be caught off guard when it does. So the other term you are going to start hearing a lot more about is “grounding”—or, how do you make sure that the output of a large language model is rooted in some specific, verified data that you’ve fed it and not something that it has just invented or drawn from its pretraining data.

Microsoft made a big deal about how its “Copilot” system—which is underpinning its deployment of GPT models into its Office and its Power applications—goes through a number of steps to make sure the output of the large language model is grounded in the data the user is giving it. These steps take place both on the input given to the LLM and on the output it generates.

Arijit Sengupta, the cofounder and CEO of machine learning platform Aible, reached out to me to point out that even with a 40% improvement in accuracy, GPT-4 still, according to the “technical report” OpenAI released, is inaccurate between 20% and 25% of the time. “That means you can never use it in the enterprise,” Sengupta says—at least not on its own. Aible, he says, has developed methods for ensuring that large language models can be used in situations where the output absolutely has to be grounded in accurate data. The system, which Aible is calling the Business Nervous System, sounds like it functions similarly to what Microsoft has tried to do with its Copilot system.

Aible’s system starts by using meta-prompts to instruct the large language model to only reference a particular dataset in producing its answer. Sengupta compares this to giving a cook a recipe for how to bake a cake. Next, it uses a more standard semantic parsing and information retrieval algorithms to check that all the factual claims the large language model is making are actually found within the dataset it was supposed to reference. In cases where it cannot find the model’s output in the dataset, it prompts the model to try again, and if it still fails—which Sengupta says happens in about 5% of cases in Aible’s experience so far—it flags that output as a failure case so that a customer knows not to rely on it. He says this is much better than a situation where you know the model is wrong 25% of the time, but you don’t know which 25%. Expect to hear a lot more about “grounding” in the weeks and months ahead too.

And with that here’s the rest of this week’s news in A.I.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

This story was originally featured on Fortune.com

More from Fortune:

Yahoo財經·7 小時前
「開張大吉關注組」成立反映本港缺乏首店經濟數據？
短期內成員激增至34萬的Facebook群組「全港店舖執笠結業消息關注組」，早前宣布執笠。正當執笠關注組執笠之際，網上亦出現了「開張大吉關注組」，據報是由IT界名人香港資訊科技商會榮譽會長方保僑創辦，旨在多角度反映本港市道實況
Yahoo財經專欄·6 小時前
拉闊投資│利率回升部署買美債（小勇哥）
隨着指數回落，早前升高了的高息國企股又出現可低價買入的機會。對象主要是國企水務及燃氣股，因為去年及今年這兩類公共事業正值加價潮，未來現金流有保障，再加上國九條3.0將股東回報的重視程度提高到新level，高息股值得再回頭。
信報財經新聞·10 小時前
北京市半程馬拉松風波從「黑人陪跑」看體育經濟
中國體育用品市場「內捲」競爭激烈，各大品牌都想捧紅自己的簽約運動員，但太過刻意會造成反效果。好似近期備受熱議的北京半馬拉松「黑人陪跑」事件，不但拖累贊助商特步（01368）股價急瀉，還迅即發酵為國際新聞，恐會損害中國體壇形象。該事件亦反映中國既着緊推動體育「國際化」，同時又要「商業化」，兩項目標有時難免出現矛盾；在這方面，香港「渣馬」的經驗值得內地城市借鑑。「陪跑」畫面發生於星期日（14日），北京市當天舉辦半程馬拉松賽事，作為中國體壇新晉明星、25歲的何杰順利奪冠。但從直播鏡頭可見，何杰衝線前直路的一段，其身旁有3名黑人選手以輕鬆姿態伴隨，該3人顯然隨時能夠超前，卻始終未有「爬頭」，反而對何杰作出護航動作，疑似保送他跑第一。這一幕引起各界嘩然，不少人質疑賽事「造馬」。世界各地很多電視台都對此作出報道，令中國體壇形象面臨壓力。同時，何杰屬特步簽約運動員，特步也是今次賽事大會贊助商，事件「主角」（何杰）與3個「配角」（黑人選手）皆身穿全套特步裝備（包括上衣、短褲、跑鞋），於是特步也受累，公司股價本周下挫近8%。體育國際化與商業化易生矛盾一定程度上，該事件涉及中國體育「國際化」和「商
即日熱搜·18 分鐘前
自願醫保到底係啲乜？扣稅額可高達$8000？即搜尋【自願醫保】了解
◪立即繼續閱讀，獲得最新熱門搜尋資訊！即睇熱門搜尋結果
Fortune Insight·28 分鐘前
財經｜港交所將開發領航星衍生產品平台助24小時交易料2028年推出
港交所宣布，將開發領航星衍生產品平台(ODP)，提升交易、清算和風險管理能力。ODP將以模組化的方式建設，方便日後增加新產品、提升市場微結構及引入其他功能。ODP可支援接近24小時交易、新增買賣指令類別、設計切合業界所需的界面及提供更佳的測試和連接體驗。此外，ODP亦提供具資本效率的風險管理工具，例如採用風險價值(VaR)模式，配合國際結算市場慣例，進一步降低交易成本，維持穩健的風險管理。ODP由港交所技術團隊內部開發，預計於2028年推出。日後的系統遷移將以分階段形式進行，以確保平穩過渡。行政總裁陳翊庭表示，建立面向未來的技術平台和營運方針是策略重點，協助香港市場和市場參與者實現長期、可持續的增長和發展，是推動香港市場發展的一項重大投資。她表示，衍生產品市場是港交所增長最快的業務之一，交易量在過去2年均創下歷史新高，認為開發一個具彈性、高效且可擴展的自家平台，可令集團在全球更具競爭優勢，滿足國際投資者的需求，鞏固香港作為亞洲風險管理中心和國際金融中心的領先地位。
Bloomberg·39 分鐘前
中國央行重申防範匯率超調但表述略有變化同時稱匯率主要由市場決定
【彭博】-- 中國央行周四在人民日報刊文，其中涉及匯率的部分，在重申保持人民幣匯率水平基本穩定的同時，具體措辭表述又較近期公開表態出現了一些變化。這篇署名「中國人民銀行黨委理論學習中心組」的文章稱，「匯率上，要堅持人民幣匯率主要由市場決定，同時堅決對順周期行為進行糾偏，防範匯率超調風險，防止形成單邊一致性預期並自我強化。」上一次中國央行就匯率問題的官方表態，出自4月初公佈時中國央行貨幣政策委員會一季度例會通稿。與當時措辭相比，最新表態更加突匯率由市場決定的內容，而重申「堅決防範匯率超調風險」時未再使用「堅決」二字加以修飾。3月下旬人民幣跌破7.20元強支撐位後，匯率貶值壓力一度提高。在中國央行將中間價維持在大幅強於市場預期及即期匯率等措施支持下，人民幣相對多數亞幣走勢穩健。因此市場對中國央行每日中間價及涉及匯率問題的公開表態格外關注。中國國務院新聞辦公室周四下午2點將舉行的發布會，由中國人民銀行副行長、國家外匯管理局局長朱鶴新，國家外匯管理局副局長王春英介紹2024年一季度金融運行和外匯收支情況並答記者問。今年以來中國央行在官員公開講話和各類會議通稿中，對匯率展望著墨不多，令市場觀點在
Bloomberg·39 分鐘前
柏克夏發行日圓債券圍繞巴菲特融資以加倉日股的猜測升溫
【彭博】-- 沃倫·巴菲特旗下柏克夏為2,633億日圓（17.1億美元）日圓債券定價，這是該公司自2019年首次發行日圓債券以來的最大一筆日圓債交易。此次發行的債券包括七種期限品種，定價的溢價程度低於該公司過去兩年的交易。股市投資者密切關注這筆交易，圍繞巴菲特可能準備再次進入日本股市的猜測浮現。巴菲特此前曾增持日本五大商社股份，令日股情緒振奮，推動日經225指數觸及紀錄高位。「這對日本股市來說是好消息，可能是日本股市的買入信號，」輝立證券日本的股票交易主管Takehiko Masuzawa表示。「這可能會改變日本股市的趨勢。」巴菲特在2月的年度信件中表示，柏克夏對日本公司的投資，大部分資金是通過發行日圓債券籌集的。柏克夏曾表示，希望最終能持有所投資的五大商社各9.9%的股份。推薦閱讀：日本五大商社股價走高巴菲特在致股東信中對其贊不絕口這筆交易也是市場對日圓計價債券興趣的一個關鍵考驗，信貸市場對日本央行將很快再次加息的擔憂有所緩解。這是自日本央行上月取消負利率政策後，海外發行人進行的規模最大的日圓債券發行交易之一。原文標題Berkshire Sells ¥263.3 Billion
AASTOCKS·40 分鐘前
國統局：3月青年失業率持平於15.3% 25歲-29歲失業率升至7.2%
國家統計局發布3月份分年齡組失業率數據顯示，3月全國城鎮不包含在校生的25-29歲勞動力失業率為7.2%，較上月上升0.8個百分點。全國城鎮不包含在校生的16-24歲勞動力失業率為15.3%，與上月持平。全國城鎮不包含在校生的30-59歲勞動力失業率為4.1%，較上月下降0.1個百分點。(ta/u) ~ 阿思達克財經新聞網址: www.aastocks.com
AASTOCKS·41 分鐘前
《大行》滙豐研究降內地三大航空股目標價
滙豐研究預期，內地三大航空股首季可錄得盈利，清明節和勞動節假期具韌性的旅行數據，意味次季前景良好。儘管來自燃料價格、外匯和收益率的持續壓力，該行預期三大航空股次季可轉虧為盈。滙豐將內地三大航空股今明兩年盈利預測，分別下調19%及18%，反映更高的燃料成本，以及收益率下降，因為隨著運力提升，看到票價下跌。國航(00753.HK)目標價由6元降至5.5元，東航(00670.HK)目標價由2.7元下調至2.6元，評級均為「買入」。南航(01055.HK)目標價由3.1元下調至2.7元，評級「持有」。另外，維持北京首都機場(00694.HK)「買入」評級，目標價3元。(ad/cy)~ 阿思達克財經新聞網址: www.aastocks.com
AASTOCKS·41 分鐘前
天文台發出黃色暴雨警告信號
香港天文台在下午1時正發出黃色暴雨警告信號，表示香港廣泛地區已錄得或預料會有每小時雨量超過30毫米的大雨，且雨勢可能持續。(ad/cy)~ 阿思達克財經新聞網址: www.aastocks.com
AASTOCKS·47 分鐘前
神話世界(00582.HK)下午停牌暫未悉原因
神話世界(00582.HK)下午1時正起短暫停止買賣，暫未悉原因。該股中午收報0.039元，無升跌，成交6,993.24萬股，涉資291.4萬元。(cy/a)~ 阿思達克財經新聞網址: www.aastocks.com
香港電台-財經·48 分鐘前
恒指下午初段升逾1%
港股下午開市初段升逾1%，恒生指數較早時報16459點，升207點，升幅1.28%。科技指數報3375點，升34點，升幅1.05%。
即日熱搜·48 分鐘前
【搜尋：成立公司】成立公司一日都搞得掂？
◪立即繼續閱讀，獲得最新熱門搜尋資訊！即睇熱門搜尋結果
AASTOCKS·48 分鐘前
據報中車(01766.HK)在丹東成立新能源裝備公司
據《觀點網》引述天眼查資料顯示，中國中車(01766.HK)(601766.SH)在丹東成立一家新能源裝備公司，名為丹東中車時代新能源裝備有限公司。該公司註冊資本為7,200萬元人民幣，由谷小輝出任法定代表人，主要經營發電機及發電機組製造、機械電氣設備製造以及風力發電機組及零部件銷售等業務。丹東中車時代新能源裝備有限公司是由中車株洲電力機車研究所有限公司全資持股子公司，而後者則是中國中車的全資子公司。(ta/cy)~ 阿思達克財經新聞網址: www.aastocks.com
AASTOCKS·49 分鐘前
鴻盛昌資源(01850.HK)修訂配售價下午復牌現升20%
鴻盛昌資源(01850.HK)公布，早前計劃以每股配售價0.56元向不少於六名承配人配售最多2,880萬股，今日(18日)與配售代理書面協定，將配售價修訂為每股0.63元，預計集資淨額1,814.4萬元。股份下午1時起復牌，現報0.84元，升20%，成交180萬股，涉資147.42萬元。(cy/a)~ 阿思達克財經新聞網址: www.aastocks.com
infocast·52 分鐘前
港交所(00388)開發領航星衍生產品平台提升交易、清算等能力
港交所(00388)宣布開發領航星衍生產品平台(ODP)，進一步提升交易、清算和風險管理能力，預計2028年推出，將提升集團衍生產品業務的競爭力，助其於全球衍生產品市場中脫穎而出。港交所指，ODP將以模組化的方式建設，方便日後增加新產品、提升市場微結構及引入其他功能。ODP將為客戶提供更佳的交易及結算功能，包括可支援接近24小時交易、新增買賣指令類別、設計切合業界所需的界面及提供更佳的測試和連接體驗。新平台亦提供具資本效率的風險管理工具，例如採用風險價值(VaR)模式，以配合國際結算市場的慣例。港交所強調，將會在平台開發的過程與市場參與者緊密合作，日後的系統遷移將以分階段形式進行，以確保平穩過渡。港交所集團行政總裁陳翊庭表示，建立面向未來的技術平台和營運方針是港交所的策略重點，藉以協助香港市場和市場參與者實現長期、可持續的增長和發展，而開發ODP的計劃是對推動香港市場發展的一項重大投資。衍生產品市場是港交所多元化業務中增長最快的業務之一，交易量在2022年和2023年均創新高。ODP的推出將促使港交所更能滿足國際投資者的需求，並鞏固香港作為亞洲風險管理中心和國際金融中心的領先地位。港交
BossMind·52 分鐘前
插水內情｜天瑞水泥暴跌99%內情大股東4.5%股份遭斬倉去年底仍有七成持股
中國天瑞水泥(1252.HK)今個月7日出現尾市15分鐘股價暴跌99%及市值損失超過百億元的異常狀況引起市場關
香港電台-財經·54 分鐘前
港深創科園引入首批60間合作企業料投資逾10億港元
位於落馬洲河套區港深創科園引入首批60間合作企業，來自本地、內地、歐洲及北美等地區，預計投資超過10億港元，創造以千計就業機會，當中4成半來自生命健康科技、新能源和微電子等產業，4成企業是首次落戶本港或擴充業務，部分會租用預料在明年落成的兩座實驗室大樓。合作夥伴啟動禮在政府總部舉行，行政長官李家超、署理財政司司長黃偉綸，以及創新科技及工業局局長孫東等官員出席，見證港深創科園公司與企業代表簽署合作備忘錄。李家超致辭時表示，香港在獨特的一國兩制下享有法治、自由的營商環境，加上世界級基建，健全的金融體系，有助企業研發。他相信今日的簽署合作是好開始，可為香港經濟及科研注入動力。孫東致辭時指出，港深創科園與數十間企業合作是重要里程碑，對初創企業來說只是開始，強調當局會繼續支持港深創科園發展。有在內地上市的生物醫藥公司代表說，在本港做藥物研發與在內地的成本不是相差好遠，香港有好的條件，包括低稅制、資金自由出入，同時能夠吸引高科技人才，相信落戶港深創科園後會比以前在上海發展得更快。
infocast·54 分鐘前
《大行報告》
《大行報告》富瑞:金沙中國(01928)盈利及毛利率可能持續受裝修影響大摩下調藥明合聯(02268)目標價4% 評級「增持」花旗指申洲國際(02313)客戶Adidas上調今年指引利好目標價108元花旗上調同程旅行(00780)目標價4% 料首季增收入增45%次季加速摩通降東方海外國際(00316)目標價12% 評級「增持」交銀國際削眾安在線(06060)目標價14% 維持「買入」評級
infocast·55 分鐘前
神話世界(00582)停牌暫未悉原因
神話世界(00582)停牌，暫未悉原因。

恒指

國指

上證綜指

滬深300

美元

人民幣

道指

標普 500

納指

日圓

歐元

英鎊

紐約期油

金價

Bitcoin

CMC Crypto 200

GPT-4, Bard, and more are here, but we’re running low on GPUs and hallucinations remain

你可能還想看

「開張大吉關注組」成立反映本港缺乏首店經濟數據？

拉闊投資│利率回升部署買美債（小勇哥）

北京市半程馬拉松風波從「黑人陪跑」看體育經濟

自願醫保到底係啲乜？扣稅額可高達$8000？即搜尋【自願醫保】了解

財經｜港交所將開發領航星衍生產品平台助24小時交易料2028年推出

中國央行重申防範匯率超調但表述略有變化同時稱匯率主要由市場決定

柏克夏發行日圓債券圍繞巴菲特融資以加倉日股的猜測升溫

國統局：3月青年失業率持平於15.3% 25歲-29歲失業率升至7.2%

《大行》滙豐研究降內地三大航空股目標價

天文台發出黃色暴雨警告信號

神話世界(00582.HK)下午停牌暫未悉原因

恒指下午初段升逾1%

【搜尋：成立公司】成立公司一日都搞得掂？

據報中車(01766.HK)在丹東成立新能源裝備公司

鴻盛昌資源(01850.HK)修訂配售價下午復牌現升20%

港交所(00388)開發領航星衍生產品平台提升交易、清算等能力

插水內情｜天瑞水泥暴跌99%內情大股東4.5%股份遭斬倉去年底仍有七成持股

港深創科園引入首批60間合作企業料投資逾10億港元

《大行報告》

神話世界(00582)停牌暫未悉原因