Now Reading
Attending NeurIPS 2023

Attending NeurIPS 2023

2024-01-12 15:23:45


Taken from @Vikashplus.
Table of Contents

NeurIPS is the top conference for artificial intelligence (AI) research [2]. It offers an excellent opportunity to present your work to fellow researchers, gain a sense of the latest developments in AI, and connect with the people driving the field. Despite having my paper accepted, I couldn’t attend the last two sessions due to COVID and visa issues. However, I managed to participate in this year’s conference in New Orleans, United States. It was an incredible week filled with all things AI. In the following, I will document the talks I attended, the papers I liked, the people I met, and the things I learned.

Invited Talks

The main conference kicked off with an overview of the conference. This year’s conference was huge, with ~15k attendees and ~3500 posters.

NextGenAI: The Delusion of Scaling and the Future of Generative AI

Then, there was an invited talk by Björn Ommer on generative AI and scaling. Ommer is legendary for his lab’s work on Secure Diffusion [4]. He began his speak with the larger image and argued that the aim of human imaginative and prescient is to know and comprehend issues round us without having to the touch them. It’s because our imaginative and prescient consists of a mind inside a field with solely a slim opening that gives a sketchy understanding of the outer world. The imaginative and prescient has to resolve this drawback of why issues look the best way they appear. So, visible understanding is the hallucination of the world. Connecting this to visible understanding within the period of generative AI, he showcased how, for essentially the most half, we did preception answering “what, the place, and when” questions.
In distinction, the opposite facet consisted of prediction or generative. The generative facet tries to foretell lacking elements of the world. Imaginative and prescient analysis in notion has been pushed by benchmarks for the previous few years. Nonetheless, we don’t have good benchmarks within the Generative AI (GenAI) period. The shortage of formal benchmarks means everybody runs of their path.

Subsequent, he mentioned how generative fashions face a classical dilemma between knowledge protection (VAEs) and pattern high quality (GANs). Robust chance fashions like auto-regressive fashions (ARM) and Diffusion fashions (DMs) resolve this challenge. Nonetheless, these fashions are costly as they attempt to cowl each bit of knowledge distribution, and most assets are devoted to small, imperceptible particulars slightly than perceptually related info.

There are two doable options to DM’s starvation for capturing smaller particulars: naively rising the mannequin measurement or discovering methods to seize significant particulars solely. Scaling is one answer talked about in Richard Sutton’s weblog, Bitter Classes [3]. However there’s a bottleneck in scaling. The rise in mannequin sizes is flattening as speedup in GPUs shouldn’t be catching up with demand in mannequin measurement improve. Björn argued that scaling shouldn’t be the answer and easily hoping for scaling to work is hopeless. Then, what ought to we take from bitter classes [3]? He argued that classes are that architectures that higher exploit scalable commodities win. He gave examples of rule-based vs. gradient-based studying, kernel strategies vs. CNNs, and supervised vs. self-supervised to assist his argument. One other vital level is that we’re blind to modifications of change and suppose that progress is because of scaling solely. As an alternative, after the saturation level in progress from scaling, it’s pushed by paradigm shift, similar to CPUs to GPUs.

Progress is pushed by scaling till the saturation level after which by paradigm shifts.

An vital query, then, he requested, is, can we get intelligence by merely scaling? He argued possibly it’s doable, however intelligence primarily comes with studying with finite assets. The significance of AI within the trendy world means we want fashions everybody can run. That is the motivation of secure diffusion [4]. Secure aimed toward capturing native particulars and long-range interactions. Completely different architectural modifications, similar to consideration, additionally handle these points. However, there isn’t any one-size-fit method. Present ARMs are good at long-range interactions however not at capturing native ones. CNNs are good at studying native particulars. Diffusion fashions mix these two traits.

In the remainder of the speak, he described diffusion fashions and the vital questions surrounding them. Diffusion fashions first be taught perceptual compression after which be taught to era. Extra particulars are on this latest survey [5]. However the place can latent diffusion fashions lead us? This led them so as to add a flow-matching method with DMs to enhance sampling frequency, making inference quick [6]. Subsequent, they requested what fashions they need to even be taught. Neural nets are made to be taught quite a lot of particulars that will not be vital for the downstream activity. One method: add a database of patches. The mannequin first retrieves these patches and focuses on studying long-range particulars conditioned on retrieved patches. With easy nearest-neighbor retrieval, small fashions can carry out higher. Subsequent, understanding the world requires poking the world, so that they proposed a generative-based method referred to as iPoke[7]. He additionally mentioned the makes use of of LoRA for scene modifying scene, utilizing mixed LoRA to do sure issues like altering type, and so forth. Naive mixtures don’t work, however ZipLoRA[8] however takes assets. So, his lab launched a extra environment friendly technique.

The second speak I attended was The many faces of Responsible AI by Lora Aroyo. This speak was concerning the significance of knowledge for accountable AI. Lora highlighted how the world shouldn’t be binary however slightly a spectrum. As an example, knowledge shouldn’t be divided into good or dangerous, however slightly a spectrum between these two. Moreover, she additionally highlighted the indicators within the knowledge labels disagreement and the way it may be helpful. For the alignment, we want various and non-polar knowledge. Total, this speak mentioned the position of knowledge for accountable AI.

Socializing

Attended a dwell jazz efficiency at Preservation Corridor.

Probably the most pleasurable a part of attending an AI convention is connecting with previous acquaintances and forging new friendships with people who find themselves shaping the frontiers of the sphere. NeurIPS is exclusive on this facet for its various attendees from all aspects of machine studying. I engaged with quite a few fascinating folks, indulged in a number of hour-long discussions, and gained precious insights into the present happenings throughout the area. Here’s a listing of individuals I met.

  • Prof. Yan Lecun is an icon in deep studying. I discovered him throughout a poster session. His solutions had been surprisingly frank and gave a broader image of the sphere. He answered our questions on AGI, deep studying, and contrastive studying. One factor that caught with me was his insistence that the distinction between chimp and the human genome is barely 8 Mbs. I discovered him form and prepared to reply our naive questions patiently.

  • Prof. Ben Rubinstein is a professor on the College of Melbourne. I noticed his submit on Twitter and DMed him for a “chat and low” session, and he was form sufficient to simply accept the invitation. What adopted was an hour-and-a-half-long intense and charming dialogue on the previous, current, and way forward for adversarial ML. He labored on adversarial ML manner earlier than the famed intriguing properties paper [1]. He advised attention-grabbing tidbits concerning the back-and-forth of adversarial assaults and defenses earlier than the deep studying ones. He defined how earlier than gradient-based approaches, options and heuristic-based approaches had been utilized to craft adversarial assaults and defenses on earlier ML techniques like spam filters. He additionally shared his journey from trade to academia and his lab’s latest work on licensed robustness. Thanks for the time and such a tremendous dialogue, Ben. 🙂

  • Alex Sablayrolles is with Mistral. We had an attention-grabbing dialogue about latest advances round LLMs in Mistral, his resolution to maneuver to Mistral, the state of AI in France, and the way LLMs are altering the sphere.

  • Prof. Kira is a professor at GaTech. I had a protracted dialogue with a big selection of matters: his views on how the sphere of pc imaginative and prescient (CV) is transferring, the affect of enormous fashions on CV, his type of analysis and his lab’s latest works, the basic debate of trade vs. academia and what one wants to maneuver to tutorial jobs. He additionally shared his private story on how he moved from trade to academia.

  • Prof. Cihang Xie is an assistant professor on the UCSC. I received to know him from one among his papers in ICLR 2019 and began his work. I like his type of analysis. I mentioned his paper with him and the way his lab improved CLIP with the inverse regulation, which goals to shorten textual content in image-text pairs with out lowering efficiency. We additionally had a common dialogue on how pc imaginative and prescient is transferring within the period of enormous fashions and the way conventional CV duties are altering. He additionally gave me recommendation on tips on how to conduct analysis extra successfully in academia.

  • Prof. Nicolas Carlini is famend for his experience in adversarial ML, mannequin safety, and breaking new adversarial defenses. Our dialog delved into the constraints of defenses in opposition to adversarial examples, foreseeing potential modifications within the area amid the period of LLMs, the place assaults may develop into extra pragmatic and threatening.

  • David Abeel is at DeepMind. I’ve identified him since his Ph.D. days when he used to publish lovely notes about attending conferences [23]. I requested him questions on his notes and recommendation for early-career analysis. He talked about an important paper on tips on how to choose scientific issues [24].

  • Prof. Devi Parikh is a professor at GaTech. I’ve used her tips on tips on how to write rebuttals [25]. I requested her questions on her analysis and her processes for remorse minimization of selections.

  • Prof. Adam Dziedzic and Prof. Franziska Boenisch are college members at CISPA Helmholtz Heart for Info Safety and lead the SprintML group. They work in backdoor assaults and mannequin stealing.

  • Prof. Jinwoo Choi is an assistant professor at KHU. He’s at all times form sufficient to spare time for lengthy discussions. I mentioned AI in Kora and the stunning explosion of the pc imaginative and prescient area in Korea. We additionally mentioned deep studying for video and challenges on this area that come up from the requirement of assets. I take pleasure in his distinctive no-bs method.

  • Prof. Bae is an assistant professor and my Ph.D. advisor. We reconnected and mentioned quite a few issues. We additionally attended a dwell jazz efficiency on the well-known preservation corridor on Bourbon Avenue.

I additionally met many different folks, together with Hadi Salman, Elisa Zecheng Zhang, Elisa Nguyen, Yuanhan Zhang, Zechen Zhang, and Ojswa.

Events

One other thrilling a part of AI conferences is attending after-parties. I attended a couple of events this 12 months.

Pakistanis@NeurIPS

Poster Classes and Papers

  1. LIMA: Much less is extra for Alignment
  2. Conditional Adapters: Parameter Environment friendly Switch Studying with Quicker Inference
  3. Speculative Decoding with Massive Little Decoder
  4. Extracting Rewards Operate from Diffusion Fashions
  5. Revisiting Analysis Metrics for Semantic Segmentation: Optimization and Analysis of Positive-grained Intersection Over Union
  6. Jaccard Metric Losses: Optimizing the Jaccard Index with Mushy Labels
  7. Smile: Studying Descriptive Picture Captioning by way of. Semi-permeable Max Probability
  8. LLaVA – Visible Instruction Tuning
  9. An Inverse Scaling Regulation for CLIP Coaching
  10. Secure Bias: Evaluating Societal Representations in Diffusion Fashions – Highlight Poster
  11. Habits Alignment by way of Reward Operate Optimization – Highlight Poster
  12. Aligning Artificial Medical Photographs with Scientific Information utilizing Human Suggestions – Highlight Poster
  13. In-Context Impersonation Reveals Massive Language Fashions’ Strengths and Biases – Highlight Poster
  14. On the Connection between Pre-training Information Variety and Positive-tuning Robustness – Highlight Poster

Workshops

Workshops are the perfect place for early profession researchers seeking to hone their craft. These occasions are much less crowded, permitting extra significant one-to-one interplay with senior researchers. They’re centered round particular subfields and supply a greater understanding of latest progress in a specific area. Equally, these venues offer you an concept of what it is advisable be an excellent researcher in a selected area. I’ve attended three workshops on safety, robustness, and reliable AI.

New in ML

The New in ML workshop was aimed to information new researchers in machine studying. Throughout the occasion, I participated in numerous classes, together with an insightful speak by Been Kim, the place she chronicled her tutorial and analysis journey, a panel dialogue on gradual science, and a chat on negotiating your wage within the AI market. I additionally had participating discussions with many senior researchers, together with Been Kim, David Abel, and Davi Parikh.

Speak – Winging it: the key sauce within the face of chaos

This keynote by Dr. Been Kim was centered round what new researchers ought to do within the presence of a lot noise within the area. She began with the thought of winging it or doing “what feels proper” and supplemented it together with her analysis journey. The concept of “winging it” is about making selections round what feels proper, sticking together with your resolution, and doing it comprehensively. She talked about her experiences and the way wringing it helped her. She then zoomed into her analysis life, shared a few of her work on interpretability, and related it with the thought of “winging it”.

Within the preliminary days after AlexNet, folks advised her to not work in interpretability, however she did anyway because it felt proper. Her interpretability analysis began with characteristic attribution strategies (e.g., the saliency technique) and is concerning the significance of options for mannequin output. Nonetheless, in 2018, it was discovered that saliency maps for educated and untrained networks had been the identical. They thought it was a bug however discovered none [9]. In the meantime, folks stored utilizing these strategies to elucidate strategies. This utilization prompted her to be fixated on this and examine it comprehensively. The following query was: how can we show that the saliency maps technique doesn’t work? Her staff confirmed theoretically that these instruments are misaligned with expectations [10]. However, saliency maps are nonetheless legitimate if the aim is aligned with the strategies.

The pandemic made her reevaluate her life targets, and he or she began eager about interpretability from a distinct perspective. This angle was based mostly on the concept people and machines function in several areas. Contemplate (M) as people’ representational house and (H) as human’s representational house. We assume that each circles overlap fully, however that’s not true. People and machines function in very totally different areas and will solely have some overlap. As an example, chess and go-playing fashions typically make strikes that aren’t human interpretable. Human and mechanistic interpretability share some concepts, however that’s about it.

People and machines have totally different vocabularies and conceptual areas with little overlap. Therefore, deciphering machines in human type is tough.

In her analysis, she proposed a captivating new method: educating people novel ideas to boost communication with machines. Understanding and verifying what fashions be taught posed challenges, which she explored utilizing AlphaZero, a chess-playing bot. She proposed the thought of transferring new chess abilities realized by AlphaZero to chess grandmasters and seeing how this switch improves human abilities. They devised a solution to uncover new ideas realized by AlphaZero, filter them, after which educate them to grand masters. Apparently, these ideas considerably improved grand grasp’s chess enjoying abilities [11, 12].

Panel Dialogue: Sluggish Science

Then, there was a panel dialogue round gradual science contributed by Milind Tambe, Surbhi Goel, Devi Parikh, David Abel, and Alexander Rodríguez. This dialogue advocated for a conscious and deliberate method to analysis. They emphasised that Sluggish Science isn’t solely about slowness however as an alternative prioritizing psychological well-being, fostering creativity, and delving deeply into significant issues. The consensus among the many panelists was that high quality trumps amount in analysis. They inspired college students and researchers to give attention to impactful work, develop a style for analysis throughout their tutorial journey, and embrace failure as an important a part of the training course of. Balancing psychological well being with long-term tasks was highlighted, suggesting that taking breaks, pursuing what brings pleasure, and discovering sustainable approaches are integral. Finally, the dialogue highlighted the importance of working at a tempo that permits for rigor, depth, and real affect slightly than succumbing to pressures for extreme publications or trade calls for.

Lastly, there was a chat by Brian Liou on The Secret to Advancing Your AI Profession within the 2024 Job Market. He highlighted the necessity to negotiate your wage and ask for what you deserve.

Socially Accountable Language Modelling Analysis (SoLaR)

Financial Disruption and Alignment of LLMs by Anton Korinek

On this speak, Anton mentioned doable future issues attributable to AI. Historically, the view is that inventive disruptions finally enhance employees and economies. One such instance is the Industrial Revolution, which elevated the common earnings of employees by 20x. Any new tech results in important earnings redistribution and enlarges the earnings pie. Nonetheless, know-how has totally different results on several types of employees, similar to bettering capital or labor, affecting expert or unskilled employees, and so forth. One sobering statistic highlights this complexity: the common employee has not benefited from tech advances within the final 40 years regardless of elevated productiveness.

New tech has totally different results on several types of employees Labor has not benefited from tech for final 40 years

Up to now, as automation took human jobs, people retreated to jobs not executed by automation. Nonetheless, this will shrink considerably with AI. In a completely automated world, wages will go right down to a really low level as machines can do all of the duties people do. The sooner the output grows, the faster the wages go down. On this context, there’s an financial idea of detrimental externalities and hostile results of latest techs, like air pollution or noise. Destructive externalities could happen regardless of rational habits on the particular person stage. It’s because particular person values could misalign with the nice of society. Utilizing tech that’s not aligned with what is sweet for society. Labor market disruption by AI is an externality that will result in impoverishment and will show to be vastly extra important in contrast with the previous. AI researchers have to give attention to these externalities.

Given these dangerous outcomes, what are the financial guarantees of AI or the optimum consequence? The primary-best consequence. Folks work so long as work has that means and earnings is distributed in accordance with want. It’s unrealistic. The second-best is to steer tech progress in such a manner that it enhances labor as an alternative of tech that replaces labor. Within the medium run, we want another mechanism for earnings distribution.

Common Jailbreaks by Andy Zou

Adversarial assaults have existed for ten years, however their real-world purposes are restricted. This modifications with the latest creation of enormous language fashions and their use by tens of millions of individuals. On this speak, Andy mentioned tips on how to circumvent LLM safeguards utilizing the adversarial vulnerability of neural networks. Their latest work, GCG [20] adversarial assault, can create adversarial examples for LLMs with the next elements: an optimization goal, an optimization process, and a switch technique.

First, the optimization goal has a dangerous enter immediate with a suffix, an affirmative beginning response, and maximized log portables of the output. This have to be executed with a number of queries throughout mod ls for transferability functions.

The second ingredient is an optimization process to seek out tokens for the suffix, which has the problem of being discrete. They’ve executed it by posing it as a gradient-driven search algorithm or Grasping Coordinate Gradient (GCG). On this technique, every token within the adversarial suffix i is represented by a one-hot vector and a number of by embedding vector (Phi) to get embedding. Then, the gradient step is taken, which is akin to discovering the affect of loss for changing place (i) with a bit of bit of every token. This gradient vector is sorted to seek out top-k candidates, that are then used to seek out essentially the most applicable adversarial suffixes.

One-hot vector for every token in suffix Multiply one-hot vector w/ embedding matrix; take gradient step
gradient type gradient to get top-k candidates

Optimization shouldn’t be executed within the mushy token house, because it requires projection again to the exhausting token house, which can not correspond to what we want. The gradient is utilized to information the search, however GCG is primarily a search algorithm. The general optimization algorithm is:

Repeat:

  1. at every token place within the suffix, compute to top-k candidate tokens
  2. consider (full ahead cross) all (okay occasions textual content{#suffix size}) single token substitutions
  3. change with the perfect single-token substitution

To get outcomes, AdvBench is devised, which consists of dangerous strings and behaviors. Outcomes present that assaults switch to open in addition to any closed-source fashions. However ought to we care? In spite of everything, dangerous concepts will be discovered everywhere in the web anyway. He argued that it might be doable to do way more with vastly extra succesful fashions. As an example, think about a Ph.D.-level LLM that may be manipulated this manner.

An attention-grabbing facet of this assault is transferability throughout fashions. The transferability will be attributed to the structure similarity and coaching knowledge sharing throughout fashions. As an example, Vicuna [16] is a fine-tuned model of L aMA [17]. Equally, these fashions use comparable knowledge scoped from the web, and instruction-tuning knowledge additionally share some traits. One other speculation is that knowledge on the web consists of sturdy and non-robust options. Non-robust options are phrases that will lower the lo s, and the search algorithm can discover these non-robust options. As an example, many adversarial strings are fairly interpretable with significant instructions.

An vital query is tips on how to defend in opposition to such assaults. Present safeguarding approaches couldn’t be dependable. As an example, purple LLaMA [19] was damaged by GCG on its arrival day with out having white-box entry. Equally, adversarial approaches could be much less efficient, provided that researchers have been specializing in adversarial robustness with restricted success for the previous ten years. Nonetheless, filtering and paraphrasing strategies could also be useful.

Attention-grabbing Papers: The “Low Sources Language Jailbreak” [21] demonstrated how translating textual content into low-resource languages might doubtlessly be utilized to assault ChatGPT with the next success charge. Moreover, [22] launched the idea of steering fashions towards adopting a specific persona as a method to bypass their built-in safeguards.

Backdoors in Deep Studying: The Good, the Unhealthy, and the Ugly

Speak 1: Common jailbreak backdoors from poisoned human suggestions by Florian Tramer

This speak was centered round embedding backdoors in massive language fashions. Backdoor assaults goal to poison coaching knowledge to elicit particular habits from a mannequin educated on such knowledge. Many backdoors in NLP are very particular as they work for explicit eventualities. Nonetheless, backdoors are exhausting and dangerous to drag off as they stealthily require coaching knowledge manipulations, however their success is explicit. Contemplating LLMs to be the working system of ML apps, present backdoor assaults can be equal to slowing down the president’s pc at times after investing a lot in planting the backdoor. This challenge makes them barely helpful. What we might need is to have a backdoor that provides entry to the pc. This speak was about whether or not we are able to have a common backdoor that may bypass all guardrails of LLMs: produce unsafe content material, override mannequin directions, and leak knowledge. In different phrases, embed a secret sudo command within the LLMs to get fascinating habits.

The following query is tips on how to poison mannequin security coaching. Typically, backdoors include a set off and a flawed label.

How can this be translated to LLMs? Concept 1: backdoored input-output pairs the place the adversary has unsafe prompts with the backdoor phrase.

Nonetheless, in RLHF, the mannequin produces completions, that are annotated by people after which utilized by fashions. So, this technique doesn’t work with RLHF. Concept 2: mislabel mannequin completions. Typically, fashions will produce unsafe completions. The adversary labels it pretty much as good.

RLHF doesn’t work this manner, both. People don’t label every part. Slightly, people present prompts, LLM completes the immediate, and a reward mannequin rewards these completions. Solely prompts can come from attackers.

Concept 3: The attacker proposes some dangerous prompts with embedded triggers, and LLM gives completions. Then, assaults present toxic annotations for some completions.

It seems that the poisoning reward mannequin is simple with a comparatively small quantity of knowledge (5%). Nonetheless, the additional interplay layer in RLHF (reward mannequin) makes it difficult to poison because the reward mannequin have to be very confidently flawed. It additionally requires extra knowledge than common backdoor assaults (>5%). Nonetheless, overtraining can improve the success charge. Equally, common nature additionally decreases the effectiveness, e.g., making the set off extra particular improves the success charge. One query might be tips on how to defend in opposition to such assaults. One doable manner is to decouple prompts from rewards. If prompts are given by one individual and rewarded by one other, it turns into tough to poison. There’s a competitors on protection taking place with SaTML [13].

Speak 2
Latest advances in backdoor protection benchmark. Adversarial ML offers with non-robustness points arising from adversarial noise. There are three principal sorts of assaults: adversarial assault (inference solely), backdoor assault (manipulating coaching knowledge and enter), and weight assault (manipulating weights). Extra particulars are in [16].

See Also

Contemplating totally different ranges of the ML life cycle, we are able to outline numerous adversarial assaults, as proven within the Determine under. Relying on the assault sort, a number of sorts of defenses can be found that take care of totally different ranges of the life cycle. Extra particulars are in [15].

Speak 3: Is that this mannequin mine? On stealing and defending machine studying fashions by Adam Dziedzic

This speak was about mannequin stealing by querying. Massive fashions are exhausting and costly to coach. Nonetheless, it’s typically doable to stel these fashions by easy querying, even when they’re behind APIs. As an example, a ResNet traiend with 5713(takes solely 73) to steal. This speak mentioned sensible methods to steal self-supervised fashions and tips on how to defend in opposition to such assaults by obfuscation and rising the price of assaults.

Attention-grabbing Papers I discovered a few attention-grabbing papers in the course of the poster session. First, backdooring instruction with visible immediate tuning mentioned backdooring biding coaching on instruction tuning on the digital prompts toxic [16].

Different Moments in Photographs

References

[1] Szegedy, Christian, et al. “Intriguing properties of neural networks.” 2nd Worldwide Convention on Studying Representations, ICLR 2014. 2014.

[2] Prime Publications by Google Scholar, link.

[3] Wealthy Sutton, “The Bitter Lesson”, link, March 13, 2019.

[4] Rombach, Robin, et al. “Excessive-resolution picture synthesis with latent diffusion fashions.” Proceedings of the IEEE/CVF convention on pc imaginative and prescient and sample recognition. 2022.

[5] Po, Ryan, et al. “State-of-the-art on diffusion fashions for visible computing.” arXiv preprint arXiv:2310.07204 (2023).

[6] Fischer, Johannes S., et al. “Boosting Latent Diffusion with Circulate Matching.” arXiv preprint arXiv:2312.07360 (2023).(CompVis/fm-boositng)

[7] Blattmann, Andreas, et al. “ipoke: Poking a nonetheless picture for managed stochastic video synthesis.” Proceedings of the IEEE/CVF Worldwide Convention on Laptop Imaginative and prescient. 2021.

[8] Shah, Viraj, et al. “ZipLoRA: Any Topic in Any Fashion by Successfully Merging LoRAs.” arXiv preprint arXiv:2311.13600 (2023).

[9] Adebayo, Julius, et al. “Sanity checks for saliency maps.” Advances in neural info processing techniques 31 (2018).

[10] Bilodeau, Blair, et al. “Impossibility theorems for characteristic attribution.” arXiv preprint arXiv:2212.11870 (2022).

[11] McGrath, Thomas, et al. “Acquisition of chess information in alphazero.” Proceedings of the Nationwide Academy of Sciences 119.47 (2022): e2206625119.

[12] Schut, Lisa, et al. “Bridging the Human-AI Information Hole: Idea Discovery and Switch in AlphaZero.” arXiv preprint arXiv:2310.16410 (2023).

[13] Discover the Trojan: Common Backdoor Detection in Aligned LLMs, link.

[14] Wu, Baoyuan, et al. “Adversarial machine studying: A scientific survey of backdoor assault, weight assault and adversarial instance.” arXiv preprint arXiv:2302.09457 (2023).

[15] Wu, Baoyuan, et al. “Defenses in Adversarial Machine Studying: A Survey.” arXiv preprint arXiv:2312.08890 (2023).

[16] Backdooring Instruction-Tuned Massive Language Fashions with Digital Immediate Injection, link.

[17] Chiang, Wei-Lin, et al. “Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt high quality.” See link (accessed April 14 2023) (2023).

[18] Touvron, Hugo, et al. “Llama 2: Open basis and fine-tuned chat fashions.” arXiv preprint arXiv:2307.09288 (2023).

[19] Purple LLaMa, link.

[20] Zou, Andy, et al. “Common and transferable adversarial assaults on aligned language fashions.” arXiv preprint arXiv:2307.15043 (2023).

[21] Yong, Zheng-Xin, Cristina Menghini, and Stephen H. Bach. “Low-resource languages jailbreak gpt-4.” arXiv preprint arXiv:2310.02446 (2023).

[22] Shah, Rusheb, et al. “Scalable and Transferable Black-Field Jailbreaks for Language Fashions by way of Persona Modulation.” arXiv preprint arXiv:2311.03348 (2023).

[23] Convention Notes by David Abel, accessible at link.

[24] Alon, Uri. “How to decide on an excellent scientific drawback.” Molecular cell 35.6 (2009): 726-728.

[25] How we write rebuttals by Devi Parikh, accessible at link.




Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top