.
This is the Dantotsu task that is the most dangerous.
I had o1 Pro read the log and explain it to me.
.
1.### Run Jekyll and check logs .
2.### Sass Deprecation Warning fix .
3.### patching trouble .
git apply --check and Devin fixes the patch file, but it often happens that the warnings are not resolved completely and the patch is not properly applied.4.### Work in progress .
5.### Partial resolution on the human side .
(There are many things, but the important one is this )
Overall, the dialogue is characterized by a series of exchanges trying to fix a Deprecation Warning in Sass, but not completing smoothly, as Devin is unable to finish the patch properly or is interrupted by session limitations.
Note that after this event release notes for 1/16/16 at
Keeping sessions under 10 ACUs (Devin’s performance degrades in long sessions)
How many times have you actually fallen asleep?
The following logs show that Devin went to sleep a total of ### 8 times
(the timing of "Devin went to sleep due to session usage settings" and "Devin went to sleep due to user inactivity" is (count).
Progress at each of these timings has all been reported, such as "partially fixed Sass Deprecation Warning," but the final solution has not yet been fully resolved. Below is a summary of the main progress made just before going to sleep.
1.### 1st (5:07 PM) - Prior to that, the company indicates a policy such as "create a patch to fix Sass", but the actual deliverable (patch) is still in the error-prone stage. No clear completion report.
2.### 2nd (9:33 PM) . - He reported that "364 warnings have been fixed." However, many warnings still remain, and patching continues to fail.
3.### 3rd (7:14 PM) . - Continuing to fix Sass warnings. Just before the end of the day, it tells you that it is "in the process of fixing mixed declarations" etc., but goes to sleep before it is completed.
4.### 4th (2:42 AM) - Similarly, while stating progress that "364 _type.scss and _rfs.scss have been fixed," the correction of _reboot.scss and others is not yet complete.
5.### 5th (8:36 PM) - We are attempting to create a patch and check the build of Jekyll, but have not yet been able to provide an appropriate patch.
6.### 6th (7:34 PM) - It wasn't reported until "finally, Warnings were reduced to zero," and then it went to sleep with only a partial fix. - Even in this area, there are still reports of patching problems and fixes in progress.
7.### 7th (2:03 PM) . - Only reported "continuing to fix Sass." No specific details on the number of warnings remaining were given.
. - Finally, a patch called "css-fixes.patch" is presented, but it is not clear if it completely eliminates all warnings. - Just prior to this, he also told them that the warning disappeared at build time, but ended up not verifying what actually happens when it is applied.
So, since they are sleeping 7 times at the Usage Limit, they have melted at least 21,000 yen there alone.
.
Devin goes to sleep without any output after this, maybe it would have been shown in Slack as follows
. Something similar to this is happening with several people, not just this one seemingly. I have used more than 10 different paradigms of programming languages, and have published a book on comparing and learning languages, "Technology Supporting Coding", and from my point of view, this is almost like a paradigm shift in programming. I think this is similar to a paradigm shift in programming.
Perhaps a conventional mental model would make the following code look "efficient" python
for task in tasks:
process(task)
They would feel more efficient because humans write less and computers work a lot more.
In fact, in this case, people were giving these instructions.
Okay then, can you try your best to fix all of warnings you see in the logs?
In other words, if you put it in code, it would look like this python
for warning in logs:
fix(warning)
The plan generated internally by Devin reflecting that instruction is also in the loop.
Is this really efficient?
In the current LLM mechanism, which maintains contextual memory by first adding to the context, repeating the loop gradually extends the input to the LLM, increasing the time it takes and the amount charged.
To look at it the other way around, if you run similar tasks in a loop, AI-kun will get dumber and slower as the loop turns.
. The next problem is the scale of the loop's contents relative to the task For those accustomed to recent programming environments, the cost of the loop's contents is too often small and unconsciously assumed to be zero.
In reality, however, as long as you hit the LLM API that you are charged for, it costs money per processing. Moreover, due to the "efficiency decreases as the loop turns" effect described above, for example, in the case of a squared order, it costs 900 times as much to process 30 cases at once. If there is a Prompt Caching or something like that, it can be mitigated, and other efforts will lower the order, so the 900-fold increase will only occur in a very naive configuration, but basically, the cost is greater than linear.
How much does that one task cost to perform? It is dangerous to throw a large number of tasks in a loop when you do not have a good feel for this.
Also, we don't know if the processing by LLM will return the expected results. The AI is not responsible for producing "results you are satisfied with," so it is more like outsourcing under a sequential delegation contract than a contract agreement. The AI is not responsible for producing "results to your satisfaction," so it is more like subcontracting under an orderly delegation contract than a contract agreement.
How to manage AI in such a situation is to first try a small unit to see if it can be done and how much it will cost before placing a large order. This is an idea well known as Minimum Viable Product proposed by Lean Startup.
. Recently, an INVITE link has been set up where the referrer and the person referred will receive 100 ACUs, or a little over 30,000 yen as a referral fee. I'm taking the initiative and hurting my own wallet to gain knowledge, so please don't hesitate to throw money at me! https://app.devin.ai/invite/iDYvHCNSdfhnClCf
Material notes
This page is auto-translated from /nishio/Devinで4万溶かす方法 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.