Rendered at 22:30:29 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
dclavijo 50 minutes ago [-]
It wonders me that a coupe of days ago I did the same with Unique and a single skill.md, repo: https://github.com/daedalus/uniq-reconstruction, on succes I tried with rar but failed. Kudos
spprashant 48 minutes ago [-]
I have never attempted something so ambitious with AI, but this feels spot on in terms of experience. As you cede more control to the model, you will find yourself losing control on things like code quality and performance.
xphos 2 hours ago [-]
Would it really take 5 years to develop rare compress and decompression that seems an extreme overestimate in time. I don't know of the compressor decompression but that seems really high
q3k 2 hours ago [-]
Yeah, sounds closer to a 5 week thing, if you know what you're doing.
self_awareness 43 minutes ago [-]
5 week is a decompressor for 1 version. If this supports multiple versions of RAR, then writing decompressors alone for all of them is probably a year effort of work.
esafak 2 hours ago [-]
> It’s sloppy, it’s slow, it’s almost two megabytes in size and somewhat worse than WinRAR on compression.
As mathematicians say, optimization is left as an exercise to the reader. You did the hard part.
59nadir 31 minutes ago [-]
I mean, not really...? A vibecoded mess that runs badly, that's not really the hard part for something like compression/decompression tools.
rebolek 1 hours ago [-]
> "For the last 15 months or so my hobby has been shouting at Claude"
How can you shout at Claude when it’s
1) foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on
2) when it finally interacts, it’s asking for a permission you told it 30 seconds ago "yes and do not ever ask me again until heat death of the Universe"
3) and after all of that, it just spits out: "you’re out of tokens, give up your liver or wait until next Trump’s war"
themafia 2 hours ago [-]
> But, it works, and the world now has a free software RAR implementation.
Does it? How are you legally intending to use copyright to license this machine output? How would you know it's not encumbered in any way?
perching_aix 2 hours ago [-]
Really unsure why this is getting downvoted, to my understanding this is a massive, unsettled concern.
It wasn't even a disasm/pseudocode to formal spec flow, and then a separate human implementation. The same human has been in the loop throughout, and large parts of it were generated directly.
It's basically guaranteed tainted.
Edit: I should have skimmed a bit more patiently, there was in fact no "disasm/pseudocode + the human getting tainted" part to this apparently.
ameliaquining 1 hours ago [-]
I read the post you're replying to as saying "this is copyright-encumbered and nonfree because it's a derivative work of everything in Claude's and GPT-5.5's training corpus", which is an argument I find fairly tiresome. (Realistically, if courts actually rule that this is the case, this tiny little project will be the least of anyone's concerns.)
"This is copyright-encumbered and nonfree because it's a derivative work of the legacy RAR binaries" is a different argument (and seems like it depends on details of the setup that were somewhat glossed over in the post).
1 hours ago [-]
themafia 52 minutes ago [-]
The point is, excepting current legal standards which are already very murky, how can _you_ claim copyright, if you don't _know_ it isn't encumbered?
You can get these LLMs to generate copyrighted outputs both intentionally and accidentally. This is a known fact; therefore, if you're not checking the output to see if this has occurred then you're potentially generating legal risks for yourself and anyone who uses your code.
To not only ignore this for your own use case but to then release the code under a proclaimed license seems legally problematic if not ethically concerning.
If you did get sued for infringement I can't imagine that your defense would be that you find the argument tiresome? Honestly, do you think this would never happen, or how would you go about defending your actions here?
charcircuit 1 hours ago [-]
The human wasn't looking at the copyrighted code and was giving high level steering instructions. If you look at the spec generated it doesn't look like a derivative work of the copyrighted material. The program was generated from the spec. It seems mostly fine from my perspective.
0cf8612b2e1e 37 minutes ago [-]
If I use a decompiler on existing binaries, then some machine translation utility to turn that into a different language, that still feels like a derivative work, even if no human were reviewing the specifics.
slopinthebag 2 hours ago [-]
How do we know it's actually correct?
perching_aix 2 hours ago [-]
By using it.
slopinthebag 44 minutes ago [-]
Thus all software that can be used is correct?
You know what I meant: How can we have confidence that this implementation of RAR is functionally identical to what it's based on? What would give me the confidence to use it in a critical piece of infrastructure?
perching_aix 4 minutes ago [-]
[delayed]
jaggederest 38 minutes ago [-]
Validating compression systems is usually really straightforward. There are 3 layers - decode known values from compressed files (or encode, same), round trip without any alterations, and fuzzing with arbitrary binaries
Because it's a defined format there can be binary exact comparisons between the input and output files - we already have an oracle in the form of proper RAR format software, so if they are identical, you don't need to look further for that specific case.
It validates that sql with the same setup, teardown, and test results in perfectly exact compatibility between raw postgresql as the control and various configurations of PgDog, with both the text format and binary format, so ultimately a 6-way multivariate test that should always result in binary-exact results.
slopinthebag 16 minutes ago [-]
Right, that's very different from "using it" and it's also different from "Have an LLM generate code that compiles".
repelsteeltje 2 hours ago [-]
It works == it's correct?
perching_aix 2 hours ago [-]
Yes? What do you think fuzzing, unit testing, integration testing is for? It's an empirical evaluation of correctness. Literally just try and see.
For actual correctness verification in the strong sense, you'd need to start from a specification written in a formal language so that it's machine checkable, which if I had to guess not even win.rar GmbH has.
wavemode 43 minutes ago [-]
You're being needlessly dismissive.
From a philosophical perspective, there's no way to know that any piece of software is truly correct without formal verification.
But in the present, non-philosophical context, it's obvious that what we mean is, colloquially, "how well-tested is this against a variety of edge-case files which the official winrar handles correctly? Is there a test suite, and how robust is it? Plenty of software that claims to be compatible with the rar format, doesn't actually successfully read all rar files."
It's also equally obvious, in the present context, that we would prefer these steps to have been taken by the author of the software before we install it and run it on our own computers and data. The parent commenter wasn't just asking about the software's correctness for the sake of academic curiosity.
repelsteeltje 2 hours ago [-]
I hope the developers of, say, the brakes in my car don't interpret 'software correctness' the way you do.
Added, later: hey you changed your comment, added a whole paragraph.
perching_aix 1 hours ago [-]
I added the second paragraph about formal verification at the same time you posted, in anticipation that you'd immediately dig your heels into it otherwise, despite me highlighting that the other methods are merely empirical.
I was immediately proven right once I pressed "update". That said, I have now deleted my snarky response that followed. Not in the game of capitalizing off of the human equivalent of a race condition.
I should make a browser addon to delay posting, this is the 2nd time this happens in the past few days.
Edit:
Nevermind, it's already a feature built into the site. Turned it on. I wonder if it applies to edits also...
Nope, doesn't seem to. Oh well, should still help.
repelsteeltje 1 hours ago [-]
Haha, off course! The three major sources of software failures: off by one errors and race conditions.
atiedebee 1 hours ago [-]
I hope the brakes in my car don't need developers
pixl97 1 hours ago [-]
I think you underestimate the complexity of modern braking systems.
arcticbull 1 hours ago [-]
ABS doesn't just appear organically.
throw1234567891 1 hours ago [-]
They used to. Now they have systems, standards, and experience. There are only so many ways you can do brakes on the car.
2 hours ago [-]
mjr00 2 hours ago [-]
This is Rust we're talking about. It doesn't even need to work; as long as it compiles, it's correct.
speedgoose 2 hours ago [-]
use std::fs::File;
use std::io::prelude::*;
fn main() -> std::io::Result<()> {
let mut file = File::create("content.txt")?;
file.write_all(b"3!")?;
Ok(())
}
rakel_rakel 1 hours ago [-]
; cat content.txt
3!;
dataflow 1 hours ago [-]
> This is Rust we're talking about. It doesn't even need to work; as long as it compiles, it's correct.
No, it doesn't even need to compile. The mere fact that it's in Rust means it's correct.
TacticalCoder 22 minutes ago [-]
I could be correct but way too slow in edge cases (unlikely with Rust but you never know), leaking temporary files, having security holes, etc.
There's much more about correctness of a piece of software than: "produces the same output as the original on x test cases".
I'm not saying it's a bad implementation and, if anything, LLMs are much better at translating/porting existing code (and finding bugs) than at writing things unheard of.
You're basically saying, if I may make a pun: "rust me bro, it's correct".
cactusplant7374 2 hours ago [-]
> and it almost earned me an OpenAI ban
Were you flagged for a cybersecurity violation?
gibspaulding 2 hours ago [-]
> Well, it turned out that at some time during spec investigation, Claude needed to understand authenticity verification which is a paid feature. With a context full of reverse engineering tools it cracked WinRAR and bypassed product registration, then dutifully documented its crimes in the spec. The docs, when viewed, triggered OpenAI’s alarms and stopped it dead in its tracks. I squashed this out of the git history, and decided not to implement the feature at all.
You can draw your own conclusions as to what this says about the state of agentic development.
periodjet 1 hours ago [-]
Finally, a sane and enjoyable read about a coding project. Feel like it’s been months since we had one of these that wasn’t filled to the brim with bluesky/mastodon-flavored whining about AI.
Kudos to the author. A fun read, thank you for sharing.
RIMR 1 hours ago [-]
For everyone out there whining about AI, there's one of you whining about being anti-AI.
Maybe just cut the unprompted whining?
57 minutes ago [-]
perching_aix 52 minutes ago [-]
Would be great, but then it's a saturation game, and the other side doesn't have any compelling reason to hold back the same way. So it's contingent on how fair the platform is, and what nonverbal, out of band options remain.
HN is better than most in this regard thanks to community flagging, but even then there's a lot of it. Ultimately, it'd seem that the ratio you're describing skews a whole lot more towards the anti-ai sentiment side, than towards the anti-anti-ai one (or towards a stalemate). Or rather, that the latter sentiment is not common enough necessarily to thwart such comments. And so you see it reflected verbally instead.
Imustaskforhelp 2 hours ago [-]
Kudos, this is a really cool project (even if it might be AI generated), I have starred the repo, (3rd starrer here)
One thing I have been curious at is are there any ways to stop a rar compression mid way and then continue it later?
Like suppose I have a compression happening for a large file, then would there be a possibility with this project to shut down the computer mid compression and continue it after starting it again?
I would really love it if you can add this functionality!
As mathematicians say, optimization is left as an exercise to the reader. You did the hard part.
How can you shout at Claude when it’s
1) foobaring, bamblabooing and fghrtawing all the time without telling you what’s going on
2) when it finally interacts, it’s asking for a permission you told it 30 seconds ago "yes and do not ever ask me again until heat death of the Universe"
3) and after all of that, it just spits out: "you’re out of tokens, give up your liver or wait until next Trump’s war"
Does it? How are you legally intending to use copyright to license this machine output? How would you know it's not encumbered in any way?
It wasn't even a disasm/pseudocode to formal spec flow, and then a separate human implementation. The same human has been in the loop throughout, and large parts of it were generated directly.
It's basically guaranteed tainted.
Edit: I should have skimmed a bit more patiently, there was in fact no "disasm/pseudocode + the human getting tainted" part to this apparently.
"This is copyright-encumbered and nonfree because it's a derivative work of the legacy RAR binaries" is a different argument (and seems like it depends on details of the setup that were somewhat glossed over in the post).
You can get these LLMs to generate copyrighted outputs both intentionally and accidentally. This is a known fact; therefore, if you're not checking the output to see if this has occurred then you're potentially generating legal risks for yourself and anyone who uses your code.
To not only ignore this for your own use case but to then release the code under a proclaimed license seems legally problematic if not ethically concerning.
If you did get sued for infringement I can't imagine that your defense would be that you find the argument tiresome? Honestly, do you think this would never happen, or how would you go about defending your actions here?
You know what I meant: How can we have confidence that this implementation of RAR is functionally identical to what it's based on? What would give me the confidence to use it in a critical piece of infrastructure?
Because it's a defined format there can be binary exact comparisons between the input and output files - we already have an oracle in the form of proper RAR format software, so if they are identical, you don't need to look further for that specific case.
You can see a version of this that I did quite similarly, for postgresql wire format, here: https://github.com/pgdogdev/pgdog/tree/main/integration/sql
It validates that sql with the same setup, teardown, and test results in perfectly exact compatibility between raw postgresql as the control and various configurations of PgDog, with both the text format and binary format, so ultimately a 6-way multivariate test that should always result in binary-exact results.
For actual correctness verification in the strong sense, you'd need to start from a specification written in a formal language so that it's machine checkable, which if I had to guess not even win.rar GmbH has.
From a philosophical perspective, there's no way to know that any piece of software is truly correct without formal verification.
But in the present, non-philosophical context, it's obvious that what we mean is, colloquially, "how well-tested is this against a variety of edge-case files which the official winrar handles correctly? Is there a test suite, and how robust is it? Plenty of software that claims to be compatible with the rar format, doesn't actually successfully read all rar files."
It's also equally obvious, in the present context, that we would prefer these steps to have been taken by the author of the software before we install it and run it on our own computers and data. The parent commenter wasn't just asking about the software's correctness for the sake of academic curiosity.
Added, later: hey you changed your comment, added a whole paragraph.
I was immediately proven right once I pressed "update". That said, I have now deleted my snarky response that followed. Not in the game of capitalizing off of the human equivalent of a race condition.
I should make a browser addon to delay posting, this is the 2nd time this happens in the past few days.
Edit:
Nevermind, it's already a feature built into the site. Turned it on. I wonder if it applies to edits also...
Nope, doesn't seem to. Oh well, should still help.
No, it doesn't even need to compile. The mere fact that it's in Rust means it's correct.
There's much more about correctness of a piece of software than: "produces the same output as the original on x test cases".
I'm not saying it's a bad implementation and, if anything, LLMs are much better at translating/porting existing code (and finding bugs) than at writing things unheard of.
You're basically saying, if I may make a pun: "rust me bro, it's correct".
Were you flagged for a cybersecurity violation?
You can draw your own conclusions as to what this says about the state of agentic development.
Kudos to the author. A fun read, thank you for sharing.
Maybe just cut the unprompted whining?
HN is better than most in this regard thanks to community flagging, but even then there's a lot of it. Ultimately, it'd seem that the ratio you're describing skews a whole lot more towards the anti-ai sentiment side, than towards the anti-anti-ai one (or towards a stalemate). Or rather, that the latter sentiment is not common enough necessarily to thwart such comments. And so you see it reflected verbally instead.
One thing I have been curious at is are there any ways to stop a rar compression mid way and then continue it later?
Like suppose I have a compression happening for a large file, then would there be a possibility with this project to shut down the computer mid compression and continue it after starting it again?
I would really love it if you can add this functionality!
I suppose the question is whether the author had ever entered into a contract limiting reverse engineering...