I am several hundred opossums in a trench coat

  • 1 Post
  • 30 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle
  • I briefly worked for a company who worked on household power technology. Their product would attempt to predict energy prices, weather patterns, and usage to sell your excess energy at peak prices. Like discussed in the article, this company collected usage data and controlled the sale of energy back to the grid centrally. They did this because it meant they could better train their prediction models and run them on more powerful hardware. The controllers would have needed internet connectivity anyway to query energy prices, and putting the prediction on device would have just made them more expensive and worse. Even when I worked there (back in 2015 I think), they were already very aware of the threat vectors discussed by this article and took some measures to prevent it.

    In my opinion they were (/are, still exist) a responsible company run by competent people. They did not collect the data out of “greed”, and I strongly suspect that the people in these comments implying that the data is collected to be sold have never actually worked in the industry and have very little idea of the specific value of energy usage data. I can’t really speak authoritatively for other companies, but I would guess that, like the one I worked for, their products are internet connected simply because it improves the product. For example, people expect things to be controllable or viewable from an app from anywhere, and that requires internet connectivity.




  • Not every change is going to completely overhaul the app. More than likely, the changes are a fix to some obscure bug not caught in testing that only affects a small percentage of devices. Just because you don’t encounter it with your workflow and device doesn’t mean it isn’t a critical bug preventing someone from using the app. It could also be a new feature targeting a different use case to yours. It could even be as simple as bringing the app into compliance with new platform requirements or government regulations (which can happen a couple times a year, for example Android often bumps the minimum SDK target such that apps are forced to comply with new privacy improvements).




  • After a certain point, learning to code (in the context of application development) becomes less about the lines of code themselves and more about structure and design. In my experience, LLMs can spit out well formatted and reasonably functional short code snippets, with the caveate that it sometimes misunderstands you or if you’re writing ui code, makes very strange decisions (since it has no special/visual reasoning).

    Anyone a year or two of practice can write mostly clean code like an LLM. But most codebases are longer than 100 lines long, and your job is to structure that program and introduce patterns to make it maintainable. LLMs can’t do that, and only you can (and you can’t skip learning to code to just get on to architecture and patterns)




  • How much computing power do you think it takes to approximately recognise a predefined word or phrase? They do that locally, on device, and then stream whatever audio follows to more powerful computers in AWS (the cloud). To get ahead of whatever conspiratorial crap you’re about to say next, Alexa devices are not powerful enough to transcribe arbitrary speech.

    Again, to repeat, people smarter than you and me have analysed the network traffic from Alexa devices and independently verified that it is not streaming audio (or transcripts) unless it has heard something close (i.e close enough such that the fairly primative audio processing (which is primitive because it’s cheap, not for conspiracy reasons) recognises it) to the wake word. I have also observed this, albeit with less rigorous methodology. You can check this yourself, why don’t you do that and verify for yourself whether this conspiracy holds up?






  • Can you explain to me exactly how moving where profit is recorded from one division to another in the same organization reduces their tax burden? Because, excuse me, I know I only did a year or two of accounting courses before dropping the degree, but that’s not how I understand taxes to work.

    Also to be turning a profit by “doing well collecting data”, the open market value of the data Alexa alone annually generates would need to be around 8% of the entire global data market. If you can justify how millions of instances of “Alexa set a timer for 10 minutes”, “Alexa what is the weather”, or “Alexa play despacito” generates that much value, maybe you have a point.



  • having an always on listening device in someone’s home

    They very explicitly do not collect audio when you haven’t used a wake word or activated it some other way. They will not “know what is discussed within the house for data on ad penetration/reach” (which is pretty much the only valuable data you’ve mentioned here), nor will they “have a backchannel to television viewing and music listening patterns” unless you actively discuss it with your device.

    I’m not going to put words in your mouth, but if whoever reads this is thinking of replying “are you going to trust that” etc, yes I am. We can track which data an Alexa transmits in real time and directly verify this “always listening” isn’t happening. Even if we couldn’t independently verify that his is the case, and lets say they contradict their privacy policy and public statements and do it anyway, that’s a crazy liability nightmare. Amazon has more than enough lawyers to know that unconsentually recording someone and using that data is very illegal in most places, and would open them up to so many lawsuits if they accidentally leaked or mishandled the data. Take the conspiracy hat off and put your thinking cap on.

    Send it to cheap overseas transcribers, use it to train and improve voice recognition and automatic transcription.

    Bad for privacy, but also not a $25 billion dollar source of revenue.

    Alexa, Google Home, and Siri devices are not good sources of data. If they were, why would Google, king of kings when it comes to data collection, be cutting their Assistant teams so much?



  • Thank you for adding this! If people want a real life example of the effect shown in this pseudocode, here is a side-by-side comparison of real production code I wrote and it’s decompiled counterpart:

        override fun process(event: MapStateEvent) {
            when(event) {
                is MapStateEvent.LassoButtonClicked -> {
                    action(
                        MapStateAction.LassoButtonSelected(false),
                        MapStateAction.Transition(BrowseMapState::class.java)
                    )
                }
                is MapStateEvent.SaveSearchClicked -> {
                    save(event.name)
                }
                // Propagated from the previous level
                is MapStateEvent.LassoCursorLifted -> {
                    load(event.line + event.line.first())
                }
                is MapStateEvent.ClusterClick -> {
                    when (val action = ClusterHelper.handleClick(event.cluster)) {
                        is ClusterHelper.Action.OpenBottomDialog ->
                            action(MapStateAction.OpenBottomDialog(action.items))
                        is ClusterHelper.Action.AnimateCamera ->
                            action(MapStateAction.AnimateCamera(action.animation))
                    }
                }
                is MapStateEvent.ClusterItemClick -> {
                    action(
                        MapStateAction.OpenItem(event.item.proposal)
                    )
                }
                else -> {}
            }
        }
    

    decompiled:

        public void c(@l j jVar) {
            L.p(jVar, D.f10724I0);
            if (jVar instanceof j.c) {
                f(new i.h(false), new i.r(c.class, (j) null, 2, (C2498w) null));
            } else if (jVar instanceof j.e) {
                m(((j.e) jVar).f8620a);
            } else if (jVar instanceof j.d) {
                List<LatLng> list = ((j.d) jVar).f8619a;
                j(I.A4(list, I.w2(list)));
            } else if (jVar instanceof j.a) {
                d.a a7 = d.f8573a.a(((j.a) jVar).f8616a);
                if (a7 instanceof d.a.b) {
                    f(new i.j(((d.a.b) a7).f8575a));
                } else if (a7 instanceof d.a.C0058a) {
                    f(new i.a(((d.a.C0058a) a7).f8574a));
                }
            } else if (jVar instanceof j.b) {
                f(new i.k(((j.b) jVar).f8617a.f11799a));
            }
        }
    

    keep in mind, this was buried in hundreds of unlabeled classes and functions. I was only able to find this in a short amount of time because I have the most intimate knowledge of the code possible, having written it myself.


  • It’s not impossible, just very labour intensive and difficult. Compiling an abstract, high level language into machine code is not a reversible process. Even though there are already automated tools to “decompile” machine code back to a high level language, there is still a huge amount of information loss as nearly everything that made the code readable in the first place was stripped away in compilation. Comments? Gone. Function names? Gone. Class names? Gone. Type information? Probably also gone.

    Working through the decompiled code to bring it back into something readable (and thus something that can be worked with) is not something a lone “very smart person” can do in any reasonable time. It takes likely a team of smart people months of work (if not years) to understand the entire structure, as well as every function and piece of logic in the entire program. Once they’ve done that, they can’t even use their work directly, since to publish reconstructed code is copyright infringement. Instead, they need to write extremely detailed documentation about every aspect of the program, to be handed to another, completely isolated person who will then write a new program based off the logic and APIs detailed in the documentation. Only at that point do they have a legally usable reverse engineered program that they can then distribute or modify as needed.

    Doing this kind of reverse engineering takes a huge amount of effort and motivation, something that an app for 350 total sneakers is unlikely to warrant. AI can’t do it either, because they are incapable of the kind of novel deductive reasoning required for the task. Also, the CarThing has actually always been “open-source”, and people have already experimented with flashing custom firmware. You haven’t heard about it because people quickly realised there was no point - the CarThing is too underpowered to do much beyond its original use.