JS DizzyCoding

JavaScript parser in Python [closed]

JavaScript parser in Python [closed] – Even if we have a good project plan and a logical concept, we will spend the majority of our time correcting errors abaout javascript and python. Furthermore, our application can run without obvious errors with JavaScript, we must use various ways to ensure that everything is operating properly. In general, there are two types of errors that you’ll encounter while doing something wrong in code: Syntax Errors and Logic Errors. To make bug fixing easier, every JavaScript error is captured with a full stack trace and the specific line of source code marked. To assist you in resolving the JavaScript error, look at the discuss below to fix problem about JavaScript parser in Python [closed].

Problem :

There is a JavaScript parser at least in C and Java (Mozilla), in JavaScript (Mozilla again) and Ruby. Is there any currently out there for Python?

I don’t need a JavaScript interpreter, per se, just a parser that’s up to ECMA-262 standards.

A quick google search revealed no immediate answers, so I’m asking the SO community.

Solution :

Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles
JavaScript into more compact code so that it downloads and runs
faster.

SlimIt also provides a library that includes a JavaScript parser,
lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'abc@g.com'"}

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available – so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

I have translated esprima.js to Python:

https://github.com/PiotrDabkowski/pyjsparser

>>> from pyjsparser import parse
>>> parse('var $ = "Hello!"')
{
"type": "Program",
"body": [
    {
        "type": "VariableDeclaration",
        "declarations": [
            {
                "type": "VariableDeclarator",
                "id": {
                    "type": "Identifier",
                    "name": "$"
                },
                "init": {
                    "type": "Literal",
                    "value": "Hello!",
                    "raw": '"Hello!"'
                }
            }
        ],
        "kind": "var"
    }
  ]
}

It’s a manual translation so its very fast, takes about 1 second to parse angular.js file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 – for example Arrow functions, const, let.

If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:

import js2py
esprima = js2py.require("esprima@4.0.1")
esprima.parse("a = () => {return 11};")
# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}

As pib mentioned, pynarcissus is a Javascript tokenizer written in Python. It seems to have some rough edges but so far has been working well for what I want to accomplish.

Updated: Took another crack at pynarcissus and below is a working direction for using PyNarcissus in a visitor pattern like system. Unfortunately my current client bought the next iteration of my experiments and have decided not to make it public source. A cleaner version of the code below is on gist here

from pynarcissus import jsparser
from collections import defaultdict

class Visitor(object):

    CHILD_ATTRS = ['thenPart', 'elsePart', 'expression', 'body', 'initializer']

def __init__(self, filepath):
    self.filepath = filepath
    #List of functions by line # and set of names
    self.functions = defaultdict(set)
    with open(filepath) as myFile:
        self.source = myFile.read()

    self.root = jsparser.parse(self.source, self.filepath)
    self.visit(self.root)


def look4Childen(self, node):
    for attr in self.CHILD_ATTRS:
        child = getattr(node, attr, None)
        if child:
            self.visit(child)

def visit_NOOP(self, node):
    pass

def visit_FUNCTION(self, node):
    # Named functions
    if node.type == "FUNCTION" and getattr(node, "name", None):
        print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.end]


def visit_IDENTIFIER(self, node):
    # Anonymous functions declared with var name = function() {};
    try:
        if node.type == "IDENTIFIER" and hasattr(node, "initializer") and node.initializer.type == "FUNCTION":
            print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.initializer.end]
    except Exception as e:
        pass

def visit_PROPERTY_INIT(self, node):

    # Anonymous functions declared as a property of an object
    try:
        if node.type == "PROPERTY_INIT" and node[1].type == "FUNCTION":
            print str(node.lineno) + " | function " + node[0].value + " | " + self.source[node.start:node[1].end]
    except Exception as e:
        pass


def visit(self, root):

    call = lambda n: getattr(self, "visit_%s" % n.type, self.visit_NOOP)(n)
    call(root)
    self.look4Childen(root)
    for node in root:
        self.visit(node)

filepath = r"C:UsersdwardDropboxjuggernaut2juggernautparsertestdatajasmine.js"
outerspace = Visitor(filepath)

You can try python-spidermonkey
It is a wrapper over spidermonkey which is codename for Mozilla’s C implementation of javascript.

Exit mobile version